Future: AWS Community Builders

AWS Data & AI Stories #04: Multimodal RAG on AWS

Sedat SALMAN — Wed, 22 Apr 2026 19:07:00 +0000

In the first article, I talked about multimodal AI at a high level.

In the second article, I focused on Amazon Bedrock Data Automation as the processing layer.

In the third article, I explained multimodal knowledge bases as the retrieval layer.

Now it is time to connect these pieces together.

This is where multimodal RAG becomes important. Amazon Bedrock Knowledge Bases now supports multimodal content including images, audio, and video, and AWS positions it as a managed way to build end-to-end RAG workflows over enterprise data.

What is multimodal RAG?

RAG means Retrieval Augmented Generation.

The idea is simple:

retrieve relevant content from your own data
send that context to the model
generate a grounded answer

A multimodal RAG system follows the same logic, but the retrieved context is not limited to text. It can also include images, audio, video, or processed outputs derived from those inputs. AWS documentation for multimodal knowledge bases explicitly supports multimedia ingestion and querying, including image queries and time-based retrieval metadata for audio and video.

Why is multimodal RAG different from normal RAG?

Traditional RAG is usually text-focused.

That works well for manuals, policies, reports, and similar documents.

But in many real environments, important knowledge is spread across:

diagrams
screenshots
scanned pages
recorded calls
videos
field images

So the challenge is no longer only “Which paragraph should I retrieve?”

The new challenge becomes:
Which content is relevant, regardless of format?

That is the real value of multimodal RAG. AWS’s newer multimodal retrieval guidance is built around this exact shift from text-only retrieval to retrieval across media types.

How I see the architecture

A simple multimodal RAG architecture on AWS looks like this:

Data is collected in a source such as Amazon S3
Raw files are processed if needed
A knowledge base indexes the usable content
A query retrieves relevant multimodal context
A foundation model generates the answer
The application returns the answer, often with source grounding

AWS describes Knowledge Bases as a fully managed RAG capability that handles ingestion, retrieval, and prompt augmentation, which is why it fits this workflow so well. AWS also shows multimodal examples where Bedrock Data Automation is used before Knowledge Bases to improve downstream retrieval.

Two main multimodal RAG patterns

This is the most important design point for this article.

Not every multimodal RAG system should be built the same way.

AWS currently describes two main approaches for multimodal processing in Knowledge Bases:

1. Retrieval-first approach

This is the better option when the main goal is:

visual similarity
image search
cross-modal retrieval
media-aware search

In this pattern, Amazon Nova Multimodal Embeddings is the main enabler. AWS describes this approach as the right fit for visual similarity searches and multimodal semantic retrieval.

2. Processing-first approach

This is the better option when the main goal is:

extracting structured meaning from raw media
turning audio, video, or documents into usable searchable content
supporting downstream question answering with processed output

In this pattern, Amazon Bedrock Data Automation becomes the first major step before retrieval. AWS documentation describes BDA as the text-based processing path for multimedia content in multimodal knowledge bases, and AWS has also published solution examples combining BDA with Knowledge Bases for multimodal RAG applications.

How to decide between the two

For me, the design question is simple.

If I want to ask:
“Find content that looks or feels similar.”
then I would think retrieval-first.

If I want to ask:
“Extract useful content from media and use that in RAG.”
then I would think processing-first.

AWS’s own “choose your multimodal processing approach” guidance makes this distinction very clearly, and I think that is the right way to avoid overdesigning the solution.

A practical workflow example

Imagine a support or operations use case.

Your data may include:

PDF maintenance procedures
field images
audio notes from engineers
short troubleshooting videos

A user asks:
“What is the likely issue and what should I check first?”

A text-only RAG system may retrieve a manual section.

A multimodal RAG system can do more:

retrieve a relevant text section
identify matching visual evidence
point to the correct moment in a video
use processed audio or image context to improve the answer

AWS documentation for querying multimodal knowledge bases shows response metadata such as source modality, MIME type, and start and end timestamps for audio and video segments, which makes this type of experience much more practical.

Why Bedrock Knowledge Bases matters here

You can always build your own RAG system.

But one reason Bedrock Knowledge Bases matters is that it reduces the amount of custom plumbing.

AWS positions it as a managed RAG capability that simplifies setup, handles parts of preprocessing and retrieval, and helps ground model responses in proprietary data. For many teams, this is a better starting point than building a fully custom retrieval pipeline from scratch.

Where BDA still matters in multimodal RAG

Even though this article is about RAG, BDA still plays an important role.

Multimodal RAG does not always mean retrieving directly from raw multimedia.

In many cases, the better pattern is:

process the content first

extract structured insights
store or index those outputs
use them in RAG

AWS has shown this pattern in solution examples where Amazon Bedrock Data Automation processes multimodal content, the extracted information is stored in a knowledge base, and then a RAG interface is used for question answering.

One point people often miss

A common mistake is to assume multimodal RAG is only about attaching files to a chatbot.

That is too simple.

A real multimodal RAG system usually includes:

ingestion

processing
indexing
retrieval
prompt augmentation
response generation
source grounding

That is why I see multimodal RAG as an architecture pattern, not just a model feature. AWS Prescriptive Guidance describes Knowledge Bases as covering the RAG workflow from ingestion to retrieval and prompt augmentation, which supports this architecture view.

Constraints to remember

There are also a few practical points to remember.

First, AWS states that multimodal support in Bedrock Knowledge Bases is available with unstructured data sources. Structured data sources do not support multimodal content processing. Second, the available query types and features depend on the processing approach you choose.

So it is important to design the knowledge layer with the right data source model from the start.

Where this is useful

I think multimodal RAG is especially useful in cases like:

technical support
operations knowledge assistants
document and image search
inspection workflows
compliance evidence review
media-rich enterprise search
predictive maintenance assistants

AWS has published examples including multimodal root-cause diagnosis and agentic multimodal assistants, which shows that this pattern is already moving into real business use cases.

Final thoughts

For me, multimodal RAG is where the previous three topics come together.

Multimodal AI gives the overall direction
Bedrock Data Automation helps process raw content
Multimodal Knowledge Bases provide the retrieval layer
Multimodal RAG turns all of that into useful answers

AWS now provides a much clearer path for building these solutions than before, especially with managed multimodal retrieval in Knowledge Bases and guidance on choosing between BDA and Nova Multimodal Embeddings depending on the use case.

For me, the key lesson is simple:

Do not start with the model.

Start with the question:
What kind of content do I need to retrieve, and why?

If that answer is clear, the multimodal RAG design becomes much easier.

In the next article, I would move to the next logical topic:

Amazon Nova Multimodal Embeddings.

Technologies And Concepts: Cheat Sheet for Solutions Architect Associate (SAA-C03)

Ntombizakhona Mabaso — Wed, 22 Apr 2026 17:31:28 +0000

☁️ Exam Guide: Solutions Architect Associate
Technologies And Concepts Cheat Sheet
📘 Cheat Sheet

Note: The SAA-C03 exam guide lists technologies and concepts across all four domains. This cheat sheet consolidates that information into a compact, exam-aligned reference. Organized domain by domain. Designed for quick review and efficient study.

📖 Exam Overview

#	Detail	Info
1	Exam Code	SAA-C03
2	Questions	65 total (50 scored, 15 unscored)
3	Passing Score	720 / 1000
4	Question Types	Multiple choice & Multiple response
5	Experience Required	1+ year hands-on designing cloud solutions on AWS

Domain Weightings

#	Domain	Weight
1	Design Secure Architectures	30%
2	Design Resilient Architectures	26%
3	Design High-Performing Architectures	24%
4	Design Cost-Optimized Architectures	20%

🔒 Domain 1

Design Secure Architectures

1.1 Secure Access to AWS Resources

#	Concept	What to Know
1	IAM	Users, Groups, Roles, Policies: Design flexible authorization models
2	IAM Identity Center	Centralized SSO across multiple AWS accounts
3	MFA	Apply to IAM users and root users as a security best practice
4	Cross-Account Access	Use IAM Roles + STS for role switching and cross-account patterns
5	AWS Organizations & SCPs	Manage multi-account security strategy with Service Control Policies
6	AWS Control Tower	Automate landing zones and guardrails across accounts
7	Resource Policies	Determine when to use resource-based vs identity-based policies
8	Federated Access	Directory service + IAM roles for external identity federation
9	Least Privilege	Core security principle: grant only minimum required permissions
10	Shared Responsibility Model	AWS secures the cloud & you secure what's in it

1.2 Secure Workloads and Applications

#	Concept	What to Know
1	VPC Architecture	Security groups, route tables, NACLs, NAT gateways
2	Subnets	Public vs private subnet segmentation strategies
3	AWS Shield	DDoS protection (Standard free, Advanced paid)
4	AWS WAF	Web Application Firewall for Layer 7 (SQL injection, XSS)
5	AWS Secrets Manager	Rotate, manage, retrieve secrets (DB credentials, API keys)
6	Amazon Cognito	User authentication for web/mobile apps
7	AWS GuardDuty	Threat detection using ML on logs/events
8	Amazon Macie	Discover and protect sensitive data (PII) in S3
9	VPN	Site-to-Site VPN and Client VPN for encrypted connectivity
10	AWS Direct Connect	Dedicated private network connection to AWS

1.3 Data Security Controls

#	Concept	What to Know
1	KMS	Managed key creation, rotation, and control for encryption at rest
2	ACM	Certificate Manager: TLS/SSL for encryption in transit
3	CloudHSM	Hardware Security Module for customer-managed key control
4	Data Classification	Categorize data by sensitivity to apply appropriate controls
5	S3 Versioning & MFA Delete	Protect object data from accidental deletion
6	Backup & Replication	Implement data backup, point-in-time recovery, cross-region replication
7	Data Lifecycle Policies	Manage retention and expiry of data at rest
8	Compliance	Align AWS services to regulatory requirements (GDPR, HIPAA, etc.)

🏗️ Domain 2

Design Resilient Architectures

2.1 Scalable and Loosely Coupled Architectures

#	Concept	What to Know
1	Amazon SQS	Decouple components with message queuing (Standard and FIFO)
2	Amazon SNS	Pub/sub messaging for fan-out patterns
3	Amazon EventBridge	Event-driven routing across AWS services and SaaS apps
4	AWS Step Functions	Workflow orchestration for distributed applications
5	API Gateway	Create, publish, and manage REST/HTTP/WebSocket APIs
6	Amazon AppFlow	Managed data integration between SaaS apps and AWS
7	AWS AppSync	Managed GraphQL API service
8	Serverless Patterns	Lambda + API Gateway + SQS/SNS for event-driven design
9	Microservices	Stateless vs stateful workloads & Independent scaling of components
10	Caching Strategies	Reduce load & know when to use caching vs direct reads
11	Horizontal vs Vertical Scaling	Scale out (add instances) vs scale up (bigger instance)
12	Load Balancers	ALB (Layer 7), NLB (Layer 4), GLB (Layer 3/4 for appliances)
13	Amazon MQ	Managed message broker (ActiveMQ/RabbitMQ) for migrations
14	Multi-tier Architectures	Web / App / DB tiers with distinct roles
15	CDN / Edge Accelerators	CloudFront for caching, Global Accelerator for routing performance

2.2 Highly Available and Fault-Tolerant Architectures

#	Concept	What to Know
1	Availability Zones	Deploy across ≥2 AZs for high availability
2	AWS Regions	Choose regions based on latency, compliance, and redundancy
3	Disaster Recovery Strategies	Backup & Restore → Pilot Light → Warm Standby → Active-Active
4	RPO / RTO	Recovery Point Objective (data loss tolerance) vs Recovery Time Objective (downtime tolerance)
5	Amazon Route 53	DNS with health checks, failover routing, latency-based routing
6	RDS Proxy	Pooled DB connections for Lambda and high-concurrency apps
7	Distributed Design Patterns	Retry with backoff, circuit breaker, bulkhead patterns
8	Service Quotas & Throttling	Plan for limits in standby environments
9	AWS X-Ray	Distributed tracing for workload visibility
10	Immutable Infrastructure	Replace rather than patch: ensures consistency
11	Auto Scaling	EC2 Auto Scaling + AWS Auto Scaling for elastic capacity
12	Storage Durability	S3 (11 9s), EBS (99.999%), choose appropriate tier

⚡ Domain 3

Design High-Performing Architectures

3.1 Storage Solutions

#	Service / Concept	Use Case
1	Amazon S3	Object storage: scalable, durable, lifecycle policies
2	Amazon EBS	Block storage for EC2: SSD (gp3, io2) or HDD (st1, sc1)
3	Amazon EFS	Managed NFS: shared file storage for Linux workloads
4	Amazon FSx	Managed file systems: Windows (SMB), Lustre (HPC), NetApp, OpenZFS
5	AWS Storage Gateway	Hybrid storage: file, volume, tape gateway types
6	Storage Types	Object vs File vs Block: know performance and use-case differences
7	S3 Storage Classes	Standard, Intelligent-Tiering, IA, Glacier, Glacier Deep Archive

3.2 Compute Solutions

#	Service / Concept	Use Case
1	Amazon EC2	Virtual machines: choose instance type/family for workload
2	EC2 Auto Scaling	Automatically add/remove instances based on demand
3	AWS Lambda	Serverless functions: event-driven, scale to zero
4	AWS Fargate	Serverless containers: no EC2 management needed
5	Amazon ECS	Container orchestration on EC2 or Fargate
6	Amazon EKS	Managed Kubernetes: supports Anywhere and Distro variants
7	AWS Batch	Managed batch processing: compute-intensive jobs
8	Amazon EMR	Big data on managed Hadoop/Spark clusters
9	AWS Elastic Beanstalk	PaaS: deploy web apps without managing infrastructure
10	AWS Outposts	AWS infrastructure on-premises (hybrid)
11	AWS Wavelength	Deploy workloads at the edge of 5G networks

3.3 Database Solutions

#	Service / Concept	Use Case
1	Amazon RDS	Managed relational DB: MySQL, PostgreSQL, SQL Server, Oracle, MariaDB
2	Amazon Aurora	High-performance relational DB (MySQL/PostgreSQL compatible)
3	Aurora Serverless	On-demand autoscaling for Aurora (v2 generally available)
4	Amazon DynamoDB	Serverless NoSQL: millisecond latency at any scale
5	Amazon ElastiCache	In-memory caching: Redis (complex data) vs Memcached (simple)
6	Amazon Redshift	Data warehouse: columnar storage for analytics queries
7	Amazon DocumentDB	Managed MongoDB-compatible document database
8	Amazon Neptune	Graph database for connected data (social graphs, fraud detection)
9	Amazon Keyspaces	Managed Apache Cassandra-compatible service
10	Read Replicas	Offload read traffic & know when to use vs Multi-AZ
11	Caching Patterns	Cache-aside, write-through, TTL strategies
12	DB Capacity Planning	Capacity Units (DynamoDB), Provisioned IOPS, instance sizing

3.4 Network Architectures

#	Service / Concept	Use Case
1	Amazon VPC	Isolated virtual network: subnets, route tables, IGW, NAT
2	Amazon CloudFront	CDN: cache content at edge locations globally
3	AWS Global Accelerator	Route users to optimal endpoints using AWS global network
4	Elastic Load Balancing	ALB (HTTP/S), NLB (TCP/UDP), GLB (appliances)
5	AWS Direct Connect	Dedicated private line to AWS (predictable performance)
6	AWS Transit Gateway	Hub-and-spoke for connecting many VPCs and on-prem networks
7	VPC Peering	Direct VPC-to-VPC connectivity (no transitive routing)
8	AWS PrivateLink	Private access to AWS services and third-party services
9	Amazon Route 53	DNS. Routing policies: simple, weighted, latency, failover, geolocation
10	Network Topology	Global, hybrid, multi-tier & design for scale

3.5 Data Ingestion and Transformation

#	Service / Concept	Use Case
1	Amazon Kinesis	Real-time streaming data: Data Streams, Data Firehose, Video Streams
2	Amazon Data Firehose	Load streaming data to S3, Redshift, OpenSearch
3	AWS Glue	Serverless ETL: transform and catalog data
4	Amazon Athena	Serverless SQL queries on S3 data
5	AWS Lake Formation	Build, secure, and manage data lakes on S3
6	Amazon EMR	Process large datasets with Hadoop, Spark, Hive
7	Amazon MSK	Managed Apache Kafka for streaming pipelines
8	AWS DataSync	Automate data transfer between on-prem and AWS storage
9	AWS Transfer Family	Managed SFTP/FTPS/FTP to S3 or EFS
10	Amazon QuickSuite	BI and data visualization service
11	Amazon OpenSearch	Search and analytics & also supports vector similarity (RAG)
12	Amazon Redshift	Query structured data at petabyte scale

💰 Domain 4

Design Cost-Optimized Architectures

4.1 Cost-Optimized Storage

#	Concept	What to Know
1	S3 Storage Classes	Match class to access frequency & Glacier for archival
2	S3 Lifecycle Policies	Automate transitions between storage classes
3	S3 Intelligent-Tiering	Auto-move objects between tiers based on access patterns
4	EBS Volume Types	gp3 vs io2 vs st1 vs sc1 & match to IOPS and cost needs
5	Requester Pays	Transfer cost charged to requester, not bucket owner
6	Data Lifecycle Management	Retain only what's needed & expire or archive the rest
7	Hybrid Storage	DataSync, Transfer Family, Storage Gateway for on-prem cost reduction
8	Backup Strategy	Balance recovery needs with cost (snapshots, replication)

4.2 Cost-Optimized Compute

#	Concept	What to Know
1	On-Demand Instances	Pay per use: highest flexibility, highest per-hour cost
2	Reserved Instances	1 or 3 year commitment: up to 72% savings
3	Savings Plans	Flexible commitment (Compute, EC2, SageMaker)
4	Spot Instances	Up to 90% savings for fault-tolerant/interruptible workloads
5	AWS Compute Optimizer	ML-based recommendations for right-sizing EC2, Lambda, EBS
6	AWS Serverless Application Repository	Pre-built serverless apps: reduce build cost
7	EC2 Hibernation	Save instance state to EBS: resume without full reboot
8	Containerization	ECS/EKS/Fargate for higher density and cost efficiency
9	Instance Families	General purpose, compute optimized, memory optimized, storage optimized
10	VMware Cloud on AWS	Extend VMware workloads to AWS without refactoring

4.3 Cost-Optimized Databases

#	Concept	What to Know
1	DynamoDB On-Demand vs Provisioned	On-demand for unpredictable; provisioned for predictable + cheaper
2	Aurora Serverless	Pay per ACU-hour: ideal for intermittent workloads
3	RDS Reserved Instances	Commit to 1 or 3 years for significant savings
4	Read Replicas	Offload reads to reduce primary DB load (and cost)
5	DB Snapshot Policies	Balance frequency vs storage cost
6	Caching	ElastiCache reduces DB query load and cost
7	Data Retention Policies	Define how long to keep data: archive vs delete
8	Right-Sized DB Instances	Don't over-provision: use metrics to guide sizing

4.4 Cost-Optimized Network Architectures

#	Concept	What to Know
1	NAT Gateway vs NAT Instance	NAT Gateway scales automatically but costs more & NAT instance is cheaper at low traffic
2	VPC Endpoints	Eliminate NAT costs for S3/DynamoDB & use Gateway Endpoints (free)
3	Direct Connect vs VPN	Direct Connect more expensive but predictable; VPN cheaper for low volume
4	Region-to-Region Transfer	Data egress fees apply & minimize cross-region traffic
5	Same-AZ Traffic	Free & architect to keep traffic within same AZ where possible
6	CloudFront	Reduce origin data transfer costs with edge caching
7	Transit Gateway Pricing	Attachment + data processing fees & evaluate vs VPC peering
8	Throttling Strategy	Use API Gateway throttling to control overuse and cost spikes

🛠️ AWS Cost Management Tools

Tool	Purpose
AWS Cost Explorer	Visualize and analyze historical spend and forecast costs
AWS Budgets	Set spend/usage thresholds with alerts
AWS Cost and Usage Report	Granular billing data exportable to S3
Savings Plans	Flexible commitment model for compute savings
Cost Allocation Tags	Tag resources to attribute costs to teams/projects
AWS Compute Optimizer	Right-sizing recommendations based on usage
AWS Trusted Advisor	Best-practice checks across cost, security, performance
AWS Well-Architected Tool	Review architecture against the Well-Architected Framework

💡 Disaster Recovery Strategy Comparison

Strategy	RPO	RTO	Cost	Description
Backup & Restore	Hours	Hours	💰 Lowest	Back up to S3/Glacier & restore on failure
Pilot Light	Minutes	10s of minutes	💰💰	Core services always running &scale up on failure
Warm Standby	Seconds/Minutes	Minutes	💰💰💰	Scaled-down live environment & quickly scale to full
Active-Active	Near zero	Near zero	💰💰💰💰 Highest	Full duplicate environment & traffic split between sites

🔑 Key Abbreviations

Abbreviation	Full Term
IAM	Identity and Access Management
SCP	Service Control Policy
MFA	Multi-Factor Authentication
STS	Security Token Service
ACM	AWS Certificate Manager
KMS	Key Management Service
VPC	Virtual Private Cloud
NACL	Network Access Control List
ALB	Application Load Balancer
NLB	Network Load Balancer
GLB	Gateway Load Balancer
CDN	Content Delivery Network
RPO	Recovery Point Objective
RTO	Recovery Time Objective
DR	Disaster Recovery
EBS	Elastic Block Store
EFS	Elastic File System
FSx	Amazon FSx (managed file systems)
SQS	Simple Queue Service
SNS	Simple Notification Service
ETL	Extract, Transform, Load
HDD	Hard Disk Drive
SSD	Solid State Drive
IOPS	Input/Output Operations Per Second
RI	Reserved Instance
ACU	Aurora Capacity Unit
PII	Personally Identifiable Information
SSO	Single Sign-On

🚀 In Scope AWS Services Quick Reference

Compute

Amazon EC2 · EC2 Auto Scaling · AWS Lambda · AWS Fargate · AWS Elastic Beanstalk · AWS Batch · AWS Outposts · VMware Cloud on AWS · AWS Wavelength · AWS Serverless Application Repository

Containers

Amazon ECR · Amazon ECS · ECS Anywhere · Amazon EKS · EKS Anywhere · Amazon EKS Distro

Storage

Amazon S3 · Amazon EBS · Amazon EFS · Amazon FSx · AWS Storage Gateway · AWS Snow Family

Database

Amazon RDS · Amazon Aurora · Aurora Serverless · Amazon DynamoDB · Amazon ElastiCache · Amazon Redshift · Amazon DocumentDB · Amazon Neptune · Amazon Keyspaces

Networking & Content Delivery

Amazon VPC · Amazon CloudFront · AWS Direct Connect · Elastic Load Balancing · AWS Global Accelerator · AWS PrivateLink · Amazon Route 53 · AWS Site-to-Site VPN · AWS Client VPN · AWS Transit Gateway

Analytics

Amazon Athena · Amazon EMR · AWS Glue · Amazon Kinesis · Amazon Data Firehose · Amazon Kinesis Video Streams · Amazon MSK · Amazon OpenSearch Service · Amazon QuickSuite · Amazon Redshift · AWS Lake Formation · AWS Data Exchange

Application Integration

Amazon SQS · Amazon SNS · Amazon EventBridge · Amazon MQ · AWS Step Functions · Amazon AppFlow · AWS AppSync

Security, Identity & Compliance

AWS IAM · AWS IAM Identity Center · Amazon Cognito · AWS KMS · AWS CloudHSM · AWS ACM · Amazon GuardDuty · Amazon Macie · Amazon Detective · AWS Shield · AWS WAF · AWS Secrets Manager · AWS Directory Service · AWS Artifact · AWS Audit Manager

Management & Governance

AWS Organizations · AWS Control Tower · AWS CloudFormation · AWS CloudTrail · Amazon CloudWatch · AWS Config · AWS Systems Manager · AWS Auto Scaling · AWS Compute Optimizer · AWS Trusted Advisor · AWS Well-Architected Tool · AWS Service Catalog · AWS Health Dashboard · AWS License Manager · Amazon Managed Grafana · Amazon Managed Service for Prometheus

Migration & Transfer

AWS DMS · AWS DataSync · AWS Snow Family · AWS Transfer Family · AWS Application Migration Service

Machine Learning

Amazon SageMaker AI · Amazon Comprehend · Amazon Kendra · Amazon Lex · Amazon Polly · Amazon Rekognition · Amazon Textract · Amazon Transcribe · Amazon Translate

Cost Management

AWS Budgets · AWS Cost Explorer · AWS Cost and Usage Report · Savings Plans

Developer Tools

AWS X-Ray

Serverless

AWS Lambda · AWS Fargate · Amazon API Gateway · Amazon DynamoDB · Amazon EventBridge · Amazon SQS · Amazon SNS

⚠️ Important: Always refer to the official exam guide for the most up-to-date list of in-scope and out-of-scope services.

📚 Additional Resources

Good luck with your exam! 🚀

Keeping Pirate Weather Afloat: Inside the AWS Pipeline and the Christmas Eve Outage

Alexander Rey — Wed, 22 Apr 2026 15:30:47 +0000

Since it's been a while since I last covered Pirate Weather's AWS infrastructure, I thought it was time to write a short update on how everything fits together, and also explain where things have gone wrong. At a high level, Pirate Weather is a Python script that reads Zarr files. These files are created from a series of scripts that run on a schedule, download the data, perform some light processing, and save .zip files for the response script.

Ingestion & Processing: A suite of Python scripts runs on a precise schedule, triggered by Amazon EventBridge. These scripts are orchestrated by AWS Step Functions, which manage AWS Fargate containers (using our custom ARM-based image). These containers download raw data, perform light processing, and "chunk" the data into Zarr format for lightning-fast retrieval.
Storage Strategy: The processed Zarr data is initially persisted as zip files on Amazon S3. To minimize latency, an rclone container syncs these files to autoscaled EC2 NVMe instances.
- By serving data from local NVMe storage rather than directly from S3, we achieve the IOPS necessary for real-time weather requests.
- Using zip files avoids the having a ton of S3 objects and the associated transaction costs.
- Notably, the time for each model forecast is included in every chunk, which avoids having to rely on metadata.
ECS service: An ECS service coordinates four containers running on the EC2 instances: rclone for syncing, the production FastAPI container, the development container, the historic data (Time Machine) container, and Kong.
- This ensures that things are restarted if there are issues, handles placement on the instances, and container updates.
Traffic Management & Security: Inbound requests are routed through Amazon CloudFront to a Network Load Balancer (NLB), which passes it to the EC2 instances. From there, traffic hits a Kong Gateway container, which manages authentication and rate limiting.
Data Persistence: The gateway and API layers are supported by Amazon ElastiCache (Redis) for rapid session/rate-limit caching and an Amazon RDS database for persistent metadata and user information.

There's quite a few nuances to the various pieces; however, this is "meat and potatoes" of it.

December 24, 2025 downtime incident

The four hour production downtime had two root causes. The first was traced to a configuration conflict between our AWS Step Function definitions and the underlying ECS cluster strategy. While our ECS cluster is architected to run a resilient 50:50 mix of Fargate Spot and Fargate On-Demand instances, the Step Function definition responsible for triggering the ingestion tasks contained an explicit override. As seen in the configuration snippet below, the task was hardcoded to rely exclusively on FARGATE_SPOT. During a period of high Spot instance reclamation in our availability zone, these ingestion containers were repeatedly terminated by AWS before completion, halting the data pipeline.

  "CapacityProviderStrategy": [
    {
      "CapacityProvider": "FARGATE_SPOT",
      "Weight": 1
    }
  ],

This is an issue on it's own; however, should have been recoverable; however, the ingestion failure was amplified by a logic error in the processing scripts, which lacked a fallback mechanism for missing GFS data when the two day buffer was exceeded, causing the forecast generation to fail entirely rather than serving stale or partial data. To resolve this, I have updated all Step Function task definitions to remove the explicit CapacityProviderStrategy override. The tasks now defer to the ECS cluster’s default capacity provider strategy, ensuring a stable 50:50 distribution between Spot and On-Demand instances. This change guarantees that even if Spot capacity is volatile, the On-Demand instances will ensure the ingestion process completes successfully. I've also added additional logging on when ingest tasks fail, which will avoid missing failures in the underlying data, as well as a check to avoid serving stale model results (PR #542).

Stop Paying Too Much for CloudWatch Logs — Auto-Archive to S3 via Firehose

Yuichi Sato — Tue, 21 Apr 2026 11:58:09 +0000

This article was originally written in Japanese and published on Qiita. It has been translated with the help of AI.
Original article: https://qiita.com/sassssan68/items/da2aa98bba12748daca7

Have you ever calculated how much it actually costs to keep CloudWatch Logs long-term?

TL;DR
Keeping logs in CloudWatch Logs long-term is expensive.
Subscription → Firehose → S3 (Deep Archive) is more stable and cost-effective.

I recently had an audit requirement to retain logs for 18 months. When I estimated the CloudWatch Logs cost…

18 months = $1,069.20
— That's way too much!

So I followed the AWS-recommended architecture — Subscription → Firehose → S3 — and combined it with a lifecycle policy to transition to Deep Archive. The result:

~85% cost reduction
Plus fully automated, stable operations

This article covers:

Why CreateExportTask is not recommended (per AWS)
How much cost you can actually save (with formulas)
How to set up Subscription → Firehose → S3 (Deep Archive)

Why You Shouldn't Keep Logs in CloudWatch Logs Long-Term

Audit and regulatory requirements often mandate log retention for years. However, storing large volumes of logs in CloudWatch Logs gets expensive fast:

High storage cost — CloudWatch Logs storage pricing is heavy
Scales linearly — The more data you store, the worse it gets
Not designed for long-term archival — It's a monitoring tool, not a storage solution

This raises the question: What's the right way to handle long-term log retention?

AWS Says Export Task Is "Not Recommended" — Here's Why

From the CloudWatch console, you can manually export logs to S3. To automate this, you'd use the CreateExportTask API:

https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_CreateExportTask.html

You could call this periodically from Lambda or EventBridge Scheduler. However, the AWS documentation explicitly discourages this:

Note
We recommend that you don't regularly export to Amazon S3 as a way to continuously archive your logs. For that use case, we instead recommend that you use subscriptions. For more information about subscriptions, see Real-time processing of log data with subscriptions.

On top of that, there's a concurrency limit of 1 export task at a time. If you're exporting from multiple log groups or across multiple time ranges, tasks will queue up, causing failures and delays.

Given these limitations, CreateExportTask is unreliable for audit-grade long-term retention. As the AWS docs say, subscriptions are the way to go.

The AWS-Recommended Architecture: Subscription → Firehose → S3

AWS recommends using CloudWatch Logs subscription filters:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html

With Firehose, logs are transferred in near real-time — no manual operations required, stable and suitable for long-term archival.

But when I first saw this architecture, I thought: "This looks expensive. Is it actually cheaper than just leaving logs in CloudWatch?"

So I ran the numbers.

Cost Comparison: CloudWatch Logs vs. Subscription → Firehose → S3 (Deep Archive)

Bottom line:
Over 18 months, Subscription → Firehose → S3 (Deep Archive) is approximately 85% cheaper.

⚠️ Note:
If your retention period is 2 months or less, CloudWatch Logs may actually be cheaper.
The conclusion in this article assumes long-term retention.

Assumptions

Retention period: 18 months
Monthly log volume: 100 GB
Tokyo region pricing (as of April 2026):
- CloudWatch Logs storage: $0.033/GB/month
- Firehose delivery: $0.036/GB
- S3 Standard: $0.025/GB/month
- Glacier Deep Archive: $0.002/GB/month
Case 1: Keep all logs in CloudWatch Logs for the full 18 months
Case 2: Keep logs in CloudWatch for 2 weeks (for analysis), simultaneously stream via Firehose → S3 Standard → Glacier Deep Archive after 1 day

Comparison Table

Item	Case 1	Formula	Case 2	Formula
CloudWatch Logs storage	$1,069.20	0.033 × (100 × 18) × 18	$27.72	0.033 × (100 × 14/30) × 18
Firehose delivery	—	—	$64.80	0.036 × 100 × 18
S3 Standard (1 day)	—	—	$1.50	0.025 × (100 × 1/30) × 18
Glacier Deep Archive	—	—	$64.80	0.002 × (100 × 18) × 18
Total	$1,069.20	—	$158.82	—
Savings	—	—	~$910.38 (~85% reduction)	—

Note:
- Average stored volume for 2-week retention: monthly log volume × (14/30)

Conclusion

By keeping logs in CloudWatch for just 2 weeks (for analysis) and archiving older logs in Deep Archive, you can achieve approximately 85% cost savings over 18 months.

How to Set It Up

The setup involves five steps:

Create an S3 bucket with a lifecycle rule (transition to Glacier Deep Archive after 1 day)
Create a Firehose stream (source: Direct PUT, destination: S3)
Create an IAM role for the subscription filter
Create a CloudWatch Logs subscription filter
Verify logs are flowing to S3

For steps 3 and 4, the AWS documentation provides a complete walkthrough including the IAM policy:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#FirehoseExample

Summary

Export Task is not recommended (per AWS documentation)
Firehose is the most operationally practical solution
S3 lifecycle rules enable cost-optimized long-term archival

For short-term log retention, CloudWatch Logs works just fine. But if you need to retain logs for months to years, Subscription → Firehose → S3 (Deep Archive) is the practical solution.

When long-term retention becomes a requirement, it's worth revisiting your architecture.

I hope this helps anyone else dealing with the same log retention cost challenges.

AWS Data & AI Stories #03: Multimodal Knowledge Bases

Sedat SALMAN — Mon, 20 Apr 2026 19:01:00 +0000

In the first article, I talked about multimodal AI at a high level.

In the second one, I focused on Amazon Bedrock Data Automation as the processing layer.

Now the next question is simple:

After we process the content, how do we make it searchable and useful for AI applications?

This is where multimodal knowledge bases come in.

Amazon Bedrock Knowledge Bases now supports multimodal content, including images, audio, and video, in addition to traditional unstructured text sources. It also supports multimodal querying, including image-based search and retrieval across media types.

For me, this is the layer that turns processed content into usable context.

What is a multimodal knowledge base?

A knowledge base is a managed retrieval layer for your own content.

Instead of asking a model to rely only on general training knowledge, a knowledge base helps the system retrieve information from your own files and data sources before generating a response. That is the main idea behind Retrieval Augmented Generation, or RAG. Amazon Bedrock Knowledge Bases is designed for exactly this purpose: it retrieves relevant information from your data sources and uses it to improve response relevance and accuracy.

A multimodal knowledge base extends that idea beyond text.

So instead of only working with documents, the system can also work with:

images
audio
video
mixed-content files

This matters because enterprise knowledge is rarely text only.

Why does this matter?

Because many real-world systems do not store knowledge in perfect written documents.

A lot of value exists in:

diagrams
scanned files
screenshots
inspection photos
recorded calls
training videos
equipment images
operational media

If our knowledge layer only understands text, a large part of business context stays outside the system.

With multimodal retrieval in Bedrock Knowledge Bases, AWS now supports ingesting, indexing, and retrieving information from text, images, video, and audio in a more unified workflow. AWS also notes that applications can search using an image query to find visually similar content or relevant scenes in multimedia sources.

Where it fits in the architecture

I see the flow like this:

Raw content → processing layer → knowledge base → retrieval → answer or action

So:

Part 1 was the general multimodal AI view
Part 2 was the processing layer with Bedrock Data Automation
Part 3 is the retrieval layer

That means the knowledge base is not the first step.

It comes after the content is already available in a usable form, whether directly from unstructured sources or after preprocessing.

AWS documentation also makes this separation clearer now by distinguishing multimodal processing approaches depending on the goal: Nova Multimodal Embeddings for visual similarity and cross-modal retrieval, or Bedrock Data Automation for text-oriented processing of multimedia content.

Two ways to think about multimodal retrieval

This is the most important design point.

Not every multimodal use case is the same.

AWS currently describes two main multimodal processing approaches for knowledge bases:

1. Nova Multimodal Embeddings approach

This is better when the focus is:

visual similarity
image search
cross-modal retrieval
searching media with text or image input

AWS documentation says this approach is suited for visual similarity searches and multimodal semantic retrieval.

Bedrock Data Automation approach

This is better when the focus is:

extracting structured meaning from multimedia
turning media into searchable text-oriented outputs
using processed content in downstream RAG

AWS documentation describes this option as the text-based processing path for multimedia content.

For me, the decision is simple:

If I want to find similar content across modalities, I think retrieval-first.
If I want to extract useful content from media and then search it, I think processing-first.

What can you query?

This is one of the nice parts of the newer multimodal support.

After ingesting multimodal content, Bedrock Knowledge Bases supports different query patterns depending on the selected approach. AWS documentation for testing and querying multimodal knowledge bases shows support for metadata such as:

source modality
MIME type
chunk start time for audio/video
chunk end time for audio/video

It also mentions playback controls with automatic segment positioning for multimedia results in the console.

That means this is not just “retrieve a paragraph.”

It can also become:

retrieve a scene from a video
return the relevant moment in an audio file
find a matching image
connect retrieved media segments to an answer

That is a big step forward compared with traditional text-only RAG.

How I would explain it simply

A traditional knowledge base answers:

“Which text chunk is relevant?”

A multimodal knowledge base can answer:

“Which content is relevant, regardless of whether it is text, image, audio, or video?”

That is the real difference.

Data source point to remember

There is one important limitation to keep in mind.

AWS documentation states that multimodal support in Bedrock Knowledge Bases is available when creating a knowledge base with unstructured data sources. Structured data sources do not support multimodal content processing.

That is important for design.

If your use case depends heavily on images, audio, or video, you should think in terms of unstructured content pipelines, not only structured tables.

A practical example

Imagine a support or operations platform.

Your users may store:

PDF manuals
field photos
recorded troubleshooting calls
short maintenance videos

A user asks:
“Show me the relevant maintenance guidance for this equipment issue.”

A traditional text-only system may retrieve only written manuals.

A multimodal knowledge base can potentially retrieve:

a relevant text section
a matching image
a useful audio segment
a video moment with the right scene

And then that context can be passed to the model for answer generation.

That is why this is more than just a storage feature.

It is a better retrieval model for real-world knowledge.

Why I like this layer

I like multimodal knowledge bases because they make AI architecture more realistic.

In many enterprise environments, the problem is not lack of data.

The problem is that the useful data is trapped inside different formats and scattered across different files.

A multimodal knowledge base helps solve that by creating a retrieval layer that can work across those formats. AWS positions Knowledge Bases as an out-of-the-box RAG capability that reduces the effort of building pipelines and helps applications answer queries using proprietary content, with source-grounded responses and citations.

Common mistake

A common mistake is to assume that all multimodal use cases need the same architecture.

They do not.

For example:

image similarity search is not the same as document extraction
video segment retrieval is not the same as audio transcription
cross-modal search is not the same as text-based RAG over processed media

AWS’s own multimodal guidance now separates these choices clearly, and I think that is the right way to approach the design.

What I would decide early

Before building the knowledge base, I would answer these questions:

Do I need visual similarity or text-oriented retrieval?
Am I retrieving directly from raw multimodal content, or from processed output?
Do I need image queries?
Do I need timestamped retrieval from audio or video?
Do I want the knowledge base mainly for search, RAG, or both?

These questions make the architecture much clearer.

Final thoughts

For me, multimodal knowledge bases are the point where multimodal AI becomes operational.

They connect processed or stored media-rich content to retrieval, and they make it possible to build AI systems that are grounded in more than just text. With Amazon Bedrock Knowledge Bases, AWS now supports multimodal ingestion and retrieval across images, audio, video, and text, along with query-time metadata that can point to the right file type and even the right media segment.

That makes this layer very important.

Because once retrieval improves, the answers improve.

And once the answers improve, the AI system becomes much more useful.

In the next article, I would move to the next logical topic:

How to use multimodal retrieval in a real RAG workflow on AWS.

Cloud Myths You Should Probably Stop Believing

Faisal Ibrahim Sadiq — Mon, 20 Apr 2026 14:41:18 +0000

If you spend enough time around cloud conversations, you’ll notice a pattern: a lot of people repeat the same ideas about the cloud as if they’re universally true.

The problem is, many of these ideas are either oversimplified or just wrong. And if you build your understanding on them, you’ll make poor decisions when it actually matters; like during architecture design, cost planning, or scaling.

Here are some of the most common cloud myths worth unlearning.

1. “Cloud is always cheaper”

This one gets repeated a lot, especially in beginner discussions.

Cloud can be cheaper, but only if you know what you’re doing.

With a provider like Amazon Web Services, pricing is usage-based. That sounds great until you realize how easy it is to:

leave instances running 24/7
over-provision resources
ignore data egress costs

At that point, you’re not saving money but just scaling your bill.

Cloud is cost-efficient when optimized. Not by default.

2. “The cloud is secure out of the box”

Cloud providers invest heavily in security, but that doesn’t mean your application is automatically secure.

There’s something called the Shared Responsibility Model:

The provider secures the infrastructure
You secure everything you deploy on top of it

Misconfigured storage, weak IAM policies, exposed APIs...these are still your responsibility.

Most cloud breaches don’t happen because the provider failed. They happen because of bad configuration.

3. “You don’t need DevOps in the cloud”

If anything, cloud environments increase the need for DevOps practices.

You’re still dealing with:

CI/CD pipelines
monitoring and logging
infrastructure provisioning

Tools like Docker and Kubernetes become even more important, not less.

Cloud doesn’t remove operational complexity, it just changes how you handle it.

4. “Cloud means no downtime”

Cloud providers are reliable, but they can't make up fpr poor architureal decisions.

If your system goes down because you deployed everything in one region, that’s not a cloud failure—that’s an architecture decision.

High availability is something you design for. It’s not automatically included.

5. “Cloud is just someone else’s computer”

You’ve probably heard this one😂.

It’s not completely wrong, but it misses the point.

Cloud is more than remote servers. You get:

managed databases
serverless computing
global infrastructure

For example, with AWS Lambda, you don’t even manage servers directly.

That’s a very different model from traditional infrastructure.

6. “You don’t need to understand networking”

This is a quick way to struggle in cloud engineering.

You still need to understand:

VPCs
subnets
routing
firewalls/security groups

In many cases, cloud networking is more complex than on-prem setups because of how flexible it is.

7. “Auto-scaling will fix performance issues”

Auto-scaling helps with load, not bad design.

If your system is inefficient:

it will still be inefficient at scale
it will just cost more while being inefficient

Scaling a poorly designed system doesn’t automatically fix it.

8. “Serverless means there are no servers”

There are still servers.

You just don’t manage them.

That abstraction is powerful, but it doesn’t remove concepts like:

cold starts
execution limits
resource constraints

Understanding what’s happening under the hood still matters.

9. “Cloud migration is a one-time thing”

A lot of teams think moving to the cloud is the finish line.

It’s not.

After migration, you still need to:

optimize costs
improve architecture
monitor performance
tighten security

Cloud is an ongoing process, not a one-off project.

Final thought

The cloud solves so many problems and offers a lot of solutions but you don't want to be misinformed going into the cloud. So give cloud a try, make use of the AWS free tier. Learn, build, and learn again. You want to make sure you have a solid foundation. So when the time comes, you'll be equipped and ready to maximize the advantages of the cloud.

Stop Giving AI Agents AWS Credentials: A Better Way to Secure Access

Sarvar Nadaf — Mon, 20 Apr 2026 14:00:16 +0000

👋 Hey there, tech enthusiasts!

I'm Sarvar, a Cloud Architect with a passion for transforming complex technological challenges into elegant solutions. With extensive experience spanning Cloud Operations (AWS & Azure), Data Operations, Analytics, DevOps, and Generative AI, I've had the privilege of architecting solutions for global enterprises that drive real business impact. Through this article series, I'm excited to share practical insights, best practices, and hands-on experiences from my journey in the tech world. Whether you're a seasoned professional or just starting out, I aim to break down complex concepts into digestible pieces that you can apply in your projects.

Let's dive in and explore the fascinating world of cloud technology together! 🚀

The Wake-Up Call

Three months ago, our security team flagged something concerning. Developers were feeding production logs, error messages, and configuration snippets to ChatGPT for debugging help.

The problem? Those logs contained customer identifiers, internal service names, and architectural details we definitely didn't want leaving our network.

We couldn't just block ChatGPT - developers needed AI assistance. The productivity gains were real. But we also couldn't keep hemorrhaging sensitive data to external APIs.

The requirements were clear:

AI agents need AWS access for legitimate automation tasks
Zero sensitive data leaves our AWS environment
Every action must be auditable
Principle of least privilege, always
No impact on developer velocity

That's when I started looking at Model Context Protocol (MCP) as a security boundary.

Understanding MCP as a Security Layer

Before diving into implementation, let's clarify what MCP actually does and why it matters for security.

Model Context Protocol is an open standard that sits between your AI agent and your resources. Think of it as a translator and gatekeeper combined.

Developer → AI Agent → MCP Server → AWS IAM → AWS Resources
                          ↓
                    Security Layer

The MCP server doesn't just pass requests through. It acts as a security boundary that:

Validates every request before execution
Translates AI intentions into specific AWS API calls
Enforces authentication and authorization
Logs everything for audit trails
Provides a single point of control

Why this matters: Instead of giving AI agents direct AWS credentials, you give them access to an MCP server that has carefully scoped permissions. The AI never touches AWS credentials. It doesn't even know they exist.

The Security Architecture

After several iterations, here's the pattern that survived production. I'll explain the thinking behind each layer.

Layer 1: Authentication Without Permanent Credentials

The first principle: no permanent credentials anywhere in the system.

Developers authenticate with our existing identity provider (Okta in our case). The identity provider issues a JWT token containing the user's identity and group memberships. The MCP server validates this JWT and issues a short-lived session token - 15 minutes, no exceptions.

Why 15 minutes? Long enough for a debugging session, short enough that a leaked token becomes useless quickly. If someone steals a session token, they have a 15-minute window at most. Compare that to permanent AWS credentials that work forever until manually revoked.

The MCP server never stores these tokens. They're validated, used, and discarded. When they expire, users re-authenticate. It's a minor inconvenience that prevents major security incidents.

Layer 2: Request Validation

This is where MCP shines as a security boundary. Every request goes through multiple validation checks:

Action Allowlist: The MCP server maintains a strict list of allowed AWS actions. If the AI requests something not on the list, it's blocked immediately. No wildcards, no "just in case" permissions.

Pattern Detection: I scan every request for dangerous patterns. Words like "delete", "terminate", "destroy" trigger additional scrutiny. Even if the action is technically allowed, suspicious patterns can block the request or require additional approval.

Parameter Sanitization: Before logging or processing, all sensitive parameters get redacted. Passwords, tokens, API keys - anything that looks like a credential gets replaced with [REDACTED] in logs. This prevents credential leakage through audit trails.

Rate Limiting: Each user gets a request budget. Exceed it, and requests start getting throttled. This prevents both accidental runaway scripts and intentional abuse.

The validation happens in milliseconds. Developers don't notice the overhead, but it's the difference between a secure system and a disaster waiting to happen.

Layer 3: AWS Execution with Scoped Permissions

The MCP server uses an IAM role with specific permissions. Not admin. Not power user. Just what's needed for legitimate use cases.

I started by listing every legitimate use case developers had:

Read CloudWatch logs for debugging
List S3 buckets to find data
Get objects from specific buckets
Query CloudWatch metrics for dashboards

Then I created IAM policies that allow exactly those actions and nothing else.

The key insight: Explicit denies for dangerous actions, even if they're not in the allow list. This protects against future policy changes or misconfigurations.

Example: Even if someone accidentally adds s3:* to the allow list, an explicit deny on s3:DeleteBucket still blocks it. Defense in depth.

Layer 4: Comprehensive Audit Trail

CloudTrail logs every AWS API call, but it doesn't capture the context we need. Who made the request? What was the AI prompt? What resources were accessed?

I built a custom logging layer that captures:

User identity (email, not just IAM role)
Original AI prompt (hashed, not stored in plain text)
AWS action requested
Resources accessed
Result (success/failure)
Execution time

All of this goes to CloudWatch Logs in structured JSON format. Now I can query: "Show me all S3 access by user X this week" or "What resources did the AI access when processing this prompt?"

The logs are immutable and retained for 90 days for compliance.

How We Built It

The deployment came down to three critical security decisions. Each one was driven by a specific threat we wanted to prevent.

Decision 1: Network Isolation Over Convenience

I put the MCP server in a completely separate VPC from production. No shared networks, no VPC peering, nothing. The only communication path is through VPC endpoints to AWS APIs.

Why this matters: If someone compromises the MCP server, they're trapped. No internet access means they can't exfiltrate data. No production VPC access means they can't pivot to other systems. They're stuck in a cage that only opens to specific AWS services.

I chose ECS Fargate because it gave me this isolation without the overhead of managing EC2 instances. No patching, no scaling configuration, just containers in a locked-down network.

The trade-off: More complex networking setup. But the security benefit was worth it. A compromised MCP server becomes useless to an attacker.

Decision 2: Explicit Denies as the Last Line of Defense

The IAM policy has two blocks: allows and denies. The allows are specific - exact actions on exact resources. But the denies are what keep me sleeping at night.

I explicitly deny all delete operations, all terminate operations, all IAM changes, and all KMS key operations. Even if someone misconfigures the allow block and adds s3:*, the deny on s3:DeleteBucket still holds.

Why this matters: Policies get changed. People make mistakes. The deny block is the safety net that catches those mistakes before they become incidents.

The trade-off: More rigid system. If we need to add a delete operation later, we have to modify both blocks. But that friction is intentional - it forces us to think twice about dangerous permissions.

Decision 3: Real-Time Alerting Over Post-Incident Analysis

I set up CloudWatch alarms that fire immediately when something looks wrong. High error rates, unusual request volumes, spikes in blocked actions - all trigger alerts to our security team's Slack channel.

Why this matters: Logs are great for forensics, but alerts prevent incidents. If the AI starts trying malicious actions, I want to know in real-time, not during next week's log review.

The alerts are tuned to avoid noise. More than 50 errors in 5 minutes is abnormal. More than 1,000 requests from one user in 5 minutes is suspicious. These thresholds came from watching normal usage patterns for a month.

The trade-off: Alert fatigue is real. We tune the thresholds monthly based on false positive rates. But I'd rather investigate a false alarm than miss a real attack.

What Broke (And How I Fixed It)

Issue 1: Permission Errors Everywhere

What happened: First deployment, every request failed with AccessDenied.

The problem: I was too restrictive. The IAM policy only allowed specific S3 buckets, but developers needed to list buckets first to know what existed.

The fix: Add s3:ListAllMyBuckets with a wildcard resource. Let them see what exists, but control what they can read. It's like letting someone see the library catalog without giving them keys to every book.

Lesson: Start with read-only list permissions, then restrict data access. Users need to discover resources before they can use them.

Issue 2: CloudTrail Logs Were Useless

What happened: CloudTrail showed the MCP server's actions, but not which user requested them.

The problem: All requests came from the same IAM role. No way to trace back to individual users.

The fix: Pass user context through custom CloudWatch Logs. Every MCP request gets logged with the user's email, the action requested, and the resources accessed. Now I can trace every action back to the person who requested it.

Lesson: CloudTrail alone isn't enough for multi-user systems. You need custom logging to capture user context.

Issue 3: AI Agents Tried Creative Exploits

What happened: The AI tried to chain commands to bypass restrictions.

Example request:

"First list the S3 buckets, then for each bucket, 
download all objects and search for passwords"

The problem: My validation checked individual actions, not sequences. The AI was trying to automate a multi-step attack.

The fix: Detect and block chaining attempts. Look for words like "then", "after that", "for each", "loop through". Force users to make explicit, separate requests for each action.

Lesson: AI agents are creative. They'll try to work around restrictions. You need to think like an attacker.

Issue 4: Rate Limiting Was Too Aggressive

What happened: Legitimate users hit rate limits during normal debugging sessions.

The problem: I set limits too low (10 requests per minute). Debugging often requires rapid iteration - check logs, adjust query, check again.

The fix: Tiered rate limits based on action type:

Read operations (Get, Describe): 100 requests per 5 minutes
List operations: 50 requests per 5 minutes
Write operations: 10 requests per 5 minutes

Read operations get higher limits because they're lower risk. Write operations stay restricted.

Lesson: One-size-fits-all rate limits don't work. Different actions have different risk profiles.

What I Learned

After three months in production, here's what actually matters:

1. Explicit Denies Are Your Friend

Don't rely on "not allowing" something. Explicitly deny dangerous actions. Even if someone misconfigures the allow rules, the denies hold.

I have explicit denies for:

All delete operations
All terminate operations
All IAM operations
All KMS key operations

These are the "break glass" protections. They prevent catastrophic mistakes.

2. Log Everything, But Make It Searchable

CloudTrail is great, but you need custom logs for MCP-specific context. I send everything to CloudWatch Logs with structured JSON.

Now I can query: "Show me all S3 access by user X in the last hour" or "What resources did the AI access when processing this prompt?"

The logs are immutable and retained for 90 days. If something goes wrong, I can reconstruct exactly what happened.

3. Sanitize Everything

Never log the actual AI prompts. They might contain sensitive data. I hash them instead.

You can still correlate requests (same hash = same prompt), but you're not storing potentially sensitive prompts in logs.

4. Network Isolation Matters

The MCP server runs in a private VPC with no internet access. It can only reach:

AWS API endpoints (via VPC endpoints)
Internal authentication service
CloudWatch Logs

If someone compromises the MCP server, they can't exfiltrate data. They're stuck in an isolated network.

5. Test Your Security Controls

I wrote tests to verify the security controls actually work. Tests like:

Verify delete operations are blocked
Verify IAM operations are blocked
Verify rate limits work
Verify audit logs capture user context

Run these tests in CI/CD. If they pass, your security controls are working. If they fail, you know immediately.

Alternative Approaches I Considered

Option 1: Direct IAM Roles for AI Agents

Pros:

Simpler architecture
No MCP server to maintain
Lower latency

Cons:

No request validation layer
Can't block dangerous patterns
Harder to audit user actions
AI has direct AWS credentials

Why I didn't use it: Too risky. One prompt injection and the AI could delete production resources. The MCP layer provides defense in depth.

Option 2: AWS Lambda as MCP Server

Pros:

Serverless, no infrastructure
Automatic scaling
Pay per request

Cons:

Cold starts (500ms+)
15-minute timeout limit
Harder to maintain state (rate limiting)
More complex networking

Why I didn't use it: Cold starts killed the developer experience. Waiting 500ms for every request was frustrating. Fargate has no cold starts.

Option 3: API Gateway + Lambda

Pros:

Built-in rate limiting
API key management
Request/response transformation

Cons:

More complex setup
Higher cost at scale
Still has Lambda cold starts
Overkill for internal use

Why I didn't use it: The built-in rate limiting was nice, but not worth the complexity for an internal tool. Fargate + ALB was simpler.

Best Practices That Actually Matter

1. Start With Read-Only

Deploy with read-only permissions first. Let developers use it for a week. Then gradually add write permissions based on actual needs.

This prevents over-permissioning. You'll discover what developers actually need, not what they think they need.

2. Use Separate AWS Accounts

Run the MCP server in a separate AWS account from your production workloads. Use cross-account roles for access.

If the MCP account is compromised, production is still isolated. It's an extra layer of defense.

3. Monitor for Anomalies

Set up CloudWatch alarms for unusual patterns:

High error rates (>50 errors in 5 minutes)
Unusual access patterns (>1,000 requests in 5 minutes)
Blocked actions (>100 blocks in 5 minutes)

These alerts go to your security team. Response time is critical.

4. Regular Security Reviews

Every month, review:

Which actions are being used most
Which permissions are never used (remove them)
Any blocked requests (are they legitimate needs?)
Rate limit effectiveness

Security isn't set-and-forget. It requires ongoing attention.

5. Document Everything

Create a runbook for common scenarios:

How to add a new allowed action
How to investigate suspicious activity
How to rotate credentials
How to handle a security incident

When something goes wrong at 2 AM, you'll be glad you documented it.

Summary

Three months in production taught me that securing AI agent access isn't about perfect security - it's about making attacks harder than they're worth while keeping developers productive.

The MCP pattern works because it gives you a single point of control. You're not trying to secure the AI agent itself. You're securing the gateway it uses to access your resources. That gateway validates every request, enforces least privilege, logs everything, and runs in an isolated network.

We went from developers sending production data to ChatGPT to having a secure, auditable system where AI agents help without creating risk. The benefit? No more 2 AM calls about data leaks.

Is it perfect? No. Can a determined attacker find ways around it? Probably. But it's dramatically better than the alternatives: giving AI agents direct AWS credentials or blocking AI tools entirely and watching developers find workarounds.

The key insight: Security is not about building walls. It's about building gates with guards. MCP is that gate.

"Security is not about building walls. It's about building gates with guards."

📌 Wrapping Up

Thank you for reading! I hope this gave you practical ideas for securing AI agent access in your environment.

Found this useful?

❤️ Like if it helped you think through your security approach
🦄 Unicorn if you're implementing this pattern
💾 Save for your next security review
🔄 Share with your security team

Follow me for more on:

AWS security patterns
AI/ML infrastructure
Cloud architecture
DevSecOps practices

💡 What's Next

I'm working on a follow-up article about monitoring and alerting for MCP deployments. Follow for updates.

Also exploring: Multi-region MCP deployments and disaster recovery patterns.

🌐 Portfolio & Work

Explore my full body of work, certifications, and architecture projects:

👉 Visit My Website

🛠️ Services I Offer

Looking for hands-on guidance with cloud security or AI infrastructure?

Cloud Security Architecture (AWS / Azure)
AI/ML Infrastructure Design
Security Audit & Remediation
Technical Writing & Documentation
Architecture Reviews
1:1 Technical Mentorship

🤝 Let's Connect

Questions about implementing this pattern? Drop a comment or connect with me on LinkedIn.

For consulting or technical discussions: simplynadaf@gmail.com

Stay secure! 🔒

Tighter and more concrete Scaling AWS Serverless in Production: Event Sources, Throttling, and Zero-Downtime Deploys

Collins Ushi — Mon, 20 Apr 2026 07:58:00 +0000

"Serverless scales automatically" is one of those claims that is technically true and practically misleading. The platform will scale your code but the rate, the ceiling, and most importantly the failure modes of that scaling are determined by decisions you make at three specific layers of the system. Get any of them wrong and your perfectly elastic, pay-per-use architecture will either stall out during a traffic spike, silently corrupt data under load, or quietly DDoS itself into a tarpit.

This post is about the parts of serverless scaling that aren't on the marketing page. It's organised around the three boundaries where scale is actually decided: the event source that feeds your functions, the throughput quotas AWS enforces against you, and the deployment pipeline that ships changes without breaking production.

The three boundaries where scale is decided

Every serverless system has the same topology. Work enters at an edge (API Gateway, an event bus, a queue, a stream). It's handed to a compute layer (Lambda, Fargate) that does the work. The compute layer writes to downstream systems (DynamoDB, S3, another queue, a third-party API). A control plane, the Lambda service itself, plus IAM and your deployment tooling governs how much of this can happen concurrently.

Almost every scaling problem I've seen in production falls into one of three buckets:

Event source misconfiguration. The Lambda is fine, but the queue or stream feeding it is throttling throughput, triggering duplicate deliveries, or creating head-of-line blocking.
Quota collision. Your function can scale, but something upstream or downstream can't - API Gateway burst, downstream database connections, account-wide Lambda concurrency, a third-party rate limit.
Deployment fragility. The system scales correctly, but a bad deploy takes it down globally in thirty seconds because there's no canary and no automatic rollback.

The rest of this post works through each of those three layers.

Layer 1: SQS-backed Lambda

Lambda integrates with SQS through a pull model that most people never inspect. When you connect a queue to a function, AWS stands up a small fleet of pollers on your behalf; typically starting at five - which continuously ask the queue "any work?" and invoke your function when the answer is yes.

That polling fleet is the scaling unit. As the queue backlog grows, AWS adds pollers, which spawn more concurrent function invocations, which drain the queue faster. The ramp continues until one of three things happens: the queue empties, your function's reserved concurrency ceiling is hit, or the account-wide concurrency limit is reached.

Batch size and batch window
The single biggest throughput lever on an SQS-backed Lambda is how much work each invocation does. By default, a poller might hand your function a single message which is wasteful, because the invocation overhead dominates. Raising the batch size (up to 10,000 for standard queues, 10 for FIFO) lets a single invocation drain many messages at once.

The tradeoff is latency. If you tell Lambda to wait for 10 messages before invoking, and traffic is light, the first message in the batch sits idle until nine others arrive.

The MaximumBatchingWindowInSeconds setting the batch window puts a ceiling on that wait. It says "gather up to N messages, but if this many seconds pass, send whatever you have." Setting it to a few seconds typically captures most of the batching benefit while keeping tail latency bounded.

The visibility timeout rule
When a poller hands your function a batch, those messages become invisible to other pollers for a configurable period, the visibility timeout. If the function succeeds, it deletes the messages. If the function crashes or times out, the messages become visible again and get retried.

The failure mode to understand is subtle. Suppose your function has a 10 second timeout and the queue's visibility timeout is also 10 seconds. If a single invocation hits a slow downstream and runs for 9 seconds, then deletes the messages at 9.5 seconds, you're fine. But if anything causes the invocation to slip past 10 seconds, the message reappears in the queue while the original function is still running. A second function picks it up. Now two invocations are processing the same message duplicated work, possible data corruption, and if the downstream isn't idempotent, a real mess.

The rule of thumb is to set the visibility timeout to at least six times the function timeout. Overkill? Yes. But this is one of those parameters where being paranoid costs you nothing and the failure mode is insidious enough that you want margin.

Layer 2: Kinesis-backed Lambda

Kinesis looks like SQS on the surface but behaves nothing like it. SQS is a buffer: messages are independent, order doesn't matter, and consumers scale horizontally. Kinesis is an ordered stream: records are partitioned into shards, order within a shard matters, and concurrency is fundamentally bounded by shard count.

The rule is one Lambda execution environment per shard. A stream with four shards gets four concurrent Lambda invocations regardless of backlog size. You could have a billion records waiting and still have only four workers draining them. This is the piece that bites teams migrating from SQS: "just add more Lambda" doesn't work.

There are two ways to scale past the shard ceiling. The infrastructure answer is to reshard - splitting four shards into eight doubles your concurrency. The software answer is the Parallelization Factor, which lets a single shard be processed by up to 10 concurrent Lambda invocations simultaneously, as long as records with the same partition key are still delivered to the same invocation. Order is preserved within a partition key, not across the whole shard. For most analytics and event-processing workloads, that's a meaningful distinction that buys you a 10x concurrency boost without resharding.

Iterator age: the lag signal
In SQS you watch queue depth to know you're falling behind. In Kinesis you watch iterator age - the age of the most recent record your function has processed. A flat iterator age means you're keeping up. A climbing iterator age means records are entering the stream faster than you can drain them, and data is aging toward the retention cliff. If iterator age crosses retention (24 hours by default, up to 365 days with extended retention), records fall off the back of the stream and are gone.

Iterator age is the single most important metric to alarm on for any Kinesis-backed Lambda. Queue depth tells you about volume; iterator age tells you about time remaining before data loss.

Enhanced Fan-Out
The default Kinesis read bandwidth is 2 MB/s per shard, shared across all consumers. Attach a Lambda and a Firehose to the same stream and each effectively gets 1 MB/s. Add a third consumer and now everyone gets 666 KB/s. This is the noisy-neighbour problem applied to data streams.

Enhanced Fan-Out solves it by giving each registered consumer its own dedicated 2 MB/s pipe. For production pipelines with multiple downstream consumers, this is not optional, it's the difference between a stream that scales with consumers and one that gets slower with every addition.

DynamoDB Streams vs Kinesis Data Streams

When you need to capture changes from DynamoDB, you have two architecturally similar but operationally very different options. Both use shards as the parallelism unit, but the management model diverges sharply.

The choice is almost entirely about how much scaling responsibility you want to own. DynamoDB Streams is the right default for triggers, CDC to a single downstream, and most small-to-medium workloads - you pay nothing for operational simplicity. Kinesis Data Streams is the right choice when you have many consumers, need long replay windows (reprocessing the last week of events for a new feature is a common pattern), or need dedicated bandwidth per consumer for SLA reasons.

Poison pills and the negative scaling trap

There's a counterintuitive behaviour of Lambda's event source integrations that every serverless team eventually discovers the hard way. If your function starts returning errors at a high rate; crashing, timing out, throwing exceptions - the Lambda service doesn't scale up to retry faster. It scales down. It reduces polling rate, reduces concurrency, and backs off.

From Lambda's perspective this is sensible: a wave of errors probably means a downstream database is struggling, and pouring more traffic at it will turn a degradation into an outage. The service is protecting your infrastructure from your own code. But for an operations team watching the dashboard, this self-imposed slowdown shows up as a rapidly climbing iterator age or queue depth exactly when you can least afford it.

The way out is to stop throwing hard errors when individual records fail. Instead of letting one bad record crash the entire batch, use the ReportBatchItemFailures response pattern. This tells Lambda "the invocation succeeded overall, but here are the specific record IDs that failed don't delete those, but keep the rest." The healthy records move forward, the failed ones go to a DLQ or on-failure destination, and Lambda sees a succeeding function and maintains full polling velocity.

Here's a clean implementation of the pattern for an SQS or DynamoDB Streams-backed function:

import json
from typing import Any

def handler(event: dict, context: Any) -> dict:
    """Process a batch, reporting per-record failures to preserve scaling velocity."""
    batch_item_failures: list[dict[str, str]] = []

    for record in event.get("Records", []):
        record_id = _extract_record_id(record)
        try:
            payload = _extract_payload(record)
            process_item(payload)
        except Exception as exc:
            # Log with structured context so the failure is diagnosable
            print(json.dumps({
                "level": "error",
                "record_id": record_id,
                "error": str(exc),
                "error_type": type(exc).__name__,
            }))
            batch_item_failures.append({"itemIdentifier": record_id})

    # Returning this shape keeps the invocation status "Success" from Lambda's
    # perspective, while telling the poller exactly which records to retry.
    return {"batchItemFailures": batch_item_failures}


def _extract_record_id(record: dict) -> str:
    """SQS uses messageId; DynamoDB/Kinesis use sequenceNumber."""
    return (
        record.get("messageId")
        or record.get("dynamodb", {}).get("SequenceNumber")
        or record.get("kinesis", {}).get("sequenceNumber", "")
    )


def _extract_payload(record: dict) -> dict:
    if "body" in record:
        return json.loads(record["body"])
    if "dynamodb" in record:
        return record["dynamodb"].get("NewImage", {})
    if "kinesis" in record:
        return record["kinesis"].get("data", {})
    return {}


def process_item(data: dict) -> None:
    if not data:
        raise ValueError("Empty payload")
    # Actual business logic here idempotent, please

You also need to enable ReportBatchItemFailures on the event source mapping itself (in SAM, CDK, or the console) the function-side response is inert without it.

Layer 2 intermission: the token bucket

Underneath almost every AWS throttling decision is the same algorithm: the token bucket. API Gateway uses it for per-route throttling. Lambda uses it for burst concurrency. DynamoDB uses it for provisioned-throughput tables. Every AWS SDK client uses one internally for retry management. Understanding it is the difference between tuning limits with intent and adjusting them until the alarms stop firing.

The mental model has three pieces. The bucket has a maximum capacity, the burst limit and starts full. Each successful request consumes one token. If the bucket is empty when a request arrives, the request is throttled (HTTP 429). Tokens refill at a steady rate; the rate limit, expressed as requests per second. A bucket with a 1,000 request burst and a 100 RPS refill rate can absorb a 1,000 request spike instantly, but then needs 10 seconds of zero traffic to fully recover its burst capacity.

Three things about this are operationally painful.

CloudWatch lies about it. Standard metrics aggregate over 1 or 5 minute windows. If 6,000 requests arrive at 100 RPS evenly across a minute, the graph looks identical to 6,000 requests arriving in the first five seconds. The first scenario is healthy; the second emptied your bucket, throttled hundreds of requests, and then sat idle. The only metric that tells you the truth is the throttle count itself in API Gateway, 4XXError or ThrottledCount; in Lambda, Throttles. Alarm on throttles, not on request counts.

Enforcement is distributed. There isn't one bucket sitting in one server. API Gateway enforces its quotas across a fleet of nodes, and tokens don't refill in perfect synchrony across all of them. At the edges you'll see "jitter" a request throttled on node A that would have succeeded on node B a millisecond later. This is why single-burst load tests often pass and then production fails you tested an idealised bucket, not the real distributed one.

Mismatched buckets upstream and downstream create phantom capacity. If API Gateway has a 5,000 RPS burst but the Lambda it fronts has a reserved concurrency of 500, the API Gateway quota is fiction. The real ceiling is 500. Every quota in your chain has to be reconciled against the weakest link, or you'll think you have headroom you don't actually have.

Practical tuning
Four habits make token-bucket behaviour predictable in practice:
First, implement exponential backoff with jitter on every client. A fixed backoff from 100 simultaneous throttled clients causes all 100 to retry at exactly the same millisecond, re-emptying the bucket instantly. Randomised backoff spreads the retries out so the bucket has time to refill between waves.

Second, calculate time-to-refill explicitly. refill_seconds = burst_limit / rate_limit. If your burst is 1,000 and your rate is 100, you need 10 seconds of quiet to recover full burst capacity. If your traffic is continuous, you may never recover it which means your effective capacity is the rate limit, not the burst.

Third, load-test for sustained burst, not just peak. A burst of 500 with a rate of 100 RPS can absorb 200 RPS for about five seconds before the bucket drains; after that you'll see ~50% throttling. If your expected peak is sustained, you need to size the rate limit, not the burst.

Fourth, use Lambda Provisioned Concurrency as a "floor of warm tokens" for latency-sensitive paths but understand the cost. Provisioned concurrency is subtracted from your account's unreserved pool. Provisioning 500 units for one function permanently removes those 500 units from every other function in the account, even when your provisioned function is idle. Over-provisioning quietly starves the rest of your workloads.

The pre-production scaling review

Before putting any ingestion-heavy serverless pipeline in front of real traffic, there are four questions worth writing down the answers to. I've seen each of them caught in review and missed in launch, with predictable outcomes.

What are the hard limits in every hop of this chain?
Not the defaults, the actual limits on this account, in this region, this month. Lambda concurrency, API Gateway RPS, DynamoDB provisioned throughput, SQS message size, Kinesis shard count. Put them in a table. The ceiling of the whole system is the lowest number on the page.

*Is the timeout hierarchy consistent? *
The function timeout must be shorter than the visibility timeout, which must be shorter than the retry window, which must be shorter than any upstream timeout. Any inversion creates ghost retries invocations that succeed but get replayed because the upstream decided they'd failed.

What's the error strategy, and is it written down?
Is this system at-least-once or exactly-once? When a record fails, does it halt the pipeline (preserving order, stopping throughput) or go to a dead-letter queue (preserving throughput, losing order)? There's no universally right answer, but there is a right answer for your business and it should be decided before traffic arrives, not during an incident.

Are there native integrations you're replacing with custom glue? If you're moving data from Kinesis to S3 with a Lambda function, you're probably reimplementing Amazon Data Firehose, badly. If you're parsing DynamoDB Stream records and writing them to OpenSearch with a Lambda, the Zero-ETL integration likely exists. Custom glue is the highest-maintenance part of any pipeline; push it into a managed service wherever possible.

Layer 3: Shipping changes without breaking production

A perfectly scaling system is one bad deploy away from an outage. The final part of the playbook is the deployment pipeline specifically, how SAM and CodeDeploy work together to make Lambda deploys boring.

The core primitives are Lambda versions (immutable snapshots of function code) and aliases (mutable pointers to versions, like live or canary). A SAM template with AutoPublishAlias: live tells the deploy pipeline: every time my code changes, publish a new immutable version and shift the live alias to point to it gradually, with monitoring, with a kill switch.

The mechanism behind that gradual shift is DeploymentPreference. Three strategies are available:

AllAtOnce; the default Lambda behaviour. Instant cutover. Fast and risky; appropriate only for non-production or for tooling functions where a failed invocation is inconvenient but not expensive.
Linear; shift traffic in fixed increments (e.g., Linear10PercentEvery10Minutes). Simple, predictable, and gives alarms time to notice problems before they're global.
Canary; shift a small slice (say 10%) immediately, hold for a configurable bake time, then shift the rest. Lower latency to full rollout than linear, still lets you catch regressions on the canary slice.

The deployment runs four phases:

Publish. SAM publishes the new version as an immutable snapshot.

Pre-traffic validation. CodeDeploy invokes a PreTraffic Lambda hook you provide a synthetic transaction or smoke test against the new version before any real traffic sees it. If the hook fails, the deploy halts immediately and live stays on the old version.

Weighted traffic shift. CodeDeploy updates the alias to use weighted routing; sending a configurable percentage to the new version, the rest to the old. During the shift window, it watches the CloudWatch alarms you've listed (typically error rate, p99 latency, downstream throttling). If any alarm fires, traffic snaps back to 100% old version automatically.

Post-traffic validation. Once the shift completes, CodeDeploy runs a PostTraffic hook for final verification, then marks the deploy done.

This is what "safe deploys" actually means in practice: not a manual runbook, but a machine that's watching metrics and can undo itself faster than you can type.

SAM or CDK?

Both deploy via CloudFormation under the hood. SAM is declarative (YAML) with shorthand resources like AWS::Serverless::Function that expand into a dozen primitive resources the right choice when your infrastructure is mostly serverless and mostly stable. CDK is imperative (TypeScript, Python) and gives you loops, conditionals, abstractions, and IDE autocomplete the right choice when your infrastructure has real logic, many environments, or needs reusable constructs across teams. For a single-team serverless app, SAM will get you there faster. For a platform that many teams build on, CDK's abstraction power pays off.

Four habits that keep the pipeline boring

Test at every stage, not at the end. Linters and unit tests in the build stage; integration tests against a deployed staging environment; synthetic transactions in pre-traffic hooks; post-deploy smoke tests. Each stage catches a different class of bug and none of them are substitutes for each other.

One AWS account per environment. Dev, staging, and production should be separate accounts, not separate regions or separate resource prefixes in one account. The boundary is for blast radius (a compromised dev IAM role can't reach production), cost attribution (one bill per environment), and accident prevention (you can't accidentally terraform destroy prod if prod is in an account you're not authenticated against).

One template, parameterised per environment. If your staging and production templates diverge, you stop testing production in staging. Use CloudFormation parameters for environment-specific values (table names, instance sizes, domain names) and keep the resource shape identical across environments.

Secrets in Secrets Manager or Parameter Store, referenced dynamically. Never bake credentials into environment variables at deploy time, you'll end up redeploying the app to rotate a secret. Reference secrets by ARN in the template, grant the function IAM permission to read them, and fetch them at runtime (with caching). Rotation becomes a secrets-manager operation, not a code deploy.

Personal token factory: OpenClaw in AWS but Nvidia GB10 at home

Piotr Pabis — Mon, 20 Apr 2026 07:14:17 +0000

Even though Nemotron 3 Super is still free on OpenRouter, the agreement is that you donate all your exchanges to Nvidia for training. A paid version is available also quite cheaply ($0.10/M input, $0.50/M output.) I still decided to utilize my ASUS GX10 (aka DGX Spark aka GB10) as the token source for my agent hosted at AWS. But the trick here is the following: I don't want to open my home network to the outside Internet! Another rule: I don't want to pay for any public IPv4 address to AWS. Will I be able to achieve that? That's actually simple! Let me guide you today how I achieved that setup.

Above you can see that I plan to tunnel both home and AWS networks using Wireguard. I could potentially make this site-to-site but to make it simpler and safer for my home devices, I will only connect to Wireguard server (listener) on AWS side with DGX Spark as a client. I will do it over IPv6. At the bottom you can see that my home devices have normal connectivity to both IPv6 and IPv4 Internet but on AWS side I am relying solely on IPv6 (although internally VPC still uses private IPv4 range). There will be some issues with that but we will fix them later.

Companion GitHub repo: github.com/ppabis/wireguard-openclaw-dgx

Setting up VPC and Wireguard server

I have created a simple VPC using a module with range 10.189.80.0/21 with IPv6 enabled. It has Internet Gateway in public subnets and egress only Internet Gateway for IPv6 outbound connectivity. I disabled NAT gateway on purpose, we will cover this issue later. Instances in public subnet will be reachable over IPv6 from the outside - this is where we will place our Wireguard server. OpenClaw will remain inside the private subnet.

data "aws_availability_zones" "available" {
  state = "available"
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 6.6.0"

  name = "my-vpc"
  cidr = "10.189.80.0/21"

  azs             = slice(data.aws_availability_zones.available.names, 0, 3)
  public_subnets  = ["10.189.80.0/24", "10.189.81.0/24", "10.189.82.0/24"]
  private_subnets = ["10.189.83.0/24", "10.189.84.0/24", "10.189.85.0/24"]

  enable_dns_hostnames = true
  enable_dns_support   = true
  enable_nat_gateway   = false

  enable_ipv6                                    = true
  public_subnet_assign_ipv6_address_on_creation  = true
  private_subnet_assign_ipv6_address_on_creation = true
  public_subnet_ipv6_prefixes                    = [0, 1, 2]
  private_subnet_ipv6_prefixes                   = [3, 4, 5]

  private_subnet_tags = { type = "private" }
  public_subnet_tags  = { type = "public" }
}

Creating Wireguard server instance

First of all, I will create an EC2 instance with latest Amazon Linux 2023. It includes WireGuard already in the repositories and the update servers work over IPv6. I will run my WireGuard server on port 51280 so that's what I will open on the security group. All egress should also be open. I will attach am IAM role to it as well, no permissions but it will come handy later. First, define all the smaller components.

data "aws_ssm_parameter" "al2023_arm64" {
  name = "/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64"
}

resource "aws_security_group" "wireguard" {
  name_prefix = "wireguard-sg"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port        = 51280
    to_port          = 51280
    protocol         = "udp"
    ipv6_cidr_blocks = ["::/0"]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
}

data "aws_iam_policy_document" "wireguard_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "wireguard" {
  name               = "wg-ec2-role"
  assume_role_policy = data.aws_iam_policy_document.wireguard_assume_role_policy.json
}

resource "aws_iam_instance_profile" "wireguard" {
  name = "wg-ec2-profile"
  role = aws_iam_role.wireguard.name
}

And finally we can define the EC2 instance. It will be the cheapest and smallest possible one I can find, which is t4g.nano. AMI ID is pulled from publicly shared SSM parameter (easier than AMI data source in Terraform). From networking I'm disabling public IPv4 assignment, forcing at least one IPv6 and disabling source destination check (more on that later). I didn't define any SSH connectivity, nor EC2 Instance Connect or Session Manager. The cheapest option here would be to define a key pair and open 22 on IPv6 to your subnet if you need debugging.

resource "aws_instance" "wireguard" {
  ami                         = data.aws_ssm_parameter.al2023_arm64.value
  instance_type               = "t4g.nano"
  iam_instance_profile        = aws_iam_instance_profile.wireguard.name
  subnet_id                   = module.vpc.public_subnets[0]
  vpc_security_group_ids      = [aws_security_group.wireguard.id]
  associate_public_ip_address = false
  ipv6_address_count          = 1
  source_dest_check           = false

  user_data = local.user_data

  tags = { Name = "Wireguard" }
  lifecycle { ignore_changes = [ami] }
}

output "ipv6" {
  value = aws_instance.wireguard.ipv6_addresses[0]
}

Installing and configuring Wireguard on the server

How would we install the server if we have no SSH or any other shell access to the instance? As you see above, I have defined user data. This is a script that runs on first instance boot... if you just use pure Bash. For our use case, we want to be able to control instance contents dynamically. For that we will use cloud-init, which allows for more flexibility regarding user data contents. We will start with the draft defining "attachments" with both cloud-init's cloud-config (YAML) as well as standard Bash script that starts on boot. I'm writing this in a new file called user-data.yaml (although it's not a valid YAML file).

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
package_update: true

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/bash
echo "Configuring WireGuard..."
--//--

Side note

If you know cloud-init, you might wonder why I want to use also user data. For some reason runcmd doesn't always execute when I want it to and bootcmd happens too early.

As you see we have two sections in this file. First, we have defined cloud-config that will just update packages on startup. Second script will also execute on first boot and just echo a message. Now what we can do is to configure each module to run on every restart. In cloud-config add this (insert between boundaries):

#cloud-config
package_update: true
cloud_final_modules:
  - [scripts-user, always]

cloud_config_modules:
  - [write_files, always]
  - [package_update_upgrade_install, always]

Now all the sections (that we define soon) will run on every instance reboot. This will give us some flexibility in changing the configuration (although requiring reboot but how often do you plan to change this 😄). Let us install required packages: Wireguard itself, iptables for routing capabilities and for convenience if you need to debug, tmux and htop can come in handy.

packages:
  - wireguard-tools
  - iptables-nft
  - htop
  - tmux

Now let's proceed with Wireguard configuration. As previously stated, I want to use 10.155.222.0/24 as the subnet and listen on port 51280. The private key will be a placeholder for safety reasons. When the tunnel is brought up, we are going to enable IP forwarding in the kernel and allow forwards between wg0 Wireguard's interface and ens5 (primary network card in AL2023 at least). The router address (sever) will be the first one in the subnet 10.155.222.1.

write_files:
  - path: /etc/wireguard/wg0.conf
    content: |-
      [Interface]
      Address    = 10.155.222.1/24
      ListenPort = 51280
      PrivateKey = _PRIVATE_KEY_

      # Enable routing + NAT for WG clients to reach VPC
      PostUp   = sysctl -w net.ipv4.ip_forward=1
      PostUp   = iptables -A FORWARD -i wg0 -o ens5 -j ACCEPT
      PostUp   = iptables -A FORWARD -i ens5 -o wg0 -j ACCEPT

      PostDown = iptables -D FORWARD -i wg0 -o ens5 -j ACCEPT
      PostDown = iptables -D FORWARD -i ens5 -o wg0 -j ACCEPT
      # End of file

This configuration is not yet usable until we create another part of the user data attachment. In the Bash script that is going to run on every machine startup, we are going to generate Wireguard's key pair, if it doesn't exist, and if it does, we will just replace the _PRIVATE_KEY_ placeholder with sed. But that's not all! We also need the public key of the server after all to connect. As I don't want to need any SSH-like connectivity to this server, it will export the public key to AWS Systems Manager Parameter Store.

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/bash
set -eo pipefail

echo "Configuring WireGuard..."
# If there's no private key, generate the private key into a file and derive
# the public one also into a file.
if [ ! -f /etc/wireguard/private.key ]; then
  wg genkey | tee /etc/wireguard/private.key | wg pubkey > /etc/wireguard/public.key
fi

# Replace the private key placeholder if it exists in wg0.conf
sed -i "s#_PRIVATE_KEY_#$(cat /etc/wireguard/private.key)#g" /etc/wireguard/wg0.conf

# Export the public key to SSM Parameter Store. Use IPv6 endpoint.
echo "Public key: $(cat /etc/wireguard/public.key)"
export AWS_USE_DUALSTACK_ENDPOINT=true
aws ssm put-parameter --type "String" --overwrite \
 --name "/wireguard/public-key" \
 --value "$(cat /etc/wireguard/public.key)"

# Start the tunnel.
wg-quick up wg0

--//--

Currently if you run the script, the machine will not be able to write into Parameter Store because the IAM role doesn't have such permissions. Add the following policy to the previously defined role.

data "aws_caller_identity" "X" {}
data "aws_region" "X" {}

data "aws_iam_policy_document" "wireguard_ssm_policy" {
  statement {
    actions   = ["ssm:PutParameter"]
    resources = ["arn:aws:ssm:${data.aws_region.X.region}:${data.aws_caller_identity.X.account_id}:parameter/wireguard/public-key"]
  }
}

resource "aws_iam_role_policy" "wireguard_ssm_policy" {
  name   = "wg-ssm-policy"
  role   = aws_iam_role.wireguard.id
  policy = data.aws_iam_policy_document.wireguard_ssm_policy.json
}

As the last part, add the user data local variable. I will use templatefile because we are going to do some dynamic things later, so it will come in handy.

locals {
  user_data = templatefile("${path.module}/user-data.yaml", {})
}

When you now deploy this infrastructure, after some minutes you should get a public key in SSM Parameter Store under wireguard/public-key. You can use it to configure the client connections. You can also use the following command to get the public key value.

aws ssm get-parameter \
 --name /wireguard/public-key \
 --query Parameter.Value \
 --output text

Setting up client

On the client side, so my DGX Spark, I will now SSH and also install Wireguard. I am using default Ubuntu installation. We are going to generate a new private key, get the public key and save the configuration. You need to run the following commands under root.

apt -y install wireguard wireguard-tools

Define some variables that are public key from SSM Parameter store and output with IPv6 from Terraform. Also configure the path where you want to keep the configuration. I will use wg1 interface.

WG_SERVER_KEY="sJQTQ7sytRuLBK74Y24TNtXrHqrmpRb+ZsT9olMfXQ4=" # Key from SSM
WG_SERVER_IP="2a05:d012:e21:2345:6789:01ab:cdef:9dd7" # IPv6 from AWS
mkdir -p /etc/wireguard/
WG_CONFIG=/etc/wireguard/wg1.conf
WG_PRIVATE_KEY=$(wg genkey)
WG_PUBLIC_KEY=$(echo $WG_PRIVATE_KEY | wg pubkey)
echo "Public key = $WG_PUBLIC_KEY"

Then generate the configuration. In case you have some firewall, select a port that you want to use for listening on incoming VPN traffic. Choose a unique address from the VPN subnet pool. Optionally set the DNS resolver to the AWS VPC one (second address of subnet's CIDR). AllowedIPs is an unfortunate name but these are the routes that should go through the VPN - I set them to VPC CIDR and VPN's internal subnet.

cat > $WG_CONFIG <<EOF
[Interface]
PrivateKey = $WG_PRIVATE_KEY
# Public Key: $WG_PUBLIC_KEY
Address = 10.155.222.3/32
DNS = 10.189.80.2    # Optional
ListenPort = 62910   # You can skip this if you don't have firewall

[Peer]
PublicKey = $WG_SERVER_KEY
AllowedIPs = 10.189.80.0/21, 10.155.222.0/24 # Connectivity to VPC and VPN
PersistentKeepalive = 25
Endpoint = [$WG_SERVER_IP]:51280

EOF

Allowing new client on the Wireguard server

As you get the new public key for your home client, we need to now enable it on the Wireguard server. As you remember we used templatefile to load the user data. This will come useful now as we will be able to configure multiple clients. Let's revisit the write_files section. Modify the end of file, after iptables commands.

write_files:
  - path: /etc/wireguard/wg0.conf
    content: |-
      [Interface]
      Address    = 10.155.222.1/24
      ListenPort = 51280
      PrivateKey = _PRIVATE_KEY_

      # Enable routing + NAT for WG clients to reach VPC
      PostUp   = sysctl -w net.ipv4.ip_forward=1
      PostUp   = iptables -A FORWARD -i wg0 -o ens5 -j ACCEPT
      PostUp   = iptables -A FORWARD -i ens5 -o wg0 -j ACCEPT

      PostDown = iptables -D FORWARD -i wg0 -o ens5 -j ACCEPT
      PostDown = iptables -D FORWARD -i ens5 -o wg0 -j ACCEPT

      %{~ for peer in peers ~}
      [Peer]
      PublicKey = ${peer.public_key}
      AllowedIPs = ${peer.address}/32
      %{~ endfor ~}

      # End of file

The above for loop will generate multiple clients on the Wireguard server. Now in the template variables you have to set peers map with public_key and address keys. Revisit the locals in EC2 instance. Applying this will reboot the instance.

locals {
  user_data = templatefile("user-data.yaml", {
    peers = [
      {
        address    = "10.155.222.3", # the IP you selected for the client
        public_key = "bxmMoVvXlVVRg7uaTnxI6Vf7wxeI0XWj5d6zREqDkzk=" # the public key of the client
      }
    ]
  })
}

Now if you bring up the there should be a status about latest handshake and some data that is received.

$ sudo wg-quick up wg1
[#] ip link add wg1 type wireguard
[#] wg setconf wg1 /dev/fd/63
[#] ip -4 address add 10.155.222.3/32 dev wg1
[#] ip link set mtu 1420 up dev wg1
[#] ip -4 route add 10.155.222.0/24 dev wg1
[#] ip -4 route add 10.189.80.0/21 dev wg1

$ sudo wg
interface: wg1
  public key: bxmMoVvXlVVRg7uaTnxI6Vf7wxeI0XWj5d6zREqDkzk=
  private key: (hidden)
  listening port: 62910

peer: sJQTQ7sytRuLBK74Y24TNtXrHqrmpRb+ZsT9olMfXQ4=
  endpoint: [2a05:d012:e21:2345:6789:01ab:cdef:9dd7]:51280
  allowed ips: 10.189.80.0/21, 10.155.222.0/24
  latest handshake: 22 seconds ago
  transfer: 92 B received, 180 B sent
  persistent keepalive: every 25 seconds

I have created a test internal application load balancer. It will listen for CIDR 10.155.222.0 (not VPC CIDR!) on port 80. But before this can be used you also have to define routes for Wireguard's subnet. This is the reason for turning off source-destination check on the network card of the EC2 instance. Without that feature, packets destined for Wireguard (whether requests or responses) will be accepted by the EC2 instance even if the destination isn't any of the instance's IPs.

resource "aws_route" "wireguard_tunnel_prefix" {
  for_each = toset(
    concat(
      module.vpc.private_route_table_ids,
      module.vpc.public_route_table_ids
    )
  )

  route_table_id         = each.value
  destination_cidr_block = "10.155.222.0/24"
  network_interface_id   = aws_instance.wireguard.primary_network_interface_id
}

Afterwards I tested with cURL and the connectivity was established! The test IPs are private range as you see below. I even tried DNS and it was also functional through the Wireguard interface - that way we can later set up some private domains or use Cloud Map.

$ curl http://internal-mytest-alb-1234567890.eu-west-3.elb.amazonaws.com
hello world

$ dig +short internal-mytest-alb-1234567890.eu-west-3.elb.amazonaws.com
10.189.83.70
10.189.85.144

$ resolvectl query internal-mytest-alb-1234567890.eu-west-3.elb.amazonaws.com
internal-mytest-alb-1234567890.eu-west-3.elb.amazonaws.com: 10.189.83.70 -- link: wg1
                                                            10.189.85.144 -- link: wg1

Installing LLM server

Now we need to install and configure Ollama. Just follow the instructions on ollama.com to install it, or you can use any other LLM server you wish. Be sure that it is listening on all addresses and not just local host. Create the following override in SystemD (on Ubuntu).

sudo cat > /etc/systemd/system/ollama.service.d/override.conf <<EOF
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_CONTEXT_LENGTH=212000"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_KEEP_ALIVE=2400"
EOF
sudo systemctl enable --now ollama
sudo systemctl restart ollama

You can alternatively run it in Docker. It should be supported on DGX Spark out of the box to use it with the GPU. Choose only one or the other because they occupy the same port in the command below!

docker run -d \
 --gpus=all \
 -v ollama:/root/.ollama \
 -p 11434:11434 \
 --name ollama \
 --restart=unless-stopped \
 ollama/ollama

For direct installations, you can just use ollama pull <model> to download the model in advance. I will use Nvidia's Nemotron 3 Super which was one of the best medium-sized models when I started writing this post. However, Gemma 4 and Qwen 3.6 were also released so you can experiment with that.

ollama pull nemotron-3-super:120b-a12b-q4_K_M

Connecting from an EC2 instance

I will create another EC2 instance which will be used for OpenClaw or any other system as you want such as Hermes or just a web app for chatting. I will prepare the instance first, will use Ubuntu 24.04 and t4g.medium instance. I will also create a new user data script that will bootstrap some of the required packages. As we have IPv6 outbound connectivity from private subnet, APT repositories should work without issues. I will also enable SSH access from Wireguard's inner subnet so that I can SSH to the instance, but you can alternatively use IPv6 when moving to public subnet or via SSM Systems Manager if you configure it, it's up to you.

data "aws_ssm_parameter" "ubuntu_2404" {
  name = "/aws/service/canonical/ubuntu/server/24.04/stable/current/arm64/hvm/ebs-gp3/ami-id"
}

resource "aws_security_group" "agent" {
  name        = "openclaw-agent-sg"
  vpc_id      = data.aws_subnet.agent.vpc_id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["10.155.222.0/24"]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
}

resource "aws_instance" "agent" {
  ami           = data.aws_ssm_parameter.ubuntu_2404.value
  instance_type = "t4g.medium"
  user_data     = file("openclaw.yaml")
  tags          = { Name = "openclaw-agent" }

  subnet_id                   = module.vpc.private_subnets[0]
  vpc_security_group_ids      = [aws_security_group.agent.id]
  associate_public_ip_address = false
  ipv6_address_count          = 1

  metadata_options {
    http_endpoint = "enabled"
    http_tokens   = "required"
  }

  root_block_device {
    volume_size           = 20
    volume_type           = "gp3"
    encrypted             = true
    delete_on_termination = true
  }

  lifecycle { ignore_changes = [ami] }
}

output "private_ip" {
  value = aws_instance.agent.private_ip
}

From the system config I will do the following:

add Node.js APT repository of version 24,
install Node.js and unattended upgrades,
enable unattended upgrades,
enable AWS SSM (optional but useful).

I will also create a separate user for OpenClaw so that it can have its own home directory and permissions. It is also very important to create the default user as this will allow you to SSH to the instance to onboard OpenClaw. If you wish you can also specify SSH keys in here or via AWS key pairs.

#cloud-config
cloud_final_modules:
  - [scripts-user, always]

cloud_config_modules:
  - [write_files, always]
  - [apt_configure, always]
  - [package_update_upgrade_install, always]

hostname: openclaw-agent
create_hostname_file: true

apt:
  sources:
    nodejs:
      keyid: 2F59B5F99B1BE0B4
      keyserver: keyserver.ubuntu.com
      source: deb [signed-by=$KEY_FILE] https://deb.nodesource.com/node_24.x nodistro main

package_update: true
package_upgrade: true
packages:
  - nodejs
  - unattended-upgrades

users:
  - default
  - name: openclaw
    uid: 2200

ssh_authorized_keys:
  - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPgVNNOeuUqMgobgeIIkndXXYekOmC/e5bqty3f0UXDa my-ssh-key

write_files:
  - path: /etc/apt/apt.conf.d/20auto-upgrades
    permissions: "0644"
    owner: root:root
    content: |
      APT::Periodic::Update-Package-Lists "1";
      APT::Periodic::Unattended-Upgrade "1";

runcmd:
  - systemctl enable --now unattended-upgrades || true
  - loginctl enable-linger openclaw || true # Enable SystemD on openclaw's user

Installing OpenClaw - with a caveat 😳

I SSH'd into the instance and Nodejs should already be there based on the provided user data. So I switched user to the new openclaw one I defined and started installation with NPM.

ssh ubuntu@$(tofu output -raw private_ip) # replace with your private IP

The authenticity of host '10.189.83.247 (10.189.83.247)' can't be established.
ED25519 key fingerprint is: SHA256:iMfJBU8iSEc5ikspbNKGD8jCAlLGwrOs28lbI4aPw2Q
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

# Inside the machine
sudo su openclaw -l -s /bin/bash
npm install openclaw@latest

...
4564 error A git connection error occurred
4565 error command git --no-replace-objects ls-remote ssh://git@github.com/whiskeysockets/libsignal-node.git
4566 error ssh: connect to host github.com port 22: Connection timed out
4566 error fatal: Could not read from remote repository.

This timeout happens because we don't have public IPv4 connectivity! There are two standard solutions - public IP for the instance or NAT Gateway or...

Hosting tinyproxy on DGX Spark

As we already have connection to other VPN places we can simply use one of the machines on the network as the exit to IPv4 internet. As a bonus we retain our residential IP! So let's spin it up in a Docker but for that we need to make some configuration first in Dockerfile and tinyproxy.conf.

FROM alpine:latest

RUN apk add tinyproxy
COPY tinyproxy.conf /etc/tinyproxy/tinyproxy.conf
EXPOSE 8888
ENTRYPOINT ["/usr/bin/tinyproxy"]
CMD ["-d"]

Port 8888
Timeout 600
MaxClients 100
ViaProxyName "tinyproxy"

User nobody
Group nobody

DefaultErrorFile "/usr/share/tinyproxy/default.html"
StatFile "/usr/share/tinyproxy/stats.html"
LogLevel Info

Allow 127.0.0.1
Allow ::1
Allow 10.155.222.0/24
Allow 10.189.80.0/21

From that config we can easily build the new image for Tinyproxy and set it up so that it starts on boot. Of course then all this connectivity will rely on our local machine being up, so it's only usable for some of IPv4 requirements such as GitHub.

docker build -t local-tinyproxy:latest .
docker run -d \
 --name tinyproxy \
 --restart=unless-stopped \
 -p 8989:8888 \
 local-tinyproxy:latest

Now we can easily direct the proxy to the new service within Wireguard's network. However, in order to do this, we need to open the port 8989 (and 11434 for Ollama) on Wireguard's instance. You might ask why is that? So any packet sent from OpenClaw's instance in that direction will look like this:

source address: 10.189.83.247 (example),
destination address: 10.155.222.3,
source port: 59123 (example),
destination port: 8989.

Filtering on AWS security group level cares about source address and destination port rather than anything else. Destination address is taken care by "source-destination" check of the network interface - the feature we just disabled. Let's update our security group.

resource "aws_security_group" "wireguard" {
  name_prefix = "wireguard-sg"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port        = 51280
    to_port          = 51280
    protocol         = "udp"
    ipv6_cidr_blocks = ["::/0"]
  }

  ingress {
    from_port        = 11434
    to_port          = 11434
    protocol         = "tcp"
    security_groups  = [aws_security_group.agent]
  }

  ingress {
    from_port        = 8989
    to_port          = 8989
    protocol         = "tcp"
    security_groups  = [aws_security_group.agent]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
}

Now you can configure NPM and Git to use the proxy and OpenClaw should be able to install with no issues. You can also test the connectivity with cURL even, if it responds with 500, this is fine; if 403, this might be a problem with tinyproxy.conf. If this cURL command shows timeout or "couldn't connect to server", this can be security groups, routes or other firewall.

curl http://10.155.222.3:8989 -X CONNECT | grep title
# Test results
# <title>500 Unable to connect</title>
npm set https-proxy=http://10.155.222.3:8989
npm set proxy=http://10.155.222.3:8989
git config --global http.proxy http://10.155.222.3:8989
git config --global https.proxy http://10.155.222.3:8989
npm install openclaw@latest

OpenClaw - Onboard!

And we are almost done! The only thing we now need is to follow the onboarding process. Use Ollama provider in local mode, set the correct DGX's IP over Wireguard and choose the model from the list!

export XDG_RUNTIME_DIR=/run/user/$(id -u) # for systemd support
npx openclaw onboard

I will demonstrate usage via TUI rather than any instant messenger here. The first load time for the latest build of OpenClaw took around 1:30 minutes with Nemotron 3 Super (q4). After resetting the session, first message took around 20 seconds (to load 12k context), so most of the time was loading the model into VRAM. Keeping model in memory is controllable on Ollama's side. For each subsequent message, there's some more time needed.

npx openclaw tui

To have some performance comparison, I decided to also try running the agent on latest Qwen 3.6 35B (qwen3.6:35b-a3b-q8_0). The startup took around 30 seconds and each message takes maybe 10.

Power usages

I decided to also order a meter that provided how much power the DGX machine draws in different situations. When it's completely idle, it takes around 30W, with the model loaded to memory but unused it's around 40W and during response generation it oscillates around 170W. Let's do some assumptions - when you sleep you don't use OpenClaw at all but you keep DGX Spark on, so it takes 30W for 7 hours. You are a very heavy user, writing to OpenClaw all day, scheduling a lot of tasks basically treating it as a thinking extension which totals to 8 hours of pure generative work. For all the other time it just sits idle but loaded to memory.

7 h * 30 W = 210 Wh
8 h * 170 W = 1360 Wh
9 h * 40 W = 360 Wh
in total it is about 2 kWh
assuming price in Germany is 0.5€ for a kilowatt-hour, it is 1€ per day, 30€ per month, just for pure token generation.

Obviously you also have to consider costs for AWS EC2 Instance. For the VPN server, if you commit for a year to EC2 saving plans, you will pay around 25€, for the agent instance, if it's online all year round this is 200€ (but note that OpenClaw is especially heavy compared to other harnesses).

If you use small model like Qwen 35B in Q8, you will still have space in VRAM for some image generator like Z-Image-Turbo, a small TTS model or Whisper Turbo for speech recognition. I managed to easily fit image generation and VibeVoice ASR along with the chatbot model in 100 gigs of RAM.

Cons of this setup

Of course each of such setups comes with a tradeoff. You keep your privacy, you maybe pay less for inference (that's debatable) but that's about it. The best setup is to have GB10 as the primary model provider and fall back on something small on OpenRouter. There's always a possibility to mix multiple models and providers using subagents. For example you want to perform a coding task, you use Sonnet 4.6 but you keep local Qwen 3.6 for orchestration. Such setup is far from highly available. Not only can the machine break, you can have blackout or Internet can be down in your flat. Another tradeoff is speed - most of the medium sized models from OpenAI or Anthropic will nevertheless run faster than any decent model on DGX Spark.

The prefill (loading context) speed is 500 tokens per second for Nemotron 3 Super and token generation is 20 tokens per second. Assuming that in each 15 minute window we have to load 200k context, we can generate 10k tokens, that makes it (within 8 hour daily generation) 6.4 M tokens input and 0.32 M tokens output. Prompt prefill accounts for around 40% of the time spent so from a daily spend, 40 cents will go to input tokens and 60 cents to output tokens. Normalized this is around 0.06€ per million input tokens and 1.88€ per million output tokens.

For Qwen 3.6 the speeds look different: 1150 tokens/s prefill and 39 tokens/s generation. Then in 15 minute window (200k input context) the prefill will account for 20% of the generation time. 80% of the time left will be for token generation and this will produce 28k tokens. So within a day we get 6.4 M input and 0.896 M output tokens, normalized this makes it 0.03€/Mtok input and 0.89€/Mtok output.

Is any of these competitive? It highly depends on your use case and usage patterns; whatever I did above is just napkin maths. Gemma 4 31B over OpenRouter is just $0.13/$0.38 in/out but GPT-5.4 Nano is $0.20/$1.25.

Serverless CDC and Event Ingestion Patterns into Analytics Pipelines on AWS

Renaldi — Sun, 19 Apr 2026 23:00:00 +0000

When I work on analytics pipelines for event-driven systems, one of the biggest mistakes I see is treating ingestion as “just connect source A to sink B.”

In production, ingestion is where a lot of the hard engineering lives:

deciding which transport is actually right (EventBridge vs Kinesis vs SQS)
handling ordering, duplication, and replay
transforming events into a canonical analytics schema
delivering to multiple sinks like S3, OpenSearch, and Redshift
keeping the design cost-efficient as volume grows

This is why I like this topic. It is architecture-heavy, it shows real trade-offs, and it comes up constantly in real workloads.

In this post, I will walk through a practical pattern for serverless CDC/event ingestion into analytics pipelines on AWS, including:

EventBridge vs Kinesis vs SQS decisioning
Lambda transformations (normalization, enrichment, routing)
delivery patterns to S3 / OpenSearch / Redshift
handling ordering, duplication, and replay
partitioning and cost optimization
an end-to-end walkthrough and implementation discussion with code

I will focus on patterns that are accurate, scalable, and maintainable rather than “one service solves everything.”

The design principle I start with

I design ingestion in layers:

Ingress transport for delivery semantics (routing, throughput, ordering, buffering)
Transformation layer for canonicalization and enrichment
Durable landing zone (usually S3 first)
Serving/analytics sinks (OpenSearch, Redshift, dashboards, ML features, etc.)
Replay and recovery path as a first-class capability

That structure helps me evolve the system without constantly rewriting downstream consumers.

EventBridge vs Kinesis vs SQS decisioning

This is the first architectural decision, and it shapes everything else.

The short version is:

EventBridge is great for event routing and integration
Kinesis Data Streams is great for high-throughput ordered streaming plus replay
SQS is great for buffering and decoupled async processing

I do not treat them as mutually exclusive. In many production designs, I use two or even all three, each for what it is best at.

Quick decision guide

Use Amazon EventBridge when I need

event routing between services and teams
content-based filtering and fan-out
SaaS integrations and AWS service events
schema governance and event contracts
archive/replay on the event bus (for supported replay workflows)
lower-to-moderate throughput domain events where strict ordering is not required

Use Amazon Kinesis Data Streams when I need

high-throughput event or CDC ingestion
ordering per partition key
multiple independent consumers at stream scale
explicit replay from stream retention
near-real-time analytics pipelines with controlled parallelism

Use Amazon SQS when I need

durable buffering and backpressure absorption
decoupling between producers and consumers
cheap asynchronous processing
retry isolation and DLQ handling
workload smoothing (especially spiky ingest)

My common production pattern

I often use:

EventBridge for domain routing
Kinesis for analytics ingestion backbone
SQS for retry/backpressure side paths

That gives me clean producer contracts and strong ingestion behavior.

What each service is not

I find it useful to say this explicitly during architecture reviews.

EventBridge is not a high-throughput ordered stream

It is excellent for routing, but it does not give me shard-style ordering or stream-style replay semantics like Kinesis retention.

Kinesis is not a drop-in replacement for event bus routing

It gives throughput and ordering, but not the same out-of-the-box event routing and filtering ergonomics as EventBridge.

SQS is not an analytics event backbone by itself

It is amazing for buffering, but replay, retention, and consumer fan-out semantics are different from Kinesis, and standard queues do not preserve ordering.

Reference architecture at a glance

For this post, I will use a practical hybrid pattern that I use often for analytics ingestion:

application and domain events are published to EventBridge
CDC or high-volume events go to Kinesis Data Streams (directly or via a CDC bridge)
Lambda transformer normalizes records into a canonical analytics schema
canonical events are delivered to:
- S3 (primary durable analytics landing zone, partitioned)
- OpenSearch (near-real-time search and observability use cases)
- Redshift Serverless (warehouse analytics, usually S3-first load pattern)
SQS is used for retry isolation and backpressure for sink-specific processors

Mermaid diagram (reference architecture)

End-to-end walkthrough (what I will build conceptually)

To make this concrete, I will walk through an example using an e-commerce platform:

domain events like OrderPlaced and OrderShipped are published on EventBridge
high-volume change events (for example inventory or order status updates) are ingested via Kinesis
a Lambda transformer converts everything into a canonical analytics event
events land in S3 as compressed JSON (or Parquet via Firehose conversion)
selected events are indexed into OpenSearch
Redshift loads from S3 for warehouse analytics

Why I like this pattern

It lets me separate:

operational event routing (EventBridge)
analytics ingestion behavior (Kinesis)
durable storage and replay (S3 + retention)
sink-specific delivery (OpenSearch, Redshift)

This gives me a pipeline that is easier to evolve as analytics use cases grow.

Canonical event schema (the contract that keeps the pipeline sane)

Before I write any code, I define a canonical schema. This is one of the highest-leverage things I do in analytics ingestion.

I do not want every downstream consumer decoding a different source format.

Example canonical schema

{
  "event_id": "evt_01HXYZ...",
  "event_type": "order.placed",
  "event_version": 1,
  "source": "commerce.orders",
  "tenant_id": "tenant_123",
  "entity_type": "order",
  "entity_id": "ord_987",
  "occurred_at": "2026-02-25T10:15:30Z",
  "ingested_at": "2026-02-25T10:15:31Z",
  "trace_id": "trace-abc",
  "idempotency_key": "order.placed:tenant_123:ord_987:v1",
  "sequence_key": "tenant_123#ord_987",
  "payload": {
    "customer_id": "cus_1",
    "currency": "USD",
    "amount": 149.90
  },
  "meta": {
    "transport": "eventbridge",
    "schema_name": "order-events",
    "schema_version": "1.4.0"
  }
}

Fields I specifically care about

event_id: unique event identity for dedupe and tracing
occurred_at: source event time (for analytics)
ingested_at: pipeline time (for operations)
sequence_key: ordering scope (important for Kinesis partitioning and reasoning)
idempotency_key: sink-safe dedupe key when replaying or retrying
event_version: schema evolution support

If I skip this step, the pipeline quickly becomes fragile.

Reference implementation pattern (AWS services)

For this walkthrough, the main flow is:

EventBridge receives domain events
EventBridge rule forwards analytics-relevant events to Kinesis Data Streams
High-volume event or CDC sources also publish to Kinesis Data Streams
Lambda transformer consumes Kinesis batches
Lambda normalizes and enriches records and writes:
- primary path to Firehose -> S3
- selective path to SQS for OpenSearch indexing
Redshift Serverless loads from S3 (COPY and MERGE pattern)
Replay and backfill can occur from Kinesis retention or S3 reprocessing

This keeps the ingestion backbone consistent while allowing different producers.

Infrastructure example (SAM / CloudFormation snippets)

The snippet below shows a minimal but realistic foundation:

Kinesis Data Stream
Lambda transformer
Firehose delivery stream to S3
SQS queue for indexing
EventBridge rule that forwards selected events to Kinesis

This is intentionally a reference snippet (not a full production template) so the article stays readable.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Serverless CDC/Event ingestion to analytics pipeline

Resources:
  AnalyticsEventsStream:
    Type: AWS::Kinesis::Stream
    Properties:
      StreamModeDetails:
        StreamMode: ON_DEMAND
      RetentionPeriodHours: 48

  RawAnalyticsBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256

  AnalyticsIndexQueue:
    Type: AWS::SQS::Queue
    Properties:
      VisibilityTimeout: 120
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt AnalyticsIndexDLQ.Arn
        maxReceiveCount: 5

  AnalyticsIndexDLQ:
    Type: AWS::SQS::Queue

  FirehoseToS3Role:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: firehose.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: FirehoseS3Write
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - s3:AbortMultipartUpload
                  - s3:GetBucketLocation
                  - s3:ListBucket
                  - s3:ListBucketMultipartUploads
                  - s3:PutObject
                Resource:
                  - !GetAtt RawAnalyticsBucket.Arn
                  - !Sub "${RawAnalyticsBucket.Arn}/*"

  AnalyticsFirehose:
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      DeliveryStreamType: DirectPut
      ExtendedS3DestinationConfiguration:
        BucketARN: !GetAtt RawAnalyticsBucket.Arn
        RoleARN: !GetAtt FirehoseToS3Role.Arn
        Prefix: "dataset=events/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/"
        ErrorOutputPrefix: "errors/!{firehose:error-output-type}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/"
        CompressionFormat: GZIP
        BufferingHints:
          IntervalInSeconds: 60
          SizeInMBs: 16

  AnalyticsTransformerFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.12
      Handler: app.lambda_handler
      CodeUri: src/
      MemorySize: 1024
      Timeout: 120
      Environment:
        Variables:
          FIREHOSE_STREAM_NAME: !Ref AnalyticsFirehose
          INDEX_QUEUE_URL: !Ref AnalyticsIndexQueue
      Policies:
        - Statement:
            - Effect: Allow
              Action:
                - firehose:PutRecordBatch
              Resource: !GetAtt AnalyticsFirehose.Arn
            - Effect: Allow
              Action:
                - sqs:SendMessageBatch
                - sqs:SendMessage
              Resource: !GetAtt AnalyticsIndexQueue.Arn
      Events:
        KinesisIngest:
          Type: Kinesis
          Properties:
            Stream: !GetAtt AnalyticsEventsStream.Arn
            StartingPosition: LATEST
            BatchSize: 500
            MaximumBatchingWindowInSeconds: 5
            FunctionResponseTypes:
              - ReportBatchItemFailures

  EventBridgeToKinesisRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: PutToKinesis
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action: kinesis:PutRecord
                Resource: !GetAtt AnalyticsEventsStream.Arn

  AnalyticsEventRule:
    Type: AWS::Events::Rule
    Properties:
      EventPattern:
        source:
          - commerce.orders
          - commerce.inventory
      Targets:
        - Arn: !GetAtt AnalyticsEventsStream.Arn
          Id: KinesisAnalyticsTarget
          RoleArn: !GetAtt EventBridgeToKinesisRole.Arn
          KinesisParameters:
            PartitionKeyPath: "$.detail.orderId"

Why this foundation works

Kinesis on-demand is great while volume is still changing
Firehose to S3 gives durable landing, buffering, and compression
Lambda centralizes canonicalization and routing
SQS isolates OpenSearch indexing retries from the main ingest path
EventBridge feeds analytics without forcing every producer to know about Kinesis directly

Lambda transformation layer (the part that pays for itself)

This is the heart of the pattern.

I use the Lambda transformation layer to:

normalize different event formats into one canonical schema
enrich records (tenant, derived dimensions, lookup joins if lightweight)
attach dedupe metadata and ordering keys
route records to the right sinks
drop or quarantine malformed records

Rules I follow for transformations

Keep it deterministic (same input -> same normalized output)
Keep it fast (avoid heavy network calls in the hot path)
Keep it observable (emit counts by event type and error reason)
Fail records, not whole batches when possible
Preserve original payload if analytics or debugging needs it

Example Lambda transformer (Kinesis -> Firehose + SQS)

This example:

reads a Kinesis batch
normalizes records from multiple sources (EventBridge-shaped or direct JSON)
writes canonical events to Firehose (S3 path)
sends selected event types to SQS for OpenSearch indexing
returns partial batch failures for retriable records

import base64
import json
import os
import time
import hashlib
from datetime import datetime, timezone
from typing import Any, Dict, List

import boto3

firehose = boto3.client("firehose")
sqs = boto3.client("sqs")

FIREHOSE_STREAM_NAME = os.environ["FIREHOSE_STREAM_NAME"]
INDEX_QUEUE_URL = os.environ["INDEX_QUEUE_URL"]

INDEXABLE_EVENT_TYPES = {
    "order.placed",
    "order.shipped",
    "product.updated",
}

def utc_now_iso() -> str:
    return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")

def sha256_text(value: str) -> str:
    return hashlib.sha256(value.encode("utf-8")).hexdigest()

def normalize_event(source_obj: Dict[str, Any]) -> Dict[str, Any]:
    ingested_at = utc_now_iso()

    if "detail-type" in source_obj and "detail" in source_obj:
        detail = source_obj.get("detail") or {}
        event_type = detail.get("eventType") or source_obj["detail-type"].replace(" ", ".").lower()
        entity_id = detail.get("orderId") or detail.get("productId") or detail.get("entityId") or "unknown"
        tenant_id = detail.get("tenantId", "default")
        occurred_at = source_obj.get("time") or ingested_at
        event_id = detail.get("eventId") or source_obj.get("id") or sha256_text(json.dumps(source_obj, sort_keys=True))
        payload = detail
        source_name = source_obj.get("source", "unknown")
    else:
        event_type = source_obj.get("event_type", "unknown")
        entity_id = source_obj.get("entity_id") or source_obj.get("pk") or "unknown"
        tenant_id = source_obj.get("tenant_id", "default")
        occurred_at = source_obj.get("occurred_at") or source_obj.get("timestamp") or ingested_at
        event_id = source_obj.get("event_id") or sha256_text(json.dumps(source_obj, sort_keys=True))
        payload = source_obj.get("payload", source_obj)
        source_name = source_obj.get("source", "direct-producer")

    event_version = int(source_obj.get("event_version", 1)) if isinstance(source_obj, dict) else 1
    entity_type = payload.get("entityType") or ("order" if "order" in event_type else "unknown")
    sequence_key = f"{tenant_id}#{entity_type}#{entity_id}"
    idempotency_key = f"{event_type}:{tenant_id}:{entity_id}:v{event_version}"

    return {
        "event_id": str(event_id),
        "event_type": str(event_type),
        "event_version": event_version,
        "source": str(source_name),
        "tenant_id": str(tenant_id),
        "entity_type": str(entity_type),
        "entity_id": str(entity_id),
        "occurred_at": str(occurred_at),
        "ingested_at": ingested_at,
        "trace_id": str((payload.get("traceId") or source_obj.get("trace_id") or "")),
        "idempotency_key": idempotency_key,
        "sequence_key": sequence_key,
        "payload": payload,
        "meta": {
            "transport": "kinesis",
            "normalized_by": "analytics-transformer-lambda",
            "schema_name": "canonical-analytics-event",
            "schema_version": "1.0.0",
        },
    }

def parse_kinesis_record(record: Dict[str, Any]) -> Dict[str, Any]:
    raw_bytes = base64.b64decode(record["kinesis"]["data"])
    return json.loads(raw_bytes.decode("utf-8"))

def chunked(items: List, size: int):
    for i in range(0, len(items), size):
        yield items[i:i + size]

def send_to_firehose(events: List[Dict[str, Any]]) -> None:
    records = [{"Data": (json.dumps(evt, separators=(",", ":")) + "\n").encode("utf-8")} for evt in events]
    for batch in chunked(records, 500):
        resp = firehose.put_record_batch(DeliveryStreamName=FIREHOSE_STREAM_NAME, Records=batch)
        if resp.get("FailedPutCount", 0):
            raise RuntimeError(f"Firehose batch write had {resp['FailedPutCount']} failed records")

def send_index_jobs(events: List[Dict[str, Any]]) -> None:
    if not events:
        return
    for batch in chunked(events, 10):
        entries = []
        for idx, evt in enumerate(batch):
            entries.append({
                "Id": str(idx),
                "MessageBody": json.dumps({
                    "event_id": evt["event_id"],
                    "event_type": evt["event_type"],
                    "tenant_id": evt["tenant_id"],
                    "entity_id": evt["entity_id"],
                    "occurred_at": evt["occurred_at"],
                    "payload": evt["payload"],
                    "idempotency_key": evt["idempotency_key"],
                })
            })
        sqs.send_message_batch(QueueUrl=INDEX_QUEUE_URL, Entries=entries)

def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
    transformed = []
    index_jobs = []
    batch_item_failures = []

    start_ms = int(time.time() * 1000)

    for record in event.get("Records", []):
        sequence_number = record["kinesis"]["sequenceNumber"]
        try:
            src = parse_kinesis_record(record)
            canonical = normalize_event(src)
            transformed.append(canonical)
            if canonical["event_type"] in INDEXABLE_EVENT_TYPES:
                index_jobs.append(canonical)
        except Exception as e:
            batch_item_failures.append({"itemIdentifier": sequence_number})
            print(json.dumps({
                "level": "ERROR",
                "message": "Failed to parse/normalize record",
                "sequence_number": sequence_number,
                "error": str(e),
            }))

    if transformed:
        try:
            send_to_firehose(transformed)
        except Exception as e:
            print(json.dumps({"level": "ERROR", "message": "Firehose write failed", "error": str(e)}))
            for record in event.get("Records", []):
                seq = record["kinesis"]["sequenceNumber"]
                if {"itemIdentifier": seq} not in batch_item_failures:
                    batch_item_failures.append({"itemIdentifier": seq})
            return {"batchItemFailures": batch_item_failures}

    if index_jobs:
        try:
            send_index_jobs(index_jobs)
        except Exception as e:
            # Often I do not fail primary ingest if indexing queue write fails.
            print(json.dumps({"level": "ERROR", "message": "Index queue write failed", "error": str(e)}))

    duration_ms = int(time.time() * 1000) - start_ms
    print(json.dumps({
        "level": "INFO",
        "message": "Batch processed",
        "records_in": len(event.get("Records", [])),
        "records_transformed": len(transformed),
        "records_indexed": len(index_jobs),
        "records_failed": len(batch_item_failures),
        "duration_ms": duration_ms,
    }))

    return {"batchItemFailures": batch_item_failures}

Why this implementation pattern works

I treat S3 landing as the primary success path
I isolate OpenSearch indexing via SQS
I use partial batch failure for source retries
I preserve enough metadata (event_id, idempotency_key, sequence_key) for dedupe and replay

OpenSearch delivery pattern (what I do in practice)

For OpenSearch, I do not assume the ingest path and indexing path should share the same retry semantics.

That is why I often decouple indexing with SQS.

Why SQS in front of OpenSearch helps

OpenSearch can throttle under load
index mapping errors or payload issues should not block S3 landing
I can tune retry behavior independently
I can replay index jobs from S3 if needed

Simple SQS -> Lambda -> OpenSearch indexer (illustrative snippet)

import json
import os
from typing import Dict, Any, List

from opensearchpy import OpenSearch, RequestsHttpConnection, helpers

OPENSEARCH_HOST = os.environ["OPENSEARCH_HOST"]
INDEX_NAME = os.environ.get("OPENSEARCH_INDEX", "analytics-events")

client = OpenSearch(
    hosts=[{"host": OPENSEARCH_HOST, "port": 443}],
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

def to_index_doc(evt: Dict[str, Any]) -> Dict[str, Any]:
    return {
        "event_id": evt["event_id"],
        "event_type": evt["event_type"],
        "tenant_id": evt["tenant_id"],
        "entity_id": evt["entity_id"],
        "occurred_at": evt["occurred_at"],
        "idempotency_key": evt["idempotency_key"],
        "payload": evt["payload"],
    }

def lambda_handler(event, context):
    actions: List[Dict[str, Any]] = []
    for record in event["Records"]:
        evt = json.loads(record["body"])
        actions.append({
            "_op_type": "index",
            "_index": INDEX_NAME,
            "_id": evt["event_id"],
            "_source": to_index_doc(evt),
        })

    if actions:
        helpers.bulk(client, actions, request_timeout=30)

    return {"ok": True, "indexed": len(actions)}

Best practice note

Using _id = event_id gives me idempotent-friendly indexing behavior (retries overwrite the same document rather than creating duplicates). That is usually what I want for analytics and search event documents.

Delivery to Redshift (S3-first is the pattern I recommend most)

For analytics warehouses, I usually prefer S3-first ingestion rather than writing directly to Redshift from the transformation Lambda.

Why:

S3 is a durable landing zone for replay and audit
Redshift loads can be batched efficiently
I can rebuild tables from historical data
I keep ingestion and warehouse modeling decoupled

Common pattern

Land canonical events in S3
Load into a staging table in Redshift
MERGE into analytics tables (or fact tables)
Keep a watermark or batch manifest for operations

Example Redshift SQL (staging + merge)

CREATE TABLE IF NOT EXISTS staging_events (
    event_id         VARCHAR(128),
    event_type       VARCHAR(128),
    event_version    INT,
    source           VARCHAR(256),
    tenant_id        VARCHAR(128),
    entity_type      VARCHAR(128),
    entity_id        VARCHAR(128),
    occurred_at      TIMESTAMP,
    ingested_at      TIMESTAMP,
    trace_id         VARCHAR(256),
    idempotency_key  VARCHAR(256),
    sequence_key     VARCHAR(256),
    payload          SUPER
);

COPY staging_events
FROM 's3://your-bucket/dataset=events/year=2026/month=02/day=25/'
IAM_ROLE 'arn:aws:iam::<account-id>:role/RedshiftCopyRole'
FORMAT AS JSON 'auto'
TIMEFORMAT 'auto'
GZIP;

CREATE TABLE IF NOT EXISTS fact_order_events (
    event_id         VARCHAR(128) PRIMARY KEY,
    tenant_id        VARCHAR(128),
    order_id         VARCHAR(128),
    event_type       VARCHAR(128),
    occurred_at      TIMESTAMP,
    ingested_at      TIMESTAMP,
    amount           DECIMAL(18,2),
    currency         VARCHAR(16),
    payload          SUPER
);

MERGE INTO fact_order_events AS tgt
USING (
    SELECT
        event_id,
        tenant_id,
        entity_id AS order_id,
        event_type,
        occurred_at,
        ingested_at,
        TRY_CAST(payload.amount AS DECIMAL(18,2)) AS amount,
        CAST(payload.currency AS VARCHAR(16)) AS currency,
        payload
    FROM staging_events
    WHERE entity_type = 'order'
) AS src
ON tgt.event_id = src.event_id
WHEN MATCHED THEN UPDATE SET
    tenant_id = src.tenant_id,
    order_id = src.order_id,
    event_type = src.event_type,
    occurred_at = src.occurred_at,
    ingested_at = src.ingested_at,
    amount = src.amount,
    currency = src.currency,
    payload = src.payload
WHEN NOT MATCHED THEN INSERT (
    event_id, tenant_id, order_id, event_type, occurred_at, ingested_at, amount, currency, payload
) VALUES (
    src.event_id, src.tenant_id, src.order_id, src.event_type, src.occurred_at, src.ingested_at, src.amount, src.currency, src.payload
);

TRUNCATE TABLE staging_events;

Why `MERGE` is important in CDC and event pipelines

Retries and replays happen. MERGE lets me keep warehouse loads idempotent at the table level rather than assuming every batch is perfectly unique.

Ordering, duplication, and replay (the part that breaks naive designs)

This is where I spend a lot of time in reviews because it directly affects data correctness.

Ordering: what I can and cannot guarantee

Kinesis ordering

Kinesis preserves record order within a shard, and practically I think in terms of ordering per partition key.

If I need ordering for orderId, I choose a partition key tied to that ordering scope (for example tenantId#orderId).

EventBridge ordering

I do not assume EventBridge preserves strict ordering across events. If ordering matters for analytics correctness, I enforce it downstream with event timestamps, versions, and conflict resolution logic.

SQS ordering

Standard queues: no strict ordering, duplicates possible
FIFO queues: ordered per MessageGroupId, with a bounded dedupe window

My rule of thumb

I preserve ordering only where it matters:

choose a sequence key
partition based on that key when using Kinesis or FIFO
store event_version and occurred_at
make downstream upserts resilient to out-of-order arrivals

Trying to preserve global ordering everywhere usually makes the system slower and more expensive than it needs to be.

Duplication: assume it will happen

I assume duplicates can appear because of:

producer retries
Lambda retries and partial batch retries
EventBridge target retries
SQS redrives
replay and backfill operations
manual reprocessing

How I handle duplicates

I include in the canonical record:

event_id
idempotency_key

Then I make sinks safe:

S3: duplicates can exist physically, but I dedupe downstream in queries or ETL
OpenSearch: use _id = event_id to overwrite same document on retry
Redshift: MERGE on event_id or business key

This is the exactly-once myth versus at-least-once reality pattern applied to analytics ingestion.

Replay: make it a feature, not an emergency procedure

I design replay paths intentionally from day one.

Replay options in this architecture

Kinesis retention replay (re-read retained stream window)
EventBridge archive and replay (for applicable event bus scenarios)
S3 reprocessing (most flexible for historical rebuilds and backfills)

Why S3 replay matters most

Even if I have Kinesis or EventBridge replay features, S3 is usually my best long-term replay layer because:

it keeps historical data longer
it is cheap and durable
I can reprocess with new transformation logic
I can rebuild OpenSearch or Redshift if needed

That is why I strongly prefer S3-first landing for analytics pipelines.

Partitioning and cost optimization (where big savings come from)

This topic matters a lot because ingestion costs scale with volume, and bad partitioning creates pain in both storage and query engines.

S3 partitioning strategy (practical guidance)

A common anti-pattern is over-partitioning too early.

What I usually start with

For general event analytics, I start with time-based partitions:

dataset=events/year=YYYY/month=MM/day=DD/

Then, if query patterns justify it, I add one more selective dimension:

tenant_id=...
event_type=...

What I avoid early on

highly granular partitions that create too many small files
partitioning on high-cardinality IDs like order_id or user_id
changing partition schemes frequently without a migration plan

Small files are a real cost problem

Too many tiny files hurt:

Athena or Redshift query planning and performance
metadata overhead
downstream ETL efficiency

That is why I use batching and buffering (Firehose or app-side batching) and aim for healthy object sizes.

Firehose and file size optimization

Firehose helps reduce operational overhead, and I use it a lot for S3 landing.

Best practices I apply

enable compression (GZIP minimum; Parquet or ORC conversion when appropriate)
tune buffer interval and size
use error output prefixes for bad records
keep schemas stable enough if using format conversion

When I choose Parquet conversion

I choose Parquet when:

analytics queries dominate
schema is reasonably stable
I want lower scan cost and faster query performance

I keep JSON initially if:

schema changes rapidly
debugging raw payloads is a priority
multiple downstream consumers still need semi-structured payloads

A common compromise is:

raw JSON landing plus curated Parquet later

Kinesis cost and throughput optimization

Kinesis can be very cost-effective when used intentionally, but I still tune it.

Decisions I make explicitly

On-demand vs provisioned
- start with on-demand for uncertain traffic
- move to provisioned when traffic is predictable and steady
partition key distribution to avoid hot shards
batch sizes and Lambda windowing to reduce invocation overhead
consumer count (Enhanced Fan-Out only when justified)

Hot shard warning sign

If one key dominates (for example a single tenant or entity), I can get uneven throughput and throttling.

Fixes include:

better partition key strategy
partition key suffixing (only if ordering requirements allow)
separating noisy tenants or workloads

Lambda cost optimization in ingest pipelines

Lambda is often not the dominant cost at first, but it can become noticeable at scale.

Tuning areas I care about

batch size and batching window
memory sizing (to get better CPU and network, and shorter runtime)
avoiding heavy per-record network calls
reusing clients across invocations
minimizing unnecessary JSON serialization churn

A practical optimization

I treat the transformer as a batch processor, not a record-at-a-time handler. That usually improves both throughput and cost.

Redshift cost optimization in event ingestion

When Redshift is the warehouse sink, I optimize the load pattern, not just the compute.

Best practices I use

load from S3 in batches (COPY), not row-by-row inserts from Lambda
stage then MERGE
align file sizes to efficient COPY behavior
keep raw event retention in S3 so warehouse tables can be rebuilt and re-modeled

For many teams, the biggest cost win is simply moving from ad hoc inserts to an S3 batch load pattern.

End-to-end implementation discussion (how I wire this in production)

This is the part I care about most because architecture decisions show up in operations.

1) I define the source of truth for ingestion success

In this design, the source of truth is:

Successful normalized delivery to S3 (via Firehose).

Why:

S3 is durable
S3 supports replay
downstream sinks can catch up independently

This prevents me from coupling ingestion success to OpenSearch availability, for example.

2) I decouple sink-specific SLAs

Different sinks serve different users:

OpenSearch may need near-real-time indexing for search or ops views
Redshift loads may run in micro-batches
lake consumers may process hourly

By decoupling them, I avoid making the entire pipeline as fragile as the most sensitive sink.

3) I make replay and backfill a documented operation

I document:

replay source (Kinesis, EventBridge archive, or S3)
dedupe keys and merge behavior
expected lag and throughput limits
how to avoid double-indexing side effects

This turns replay into an operational capability instead of a risky one-off script.

4) I design for schema evolution early

Events change. They always do.

I version:

event schema (event_version)
transformation logic (deployable version)
warehouse model migrations

I also preserve raw payloads so I can re-derive curated data if the schema evolves.

Common mistakes I see (and how I avoid them)

Mistake 1: Using only EventBridge for everything and expecting stream semantics

EventBridge is excellent for routing, but it is not the same as Kinesis when I need sustained high-throughput ordered ingestion.

Fix: use EventBridge for routing, Kinesis for the analytics ingestion backbone when needed.

Mistake 2: Letting sink failures block primary landing

If OpenSearch throttles and that blocks the whole ingest path, the pipeline becomes fragile.

Fix: make S3 landing primary, and decouple secondary sinks with SQS or replay.

Mistake 3: No canonical schema

Every producer emits a different shape, and downstream SQL gets messy fast.

Fix: normalize once in Lambda and publish a canonical analytics contract.

Mistake 4: Ignoring duplication until dashboards look wrong

Retries, redrives, and replay all create duplicates eventually.

Fix: include event_id and dedupe keys, and make each sink idempotent.

Mistake 5: Over-partitioning S3 on day one

This creates small files, metadata overhead, and poor performance.

Fix: start with time partitions and compression, then add dimensions based on real query patterns.

Practical best practices checklist

Transport decisioning

[ ] EventBridge used for routing and integration use cases
[ ] Kinesis used where throughput, order, and replay requirements justify it
[ ] SQS used for buffering and retry isolation where needed

Transformation layer

[ ] Canonical schema defined and versioned
[ ] Transformer is deterministic and observable
[ ] Partial batch failure behavior is configured for stream or queue consumers
[ ] Original payload preserved when needed for replay and debugging

Sink delivery

[ ] S3 is durable landing zone (preferred for analytics)
[ ] OpenSearch indexing path is decoupled from primary ingest
[ ] Redshift loads are batch-based (COPY and MERGE), not row-by-row Lambda inserts
[ ] Dedupe and idempotency strategy exists per sink

Ordering / duplication / replay

[ ] Partition keys align to ordering scope
[ ] Duplicate handling defined across retries and replays
[ ] Replay and backfill path documented and tested
[ ] Metrics and alarms exist for lag, failure, and sink throttling

Cost / performance

[ ] S3 compression enabled
[ ] Partitioning strategy avoids small-file explosion
[ ] Kinesis mode (on-demand or provisioned) chosen intentionally
[ ] Lambda batching and memory tuned with real metrics

Final thoughts

If I had to summarize this architecture pattern in one line, it would be:

Use the right service for the right ingestion job, normalize once, land durably in S3, and make every downstream sink replay-safe.

That combination gives me:

cleaner producer integrations
better analytics correctness
safer reprocessing
more predictable scaling and cost

For most teams, the biggest improvement is not a new service. It is adopting a clearer ingestion architecture with explicit semantics for ordering, duplication, replay, and sink ownership.

If you are building serverless analytics pipelines on AWS, this pattern will give you a strong foundation that can grow with both event volume and analytics complexity.

References

Amazon EventBridge documentation (event buses, rules, targets, archive/replay)
Amazon Kinesis Data Streams documentation (stream modes, ordering, retention, consumers)
Amazon SQS documentation (standard vs FIFO, retries, DLQs)
AWS Lambda documentation (event source mappings, partial batch response)
Amazon Kinesis Data Firehose documentation (S3/OpenSearch/Redshift delivery, buffering, compression)
Amazon S3 documentation (partitioning and storage best practices)
Amazon OpenSearch Service documentation
Amazon Redshift and Redshift Serverless documentation (COPY, MERGE, SUPER)

Rebuilding TLS, Part 3 — Building Our First Handshake

Dmytro Huz — Sun, 19 Apr 2026 17:09:17 +0000

Overview: Where we are and What Is Still Missing

In the previous part of this series, we made our fake secure channel much less fake.

We started with the broken encrypted transport from Part 1, added integrity with HMAC, added sequence numbers to make the record layer less naive, and then moved to AEAD — the approach modern systems usually use to protect records.

At that point, our protocol could already do something meaningful:

encrypt application data
detect tampering
reject modified records
keep some minimal record-layer state

That was a real step forward.

But it still relied on one very unrealistic assumption:

both sides already shared the secret keys

And that is exactly what we need to remove now.

Because a real secure protocol cannot stop at protecting data after the keys already exist. It also has to answer one of the harder questions first:

if client and server do not already share a secret, how can they create one over an insecure network in the first place?

That is the goal of this part.

We are going to build the next missing layer of the protocol: the handshake.

The architecture of this step is simple:

Client                           Server
------                           ------
Handshake messages  <--------->  Handshake messages
       |                               |
       v                               v
  shared secret                  shared secret
       |                               |
       +---------> HKDF <--------------+
                    |
                    v
              session keys
                    |
                    v
         protected application data

The idea is to let the connection create fresh key material dynamically instead of starting with a hardcoded application key.

We will implement that in three steps.

First, we will build a handshake with classic Diffie-Hellman, where the shared prime and base are still explicit and visible in the protocol. Then we will replace that version with X25519 to show how modern protocols simplify the same idea. After that, we will use HKDF to derive proper session keys from the raw shared secret.

That will take us one big step closer to the shape of real TLS.

But still not all the way.

Because even if both sides manage to derive the same fresh session keys, one critical problem will remain: they still do not know who is on the other side.

And that is where this part is heading.

A Very Short Note on Public Key Exchange

The basic idea of public key exchange is simple.

Two sides communicate over an insecure network. They exchange some public information. And from that exchange, both sides derive the same shared secret — without ever sending that secret directly over the wire.

That is the key point.

The network can be fully visible.

An observer can see all handshake messages.

But the observer still should not be able to derive the same secret.

That is exactly the kind of mechanism we need now.

Until this point in the series, our protocol always started with a secret that already existed. Public key exchange changes that. It gives the connection a way to create fresh shared key material dynamically.

In this article, I do not want to go deep into the mathematics behind it. I only want to use the core idea as the next building block of the protocol.

If you want the deeper intuition behind why this works, I already wrote about it here:

The aha moment of public key encryption

https://www.dmytrohuz.com/p/the-aha-moment-of-public-key-encryption

For now, the main idea we need is this:

each side contributes its own private value
both sides exchange some public values
both sides derive the same shared secret
that secret can then become the basis for session keys

So let’s build that first in the most explicit way, with classic Diffie-Hellman where the shared public parameters are still visible in the handshake.

Implementation Part 1 — Our First Handshake with Classic Diffie-Hellman

Now let’s build the first real handshake in the series.

I want to start with classic Diffie-Hellman, not because this is the final form we want to keep, but because it makes the mechanics of key exchange much more visible.

In this version, both sides work with the same public parameters:

a prime p
a generator g

These values are not secret. In our implementation, the client sends them in the handshake, which makes the whole mechanism more explicit on the wire. That is exactly what I want at this stage. Before we hide the details behind a cleaner modern primitive, I want to make the structure fully visible.

The actual secret material comes from somewhere else:

the client chooses a private exponent a
the server chooses a private exponent b

From those private values, both sides compute public values:

the client computes A = g^a mod p
the server computes B = g^b mod p

Then they exchange A and B.

And this is the key step:

the client computes s = B^a mod p
the server computes s = A^b mod p

Both sides end up with the same shared secret, without ever sending that secret directly over the network.

In diagram form, the handshake looks like this:

Client                                        Server
------                                        ------
choose private a
compute A = g^a mod p

ClientHello(p, g, A)        --------->

                                              choose private b
                                              compute B = g^b mod p

                            <---------          ServerHello(B)

compute s = B^a mod p                           compute s = A^b mod p

That is our first real handshake.

Until now, the protocol always started with a secret key that already existed.

Now the connection itself creates the secret.

That is a major shift.

The raw Diffie-Hellman math

At the lowest level, the core operations are very small. That is one of the nice things about starting with classic Diffie-Hellman: the whole idea is still visible in a few functions.


# RFC 3526 Group 14: 2048-bit MODP prime
DH_PRIME = int(
    "FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD1"
    "29024E088A67CC74020BBEA63B139B22514A08798E3404DD"
    "EF9519B3CD3A431B302B0A6DF25F14374FE1356D6D51C245"
    "E485B576625E7EC6F44C42E9A637ED6B0BFF5CB6F406B7ED"
    "EE386BFB5A899FA5AE9F24117C4B1FE649286651ECE45B3D"
    "C2007CB8A163BF0598DA48361C55D39A69163FA8FD24CF5F"
    "83655D23DCA3AD961C62F356208552BB9ED529077096966D"
    "670C354E4ABC9804F1746C08CA18217C32905E462E36CE3B"
    "E39E772C180E86039B2783A2EC07A28FB5C55DF06F4C52C9"
    "DE2BCBF6955817183995497CEA956AE515D2261898FA0510"
    "15728E5A8AACAA68FFFFFFFFFFFFFFFF",
    16,
)

DH_GENERATOR = 2

def generate_private_exponent() -> int:
    return int.from_bytes(os.urandom(32), "big")

def compute_public_value(private: int, g: int, p: int) -> int:
    return pow(g, private, p)

def compute_shared_secret(peer_public: int, private: int, p: int) -> int:
    return pow(peer_public, private, p)

This is the whole core idea in code:

private exponent stays local
public value goes on the wire
shared secret is derived independently on both sides

That is the heart of Diffie-Hellman.

Client side

def client_handshake(sock) -> bytes:
    """Perform the client side of the classic DH handshake.

    The client picks the public parameters (p, g) and sends them to the
    server along with its own public DH value.  The server uses those
    parameters to compute its own public value and sends it back.

    Returns the shared secret as bytes.
    """
    # The client chooses p and g.  These are PUBLIC — not secret.
    # Anyone on the wire can see them, and that is perfectly fine.
    # The security of DH depends on the hardness of the discrete
    # logarithm problem, not on hiding p and g.
    p = DH_PRIME
    g = DH_GENERATOR

    print(f"  Public parameters (chosen by client, sent to server):")
    print(f"    p = {str(p)[:40]}... ({p.bit_length()} bits)")
    print(f"    g = {g}")

    # Step 1: Generate client's private exponent and public value.
    # The private exponent is the ONE thing that stays secret.
    client_private = generate_private_exponent()
    client_public = compute_public_value(client_private, g, p)
    client_public_bytes = int_to_bytes(client_public)

    # Step 2: Send ClientHello with p, g, and our public value.
    # All three are public.  The private exponent is NOT included.
    p_bytes = int_to_bytes(p)
    g_bytes = int_to_bytes(g)

    client_hello = encode_message(
        [
            (TAG_DH_P, p_bytes),
            (TAG_DH_G, g_bytes),
            (TAG_DH_PUBLIC, client_public_bytes),
        ]
    )
    # Step 3: send p, g, and the client’s public value inside ClientHello
    send_record(sock, client_hello)

    # Step 4: Receive ServerHello with the server's public value.
    server_hello_raw = recv_record(sock)
    fields = decode_message(server_hello_raw)
    server_public_bytes = None
    for tag, value in fields:
        if tag == TAG_DH_PUBLIC:
            server_public_bytes = value
    if server_public_bytes is None:
        raise ValueError("ServerHello missing DH public value")

    server_public = bytes_to_int(server_public_bytes)
    print(f"  <- Received ServerHello")
    print(f"  Server public value B:   {hex_preview(server_public_bytes)}")

    # Step 5: Compute the shared secret.
    # shared = B^a mod p = (g^b)^a mod p = g^(ab) mod p
    shared_int = compute_shared_secret(server_public, client_private, p)
    shared_bytes = int_to_bytes(shared_int)

    return shared_bytes

On the client side, the flow is:

choose a private exponent
compute the public value
send p, g, and the client’s public value inside ClientHello
receive the server’s public value
derive the shared secret

That is the first point in the series where the client does not begin with the application key. It participates in creating it.

Server side

def server_handshake(sock) -> bytes:
    """Perform the server side of the classic DH handshake.

    The server receives p, g, and client_public from the ClientHello,
    uses those parameters to generate its own keypair, and sends its
    public value back.

    Returns the shared secret as bytes.
    """
    # Step 1: Receive ClientHello — parse p, g, and client's public value.
    # The server does NOT assume any particular p or g.  It uses whatever
    # the client proposes.  (In a production system, the server would
    # validate that p is a safe prime and g is a proper generator.
    # We skip that here for clarity.)
    client_hello_raw = recv_record(sock)
    fields = decode_message(client_hello_raw)

    p_bytes = None
    g_bytes = None
    client_public_bytes = None
    for tag, value in fields:
        if tag == TAG_DH_P:
            p_bytes = value
        elif tag == TAG_DH_G:
            g_bytes = value
        elif tag == TAG_DH_PUBLIC:
            client_public_bytes = value

    if p_bytes is None:
        raise ValueError("ClientHello missing DH prime (p)")
    if g_bytes is None:
        raise ValueError("ClientHello missing DH generator (g)")
    if client_public_bytes is None:
        raise ValueError("ClientHello missing DH public value (A)")

    # Deserialize the parameters from bytes.
    p = bytes_to_int(p_bytes)
    g = bytes_to_int(g_bytes)
    client_public = bytes_to_int(client_public_bytes)

    # Step 2: Generate server's private exponent and public value
    # using the p and g received from the client.
    server_private = generate_private_exponent()

    # Step 3: Compute server's public value
    server_public = compute_public_value(server_private, g, p)
    server_public_bytes = int_to_bytes(server_public)

    # Step 4: Send ServerHello with our public value.
    # Only B is sent — p and g are already known from the ClientHello.
    server_hello = encode_message(
        [
            (TAG_DH_PUBLIC, server_public_bytes),
        ]
    )
    send_record(sock, server_hello)

    # Step 5: Compute the shared secret.
    # shared = A^b mod p = (g^a)^b mod p = g^(ab) mod p
    shared_int = compute_shared_secret(client_public, server_private, p)
    shared_bytes = int_to_bytes(shared_int)

    return shared_bytes

The server does the mirror image:

receive p, g, and the client’s public value
choose its own private exponent
compute its own public value
send that value back in ServerHello
derive the same shared secret from the client’s public value

So at the end of the handshake, both sides have the same secret — but that secret was never transmitted directly.

That is the big win.

After this step, the connection can create fresh shared key material dynamically.

That is a much more realistic foundation.

But it is also still awkward.

Not conceptually awkward — educationally this version is very useful — but operationally awkward. We now have explicit p and g in the handshake, which is nice for understanding the mechanism, but clunky for a modern protocol design.

That is exactly why the next step will replace this version with X25519.

Implementation Part 2 — Simplifying the Handshake with X25519

The classic Diffie-Hellman version was useful because it made the mechanics of the handshake fully visible.

But it also makes something else visible:

it is a bit clunky.

Not conceptually clunky — educationally it is great — but operationally clunky. There are more moving parts in the handshake, more explicit protocol fields, and more visible math than modern protocols usually want to expose directly.

So now we keep the same core idea and simplify the workflow.

That is where X25519 comes in.

The conceptual goal stays exactly the same:

both sides generate ephemeral private/public key pairs
both sides exchange public keys
both sides derive the same shared secret
that secret will later become the basis for session keys

What changes is the shape of the handshake.

We no longer need to carry an explicit prime and generator through the protocol. We no longer manually perform modular exponentiation with visible p and g. X25519 gives us the same public-key exchange idea in a much cleaner modern form.

That is why I wanted this section right after the classic DH version.

Classic DH makes the mechanism visible.

X25519 shows what the modern streamlined version looks like.

Client-side handshake structure

Here is the current client handshake implementation:

def client_handshake(sock) -> bytes:
    """Perform the client side of the X25519 handshake.

    Returns the 32-byte shared secret.
    """
    print("\n[handshake] Client: starting X25519 handshake")

    # Step 1: Generate an ephemeral X25519 keypair.
    # "Ephemeral" means we create a fresh keypair for this session only.
    # The private key never leaves this process and is discarded after use.
    client_private = X25519PrivateKey.generate()
    client_public = client_private.public_key()
    client_public_bytes = client_public.public_bytes(Encoding.Raw, PublicFormat.Raw)

    # Step 2: Send ClientHello with our public key.
    client_hello = encode_message(
        [
            (TAG_X25519_PUBLIC, client_public_bytes),
        ]
    )
    send_record(sock, client_hello)

    # Step 3: Receive ServerHello with the server's public key.
    server_hello_raw = recv_record(sock)
    fields = decode_message(server_hello_raw)
    server_public_bytes = None
    for tag, value in fields:
        if tag == TAG_X25519_PUBLIC:
            server_public_bytes = value
    if server_public_bytes is None:
        raise ValueError("ServerHello missing X25519 public key")

    # Deserialize the server's public key from raw bytes.
    server_public = X25519PublicKey.from_public_bytes(server_public_bytes)

    # Step 4: Compute the shared secret.
    # X25519(client_private, server_public) = X25519(server_private, client_public)
    # This is the elliptic-curve equivalent of g^(ab) mod p from v1.
    shared_secret = client_private.exchange(server_public)

    return shared_secret

I like this version because it makes the transition very clear.

The client code no longer has to think about p and g at all. It just performs the handshake, gets the shared secret, and prints it. That is exactly the point of this stage in the series: the workflow becomes smaller, but the underlying purpose stays the same.

What changed conceptually

Compared to the classic DH version, the protocol has become simpler in three important ways.

1. No explicit shared public parameters in the handshake

In the previous version, the client sent the prime and generator so the whole structure of classic Diffie-Hellman stayed visible.

Now that goes away.

X25519 already gives us a fixed, standard structure for the exchange, so the handshake only needs to carry the public key material.

That makes the protocol smaller and cleaner.

2. The public values are much more compact

In the classic DH version, the public values were tied to a large prime-field construction and looked much heavier in the protocol.

In this version, the public keys are just 32 bytes.

That is a huge practical simplification.

3. The code starts to look more like real modern protocol code

This line from the comments says it well:

generate(), exchange(), done.

That is exactly the feeling this section should create.

We are still doing public-key exchange.

We are still deriving a shared secret.

But the implementation shape is now much closer to what modern systems actually use.

What this version still does not solve

Even after switching to X25519, this version is still simplified:

there is still no authentication
the shared secret is not yet turned into session keys
there is still no record-layer encryption using the new keys

In the next step, we will add HKDF and derive proper working session keys from it.

That is where the handshake starts to connect back to the record protection we built earlier.

Implementation Part 3 — Deriving Session Keys with HKDF

At this point, both the classic Diffie-Hellman version and the X25519 version give us the same kind of output:

a shared secret that both sides can compute independently.

That is already a big step forward compared to the pre-shared-key model from the previous parts. The connection can now create fresh key material dynamically instead of starting with one hardcoded application key.

But there is still one important design question left:

should we use that raw shared secret directly as the application key?

For a toy demo, we probably could.

But even here, that would be the wrong direction.

Because a cleaner protocol separates these two ideas:

the handshake creates a shared secret
the protocol derives working session keys from that secret

That is exactly where HKDF comes in.

HKDF is a key-derivation function. Its job is not to invent secrecy out of nowhere, but to take existing secret material and turn it into keys that are better structured and easier to use safely inside the protocol.

So instead of treating the X25519 output as “the AES key,” we will use HKDF to derive proper session keys from it.

That already makes the protocol feel much closer to real TLS.

What changes conceptually

The structure now becomes:

X25519 shared secret
        |
        v
      HKDF
        |
        v
  session key material
        |
        v
 protected application data

This is an important shift.

Before this step, the handshake produced something secret and we could have stopped there.

After this step, the handshake produces an input to a key schedule.

That is a much better protocol design.

Why this matters

There are two main reasons to do this.

1. The raw shared secret is handshake output, not final protocol state

The shared secret is the result of key exchange. That does not automatically mean it should be used directly as the application-data key.

Protocols usually want a cleaner boundary:

handshake result first
working keys second

2. We can derive keys for different purposes

Once we introduce a key-derivation step, we are no longer forced into “one secret for everything.”

Even in this toy protocol, that opens the door to a much more realistic design.

For example, instead of one single AEAD key, we can derive:

client → server key
server → client key

That is already much closer to how real secure protocols think.

Deriving the keys

In the current implementation, HKDF takes the X25519 shared secret and stretches it into 64 bytes of key material.

Then that material is split into two 32-byte keys:

one for traffic from client to server
one for traffic from server to client

That gives us directional keys instead of one shared application key for both directions.

Here is the key schedule:

# key_schedule_x25519.py
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.hkdf import HKDF

def derive_session_keys(shared_secret: bytes) -> tuple[bytes, bytes]:
    key_material = HKDF(
        algorithm=hashes.SHA256(),
        length=64,
        salt=None,
        info=b"toy-tls-part-3-x25519",
    ).derive(shared_secret)

    client_to_server_key = key_material[:32]
    server_to_client_key = key_material[32:]

    return client_to_server_key, server_to_client_key

I like this step a lot because it is small in code, but it changes the protocol mindset in an important way.

We are no longer thinking:

handshake gives us the key

We are now thinking:

handshake gives us secret material, and the protocol derives the keys it actually wants to use

That is a much stronger model.

A small but important detail

Notice that the two sides must interpret the derived keys consistently.

If the client treats the first 32 bytes as the client → server key, then the server must do the same. Otherwise the channel will immediately break.

So now the handshake is not only producing shared secret material. It is also establishing a shared rule for how that material becomes working traffic keys.

That is another reason protocols need structure, not just primitives.

Connecting HKDF back to the record layer

Now we can finally connect this part back to what we built earlier.

In Part 2, we already built an AEAD-protected record layer. But that record layer still depended on hardcoded keys.

Now that changes.

The AEAD layer no longer starts with a static key from configuration.

It receives fresh traffic keys from the handshake.

So the protocol shape becomes:

Handshake -> X25519 shared secret -> HKDF -> directional session keys -> AEAD protected records

That is a major milestone in the series.

At this point, the protocol no longer just looks secure because we wrapped some bytes in encryption. It now has a real high-level structure:

first establish shared key material
then derive traffic keys
then use those keys to protect application data

That is already much closer to the shape of real TLS.

Using the new session keys

Once the keys are derived, the record layer can use them directly.

Conceptually, the flow now looks like this:

Client

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as client:
    client.connect((HOST, PORT))
    print(f"Connected to {HOST}:{PORT}")

    # ==========================================
    # PHASE 1: HANDSHAKE
    # ==========================================
    # New in Part 3: the handshake dynamically establishes session keys.
    # No pre-shared secret needed.
    client_write_key, server_write_key = client_handshake(client)

    # ==========================================
    # PHASE 2: APPLICATION DATA
    # ==========================================
    # The record layer now uses HKDF-derived keys instead of hardcoded ones.
    # The record format is the same as Part 2 Stage 3 (AEAD).

    # --- Send request (encrypted with client_write_key) ---
    protected = protect_record(client_write_key, send_seq, request)
    send_record(client, protected)
    send_seq += 1

    # --- Receive response (decrypted with server_write_key) ---
    raw_response = recv_record(client)

    try:
        response = unprotect_record(server_write_key, recv_seq, raw_response)
        recv_seq += 1
        print(f"\n  Decrypted response:\n  {response.decode('utf-8')}")
    except Exception as e:
        print(f"\n  *** REJECTED: {e} ***")

print("\nDone.")

use client_write_key to protect outgoing application data
use server_write_key to unprotect incoming application data

Server

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as server:
    server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server.bind((HOST, PORT))
    server.listen(1)
    print(f"Listening on {HOST}:{PORT}")

    conn, addr = server.accept()
    with conn:
        # ==========================================
        # PHASE 1: HANDSHAKE
        # ==========================================
        client_write_key, server_write_key = server_handshake(conn)

        # ==========================================
        # PHASE 2: APPLICATION DATA
        # ==========================================

        # --- Receive request (decrypted with client_write_key) ---
        raw_request = recv_record(conn)

        try:
            request = unprotect_record(client_write_key, recv_seq, raw_request)
            recv_seq += 1
        except Exception as e:
            print(f"\n  *** REJECTED: {e} ***")
            print("  Connection closed — refusing to process invalid data.")
        else:

            # --- Send response (encrypted with server_write_key) ---
            response = (
                "HTTP/1.1 200 OK\r\n"
                "Content-Type: text/plain\r\n"
                "Content-Length: 13\r\n\r\n"
                "hello, client"
            ).encode("utf-8")

            protected = protect_record(server_write_key, send_seq, response)
            send_record(conn, protected)
            send_seq += 1

print("\nDone.")

use client_write_key to unprotect incoming client traffic
use server_write_key to protect outgoing server traffic

That means the two directions are now separated.

This is cleaner than one symmetric application key shared blindly by both directions, and it makes the protocol feel more deliberate.

Even in this simplified version, that is a meaningful step.

What this step really gave us

By adding HKDF, we improved the protocol in a way that is easy to underestimate.

We did not just “derive another key.”

We made the protocol architecture cleaner.

Now the handshake and the traffic layer are connected in a more principled way:

the handshake creates shared secret material
the key schedule turns that material into working keys
the record layer consumes those keys

This is a much better model than treating the raw X25519 result as the final answer.

And it brings us one step closer to real TLS, where key derivation is not an optional detail, but one of the central pieces of the protocol design.

But we are still not secure

And now we arrive at the uncomfortable but necessary part.

Even with:

a real handshake
X25519
HKDF
fresh directional session keys
AEAD-protected records

the protocol still cannot be considered secure enough.

Why?

Because all of this still says nothing about who is on the other side.

The handshake can successfully create shared secrets.

HKDF can successfully derive traffic keys.

The record layer can successfully protect application data.

And an attacker can still sit in the middle and run two separate handshakes.

That is the next lesson.

Still Not Secure — The Man-in-the-Middle Problem

At this point, our protocol already looks much more serious than the one we started with.

We now have:

a real handshake
fresh shared secrets
X25519 instead of a pre-shared application key
HKDF-derived session keys
AEAD-protected application records

That is a long way from the fake secure channel in Part 1.

But it is still not enough.

The missing piece is one of the most important ideas in this whole series:

key exchange is not authentication

That sentence is easy to read quickly and move on from. But it is worth stopping here, because this is exactly where many protocols fail.

Our handshake proves that both sides can derive the same shared secret.

What it does not prove is:

who is actually on the other side.

And that difference is the whole problem.

The attack

Imagine an active attacker sitting between the client and the server.

Let’s call her Mallory.

The client thinks it is talking to the server.

The server thinks it is talking to the client.

But Mallory intercepts the handshake and replaces the exchanged public keys with her own.

In simplified form, the flow looks like this:

And now something very important happens.

The handshake still “works.”

But it works in the wrong way.

the client ends up with a shared secret with Mallory
the server ends up with a different shared secret with Mallory
and Mallory now has one valid secure channel to each side

From the point of view of the client and the server, everything looks normal:

key exchange succeeded
keys were derived
encrypted records verify correctly
AEAD tags are valid

And yet the protocol has already failed.

Because Mallory can now:

decrypt the client’s traffic
read it or modify it
re-encrypt it toward the server
receive the server’s response
read it or modify it
re-encrypt it back toward the client

Neither side can detect this.

In The Next Part — Building the Certificate Infrastructure

The handshake only proves one thing:

“I computed a shared secret with whoever sent me this public key.”

It does not prove:

“This public key came from the server I actually intended to talk to.”

That is the missing half.

To fix this, the client needs a way to verify that the public key it receives during the handshake actually belongs to the server it wanted to talk to.

That is where the next layer enters:

certificates
signatures
trust chains
certificate authorities

In other words, this is where the protocol must stop proving only that “someone” is there and start proving who that someone is.

That is exactly what the next article will build.

Summary

Our protocol now has secrecy against passive observers.

It has integrity for protected records.

It has fresh session keys.

But it still does not have identity.

And without identity, a correct shared secret with the wrong party is still a protocol failure.

That is the deeper lesson of Part 3.

Part 1 taught us:

confidentiality is not integrity

Part 2 taught us:

protecting records is not the same thing as establishing trust

And now Part 3 adds the next lesson:

key exchange is not authentication

That we will solve in the next article!

Final Code

The full code for this part is available here:

GitHub: https://github.com/DmytroHuzz/rebuilding_tls/tree/main/part_3

Amazon Aurora DSQL: A Practical Guide to AWS's Distributed SQL Database

Darryl Ruggles — Sun, 19 Apr 2026 16:23:59 +0000

Architecture, features, Terraform setup, and real application code - April 2026

When AWS announced Aurora DSQL at re:Invent 2024, I was very interested. We had heard promises about distributed SQL databases before and I really wanted to try it out. I experimented with it locally for a while and then built the Kabob Store example on it. Fifteen months later, DSQL has gone from preview to general availability, expanded to 14 regions, and shipped a steady stream of features. It fills the gap between DynamoDB's serverless economics and Aurora PostgreSQL's SQL power - and it does it well.

This is my comprehensive look at where DSQL stands in April 2026: what it does, what it doesn't do yet, how to set it up with Terraform, and practical application code you can use today.

Why Aurora DSQL?

For years, the database decision on AWS looked like this:

Need serverless economics? DynamoDB. But learn single-table design and give up SQL.
Need SQL? RDS or Aurora PostgreSQL. But accept always-on costs, instance sizing, and 10-15 minute provisioning.
Need multi-Region? DynamoDB Global Tables. SQL wasn't an option without manual replication.

Aurora DSQL eliminates the tradeoff. Four things make it different:

Serverless to zero - No instances, no capacity planning. Zero DPU charges when idle. Provisions in under 60 seconds.
PostgreSQL compatible - Based on PostgreSQL 16. Use psql, psycopg2, pgx, JDBC - the drivers you already know.
Strongly consistent - Not eventually consistent. Snapshot isolation with linearizability. Readers always see committed data.
Active-active multi-Region - Two full regions with concurrent reads and writes. No leader, no failover, no replication lag on commit.

What is Aurora DSQL?

Aurora DSQL is a serverless, distributed SQL database that disaggregates every component of a traditional database engine. Unlike Aurora PostgreSQL (which separates storage from compute but keeps them coupled), DSQL breaks the database into six independently scaling components:

Query Processors (QPs) - Run customized PostgreSQL engines inside Firecracker MicroVMs. Handle SQL parsing, planning, and execution. Scale independently based on query load.
Adjudicators - Validate transactions at COMMIT time using Optimistic Concurrency Control (OCC). Stateless and reconstructible.
Journal - A Paxos-based distributed transaction log (same technology as MemoryDB). Provides cross-AZ and cross-Region durability.
Crossbar - Merges journal streams and publishes committed writes to storage replicas. Sits between the Journal and Storage layers, ensuring all storage replicas receive the same ordered stream of committed transactions.
Storage - MVCC storage replicas distributed across 3 AZs. Consume committed entries from the Crossbar. Scale independently.
Control Plane - Coordinates all components, handles cluster lifecycle and scaling.

Note: The official AWS User Guide describes these layers as "Relay and connectivity, Compute and databases, Transaction log/concurrency control/isolation, Storage, and Control plane." The component names used here (Query Processors, Adjudicators, Journal, Crossbar) come from Marc Brooker's architecture deep-dive series and the AWS Database Blog, which provide more implementation detail.

The key design achievement, as Marc Brooker (VP/Distinguished Engineer at AWS) explained in his DSQL blog series, is that cross-region latency is incurred only at COMMIT time, not per-statement. During a transaction, reads and writes execute locally on the Query Processor. Only when you commit does the system coordinate with the Adjudicator and Journal for conflict detection and durability. Read-only transactions need no validation, no persistence, and no cross-region coordination at all.

Core Concepts

Optimistic Concurrency Control (OCC) - DSQL doesn't use locks. Transactions proceed without blocking each other. At COMMIT, the Adjudicator checks for write-write conflicts. If two transactions modified the same rows, one succeeds and the other gets a serialization failure (SQLSTATE 40001). Your application retries the failed transaction. No deadlocks, ever.

Snapshot Isolation - Each transaction sees a consistent snapshot of the database as of its start time (tau_start). All reads within a transaction see the same data, regardless of concurrent commits by other transactions. Equivalent to PostgreSQL's REPEATABLE READ.

IAM Authentication - No database passwords. Period. Applications generate tokens using generate_db_connect_auth_token (for runtime DML) or generate_db_connect_admin_auth_token (for schema migrations only). Integrates with IAM roles, so your ECS tasks and Lambda functions authenticate using their execution role. Tokens default to 15 minutes but can be configured up to one week using the token-duration-secs parameter in the connectors and CLI.

Asynchronous Indexes - DSQL requires CREATE INDEX ASYNC (synchronous CREATE INDEX is not supported). The index builds asynchronously while transactions continue. You can monitor build progress through system catalog queries.

Single DDL Per Transaction - Each CREATE TABLE, ALTER TABLE, or CREATE INDEX statement needs its own transaction with an explicit commit before the next DDL statement.

Feature Timeline: From Preview to Production

DSQL has shipped features at a steady pace since launch. Here's what has been added:

Date	Feature
February 2026	DSQL Playground (browser-based, no AWS account needed), sequences and identity columns, Go/Ruby/Python (asyncpg)/Node.js (WebSocket) connectors, numeric index support, AI steering (Kiro Powers, Claude/Gemini/Codex Skills), DBeaver plugin, SQLTools VS Code driver, Tortoise ORM adapter, Flyway dialect, Prisma CLI tools, expanded to 14 regions (added Canada, Sydney, Melbourne)
December 2025	Cluster lifecycle management, enhanced PrivateLink (Direct Connect + VPC peering), PostgreSQL migration guide
November 2025	Query Editor in console, JupyterLab integration, Python and Node.js connectors, storage quota increased to 256 TiB
October 2025	Resource-based policies for fine-grained cluster access control
September 2025	JDBC connector for Java applications
August 2025	AWS Fault Injection Service (FIS) integration for chaos testing
May 2025	General Availability - CloudWatch monitoring, AWS Backup, KMS CMK encryption, CloudFormation support, PrivateLink, Views
December 2024	Preview launch at re:Invent (3 US regions)

Region Availability (April 2026)

DSQL is now available in 14 regions across 4 continents:

Continent	Regions
North America	us-east-1 (Virginia), us-east-2 (Ohio), us-west-2 (Oregon), ca-central-1 (Montreal), ca-west-1 (Calgary)
Europe	eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-2 (London), eu-west-3 (Paris)
Asia Pacific	ap-northeast-1 (Tokyo), ap-northeast-2 (Seoul), ap-northeast-3 (Osaka), ap-southeast-2 (Sydney), ap-southeast-4 (Melbourne)

Multi-Region Cluster Sets

Multi-Region clusters must stay within one geographic set:

US: us-east-1, us-east-2, us-west-2
Europe: eu-central-1, eu-west-1, eu-west-2, eu-west-3
Asia Pacific: ap-northeast-1, ap-northeast-2, ap-northeast-3

Canada (ca-central-1, ca-west-1), Sydney (ap-southeast-2), and Melbourne (ap-southeast-4) are available as single-region clusters only and are not part of any multi-Region set. This is a common gotcha for customers in those regions.

Cross-continent multi-Region clusters are not supported. For global data sync across continents, DynamoDB Global Tables remain the option.

DSQL vs the Alternatives

Feature	Aurora DSQL	Aurora PostgreSQL Serverless v2	DynamoDB
Query language	PostgreSQL SQL	PostgreSQL SQL	PartiQL / NoSQL API
Provisioning time	Under 60 seconds	10-15 minutes	Instant
Scales to zero	Yes (no DPU charges)	Yes (0 ACU with auto-pause, ~15s cold start)	Yes (on-demand mode)
Multi-Region	Active-active, strong consistency	Read replicas, eventual	Global Tables, eventual
Availability SLA	99.99% / 99.999% multi-Region	99.99%	99.99% / 99.999% global
Authentication	IAM only (no passwords)	IAM or passwords	IAM or passwords
Foreign keys	Not yet	Yes	No (NoSQL)
Stored procedures	Not yet	Yes	No
Max storage	256 TiB	128 TiB	Unlimited
Transaction limits	3,000 rows, 10 MiB, 5 min	Practical limits (memory, storage, lock timeouts)	100 items, 4 MB
Pricing model	Per DPU ($8/million)	Per ACU-hour ($0.12+)	Per RRU/WRU or provisioned

When to Use What

Choose DSQL when you need SQL with serverless economics, multi-Region strong consistency, or you're building new applications that benefit from zero infrastructure management.

Choose Aurora PostgreSQL when you need foreign keys, stored procedures, triggers, pgvector for AI embeddings, or you're running an existing PostgreSQL application that uses unsupported features. Aurora Serverless v2 now scales to 0 ACUs with auto-pause (since November 2024), so it also offers scale-to-zero economics - with the tradeoff of a ~15-second cold start on resume.

Choose DynamoDB when your data model fits key-value or document patterns naturally, you need sub-millisecond latency, cross-continent global replication, or unlimited throughput scaling.

Setting Up DSQL with Terraform

All the Terraform code below uses Terraform >= 1.11 and the AWS provider ~> 6.0 . The terraform-aws-modules/rds-aurora DSQL submodule requires Terraform >= 1.11 and provider >= 6.18. The complete examples are in the GitHub repo.

Single-Region Cluster

This is the simplest setup. One resource, 60 seconds to provision, automatically distributed across 3 AZs:

terraform {
  required_version = ">= 1.11"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
  }
}

data "aws_region" "current" {}

resource "aws_dsql_cluster" "main" {
  deletion_protection_enabled = false

  # For production, enable deletion protection and add a KMS CMK:
  # deletion_protection_enabled = true
  # kms_encryption_key          = aws_kms_key.dsql.arn

  tags = {
    Name        = "my-app-dsql"
    Environment = "dev"
  }
}

# DSQL has no "endpoint" attribute - construct it from the identifier
output "dsql_endpoint" {
  value = "${aws_dsql_cluster.main.identifier}.dsql.${data.aws_region.current.id}.on.aws"
}

output "dsql_arn" {
  value = aws_dsql_cluster.main.arn
}

That's it. No instance class, no storage allocation, no replica configuration. One resource gives you a PostgreSQL-compatible database with 99.99% availability.

Multi-Region Cluster with Terraform Module

For production workloads requiring 99.999% availability, use multi-Region clusters. The official terraform-aws-modules/rds-aurora module includes a DSQL submodule:

provider "aws" {
  region = "us-east-1"
}

provider "aws" {
  alias  = "secondary"
  region = "us-east-2"
}

module "dsql_primary" {
  source  = "terraform-aws-modules/rds-aurora/aws//modules/dsql"
  version = "~> 10.0"

  deletion_protection_enabled = false
  witness_region              = "us-west-2"
  create_cluster_peering      = true
  clusters                    = [module.dsql_secondary.arn]

  tags = {
    Name = "my-app-dsql-primary"
  }
}

module "dsql_secondary" {
  source  = "terraform-aws-modules/rds-aurora/aws//modules/dsql"
  version = "~> 10.0"

  providers = {
    aws = aws.secondary
  }

  deletion_protection_enabled = false
  witness_region              = "us-west-2"
  create_cluster_peering      = true
  clusters                    = [module.dsql_primary.arn]

  tags = {
    Name = "my-app-dsql-secondary"
  }
}

The module handles cluster peering automatically. One terraform apply creates:

Primary cluster in us-east-1 with full read/write endpoint
Secondary cluster in us-east-2 with full read/write endpoint
Witness region in us-west-2 for Journal-only quorum (no endpoint, no user access)
Bidirectional peering with synchronous replication

Both endpoints present a single logical database. Your application can read and write to either endpoint. Strong consistency across both regions with zero replication lag on commit.

If you prefer using the native aws_dsql_cluster resource directly instead of the module, the multi-Region interface uses multi_region_properties with witness_region - see the commented-out Option B in the dsql-multi-region.tf example. Also note that AWS provider 6.x introduced per-resource region attributes, which can eliminate the need for provider aliases in some configurations.

IAM Authentication Policy

DSQL uses two IAM permission levels. Use the right one for each role:

dsql:DbConnect - Generates tokens for connecting with custom database roles. Use this for application runtime.
dsql:DbConnectAdmin - Generates tokens for connecting as the admin database user (full DDL + DML). Use this only for schema migrations and admin tasks.

Note that the DDL/DML restriction is enforced at the database role level, not the IAM layer. DbConnect generates a token that can only authenticate as a custom role (not admin), and custom roles only have the permissions you grant them. DbConnectAdmin generates a token that authenticates as admin, which has full privileges. AWS's security best practices are clear: don't use the admin role for everyday operations. Create separate IAM roles and custom database roles for application access.

# Application runtime policy - DML only (least privilege)
data "aws_iam_policy_document" "dsql_app_runtime" {
  statement {
    effect    = "Allow"
    actions   = ["dsql:DbConnect"]
    resources = [aws_dsql_cluster.main.arn]
  }
}

# Admin/migration policy - DDL + DML (for CI/CD pipelines, not app runtime)
data "aws_iam_policy_document" "dsql_admin" {
  statement {
    effect    = "Allow"
    actions   = ["dsql:DbConnectAdmin"]
    resources = [aws_dsql_cluster.main.arn]
  }
}

# ECS task role for application runtime - uses DbConnect, NOT DbConnectAdmin
resource "aws_iam_role" "app_task" {
  name = "my-app-task-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "app_dsql" {
  name   = "dsql-runtime-access"
  role   = aws_iam_role.app_task.id
  policy = data.aws_iam_policy_document.dsql_app_runtime.json
}

# Separate role for schema migrations (CI/CD pipeline, not the running app)
resource "aws_iam_role" "migration_role" {
  name = "my-app-migration-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "migration_dsql" {
  name   = "dsql-admin-access"
  role   = aws_iam_role.migration_role.id
  policy = data.aws_iam_policy_document.dsql_admin.json
}

Always scope DSQL permissions to the specific cluster ARN. No wildcard resources. Your running application should never have DbConnectAdmin - reserve that for migration tasks.

Custom Database Role (Least Privilege at the Database Layer)

IAM controls which token type you can generate, but you should also avoid connecting as admin for everyday operations. Create a custom database role and map it to an IAM identity:

-- Connect as admin (one-time setup via DbConnectAdmin)
CREATE ROLE app_role WITH LOGIN;
GRANT USAGE ON SCHEMA public TO app_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_role;

-- Map the IAM role ARN to the custom database role (DSQL-specific syntax)
AWS IAM GRANT app_role TO 'arn:aws:iam::123456789012:role/my-app-task-role';

-- To revoke later:
-- AWS IAM REVOKE app_role FROM 'arn:aws:iam::123456789012:role/my-app-task-role';

Then connect as the custom role in your application code:

conn = psycopg2.connect(
    host=cluster_endpoint,
    port=5432,
    database="postgres",
    user="app_role",  # Custom role, not admin
    password=token,   # Token from generate_db_connect_auth_token
    sslmode="require",
)

This completes the least-privilege story at both layers: IAM controls token generation (DbConnect vs DbConnectAdmin), and the database role controls what SQL the connection can execute.

PrivateLink (Production)

For production workloads, keep database traffic off the public internet using VPC endpoints:

resource "aws_vpc_endpoint" "dsql" {
  vpc_id              = aws_vpc.main.id
  service_name        = aws_dsql_cluster.main.vpc_endpoint_service_name
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.dsql_endpoint.id]
  private_dns_enabled = true
}

resource "aws_security_group" "dsql_endpoint" {
  name_prefix = "${var.project_name}-dsql-endpoint-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound (required for VPC endpoint communication)"
  }
}

With private_dns_enabled = true, your application connects using the same cluster endpoint - no code changes needed. For connections from on-premises via Direct Connect without private DNS, use the amzn-cluster-id connection parameter.

Application Code: Python

The examples below use Python 3.13+ with psycopg2 2.9.11 and boto3. The full example is in dsql_connection.py.

Connection with IAM Auth

import boto3
import psycopg2
from psycopg2.extras import RealDictCursor

client = boto3.client("dsql", region_name="us-east-1")
cluster_endpoint = f"{cluster_id}.dsql.us-east-1.on.aws"

# Token method must match the user:
# - user="admin" -> generate_db_connect_admin_auth_token (DDL + DML)
# - custom role  -> generate_db_connect_auth_token (DML only)
token = client.generate_db_connect_admin_auth_token(cluster_endpoint, "us-east-1")

conn = psycopg2.connect(
    host=cluster_endpoint,
    port=5432,
    database="postgres",
    user="admin",  # For production, use a custom database role - see "Custom Database Role" section
    password=token,
    sslmode="require",
    cursor_factory=RealDictCursor,
)

Or use the official connector (pip install aurora-dsql-python-connector, v0.2.6+) which handles token refresh automatically:

from aurora_dsql_python_connector import connect

conn = connect(
    cluster_endpoint="abc123.dsql.us-east-1.on.aws",
    region="us-east-1",
)

The OCC Retry Pattern

This is the most important pattern for DSQL applications. Since DSQL uses Optimistic Concurrency Control instead of locks, write transactions can fail at COMMIT when concurrent modifications conflict:

import time
import psycopg2.errors

def with_occ_retry(func, max_retries=3, base_delay=0.1):
    """Retry wrapper for OCC conflicts (SQLSTATE 40001)."""
    for attempt in range(max_retries):
        try:
            return func()
        except psycopg2.errors.SerializationFailure:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt)
            time.sleep(delay)

def create_order(customer_email, items, total):
    def _do_insert():
        conn = db.get_connection()
        cur = conn.cursor()
        cur.execute("""
            INSERT INTO orders (customer_email, items, total_amount)
            VALUES (%s, %s, %s)
            RETURNING *
        """, (customer_email, json.dumps(items), total))
        result = cur.fetchone()
        conn.commit()
        return dict(result)

    return with_occ_retry(_do_insert)

Key points about OCC:

Read-only transactions never conflict - they don't need retry logic
OCC conflicts are SQLSTATE 40001 (serialization_failure)
Use exponential backoff to avoid retry storms
Design transactions to be small and fast to minimize conflict windows
Avoid hot-spot writes (e.g., incrementing a single counter row from many threads)

Schema Setup with DDL Limits

def create_tables():
    conn = db.get_connection()
    cur = conn.cursor()

    # One DDL per transaction - commit before next DDL
    cur.execute("""
        CREATE TABLE IF NOT EXISTS products (
            id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
            name VARCHAR(200) NOT NULL,
            price NUMERIC(10, 2) NOT NULL,
            category VARCHAR(50),
            created_at TIMESTAMPTZ DEFAULT now()
        )
    """)
    conn.commit()  # Must commit before next DDL

    cur.execute("""
        CREATE TABLE IF NOT EXISTS orders (
            id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
            customer_email VARCHAR(255) NOT NULL,
            items TEXT NOT NULL,
            total_amount NUMERIC(10, 2) NOT NULL,
            status VARCHAR(20) DEFAULT 'pending',
            created_at TIMESTAMPTZ DEFAULT now()
        )
    """)
    conn.commit()  # Separate transaction for each DDL

Sequences and Identity Columns (New - February 2026)

# Using identity columns for auto-incrementing IDs
cur.execute("""
    CREATE TABLE IF NOT EXISTS audit_log (
        id BIGINT GENERATED ALWAYS AS IDENTITY (CACHE 65536) PRIMARY KEY,
        event_type VARCHAR(50) NOT NULL,
        payload TEXT,
        created_at TIMESTAMPTZ DEFAULT now()
    )
""")
conn.commit()

# Or use sequences directly
cur.execute("CREATE SEQUENCE IF NOT EXISTS invoice_seq START 1000 CACHE 65536")
conn.commit()

cur.execute("SELECT nextval('invoice_seq')")
next_invoice = cur.fetchone()["nextval"]

Application Code: Node.js

The examples below use Node.js 24.x LTS with @aws-sdk/dsql-signer and pg 8.20+. The full example is in dsql-connection.mjs. You can also use the official connector @aws/aurora-dsql-node-postgres-connector (v0.1.8+) which wraps pg with automatic IAM auth.

Connection with AWS SDK Signer

import { DsqlSigner } from "@aws-sdk/dsql-signer";
import pg from "pg";

const signer = new DsqlSigner({
  hostname: "abc123.dsql.us-east-1.on.aws",
  region: "us-east-1",
});

// Token method matches the user:
// - "admin" -> getDbConnectAdminAuthToken (DDL + DML)
// - custom role -> getDbConnectAuthToken (DML only)
const token = await signer.getDbConnectAdminAuthToken();

const pool = new pg.Pool({
  host: "abc123.dsql.us-east-1.on.aws",
  port: 5432,
  database: "postgres",
  user: "admin",
  password: token,
  ssl: true,
});

OCC Retry in Node.js

async function withOccRetry(pool, txnFn, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const client = await pool.connect();
    try {
      await client.query("BEGIN");
      const result = await txnFn(client);
      await client.query("COMMIT");
      return result;
    } catch (err) {
      await client.query("ROLLBACK").catch(() => {});
      if (err.code === "40001" && attempt < maxRetries - 1) {
        await new Promise((r) => setTimeout(r, 100 * 2 ** attempt));
        continue;
      }
      throw err;
    } finally {
      client.release();
    }
  }
}

// Usage
const order = await withOccRetry(pool, async (client) => {
  const result = await client.query(
    `INSERT INTO orders (customer_email, items, total_amount)
     VALUES ($1, $2, $3) RETURNING *`,
    [email, JSON.stringify(items), total]
  );
  return result.rows[0];
});

Multi-Region Application Architecture

For applications that need 99.999% availability and low-latency reads from multiple regions, deploy your application stack in each DSQL region with Route53 latency-based routing:

The architecture uses:

Route53 latency-based routing to direct users to the nearest region
CloudFront for static asset caching and edge termination
ECS Fargate running application containers in each region
Aurora DSQL with active-active clusters in both regions and a witness for quorum

Both DSQL endpoints present a single logical database. East Coast users connect to us-east-1, West Coast users connect to us-east-2 (or us-west-2 if available as a full endpoint) - both reading and writing the same strongly consistent data. The witness region in us-west-2 stores only encrypted Journal entries for quorum, with no user endpoint.

This is conceptually similar to DynamoDB Global Tables, but with full PostgreSQL SQL support and strong consistency instead of eventual consistency.

Database Limits to Know

DSQL has intentional limits that prevent tail latency and keep the system predictable. These aren't bugs - they're design choices:

Limit	Value	Why It Matters
Rows per transaction	3,000	Keeps OCC conflict windows small. Batch large inserts.
Transaction size	10 MiB	Prevents oversized commits from impacting the Journal.
Transaction duration	5 minutes	Forces short, focused transactions. No long-running locks (because there are no locks).
Connection duration	60 minutes	Aligns with IAM token lifecycle. Reconnect periodically.
Max connections	10,000 per cluster	Configurable via Service Quotas.
Connection rate	100/second (1,000 burst)	Not configurable. Critical for Lambda cold-start scenarios.
Tables per database	1,000	One database per cluster.
Schemas per database	10	Not configurable.
Indexes per table	24	Including primary key.
Max row size	2 MiB	Individual column max is 1 MiB.
Max storage	256 TiB (with quota increase)	Default is 10 TiB.
Sequences	5,000 per database	Added February 2026.
Views	5,000 per database	Added at GA, May 2025.

Pricing

DSQL uses a DPU (Distributed Processing Unit) billing model that covers all database activity - compute, I/O, and transaction processing - in a single metric.

DPU rate: $8 per million DPUs (us-east-2)
Storage: $0.33 per GB-month (pay for one logical copy per region)
Multi-Region writes: Additional DPU charges equal to originating write DPUs
Free tier: 100,000 DPUs + 1 GB storage per month (roughly 700K TPC-C equivalent transactions)
Scales to zero: No DPU charges when idle

Cost Comparison for a Modest Workload

For an application processing 1,000 transactions per hour, 10 GB storage:

Service	Monthly Cost
Aurora DSQL (single region)	~$50-80/month
Aurora DSQL (idle dev environment)	~$3/month (storage only)
Aurora PostgreSQL Serverless v2	~$90-120/month active, or storage-only when paused at 0 ACU (~15s cold start on resume)
RDS PostgreSQL (db.t3.medium)	~$60-80/month (runs 24/7)
DynamoDB (on-demand, equivalent)	~$30-50/month

Both DSQL and Aurora Serverless v2 can now scale to zero. The difference: DSQL resumes instantly with no cold start, while Aurora Serverless v2 takes approximately 15 seconds to resume from a paused state. For development environments with intermittent traffic, both cost pennies when idle. For production workloads that need instant response times, DSQL's zero cold start matters. DSQL is also eligible for Database Savings Plans for predictable workloads.

You can monitor DPU breakdown in CloudWatch under the AWS/AuroraDSQL namespace: ComputeDPU, ReadDPU, WriteDPU, and MultiRegionWriteDPU.

Developer Experience and Tooling

DSQL's tooling ecosystem has grown quickly:

Connectors (official, handle IAM auth automatically):

Python: aurora-dsql-python-connector v0.2.6 - wraps psycopg, psycopg2, asyncpg
Node.js: @aws/aurora-dsql-node-postgres-connector v0.1.8 (pg) and @aws/aurora-dsql-postgresjs-connector v0.2.1 (Postgres.js)
Java: JDBC connector (PgJDBC wrapper)
Go: pgx v5.8.0 wrapper (February 2026)
Ruby: aurora-dsql-ruby-pg-connector (February 2026)

ORM and Migration Tooling:

Tortoise ORM adapter (Python async ORM)
Prisma CLI tools (Node.js ORM integration)
Flyway dialect (database migration tooling)

IDE Integrations:

DBeaver plugin (Community and Pro editions)
VS Code SQLTools driver
JupyterLab and SageMaker AI integration
AWS Console Query Editor

AI Steering:

Aurora DSQL MCP Server for AI-assisted development
Kiro Powers for Kiro IDE
Skills for Claude Code, Cursor, Gemini, Codex
Steering ensures AI assistants generate DSQL-compatible code (handling OCC retries, DDL limits, IAM auth)

Infrastructure:

Terraform 1.11+ with AWS provider 6.18+ - native aws_dsql_cluster resource
terraform-aws-modules/rds-aurora DSQL submodule for multi-Region
CloudFormation support
AWS Backup integration for automated backups

Best Practices

Implement OCC retry logic on every write path. Use exponential backoff with 3-5 retries. Read-only transactions don't need retries.
Keep transactions small and fast. The 3,000 row and 5-minute limits exist for good reason. Batch large operations into chunks of 500 rows.
Use UUID primary keys. Random UUIDs distribute writes evenly across storage shards. Sequential IDs create hot spots that increase OCC conflicts.
Refresh IAM tokens proactively. Tokens default to 15 minutes (configurable up to one week via token-duration-secs). With the default, refresh at 10 minutes to avoid connection failures. The official connectors handle this automatically.
Use the official connectors for production SSL. Raw psycopg2 with sslmode="require" encrypts the connection but doesn't verify the server's identity. The official aurora-dsql-python-connector and @aws/aurora-dsql-node-postgres-connector handle full certificate verification automatically. For production, use the connectors rather than managing SSL configuration yourself.
One DDL per transaction. Always commit after each CREATE TABLE, ALTER TABLE, or CREATE INDEX. This catches many migration scripts that batch DDL.
Scope IAM policies to cluster ARNs. Never use wildcard resources for DSQL permissions. Scope dsql:DbConnect and dsql:DbConnectAdmin to specific cluster ARNs.
Use EXPLAIN ANALYZE VERBOSE for query optimization. Covering indexes can significantly reduce DPU costs by enabling index-only scans instead of full table scans.
Implement referential integrity in application code. Without foreign keys, enforce relationships through application-level validation and carefully designed transaction boundaries.
Test with AWS FIS. Use Fault Injection Service to simulate region failures and validate your application's multi-Region behavior before you need it.
Monitor DPU breakdown in CloudWatch. Watch ComputeDPU, ReadDPU, WriteDPU separately. High WriteDPU relative to reads may indicate OCC conflict storms.

What's Not There Yet - And Why

This is the most contentious part of DSQL. If you're coming from standard RDS PostgreSQL or Aurora PostgreSQL, the list of missing features is significant. But these aren't oversights - the DSQL team made deliberate engineering tradeoffs to deliver strong consistency and predictable performance across a distributed, multi-Region architecture. Some of these features are fundamentally difficult in a disaggregated, OCC-based system. Others have been deliberately deprioritized based on customer usage patterns.

The Full Gap List vs Standard PostgreSQL

PostgreSQL Feature	DSQL Status	Why
Foreign key constraints	Not yet - deprioritized based on customer usage patterns	Cascading operations (e.g., deleting an order with 1,000 line items) create large implicit transactions that conflict with DSQL's 3,000-row transaction limit and OCC model. Many high-scale customers avoid foreign keys even in standard PostgreSQL for this reason. Marc Brooker has noted the team "haven't built foreign key constraints yet" because many customers take the same approach.
Stored procedures (PL/pgSQL)	Not supported	Procedural code running inside the database conflicts with the serverless, stateless Query Processor model. The DSQL team sees this as an architectural direction, not a gap - business logic belongs in CI/CD-deployed application code, not inside the database.
Triggers	Not supported	Same reasoning as stored procedures. Database-side event processing creates hidden coupling and unpredictable transaction sizes. Use EventBridge, Lambda, or application-level event patterns instead.
TRUNCATE	Not supported	Use `DELETE FROM table_name` or `DROP TABLE` + `CREATE TABLE`. TRUNCATE's behavior is difficult to implement consistently across distributed storage replicas.
Temporary tables	Not supported	The stateless, multi-tenant Query Processor model means there's no persistent session state. Use CTEs (`WITH` clauses), subqueries, or regular tables with cleanup logic.
VACUUM / ANALYZE	Not needed	DSQL's MVCC garbage collection is automatic. The 5-minute transaction time limit enables simple, efficient cleanup without the complexity of PostgreSQL's vacuum process. No maintenance windows required.
pgvector / vector support	Not yet	Vector similarity search is planned. In the meantime, AWS offers S3 Vectors and Aurora PostgreSQL with pgvector for embedding workloads.
JSONB columns	Not as a column type	Store JSON in `TEXT` columns and cast to `jsonb` at query time (e.g., `my_column::jsonb->>'key'`). JSON functions and operators work at runtime, but you lose JSONB indexing (GIN indexes).
Full-text search	Not supported	No `tsvector`/`tsquery`. Use OpenSearch Serverless or Amazon Kendra for full-text search workloads.
Multiple databases per cluster	1 database (`postgres`)	Use schemas for logical separation within a cluster, or create separate clusters. This simplifies distributed metadata management.
Tablespaces	Not supported	Storage is fully managed and auto-scaled. No manual storage allocation or placement decisions needed.
Advisory locks	Not supported	OCC replaces all locking mechanisms. Advisory locks are a pessimistic concurrency pattern that doesn't fit the OCC model.
LISTEN / NOTIFY	Not supported	The stateless Query Processor model has no persistent connections for push notifications. Use SQS, SNS, or EventBridge for pub/sub patterns.
Extensions (PostGIS, etc.)	Not supported	The managed, multi-tenant architecture doesn't support arbitrary extensions. Use purpose-built AWS services (Location Service for geo, OpenSearch for search).
Custom collations	`C` collation only	Consistent collation across distributed storage simplifies sort ordering and index behavior across regions. UTF-8 encoding is supported.
Configurable isolation levels	`REPEATABLE READ` only	A single isolation level eliminates an entire class of consistency bugs. Strong snapshot isolation is the sweet spot between anomaly prevention and distributed performance.
Password authentication	IAM only	No database passwords, ever. This is a security decision - IAM tokens integrate with CloudTrail, roles, and temporary credentials.
CREATE INDEX (synchronous)	`CREATE INDEX ASYNC` only	Asynchronous index creation prevents DDL from blocking running transactions. You monitor build progress through system catalog queries. This is actually an improvement for production workloads.
Multiple DDL per transaction	1 DDL per transaction	Distributed schema changes are coordinated across all Query Processors and storage replicas. Limiting to one DDL per transaction keeps this coordination simple and predictable.

The Engineering Reasoning

Marc Brooker addressed the feature gaps directly in his Simplifying Architectures post. The key insight: DSQL's limits aren't arbitrary restrictions - they're what make the system's guarantees possible.

Transaction limits (3,000 rows, 10 MiB, 5 minutes) prevent head-of-line blocking. In a traditional database, one long-running transaction holding locks can stall every other transaction behind it. DSQL's OCC model doesn't have locks, but oversized commits would still create contention at the Adjudicator and Journal layers. The limits keep individual transactions fast and predictable, which keeps the entire system fast and predictable.

No stored procedures or triggers is the most opinionated choice. The DSQL team observed that customers are increasingly moving business logic out of the database and into application code deployed through CI/CD pipelines. Code in the database is hard to version, hard to test, and hard to debug. DSQL leans into this direction rather than supporting both models.

No foreign keys yet is the gap most customers notice first. The team has acknowledged the gap and may add support where it makes sense for the distributed architecture, but has deprioritized it based on customer feedback. The challenge is that cascading operations (CASCADE DELETE, CASCADE UPDATE) can create implicit transactions that exceed the row limits and generate unpredictable OCC conflict windows. Many high-scale PostgreSQL users already avoid foreign keys for exactly these reasons - but having the option matters.

What to Use Instead

For applications that depend heavily on the missing features today, here's the practical guidance:

Need foreign keys, stored procedures, triggers? Use Aurora PostgreSQL Serverless v2. Full PostgreSQL feature set with serverless scaling (though not to zero).
Need vector search? Aurora PostgreSQL with pgvector, S3 Vectors, or OpenSearch Serverless.
Need full-text search? OpenSearch Serverless or Amazon Kendra.
Need pub/sub notifications? EventBridge + Lambda instead of LISTEN/NOTIFY.
Need geospatial queries? Amazon Location Service instead of PostGIS.

DSQL is best for new applications that can work within these constraints, or existing applications that were already avoiding the missing features. The team is actively expanding compatibility - views, sequences, identity columns, and the Go connector all shipped based on direct customer feedback. Foreign key constraints remain a known gap, and customer demand will likely influence when they're addressed.

Things to Know

Connection caching - DSQL manages prepared statements cluster-wide. You may see more prepared statements per connection than expected. This is by design.

IPv4 connections - Some PostgreSQL clients attempt IPv6 first in dualstack mode. If you're on IPv4-only hosts, configure your client for IPv4 explicitly to avoid NetworkUnreachable errors.

Schema propagation - GRANT and REVOKE changes propagate to existing connections within the connection lifetime (up to one hour). For immediate effect, reconnect after permission changes.

Catalog cache - After creating schemas or tables, refresh your connection (disconnect/reconnect or SET search_path again) to update the catalog cache. This catches "Schema Already Exists" errors.

Deletion protection - Enable deletion_protection_enabled = true in production Terraform configs. If you need to destroy a DSQL cluster, disable protection first then run terraform apply before terraform destroy.

Row counts - For large tables, use the system catalog instead of COUNT(*) for row counts. DSQL stores approximate counts in pg_class.reltuples.

TRUNCATE - Not supported. Use DELETE FROM table_name to clear all rows, or DROP TABLE followed by CREATE TABLE for a full reset. This is a common migration stumbling block for scripts that use TRUNCATE for test data cleanup.

Connection pooling - With 60-minute connection limits and IAM token refresh, pool refresh behavior matters. Configure your connection pool to close and recreate connections before the 60-minute limit. The official connectors handle token refresh, but pool-level eviction still needs configuration. Set idleTimeoutMillis (Node.js) or equivalent to well under 60 minutes.

PostgreSQL client version - AWS recommends PostgreSQL client version 17 or later for best compatibility with DSQL.

Recent Features Worth Highlighting

DSQL Playground (February 2026) - A browser-based sandbox where you can create schemas, load sample data, and run SQL queries against a real DSQL database - no AWS account required. This is the fastest way to try DSQL. Visit the Aurora DSQL Playground and start writing queries in seconds.

Sequences and Identity Columns (February 2026) - The most requested feature after foreign keys. You can now use GENERATED ALWAYS AS IDENTITY columns and explicit CREATE SEQUENCE / nextval() calls. Up to 5,000 sequences per database.

AI Steering (February 2026) - The DSQL MCP server and IDE skills ensure AI coding assistants generate code that handles DSQL's specific patterns - OCC retries, DDL limits, IAM auth. If you use Claude Code, Cursor, or similar tools, install the DSQL steering skill. It saves real debugging time.

PrivateLink with Direct Connect (December 2025) - Connect to DSQL from on-premises networks without traversing the public internet. Uses the amzn-cluster-id connection option for clusters behind PrivateLink without private DNS.

Resource-Based Policies (October 2025) - Attach policies directly to DSQL clusters for cross-account access patterns. Useful for shared database architectures.

AWS FIS Integration (August 2025) - Inject connection errors into specific regions to test your application's failover behavior. For multi-Region deployments, run experiments in one region while the other continues normal operations.

My Project: The Kabob Store on DSQL

I built the Kabob Store as a real-world test of DSQL. It's a full e-commerce platform with menu browsing, cart management, and order processing, running on ECS Fargate with a FastAPI backend.

Key architectural decisions from that project:

Direct psycopg2 instead of an ORM - Better control over transaction boundaries and DSQL-specific patterns
Container-based architecture - The same Docker image deploys to Fargate, Lambda, EC2, or EKS without code changes
Multi-Region DSQL with single-Region compute - Data replication for disaster recovery, with plans to add multi-Region compute with Route53 routing
Defense-in-depth security - Six layers from client validation through parameterized queries
IAM token refresh manager - Thread-safe connection management with 55-minute token refresh

The architecture principles from that project apply to any DSQL application. I covered ECS as my default container runtime and EventBridge for event-driven patterns in previous posts - DSQL fits naturally into both patterns.

Cleanup

If you deployed a DSQL cluster to follow along, destroy your resources to avoid ongoing charges:

cd terraform

# If you enabled deletion protection, disable it first:
# Edit dsql-single-region.tf: set deletion_protection_enabled = false
# terraform apply

terraform destroy

For CI/CD pipelines and automated testing, set deletion_protection_enabled = false from the start, or use the force_destroy option in the Terraform module to skip the protection check during teardown.

DSQL charges only for DPUs consumed and storage used - there are no idle compute charges. But storage charges ($0.33/GB-month) continue as long as data exists in the cluster. For multi-Region clusters, destroy both the primary and secondary clusters. The witness region has no standalone resources to clean up.

Wrapping Up

Aurora DSQL is 15 months old and has matured quickly. It went from a 3-region preview to a 14-region GA service with CloudWatch monitoring, AWS Backup, PrivateLink, FIS chaos testing, resource-based policies, sequences, and a growing ecosystem of connectors and IDE integrations.

The gaps are real - no foreign keys, no stored procedures, no vector support. These matter for some workloads. But for new applications that need SQL with serverless economics, multi-Region strong consistency without managing replicas, or a database that actually scales to zero, DSQL delivers.

My decision tree for new projects now has a clear path:

Need key-value at scale? DynamoDB.
Need full PostgreSQL? Aurora PostgreSQL Serverless v2.
Need SQL + serverless + multi-Region? Aurora DSQL.

The code examples in this post are in the GitHub repo - Terraform for infrastructure, Python and Node.js for application patterns. If you want to try DSQL without even creating an AWS account, the DSQL Playground lets you run queries in your browser in seconds. When you're ready for your own cluster, it's sixty seconds from terraform apply to a running PostgreSQL-compatible database with no instances to manage.

If you've been waiting for a serverless SQL database on AWS that isn't a compromise, this is it.

Resources

Aurora DSQL Playground - Try DSQL in your browser, no AWS account needed
Aurora DSQL User Guide
Aurora DSQL Pricing
Aurora DSQL Document History - Track every feature addition
Marc Brooker's DSQL Blog Series - Essential reading. Marc is the VP/Distinguished Engineer behind DSQL. His five-part series covers the architecture internals (reads, writes, transactions, multi-Region, simplifying architectures) in detail you won't find anywhere else.
Aurora DSQL Discord - Community Discord for questions, feedback, and discussion with the DSQL team
terraform-aws-modules/rds-aurora DSQL Module
Aurora DSQL MCP Server - AI steering for DSQL-aware code generation
Aurora DSQL Connectors - Official Python, Node.js, Java, Go, Ruby connectors
My Kabob Store Project - My previous DSQL blog - building a multi-Region e-commerce platform
ECS: My Default Choice for Containers
EventBridge: The Event-Driven Backbone of AWS

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.