<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Future: AWS Community Builders </title>
    <description>The latest articles on Future by AWS Community Builders  (@aws-builders).</description>
    <link>https://future.forem.com/aws-builders</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F2794%2F88da75b6-aadd-4ea1-8083-ae2dfca8be94.png</url>
      <title>Future: AWS Community Builders </title>
      <link>https://future.forem.com/aws-builders</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://future.forem.com/feed/aws-builders"/>
    <language>en</language>
    <item>
      <title>AWS Data &amp; AI Stories #04: Multimodal RAG on AWS</title>
      <dc:creator>Sedat SALMAN</dc:creator>
      <pubDate>Wed, 22 Apr 2026 19:07:00 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/aws-data-ai-stories-04-multimodal-rag-on-aws-2ppp</link>
      <guid>https://future.forem.com/aws-builders/aws-data-ai-stories-04-multimodal-rag-on-aws-2ppp</guid>
      <description>&lt;p&gt;In the first article, I talked about multimodal AI at a high level.&lt;/p&gt;

&lt;p&gt;In the second article, I focused on Amazon Bedrock Data Automation as the processing layer.&lt;/p&gt;

&lt;p&gt;In the third article, I explained multimodal knowledge bases as the retrieval layer.&lt;/p&gt;

&lt;p&gt;Now it is time to connect these pieces together.&lt;/p&gt;

&lt;p&gt;This is where multimodal RAG becomes important. Amazon Bedrock Knowledge Bases now supports multimodal content including images, audio, and video, and AWS positions it as a managed way to build end-to-end RAG workflows over enterprise data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is multimodal RAG?
&lt;/h2&gt;

&lt;p&gt;RAG means Retrieval Augmented Generation.&lt;/p&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve relevant content from your own data&lt;/li&gt;
&lt;li&gt;send that context to the model&lt;/li&gt;
&lt;li&gt;generate a grounded answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A multimodal RAG system follows the same logic, but the retrieved context is not limited to text. It can also include images, audio, video, or processed outputs derived from those inputs. AWS documentation for multimodal knowledge bases explicitly supports multimedia ingestion and querying, including image queries and time-based retrieval metadata for audio and video.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why is multimodal RAG different from normal RAG?
&lt;/h2&gt;

&lt;p&gt;Traditional RAG is usually text-focused.&lt;/p&gt;

&lt;p&gt;That works well for manuals, policies, reports, and similar documents.&lt;/p&gt;

&lt;p&gt;But in many real environments, important knowledge is spread across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;diagrams&lt;/li&gt;
&lt;li&gt;screenshots&lt;/li&gt;
&lt;li&gt;scanned pages&lt;/li&gt;
&lt;li&gt;recorded calls&lt;/li&gt;
&lt;li&gt;videos&lt;/li&gt;
&lt;li&gt;field images&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the challenge is no longer only “Which paragraph should I retrieve?”&lt;/p&gt;

&lt;p&gt;The new challenge becomes:&lt;br&gt;
Which content is relevant, regardless of format?&lt;/p&gt;

&lt;p&gt;That is the real value of multimodal RAG. AWS’s newer multimodal retrieval guidance is built around this exact shift from text-only retrieval to retrieval across media types.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I see the architecture
&lt;/h2&gt;

&lt;p&gt;A simple multimodal RAG architecture on AWS looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is collected in a source such as Amazon S3&lt;/li&gt;
&lt;li&gt;Raw files are processed if needed&lt;/li&gt;
&lt;li&gt;A knowledge base indexes the usable content&lt;/li&gt;
&lt;li&gt;A query retrieves relevant multimodal context&lt;/li&gt;
&lt;li&gt;A foundation model generates the answer&lt;/li&gt;
&lt;li&gt;The application returns the answer, often with source grounding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS describes Knowledge Bases as a fully managed RAG capability that handles ingestion, retrieval, and prompt augmentation, which is why it fits this workflow so well. AWS also shows multimodal examples where Bedrock Data Automation is used before Knowledge Bases to improve downstream retrieval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two main multimodal RAG patterns
&lt;/h2&gt;

&lt;p&gt;This is the most important design point for this article.&lt;/p&gt;

&lt;p&gt;Not every multimodal RAG system should be built the same way.&lt;/p&gt;

&lt;p&gt;AWS currently describes two main approaches for multimodal processing in Knowledge Bases:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Retrieval-first approach
&lt;/h3&gt;

&lt;p&gt;This is the better option when the main goal is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;visual similarity&lt;/li&gt;
&lt;li&gt;image search&lt;/li&gt;
&lt;li&gt;cross-modal retrieval&lt;/li&gt;
&lt;li&gt;media-aware search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this pattern, Amazon Nova Multimodal Embeddings is the main enabler. AWS describes this approach as the right fit for visual similarity searches and multimodal semantic retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Processing-first approach
&lt;/h3&gt;

&lt;p&gt;This is the better option when the main goal is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extracting structured meaning from raw media&lt;/li&gt;
&lt;li&gt;turning audio, video, or documents into usable searchable content&lt;/li&gt;
&lt;li&gt;supporting downstream question answering with processed output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this pattern, Amazon Bedrock Data Automation becomes the first major step before retrieval. AWS documentation describes BDA as the text-based processing path for multimedia content in multimodal knowledge bases, and AWS has also published solution examples combining BDA with Knowledge Bases for multimodal RAG applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to decide between the two
&lt;/h3&gt;

&lt;p&gt;For me, the design question is simple.&lt;/p&gt;

&lt;p&gt;If I want to ask:&lt;br&gt;
“Find content that looks or feels similar.”&lt;br&gt;
then I would think retrieval-first.&lt;/p&gt;

&lt;p&gt;If I want to ask:&lt;br&gt;
“Extract useful content from media and use that in RAG.”&lt;br&gt;
then I would think processing-first.&lt;/p&gt;

&lt;p&gt;AWS’s own “choose your multimodal processing approach” guidance makes this distinction very clearly, and I think that is the right way to avoid overdesigning the solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical workflow example
&lt;/h2&gt;

&lt;p&gt;Imagine a support or operations use case.&lt;/p&gt;

&lt;p&gt;Your data may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF maintenance procedures&lt;/li&gt;
&lt;li&gt;field images&lt;/li&gt;
&lt;li&gt;audio notes from engineers&lt;/li&gt;
&lt;li&gt;short troubleshooting videos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A user asks:&lt;br&gt;
“What is the likely issue and what should I check first?”&lt;/p&gt;

&lt;p&gt;A text-only RAG system may retrieve a manual section.&lt;/p&gt;

&lt;p&gt;A multimodal RAG system can do more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve a relevant text section&lt;/li&gt;
&lt;li&gt;identify matching visual evidence&lt;/li&gt;
&lt;li&gt;point to the correct moment in a video&lt;/li&gt;
&lt;li&gt;use processed audio or image context to improve the answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS documentation for querying multimodal knowledge bases shows response metadata such as source modality, MIME type, and start and end timestamps for audio and video segments, which makes this type of experience much more practical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Bedrock Knowledge Bases matters here
&lt;/h2&gt;

&lt;p&gt;You can always build your own RAG system.&lt;/p&gt;

&lt;p&gt;But one reason Bedrock Knowledge Bases matters is that it reduces the amount of custom plumbing.&lt;/p&gt;

&lt;p&gt;AWS positions it as a managed RAG capability that simplifies setup, handles parts of preprocessing and retrieval, and helps ground model responses in proprietary data. For many teams, this is a better starting point than building a fully custom retrieval pipeline from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where BDA still matters in multimodal RAG
&lt;/h2&gt;

&lt;p&gt;Even though this article is about RAG, BDA still plays an important role.&lt;/p&gt;

&lt;p&gt;Multimodal RAG does not always mean retrieving directly from raw multimedia.&lt;/p&gt;

&lt;p&gt;In many cases, the better pattern is:&lt;/p&gt;

&lt;h2&gt;
  
  
  process the content first
&lt;/h2&gt;

&lt;p&gt;extract structured insights&lt;br&gt;
store or index those outputs&lt;br&gt;
use them in RAG&lt;/p&gt;

&lt;p&gt;AWS has shown this pattern in solution examples where Amazon Bedrock Data Automation processes multimodal content, the extracted information is stored in a knowledge base, and then a RAG interface is used for question answering.&lt;/p&gt;

&lt;h2&gt;
  
  
  One point people often miss
&lt;/h2&gt;

&lt;p&gt;A common mistake is to assume multimodal RAG is only about attaching files to a chatbot.&lt;/p&gt;

&lt;p&gt;That is too simple.&lt;/p&gt;

&lt;p&gt;A real multimodal RAG system usually includes:&lt;/p&gt;

&lt;h2&gt;
  
  
  ingestion
&lt;/h2&gt;

&lt;p&gt;processing&lt;br&gt;
indexing&lt;br&gt;
retrieval&lt;br&gt;
prompt augmentation&lt;br&gt;
response generation&lt;br&gt;
source grounding&lt;/p&gt;

&lt;p&gt;That is why I see multimodal RAG as an architecture pattern, not just a model feature. AWS Prescriptive Guidance describes Knowledge Bases as covering the RAG workflow from ingestion to retrieval and prompt augmentation, which supports this architecture view.&lt;/p&gt;

&lt;h2&gt;
  
  
  Constraints to remember
&lt;/h2&gt;

&lt;p&gt;There are also a few practical points to remember.&lt;/p&gt;

&lt;p&gt;First, AWS states that multimodal support in Bedrock Knowledge Bases is available with unstructured data sources. Structured data sources do not support multimodal content processing. Second, the available query types and features depend on the processing approach you choose.&lt;/p&gt;

&lt;p&gt;So it is important to design the knowledge layer with the right data source model from the start.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this is useful
&lt;/h2&gt;

&lt;p&gt;I think multimodal RAG is especially useful in cases like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;technical support&lt;/li&gt;
&lt;li&gt;operations knowledge assistants&lt;/li&gt;
&lt;li&gt;document and image search&lt;/li&gt;
&lt;li&gt;inspection workflows&lt;/li&gt;
&lt;li&gt;compliance evidence review&lt;/li&gt;
&lt;li&gt;media-rich enterprise search&lt;/li&gt;
&lt;li&gt;predictive maintenance assistants&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS has published examples including multimodal root-cause diagnosis and agentic multimodal assistants, which shows that this pattern is already moving into real business use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;For me, multimodal RAG is where the previous three topics come together.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multimodal AI gives the overall direction&lt;/li&gt;
&lt;li&gt;Bedrock Data Automation helps process raw content&lt;/li&gt;
&lt;li&gt;Multimodal Knowledge Bases provide the retrieval layer&lt;/li&gt;
&lt;li&gt;Multimodal RAG turns all of that into useful answers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS now provides a much clearer path for building these solutions than before, especially with managed multimodal retrieval in Knowledge Bases and guidance on choosing between BDA and Nova Multimodal Embeddings depending on the use case.&lt;/p&gt;

&lt;p&gt;For me, the key lesson is simple:&lt;/p&gt;

&lt;p&gt;Do not start with the model.&lt;/p&gt;

&lt;p&gt;Start with the question:&lt;br&gt;
What kind of content do I need to retrieve, and why?&lt;/p&gt;

&lt;p&gt;If that answer is clear, the multimodal RAG design becomes much easier.&lt;/p&gt;

&lt;p&gt;In the next article, I would move to the next logical topic:&lt;/p&gt;

&lt;p&gt;Amazon Nova Multimodal Embeddings.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>datascience</category>
      <category>aws</category>
      <category>awsbigdata</category>
    </item>
    <item>
      <title>Technologies And Concepts: Cheat Sheet for Solutions Architect Associate (SAA-C03)</title>
      <dc:creator>Ntombizakhona Mabaso</dc:creator>
      <pubDate>Wed, 22 Apr 2026 17:31:28 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/technologies-and-concepts-cheat-sheet-for-solutions-architect-associate-saa-c03-h52</link>
      <guid>https://future.forem.com/aws-builders/technologies-and-concepts-cheat-sheet-for-solutions-architect-associate-saa-c03-h52</guid>
      <description>&lt;p&gt;☁️ &lt;strong&gt;Exam Guide:&lt;/strong&gt; Solutions Architect Associate&lt;br&gt;
&lt;strong&gt;Technologies And Concepts Cheat Sheet&lt;/strong&gt;&lt;br&gt;
📘 &lt;em&gt;Cheat Sheet&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The SAA-C03 exam guide lists technologies and concepts across all four domains. This cheat sheet consolidates that information into a &lt;strong&gt;compact, exam-aligned reference.&lt;/strong&gt; Organized domain by domain.  Designed for quick review and efficient study.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  📖 Exam Overview
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;th&gt;Info&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Exam Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SAA-C03&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Questions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;65 total (50 scored, 15 unscored)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Passing Score&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;720 / 1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Question Types&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple choice &amp;amp; Multiple response&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Experience Required&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1+ year hands-on designing cloud solutions on AWS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Domain Weightings
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Design Secure Architectures&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;30%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Design Resilient Architectures&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;26%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Design High-Performing Architectures&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;24%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Design Cost-Optimized Architectures&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🔒 Domain 1
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Design Secure Architectures
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1.1&lt;/strong&gt; Secure Access to AWS Resources
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;IAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Users, Groups, Roles, Policies: Design flexible authorization models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;IAM Identity Center&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Centralized SSO across multiple AWS accounts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;MFA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apply to IAM users and root users as a security best practice&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Cross-Account Access&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Use IAM Roles + STS for role switching and cross-account patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Organizations &amp;amp; SCPs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manage multi-account security strategy with Service Control Policies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Control Tower&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automate landing zones and guardrails across accounts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Resource Policies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Determine when to use resource-based vs identity-based policies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Federated Access&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Directory service + IAM roles for external identity federation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Least Privilege&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Core security principle: grant only minimum required permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Shared Responsibility Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS secures the cloud &amp;amp; you secure what's in it&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1.2&lt;/strong&gt; Secure Workloads and Applications
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;VPC Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Security groups, route tables, NACLs, NAT gateways&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Subnets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Public vs private subnet segmentation strategies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Shield&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DDoS protection (Standard free, Advanced paid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS WAF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web Application Firewall for Layer 7 (SQL injection, XSS)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Secrets Manager&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rotate, manage, retrieve secrets (DB credentials, API keys)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Cognito&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User authentication for web/mobile apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS GuardDuty&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Threat detection using ML on logs/events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Macie&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Discover and protect sensitive data (PII) in S3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;VPN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Site-to-Site VPN and Client VPN for encrypted connectivity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Direct Connect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dedicated private network connection to AWS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1.3&lt;/strong&gt; Data Security Controls
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;KMS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed key creation, rotation, and control for encryption at rest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;ACM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Certificate Manager: TLS/SSL for encryption in transit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;CloudHSM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hardware Security Module for customer-managed key control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Data Classification&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Categorize data by sensitivity to apply appropriate controls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;S3 Versioning &amp;amp; MFA Delete&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Protect object data from accidental deletion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Backup &amp;amp; Replication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Implement data backup, point-in-time recovery, cross-region replication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Data Lifecycle Policies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manage retention and expiry of data at rest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Compliance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Align AWS services to regulatory requirements (GDPR, HIPAA, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🏗️ Domain 2
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Design Resilient Architectures
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2.1&lt;/strong&gt; Scalable and Loosely Coupled Architectures
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon SQS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Decouple components with message queuing (Standard and FIFO)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon SNS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pub/sub messaging for fan-out patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon EventBridge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Event-driven routing across AWS services and SaaS apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Step Functions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Workflow orchestration for distributed applications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;API Gateway&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Create, publish, and manage REST/HTTP/WebSocket APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon AppFlow&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed data integration between SaaS apps and AWS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS AppSync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed GraphQL API service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Serverless Patterns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lambda + API Gateway + SQS/SNS for event-driven design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Microservices&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stateless vs stateful workloads &amp;amp; Independent scaling of components&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Caching Strategies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reduce load &amp;amp; know when to use caching vs direct reads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Horizontal vs Vertical Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scale out (add instances) vs scale up (bigger instance)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Load Balancers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ALB (Layer 7), NLB (Layer 4), GLB (Layer 3/4 for appliances)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon MQ&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed message broker (ActiveMQ/RabbitMQ) for migrations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Multi-tier Architectures&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web / App / DB tiers with distinct roles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;CDN / Edge Accelerators&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CloudFront for caching, Global Accelerator for routing performance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2.2&lt;/strong&gt; Highly Available and Fault-Tolerant Architectures
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Availability Zones&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deploy across ≥2 AZs for high availability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Regions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Choose regions based on latency, compliance, and redundancy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Disaster Recovery Strategies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Backup &amp;amp; Restore → Pilot Light → Warm Standby → Active-Active&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;RPO / RTO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Recovery Point Objective (data loss tolerance) vs Recovery Time Objective (downtime tolerance)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Route 53&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DNS with health checks, failover routing, latency-based routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;RDS Proxy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pooled DB connections for Lambda and high-concurrency apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Distributed Design Patterns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retry with backoff, circuit breaker, bulkhead patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Service Quotas &amp;amp; Throttling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Plan for limits in standby environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS X-Ray&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Distributed tracing for workload visibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Immutable Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Replace rather than patch: ensures consistency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Auto Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;EC2 Auto Scaling + AWS Auto Scaling for elastic capacity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Storage Durability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;S3 (11 9s), EBS (99.999%), choose appropriate tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  ⚡ Domain 3
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Design High-Performing Architectures
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3.1&lt;/strong&gt; Storage Solutions
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Service / Concept&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Object storage: scalable, durable, lifecycle policies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon EBS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Block storage for EC2: SSD (gp3, io2) or HDD (st1, sc1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon EFS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed NFS: shared file storage for Linux workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon FSx&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed file systems: Windows (SMB), Lustre (HPC), NetApp, OpenZFS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Storage Gateway&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hybrid storage: file, volume, tape gateway types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Storage Types&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Object vs File vs Block: know performance and use-case differences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;S3 Storage Classes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standard, Intelligent-Tiering, IA, Glacier, Glacier Deep Archive&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3.2&lt;/strong&gt; Compute Solutions
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Service / Concept&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon EC2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Virtual machines: choose instance type/family for workload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;EC2 Auto Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automatically add/remove instances based on demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Lambda&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless functions: event-driven, scale to zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Fargate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless containers: no EC2 management needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon ECS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Container orchestration on EC2 or Fargate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon EKS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed Kubernetes: supports Anywhere and Distro variants&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Batch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed batch processing: compute-intensive jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon EMR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Big data on managed Hadoop/Spark clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Elastic Beanstalk&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PaaS: deploy web apps without managing infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Outposts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS infrastructure on-premises (hybrid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Wavelength&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deploy workloads at the edge of 5G networks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  3.3 Database Solutions
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Service / Concept&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon RDS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed relational DB: MySQL, PostgreSQL, SQL Server, Oracle, MariaDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Aurora&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High-performance relational DB (MySQL/PostgreSQL compatible)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Aurora Serverless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;On-demand autoscaling for Aurora (v2 generally available)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon DynamoDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless NoSQL: millisecond latency at any scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon ElastiCache&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;In-memory caching: Redis (complex data) vs Memcached (simple)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Redshift&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data warehouse: columnar storage for analytics queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon DocumentDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed MongoDB-compatible document database&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Neptune&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Graph database for connected data (social graphs, fraud detection)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Keyspaces&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed Apache Cassandra-compatible service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Read Replicas&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Offload read traffic &amp;amp; know when to use vs Multi-AZ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Caching Patterns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cache-aside, write-through, TTL strategies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;DB Capacity Planning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Capacity Units (DynamoDB), Provisioned IOPS, instance sizing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3.4&lt;/strong&gt; Network Architectures
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Service / Concept&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon VPC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Isolated virtual network: subnets, route tables, IGW, NAT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon CloudFront&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CDN: cache content at edge locations globally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Global Accelerator&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Route users to optimal endpoints using AWS global network&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Elastic Load Balancing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ALB (HTTP/S), NLB (TCP/UDP), GLB (appliances)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Direct Connect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dedicated private line to AWS (predictable performance)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Transit Gateway&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hub-and-spoke for connecting many VPCs and on-prem networks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;VPC Peering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Direct VPC-to-VPC connectivity (no transitive routing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS PrivateLink&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Private access to AWS services and third-party services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Route 53&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DNS. Routing policies: simple, weighted, latency, failover, geolocation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Network Topology&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Global, hybrid, multi-tier &amp;amp; design for scale&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3.5&lt;/strong&gt; Data Ingestion and Transformation
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Service / Concept&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Kinesis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time streaming data: Data Streams, Data Firehose, Video Streams&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Data Firehose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Load streaming data to S3, Redshift, OpenSearch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Glue&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless ETL: transform and catalog data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Athena&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless SQL queries on S3 data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Lake Formation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Build, secure, and manage data lakes on S3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon EMR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Process large datasets with Hadoop, Spark, Hive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon MSK&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed Apache Kafka for streaming pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS DataSync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automate data transfer between on-prem and AWS storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Transfer Family&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed SFTP/FTPS/FTP to S3 or EFS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon QuickSuite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;BI and data visualization service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon OpenSearch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Search and analytics &amp;amp; also supports vector similarity (RAG)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Amazon Redshift&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Query structured data at petabyte scale&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  💰 Domain 4
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Design Cost-Optimized Architectures
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;4.1&lt;/strong&gt; Cost-Optimized Storage
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;S3 Storage Classes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Match class to access frequency &amp;amp; Glacier for archival&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;S3 Lifecycle Policies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automate transitions between storage classes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;S3 Intelligent-Tiering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Auto-move objects between tiers based on access patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;EBS Volume Types&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;gp3 vs io2 vs st1 vs sc1 &amp;amp; match to IOPS and cost needs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Requester Pays&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Transfer cost charged to requester, not bucket owner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Data Lifecycle Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retain only what's needed &amp;amp; expire or archive the rest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Hybrid Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DataSync, Transfer Family, Storage Gateway for on-prem cost reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Backup Strategy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Balance recovery needs with cost (snapshots, replication)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  4.2 Cost-Optimized Compute
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;On-Demand Instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pay per use: highest flexibility, highest per-hour cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Reserved Instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 or 3 year commitment: up to 72% savings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Savings Plans&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Flexible commitment (Compute, EC2, SageMaker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Spot Instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Up to 90% savings for fault-tolerant/interruptible workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Compute Optimizer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ML-based recommendations for right-sizing EC2, Lambda, EBS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS Serverless Application Repository&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pre-built serverless apps: reduce build cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;EC2 Hibernation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Save instance state to EBS: resume without full reboot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Containerization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ECS/EKS/Fargate for higher density and cost efficiency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Instance Families&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;General purpose, compute optimized, memory optimized, storage optimized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;VMware Cloud on AWS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extend VMware workloads to AWS without refactoring&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;4.3&lt;/strong&gt; Cost-Optimized Databases
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;DynamoDB On-Demand vs Provisioned&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;On-demand for unpredictable; provisioned for predictable + cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Aurora Serverless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pay per ACU-hour: ideal for intermittent workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;RDS Reserved Instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Commit to 1 or 3 years for significant savings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Read Replicas&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Offload reads to reduce primary DB load (and cost)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;DB Snapshot Policies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Balance frequency vs storage cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Caching&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ElastiCache reduces DB query load and cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Data Retention Policies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Define how long to keep data: archive vs delete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Right-Sized DB Instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Don't over-provision: use metrics to guide sizing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  4.4 Cost-Optimized Network Architectures
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;What to Know&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;NAT Gateway vs NAT Instance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NAT Gateway scales automatically but costs more &amp;amp; NAT instance is cheaper at low traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;VPC Endpoints&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Eliminate NAT costs for S3/DynamoDB &amp;amp; use Gateway Endpoints (free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Direct Connect vs VPN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Direct Connect more expensive but predictable; VPN cheaper for low volume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Region-to-Region Transfer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data egress fees apply &amp;amp; minimize cross-region traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Same-AZ Traffic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free &amp;amp; architect to keep traffic within same AZ where possible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;CloudFront&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reduce origin data transfer costs with edge caching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Transit Gateway Pricing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Attachment + data processing fees &amp;amp; evaluate vs VPC peering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Throttling Strategy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Use API Gateway throttling to control overuse and cost spikes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🛠️ AWS Cost Management Tools
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Cost Explorer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Visualize and analyze historical spend and forecast costs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Budgets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Set spend/usage thresholds with alerts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Cost and Usage Report&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Granular billing data exportable to S3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Savings Plans&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Flexible commitment model for compute savings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost Allocation Tags&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tag resources to attribute costs to teams/projects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Compute Optimizer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Right-sizing recommendations based on usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Trusted Advisor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Best-practice checks across cost, security, performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Well-Architected Tool&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Review architecture against the Well-Architected Framework&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  💡 Disaster Recovery Strategy Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;RPO&lt;/th&gt;
&lt;th&gt;RTO&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backup &amp;amp; Restore&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hours&lt;/td&gt;
&lt;td&gt;Hours&lt;/td&gt;
&lt;td&gt;💰 Lowest&lt;/td&gt;
&lt;td&gt;Back up to S3/Glacier &amp;amp; restore on failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pilot Light&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;10s of minutes&lt;/td&gt;
&lt;td&gt;💰💰&lt;/td&gt;
&lt;td&gt;Core services always running &amp;amp;scale up on failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Warm Standby&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Seconds/Minutes&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;💰💰💰&lt;/td&gt;
&lt;td&gt;Scaled-down live environment &amp;amp; quickly scale to full&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Active-Active&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Near zero&lt;/td&gt;
&lt;td&gt;Near zero&lt;/td&gt;
&lt;td&gt;💰💰💰💰 Highest&lt;/td&gt;
&lt;td&gt;Full duplicate environment &amp;amp; traffic split between sites&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🔑 Key Abbreviations
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Abbreviation&lt;/th&gt;
&lt;th&gt;Full Term&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Identity and Access Management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Service Control Policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MFA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-Factor Authentication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;STS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Security Token Service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ACM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Certificate Manager&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KMS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Key Management Service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VPC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Virtual Private Cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NACL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Network Access Control List&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ALB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application Load Balancer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NLB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Network Load Balancer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GLB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gateway Load Balancer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CDN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Content Delivery Network&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RPO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Recovery Point Objective&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RTO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Recovery Time Objective&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Disaster Recovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EBS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Elastic Block Store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EFS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Elastic File System&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FSx&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon FSx (managed file systems)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple Queue Service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SNS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple Notification Service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ETL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extract, Transform, Load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HDD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hard Disk Drive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Solid State Drive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IOPS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Input/Output Operations Per Second&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reserved Instance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ACU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Aurora Capacity Unit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PII&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Personally Identifiable Information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single Sign-On&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 In Scope AWS Services Quick Reference
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Compute
&lt;/h3&gt;

&lt;p&gt;Amazon EC2 · EC2 Auto Scaling · AWS Lambda · AWS Fargate · AWS Elastic Beanstalk · AWS Batch · AWS Outposts · VMware Cloud on AWS · AWS Wavelength · AWS Serverless Application Repository&lt;/p&gt;

&lt;h3&gt;
  
  
  Containers
&lt;/h3&gt;

&lt;p&gt;Amazon ECR · Amazon ECS · ECS Anywhere · Amazon EKS · EKS Anywhere · Amazon EKS Distro&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage
&lt;/h3&gt;

&lt;p&gt;Amazon S3 · Amazon EBS · Amazon EFS · Amazon FSx · AWS Storage Gateway · AWS Snow Family&lt;/p&gt;

&lt;h3&gt;
  
  
  Database
&lt;/h3&gt;

&lt;p&gt;Amazon RDS · Amazon Aurora · Aurora Serverless · Amazon DynamoDB · Amazon ElastiCache · Amazon Redshift · Amazon DocumentDB · Amazon Neptune · Amazon Keyspaces&lt;/p&gt;

&lt;h3&gt;
  
  
  Networking &amp;amp; Content Delivery
&lt;/h3&gt;

&lt;p&gt;Amazon VPC · Amazon CloudFront · AWS Direct Connect · Elastic Load Balancing · AWS Global Accelerator · AWS PrivateLink · Amazon Route 53 · AWS Site-to-Site VPN · AWS Client VPN · AWS Transit Gateway&lt;/p&gt;

&lt;h3&gt;
  
  
  Analytics
&lt;/h3&gt;

&lt;p&gt;Amazon Athena · Amazon EMR · AWS Glue · Amazon Kinesis · Amazon Data Firehose · Amazon Kinesis Video Streams · Amazon MSK · Amazon OpenSearch Service · Amazon QuickSuite · Amazon Redshift · AWS Lake Formation · AWS Data Exchange&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Integration
&lt;/h3&gt;

&lt;p&gt;Amazon SQS · Amazon SNS · Amazon EventBridge · Amazon MQ · AWS Step Functions · Amazon AppFlow · AWS AppSync&lt;/p&gt;

&lt;h3&gt;
  
  
  Security, Identity &amp;amp; Compliance
&lt;/h3&gt;

&lt;p&gt;AWS IAM · AWS IAM Identity Center · Amazon Cognito · AWS KMS · AWS CloudHSM · AWS ACM · Amazon GuardDuty · Amazon Macie · Amazon Detective · AWS Shield · AWS WAF · AWS Secrets Manager · AWS Directory Service · AWS Artifact · AWS Audit Manager&lt;/p&gt;

&lt;h3&gt;
  
  
  Management &amp;amp; Governance
&lt;/h3&gt;

&lt;p&gt;AWS Organizations · AWS Control Tower · AWS CloudFormation · AWS CloudTrail · Amazon CloudWatch · AWS Config · AWS Systems Manager · AWS Auto Scaling · AWS Compute Optimizer · AWS Trusted Advisor · AWS Well-Architected Tool · AWS Service Catalog · AWS Health Dashboard · AWS License Manager · Amazon Managed Grafana · Amazon Managed Service for Prometheus&lt;/p&gt;

&lt;h3&gt;
  
  
  Migration &amp;amp; Transfer
&lt;/h3&gt;

&lt;p&gt;AWS DMS · AWS DataSync · AWS Snow Family · AWS Transfer Family · AWS Application Migration Service&lt;/p&gt;

&lt;h3&gt;
  
  
  Machine Learning
&lt;/h3&gt;

&lt;p&gt;Amazon SageMaker AI · Amazon Comprehend · Amazon Kendra · Amazon Lex · Amazon Polly · Amazon Rekognition · Amazon Textract · Amazon Transcribe · Amazon Translate&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Management
&lt;/h3&gt;

&lt;p&gt;AWS Budgets · AWS Cost Explorer · AWS Cost and Usage Report · Savings Plans&lt;/p&gt;

&lt;h3&gt;
  
  
  Developer Tools
&lt;/h3&gt;

&lt;p&gt;AWS X-Ray&lt;/p&gt;

&lt;h3&gt;
  
  
  Serverless
&lt;/h3&gt;

&lt;p&gt;AWS Lambda · AWS Fargate · Amazon API Gateway · Amazon DynamoDB · Amazon EventBridge · Amazon SQS · Amazon SNS&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important:&lt;/strong&gt; Always refer to the official exam guide for the most up-to-date list of in-scope and out-of-scope services.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  📚 Additional Resources
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/pdfs/aws-certification/latest/solutions-architect-associate-03/solutions-architect-associate-03.pdf" rel="noopener noreferrer"&gt;AWS Certified Solutions Architect – Associate (SAA-C03) Exam Guide (PDF)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/pdfs/aws-certification/latest/examguides/aws-certification-exam-guides.pdf" rel="noopener noreferrer"&gt;AWS Certification: All Exam Guides&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ntombizakhona/series/35366"&gt;Exam Guide: Solutions Architect Associate Series&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;Good luck with your exam! 🚀&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>certification</category>
      <category>solutionsarchitect</category>
    </item>
    <item>
      <title>Keeping Pirate Weather Afloat: Inside the AWS Pipeline and the Christmas Eve Outage</title>
      <dc:creator>Alexander Rey</dc:creator>
      <pubDate>Wed, 22 Apr 2026 15:30:47 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/keeping-pirate-weather-afloat-inside-the-aws-pipeline-and-the-christmas-eve-outage-1g51</link>
      <guid>https://future.forem.com/aws-builders/keeping-pirate-weather-afloat-inside-the-aws-pipeline-and-the-christmas-eve-outage-1g51</guid>
      <description>&lt;p&gt;Since it's been a while since I last covered Pirate Weather's AWS infrastructure, I thought it was time to write a short update on how everything fits together, and also explain where things have gone wrong. At a high level, Pirate Weather is a Python script that reads Zarr files. These files are created from a series of scripts that run on a schedule, download the data, perform some light processing, and save .zip files for the response script. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion &amp;amp; Processing:&lt;/strong&gt; A suite of Python scripts runs on a precise schedule, triggered by &lt;strong&gt;Amazon EventBridge&lt;/strong&gt;. These scripts are orchestrated by &lt;strong&gt;AWS Step Functions&lt;/strong&gt;, which manage &lt;strong&gt;AWS Fargate&lt;/strong&gt; containers (using our &lt;a href="https://gallery.ecr.aws/j9v4j3c7/pirate-wgrib-python-arm" rel="noopener noreferrer"&gt;custom ARM-based image&lt;/a&gt;). These containers download raw data, perform light processing, and "chunk" the data into Zarr format for lightning-fast retrieval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage Strategy:&lt;/strong&gt; The processed Zarr data is initially persisted as zip files on &lt;strong&gt;Amazon S3&lt;/strong&gt;. To minimize latency, an &lt;strong&gt;rclone&lt;/strong&gt; container syncs these files to &lt;strong&gt;autoscaled EC2 NVMe instances&lt;/strong&gt;. 

&lt;ul&gt;
&lt;li&gt;By serving data from local NVMe storage rather than directly from S3, we achieve the IOPS necessary for real-time weather requests.&lt;/li&gt;
&lt;li&gt;Using zip files avoids the having a ton of S3 objects and the associated transaction costs.&lt;/li&gt;
&lt;li&gt;Notably, the time for each model forecast is included in every chunk, which avoids having to rely on metadata. &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;ECS service:&lt;/strong&gt; An ECS service coordinates four containers running on the EC2 instances: rclone for syncing, the &lt;a href="https://gallery.ecr.aws/j9v4j3c7/pirate-alpine-zarr" rel="noopener noreferrer"&gt;production FastAPI container&lt;/a&gt;, the development container, the historic data (Time Machine) container, and Kong.

&lt;ul&gt;
&lt;li&gt;This ensures that things are restarted if there are issues, handles placement on the instances, and container updates.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Traffic Management &amp;amp; Security:&lt;/strong&gt; Inbound requests are routed through &lt;strong&gt;Amazon CloudFront&lt;/strong&gt; to a &lt;strong&gt;Network Load Balancer (NLB)&lt;/strong&gt;, which passes it to the EC2 instances. From there, traffic hits a &lt;strong&gt;Kong Gateway&lt;/strong&gt; container, which manages authentication and rate limiting.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Data Persistence:&lt;/strong&gt; The gateway and API layers are supported by &lt;strong&gt;Amazon ElastiCache (Redis)&lt;/strong&gt; for rapid session/rate-limit caching and an &lt;strong&gt;Amazon RDS&lt;/strong&gt; database for persistent metadata and user information.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FPirate-Weather%2Fpirateweather%2Fblob%2Fmain%2Fdocs%2Fimages%2FArch_Diagram_2026.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FPirate-Weather%2Fpirateweather%2Fblob%2Fmain%2Fdocs%2Fimages%2FArch_Diagram_2026.png" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's quite a few nuances to the various pieces; however, this is "meat and potatoes" of it. &lt;/p&gt;

&lt;h4&gt;
  
  
  December 24, 2025 downtime incident
&lt;/h4&gt;

&lt;p&gt;The four hour production downtime had two root causes. The first was traced to a configuration conflict between our AWS Step Function definitions and the underlying ECS cluster strategy. While our ECS cluster is architected to run a resilient 50:50 mix of Fargate Spot and Fargate On-Demand instances, the Step Function definition responsible for triggering the ingestion tasks contained an explicit override. As seen in the configuration snippet below, the task was hardcoded to rely exclusively on &lt;code&gt;FARGATE_SPOT&lt;/code&gt;. During a period of high Spot instance reclamation in our availability zone, these ingestion containers were repeatedly terminated by AWS before completion, halting the data pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"CapacityProviderStrategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"CapacityProvider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FARGATE_SPOT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Weight"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is an issue on it's own; however, should have been recoverable; however, the ingestion failure was amplified by a logic error in the processing scripts, which lacked a fallback mechanism for missing GFS data when the two day buffer was exceeded, causing the forecast generation to fail entirely rather than serving stale or partial data. To resolve this, I have updated all Step Function task definitions to remove the explicit &lt;code&gt;CapacityProviderStrategy&lt;/code&gt; override. The tasks now defer to the ECS cluster’s default capacity provider strategy, ensuring a stable 50:50 distribution between Spot and On-Demand instances. This change guarantees that even if Spot capacity is volatile, the On-Demand instances will ensure the ingestion process completes successfully. I've also added additional logging on when ingest tasks fail, which will avoid missing failures in the underlying data, as well as a check to avoid serving stale model results (&lt;a href="https://github.com/Pirate-Weather/pirate-weather-code/pull/542" rel="noopener noreferrer"&gt;PR #542&lt;/a&gt;). &lt;/p&gt;

</description>
      <category>weather</category>
      <category>aws</category>
    </item>
    <item>
      <title>Stop Paying Too Much for CloudWatch Logs — Auto-Archive to S3 via Firehose</title>
      <dc:creator>Yuichi Sato</dc:creator>
      <pubDate>Tue, 21 Apr 2026 11:58:09 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/stop-paying-too-much-for-cloudwatch-logs-auto-archive-to-s3-via-firehose-3f2f</link>
      <guid>https://future.forem.com/aws-builders/stop-paying-too-much-for-cloudwatch-logs-auto-archive-to-s3-via-firehose-3f2f</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This article was originally written in Japanese and published on Qiita. It has been translated with the help of AI.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Original article: &lt;a href="https://qiita.com/sassssan68/items/da2aa98bba12748daca7" rel="noopener noreferrer"&gt;https://qiita.com/sassssan68/items/da2aa98bba12748daca7&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Have you ever calculated how much it actually costs to keep CloudWatch Logs long-term?&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TL;DR&lt;br&gt;
Keeping logs in CloudWatch Logs long-term is expensive.&lt;br&gt;
Subscription → Firehose → S3 (Deep Archive) is more stable and cost-effective.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I recently had an audit requirement to retain logs for 18 months. When I estimated the CloudWatch Logs cost…&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;18 months = $1,069.20&lt;br&gt;
— That's way too much!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I followed the AWS-recommended architecture — &lt;strong&gt;Subscription → Firehose → S3&lt;/strong&gt; — and combined it with a lifecycle policy to transition to Deep Archive. The result:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;~85% cost reduction&lt;br&gt;
Plus fully automated, stable operations&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This article covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why CreateExportTask is not recommended (per AWS)&lt;/li&gt;
&lt;li&gt;How much cost you can actually save (with formulas)&lt;/li&gt;
&lt;li&gt;How to set up Subscription → Firehose → S3 (Deep Archive)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Why You Shouldn't Keep Logs in CloudWatch Logs Long-Term
&lt;/h1&gt;

&lt;p&gt;Audit and regulatory requirements often mandate log retention for years. However, storing large volumes of logs in CloudWatch Logs gets expensive fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High storage cost&lt;/strong&gt; — CloudWatch Logs storage pricing is heavy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scales linearly&lt;/strong&gt; — The more data you store, the worse it gets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not designed for long-term archival&lt;/strong&gt; — It's a monitoring tool, not a storage solution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This raises the question: &lt;strong&gt;What's the right way to handle long-term log retention?&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  AWS Says Export Task Is "Not Recommended" — Here's Why
&lt;/h1&gt;

&lt;p&gt;From the CloudWatch console, you can manually export logs to S3. To automate this, you'd use the &lt;code&gt;CreateExportTask&lt;/code&gt; API:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_CreateExportTask.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_CreateExportTask.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You could call this periodically from Lambda or EventBridge Scheduler. However, the AWS documentation explicitly discourages this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;&lt;br&gt;
We recommend that you don't regularly export to Amazon S3 as a way to continuously archive your logs. For that use case, we instead recommend that you use subscriptions. For more information about subscriptions, see Real-time processing of log data with subscriptions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On top of that, there's a &lt;strong&gt;concurrency limit of 1 export task at a time&lt;/strong&gt;. If you're exporting from multiple log groups or across multiple time ranges, tasks will queue up, causing failures and delays.&lt;/p&gt;

&lt;p&gt;Given these limitations, CreateExportTask is unreliable for audit-grade long-term retention. &lt;strong&gt;As the AWS docs say, subscriptions are the way to go.&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The AWS-Recommended Architecture: Subscription → Firehose → S3
&lt;/h1&gt;

&lt;p&gt;AWS recommends using CloudWatch Logs subscription filters:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With Firehose, logs are transferred in near real-time — no manual operations required, stable and suitable for long-term archival.&lt;/p&gt;

&lt;p&gt;But when I first saw this architecture, I thought: &lt;em&gt;"This looks expensive. Is it actually cheaper than just leaving logs in CloudWatch?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So I ran the numbers.&lt;/p&gt;

&lt;h1&gt;
  
  
  Cost Comparison: CloudWatch Logs vs. Subscription → Firehose → S3 (Deep Archive)
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt;&lt;br&gt;
Over 18 months, Subscription → Firehose → S3 (Deep Archive) is approximately &lt;strong&gt;85% cheaper&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Note:&lt;/strong&gt;&lt;br&gt;
If your retention period is &lt;strong&gt;2 months or less&lt;/strong&gt;, CloudWatch Logs may actually be cheaper.&lt;br&gt;
The conclusion in this article assumes &lt;strong&gt;long-term retention&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Assumptions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Retention period: 18 months&lt;/li&gt;
&lt;li&gt;Monthly log volume: 100 GB&lt;/li&gt;
&lt;li&gt;Tokyo region pricing (as of April 2026):

&lt;ul&gt;
&lt;li&gt;CloudWatch Logs storage: $0.033/GB/month&lt;/li&gt;
&lt;li&gt;Firehose delivery: $0.036/GB&lt;/li&gt;
&lt;li&gt;S3 Standard: $0.025/GB/month&lt;/li&gt;
&lt;li&gt;Glacier Deep Archive: $0.002/GB/month&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Case 1:&lt;/strong&gt; Keep all logs in CloudWatch Logs for the full 18 months&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Case 2:&lt;/strong&gt; Keep logs in CloudWatch for 2 weeks (for analysis), simultaneously stream via Firehose → S3 Standard → Glacier Deep Archive after 1 day&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Case 1&lt;/th&gt;
&lt;th&gt;Formula&lt;/th&gt;
&lt;th&gt;Case 2&lt;/th&gt;
&lt;th&gt;Formula&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Logs storage&lt;/td&gt;
&lt;td&gt;$1,069.20&lt;/td&gt;
&lt;td&gt;0.033 × (100 × 18) × 18&lt;/td&gt;
&lt;td&gt;$27.72&lt;/td&gt;
&lt;td&gt;0.033 × (100 × 14/30) × 18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Firehose delivery&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;$64.80&lt;/td&gt;
&lt;td&gt;0.036 × 100 × 18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 Standard (1 day)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;$1.50&lt;/td&gt;
&lt;td&gt;0.025 × (100 × 1/30) × 18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Glacier Deep Archive&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;$64.80&lt;/td&gt;
&lt;td&gt;0.002 × (100 × 18) × 18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,069.20&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$158.82&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Savings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$910.38 (~85% reduction)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;Note:

&lt;ul&gt;
&lt;li&gt;Average stored volume for 2-week retention: monthly log volume × (14/30)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By keeping logs in CloudWatch for just 2 weeks (for analysis) and archiving older logs in Deep Archive, you can achieve approximately &lt;strong&gt;85% cost savings&lt;/strong&gt; over 18 months.&lt;/p&gt;

&lt;h1&gt;
  
  
  How to Set It Up
&lt;/h1&gt;

&lt;p&gt;The setup involves five steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an S3 bucket with a lifecycle rule (transition to Glacier Deep Archive after 1 day)&lt;/li&gt;
&lt;li&gt;Create a Firehose stream (source: Direct PUT, destination: S3)&lt;/li&gt;
&lt;li&gt;Create an IAM role for the subscription filter&lt;/li&gt;
&lt;li&gt;Create a CloudWatch Logs subscription filter&lt;/li&gt;
&lt;li&gt;Verify logs are flowing to S3&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For steps 3 and 4, the AWS documentation provides a complete walkthrough including the IAM policy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#FirehoseExample" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#FirehoseExample&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Export Task is not recommended&lt;/strong&gt; (per AWS documentation)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Firehose is the most operationally practical solution&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;S3 lifecycle rules enable cost-optimized long-term archival&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For short-term log retention, CloudWatch Logs works just fine. But if you need to retain logs for &lt;strong&gt;months to years&lt;/strong&gt;, &lt;strong&gt;Subscription → Firehose → S3 (Deep Archive) is the practical solution&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When long-term retention becomes a requirement, it's worth revisiting your architecture.&lt;/p&gt;

&lt;p&gt;I hope this helps anyone else dealing with the same log retention cost challenges.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloudwatch</category>
      <category>s3</category>
      <category>costoptimization</category>
    </item>
    <item>
      <title>AWS Data &amp; AI Stories #03: Multimodal Knowledge Bases</title>
      <dc:creator>Sedat SALMAN</dc:creator>
      <pubDate>Mon, 20 Apr 2026 19:01:00 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/aws-data-ai-stories-03-multimodal-knowledge-bases-34af</link>
      <guid>https://future.forem.com/aws-builders/aws-data-ai-stories-03-multimodal-knowledge-bases-34af</guid>
      <description>&lt;p&gt;In the first article, I talked about multimodal AI at a high level.&lt;/p&gt;

&lt;p&gt;In the second one, I focused on Amazon Bedrock Data Automation as the processing layer.&lt;/p&gt;

&lt;p&gt;Now the next question is simple:&lt;/p&gt;

&lt;p&gt;After we process the content, how do we make it searchable and useful for AI applications?&lt;/p&gt;

&lt;p&gt;This is where multimodal knowledge bases come in.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock Knowledge Bases now supports multimodal content, including images, audio, and video, in addition to traditional unstructured text sources. It also supports multimodal querying, including image-based search and retrieval across media types.&lt;/p&gt;

&lt;p&gt;For me, this is the layer that turns processed content into usable context.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a multimodal knowledge base?
&lt;/h2&gt;

&lt;p&gt;A knowledge base is a managed retrieval layer for your own content.&lt;/p&gt;

&lt;p&gt;Instead of asking a model to rely only on general training knowledge, a knowledge base helps the system retrieve information from your own files and data sources before generating a response. That is the main idea behind Retrieval Augmented Generation, or RAG. Amazon Bedrock Knowledge Bases is designed for exactly this purpose: it retrieves relevant information from your data sources and uses it to improve response relevance and accuracy.&lt;/p&gt;

&lt;p&gt;A multimodal knowledge base extends that idea beyond text.&lt;/p&gt;

&lt;p&gt;So instead of only working with documents, the system can also work with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;images&lt;/li&gt;
&lt;li&gt;audio&lt;/li&gt;
&lt;li&gt;video&lt;/li&gt;
&lt;li&gt;mixed-content files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because enterprise knowledge is rarely text only.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why does this matter?
&lt;/h2&gt;

&lt;p&gt;Because many real-world systems do not store knowledge in perfect written documents.&lt;/p&gt;

&lt;p&gt;A lot of value exists in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;diagrams&lt;/li&gt;
&lt;li&gt;scanned files&lt;/li&gt;
&lt;li&gt;screenshots&lt;/li&gt;
&lt;li&gt;inspection photos&lt;/li&gt;
&lt;li&gt;recorded calls&lt;/li&gt;
&lt;li&gt;training videos&lt;/li&gt;
&lt;li&gt;equipment images&lt;/li&gt;
&lt;li&gt;operational media&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If our knowledge layer only understands text, a large part of business context stays outside the system.&lt;/p&gt;

&lt;p&gt;With multimodal retrieval in Bedrock Knowledge Bases, AWS now supports ingesting, indexing, and retrieving information from text, images, video, and audio in a more unified workflow. AWS also notes that applications can search using an image query to find visually similar content or relevant scenes in multimedia sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it fits in the architecture
&lt;/h2&gt;

&lt;p&gt;I see the flow like this:&lt;/p&gt;

&lt;p&gt;Raw content → processing layer → knowledge base → retrieval → answer or action&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Part 1 was the general multimodal AI view&lt;/li&gt;
&lt;li&gt;Part 2 was the processing layer with Bedrock Data Automation&lt;/li&gt;
&lt;li&gt;Part 3 is the retrieval layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That means the knowledge base is not the first step.&lt;/p&gt;

&lt;p&gt;It comes after the content is already available in a usable form, whether directly from unstructured sources or after preprocessing.&lt;/p&gt;

&lt;p&gt;AWS documentation also makes this separation clearer now by distinguishing multimodal processing approaches depending on the goal: Nova Multimodal Embeddings for visual similarity and cross-modal retrieval, or Bedrock Data Automation for text-oriented processing of multimedia content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two ways to think about multimodal retrieval
&lt;/h2&gt;

&lt;p&gt;This is the most important design point.&lt;/p&gt;

&lt;p&gt;Not every multimodal use case is the same.&lt;/p&gt;

&lt;p&gt;AWS currently describes two main multimodal processing approaches for knowledge bases:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Nova Multimodal Embeddings approach
&lt;/h3&gt;

&lt;p&gt;This is better when the focus is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;visual similarity&lt;/li&gt;
&lt;li&gt;image search&lt;/li&gt;
&lt;li&gt;cross-modal retrieval&lt;/li&gt;
&lt;li&gt;searching media with text or image input&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS documentation says this approach is suited for visual similarity searches and multimodal semantic retrieval.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Bedrock Data Automation approach&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is better when the focus is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extracting structured meaning from multimedia&lt;/li&gt;
&lt;li&gt;turning media into searchable text-oriented outputs&lt;/li&gt;
&lt;li&gt;using processed content in downstream RAG&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS documentation describes this option as the text-based processing path for multimedia content.&lt;/p&gt;

&lt;p&gt;For me, the decision is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If I want to find similar content across modalities, I think retrieval-first.&lt;/li&gt;
&lt;li&gt;If I want to extract useful content from media and then search it, I think processing-first.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What can you query?
&lt;/h2&gt;

&lt;p&gt;This is one of the nice parts of the newer multimodal support.&lt;/p&gt;

&lt;p&gt;After ingesting multimodal content, Bedrock Knowledge Bases supports different query patterns depending on the selected approach. AWS documentation for testing and querying multimodal knowledge bases shows support for metadata such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;source modality&lt;/li&gt;
&lt;li&gt;MIME type&lt;/li&gt;
&lt;li&gt;chunk start time for audio/video&lt;/li&gt;
&lt;li&gt;chunk end time for audio/video&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also mentions playback controls with automatic segment positioning for multimedia results in the console.&lt;/p&gt;

&lt;p&gt;That means this is not just “retrieve a paragraph.”&lt;/p&gt;

&lt;p&gt;It can also become:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve a scene from a video&lt;/li&gt;
&lt;li&gt;return the relevant moment in an audio file&lt;/li&gt;
&lt;li&gt;find a matching image&lt;/li&gt;
&lt;li&gt;connect retrieved media segments to an answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a big step forward compared with traditional text-only RAG.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I would explain it simply
&lt;/h2&gt;

&lt;p&gt;A traditional knowledge base answers:&lt;/p&gt;

&lt;p&gt;“Which text chunk is relevant?”&lt;/p&gt;

&lt;p&gt;A multimodal knowledge base can answer:&lt;/p&gt;

&lt;p&gt;“Which content is relevant, regardless of whether it is text, image, audio, or video?”&lt;/p&gt;

&lt;p&gt;That is the real difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data source point to remember
&lt;/h2&gt;

&lt;p&gt;There is one important limitation to keep in mind.&lt;/p&gt;

&lt;p&gt;AWS documentation states that multimodal support in Bedrock Knowledge Bases is available when creating a knowledge base with unstructured data sources. Structured data sources do not support multimodal content processing.&lt;/p&gt;

&lt;p&gt;That is important for design.&lt;/p&gt;

&lt;p&gt;If your use case depends heavily on images, audio, or video, you should think in terms of unstructured content pipelines, not only structured tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical example
&lt;/h2&gt;

&lt;p&gt;Imagine a support or operations platform.&lt;/p&gt;

&lt;p&gt;Your users may store:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF manuals&lt;/li&gt;
&lt;li&gt;field photos&lt;/li&gt;
&lt;li&gt;recorded troubleshooting calls&lt;/li&gt;
&lt;li&gt;short maintenance videos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A user asks:&lt;br&gt;
“Show me the relevant maintenance guidance for this equipment issue.”&lt;/p&gt;

&lt;p&gt;A traditional text-only system may retrieve only written manuals.&lt;/p&gt;

&lt;p&gt;A multimodal knowledge base can potentially retrieve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a relevant text section&lt;/li&gt;
&lt;li&gt;a matching image&lt;/li&gt;
&lt;li&gt;a useful audio segment&lt;/li&gt;
&lt;li&gt;a video moment with the right scene&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then that context can be passed to the model for answer generation.&lt;/p&gt;

&lt;p&gt;That is why this is more than just a storage feature.&lt;/p&gt;

&lt;p&gt;It is a better retrieval model for real-world knowledge.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why I like this layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I like multimodal knowledge bases because they make AI architecture more realistic.&lt;/p&gt;

&lt;p&gt;In many enterprise environments, the problem is not lack of data.&lt;/p&gt;

&lt;p&gt;The problem is that the useful data is trapped inside different formats and scattered across different files.&lt;/p&gt;

&lt;p&gt;A multimodal knowledge base helps solve that by creating a retrieval layer that can work across those formats. AWS positions Knowledge Bases as an out-of-the-box RAG capability that reduces the effort of building pipelines and helps applications answer queries using proprietary content, with source-grounded responses and citations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Common mistake&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A common mistake is to assume that all multimodal use cases need the same architecture.&lt;/p&gt;

&lt;p&gt;They do not.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;image similarity search is not the same as document extraction&lt;/li&gt;
&lt;li&gt;video segment retrieval is not the same as audio transcription&lt;/li&gt;
&lt;li&gt;cross-modal search is not the same as text-based RAG over processed media&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS’s own multimodal guidance now separates these choices clearly, and I think that is the right way to approach the design.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would decide early
&lt;/h2&gt;

&lt;p&gt;Before building the knowledge base, I would answer these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do I need visual similarity or text-oriented retrieval?&lt;/li&gt;
&lt;li&gt;Am I retrieving directly from raw multimodal content, or from processed output?&lt;/li&gt;
&lt;li&gt;Do I need image queries?&lt;/li&gt;
&lt;li&gt;Do I need timestamped retrieval from audio or video?&lt;/li&gt;
&lt;li&gt;Do I want the knowledge base mainly for search, RAG, or both?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These questions make the architecture much clearer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;For me, multimodal knowledge bases are the point where multimodal AI becomes operational.&lt;/p&gt;

&lt;p&gt;They connect processed or stored media-rich content to retrieval, and they make it possible to build AI systems that are grounded in more than just text. With Amazon Bedrock Knowledge Bases, AWS now supports multimodal ingestion and retrieval across images, audio, video, and text, along with query-time metadata that can point to the right file type and even the right media segment.&lt;/p&gt;

&lt;p&gt;That makes this layer very important.&lt;/p&gt;

&lt;p&gt;Because once retrieval improves, the answers improve.&lt;/p&gt;

&lt;p&gt;And once the answers improve, the AI system becomes much more useful.&lt;/p&gt;

&lt;p&gt;In the next article, I would move to the next logical topic:&lt;/p&gt;

&lt;p&gt;How to use multimodal retrieval in a real RAG workflow on AWS.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>datascience</category>
      <category>awsbigdata</category>
    </item>
    <item>
      <title>Cloud Myths You Should Probably Stop Believing</title>
      <dc:creator>Faisal Ibrahim Sadiq</dc:creator>
      <pubDate>Mon, 20 Apr 2026 14:41:18 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/cloud-myths-you-should-probably-stop-believing-5c4b</link>
      <guid>https://future.forem.com/aws-builders/cloud-myths-you-should-probably-stop-believing-5c4b</guid>
      <description>&lt;p&gt;If you spend enough time around cloud conversations, you’ll notice a pattern: a lot of people repeat the same ideas about the cloud as if they’re universally true.&lt;/p&gt;

&lt;p&gt;The problem is, many of these ideas are either oversimplified or just wrong. And if you build your understanding on them, you’ll make poor decisions when it actually matters; like during architecture design, cost planning, or scaling.&lt;/p&gt;

&lt;p&gt;Here are some of the most common cloud myths worth unlearning.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. “Cloud is always cheaper”
&lt;/h2&gt;

&lt;p&gt;This one gets repeated a lot, especially in beginner discussions.&lt;/p&gt;

&lt;p&gt;Cloud &lt;em&gt;can&lt;/em&gt; be cheaper, but only if you know what you’re doing.&lt;/p&gt;

&lt;p&gt;With a provider like Amazon Web Services, pricing is usage-based. That sounds great until you realize how easy it is to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;leave instances running 24/7&lt;/li&gt;
&lt;li&gt;over-provision resources&lt;/li&gt;
&lt;li&gt;ignore data egress costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, you’re not saving money but just scaling your bill.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Cloud is cost-efficient when optimized. Not by default.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. “The cloud is secure out of the box”
&lt;/h2&gt;

&lt;p&gt;Cloud providers invest heavily in security, but that doesn’t mean your application is automatically secure.&lt;/p&gt;

&lt;p&gt;There’s something called the &lt;strong&gt;Shared Responsibility Model&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The provider secures the infrastructure&lt;/li&gt;
&lt;li&gt;You secure everything you deploy on top of it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Misconfigured storage, weak &lt;strong&gt;IAM&lt;/strong&gt; policies, exposed APIs...these are still your responsibility.&lt;/p&gt;

&lt;p&gt;Most cloud breaches don’t happen because the provider failed. They happen because of bad configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. “You don’t need DevOps in the cloud”
&lt;/h2&gt;

&lt;p&gt;If anything, cloud environments &lt;em&gt;increase&lt;/em&gt; the need for DevOps practices.&lt;/p&gt;

&lt;p&gt;You’re still dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI/CD pipelines&lt;/li&gt;
&lt;li&gt;monitoring and logging&lt;/li&gt;
&lt;li&gt;infrastructure provisioning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tools like Docker and Kubernetes become even more important, not less.&lt;/p&gt;

&lt;p&gt;Cloud doesn’t remove operational complexity, it just changes how you handle it.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. “Cloud means no downtime”
&lt;/h2&gt;

&lt;p&gt;Cloud providers are reliable, but they can't make up fpr poor architureal decisions.&lt;/p&gt;

&lt;p&gt;If your system goes down because you deployed everything in one region, that’s not a cloud failure—that’s an architecture decision.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;High availability is something you design for. It’s not automatically included.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  5. “Cloud is just someone else’s computer”
&lt;/h2&gt;

&lt;p&gt;You’ve probably heard this one😂.&lt;/p&gt;

&lt;p&gt;It’s not completely wrong, but it misses the point.&lt;/p&gt;

&lt;p&gt;Cloud is more than remote servers. You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;managed databases&lt;/li&gt;
&lt;li&gt;serverless computing&lt;/li&gt;
&lt;li&gt;global infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, with &lt;strong&gt;AWS&lt;/strong&gt; Lambda, you don’t even manage servers directly.&lt;/p&gt;

&lt;p&gt;That’s a very different model from traditional infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. “You don’t need to understand networking”
&lt;/h2&gt;

&lt;p&gt;This is a quick way to struggle in cloud engineering.&lt;/p&gt;

&lt;p&gt;You still need to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VPCs&lt;/li&gt;
&lt;li&gt;subnets&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;firewalls/security groups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In many cases, cloud networking is &lt;em&gt;more complex&lt;/em&gt; than on-prem setups because of how flexible it is.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. “Auto-scaling will fix performance issues”
&lt;/h2&gt;

&lt;p&gt;Auto-scaling helps with load, not bad design.&lt;/p&gt;

&lt;p&gt;If your system is inefficient:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it will still be inefficient at scale&lt;/li&gt;
&lt;li&gt;it will just cost more while being inefficient&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scaling a poorly designed system doesn’t automatically fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. “Serverless means there are no servers”
&lt;/h2&gt;

&lt;p&gt;There are still servers.&lt;/p&gt;

&lt;p&gt;You just don’t manage them.&lt;/p&gt;

&lt;p&gt;That abstraction is powerful, but it doesn’t remove concepts like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cold starts&lt;/li&gt;
&lt;li&gt;execution limits&lt;/li&gt;
&lt;li&gt;resource constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding what’s happening under the hood still matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. “Cloud migration is a one-time thing”
&lt;/h2&gt;

&lt;p&gt;A lot of teams think moving to the cloud is the finish line.&lt;/p&gt;

&lt;p&gt;It’s not.&lt;/p&gt;

&lt;p&gt;After migration, you still need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;optimize costs&lt;/li&gt;
&lt;li&gt;improve architecture&lt;/li&gt;
&lt;li&gt;monitor performance&lt;/li&gt;
&lt;li&gt;tighten security&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud is an ongoing process, not a one-off project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;The cloud solves so many problems and offers a lot of solutions but you don't want to be misinformed going into the cloud. So give cloud a try, make use of the AWS free tier. Learn, build, and learn again. You want to make sure you have a solid foundation. So when the time comes, you'll be equipped and ready to maximize the advantages of the cloud.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Stop Giving AI Agents AWS Credentials: A Better Way to Secure Access</title>
      <dc:creator>Sarvar Nadaf</dc:creator>
      <pubDate>Mon, 20 Apr 2026 14:00:16 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/stop-giving-ai-agents-aws-credentials-a-better-way-to-secure-access-5gih</link>
      <guid>https://future.forem.com/aws-builders/stop-giving-ai-agents-aws-credentials-a-better-way-to-secure-access-5gih</guid>
      <description>&lt;p&gt;👋 Hey there, tech enthusiasts! &lt;/p&gt;

&lt;p&gt;I'm Sarvar, a Cloud Architect with a passion for transforming complex technological challenges into elegant solutions. With extensive experience spanning Cloud Operations (AWS &amp;amp; Azure), Data Operations, Analytics, DevOps, and Generative AI, I've had the privilege of architecting solutions for global enterprises that drive real business impact. Through this article series, I'm excited to share practical insights, best practices, and hands-on experiences from my journey in the tech world. Whether you're a seasoned professional or just starting out, I aim to break down complex concepts into digestible pieces that you can apply in your projects.&lt;/p&gt;

&lt;p&gt;Let's dive in and explore the fascinating world of cloud technology together! 🚀&lt;/p&gt;




&lt;h2&gt;
  
  
  The Wake-Up Call
&lt;/h2&gt;

&lt;p&gt;Three months ago, our security team flagged something concerning. Developers were feeding production logs, error messages, and configuration snippets to ChatGPT for debugging help.&lt;/p&gt;

&lt;p&gt;The problem? Those logs contained customer identifiers, internal service names, and architectural details we definitely didn't want leaving our network.&lt;/p&gt;

&lt;p&gt;We couldn't just block ChatGPT - developers needed AI assistance. The productivity gains were real. But we also couldn't keep hemorrhaging sensitive data to external APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The requirements were clear:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI agents need AWS access for legitimate automation tasks&lt;/li&gt;
&lt;li&gt;Zero sensitive data leaves our AWS environment&lt;/li&gt;
&lt;li&gt;Every action must be auditable&lt;/li&gt;
&lt;li&gt;Principle of least privilege, always&lt;/li&gt;
&lt;li&gt;No impact on developer velocity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's when I started looking at Model Context Protocol (MCP) as a security boundary.&lt;/p&gt;




&lt;h2&gt;
  
  
  Understanding MCP as a Security Layer
&lt;/h2&gt;

&lt;p&gt;Before diving into implementation, let's clarify what MCP actually does and why it matters for security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Context Protocol&lt;/strong&gt; is an open standard that sits between your AI agent and your resources. Think of it as a translator and gatekeeper combined.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer → AI Agent → MCP Server → AWS IAM → AWS Resources
                          ↓
                    Security Layer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MCP server doesn't just pass requests through. It acts as a security boundary that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validates every request before execution&lt;/li&gt;
&lt;li&gt;Translates AI intentions into specific AWS API calls&lt;/li&gt;
&lt;li&gt;Enforces authentication and authorization&lt;/li&gt;
&lt;li&gt;Logs everything for audit trails&lt;/li&gt;
&lt;li&gt;Provides a single point of control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Instead of giving AI agents direct AWS credentials, you give them access to an MCP server that has carefully scoped permissions. The AI never touches AWS credentials. It doesn't even know they exist.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Security Architecture
&lt;/h2&gt;

&lt;p&gt;After several iterations, here's the pattern that survived production. I'll explain the thinking behind each layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Authentication Without Permanent Credentials
&lt;/h3&gt;

&lt;p&gt;The first principle: no permanent credentials anywhere in the system.&lt;/p&gt;

&lt;p&gt;Developers authenticate with our existing identity provider (Okta in our case). The identity provider issues a JWT token containing the user's identity and group memberships. The MCP server validates this JWT and issues a short-lived session token - 15 minutes, no exceptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why 15 minutes?&lt;/strong&gt; Long enough for a debugging session, short enough that a leaked token becomes useless quickly. If someone steals a session token, they have a 15-minute window at most. Compare that to permanent AWS credentials that work forever until manually revoked.&lt;/p&gt;

&lt;p&gt;The MCP server never stores these tokens. They're validated, used, and discarded. When they expire, users re-authenticate. It's a minor inconvenience that prevents major security incidents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Request Validation
&lt;/h3&gt;

&lt;p&gt;This is where MCP shines as a security boundary. Every request goes through multiple validation checks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action Allowlist:&lt;/strong&gt; The MCP server maintains a strict list of allowed AWS actions. If the AI requests something not on the list, it's blocked immediately. No wildcards, no "just in case" permissions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern Detection:&lt;/strong&gt; I scan every request for dangerous patterns. Words like "delete", "terminate", "destroy" trigger additional scrutiny. Even if the action is technically allowed, suspicious patterns can block the request or require additional approval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parameter Sanitization:&lt;/strong&gt; Before logging or processing, all sensitive parameters get redacted. Passwords, tokens, API keys - anything that looks like a credential gets replaced with &lt;code&gt;[REDACTED]&lt;/code&gt; in logs. This prevents credential leakage through audit trails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate Limiting:&lt;/strong&gt; Each user gets a request budget. Exceed it, and requests start getting throttled. This prevents both accidental runaway scripts and intentional abuse.&lt;/p&gt;

&lt;p&gt;The validation happens in milliseconds. Developers don't notice the overhead, but it's the difference between a secure system and a disaster waiting to happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: AWS Execution with Scoped Permissions
&lt;/h3&gt;

&lt;p&gt;The MCP server uses an IAM role with specific permissions. Not admin. Not power user. Just what's needed for legitimate use cases.&lt;/p&gt;

&lt;p&gt;I started by listing every legitimate use case developers had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read CloudWatch logs for debugging&lt;/li&gt;
&lt;li&gt;List S3 buckets to find data&lt;/li&gt;
&lt;li&gt;Get objects from specific buckets&lt;/li&gt;
&lt;li&gt;Query CloudWatch metrics for dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then I created IAM policies that allow exactly those actions and nothing else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; Explicit denies for dangerous actions, even if they're not in the allow list. This protects against future policy changes or misconfigurations.&lt;/p&gt;

&lt;p&gt;Example: Even if someone accidentally adds &lt;code&gt;s3:*&lt;/code&gt; to the allow list, an explicit deny on &lt;code&gt;s3:DeleteBucket&lt;/code&gt; still blocks it. Defense in depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: Comprehensive Audit Trail
&lt;/h3&gt;

&lt;p&gt;CloudTrail logs every AWS API call, but it doesn't capture the context we need. Who made the request? What was the AI prompt? What resources were accessed?&lt;/p&gt;

&lt;p&gt;I built a custom logging layer that captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User identity (email, not just IAM role)&lt;/li&gt;
&lt;li&gt;Original AI prompt (hashed, not stored in plain text)&lt;/li&gt;
&lt;li&gt;AWS action requested&lt;/li&gt;
&lt;li&gt;Resources accessed&lt;/li&gt;
&lt;li&gt;Result (success/failure)&lt;/li&gt;
&lt;li&gt;Execution time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this goes to CloudWatch Logs in structured JSON format. Now I can query: "Show me all S3 access by user X this week" or "What resources did the AI access when processing this prompt?"&lt;/p&gt;

&lt;p&gt;The logs are immutable and retained for 90 days for compliance.&lt;/p&gt;




&lt;h2&gt;
  
  
  How We Built It
&lt;/h2&gt;

&lt;p&gt;The deployment came down to three critical security decisions. Each one was driven by a specific threat we wanted to prevent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 1: Network Isolation Over Convenience
&lt;/h3&gt;

&lt;p&gt;I put the MCP server in a completely separate VPC from production. No shared networks, no VPC peering, nothing. The only communication path is through VPC endpoints to AWS APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; If someone compromises the MCP server, they're trapped. No internet access means they can't exfiltrate data. No production VPC access means they can't pivot to other systems. They're stuck in a cage that only opens to specific AWS services.&lt;/p&gt;

&lt;p&gt;I chose ECS Fargate because it gave me this isolation without the overhead of managing EC2 instances. No patching, no scaling configuration, just containers in a locked-down network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trade-off:&lt;/strong&gt; More complex networking setup. But the security benefit was worth it. A compromised MCP server becomes useless to an attacker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 2: Explicit Denies as the Last Line of Defense
&lt;/h3&gt;

&lt;p&gt;The IAM policy has two blocks: allows and denies. The allows are specific - exact actions on exact resources. But the denies are what keep me sleeping at night.&lt;/p&gt;

&lt;p&gt;I explicitly deny all delete operations, all terminate operations, all IAM changes, and all KMS key operations. Even if someone misconfigures the allow block and adds &lt;code&gt;s3:*&lt;/code&gt;, the deny on &lt;code&gt;s3:DeleteBucket&lt;/code&gt; still holds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Policies get changed. People make mistakes. The deny block is the safety net that catches those mistakes before they become incidents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trade-off:&lt;/strong&gt; More rigid system. If we need to add a delete operation later, we have to modify both blocks. But that friction is intentional - it forces us to think twice about dangerous permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 3: Real-Time Alerting Over Post-Incident Analysis
&lt;/h3&gt;

&lt;p&gt;I set up CloudWatch alarms that fire immediately when something looks wrong. High error rates, unusual request volumes, spikes in blocked actions - all trigger alerts to our security team's Slack channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Logs are great for forensics, but alerts prevent incidents. If the AI starts trying malicious actions, I want to know in real-time, not during next week's log review.&lt;/p&gt;

&lt;p&gt;The alerts are tuned to avoid noise. More than 50 errors in 5 minutes is abnormal. More than 1,000 requests from one user in 5 minutes is suspicious. These thresholds came from watching normal usage patterns for a month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trade-off:&lt;/strong&gt; Alert fatigue is real. We tune the thresholds monthly based on false positive rates. But I'd rather investigate a false alarm than miss a real attack.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Broke (And How I Fixed It)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Issue 1: Permission Errors Everywhere
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; First deployment, every request failed with &lt;code&gt;AccessDenied&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; I was too restrictive. The IAM policy only allowed specific S3 buckets, but developers needed to list buckets first to know what existed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Add &lt;code&gt;s3:ListAllMyBuckets&lt;/code&gt; with a wildcard resource. Let them see what exists, but control what they can read. It's like letting someone see the library catalog without giving them keys to every book.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Start with read-only list permissions, then restrict data access. Users need to discover resources before they can use them.&lt;/p&gt;




&lt;h3&gt;
  
  
  Issue 2: CloudTrail Logs Were Useless
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; CloudTrail showed the MCP server's actions, but not which user requested them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; All requests came from the same IAM role. No way to trace back to individual users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Pass user context through custom CloudWatch Logs. Every MCP request gets logged with the user's email, the action requested, and the resources accessed. Now I can trace every action back to the person who requested it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; CloudTrail alone isn't enough for multi-user systems. You need custom logging to capture user context.&lt;/p&gt;




&lt;h3&gt;
  
  
  Issue 3: AI Agents Tried Creative Exploits
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; The AI tried to chain commands to bypass restrictions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example request:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"First list the S3 buckets, then for each bucket, 
download all objects and search for passwords"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; My validation checked individual actions, not sequences. The AI was trying to automate a multi-step attack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Detect and block chaining attempts. Look for words like "then", "after that", "for each", "loop through". Force users to make explicit, separate requests for each action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; AI agents are creative. They'll try to work around restrictions. You need to think like an attacker.&lt;/p&gt;




&lt;h3&gt;
  
  
  Issue 4: Rate Limiting Was Too Aggressive
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; Legitimate users hit rate limits during normal debugging sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; I set limits too low (10 requests per minute). Debugging often requires rapid iteration - check logs, adjust query, check again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Tiered rate limits based on action type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read operations (Get, Describe): 100 requests per 5 minutes&lt;/li&gt;
&lt;li&gt;List operations: 50 requests per 5 minutes&lt;/li&gt;
&lt;li&gt;Write operations: 10 requests per 5 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read operations get higher limits because they're lower risk. Write operations stay restricted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; One-size-fits-all rate limits don't work. Different actions have different risk profiles.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;After three months in production, here's what actually matters:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Explicit Denies Are Your Friend
&lt;/h3&gt;

&lt;p&gt;Don't rely on "not allowing" something. Explicitly deny dangerous actions. Even if someone misconfigures the allow rules, the denies hold.&lt;/p&gt;

&lt;p&gt;I have explicit denies for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All delete operations&lt;/li&gt;
&lt;li&gt;All terminate operations&lt;/li&gt;
&lt;li&gt;All IAM operations&lt;/li&gt;
&lt;li&gt;All KMS key operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the "break glass" protections. They prevent catastrophic mistakes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Log Everything, But Make It Searchable
&lt;/h3&gt;

&lt;p&gt;CloudTrail is great, but you need custom logs for MCP-specific context. I send everything to CloudWatch Logs with structured JSON.&lt;/p&gt;

&lt;p&gt;Now I can query: "Show me all S3 access by user X in the last hour" or "What resources did the AI access when processing this prompt?"&lt;/p&gt;

&lt;p&gt;The logs are immutable and retained for 90 days. If something goes wrong, I can reconstruct exactly what happened.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Sanitize Everything
&lt;/h3&gt;

&lt;p&gt;Never log the actual AI prompts. They might contain sensitive data. I hash them instead.&lt;/p&gt;

&lt;p&gt;You can still correlate requests (same hash = same prompt), but you're not storing potentially sensitive prompts in logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Network Isolation Matters
&lt;/h3&gt;

&lt;p&gt;The MCP server runs in a private VPC with no internet access. It can only reach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS API endpoints (via VPC endpoints)&lt;/li&gt;
&lt;li&gt;Internal authentication service&lt;/li&gt;
&lt;li&gt;CloudWatch Logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If someone compromises the MCP server, they can't exfiltrate data. They're stuck in an isolated network.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Test Your Security Controls
&lt;/h3&gt;

&lt;p&gt;I wrote tests to verify the security controls actually work. Tests like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verify delete operations are blocked&lt;/li&gt;
&lt;li&gt;Verify IAM operations are blocked&lt;/li&gt;
&lt;li&gt;Verify rate limits work&lt;/li&gt;
&lt;li&gt;Verify audit logs capture user context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run these tests in CI/CD. If they pass, your security controls are working. If they fail, you know immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  Alternative Approaches I Considered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Direct IAM Roles for AI Agents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simpler architecture&lt;/li&gt;
&lt;li&gt;No MCP server to maintain&lt;/li&gt;
&lt;li&gt;Lower latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No request validation layer&lt;/li&gt;
&lt;li&gt;Can't block dangerous patterns&lt;/li&gt;
&lt;li&gt;Harder to audit user actions&lt;/li&gt;
&lt;li&gt;AI has direct AWS credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why I didn't use it:&lt;/strong&gt; Too risky. One prompt injection and the AI could delete production resources. The MCP layer provides defense in depth.&lt;/p&gt;




&lt;h3&gt;
  
  
  Option 2: AWS Lambda as MCP Server
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serverless, no infrastructure&lt;/li&gt;
&lt;li&gt;Automatic scaling&lt;/li&gt;
&lt;li&gt;Pay per request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cold starts (500ms+)&lt;/li&gt;
&lt;li&gt;15-minute timeout limit&lt;/li&gt;
&lt;li&gt;Harder to maintain state (rate limiting)&lt;/li&gt;
&lt;li&gt;More complex networking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why I didn't use it:&lt;/strong&gt; Cold starts killed the developer experience. Waiting 500ms for every request was frustrating. Fargate has no cold starts.&lt;/p&gt;




&lt;h3&gt;
  
  
  Option 3: API Gateway + Lambda
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built-in rate limiting&lt;/li&gt;
&lt;li&gt;API key management&lt;/li&gt;
&lt;li&gt;Request/response transformation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More complex setup&lt;/li&gt;
&lt;li&gt;Higher cost at scale&lt;/li&gt;
&lt;li&gt;Still has Lambda cold starts&lt;/li&gt;
&lt;li&gt;Overkill for internal use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why I didn't use it:&lt;/strong&gt; The built-in rate limiting was nice, but not worth the complexity for an internal tool. Fargate + ALB was simpler.&lt;/p&gt;




&lt;h2&gt;
  
  
  Best Practices That Actually Matter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Start With Read-Only
&lt;/h3&gt;

&lt;p&gt;Deploy with read-only permissions first. Let developers use it for a week. Then gradually add write permissions based on actual needs.&lt;/p&gt;

&lt;p&gt;This prevents over-permissioning. You'll discover what developers actually need, not what they think they need.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use Separate AWS Accounts
&lt;/h3&gt;

&lt;p&gt;Run the MCP server in a separate AWS account from your production workloads. Use cross-account roles for access.&lt;/p&gt;

&lt;p&gt;If the MCP account is compromised, production is still isolated. It's an extra layer of defense.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Monitor for Anomalies
&lt;/h3&gt;

&lt;p&gt;Set up CloudWatch alarms for unusual patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High error rates (&amp;gt;50 errors in 5 minutes)&lt;/li&gt;
&lt;li&gt;Unusual access patterns (&amp;gt;1,000 requests in 5 minutes)&lt;/li&gt;
&lt;li&gt;Blocked actions (&amp;gt;100 blocks in 5 minutes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These alerts go to your security team. Response time is critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Regular Security Reviews
&lt;/h3&gt;

&lt;p&gt;Every month, review:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which actions are being used most&lt;/li&gt;
&lt;li&gt;Which permissions are never used (remove them)&lt;/li&gt;
&lt;li&gt;Any blocked requests (are they legitimate needs?)&lt;/li&gt;
&lt;li&gt;Rate limit effectiveness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Security isn't set-and-forget. It requires ongoing attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Document Everything
&lt;/h3&gt;

&lt;p&gt;Create a runbook for common scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to add a new allowed action&lt;/li&gt;
&lt;li&gt;How to investigate suspicious activity&lt;/li&gt;
&lt;li&gt;How to rotate credentials&lt;/li&gt;
&lt;li&gt;How to handle a security incident&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When something goes wrong at 2 AM, you'll be glad you documented it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Three months in production taught me that securing AI agent access isn't about perfect security - it's about making attacks harder than they're worth while keeping developers productive.&lt;/p&gt;

&lt;p&gt;The MCP pattern works because it gives you a single point of control. You're not trying to secure the AI agent itself. You're securing the gateway it uses to access your resources. That gateway validates every request, enforces least privilege, logs everything, and runs in an isolated network.&lt;/p&gt;

&lt;p&gt;We went from developers sending production data to ChatGPT to having a secure, auditable system where AI agents help without creating risk. The benefit? No more 2 AM calls about data leaks.&lt;/p&gt;

&lt;p&gt;Is it perfect? No. Can a determined attacker find ways around it? Probably. But it's dramatically better than the alternatives: giving AI agents direct AWS credentials or blocking AI tools entirely and watching developers find workarounds.&lt;/p&gt;

&lt;p&gt;The key insight: Security is not about building walls. It's about building gates with guards. MCP is that gate.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Security is not about building walls. It's about building gates with guards."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  📌 Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Thank you for reading! I hope this gave you practical ideas for securing AI agent access in your environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Found this useful?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❤️ Like if it helped you think through your security approach&lt;/li&gt;
&lt;li&gt;🦄 Unicorn if you're implementing this pattern&lt;/li&gt;
&lt;li&gt;💾 Save for your next security review&lt;/li&gt;
&lt;li&gt;🔄 Share with your security team&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Follow me for more on:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS security patterns&lt;/li&gt;
&lt;li&gt;AI/ML infrastructure&lt;/li&gt;
&lt;li&gt;Cloud architecture&lt;/li&gt;
&lt;li&gt;DevSecOps practices&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💡 What's Next
&lt;/h2&gt;

&lt;p&gt;I'm working on a follow-up article about monitoring and alerting for MCP deployments. Follow for updates.&lt;/p&gt;

&lt;p&gt;Also exploring: Multi-region MCP deployments and disaster recovery patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌐 Portfolio &amp;amp; Work
&lt;/h2&gt;

&lt;p&gt;Explore my full body of work, certifications, and architecture projects:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://sarvarnadaf.com" rel="noopener noreferrer"&gt;Visit My Website&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Services I Offer
&lt;/h2&gt;

&lt;p&gt;Looking for hands-on guidance with cloud security or AI infrastructure?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud Security Architecture (AWS / Azure)&lt;/li&gt;
&lt;li&gt;AI/ML Infrastructure Design&lt;/li&gt;
&lt;li&gt;Security Audit &amp;amp; Remediation&lt;/li&gt;
&lt;li&gt;Technical Writing &amp;amp; Documentation&lt;/li&gt;
&lt;li&gt;Architecture Reviews&lt;/li&gt;
&lt;li&gt;1:1 Technical Mentorship&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🤝 Let's Connect
&lt;/h2&gt;

&lt;p&gt;Questions about implementing this pattern? Drop a comment or connect with me on &lt;a href="https://www.linkedin.com/in/sarvar04/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For consulting or technical discussions: &lt;a href="mailto:simplynadaf@gmail.com"&gt;simplynadaf@gmail.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stay secure!&lt;/strong&gt; 🔒&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>agents</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Tighter and more concrete Scaling AWS Serverless in Production: Event Sources, Throttling, and Zero-Downtime Deploys</title>
      <dc:creator>Collins Ushi</dc:creator>
      <pubDate>Mon, 20 Apr 2026 07:58:00 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/tighter-and-more-concrete-scaling-aws-serverless-in-production-event-sources-throttling-and-icn</link>
      <guid>https://future.forem.com/aws-builders/tighter-and-more-concrete-scaling-aws-serverless-in-production-event-sources-throttling-and-icn</guid>
      <description>&lt;p&gt;"Serverless scales automatically" is one of those claims that is technically true and practically misleading. The platform will scale your code but the rate, the ceiling, and most importantly the failure modes of that scaling are determined by decisions you make at three specific layers of the system. Get any of them wrong and your perfectly elastic, pay-per-use architecture will either stall out during a traffic spike, silently corrupt data under load, or quietly DDoS itself into a tarpit.&lt;/p&gt;

&lt;p&gt;This post is about the parts of serverless scaling that aren't on the marketing page. It's organised around the three boundaries where scale is actually decided: the event source that feeds your functions, the throughput quotas AWS enforces against you, and the deployment pipeline that ships changes without breaking production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three boundaries where scale is decided
&lt;/h2&gt;

&lt;p&gt;Every serverless system has the same topology. Work enters at an edge (API Gateway, an event bus, a queue, a stream). It's handed to a compute layer (Lambda, Fargate) that does the work. The compute layer writes to downstream systems (DynamoDB, S3, another queue, a third-party API). A control plane, the Lambda service itself, plus IAM and your deployment tooling governs how much of this can happen concurrently.&lt;/p&gt;

&lt;p&gt;Almost every scaling problem I've seen in production falls into one of three buckets:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Event source misconfiguration. The Lambda is fine, but the queue or stream feeding it is throttling throughput, triggering duplicate deliveries, or creating head-of-line blocking.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Quota collision. Your function can scale, but something upstream or downstream can't - API Gateway burst, downstream database connections, account-wide Lambda concurrency, a third-party rate limit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deployment fragility. The system scales correctly, but a bad deploy takes it down globally in thirty seconds because there's no canary and no automatic rollback.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The rest of this post works through each of those three layers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: SQS-backed Lambda
&lt;/h2&gt;

&lt;p&gt;Lambda integrates with SQS through a pull model that most people never inspect. When you connect a queue to a function, AWS stands up a small fleet of pollers on your behalf; typically starting at five - which continuously ask the queue "any work?" and invoke your function when the answer is yes.&lt;/p&gt;

&lt;p&gt;That polling fleet is the scaling unit. As the queue backlog grows, AWS adds pollers, which spawn more concurrent function invocations, which drain the queue faster. The ramp continues until one of three things happens: the queue empties, your function's reserved concurrency ceiling is hit, or the account-wide concurrency limit is reached.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Batch size and batch window&lt;/strong&gt;&lt;br&gt;
The single biggest throughput lever on an SQS-backed Lambda is how much work each invocation does. By default, a poller might hand your function a single message which is wasteful, because the invocation overhead dominates. Raising the batch size (up to 10,000 for standard queues, 10 for FIFO) lets a single invocation drain many messages at once.&lt;/p&gt;

&lt;p&gt;The tradeoff is latency. If you tell Lambda to wait for 10 messages before invoking, and traffic is light, the first message in the batch sits idle until nine others arrive. &lt;/p&gt;

&lt;p&gt;The &lt;code&gt;MaximumBatchingWindowInSeconds&lt;/code&gt; setting the batch window puts a ceiling on that wait. It says "gather up to N messages, but if this many seconds pass, send whatever you have." Setting it to a few seconds typically captures most of the batching benefit while keeping tail latency bounded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The visibility timeout rule&lt;/strong&gt;&lt;br&gt;
When a poller hands your function a batch, those messages become invisible to other pollers for a configurable period, the visibility timeout. If the function succeeds, it deletes the messages. If the function crashes or times out, the messages become visible again and get retried.&lt;/p&gt;

&lt;p&gt;The failure mode to understand is subtle. Suppose your function has a 10 second timeout and the queue's visibility timeout is also 10 seconds. If a single invocation hits a slow downstream and runs for 9 seconds, then deletes the messages at 9.5 seconds, you're fine. But if anything causes the invocation to slip past 10 seconds, the message reappears in the queue while the original function is still running. A second function picks it up. Now two invocations are processing the same message duplicated work, possible data corruption, and if the downstream isn't idempotent, a real mess.&lt;/p&gt;

&lt;p&gt;The rule of thumb is to set the visibility timeout to at least six times the function timeout. Overkill? Yes. But this is one of those parameters where being paranoid costs you nothing and the failure mode is insidious enough that you want margin.&lt;/p&gt;
&lt;h2&gt;
  
  
  Layer 2: Kinesis-backed Lambda
&lt;/h2&gt;

&lt;p&gt;Kinesis looks like SQS on the surface but behaves nothing like it. SQS is a buffer: messages are independent, order doesn't matter, and consumers scale horizontally. Kinesis is an ordered stream: records are partitioned into shards, order within a shard matters, and concurrency is fundamentally bounded by shard count.&lt;/p&gt;

&lt;p&gt;The rule is one Lambda execution environment per shard. A stream with four shards gets four concurrent Lambda invocations regardless of backlog size. You could have a billion records waiting and still have only four workers draining them. This is the piece that bites teams migrating from SQS: "just add more Lambda" doesn't work.&lt;/p&gt;

&lt;p&gt;There are two ways to scale past the shard ceiling. The infrastructure answer is to reshard - splitting four shards into eight doubles your concurrency. The software answer is the Parallelization Factor, which lets a single shard be processed by up to 10 concurrent Lambda invocations simultaneously, as long as records with the same partition key are still delivered to the same invocation. Order is preserved within a partition key, not across the whole shard. For most analytics and event-processing workloads, that's a meaningful distinction that buys you a 10x concurrency boost without resharding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Iterator age: the lag signal&lt;/strong&gt;&lt;br&gt;
In SQS you watch queue depth to know you're falling behind. In Kinesis you watch iterator age - the age of the most recent record your function has processed. A flat iterator age means you're keeping up. A climbing iterator age means records are entering the stream faster than you can drain them, and data is aging toward the retention cliff. If iterator age crosses retention (24 hours by default, up to 365 days with extended retention), records fall off the back of the stream and are gone.&lt;/p&gt;

&lt;p&gt;Iterator age is the single most important metric to alarm on for any Kinesis-backed Lambda. Queue depth tells you about volume; iterator age tells you about time remaining before data loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced Fan-Out&lt;/strong&gt;&lt;br&gt;
The default Kinesis read bandwidth is 2 MB/s per shard, shared across all consumers. Attach a Lambda and a Firehose to the same stream and each effectively gets 1 MB/s. Add a third consumer and now everyone gets 666 KB/s. This is the noisy-neighbour problem applied to data streams.&lt;/p&gt;

&lt;p&gt;Enhanced Fan-Out solves it by giving each registered consumer its own dedicated 2 MB/s pipe. For production pipelines with multiple downstream consumers, this is not optional, it's the difference between a stream that scales with consumers and one that gets slower with every addition.&lt;/p&gt;
&lt;h2&gt;
  
  
  DynamoDB Streams vs Kinesis Data Streams
&lt;/h2&gt;

&lt;p&gt;When you need to capture changes from DynamoDB, you have two architecturally similar but operationally very different options. Both use shards as the parallelism unit, but the management model diverges sharply.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fux25t4aype0ajyswilpy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fux25t4aype0ajyswilpy.png" alt=" " width="666" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The choice is almost entirely about how much scaling responsibility you want to own. DynamoDB Streams is the right default for triggers, CDC to a single downstream, and most small-to-medium workloads - you pay nothing for operational simplicity. Kinesis Data Streams is the right choice when you have many consumers, need long replay windows (reprocessing the last week of events for a new feature is a common pattern), or need dedicated bandwidth per consumer for SLA reasons.&lt;/p&gt;
&lt;h2&gt;
  
  
  Poison pills and the negative scaling trap
&lt;/h2&gt;

&lt;p&gt;There's a counterintuitive behaviour of Lambda's event source integrations that every serverless team eventually discovers the hard way. If your function starts returning errors at a high rate; crashing, timing out, throwing exceptions - the Lambda service doesn't scale up to retry faster. It scales down. It reduces polling rate, reduces concurrency, and backs off.&lt;/p&gt;

&lt;p&gt;From Lambda's perspective this is sensible: a wave of errors probably means a downstream database is struggling, and pouring more traffic at it will turn a degradation into an outage. The service is protecting your infrastructure from your own code. But for an operations team watching the dashboard, this self-imposed slowdown shows up as a rapidly climbing iterator age or queue depth exactly when you can least afford it.&lt;/p&gt;

&lt;p&gt;The way out is to stop throwing hard errors when individual records fail. Instead of letting one bad record crash the entire batch, use the &lt;code&gt;ReportBatchItemFailures&lt;/code&gt; response pattern. This tells Lambda "the invocation succeeded overall, but here are the specific record IDs that failed don't delete those, but keep the rest." The healthy records move forward, the failed ones go to a DLQ or on-failure destination, and Lambda sees a succeeding function and maintains full polling velocity.&lt;/p&gt;

&lt;p&gt;Here's a clean implementation of the pattern for an SQS or DynamoDB Streams-backed function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process a batch, reporting per-record failures to preserve scaling velocity.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_extract_record_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_extract_payload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;process_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Log with structured context so the failure is diagnosable
&lt;/span&gt;            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;record_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}))&lt;/span&gt;
            &lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;itemIdentifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# Returning this shape keeps the invocation status "Success" from Lambda's
&lt;/span&gt;    &lt;span class="c1"&gt;# perspective, while telling the poller exactly which records to retry.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batchItemFailures&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_record_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;SQS uses messageId; DynamoDB/Kinesis use sequenceNumber.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messageId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SequenceNumber&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kinesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sequenceNumber&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_payload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NewImage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kinesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kinesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Empty payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Actual business logic here idempotent, please
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You also need to enable ReportBatchItemFailures on the event source mapping itself (in SAM, CDK, or the console) the function-side response is inert without it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2 intermission: the token bucket
&lt;/h2&gt;

&lt;p&gt;Underneath almost every AWS throttling decision is the same algorithm: the token bucket. API Gateway uses it for per-route throttling. Lambda uses it for burst concurrency. DynamoDB uses it for provisioned-throughput tables. Every AWS SDK client uses one internally for retry management. Understanding it is the difference between tuning limits with intent and adjusting them until the alarms stop firing.&lt;/p&gt;

&lt;p&gt;The mental model has three pieces. The bucket has a maximum capacity, the burst limit and starts full. Each successful request consumes one token. If the bucket is empty when a request arrives, the request is throttled (HTTP 429). Tokens refill at a steady rate; the rate limit, expressed as requests per second. A bucket with a 1,000 request burst and a 100 RPS refill rate can absorb a 1,000 request spike instantly, but then needs 10 seconds of zero traffic to fully recover its burst capacity.&lt;/p&gt;

&lt;p&gt;Three things about this are operationally painful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CloudWatch lies about it&lt;/strong&gt;. Standard metrics aggregate over 1 or 5 minute windows. If 6,000 requests arrive at 100 RPS evenly across a minute, the graph looks identical to 6,000 requests arriving in the first five seconds. The first scenario is healthy; the second emptied your bucket, throttled hundreds of requests, and then sat idle. The only metric that tells you the truth is the throttle count itself in API Gateway, 4XXError or ThrottledCount; in Lambda, Throttles. Alarm on throttles, not on request counts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enforcement is distributed&lt;/strong&gt;. There isn't one bucket sitting in one server. API Gateway enforces its quotas across a fleet of nodes, and tokens don't refill in perfect synchrony across all of them. At the edges you'll see "jitter" a request throttled on node A that would have succeeded on node B a millisecond later. This is why single-burst load tests often pass and then production fails you tested an idealised bucket, not the real distributed one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mismatched buckets upstream and downstream create phantom capacity&lt;/strong&gt;. If API Gateway has a 5,000 RPS burst but the Lambda it fronts has a reserved concurrency of 500, the API Gateway quota is fiction. The real ceiling is 500. Every quota in your chain has to be reconciled against the weakest link, or you'll think you have headroom you don't actually have.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical tuning&lt;/strong&gt;&lt;br&gt;
Four habits make token-bucket behaviour predictable in practice:&lt;br&gt;
First, implement exponential backoff with jitter on every client. A fixed backoff from 100 simultaneous throttled clients causes all 100 to retry at exactly the same millisecond, re-emptying the bucket instantly. Randomised backoff spreads the retries out so the bucket has time to refill between waves.&lt;/p&gt;

&lt;p&gt;Second, calculate time-to-refill explicitly. refill_seconds = burst_limit / rate_limit. If your burst is 1,000 and your rate is 100, you need 10 seconds of quiet to recover full burst capacity. If your traffic is continuous, you may never recover it which means your effective capacity is the rate limit, not the burst.&lt;/p&gt;

&lt;p&gt;Third, load-test for sustained burst, not just peak. A burst of 500 with a rate of 100 RPS can absorb 200 RPS for about five seconds before the bucket drains; after that you'll see ~50% throttling. If your expected peak is sustained, you need to size the rate limit, not the burst.&lt;/p&gt;

&lt;p&gt;Fourth, use Lambda Provisioned Concurrency as a "floor of warm tokens" for latency-sensitive paths but understand the cost. Provisioned concurrency is subtracted from your account's unreserved pool. Provisioning 500 units for one function permanently removes those 500 units from every other function in the account, even when your provisioned function is idle. Over-provisioning quietly starves the rest of your workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pre-production scaling review
&lt;/h2&gt;

&lt;p&gt;Before putting any ingestion-heavy serverless pipeline in front of real traffic, there are four questions worth writing down the answers to. I've seen each of them caught in review and missed in launch, with predictable outcomes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are the hard limits in every hop of this chain?&lt;/strong&gt; &lt;br&gt;
Not the defaults, the actual limits on this account, in this region, this month. Lambda concurrency, API Gateway RPS, DynamoDB provisioned throughput, SQS message size, Kinesis shard count. Put them in a table. The ceiling of the whole system is the lowest number on the page.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Is the timeout hierarchy consistent? *&lt;/em&gt;&lt;br&gt;
The function timeout must be shorter than the visibility timeout, which must be shorter than the retry window, which must be shorter than any upstream timeout. Any inversion creates ghost retries invocations that succeed but get replayed because the upstream decided they'd failed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the error strategy, and is it written down?&lt;/strong&gt; &lt;br&gt;
Is this system at-least-once or exactly-once? When a record fails, does it halt the pipeline (preserving order, stopping throughput) or go to a dead-letter queue (preserving throughput, losing order)? There's no universally right answer, but there is a right answer for your business and it should be decided before traffic arrives, not during an incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are there native integrations you're replacing with custom glue?&lt;/strong&gt; If you're moving data from Kinesis to S3 with a Lambda function, you're probably reimplementing Amazon Data Firehose, badly. If you're parsing DynamoDB Stream records and writing them to OpenSearch with a Lambda, the Zero-ETL integration likely exists. Custom glue is the highest-maintenance part of any pipeline; push it into a managed service wherever possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: Shipping changes without breaking production
&lt;/h2&gt;

&lt;p&gt;A perfectly scaling system is one bad deploy away from an outage. The final part of the playbook is the deployment pipeline specifically, how SAM and CodeDeploy work together to make Lambda deploys boring.&lt;/p&gt;

&lt;p&gt;The core primitives are Lambda versions (immutable snapshots of function code) and aliases (mutable pointers to versions, like live or canary). A SAM template with AutoPublishAlias: live tells the deploy pipeline: every time my code changes, publish a new immutable version and shift the live alias to point to it gradually, with monitoring, with a kill switch.&lt;/p&gt;

&lt;p&gt;The mechanism behind that gradual shift is DeploymentPreference. Three strategies are available:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AllAtOnce&lt;/strong&gt;; the default Lambda behaviour. Instant cutover. Fast and risky; appropriate only for non-production or for tooling functions where a failed invocation is inconvenient but not expensive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Linear&lt;/strong&gt;; shift traffic in fixed increments (e.g., Linear10PercentEvery10Minutes). Simple, predictable, and gives alarms time to notice problems before they're global.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Canary&lt;/strong&gt;; shift a small slice (say 10%) immediately, hold for a configurable bake time, then shift the rest. Lower latency to full rollout than linear, still lets you catch regressions on the canary slice.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The deployment runs four phases:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Publish&lt;/strong&gt;. SAM publishes the new version as an immutable snapshot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pre-traffic validation.&lt;/strong&gt; CodeDeploy invokes a PreTraffic Lambda hook you provide a synthetic transaction or smoke test against the new version before any real traffic sees it. If the hook fails, the deploy halts immediately and live stays on the old version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weighted traffic shift.&lt;/strong&gt; CodeDeploy updates the alias to use weighted routing; sending a configurable percentage to the new version, the rest to the old. During the shift window, it watches the CloudWatch alarms you've listed (typically error rate, p99 latency, downstream throttling). If any alarm fires, traffic snaps back to 100% old version automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Post-traffic validation.&lt;/strong&gt; Once the shift completes, CodeDeploy runs a PostTraffic hook for final verification, then marks the deploy done.&lt;/p&gt;

&lt;p&gt;This is what "safe deploys" actually means in practice: not a manual runbook, but a machine that's watching metrics and can undo itself faster than you can type.&lt;/p&gt;

&lt;h2&gt;
  
  
  SAM or CDK?
&lt;/h2&gt;

&lt;p&gt;Both deploy via CloudFormation under the hood. SAM is declarative (YAML) with shorthand resources like AWS::Serverless::Function that expand into a dozen primitive resources the right choice when your infrastructure is mostly serverless and mostly stable. CDK is imperative (TypeScript, Python) and gives you loops, conditionals, abstractions, and IDE autocomplete the right choice when your infrastructure has real logic, many environments, or needs reusable constructs across teams. For a single-team serverless app, SAM will get you there faster. For a platform that many teams build on, CDK's abstraction power pays off.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four habits that keep the pipeline boring
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Test at every stage, not at the end.&lt;/strong&gt; Linters and unit tests in the build stage; integration tests against a deployed staging environment; synthetic transactions in pre-traffic hooks; post-deploy smoke tests. Each stage catches a different class of bug and none of them are substitutes for each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One AWS account per environment.&lt;/strong&gt; Dev, staging, and production should be separate accounts, not separate regions or separate resource prefixes in one account. The boundary is for blast radius (a compromised dev IAM role can't reach production), cost attribution (one bill per environment), and accident prevention (you can't accidentally terraform destroy prod if prod is in an account you're not authenticated against).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One template, parameterised per environment.&lt;/strong&gt; If your staging and production templates diverge, you stop testing production in staging. Use CloudFormation parameters for environment-specific values (table names, instance sizes, domain names) and keep the resource shape identical across environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secrets in Secrets Manager or Parameter Store, referenced dynamically.&lt;/strong&gt; Never bake credentials into environment variables at deploy time, you'll end up redeploying the app to rotate a secret. Reference secrets by ARN in the template, grant the function IAM permission to read them, and fetch them at runtime (with caching). Rotation becomes a secrets-manager operation, not a code deploy.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>awscommunity</category>
      <category>awscommunitybuilder</category>
    </item>
    <item>
      <title>Personal token factory: OpenClaw in AWS but Nvidia GB10 at home</title>
      <dc:creator>Piotr Pabis</dc:creator>
      <pubDate>Mon, 20 Apr 2026 07:14:17 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/personal-token-factory-openclaw-in-aws-but-nvidia-gb10-at-home-3klk</link>
      <guid>https://future.forem.com/aws-builders/personal-token-factory-openclaw-in-aws-but-nvidia-gb10-at-home-3klk</guid>
      <description>&lt;p&gt;Even though Nemotron 3 Super is still free on OpenRouter, the agreement is that you donate all your exchanges to Nvidia for training. A paid version is available also quite cheaply ($0.10/M input, $0.50/M output.) I still decided to utilize my ASUS GX10 (aka DGX Spark aka GB10) as the token source for my agent hosted at AWS. But the trick here is the following: I don't want to open my home network to the outside Internet! Another rule: I don't want to pay for any public IPv4 address to AWS. Will I be able to achieve that? That's actually simple! Let me guide you today how I achieved that setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanxmh8apbb4iw3ivtat2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanxmh8apbb4iw3ivtat2.jpg" alt="Diagram of the setup" width="800" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Above you can see that I plan to tunnel both home and AWS networks using Wireguard. I could potentially make this site-to-site but to make it simpler and safer for my home devices, I will only connect to Wireguard server (listener) on AWS side with DGX Spark as a client. I will do it over IPv6. At the bottom you can see that my home devices have normal connectivity to both IPv6 and IPv4 Internet but on AWS side I am relying solely on IPv6 (although internally VPC still uses private IPv4 range). There will be some issues with that but we will fix them later.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Companion GitHub repo: &lt;a href="https://github.com/ppabis/wireguard-openclaw-dgx" rel="noopener noreferrer"&gt;github.com/ppabis/wireguard-openclaw-dgx&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up VPC and Wireguard server
&lt;/h2&gt;

&lt;p&gt;I have created a simple VPC using a module with range &lt;code&gt;10.189.80.0/21&lt;/code&gt; with IPv6 enabled. It has Internet Gateway in public subnets and egress only Internet Gateway for IPv6 outbound connectivity. I disabled NAT gateway on purpose, we will cover this issue later. Instances in public subnet will be reachable over IPv6 from the outside - this is where we will place our Wireguard server. OpenClaw will remain inside the private subnet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_availability_zones"&lt;/span&gt; &lt;span class="s2"&gt;"available"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"available"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/vpc/aws"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.6.0"&lt;/span&gt;

  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-vpc"&lt;/span&gt;
  &lt;span class="nx"&gt;cidr&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.189.80.0/21"&lt;/span&gt;

  &lt;span class="nx"&gt;azs&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_availability_zones&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;available&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;public_subnets&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.189.80.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.189.81.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.189.82.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_subnets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.189.83.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.189.84.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.189.85.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;enable_dns_hostnames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;enable_dns_support&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;enable_nat_gateway&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

  &lt;span class="nx"&gt;enable_ipv6&lt;/span&gt;                                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;public_subnet_assign_ipv6_address_on_creation&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;private_subnet_assign_ipv6_address_on_creation&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;public_subnet_ipv6_prefixes&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_subnet_ipv6_prefixes&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;private_subnet_tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"private"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;public_subnet_tags&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"public"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating Wireguard server instance
&lt;/h2&gt;

&lt;p&gt;First of all, I will create an EC2 instance with latest Amazon Linux 2023. It includes WireGuard already in the repositories and the update servers work over IPv6. I will run my WireGuard server on port &lt;code&gt;51280&lt;/code&gt; so that's what I will open on the security group. All egress should also be open. I will attach am IAM role to it as well, no permissions but it will come handy later. First, define all the smaller components.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_ssm_parameter"&lt;/span&gt; &lt;span class="s2"&gt;"al2023_arm64"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name_prefix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"wireguard-sg"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;51280&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;51280&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"udp"&lt;/span&gt;
    &lt;span class="nx"&gt;ipv6_cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;egress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"-1"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;ipv6_cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard_assume_role_policy"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;principals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Service"&lt;/span&gt;
      &lt;span class="nx"&gt;identifiers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ec2.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"wg-ec2-role"&lt;/span&gt;
  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wireguard_assume_role_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_instance_profile"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"wg-ec2-profile"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wireguard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And finally we can define the EC2 instance. It will be the cheapest and smallest possible one I can find, which is &lt;code&gt;t4g.nano&lt;/code&gt;. AMI ID is pulled from publicly shared SSM parameter (easier than AMI data source in Terraform). From networking I'm disabling public IPv4 assignment, forcing at least one IPv6 and disabling source destination check (more on that later). I didn't define any SSH connectivity, nor EC2 Instance Connect or Session Manager. The cheapest option here would be to define a key pair and open &lt;code&gt;22&lt;/code&gt; on IPv6 to your subnet if you need debugging.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3zcptdl3gik6sz0dd3ar.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3zcptdl3gik6sz0dd3ar.jpg" alt="EC2 instance diagram" width="637" height="304"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_instance"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ami&lt;/span&gt;                         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_ssm_parameter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;al2023_arm64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;
  &lt;span class="nx"&gt;instance_type&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t4g.nano"&lt;/span&gt;
  &lt;span class="nx"&gt;iam_instance_profile&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_instance_profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wireguard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_subnets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_security_group_ids&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wireguard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;associate_public_ip_address&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;ipv6_address_count&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nx"&gt;source_dest_check&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

  &lt;span class="nx"&gt;user_data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user_data&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Wireguard"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;lifecycle&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ignore_changes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;ami&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"ipv6"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wireguard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ipv6_addresses&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing and configuring Wireguard on the server
&lt;/h2&gt;

&lt;p&gt;How would we install the server if we have no SSH or any other shell access to the instance? As you see above, I have defined user data. This is a script that runs on first instance boot... if you just use pure Bash. For our use case, we want to be able to control instance contents dynamically. For that we will use cloud-init, which allows for more flexibility regarding user data contents. We will start with the draft defining "attachments" with both cloud-init's cloud-config (YAML) as well as standard Bash script that starts on boot. I'm writing this in a new file called &lt;code&gt;user-data.yaml&lt;/code&gt; (although it's not a valid YAML file).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Content-Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;multipart/mixed; boundary="//"&lt;/span&gt;
&lt;span class="na"&gt;MIME-Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.0&lt;/span&gt;

&lt;span class="s"&gt;--//&lt;/span&gt;
&lt;span class="na"&gt;Content-Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;text/cloud-config; charset="us-ascii"&lt;/span&gt;
&lt;span class="na"&gt;MIME-Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.0&lt;/span&gt;
&lt;span class="na"&gt;Content-Transfer-Encoding&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;7bit&lt;/span&gt;
&lt;span class="na"&gt;Content-Disposition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;attachment; filename="cloud-config.txt"&lt;/span&gt;

&lt;span class="c1"&gt;#cloud-config&lt;/span&gt;
&lt;span class="na"&gt;package_update&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="s"&gt;--//&lt;/span&gt;
&lt;span class="na"&gt;Content-Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;text/x-shellscript; charset="us-ascii"&lt;/span&gt;
&lt;span class="na"&gt;MIME-Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.0&lt;/span&gt;
&lt;span class="na"&gt;Content-Transfer-Encoding&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;7bit&lt;/span&gt;
&lt;span class="na"&gt;Content-Disposition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;attachment; filename="userdata.txt"&lt;/span&gt;

&lt;span class="c1"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="s"&gt;echo "Configuring WireGuard..."&lt;/span&gt;
&lt;span class="s"&gt;--//--&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;em&gt;Side note&lt;/em&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;If you know cloud-init, you might wonder why I want to use also user data. For&lt;/em&gt; &lt;em&gt;some reason &lt;code&gt;runcmd&lt;/code&gt; doesn't always execute when I want it to and &lt;code&gt;bootcmd&lt;/code&gt;&lt;/em&gt; &lt;em&gt;happens too early.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As you see we have two sections in this file. First, we have defined cloud-config that will just update packages on startup. Second script will also execute on first boot and just echo a message. Now what we can do is to configure each module to run on every restart. In cloud-config add this (insert between boundaries):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;#cloud-config&lt;/span&gt;
&lt;span class="na"&gt;package_update&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;cloud_final_modules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;scripts-user&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;always&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;cloud_config_modules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;write_files&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;always&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;package_update_upgrade_install&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;always&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now all the sections (that we define soon) will run on every instance reboot. This will give us some flexibility in changing the configuration (although requiring reboot but how often do you plan to change this 😄). Let us install required packages: Wireguard itself, iptables for routing capabilities and for convenience if you need to debug, &lt;code&gt;tmux&lt;/code&gt; and &lt;code&gt;htop&lt;/code&gt; can come in handy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;packages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;wireguard-tools&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;iptables-nft&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;htop&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tmux&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's proceed with Wireguard configuration. As previously stated, I want to use &lt;code&gt;10.155.222.0/24&lt;/code&gt; as the subnet and listen on port &lt;code&gt;51280&lt;/code&gt;. The private key will be a placeholder for safety reasons. When the tunnel is brought up, we are going to enable IP forwarding in the kernel and allow forwards between &lt;code&gt;wg0&lt;/code&gt; Wireguard's interface and &lt;code&gt;ens5&lt;/code&gt; (primary network card in AL2023 at least). The router address (sever) will be the first one in the subnet &lt;code&gt;10.155.222.1&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;write_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/wireguard/wg0.conf&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|-&lt;/span&gt;
      &lt;span class="s"&gt;[Interface]&lt;/span&gt;
      &lt;span class="s"&gt;Address    = 10.155.222.1/24&lt;/span&gt;
      &lt;span class="s"&gt;ListenPort = 51280&lt;/span&gt;
      &lt;span class="s"&gt;PrivateKey = _PRIVATE_KEY_&lt;/span&gt;

      &lt;span class="s"&gt;# Enable routing + NAT for WG clients to reach VPC&lt;/span&gt;
      &lt;span class="s"&gt;PostUp   = sysctl -w net.ipv4.ip_forward=1&lt;/span&gt;
      &lt;span class="s"&gt;PostUp   = iptables -A FORWARD -i wg0 -o ens5 -j ACCEPT&lt;/span&gt;
      &lt;span class="s"&gt;PostUp   = iptables -A FORWARD -i ens5 -o wg0 -j ACCEPT&lt;/span&gt;

      &lt;span class="s"&gt;PostDown = iptables -D FORWARD -i wg0 -o ens5 -j ACCEPT&lt;/span&gt;
      &lt;span class="s"&gt;PostDown = iptables -D FORWARD -i ens5 -o wg0 -j ACCEPT&lt;/span&gt;
      &lt;span class="s"&gt;# End of file&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This configuration is not yet usable until we create another part of the user data attachment. In the Bash script that is going to run on every machine startup, we are going to generate Wireguard's key pair, if it doesn't exist, and if it does, we will just replace the &lt;code&gt;_PRIVATE_KEY_&lt;/code&gt; placeholder with &lt;code&gt;sed&lt;/code&gt;. But that's not all! We also need the public key of the server after all to connect. As I don't want to need any SSH-like connectivity to this server, it will export the public key to AWS Systems Manager Parameter Store.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;--&lt;/span&gt;//
Content-Type: text/x-shellscript&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nv"&gt;charset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-ascii"&lt;/span&gt;
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nv"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"userdata.txt"&lt;/span&gt;

&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-eo&lt;/span&gt; pipefail

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Configuring WireGuard..."&lt;/span&gt;
&lt;span class="c"&gt;# If there's no private key, generate the private key into a file and derive&lt;/span&gt;
&lt;span class="c"&gt;# the public one also into a file.&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /etc/wireguard/private.key &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;wg genkey | &lt;span class="nb"&gt;tee&lt;/span&gt; /etc/wireguard/private.key | wg pubkey &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/wireguard/public.key
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Replace the private key placeholder if it exists in wg0.conf&lt;/span&gt;
&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"s#_PRIVATE_KEY_#&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /etc/wireguard/private.key&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;#g"&lt;/span&gt; /etc/wireguard/wg0.conf

&lt;span class="c"&gt;# Export the public key to SSM Parameter Store. Use IPv6 endpoint.&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Public key: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /etc/wireguard/public.key&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_USE_DUALSTACK_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;aws ssm put-parameter &lt;span class="nt"&gt;--type&lt;/span&gt; &lt;span class="s2"&gt;"String"&lt;/span&gt; &lt;span class="nt"&gt;--overwrite&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"/wireguard/public-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--value&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /etc/wireguard/public.key&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Start the tunnel.&lt;/span&gt;
wg-quick up wg0

&lt;span class="nt"&gt;--&lt;/span&gt;//--
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Currently if you run the script, the machine will not be able to write into Parameter Store because the IAM role doesn't have such permissions. Add the following policy to the previously defined role.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_caller_identity"&lt;/span&gt; &lt;span class="s2"&gt;"X"&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_region"&lt;/span&gt; &lt;span class="s2"&gt;"X"&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard_ssm_policy"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ssm:PutParameter"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:ssm:&lt;/span&gt;&lt;span class="k"&gt;${data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_region&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_caller_identity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account_id&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:parameter/wireguard/public-key"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard_ssm_policy"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"wg-ssm-policy"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wireguard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wireguard_ssm_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As the last part, add the user data local variable. I will use &lt;code&gt;templatefile&lt;/code&gt; because we are going to do some dynamic things later, so it will come in handy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;user_data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;templatefile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;module}&lt;/span&gt;&lt;span class="s2"&gt;/user-data.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you now deploy this infrastructure, after some minutes you should get a public key in SSM Parameter Store under &lt;code&gt;wireguard/public-key&lt;/code&gt;. You can use it to configure the client connections. You can also use the following command to get the public key value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ssm get-parameter &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--name&lt;/span&gt; /wireguard/public-key &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--query&lt;/span&gt; Parameter.Value &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--output&lt;/span&gt; text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcqlnup2vlths6gmvxsx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcqlnup2vlths6gmvxsx.png" alt="Public key in parameter store" width="800" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up client
&lt;/h2&gt;

&lt;p&gt;On the client side, so my DGX Spark, I will now SSH and also install Wireguard. I am using default Ubuntu installation. We are going to generate a new private key, get the public key and save the configuration. You need to run the following commands under root.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nb"&gt;install &lt;/span&gt;wireguard wireguard-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Define some variables that are public key from SSM Parameter store and output with IPv6 from Terraform. Also configure the path where you want to keep the configuration. I will use &lt;code&gt;wg1&lt;/code&gt; interface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;WG_SERVER_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sJQTQ7sytRuLBK74Y24TNtXrHqrmpRb+ZsT9olMfXQ4="&lt;/span&gt; &lt;span class="c"&gt;# Key from SSM&lt;/span&gt;
&lt;span class="nv"&gt;WG_SERVER_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"2a05:d012:e21:2345:6789:01ab:cdef:9dd7"&lt;/span&gt; &lt;span class="c"&gt;# IPv6 from AWS&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /etc/wireguard/
&lt;span class="nv"&gt;WG_CONFIG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/etc/wireguard/wg1.conf
&lt;span class="nv"&gt;WG_PRIVATE_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;wg genkey&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;WG_PUBLIC_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$WG_PRIVATE_KEY&lt;/span&gt; | wg pubkey&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Public key = &lt;/span&gt;&lt;span class="nv"&gt;$WG_PUBLIC_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then generate the configuration. In case you have some firewall, select a port that you want to use for listening on incoming VPN traffic. Choose a unique address from the VPN subnet pool. Optionally set the DNS resolver to the AWS VPC one (second address of subnet's CIDR). &lt;code&gt;AllowedIPs&lt;/code&gt; is an unfortunate name but these are the routes that should go through the VPN - I set them to VPC CIDR and VPN's internal subnet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$WG_CONFIG&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
[Interface]
PrivateKey = &lt;/span&gt;&lt;span class="nv"&gt;$WG_PRIVATE_KEY&lt;/span&gt;&lt;span class="sh"&gt;
# Public Key: &lt;/span&gt;&lt;span class="nv"&gt;$WG_PUBLIC_KEY&lt;/span&gt;&lt;span class="sh"&gt;
Address = 10.155.222.3/32
DNS = 10.189.80.2    # Optional
ListenPort = 62910   # You can skip this if you don't have firewall

[Peer]
PublicKey = &lt;/span&gt;&lt;span class="nv"&gt;$WG_SERVER_KEY&lt;/span&gt;&lt;span class="sh"&gt;
AllowedIPs = 10.189.80.0/21, 10.155.222.0/24 # Connectivity to VPC and VPN
PersistentKeepalive = 25
Endpoint = [&lt;/span&gt;&lt;span class="nv"&gt;$WG_SERVER_IP&lt;/span&gt;&lt;span class="sh"&gt;]:51280
&lt;/span&gt;&lt;span class="no"&gt;
EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Allowing new client on the Wireguard server
&lt;/h2&gt;

&lt;p&gt;As you get the new public key for your home client, we need to now enable it on the Wireguard server. As you remember we used &lt;code&gt;templatefile&lt;/code&gt; to load the user data. This will come useful now as we will be able to configure multiple clients. Let's revisit the &lt;code&gt;write_files&lt;/code&gt; section. Modify the end of file, after iptables commands.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;write_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/wireguard/wg0.conf&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|-&lt;/span&gt;
      &lt;span class="s"&gt;[Interface]&lt;/span&gt;
      &lt;span class="s"&gt;Address    = 10.155.222.1/24&lt;/span&gt;
      &lt;span class="s"&gt;ListenPort = 51280&lt;/span&gt;
      &lt;span class="s"&gt;PrivateKey = _PRIVATE_KEY_&lt;/span&gt;

      &lt;span class="s"&gt;# Enable routing + NAT for WG clients to reach VPC&lt;/span&gt;
      &lt;span class="s"&gt;PostUp   = sysctl -w net.ipv4.ip_forward=1&lt;/span&gt;
      &lt;span class="s"&gt;PostUp   = iptables -A FORWARD -i wg0 -o ens5 -j ACCEPT&lt;/span&gt;
      &lt;span class="s"&gt;PostUp   = iptables -A FORWARD -i ens5 -o wg0 -j ACCEPT&lt;/span&gt;

      &lt;span class="s"&gt;PostDown = iptables -D FORWARD -i wg0 -o ens5 -j ACCEPT&lt;/span&gt;
      &lt;span class="s"&gt;PostDown = iptables -D FORWARD -i ens5 -o wg0 -j ACCEPT&lt;/span&gt;

      &lt;span class="s"&gt;%{~ for peer in peers ~}&lt;/span&gt;
      &lt;span class="s"&gt;[Peer]&lt;/span&gt;
      &lt;span class="s"&gt;PublicKey = ${peer.public_key}&lt;/span&gt;
      &lt;span class="s"&gt;AllowedIPs = ${peer.address}/32&lt;/span&gt;
      &lt;span class="s"&gt;%{~ endfor ~}&lt;/span&gt;

      &lt;span class="s"&gt;# End of file&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above for loop will generate multiple clients on the Wireguard server. Now in the template variables you have to set &lt;code&gt;peers&lt;/code&gt; map with &lt;code&gt;public_key&lt;/code&gt; and &lt;code&gt;address&lt;/code&gt; keys. Revisit the locals in EC2 instance. Applying this will reboot the instance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;user_data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;templatefile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"user-data.yaml"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;peers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;address&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.155.222.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# the IP you selected for the client&lt;/span&gt;
        &lt;span class="nx"&gt;public_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"bxmMoVvXlVVRg7uaTnxI6Vf7wxeI0XWj5d6zREqDkzk="&lt;/span&gt; &lt;span class="c1"&gt;# the public key of the client&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now if you bring up the there should be a status about latest handshake and some data that is received.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;wg-quick up wg1
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="c"&gt;#] ip link add wg1 type wireguard&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="c"&gt;#] wg setconf wg1 /dev/fd/63&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="c"&gt;#] ip -4 address add 10.155.222.3/32 dev wg1&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="c"&gt;#] ip link set mtu 1420 up dev wg1&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="c"&gt;#] ip -4 route add 10.155.222.0/24 dev wg1&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="c"&gt;#] ip -4 route add 10.189.80.0/21 dev wg1&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;wg
interface: wg1
  public key: &lt;span class="nv"&gt;bxmMoVvXlVVRg7uaTnxI6Vf7wxeI0XWj5d6zREqDkzk&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
  private key: &lt;span class="o"&gt;(&lt;/span&gt;hidden&lt;span class="o"&gt;)&lt;/span&gt;
  listening port: 62910

peer: sJQTQ7sytRuLBK74Y24TNtXrHqrmpRb+ZsT9olMfXQ4&lt;span class="o"&gt;=&lt;/span&gt;
  endpoint: &lt;span class="o"&gt;[&lt;/span&gt;2a05:d012:e21:2345:6789:01ab:cdef:9dd7]:51280
  allowed ips: 10.189.80.0/21, 10.155.222.0/24
  latest handshake: 22 seconds ago
  transfer: 92 B received, 180 B sent
  persistent keepalive: every 25 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I have created a test internal application load balancer. It will listen for CIDR &lt;code&gt;10.155.222.0&lt;/code&gt; (&lt;strong&gt;not&lt;/strong&gt; VPC CIDR!) on port 80. But before this can be used you also have to define routes for Wireguard's subnet. This is the reason for turning off source-destination check on the network card of the EC2 instance. Without that feature, packets destined for Wireguard (whether requests or responses) will be accepted by the EC2 instance even if the destination isn't any of the instance's IPs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_route"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard_tunnel_prefix"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;for_each&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_route_table_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;public_route_table_ids&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nx"&gt;route_table_id&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;
  &lt;span class="nx"&gt;destination_cidr_block&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.155.222.0/24"&lt;/span&gt;
  &lt;span class="nx"&gt;network_interface_id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wireguard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;primary_network_interface_id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Afterwards I tested with cURL and the connectivity was established! The test IPs are private range as you see below. I even tried DNS and it was also functional through the Wireguard interface - that way we can later set up some private domains or use Cloud Map.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;curl http://internal-mytest-alb-1234567890.eu-west-3.elb.amazonaws.com
hello world

&lt;span class="nv"&gt;$ &lt;/span&gt;dig +short internal-mytest-alb-1234567890.eu-west-3.elb.amazonaws.com
10.189.83.70
10.189.85.144

&lt;span class="nv"&gt;$ &lt;/span&gt;resolvectl query internal-mytest-alb-1234567890.eu-west-3.elb.amazonaws.com
internal-mytest-alb-1234567890.eu-west-3.elb.amazonaws.com: 10.189.83.70 &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nb"&gt;link&lt;/span&gt;: wg1
                                                            10.189.85.144 &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nb"&gt;link&lt;/span&gt;: wg1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing LLM server
&lt;/h2&gt;

&lt;p&gt;Now we need to install and configure Ollama. Just follow the instructions on &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;ollama.com&lt;/a&gt; to install it, or you can use any other LLM server you wish. Be sure that it is listening on all addresses and not just local host. Create the following override in SystemD (on Ubuntu).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/systemd/system/ollama.service.d/override.conf &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_CONTEXT_LENGTH=212000"
Environment="OLLAMA_KV_CACHE_TYPE=q8_0"
Environment="OLLAMA_KEEP_ALIVE=2400"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; ollama
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can alternatively run it in Docker. It should be supported on DGX Spark out of the box to use it with the GPU. Choose only one or the other because they occupy the same port in the command below!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--gpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;all &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-v&lt;/span&gt; ollama:/root/.ollama &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-p&lt;/span&gt; 11434:11434 &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--name&lt;/span&gt; ollama &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;unless-stopped &lt;span class="se"&gt;\&lt;/span&gt;
 ollama/ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For direct installations, you can just use &lt;code&gt;ollama pull &amp;lt;model&amp;gt;&lt;/code&gt; to download the model in advance. I will use Nvidia's Nemotron 3 Super which was one of the best medium-sized models when I started writing this post. However, Gemma 4 and Qwen 3.6 were also released so you can experiment with that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull nemotron-3-super:120b-a12b-q4_K_M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connecting from an EC2 instance
&lt;/h2&gt;

&lt;p&gt;I will create another EC2 instance which will be used for OpenClaw or any other system as you want such as Hermes or just a web app for chatting. I will prepare the instance first, will use Ubuntu 24.04 and &lt;code&gt;t4g.medium&lt;/code&gt; instance. I will also create a new user data script that will bootstrap some of the required packages. As we have IPv6 outbound connectivity from private subnet, APT repositories should work without issues. I will also enable SSH access from Wireguard's inner subnet so that I can SSH to the instance, but you can alternatively use IPv6 when moving to public subnet or via SSM Systems Manager if you configure it, it's up to you.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_ssm_parameter"&lt;/span&gt; &lt;span class="s2"&gt;"ubuntu_2404"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/aws/service/canonical/ubuntu/server/24.04/stable/current/arm64/hvm/ebs-gp3/ami-id"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"agent"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"openclaw-agent-sg"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;22&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;22&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.155.222.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;egress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"-1"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;ipv6_cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_instance"&lt;/span&gt; &lt;span class="s2"&gt;"agent"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;ami&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_ssm_parameter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ubuntu_2404&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;
  &lt;span class="nx"&gt;instance_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"t4g.medium"&lt;/span&gt;
  &lt;span class="nx"&gt;user_data&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"openclaw.yaml"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;tags&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"openclaw-agent"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_security_group_ids&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;associate_public_ip_address&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;ipv6_address_count&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

  &lt;span class="nx"&gt;metadata_options&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;http_endpoint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"enabled"&lt;/span&gt;
    &lt;span class="nx"&gt;http_tokens&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"required"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;root_block_device&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;volume_size&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
    &lt;span class="nx"&gt;volume_type&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"gp3"&lt;/span&gt;
    &lt;span class="nx"&gt;encrypted&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;delete_on_termination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;lifecycle&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ignore_changes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;ami&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"private_ip"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_ip&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the system config I will do the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add Node.js APT repository of version 24,&lt;/li&gt;
&lt;li&gt;install Node.js and unattended upgrades,&lt;/li&gt;
&lt;li&gt;enable unattended upgrades,&lt;/li&gt;
&lt;li&gt;enable AWS SSM (optional but useful).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I will also create a separate user for OpenClaw so that it can have its own home directory and permissions. It is also very important to create the default user as this will allow you to SSH to the instance to onboard OpenClaw. If you wish you can also specify SSH keys in here or via AWS key pairs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;#cloud-config&lt;/span&gt;
&lt;span class="na"&gt;cloud_final_modules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;scripts-user&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;always&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;cloud_config_modules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;write_files&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;always&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;apt_configure&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;always&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;package_update_upgrade_install&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;always&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openclaw-agent&lt;/span&gt;
&lt;span class="na"&gt;create_hostname_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;apt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nodejs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;keyid&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2F59B5F99B1BE0B4&lt;/span&gt;
      &lt;span class="na"&gt;keyserver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;keyserver.ubuntu.com&lt;/span&gt;
      &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deb [signed-by=$KEY_FILE] https://deb.nodesource.com/node_24.x nodistro main&lt;/span&gt;

&lt;span class="na"&gt;package_update&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;package_upgrade&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;packages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nodejs&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;unattended-upgrades&lt;/span&gt;

&lt;span class="na"&gt;users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openclaw&lt;/span&gt;
    &lt;span class="na"&gt;uid&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2200&lt;/span&gt;

&lt;span class="na"&gt;ssh_authorized_keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPgVNNOeuUqMgobgeIIkndXXYekOmC/e5bqty3f0UXDa my-ssh-key&lt;/span&gt;

&lt;span class="na"&gt;write_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/apt/apt.conf.d/20auto-upgrades&lt;/span&gt;
    &lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0644"&lt;/span&gt;
    &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root:root&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;APT::Periodic::Update-Package-Lists "1";&lt;/span&gt;
      &lt;span class="s"&gt;APT::Periodic::Unattended-Upgrade "1";&lt;/span&gt;

&lt;span class="na"&gt;runcmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;systemctl enable --now unattended-upgrades || &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;loginctl enable-linger openclaw || &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="c1"&gt;# Enable SystemD on openclaw's user&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing OpenClaw - with a caveat 😳
&lt;/h2&gt;

&lt;p&gt;I SSH'd into the instance and Nodejs should already be there based on the provided user data. So I switched user to the new &lt;code&gt;openclaw&lt;/code&gt; one I defined and started installation with NPM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh ubuntu@&lt;span class="si"&gt;$(&lt;/span&gt;tofu output &lt;span class="nt"&gt;-raw&lt;/span&gt; private_ip&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="c"&gt;# replace with your private IP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ssh"&gt;&lt;code&gt;&lt;span class="k"&gt;The&lt;/span&gt; authenticity of host '10.189.83.247 (10.189.83.247)' can't be established.
&lt;span class="k"&gt;ED25519&lt;/span&gt; key fingerprint is: SHA256:iMfJBU8iSEc5ikspbNKGD8jCAlLGwrOs28lbI4aPw2Q
&lt;span class="k"&gt;This&lt;/span&gt; key is not known by any other names.
&lt;span class="k"&gt;Are&lt;/span&gt; you sure you want to continue connecting (yes/no/[fingerprint])? &lt;span class="no"&gt;yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside the machine&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;su openclaw &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /bin/bash
npm &lt;span class="nb"&gt;install &lt;/span&gt;openclaw@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="c"&gt;...
&lt;/span&gt;&lt;span class="go"&gt;4564 error A git connection error occurred
4565 error command git --no-replace-objects ls-remote ssh://git@github.com/whiskeysockets/libsignal-node.git
4566 error ssh: connect to host github.com port 22: Connection timed out
4566 error fatal: Could not read from remote repository.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This timeout happens because we don't have public IPv4 connectivity! There are two standard solutions - public IP for the instance or NAT Gateway or...&lt;/p&gt;

&lt;h3&gt;
  
  
  Hosting tinyproxy on DGX Spark
&lt;/h3&gt;

&lt;p&gt;As we already have connection to other VPN places we can simply use one of the machines on the network as the exit to IPv4 internet. As a bonus we retain our residential IP! So let's spin it up in a Docker but for that we need to make some configuration first in &lt;code&gt;Dockerfile&lt;/code&gt; and &lt;code&gt;tinyproxy.conf&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; alpine:latest&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;apk add tinyproxy
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tinyproxy.conf /etc/tinyproxy/tinyproxy.conf&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8888&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/usr/bin/tinyproxy"]&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["-d"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;Port&lt;/span&gt; &lt;span class="err"&gt;8888&lt;/span&gt;
&lt;span class="err"&gt;Timeout&lt;/span&gt; &lt;span class="err"&gt;600&lt;/span&gt;
&lt;span class="err"&gt;MaxClients&lt;/span&gt; &lt;span class="err"&gt;100&lt;/span&gt;
&lt;span class="err"&gt;ViaProxyName&lt;/span&gt; &lt;span class="err"&gt;"tinyproxy"&lt;/span&gt;

&lt;span class="err"&gt;User&lt;/span&gt; &lt;span class="err"&gt;nobody&lt;/span&gt;
&lt;span class="err"&gt;Group&lt;/span&gt; &lt;span class="err"&gt;nobody&lt;/span&gt;

&lt;span class="err"&gt;DefaultErrorFile&lt;/span&gt; &lt;span class="err"&gt;"/usr/share/tinyproxy/default.html"&lt;/span&gt;
&lt;span class="err"&gt;StatFile&lt;/span&gt; &lt;span class="err"&gt;"/usr/share/tinyproxy/stats.html"&lt;/span&gt;
&lt;span class="err"&gt;LogLevel&lt;/span&gt; &lt;span class="err"&gt;Info&lt;/span&gt;

&lt;span class="err"&gt;Allow&lt;/span&gt; &lt;span class="err"&gt;127.0.0.1&lt;/span&gt;
&lt;span class="py"&gt;Allow&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;:1&lt;/span&gt;
&lt;span class="err"&gt;Allow&lt;/span&gt; &lt;span class="err"&gt;10.155.222.0/24&lt;/span&gt;
&lt;span class="err"&gt;Allow&lt;/span&gt; &lt;span class="err"&gt;10.189.80.0/21&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From that config we can easily build the new image for Tinyproxy and set it up so that it starts on boot. Of course then all this connectivity will rely on our local machine being up, so it's only usable for some of IPv4 requirements such as GitHub.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; local-tinyproxy:latest &lt;span class="nb"&gt;.&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--name&lt;/span&gt; tinyproxy &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--restart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;unless-stopped &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-p&lt;/span&gt; 8989:8888 &lt;span class="se"&gt;\&lt;/span&gt;
 local-tinyproxy:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can easily direct the proxy to the new service within Wireguard's network. However, in order to do this, we need to open the port &lt;code&gt;8989&lt;/code&gt; (and &lt;code&gt;11434&lt;/code&gt; for Ollama) on Wireguard's instance. You might ask why is that? So any packet sent from OpenClaw's instance in that direction will look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;source address: &lt;code&gt;10.189.83.247&lt;/code&gt; (example),&lt;/li&gt;
&lt;li&gt;destination address: &lt;code&gt;10.155.222.3&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;source port: &lt;code&gt;59123&lt;/code&gt; (example),&lt;/li&gt;
&lt;li&gt;destination port: &lt;code&gt;8989&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Filtering on AWS security group level cares about source address and destination port rather than anything else. Destination address is taken care by "source-destination" check of the network interface - the feature we just disabled. Let's update our security group.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"wireguard"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name_prefix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"wireguard-sg"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;51280&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;51280&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"udp"&lt;/span&gt;
    &lt;span class="nx"&gt;ipv6_cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11434&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11434&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;security_groups&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8989&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8989&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;security_groups&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;egress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"-1"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;ipv6_cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9p5otcbagpnyidjkz1aq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9p5otcbagpnyidjkz1aq.jpg" alt="Connection between OpenClaw and DGX over VPN" width="764" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now you can configure NPM and Git to use the proxy and OpenClaw should be able to install with no issues. You can also test the connectivity with cURL even, if it responds with 500, this is fine; if 403, this might be a problem with &lt;code&gt;tinyproxy.conf&lt;/code&gt;. If this cURL command shows timeout or "couldn't connect to server", this can be security groups, routes or other firewall.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://10.155.222.3:8989 &lt;span class="nt"&gt;-X&lt;/span&gt; CONNECT | &lt;span class="nb"&gt;grep &lt;/span&gt;title
&lt;span class="c"&gt;# Test results&lt;/span&gt;
&lt;span class="c"&gt;# &amp;lt;title&amp;gt;500 Unable to connect&amp;lt;/title&amp;gt;&lt;/span&gt;
npm &lt;span class="nb"&gt;set &lt;/span&gt;https-proxy&lt;span class="o"&gt;=&lt;/span&gt;http://10.155.222.3:8989
npm &lt;span class="nb"&gt;set &lt;/span&gt;&lt;span class="nv"&gt;proxy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://10.155.222.3:8989
git config &lt;span class="nt"&gt;--global&lt;/span&gt; http.proxy http://10.155.222.3:8989
git config &lt;span class="nt"&gt;--global&lt;/span&gt; https.proxy http://10.155.222.3:8989
npm &lt;span class="nb"&gt;install &lt;/span&gt;openclaw@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  OpenClaw - Onboard!
&lt;/h2&gt;

&lt;p&gt;And we are almost done! The only thing we now need is to follow the onboarding process. Use Ollama provider in local mode, set the correct DGX's IP over Wireguard and choose the model from the list!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;XDG_RUNTIME_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/run/user/&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="c"&gt;# for systemd support&lt;/span&gt;
npx openclaw onboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6roww7inka2rusu3zdp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn6roww7inka2rusu3zdp.png" alt="OpenClaw onboarding" width="610" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I will demonstrate usage via TUI rather than any instant messenger here. The first load time for the latest build of OpenClaw took around 1:30 minutes with Nemotron 3 Super (q4). After resetting the session, first message took around 20 seconds (to load 12k context), so most of the time was loading the model into VRAM. Keeping model in memory is controllable on Ollama's side. For each subsequent message, there's some more time needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx openclaw tui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmfxiz0oswkh5rcsdhsr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmfxiz0oswkh5rcsdhsr.png" alt="OpenClaw first message" width="518" height="209"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifveulpztugtv33r23jb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fifveulpztugtv33r23jb.png" alt="OpenClaw any other message" width="518" height="209"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To have some performance comparison, I decided to also try running the agent on latest Qwen 3.6 35B (&lt;code&gt;qwen3.6:35b-a3b-q8_0&lt;/code&gt;). The startup took around 30 seconds and each message takes maybe 10.&lt;/p&gt;

&lt;h2&gt;
  
  
  Power usages
&lt;/h2&gt;

&lt;p&gt;I decided to also order a meter that provided how much power the DGX machine draws in different situations. When it's completely idle, it takes around 30W, with the model loaded to memory but unused it's around 40W and during response generation it oscillates around 170W. Let's do some assumptions - when you sleep you don't use OpenClaw at all but you keep DGX Spark on, so it takes 30W for 7 hours. You are a very heavy user, writing to OpenClaw all day, scheduling a lot of tasks basically treating it as a thinking extension which totals to 8 hours of pure generative work. For all the other time it just sits idle but loaded to memory.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;7 h * 30 W = 210 Wh&lt;/li&gt;
&lt;li&gt;8 h * 170 W = 1360 Wh&lt;/li&gt;
&lt;li&gt;9 h * 40 W = 360 Wh&lt;/li&gt;
&lt;li&gt;in total it is about 2 kWh&lt;/li&gt;
&lt;li&gt;assuming price in Germany is 0.5€ for a kilowatt-hour, it is 1€ per day, 30€ per month, just for pure token generation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Obviously you also have to consider costs for AWS EC2 Instance. For the VPN server, if you commit for a year to EC2 saving plans, you will pay around 25€, for the agent instance, if it's online all year round this is 200€ (but note that OpenClaw is especially heavy compared to other harnesses).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxbf4jjt8youfqi2ymha.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxbf4jjt8youfqi2ymha.jpg" alt="Watt measurements" width="800" height="222"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you use small model like Qwen 35B in Q8, you will still have space in VRAM for some image generator like Z-Image-Turbo, a small TTS model or Whisper Turbo for speech recognition. I managed to easily fit image generation and VibeVoice ASR along with the chatbot model in 100 gigs of RAM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03r9u0n4n3hx4k6gm5g2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03r9u0n4n3hx4k6gm5g2.png" alt="100G RAM filled up" width="800" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cons of this setup
&lt;/h2&gt;

&lt;p&gt;Of course each of such setups comes with a tradeoff. You keep your privacy, you maybe pay less for inference (that's debatable) but that's about it. The best setup is to have GB10 as the primary model provider and fall back on something small on OpenRouter. There's always a possibility to mix multiple models and providers using subagents. For example you want to perform a coding task, you use Sonnet 4.6 but you keep local Qwen 3.6 for orchestration. Such setup is far from highly available. Not only can the machine break, you can have blackout or Internet can be down in your flat. Another tradeoff is speed - most of the medium sized models from OpenAI or Anthropic will nevertheless run faster than any decent model on DGX Spark.&lt;/p&gt;

&lt;p&gt;The prefill (loading context) speed is 500 tokens per second for Nemotron 3 Super and token generation is 20 tokens per second. Assuming that in each 15 minute window we have to load 200k context, we can generate 10k tokens, that makes it (within 8 hour daily generation) 6.4 M tokens input and 0.32 M tokens output. Prompt prefill accounts for around 40% of the time spent so from a daily spend, 40 cents will go to input tokens and 60 cents to output tokens. Normalized this is around 0.06€ per million input tokens and 1.88€ per million output tokens.&lt;/p&gt;

&lt;p&gt;For Qwen 3.6 the speeds look different: 1150 tokens/s prefill and 39 tokens/s generation. Then in 15 minute window (200k input context) the prefill will account for 20% of the generation time. 80% of the time left will be for token generation and this will produce 28k tokens. So within a day we get 6.4 M input and 0.896 M output tokens, normalized this makes it 0.03€/Mtok input and 0.89€/Mtok output.&lt;/p&gt;

&lt;p&gt;Is any of these competitive? It highly depends on your use case and usage patterns; whatever I did above is just napkin maths. Gemma 4 31B over OpenRouter is just $0.13/$0.38 in/out but GPT-5.4 Nano is $0.20/$1.25.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>nvidia</category>
      <category>vpn</category>
      <category>aws</category>
    </item>
    <item>
      <title>Serverless CDC and Event Ingestion Patterns into Analytics Pipelines on AWS</title>
      <dc:creator>Renaldi</dc:creator>
      <pubDate>Sun, 19 Apr 2026 23:00:00 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/serverless-cdc-and-event-ingestion-patterns-into-analytics-pipelines-on-aws-4i56</link>
      <guid>https://future.forem.com/aws-builders/serverless-cdc-and-event-ingestion-patterns-into-analytics-pipelines-on-aws-4i56</guid>
      <description>&lt;p&gt;When I work on analytics pipelines for event-driven systems, one of the biggest mistakes I see is treating ingestion as “just connect source A to sink B.”&lt;/p&gt;

&lt;p&gt;In production, ingestion is where a lot of the hard engineering lives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deciding which transport is actually right (EventBridge vs Kinesis vs SQS)&lt;/li&gt;
&lt;li&gt;handling ordering, duplication, and replay&lt;/li&gt;
&lt;li&gt;transforming events into a canonical analytics schema&lt;/li&gt;
&lt;li&gt;delivering to multiple sinks like S3, OpenSearch, and Redshift&lt;/li&gt;
&lt;li&gt;keeping the design cost-efficient as volume grows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why I like this topic. It is architecture-heavy, it shows real trade-offs, and it comes up constantly in real workloads.&lt;/p&gt;

&lt;p&gt;In this post, I will walk through a practical pattern for serverless CDC/event ingestion into analytics pipelines on AWS, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EventBridge vs Kinesis vs SQS decisioning&lt;/li&gt;
&lt;li&gt;Lambda transformations (normalization, enrichment, routing)&lt;/li&gt;
&lt;li&gt;delivery patterns to S3 / OpenSearch / Redshift&lt;/li&gt;
&lt;li&gt;handling ordering, duplication, and replay&lt;/li&gt;
&lt;li&gt;partitioning and cost optimization&lt;/li&gt;
&lt;li&gt;an end-to-end walkthrough and implementation discussion with code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I will focus on patterns that are accurate, scalable, and maintainable rather than “one service solves everything.”&lt;/p&gt;




&lt;h2&gt;
  
  
  The design principle I start with
&lt;/h2&gt;

&lt;p&gt;I design ingestion in layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ingress transport for delivery semantics (routing, throughput, ordering, buffering)&lt;/li&gt;
&lt;li&gt;Transformation layer for canonicalization and enrichment&lt;/li&gt;
&lt;li&gt;Durable landing zone (usually S3 first)&lt;/li&gt;
&lt;li&gt;Serving/analytics sinks (OpenSearch, Redshift, dashboards, ML features, etc.)&lt;/li&gt;
&lt;li&gt;Replay and recovery path as a first-class capability&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That structure helps me evolve the system without constantly rewriting downstream consumers.&lt;/p&gt;




&lt;h2&gt;
  
  
  EventBridge vs Kinesis vs SQS decisioning
&lt;/h2&gt;

&lt;p&gt;This is the first architectural decision, and it shapes everything else.&lt;/p&gt;

&lt;p&gt;The short version is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EventBridge is great for event routing and integration&lt;/li&gt;
&lt;li&gt;Kinesis Data Streams is great for high-throughput ordered streaming plus replay&lt;/li&gt;
&lt;li&gt;SQS is great for buffering and decoupled async processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I do not treat them as mutually exclusive. In many production designs, I use two or even all three, each for what it is best at.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick decision guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Amazon EventBridge when I need
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;event routing between services and teams&lt;/li&gt;
&lt;li&gt;content-based filtering and fan-out&lt;/li&gt;
&lt;li&gt;SaaS integrations and AWS service events&lt;/li&gt;
&lt;li&gt;schema governance and event contracts&lt;/li&gt;
&lt;li&gt;archive/replay on the event bus (for supported replay workflows)&lt;/li&gt;
&lt;li&gt;lower-to-moderate throughput domain events where strict ordering is not required&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Use Amazon Kinesis Data Streams when I need
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;high-throughput event or CDC ingestion&lt;/li&gt;
&lt;li&gt;ordering per partition key&lt;/li&gt;
&lt;li&gt;multiple independent consumers at stream scale&lt;/li&gt;
&lt;li&gt;explicit replay from stream retention&lt;/li&gt;
&lt;li&gt;near-real-time analytics pipelines with controlled parallelism&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Use Amazon SQS when I need
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;durable buffering and backpressure absorption&lt;/li&gt;
&lt;li&gt;decoupling between producers and consumers&lt;/li&gt;
&lt;li&gt;cheap asynchronous processing&lt;/li&gt;
&lt;li&gt;retry isolation and DLQ handling&lt;/li&gt;
&lt;li&gt;workload smoothing (especially spiky ingest)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  My common production pattern
&lt;/h3&gt;

&lt;p&gt;I often use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EventBridge for domain routing&lt;/li&gt;
&lt;li&gt;Kinesis for analytics ingestion backbone&lt;/li&gt;
&lt;li&gt;SQS for retry/backpressure side paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives me clean producer contracts and strong ingestion behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  What each service is not
&lt;/h2&gt;

&lt;p&gt;I find it useful to say this explicitly during architecture reviews.&lt;/p&gt;

&lt;h3&gt;
  
  
  EventBridge is not a high-throughput ordered stream
&lt;/h3&gt;

&lt;p&gt;It is excellent for routing, but it does not give me shard-style ordering or stream-style replay semantics like Kinesis retention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kinesis is not a drop-in replacement for event bus routing
&lt;/h3&gt;

&lt;p&gt;It gives throughput and ordering, but not the same out-of-the-box event routing and filtering ergonomics as EventBridge.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQS is not an analytics event backbone by itself
&lt;/h3&gt;

&lt;p&gt;It is amazing for buffering, but replay, retention, and consumer fan-out semantics are different from Kinesis, and standard queues do not preserve ordering.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reference architecture at a glance
&lt;/h2&gt;

&lt;p&gt;For this post, I will use a practical hybrid pattern that I use often for analytics ingestion:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;application and domain events are published to EventBridge&lt;/li&gt;
&lt;li&gt;CDC or high-volume events go to Kinesis Data Streams (directly or via a CDC bridge)&lt;/li&gt;
&lt;li&gt;Lambda transformer normalizes records into a canonical analytics schema&lt;/li&gt;
&lt;li&gt;canonical events are delivered to:

&lt;ul&gt;
&lt;li&gt;S3 (primary durable analytics landing zone, partitioned)&lt;/li&gt;
&lt;li&gt;OpenSearch (near-real-time search and observability use cases)&lt;/li&gt;
&lt;li&gt;Redshift Serverless (warehouse analytics, usually S3-first load pattern)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;SQS is used for retry isolation and backpressure for sink-specific processors&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Mermaid diagram (reference architecture)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuskg431o9qz8eil2phoj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuskg431o9qz8eil2phoj.png" alt=" " width="800" height="182"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  End-to-end walkthrough (what I will build conceptually)
&lt;/h2&gt;

&lt;p&gt;To make this concrete, I will walk through an example using an e-commerce platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;domain events like &lt;code&gt;OrderPlaced&lt;/code&gt; and &lt;code&gt;OrderShipped&lt;/code&gt; are published on EventBridge&lt;/li&gt;
&lt;li&gt;high-volume change events (for example inventory or order status updates) are ingested via Kinesis&lt;/li&gt;
&lt;li&gt;a Lambda transformer converts everything into a canonical analytics event&lt;/li&gt;
&lt;li&gt;events land in S3 as compressed JSON (or Parquet via Firehose conversion)&lt;/li&gt;
&lt;li&gt;selected events are indexed into OpenSearch&lt;/li&gt;
&lt;li&gt;Redshift loads from S3 for warehouse analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why I like this pattern
&lt;/h3&gt;

&lt;p&gt;It lets me separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operational event routing (EventBridge)&lt;/li&gt;
&lt;li&gt;analytics ingestion behavior (Kinesis)&lt;/li&gt;
&lt;li&gt;durable storage and replay (S3 + retention)&lt;/li&gt;
&lt;li&gt;sink-specific delivery (OpenSearch, Redshift)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives me a pipeline that is easier to evolve as analytics use cases grow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Canonical event schema (the contract that keeps the pipeline sane)
&lt;/h2&gt;

&lt;p&gt;Before I write any code, I define a canonical schema. This is one of the highest-leverage things I do in analytics ingestion.&lt;/p&gt;

&lt;p&gt;I do not want every downstream consumer decoding a different source format.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example canonical schema
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"evt_01HXYZ..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"order.placed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"commerce.orders"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tenant_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"entity_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"entity_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ord_987"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"occurred_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-25T10:15:30Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ingested_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-25T10:15:31Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trace_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trace-abc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"idempotency_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"order.placed:tenant_123:ord_987:v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sequence_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tenant_123#ord_987"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"payload"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cus_1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;149.90&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"transport"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eventbridge"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"schema_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"order-events"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"schema_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.4.0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fields I specifically care about
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;event_id&lt;/code&gt;: unique event identity for dedupe and tracing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;occurred_at&lt;/code&gt;: source event time (for analytics)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ingested_at&lt;/code&gt;: pipeline time (for operations)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sequence_key&lt;/code&gt;: ordering scope (important for Kinesis partitioning and reasoning)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;idempotency_key&lt;/code&gt;: sink-safe dedupe key when replaying or retrying&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;event_version&lt;/code&gt;: schema evolution support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I skip this step, the pipeline quickly becomes fragile.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reference implementation pattern (AWS services)
&lt;/h2&gt;

&lt;p&gt;For this walkthrough, the main flow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;EventBridge receives domain events&lt;/li&gt;
&lt;li&gt;EventBridge rule forwards analytics-relevant events to Kinesis Data Streams&lt;/li&gt;
&lt;li&gt;High-volume event or CDC sources also publish to Kinesis Data Streams&lt;/li&gt;
&lt;li&gt;Lambda transformer consumes Kinesis batches&lt;/li&gt;
&lt;li&gt;Lambda normalizes and enriches records and writes:

&lt;ul&gt;
&lt;li&gt;primary path to Firehose -&amp;gt; S3&lt;/li&gt;
&lt;li&gt;selective path to SQS for OpenSearch indexing&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Redshift Serverless loads from S3 (COPY and MERGE pattern)&lt;/li&gt;
&lt;li&gt;Replay and backfill can occur from Kinesis retention or S3 reprocessing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This keeps the ingestion backbone consistent while allowing different producers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Infrastructure example (SAM / CloudFormation snippets)
&lt;/h2&gt;

&lt;p&gt;The snippet below shows a minimal but realistic foundation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kinesis Data Stream&lt;/li&gt;
&lt;li&gt;Lambda transformer&lt;/li&gt;
&lt;li&gt;Firehose delivery stream to S3&lt;/li&gt;
&lt;li&gt;SQS queue for indexing&lt;/li&gt;
&lt;li&gt;EventBridge rule that forwards selected events to Kinesis&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This is intentionally a reference snippet (not a full production template) so the article stays readable.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;AWSTemplateFormatVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2010-09-09'&lt;/span&gt;
&lt;span class="na"&gt;Transform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless-2016-10-31&lt;/span&gt;
&lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Serverless CDC/Event ingestion to analytics pipeline&lt;/span&gt;

&lt;span class="na"&gt;Resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;AnalyticsEventsStream&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Kinesis::Stream&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;StreamModeDetails&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;StreamMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ON_DEMAND&lt;/span&gt;
      &lt;span class="na"&gt;RetentionPeriodHours&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;48&lt;/span&gt;

  &lt;span class="na"&gt;RawAnalyticsBucket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::S3::Bucket&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;BucketEncryption&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;ServerSideEncryptionConfiguration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ServerSideEncryptionByDefault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;SSEAlgorithm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AES256&lt;/span&gt;

  &lt;span class="na"&gt;AnalyticsIndexQueue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::SQS::Queue&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;VisibilityTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120&lt;/span&gt;
      &lt;span class="na"&gt;RedrivePolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;deadLetterTargetArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;AnalyticsIndexDLQ.Arn&lt;/span&gt;
        &lt;span class="na"&gt;maxReceiveCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;

  &lt;span class="na"&gt;AnalyticsIndexDLQ&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::SQS::Queue&lt;/span&gt;

  &lt;span class="na"&gt;FirehoseToS3Role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::IAM::Role&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;AssumeRolePolicyDocument&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2012-10-17'&lt;/span&gt;
        &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
            &lt;span class="na"&gt;Principal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;Service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;firehose.amazonaws.com&lt;/span&gt;
            &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sts:AssumeRole&lt;/span&gt;
      &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;PolicyName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;FirehoseS3Write&lt;/span&gt;
          &lt;span class="na"&gt;PolicyDocument&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2012-10-17'&lt;/span&gt;
            &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
                &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;s3:AbortMultipartUpload&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;s3:GetBucketLocation&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;s3:ListBucket&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;s3:ListBucketMultipartUploads&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;s3:PutObject&lt;/span&gt;
                &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;RawAnalyticsBucket.Arn&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${RawAnalyticsBucket.Arn}/*"&lt;/span&gt;

  &lt;span class="na"&gt;AnalyticsFirehose&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::KinesisFirehose::DeliveryStream&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DeliveryStreamType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DirectPut&lt;/span&gt;
      &lt;span class="na"&gt;ExtendedS3DestinationConfiguration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;BucketARN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;RawAnalyticsBucket.Arn&lt;/span&gt;
        &lt;span class="na"&gt;RoleARN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;FirehoseToS3Role.Arn&lt;/span&gt;
        &lt;span class="na"&gt;Prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dataset=events/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/"&lt;/span&gt;
        &lt;span class="na"&gt;ErrorOutputPrefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;errors/!{firehose:error-output-type}/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/"&lt;/span&gt;
        &lt;span class="na"&gt;CompressionFormat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GZIP&lt;/span&gt;
        &lt;span class="na"&gt;BufferingHints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;IntervalInSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
          &lt;span class="na"&gt;SizeInMBs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;

  &lt;span class="na"&gt;AnalyticsTransformerFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python3.12&lt;/span&gt;
      &lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app.lambda_handler&lt;/span&gt;
      &lt;span class="na"&gt;CodeUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/&lt;/span&gt;
      &lt;span class="na"&gt;MemorySize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt;
      &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120&lt;/span&gt;
      &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;FIREHOSE_STREAM_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;AnalyticsFirehose&lt;/span&gt;
          &lt;span class="na"&gt;INDEX_QUEUE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;AnalyticsIndexQueue&lt;/span&gt;
      &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
              &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;firehose:PutRecordBatch&lt;/span&gt;
              &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;AnalyticsFirehose.Arn&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
              &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sqs:SendMessageBatch&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sqs:SendMessage&lt;/span&gt;
              &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;AnalyticsIndexQueue.Arn&lt;/span&gt;
      &lt;span class="na"&gt;Events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;KinesisIngest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Kinesis&lt;/span&gt;
          &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Stream&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;AnalyticsEventsStream.Arn&lt;/span&gt;
            &lt;span class="na"&gt;StartingPosition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LATEST&lt;/span&gt;
            &lt;span class="na"&gt;BatchSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;
            &lt;span class="na"&gt;MaximumBatchingWindowInSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
            &lt;span class="na"&gt;FunctionResponseTypes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ReportBatchItemFailures&lt;/span&gt;

  &lt;span class="na"&gt;EventBridgeToKinesisRole&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::IAM::Role&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;AssumeRolePolicyDocument&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2012-10-17'&lt;/span&gt;
        &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
            &lt;span class="na"&gt;Principal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;Service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;events.amazonaws.com&lt;/span&gt;
            &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sts:AssumeRole&lt;/span&gt;
      &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;PolicyName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PutToKinesis&lt;/span&gt;
          &lt;span class="na"&gt;PolicyDocument&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2012-10-17'&lt;/span&gt;
            &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
                &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kinesis:PutRecord&lt;/span&gt;
                &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;AnalyticsEventsStream.Arn&lt;/span&gt;

  &lt;span class="na"&gt;AnalyticsEventRule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Events::Rule&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;EventPattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;commerce.orders&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;commerce.inventory&lt;/span&gt;
      &lt;span class="na"&gt;Targets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;AnalyticsEventsStream.Arn&lt;/span&gt;
          &lt;span class="na"&gt;Id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KinesisAnalyticsTarget&lt;/span&gt;
          &lt;span class="na"&gt;RoleArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;EventBridgeToKinesisRole.Arn&lt;/span&gt;
          &lt;span class="na"&gt;KinesisParameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;PartitionKeyPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$.detail.orderId"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why this foundation works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Kinesis on-demand is great while volume is still changing&lt;/li&gt;
&lt;li&gt;Firehose to S3 gives durable landing, buffering, and compression&lt;/li&gt;
&lt;li&gt;Lambda centralizes canonicalization and routing&lt;/li&gt;
&lt;li&gt;SQS isolates OpenSearch indexing retries from the main ingest path&lt;/li&gt;
&lt;li&gt;EventBridge feeds analytics without forcing every producer to know about Kinesis directly&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Lambda transformation layer (the part that pays for itself)
&lt;/h2&gt;

&lt;p&gt;This is the heart of the pattern.&lt;/p&gt;

&lt;p&gt;I use the Lambda transformation layer to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;normalize different event formats into one canonical schema&lt;/li&gt;
&lt;li&gt;enrich records (tenant, derived dimensions, lookup joins if lightweight)&lt;/li&gt;
&lt;li&gt;attach dedupe metadata and ordering keys&lt;/li&gt;
&lt;li&gt;route records to the right sinks&lt;/li&gt;
&lt;li&gt;drop or quarantine malformed records&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rules I follow for transformations
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Keep it deterministic (same input -&amp;gt; same normalized output)&lt;/li&gt;
&lt;li&gt;Keep it fast (avoid heavy network calls in the hot path)&lt;/li&gt;
&lt;li&gt;Keep it observable (emit counts by event type and error reason)&lt;/li&gt;
&lt;li&gt;Fail records, not whole batches when possible&lt;/li&gt;
&lt;li&gt;Preserve original payload if analytics or debugging needs it&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Example Lambda transformer (Kinesis -&amp;gt; Firehose + SQS)
&lt;/h2&gt;

&lt;p&gt;This example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reads a Kinesis batch&lt;/li&gt;
&lt;li&gt;normalizes records from multiple sources (EventBridge-shaped or direct JSON)&lt;/li&gt;
&lt;li&gt;writes canonical events to Firehose (S3 path)&lt;/li&gt;
&lt;li&gt;sends selected event types to SQS for OpenSearch indexing&lt;/li&gt;
&lt;li&gt;returns partial batch failures for retriable records
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;firehose&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;firehose&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sqs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;FIREHOSE_STREAM_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FIREHOSE_STREAM_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;INDEX_QUEUE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INDEX_QUEUE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;INDEXABLE_EVENT_TYPES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order.placed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order.shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product.updated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;utc_now_iso&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+00:00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Z&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sha256_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;ingested_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;utc_now_iso&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;detail-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;detail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;detail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;detail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eventType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;detail-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;entity_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orderId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;productId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entityId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenantId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;ingested_at&lt;/span&gt;
        &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eventId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;sha256_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sort_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;
        &lt;span class="n"&gt;source_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;entity_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;occurred_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;ingested_at&lt;/span&gt;
        &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;sha256_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sort_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;source_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;direct-producer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;event_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;entity_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entityType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sequence_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entity_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entity_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;idempotency_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entity_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:v&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event_version&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event_version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_name&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entity_type&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entity_id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;occurred_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ingested_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ingested_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traceId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;source_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idempotency_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;idempotency_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sequence_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sequence_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transport&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kinesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normalized_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analytics-transformer-lambda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;canonical-analytics-event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_kinesis_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;raw_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kinesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_bytes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chunked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_to_firehose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;separators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;chunked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;firehose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_record_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DeliveryStreamName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FIREHOSE_STREAM_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Records&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FailedPutCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Firehose batch write had &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;FailedPutCount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_index_jobs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;chunked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;entries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MessageBody&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;occurred_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;occurred_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idempotency_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idempotency_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;sqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_message_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;QueueUrl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;INDEX_QUEUE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Entries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;transformed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;index_jobs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;batch_item_failures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;start_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;sequence_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kinesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sequenceNumber&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_kinesis_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;canonical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;normalize_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;canonical&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;canonical&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;INDEXABLE_EVENT_TYPES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;index_jobs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;canonical&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;itemIdentifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sequence_number&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ERROR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to parse/normalize record&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sequence_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sequence_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;}))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;send_to_firehose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ERROR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Firehose write failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)}))&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
                &lt;span class="n"&gt;seq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kinesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sequenceNumber&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;itemIdentifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;itemIdentifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batchItemFailures&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;index_jobs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;send_index_jobs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_jobs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Often I do not fail primary ingest if indexing queue write fails.
&lt;/span&gt;            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ERROR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Index queue write failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)}))&lt;/span&gt;

    &lt;span class="n"&gt;duration_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_ms&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INFO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Batch processed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;records_in&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;records_transformed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;records_indexed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_jobs&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;records_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;duration_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batchItemFailures&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;batch_item_failures&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why this implementation pattern works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;I treat S3 landing as the primary success path&lt;/li&gt;
&lt;li&gt;I isolate OpenSearch indexing via SQS&lt;/li&gt;
&lt;li&gt;I use partial batch failure for source retries&lt;/li&gt;
&lt;li&gt;I preserve enough metadata (&lt;code&gt;event_id&lt;/code&gt;, &lt;code&gt;idempotency_key&lt;/code&gt;, &lt;code&gt;sequence_key&lt;/code&gt;) for dedupe and replay&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  OpenSearch delivery pattern (what I do in practice)
&lt;/h2&gt;

&lt;p&gt;For OpenSearch, I do not assume the ingest path and indexing path should share the same retry semantics.&lt;/p&gt;

&lt;p&gt;That is why I often decouple indexing with SQS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why SQS in front of OpenSearch helps
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;OpenSearch can throttle under load&lt;/li&gt;
&lt;li&gt;index mapping errors or payload issues should not block S3 landing&lt;/li&gt;
&lt;li&gt;I can tune retry behavior independently&lt;/li&gt;
&lt;li&gt;I can replay index jobs from S3 if needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Simple SQS -&amp;gt; Lambda -&amp;gt; OpenSearch indexer (illustrative snippet)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opensearchpy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenSearch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RequestsHttpConnection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;helpers&lt;/span&gt;

&lt;span class="n"&gt;OPENSEARCH_HOST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENSEARCH_HOST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;INDEX_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENSEARCH_INDEX&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analytics-events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;hosts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OPENSEARCH_HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;use_ssl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verify_certs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;connection_class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RequestsHttpConnection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;to_index_doc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;occurred_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;occurred_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idempotency_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;idempotency_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;evt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_op_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;INDEX_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;to_index_doc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;helpers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;indexed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Best practice note
&lt;/h3&gt;

&lt;p&gt;Using &lt;code&gt;_id = event_id&lt;/code&gt; gives me idempotent-friendly indexing behavior (retries overwrite the same document rather than creating duplicates). That is usually what I want for analytics and search event documents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Delivery to Redshift (S3-first is the pattern I recommend most)
&lt;/h2&gt;

&lt;p&gt;For analytics warehouses, I usually prefer S3-first ingestion rather than writing directly to Redshift from the transformation Lambda.&lt;/p&gt;

&lt;p&gt;Why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 is a durable landing zone for replay and audit&lt;/li&gt;
&lt;li&gt;Redshift loads can be batched efficiently&lt;/li&gt;
&lt;li&gt;I can rebuild tables from historical data&lt;/li&gt;
&lt;li&gt;I keep ingestion and warehouse modeling decoupled&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common pattern
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Land canonical events in S3&lt;/li&gt;
&lt;li&gt;Load into a staging table in Redshift&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MERGE&lt;/code&gt; into analytics tables (or fact tables)&lt;/li&gt;
&lt;li&gt;Keep a watermark or batch manifest for operations&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Example Redshift SQL (staging + merge)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;staging_events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event_id&lt;/span&gt;         &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;       &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;event_version&lt;/span&gt;    &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;source&lt;/span&gt;           &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tenant_id&lt;/span&gt;        &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;entity_type&lt;/span&gt;      &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;entity_id&lt;/span&gt;        &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;occurred_at&lt;/span&gt;      &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ingested_at&lt;/span&gt;      &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trace_id&lt;/span&gt;         &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;idempotency_key&lt;/span&gt;  &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;sequence_key&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;          &lt;span class="n"&gt;SUPER&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt; &lt;span class="n"&gt;staging_events&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="s1"&gt;'s3://your-bucket/dataset=events/year=2026/month=02/day=25/'&lt;/span&gt;
&lt;span class="n"&gt;IAM_ROLE&lt;/span&gt; &lt;span class="s1"&gt;'arn:aws:iam::&amp;lt;account-id&amp;gt;:role/RedshiftCopyRole'&lt;/span&gt;
&lt;span class="n"&gt;FORMAT&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt; &lt;span class="s1"&gt;'auto'&lt;/span&gt;
&lt;span class="n"&gt;TIMEFORMAT&lt;/span&gt; &lt;span class="s1"&gt;'auto'&lt;/span&gt;
&lt;span class="n"&gt;GZIP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;fact_order_events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event_id&lt;/span&gt;         &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tenant_id&lt;/span&gt;        &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;         &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt;       &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;occurred_at&lt;/span&gt;      &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ingested_at&lt;/span&gt;      &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;           &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;currency&lt;/span&gt;         &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;          &lt;span class="n"&gt;SUPER&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;MERGE&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;fact_order_events&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;tgt&lt;/span&gt;
&lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;entity_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ingested_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;TRY_CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DECIMAL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;currency&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;staging_events&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;entity_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'order'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;tgt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;MATCHED&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt;
    &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;occurred_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ingested_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ingested_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;currency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="n"&gt;MATCHED&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ingested_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;occurred_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ingested_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;TRUNCATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;staging_events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why &lt;code&gt;MERGE&lt;/code&gt; is important in CDC and event pipelines
&lt;/h3&gt;

&lt;p&gt;Retries and replays happen. &lt;code&gt;MERGE&lt;/code&gt; lets me keep warehouse loads idempotent at the table level rather than assuming every batch is perfectly unique.&lt;/p&gt;




&lt;h2&gt;
  
  
  Ordering, duplication, and replay (the part that breaks naive designs)
&lt;/h2&gt;

&lt;p&gt;This is where I spend a lot of time in reviews because it directly affects data correctness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Ordering: what I can and cannot guarantee
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Kinesis ordering
&lt;/h3&gt;

&lt;p&gt;Kinesis preserves record order within a shard, and practically I think in terms of ordering per partition key.&lt;/p&gt;

&lt;p&gt;If I need ordering for &lt;code&gt;orderId&lt;/code&gt;, I choose a partition key tied to that ordering scope (for example &lt;code&gt;tenantId#orderId&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  EventBridge ordering
&lt;/h3&gt;

&lt;p&gt;I do not assume EventBridge preserves strict ordering across events. If ordering matters for analytics correctness, I enforce it downstream with event timestamps, versions, and conflict resolution logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQS ordering
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Standard queues: no strict ordering, duplicates possible&lt;/li&gt;
&lt;li&gt;FIFO queues: ordered per MessageGroupId, with a bounded dedupe window&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  My rule of thumb
&lt;/h3&gt;

&lt;p&gt;I preserve ordering only where it matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;choose a sequence key&lt;/li&gt;
&lt;li&gt;partition based on that key when using Kinesis or FIFO&lt;/li&gt;
&lt;li&gt;store &lt;code&gt;event_version&lt;/code&gt; and &lt;code&gt;occurred_at&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;make downstream upserts resilient to out-of-order arrivals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trying to preserve global ordering everywhere usually makes the system slower and more expensive than it needs to be.&lt;/p&gt;




&lt;h2&gt;
  
  
  Duplication: assume it will happen
&lt;/h2&gt;

&lt;p&gt;I assume duplicates can appear because of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;producer retries&lt;/li&gt;
&lt;li&gt;Lambda retries and partial batch retries&lt;/li&gt;
&lt;li&gt;EventBridge target retries&lt;/li&gt;
&lt;li&gt;SQS redrives&lt;/li&gt;
&lt;li&gt;replay and backfill operations&lt;/li&gt;
&lt;li&gt;manual reprocessing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How I handle duplicates
&lt;/h3&gt;

&lt;p&gt;I include in the canonical record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;event_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idempotency_key&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then I make sinks safe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3: duplicates can exist physically, but I dedupe downstream in queries or ETL&lt;/li&gt;
&lt;li&gt;OpenSearch: use &lt;code&gt;_id = event_id&lt;/code&gt; to overwrite same document on retry&lt;/li&gt;
&lt;li&gt;Redshift: &lt;code&gt;MERGE&lt;/code&gt; on &lt;code&gt;event_id&lt;/code&gt; or business key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the exactly-once myth versus at-least-once reality pattern applied to analytics ingestion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Replay: make it a feature, not an emergency procedure
&lt;/h2&gt;

&lt;p&gt;I design replay paths intentionally from day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replay options in this architecture
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Kinesis retention replay (re-read retained stream window)&lt;/li&gt;
&lt;li&gt;EventBridge archive and replay (for applicable event bus scenarios)&lt;/li&gt;
&lt;li&gt;S3 reprocessing (most flexible for historical rebuilds and backfills)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why S3 replay matters most
&lt;/h3&gt;

&lt;p&gt;Even if I have Kinesis or EventBridge replay features, S3 is usually my best long-term replay layer because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it keeps historical data longer&lt;/li&gt;
&lt;li&gt;it is cheap and durable&lt;/li&gt;
&lt;li&gt;I can reprocess with new transformation logic&lt;/li&gt;
&lt;li&gt;I can rebuild OpenSearch or Redshift if needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why I strongly prefer S3-first landing for analytics pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Partitioning and cost optimization (where big savings come from)
&lt;/h2&gt;

&lt;p&gt;This topic matters a lot because ingestion costs scale with volume, and bad partitioning creates pain in both storage and query engines.&lt;/p&gt;




&lt;h2&gt;
  
  
  S3 partitioning strategy (practical guidance)
&lt;/h2&gt;

&lt;p&gt;A common anti-pattern is over-partitioning too early.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I usually start with
&lt;/h3&gt;

&lt;p&gt;For general event analytics, I start with time-based partitions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dataset=events/year=YYYY/month=MM/day=DD/&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then, if query patterns justify it, I add one more selective dimension:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tenant_id=...&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;event_type=...&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What I avoid early on
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;highly granular partitions that create too many small files&lt;/li&gt;
&lt;li&gt;partitioning on high-cardinality IDs like &lt;code&gt;order_id&lt;/code&gt; or &lt;code&gt;user_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;changing partition schemes frequently without a migration plan&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Small files are a real cost problem
&lt;/h3&gt;

&lt;p&gt;Too many tiny files hurt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Athena or Redshift query planning and performance&lt;/li&gt;
&lt;li&gt;metadata overhead&lt;/li&gt;
&lt;li&gt;downstream ETL efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why I use batching and buffering (Firehose or app-side batching) and aim for healthy object sizes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Firehose and file size optimization
&lt;/h2&gt;

&lt;p&gt;Firehose helps reduce operational overhead, and I use it a lot for S3 landing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Best practices I apply
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;enable compression (GZIP minimum; Parquet or ORC conversion when appropriate)&lt;/li&gt;
&lt;li&gt;tune buffer interval and size&lt;/li&gt;
&lt;li&gt;use error output prefixes for bad records&lt;/li&gt;
&lt;li&gt;keep schemas stable enough if using format conversion&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When I choose Parquet conversion
&lt;/h3&gt;

&lt;p&gt;I choose Parquet when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;analytics queries dominate&lt;/li&gt;
&lt;li&gt;schema is reasonably stable&lt;/li&gt;
&lt;li&gt;I want lower scan cost and faster query performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I keep JSON initially if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;schema changes rapidly&lt;/li&gt;
&lt;li&gt;debugging raw payloads is a priority&lt;/li&gt;
&lt;li&gt;multiple downstream consumers still need semi-structured payloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A common compromise is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;raw JSON landing plus curated Parquet later&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Kinesis cost and throughput optimization
&lt;/h2&gt;

&lt;p&gt;Kinesis can be very cost-effective when used intentionally, but I still tune it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decisions I make explicitly
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;On-demand vs provisioned

&lt;ul&gt;
&lt;li&gt;start with on-demand for uncertain traffic&lt;/li&gt;
&lt;li&gt;move to provisioned when traffic is predictable and steady&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;partition key distribution to avoid hot shards&lt;/li&gt;

&lt;li&gt;batch sizes and Lambda windowing to reduce invocation overhead&lt;/li&gt;

&lt;li&gt;consumer count (Enhanced Fan-Out only when justified)&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hot shard warning sign
&lt;/h3&gt;

&lt;p&gt;If one key dominates (for example a single tenant or entity), I can get uneven throughput and throttling.&lt;/p&gt;

&lt;p&gt;Fixes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better partition key strategy&lt;/li&gt;
&lt;li&gt;partition key suffixing (only if ordering requirements allow)&lt;/li&gt;
&lt;li&gt;separating noisy tenants or workloads&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Lambda cost optimization in ingest pipelines
&lt;/h2&gt;

&lt;p&gt;Lambda is often not the dominant cost at first, but it can become noticeable at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tuning areas I care about
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;batch size and batching window&lt;/li&gt;
&lt;li&gt;memory sizing (to get better CPU and network, and shorter runtime)&lt;/li&gt;
&lt;li&gt;avoiding heavy per-record network calls&lt;/li&gt;
&lt;li&gt;reusing clients across invocations&lt;/li&gt;
&lt;li&gt;minimizing unnecessary JSON serialization churn&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A practical optimization
&lt;/h3&gt;

&lt;p&gt;I treat the transformer as a batch processor, not a record-at-a-time handler. That usually improves both throughput and cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Redshift cost optimization in event ingestion
&lt;/h2&gt;

&lt;p&gt;When Redshift is the warehouse sink, I optimize the load pattern, not just the compute.&lt;/p&gt;

&lt;h3&gt;
  
  
  Best practices I use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;load from S3 in batches (COPY), not row-by-row inserts from Lambda&lt;/li&gt;
&lt;li&gt;stage then MERGE&lt;/li&gt;
&lt;li&gt;align file sizes to efficient COPY behavior&lt;/li&gt;
&lt;li&gt;keep raw event retention in S3 so warehouse tables can be rebuilt and re-modeled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many teams, the biggest cost win is simply moving from ad hoc inserts to an S3 batch load pattern.&lt;/p&gt;




&lt;h2&gt;
  
  
  End-to-end implementation discussion (how I wire this in production)
&lt;/h2&gt;

&lt;p&gt;This is the part I care about most because architecture decisions show up in operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) I define the source of truth for ingestion success
&lt;/h3&gt;

&lt;p&gt;In this design, the source of truth is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Successful normalized delivery to S3 (via Firehose).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 is durable&lt;/li&gt;
&lt;li&gt;S3 supports replay&lt;/li&gt;
&lt;li&gt;downstream sinks can catch up independently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This prevents me from coupling ingestion success to OpenSearch availability, for example.&lt;/p&gt;




&lt;h3&gt;
  
  
  2) I decouple sink-specific SLAs
&lt;/h3&gt;

&lt;p&gt;Different sinks serve different users:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenSearch may need near-real-time indexing for search or ops views&lt;/li&gt;
&lt;li&gt;Redshift loads may run in micro-batches&lt;/li&gt;
&lt;li&gt;lake consumers may process hourly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By decoupling them, I avoid making the entire pipeline as fragile as the most sensitive sink.&lt;/p&gt;




&lt;h3&gt;
  
  
  3) I make replay and backfill a documented operation
&lt;/h3&gt;

&lt;p&gt;I document:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;replay source (Kinesis, EventBridge archive, or S3)&lt;/li&gt;
&lt;li&gt;dedupe keys and merge behavior&lt;/li&gt;
&lt;li&gt;expected lag and throughput limits&lt;/li&gt;
&lt;li&gt;how to avoid double-indexing side effects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns replay into an operational capability instead of a risky one-off script.&lt;/p&gt;




&lt;h3&gt;
  
  
  4) I design for schema evolution early
&lt;/h3&gt;

&lt;p&gt;Events change. They always do.&lt;/p&gt;

&lt;p&gt;I version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;event schema (&lt;code&gt;event_version&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;transformation logic (deployable version)&lt;/li&gt;
&lt;li&gt;warehouse model migrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also preserve raw payloads so I can re-derive curated data if the schema evolves.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common mistakes I see (and how I avoid them)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: Using only EventBridge for everything and expecting stream semantics
&lt;/h3&gt;

&lt;p&gt;EventBridge is excellent for routing, but it is not the same as Kinesis when I need sustained high-throughput ordered ingestion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; use EventBridge for routing, Kinesis for the analytics ingestion backbone when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: Letting sink failures block primary landing
&lt;/h3&gt;

&lt;p&gt;If OpenSearch throttles and that blocks the whole ingest path, the pipeline becomes fragile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; make S3 landing primary, and decouple secondary sinks with SQS or replay.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: No canonical schema
&lt;/h3&gt;

&lt;p&gt;Every producer emits a different shape, and downstream SQL gets messy fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; normalize once in Lambda and publish a canonical analytics contract.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: Ignoring duplication until dashboards look wrong
&lt;/h3&gt;

&lt;p&gt;Retries, redrives, and replay all create duplicates eventually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; include &lt;code&gt;event_id&lt;/code&gt; and dedupe keys, and make each sink idempotent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 5: Over-partitioning S3 on day one
&lt;/h3&gt;

&lt;p&gt;This creates small files, metadata overhead, and poor performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; start with time partitions and compression, then add dimensions based on real query patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical best practices checklist
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Transport decisioning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] EventBridge used for routing and integration use cases&lt;/li&gt;
&lt;li&gt;[ ] Kinesis used where throughput, order, and replay requirements justify it&lt;/li&gt;
&lt;li&gt;[ ] SQS used for buffering and retry isolation where needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Transformation layer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Canonical schema defined and versioned&lt;/li&gt;
&lt;li&gt;[ ] Transformer is deterministic and observable&lt;/li&gt;
&lt;li&gt;[ ] Partial batch failure behavior is configured for stream or queue consumers&lt;/li&gt;
&lt;li&gt;[ ] Original payload preserved when needed for replay and debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Sink delivery
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] S3 is durable landing zone (preferred for analytics)&lt;/li&gt;
&lt;li&gt;[ ] OpenSearch indexing path is decoupled from primary ingest&lt;/li&gt;
&lt;li&gt;[ ] Redshift loads are batch-based (COPY and MERGE), not row-by-row Lambda inserts&lt;/li&gt;
&lt;li&gt;[ ] Dedupe and idempotency strategy exists per sink&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ordering / duplication / replay
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Partition keys align to ordering scope&lt;/li&gt;
&lt;li&gt;[ ] Duplicate handling defined across retries and replays&lt;/li&gt;
&lt;li&gt;[ ] Replay and backfill path documented and tested&lt;/li&gt;
&lt;li&gt;[ ] Metrics and alarms exist for lag, failure, and sink throttling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost / performance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] S3 compression enabled&lt;/li&gt;
&lt;li&gt;[ ] Partitioning strategy avoids small-file explosion&lt;/li&gt;
&lt;li&gt;[ ] Kinesis mode (on-demand or provisioned) chosen intentionally&lt;/li&gt;
&lt;li&gt;[ ] Lambda batching and memory tuned with real metrics&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;If I had to summarize this architecture pattern in one line, it would be:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use the right service for the right ingestion job, normalize once, land durably in S3, and make every downstream sink replay-safe.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That combination gives me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cleaner producer integrations&lt;/li&gt;
&lt;li&gt;better analytics correctness&lt;/li&gt;
&lt;li&gt;safer reprocessing&lt;/li&gt;
&lt;li&gt;more predictable scaling and cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most teams, the biggest improvement is not a new service. It is adopting a clearer ingestion architecture with explicit semantics for ordering, duplication, replay, and sink ownership.&lt;/p&gt;

&lt;p&gt;If you are building serverless analytics pipelines on AWS, this pattern will give you a strong foundation that can grow with both event volume and analytics complexity.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Amazon EventBridge documentation (event buses, rules, targets, archive/replay)&lt;/li&gt;
&lt;li&gt;Amazon Kinesis Data Streams documentation (stream modes, ordering, retention, consumers)&lt;/li&gt;
&lt;li&gt;Amazon SQS documentation (standard vs FIFO, retries, DLQs)&lt;/li&gt;
&lt;li&gt;AWS Lambda documentation (event source mappings, partial batch response)&lt;/li&gt;
&lt;li&gt;Amazon Kinesis Data Firehose documentation (S3/OpenSearch/Redshift delivery, buffering, compression)&lt;/li&gt;
&lt;li&gt;Amazon S3 documentation (partitioning and storage best practices)&lt;/li&gt;
&lt;li&gt;Amazon OpenSearch Service documentation&lt;/li&gt;
&lt;li&gt;Amazon Redshift and Redshift Serverless documentation (COPY, MERGE, SUPER)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
      <category>aws</category>
      <category>serverless</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Rebuilding TLS, Part 3 — Building Our First Handshake</title>
      <dc:creator>Dmytro Huz</dc:creator>
      <pubDate>Sun, 19 Apr 2026 17:09:17 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/rebuilding-tls-part-3-building-our-first-handshake-4a2j</link>
      <guid>https://future.forem.com/aws-builders/rebuilding-tls-part-3-building-our-first-handshake-4a2j</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Overview: Where we are and What Is Still Missing&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In the previous part of this series, we made our fake secure channel much less fake.&lt;/p&gt;

&lt;p&gt;We started with the broken encrypted transport from &lt;a href="https://www.dmytrohuz.com/p/rebuilding-tls-part-1-why-encryption" rel="noopener noreferrer"&gt;Part 1&lt;/a&gt;, added integrity with HMAC, added sequence numbers to make the record layer less naive, &lt;a href="https://www.dmytrohuz.com/p/rebuilding-tls-part-2-adding-integrity" rel="noopener noreferrer"&gt;and then moved to AEAD&lt;/a&gt; — the approach modern systems usually use to protect records.&lt;/p&gt;

&lt;p&gt;At that point, our protocol could already do something meaningful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;encrypt application data&lt;/li&gt;
&lt;li&gt;detect tampering&lt;/li&gt;
&lt;li&gt;reject modified records&lt;/li&gt;
&lt;li&gt;keep some minimal record-layer state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was a real step forward.&lt;/p&gt;

&lt;p&gt;But it still relied on one very unrealistic assumption:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;both sides already shared the secret keys&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And that is exactly what we need to remove now.&lt;/p&gt;

&lt;p&gt;Because a real secure protocol cannot stop at protecting data after the keys already exist. It also has to answer one of the harder questions first:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;if client and server do not already share a secret, how can they create one over an insecure network in the first place?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the goal of this part.&lt;/p&gt;

&lt;p&gt;We are going to build the next missing layer of the protocol: the handshake.&lt;/p&gt;

&lt;p&gt;The architecture of this step is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client                           Server
------                           ------
Handshake messages  &amp;lt;---------&amp;gt;  Handshake messages
       |                               |
       v                               v
  shared secret                  shared secret
       |                               |
       +---------&amp;gt; HKDF &amp;lt;--------------+
                    |
                    v
              session keys
                    |
                    v
         protected application data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The idea is to let the connection create fresh key material dynamically instead of starting with a hardcoded application key.&lt;/p&gt;

&lt;p&gt;We will implement that in three steps.&lt;/p&gt;

&lt;p&gt;First, we will build a handshake with classic Diffie-Hellman, where the shared prime and base are still explicit and visible in the protocol. Then we will replace that version with X25519 to show how modern protocols simplify the same idea. After that, we will use HKDF to derive proper session keys from the raw shared secret.&lt;/p&gt;

&lt;p&gt;That will take us one big step closer to the shape of real TLS.&lt;/p&gt;

&lt;p&gt;But still not all the way.&lt;/p&gt;

&lt;p&gt;Because even if both sides manage to derive the same fresh session keys, one critical problem will remain: they still do not know who is on the other side.&lt;/p&gt;

&lt;p&gt;And that is where this part is heading.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;A Very Short Note on Public Key Exchange&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The basic idea of public key exchange is simple.&lt;/p&gt;

&lt;p&gt;Two sides communicate over an insecure network. They exchange some public information. And from that exchange, both sides derive the same shared secret — without ever sending that secret directly over the wire.&lt;/p&gt;

&lt;p&gt;That is the key point.&lt;/p&gt;

&lt;p&gt;The network can be fully visible.&lt;/p&gt;

&lt;p&gt;An observer can see all handshake messages.&lt;/p&gt;

&lt;p&gt;But the observer still should not be able to derive the same secret.&lt;/p&gt;

&lt;p&gt;That is exactly the kind of mechanism we need now.&lt;/p&gt;

&lt;p&gt;Until this point in the series, our protocol always started with a secret that already existed. Public key exchange changes that. It gives the connection a way to create fresh shared key material dynamically.&lt;/p&gt;

&lt;p&gt;In this article, I do not want to go deep into the mathematics behind it. I only want to use the core idea as the next building block of the protocol.&lt;/p&gt;

&lt;p&gt;If you want the deeper intuition behind why this works, I already wrote about it here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The aha moment of public key encryption&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dmytrohuz.com/p/the-aha-moment-of-public-key-encryption" rel="noopener noreferrer"&gt;https://www.dmytrohuz.com/p/the-aha-moment-of-public-key-encryption&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For now, the main idea we need is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;each side contributes its own private value&lt;/li&gt;
&lt;li&gt;both sides exchange some public values&lt;/li&gt;
&lt;li&gt;both sides derive the same shared secret&lt;/li&gt;
&lt;li&gt;that secret can then become the basis for session keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So let’s build that first in the most explicit way, with classic Diffie-Hellman where the shared public parameters are still visible in the handshake.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Implementation Part 1 — Our First Handshake with Classic Diffie-Hellman&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Now let’s build the first real handshake in the series.&lt;/p&gt;

&lt;p&gt;I want to start with classic Diffie-Hellman, not because this is the final form we want to keep, but because it makes the mechanics of key exchange much more visible.&lt;/p&gt;

&lt;p&gt;In this version, both sides work with the same public parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a prime p&lt;/li&gt;
&lt;li&gt;a generator g&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These values are not secret. In our implementation, the client sends them in the handshake, which makes the whole mechanism more explicit on the wire. That is exactly what I want at this stage. Before we hide the details behind a cleaner modern primitive, I want to make the structure fully visible.&lt;/p&gt;

&lt;p&gt;The actual secret material comes from somewhere else:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the client chooses a private exponent a&lt;/li&gt;
&lt;li&gt;the server chooses a private exponent b&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From those private values, both sides compute public values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the client computes A = g^a mod p&lt;/li&gt;
&lt;li&gt;the server computes B = g^b mod p&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then they exchange A and B.&lt;/p&gt;

&lt;p&gt;And this is the key step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the client computes s = B^a mod p&lt;/li&gt;
&lt;li&gt;the server computes s = A^b mod p&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both sides end up with the same shared secret, without ever sending that secret directly over the network.&lt;/p&gt;

&lt;p&gt;In diagram form, the handshake looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client                                        Server
------                                        ------
choose private a
compute A = g^a mod p

ClientHello(p, g, A)        ---------&amp;gt;

                                              choose private b
                                              compute B = g^b mod p

                            &amp;lt;---------          ServerHello(B)

compute s = B^a mod p                           compute s = A^b mod p
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is our first real handshake.&lt;/p&gt;

&lt;p&gt;Until now, the protocol always started with a secret key that already existed.&lt;/p&gt;

&lt;p&gt;Now the connection itself creates the secret.&lt;/p&gt;

&lt;p&gt;That is a major shift.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The raw Diffie-Hellman math&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At the lowest level, the core operations are very small. That is one of the nice things about starting with classic Diffie-Hellman: the whole idea is still visible in a few functions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="c1"&gt;# RFC 3526 Group 14: 2048-bit MODP prime
&lt;/span&gt;&lt;span class="n"&gt;DH_PRIME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;29024E088A67CC74020BBEA63B139B22514A08798E3404DD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EF9519B3CD3A431B302B0A6DF25F14374FE1356D6D51C245&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;E485B576625E7EC6F44C42E9A637ED6B0BFF5CB6F406B7ED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EE386BFB5A899FA5AE9F24117C4B1FE649286651ECE45B3D&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;C2007CB8A163BF0598DA48361C55D39A69163FA8FD24CF5F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;83655D23DCA3AD961C62F356208552BB9ED529077096966D&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;670C354E4ABC9804F1746C08CA18217C32905E462E36CE3B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;E39E772C180E86039B2783A2EC07A28FB5C55DF06F4C52C9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DE2BCBF6955817183995497CEA956AE515D2261898FA0510&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15728E5A8AACAA68FFFFFFFFFFFFFFFF&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;DH_GENERATOR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_private_exponent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urandom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;big&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_public_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;private&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;private&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_shared_secret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;peer_public&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;private&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;peer_public&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;private&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the whole core idea in code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;private exponent stays local&lt;/li&gt;
&lt;li&gt;public value goes on the wire&lt;/li&gt;
&lt;li&gt;shared secret is derived independently on both sides&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the heart of Diffie-Hellman.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Client side&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;client_handshake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Perform the client side of the classic DH handshake.

    The client picks the public parameters (p, g) and sends them to the
    server along with its own public DH value.  The server uses those
    parameters to compute its own public value and sends it back.

    Returns the shared secret as bytes.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# The client chooses p and g.  These are PUBLIC — not secret.
&lt;/span&gt;    &lt;span class="c1"&gt;# Anyone on the wire can see them, and that is perfectly fine.
&lt;/span&gt;    &lt;span class="c1"&gt;# The security of DH depends on the hardness of the discrete
&lt;/span&gt;    &lt;span class="c1"&gt;# logarithm problem, not on hiding p and g.
&lt;/span&gt;    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DH_PRIME&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DH_GENERATOR&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Public parameters (chosen by client, sent to server):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    p = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;... (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bit_length&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; bits)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    g = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Generate client's private exponent and public value.
&lt;/span&gt;    &lt;span class="c1"&gt;# The private exponent is the ONE thing that stays secret.
&lt;/span&gt;    &lt;span class="n"&gt;client_private&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_private_exponent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;client_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_public_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_private&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int_to_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_public&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Send ClientHello with p, g, and our public value.
&lt;/span&gt;    &lt;span class="c1"&gt;# All three are public.  The private exponent is NOT included.
&lt;/span&gt;    &lt;span class="n"&gt;p_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int_to_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int_to_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client_hello&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encode_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TAG_DH_P&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p_bytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TAG_DH_G&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g_bytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TAG_DH_PUBLIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client_public_bytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 3: send p, g, and the client’s public value inside ClientHello
&lt;/span&gt;    &lt;span class="nf"&gt;send_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client_hello&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Receive ServerHello with the server's public value.
&lt;/span&gt;    &lt;span class="n"&gt;server_hello_raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recv_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;decode_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_hello_raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;server_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;TAG_DH_PUBLIC&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;server_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;server_public_bytes&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ServerHello missing DH public value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;server_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bytes_to_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_public_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &amp;lt;- Received ServerHello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Server public value B:   &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;hex_preview&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_public_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 5: Compute the shared secret.
&lt;/span&gt;    &lt;span class="c1"&gt;# shared = B^a mod p = (g^b)^a mod p = g^(ab) mod p
&lt;/span&gt;    &lt;span class="n"&gt;shared_int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_shared_secret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_public&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client_private&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;shared_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int_to_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shared_int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;shared_bytes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the client side, the flow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;choose a private exponent&lt;/li&gt;
&lt;li&gt;compute the public value&lt;/li&gt;
&lt;li&gt;send p, g, and the client’s public value inside ClientHello&lt;/li&gt;
&lt;li&gt;receive the server’s public value&lt;/li&gt;
&lt;li&gt;derive the shared secret&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the first point in the series where the client does not begin with the application key. It participates in creating it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Server side&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;server_handshake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Perform the server side of the classic DH handshake.

    The server receives p, g, and client_public from the ClientHello,
    uses those parameters to generate its own keypair, and sends its
    public value back.

    Returns the shared secret as bytes.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Receive ClientHello — parse p, g, and client's public value.
&lt;/span&gt;    &lt;span class="c1"&gt;# The server does NOT assume any particular p or g.  It uses whatever
&lt;/span&gt;    &lt;span class="c1"&gt;# the client proposes.  (In a production system, the server would
&lt;/span&gt;    &lt;span class="c1"&gt;# validate that p is a safe prime and g is a proper generator.
&lt;/span&gt;    &lt;span class="c1"&gt;# We skip that here for clarity.)
&lt;/span&gt;    &lt;span class="n"&gt;client_hello_raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recv_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;decode_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_hello_raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;p_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;g_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;client_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;TAG_DH_P&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;p_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;TAG_DH_G&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;g_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;TAG_DH_PUBLIC&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;client_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p_bytes&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ClientHello missing DH prime (p)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;g_bytes&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ClientHello missing DH generator (g)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;client_public_bytes&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ClientHello missing DH public value (A)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Deserialize the parameters from bytes.
&lt;/span&gt;    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bytes_to_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bytes_to_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bytes_to_int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_public_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Generate server's private exponent and public value
&lt;/span&gt;    &lt;span class="c1"&gt;# using the p and g received from the client.
&lt;/span&gt;    &lt;span class="n"&gt;server_private&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_private_exponent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Compute server's public value
&lt;/span&gt;    &lt;span class="n"&gt;server_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_public_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_private&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;server_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int_to_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_public&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Send ServerHello with our public value.
&lt;/span&gt;    &lt;span class="c1"&gt;# Only B is sent — p and g are already known from the ClientHello.
&lt;/span&gt;    &lt;span class="n"&gt;server_hello&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encode_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TAG_DH_PUBLIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;server_public_bytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;send_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;server_hello&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 5: Compute the shared secret.
&lt;/span&gt;    &lt;span class="c1"&gt;# shared = A^b mod p = (g^a)^b mod p = g^(ab) mod p
&lt;/span&gt;    &lt;span class="n"&gt;shared_int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_shared_secret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_public&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;server_private&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;shared_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int_to_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shared_int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;shared_bytes&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server does the mirror image:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;receive p, g, and the client’s public value&lt;/li&gt;
&lt;li&gt;choose its own private exponent&lt;/li&gt;
&lt;li&gt;compute its own public value&lt;/li&gt;
&lt;li&gt;send that value back in ServerHello&lt;/li&gt;
&lt;li&gt;derive the same shared secret from the client’s public value&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So at the end of the handshake, both sides have the same secret — but that secret was never transmitted directly.&lt;/p&gt;

&lt;p&gt;That is the big win.&lt;/p&gt;

&lt;p&gt;After this step, the connection can create fresh shared key material dynamically.&lt;/p&gt;

&lt;p&gt;That is a much more realistic foundation.&lt;/p&gt;

&lt;p&gt;But it is also still awkward.&lt;/p&gt;

&lt;p&gt;Not conceptually awkward — educationally this version is very useful — but operationally awkward. We now have explicit p and g in the handshake, which is nice for understanding the mechanism, but clunky for a modern protocol design.&lt;/p&gt;

&lt;p&gt;That is exactly why the next step will replace this version with X25519.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Implementation Part 2 — Simplifying the Handshake with X25519&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The classic Diffie-Hellman version was useful because it made the mechanics of the handshake fully visible.&lt;/p&gt;

&lt;p&gt;But it also makes something else visible:&lt;/p&gt;

&lt;p&gt;it is a bit clunky.&lt;/p&gt;

&lt;p&gt;Not conceptually clunky — educationally it is great — but operationally clunky. There are more moving parts in the handshake, more explicit protocol fields, and more visible math than modern protocols usually want to expose directly.&lt;/p&gt;

&lt;p&gt;So now we keep the same core idea and simplify the workflow.&lt;/p&gt;

&lt;p&gt;That is where &lt;strong&gt;X25519&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;The conceptual goal stays exactly the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;both sides generate ephemeral private/public key pairs&lt;/li&gt;
&lt;li&gt;both sides exchange public keys&lt;/li&gt;
&lt;li&gt;both sides derive the same shared secret&lt;/li&gt;
&lt;li&gt;that secret will later become the basis for session keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What changes is the &lt;em&gt;shape&lt;/em&gt; of the handshake.&lt;/p&gt;

&lt;p&gt;We no longer need to carry an explicit prime and generator through the protocol. We no longer manually perform modular exponentiation with visible p and g. X25519 gives us the same public-key exchange idea in a much cleaner modern form.&lt;/p&gt;

&lt;p&gt;That is why I wanted this section right after the classic DH version.&lt;/p&gt;

&lt;p&gt;Classic DH makes the mechanism visible.&lt;/p&gt;

&lt;p&gt;X25519 shows what the modern streamlined version looks like.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Client-side handshake structure&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here is the current client handshake implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;client_handshake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Perform the client side of the X25519 handshake.

    Returns the 32-byte shared secret.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[handshake] Client: starting X25519 handshake&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Generate an ephemeral X25519 keypair.
&lt;/span&gt;    &lt;span class="c1"&gt;# "Ephemeral" means we create a fresh keypair for this session only.
&lt;/span&gt;    &lt;span class="c1"&gt;# The private key never leaves this process and is discarded after use.
&lt;/span&gt;    &lt;span class="n"&gt;client_private&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X25519PrivateKey&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;client_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client_private&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;public_key&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;client_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client_public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;public_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Encoding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PublicFormat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Send ClientHello with our public key.
&lt;/span&gt;    &lt;span class="n"&gt;client_hello&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encode_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TAG_X25519_PUBLIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client_public_bytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;send_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;client_hello&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Receive ServerHello with the server's public key.
&lt;/span&gt;    &lt;span class="n"&gt;server_hello_raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recv_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;decode_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_hello_raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;server_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;TAG_X25519_PUBLIC&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;server_public_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;server_public_bytes&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ServerHello missing X25519 public key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Deserialize the server's public key from raw bytes.
&lt;/span&gt;    &lt;span class="n"&gt;server_public&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X25519PublicKey&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_public_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_public_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Compute the shared secret.
&lt;/span&gt;    &lt;span class="c1"&gt;# X25519(client_private, server_public) = X25519(server_private, client_public)
&lt;/span&gt;    &lt;span class="c1"&gt;# This is the elliptic-curve equivalent of g^(ab) mod p from v1.
&lt;/span&gt;    &lt;span class="n"&gt;shared_secret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client_private&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exchange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_public&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;shared_secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I like this version because it makes the transition very clear.&lt;/p&gt;

&lt;p&gt;The client code no longer has to think about p and g at all. It just performs the handshake, gets the shared secret, and prints it. That is exactly the point of this stage in the series: the workflow becomes smaller, but the underlying purpose stays the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What changed conceptually&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Compared to the classic DH version, the protocol has become simpler in three important ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. No explicit shared public parameters in the handshake&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In the previous version, the client sent the prime and generator so the whole structure of classic Diffie-Hellman stayed visible.&lt;/p&gt;

&lt;p&gt;Now that goes away.&lt;/p&gt;

&lt;p&gt;X25519 already gives us a fixed, standard structure for the exchange, so the handshake only needs to carry the public key material.&lt;/p&gt;

&lt;p&gt;That makes the protocol smaller and cleaner.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. The public values are much more compact&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In the classic DH version, the public values were tied to a large prime-field construction and looked much heavier in the protocol.&lt;/p&gt;

&lt;p&gt;In this version, the public keys are just 32 bytes.&lt;/p&gt;

&lt;p&gt;That is a huge practical simplification.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. The code starts to look more like real modern protocol code&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This line from the comments says it well:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;generate(), exchange(), done.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is exactly the feeling this section should create.&lt;/p&gt;

&lt;p&gt;We are still doing public-key exchange.&lt;/p&gt;

&lt;p&gt;We are still deriving a shared secret.&lt;/p&gt;

&lt;p&gt;But the implementation shape is now much closer to what modern systems actually use.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What this version still does not solve&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Even after switching to X25519, this version is still simplified:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;there is still &lt;strong&gt;no authentication&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;the shared secret is &lt;strong&gt;not yet turned into session keys&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;there is still &lt;strong&gt;no record-layer encryption using the new keys&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the next step, we will add &lt;strong&gt;HKDF&lt;/strong&gt; and derive proper working session keys from it.&lt;/p&gt;

&lt;p&gt;That is where the handshake starts to connect back to the record protection we built earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Implementation Part 3 — Deriving Session Keys with HKDF&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;At this point, both the classic Diffie-Hellman version and the X25519 version give us the same kind of output:&lt;/p&gt;

&lt;p&gt;a shared secret that both sides can compute independently.&lt;/p&gt;

&lt;p&gt;That is already a big step forward compared to the pre-shared-key model from the previous parts. The connection can now create fresh key material dynamically instead of starting with one hardcoded application key.&lt;/p&gt;

&lt;p&gt;But there is still one important design question left:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;should we use that raw shared secret directly as the application key?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a toy demo, we probably could.&lt;/p&gt;

&lt;p&gt;But even here, that would be the wrong direction.&lt;/p&gt;

&lt;p&gt;Because a cleaner protocol separates these two ideas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the handshake creates a shared secret&lt;/li&gt;
&lt;li&gt;the protocol derives working session keys from that secret&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly where &lt;strong&gt;HKDF&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;HKDF is a key-derivation function. Its job is not to invent secrecy out of nowhere, but to take existing secret material and turn it into keys that are better structured and easier to use safely inside the protocol.&lt;/p&gt;

&lt;p&gt;So instead of treating the X25519 output as “the AES key,” we will use HKDF to derive proper session keys from it.&lt;/p&gt;

&lt;p&gt;That already makes the protocol feel much closer to real TLS.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What changes conceptually&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The structure now becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X25519 shared secret
        |
        v
      HKDF
        |
        v
  session key material
        |
        v
 protected application data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is an important shift.&lt;/p&gt;

&lt;p&gt;Before this step, the handshake produced something secret and we could have stopped there.&lt;/p&gt;

&lt;p&gt;After this step, the handshake produces an &lt;em&gt;input&lt;/em&gt; to a key schedule.&lt;/p&gt;

&lt;p&gt;That is a much better protocol design.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why this matters&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;There are two main reasons to do this.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. The raw shared secret is handshake output, not final protocol state&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The shared secret is the result of key exchange. That does not automatically mean it should be used directly as the application-data key.&lt;/p&gt;

&lt;p&gt;Protocols usually want a cleaner boundary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;handshake result first&lt;/li&gt;
&lt;li&gt;working keys second&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. We can derive keys for different purposes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Once we introduce a key-derivation step, we are no longer forced into “one secret for everything.”&lt;/p&gt;

&lt;p&gt;Even in this toy protocol, that opens the door to a much more realistic design.&lt;/p&gt;

&lt;p&gt;For example, instead of one single AEAD key, we can derive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;client → server key&lt;/li&gt;
&lt;li&gt;server → client key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is already much closer to how real secure protocols think.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Deriving the keys&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In the current implementation, HKDF takes the X25519 shared secret and stretches it into 64 bytes of key material.&lt;/p&gt;

&lt;p&gt;Then that material is split into two 32-byte keys:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one for traffic from client to server&lt;/li&gt;
&lt;li&gt;one for traffic from server to client&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives us directional keys instead of one shared application key for both directions.&lt;/p&gt;

&lt;p&gt;Here is the key schedule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# key_schedule_x25519.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cryptography.hazmat.primitives&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashes&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cryptography.hazmat.primitives.kdf.hkdf&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HKDF&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;derive_session_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shared_secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;key_material&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HKDF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;algorithm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hashes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SHA256&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;salt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toy-tls-part-3-x25519&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;derive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shared_secret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;client_to_server_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key_material&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;server_to_client_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key_material&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client_to_server_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;server_to_client_key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I like this step a lot because it is small in code, but it changes the protocol mindset in an important way.&lt;/p&gt;

&lt;p&gt;We are no longer thinking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;handshake gives us the key&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We are now thinking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;handshake gives us secret material, and the protocol derives the keys it actually wants to use&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a much stronger model.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A small but important detail&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Notice that the two sides must interpret the derived keys consistently.&lt;/p&gt;

&lt;p&gt;If the client treats the first 32 bytes as the client → server key, then the server must do the same. Otherwise the channel will immediately break.&lt;/p&gt;

&lt;p&gt;So now the handshake is not only producing shared secret material. It is also establishing a shared rule for how that material becomes working traffic keys.&lt;/p&gt;

&lt;p&gt;That is another reason protocols need structure, not just primitives.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Connecting HKDF back to the record layer&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Now we can finally connect this part back to what we built earlier.&lt;/p&gt;

&lt;p&gt;In Part 2, we already built an AEAD-protected record layer. But that record layer still depended on hardcoded keys.&lt;/p&gt;

&lt;p&gt;Now that changes.&lt;/p&gt;

&lt;p&gt;The AEAD layer no longer starts with a static key from configuration.&lt;/p&gt;

&lt;p&gt;It receives fresh traffic keys from the handshake.&lt;/p&gt;

&lt;p&gt;So the protocol shape becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Handshake -&amp;gt; X25519 shared secret -&amp;gt; HKDF -&amp;gt; directional session keys -&amp;gt; AEAD protected records
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a major milestone in the series.&lt;/p&gt;

&lt;p&gt;At this point, the protocol no longer just looks secure because we wrapped some bytes in encryption. It now has a real high-level structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;first establish shared key material&lt;/li&gt;
&lt;li&gt;then derive traffic keys&lt;/li&gt;
&lt;li&gt;then use those keys to protect application data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is already much closer to the shape of real TLS.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Using the new session keys&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once the keys are derived, the record layer can use them directly.&lt;/p&gt;

&lt;p&gt;Conceptually, the flow now looks like this:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Client&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AF_INET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOCK_STREAM&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Connected to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;HOST&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ==========================================
&lt;/span&gt;    &lt;span class="c1"&gt;# PHASE 1: HANDSHAKE
&lt;/span&gt;    &lt;span class="c1"&gt;# ==========================================
&lt;/span&gt;    &lt;span class="c1"&gt;# New in Part 3: the handshake dynamically establishes session keys.
&lt;/span&gt;    &lt;span class="c1"&gt;# No pre-shared secret needed.
&lt;/span&gt;    &lt;span class="n"&gt;client_write_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;server_write_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;client_handshake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ==========================================
&lt;/span&gt;    &lt;span class="c1"&gt;# PHASE 2: APPLICATION DATA
&lt;/span&gt;    &lt;span class="c1"&gt;# ==========================================
&lt;/span&gt;    &lt;span class="c1"&gt;# The record layer now uses HKDF-derived keys instead of hardcoded ones.
&lt;/span&gt;    &lt;span class="c1"&gt;# The record format is the same as Part 2 Stage 3 (AEAD).
&lt;/span&gt;
    &lt;span class="c1"&gt;# --- Send request (encrypted with client_write_key) ---
&lt;/span&gt;    &lt;span class="n"&gt;protected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;protect_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_write_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;send_seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;send_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;protected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;send_seq&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# --- Receive response (decrypted with server_write_key) ---
&lt;/span&gt;    &lt;span class="n"&gt;raw_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recv_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;unprotect_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_write_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recv_seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;recv_seq&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;  Decrypted response:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;  *** REJECTED: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ***&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Done.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;use client_write_key to protect outgoing application data&lt;/li&gt;
&lt;li&gt;use server_write_key to unprotect incoming application data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Server&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AF_INET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOCK_STREAM&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setsockopt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SOL_SOCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SO_REUSEADDR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;HOST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Listening on &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;HOST&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# ==========================================
&lt;/span&gt;        &lt;span class="c1"&gt;# PHASE 1: HANDSHAKE
&lt;/span&gt;        &lt;span class="c1"&gt;# ==========================================
&lt;/span&gt;        &lt;span class="n"&gt;client_write_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;server_write_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;server_handshake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# ==========================================
&lt;/span&gt;        &lt;span class="c1"&gt;# PHASE 2: APPLICATION DATA
&lt;/span&gt;        &lt;span class="c1"&gt;# ==========================================
&lt;/span&gt;
        &lt;span class="c1"&gt;# --- Receive request (decrypted with client_write_key) ---
&lt;/span&gt;        &lt;span class="n"&gt;raw_request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recv_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;unprotect_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_write_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recv_seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;recv_seq&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;  *** REJECTED: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ***&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Connection closed — refusing to process invalid data.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

            &lt;span class="c1"&gt;# --- Send response (encrypted with server_write_key) ---
&lt;/span&gt;            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTTP/1.1 200 OK&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type: text/plain&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Length: 13&lt;/span&gt;&lt;span class="se"&gt;\r\n\r\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello, client&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;protected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;protect_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_write_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;send_seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;send_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;protected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;send_seq&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Done.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;use client_write_key to unprotect incoming client traffic&lt;/li&gt;
&lt;li&gt;use server_write_key to protect outgoing server traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the two directions are now separated.&lt;/p&gt;

&lt;p&gt;This is cleaner than one symmetric application key shared blindly by both directions, and it makes the protocol feel more deliberate.&lt;/p&gt;

&lt;p&gt;Even in this simplified version, that is a meaningful step.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What this step really gave us&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;By adding HKDF, we improved the protocol in a way that is easy to underestimate.&lt;/p&gt;

&lt;p&gt;We did not just “derive another key.”&lt;/p&gt;

&lt;p&gt;We made the protocol architecture cleaner.&lt;/p&gt;

&lt;p&gt;Now the handshake and the traffic layer are connected in a more principled way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the handshake creates shared secret material&lt;/li&gt;
&lt;li&gt;the key schedule turns that material into working keys&lt;/li&gt;
&lt;li&gt;the record layer consumes those keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a much better model than treating the raw X25519 result as the final answer.&lt;/p&gt;

&lt;p&gt;And it brings us one step closer to real TLS, where key derivation is not an optional detail, but one of the central pieces of the protocol design.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;But we are still not secure&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;And now we arrive at the uncomfortable but necessary part.&lt;/p&gt;

&lt;p&gt;Even with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a real handshake&lt;/li&gt;
&lt;li&gt;X25519&lt;/li&gt;
&lt;li&gt;HKDF&lt;/li&gt;
&lt;li&gt;fresh directional session keys&lt;/li&gt;
&lt;li&gt;AEAD-protected records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the protocol still cannot be considered secure enough.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because all of this still says nothing about &lt;strong&gt;who&lt;/strong&gt; is on the other side.&lt;/p&gt;

&lt;p&gt;The handshake can successfully create shared secrets.&lt;/p&gt;

&lt;p&gt;HKDF can successfully derive traffic keys.&lt;/p&gt;

&lt;p&gt;The record layer can successfully protect application data.&lt;/p&gt;

&lt;p&gt;And an attacker can still sit in the middle and run two separate handshakes.&lt;/p&gt;

&lt;p&gt;That is the next lesson.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Still Not Secure — The Man-in-the-Middle Problem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;At this point, our protocol already looks much more serious than the one we started with.&lt;/p&gt;

&lt;p&gt;We now have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a real handshake&lt;/li&gt;
&lt;li&gt;fresh shared secrets&lt;/li&gt;
&lt;li&gt;X25519 instead of a pre-shared application key&lt;/li&gt;
&lt;li&gt;HKDF-derived session keys&lt;/li&gt;
&lt;li&gt;AEAD-protected application records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a long way from the fake secure channel in Part 1.&lt;/p&gt;

&lt;p&gt;But it is still not enough.&lt;/p&gt;

&lt;p&gt;The missing piece is one of the most important ideas in this whole series:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;key exchange is not authentication&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That sentence is easy to read quickly and move on from. But it is worth stopping here, because this is exactly where many protocols fail.&lt;/p&gt;

&lt;p&gt;Our handshake proves that both sides can derive the same shared secret.&lt;/p&gt;

&lt;p&gt;What it does &lt;strong&gt;not&lt;/strong&gt; prove is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;who&lt;/strong&gt; is actually on the other side.&lt;/p&gt;

&lt;p&gt;And that difference is the whole problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The attack&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Imagine an active attacker sitting between the client and the server.&lt;/p&gt;

&lt;p&gt;Let’s call her Mallory.&lt;/p&gt;

&lt;p&gt;The client thinks it is talking to the server.&lt;/p&gt;

&lt;p&gt;The server thinks it is talking to the client.&lt;/p&gt;

&lt;p&gt;But Mallory intercepts the handshake and replaces the exchanged public keys with her own.&lt;/p&gt;

&lt;p&gt;In simplified form, the flow looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiq3eu6p0rk0j2mynzq9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiq3eu6p0rk0j2mynzq9z.png" alt="attack schema"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And now something very important happens.&lt;/p&gt;

&lt;p&gt;The handshake still “works.”&lt;/p&gt;

&lt;p&gt;But it works in the wrong way.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;strong&gt;client&lt;/strong&gt; ends up with a shared secret with &lt;strong&gt;Mallory&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;the &lt;strong&gt;server&lt;/strong&gt; ends up with a different shared secret with &lt;strong&gt;Mallory&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;and &lt;strong&gt;Mallory&lt;/strong&gt; now has one valid secure channel to each side&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From the point of view of the client and the server, everything looks normal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;key exchange succeeded&lt;/li&gt;
&lt;li&gt;keys were derived&lt;/li&gt;
&lt;li&gt;encrypted records verify correctly&lt;/li&gt;
&lt;li&gt;AEAD tags are valid&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And yet the protocol has already failed.&lt;/p&gt;

&lt;p&gt;Because Mallory can now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;decrypt the client’s traffic&lt;/li&gt;
&lt;li&gt;read it or modify it&lt;/li&gt;
&lt;li&gt;re-encrypt it toward the server&lt;/li&gt;
&lt;li&gt;receive the server’s response&lt;/li&gt;
&lt;li&gt;read it or modify it&lt;/li&gt;
&lt;li&gt;re-encrypt it back toward the client&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Neither side can detect this.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;In The Next Part — Building the Certificate Infrastructure&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The handshake only proves one thing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I computed a shared secret with whoever sent me this public key.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; prove:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This public key came from the server I actually intended to talk to.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the missing half.&lt;/p&gt;

&lt;p&gt;To fix this, the client needs a way to verify that the public key it receives during the handshake actually belongs to the server it wanted to talk to.&lt;/p&gt;

&lt;p&gt;That is where the next layer enters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;certificates&lt;/li&gt;
&lt;li&gt;signatures&lt;/li&gt;
&lt;li&gt;trust chains&lt;/li&gt;
&lt;li&gt;certificate authorities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, this is where the protocol must stop proving only that “someone” is there and start proving &lt;strong&gt;who&lt;/strong&gt; that someone is.&lt;/p&gt;

&lt;p&gt;That is exactly what the next article will build.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;Our protocol now has secrecy against passive observers.&lt;/p&gt;

&lt;p&gt;It has integrity for protected records.&lt;/p&gt;

&lt;p&gt;It has fresh session keys.&lt;/p&gt;

&lt;p&gt;But it still does not have &lt;strong&gt;identity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And without identity, a correct shared secret with the wrong party is still a protocol failure.&lt;/p&gt;

&lt;p&gt;That is the deeper lesson of Part 3.&lt;/p&gt;

&lt;p&gt;Part 1 taught us:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;confidentiality is not integrity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Part 2 taught us:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;protecting records is not the same thing as establishing trust&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And now Part 3 adds the next lesson:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;key exchange is not authentication&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That we will solve in the next article!&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Final Code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The full code for this part is available here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/DmytroHuzz/rebuilding_tls/tree/main/part_3" rel="noopener noreferrer"&gt;https://github.com/DmytroHuzz/rebuilding_tls/tree/main/part_3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>learning</category>
      <category>development</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>Amazon Aurora DSQL: A Practical Guide to AWS's Distributed SQL Database</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Sun, 19 Apr 2026 16:23:59 +0000</pubDate>
      <link>https://future.forem.com/aws-builders/amazon-aurora-dsql-a-practical-guide-to-awss-distributed-sql-database-2n58</link>
      <guid>https://future.forem.com/aws-builders/amazon-aurora-dsql-a-practical-guide-to-awss-distributed-sql-database-2n58</guid>
      <description>&lt;p&gt;&lt;em&gt;Architecture, features, Terraform setup, and real application code - April 2026&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;When AWS announced Aurora DSQL at re:Invent 2024, I was very interested. We had heard promises about distributed SQL databases before and I really wanted to try it out. I experimented with it locally for a while and then built the &lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;Kabob Store&lt;/a&gt; example on it. Fifteen months later, DSQL has gone from preview to general availability, expanded to 14 regions, and shipped a steady stream of features. It fills the gap between DynamoDB's serverless economics and Aurora PostgreSQL's SQL power - and it does it well.&lt;/p&gt;

&lt;p&gt;This is my comprehensive look at where DSQL stands in April 2026: what it does, what it doesn't do yet, how to set it up with Terraform, and practical application code you can use today.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Aurora DSQL?
&lt;/h2&gt;

&lt;p&gt;For years, the database decision on AWS looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need serverless economics?&lt;/strong&gt; DynamoDB. But learn single-table design and give up SQL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need SQL?&lt;/strong&gt; RDS or Aurora PostgreSQL. But accept always-on costs, instance sizing, and 10-15 minute provisioning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need multi-Region?&lt;/strong&gt; DynamoDB Global Tables. SQL wasn't an option without manual replication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aurora DSQL eliminates the tradeoff. Four things make it different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless to zero&lt;/strong&gt; - No instances, no capacity planning. Zero DPU charges when idle. Provisions in under 60 seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL compatible&lt;/strong&gt; - Based on PostgreSQL 16. Use psql, psycopg2, pgx, JDBC - the drivers you already know.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strongly consistent&lt;/strong&gt; - Not eventually consistent. Snapshot isolation with linearizability. Readers always see committed data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active-active multi-Region&lt;/strong&gt; - Two full regions with concurrent reads and writes. No leader, no failover, no replication lag on commit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is Aurora DSQL?
&lt;/h2&gt;

&lt;p&gt;Aurora DSQL is a serverless, distributed SQL database that disaggregates every component of a traditional database engine. Unlike Aurora PostgreSQL (which separates storage from compute but keeps them coupled), DSQL breaks the database into six independently scaling components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Query Processors (QPs)&lt;/strong&gt; - Run customized PostgreSQL engines inside Firecracker MicroVMs. Handle SQL parsing, planning, and execution. Scale independently based on query load.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adjudicators&lt;/strong&gt; - Validate transactions at COMMIT time using Optimistic Concurrency Control (OCC). Stateless and reconstructible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Journal&lt;/strong&gt; - A Paxos-based distributed transaction log (same technology as MemoryDB). Provides cross-AZ and cross-Region durability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crossbar&lt;/strong&gt; - Merges journal streams and publishes committed writes to storage replicas. Sits between the Journal and Storage layers, ensuring all storage replicas receive the same ordered stream of committed transactions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt; - MVCC storage replicas distributed across 3 AZs. Consume committed entries from the Crossbar. Scale independently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Control Plane&lt;/strong&gt; - Coordinates all components, handles cluster lifecycle and scaling.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Note: The&lt;/em&gt; &lt;a href="https://docs.aws.amazon.com/aurora-dsql/latest/userguide/what-is-aurora-dsql.html" rel="noopener noreferrer"&gt;official AWS User Guide&lt;/a&gt; &lt;em&gt;describes these layers as "Relay and connectivity, Compute and databases, Transaction log/concurrency control/isolation, Storage, and Control plane." The component names used here (Query Processors, Adjudicators, Journal, Crossbar) come from&lt;/em&gt; &lt;a href="https://brooker.co.za/blog/2024/12/03/aurora-dsql.html" rel="noopener noreferrer"&gt;Marc Brooker's architecture deep-dive series&lt;/a&gt; &lt;em&gt;and the&lt;/em&gt; &lt;a href="https://aws.amazon.com/blogs/database/introducing-amazon-aurora-dsql/" rel="noopener noreferrer"&gt;AWS Database Blog&lt;/a&gt;&lt;em&gt;, which provide more implementation detail.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7lx19wkxwxif02puesbh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7lx19wkxwxif02puesbh.png" alt="Aurora DSQL Single Region Architecture" width="800" height="1558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key design achievement, as Marc Brooker (VP/Distinguished Engineer at AWS) explained in his &lt;a href="https://brooker.co.za/blog/2024/12/03/aurora-dsql.html" rel="noopener noreferrer"&gt;DSQL blog series&lt;/a&gt;, is that cross-region latency is incurred &lt;strong&gt;only at COMMIT time&lt;/strong&gt;, not per-statement. During a transaction, reads and writes execute locally on the Query Processor. Only when you commit does the system coordinate with the Adjudicator and Journal for conflict detection and durability. Read-only transactions need no validation, no persistence, and no cross-region coordination at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Concepts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Optimistic Concurrency Control (OCC)&lt;/strong&gt; - DSQL doesn't use locks. Transactions proceed without blocking each other. At COMMIT, the Adjudicator checks for write-write conflicts. If two transactions modified the same rows, one succeeds and the other gets a serialization failure (SQLSTATE 40001). Your application retries the failed transaction. No deadlocks, ever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Snapshot Isolation&lt;/strong&gt; - Each transaction sees a consistent snapshot of the database as of its start time (tau_start). All reads within a transaction see the same data, regardless of concurrent commits by other transactions. Equivalent to PostgreSQL's REPEATABLE READ.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IAM Authentication&lt;/strong&gt; - No database passwords. Period. Applications generate tokens using &lt;code&gt;generate_db_connect_auth_token&lt;/code&gt; (for runtime DML) or &lt;code&gt;generate_db_connect_admin_auth_token&lt;/code&gt; (for schema migrations only). Integrates with IAM roles, so your ECS tasks and Lambda functions authenticate using their execution role. Tokens default to 15 minutes but can be configured up to one week using the &lt;code&gt;token-duration-secs&lt;/code&gt; parameter in the connectors and CLI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Asynchronous Indexes&lt;/strong&gt; - DSQL requires &lt;code&gt;CREATE INDEX ASYNC&lt;/code&gt; (synchronous &lt;code&gt;CREATE INDEX&lt;/code&gt; is not supported). The index builds asynchronously while transactions continue. You can monitor build progress through system catalog queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single DDL Per Transaction&lt;/strong&gt; - Each &lt;code&gt;CREATE TABLE&lt;/code&gt;, &lt;code&gt;ALTER TABLE&lt;/code&gt;, or &lt;code&gt;CREATE INDEX&lt;/code&gt; statement needs its own transaction with an explicit commit before the next DDL statement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature Timeline: From Preview to Production
&lt;/h2&gt;

&lt;p&gt;DSQL has shipped features at a steady pace since launch. Here's what has been added:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;February 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;DSQL Playground&lt;/strong&gt; (browser-based, no AWS account needed), sequences and identity columns, Go/Ruby/Python (asyncpg)/Node.js (WebSocket) connectors, numeric index support, AI steering (Kiro Powers, Claude/Gemini/Codex Skills), DBeaver plugin, SQLTools VS Code driver, Tortoise ORM adapter, Flyway dialect, Prisma CLI tools, expanded to 14 regions (added Canada, Sydney, Melbourne)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;December 2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cluster lifecycle management, enhanced PrivateLink (Direct Connect + VPC peering), PostgreSQL migration guide&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;November 2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Query Editor in console, JupyterLab integration, Python and Node.js connectors, storage quota increased to 256 TiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;October 2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Resource-based policies for fine-grained cluster access control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;September 2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JDBC connector for Java applications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;August 2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Fault Injection Service (FIS) integration for chaos testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;May 2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;General Availability&lt;/strong&gt; - CloudWatch monitoring, AWS Backup, KMS CMK encryption, CloudFormation support, PrivateLink, Views&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;December 2024&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Preview launch at re:Invent (3 US regions)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Region Availability (April 2026)
&lt;/h3&gt;

&lt;p&gt;DSQL is now available in &lt;strong&gt;14 regions&lt;/strong&gt; across 4 continents:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Continent&lt;/th&gt;
&lt;th&gt;Regions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;North America&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;us-east-1 (Virginia), us-east-2 (Ohio), us-west-2 (Oregon), ca-central-1 (Montreal), ca-west-1 (Calgary)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Europe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-2 (London), eu-west-3 (Paris)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Asia Pacific&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ap-northeast-1 (Tokyo), ap-northeast-2 (Seoul), ap-northeast-3 (Osaka), ap-southeast-2 (Sydney), ap-southeast-4 (Melbourne)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Multi-Region Cluster Sets
&lt;/h3&gt;

&lt;p&gt;Multi-Region clusters must stay within one geographic set:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;US&lt;/strong&gt;: us-east-1, us-east-2, us-west-2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Europe&lt;/strong&gt;: eu-central-1, eu-west-1, eu-west-2, eu-west-3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asia Pacific&lt;/strong&gt;: ap-northeast-1, ap-northeast-2, ap-northeast-3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Canada (ca-central-1, ca-west-1), Sydney (ap-southeast-2), and Melbourne (ap-southeast-4) are available as single-region clusters only and are not part of any multi-Region set. This is a common gotcha for customers in those regions.&lt;/p&gt;

&lt;p&gt;Cross-continent multi-Region clusters are not supported. For global data sync across continents, DynamoDB Global Tables remain the option.&lt;/p&gt;




&lt;h2&gt;
  
  
  DSQL vs the Alternatives
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Aurora DSQL&lt;/th&gt;
&lt;th&gt;Aurora PostgreSQL Serverless v2&lt;/th&gt;
&lt;th&gt;DynamoDB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PostgreSQL SQL&lt;/td&gt;
&lt;td&gt;PostgreSQL SQL&lt;/td&gt;
&lt;td&gt;PartiQL / NoSQL API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Provisioning time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Under 60 seconds&lt;/td&gt;
&lt;td&gt;10-15 minutes&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scales to zero&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (no DPU charges)&lt;/td&gt;
&lt;td&gt;Yes (0 ACU with auto-pause, ~15s cold start)&lt;/td&gt;
&lt;td&gt;Yes (on-demand mode)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-Region&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Active-active, strong consistency&lt;/td&gt;
&lt;td&gt;Read replicas, eventual&lt;/td&gt;
&lt;td&gt;Global Tables, eventual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Availability SLA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;99.99% / 99.999% multi-Region&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;td&gt;99.99% / 99.999% global&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IAM only (no passwords)&lt;/td&gt;
&lt;td&gt;IAM or passwords&lt;/td&gt;
&lt;td&gt;IAM or passwords&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Foreign keys&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not yet&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (NoSQL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stored procedures&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not yet&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;256 TiB&lt;/td&gt;
&lt;td&gt;128 TiB&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transaction limits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3,000 rows, 10 MiB, 5 min&lt;/td&gt;
&lt;td&gt;Practical limits (memory, storage, lock timeouts)&lt;/td&gt;
&lt;td&gt;100 items, 4 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per DPU ($8/million)&lt;/td&gt;
&lt;td&gt;Per ACU-hour ($0.12+)&lt;/td&gt;
&lt;td&gt;Per RRU/WRU or provisioned&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Use What
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Choose DSQL when&lt;/strong&gt; you need SQL with serverless economics, multi-Region strong consistency, or you're building new applications that benefit from zero infrastructure management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Aurora PostgreSQL when&lt;/strong&gt; you need foreign keys, stored procedures, triggers, pgvector for AI embeddings, or you're running an existing PostgreSQL application that uses unsupported features. Aurora Serverless v2 now scales to 0 ACUs with auto-pause (since November 2024), so it also offers scale-to-zero economics - with the tradeoff of a ~15-second cold start on resume.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose DynamoDB when&lt;/strong&gt; your data model fits key-value or document patterns naturally, you need sub-millisecond latency, cross-continent global replication, or unlimited throughput scaling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up DSQL with Terraform
&lt;/h2&gt;

&lt;p&gt;All the Terraform code below uses Terraform &amp;gt;= 1.11 and the AWS provider ~&amp;gt; 6.0 . The &lt;code&gt;terraform-aws-modules/rds-aurora&lt;/code&gt; DSQL submodule requires Terraform &amp;gt;= 1.11 and provider &amp;gt;= 6.18. The complete examples are in the &lt;a href="https://github.com/RDarrylR/aurora-dsql-2026" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single-Region Cluster
&lt;/h3&gt;

&lt;p&gt;This is the simplest setup. One resource, 60 seconds to provision, automatically distributed across 3 AZs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;gt;= 1.11"&lt;/span&gt;

  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/aws"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_region"&lt;/span&gt; &lt;span class="s2"&gt;"current"&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_dsql_cluster"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;deletion_protection_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

  &lt;span class="c1"&gt;# For production, enable deletion protection and add a KMS CMK:&lt;/span&gt;
  &lt;span class="c1"&gt;# deletion_protection_enabled = true&lt;/span&gt;
  &lt;span class="c1"&gt;# kms_encryption_key          = aws_kms_key.dsql.arn&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-app-dsql"&lt;/span&gt;
    &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dev"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# DSQL has no "endpoint" attribute - construct it from the identifier&lt;/span&gt;
&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"dsql_endpoint"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${aws_dsql_cluster.main.identifier}.dsql.${data.aws_region.current.id}.on.aws"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"dsql_arn"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_dsql_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No instance class, no storage allocation, no replica configuration. One resource gives you a PostgreSQL-compatible database with 99.99% availability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Region Cluster with Terraform Module
&lt;/h3&gt;

&lt;p&gt;For production workloads requiring 99.999% availability, use multi-Region clusters. The official &lt;code&gt;terraform-aws-modules/rds-aurora&lt;/code&gt; module includes a DSQL submodule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"aws"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"aws"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;alias&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"secondary"&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-2"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"dsql_primary"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/rds-aurora/aws//modules/dsql"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 10.0"&lt;/span&gt;

  &lt;span class="nx"&gt;deletion_protection_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;witness_region&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-west-2"&lt;/span&gt;
  &lt;span class="nx"&gt;create_cluster_peering&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;clusters&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dsql_secondary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-app-dsql-primary"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"dsql_secondary"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/rds-aurora/aws//modules/dsql"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 10.0"&lt;/span&gt;

  &lt;span class="nx"&gt;providers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secondary&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;deletion_protection_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;witness_region&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-west-2"&lt;/span&gt;
  &lt;span class="nx"&gt;create_cluster_peering&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;clusters&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dsql_primary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-app-dsql-secondary"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The module handles cluster peering automatically. One &lt;code&gt;terraform apply&lt;/code&gt; creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Primary cluster in us-east-1 with full read/write endpoint&lt;/li&gt;
&lt;li&gt;Secondary cluster in us-east-2 with full read/write endpoint&lt;/li&gt;
&lt;li&gt;Witness region in us-west-2 for Journal-only quorum (no endpoint, no user access)&lt;/li&gt;
&lt;li&gt;Bidirectional peering with synchronous replication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both endpoints present a single logical database. Your application can read and write to either endpoint. Strong consistency across both regions with zero replication lag on commit.&lt;/p&gt;

&lt;p&gt;If you prefer using the native &lt;code&gt;aws_dsql_cluster&lt;/code&gt; resource directly instead of the module, the multi-Region interface uses &lt;code&gt;multi_region_properties&lt;/code&gt; with &lt;code&gt;witness_region&lt;/code&gt; - see the commented-out Option B in the &lt;a href="https://github.com/RDarrylR/aurora-dsql-2026/blob/main/terraform/dsql-multi-region.tf" rel="noopener noreferrer"&gt;dsql-multi-region.tf&lt;/a&gt; example. Also note that AWS provider 6.x introduced per-resource &lt;code&gt;region&lt;/code&gt; attributes, which can eliminate the need for provider aliases in some configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  IAM Authentication Policy
&lt;/h3&gt;

&lt;p&gt;DSQL uses two IAM permission levels. Use the right one for each role:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;dsql:DbConnect&lt;/code&gt; - Generates tokens for connecting with custom database roles. &lt;strong&gt;Use this for application runtime.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dsql:DbConnectAdmin&lt;/code&gt; - Generates tokens for connecting as the &lt;code&gt;admin&lt;/code&gt; database user (full DDL + DML). &lt;strong&gt;Use this only for schema migrations and admin tasks.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that the DDL/DML restriction is enforced at the database role level, not the IAM layer. &lt;code&gt;DbConnect&lt;/code&gt; generates a token that can only authenticate as a custom role (not &lt;code&gt;admin&lt;/code&gt;), and custom roles only have the permissions you grant them. &lt;code&gt;DbConnectAdmin&lt;/code&gt; generates a token that authenticates as &lt;code&gt;admin&lt;/code&gt;, which has full privileges. AWS's security best practices are clear: don't use the admin role for everyday operations. Create separate IAM roles and custom database roles for application access.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Application runtime policy - DML only (least privilege)&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"dsql_app_runtime"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dsql:DbConnect"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_dsql_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Admin/migration policy - DDL + DML (for CI/CD pipelines, not app runtime)&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"dsql_admin"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dsql:DbConnectAdmin"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_dsql_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ECS task role for application runtime - uses DbConnect, NOT DbConnectAdmin&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"app_task"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-app-task-role"&lt;/span&gt;

  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ecs-tasks.amazonaws.com"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"app_dsql"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dsql-runtime-access"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;app_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dsql_app_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Separate role for schema migrations (CI/CD pipeline, not the running app)&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"migration_role"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-app-migration-role"&lt;/span&gt;

  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ecs-tasks.amazonaws.com"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"migration_dsql"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dsql-admin-access"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;migration_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dsql_admin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always scope DSQL permissions to the specific cluster ARN. No wildcard resources. Your running application should never have &lt;code&gt;DbConnectAdmin&lt;/code&gt; - reserve that for migration tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom Database Role (Least Privilege at the Database Layer)
&lt;/h3&gt;

&lt;p&gt;IAM controls which token type you can generate, but you should also avoid connecting as &lt;code&gt;admin&lt;/code&gt; for everyday operations. Create a custom database role and map it to an IAM identity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Connect as admin (one-time setup via DbConnectAdmin)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;app_role&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_role&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Map the IAM role ARN to the custom database role (DSQL-specific syntax)&lt;/span&gt;
&lt;span class="n"&gt;AWS&lt;/span&gt; &lt;span class="n"&gt;IAM&lt;/span&gt; &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;app_role&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="s1"&gt;'arn:aws:iam::123456789012:role/my-app-task-role'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- To revoke later:&lt;/span&gt;
&lt;span class="c1"&gt;-- AWS IAM REVOKE app_role FROM 'arn:aws:iam::123456789012:role/my-app-task-role';&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then connect as the custom role in your application code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cluster_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app_role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Custom role, not admin
&lt;/span&gt;    &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Token from generate_db_connect_auth_token
&lt;/span&gt;    &lt;span class="n"&gt;sslmode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;require&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This completes the least-privilege story at both layers: IAM controls token generation (&lt;code&gt;DbConnect&lt;/code&gt; vs &lt;code&gt;DbConnectAdmin&lt;/code&gt;), and the database role controls what SQL the connection can execute.&lt;/p&gt;

&lt;h3&gt;
  
  
  PrivateLink (Production)
&lt;/h3&gt;

&lt;p&gt;For production workloads, keep database traffic off the public internet using VPC endpoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_endpoint"&lt;/span&gt; &lt;span class="s2"&gt;"dsql"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_dsql_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_endpoint_service_name&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_endpoint_type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Interface"&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private&lt;/span&gt;&lt;span class="p"&gt;[*].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;security_group_ids&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dsql_endpoint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_dns_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"dsql_endpoint"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name_prefix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-dsql-endpoint-"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5432&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5432&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;security_groups&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;egress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"-1"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow all outbound (required for VPC endpoint communication)"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;private_dns_enabled = true&lt;/code&gt;, your application connects using the same cluster endpoint - no code changes needed. For connections from on-premises via Direct Connect without private DNS, use the &lt;code&gt;amzn-cluster-id&lt;/code&gt; connection parameter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Application Code: Python
&lt;/h2&gt;

&lt;p&gt;The examples below use Python 3.13+ with psycopg2 2.9.11 and boto3. The full example is in &lt;a href="https://github.com/RDarrylR/aurora-dsql-2026/blob/main/python/dsql_connection.py" rel="noopener noreferrer"&gt;dsql_connection.py&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection with IAM Auth
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;psycopg2.extras&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RealDictCursor&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dsql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;cluster_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.dsql.us-east-1.on.aws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Token method must match the user:
# - user="admin" -&amp;gt; generate_db_connect_admin_auth_token (DDL + DML)
# - custom role  -&amp;gt; generate_db_connect_auth_token (DML only)
&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_db_connect_admin_auth_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cluster_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# For production, use a custom database role - see "Custom Database Role" section
&lt;/span&gt;    &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sslmode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;require&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cursor_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RealDictCursor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the official connector (&lt;code&gt;pip install aurora-dsql-python-connector&lt;/code&gt;, v0.2.6+) which handles token refresh automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aurora_dsql_python_connector&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;connect&lt;/span&gt;

&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cluster_endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;abc123.dsql.us-east-1.on.aws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The OCC Retry Pattern
&lt;/h3&gt;

&lt;p&gt;This is the most important pattern for DSQL applications. Since DSQL uses Optimistic Concurrency Control instead of locks, write transactions can fail at COMMIT when concurrent modifications conflict:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2.errors&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;with_occ_retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base_delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retry wrapper for OCC conflicts (SQLSTATE 40001).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SerializationFailure&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt;
            &lt;span class="n"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_delay&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_do_insert&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            INSERT INTO orders (customer_email, items, total_amount)
            VALUES (%s, %s, %s)
            RETURNING *
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;with_occ_retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_do_insert&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points about OCC:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read-only transactions never conflict - they don't need retry logic&lt;/li&gt;
&lt;li&gt;OCC conflicts are SQLSTATE 40001 (serialization_failure)&lt;/li&gt;
&lt;li&gt;Use exponential backoff to avoid retry storms&lt;/li&gt;
&lt;li&gt;Design transactions to be small and fast to minimize conflict windows&lt;/li&gt;
&lt;li&gt;Avoid hot-spot writes (e.g., incrementing a single counter row from many threads)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Schema Setup with DDL Limits
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_tables&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_connection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# One DDL per transaction - commit before next DDL
&lt;/span&gt;    &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        CREATE TABLE IF NOT EXISTS products (
            id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
            name VARCHAR(200) NOT NULL,
            price NUMERIC(10, 2) NOT NULL,
            category VARCHAR(50),
            created_at TIMESTAMPTZ DEFAULT now()
        )
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Must commit before next DDL
&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        CREATE TABLE IF NOT EXISTS orders (
            id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
            customer_email VARCHAR(255) NOT NULL,
            items TEXT NOT NULL,
            total_amount NUMERIC(10, 2) NOT NULL,
            status VARCHAR(20) DEFAULT &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,
            created_at TIMESTAMPTZ DEFAULT now()
        )
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# Separate transaction for each DDL
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Sequences and Identity Columns (New - February 2026)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Using identity columns for auto-incrementing IDs
&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    CREATE TABLE IF NOT EXISTS audit_log (
        id BIGINT GENERATED ALWAYS AS IDENTITY (CACHE 65536) PRIMARY KEY,
        event_type VARCHAR(50) NOT NULL,
        payload TEXT,
        created_at TIMESTAMPTZ DEFAULT now()
    )
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Or use sequences directly
&lt;/span&gt;&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE SEQUENCE IF NOT EXISTS invoice_seq START 1000 CACHE 65536&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT nextval(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;invoice_seq&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;next_invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nextval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Application Code: Node.js
&lt;/h2&gt;

&lt;p&gt;The examples below use Node.js 24.x LTS with &lt;code&gt;@aws-sdk/dsql-signer&lt;/code&gt; and &lt;code&gt;pg&lt;/code&gt; 8.20+. The full example is in &lt;a href="https://github.com/RDarrylR/aurora-dsql-2026/blob/main/nodejs/dsql-connection.mjs" rel="noopener noreferrer"&gt;dsql-connection.mjs&lt;/a&gt;. You can also use the official connector &lt;code&gt;@aws/aurora-dsql-node-postgres-connector&lt;/code&gt; (v0.1.8+) which wraps &lt;code&gt;pg&lt;/code&gt; with automatic IAM auth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection with AWS SDK Signer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DsqlSigner&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws-sdk/dsql-signer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;pg&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;signer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DsqlSigner&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;abc123.dsql.us-east-1.on.aws&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;us-east-1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Token method matches the user:&lt;/span&gt;
&lt;span class="c1"&gt;// - "admin" -&amp;gt; getDbConnectAdminAuthToken (DDL + DML)&lt;/span&gt;
&lt;span class="c1"&gt;// - custom role -&amp;gt; getDbConnectAuthToken (DML only)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;signer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDbConnectAdminAuthToken&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;pg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;abc123.dsql.us-east-1.on.aws&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5432&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;postgres&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;admin&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  OCC Retry in Node.js
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;withOccRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;txnFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;BEGIN&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;txnFn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;COMMIT&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ROLLBACK&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;40001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;withOccRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`INSERT INTO orders (customer_email, items, total_amount)
     VALUES ($1, $2, $3) RETURNING *`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Multi-Region Application Architecture
&lt;/h2&gt;

&lt;p&gt;For applications that need 99.999% availability and low-latency reads from multiple regions, deploy your application stack in each DSQL region with Route53 latency-based routing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhdev6rp5g2tcuq34ks6m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhdev6rp5g2tcuq34ks6m.png" alt="Aurora DSQL Multi-Region Application Architecture" width="800" height="589"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architecture uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Route53 latency-based routing&lt;/strong&gt; to direct users to the nearest region&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudFront&lt;/strong&gt; for static asset caching and edge termination&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ECS Fargate&lt;/strong&gt; running application containers in each region&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aurora DSQL&lt;/strong&gt; with active-active clusters in both regions and a witness for quorum&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both DSQL endpoints present a single logical database. East Coast users connect to us-east-1, West Coast users connect to us-east-2 (or us-west-2 if available as a full endpoint) - both reading and writing the same strongly consistent data. The witness region in us-west-2 stores only encrypted Journal entries for quorum, with no user endpoint.&lt;/p&gt;

&lt;p&gt;This is conceptually similar to DynamoDB Global Tables, but with full PostgreSQL SQL support and strong consistency instead of eventual consistency.&lt;/p&gt;




&lt;h2&gt;
  
  
  Database Limits to Know
&lt;/h2&gt;

&lt;p&gt;DSQL has intentional limits that prevent tail latency and keep the system predictable. These aren't bugs - they're design choices:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Limit&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Why It Matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rows per transaction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3,000&lt;/td&gt;
&lt;td&gt;Keeps OCC conflict windows small. Batch large inserts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transaction size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10 MiB&lt;/td&gt;
&lt;td&gt;Prevents oversized commits from impacting the Journal.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transaction duration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5 minutes&lt;/td&gt;
&lt;td&gt;Forces short, focused transactions. No long-running locks (because there are no locks).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Connection duration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;60 minutes&lt;/td&gt;
&lt;td&gt;Aligns with IAM token lifecycle. Reconnect periodically.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max connections&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10,000 per cluster&lt;/td&gt;
&lt;td&gt;Configurable via Service Quotas.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Connection rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100/second (1,000 burst)&lt;/td&gt;
&lt;td&gt;Not configurable. Critical for Lambda cold-start scenarios.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tables per database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;One database per cluster.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schemas per database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Not configurable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Indexes per table&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;Including primary key.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max row size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2 MiB&lt;/td&gt;
&lt;td&gt;Individual column max is 1 MiB.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;256 TiB (with quota increase)&lt;/td&gt;
&lt;td&gt;Default is 10 TiB.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sequences&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5,000 per database&lt;/td&gt;
&lt;td&gt;Added February 2026.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Views&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5,000 per database&lt;/td&gt;
&lt;td&gt;Added at GA, May 2025.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;DSQL uses a DPU (Distributed Processing Unit) billing model that covers all database activity - compute, I/O, and transaction processing - in a single metric.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DPU rate&lt;/strong&gt;: $8 per million DPUs (us-east-2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: $0.33 per GB-month (pay for one logical copy per region)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Region writes&lt;/strong&gt;: Additional DPU charges equal to originating write DPUs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free tier&lt;/strong&gt;: 100,000 DPUs + 1 GB storage per month (roughly 700K TPC-C equivalent transactions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scales to zero&lt;/strong&gt;: No DPU charges when idle&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Comparison for a Modest Workload
&lt;/h3&gt;

&lt;p&gt;For an application processing 1,000 transactions per hour, 10 GB storage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Aurora DSQL&lt;/strong&gt; (single region)&lt;/td&gt;
&lt;td&gt;~$50-80/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Aurora DSQL&lt;/strong&gt; (idle dev environment)&lt;/td&gt;
&lt;td&gt;~$3/month (storage only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Aurora PostgreSQL Serverless v2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~$90-120/month active, or storage-only when paused at 0 ACU (~15s cold start on resume)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;RDS PostgreSQL&lt;/strong&gt; (db.t3.medium)&lt;/td&gt;
&lt;td&gt;~$60-80/month (runs 24/7)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;DynamoDB&lt;/strong&gt; (on-demand, equivalent)&lt;/td&gt;
&lt;td&gt;~$30-50/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both DSQL and Aurora Serverless v2 can now scale to zero. The difference: DSQL resumes instantly with no cold start, while Aurora Serverless v2 takes approximately 15 seconds to resume from a paused state. For development environments with intermittent traffic, both cost pennies when idle. For production workloads that need instant response times, DSQL's zero cold start matters. DSQL is also eligible for Database Savings Plans for predictable workloads.&lt;/p&gt;

&lt;p&gt;You can monitor DPU breakdown in CloudWatch under the &lt;code&gt;AWS/AuroraDSQL&lt;/code&gt; namespace: &lt;code&gt;ComputeDPU&lt;/code&gt;, &lt;code&gt;ReadDPU&lt;/code&gt;, &lt;code&gt;WriteDPU&lt;/code&gt;, and &lt;code&gt;MultiRegionWriteDPU&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Developer Experience and Tooling
&lt;/h2&gt;

&lt;p&gt;DSQL's tooling ecosystem has grown quickly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connectors&lt;/strong&gt; (official, handle IAM auth automatically):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python: &lt;code&gt;aurora-dsql-python-connector&lt;/code&gt; v0.2.6 - wraps psycopg, psycopg2, asyncpg&lt;/li&gt;
&lt;li&gt;Node.js: &lt;code&gt;@aws/aurora-dsql-node-postgres-connector&lt;/code&gt; v0.1.8 (pg) and &lt;code&gt;@aws/aurora-dsql-postgresjs-connector&lt;/code&gt; v0.2.1 (Postgres.js)&lt;/li&gt;
&lt;li&gt;Java: JDBC connector (PgJDBC wrapper)&lt;/li&gt;
&lt;li&gt;Go: pgx v5.8.0 wrapper (February 2026)&lt;/li&gt;
&lt;li&gt;Ruby: &lt;code&gt;aurora-dsql-ruby-pg-connector&lt;/code&gt; (February 2026)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ORM and Migration Tooling&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tortoise ORM adapter (Python async ORM)&lt;/li&gt;
&lt;li&gt;Prisma CLI tools (Node.js ORM integration)&lt;/li&gt;
&lt;li&gt;Flyway dialect (database migration tooling)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;IDE Integrations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DBeaver plugin (Community and Pro editions)&lt;/li&gt;
&lt;li&gt;VS Code SQLTools driver&lt;/li&gt;
&lt;li&gt;JupyterLab and SageMaker AI integration&lt;/li&gt;
&lt;li&gt;AWS Console Query Editor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AI Steering&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aurora DSQL MCP Server for AI-assisted development&lt;/li&gt;
&lt;li&gt;Kiro Powers for Kiro IDE&lt;/li&gt;
&lt;li&gt;Skills for Claude Code, Cursor, Gemini, Codex&lt;/li&gt;
&lt;li&gt;Steering ensures AI assistants generate DSQL-compatible code (handling OCC retries, DDL limits, IAM auth)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform 1.11+ with AWS provider 6.18+ - native &lt;code&gt;aws_dsql_cluster&lt;/code&gt; resource&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;terraform-aws-modules/rds-aurora&lt;/code&gt; DSQL submodule for multi-Region&lt;/li&gt;
&lt;li&gt;CloudFormation support&lt;/li&gt;
&lt;li&gt;AWS Backup integration for automated backups&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Implement OCC retry logic on every write path.&lt;/strong&gt; Use exponential backoff with 3-5 retries. Read-only transactions don't need retries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep transactions small and fast.&lt;/strong&gt; The 3,000 row and 5-minute limits exist for good reason. Batch large operations into chunks of 500 rows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use UUID primary keys.&lt;/strong&gt; Random UUIDs distribute writes evenly across storage shards. Sequential IDs create hot spots that increase OCC conflicts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refresh IAM tokens proactively.&lt;/strong&gt; Tokens default to 15 minutes (configurable up to one week via &lt;code&gt;token-duration-secs&lt;/code&gt;). With the default, refresh at 10 minutes to avoid connection failures. The official connectors handle this automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use the official connectors for production SSL.&lt;/strong&gt; Raw psycopg2 with &lt;code&gt;sslmode="require"&lt;/code&gt; encrypts the connection but doesn't verify the server's identity. The official &lt;code&gt;aurora-dsql-python-connector&lt;/code&gt; and &lt;code&gt;@aws/aurora-dsql-node-postgres-connector&lt;/code&gt; handle full certificate verification automatically. For production, use the connectors rather than managing SSL configuration yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One DDL per transaction.&lt;/strong&gt; Always commit after each &lt;code&gt;CREATE TABLE&lt;/code&gt;, &lt;code&gt;ALTER TABLE&lt;/code&gt;, or &lt;code&gt;CREATE INDEX&lt;/code&gt;. This catches many migration scripts that batch DDL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scope IAM policies to cluster ARNs.&lt;/strong&gt; Never use wildcard resources for DSQL permissions. Scope &lt;code&gt;dsql:DbConnect&lt;/code&gt; and &lt;code&gt;dsql:DbConnectAdmin&lt;/code&gt; to specific cluster ARNs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;EXPLAIN ANALYZE VERBOSE&lt;/code&gt; for query optimization.&lt;/strong&gt; Covering indexes can significantly reduce DPU costs by enabling index-only scans instead of full table scans.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement referential integrity in application code.&lt;/strong&gt; Without foreign keys, enforce relationships through application-level validation and carefully designed transaction boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test with AWS FIS.&lt;/strong&gt; Use Fault Injection Service to simulate region failures and validate your application's multi-Region behavior before you need it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor DPU breakdown in CloudWatch.&lt;/strong&gt; Watch &lt;code&gt;ComputeDPU&lt;/code&gt;, &lt;code&gt;ReadDPU&lt;/code&gt;, &lt;code&gt;WriteDPU&lt;/code&gt; separately. High &lt;code&gt;WriteDPU&lt;/code&gt; relative to reads may indicate OCC conflict storms.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What's Not There Yet - And Why
&lt;/h2&gt;

&lt;p&gt;This is the most contentious part of DSQL. If you're coming from standard RDS PostgreSQL or Aurora PostgreSQL, the list of missing features is significant. But these aren't oversights - the DSQL team made deliberate engineering tradeoffs to deliver strong consistency and predictable performance across a distributed, multi-Region architecture. Some of these features are fundamentally difficult in a disaggregated, OCC-based system. Others have been deliberately deprioritized based on customer usage patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Full Gap List vs Standard PostgreSQL
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;PostgreSQL Feature&lt;/th&gt;
&lt;th&gt;DSQL Status&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Foreign key constraints&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not yet - deprioritized based on customer usage patterns&lt;/td&gt;
&lt;td&gt;Cascading operations (e.g., deleting an order with 1,000 line items) create large implicit transactions that conflict with DSQL's 3,000-row transaction limit and OCC model. Many high-scale customers avoid foreign keys even in standard PostgreSQL for this reason. Marc Brooker has noted the team "haven't built foreign key constraints yet" because many customers take the same approach.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stored procedures (PL/pgSQL)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;Procedural code running inside the database conflicts with the serverless, stateless Query Processor model. The DSQL team sees this as an architectural direction, not a gap - business logic belongs in CI/CD-deployed application code, not inside the database.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Triggers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;Same reasoning as stored procedures. Database-side event processing creates hidden coupling and unpredictable transaction sizes. Use EventBridge, Lambda, or application-level event patterns instead.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TRUNCATE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;DELETE FROM table_name&lt;/code&gt; or &lt;code&gt;DROP TABLE&lt;/code&gt; + &lt;code&gt;CREATE TABLE&lt;/code&gt;. TRUNCATE's behavior is difficult to implement consistently across distributed storage replicas.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Temporary tables&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;The stateless, multi-tenant Query Processor model means there's no persistent session state. Use CTEs (&lt;code&gt;WITH&lt;/code&gt; clauses), subqueries, or regular tables with cleanup logic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VACUUM / ANALYZE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not needed&lt;/td&gt;
&lt;td&gt;DSQL's MVCC garbage collection is automatic. The 5-minute transaction time limit enables simple, efficient cleanup without the complexity of PostgreSQL's vacuum process. No maintenance windows required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;pgvector / vector support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not yet&lt;/td&gt;
&lt;td&gt;Vector similarity search is planned. In the meantime, AWS offers S3 Vectors and Aurora PostgreSQL with pgvector for embedding workloads.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSONB columns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not as a column type&lt;/td&gt;
&lt;td&gt;Store JSON in &lt;code&gt;TEXT&lt;/code&gt; columns and cast to &lt;code&gt;jsonb&lt;/code&gt; at query time (e.g., &lt;code&gt;my_column::jsonb-&amp;gt;&amp;gt;'key'&lt;/code&gt;). JSON functions and operators work at runtime, but you lose JSONB indexing (GIN indexes).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Full-text search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;No &lt;code&gt;tsvector&lt;/code&gt;/&lt;code&gt;tsquery&lt;/code&gt;. Use OpenSearch Serverless or Amazon Kendra for full-text search workloads.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multiple databases per cluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 database (&lt;code&gt;postgres&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Use schemas for logical separation within a cluster, or create separate clusters. This simplifies distributed metadata management.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tablespaces&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;Storage is fully managed and auto-scaled. No manual storage allocation or placement decisions needed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Advisory locks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;OCC replaces all locking mechanisms. Advisory locks are a pessimistic concurrency pattern that doesn't fit the OCC model.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LISTEN / NOTIFY&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;The stateless Query Processor model has no persistent connections for push notifications. Use SQS, SNS, or EventBridge for pub/sub patterns.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Extensions (PostGIS, etc.)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;The managed, multi-tenant architecture doesn't support arbitrary extensions. Use purpose-built AWS services (Location Service for geo, OpenSearch for search).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom collations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;C&lt;/code&gt; collation only&lt;/td&gt;
&lt;td&gt;Consistent collation across distributed storage simplifies sort ordering and index behavior across regions. UTF-8 encoding is supported.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Configurable isolation levels&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;REPEATABLE READ&lt;/code&gt; only&lt;/td&gt;
&lt;td&gt;A single isolation level eliminates an entire class of consistency bugs. Strong snapshot isolation is the sweet spot between anomaly prevention and distributed performance.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Password authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IAM only&lt;/td&gt;
&lt;td&gt;No database passwords, ever. This is a security decision - IAM tokens integrate with CloudTrail, roles, and temporary credentials.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CREATE INDEX (synchronous)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CREATE INDEX ASYNC&lt;/code&gt; only&lt;/td&gt;
&lt;td&gt;Asynchronous index creation prevents DDL from blocking running transactions. You monitor build progress through system catalog queries. This is actually an improvement for production workloads.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multiple DDL per transaction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 DDL per transaction&lt;/td&gt;
&lt;td&gt;Distributed schema changes are coordinated across all Query Processors and storage replicas. Limiting to one DDL per transaction keeps this coordination simple and predictable.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Engineering Reasoning
&lt;/h3&gt;

&lt;p&gt;Marc Brooker addressed the feature gaps directly in his &lt;a href="https://brooker.co.za/blog/2025/11/02/thinking-dsql.html" rel="noopener noreferrer"&gt;Simplifying Architectures&lt;/a&gt; post. The key insight: DSQL's limits aren't arbitrary restrictions - they're what make the system's guarantees possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transaction limits (3,000 rows, 10 MiB, 5 minutes)&lt;/strong&gt; prevent head-of-line blocking. In a traditional database, one long-running transaction holding locks can stall every other transaction behind it. DSQL's OCC model doesn't have locks, but oversized commits would still create contention at the Adjudicator and Journal layers. The limits keep individual transactions fast and predictable, which keeps the entire system fast and predictable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No stored procedures or triggers&lt;/strong&gt; is the most opinionated choice. The DSQL team observed that customers are increasingly moving business logic out of the database and into application code deployed through CI/CD pipelines. Code in the database is hard to version, hard to test, and hard to debug. DSQL leans into this direction rather than supporting both models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No foreign keys yet&lt;/strong&gt; is the gap most customers notice first. The team has acknowledged the gap and may add support where it makes sense for the distributed architecture, but has deprioritized it based on customer feedback. The challenge is that cascading operations (CASCADE DELETE, CASCADE UPDATE) can create implicit transactions that exceed the row limits and generate unpredictable OCC conflict windows. Many high-scale PostgreSQL users already avoid foreign keys for exactly these reasons - but having the option matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to Use Instead
&lt;/h3&gt;

&lt;p&gt;For applications that depend heavily on the missing features today, here's the practical guidance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need foreign keys, stored procedures, triggers?&lt;/strong&gt; Use Aurora PostgreSQL Serverless v2. Full PostgreSQL feature set with serverless scaling (though not to zero).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need vector search?&lt;/strong&gt; Aurora PostgreSQL with pgvector, S3 Vectors, or OpenSearch Serverless.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need full-text search?&lt;/strong&gt; OpenSearch Serverless or Amazon Kendra.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need pub/sub notifications?&lt;/strong&gt; EventBridge + Lambda instead of LISTEN/NOTIFY.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need geospatial queries?&lt;/strong&gt; Amazon Location Service instead of PostGIS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DSQL is best for new applications that can work within these constraints, or existing applications that were already avoiding the missing features. The team is actively expanding compatibility - views, sequences, identity columns, and the Go connector all shipped based on direct customer feedback. Foreign key constraints remain a known gap, and customer demand will likely influence when they're addressed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things to Know
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Connection caching&lt;/strong&gt; - DSQL manages prepared statements cluster-wide. You may see more prepared statements per connection than expected. This is by design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IPv4 connections&lt;/strong&gt; - Some PostgreSQL clients attempt IPv6 first in dualstack mode. If you're on IPv4-only hosts, configure your client for IPv4 explicitly to avoid &lt;code&gt;NetworkUnreachable&lt;/code&gt; errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema propagation&lt;/strong&gt; - &lt;code&gt;GRANT&lt;/code&gt; and &lt;code&gt;REVOKE&lt;/code&gt; changes propagate to existing connections within the connection lifetime (up to one hour). For immediate effect, reconnect after permission changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Catalog cache&lt;/strong&gt; - After creating schemas or tables, refresh your connection (disconnect/reconnect or &lt;code&gt;SET search_path&lt;/code&gt; again) to update the catalog cache. This catches "Schema Already Exists" errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deletion protection&lt;/strong&gt; - Enable &lt;code&gt;deletion_protection_enabled = true&lt;/code&gt; in production Terraform configs. If you need to destroy a DSQL cluster, disable protection first then run &lt;code&gt;terraform apply&lt;/code&gt; before &lt;code&gt;terraform destroy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Row counts&lt;/strong&gt; - For large tables, use the system catalog instead of &lt;code&gt;COUNT(*)&lt;/code&gt; for row counts. DSQL stores approximate counts in &lt;code&gt;pg_class.reltuples&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TRUNCATE&lt;/strong&gt; - Not supported. Use &lt;code&gt;DELETE FROM table_name&lt;/code&gt; to clear all rows, or &lt;code&gt;DROP TABLE&lt;/code&gt; followed by &lt;code&gt;CREATE TABLE&lt;/code&gt; for a full reset. This is a common migration stumbling block for scripts that use &lt;code&gt;TRUNCATE&lt;/code&gt; for test data cleanup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection pooling&lt;/strong&gt; - With 60-minute connection limits and IAM token refresh, pool refresh behavior matters. Configure your connection pool to close and recreate connections before the 60-minute limit. The official connectors handle token refresh, but pool-level eviction still needs configuration. Set &lt;code&gt;idleTimeoutMillis&lt;/code&gt; (Node.js) or equivalent to well under 60 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PostgreSQL client version&lt;/strong&gt; - AWS recommends PostgreSQL client version 17 or later for best compatibility with DSQL.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recent Features Worth Highlighting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DSQL Playground (February 2026)&lt;/strong&gt; - A browser-based sandbox where you can create schemas, load sample data, and run SQL queries against a real DSQL database - no AWS account required. This is the fastest way to try DSQL. Visit the &lt;a href="https://playground.dsql.demo.aws/" rel="noopener noreferrer"&gt;Aurora DSQL Playground&lt;/a&gt; and start writing queries in seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sequences and Identity Columns (February 2026)&lt;/strong&gt; - The most requested feature after foreign keys. You can now use &lt;code&gt;GENERATED ALWAYS AS IDENTITY&lt;/code&gt; columns and explicit &lt;code&gt;CREATE SEQUENCE&lt;/code&gt; / &lt;code&gt;nextval()&lt;/code&gt; calls. Up to 5,000 sequences per database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Steering (February 2026)&lt;/strong&gt; - The DSQL MCP server and IDE skills ensure AI coding assistants generate code that handles DSQL's specific patterns - OCC retries, DDL limits, IAM auth. If you use Claude Code, Cursor, or similar tools, install the DSQL steering skill. It saves real debugging time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PrivateLink with Direct Connect (December 2025)&lt;/strong&gt; - Connect to DSQL from on-premises networks without traversing the public internet. Uses the &lt;code&gt;amzn-cluster-id&lt;/code&gt; connection option for clusters behind PrivateLink without private DNS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource-Based Policies (October 2025)&lt;/strong&gt; - Attach policies directly to DSQL clusters for cross-account access patterns. Useful for shared database architectures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS FIS Integration (August 2025)&lt;/strong&gt; - Inject connection errors into specific regions to test your application's failover behavior. For multi-Region deployments, run experiments in one region while the other continues normal operations.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Project: The Kabob Store on DSQL
&lt;/h2&gt;

&lt;p&gt;I built the &lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;Kabob Store&lt;/a&gt; as a real-world test of DSQL. It's a full e-commerce platform with menu browsing, cart management, and order processing, running on ECS Fargate with a FastAPI backend.&lt;/p&gt;

&lt;p&gt;Key architectural decisions from that project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct psycopg2 instead of an ORM&lt;/strong&gt; - Better control over transaction boundaries and DSQL-specific patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container-based architecture&lt;/strong&gt; - The same Docker image deploys to Fargate, Lambda, EC2, or EKS without code changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Region DSQL with single-Region compute&lt;/strong&gt; - Data replication for disaster recovery, with plans to add multi-Region compute with Route53 routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defense-in-depth security&lt;/strong&gt; - Six layers from client validation through parameterized queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM token refresh manager&lt;/strong&gt; - Thread-safe connection management with 55-minute token refresh&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture principles from that project apply to any DSQL application. I covered &lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;ECS as my default container runtime&lt;/a&gt; and &lt;a href="https://darryl-ruggles.cloud/amazon-eventbridge-the-event-driven-backbone-of-aws-and-my-favourite-service/" rel="noopener noreferrer"&gt;EventBridge for event-driven patterns&lt;/a&gt; in previous posts - DSQL fits naturally into both patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cleanup
&lt;/h2&gt;

&lt;p&gt;If you deployed a DSQL cluster to follow along, destroy your resources to avoid ongoing charges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;terraform

&lt;span class="c"&gt;# If you enabled deletion protection, disable it first:&lt;/span&gt;
&lt;span class="c"&gt;# Edit dsql-single-region.tf: set deletion_protection_enabled = false&lt;/span&gt;
&lt;span class="c"&gt;# terraform apply&lt;/span&gt;

terraform destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For CI/CD pipelines and automated testing, set &lt;code&gt;deletion_protection_enabled = false&lt;/code&gt; from the start, or use the &lt;code&gt;force_destroy&lt;/code&gt; option in the Terraform module to skip the protection check during teardown.&lt;/p&gt;

&lt;p&gt;DSQL charges only for DPUs consumed and storage used - there are no idle compute charges. But storage charges ($0.33/GB-month) continue as long as data exists in the cluster. For multi-Region clusters, destroy both the primary and secondary clusters. The witness region has no standalone resources to clean up.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Aurora DSQL is 15 months old and has matured quickly. It went from a 3-region preview to a 14-region GA service with CloudWatch monitoring, AWS Backup, PrivateLink, FIS chaos testing, resource-based policies, sequences, and a growing ecosystem of connectors and IDE integrations.&lt;/p&gt;

&lt;p&gt;The gaps are real - no foreign keys, no stored procedures, no vector support. These matter for some workloads. But for new applications that need SQL with serverless economics, multi-Region strong consistency without managing replicas, or a database that actually scales to zero, DSQL delivers.&lt;/p&gt;

&lt;p&gt;My decision tree for new projects now has a clear path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need key-value at scale?&lt;/strong&gt; DynamoDB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need full PostgreSQL?&lt;/strong&gt; Aurora PostgreSQL Serverless v2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need SQL + serverless + multi-Region?&lt;/strong&gt; Aurora DSQL.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The code examples in this post are in the &lt;a href="https://github.com/RDarrylR/aurora-dsql-2026" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; - Terraform for infrastructure, Python and Node.js for application patterns. If you want to try DSQL without even creating an AWS account, the &lt;a href="https://playground.dsql.demo.aws/" rel="noopener noreferrer"&gt;DSQL Playground&lt;/a&gt; lets you run queries in your browser in seconds. When you're ready for your own cluster, it's sixty seconds from &lt;code&gt;terraform apply&lt;/code&gt; to a running PostgreSQL-compatible database with no instances to manage.&lt;/p&gt;

&lt;p&gt;If you've been waiting for a serverless SQL database on AWS that isn't a compromise, this is it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://playground.dsql.demo.aws/" rel="noopener noreferrer"&gt;Aurora DSQL Playground&lt;/a&gt; - Try DSQL in your browser, no AWS account needed&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/aurora-dsql/latest/userguide/what-is-aurora-dsql.html" rel="noopener noreferrer"&gt;Aurora DSQL User Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/rds/aurora/dsql/pricing/" rel="noopener noreferrer"&gt;Aurora DSQL Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/aurora-dsql/latest/userguide/doc-history.html" rel="noopener noreferrer"&gt;Aurora DSQL Document History&lt;/a&gt; - Track every feature addition&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://brooker.co.za/blog/2024/12/03/aurora-dsql.html" rel="noopener noreferrer"&gt;Marc Brooker's DSQL Blog Series&lt;/a&gt; - Essential reading. Marc is the VP/Distinguished Engineer behind DSQL. His five-part series covers the architecture internals (reads, writes, transactions, multi-Region, simplifying architectures) in detail you won't find anywhere else.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://discord.com/invite/nEF6ksFWru" rel="noopener noreferrer"&gt;Aurora DSQL Discord&lt;/a&gt; - Community Discord for questions, feedback, and discussion with the DSQL team&lt;/li&gt;
&lt;li&gt;&lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/rds-aurora/aws/latest/submodules/dsql" rel="noopener noreferrer"&gt;terraform-aws-modules/rds-aurora DSQL Module&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/aurora-dsql/latest/userguide/SECTION_aurora-dsql-mcp-server.html" rel="noopener noreferrer"&gt;Aurora DSQL MCP Server&lt;/a&gt; - AI steering for DSQL-aware code generation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/aurora-dsql/latest/userguide/SECTION_connectors.html" rel="noopener noreferrer"&gt;Aurora DSQL Connectors&lt;/a&gt; - Official Python, Node.js, Java, Go, Ruby connectors&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;My Kabob Store Project&lt;/a&gt; - My previous DSQL blog - building a multi-Region e-commerce platform&lt;/li&gt;
&lt;li&gt;&lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;ECS: My Default Choice for Containers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://darryl-ruggles.cloud/amazon-eventbridge-the-event-driven-backbone-of-aws-and-my-favourite-service/" rel="noopener noreferrer"&gt;EventBridge: The Event-Driven Backbone of AWS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>aurora</category>
      <category>postgres</category>
      <category>database</category>
    </item>
  </channel>
</rss>
