AWS Architecture Blog

Simplify multi-tenant encryption with a cost-conscious AWS KMS key strategy

In this post, we explore an efficient approach to managing encryption keys in a multi-tenant SaaS environment through centralization, addressing challenges like key proliferation, rising costs, and operational complexity across multiple AWS accounts and services. We demonstrate how implementing a centralized key management strategy using a single AWS KMS key per tenant can maintain security and compliance while reducing operational overhead as organizations scale.

How CommBank made their CommSec trading platform highly available and operationally resilient

In this post, we explore how CommSec, Australia’s leading online broker, transitioned from a multicloud environment to AWS as their sole cloud provider while implementing Amazon Application Recovery Controller (ARC) zonal shift to maintain high availability and operational resilience. The consolidation resulted in significant benefits including 25% base capacity reduction, two times faster deployments, and improved failover capabilities through ARC zonal shift, enabling CommSec to continue serving millions of customers while meeting strict regulatory requirements.

How Karrot built a feature platform on AWS, Part 1: Motivation and feature serving

This two-part series shows how Karrot developed a new feature platform, which consists of three main components: feature serving, a stream ingestion pipeline, and a batch ingestion pipeline. This post starts by presenting our motivation, our requirements, and the solution architecture, focusing on feature serving.

How Karrot built a feature platform on AWS, Part 2: Feature ingestion

This two-part series shows how Karrot developed a new feature platform, which consists of three main components: feature serving, a stream ingestion pipeline, and a batch ingestion pipeline. This post covers the process of collecting features in real-time and batch ingestion into an online store, and the technical approaches for stable operation.

Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers

In this post, we demonstrate how to deploy the DeepSeek-R1-Distill-Qwen-32B model using AWS DLCs for vLLMs on Amazon EKS, showcasing how these purpose-built containers simplify deployment of this powerful open source inference engine. This solution can help you solve the complex infrastructure challenges of deploying LLMs while maintaining performance and cost-efficiency.

Maximizing Business Value Through Strategic Cloud Optimization

As cloud spending continues to surge, organizations must focus on strategic cloud optimization to maximize business value. This blog post explores key insights from MIT Technology Review’s publication on cloud optimization, highlighting the importance of viewing optimization as a continuous process that encompasses all six AWS Well-Architected pillars.

How Zapier runs isolated tasks on AWS Lambda and upgrades functions at scale

In this post, you’ll learn how Zapier has built their serverless architecture focusing on three key aspects: using Lambda functions to build isolated Zaps, operating over a hundred thousand Lambda functions through Zapier’s control plane infrastructure, and enhancing security posture while reducing maintenance efforts by introducing automated function upgrades and cleanup workflows into their platform architecture.

How HashiCorp made cross-Region switchover seamless with Amazon Application Recovery Controller

In this post, we discuss HashiCorp’s journey from manual, stress-inducing failover procedures to a streamlined, confident approach that fundamentally changed how they deliver on their enterprise-grade resilience promises.

Implement monitoring for Amazon EKS with managed services

In this post, we show you how to implement comprehensive monitoring for Amazon Elastic Kubernetes Service (Amazon EKS) workloads using AWS managed services. This solution demonstrates building an EKS platform that combines flexible compute options with enterprise-grade observability using AWS native services and OpenTelemetry.

How Scale to Win uses AWS WAF to block DDoS events

In this post, you’ll learn how Scale to Win configured their network topology and AWS WAF to protect against DDoS events that reached peaks of over 2 million requests per second during the 2024 US presidential election campaign season. The post details how they implemented comprehensive DDoS protection by segmenting human and machine traffic, using tiered rate limits with CAPTCHA, and preventing CAPTCHA token reuse through AWS WAF Bot Control.

AWS Architecture Blog

Top Architecture Blog Posts of 2024