Lambda Lowdown: Unpacking the Internals of AWS Lambda

Are you curious about how AWS Lambda makes deploying your code a cakewalk? Do you want to understand the nitty-gritty details of what's going on behind the scenes? Let's peel back the layers of the mysterious world of serverless computing.

Lambda is made up of two parts, the Control Plane and the Data Plane. The Control Plane is responsible for managing all the integrations with other AWS services and providing management APIs. On the other hand, the Data Plane triggers your function invocations with its Invoke API.

Let's dig deep and find out how Lambda performs the heavy lifting for you, from provisioning to scaling.

How to Deploy Your Code on AWS Lambda:

For example, let's say you're creating a serverless application that sends out an email whenever a new user signs up on your website. You can either store the code for this application as a container image in Amazon ECR, or you can package the code into a .zip file and upload it to AWS Lambda.

If you choose to upload a .zip file, AWS Lambda will optimize the code once and then encrypt it using an AES-GCM encryption method with a key managed by AWS. This encrypted code will be stored in an inaccessible S3 bucket and can be quickly retrieved whenever a new user signs up on your website.

If you opt to store the code as a container image in Amazon ECR, the image will be retrieved, optimized into smaller chunks, and encrypted using a combination of AES-CTR, AES-GCM, and SHA-256 MAC. This encryption method allows Lambda to deduplicate encrypted chunks securely, ensuring that your application runs smoothly and efficiently.

In either case, AWS Lambda takes care of the heavy lifting, freeing you up to focus on developing your serverless application.

The architecture

At the heart of AWS Lambda lies a sophisticated architecture that ensures seamless deployment of your code. The service creates an execution environment, also known as a worker, on a fleet of EC2 instances. These workers are optimized for high performance and run on bare-metal Nitro instances in an isolated AWS account for security purposes. The instances are equipped with hardware-virtualized MVMs (Micro Virtual Machines) powered by Firecracker, a Linux-based virtualization technology.

To ensure efficient resource utilization, the workers have a lease lifetime of 14 hours. If a worker approaches this maximum, no further invocations are sent to it and it's terminated. Each worker can host one concurrent invocation, but is reused if multiple invocations of the same function are made. However, AWS Lambda never reuses an execution environment across different functions.

For added security, all communication between workers is encrypted using AES with Galois encryption. Customers do not have direct access to workers, as they are hosted in a network-isolated VPC in the Lambda AWS service accounts. This helps maintain the integrity and confidentiality of customer data.

AWS Lambda's Load Balancing and Scaling Strategy:

Lambda focuses on distributing the load evenly across the smallest number of busy "sandboxes." A sandbox can either be invoked or idle, so counting the number of busy sandboxes gives a clear picture of the system's load.

To optimize the load balancing, Lambda combines different types of workloads onto the same server. For example, if Function A has spikes in CPU usage, combining it with a completely different Function B can help even out the spikes and balance the load.

This technique is called "statistical multiplexing," and it works best when the workloads are uncorrelated. The larger the scale of the operation, the easier it is to find these uncorrelated workloads and pack them together for better performance.

So, to simplify, AWS Lambda balances the load by packing together different, uncorrelated workloads, maximizing the performance and stability of the system.

Lambda's Worker Layers and Isolated Environments

The Worker layers in AWS Lambda can be thought of as a series of layers that make up the environment where your code runs. The layers are:

  • Function code: This is your own code that is provisioned by the Worker Manager and downloaded and initialized on the Worker.

  • Lambda runtimes: These are the built-in runtimes that Lambda supports, such as .NET Core, Node.js, Python, etc.

  • Sandbox: This layer is made up of a Linux guest kernel, which has most of its features stripped away.

  • Guest OS: This layer is the Amazon Linux 2 distribution, which can run multiple isolated functions on a single Worker. The isolation between accounts is achieved through virtualization for security reasons.

  • Host OS: This layer is bound to the physical instance and its hardware.

All these layers work together to create the environment where your code runs. For example, the sandbox layer uses tools like cgroups and seccomp to restrict the function's maximum memory footprint and prevent certain syscalls from being performed.

The technology behind these layers is similar to containerization, but with added security features. Firecracker, a Linux-based virtual machine, is also used to create isolated environments on the Workers.

Synchronous Execution Path

Synchronous invocations are requests that expect a result immediately. For example, invoking one function that computes something and then using the result in another function.

When a synchronous invocation is made to a Lambda function, AWS evaluates the request using an internal Application Load Balancer. The load balancer manages and balances the requests across the underlying infrastructure.

The load balancer forwards the request to a Frontend Worker, which authenticates the caller and validates if the per function concurrency hasn't been exceeded.

If it's a first-time (cold-start) synchronous invocation, there's no available sandbox ready for invocation. The Worker Manager communicates with a Placement Service to place a workload on a location for the given host, provisioning the sandbox. The Worker Manager then initializes the function for execution by downloading the Lambda package from S3 and setting up the Lambda runtime. The Frontend Worker can then call Invoke.

On a warm-start (the majority of invocations), the Worker Manager is aware of a readily available warm sandbox to invoke. In this case, the execution path is shorter and does not involve the Placement Service.

To maximize packing density without impacting cold-path latency, the Placement Service places sandboxes on Workers.

Overall, the synchronous execution path involves several steps, including load balancing, authentication, validation, and sandbox provisioning. Warm-starts are preferred, as they involve a shorter execution path and do not require sandbox provisioning.

Asynchronous Execution Path

Asynchronous invocations and events in AWS Lambda are handled differently than synchronous invocations. In the case of asynchronous invocations, the Application Load Balancer forwards the request to an available Frontend. The Frontend then places the event onto an internal queue, which is monitored by a set of pollers assigned to the queue. The pollers are responsible for moving the event onto a Frontend synchronously.

Once the event is placed onto the Frontend, it follows the same synchronous invocation call pattern that we covered earlier. This means that the Frontend will authenticate the caller and check function metadata to validate if the per function concurrency hasn’t been exceeded. If everything checks out, the Frontend will route the request to a Worker Manager to track and manage warm Lambda Sandboxes ready for invocation.

In the case of stream-based processing, the event source is responsible for executing the invocation request. Stream-based processing allows for processing a stream of events in real-time. For example, an AWS Lambda function could be configured to process events from an Amazon Kinesis stream. When a new event is added to the stream, the event source automatically triggers the Lambda function to process the event.

Similarly, with DynamoDB streams, when a change is made to a table, an event is generated and sent to the Lambda service. This event is placed onto the internal queue and is polled by the assigned pollers, who then place it onto the Frontend for processing through the synchronous execution path.

So, the process for stream sources like Kinesis and DynamoDB involves the same synchronous execution path as for asynchronous invocations and events, but with the added step of events being generated by the stream source and sent to Lambda for processing.

Firecracker

Firecracker is an open-source virtual machine monitor (VMM) optimized for multi-tenant serverless and container workloads. In 2018, AWS announced Firecracker to improve the economics and scale of serverless applications. It provides both strong security and minimal overhead, making it ideal for public infrastructure providers. Firecracker is generally useful for containers, functions, and other compute workloads within a reasonable set of constraints.

Serverless containers and functions are widely used for deploying and managing software in the cloud. Their popularity is due to reduced cost of operations, improved utilization of hardware, and faster scaling than traditional deployment methods. However, workloads from multiple customers run on the same hardware with minimal overhead, while preserving strong security and performance isolation. Traditionally, there was a choice between virtualization with strong security and high overhead, and container technologies with weaker security and minimal overhead. The tradeoff was unacceptable to public infrastructure providers, who needed both strong security and minimal overhead.

To meet this need, the AWS team developed Firecracker, which is specialized for serverless workloads. It is designed to support millions of production workloads and trillions of requests per month. It has been deployed in two publicly available serverless compute services at AWS, Lambda and Fargate. The team describes how specializing for serverless informed the design of Firecracker and what they learned from seamlessly migrating Lambda customers to Firecracker.

The development of Firecracker involved a team of people from vision to execution. While it was designed to be simple and well-suited for a relatively small number of tasks, it is not simplistic. Choosing what to do and what not to do was some of the most interesting decisions made during its development.

Readings