Posts

Avoiding AWS Lambda cold starts

May 11, 2023
Dmitry Chuyko
14.3

The summer is just around the corner, but it's freezing in your Lambdas, and the cold starts are the culprit.

If you have noticed the detrimental effect of this condition on your business, this article is for you. We will look into the reasons for cold starts, find out why your instances warm up agonizingly slowly and learn how to dodge the issue with a few simple techniques.

What causes cold starts in AWS Lambda

A cold start is an inherent side effect of serverless computing where the machine capacities for code execution are allocated on demand. AWS Lambda, a highly available serverless compute service, manages the code in functions invoked only when the code is called: one request per day or thousands per second. To maintain high availability, AWS Lambda needs a ready set of containers to run the functions at any moment. So the function that is no longer needed is kept warm for a limited time and then shut down to free up a container. To spin up the function, AWS Lambda goes through the initialization process, which includes:

  • The init phase, during which Lambda starts the extensions (monitoring and security tools, etc.), the runtime environment (OS, programming language, libraries), and the function.
  • The invoke phase, during which the function becomes fully serviceable. This stage ends after the runtime and extensions signal that they are done.
  • The shutdown phase is when Lambda shuts down the runtime and extensions.

After the invoke phase, Lambda maintains the execution environment in case the function is invoked again quickly. If so, Lambda reuses the execution environment and skips the init phase, going straight to the invoke phase — a warm start. If the function is called after being shut down, the process starts from scratch with the init phase — a cold start.

Lifecycle of an AWS Lambda function

As a result, companies find themselves in the following situation:

  • No available instances lead to cold starts;
  • The application starts up slowly;
  • All the requests are satisfied, and the functions are shut down;
  • The process commences anew with cold starts. 

The pattern of a sudden high load can be costly, and in the end, you end up paying much more than you expected due to slow warm up. And simultaneously, while your instances are warming up, the user waits until the server responds. Therefore, long and frequent cold starts lead to increased cloud bills and are highly disruptive to the user experience with your product. 

Several factors contribute to even longer cold starts.

Programming languages

The speed of function initialization depends on the language. It has been demonstrated that Python and Node.js load much faster, whereas Java and .NET (strictly speaking, a platform, not a language) take longer to initialize. Although the situation has improved compared to 2018, and the difference is no longer striking, it should still be considered.

The main reason for Java and .NET slow startup is Just-in-time (JIT) compilation. The JIT-compiler converts the source bytecode into the machine code at runtime. At the same time, it performs necessary optimizations for better long-term performance, which results in an extended warm-up.

Dependencies

The more dependencies and packages you use, the more time it takes for the provider to load them. To make matters worse, the process repeats with every function initialization.

Lambda Function chains

Function chains are an excellent method of splitting up a huge function by assigning related tasks to smaller functions. Each function calls another after completing its job, thus guaranteeing shorter response times to user requests. But if something goes wrong at any stage of the process, it can cause Lambda timeout (the max amount of time a Lambda can run), and the function initialization will start again, leading to even worse latency.

HTTPS calls

The HTTPS call triggers the SSL/TLS handshake, a communication session between client and server to establish a secure connection. When an HTTPS call happens inside the function, it prolongs the invocation time.

Recommendations on minimizing cold starts

Unfortunately, we cannot eliminate cold starts, but we can minimize their frequency and duration with the following techniques. The good part is that implementing them will give you an optimal result without paying for additional services such as provisioned concurrency.

Keep functions warm

Warm Lambda functions are initialized but not fully invoked (“frozen”), and they can be “thawed” when a request comes through. This is achieved by implementing handlers with warming logic that ping the function every few minutes and don’t let it die. For that purpose, you can use third-party tools or CloudWatch Events.

Reduce dependencies and packages

Reduce the number of dependencies and leave only the direct ones required by your application to run. Another approach is to preload dependencies, thus accelerating startup time.

The same goes for OS packages: utilize a minimally sufficient set. You can manually remove unnecessary packages or opt for a minimalistic distribution like Alpaquita Linux. The base image of 3.69MB (musl) or 8.32MB (glibc) is sufficient for simple Lambda functions, but it is possible to pull any other essential packages from Linux repositories.

Develop the app with cold starts in mind

Developers should consider the response times and timeout risks when developing and preparing an app for deployment in AWS Lambda. For instance, they can reduce the number of static variables or use lazy loading in DB whenever possible.

In addition, it is vital to keep functions small and straightforward. Huge functions take a lot of time to load and are associated with significant latency when processing complex requests. Minimalistic single-purpose functions start up fast, complete their task, and release the instance into the pool; the more instances are available, the lower the risks of creating new ones for further requests.

Use the Coordinated Restore at Checkpoint (CRaC) API

Coordinated Restore at Checkpoint (CRaC) is an OpenJDK Project aimed at reducing the startup and warmup times of Java applications from minutes to milliseconds. Java with CRaC enables developers to take a snapshot of a running application, save it to a file, replicate among instances, and then restore the state of an application from the file. As a result, the application starts almost instantly at peak performance, so you don't have to pay for CPU cycles required for the warmup.

BellSoft offers ready-to use Alpaquita Containers with CRaC support so that you don't have to configure Java or the OS to work with this feature. Head to our tutorial on using CRaC with Java containers and give it a try! 

Use AOT-compilation

AOT-compilation is an alternative to the CRaC Project. The Ahead-of-time (AOT) compiler translates the bytecode into the machine code before program execution. It generates a native executable file, which starts almost instantly because searching for hotspots and performing bytecode interpretation is unnecessary. All the optimizations are conducted before the program has started. An AOT compiler eliminates unused code and dependencies, and coupled with the fact that the resulting file doesn’t need a JVM to run, it helps to

  • drastically reduce memory consumption;
  • accelerate startup up 1/10 s;
  • reach peak performance immediately without warm-up;
  • increase the security thanks to a smaller attack surface. 

AOT compilation is made possible in Java through the GraalVM compiler, which aims to produce highly performant and resource-efficient native images, perfect for cloud-native applications. Indeed, native images with instant startup eliminate the possible damage from cold starts by reducing the function warm-up times.

Note that AOT-compiler doesn’t support dynamic features (JNI, Reflection, etc.), so you need to make the compiler aware of them or rebuild the app accordingly.

Try out the AOT-compilation right now: BellSoft develops Liberica Native Image Kit (NIK), a GraalVM-based tool for native image generation recommended by Spring. Liberica NIK is always based on the latest GraalVM and Liberica JDK 11 & 17 versions with security patches, bug fixes, and enhancements.

Utilize logging and performance monitoring

Monitor the behavior of your functions: how often they are invoked, how long the cold starts are, etc. Measure KPIs such as latency and throughput. Understanding the bottlenecks of the app’s performance can help you introduce improvements and reduce cold starts eventually. AWS CloudWatch is an outstanding tool for collecting, visualizing, and analyzing metrics for your Lambda functions.

Conclusion

If you work with Lambdas or any other serverless computing service, you must deal with cold starts regularly. It’s impossible to eliminate their occurrence, but following the recommendations above will end the Ice Age in your Lambdas: your functions will warm up much faster, and instances will be used more efficiently.

Heads up for the Java developers: you don’t have to migrate your application from Java to Python to accelerate function startup. Java may require more space in the cloud and warm up slowly. This is true. But there are numerous ways to alleviate the matter:

  • Use JRE instead of JDK for deployment so that your runtime initializes faster;
  • Consider AOT-compilation for almost instant app startup;
  • Migrate to a smaller base image with minimal packages that must be loaded during initialization. 

In addition, BellSoft develops a full stack of Java technologies for cloud-native Java applications:

  • Liberica Native Image Kit for native image generation;
  • Alpaquita Linux, a lightweight but highly performant Linux tailor-made for Java. It can be used as a lightweight base image for Lambdas or any other cloud service and CI/CD. We provide ready images with Java, Python, GCC, and Native Image;
  • Liberica Lite, a version of our Liberica JDK optimized for the cloud.

All our technologies are free to use. But if you need an enterprise-grade solution, we have the Alpaquita Cloud Native Platform that combines the above technologies plus 24/7 support from a major OpenJDK vendor. Alpaquita Cloud Native Platform is a convenient all-in-one solution for Lambdas, but compatibility with multiple system configurations enables you to develop applications for any purpose (cloud, server, desktop, and embedded), deploy them to any environment, and unify the enterprise Java stack. 

Click on the button below to learn more about the offer, and may there be spring in your Lambdas!

 

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading