Posts

How to reduce the size of Docker container images

Dec 14, 2022
Dmitry Chuyko
24.7

Docker images are often filled with software components your application doesn’t need to run. As such, they slow down the development process and take too much space in the cloud that you have to pay for. 

How do we reduce the Docker image size? Find out in this article!

Understanding Docker Images

Basic Structure

Before adjusting the container size, we must understand its structure. The software in containers forms a stack. The top layers with an application, its dependencies, and OS packages are the most frequently changed ones. A parent image and a base image compose the bottom layers. We rarely change base layers, so there are solutions for patching separate container layers to accelerate update times.

As per Docker documentation, a parent image is the one that your image is based on. It refers to the contents of the FROM directive in the Dockerfile. If the parent image is SCRATCH, then it is considered a base image. Most Docker images start from a parent image rather than a base image. However, these two terms are often used interchangeably.

The structure of a typical Docker image can be depicted as follows.

Container layers

Common Issues

So why are Docker images prone to bloating? The most common root causes include

  • Unnecessary base OS packages,
  • JDK instead of JRE for Java applications,
  • Unnecessary image layers.

In the subsequent sections, we will look into methods of adjusting the layers to keep the image lightweight yet performant. Note that there are two types of images — for developing and deploying applications. We will focus on containers for deployment.

Best Practices for Reducing Docker Image Size

Choosing a Base Image

There are two ways to reduce the base image size: use distroless images or lightweight Linux distributions tailored to containers.

Distroless images are designed to be as small as possible. They contain only the packages necessary for the application to run and no usual Linux components such as shell or a package manager. The smallest distroless image takes up only 2MB, but it is suitable for specific statically-linked applications. Distroless images for applications that require a lbc to run JRE, or have dynamic features,are a lot bigger: the size ranges up to 200 MB! To learn more about distroless and whether these images are suitable for your project, refer to the article Are distroless images small and secure?

Another option is to use a small Linux distribution for a base image.

But the variation between Linux image sizes is quite significant.

 

Alpaquita (musl)

Alpaquita (glibc)

Alpine (musl)

RHEL (Distroless UBI 8 Micro)

Ubuntu Jammy

Debian Slim

Container image size (compressed)

3.57MB

11.55MB

3.46MB

11.3MB

28.17MB

30.72MB

Size is not the only factor — albeit quite a substantial one judging by the cost of Cloud resources — when selecting a Linux distribution. Other important characteristics include available and affordable commercial support, LTS releases, security features, C library implementation, etc. A detailed comparison of popular Linux distributions for cloud and server can be found in our previous article.

But as you can see, you can significantly reduce the Docker image by migrating to another OS.

When it comes to minimizing resource consumption, Alpine Linux has been a distribution of choice for a long time. Alpine is an open-source community-driven project. It is based on musl libc in contrast to other popular distributions utilizing glibc. Coupled with the fact that it includes only the essential OS packages, it is a great minimalistic and secure distro for containers.

But Alpine has several drawbacks:

  • There’s no enterprise support;
  • Stock musl implemented in Alpine may have worse performance than glibc in certain cases;
  • There are no builds with glibc libc, which may make migration from glibc-based distros challenging. 

If these factors are critical to you, consider Alpaquita Linux. It is a lightweight open-source distribution inspired by Alpine, but with several enhancements such as

Alpaquita can be used with various programming languages. For running Java applications, there’s a Liberica Runtime Container that takes up only 52MB. For most users, such image size reduction will already be enough to drastically lower сloud resources consumption even without further adjustments.

Minimizing Layers

Each layer of a Docker image represents an instruction in the Dockerfile. Commands in the Dockerfile that modify the filesystem create a new layer. So each RUN, COPY, ADD command adds a new layer to the image, thus increasing its size. So instead of running

FROM ubuntu:latest
RUN echo somedata
RUN mv somefile

We can merge the requests into one command:

FROM ubuntu:latest
RUN echo somedata && mv somefile

Yet, an even better approach is to use Docker multi-stage builds. With this technique, you write one Dockerfile that contains several FROM statements, with each FROM statement using its own base image and starting a new build phase. You can copy only the artifacts you need at a new stage. This way, the final image won’t contain layers from the previous build phases. 

Let’s see how we can implement multi-stage builds using Spring Petclinic as an example app and Liberica Runtime Container as a base image:

FROM bellsoft/liberica-runtime-container:jdk-21-stream-musl as builder

WORKDIR /home/app
ADD spring-petclinic-main /home/app/spring-petclinic-main
RUN cd spring-petclinic-main && ./mvnw clean package

FROM bellsoft/liberica-runtime-container:jre-21-musl

WORKDIR /home/app
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "/petclinic.jar"]
COPY --from=builder /home/app/spring-petclinic-main/target/*.jar petclinic.jar

Another way to minimize the number of layers is to use the docker-squash tool, which squashes the last N number of layers into one. This will help you keep layers with temporary or deleted files out of the image. So if you created a lot of large files (for instance, added Spring resources) and then deleted them in a new layer, you can squash the image and reduce its size several times.

Optimizing Dockerfile Instructions

The fact that each instruction in the Dockerfile creates a new layer gives rise to another issue not strictly related to the size of the image, but to the image build time. As layers are placed on top of each other, whenever you update a layer, it must be rebuilt together with all consecutive layers. So, it is better to place less frequently updated layers on the bottom and shift more frequently updated ones towards the top of the Dockerfile.

For instance, the following Dockerfile

FROM bellsoft/alpaquita-linux-base:stream-musl
COPY . .
RUN apk add --no-cache liberica17-lite-jdk-all

Will trigger the reinstallation of dependencies if any of the project files are changed. Instead, we can do that:

FROM bellsoft/alpaquita-linux-base:stream-musl
RUN apk add --no-cache liberica17-lite-jdk-all
COPY . .

This way, Docker will cache the installed dependencies and use the data from cache next time you have to change the following layers.

In addition, it is advisable to remove any temporary files in the same command you’ve been working with them.

FROM bellsoft/alpaquita-linux-base:stream-musl
RUN apk add --no-cache liberica17-lite-jdk-all \
# use jlink to cut out a custom jre
&& apk del liberica17-lite-jdk-all

Using Minimal Packages

One of the reasons for Linux's immense popularity is customization. You can eliminate unnecessary packages and add modules required for your project, thus keeping the distribution clean and compact. In most cases, a base Linux image already contains numerous modules, which can be later removed manually.

However, starting with a minimal set of packages is more manageable, like in the case of Alpine and Alpaquita. The micro base image can be used as is for simple tasks or Lambdas. But you can pull the rest of the essential packages from Linux repos.

You should also reduce the number of dependencies by removing unnecessary ones or monitoring that unrequired dependencies are not added.

 Package managers can be helpful if you want to control the installation of dependencies. Direct dependencies are essential for some tasks, and indirect (optional) ones might be beneficial. The most popular Linux package managers (apt, yum, etc.) install all dependencies by default. You can regulate this behavior. For instance, the command with Ubuntu/Debian

$ apt-get install -y --no-install-recommends package name

Ensures that optional recommended packages are not installed. As for Alpaquita and Alpine, there’s no need for a similar command because these distros install only direct dependencies.

Removing Unnecessary Files

The image build process implies copying files from the host into the build context. But the project often contains large and unnecessary files not required for the application to run. The .dockerignore file enables the developers to specify files and directories that shouldn't be copied into the image. This helps to accelerate the build process and reduce the image size. It also increases security by eliminating the risk of putting sensitive data (commit history, credentials, etc.) into the image.

In addition, package managers have a cache where they store installed packages and other files. But we don’t need it in the Docker image. So you should clean the cache before building an image. For Debian/Ubuntu, run

$ apt-get clean

With Alpaquita and Alpine, you can utilize the command

$ apk add --no-cache

to avoid storing the package in the cache in the first place.

Find out more tips on working with APK in a dedicated guide.

You can also exclude man pages and documentation from your Docker image if you don’t need them.

Analyzing and Monitoring Image Size

When optimizing the Docker image size, it is important to understand what is going on in your image: which layers it contains, which layers take the most space, and so on. The usual docker images command gives only the general info on the docker image size, but there are other tools you can use to peek into your image structure.

First of all, you can use docker image history [IMAGE] that displays the information about image layers and their size, for instance:

$ docker image history <image-tag>
IMAGE          CREATED              CREATED BY                                      SIZE      COMMENT
6988370c72b1   About a minute ago   CMD ["java" "-jar" "petclinic.jar"]             0B        buildkit.dockerfile.v0
<missing>      About a minute ago   COPY /home/app/spring-petclinic-main/target/61.2MB    buildkit.dockerfile.v0
<missing>      7 minutes ago        WORKDIR /home/app                               0B        buildkit.dockerfile.v0
<missing>      12 days ago          /bin/sh -c #(nop)  ENV JAVA_HOME=/usr/lib/jv…   0B        
<missing>      12 days ago          |8 LIBERICA_BUILD=9 LIBERICA_GENERATE_CDS=fa…   129MB     
<missing>      12 days ago          /bin/sh -c #(nop)  ARG LIBERICA_GENERATE_CDS0B        
<missing>      12 days ago          /bin/sh -c #(nop)  ARG LIBSUFFIX=-musl          0B        
<missing>      12 days ago          /bin/sh -c #(nop)  ARG LIBERICA_USE_LITE=1      0B        
<missing>      12 days ago          /bin/sh -c #(nop)  ARG LIBERICA_RELEASE_TAG=    0B        
<missing>      12 days ago          /bin/sh -c #(nop)  ARG LIBERICA_ROOT=/usr/li…   0B        
<missing>      12 days ago          /bin/sh -c #(nop)  ARG LIBERICA_VARIANT=jre     0B        
<missing>      12 days ago          /bin/sh -c #(nop)  ARG LIBERICA_BUILD=9         0B        
<missing>      12 days ago          /bin/sh -c #(nop)  ARG LIBERICA_VERSION=21.00B        
<missing>      2 weeks ago          /bin/sh -c #(nop)  ENV LANG=en_US.UTF-8 LANG0B        
<missing>      2 weeks ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>      2 weeks ago          /bin/sh -c #(nop) ADD file:a71f7e9bc66668361…   8.83MB 

Another useful tool for analyzing image contents is dive. The tool displays information about the image layers, changes to the file tree, and estimates image efficiency. Using it is as simple as running

$ dive <image-tag>

And the screenshot below shows an example output.

Image evaluation by dive

Optimizing Application Dependencies

Packing the application with all its dependencies into a Docker image may significantly contribute to its size.

Let’s see how we can optimize dependencies using Java applications as an example.

Thin JARs 

A traditional way of packing a Java application into an executable is building a fat JAR. A fat JAR or uber-JAR contains application class files, application, and all its dependencies, resulting in a self-contained executable that needs only a JRE to run. But we can trim down the size of our executable by creating a thin JAR that includes the application code without the dependencies. The dependencies are stored in the local repository, so there's no need to push the application with all dependencies across the dev, test, and prod environments, thus increasing process efficiency. A thin JAR forms a separate container layer leaving the same overall size and a tiny update portion. It is also possible to utilize class files without the JAR packaging, which accelerates startup and reduces compressed image size due to the absence of double compression.

Using thin JARs also lets us separate the layers of a container image and put the ones that get frequent updates on top. This method saves the developers a lot of time when they need to introduce changes to the application because only the top image layers get affected.

Another possible solution is to use Application Class Data Sharing (AppCDS), a JVM feature that loads a set of pre-initialized system classes and application classes into an archive that can be shared by multiple JVM processes. The main goal of AppCDS is to reduce Java application startup, but it can also help you reduce footprint depending on the number of application instances that share the archive.

Great news is that Spring Boot 3.3 comes with support for CDS, so creating an archive is extremely convenient. CDS with Spring Boot apps yields about 40% faster startup, and if you couple it with Spring AOT the startup is more than 50%! Find out how to use CDS with Spring Boot in this tutorial.

JRE images

We need JDK (Java Development Kit) to develop Java applications. It includes Java Virtual Machine (JVM), Java Runtime Environment (JRE), and development tools, such as a debugger, compiler, etc. To run Java apps, we need only JRE, which includes JVM and specific classes for program execution.

Developers sometimes put JDK into the containers aimed for app deployment, increasing their size unnecessarily, while JRE images are more suitable for use in production. To compare, Liberica Runtime Container with Alpaquita Linux and

  • JDK, including tools such as jlink, takes up about 180 MB
  • JDK Lite, optimized for Cloud, takes up about 85 MB
  • JRE is only about 50 MB

In the Minimizing layers section we showed how to use a base image with JRE.

Using Slimming Tools

The DockerSlim tool (docker-slim) automatically optimizes the size of a Docker image. It creates a temporary container and decides which files an application requires using various analysis techniques. The resulting single-layer image with only necessary files can be 30 times smaller than the original one, thus reducing memory consumption and enhancing security due to a minimal attack surface.

However, the tool should be used with caution. docker-slim may accidentally throw away the files the application needs due to lazy loading. It may lead to production errors or even an unusable container. To avoid such situations, use the --http-probe and --include-path flags to detect all dynamically loaded functions and preserve required files. It may be more complicated with Java, because a typical Java API contains multiple dependencies, and some of them can be unobvious and nonstatic. 

Case Studies and Examples

How to create a custom JDK container image

In this section, we will build a custom JDK image using jlink, Liberica JDK Lite and Alpaquita Linux.

First, choose a C library implementation. Alpaquita Linux offers two libraries with three versions:

  • musl-perf optimized for performance
  • musl-default (upstream build)
  • glibc

If you choose musl-perf, the Dockerfile will start with 

FROM bellsoft/alpaquita-linux-base:stream-musl

Note that Docker images with musl-based Alpaquita contain musl-perf by default. If you want to switch to the upstream version, add the following command to the RUN instruction:

RUN apk add musl-default

Next, choose a Java version. Only new LTS versions, JDK 11, 17, and 21 currently support jlink. If you are using Java 8 for enterprise development, this is a good incentive to migrate to a newer version.

We will build a custom image based on Java 17. For this purpose, install the package liberica17-lite-jdk-all, which contains everything you may need to build the image. 

RUN apk add --no-cache liberica17-lite-jdk-all

Configure jlink execution parameters. The most important ones are:

  • --add-modules <list> — this option allows us to specify only those modules which we really need. We choose the java.base module which contains the essential implementation of reading classes and resources
  • --vm <minimal/client/server> — we will use server as the most suitable option for user needs
  • --no-header-files — we don't need headers as we aren’t going to compile JNI code
  • --no-man-pages — we don't need documentation
  • --compress <0/1/2> — 0 means No compression, 1 is Constant string sharing, 2 is ZIP. The compression level affects the disk size of the runtime image. The highest compression level results in a smaller image, but with a potential penalty to startup. Another problem with compression is that the resulting container image will also be compressed, but with Zip, the result will be less efficient as it could be in case of no compression. In other words, if you want to save the network bandwidth, use 0, other options may require additional experiments
  • --strip-debug — we don't need debug information, and this option reduces size of the runtime image by approx. 20%
  • --module-path — a path to Java Modules (jmods), usually it's $JAVA_HOME/jmods
  • --output — a path to the location where the resulting image will be created

Let’s summarize it all for our Dockerfile:

RUN apk add --no-cache liberica17-lite-jdk-all \
&& jlink \
--compress=2 \
--no-header-files \
--no-man-pages \
--strip-debug \
--module-path $JAVA_HOME/jmods \
--vm=server \
--output /opt/customjdk

Now, let’s remove our supplementary package liberica17-lite-jdk-all because we don't need it anymore:

&& apk del --no-cache liberica17-lite-jdk-all

The --no-cache option is not really required, but without it, apk will issue a warning that there are no index files.

Next, add necessary environment variables:

ENV JAVA_HOME="/opt/customjdk"
ENV PATH="$JAVA_HOME/bin:$PATH"

Add the default execution command, i.e., when it's run without any parameters. It will show the Java version:

CMD ["java", "-version"]

Below is the resulting Dockerfile.

FROM bellsoft/alpaquita-linux-base:stream-musl
RUN apk add --no-cache \
liberica17-lite-jdk-all \
&& jlink \
--add-modules java.base \
--compress=2 \
--no-header-files \
--no-man-pages --strip-debug \
--module-path $JAVA_HOME/jmods \
--vm=server \
--output /opt/customjdk \
&& apk del liberica17-lite-jdk-all
ENV JAVA_HOME="/opt/customjdk"
ENV PATH="$JAVA_HOME/bin:$PATH"
CMD ["java", "-version"]

We can now build the image:

$ docker build . -t customjdk

Finally, let’s run our image:

$ docker run --rm -it customjdk
openjdk version "17.0.12" 2024-07-16 LTS
OpenJDK Runtime Environment (build 17.0.12+10-LTS)
OpenJDK 64-Bit Server VM (build 17.0.12+10-LTS, mixed mode, sharing)

We can check the size of the image with the following command:

$ docker inspect --format '{{.Size}}' customjdk
40300400

The Docker image size is only about 40.3MiB. And its compressed size will be around 20.4MiB. You can play with the --compress option described above to achieve the desired result.

Of note is that the same image with Minimal VM (--vm minimal) has an uncompressed size of 24.4 MiB and compressed size of 14.93 MiB.

Conclusion

As you can see, there are numerous ways of keeping Docker container images clean and lightweight. The key takeaway — keep unnecessary files out of the image and use the smallest base image possible.

 

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading