Estimated reading time: 5 minutes
Docker images can support multiple architectures, which means that a singleimage may contain variants for different architectures, and sometimes for differentoperating systems, such as Windows.
Stack Exchange network consists of 176 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. Wow, check out the difference in size. Alpine is about 30x smaller than Debian. The Docker Hub has handled a ton of pulls. By investigating its public API we can see that Debian has gotten 35,555,107 pulls and Alpine has gotten 135,136,475 pulls at the time of this article.
When running an image with multi-architecture support, docker
willautomatically select an image variant which matches your OS and architecture.
Most of the official images on Docker Hub provide a variety of architectures.For example, the busybox
image supports amd64
, arm32v5
, arm32v6
,arm32v7
, arm64v8
, i386
, ppc64le
, and s390x
. When running this imageon an x86_64
/ amd64
machine, the x86_64
variant will be pulled and run.
Docker Desktop provides binfmt_misc
multi-architecture support,which means you can run containers for different Linux architecturessuch as arm
, mips
, ppc64le
, and even s390x
.
This does not require any special configuration in the container itself as it usesqemu-static from the Docker forMac VM. Because of this, you can run an ARM container, like the arm32v7
or ppc64le
variants of the busybox image.
Buildx (Experimental)
Docker is now making it easier than ever to develop containers on, and for Arm servers and devices. Using the standard Docker tooling and processes, you can start to build, push, pull, and run images seamlessly on different compute architectures. Note that you don’t have to make any changes to Dockerfiles or source code to start building for Arm.
Docker introduces a new CLI command called buildx
. You can use the buildx
command on Docker Desktop for Mac and Windows to build multi-arch images, link them together with a manifest file, and push them all to a registry using a single command. With the included emulation, you can transparently build more than just native images. Buildx accomplishes this by adding new builder instances based on BuildKit, and leveraging Docker Desktop’s technology stack to run non-native binaries.
For more information about the Buildx CLI command, see Buildx.
Install
Download the latest version of Docker Desktop.
Follow the on-screen instructions to complete the installation process. After you have successfully installed Docker Desktop, you will see the Docker icon in your task tray.
Click About Docker Desktop from the Docker menu and ensure you have installed Docker Desktop version 2.0.4.0 (33772) or higher.
Build and run multi-architecture images
Run the command docker buildx ls
to list the existing builders. This displays the default builder, which is our old builder.
Create a new builder which gives access to the new multi-architecture features.
Alternatively, run docker buildx create --name mybuilder --use
to create a new builder and switch to it using a single command.
Switch to the new builder and inspect it.
Test the workflow to ensure you can build, push, and run multi-architecture images. Create a simple example Dockerfile, build a couple of image variants, and push them to Docker Hub.
Where, username
is a valid Docker username.
Notes:
- The
--platform
flag informs buildx to generate Linux images for AMD 64-bit, Arm 64-bit, and Armv7 architectures. - The
--push
flag generates a multi-arch manifest and pushes all the images to Docker Hub.
Inspect the image using imagetools
.
The image is now available on Docker Hub with the tag username/demo:latest
. You can use this image to run a container on Intel laptops, Amazon EC2 A1 instances, Raspberry Pis, and on other architectures. Docker pulls the correct image for the current architecture, so Raspberry Pis run the 32-bit Arm version and EC2 A1 instances run 64-bit Arm. The SHA tags identify a fully qualified image variant. You can also run images targeted for a different architecture on Docker Desktop.
You can run the images using the SHA tag, and verify the architecture. For example, when you run the following on a macOS:
In the above example, uname -m
returns aarch64
and armv7l
as expected, even when running the commands on a native macOS developer machine.
Series Index
Reducing Image Size
Details Specific To Different Languages
Going Farther To Reduce Image Size
Introduction
When getting started with containers, it’s pretty easy to be shocked by the size of the images that we build. We’re going to review a number of techniques to reduce image size, without sacrificing developers’ and ops’ convenience. In this first part, we will talk about multi-stage builds, because that’s where anyone should start if they want to reduce the size of their images. We will also explain the differences between static and dynamic linking, as well as why we should care about that. This will be the occasion to introduce Alpine.
In the second part, we will see some particularities relevant to various popular languages. We will talk about Go, but also Java, Node, Python, Ruby, and Rust. We will also talk more about Alpine and how to leverage it across the board.
In the third part, we will cover some patterns (and anti-patterns!) relevant to most languages and frameworks, like using common base images, stripping binaries and reducing asset size. We will wrap up with some more exotic or advanced methods like Bazel, Distroless, DockerSlim, or UPX. We will see how some of these will be counter-productive in some scenarios, but might be useful in some particular cases.
Note that the sample code, and all the Dockerfiles mentioned here, are conveniently available in a public GitHub repository, with a Compose file to build all the images and easily compare their sizes.
What we’re trying to solve
I bet that everyone who built their first Docker image that compiled some code was surprised (not in a good way) by the size of that image.
Look at this trivial “hello world” program in C:
We could build it with the following Dockerfile:
… But the resulting image will be more than 1 GB, because it will have the whole gcc
image in it!
If we use e.g. the Ubuntu image, install a C compiler, and build the program, we get a 300 MB image; which looks better, but is still way too much for a binary that, by itself, is less than 20 kB:
Same story with the equivalent Go program:
Building this code with the golang
image, the resulting image is 800 MB, even though the hello
program is only 2 MB:
There has to be a better way!
Let’s see how to drastically reduce the size of these images. In some cases, we will achieve 99.8% size reduction (but we will see that it’s not always a good idea to go that far).
Pro Tip: To easily compare the size of our images, we are going to use the same image name, but different tags. For instance, our images will be hello:gcc
, hello:ubuntu
, hello:thisweirdtrick
, etc. That way, we can run docker images hello
and it will list all the tags for that hello
image, with their sizes, without being encumbered with the bazillions of other images that we have on our Docker engine.
Multi-stage builds
This is the first step (and the most drastic) to reduce the size of our images. We need to be careful, though, because if it’s done incorrectly, it can result in images that are harder to operate (or could even be completely broken).
Multi-stage builds come from a simple idea: “I don’t need to include the C or Go compiler and the whole build toolchain in my final application image. I just want to ship the binary!”
How To Increase The Size Of The Base Docker For Mac 10.6
We obtain a multi-stage build by adding another FROM
line in our Dockerfile. Look at the example below:
We use the gcc
image to build our hello.c
program. Then, we start a new stage (that we call the “run stage”) using the ubuntu
image. We copy the hello
binary from the previous stage. The final image is 64 MB instead of 1.1 GB, so that’s about 95% size reduction:
Not bad, right? We can do even better. But first, a few tips and warnings.
You don’t have to use the AS
keyword when declaring your build stage. When copying files from a previous stage, you can simply indicate the number of that build stage (starting at zero).
In other words, the two lines below are identical:
Personally, I think it’s fine to use numbers for build stages in short Dockerfiles (say, 10 lines or less), but as soon as your Dockerfile gets longer (and possibly more complex, with multiple build stages), it’s a good idea to name the stages explicitly. It will help maintenance for your team mates (and also for future you who will review that Dockerfile months later).
Warning: use classic images
I strongly recommend that you stick to classic images for your “run” stage. By “classic”, I mean something like CentOS, Debian, Fedora, Ubuntu; something familiar. You might have heard about Alpine and be tempted to use it. Do not! At least, not yet. We will talk aboutAlpine later, and we will explain why we need to be careful with it.
Warning: COPY --from
uses absolute paths
When copying files from a previous stage, paths are interpreted as relative to the root of the previous stage.
The problem appears as soon as we use a builder image with a WORKDIR
, for instance the golang
image.
If we try to build this Dockerfile:
We get an error similar to the following one:
This is because the COPY
command tries to copy /hello
, but since the WORKDIR
in golang
is /go
, the program path is really /go/hello
.
If we are using official (or very stable) images in our build, it’s probably fine to specify the full absolute path and forget about it.
However, if our build or run images might change in the future, I suggest to specify a WORKDIR
in the build image. This will make sure that the files are where we expect them, even if the base image that we use for our build stage changes later.
Following this principle, the Dockerfile to build our Go program will look like this:
If you’re wondering about the efficiency of multi-stage builds for Golang, well, they let us go (no pun intended) from a 800 MB image down to a 66 MB one:
Using FROM scratch
Back to our “Hello World” program. The C version is 16 kB, the Go version is 2 MB. Can we get an image of that size?
Can we build an image with just our binary and nothing else?
Yes! All we have to do is use a multi-stage build, and pick scratch
as our run image. scratch
is a virtual image. You can’t pull it or run it, because it’s completely empty. This is why if a Dockerfile starts with FROM scratch
, it means that we’re building from scratch, without using any pre-existing ingredient.
This gives us the following Dockerfile:
If we build that image, its size is exactly the size of the binary (2 MB), and it works!
There are, however, a few things to keep in mind when using scratch
as a base.
No shell
The scratch
image doesn’t have a shell. This means that we cannot use the string syntax with CMD
(or RUN
, for that matter). Consider the following Dockerfile:
If we try to docker run
the resulting image, we get the following error message:
It’s not presented in a very clear way, but the core information is here: /bin/sh
is missing from the image.
This happens because when we use the string syntax with CMD
or RUN
, the argument gets passed to /bin/sh
. This means that our CMD ./hello
above will execute /bin/sh -c './hello'
, and since we don’t have /bin/sh
in the scratch
image, this fails.
The workaround is simple: use the JSON syntax in the Dockerfile. CMD ./hello
becomes CMD ['./hello']
. When Docker detects the JSON syntax, it runs the arguments directly, without a shell.
No debugging tools
The scratch
image is, by definition, empty; so it doesn’t have anything to help us troubleshoot the container. No shell (as we said in the previous paragraph) but also no ls
, ps
, ping
, and so on and so forth. This means that we won’t be able to enter the container (with docker exec
or kubectl exec
) to look into it.
(Note that strictly speaking, there are some methods to troubleshoot our container anyway. We can use docker cp
to get files out of thecontainer; we can use docker run --net container:
to interact with the network stack; and a low-level tool like nsenter
can be very powerful. Recent versions of Kubernetes have the concept of ephemeral container, but it’s still in alpha. And let’s keep in mind that all these techniques will definitely make our lives more complicated, especially when we have so much to deal with already!)
One workaround here is to use an image like busybox
or alpine
instead of scratch
. Granted, they’re bigger (respectively 1.2 MB and 5.5 MB), but in the grand scheme of things, it’s a small price to pay if we compare it to the hundreds of megabytes, or the gigabytes, of our original image.
No libc
That one is trickier to troubleshoot. Our simple “hello world” in Go worked fine, but if we try to put a C program in the scratch
image, or a more complex Go program (for instance, anything using network pacakges), we will get the following error message:
Some file seems to be missing. But it doesn’t tell us which file is missing exactly.
The missing file is a dynamic library that is necessary for our program to run.
What’s a dynamic library and why do we need it?
After a program is compiled, it gets linked with the libraries that it is using. (As simple as it is, our “hello world” program is still using libraries; that’s where the puts
function comes from.) A long time ago (before the 90s), we used mostly static linking, meaning that all the libraries used by a program would be included in the binary. This is perfect when software is executed from a floppy disk or a cartridge, or when there is simply no standard library. However, on a timesharing system like Linux, we run many concurrent programs that are stored on a hard disk; and these programs almost always use the standard C library.
In that scenario, it gets more advantageous to use dynamic linking. With dynamic linking, the final binary doesn’t contain the code of all the libraries that it uses. Instead, it contains references to these libraries, like “this program needs functions cos
and sin
and tan
from libtrigonometry.so
. When the program is executed, the system looks for that libtrigonometry.so
and loads it alongside the program so that the program can call these functions.
Dynamic linking has multiple advantages.
- It saves disk space, since common libraries don’t haveto be duplicated anymore.
- It saves memory, since these libraries can be loaded oncefrom disk, and then shared between multiple programs using them.
- It makes maintenance easier, because when a library is updated,we don’t need to recompile all the programs using that library.
(If we want to be thorough, memory savings aren’t a result of dynamic libraries but rather of shared libraries. That being said, the two generally go together. Did you know than on Linux, dynamic library files typically have the extension .so
, which stands for shared object? On Windows, it’s .DLL
, which stands for Dynamic-link library.)
Back to our story: by default, C programs are dynamically linked. This is also the case for Go programs that are using some packages. Our specific program uses the standard C library, which on recent Linux systems will be in file libc.so.6
. So to run, our program needs that file to be present in the container image. And if we’re using scratch
, that file is obviously absent. This is the same if we use busybox
or alpine
, because busybox
doesn’t contain a standard library, and alpine
is using another one, that is incompatible. We’ll tell more about that later.
How To Increase The Size Of The Base Docker For Mac Catalina
How do we solve this? There are at least 3 options.
Building a static binary
We can tell our toolchain to make a static binary. There are various ways to achieve that (depending on how we build our program in the first place), but if we’re using gcc
, all we have to do is add -static
to the command line:
The resulting binary is now 760 kB (on my system) instead of 16 kB. Of course, we’re embedding the library in the binary, so it’s much bigger. But that binary will now run correctly in the scratch
image.
We can get an even smaller image if we build a static binary with Alpine. The result is less than 100 kB!
Adding the libraries to our image
We can find out which libraries our program needs with the ldd
tool:
We can see the libraries needed by the program, and the exact path where each of them was found by the system.
In the example above, the only “real” library is libc.so.6
. linux-vdso.so.1
is related to a mechanism called VDSO (virtual dynamic shared object), which accelerates some system calls. Let’s pretend it’s not there. As for ld-linux-x86-64.so.2
, it’s actually the dynamic linker itself. (Technically, our hello
binary contains information saying, “hey, this is a dynamic program, and the thing that knows how to put all its parts together is ld-linux-x86-64.so.2
”.)
If we were so inclined, we could manually add all the files listed above by ldd
to our image. It would be fairly tedious, and difficult to maintain, especially for programs will lots of dependencies. For our little hello world program this would work fine. But for a morecomplex program, for instance something using DNS, we would run into another issue. The GNU C library (used on most Linux systems) implements DNS (and a few other things) through a fairly complex mechanism called the Name Service Switch (NSS in short). This mechanism needs a configuration file, /etc/nsswitch.conf
, and additional libraries. But these libraries don’t show up with ldd
, because they are loadedlater, when the program is running. If we want DNS resolution to work correctly, we still need to include them! (These libraries are typically found at /lib64/libnss_*
.)
I personally can’t recommend going that route, because it is quite arcane, difficult to maintain, and it might easily break in the future.
Using busybox:glibc
There is an image designed specifically to solve all these issues: busybox:glibc
. It is a small image (5 MB) using busybox
(so providing a lot of useful tools for troubleshooting and operations) and providing the GNU C library (or glibc
). That image contains precisely all these pesky files that we were mentioning earlier. This is what we should use if we want to run a dynamic binary in a small image.
Keep in mind, however, that if our program uses additional libraries, these libraries will need to be copied as well.
How To Increase The Size Of The Base Docker For Mac Os
Recap and (partial) conclusion
Let’s see how we did for our “hello world” program in C. Spoiler alert: this list includes results obtained by leveraging Alpine in ways that will be described in the next part of this series.
- Original image built with
gcc
: 1.14 GB - Multi-stage build with
gcc
andubuntu
: 64.2 MB - Static glibc binary in
alpine
: 6.5 MB - Dynamic binary in
alpine
: 5.6 MB - Static binary in
scratch
: 940 kB - Static musl binary in
scratch
: 94 kB
How To Increase The Size Of The Base Docker For Mac High Sierra
That’s a 12000x size reduction, or 99.99% less disk space.
Not bad.
Personally, I wouldn’t go with the scratch
images (because troubleshooting them might be, well, trouble) but if that’s what you’re after, they’re here for you!
In the next part, we will mention some aspects specific to the Go language, including cgo and tags. We will also cover other popular languages, and we will talk more about Alpine, because it’s pretty awesome if you ask me.