Docker Two-Stage Builds

Mar 7

“Let me recommend the best medicine in the world: a long journey, at a mild season, through a pleasant country, in easy stages.” -- James Madison

Phases of a lunar eclipse — Photo by Ganapathy Kumar on Unsplash

I struggled with understanding docker multistage builds for longer than I’d like to admit. I found they have a steep learning curve, most of my projects have long strings of dependencies, and I struggled with lengthy build times. This article is provided to hopefully spare you some of my pains. We begin with a tutorial on the humble two stage build.

The source code lives on my github here https://github.com/mday299/keypuncher/tree/main/Docker/multistage/2Stage

Introduction

Used correctly, Docker multistage build can save you a lot of pain in the long term, but some common mistakes people make include:

Copying Unnecessary Files: Copying the entire file system between stages instead of specific files can lead to larger image sizes and longer build times. Be selective about what you copy!

Overlooking Multistage Build Benefits: Not fully utilizing the benefits of multistage builds, such as optimizing build time and minimizing image size.

Poor Dependency Management: Not managing dependencies correctly can lead to bloated images and longer build times. Pay attention to which dependencies are included in each stage and ensure that only necessary ones are carried over.

Coding

Sample code for this lives on my github here: https://github.com/mday299/keypuncher/tree/main/Docker/multistage/2Stage

Simple: allows root access

We are going to first do a simple 2-stage build for a C++ application. Our two stages will be build and production.

First open and editor or IDE (Integrated development environment) of your choice and add a Docker.simple file:

# Stage 1: Build
FROM gcc:14 AS build

# Set working directory
WORKDIR /app

# Copy source files to the container
COPY main.cpp .

# Compile the program
RUN g++ -o myapp main.cpp

# Stage 2: Production
FROM ubuntu:24.04 AS production

# Set working directory
WORKDIR /app

# Copy the binary from the build stage
COPY --from=build /app/myapp .

Dockerfile.simple summary:

Build Stage:

Uses the gcc:14 image.
Sets the working directory to /app.
Copies the main.cpp source file to the container.
Compiles the program with g++ and generates a binary named myapp.

Production Stage:

Uses the ubuntu:24.04 image.
Sets the working directory to /app.
Copies the compiled binary myapp from the build stage into the production stage.

Then add main.cpp as follows:

#include <iostream>
#include <fstream>

int main() {
    std::ofstream outFile("output.txt");

    if (!outFile) {
        std::cerr << "Error opening file for writing" << std::endl;
        return 1;
    }

    std::string content = "Hello, World!\n";
    outFile << content;

    outFile.close();
    std::cout << "File written successfully" << std::endl;

    return 0;
}

This C++ program does the following:

Includes the necessary headers for input/output operations: <iostream> and <fstream>.
In the main() function:
- Creates an ofstream object named outFile to write to a file named output.txt.
- Checks if the file was successfully opened. If not, it prints an error message and returns 1.
- Writes the string "Hello, World!\n" to the file.
- Closes the file.
- Prints a success message to the console.

To build the image run

docker build -f Dockerfile.simple -t mycppapp .

To view your image enter:

docker images

and you should see something like the following:

REPOSITORY TAG IMAGE ID CREATED SIZE

mycppapp latest 2d5de5618e30 20 hours ago 78.2MB

now build and run the docker container with:

docker run -it mycppapp /bin/bash

This should drop you into a prompt inside the container. You can verify the app is already built inside the container by listing the files:

ls

Which should show:

myapp

You can run the app with:

./myapp

This should result in an output file which can be seen by listing the files in the directory again:

ls
myapp output.txt

Now there are 2 stages to this build. It’s easier to see them if we break them up:

docker build --target build -f Dockerfile.simple -t mycppapp-build .
docker build --target production -f Dockerfile.simple -t mycppapp-prod .

Now a list of docker images should show 3 images:

docker images
REPOSITORY       TAG       IMAGE ID       CREATED      SIZE
mycppapp-prod    latest    2d5de5618e30   3 days ago   78.2MB
mycppapp         latest    2d5de5618e30   3 days ago   78.2MB
mycppapp-build   latest    98ea7128baec   3 days ago   1.42GB

And huzzah! You’ve just done a simple 2 stage build for a C++ application

Advanced: nonRoot

Now we’ll get a bit more complicated by not allowing docker to create a root user, which can enhance security if you do it right. Please note: these containers have not been vetted by any IT department and are provided as non-secure examples!

The Dockerfile.nonRoot is as follows:

# Stage 1: Build
FROM gcc:14 AS build

# Create a non-root user and group
RUN groupadd -r myuser && useradd -r -g myuser myuser

# Set working directory
WORKDIR /app

# Change ownership of the working directory
RUN chown myuser:myuser /app

# Switch to the non-root user
USER myuser

# Copy source files to the container
COPY main.cpp .

# Compile the program
RUN g++ -o myapp main.cpp

# Stage 2: Production
FROM ubuntu:24.04

# Create a non-root user and group
RUN groupadd -r myuser && useradd -r -g myuser myuser

# Set working directory
WORKDIR /app

# Change ownership of the working directory
RUN chown myuser:myuser /app

# Switch to the non-root user
USER myuser

# Copy the binary from the build stage
COPY --from=build /app/myapp .

Here's a summary of your updated Dockerfile:

Build Stage:
- Uses the gcc:14 image.
- Creates a non-root user myuser and group.
- Sets the working directory to /app and changes its ownership to myuser.
- Switches to the non-root user myuser.
- Copies the main.cpp source file to the container.
- Compiles the program with g++ and generates a binary named myapp.
Production Stage:
- Uses the ubuntu:24.04 image.
- Creates a non-root user myuser and group.
- Sets the working directory to /app and changes its ownership to myuser.
- Switches to the non-root user myuser.
- Copies the compiled binary myapp from the build stage into the production stage.

The main.cpp file remains unchanged.

Now you can build this Dockerfile:

docker build -f Dockerfile.nonRoot -t mycppapp-nr .

and then you should be left with 4 images:

docker images
REPOSITORY       TAG       IMAGE ID       CREATED      SIZE
mycppapp-nr      latest    5dabe85d2e8f   3 days ago   78.2MB
mycppapp-prod    latest    2d5de5618e30   3 days ago   78.2MB
mycppapp         latest    2d5de5618e30   3 days ago   78.2MB
mycppapp-build   latest    98ea7128baec   3 days ago   1.42GB

To run the image:

docker run -it mycppapp-nr /bin/bash

The run process should be the same as the simple Dockerfile above.

Redundancies

You may have noticed some redundancy in the nonRoot example Dockerfile. Docker doesn’t have the notion if C includes (see Include directive) per se, but it is possible to kind of achieve them same end with Dockerfile ARGs https://docs.docker.com/reference/dockerfile/#arg. That is covered at https://forums.docker.com/t/use-arg-in-from-in-multistage-build/124610 and https://docs.docker.com/reference/dockerfile/#impact-on-build-caching but personally I haven’t yet had much need to use ARG in this way.

Cleaning Up:

To remove all containers

docker rm $(docker ps -a -q)

To remove all images

docker rmi -f $(docker images -q)

Go nuclear:

docker system prune

Feedback

As always, do make a comment or write me an email if you have something to say about this post!

Details For Interested Parties

Docker multistage builds are a powerful feature, but they can present some challenges. A few reasons why they might be difficult to work with:

Debugging: Debugging issues can be more challenging because you need to understand the interactions between different stages and ensure that artifacts are correctly passed from one stage to another.
Learning Curve: There's a learning curve associated with understanding how to effectively use multistage builds, including knowing how to name and reference stages correctly.
Build Time: Depending on the complexity of the build process, multistage builds can sometimes increase the overall build time, as each stage needs to be processed sequentially.
Dependency Management: Ensuring that all necessary dependencies are included in the final image while excluding unnecessary ones requires careful planning.

Some tips to decrease the build time for Docker multistage builds:

Minimize Layers: Each command in a Dockerfile creates a new layer. Combine commands to reduce the number of layers. For example, instead of having multiple RUN statements, combine them with &&.
Example: https://www.reddit.com/r/docker/comments/wx2pss/do_dockerfile_run_command_still_need_at_end_to/
Use Cached Layers: Take advantage of Docker’s build cache. Reorder commands so that less frequently changing commands come first. This way, Docker can reuse cached layers and avoid rebuilding the entire image from scratch.
See https://docs.docker.com/build/cache/
Optimize Base Images: Choose lightweight base images, such as alpine, whenever possible. This reduces the size and build time.
Exclude Unnecessary Files: Use a .dockerignore file to exclude files and directories that are not needed in the build context. This reduces the amount of data that Docker needs to process.
Optimize COPY Instructions: Combine multiple COPY commands into one, and place it after commands that are less likely to change. This helps in leveraging Docker’s cache more effectively. Also, consider placing COPY instructions that change frequently lower in the Dockerfile.
Use Build Stages Efficiently: Only copy necessary artifacts between stages. Avoid copying the entire file system when only specific files or directories are needed.

Some common mistakes that people make when working with Docker multistage builds:

Overcomplicating Stages: Using too many stages can make the Dockerfile difficult to read and maintain. Keep the stages as simple and focused as possible.
Ignoring Security Best Practices: Not considering security when building images can lead to vulnerabilities. For example, always use the least privileged user in the final stage.
Neglecting Cleanup: Leaving behind temporary files or build artifacts that are not needed in the final image can increase the image size. Make sure to clean up any unnecessary files.