Multi-stage Docker Builds
Table of Contents
Introduction to Multi-stage Builds
Multi-stage builds are a powerful feature in Docker that allows you to create more efficient and smaller images by separating your build environment from your runtime environment. This approach was introduced in Docker 17.05 and has since become a best practice for creating production-ready Docker images.
With multi-stage builds, you can:
- Create significantly smaller images by only including what's necessary to run your application
- Improve security by excluding build tools and dependencies from your final image
- Simplify your CI/CD pipeline by encapsulating the build process in your Dockerfile
- Maintain clear separation between development and production environments
The Problem with Single-stage Builds
Before multi-stage builds, creating efficient Docker images often required two separate Dockerfiles or complex shell scripts to build and package applications. Let's examine why single-stage builds can be problematic:
Single-stage Build
920MBIn a single-stage build, all build tools and dependencies remain in the final image, even though they're not needed at runtime.
Multi-stage Build
145MBWith multi-stage builds, only the production artifacts are copied to the final image, resulting in a much smaller size.
Consider a typical single-stage Dockerfile for a Node.js application:
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]
This Dockerfile results in a large image that contains:
- The Node.js runtime
- All build tools
- All dependencies (including dev dependencies)
- Source code
- Build artifacts
This approach is inefficient and can lead to security vulnerabilities by exposing unnecessary software in production.
How Multi-stage Builds Work
Multi-stage builds solve these problems by using multiple FROM
statements in your Dockerfile. Each FROM
instruction begins a new stage of the build, and you can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.
Here's the basic pattern of a multi-stage Dockerfile:
# Build stage
FROM [base image for building] AS build
WORKDIR /app
COPY [source code]
RUN [build commands]
# Final stage
FROM [minimal runtime base image]
WORKDIR /app
COPY --from=build [build artifacts] [destination]
CMD ["run", "command"]
The key feature here is the COPY --from=build
instruction, which allows you to copy files from a previous build stage. Everything else from the build stage is discarded in the final image.
Example: Multi-stage Build for a Node.js App
Let's convert our earlier single-stage Node.js Dockerfile to a multi-stage build:
# Build stage
FROM node:18 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
# Production stage
FROM node:18-alpine
WORKDIR /app
COPY --from=build /app/dist /app/dist
COPY --from=build /app/package*.json ./
RUN npm install --only=production
EXPOSE 3000
CMD ["npm", "start"]
In this multi-stage Dockerfile:
- We use the full Node.js image for building our application
- We install all dependencies (including dev dependencies) and build the app
- We start a new stage with a smaller Alpine-based Node.js image
- We copy only the build artifacts (dist directory) and package.json files
- We install only production dependencies in the final image
The result is a much smaller image that only contains what's needed to run the application in production.
Example: Go Application with Multi-stage Builds
Go applications benefit especially from multi-stage builds because the compiled binary doesn't require the Go toolchain at runtime. Let's see an example:
# Build stage
FROM golang:1.20 AS build
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /go/bin/app
# Final stage
FROM scratch
COPY --from=build /go/bin/app /app
EXPOSE 8080
CMD ["/app"]
This example shows the dramatic benefits of multi-stage builds. The golang:1.20
image is over 800MB, but the final image based on scratch
(an empty image) is just a few MB because it only contains the compiled Go binary.
The scratch
image is the most minimal base image available, containing nothing except the binary we copied from the build stage. This is perfect for compiled languages like Go, but might not be suitable for interpreted languages that require a runtime environment.
Naming Build Stages
In the examples above, we've used the AS build
syntax to name our build stage. This makes it easier to reference the stage when copying files. You can have multiple named stages and copy from any of them:
# Dependencies stage
FROM node:18 AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm install
# Build stage
FROM node:18 AS build
WORKDIR /app
COPY --from=dependencies /app/node_modules ./node_modules
COPY . .
RUN npm run build
# Production stage
FROM node:18-alpine
WORKDIR /app
COPY --from=dependencies /app/package*.json ./
COPY --from=dependencies /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
EXPOSE 3000
CMD ["npm", "start"]
In this example, we have three stages:
- The
dependencies
stage installs all dependencies - The
build
stage copies the dependencies and builds the application - The production stage copies only what's needed from both previous stages
This approach further optimizes the build process by separating concerns and potentially leveraging Docker's caching more effectively.
Advanced Techniques
Using Build Arguments to Control Stages
You can use build arguments to control which stage is used as the final image. This allows you to create different images for development and production from the same Dockerfile:
# Build stage
FROM node:18 AS build
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
# Development stage
FROM node:18 AS development
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["npm", "run", "dev"]
# Production stage
FROM node:18-alpine AS production
WORKDIR /app
COPY --from=build /app/dist /app/dist
COPY --from=build /app/package*.json ./
RUN npm install --only=production
EXPOSE 3000
CMD ["npm", "start"]
# Default target (can be overridden with --target flag)
FROM ${TARGET:-production}
With this Dockerfile, you can build different images by specifying the target stage:
# Build production image (default)
docker build -t myapp:production .
# Build development image
docker build --build-arg TARGET=development -t myapp:development .
Alternatively, you can use the --target
flag to specify which stage to build up to:
docker build --target development -t myapp:development .
Using External Images for Building
You don't have to use stages only for building. You can also use external images to get specific tools or files for your build process:
# Get a tool from an external image
FROM alpine/git:latest AS git
WORKDIR /app
RUN git clone https://github.com/example/repo.git
# Build stage
FROM node:18 AS build
WORKDIR /app
COPY --from=git /app/repo .
RUN npm install && npm run build
# Production stage
FROM nginx:alpine
COPY --from=build /app/dist /usr/share/nginx/html
This approach lets you use specialized tools for specific tasks without including them in the final image.
Best Practices for Multi-stage Builds
1. Use Specific Base Image Tags
Always use specific version tags for your base images to ensure reproducible builds. Avoid using latest
as it can lead to unexpected changes.
# Good
FROM node:18.16.0-alpine3.17 AS build
# Avoid
FROM node:latest AS build
2. Organize Instructions by Change Frequency
Place instructions that change less frequently at the beginning of each stage to maximize Docker's build cache.
# Dependencies change less frequently than source code
COPY package*.json ./
RUN npm install
# Source code changes more frequently
COPY . .
3. Minimize the Number of Layers in the Final Stage
In the final stage, combine RUN commands when possible to reduce the number of layers.
# Good
RUN apk add --no-cache curl && \
mkdir -p /app/data
# Avoid
RUN apk add --no-cache curl
RUN mkdir -p /app/data
4. Use Non-root Users in the Final Stage
For security, run your application as a non-root user in the final stage.
# Final stage
FROM node:18-alpine
WORKDIR /app
COPY --from=build /app/dist ./
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
CMD ["node", "server.js"]
5. Remove Unnecessary Files
Make sure you only copy the files you need from the build stage, and clean up any temporary files or caches in each stage.
RUN npm install && \
npm run build && \
npm cache clean --force
Conclusion
Multi-stage builds are a powerful Docker feature that helps create smaller, more efficient, and more secure images. By separating your build environment from your runtime environment, you can ensure that your production images only contain what's necessary to run your application.
The key benefits of multi-stage builds include:
- Smaller image sizes (often 50-90% smaller)
- Improved security by excluding build tools and dependencies
- Simplified build process in a single Dockerfile
- Cleaner separation of concerns
As you've seen from the examples, multi-stage builds are particularly beneficial for compiled languages like Go, but they provide significant advantages for any application type.
Ready to Optimize Your Docker Builds Further?
Check out our tutorial on Optimizing Docker Layers to learn how to make your Docker images even more efficient.
Next Tutorial: Optimizing Docker Layers