FinanceHub: A Microservices Journey
FinanceHub (name changed for privacy) is a leading financial technology company providing services to over 5 million users across North America. Their platform consists of 70+ microservices handling everything from payments processing to investment analytics.
As organizations scale their microservices architectures, Docker build pipelines often become bottlenecks in the development process. This case study explores how FinanceHub, a fintech company with 70+ microservices, transformed their Docker build process to dramatically improve developer productivity and reduce infrastructure costs.
The Initial Situation
FinanceHub's engineering team had grown to over 200 developers across 15 teams, each responsible for multiple microservices. Their architecture included services built with various technologies:
- 35 Node.js services
- 20 Java/Spring Boot services
- 8 Python services
- 7 Go services
- Various other specialized services
Each service had its own repository and CI/CD pipeline. The team was deploying to production approximately 50 times per day across all services, with each deployment requiring a Docker build and push to their container registry.
The Breaking Point
The engineering leadership realized they had a problem when their CI/CD costs reached $28,000 per month, with Docker builds accounting for nearly 60% of build minutes. Many developers complained about waiting 15-20 minutes for CI/CD pipelines to complete for even minor changes.
Assessment and Planning
A dedicated team of 3 DevOps engineers and 2 senior developers conducted a thorough assessment of their Docker build processes. They discovered numerous inefficiencies, including unoptimized Dockerfiles, lack of caching, and oversized base images.
Implementation Phase
The team implemented a series of Docker build optimizations across all services, starting with the most frequently updated ones. They created standardized Dockerfile templates for each technology stack and updated CI/CD pipelines to leverage BuildKit and remote caching.
Results and Ongoing Improvements
By May, all 70+ services had been migrated to the optimized Docker build process. The team continued to refine their approach, implementing automation to ensure all new services followed best practices.
Key Challenges and Solutions
Challenge #1: Slow Build Times
Most services had build times of 8-15 minutes in CI, with an average of 12 minutes. This resulted in developers waiting for feedback and delayed deployments, especially for urgent fixes.
Analysis revealed common issues:
- Poor layer ordering causing unnecessary rebuilds
- No caching between builds in CI/CD
- Full rebuilds for minor code changes
Solution: Optimized Dockerfile Templates
The team created standardized Dockerfile templates for each technology stack with:
- Optimized layer ordering for better cache utilization
- Multi-stage builds to separate build and runtime dependencies
- BuildKit cache mounts for package manager caches
- Remote cache storage and retrieval in CI/CD
Example for Node.js services:
# syntax=docker/dockerfile:1.4
FROM node:18-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm \
npm ci
FROM node:18-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN --mount=type=cache,target=/root/.npm \
npm run build
FROM node:18-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/dist ./dist
COPY --from=deps /app/node_modules ./node_modules
USER node
EXPOSE 3000
CMD ["node", "dist/main.js"]
Challenge #2: Oversized Images
Container images were unnecessarily large, causing:
- Slower deployments due to large image pulls
- Higher storage costs in container registries
- Increased attack surface with unneeded tools
- Wasted resources in production
The average Node.js service image was 1.2GB, and Java services averaged 850MB.
Solution: Image Size Optimization
The team implemented several image size reduction techniques:
- Strict multi-stage builds with minimal final images
- Alpine-based images where appropriate
- Distroless images for Java services
- Production-only dependencies in final stage
- Removal of development tools and documentation
Example for Java services:
# syntax=docker/dockerfile:1.4
FROM eclipse-temurin:17-jdk-alpine AS builder
WORKDIR /app
COPY gradle/ gradle/
COPY gradlew build.gradle settings.gradle ./
RUN --mount=type=cache,target=/root/.gradle \
./gradlew dependencies
COPY src/ src/
RUN --mount=type=cache,target=/root/.gradle \
./gradlew bootJar
FROM gcr.io/distroless/java17-debian11
WORKDIR /app
COPY --from=builder /app/build/libs/*.jar app.jar
EXPOSE 8080
USER nonroot
ENTRYPOINT ["java", "-jar", "app.jar"]
Challenge #3: CI/CD Inefficiencies
CI/CD pipelines were not optimized for Docker builds:
- No reuse of layer cache between pipeline runs
- Separate build and push steps causing duplication
- BuildKit features not enabled in CI
- High CI minutes consumption (approximately 500,000 minutes/month)
Solution: CI/CD Pipeline Optimization
The team redesigned their CI/CD pipeline approach:
- Implemented BuildKit's remote caching in all pipelines
- Created shared base images for common dependencies
- Added distributed caching to store and retrieve layers
- Combined build and push steps to reduce overhead
Example GitHub Actions workflow excerpt:
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Container Registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ secrets.REGISTRY_USERNAME }}
password: ${{ secrets.REGISTRY_PASSWORD }}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ env.IMAGE_NAME }}:${{ env.TAG }}
cache-from: type=registry,ref=${{ env.IMAGE_NAME }}:cache
cache-to: type=registry,ref=${{ env.IMAGE_NAME }}:cache,mode=max
Results and Impact
The optimization efforts led to dramatic improvements across all key metrics:
Metric | Before | After | Improvement |
---|---|---|---|
Avg. Build Time (CI) | 12 minutes | 3.6 minutes | 70% |
Avg. Node.js Image Size | 1.2 GB | 154 MB | 87% |
Avg. Java Image Size | 850 MB | 180 MB | 79% |
CI Compute Minutes/Month | 500,000 | 150,000 | 70% |
Deployment Time | 4.5 minutes | 1.8 minutes | 60% |
Monthly CI Cost | $28,000 | $9,000 | 68% |
These improvements had a profound effect on the development process:
- Developer Productivity: Faster feedback cycles led to more iterations and fewer context switches
- Incident Response: Time to deploy critical fixes reduced by 65%
- Infrastructure Costs: Annual savings of approximately $156,000 in CI costs alone
- Security Posture: Smaller attack surface with minimal production images
- Deployment Reliability: Faster, more reliable deployments with fewer timeout issues
The original pipeline had long build times with many redundant steps and no layer caching between runs.
The optimized pipeline leverages BuildKit caching, parallel builds, and shared base images for dramatic speed improvements.
Key Learnings and Best Practices
The FinanceHub team identified several key learnings that can be applied to other microservices environments:
Standardize Across Teams
Creating standardized Dockerfile templates for each technology stack ensured consistency and simplified maintenance. The team developed a central repository of Dockerfile templates that teams could easily adapt to their specific services.
This standardization made it easier to implement improvements across all services and onboard new services with optimized builds from day one.
Invest in Shared Base Images
The team created a set of custom base images for each major technology stack that included common dependencies and security configurations. These images were rebuilt weekly with the latest security patches.
This approach reduced duplication, improved security compliance, and further decreased build times by providing optimized starting points for service builds.
Measure Everything
The team implemented comprehensive metrics collection for their Docker build process:
- Build times for each pipeline stage
- Image sizes and layer counts
- Cache hit ratios in CI/CD
- CI minutes consumed per service
These metrics allowed them to identify bottlenecks, prioritize optimizations, and quantify improvements.
Pipeline Architecture Matters
The team discovered that the design of CI/CD pipelines significantly impacted build performance. They restructured their pipelines to:
- Run tests in parallel with Docker builds where possible
- Use ephemeral environments for integration tests
- Implement smart skipping of stages when appropriate
- Optimize for cold starts with distributed caching
These pipeline architecture improvements complemented the Dockerfile optimizations for maximum effect.
Lessons for Other Organizations
Based on FinanceHub's experience, here are key recommendations for organizations looking to optimize their Docker build pipelines for microservices:
- Start with Measurement: Collect baseline metrics before making changes to quantify improvements
- Prioritize High-Impact Services: Begin with the most frequently built services or those with the longest build times
- Standardize for Scale: Create templates and standards that can be applied consistently across all services
- Educate Teams: Ensure all developers understand Docker build best practices through workshops and documentation
- Automate Compliance: Implement CI checks to ensure Dockerfile best practices are followed
- Consider Total Costs: Factor in both infrastructure costs and developer time when evaluating optimizations
- Iterate and Improve: Continuously monitor metrics and refine your approach based on real-world results
"Our Docker build optimization project started as a cost-saving initiative, but quickly became a major productivity win. Developers who previously had to wait 15 minutes for feedback now get it in under 4 minutes. That translates to more iterations, better code quality, and happier engineers."
— VP of Engineering, FinanceHub
Conclusion
FinanceHub's journey to optimize their Docker build pipelines demonstrates that with careful analysis and implementation of best practices, organizations can achieve dramatic improvements in build performance, image size, and costs.
The key to their success was a holistic approach that addressed:
- Dockerfile structure and optimization
- CI/CD pipeline architecture
- Caching strategies at multiple levels
- Standardization across teams and services
- Continuous measurement and improvement
For organizations with growing microservices architectures, investing in Docker build optimization can yield significant returns in terms of both direct costs and developer productivity. The principles and techniques demonstrated in this case study can be adapted to microservices environments of any size.
Comments
Comments system placeholder. In a real implementation, this would be integrated with a third-party comments system or custom solution.