HEALTHCHECK reference
Last reviewed on 2026-05-02
How the HEALTHCHECK instruction works, how to tune its options, and the pitfalls teams hit when running it in production.
The HEALTHCHECK instruction tells Docker how to test that a container is still working. The runtime calls the configured command at a chosen interval and uses its exit code to set the container's health status. That status is exposed in docker ps, docker inspect, and to higher-level orchestrators that consult it.
Syntax
HEALTHCHECK [OPTIONS] CMD command
HEALTHCHECK NONE
Two forms exist. The first registers a probe; the second explicitly disables a healthcheck inherited from a base image. The CMD here is unrelated to the CMD instruction — it is part of the healthcheck syntax.
Options
| Option | Default | What it controls |
|---|---|---|
--interval | 30s | How often the probe runs after the container is healthy. |
--timeout | 30s | How long a single probe is allowed to take before it counts as a failure. |
--start-period | 0s | Grace window during startup. Failures inside this window do not count towards --retries. |
--start-interval | 5s | Probe interval during the start period. Lets startup health be detected quickly. |
--retries | 3 | Consecutive failed probes required before the container is marked unhealthy. |
Exit-code semantics
- 0 — healthy. The service is responding correctly.
- 1 — unhealthy. The service is not responding correctly. Docker increments the failure count.
- 2 — reserved. Don't use it.
The container starts in the starting state. After the first successful probe (or after enough probes have failed past the retry threshold), it transitions to healthy or unhealthy.
Worked example: HTTP service
FROM nginx:1.27-alpine
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --quiet --tries=1 --spider http://127.0.0.1/ || exit 1
This probe asks the in-container nginx to serve its index. wget --spider performs a HEAD-like request without writing the response to disk; redirecting to exit 1 on failure keeps the script POSIX-clean. --tries=1 is important — without it, wget would retry inside the probe and exhaust the timeout before Docker sees a failure.
Worked example: a service with a dedicated /healthz
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /out/server /server
COPY --from=builder /usr/bin/healthcheck /healthcheck
HEALTHCHECK --interval=10s --timeout=2s --start-period=15s --retries=3 \
CMD ["/healthcheck", "--addr=127.0.0.1:8080", "--path=/healthz"]
USER nonroot
EXPOSE 8080
ENTRYPOINT ["/server"]
A distroless image has no shell, so the probe must be a single executable invoked with the exec form (the JSON-array form). Many teams ship a tiny healthcheck binary alongside the application for exactly this reason. The binary should set a short timeout, exit non-zero on any error, and never log to stdout.
Tuning interval, timeout, and start period
The defaults are usable but rarely optimal. A useful set of decision criteria:
- Interval — fast enough to detect a stuck process before the orchestrator restarts it for unrelated reasons.
10s–30sis a typical range. Lower is wasteful; higher means slower detection. - Timeout — strictly less than interval. If a single probe ever blocks, you do not want it overlapping the next one. Set timeout to the worst-case latency of your
/healthzhandler with margin (often2s–5s). - Start period — long enough to cover cold-start work: JVM warm-up, DB pool initialisation, cache priming. Probes inside this window may fail without counting towards
--retries, which avoids tearing down a perfectly healthy container that simply hasn't finished booting. - Retries — keep at
3unless you have a good reason. A higher number masks real failures; a lower number makes transient blips fatal.
Common pitfalls
- Probing through the public network. A healthcheck is checking this container, not the world. Always probe
127.0.0.1or a Unix socket, never a public hostname or load balancer. - Probes that are heavier than the request they protect. A probe that touches a database, runs a query, and warms a cache is a probe that will fail under load — exactly when you don't want it to. Keep
/healthzcheap; put dependency checks behind a separate/readyzif you need them. - Distroless images without a shell. The default
CMDform is parsed by/bin/sh. Distroless images have none. Use the JSON-array exec form, or pick a base image that includes a shell. - No
--start-periodon slow-booting services. Many JVM and Python services need 10–30 seconds to start. Without a start period, the first--retriesfailures during boot mark the container unhealthy and an orchestrator may kill it before it ever finishes starting. - Inheriting a base image's
HEALTHCHECKby accident. If your base image declares one and yours does not, you inherit it. UseHEALTHCHECK NONEto disable, or set your own.
Healthcheck and orchestrators
Most orchestrators have their own probe model. Kubernetes uses liveness, readiness, and startup probes that are configured at the Pod level and override Dockerfile HEALTHCHECK. ECS, Nomad, and Docker Swarm honour HEALTHCHECK directly. As a rule of thumb: include a HEALTHCHECK for portability and for local docker compose use, but assume the orchestrator's own probes are the source of truth in production.
Related
- CMD reference — the difference between shell form and exec form, which applies here too.
- ENTRYPOINT reference — how the main process is launched.
- FROM reference — picking a base image that includes a shell when your healthcheck needs one.
- Security best practices for Docker images — minimal base images and probe shape.