`HEALTHCHECK` reference

Last reviewed on 2026-05-02

How the HEALTHCHECK instruction works, how to tune its options, and the pitfalls teams hit when running it in production.

The HEALTHCHECK instruction tells Docker how to test that a container is still working. The runtime calls the configured command at a chosen interval and uses its exit code to set the container's health status. That status is exposed in docker ps, docker inspect, and to higher-level orchestrators that consult it.

Syntax

HEALTHCHECK [OPTIONS] CMD command
HEALTHCHECK NONE

Two forms exist. The first registers a probe; the second explicitly disables a healthcheck inherited from a base image. The CMD here is unrelated to the CMD instruction — it is part of the healthcheck syntax.

Options

Option	Default	What it controls
`--interval`	`30s`	How often the probe runs after the container is healthy.
`--timeout`	`30s`	How long a single probe is allowed to take before it counts as a failure.
`--start-period`	`0s`	Grace window during startup. Failures inside this window do not count towards `--retries`.
`--start-interval`	`5s`	Probe interval during the start period. Lets startup health be detected quickly.
`--retries`	`3`	Consecutive failed probes required before the container is marked unhealthy.

Exit-code semantics

0 — healthy. The service is responding correctly.
1 — unhealthy. The service is not responding correctly. Docker increments the failure count.
2 — reserved. Don't use it.

The container starts in the starting state. After the first successful probe (or after enough probes have failed past the retry threshold), it transitions to healthy or unhealthy.

Worked example: HTTP service

FROM nginx:1.27-alpine

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --quiet --tries=1 --spider http://127.0.0.1/ || exit 1

This probe asks the in-container nginx to serve its index. wget --spider performs a HEAD-like request without writing the response to disk; redirecting to exit 1 on failure keeps the script POSIX-clean. --tries=1 is important — without it, wget would retry inside the probe and exhaust the timeout before Docker sees a failure.

Worked example: a service with a dedicated `/healthz`

FROM gcr.io/distroless/static:nonroot
COPY --from=builder /out/server /server
COPY --from=builder /usr/bin/healthcheck /healthcheck

HEALTHCHECK --interval=10s --timeout=2s --start-period=15s --retries=3 \
  CMD ["/healthcheck", "--addr=127.0.0.1:8080", "--path=/healthz"]

USER nonroot
EXPOSE 8080
ENTRYPOINT ["/server"]

A distroless image has no shell, so the probe must be a single executable invoked with the exec form (the JSON-array form). Many teams ship a tiny healthcheck binary alongside the application for exactly this reason. The binary should set a short timeout, exit non-zero on any error, and never log to stdout.

Tuning interval, timeout, and start period

The defaults are usable but rarely optimal. A useful set of decision criteria:

Interval — fast enough to detect a stuck process before the orchestrator restarts it for unrelated reasons. 10s–30s is a typical range. Lower is wasteful; higher means slower detection.
Timeout — strictly less than interval. If a single probe ever blocks, you do not want it overlapping the next one. Set timeout to the worst-case latency of your /healthz handler with margin (often 2s–5s).
Start period — long enough to cover cold-start work: JVM warm-up, DB pool initialisation, cache priming. Probes inside this window may fail without counting towards --retries, which avoids tearing down a perfectly healthy container that simply hasn't finished booting.
Retries — keep at 3 unless you have a good reason. A higher number masks real failures; a lower number makes transient blips fatal.

Common pitfalls

Probing through the public network. A healthcheck is checking this container, not the world. Always probe 127.0.0.1 or a Unix socket, never a public hostname or load balancer.
Probes that are heavier than the request they protect. A probe that touches a database, runs a query, and warms a cache is a probe that will fail under load — exactly when you don't want it to. Keep /healthz cheap; put dependency checks behind a separate /readyz if you need them.
Distroless images without a shell. The default CMD form is parsed by /bin/sh. Distroless images have none. Use the JSON-array exec form, or pick a base image that includes a shell.
No --start-period on slow-booting services. Many JVM and Python services need 10–30 seconds to start. Without a start period, the first --retries failures during boot mark the container unhealthy and an orchestrator may kill it before it ever finishes starting.
Inheriting a base image's HEALTHCHECK by accident. If your base image declares one and yours does not, you inherit it. Use HEALTHCHECK NONE to disable, or set your own.

Healthcheck and orchestrators

Most orchestrators have their own probe model. Kubernetes uses liveness, readiness, and startup probes that are configured at the Pod level and override Dockerfile HEALTHCHECK. ECS, Nomad, and Docker Swarm honour HEALTHCHECK directly. As a rule of thumb: include a HEALTHCHECK for portability and for local docker compose use, but assume the orchestrator's own probes are the source of truth in production.

CMD reference — the difference between shell form and exec form, which applies here too.
ENTRYPOINT reference — how the main process is launched.
FROM reference — picking a base image that includes a shell when your healthcheck needs one.
Security best practices for Docker images — minimal base images and probe shape.

HEALTHCHECK reference