Spring Boot in production: config, observability, graceful shutdown

Spring Boot makes it easy to get an app running on your laptop. Getting it to behave well in production is a different set of problems, and they tend to surface at the worst time - during an incident, during a deploy, during a traffic spike. These are the three that have bitten us most: configuration, observability, and shutdown.

Config belongs in the environment

Hardcoded values and committed application.properties with real credentials are how secrets leak. We keep configuration in environment variables, injected at deploy time, with sensible defaults in the code only for things that are safe to default. The twelve-factor rule holds up: the same artifact runs in every environment, and only the config changes.

Secrets come from the environment or a secret manager, never from the repo
Use Spring profiles for behavior differences, not for hiding credentials
Fail fast on startup if a required variable is missing - do not limp along
Bind config to typed @ConfigurationProperties so a typo is a startup error, not a runtime surprise

Observability is not optional

When something breaks at 3am, you are debugging with whatever you instrumented beforehand. Actuator plus Micrometer gives us metrics with almost no code, and we wire structured JSON logging so a log line is queryable, not just human-readable. Every request carries a trace id that flows through logs and downstream calls, so we can follow one user's request across services.

java

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus

The three things we make sure we can answer without redeploying: what is the error rate, how long do requests take at the 95th percentile, and which downstream call is slow. If the answer to any of those requires adding a log line and shipping, we instrumented too little.

You cannot add observability during the incident. You either had it, or you are guessing in the dark.

Graceful shutdown so deploys do not drop requests

By default a container can be killed mid-request during a rolling deploy, and the user gets a connection reset for no reason of their own. Spring Boot supports graceful shutdown: stop accepting new requests, finish the in-flight ones, then exit. We set it explicitly and pair it with a shutdown grace period in the orchestrator that is longer than our slowest reasonable request.

There is a subtlety we learned the hard way. Readiness has to flip to "not ready" before shutdown begins, so the load balancer stops sending new traffic a few seconds before the process actually starts draining. Setting shutdown: graceful and a timeout-per-shutdown-phase of 30s handles the draining, but without that readiness gap you still drop the requests that arrive in the window between the kill signal and the load balancer noticing.

Config belongs in the environment

Secrets come from the environment or a secret manager, never from the repo

Use Spring profiles for behavior differences, not for hiding credentials

Fail fast on startup if a required variable is missing - do not limp along

Bind config to typed @ConfigurationProperties so a typo is a startup error, not a runtime surprise

Observability is not optional

java

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus

You cannot add observability during the incident. You either had it, or you are guessing in the dark.

Graceful shutdown so deploys do not drop requests

Spring Boot in production: config, observability, graceful shutdown

Config belongs in the environment

Observability is not optional

Graceful shutdown so deploys do not drop requests

João Matos

Other notes from the team.

Splitting a monolith without a rewrite

API design that ages well

A testing strategy that pays for itself

Spring Boot in production: config, observability, graceful shutdown

Config belongs in the environment

Observability is not optional

Graceful shutdown so deploys do not drop requests

João Matos

Other notes from the team.

Splitting a monolith without a rewrite

API design that ages well

A testing strategy that pays for itself