The first postmortem I ran badly, I asked who pushed the change. The room went quiet, the engineer got defensive, and we spent an hour establishing innocence instead of understanding the system. We learned nothing useful and the same class of failure hit us again four months later. Blame is not just unkind. It is operationally useless.
Separate the incident from the postmortem
During the incident there is one job: stop the bleeding. We name a single incident commander whose authority is total for the duration, and everyone else either does what they ask or gets out of the way. We do not analyze root cause while the site is down. We do not assign fault in the chat. We restore service, then we go home, then we meet to learn.
- One incident commander with clear authority, rotated so it is not always the same person
- A scribe capturing a timeline in real time, because memory rewrites itself within hours
- No root-cause debate during the fire, only during the review
- A hard rule that the person closest to the change is treated as the best source, never the suspect
Why blameless is selfish, not soft
Blameless is often sold as kindness. It is, but that is not why we do it. We do it because the engineer who made the change has the most context about what happened, and a frightened person hides context to protect themselves. You cannot fix a system whose operators are incentivized to obscure how it actually behaves. We want the messy truth, so we make telling it safe.
If your postmortem produces a name instead of a changed system, you ran a trial, not a review. The same incident is already scheduled to return.
Ask how the system allowed it
The phrasing matters. We do not ask why did you deploy that. We ask how was it possible to deploy that without a check catching it. The deploy was a person doing their job inside a system that permitted the outcome. The fix lives in the system: a missing guardrail, a confusing dashboard, an alert that fired too late. People are not bugs you patch.
Action items or it didn't happen
A postmortem with no owned, dated action items is therapy. We close every review with a small number of concrete changes, each with a name and a date, and we track them like any other work. And we publish the writeup widely, because a postmortem read only by the team that lived it teaches no one else, and the next team inherits the same trap we just walked out of.