Daniel Leeder


Early in my tenure at a company, a major production issue occurred. A meeting was scheduled, labeled "RCA." The COO, CTO, and a couple of directors were present. The COO initiated the meeting with two questions: "Whose fault is this?" and "Who do we need to talk to?"

These questions were immediate red flags, an early indicator of a mismanaged organization where accountability was confused with blame. This type of culture, where the goal is to punish individuals, is not just toxic—it's ineffective. It's not individuals that allow critical issues like these to occur; it's the lack of systems in place to prevent them.


The Problem with a Culture of Blame

A "who-dunnit" approach to failure creates a culture of fear. When people are afraid to make mistakes, they stop taking risks. They hide problems instead of surfacing them. Innovation grinds to a halt.

Furthermore, this mindset is fundamentally incompatible with scale. Scolding or keeping a close eye on certain individuals is not a feasible or effective strategy. The more complex your system becomes, the more you need to rely on the system itself for protection, not on the heroic or cautious actions of any single person. Well-implemented systems are the only guarantee of quality and protection against failure.


From "RCA" to Blameless Post-Mortem

This is why the goal should not be a simple "Root Cause Analysis" (RCA) that identifies a person or a single event. The goal must be a comprehensive Post-Mortem focused on learning and systemic improvement. A post-mortem is a process, not a tribunal.

A constructive, blameless post-mortem seeks to answer a series of questions:

  1. What was the root cause? A timeline of the events leading up to the failure.
  2. How did our process fail? Why did our testing, CI/CD pipeline, or review process not catch this before it made it to production?
  3. What was the blast radius? Who and what was affected by the issue?
  4. What were the immediate resolutions? What steps were taken to mitigate the issue right away?
  5. What are the long-term preventative measures? What new systems, automations, rules, or workflows can we establish to prevent this entire class of problem from ever happening again?

Fix the System, Not the Blame

The key goal of a post-mortem is to fix the issue now and in the future. It's about building a more resilient organization, not about finding a scapegoat until the same systemic failure allows the issue to happen again.

Changing this mindset is not easily accomplished if a company has become accustomed to blame as a normal operating practice, but it is the most essential lesson a scaling organization can learn. The failure is not with the individual; it's with the system.