Daniel Leeder


It’s 2:00 PM on a Tuesday. Monitoring alerts turn red. Customer support tickets spike. The site is down.

And then, inevitable as the tide, a message pops up in the incident channel from an executive: "ETA?"

It is a delicate balance when pressures are high, teams are stressed, and leadership is pushing for updates. But as an engineering leader, it is not your job to simply pass down the stress to the teams while they are desperately attempting to solve a crisis.

The Psychology of Panic Coding

When you demand an immediate timeline from an engineer who is deep in the logs trying to figure out why the database locked up, you aren't getting accurate information. You are simply increasing their cognitive load.

A stressed-out team will inevitably gravitate towards the first fix rather than the best fix just to alleviate the pressure.

Your goal is recovery, but your pressure is incentivizing a band-aid.

Preparation: The Antidote to Chaos

The time to manage an outage is not during the outage. It is in the months leading up to it. You avoid the boiling point by building systems that allow for safety and speed.

  1. The Panic Button: Have easily accessible rollback and reversion mechanisms in place. If a deploy breaks the site, the fix shouldn't be "write new code," it should be "press the undo button."
  2. Permissionless Repair: Build trust by enabling independent engineers to make proactive changes without permission barriers. If an engineer needs approval from a VP to restart a server, your recovery time will always be too slow.
  3. Observability, Not Just Monitoring: Surface potential issues before they become too impactful. Monitoring tells you the site is down; well-implemented observability tells you why (e.g., "Latency spiked in the payment service after the last config change").

The Leader's Role: Shield and Support

There is a human element to reliability. You can build resilience by delegating high-stakes tasks to engineers during normal operations to establish communication habits and reduce the level of discomfort when urgent issues do arise.

When the sudden problems do occur, your role shifts from "Manager" to "Support Staff."

Reliability is a culture. If you lead with trust and preparation, your team will respond with competence. If you lead with panic, they will respond with patches.