Konstantin is a Site Reliability Engineering (SRE) and DevOps consultant with extensive experience leading teams at top companies using a variety of tech stacks. Currently leading an SRE team, automating the Auckland power grid.
A look at a few real-world catastrophic incidents and the following post-mortems.
The tech is often easy to focus on, but what about the team? You've fixed the platform, but have you stabilised the team?