Who is the Chaos Monkey in your team?

A post by Harish.

Netflix added 15.7 million users in the first 3 months of 2020. The global stay-at-home situation placed heavy demands on the company, but it could rise up to this scaling challenge. How? One answer is their famed Chaos Monkey

As this CMU case study put it, the Chaos Monkey "is a script that runs continually in all Netflix environments, causing chaos by randomly shutting down server instances. Thus, while writing code, Netflix developers are constantly operating in an environment of unreliable services and unexpected outages."

So you’d expect Netflix to expect and deal with chaos. 2020 saw a Chaos Monkey take on the entire world. Or should we call it the Chaos Swan? 

The Taleb Dictionary

Unlike with previous crises, we are all better prepared for COVID-19...in our vocabulary. The world has soaked in the ideas of Nassim Taleb in the last ten years. So we hear many people talk about Black Swans, anti-fragility, and making the most of a crisis. But what did we do with these ideas?

When the lock-downs struck, most organizations immediately adopted remote working. They successfully executed their Business Continuity Planning play book. But most of these changes aimed at getting back to the pre-COVID-19 levels. In Taleb’s dictionary, this is merely ‘robustness’ and ‘reduction in fragility’. Being antifragile goes beyond robustness: it’s about systems that improve because of shock.

As an organization, 'antifragility' is a great ideal to aim for. But temper your hopes with a dose of reality. 

The One-Two Punch

I see a two-step hierarchy in how organizations should plan for shocks. The first is to increase robustness. Good organizations already know how to do this using best practices, time-tested systems, and A-players in the right roles. But these very steps can hold you back from becoming antifragile. So efficient is this system that any shock can shake it up.

This took me back to early 2020, when I heard a talk by Monish Darda, Co-founder and CTO of Icertis. He spoke of how he often deliberately takes out the A-players from a functioning team. The rest of the team now has to learn to withstand this shock and deliver without them. While the A-players get to grow by taking up newer ideas. Monish is the Chaos Monkey! 

(Pic: courtesy Monish - no doubt, when thinking about anti-fragility!)

(Pic: courtesy Monish - no doubt, when thinking about anti-fragility!)

When asked, most leaders will agree with the spirit of this action. But they’ll think twice before doing it. That’s understandable: they are not sure how things will pan out. But in the process, they’ll never find out how good their teams are, or whether their system will be antifragile. It takes immense courage for leaders to pull the plugs out to see what happens. It also takes ‘psychological safety’ for team-members to respond to crises in a positive way. 

‘It don't matter if you're black or white’. Or does it?

For several years, many experts have warned us about pandemics. So is COVID-19 a Black, Grey, or White Swan? Does it matter which colour it is? I think it doesn't. When you are aware of a potentially significant event, the choice to prepare for it is yours. 

We don’t know when the next grave crisis will come, but we know that, sooner or later, another one will come along. So think about this: who is the Chaos Monkey in your team?

It’s like that old Road Safety campaign ad: "It’s your choice. After all, it’s your head".


Our thanks to Monish Darda for reading a draft of this post