Sotto Voce.

"Qui plume a, guerre a." — Voltaire

Failure Modes

A couple days ago, after work, I was tooling around YouTube and came across a documentary I hadn’t seen in a while: “Failure is Not an Option: A Flight Control History of NASA.” It’s ever so slightly over the top, but that’s okay because the subject — how a group of young engineers invented the procedures that got us to the Moon — is part of a modern epic that should be told in the grand, sweeping language of the sagas and ballads of yore. Growing up, my idols were guys like Christopher Columbus Kraft, Gene Kranz, and John Aaron. So I don’t mind seeing them idolized. And of course, “failure is not an option” entered the popular lexicon with the movie Apollo 13, which elevated the lunar program’s pit crew to the status of heroes.

Yesterday, following SpaceX’s terrific success landing its Falcon orbital booster autonomously on its ocean-going barge, I came across an article on the tech news website Ars Technica titled “Because failure is an option SpaceX can do stuff like land rockets on a boat.” With the documentary fresh in mind, I was naturally curious to read it.

The article opens by noting that “there is a belief among some that, since the heady Apollo days, such an attitude [that of failure not being an option] has made NASA’s managers too timid and too risk averse.” It then constrasts this attitude with something SpaceX’s Elon Musk said in a 2005 Fast Company article: “There’s a silly notion that failure’s not an option at NASA. Failure is an option here. If things are not failing, you are not innovating enough.”

In reading that, it struck me that we’re talking about two fundamentally different ideas of failure here: failure of imagination, and failure of responsibility.

Following the pad fire in January 1967 that killed the flight crew of Apollo I — Gus Grissom, Ed White, and Roger Chaffee — Gene Kranz assembled his flight controllers in a room, shut the door, and gave the greatest speech of his career:

“Spaceflight will never tolerate carelessness, incapacity, and neglect. Somewhere, somehow, we screwed up. It could have been in design, build, or test. Whatever it was, we should have caught it. We were too gung ho about the schedule and we locked out all of the problems we saw each day in our work. Every element of the program was in trouble and so were we. The simulators were not working, Mission Control was behind in virtually every area, and the flight and test procedures changed daily. Nothing we did had any shelf life. Not one of us stood up and said, ‘Dammit, stop!’

I don’t know what Thompson’s committee will find as the cause, but I know what I find. We are the cause! We were not ready! We did not do our job. We were rolling the dice, hoping that things would come together by launch day, when in our hearts we knew it would take a miracle. We were pushing the schedule and betting that the Cape would slip before we did.

From this day forward, Flight Control will be known by two words: “Tough” and “Competent.” “Tough” means we are forever accountable for what we do or what we fail to do. We will never again compromise our responsibilities. Every time we walk into Mission Control we will know what we stand for. “Competent” means we will never take anything for granted. We will never be found short in our knowledge and in our skills. Mission Control will be perfect.

When you leave this meeting today you will go to your office and the first thing you will do there is to write “Tough and Competent” on your blackboards. It will never be erased. Each day when you enter the room these words will remind you of the price paid by Grissom, White, and Chaffee.

These words are the price of admission to the ranks of Mission Control.”

That day, NASA had failed in its duty of care to the three astronauts who perished. What Gene Kranz was saying was that as far as Mission Control was concerned, it would never again abdicate its responsibilities to others.

The losses of the Challenger in 1986 and Columbia in 2003 were not the result of failures of imagination; everyone involved in both missions knew that spaceflight is a risky business. They were failures of responsibilty. In both cases, crucial decisions were made by people who were more concerned with the consequences to themselves than the consequences to others — precisely the opposite of Kranz’ tough and competent dictum.

There were plenty of engineering failures in the testing and development phases of Apollo — engines, structural, design, you name it. They were expected, anticipated, and sought out — how else could they figure out everything that could go wrong before they sent people to the Moon? Nothing could be left to the imagination; everything had to be foressen, tested, and designed for or against. Apollo’s track record makes it clear that there was no failure of imagination there.

The thing is, they worked out all those kinks before they ever put anyone up into space. They were willing to risk failure all the way up to the moment when the astronauts climbed in. Those failures were sought out in order to prevent that failure.

The most recent successful Falcon flight is a perfect example of the balancing of the two kinds of failure I’m talking about here. The design of the Falcon launch vehicle brilliantly isolates two distinct event chains: the delivery of the payload to orbit and the return of the booster to the landing barge. SpaceX does not risk failing its responsibility to its launch customers by tying the success of the former to the success of the latter. Here’s a simple graph:

Function Failure Type Duty of Care
Payload delivery Responsibility Customer
Booster landing Imagination SpaceX

SpaceX doesn’t take risks with the customer’s payload, because the customer is paying them to do the job. (Sure, SpaceX has liability in the event that a booster malfunction results in the loss of the payload, but that’s an incentive, not a risk.) On the other hand, SpaceX can afford to take risks with the booster landing because, hey, whatever SpaceX wants to do with its own property after fulfilling its contractual duty is its own business. SpaceX’s approach neatly compartmentalizes the respective failure modes.

Now SpaceX is talking about sending a Falcon to Mars. Certainly no failure of imagination there. But I think we can be damn sure that the company isn’t proposing to send those astronauts up in a ship that hasn’t been tested to its limits and beyond. Musk surely knows that, as a private company, SpaceX would never survive a public revelation that it had failed its responsibility to the crew.

“Failure is not an option” is not synonymous with “risk averse.” Fear of failure in the everyday sense — “what if I try this and it doesn’t work?” — is not something that NASA currently suffers from. If you have any doubts, just recall the batshit-crazy descent profile flown by the Mars Curiosity lander. Which worked, by the way.

No, “failure is not an option” means “tough and competent.” It means having the balls to hold yourself accountable and the audacity to aspire to perfection. NASA has it, SpaceX has it, Blue Origin has it, and anyone who blazes a trail — to outer space, to inner space, to anywhere in between — has it too.

So say we all.

Categorised as: Life the Universe and Everything

Comments are disabled on this post

One Comment

  1. JoeVC says:

    Well said!