How to succeed without knowing how to succeed

Nature achieves more than any human mind conceives. The powers of predictive models and reasoning are dwarfed by a system without a brain. Why? If we understand this, we can achieve such greatness in our teams, and in our codebases.

Here’s an abstraction. 3 components (and billions of years) take us from amino acids to human beings:
– Some level of random variation
– a way to recognize “better” and keep those variations around, and
– no real cost to the system when a variation is worse.

In nature, mutation is the variation, and natural selection keeps more of “better” around than “worse.” If a particular organism is less fit and dies, the system doesn’t care. That cost is minute, while the potential benefits of progress are large. Every generation is at least as fit as the one before. On and on.

In a reasonably free market, there’s all kinds of people jumping in, there’s competition to keep a “better” business around, and one company going under doesn’t bother the system. New entrants learn from successes before them. Growth.

There is no requirement that anybody predict what will succeed. Only a way to recognize success when it happens, and go with it. Costs of failure are borne by individuals, while success benefits the whole system.

We can set up these conditions. Methods based on MCMC for solving complicated problems totally use this. Take a problem too complex to solve explicitly, but with a way to calculate how good a particular solution is. Example: image recognition, building a model for a scene. Start with some guess (“there’s a tree in the middle”), generate an image from the model, compare with the real one. Tweak the model. Did it get better? Keep it. Did it get worse? probably discard that change and try again. [1] The result ratchets closer and closer to the real solution. The algorithm recognizes success and discards failure quickly. Discovery.

These three examples show problems insoluble by reasoning or prediction, surmounted by recognizing success when it happens, repeatedly. Nature and the market use competition to recognize success, but that is not the only way. MCMC uses a calculation, comparing new results only to previous results. We can do the same when we have a single sample to optimize — one app or one team.

We can set up these conditions for code. At Etsy, there’s a monolithic PHP web app.[4] That doesn’t sound easy to keep reliable under change. Yet, they’ve got this figured out. Lots of developers are making changes, deploying 50x/day. Every feature includes monitoring that tells people whether it’s working, and whether it’s broken. This shows success. And if a deploy breaks something, people find out quickly. They can roll it back, or fix it, right away. Many changes; successes kept; failures mitigated. The app improves over time, with no grand scheme or Architect of Doom watching over it. Productivity.

There’s another element necessary when people are the ones making all the little changes. If failure has a high cost to the individual, then the incentive exists to hold still. But we need little variations net-nudging us toward success. Etsy has removed the cost of failure from the individual by keeping a blameless culture. They treat any outage as a learning opportunity for the whole company, and they do not punish the unfortunate developer whose deploy triggered it. There’s a safety there.

In the market, this safety is LLCs, where you can lose your business but keep your house. In nature or MCMC, the organism or parameter set doesn’t vary voluntarily, so no disincentive exists.

No project plan and organization chart can reach the potential of an agile team, when the team takes Linda Rising’s advice[2]. She said, every week in the retro, pick one thing to change about how the team works. Tweak it, try something new. If it doesn’t help, go back after a week. If it makes your team work better, keep it. Each week, the team is at least as good as the week before. Excellence.

The Romans didn’t develop their political system through a grand plan, according to Nassim Taleb in Antifragile. They did it by tinkering. Taleb calls the property of “Recognize success and win; discard failure and lose nothing” optionality. Combine optionality and randomness for unlimited success.[3]

What does this mean?

It means our apps don’t have to be beautifully architected if they’re well instrumented.

It means that the most important part of treating failure as a source of learning may well be removing failure as a source of persecution.

Removing fear of failure lets people try different ways of doing things. Metrics help recognize the right ones, the variations to keep. Monitoring and quick rollbacks make failures cheap at the system level. Maybe these 3 things, and time, are all we need to build complex software more useful than we can imagine.

[1] In MCMC, you sometimes keep a worse solution, with a probability based on how much worse. It gets you out of local optimums.
[2] Linda Rising’s closing keynote to GOTO Amsterdam 2014.
[3] This post comes out of this section in Antifragile. The book is annoying, but the ideas in it are crucial.
[4] Daniel Schauenberg’s presentation at GOTO Amsterdam 2014