My work does not reduce to measurable outcomes. Much of what I accomplish as an engineer and as a developer advocate amounts to creating conditions that make it more likely for the company to succeed. I resist and resent most metrics, yet I don’t mind OKRs the way Honeycomb does them.
How not to OKR: manage to them
OKRs (Objectives and Key Results) were popularized by Google, and many companies use them for evil. If meeting the numbers in the Key Results determines whether we get a raise or get fired, then we skip the Objective and completely ignore externalities (effects on the world outside the system or beyond the timeframe).
OKRs consist of an Objective in words: what are we trying to accomplish? and 2-4 measurable Key Results: how will we know that we succeeded?
entirely hypothetical example:
| Objective: | Increase purchases through Alexa’s voice interface |
| Key Result: | 5% fewer people report that Alexa heard the wrong item |
| Key Result: | 0.5s faster response time to purchase requests |
| Key Result: | 3% more people who ask for an item say ‘Buy it now’ |
OKRs are quarterly. Teams have about three OKRs for a quarter, then evaluate them at the end of those three months. Did we hit our Key Results? Did we meet the objective? Usually people stop at the first question.
Asked to increase purchases from Alexa, as measured by the percent of people who say ‘Buy it now’ after asking for an item, a team might study what factors affect customers’ likelihood to say these magic words. Perhaps they notice that when Alexa says “with delivery by today,” more people say ‘Buy it now.’ Perhaps the team can decrease the response time and get more magic words by defaulting to “with delivery by today” on all items.
Then after I tell Alexa ‘Buy it now,’ she says ‘Your item has an updated delivery date…’ and gets a realistic delivery estimate.
Have you also noticed Alexa doing that on every single item all the time recently? It’s infuriating! It makes me hate using Alexa to purchase things.
It advances the key results but not the objective.
My rate of completing purchases went to zero after the third time consecutive time Alexa reneged on the delivery date. But hey, that’s someone else’s metric!
I imagine the team who made this decision getting high performance ratings and a pizza party for rocking this quarter’s Key Results. And then the entire Alexa program shut down a few years later because overall purchases have tanked.
How to OKR: learn from them
No small set of metrics can tell you you’re doing a good job. They’re a limited clue. This is why I have a resistance to measuring my work. Reducing my job to numbers ignores most of the most significant parts. So why use Key Results at all?
An incomplete measurement is better than no measurement, if you treat it as a clue instead of judgement. At Honeycomb, Key Results remind us to look at the world to see whether we had the effect we want.
For instance, my team is Developer Relations. Our job is to tell the story of Honeycomb’s value to engineers. Right now, we don’t have enough videos out there to show what it’s like using our product. People hear about us, they come looking for an explainer, and they don’t find a current one on YouTube.
| Objective: | People find compelling examples of using Honeycomb on our YouTube channel. |
| Key Result: | We publish 6 new videos this quarter. |
| Key Result: | People watch 2000 minutes of these videos. |
| Key Result: | Our channel has 5% more subscribers. |
These key results are not equivalent to meeting the objective. Instead, they are three observable things that we think will be true if we do meet the objective. Most importantly, they are three conversations we want to have.
In this example, one key result is output-based: publish 6 videos. Barring technical difficulties, my team has direct control over whether this happens. That makes it motivating. However, this one is the farthest from representing the objective.
The second KR, “People watch 2000 minutes of these videos,” is a made-up number. I don’t know whether that number is low or high. I do know that I care how much time people spend watching the video. This KR exists to get us to talk about the performance of the videos. The number doesn’t matter; the measurement does. We will notice which videos keep attention, and guide our future choices.
More engagement is a clue that we might be meeting our objective. The third KR, “5% more subscribers,” is a number we have little control over. I don’t know whether it’s realistic, and that doesn’t matter. It matters that we look at how many subscribers we have, we look at how that is growing, and we wonder about it together.
The key results remind us to look at the world as it is, and compare it to our expectations. At the beginning of the quarter I record a baseline for the number of subscribers; I hadn’t looked at that number before, so the KR is useful already.
At the end of the quarter, it may not be 2000 minutes of viewing, or it may be way more. Either way, I notice what it is. I am more informed for future expectations, and my team has a conversation about how to make our future videos more engaging.
We are able to look thoughtfully at the Key Results, to use them as clues without gaming them, because nobody’s performance review is based on them. Information, not judgement. Nobody at Honeycomb pretends that our work reduces to a number, so we don’t have to warp our systems to stretch the numbers over someone’s threshold. We never do work that influences the Key Result but not the objective.
OKRs give us a few directions (of the many valid ones) to pull toward in a quarter, and key results are markers along the way. Within the team and across the teams, they help us keep in touch with each other and with the world.