Taking care of code … more and more code

(This is a shorter version of my talk for DeliveryConf, January 2020. Video of slides+audio; Slides as pdf)

Good software is still alive.

The other day, I asked my twelve year old daughter for recommendations of drawing programs. She told me about one (FireAlpaca?) “It’s free, and it updates pretty often.” She contrasted that with one that cost money “but isn’t as good. It never updates.”

The next generation appreciates that good software is updated regularly. Anything that doesn’t update is eventually terrible.

Software that doesn’t change falls behind. People’s standards rise, their needs change. At best, old software looks dumb. At worst, it doesn’t run on modern devices.

Software that doesn’t change is dead. You might say, if it still runs, it is not dead. Oh, sure, it’s moving around — but it’s a zombie. If it isn’t learning, it’ll eventually fall over, and it might eat your face.

I want to use software that’s alive. And when I make software, I want it to stay alive as long as it’s in use. I want it be “done” when it’s out of production.

Software is like people. The only “done” is death.

Alive software belongs to a team.

What’s the alternative? Keep learning to keep living. Software needs to keep improving, at least in small ways, for as long as it is running.

We have to be able to change it, easily. If Customer Service says, “Hey, this text is unclear, can you change it to this?” then pushing that out should be as easy as updating some text. It should be not be harder than when the software was in constant iteration.

This requires automated delivery, of course. And you have to know that delivery works. So you have to have run it recently.

But it takes more than that. Someone has to know — or find out quickly — where that text lives. They have to know how to trigger the deployment and how to check whether it worked.

More than that, someone has to know what that text means. A developer needs to understand that application. Probably, this is a developer who was part of its implementation, or the last major set of changes.

For the software to be alive, it has to be alive in someone’s head.

And one head will never do; the unit of delivery is the team. That’s more resilient.

Alive software is owned and cared for by an active team. Some people keep learning, keep teaching the software, and the shared sociotechnical system keeps living. The team and software form a symmathesy.

How do we keep all our software alive, while still growing more?

Okay, but what if the software is good enough right now? How do we keep it alive when there’s no big initiative to change it?

Hmm. We can ask, what kind of code is easy to change?

Code needs to be clean and modern.

Well, it’s consistent. It is up-to-date with the language versions and frameworks and libraries that we currently use for development.

It is “readable” by our current selves. It uses familiar styles and idioms.

What you don’t want is to come at the “simple” (from outside perspective) task of updating some text, and find you need to install a bunch of old tools, oh wait, there’s security patches that need to happen before this will pass pre-deployment checks. Oh now we have to upgrade more stuff to the modern versions of those libraries to work. You don’t want to have to resuscitate the software before you can breathe new life into it.

If changing the software isn’t easy enough, we won’t do it. And then it gets terrible.

So all those tool upgrades, security patches, library updates gotta have been done already, in the regular course of business.

Keeping those up to date gives us an excuse to change the code, trigger a release, and then notice any problems in the deployment pipeline. We keep confidence that we can deploy it, because we deploy it every week whether we need to or not.

People need to be on stable teams with customer contact.

More interesting than properties of the code: what are some properties of people who can keep code alive?

The team is stable. There’s continuity of knowledge.

The team understands the reason the software exists. The business significance of that text and everything else.

And we still care. We have contact with people who use this software, so we can check in on whether this text change works for them. We continue to learn.

Code belongs to one team.

More interesting still: what kind of relationship does the alive-keeping team have with the still-alive code?

Ownership. The code is under the care of a single team.

Good communication. We can teach the code (by changing it), so we have good deployment automation and we understand the programming language, etc. And the code can teach us — it has good tests, so we know when we broke something. It is accountable to us, in the sense that it can tell us the story of what happens. This means observability. With this, we can learn (or re-learn) how it works while it’s running. Keep the learning moving, keep the system living.

The team is a learning system, within a learning system.

Finally: what kind of environment can hold such a relationship?

(diagram of code, people, relationship, environment)

It’s connected; the teams are in touch with the people who use software, or with customer support. The culture accepts continued iteration as good, it doesn’t fear change. Learning flows into and out of the symmathesy.

It supports learning. Software is funded as a profit center, as operational costs, not as capital expenditure, where a project is “done” and gets deprecated over years. How the accounting works around development teams is a good indication of whether a company is powered by software, or subject to software.

Then there’s the tricky one: the team doesn’t have too much else on their plate.

How do we keep adding code to our responsibilities?

The team that owns this code also owns other code. We don’t want to update libraries all day across various systems we’ve written before. We want to do new work.

It’s like a garden; we want to keep the flowers we planted years ago healthy, and we also want to plant new flowers. How do we increase the number of plants we can care for?

And, at a higher level — how can we, as people who think about DevOps, make every team in our organization able to keep code alive?

Teams are limited by cognitive load.

This is not: how do we increase the amount of work that we do. If all we did was type the same stuff all the time, we know what to do — we automate it.

Our work is not typing; it’s making decisions. Our limitation is not what we can do, it is what we can know.

In Team Topologies, Manuel Pais and Matthew Skelton emphasize: the unit of delivery of a team, and the limitation of a team is cognitive load.

We have to know what that software is about, and what the next software we’re working on is about. and the programming languages they’re in, and how to deploy them, and how to fill out our timesheets and which kitchen has the best bubbly water selection, and who just had a baby, and — it takes a lot of knowledge to do our work well.

Team Topologies lists three categories of cognitive load.

The germane cognitive load, we want that.

Germane cognitive load is the business domain. It is why our software exists. We want complexity here, because the more complex work our software does, the less the people who use it have to bother with. Maximize the percentage of our cognitive load taken up by this category.

So which software systems a team owns matters; group by business domain.

Intrinisic cognitive load increases if we let code get out of date.

Intrinsic cognitive load is essential to the task. This is our programming language and frameworks and libraries. It is the quirks of the systems we integrate with. How to write a healthy database query. How the runtime works: browser behavior, or sometimes the garbage collector.

The fewer languages we have to know, the better. I used to be all about “the best language for the problem.” Now I recommend “the language your team knows best, as long as it’s good enough.”

And “fewer” includes versions of the language, so again, consistency in the code matters.

Extrinsic cognitive load is a property of the work environment. Work on this

Finally, extrinsic cognitive load is everything else. It’s the timesheet system. The health insurance forms. It’s our build tools. It’s Kubernetes. It’s how to get credentials to the database to test those queries. It’s who has to review a pull request, and when it’s OK to merge.

This is not the stuff we want to spend our brain on. The less extrinsic cognitive load on the team, the more we have room for the business and systems knowledge, the more responsibility we can take on.

And this is a place where carefully integrated tools can help.

DevOps is about moving system boundaries to work better. How can we do that?

We can move knowledge within the team, and we can move knowledge out to a different team.

We can move work below the line.

Within the team, we can move knowledge from the social side to the technical side of the symmathesy. We can package up our personal knowledge into code that can be shared.

Automations encapsulate knowledge of how to do something

Automate bits of our work. I do this with scripts.

The trick is, can we make sharing it with the team almost as easy as writing it for ourselves?

Especially automate anything we want to remain consistent.

For instance, when I worked on the docs at Atomist, I wrote the deployment automation for them. I made a glossary, and I wanted it in alphabetical order. I didn’t to put it in alphabetical order; I wanted it to constantly be alphabetical. This is a case for automation.

I wrote a function to alphabetize the markdown sections, and told it to run with every build and push the changes back to the repository.

Autofixes like this also keep the third party licenses up to date (all the npm dependencies and their licenses). This is a legal requirement that a human is not going to do. Another one puts the standard license header on any code that’s committed without it. So I never copied the headers, I just let the automation do that. Formatting and linting, same thing.

If you care about consistency, put it in code. Please don’t nag a human.

Some of that knowledge can help with keeping code alive

Then there’s all that drudgery of updating versions and code styles etc etc — weeding the section of the garden we planted last year and earlier. how much of that can we automate?

We can write code to do some of our coding for us. To find the inconsistencies, and then fix some of them.

Encapsulate knowledge about -when- to do something

Often the work is more than knowledge of -how- to do something. It is also -when-, and that takes requires attentiveness. Very expensive for humans. When my pull request has been approved, then I need to push merge. Then I need to wait for a build, and then I need to use that new artifact in some other repository.

Can we make a computer wait, instead of a person?

This is where you need an event stream to run automations in response to.

Galo Navarro has an excellent description of how this helped smooth the development experience at Adevinta. They created an event hub for software development and operations related activities, called Devhose. (This is what Atomist works to let everyone do, without implementing the event hub themselves.)

We can move some of that to a platform team.

Yet, every automation we build is code that we need to keep alive.

We can move knowledge across team boundaries, with a platform team. I want my team’s breadth of responsibility to increase, as we keep more software alive, so I want its depth to be reduced.

Team Topologies describes this structure. The business software teams are called “stream aligned” because they’re working in a particular value stream, keeping software alive for someone else. We want to thin out their extrinsic cognitive load.

Move some it to a platform team. That team can take responsibility for a lot of those automations. And deep knowledge of delivery and operational tooling. Keep the human judgement of what to deploy when in the stream-aligned teams, and a lot of the “how” and “some common things to watch out for” in the platform team.

Some things a platform team can do:

onboarding
onboarding of code (delivery setup)
delivery
checks every team needs, like licenses

And then, all of this needs to stay alive, too. Your delivery process needs to keep updating for every repository. If delivery is event-based, and the latest delivery logic responds to every push (instead of what the repo was last configured for), then this keeps happening.

But keep thinning our platforms.

Platforms are not business value, though. We don’t really want more and more software there, in the platform.

We do want to keep adding services and automation that helps the team. But growing the platform team is not a goal. Instead, we need to make our platforms thinner.

There is such a thing as “done”

The best way to thin our software is outsourcing to another company. Not the development work, not the decisions. But software as a service, IaaS, logging, tooling of all sorts — hire a professional. Software someone else runs is tech debt you don’t have.

So maybe Galo could move Devhose on top of Atomist and retire some code.

Because any code that isn’t describing business complexity, we do want to die. As soon as we can move onto someone else’s service, win. Kill it, take it out of production. Then, finally, it’s done.

So yeah. There is such a thing as done. “Done” is death. You don’t want it for your value-producing code. You do want it for all other code you run.

Don’t do boring work.

If keeping software alive sounds boring, then let’s change that. Go up a level of abstraction and ask, how much of this can we automate?

Writing code to change code is hard. Automating is hard.

That will challenge your knowledge of your own job, as you try to encode it into a computer. Best case, you get the computer doing the boring bits for you. Worst case, you learn that your job really is hard, and you feel smart.

Keep learning to keep living. Works for software, and it works for us.