Working Skillfully in Complexity (for VDDD)

For: [Virtual open Space] Systems Thinking and Skillful Interaction

20 September, 2023

Keynote by Jessica Kerr, jessitron.com

These are my notes, publishing for people who were there (or anyone who wants to read them)

Working Skillfully in Complexity

Plan: about half on the technical side of sociotechnical systems, half on social.

I can’t define what Systems Thinking is. It is many things to many people. Personally, my focus area is symmathesy in software teams.

I can tell you some things Systems Thinking is not.

Systems Thinking is NOT linear.

We don’t get a nice clean cause-and-effect story out of it. We get many stories instead. (just as systems thinking is many things)

For instance, if there’s a big production incident, you’re a cannabis distribution company and you’re down for 12 hours on 4/20. Your developers are digging through logs, arguing about what’s the real error. Your infrastructure people are rolling back deploys and restarting processes in hope. After that, after your business lost millions of dollars in business and ??? in reputation – why did that suffering happen? Who fell down on the job?

When software has grown into multiplicitous distributed systems, it can be really hard to tell.

Systems Thinking is NOT simplifying. It is working with complexity.

Microservices said, shrink the parts until each one is simple!
So their interrelations are more complex than ever.
All our Reasoning About Code ends at the process boundary.
- So does “100%” test coverage, even with property-based testing. Because you did not test every possible network partition.

Systems Thinking does NOT predict. Instead, it responds.

We can’t stop incidents from happening.

In software, in our distributed systems, we have the opportunity to make the software easier to respond to, by making it tell us what’s going on.

This is Observability. I want to describe the state of the art in observability–which is distributed tracing–and explain how it works in enough detail that it makes sense.

Now to PowerPoint — and explain how observability works.

With distributed tracing, we see the shape of the forest (overall system health; whether the user experience is degraded) AND some relevant individual trees (requests that failed or were slow, plus representative happy requests).

Systems Thinking is NOT “the bigger picture.” It is the current, local picture in its context with its constituents.

We aren’t trying to solve for every team, every software system.

Systems Thinking (or at least, Symmathesy) is NOT abstract. It is particular.

Sure, there are patterns of forces with circularities, and so fall outside of linear, “rational” thought. These can be useful. But abstraction is not the point. Generality is not the point. Understanding is the point, understanding the system we are working in and on.

So how do we see inside this system?

We ask people. Individually. Listen and watch. “People have a very rich debugging interface, not totally reliable.”

Ask, how do you feel about this? Can you say more? How do you think this works?

And it’s OK for different stories to contradict.

Aim for shared language within scope. (But never universal language, not a thing.)

Systems Thinking is NOT universal. It is what works right here.

Here’s an example: the ACM published an interesting article about robots: should they have rights (like freedoms) or rites (like rituals)? That’s a little confusing and also I don’t care about the robots, I care about people right now. But the article raises some important points about that.

“Rights” as in the US constitutions ‘we are endowed by our Creator’ (different problem, and I don’t have time to talk about how Bible stories presage our need to blame a human in incident reports – Sydney Dekker has a great paper about this) ‘with certain inalienable rights’

… are a useful framework for interacting with strangers.

When we have a group of people who interact regularly, who get things done together, “rights” are not useful. Much more useful is “ritual”–how we interact. From how we greet each other, to how we share information, to how we run a regular meeting. These little ceremonies show civility, they show mutual humanity to each other. They’re crucial to the local system.

And they’re gonna be different everywhere!

Because we want diversity, right? This is where we evolve team norms and rituals particular to the people on the team.

Keep company norms limited.

Where interdependencies exist, encourage rituals. Meetings, or slack conversation, in-person gatherings where possible, something. This is an intervention point, we can be deliberate about this, as leaders and influencers we can use this as a slow leverage point.

Systems Thinking does NOT predict or guarantee. It responds.

(again)

For any given incident or disruption, can we say “this will never happen again”? No.

Can we say whose fault it is? No.

We don’t get to find some proportionate moral failure behind every occurrence of suffering.

We don’t get retribution. (this is from Sydney Dekker’s paper)

Instead, respond to suffering with compassion. Holy cow the team had a rough day on that 4/20 incident. Let’s put feature work on hold for a while and think about this. Work through–not a root cause analysis–ways to make the system easier to see into (improve observability) and use that to notice problems when they’re smaller. Improve our interactions with the wider world (maybe the trigger was a hiccup in payment processing, can we handle that better?) and between our participants (shared view of what’s wrong).

Creating that shared view might require digging into the technical weeds of how do the OpenTelemetry libraries interact with ESM Modules on Node version 16 and our particular web framework. From deep in the weeds of code you never wanted to look at, back up to how this lets me see what my software is doing “oh it doesn’t make that many database calls” “oh yes it does, the trace is reality”, back up to influencing all teams to send uniform telemetry so we can talk about this together, so that next incident we can find the problem in fifteen minutes instead of ten hours.

Get to the right tree, look at the whole thing, diagnose it.

That kind of zooming in and out, always grounded in the present situation and needs, that is systems thinking (symmathesy). It is hard.

Systems Thinking is NOT for everyone. And that’s OK.

At some point ya gotta descend into a limited scope and do the work. Or somebody does. If we can support people in that focus, while making sure they see the relevant parts of the bigger and smaller pictures, we contribute a lot.