Site icon Jessitron

Weakness and Vulnerability

Weakness and vulnerability are different. Separate the concerns: [1]

Vulnerability is an openness to being wounded.
Weakness is inability to live through wounds.

In D&D terms: vulnerability is a low armor class, weakness is low hit points. Armor class determines how hard it is for an enemy to hit you, and hit points determine how many hits you can take. So you have a choice: prevent hits, or endure more hits.

If you try to make your software perfect, so that it never experiences a failure, that’s a high armor class. That’s aiming for invulnerability.

Thing is, in D&D, no matter how high your armor class, if the enemy makes a perfect roll (a 20 on a d20, a twenty-sided die), that’s a critical hit and it strikes you. Even if your software is bug-free, hardware goes down or misbehaves.

If you’ve spent all your energy on armor class and little on hit points, that single hit can kill you.

Embrace failure by letting go of ideal invulnerability, and think about recovery instead. I could implement signal handlers, and maintain them, and this is a huge pain and makes my code ugly. Or I could implement a separate cleanup mechanism for crashed processes. That’s a separation of concerns, and it’s more robust: signal handlers don’t help when the app is out of memory, a separate recovery does.

In the software I currently work on, I take the strategy of building safety nets at the application, process, subsystem, and module levels, as feasible.[3] Then while I try to get my code right, I don’t convolute my code looking for hardware and network failures, bad data and every error I can conceive. There are always going to be errors I don’t conceive. Fail gracefully, and pick up the pieces.

—–
An expanded version of this post, adding the human element, is on True in Software, True in Life.

—–
[1] Someone tweeted a quote from some book on this, on the difference between weakness and vulnerability, a few weeks ago and it clicked with me. I can’t find the tweet or the quote anymore. Anyone recognize this?
[3] The actor model (Akka in my case) helps with recovery. It implements “Have you restarted your computer?” at the small scale.

Exit mobile version