complexity as entropy
In a random dataset, there are no internal relationships; with each element, our explanation must begin anew.
In the previous post, I characterized Ron Jeffries' meandering approach to software design as "a shrug in the face of entropy." Some readers seem to have taken this as a strange, flowery metaphor. It wasn't.
In this newsletter, our analytic framework is borrowed from information theory: simplicity is a fitness between content and expectation. When we say software is complex, we mean it's difficult to explain; we need to bridge the distance between our audience's expectations and the realities of our software. This distance — the subjective complexity of our software — can be measured in bits.
In some literature, this metric is called surprisal. This is a good name; it reminds us that the complexity of our code varies with its audience. But there is a more common term for the number of bits in a message, borrowed from thermodynamics: entropy.
To understand the relationship between information and disorder in a physical system, consider why a sequence of random numbers cannot be compressed. When we compress data, we exploit its internal relationships; we use one part to describe another. You can see this in run-length encoding, or in how gzip encodes duplicate strings as back-references. But in a random dataset, there are no internal relationships; with each element, our explanation must begin anew.
And this is why, in this newsletter, we spend so much time on the structures that bind the disparate parts of our software together. These structures compress our code, amplify our explanations. They make our software simpler.
entropy as decay
While entropy is an instantaneous measure, it's often used as a metonym for the second law of thermodynamics: in physical systems, entropy increases over time. When we talk about entropy, it connotes more than temporary disorder; it has the weight of inevitability, an inexorable decline.
This also applies to entropy in our software. Consider how, in the previous post, Jeffries' code was full of tiny, needless complexities: muddled naming conventions, a single mutable method on an otherwise immutable class. These small inconsistencies make the existing relationships harder to see, easier to ignore. If Jeffries continued to tinker with his solver, we'd expect this complexity to compound.
We cannot prevent our software from being misunderstood, even by ourselves. Its structures will, inevitably, be weakened as our software changes. Good design requires continuous maintenance.
According to Ron Jeffries, his pinhole approach to software design provides this maintenance. This is demonstrably false; despite spending months iterating on a few hundred lines of code, the inconsistencies remain. But Jeffries is not alone in his belief. Many writers espouse a similar approach to software maintenance, and all of them should be approached with skepticism.
broken windows, clean code
Shortly before co-authoring the Agile Manifesto, Andrew Hunt and David Thomas wrote The Pragmatic Programmer. In the first few pages, they introduce the metaphor of broken windows:
One broken window, left unrepaired for any substantial length of time, instills in the inhabitants of the building a sense of abandonment.1
And the same was true for software. "We've seen," they said, "clean, functional systems deteriorate pretty quickly once windows start breaking."
This metaphor was borrowed from an article in The Atlantic, published in 1982. It described how a social psychologist named Philip Zimbardo had parked a car in Palo Alto. He left the hood up and removed the license plate, as an invitation for passersby to do something antisocial. He observed it for an entire week, but nothing happened.
"Then," the article tells us, "Zimbardo smashed part of it with a sledgehammer. Soon, passersby were joining in. Within a few hours, the car had been turned upside down and utterly destroyed."2 It is implied, but never stated, that the car was destroyed because of a simple broken window.
This is not what actually happened3, but few people noticed or cared. The article justified a belief that most readers of The Atlantic already held: systems decay from the bottom-up. And this, in turn, suggests the contrapositive: systems are restored from the bottom-up.
This preexisting belief was a reflection of both the publication and the time. The Atlantic is a center-right magazine, and individualism was the ascendant political philosophy of the 1980s. As Margaret Thatcher put it:
There is no such thing as society. There is living tapestry of men and women and people and the beauty of that tapestry and the quality of our lives will depend upon how much each of us is prepared to take responsibility for ourselves.4
The moral of the broken window fable was one of personal responsibility. If we have problems — in our neighborhood or in our software — we should look to ourselves. Systems are just the sum of our choices.
We can find these same themes in Robert Martin's paean to hygiene, Clean Code:
Have you ever waded through a mess so grave that it took weeks to do what should have taken hours? Have you seen what should have been a one-line change, made instead in hundreds of modules?5
"Why does this happen to code?" he asks. Is it external constraints? Is it poorly articulated goals? No, says Martin, this only happens to us when "[w]e are unprofessional." If our software is complex, it's only because we haven't kept it clean.
Martin does not offer a concise definition of clean code. His book, he says, will explain it "in hideous detail."6 He does, however, ask some of his friends to define the term for him. Ward Cunningham provides a familiar answer:
You know you are working on clean code when each routine you read turns out to be pretty much what you expected.7
Martin calls this "profound." But where are these expectations set? His rules are syntactic, focusing entirely on the text of our software.8 It is implied, then, that our expectations arise from that text. Higher-level structures are just the sum of our syntactic choices.
This implies that code ought to be self-documenting. And indeed, Martin has repeatedly warned that comments are best avoided. If our software's paratext has anything to teach us, he says, that only means its text has fallen short:
It's very true that there is important information that is not, or cannot be, expressed in code. That's a failure. A failure of our languages, or of our ability to use them to express ourselves. In every case a comment is a failure of our ability to use our languages to express our intent.
And we fail at that very frequently, and so comments are a necessary evil — or, if you prefer, an unfortunate necessity. If we had the perfect programming language (TM) we would never write another comment.9
As you read through Clean Code, you begin to understand the ideal Martin is chasing. Our classes should be small, with an eye towards reuse in other applications. Our functions should be "just two, or three, or four lines long."10 Martin wants his software to resemble Thatcher's atomized society: a collection of components which are small, clean, decoupled.
But this atomization doesn't make our software simpler. It leads, instead, towards entropy; without structures, our software becomes little more than random numbers.
When we maintain our software, we are trying to preserve its simplicity. The cleanliness of our code is, at best, a small part of that simplicity. To focus entirely on broken windows — to dismiss the rest as incidental, regrettable — is less than a shrug. It's closing our eyes, and hoping the rest will take care of itself.
-
According to Zimbardo's report, his research assistants destroyed the car on their own. After getting over their "considerable reluctance to take that first blow," he said, they found that "the next one comes more easily, with more force, and feels even better."
For Zimbardo, this "awakening of dark impulses" in his students was a glimpse of the violence that lurked beneath polite society. Two years later, he would return to these themes in the now-discredited Stanford Prison Experiment. ↩
-
Martin 2009, p. 5 ↩
-
ibid, p. 12 ↩
-
ibid, p. 11 ↩
-
Martin went on to publish Clean Architecture, Clean Craftsmanship, and Clean Agile, but this seems to be largely a branding exercise. The themes in Clean Code are rarely mentioned, and never built upon. ↩
-
Martin 2009, p. 34 ↩
"[S]ystems decay from the bottom-up. And this, in turn, suggests the contrapositive: systems are restored from the bottom-up." Repetitive code is often caused by the wrong choice of data structures. Such code is better not refactored from the bottom-up, but first understood on a higher-level, and then rewritten. The level of the design mistake determines the level on which the re-design has to happen.
Agreed!