similar, but different
In the software design literature, cohesion is often referred to by a different name: single responsibility. As Sandi Metz explains it:
When everything in a class is related to its central purpose, the class is said to be highly cohesive or to have a single responsibility.1
To determine if a method belongs inside a class, Metz suggests posing it as a question. "Mr. Bicycle Gear, what is your ratio," she says, makes sense. "Mr. Bicycle Gear, what is your tire size," does not.
In a cohesive class, then, all the methods are alike. They all extend from the same singular responsibility. They all share a common locus.
We do not need this similarity to be reaffirmed at the method level. We do not ask, "Mr. Bicycle Gear, what is your gear ratio." Instead, a method's name only needs to describe its specific role. When we give something a name, it is to explain how it is different.
Consider the names inside a function. A variable named count
doesn't tell us what is being counted, but it shouldn't have to. The surrounding lexical scopes — the function, the class, the namespace — should explain most of it. A quick glance at the code directly adjacent to count
should explain the rest.
A name like count
describes its relationship to the surrounding code, the singular way in which it differs. In all other ways, we must assume, it is similar. If it were at odds with its surroundings, counting something unrelated, then the name would be longer.
Names, then, should be as short as possible. A long name tells the reader to pay close attention; the code is doing something surprising, something at odds with its surroundings. It signals a lack of cohesion; the disparate components don't quite fit.
Simple names arise from cohesion. They are only possible when each component is similar, but different.
This difference means that a name conveys two things. It describes what its referent is, but also what nearby referents are not. This is why Util
is a bad name; it suggests one class has utility, and all the others don't.
This is also why we prefer count
to n
. We want to partition our variables by role, not by whether they're numeric. Simple names are short, but meaningful.
Single-letter names can, however, have real meaning. There is, for instance, a long-standing convention that i
is an index for the outermost loop, j
for the first inner loop, and so on. In semiotics, these are called indexical signs; like a finger, they point to their referent. As long as the loops are nearby, these indices tell us everything we need to know.
Single responsibility, like cohesion, is a tacit concept. Robert Martin tells us that "[a] class should only have one reason to change,"2 but Sandi Metz cautions that the "SRP doesn't require that a class do only one very narrow thing or that it change for only a single nitpicky reason." It's left to us to find the golden mean.
Nothing in this post changes that. Similarity and difference are both continuums; only we can decide how much is enough. It does, however, suggest an indirect measure for cohesion. If every name within some lexical scope is short and meaningful, then that scope is probably cohesive.
This is why, as a software designer, I constantly return to the names. Sometimes, they tell me I've done enough. And the rest of the time, they at least tell me if I'm moving in the right direction.
-
Martin 2003, p. 95 ↩
This post is an excerpt from my (incomplete) book on software design. For more about the book, see the overview.
A variable named count doesn't tell us what is being counted, but it shouldn't have to.
You may be right, but I wasn't persuaded by this bit. I like making names as self-evident as possible, without requiring the reader to scan through the context to guess at what it means. Tradeoffs, of course.
When you choose a name, you should have some expectations for what the reader already has in their head (I call this the "prefix" here). For instance, I think it's fair to expect that someone reading a method is passingly familiar with the surrounding class. Even if they're not, naming choices that build upon the class will force them to double back and understand it.
I believe this is much simpler (in the information theoretic sense) than trying to reexplain the class at every level. I'll be expanding on this in the next post.
Interesting point that name should show how thing is different. I agree with most, except that n is as good as count. All i, j, k and n come from mathematics. Look for example how sum, product or limit are written - https://en.wikipedia.org/wiki/Riemann_sum . However n, m and k are common as indices in sequences (https://en.wikipedia.org/wiki/Sequence). 🤷♂️
If you're sure that people reading your code have that association with
n
,m
, andk
, then the same line of reasoning used fori
,j
, andk
applies. But since it's impossible to control your audience - teams change over time, etc - I wouldn't be that confident. In my experience, when programmers usen
it's to refer to "the number" within some lexical scope. That would be my default expectation, and I would expect the same of the average reader.