B669: Personalized Data Mining and Mapping
The Keypuncher's Assumption

 

When I was a kid, programming meant keypunching cards. In those ancient days it was an insane effort to change even a single line of code because it meant you had to repunch the entire card (not to mention not having such a thing as "backspace").

Mistyping a keyword wasn't the trivial matter it is today with our color-coded interactive development environments, and dynamic linking and loading. Back then even the tiniest change meant having to resubmit the job and wait half an hour (or even days sometimes) because everything was batch-oriented. So my earliest coding style used variable names like i and j and x and y.

I didn't think too much about it after learning to code that way because in those antediluvian days pretty much all the books also used that style of naming variables. It was only after I started learning about structured design that I began to give it any thought.

Nowadays, decades later, I regularly name my variables numberOfRows, partialTotal, transformed3DMatrix and things like that. Why the difference?

Well with variables like i and x and so on I regularly found myself writing the following kinds of comments (here cast in terms of Java, but this was in the days of Fortran, Pascal, PL/I, Basic, Cobol, Snobol, Watfor, APL, and C):

//multiply the number of rows by the default number of columns
i = j * 15;
...

//for each pixel, transform the pixel using the new colormap
for (int k = 0; k < i; k++)
	...

Now, I'm continually looking for ways to improve my code. After reading excellent books like The Elements of Programming Style, Code Complete, Writing Efficient Code, Writing Solid Code, and so on, I realized that what I was doing was just plain stupid.

Instead, I should have been writing the following:

numPixels = numRows * 15;
...

for (int pixel = 0; pixel < numPixels; pixel++)
	...

The compiler doesn't care once it's built its symbol table, and while it may take more time to type more descriptive names, it actually takes much less time overall because:


* I don't have to write those pointless explanatory comments anymore since the meaning is clear from the code itself.
 
* I'm much less likely to make a coding error when variable names are this explicit; and that has incalculable benefit when it comes to bugs.
 
* Other programmers can follow my code much more easily, so the code is easier to share and to modify.

Since I'm on the topic of variable names, let me mention a related point that shows up in the above code snippet---it has to do with that magic number 15 that's been hardcoded into the program. This is also bad because it makes the program less flexible and because I can easily forget what the number means in the context of the code when I'm deep inside it, or when it's five months after I first wrote it. Is it the 15 that stands for the default number of columns, or is it the 15 that stands for the number of pictures to display? Oops.

Nowadays I would produce something like the following:

private final static int DEFAULT_NUM_COLUMNS = 15;
...

numPixels = numRows * DEFAULT_NUM_COLUMNS;
for (int pixel = 0; pixel < numPixels; pixel++)
	...

The character count (and therefore the initial typing time) has gone up a little, but the code is much cleaner, much easier to read, and much easier to modify than the original. Further, note that the number of characters really isn't that much more when you count the silly comments the previous version needed.

This little lesson extends naturally to methods, classes, abstract classes, interfaces, and packages. Don't call your class MyClass or Display; name it descriptively: ParenthesizedExtractor, AnimationSequenceController.

A lot of our programming problems stem from the "keypuncher's assumption". We all do things a certain way because we did them that way when we first learned to code and those ways made sense to somebody at the time. Maybe it was hard to do things some other way (as in the keypunching example) or simply that that's the way you were taught, or because someone decided that it was more efficient for the machine, or whatever.

The point is that unlike almost every other discipline, good programmers can't afford to be that slack since the technology we use is changing constantly. Assumptions that were true five years ago are ancient history today. If you never reexamine why you're doing whatever it is that you're doing you're going to be ancient history right along with it.