Avoiding inline versioning of code when using an SCM

Saturday, August 8, 2015

An SCM tool such as Git, Mercurial, Subversion, or even the venerable CVS makes it easy to keep a working history of an application. An added bonus of using such a tool is the ability to see what changed in a particular commit (e.g. git diff), who changed a particular block of code and when (e.g. git blame), and so on.

This, of course, makes the assumption that developers are letting the SCM do the versioning. Whether because a change is temporary or just untrusted, an unfortunately common idiom is to comment out a block of code, replacing it with a completely new and different block of code in the same commit. Since the SCM already contains this code, the commented-out block is redundant, and can even break some of the more useful features the SCM may offer.

Example application

The next few sections make use of a Git bundle of a simple node.js application for demonstration purposes. The application being implemented is a prime number generator.

Revision 1: A prime number generator

The first revision of the prime counter program (563de5c) consists of an isPrime function, which calculates whether a given number is prime, and a block which finds a given number of primes. The invocation is pretty simple: call node primes _n_ with the desired number of prime numbers n as an argument.

Revision 2: Adding error handling

In revision 2 (9adf8b0), some basic error handling is added. This is a purely additive change. A usage statement is printed if the incorrect number of parameters is passed, and an appropriate error is displayed if fewer than one prime is requested. This is simple to visualize in git diff:

C:\scm_comments>git diff
diff --git a/primes.js b/primes.js
index c72c70f..60014a1 100644
--- a/primes.js
+++ b/primes.js
@@ -7,6 +7,15 @@ function isPrime(x) {
        return true;
 }

+// Determine if the program was called correctly.
+if(process.argv.length != 3) {
+       console.error("Usage: " + process.argv[0] + " " + process.argv[1] + " (number of primes)");
+       process.exit(1);
+} else if(process.argv[2] < 2) {
+       console.error("Number of primes to calculate must be at least one.");
+       process.exit(2);
+}
+>
 // Find a given number of primes.
 primeCounter = 1;
 for(var i = 2; primeCounter <= process.argv[2]; ++i) {

Revision 3: Making a better prime number generator

In revision 3 (fe9046b), better logic for determining whether a number is prime (via Fermat’s little theorem) is added to the program, changing the implementation of isPrime. In this case, in an attempt to preserve the “old” algorithm, the method body is commented out, rather than overwritten. Looking at git diff, it’s not immediately obvious what’s happened to the original algorithm - bounding comments have been added, but do not show any change to the rest of the block:

C:\scm_comments>git diff
diff --git a/primes.js b/primes.js
index 60014a1..4d776e8 100644
--- a/primes.js
+++ b/primes.js
@@ -1,10 +1,14 @@
 // Determine whether a number is prime.
 function isPrime(x) {
-       for(var i = 2; i < x; ++i) {
+       /* for(var i = 2; i < x; ++i) {
                if (x%i==0) return false;
        }

-       return true;
+       return true; */
+
+       var anmn = Math.pow(2, x) % x;
+       var amn = 2 % x;
+       return anmn == amn;
 }

 // Determine if the program was called correctly.

Likewise, other valuable tools such as git blame will not reflect that, due to the change, code in an older commit is no longer actually being run. The ease with which an interpreter or static analyzer can ignore a multi-line comment of this sort doesn’t translate at all to human beings, be it while code reviewing, debugging, or otherwise.

Avoiding SCM pitfalls

Because this antipattern is not a code problem, code reviews aren’t necessarily sufficient to remedy this post-facto. Should such behavior be called out at a code review, there are several options for fixing the acute, checked-in issue:

Recommend removal of the offending commented-out code block via a new checkin. This further exacerbates the problem, as now two contextless changes have been made to this block.
Amend commits and force-push to the repository. Rewriting history is generally a bad idea. It’s up to process to determine whether this sufficiently constitutes extenuating circumstances. In the case of centralized systems like Subversion, this is not necessarily an option to begin with.
Create a new branch altogether and merge changes on top of it, commit for commit, modifying the appropriate commit as necessary. By deleting the old branch, it will eventually be garbage collected, and the master branch’s history can stay clean. Again, however, this is not particularly relevant to a centralized SCM.

The best way to avoid the issue is to learn what features are available in whichever SCM is being used, and then make use of those features. This, in turn, avoids reliance on workarounds such as this to easily revert or navigate back to old code.