Saturday, February 17, 2007

Code Read 5 - Knuth's "Structured Programming with Go Tos"

Code Reads 5 takes us from Dijkstra to Knuth, and his humorously titled "Structured Programming with Go To Statements". In it, Knuth addresses how the field seems to have missed the real point of Dijkstra's structured programming, and instead focused on mindlessly eliminating "go to" statements. Knuth quotes Hoare when saying the most important point is "the systematic use of abstraction to control a mass of detail", not eliminating a particular programming tool.

The main topic of Knuth's article is not abstraction, but rather examining the uses of "go to" that improve a program, and how newer (in 1974) language constructs (like break statements and case statements) can almost always be used to express the meaning of those "go to" statements. In fact, he ends by saying that although he would personally like to keep "go to" available in higher level languages, he would probably never need to use it.

In general, this piece feels a bit dated, which is a measure of its own success - Knuth said that his aim was to put the "go to" controversy to rest by showing it to be moot, which, except for a few lifelong partisans, is where it is now. And many of the structures he discusses (using break to exit from loops, early versions of case statements) are common in almost all languages. So it makes a fitting end to Code Read's exploration of the roots of language semantics for abstraction.

What I found most interesting was his study of optimization, which the process of changing a program so that it does its work more efficiently. Almost all are examples of optimization. Here he makes some very relevant points, but I think one of his observations is particularly dated.

His most important point is something which many beginning programmers still do not understand: "Premature optimization is the root of all evil". We should write our programs in the clearest way, he says, so they will almost certainly be correct. Then, if there are performance problem, we should analyze the code to find where those problems are, and only optimize those sections. Optimizing first is a surefire way to render a program difficult to understand and likely to contain bugs.

He goes to to give some of good examples of fine grain optimization technique, things that all programmers should be aware of, but will only rarely be used.

Where I feel his view has become outdated is how he relates to small optimizations. "In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal". I think this is no longer true for computer science.

People often ask why building software is not like building bridges. This is one reason. Nobody expects a footbridge to carry a truck. Yet we routinely use the same software to handle hundreds of items that we use to handle hundreds of millions of items. So while a 12% improvement in bridge strength is something meaningful, a 12% improvement in software performance is not nearly so impressive. (In very specific cases, it might be meaningful, but generally its not.)

Suppose you're writing part of a program which is interactive - that is a person will be waiting for the result. And suppose that it takes 60 seconds. A 12 percent improvement is just over 7 seconds, which means the optimized code still takes 52 seconds. Still too long to wait, and not long enough to get a cup of coffee. In fact, I doubt most people will notice the difference, since we are much more susceptible to psychological factors when waiting for computers. (Just one example of this.)

In Knuth's day, people had a different relationship with computers, so this kind of performance gain was more relevant. Not so today. In fact, these days algorithm performance is typically described using "big O notation", which loosely quantifies how an algorithm works as the size of its input changes. And it provides what is by and large a more meaningful estimate of the performance of an algorithm.

To understand this, it is helpful to use example. Suppose we want to compare three ways of sorting a list containing 1000n names (so for 1000 names n = 1, for 2000 names n = 2, etc). We analyze the algorithms as Knuth did in his paper, and come up with the following formulas. The first method takes take 6(2n) milliseconds, the second takes 3(2n) + 15n milliseconds, and the third takes 25n2 milliseconds. From a simple test of 1000 names, with n=1, we get:

Method 1: 12 milliseconds.
Method 2: 21 milliseconds.
Method 3: 25 milliseconds.


We might decide that method 1 is the best say to go. But we'd be wrong, if we wanted to sort bigger lists. Using big O notation makes that clear.

In big O notation, we ignore any constant multiplication, and only take the most rapidly increasing term. So we say method 1 is O(2n), method 2 is also O(2n), and method 3 is O(n2). This is the essential information about which is faster, and since 2n is much larger than n2, for large n, we know that the third technique is actually the fastest in general. For example, when n=10:

Method 1: 6.1 seconds.
Method 2: 3.2 seconds.
Method 3: 2.5 seconds.

This is exactly reversed from what we had before - method 3 is the fastest and method 1 is the slowest. And things get much worst for methods 1 and 2 when n=100:

Method 1: 240,000,000,000,000,000,000 years
Method 2: 120,000,000,000,000,000,000 years
Method 3: 250 seconds

At this scale methods 1 and 2 are completely useless, leaving method 3 as the only option.

To tie all of this back to Knuth's paper, almost all of his cases for the use of "go to" statements involve optimizations which do not change the big O performance of program at all. So while there are programs out there where this is important (certain key inner loops in operating systems or scientific software) most of us will always be able to rely in library calls where someone else has already does this kind of super-tight optimization, and we should focus solely on selecting the appropriate abstractions for our algorithm. That is, we need to select abstractions that give us the best big O performance. And using "go to" or not using "go to" has nothing to do with this kind of optimization.

No comments:

© 2007 Andrew Sacamano. All rights reserved.