Sunday, April 29, 2007

Code Read 9 - John Backus and Functional Programming

The 1977 Turing Award went to John Backus, and in his Turing Lecture, "Can Programming Be Liberated from the von Neumann Style?", he made a vigorous case against traditional "von Neumann" programming (VNP), and for functional programming (FP). Unfortunately, the vigorous rhetorical style of this paper has infused the discussion of FP, resulting in countless flame wars over the last 30 years. The overly broad and weakly supported statements that make up the bulk of the discussion often boil down to these:

"Traditional programming languages are clumsy. FP will solve all your problems, if only you are smart enough to use it."

versus

"FP is a toy. Nobody has done any real work with it, so it must be useless."

Both sides are throwing out the baby with the bathwater, and the good ideas of FP are tainted by their association with their more problematic brethren.

Backus's Paper

Let's start with Backus's paper. From the beginning, he clearly has an ax to grind - he describes conventional programming languages foundations as "complex, bulky, not useful", and conventional programs, while "moderately clear, are not very useful conceptually". Whereas, he says later in his paper, FP programs "can accommodate a great range of changeable parts, parts whose power and flexibility exceed that of any von Neumann language so far." Yet he makes very few points to support these statements.

Expressive Power and Simplicity

One of his recurrent themes is the superior "expressive power" of FP. When he does defines "expressive power", it is not clear that even under his definition FP is any more expressive than VNP. His definition revolves around on having functions that are expressible in one language and not in another, yet he offers no supporting example.

He also asserts that VNP languages are necessarily complex, requiring new features to be added to the language itself, rather than implemented using existing language features. As one of the creators of many VNP languages, he's more than entitled to make such statements. But newer languages (like C, and many OO languages) have reversed this trend by simplifying the underlying language and making it easier to define new functions. They do this without using using FP, so this argument also falls apart.

An Algebra of Programs

The most coherent argument in the paper, and also longest, is that FP allows a relatively straightforward mathematical theory of programming, which is much more difficult in VNP. This is true, but I'm not convinced it is that important. He proves the equivalence of two algorithms for matrix multiplication, and also discusses using this algebra of programs to optimize algorithms safely.

However, formal proofs of programs are only marginally useful, usually taking far longer to do well than to write an algorithm, and are just a prone to misunderstanding of the problem to be addresses. So while it may be easier to prove an FP algorithm does what it tires to do, it is just as hard to prove that what it tries to do correctly solves the underlying problem.

Optimization is important, but as I've argued before, the most effective optimizations involve using an newer algorithm that is not equivalent in a formal way, but only by virtue of a deeper understanding of the problem. For example I once had to optimize an algorithm which had to go through a list doing pairwise operations on each element and all previous elements (e.g. f(1,2), f(1,3), f(2,3), f(1,4), f(2,4), f(3,4),...) This approach takes big-O n-squared operations, so the implementation ran in seconds for n = 100, but took hours for n = 1000, and days for n = 10000. No amount of code optimization was going to improve that. But when I realized that there was a way to run it operating on the elements of the list and only a few previous elements, the program went from needing big-O n-squared operations to big-O n operations, which meant a run time of a few minutes even for n = 100000. No formal manipulations of the algorithm would ever have gotten me there, only the insight that most of the operations were unnecessary by virtue of the problem itself, not by virtue of the comparison algorithm.

Oddly, enough, when talking about this algebra of programs, Backus himself says that "severely restricted FP systems" are easier to analyze, "as compared with the much more powerful classical systems." He seems to be implying that the ease of analyzing FP systems justifies their restrictions, but I think the consensus view today is the other way. Outside of academic circles, formal analysis has never caught on.

Rethinking Computers from the Ground Up

One of the most visionary themes in his paper is the proposal that we change the way we relate to state (aka history, or storage). VNP relies on making many small, incremental, changes to the stored data in a computer. Thus the overall state of that data is consistent and meaningful only rarely - usually it represents a job half done. Moreover, subsequent, or concurrent execution runs the risk of operating on that inconsistent data, resulting in what is called "side-effects". Instead of operating piecewise on small bits of data, leaving the overall state inconsistent during the calculation, Backus proposed not just a programming style but a change in hardware that prevents these side-effects. Not only does this make it easier to analyze the code, it becomes trivial to split larger programs into smaller pieces of work, and run those pieces in parallel. Indeed, this idea seems to be the root of many the "FP" successes - for instance its use in telecommunications software, and in Google's MapReduce. However, I put FP in quotes, because this idea is a fringe benefit of FP, and can easily be implemented in traditional languages (in the right context), and even Ericssons's Erlang projects make extensive use of C libraries. Ironically, the most successful idea of FP - a formalized way to eliminate side-effects - seems to be a side effect itself.

Conclusions

Since Turing and Church proved that "Turning Machine Computable" and "Lambda Calculus Computable" are the same thing, and thus FP and VNP can perform exactly the same computations (with suitable extensions to FP), there are two basic questions. Is either one inherently better than the other in:
1) simplicity/ease of writing code (fewer bugs, easier to understand, better at expressing underlying problem space, easier to fix bugs)
2) speed of execution?

I do not find FP to be any more "expressive" or easier to understand than VNP (and my first programming language was LISP). If anything, the restrictions of FP make it harder to model some problems in code, and thus make it harder to understand that code. It is easier to optimize FP in some cases, and thus optimizations bugs are less likely. But these bugs are a small fraction of all bugs, and among the easier to fix once identified.

As for speed of execution, on existing hardware FP programs tend to be slower, whereas they may be faster on purpose build hardware. This is because of the elimination of side-effects. On conventional hardware, eliminating side-effects often requires duplicating data. On purpose-built hardware, eliminating side-effects allows automatic parallelization. So FP programs may be faster in cases where both problem and the hardware support simple parallelization. But parallelization is notoriously difficult, and automatic parallelization technique are only modestly successful on general problems. The marginal payoff of adding more processors quickly diminishes. Instead, special parallel algorithms need to be built which carefully manage data movement in the system, minimize duplication, and so on. These techniques are equally suited to FP and VNP.

So in the end, I think FP will fade away, except for leaving its name attached to extensions to VNP languages which encourage side-effect free programming for easier application to distributed systems. These extensions will prove to be very useful, especially as programming for the web, and programming for multicore processors becomes more and more common.

Hopefully the dogmatic proponents of FP will be satisfied by this, and the dogmatic detractors will be satisfied, so we can all move on to more productive discussions.

2 comments:

Anonymous said...

The primary difference, in my opinion, of functional programming compared to more traditional imperative methods, is its attempt to minimise side effects. The more side effects are done away with, the more 'functional' a language is. If there are no side effects whatsoever, then the language is purely functional. The elimination of side effects is not, itself, a side effect - it's one of the basic principles by which FP is defined.

A lack of side effects is not only useful for parallel programming; it's also useful for testing and reliability. You can unit test a pure function however much you want, safe in the knowledge that for any argument x, you'll always get the same return value, y. You can also be sure it won't inadvertently change your system in any way.

Combined with a robust type system, you have further opportunities to check your program is correct. Since all effects of a function must occur through its return value, this means all the effects of a function must also be typed. A type system in a purely functional language is more pervasive than one in an imperative language can be.

Regarding efficiency, pure functional languages are not significantly any less efficient than their imperative equivalents in most cases. Certainly the difference is much smaller than the difference between, say, C++ and C#.

Whilst FP languages can be hard to read and understand, I'm not certain that has to be the case. And even if that is the case, I don't think a programming language has ever died from being too complex :). Maybe pure FP will never be mainstream, but I don't think it will disappear, either; there are considerable advantages to building software in a language that is entirely free of side effects.

Andrew Sacamano said...

Thanks for the feedback, Weave Jester. You make some very important points, particularly about testing. And I whole-heartedly agree with you that side-effect free programming is the most important idea to come out of FP, and is now generally accepted to be one of, if not the most important, distinguishing ideas of FP. Like you say, "there are considerable advantages to building software in a language that is entirely free of side effects."

But in Backus's paper, side-effect free programming was a definite tangent - his primary argument was that FP was more expressive and conceptually clearer than imperative programming (VNP).

And FP is not the only side-effect minimizing tool. Another style of programming that encourages side-effect free code (but is even less popular than FP) is dataflow programming - look at Prograph, for example.

Lastly, in terms of efficiency, I'm still not convinced. Call by reference, which inherently produces side effects, is a significant performance advantage. It may be possible for skilled FP coders to work around that issue, but I'm not sure. Convince me, somebody...

© 2007 Andrew Sacamano. All rights reserved.