Friday, March 9, 2007

Code Read 7 - David Parnas on Star Wars Software

In 1985, David Parnas resigned from his position on a panel convened by the Strategic Defense Initiative Organization, which was overseeing the "Star Wars", or SDI anti-ballistic missile defense program. Along with his resignation he submitted several short essays explaining why he thought the software required for Star Wars could not be built, at that time or in the foreseeable future. Those essays were later collected and published, and are the subject of Code Read 7.

Some of the essays deal with issues such as using SDI to fund basic research (an idea in which he did not believe), or why AI would not solve the problems, but his core arguments focus around two main themes.

1) Software can not be reliable without extensive high-quality testing, and such testing could not be done for SDI.
2) Our ability to build software is insufficient to build SDI.

Scott Rosenberg, the author of Code Reads, seems to be asking what, if anything, has changed since 1985. Sadly, the answer is "Not much". Indeed, Parnas's paper is the most current of all the Code Reads sources, in is view of the software industry. It could have been written in 2007 just as easily as in 1985.

Testing is important

Of his two main themes, the first is the easiest to discuss. Basically, software is built broken. It needs to be tested before it works smoothly enough to be considered functional. Nobody has ever done it otherwise, despite their best efforts. Some have come close, but many have failed completely. All software needs to be refined, in situations very similar to its real usage, before it can be considered reliable. This is not news to anyone. Parnas makes it very clear how difficult it will be to do this with SDI.

Even if you exhaustively work to prove each component correct, or test each component extensively as you build it, the resulting system is still not trustworthy until it has been tested.

"If we wrote a formal specification for the software, we would have no way of proving that a program that satisfied the specification would actually do what we expected it to do. The specification itself might be wrong or incomplete." - David Parnas


A classic, and tragic, example of this problem is the Mars Climate Orbiter. Despite a rigorous testing process, a software error still caused the crash of the probe.

Software is hard

His arguments about our ability to build software go along three basic steps:

- Software is harder than other things we build.
- The way be build programs ensures there will be bugs.
- There does not seem to be hope for a much better way to build software.

Dijkstra, Brooks, and Knuth (as quoted in previous posts) have explained many reasons why software is hard. Parnas provides another reason - the discontinuity of software. He compares the structures of analog hardware, digital computer hardware, and software, and argues that since software is discontinuous, and has a large number of discrete states, it is much less amenable to mathematical analysis. This analysis is the main reason why non-software engineering projects are reliable.

For example, a structural member in a bridge has two states, "intact" and "failed". The behavior of the "intact" state is well understood: we have good mathematical models for the part's deformation under load, response to temperature, resistance to wind or water, degradation over time, and so on. The transition between "intact" and "failed" happens under fairly well understood circumstances. And we mostly just hope the "failed" state never happens. The same pattern of logic can be applied to almost all parts of the bridge.

Software systems, on the other hand, have many components, each with generally poorly understood behavior (compared to physical engineering), and many states. Indeed, most software approaches the what we now call "chaotic" behavior. Although it does always fulfill all three formal requirements, most software comes quite close. So on top of the layers of complexity, and depth of scale, most software is also, for practical purposes, chaotic.

How we build software with bugs

We try to manage this complexity by creating a logical model which we can use to break out smaller components, which themselves are broken into smaller components, and so on, until we are writing step-by-step instructions.

But this process is hard to do well. While we can write precise formal specs, "it is hard to make the decisions that must be made to write such a document. We often do not know how to make those decisions until we can play with the system... The result will be a structure that does not fully separate concerns and minimize complexity."

And "even in highly structured systems, surprises and unreliability occur because the human mind is not able to fully comprehend the many conditions that can arise because of the interaction of these components. Moreover, finding the right structure has proved to be very difficult. Well-structured real software systems are rare."

Additionally, we have the difficulty of translating those structures into code. Generally, we write programs as step-by-step algorithms, "thinking like a computer". We can sometimes do this in a top-down fashion, as Dijkstra proposed in his "Notes on Structured Programming", but even that uses a "do-this-then-do-that" approach. Various attempts have been made to find other ways, but none has found wide success.

"In recent years many programmers have tried to improve their working methods using a variety of software design approaches. However, when they get down to writing executable programs, they revert to the conventional way of thinking. I have yet to find a substantial program in practical use whose structure was not based on the expected execution sequence."

This provoked a heated discussion at Code Reads, but I think the fundamental point is that while other techniques exist and do provide real benefit in many cases, they are all ways of working with a larger problem, of structuring the overall approach. At its finest level, software is algorithmic, and algorithms are specified in sequential steps.

There are generally two main non-algorithmic ways to program. The first is to relieve ourselves of some of the work of creating complex sequential algorithms by specifying rules and having some system to implement those rules. And the second is formally isolating non-related portions of an algorithm so they can be run in parallel.

In the first case, using rules, is simple enough in restricted cases, but such systems become as complex and general programming languages in more general cases, and one eventually finds oneself creating rules to describe an underlying algorithm.

The second case works quite well, but each of the many parallel computations ends up being done using the same old step-by-step sequential executions.

And in both cases, we continue to make the same fundamental mistakes about the overall structure of our system because we don't fully understand its behavior.

Better tools and tecniques

Lastly, Parnas addresses the hope that improvements in methodology or tools will alleviate these problems. At the time he was writing, he saw four main threads in tool and methodology improvements (I'm combining two of his essays):

1) Structured programming
2) Formal abstraction
3) Cooperating sequential processes
4) Better housekeeping tools

Back in 1970's, according to Parnas, this was academic "motherhood" - nobody could object. Today, in my experience, this view is industry wide. A few people will argue that we've all been brainwashed and are now blind to alternatives, but even in the portion of our community which is most open to new ideas, these four still are the dominant paradigms.

Parnas argues that we are now in the days of incremental improvements in software, rather than rapid and dramatic advances, saying "Programing languages are now sufficiently flexible that we can use almost any of them for almost any task." And even things like non-algorithmic specifications still suffer from the same problems as writing code: "...our experience in writing nonalgorithmic specifications has shown that people make mistakes in writing them just as they do in writing algorithms. "

Conclusion

One could argue that Parnas is a dead-end thinker, saying nothing more than the status quo is bad, and it is all we will ever get. Instead, we must remember Parnas is talking about the most ambitious and complex software project ever conceived, and saying that the particular project is beyond our capabilities, not that software in general is beyond our capabilities.

However, I do think he misses a possible way out. I say "possible" because I do not know if it is a real solution, or just a fantasy. I think we need to improve the way we think about our own solutions. Then we can build systems that are less prone to the kinds of complexities which befuddle us.

To use an analogy, after the Tacoma Narrows Bridge disaster, civil engineers added some new factors to the way they think about bridges: "wind resistance" and "harmonic effects". Software engineers are still looking for what those factors are. I do not believe that we have a mature set of factors to consider when we design software, and I do believe that we can discover what those factors are.

In fact, I hope this blog will help me discover them, and I'd love to hear any suggestions.

Read more...

© 2007 Andrew Sacamano. All rights reserved.