Tuesday, January 31, 2006

From Magic Bullets to Puzzle Pieces

Arising from a discussion about expanding the unit testing activities on our current project, a colleague of mine sent along an excerpt from Robert L. Glass' Facts and Fallacies of Software Engineering. His impression was that this book provided a clear argument against the need for us to write comprehensive software tests. I don't agree, and my sense is that Glass' comments in Facts and Fallacies are not intended as an attack against the practice of unit testing in particular. This article is an attempt to work through some of the common misconceptions about test-driven development, articulating the advantages of testing as I have experienced them in practice. Although I haven't yet had a chance to read the whole Facts and Fallacies book, I will quote from the portions of the book that address the issue of testing as they were forwarded to me.

The Successful Software Puzzle

My main position here is that comprehensive unit testing delivers a number of key benefits to the software development process, making it an indispensable tool for writing better software. To understand the true value of unit testing, it's essential to move away from the dogmatic notion that test-driven development is a "methodological magic bullet" (as some skeptics have put it), seeing it instead as one technique among several that are highly effective in producing successful software. One piece of the puzzle, so to speak.

The argument against unit testing, as it was articulated to me at a recent project meeting, is that one-hundred percent test coverage is not sufficient for catching all program errors. I'm not sure this argument makes much sense, as most would agree that there is still significant value in catching some percentage of bugs rather than doing nothing at all. But before we get into the numbers, let's start by looking at what kinds of errors generally occur in software, and how other agile development techniques can help to augment unit testing and further reduce errors.

In his Fact #33, Robert Glass states that, based on his own personal research, there are essentially two kinds of errors which can occur in software. He says:

"I was working in the aerospace industry at the time and I had access to lots of error data from lots of real aerospace projects. I decided to examine some of that error data, one error report at a time from the point of view of this question: Would complete logic path coverage have allowed someone to detect this error? ...

The answer, as you could already guess from my earlier treatment of this topic, was all too seldom "yes". There were a large number of errors that thorough structural testing would not have been sufficient to detect. As I looked at error report after error report, those particular errors began to separate themselves into two major classes. Those two classes were:

"1. Errors of omission, in which the logic to perform a required task was simply not in the program.
2. Errors of combinatorics, in which an error would manifest itself only when a particular combination of logic paths was executed."


In the first case, it's unlikely that unit tests are sufficient in themselves to catch an error if the programmer was not aware of the need for the logic in the first place. Fair enough. So if we look at unit testing as one piece of the puzzle, what other techniques are available to help prevent this type of error?

In my mind, user-driven requirements and acceptance testing provide some support in this regard. Agile development shifts the emphasis of requirements definition away from time-consuming and often-ignored paper documentation and towards a more customer-driven approach. Tasks are defined and prioritized by the users, allowing them to participate fully in the process of shaping the software and in helping verify its success. With the customer driving the features, important business logic is less likely to be omitted by the developer, and if it is forgotten, is more likely to be caught in testing.

In the second case, Glass refers to the fact that individual logic path tests aren't able to catch bugs which rely on the user triggering certain combinations of behaviour or functionality to make themselves known. Certainly, every software developer has encountered these kinds of errors and is aware of the difficulty in tracking them down. Once again, this is a case where other techniques work in concert with unit testing to reduce bugs. While unit testing focusses on verifying individual classes and methods in as loosely-coupled a fashion as possible, user acceptance tests help to verify that large-scale usage patterns are correctly handled by the application. Although it is impossible to test all of the possible permutations and combinations of an application's functionality, acceptance testing ensures that the most important types of behaviour--those paths which the customer determines are essential to their work--are correctly supported.

Once again, I don't believe that Robert Glass is talking about the failures of unit testing in particular here. Fact #33 ultimately states what we know from experience: that software is highly complex and, as a result, preventing bugs is one of our most difficult challenges. It takes time, effort, and skill to write software that meets the needs of its users reliably. Glass reminds us that testing needs to be taken seriously and should be considered from several perspectives--from the lowest level of individual units through to broad user acceptance tests--throughout the software development lifecycle. Glass himself says:

"The fact of the matter is, producing successful, reliable software involves mixing and matching an all-too-variable number of error removal approaches, typically the more of them the better. There is no known magic solution to this problem."

Okay, so let's put this issue of the magic bullets to rest once and for all. It's a cynical conclusion to assume that unit testing is just another snake-oil sales job, promising one more cure-all for the ailment of software complexity. This notion is perhaps driven by the hype currently surrounding agile software development, and has lead some people to think that Extreme Programming (XP) is like many of the other approaches to software we've been sold over the years: a so-called "big M" methodology. But at it's core, XP is not a methodological prescription. In reality, it attempts to shift the focus of the development team to a number of proven techniques that have been successful in the past, regardless of methodology. What XP brings is the recognition that certain best practices support each other and work best in combination. The true value of agile development is that it challenges us to find the simplest, most effective techniques for creating successful software, and to recognize their benefits as an integrated whole, rather than in isolation. That's not to say that many agile techniques, including unit testing, can't be used on their own, but that their full value isn't always apparent without other supports in place.

Returns on Testing Investment

That having been said, let's go back to looking at unit testing on its own for a moment. I certainly agree with my colleague's underlying point that it is important to think critically about the benefits and costs of unit testing. Assuming Glass' informal study is accurate, is unit testing really worth its cost? Glass believes that structural testing of any kind has the potential to catch only about twenty five percent of all errors in a software product. He found that:

"Errors of omitted logic represented 35 percent of the database of errors I was examining... Combinatorics represented a quite surprising additional 40 percent. Thus 100 percent structural test coverage, far from being a step toward error-free software, is an approach that is seductive but insufficient - insufficient at the 75 percent level (that is, complete structural coverage will still guarantee us the detection of only 25 percent of the errors in a software product)."

The problem here is that error reduction isn't the only benefit provided by a comprehensive suite of unit tests. These numbers simply don't reflect the true value of unit testing, since they ignore the several beneficial roles that unit testing plays in addition to the reduction of errors. Here is how I see it:

1) A comprehensive test suite eases the process of tracking down and fixing bugs once they've been discovered. Of course, unit tests are no substitute to user-level testing, but when bugs are encountered by users or testers, an existing test case serves as an effective fixture for quickly writing new tests to prove the existence of the bug. Once the test has been written, the unit tests provide immediate feedback about the bug's status and help to identify regression errors that may occur later in development.

2) Testing improves code design. Comprehensive test coverage demands implementing code in smaller, simpler units. This benefits overall class design and facilitates refactoring. Even more importantly, the process of testing classes in isolation requires a stronger emphasis on loose-coupling and coding to interfaces, two key ways to improve the flexibility of objects. The impact of testing on the design process is an often-underestimated feature of test-driven development.

3) Unit tests are a more effective way to articulate our intentions and expectations of how a class should be used than documentation alone. Well-written tests represent the first client for the code, and help to explicitly declare certain assumptions made while the code was written.

So, the benefit of unit testing not only includes a reduction in program errors, but also provides a useful infrastructure for fixing bugs and catching regression errors, encourages better code design, and a is clear way of articulating intentions to other developers. As I see it, unit testing provides much more than a 25% return on my investment.

Once again, I want to be clear here: I don't see unit testing as a methodological "magic bullet." Software development is complex, and requires the integration a number of techniques and approaches to be successful. No tool or methodology can substitute for the knowledge and experience of good developers. But unit testing has proven itself to be an extremely valuable technique, and I have come to view writing tests as one of the fundamental responsibilities that all programmers hold. Interestingly, I don't think Robert Glass disagrees with me on this point, so I'll let him have the last word here. In an article in the IEEE Software journal, Glass states:

"Unit testing: Few, other than formal-method and fault tolerance zealots, will argue against unit testing. It might well be the most agreed-upon software best practice of all time."

Well said.