I watched the movie K-19: The Widowmaker on TV last night; it’s about a Russian nuclear submarine on its maiden voyage in 1961 where pretty much everything goes wrong. In the midst of watching reactor failures and other slightly less catastrophic mishaps, I started thinking about software testing. I’ve seen software that exhibited the functional equivalent of a reactor failure: a major point of failure that required immediate shutdown for repairs. Fortunately, since I have worked primarily on back-office BPM systems for financial services clients over the years, the impact of these catastrophic system failures is measured in lost efficiences (time and money) by having to revert to paper-based processes, not in human lives.
When I owned a professional services company in the 90’s, I spent many years being directly responsible for the quality of the software that left our hands and was installed on our clients’ systems. In the early days, I did much of the design, although that was later spread over a team of designers, and I like to think that good design led to systems with a low “incident” rate. That’s only part of the equation, however. Without doubt, the single most important thing that I did to maximize the quality of our product was to create an autonomous quality assurance and testing team that was equivalent in rank (and capabilities) to the design and development teams, and had the power to stop the release of software to a client. Because of this, virtually all of our “showstopper” bugs occurred while the system was still in testing, saving our clients the expense of production downtime, and maintaining our own professional reputation. Although we always created emergency system failure plans that would allow our client to revert to a manual process, these plans were rarely executed due to faults in our software, although I did see them used in cases of hardware and environmental failures.
When I watched Liam Neeson’s character in K-19 try to stop the sea trials of the sub because it wasn’t ready, and be overruled for political reasons, I heard echoes of so many software projects gone wrong, so many systems put into production with inadequate testing despite a QA team’s protests. But not on my watch.