More on ancient engineering

Good to see that I’m not the only blogger slacking off by taking European vacations these days, and trying to compensate by blogging about ancient engineering feats: John Reynolds ponders Pisa’s leaning tower by exploring the bond between Renaissance engineers and all of us implementing IT systems these days:

  • Renaissance engineers were asked (or forced) to build on foundations that they knew were flawed
  • Renaissance engineers had to deal with “legacy systems”
  • Renaissance engineers had to implement “quick-fixes” that made the original problem worse
  • Renaissance engineers had to expend great effort over many years to patch and maintain defective projects (instead of starting over)

He echoes my sentiment somewhat by hoping that something of his will become as significant as Pisa’s tower someday.

Think big, start small

I watched an SAP presentation today about their NetWeaver platform, and although the product was only of peripheral interest to me, I love their philosophy on how to get started on a project: think big, start small.

I can’t even count how many projects that I’ve seen fail, or miss their targets significantly, due to over-reaching on the first phase. Usually, someone gets all excited about the technology and before you know it, they’re trying to implement the 8th wonder of the world in Phase I. Schedules slip, but even worse, vision slips: often, no one is left with a clear idea of what is to be accomplished, or the path to getting there. [Of course, the converse is just as bad: the pilot (a.k.a. Project In Lots Of Trouble), where someone hacks together a system without proper design or testing, and it becomes the cornerstone of a future legacy system, but that’s a story for another day.]

This type of scope creep is especially prevalent on BPM projects, since it’s so (conceptually) easy to just add another step to the initial process, then another, and another. What starts out as a simple process with 2 human touch-points using an out-of-the-box interface and 3 system touch-points using standard adapters becomes a morass of custom interfaces and extraneous exception paths. Without fail, the biggest argument that I ever have with anyone on a BPM project is about keeping the first phase small so as to get something into production sooner.

Of course, as a designer, I believe in getting some amount of the design work done up front: you have to understand the overall scope and the required functionality to provide a framework for the work, but you also have to carve off a reasonable first phase that won’t take too long and will provide a useful system, when implemented. In the case of BPM projects, if you can’t implement that first something inside 6 months, there’s something wrong with what you’re doing.

Think big, start small.

Testing for real life

I watched the movie K-19: The Widowmaker on TV last night; it’s about a Russian nuclear submarine on its maiden voyage in 1961 where pretty much everything goes wrong. In the midst of watching reactor failures and other slightly less catastrophic mishaps, I started thinking about software testing. I’ve seen software that exhibited the functional equivalent of a reactor failure: a major point of failure that required immediate shutdown for repairs. Fortunately, since I have worked primarily on back-office BPM systems for financial services clients over the years, the impact of these catastrophic system failures is measured in lost efficiences (time and money) by having to revert to paper-based processes, not in human lives.

When I owned a professional services company in the 90’s, I spent many years being directly responsible for the quality of the software that left our hands and was installed on our clients’ systems. In the early days, I did much of the design, although that was later spread over a team of designers, and I like to think that good design led to systems with a low “incident” rate. That’s only part of the equation, however. Without doubt, the single most important thing that I did to maximize the quality of our product was to create an autonomous quality assurance and testing team that was equivalent in rank (and capabilities) to the design and development teams, and had the power to stop the release of software to a client. Because of this, virtually all of our “showstopper” bugs occurred while the system was still in testing, saving our clients the expense of production downtime, and maintaining our own professional reputation. Although we always created emergency system failure plans that would allow our client to revert to a manual process, these plans were rarely executed due to faults in our software, although I did see them used in cases of hardware and environmental failures.

When I watched Liam Neeson’s character in K-19 try to stop the sea trials of the sub because it wasn’t ready, and be overruled for political reasons, I heard echoes of so many software projects gone wrong, so many systems put into production with inadequate testing despite a QA team’s protests. But not on my watch.