Yesterday, I posted a rather whiny entry about rude customers (and Bob McIlree was kind enough to give me a comforting pat on the shoulder, virtually speaking — thanks Bob!) so today I decided to get a bit more productive. Moving from whine to wine, I finally made my first cut of a Squidoo lens about Australian Wine in Toronto (yes, I’m a geek and this is how I spent part of my Sunday). Sort of a niche topic, true, but that’s what Squidoo lenses are all about: it allows you to quickly build a one-page portal with links to other sites, Amazon products, eBay, RSS feeds, and a number of other kinds of information. Since it’s all on the web, you can update it anywhere, which is why I’ve moved quite a bit of information about both wine and BPM from my websites to my two Squidoo lenses.
I want to add a bit of meat to this post to offset the whine of yesterday, and coincidentally (before I saw his comment), I was reading Bob’s post on SOA and Data Issues and the need to maintain a source system of record (SSoR) for data. In particular, he discusses a conversation that was passed along to him from another organization:
A, the SOA implementer, argues that application-specific databases have no need to retain SSoR data at all since applications can invoke services at any time to receive data. He further opined that the SOA approach will eliminate application silos as his primary argument in the thread.
B, the applications development manager, is worried that he won’t get the ‘correct’ value from A’s services and that he has to retain what he receives from SSoRs to reconcile aggregations and calculated values at any point in time.
Since I’m usually working on customer projects that involve the intersection of legacy systems, operational databases, BPMS and analytical databases, I see this problem a lot. In addition to B’s argument about getting the “correct” value, I also hear the efficiency argument, which usually manifests as “we have to replicate [source data] into [target system] because it’s too slow to invoke the call to the source system at runtime”. If you have to screen-scrape data from a legacy CICS screen and reformat it at every call, I might go for the argument to replicate the mainframe data into an operational database for faster access. However, if you’re pulling data from an operational database and merging it with data from your BPMS, I’m going to find it harder to accept efficiency as a valid reason for replicating the data into the BPMS. I know, it’s easier to do it that way, but it’s just not right.
When data is replicated between systems, the notion of the SSoR, or “golden copy”, of the data is often lost, the most common problem being when the replicated data is updated and never synchronized back to the original source. This is exacerbated by synchronization applications that attempt to update the source but were written by someone who didn’t understand their responsibility in creating what is effectively a heterogeneous two-phase commit — if the update on the SSoR fails, no effective action is taken to either rollback the change to the replicated data or raise a big red flag before anyone starts making further decisions based on either of the data sources. Furthermore, what if two developers each take the same approach against the same SSoR data, replicating it to application-specific databases, updating it, then trying to synchronize the changes back to the source?
I’m definitely in the A camp: services eliminate (or greatly reduce) the need to replicate data between systems, and create a much cleaner and safer data environment. In the days before services ruled the earth, you could be forgiven for that little data replication transgression. In today’s SOA world, however, there are virtually no excuses to hide behind any more.