Can RSS replace trickle feeds for BI?

I was having a conversation late last week with a SaaS BI vendor about how organizations get data into their online data warehouse (ftp seems to be the most popular method), when it struck me: why couldn’t they use an RSS feed from a transactional system to feed data into the data warehouse for BI purposes (or Atom, for that matter)? Near-real-time data is essential for many types of BI analysis, so there has to be something better than once-daily uploads.

5 thoughts on “Can RSS replace trickle feeds for BI?

  1. The problem with RSS/Atom is that the Producer doesn’t know whether the Consumer has read the data or not, so you either stuff the RSS feed with more data than it needs or (really, and/or) suffer the risk of data dropouts. Now, there are technological solutions to this, using ETag tokens and various clever techniques, but since you’re coding this yourself, the question has to be: why not just use TIBCO, MQ, or whatever.

  2. I have to agree with David on this. You want to have system sync transaction save and will need appropriate technology for that. Data push or Publish and Subscribe technologies are already out there, that work fine for this.
    Although I see a use case for RSS feeds on certain process data, towards people that do day-to-day monitoring. Non critical monitoring (for the C & I in RACI of certain processes).

    Regarding the daily uploads: yes, I agree there. We need to start moving out of the batch thinking of both tasks and data transfer. As a driver, I would not want my car to only upload and show my speed, gasoline level and traffic jam info every hour or so, even less 24 hour. So why do we settle for less in business? For some reason people do firefighting from a result of delayed information, but never have the role in their job description…
    Unfortunately, in many IT people’s head, there is still a lot of mainframe thinking (send blob of data here, perform update XYZ, print signal list, send data further), instead of: what business processes are we running here (24×7) and what operational control do we need over it?

    Regards,
    Roeland

  3. David and Roeland, thanks for your comments. I agree that there’s an issue with no delivery guarantee mechanism, but there could be a lot of intraday BI that is more focussed on aggregates than individual data points for which this might be suitable.

    David, I agree that if you’re doing this between two on-premise systems, then a proper message bus is the way to go; I’m thinking of the case of an on-premise transaction system sending data to a SaaS BI system, where the SaaS system is more likely to consume RSS feeds than TIBCO or MQ messages, at least until the SaaS offerings mature somewhat.

    Roeland, the batch thinking definitely needs to go, but it’s a constant battle with old-style IT departments. The problem is, as you state, that business needs near-real-time information to manage things effectively, and IT is only giving them daily updates.

    Also check out the post after this one on the TIBCO PageBus stuff that was just announced today, that’s client-side, but interesting to start seeing the pub-sub paradigm being used more widely.

  4. Hi Sandy,

    Ah, but then the issue is RSS isn’t “sent” — it’s consumed via HTTP GET, which means the SaaS consumer/system has to tunnel in somehow to business data. This is a quibble, I know.

    We did the [Large Car Company] website by using MQ messages to replicate parts of their DB. The site was off-premises and outside of the firewall, but once again…. Funnily enough, they didn’t send the messages as updates happened but exactly in the batch orientation that Roeland was talking about.

    There’s still options. First, the Atom Publishing Protocol (not to be confused with the Atom Syndication Format, which is more anologous to RSS) allows secure connections to push item by item from the producer to the consumer. This is how Google populates Google Base (they call the protocol GData, but it’s APP). The nice thing is that there’s starting to be good tool support for this and if I was doing SaaS services I would look at this in a serious way. You’d still have to deal with queuing internally, but with a single consumer this isn’t too tough.

    Another SaaS option the Amazon Simple Queue service:
    http://www.amazon.com/gp/browse.html?node=13584001

    I’d seriously consider this if there were “freeware” alternatives, just in case Amazon decides to get out of the software services industry.

Leave a Reply