The second half of the workshop started with Renée Miller from University of Toronto digging into the deeper database levels of BI, and the evolving role of schema from a prescriptive role (time-invariant, used to ensure data consistency) to a descriptive role (describe/understand data, capture business knowledge). In the old world, a schema was meant to reduce redundancy (via Boyce-Codd normal form), whereas the new world schema is used to understand data, and the schema may evolve. There are a lot of reasons why data can be “dirty” – my other half, who does data warehouse/BI for a living, is often telling me about how web developers create their operational database models mostly by accident, then don’t constrain data values at the UI – but the fact remains that no matter how clean you try to make it, there are always going to be operational data stores with data that needs some sort of cleansing before effective BI. In some cases, rules can be used to maintain data consistency, especially where those rules are context-dependent. In cases where the constraints are inconsistent with the existing data (besides asking the question of how that came to be), you can either repair the data, or discover new constraints from the data and repair the constraints. Some human judgment may be involved in determining whether the data or the constraint requires repair, although statistical models can be used to understand when a constraint is likely invalid and requires repair based on data semantics. In large enterprise databases as well as web databases, this sort of schema management and discovery could be used to identify and leverage redundancy in data to discover metadata such as rules and constraints, which in turn could be used to modify the data in classic data repair scenarios, or modify the schema to adjust for a changing reality.
Sheila McIlraith from University of Toronto presented on a use-centric model of data for customizing and constraining processes. I spoke last week at Building Business Capability on some of the links between data and processes, and McIlraith characterized processes as a purposeful view of data: processes provide a view of the data, and impose policies on data relative to some metrics. Processes are also, as she pointed out, are a delivery vehicle for BI – from a BPM standpoint, this is a bit of a trivial publishing process – to ensure that the right data gets to the right stakeholder. The objective of her research is to develop business process modeling formalism that treats data and processes as first class citizens, and supports specification of abstract (ad hoc) business processes while allowing the specification of stakeholder policies, preferences and priorities. Sounds like data+process+rules to me. The approach is to specify processes as flexible templates, with policies as further constraints; although she represents this as allowing for customizable processes, it really just appears to be a few pre-defined variations on a process model with a strong reliance on rules (in linear temporal logic) for policy enforcement, not full dynamic process definition.
Lastly, we heard from Rock Leung from SAP’s academic research center and Stephan Jou from IBM CAS on industry challenges: SAP and IBM are industry partners to the NSERC Business Intelligence Network. They listed 10 industry challenges for BI, but focused on big data, mobility, consumable analytics, and geospatial and temporal analytics.
- Big data: Issues focus on volume of data, variety of information and sources, and velocity of decision-making. Watson has raised expectations about what can be done with big data, but there are challenges on how to model, navigate, analyze and visualize it.
- Consumable analytics: There is a need to increase usability and offering new interactions, making the analytics consumable by everyone – not just statistical wizards – on every type of device.
- Mobility: Since users need to be connected anywhere, there is a need to design for smaller devices (and intermittent connectivity) so that information can be represented effectively, and seamless with representations on other devices. Both presenters said that there is nothing that their respective companies are doing where mobile device support is not at least a topic of conversation, if not already a reality.
- Geospatial and temporal analytics: Geospatial data isn’t just about Google Maps mashups any more: location and time are being used as key constraints in any business analytics, especially when you want to join internal business information with external events.
They touched briefly on social in response to a question (it was on their list of 10, but not the short list), seeing it as a way to make decisions better.
For a workshop on business intelligence, I was surprised at how many of the presentations included aspects of business rules and business process, as well as the expected data and analytics. Maybe I shouldn’t have been surprised, since data, rules and process are tightly tied in most business environments. A fascinating morning, and I’m looking forward to the keynote and other presentations this afternoon.