I had a bit of blog fatigue earlier, but Keith Swenson blogged the session on process cloud concepts for case management that I attended but didn’t write about, and I’m back at it for the last set of papers for the day at BPM 2012, all with a focus on process mining.
Repairing Process Models to Reflect Reality
Dirk Fahland of Eindhoven University presented a paper on process repair, as opposed to process mining, with a focus on adjusting the original process model to maximize fitness, where fitness is measured by the ability to replace traces in the event log: if a model can replay all of the traces of actual process execution, then it is perfectly fit. Their methods compare the process model to the event log using a conformance checker in order to align the event log and the model, which can be accomplished with the methods of Adriansyah et al’s cost-based replayer to find the diagnostic information.
The result includes activities that are skipped, and activities that must be added. The activities to be added can be fed to an existing process discovery algorithm to create subprocesses that must be added to the existing process, and the activities that were skipped are either made optional or removed from the original process model.
Obviously, this is relevant in situations where the process model isn’t automated, that is, the event logs are from other systems, not directly executed from the process model; this is common when processes are implemented in ERP and other systems rather than in a BPMS, and process models are created manually in order to document the business processes and discover opportunities for optimization. However, as we implement more semi-structured and dynamic processes automated by a BPMS, the event logs of the BPMS itself will include many events that are not part of the original process model; this could be a useful technique for improving understanding of ad hoc processes. By understanding and modeling ad hoc processes that occur frequently, there is the potential to identify emergent subprocesses and add those to the original model in order to reduce time spent by workers creating the same common ad hoc processes over and over again.
There are other measurements of model quality besides fitness, including precision, generalization and simplicity; future research will be looking at these as well as improving the quality of alignment and repair.
Where Did I Misbehave? Diagnostic Information in Compliance Checking
Elham Ramezani of Eindhoven University presented a paper on compliance checking. Compliance checking covers the full BPM lifecycle: compliance verification during modeling, design and implementation; compliance monitoring during execution; and compliance auditing during evaluation. The challenge is that compliance requirements have to be decomposed and used to create compliance rules that can be formalized into a machine-understandable form, then compared to the event logs using a conformance checker. This is somewhat the opposite of the previous paper, which used conformance checking to find ways to modify the process model to fit reality; this looks at using conformance checking to ensure that compliance rules, represented by a particular process model, are being followed during execution.
Again, this is valuable for processes that are not automated using a BPMS or BRMS (since rules can be strictly enforced in that environment), but rather processes executing in other systems or manually: event logs from systems are compared to the process models that represent the compliance rules using a conformance checker, and the alignment calculated to identify non-compliant instances. There were some case studies with data from a medical clinic, detecting non-compliant actions such as performing an MRI and CT scan of the same organ, or registering a patient twice on one visit.
There was an audience question that was in my mind as well, which is why to express the compliance rules in Petri nets rather a declarative form; she pointed out that the best conformance checking available for aligning with event logs use operational models such as Petri nets, although they may consider adding declarative rules to this method in the future in addition to other planned extensions to the research. She also mentioned that they were exploring applicability to monitoring service level agreement compliance, which has a huge potential for business applications where SLA measurements are not built into the operational systems but must be detected from the event logs.
FNet: An Index for Advanced Business Process Querying
[link to pdf paper]
Zhiqiang Yan, also of Eindhoven University (are you seeing a theme here in process mining?), presented on querying within a large collection of process models based on certain criteria; much of the previous research has been on defining expressive query languages (such as BPMN-Q) that can be very slow to execute, but here they have focused on developing efficient techniques for executing the queries. They identify basic features, or small fragments, of process models, and advanced elements such as transitive or negative edges that form advanced features.
To perform a query, both the query and the target process models are decomposed into features, where the features are small and representative: specific sequences, join, splits and loops. Keywords for the nodes in the graphs are using in addition to the topology of the basic features. [There was a great deal of graph theory in the paper concerned with constructing directed graphs based on these features, but I think that I forgot all of my graph theory shortly after graduation.]
The results seem impressive: two orders of magnitude increase in speed over BPMN-Q. As organizations continue to develop large repositories of process models and hope to get some degree of reuse, process querying will become more important in practical applications.
Using MapReduce to scale events correlation discovery for business processes mining
The last paper of this session, and of the day, was presented by Hicham Reguieg of Blaise Pascal University in Clermont-Ferrand. One of the challenges in process mining and discovery is big data: the systems that are under consideration generate incredible amounts of log data, and it’s not something that you’re going to just open up in a spreadsheet and analyze manually. This paper looks at using MapReduce, a programming model for processing large data sets (usually by distributing processing across clusters of computers), applied to the specific step of event correlation discovery, which analyzes the event logs in order to find relationships between events that belong to the same business process.
Although he didn’t mention the specific MapReduce framework that they are using for their experiments, I know that there’s a Hadoop one – inevitable that we would start seeing some applicability for Hadoop in some of the big data process problems.