TIBCO’s Recent Acquisitions: DataSynapse, Foresight, Netrics and Spotfire

No rest for the wicked: at the analyst lunch, we had sessions on four of TIBCO’s recent acquisitions while we were eating:

DataSynapse

This is a significant part of TIBCO’s cloud and grid strategy, with a stack of four key products:

  • Grid Server, which allows multiple servers to be pooled and used as a single resource
  • Fabric Server, which is the platform-as-a-service platform on top of Grid Server
  • Federator, a self-service provisioning portal
  • DataSynapse Analytics, providing metering of the grid

The real meat is in the Grid Server, which has been used to create private clouds of over 40,000 connected cores; these can be either internal or externally-facing, so are being used for customer-facing applications as well as internal ones. They position Grid Server for situations where the application and configuration complexity are just beyond the capabilities of a platform like VMWare, and see three main use cases:

  • Dynamic application scalability
  • Server virtualization to improve utilization and reduce deployment times
  • Rolling out new applications quickly

Foresight

A recent acquisition, Foresight is used for transaction modernization and cross-industry EDI, although they have some very strong healthcare solutions. They have several products:

  • Gateway/portal for managing healthcare insurance transactions between parties
  • EDISIM, for EDI authoring, testing and compliance
  • HIPAA Validator, for compliance and validation of HIPAA transactions
  • Instream, for routing, acknowledgement, management and translation of messages and events
  • Community Manager, for mass testing and migration

From cloud to EDI was a bit of a retro comparison, although there’s a lot of need for both.

Netrics

Netrics does data matching of (semi-)structured data, such as name matching in databases, in order to clean up data, reduce errors and repeats, and improve decision-making. They have two products:

  • Matching Engine models human similarity measures for comparing data
  • Machine Learning Engine models human decisions on data

Interesting discussion about some of the algorithms that they’re using, that go far beyond the simple soundex-type calculations that are more commonly available.

Spotfire

Spotfire is the oldest acquisition of the four presented here (three years ago), and was shown as much to show TIBCO’s model for acquisition and assimilation, as it was to talk about Spotfire’s capabilities.

Spotfire, as I’ve written about previously, provides easy-to-use visual analytics, using in-memory data for near-instantaneous results. Since becoming part of TIBCO, they’ve integrated with other TIBCO products to become visualization for a wide range of process and event-driven applications. their integration with iProcess BPM was shown back in 2008, and they’ve developed links with the SOA and CEP products as well.

This acquisition shows how TIBCO’s acquisition process works with these smaller companies – different from either the Borg or death by 1000 cuts methods of their competitors – first of all since they tend to target companies specifically that allow them to leapfrog their competition technologically by buying cool and innovative technology. Once acquired, Spotfire had access to TIBCO’s large base of customers, partners and markets, providing an immediate boost to their sales efforts. As they reorganized, the product group focused on preserving what worked at Spotfire, while optimizing for execution within the larger TIBCO context. Alongside this, the Spotfire product group worked with other TIBCO areas to integrate to other technologies, weaving Spotfire into the TIBCO portfolio.

WebSphere BPM Product Portfolio Technical Update

The keynotes sessions this morning were typical “big conference”: too much loud music, comedians and irrelevant speakers for my taste, although the brief addresses by Steve Mills and Craig Hayman as well as this morning’s press release showed that process is definitely high on IBM’s mind. The breakout session that I attended following that, however, contained more of the specifics about what’s happening with IBM WebSphere BPM. This is a portfolio of products – in some cases, not yet really integrated – including Process Server and Lombardi.

Some of the new features:

  • A whole bunch of infrastructure stuff such as clustering for simple/POC environments
  • WS CloudBurst Appliance supports Process Server Hypervisor Edition for fast, repeatable deployments
  • Database configuration tools to help simplify creation and configuration of databases, rather than requiring back and forth with a DBA as was required with previous version
  • Business Space has some enhancements, and is being positioned as the “Web 2.0 interface into BPM” (a message that they should probably pass on to GBS)
  • A number of new and updated widgets for Business Space and Lotus Mashups
  • UI integration between Business Space and WS Portal
  • Webform Server removes the need for a client form viewer on each desktop in order to interact with Lotus Forms – this is huge in cases where forms are used as a UI for BPM participant tasks
  • Version migration tools
  • BPMN 2.0 support, using different levels/subclasses of the language in different tools
  • Enhancements to WS Business Modeler (including the BPMN 2.0 support), including team support, and new constructs including case and compensation
  • Parallel routing tasks in WPS (amazing that they existed this long without that, but an artifact of the BPEL base)
  • Improved monitoring support in WS Business Monitor for ad hoc human tasks.
  • Work baskets for human workflow in WPS, allowing for runtime reallocation of tasks – I’m definitely interested in more details on this
  • The ability to add business categories to tasks in WPS to allow for easier searching and sorting of human tasks; these can be assigned at design time or runtime
  • Instance migration to move long-running process instances to a new process schema
  • A lot of technical implementation enhancements, such as new WESB primitives and improvements to the developer environment, that likely meant a lot to the WebSphere experts in the room (which I’m not)
  • Allowing Business Monitor to better monitor BPEL processes
  • Industry accelerators (previously known as industry content packs) that include capability models, process flows, service interfaces, business vocabulary, data models, dashboards and solution templates – note that these are across seven different products, not some sort of all-in-one solution
  • WAS and BPM performance enhancements enabling scalability
  • WS Lombardi Edition: not sure what’s really new here except for the bluewashing

I’m still fighting with the attendee site to get a copy of the presentation, so I’m sure that I’ve missed things here, but I have some roundtable and one-on-one sessions later today and tomorrow that should clarify things further. Looking at the breakout sessions for the rest of the day, I’m definitely going to have to clone myself in order to attend everything that looks interesting.

In terms of the WPS enhancements, many of the things that we saw in this session seem to be starting to bring WebSphere BPM level with other full BPM suites: it’s definitely expanding beyond being just a BPEL-based orchestration tool to include full support for human tasks and long-running processes. The question lurking in my mind, of course, is what happens to FileNet P8 BPM and WS Lombardi (formerly TeamWorks) as mainstream BPM engines if WPS can do it all in the future? Given that my recommendation at the time of the FileNet acquisition was to rip out BPM and move it over to the WebSphere portfolio, and the spirited response that I had recently to a post about customers not wanting 3 BPMSs, I definitely believe that more BPM product consolidation is required in this portfolio.

Lean Sigma Tools Applied to BPM

Chris Rocke and Jane Long from Whirlpool presented on their experiences with integrating LSS tools into BPM practices to move beyond traditional process mapping. Whirlpool is a mature Six Sigma company: starting in their manufacturing areas, it has spread to all other functions, and they’ve insourced their own training certification program. Six Sigma is not tracked as separate cost/benefit within a project, but is an inherent part of the way every project is done.

They introduced BPM during to a large-scale overhaul of their systems, processes and practices; their use of BPM is includes process modeling and monitoring, but not explicit process automation with a BPMS outside of their existing financial and ERP systems. However, they are creating a process-centric culture that does manage business processes in the governance and management sense, if not the automation sense in all cases. They brought LSS tools to their BPM efforts, such as process failure mode and effects analysis (PFMEA), data sampling and structure methods, thought maps and control charts; these provide more rigorous analysis than is often done within BPM projects.

Looking at their dashboards, they had the same problem as Johnson & Johnson: lots of data but no consistent and actionable information. They developed some standard KPIs, visualized in a suite of seven dashboards, with alert when certain control points are exceeded. Their Six Sigma analytics are embedded within the dashboards, not explicit, so that the business owners view and click through the dashboards in their own terms. The items included in the dashboard are fairly dynamic: for example, in the shipping dashboard, the products that vary widely from expected and historic values are brought forward, while those that are within normal operating parameters may not even appear. Obviously, building the models underlying this was a big part of the work in creating the dashboards: for example, shipping dashboard alerts are based on year-over-year differences (because sales of most products are seasonal) with control limits that are the mean of the YOY differences +/-  two standard deviations for a yellow alert, or three standard deviations for a red alert, plus other factors such as checking to see if the previous year’s value was an anomaly, weighted by the number of units shipped and a few other things thrown in.

The analytical calculations behind a dashboard might include internal forecasts or market/industry values, include seasonal fluctuations or not, depending on the particular measurement. The dashboard visuals, however, conceal all the complications of the underlying model. Alerts aren’t necessarily bad, but indicate a data point that’s outside the expected range and warrants investigation or explanation. They’ve seen some success in reducing variability and therefore making their forecasts more accurate: preventing rather than detecting defects.

They’re also using SAP’s Xcelsius for the dashboard itself; that’s the third company that I’ve heard here that is using that, which is likely due in part to the large number of SAP users but also gives credit to the flexibility and ease of use of that tool. They’re using SAP’s Business Warehouse for housing the data, which extracts from their core ERP system nightly: considerably more up-to-date than some of the others that we’ve seen here, which rely on monthly extracts manipulated in Excel. Although IT was involved in creating and maintaining BW, the LSS team owns their own use of Xcelsius, which allows them to modify the dashboards quickly.

Using Dashboards to Run the Business and Focus Improvements

David Haigh of Johnson & Johnson presented on how they’re using dashboards in their process improvement efforts; this is much further into my comfort zone, since dashboards are an integral part of any BPM implementation. He’s part of the consumer products division rather than pharmaceutical or medical: lots of name brands that we all see and use every day.

Their process excellence program covers a range of methods and tools, but today’s talk was focused on dashboards as a visualization of a management system for your business: to set strategy, track progress, and make corrections. Like many companies, J&J has a lot of data but not very much that has been transformed into actionable information. He makes an automotive analogy: a car engine typically has 43 inputs and 35 outputs, but we drive using a dashboard that has that information rolled up into a few key indicators: speed, RPM, temperature and so on.

They see dashboards as being used for governing the company, but also for informing the company, which means that the dashboards are visible to all employees so that they understand how the company is doing, and how their job fits into the overall goals and performance. Dashboards can – and should – leverage existing reporting, especially automated reporting, in order to reduce the incremental work required to create them. They have to be specific, relating jobs to results, and relevant in terms of individual compensation metrics. They have dashboards with different of levels of details, for different audiences: real-time detailed cockpits, medium-level dashboards, and reports for when a repeatable question can’t be answered from a dashboard within three clicks (great idea for deciding when to use a dashboard versus a report, btw). They used a fairly standard, slightly waterfall-y method for developing their dashboards, although did their first rollout in about 3 months with the idea that the dashboards would be customizable to suit changing requirements. One challenge is their wide variety of data sources and the need for data manipulation and transformation before reporting and feeding into dashboards.

They had most of their reports in Excel already, and added SAP’s Xcelsius to generate dashboards from those Excel reports. That provided them with a lot of flexibility in visualization without having to rewrite their entire ETL and reporting structure (I know, export to Excel isn’t the best ETL, but if it’s already there, use it).

One of the big benefits is the cross-departmental transparency: sales and logistics can see what’s happening in each others areas, and understand how their operations interrelate. This highlights their non-traditional approach to dashboard visibility: instead of just having management view the dashboards, as happens in most companies, they expose relevant parts of the dashboard to all employees in order to bring everyone into the conversation. They actually have it on monitors in their cafeteria, as well as on the intranet. I love this approach, because I’m a big believer in the benefits of transparency within organizations: better-informed people make better decisions, and are happier in their work environment. They’re able to weave the dashboards into their process improvements and how they engage with employees in running the business: being able to show why certain decisions were made, or the impact of decisions on performance.

Their next steps are to review and improve the metrics that they collect and display, and to start involving IT to automate more of the data collection by pushing information directly to Cognos rather than Excel. There were a ton of questions from the audience on this; some are using dashboards, but many are not, and are interested in how this can help them. I’m interested in how they plan to push the dashboard results beyond just human consumption and into triggering other automated processes through event processing, but I’ll have to catch David offline for that conversation.

Lean Six Sigma & Process Improvement: David Brown of Motorola

I missed the first morning of the IQPC Lean Six Sigma & Process Improvement conference in Toronto today, but with my usual impeccable timing, showed up just in time for lunch (where we had to explain the rules of curling to the American attendees). The first session this afternoon is with David Brown, a black belt at Motorola, where the term “Six Sigma” was first coined and is still used to make their processes more effective, efficient, productive, and transparent.

There has been a transformation for them in how they analyze their processes: ranging from just looking at transactions to high-level intelligence including complex simulations and forecasting. Since they run SAP for their ERP, they have a number of SAP business intelligence (Xcelsius and Business Objects) products, although their most complex analysis is done with Oracle Crystal Ball.

Brown’s presentation was short – less than 10 minutes – and the rest of the session was an interactive one-on-one interview with questions from Charles Spina of e-Zsigma, the conference chair. The Q&A explored much more about how Motorola uses business analytics tools, and opened it up to the (small) audience for their experience with analytics. Not surprisingly, there has been quite a bit of success through the introduction of analytics to process improvement teams: sometimes it’s the black belts themselves, sometimes it’s a separate analytics group that works closely to develop the reports, analysis, and more complex intelligence based on the large volumes of data collected as part of any process improvement project.

Reporting tools can be as simple as Excel – for simple needs – through more complex solutions that include ETL from multiple data sources and regularly scheduled reports, such as Crystal Reports and Xcelsius. Legacy systems can make that a bit of a challenge; often these end up as extracts to Excel or Access, which are then remixed with other sources. Extracts such as this can be really problematic, as I’ve seen first-hand with many of my customers, since there’s no way to keep the data completely in sync with the underlying systems, and typically any one legacy system doesn’t have all the relevant data, so there can be a real problem in matching up related data from multiple systems. Brown underlined that the key issue is to get all of your data into a central data warehouse in order to determine if your data is complete and clean, and to facilitate reporting and analytics. This is especially important for process engineers when trying to do time studies over long periods of time: if you don’t have some consistent representation of the processes over the time period in question, then your analysis will suffer.

Motorola is using their data analytics to improve operational processes, such as order shipping, but also what-if scenarios to inform salespeople on the impact of discount levels to the bottom line. In many cases, this is an issue of data integration: Sabrina Lemos from United Airlines (who will be on the panel following) shared what they were able to recover in late container fees just by integrating their container tracking system with a database (Access, alas) that generates their invoices. Interestingly, I wouldn’t have thought of this as a process improvement initiative – although it is – but rather just as an artifact of doing some clever system integration.

They also discussed the challenges with presenting the results of analytics to the less numerically inclined, which often entails rolling data up to some simpler charts that can be drilled into as required, or just presented in a PowerPoint or PDF file. The real ROI may come from more interactive tools, however, such as dashboards that show operational alerts, or real-time what-if analysis to support human and automated decisions. Since Lean and Six Sigma tools are inherently analytical, this isn’t a new problem for the people in this audience; this is a matter of building relationships early with the non-analytical business managers, getting some early successes in projects to encourage adoption, and using different presentation and learning styles to present the information.

Because of the nature of this audience, the analytics that they’re discussing are typically for human consumption; in the BPM world, this is more and more moving to using the analytics to generate events that feed back into processes, or to inform automated decisioning. Either way, it’s all about improving the business processes.

NetWeaver BPM update #SAPTechEd09

Wolfgang Hilpert and Thomas Volmering gave us an update on NetWeaver BPM, since I was last updated at SAPPHIRE when they were releasing the product to full general availability. They’re readying the next wave of BPM – NetWeaver 7.2 – with beta customers now, for ramp-up near the beginning of the year and GA in spring of 2010.

There are a number of enhancements in this version, based on increasing productivity and incorporating feedback from customers:

  • Creating user interfaces: instead of just Web DynPro for manual creation of UI using code, they can auto-generate a UI for a human-facing task step.
  • New functions in notifications.
  • Handling intermediate events for asynchronous interfaces with other systems and services.
  • More complete coverage of BPMN in terms of looping, boundary events, exception handling and other constructs;
  • Allowing a process participant to invite other people on their team to participate in a task, even if not defined in the process model (ad hoc collaboration at a step).
  • The addition of a reporting activity to the process model in order to help merge the process instance data and the process flow data to make available for in-process analytics using a tool such as BusinessObjects – the reporting activity takes a snapshot of the process instance data to the reporting database at that point in the process without having to call APIs.
  • Deeper integration with other SAP business services, making it easier to discover and consume those services directly within the NetWeaver Process Composer even if the customer hasn’t upgraded to a version of SAP ERP that has SOA capabilities
  • Better integration of the rules management (the former Yasu product) to match the NetWeaver UI paradigms, expose more of the functionality in the Composer and allow better use of rules flow for defining rules as well as rules testing.
  • Business analyst perspective in process modeler so that the BA can sketch out a model, then allow a developer to do more of the technical underpinnings; this uses a shared model so that the BA can return to make modifications to the process model at a later time.

I’d like to see more about the ad hoc runtime collaboration at a task (being able to invite team members to participate in a task) as well as the BA perspective in the process modeler and the auto-generation of user interfaces; I’m sure that there’s a 7.2 demo in my future sometime soon.

They also talked briefly about plans for post-7.2:

  • Gravity and similar concepts for collaborative process modeling.
  • Common process model to allow for modeling of the touchpoints of ERP processes in BPM, in order to leverage their natural advantage of direct access to SAP business applications.
  • Push further into the business through more comprehensive business-focused modeling tools.
  • Goal-driven processes where the entire structure of the process model is not defined at design time, only the goals.

In the future, there will continue to be a focus on productivity with the BPM tools, greater evolution of the common process model, and better use of BI and analytics as the BusinessObjects assets are leveraged in the context of BPM.

Social media and business activity monitoring #BTF09

James Kobielus and Natalie Petouhoff presented at a breakout session on social media as a method for gaining visibility into your customer service processes: customers will react on social media channels such as Twitter, Facebook, review and community sites, and blogs if they have either a good or bad customer service experience. I’m not sure that this fits into the classic definition of BAM, but it does provide insight into how well you’re working with your customers.

They referred to the “witness factor” that social media has on business transformation: if people within the company know that they are being watched and commented upon, they often change their behavior in order to make those comments more favorable. Social media provides one window for a company into their customers’ impressions of the company and products; since people are much more likely to comment if they have a bad experience than a good one, those are overwhelmingly negative, but still represent valid complaints.

One problem with many current BAM applications is that they’re trapped within a BPMS framework, and are focused primarily on the data and events generated by that BPMS. Instead, we need to move towards a more comprehensive monitoring environment that can accept information from a number of different sources, including social media channels. Just think of tweets as events that can feed into a monitoring dashboard, allowing a customer service representative to review and respond to those in the context of any other customer-related events and information. Kobielus mentioned that there is little integration of social media into traditional BAM tools, but I think that we’ll see this sort of functionality being offered by other tools, such as more forward-thinking CRM.

This seemed to be a bit of a disjointed presentation, with social media on one side and BAM on the other, but there are ways to bring this together: in advance of this session, I started a discussion with my fellow Enterprise Irregulars about Twitter being used for customer engagement (not just one-way PR blasts), which has resulted in a fascinating stream of messages that weave around these same issues. After I’ve had a chance to digest those a bit more, and think about how this impacts on business processes, I’ll bring some of those ideas forward.

HandySoft BizFlow BPM

I caught up with Garth Knudson from HandySoft a few weeks ago; I’ve looked at their BizFlow product previously, and they’re currently at version 11.3 so have a pretty long track record. Although HandySoft handles the same sort of structured processes as you see in most other BPMS vendors, they really focus on ad hoc and dynamic (unstructured) processes, where either a user needs to jump out of an existing process definition at a particular step to an unstructured flow and bring the results back to the structured process, or even create a new dynamically-defined process. Some processes just can’t be modeled in advance due to non-standard processes, changing roles and responsibilities, or process participants and actions being dependent on the participating user request: this is more like managing a project rather than a traditional process, but with BPM capabilities and structured applied to it rather than trying to manage this in email. These types of dynamic processes can form a huge portion of an organization’s processes: think of all the ad hoc processes that you have now in email, only with no control or monitoring. Some significant research efforts are underway on dealing with dynamic processes, as I saw at the academic conference in Ulm two weeks ago; Gartner and Forrester are all over this area as well, so I expect that we’ll see some advances from many vendors in this area in the next few years.

The structured parts of the process are managed by BizFlow BPM, whereas the unstructured workflow portions, whether spawned from a structured process or initiated directly, are managed by the OfficeEngine front-end application; in both cases, the process engine is BizFlow. Although you use an email-like interface to kick things off, and email is used as a transport for external recipients, this provides tracking of ad hoc processes that’s just not possible in email.

HandySoft: Specify ad hoc task detailTo start a completely ad hoc process, you create a task, specify properties such as instructions and deadlines, and attach any documents required or link to documents in a ECM repository using a URL in the rich comments field on the launch form. You use ActiveDirectory/LDAP or type in external email addresses to select participants, specify whether the participants can reassign the task further, and submit the task; then, the task is available for monitoring and you can see who has done what in a graphical view. Process participants receive tasks as calendar invitations, then click through to login to BizFlow and work on the task assigned to them, which may include adding other people to the collaboration. The web-based user interface includes a list of ad hoc tasks in which you are participating, a work list for your activities within structured processes, a launch pad for initiating new tasks or processes, and a graphical view of your SLA scorecard. From there, you can click through to the task monitor for ad hoc tasks that you have created, and see the state of each participant.

HandySoft: Task monitoringSince external participants can’t access BizFlow directly, they do their work outside the system and reply; replies from external participants are returned as a proxy, and an internal user must enter the response manually. This sort of one-step collaborative process – including multiple participants and reassignments – can replace the current practice of emailing around to multiple people for information or comments, then manually tracking to see who has responded. In an environment dominated by ad hoc processes in email, this provides a big benefit for tracking who is doing what, and when.

It’s fairly similar for launching an ad hoc task from a structured process: the structured process is modeled (using BPMN) in a similar fashion to other BPMS tools, and launched using a web form. From the participant’s UI at any step, however, you have an “Assign a task” tab that pops up the same form as was used for the purely ad hoc tasks; essentially, this allows delegating the structured process activity to the collaborative task, which can then include people who were not originally involved in the structured process. It doesn’t change the structured process; it just pops out to a collaborative task at this point, and when that completes, it returns to this step in the structured process and continues on. Just as with the standalone ad hoc tasks, this reduces the amount of unmonitored email activity that is prevalent in many structured processes where someone needs to request more information at a step.

In many BPM implementations, there is an attempt to capture all possible exceptions and collaborations as part of the structured process, but in reality, this just isn’t possible; they end up in email, phone calls and other untraceable activities. As Clay Richardson of Forrester pointed out in his vendor snapshot on HandySoft:

Traditional BPM platforms perpetuate the myth of neatly structured processes – with most vendors providing ample support for capturing reoccurring and well-defined workflows, but minimal support for managing unstructured and dynamic business processes. This chasm between the worlds of structured and unstructured processes forces teams to develop custom workarounds to handle ad hoc routing and collaborative interactions, ultimately increasing the time and cost to deliver BPM solutions.

HandySoft: Detailed stats of SLA violationAllowing an ad hoc, yet monitored, task to be launched from any point in a structured process has the effect of reducing the complexity of the structured process without sacrificing monitoring and auditability of the process. In BizFlow, launching an ad hoc task from a structured process causes an indicator to appear on the graphical view of the executing process to indicate that a task has been launched from that point, and the complete audit trail of structured and unstructured processes is maintained. If the ad hoc task isn’t completed within the specified deadline, that SLA violation shows on the structured process monitoring, and you can click to the OfficeEngine interface for detailed monitoring of the task.

On their product release agenda for later this year are a reporting service module – there’s already a fairly capable BAM functionality – and full rich internet application development capabilities to create more usable web forms for the user interface.

Discovering Reference Models by Mining Process Variants Using a Heuristic Approach #BPM2009

Chen Li of University of Twente gave the last presentation of the conference on process variant mining. We heard yesterday in the tutorial on flexibility about process variants; one issue with process variants is that there needs to be some way to identify which of the variants are important enough to update the original process model. The paper describes a heuristic search algorithm for determining which of the variants are important, by considering both the similarity to the original process model and the most common changes amongst the variants, e.g., the same new step is added to almost all process variants.

Since the process can be varied at runtime, the new process model doesn’t have to be perfect, it just has to be better than the original one. In general, the more activities contained in a candidate model and the more that its structure matches that of the variants, the better it is: they have created a fitness function that combines these two parameters and calculates how good a specific candidate model is. The search tree used to find the optimal candidate process model generates all potential candidates by changing one activity at a time, calculating the best fit, then replacing the original with the candidate if it is better than the original. This continues until no better candidate model can be calculated, or until you reach your maximum search distance (which would be set in order to bound computations).

The algorithm was evaluated with simulations, indicating that the most important changes tend to be performed at the beginning of the search.

Divide-and-Conquer Strategies for Process Mining #BPM2009

In the first of two papers in the final session of the conference, Josep Carmona of Universitat Politecnica de Catalunya presented on process mining calculation strategies. The theory of regions shows how to derive a Petri net representation of a process model from the process log, which shows the transition between states, but it’s very computationally expensive. This paper deals with ways of making that computation less expensive in order to deal effectively with large logs.

First is a decompositional strategy, which partitions the regions in a way that allows the identification of a set of state machines that cover all the events, then uses parallel composition to assemble the state machines into a Petri net.

The second approach is a higher-level divide-and-conquer strategy, where the event log is recursively partitioned by event class until the log sections are small enough to use other techniques. The clustering of the events is the key thing here: first, compute the causal dependency graph, then use spectral graph theory to find clusters of highly related events that will be partitioned off into their own section of the event log.

What they’ve seen in experiments using this technique is that there is a significant computational improvement (from minutes to seconds) from the decompositional approach, and that the divide-and-conquer approach allows for the processing of event logs that are just too large for other techniques.

You can get Genet, the tool that they developed to do this, here.