Process Mining Update

There has been a surprising amount happening around Process Mining recently, notably the process intelligence workshop, the IEEE task force meeting, and some other links.

Business Process Intelligence Workshop

The 6th International Workshop on Business Process Intelligence (BPI 2010) was held on Monday, September 13, the workshop day of the BPM 2010 conference.  There is a very close relationship between BPI and process mining — they are not exactly the same.  Probably half the papers were on process mining.

1. BPI Keynote. Michael zur Muehlen kicked off the workshop with a good overview of the goals we should be seeking.  This was a good talk, and only a fraction of it is captured here, but you can find the slides posted here.

Researchers need to remain grounded in the real world.  He talked about a simple process implemented at a German Bank. Such a simple process, which focused only on a particular difficult to handle exception case, would not be exciting for researchers, but the organization itself was thrilled.

He asked: Why can’t we have “Amazon for BPM”?  Amazon’s affinity mechanism shows you other books similar to the one you are looking at.  If you are interested in one  process, imagine how it would be if the system could show you another 5 processes that are similar on some basis.  This would be a good use of BPI.

We must remember that the customers of a process are often not interested in the process, only the result of the process.  Includes customer needs, customer issue, and service level agreements.  Six sigma calls it “the voice of the customer”.  Another driver is Business Strategy: the voice of the bottom line.

Voice of the process however is about key goal indicator (KGI), and key performance indicator (KPI).  The first measures the process objectives, while the latter measures efficiency target.

Semantics and Context.  There was an EU project called “Super” to study this.  If you look at process, there are several classes of information.  At design time there is are BPMN semantics, process semantics, metadata, payload semantics, layout.  Then at runtime you have processing behavior and payload instance semantics.

Open Issues for Business Process Intelligence

  • predicting performance based on case data (scheduling)
  • dealing with event outside of workflow scope (non-workflow systems)
  • Modeling complex event processing (reactive / adaptive systems)
  • Linking technical metrics to goals (tracability)

Look outside the process box, consider context, and there are exciting possibilities.

2. Mining Context Dependent & Interactive Business Process Maps using Execution Patterns (Li) told about a promising approach to make understandable output maps.  The problem with mining is that in a real world situation, the resulting map is usually a very messy spaghetti diagram.  This approach has the user categorize groups of activities in subprocesses.  The result for this modest up-front investment is a significantly simplified diagram can be generated.

3. Toward Robust Conformance Checking (Adriansyah)  One of the main uses of BPI is to tell whether an organization is doing everything they said they would.  In this case the desired process diagram is known, and this paper proposes some ways of producing a numerical measure representing how closely the log of events matches the expected process.

4. User Assistance During Process Execution – An Experimental Evaluation of Recommendation Strategies (Haisjackl)  My choice for the most interesting paper of the day.  This paper starts with the understanding that some people (a.k.a. knowledge workers) need a flexible environment that allows them to self-determine what to do next.  At the same time, the history if what has gone before can be very helpful as guidance.  This paper describes a way to consider what has happened so far in a case, and to find matching cases in history, and to suggest possible “next steps” to the person running the case.  The results are promising.  I am very interested in seeing more work in this direction!

5. Run-time Auditing for Business Process Data using Constraints (Gomez-Lopez) proposes that as part of designing a process definition, one include a set of constraints that the data values must satisfy.  These can be evaluated everytime the data is changed, yeilding early detection of errors.  This will be effective sometimes, but only in those processes which can be completely and unambiguously expressed in advance.  Also, it is not clear how much trouble the constrain expression will cost to create, versus the benefit of catching the error early. To me this is another programming tool for highly automated processes.

6. A critical evaluation of model-log metrics in Process Discovery (De Weerdt) various process mining algorithms will yield different resulting models.  This paper proses several ways to measure the quality of the resulting model, which can be used in determining which mining algorithms are better.

7. BPAF: A Standard for the Interchange of Process Analytics Data (zur Muehlen) & also myself as co-author, presents the result of WfMC work on standardizing the event stream.  Meant really as a means to promote discussion of such a standard, and what the requirements should be able to cover.  See IEEE discussion below.

8. Revising Process Models through Inductive Learning (Maggi) once again we see the recognition that processes are not fixed, and must change over time.  This proposes an approach to find a minimal change to an existing process model, which can incorporate the changes mined from the event log.  It is really a very compelling vision for supporting the long term maintenance of process models.

9. Improving the Diagnosability of Business Process Management Systems Using Test Points (Borrego) an approach is given to reduce the complexity of the job of determining if a process is running correctly.  Seems to me to be aimed mostly at fully automated (programmed) processes.

10. Toward Obtaining Event Logs from Legacy Code (Perez-Castillo) present an approach which might be useful when an organization is dependent upon a legacy system and they no longer have any idea what it does.  You can generate log files from the code, but the indiscriminate peppering of the code with log statements will not necessarily yield useful results.  This paper tells you how to approach this task so that you get useful output.

11. Dimensions of Business Process Intelligence (Lnden) presents a discussion of the terminology of the field, and the direction it might be taking.

12. PLG: a framework for the Generation of Business Process Models and their Execution Logs (Burattin) presents a way to generate event logs which can be used for testing of process mining tools.

IEEE Task Force on Process Mining

The goal of this Task Force is to promote the research, development, education and understanding of process mining.  Web site:

I have been involved with this task force since it was first proposed in summer 2009, but September15th at BPM 2010 at Stevens Institute of Technology was the first time this group held a meeting.  The meeting was called by Wil van era Aalst who has been motivating the task force, along with others at TU Eindhoven.  Being the first meeting, we are not expecting any conclusions, but rather setting the stage for getting work done.

Attendance was mostly professors and grad student, which is not surprising given that BPM2010 is primarily an academic conference.  John Hoogland (Pallas Athena) and I were there as representatives from industry.  Unfortunately I had a panel session just before the meeting, and a talk schedule after, so I had to come late and leave early.

There was a lot of discussion about clarifying terminology. For example, Fujitsu calls this “Automatic Process Discovery” instead of process mining.  Why not have everyone use a single term?  At the same time, there was discussion of some people who call it process discovery when it is not.  What is needed is a clear definition, and probably a glossary of terms.

The task force might be the right place to create standards.  The most important standard needed is for the exchange of event data.  A proposal exists: the XES (Extensible Event Stream) format which is already supported by the open source ProM project.  XES is comparable to BPAF from the WfMC, and Michael zur Muehlen and I presented a paper covering the similarities and differences (see above).  Such a standard will help in two ways: first it will help vendors and practitioners in making it easy to connect various products together, and second, it will help the formation of a large collection of use cases event histories that can be used for research activities.  BPAF contains some concepts not yet present in XES, so I not have an action item to figure out how to extend XES to have this capability.

XES is an unusual approach to encoding information in XML, with the tags being data types, and the name of data is an attribute of the tag.  I believe this unusual approach is yet another attempt to remain flexible while complying with the completely inappropriate demands of XML Schema.  The more I work with XML Schema, particularly “Validation”, the more I think that it, like the WS-* approach in general, simply misses the mark.  Why am I not surprised to see SalesForce endorse a REST approach just this week.

Of all this, most important is for the task force to serve as a place to promote the proper uses of process mining, educate the market about the potential, and to discuss and resolve disagreements when they arise.

Other Process Mining Resources

Process mining is clearly a new and very hot field.  It has gone from initial conception 10 years ago, to shipping products today, very fast by any measure.  And the value of this mining is easy to prove.  I anticipate in the next few years process mining will become increasingly a regular part of our technology infrastructure.

This entry was posted in BPM and tagged , . Bookmark the permalink.

9 Responses to Process Mining Update

  1. Hi Keith,

    Great overview about all things process mining at BPM2010, thanks!

    For people interested in an overview about the XES standard, we have written up a short introduction on our blog here:

    — Christian

  2. Pingback: Pitfalls of Process Mining — Part 1 — Flux Capacitor

  3. Pingback: Mining Activity Streams | On Collaborative Planning

  4. Pingback: Process Mining at BPM2010 — Flux Capacitor

  5. Hi Keith,

    I’ve been using BPAF (as described in the spec currently available at the wfmc website) in an environment where we are tracking business process events. Here are some of my observations:

    1. Strictly resource related events: One type of event we are capturing is when resources come on and off shift. In this way we can calculate the overall time a resource is available for work. The problem is that the current state model doesn’t really work well for these events. I suppose you could say that the shift is “In Progress”, but I think you’d agree that being on shift is a different sort of thing than a work item being processed. Specifically, the time a resource is scheduled is a different kind of measure that work processing time.

    I’d like to see some resource related additions to the state model (like “On Shift” and “Off Shift”), or some other way to differentiate a strictly resource related event from a process activity related event. I know I could different between events via a data element, but then it would be by convention instead of by spec, and this seems a little too fundamental IMHO for that kind of approach.

    2. Event log specifications like Prom’s MXML have an explicit element for the activity performer(s). I think this is useful in that events will often be associated with specific activity instances and specific resources. I know I could specify the resources as a data element, however, I am thinking that this is a rather common requirement.

    3. I have a need capture both actual and virtual events. The virtual events are the output of simulation, forecasting and other predictive tools. Therefore, the actual and virtual events can “overlap” and two events can represent the same thing – the same activity instance in the same process. It would be great to be able to differentiate
    between these. In this case we could also use a data element to specify the “scenario”, e.g. actual vs. virtual, however, I thought I’d through it out there as an idea to add the scenario directly and explicitly to the schema.

  6. Pingback: What the Process Mining Manifesto means to ACM | Collaborative Planning & Social Business

  7. Pingback: What the Process Mining Manifesto means to ACM | Collaborative Planning & Social Business

  8. Pingback: Process Mining Manifesto clarifies Market for APD « Fujitsu Interstage Blog

  9. Pingback: Process Mining MOOC on Coursera | Collaborative Planning & Social Business

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s