Peter Schoof interviewed me last week on the subject of robust BPM. (Thanks Peter!) This had been the basis of a talk I gave in Montreal at the Workshop on Methodologies for Robustness Injection into Business Processes. It is a quick 15 minute summary:
The main point is that the standard mechanism for reliability in software engineering is the database transaction. Systems can be made to always be consistent through proper use of transactions, how BPM often spans many systems. Large distributed transactions, while theoretically possible, are not practical. Therefor, you will always run into consistency problems, which must be dealt with. The answer to making a reliable BPM solution is not sweep the problems under the rug, but rather to make sure that any such problem is quickly and reliably reported. When a problem occurs, stop processing right away, and just record the issue. Instead of designing processes as an opaque black-box for the user, allow a dashboard-like visibility to some of the key, separate parts of the process, and give status lights to indicate whether the remote process ran correctly or not. This means that your process system must be instrumented to be able to report on status, particularly error status. (It is not OK to fail, and then just go dark. The system has to be able to report about failures.) Instead of trying to prevent all possible failures, the system needs to be designed with the idea that failures will happen, and to be able to record and communicate about them.
- You want your BPM diagrams to be a clean, pure representation of the business logic, without muddying from the reality of the hosting environment. Such a process would run in an idealized perfect environment, but we don’t actually have such an environment.
- Do not confuse this BPM diagram with the system architecture! You need a system architect to take the business logic, and translate it to the realities. Parts of the system are reliable and can have a faithful translation. Other parts might be out of sync, and as such special mechanisms must be included to help notify people of problems (failures) and to give them controls to restart things when necessary.
It is ironic, that to make a robust reliable system, you do so not by hiding problems, but by exposing all the problems as they happen.