data:image/s3,"s3://crabby-images/46cb5/46cb50eb37958f08af19f84144898ae54393db2e" alt="Process mining"
Process Mining can be defined as a combination of data science and business process management (BPM) techniques that enables the analysis of business processes by examining logs and event data generated by information systems. These data originate from various business sources, such as ERP (Enterprise Resource Planning), CRM (Customer Relationship Management), human resource management systems, and production management systems. The core principle of Process Mining is that it utilizes data actually generated by processes.
In its classical form, Process Mining algorithms are used to build a process model from the extracted data. This model visualizes the flow of activities, representing the sequences of events and the various paths followed within the process. The output can be a graphical map that displays all possible process variants, including exceptions or deviations from standard flows. Models in this category are referred to as imperative models and, in the case of well-defined and standardized processes, they can accurately and comprehensively describe the analyzed processes.
However, for processes lacking these characteristics, imperative models often prove unsuitable because they risk losing generality by excluding valid variants or becoming unreadable and unusable—the so-called “spaghetti models.”
data:image/s3,"s3://crabby-images/e1692/e169256de87d08d307024d4160bc2266b76e2e02" alt=""
In Figure 2, a representation of a real process is shown as the green area on the left. An imperative model can be represented as the blue ellipse in the center of the figure; it may capture a portion of the real process while potentially including variants and paths that are not present in reality. A declarative model, on the other hand, is based on rules and constraints that are used not to define all possible process paths but to delimit the region where the process traces can be found. This is graphically represented on the right side of the figure.
data:image/s3,"s3://crabby-images/5de36/5de3697222e22b1d25b504caef7c62d77ae54d18" alt=""
Revelis believes in research and development activities, and recently developed explainable and aware process intelligence techniques. To this end, recent advancements in artificial intelligence and declarative languages and techniques have been integrated into Process Mining to represent flexible processes.
The work carried out by Revelis aimed to implement a demonstrator prototype of these techniques by applying them to an industrial case study.
Declarative Process Mining
Declarative Process Mining represents an innovative evolution in process analysis. Unlike traditional Process Mining, which mainly relies on extracting process models from event logs, declarative Process Mining introduces a higher level of abstraction. Declare is a declarative language for process modeling that consists of a set of templates expressing the temporal properties of process execution traces.
The Declare models (or templates) can be classified into four distinct categories, each addressing different aspects of process behavior:
- Existence Models: Specify the necessity or prohibition of executing a particular activity, potentially with constraints on the number of occurrences.
- Choice Models: Focus on the concept of execution choices, modeling scenarios where there is an option regarding which activities can be executed.
- Relation Models: Establish dependencies between activities, specifying that the execution of one activity requires the execution of another, often based on specific conditions or requirements.
- Negation Models: Model mutual exclusivity or prohibitive conditions in activity execution.
For example, the Existence(A) model indicates that activity A must appear in a process log. The Exclusive Choice(A, B) model specifies that activities A and B can appear in a trace but not simultaneously. For the ChainResponse(A, B) model, every time A appears in a process instance, B is the immediately following activity.
Answer Set Programming
Answer Set Programming (ASP) is a form of non-monotonic logic programming that focuses on solving problems by representing them as sets of logical rules. Instead of providing a single result, ASP provides a set of models (or answer sets) that satisfy all the rules of the program. This makes it particularly suitable for tackling complex and uncertain problems, where multiple valid solutions may exist.
Encoding Declare Models in ASP
Within the project, an automatic translation technique for Declare models and process logs into ASP logic programs was developed. This allows the use of ASP for tasks such as Conformance Checking and Query Checking. In the first case, the goal is to verify the adherence of an event log to a set of Declare templates. In the second case, the aim is to identify all possible solutions to a query by specifying a Declare template, a process activity, and one or more variables.
Example:
ChainResponse(‘A’, X) with minimum support 0.8
Let A be a process activity and X a variable. The result of the query is given by all the activities that immediately follow activity A, making the model true in at least 80% of the cases.
Case Study
The techniques presented so far were applied using data on the management of applications submitted for obtaining Development Contracts (CdS) with an important national development agency. The process under consideration is divided into two independent macro-phases (Instruction and Technical Verification), which were analyzed separately. Using the specifically implemented prototype, the first task was to derive a Declare model for the two phases. An excerpt of the model for the Technical Verification phase is shown here.
0: Existence1[VT merit in progress] | | |
6: Exactly1[VT merit in progress] |
8: End[VT merit in progress] |
14: Init[VT Merit to be assigned] |
15: Choice[VT merit in progress, VT Merit to be assigned] |
16: Choice[VT Merit to be assigned, VT merit in progress] |
21: Chain Response[VT Merit to be assigned, VT merit in progress] |
21: Chain Response[VT Merit to be assigned, VT merit in progress] |
23: Alternate Precedence[VT Merit to be assigned, VT merit in progress] |
24: Chain Precedence[VT Merit to be assigned, VT merit in progress] |
27: Not Chain Response[VT merit in progress, VT Merit to be assigned] |
28: Not Chain Precedence[VT merit in progress, VT Merit to be assigned] |
Each line represents a rule that is valid for the log traces analyzed by the algorithm. For example, lines 0 and 6 indicate the existence of a single instance of the activity “VT merit in progress.” Using this model, properly revised, it is possible to perform Conformance Checking, which determines whether process instances violate the specified constraints. This technique was applied to a set of traces provided by the client, which was not used for creating the model. Among the analyzed instances, the following two reported constraint violations, as shown in the table below:
Trace: 0 Constraints: 15, 16 |
Trace: 1 Constraints: 23, 24 |
By checking the violated constraints in the traces, we can see that:
- In trace 0, the violations of constraints 15 and 16 are due to the absence of at least one of the activities “VT merit in progress” and “VT Merit to be assigned.”
- In trace 1, constraints 23 and 24 are violated, indicating that the activity “VT merit in progress” is not preceded by the activity “VT Merit to be assigned.”
In the Declare model shown earlier, it can be seen that the activity “Merit to be Assigned” is marked as init in row 14 of the Technical Verification flow table. To check which activities are performed when it is present in the log, the Query Checking technique can be used by querying the prototype with the Precedence template, with the first parameter as “Merit to be Assigned” and the second parameter as the variable X. The results are:
- X: VT merit completed – support: 1
- X: VT merit in progress – support: 1
- X: VT Merit to be assigned – support: 1
This indicates that the three activities mentioned above are performed only if “Merit to be Assigned” is executed.
To check for alternative paths to the one starting with “Merit to be Assigned,” an “Exclusive Choice” query can be used, with the first parameter as “Merit to be Assigned” and the second as variable X. The results are:
- X: VT merit completed – support: 0.7272727272727273
- X: VT Fast Track to be assigned – support: 0.7727272727272727
- X: VT Fast Track in progress – support: 0.7727272727272727
- X: VT fast Track completed – support: 0.7272727272727273
As can be easily seen, there exists an alternative Technical Evaluation branch for the instances, called “Fast Track.”
Conclusions
The techniques and algorithms described here enable a new approach to Process Mining, offering a more flexible analysis. In particular, it is possible to describe the properties of a process using constraints that are either predefined or automatically discovered. These constraints can be used to identify anomalies in process execution through Conformance Checking analysis. Furthermore, the characteristics of the process can be explored by posing queries using Query Checking techniques.
Author: Luigi Granata