Guide to PM4Py: Python Framework for Process Mining Algorithms – Analytics India Magazine

Processes are throughout us. Any sequence of duties that collectively obtain an goal could be known as a Process. Thanks to the digital revolution copious quantities of knowledge associated to various processes are being generated and amassed. In the sphere of Data Science, evaluation and drawing insights from the operational processes is of explicit significance. Modelling the method permits us to carry out conformance checks and even present us with the potential to enhance the processes. This sort of extraction of insights from occasion information is known as Process Mining. In this text let’s dive deeper into the method mining methods with python.

Process Mining is the amalgamation of computational intelligence, information mining and course of administration. It refers back to the data-oriented evaluation methods used to attract insights into organizational processes. Following is a basic framework of course of mining.

Real-world occasions and enterprise processes management the software program methods and generate occasion logs. Each log corresponds to exercise together with further data resembling timestamp, kind, the context of the occasion and so on. The availability of this sort of information is essential for the appliance of Process Mining. A mannequin is constructed on high of this information which might current the processes occurring in an actionable means.

Model Discovery

Process Mining consists of three principal parts: Model Discovery, Conformance checking and Model Enhancement. Discovery is the method of mechanically producing a mannequin from occasion logs that may clarify the logs themselves with none prior data. There are a number of algorithms that can be utilized for this discovery course of. An Example Process Model generated by an automatic platform

Conformance Checking

The second part of course of mining is conformance checking. In this step,  we juxtapose the occasion logs with the method mannequin of the identical course of. This reveals any non-conformances. Example: Transactions over 1 lakh rupees require the PAN card of the person. This constraint could be expressed by the method mannequin. Then we are able to examine all of the occasion logs to verify if this rule is adopted.

Model Enhancement

In the third step, we use the method mannequin that’s found and the outcomes of conformance checks to determine the method bottlenecks, round loops and undesired aberrations within the processes. Equipped with this information a brand new enhanced course of is applied and a goal course of mannequin is constructed. This new course of mannequin is once more enhanced utilizing the identical steps. Repeating these steps time and again leads to the continual enchancment of organizational processes.


Pm4py is an open-source python library constructed by Fraunhofer Institute for Applied Information Technology to assist Process Mining. Following is the command for set up.

!pip set up -U pm4py

Data Loading

This library helps tabular information enter like CSV with the assistance of pandas. But the really useful information format for occasion logs is XES(EXtensible Event Stream). This is an XML primarily based hierarchical, tag-based log storage format prescribed by IEEE as an ordinary.

Let’s load some financial institution transaction logs saved in xes format. Data is downloaded from this website.

 from pm4py.objects.log.importer.xes import importer as xes_importer
 log = xes_importer.apply('/content material/banktransfer(2000-all-noise).xes')
 If we desire to make use of pandas to analyse the info we are able to convert the imported logs as follows.
 import pandas as pd
 from pm4py.objects.conversion.log import converter as log_converter
 df = log_converter.apply(log, variant=log_converter.Variants.TO_DATA_FRAME)

We can see that the three most vital attributes, case id, timestamp and identify of the occasion are current. Let us cut back the variety of rows by limiting the variety of traces. This could be finished by pm4py’s personal suite of filtering capabilities.

See Also

 from pm4py.algo.filtering.log.timestamp import timestamp_filter
 filtered_log = timestamp_filter.filter_traces_contained(log, "2013-01-01 00:00:00", "2020-01-01 23:59:59") 

Model Discovery

PM4PY helps three formalisms that signify the method fashions: PetriNets(Place Transition Net), Directly Flow graphs and Process bushes. We will confine ourselves to utilizing Petrinets on this article. Following is the outline of Petrinets revealed within the pm4py documentation.

Petrinets could be obtained utilizing a number of completely different mining algorithms.We will use one such algorithm known as alphaminer.

 from pm4py.algo.discovery.alpha import algorithm as alpha_miner
 web, initial_marking, final_marking = alpha_miner.apply(filtered_log) 

Visualizing a Petrinet

 from pm4py.visualization.petrinet import visualizer as pn_visualizer
 gviz = pn_visualizer.apply(web, initial_marking, final_marking)

Conformance Checking

Following is an instance code to carry out conformance checking.We generate a mannequin utilizing part of the log after which validate all the log.

 from pm4py.algo.discovery.inductive import algorithm as inductive_miner
 from pm4py.algo.filtering.log.auto_filter.auto_filter import apply_auto_filter
 from pm4py.algo.conformance.tokenreplay.diagnostics import duration_diagnostics
 #Generating mannequin utilizing solely part of the log
 filtered_log = apply_auto_filter(log)
 web, initial_marking, final_marking = inductive_miner.apply(filtered_log)
 #Checking all the log for conformance with the mannequin
 from pm4py.algo.conformance.tokenreplay import algorithm as token_based_replay
 parameters_tbr = token_based_replay.Variants.TOKEN_REPLAY.worth.Parameters.DISABLE_VARIANTS: True, token_based_replay.Variants.TOKEN_REPLAY.worth.Parameters.ENABLE_PLTR_FITNESS: True
 replayed_traces, place_fitness, trans_fitness, unwanted_activities = token_based_replay.apply(log, web,
 #Displaying Diagnostics Information
 act_diagnostics = duration_diagnostics.diagnose_from_notexisting_activities(new_log, unwanted_activities)
 for act in act_diagnostics:
     print(act, act_diagnostics[act]) 

Subscribe to our Newsletter

Get the newest updates and related affords by sharing your e mail.

Join Our Telegram Group. Be a part of an interesting on-line neighborhood. Join Here.
Pavan Kandru

Pavan Kandru

AI fanatic with a aptitude for NLP. I really like enjoying with unique information.


Please enter your comment!
Please enter your name here