Published in TDAN.com July 2001
Business Rule Mining is the process of extracting business rules from program code and restating them in business terms. Why would someone want to do that? As a matter of fact, it is appropriate to
use Business Rule Mining in a number of situations:
- Application documentation and code maintenance – Many legacy applications have sub-optimal, if any, documentation at all. Many times, because of the age of the system, there are no longer SMEs
available who have familiarity with the application. Business rule mining can serve to document these significant rules and provide an understanding of the application. This is of particular
interest when maintenance needs to be applied. Depending on the characteristics of the system, mining 20-35% of the attributes provides a good understanding of the application. - Accelerating and improving the quality of system re-engineering efforts – These same techniques can be used to quickly focus in on the most “rule rich” portions of a legacy
application, and to gain a thorough and unambiguous understanding of these. Such an approach not only speeds up the re-engineering process, but also provides a much higher level of confidence that
key business policies embedded in the existing system (and no longer understood) are excavated and forward engineered correctly. - Enhancing a Data Quality effort – Quite often, the reality of the values found in attributes during data quality analysis does not match the SMEs knowledge of a business rule. Mining of
the rule from the code in this situation serves to either initiate program maintenance to state the rule correctly or of documenting and informing the SME of additional processing that occurs. - Fast Track Rules Engine Deployment from COBOL – Business rules engines allow implementation of rules quickly while permitting SMEs better control over the management of how and when rules
are implemented. Before these rules can be implemented in a rules engine, the current rule implementation must be documented and validated by SMEs. - Package Implementation – Currently many companies are moving from developing proprietary applications to implementing package solutions. Again, these packages mandate that the
organization have an understanding of how their rules are currently implemented in order to understand how the package meets their goals and where enhancements must be made.
Business Rule Mining is an effective and repeatable technique to facilitate these efforts. It is an inherently tool-assisted process that also requires solid, “hands on” analytical
skills. So while there is no ‘silver bullet, the good news is that supporting tools do go a long way in not only automating tedious manual tasks but also facilitating the analytical process
itself. And their ability to perform these tasks is improving every day. Also note that, today, these tools are targeted toward COBOL code specifically.
Business Rule Mining can be broken down into four distinct phases: archeology, program inspection, data analysis and rule integration.
- Archeology is the process of inventorying an existing application and determining which attributes it is appropriate to mine. The main purpose of this phase is to identify portions of the
system that are likely to contain the important business rules. Deliverables from this phase include an application inventory listing all programs, copybooks, JCL, and procs, CRUD matrix,
applications flows, and list of programs/attributes to mine. - Program inspection is the process of examining legacy program code, uncovering and documenting the business rules. Deliverables from this phase include mined rules and attribute traces.
- Data Analysis is the process of modeling the data used by the application. The level of data abstraction is dependent on the rule mining objectives. Where the objective is program
documentation, the data model may be physical in nature. When the purpose is to verify and validate the rules to a package, it is more beneficial to elevate the model to a logical level.
Deliverables from this phase include a logical and/or physical data model and rules restated in terms used in the data model. - Rule integration determines if rules mined are the same or inconsistent, and facilitates incorporation of the individual mined rules into a single integrated rule. Deliverables from this phase
are identification of rule redundancies and inconsistencies, and integration of the same rule implemented in multiple places.
There are several tools in the marketplace today that enable the rule mining process in the areas of application inventory, narrowing the focus on where to mine, isolating code slices (see
explanation of “code slices” below), attribute traces and identification of rules to integrate. They are:
- Mosiac Studio from SEEC – >http://www.seec.com
- HotRod from Netron – www.netron.com
- RescueWare from Relativity – www.relativity.com
- Cosmos/ES from Emendo – http://www.emendo.com
- MineIt from Intercomp – http://www.legacy2web.com
For purposes of this article, only the Seec Mosaic Studio and Netron HotRod have been reviewed and will be considered.
Application Inventory – In the Archeology Phase, both Mosaic Studio and HotRod assist in the application inventory process by importing the source components (Cobol source,
JCL, Procs, Copybooks) into the product. The components are then catalogued and a number of reports produced. A comprehensive inventory of an application can be completed within a relatively short
timeframe, usually in a few days. Producing this inventory manually is time consuming and error prone.
Narrowing the focus on where to mine – One of the most important tasks in mining is figuring out where to do it. Where will the gold be found? A good tool will point the user
in the appropriate direction.
Mosaic Studio assists by providing a CRUD matrix. Programs that insert or update persistent files or tables are the primary candidates for rule mining. Also, Seec provides metrics that rate
programs based on complexity. The intersection of these two indicators help focus the effort, and assist in gaining a rapid understanding of the overall system flow.
HotRod has several proprietary algorithms that assist in this process. The first one is called ‘clone detection’ that analyzes all the code and detects programs that may be the same.
This will reduce the set of the programs to analyze. The second algorithm (based on the algorithm used by the Hubble Telescope to analyze collections of stars) uses a pattern based search to
identify (fingerprint) programs or routines that are likely to contain business rules.
Isolating code slices – A code slice is the set of program statements in source code order that is executed before the selected statement can be executed. Basically, what this does
is throw out all the code statements that will not be processed before executing the selected statement. This is the place in the program where the business rule is finally stated. But all the code
leading up to this statement – and only that code – must be considered to ensure that the context and the actual business rule is correctly understood. Both Mosaic Studio and HotRod
produce a code slice. This code slice contains all the logic necessary to extract a business rule. Once the slice is created, the analyst extracts the rule by:
- Removing all the noise statements such as processing specific logic
- Removing the intermediate variables
- Restating in business (or logical) terms
While similar in overall objective, Mosaic Studio and HotRod differ slightly in their overall approach to code slicing. Mosaic Studio provides these features as an inherent part of the tool. Using
its pattern matching capabilities, HotRod allows you to assemble together various pieces of system that appear to provide the same functionality or incorporate the same business rules into a
“Reusable Object Document” or ROD (thus the name “HotRod”). Detailed analysis of this content can then be performed using Merant’s Revolve product, which HotRod sits
on top of and integrates with. It is this integrated package of tools that comes packaged from Netron. Merant also markets this product directly under another name.
Attribute traces – During the Program Inspection phase, an attribute trace serves two purposes. First, it is important to trace the attribute path through an individual
program to find the various names the attribute is known as to facilitate the removal of intermediate variables in a mined rule. The second purpose is to follow the flow across programs to
determine if the complete rule has been uncovered.
Seec’s Mosaic Studio facilitates this in several ways. Within a program, an attribute can be traced to all statements that modify the attribute. This can be done iteratively until all
modifiers can be identified. The synonym analysis process detects synonyms of a selected attribute and then finds all program references to both the initial attribute and its synonyms across the
system. This together with the application flows and job flows provide an analysis path to determine the attribute flow within the system. Again, Netron relies on the underlying functionality of
Merant’s Revolve to accomplish this task.
Identification of rules to integrate – During the Rule Integration Phase, rules that are the same are identified and integrated into one rule where they may differ. Rule
inconsistencies are also identified.
While this is to some degree an analytical step accomplished after the rules have been mined, the tool features used to support Archeology and Program Inspection can accelerate the discovery of
these discrepancies and redundancies early on. Seec’s Mosaic Studio provides this functionality by searching on all attributes that are named the same through as well as the synonym analysis
described above. Netron’s HotRod provides a common code detection algorithm that uses the fingerprint defined above to identify pieces of code that are the same or similar.
In summary, Business Rule Mining is a combination of tools and method that can greatly accelerate the rule mining effort and improve results in several areas. In any of these cases, though, the
objectives must be clearly stated up front for the process to be most effective. From there, tailor the methodology and tool selection and training to fit the situation. And don’t forget to
include some planning for management of the mined rules. After all that digging, the last thing that should be done is to rebury them! We’ll talk ore about that in the next column.