Published in TDAN.com October 2000
Remembering – What is Business Rule Mining
|Rule Mining Phases||Description|
|Archeology||The aptly named discovery and collection of the initial artifacts, collecting the evidence of the organization’s life and culture.|
|Program Inspection||The process of mining the procedural code for business rules.|
|Data Inspection||The process of mining the data structures, as well as some program code components, for business rules.|
|Rule Integration and Validation||
Integrating the results of program and data inspection, recasting mined system rules into business terms, verifying correctness and validating that these are actually the
right (desired) rules and that they are complete.
|Forward Engineering||Implementing the rules in new technology or restructuring existing technology.|
|Rule Management||The task, spanning phases, that records, organizes and makes available the mined rules.|
In this edition of TDAN.com, we want to continue the discussion of Business Rule Mining by examining how it can be applied to improve the Sourcing, Data Quality, Data Analysis and Meta Data
components of a Data Warehousing Project.
So – Where are the Business Rules in Data Warehousing?
In the course of designing and populating a data warehouse, some key questions must be answered about the data being incorporated in the warehouse. More often than not, many of these answers are
not known at the outset of the project, but must be established if the data warehouse is to succeed. Interestingly, these for the most part represent the same contextual information about the data
that business users of the warehouse will need to know to be able to fully understand the information provided, and to trust in its reliability. The questions include:
- What are the valid values for the attributes of the data warehouse?
- What are the valid data sources for the data warehouse?
- When the data’s life cycle, in the operational world, should it be captured and sent to the data warehouse?
- What are the “cleansing rules” for the source data?
- What are the transformation rules to move the source data to the target database?
- How was the data calculated in the operational database?
Today, a common approach to answering these questions is to:
- Interview the business clients to identify their information requirements
- Review the system documentation
- Interview the application SME’s and attempt to identify the right source(s) and timing with their help
- Perform some data quality analysis on field content (we hope) and
- Specify and code the transformations.
There are potential weaknesses with this approach, primarily to do with accuracy and completeness. The following problems often arise:
- Technical and Business Subject Matter Expertise is lacking.
- Specifically, there is limited knowledge of what the data sources should be, and what they mean.
- There are multiple possible data sources to choose from. For example, which of the 10 Customer Files should be the source for Data warehouse?
- There is no documentation on how the source data was calculated or derived.
- There is no or limited documentation on what the valid values should be, and/or unexplained values are showing up in data quality analysis
Applying a Business Rule Mining approach to system sourcing and data analysis can help address these gaps, accelerate analysis, and improve its completeness and reliability. In fact, there are some
areas where, in the absence of business rule mining – or at least some type of rigorous, tool assisted examination of the actual source systems’ flow and code for rules – underlying
rules may not be uncovered at all. The chart below identifies the types of business rules that can be found during your Data warehouse development life cycle, and where business rule mining can be
|Business Rule Mining||Data Analysis/Data Quality||Logical Data Modeling|
|Terms (data elements and definitions||X||X|
|Facts (data relationships)||X||X|
|Constraints (on data element values)||X||X||X|
|Inferred Knowledge (knowing something about one data element state from another or other data element state(s)||X|
|Action Enabling (states that enable certain system responses||X|
In fact, Business Rule Mining can assist data modeling in the discovery of terms and facts, as well, so it really applies to all types of rules.
It would be overkill and not cost effective to apply Business Rule Mining to every attribute that will be included in your Data warehouse. You WILL want to take advantage of a Business Rule Mining
approach for the following areas:
- There are high impact metrics that must be accurate.
- There are values in your source data files that no one can explain.
- There is more than one candidate source for one data element for the target Data warehouse.
- There is a complex life cycle for a data element and it needs to be validated.
- The full extent of an element’s life cycle is uncertain.
- It is uncertain where and when to capture the element in its life cycle.
- There is minimal availability of subject matter experts.
- There is minimal system documentation available.
- Additional metadata is required and/or desired for the Data warehouse Repository.
The Business Rule Mining Process in Data Warehousing.
There are three major steps in Business Rule Mining. These steps were explained in the last issue of TDAN.com. However, that article focused on mining the business rules from a system as a whole.
In applying the business rule mining approach to data warehousing, many of the methods remain the same, but the focus is slightly different. In data warehousing, the perspective is that of a single
data element. The objective is to trace the data element’s life cycle in order to discover the right capture source and time, and to identify all the relevant business rules associated with
its creation, update and contents.
The major business rule mining steps defined earlier – system archeology, data and program inspection, and rule integration and validation, remain much the same with these variations in emphasis
Typically, you will start with a candidate file, display screen or report from that identifies the visible candidate element, (the tip of the ice berg). From here, the business rule mining steps
are applied as follows:
You will still want to do a full inventory of the system involved, and develop an overall system flow. This will provide the context in which to understand the data element life cycle as it is
unraveled. Otherwise, you’re always inside the forest looking out, examining one tree at a time. As noted in the first article on business rule mining, automated tools can greatly facilitate
this step. They can rapidly register all source components of a system in such as tool, and validate that you have a established a full list of components, provide graphical representations of job
stream flows, and create CRUD (create, update, modify, delete) matrices of data files by program.
The next step in archeology is a preliminary identification of all the potential synonyms for this particular data element. In the absence of an automated tool, this can be a difficult and
labor-intensive analysis process. Tools such as SEEC Corporation’s Reengineering Workbench can be applied to rapidly give you a list of candidates.
Having identified the synonym list, you can, with the use of such software tools, also identify the list of data files and program modules, within the total system flow, that need to be examined,
narrowing your target.
Having identified the domain of target system modules for business rule mining, and using the system flow to prioritize the sequence of mining activities, the next step is to mine the rules from
these modules, then integrate and validate the resultant rule set, as covered in the earlier TDAN article.
Always, Rule management must be established in order to Capture and manage this rule meta data for your data warehouse repository. Rule management can be expanded once rule stewards are assigned.
The rule repository can be utilized to assist in maintenance of the legacy systems.
As we said in the beginning, Business Rule Mining is not easy, but is sometimes necessary, such as when other sources of information are limited, the data life cycle is complex, or when multiple
candidate sources exist. In the case of the latter, business rule mining would be conducted on each potential source to point necessary to discriminate meaning and appropriateness for sourcing
across the candidates. The full mining exercise might then be completed on the chosen source.
The rigor of a structured Business Rule Mining process, combined with appropriate use of software tools now on the market, makes this process feasible, and helps ensure the completeness and
reliability of excavated metadata.
When thoroughly done, the business rules uncovered can be leveraged even beyond the Data warehouse. The rules so extracted represent a reverse engineering of the truly essential business logic from
the subject systems, at least as relates to the elements being mined. As such, they provide a basis for business understanding of the system, system maintenance, and forward engineering.
Business Rules in Data Warehousing