If you are a seasoned data professional, you appreciate well the history of data management. It begins in early days before normalization principles and continues into today’s world of complex Big Data.For you, normalization is so native it is instinctual. Some of you can name and define the various normal forms. Still, others automatically recognize a normalized data structure just because it feels right.2 It feels right because normalization is part of a rigorous theory applied to the characteristics of data.3
Our book and this month’s column resurrect the formal notion of normalization, giving it a new life in a new field. We use the word “normalization” in a strict sense along the lines of its usage in “database normalization.4 Specifically, we begin by exploring why some of us still love normalization because it was a major step forward in the evolution of data management. But more relevant today is whether the notion of normalization is relevant (and important) to assets other than data. Specifically is it possible to apply the rigor of normalization to business logic in decision models? If so, what is its value and how do we define and apply it?
Why Some of Us Still Love Normalization
Normalization is a fancy word with an important history. Its importance lies in the fact that the Relational Model and its normal forms provided the database field with a stable, scientific foundation. The premise of this column is that a similar (but necessarily different) set of normal forms for The Decision Model should accomplish the same. And, as data professionals, this is very good news indeed.
So, the remainder of this column reveals three decision model normal forms. It introduces their formal definitions and provides step-by-step illustrations using realistic examples.5
Part 1: A Critical time for The Decision Model
But first, why introduce decision model normalization now? The reason is that decision modeling is well on its way to becoming the standard technique for representing, managing, and automating business logic of operational decisions, regulatory compliance, and best practices. Since publication of our book in 2009, various approaches and software for decision modeling are emerging and will continue to do so. As decision modeling comes of age, it is time to pose an important question: Is decision modeling an art or is it partially based in rigor or science?
We believe that normalization is a cornerstone for true decision modeling. Without normalization, decision modelers can create useful diagrams, but these are not the same as delivering a more formal model. The Decision Model with normalization adds simple and practical rigor to the logic of business decisions in much the same way as the Relational Model did for data. So, let’s start by understanding the commonality between data and business logic.
Part 2: Seeking a Commonality between Data and Business Logic
It is important to point out that data and business logic are fundamentally different intellectual assets.
Therefore, their normal forms cannot be identical. However, the similarities between the Relational Model and The Decision Model are interesting:
- Each defines a technology-independent way of organizing an important, somewhat intangible asset
- Each is implementable in various technologies
- Each is a solution to an unsolved problem of its day.
With these points in mind, we can align their normal forms on the purpose of each normal form. This means recognizing that each normal form must constrain the corresponding model structure (i.e., data or business logic) so that the resulting structure delivers the full value of that normal form. In this way, the purpose of each normal form is universal across The Decision Model, the Relational Model, and any other usage that may emerge in the future.
Part 3: Where Normalization Begins and Why
First Normal Form is where it all begins. In fact, First Normal Form is required for data to be sound from a relational model perspective. First Normal Form is also required for business logic to be sound from The Decision Model perspective.
First Normal Form is required because it delivers a simple single representation for an entire model of data or of business logic. The important point is that the entire content of a model in First Normal Form is represented in one and only one way leading to one and only one set of governing principles. So, every structure in The Decision Model looks and feels the same as every other structure. Likewise, every structure in Relational Model looks and feels the same as every other structure.
For example, a relational model never contains hierarchical, networked, or indexed data structures that are visible to non-technical users. By design, it presents to non-technical users only relations adhering to First Normal Form.
Likewise, The Decision Model never contains variations of decision trees, decision tables, or other types of logical structures that are visible to non-technical users. Instead, it presents to non-technical users only Rule Families adhering to First Normal Form.
Part 4: A Refresher on the Normalization Process
While First Normal Form is mandatory, all other normal forms are optional, but desirable. The process of normalization starts with a structure in First Normal Form and simply decomposes it to deliver the highest integrity of data or of business logic. Highest integrity is reached when it is possible to make additions, deletions, and updates in one place and have these propagate throughout the model using pre-defined naturally occurring relationships. That’s because the entire model is a holistic deliverable, operating according to pre-defined principles as one integrated unit. In this way, the model is free of anomalies (i.e., errors) that can arise from insert, update, and delete activities.
Each higher level of normalization usually results in the decomposition of an original structure into multiple ones of higher quality. Any argument against more versus fewer structures theoretically makes no sense. What makes sense is choosing between higher and lower quality. That is what is at stake.
This ultimate simplicity of First Normal Form delivers business-friendly, rigorous, and easily maintained models. Let’s explore the first three normal forms for The Decision Model, one at a time, in detail.
Part 5: First Normal Form
As indicated above, the universal purpose of First Normal Form (for data and business logic) is to result in a model is represented and interpreted in one and only one way.
First Normal Form for data and business logic applies to the population of a structure already adhering to special properties. Specifically, such a structure has the following special properties to start with:
- Is two-dimensional,
- Entries in columns are of the same kind,
- No duplicate rows; each row is unique,
- Each column has a unique name,
- Sequence of columns and sequence of rows is insignificant.
However, the differences in normal forms between data and business logic start with First Normal Form. First Normal Form for data applies to a collection of individual attributes that together convey information. On the other hand, First Normal Form for business logic applies to a collection of logical expressions that together infer a conclusion.
First Normal Form for Business Logic
When a business logic structure with the above properties is translated into First Normal Form, the result is a business logic record that, in every row there is one and only one conclusion column (TDM Principle 5), at each row-and-column position there is always an atomic logical expression6 conforming to the heading (TDM principle 3), and all populated condition cells evaluate to true for the corresponding conclusion cell to be true (TDM principle 6).
Most people explain decision model First Normal Form as a set of rows in a Rule Family in which conditions are connected only by AND meaning there are no ORs, BUTs, ELSEs, YET, or OTHERWISEs.
Step by Step First Normal Form in The Decision Model
In the real world there are lots of business lookup tables. These come in all kinds of shapes and formats. The act of transforming a business lookup table into First Normal Form means recasting it in one and only one way so that everyone knows how to interpret it. Figure 1 is a real-world business lookup table7 as our starting point.
The transformation to decision model First Normal Form has three steps. The first step is to identify the Rule Family’s single conclusion column heading (Remember: TDM Principle 5). After studying the business lookup table in Figure 1 to understand how to read it, we can conclude that it provides the means for finding the Max LTV8 for various mortgage purposes, property types, and secondary financing. So, the Rule Family’s conclusion values are the percentages inside the cells of the second two columns.
The second step is to identify the Rule Family’s atomic condition column headings (Remember TDM Principle 3). After studying Figure 1 in search of conditions, we can conclude that its first column heading contains two conditions (i.e., an overloaded data field). These two conditions are Mortgage Purpose Type and Property Type. Further study reveals there is another condition hidden in the column heading of the second two columns. This condition is Secondary Financing.
The third step is to fill in the rows with populated condition cells leading to the corresponding populated conclusion cell constrained by using only ANDs between them (Remember TDM Principles 3 and 6). The resulting Rule Family (in First Normal Form) is in Figure 2.
Hopefully, you appreciate the clarity of Figure 2. Each column heading and its content are in atomic (i.e., non-decomposable) form and all populated condition cells lead to or infer the corresponding conclusion column cell. A subtle but important realization is that, because the populated condition cells lead to the corresponding conclusion cells, the conclusion column is functionally dependent on its concatenated populated condition columns. Also valuable is that every Rule Family in decision models have the same look and feel as the one in Figure 2.
These two properties: standard look and feel and functional dependency, set the stage for new, similar forms of normalization.
Part 6: Second Normal Form
The universal purpose of Second Normal Form (for data and business logic) is to eliminate functional dependencies involving only part of the identifier. That is, there are no partial key dependencies.
Second Normal Form for Business Logic
Second Normal Form in The Decision Model means business logic is already in First Normal Form and also every conclusion value is fully functionally dependent (i.e., inferentially dependent) on the entire set of populated condition columns. More simply, there is no populated condition cell that is irrelevant to deciding the corresponding populated conclusion cell.
Second Normal Form is easier to understand through an example than through its definition.
Step by Step Second Normal in The Decision Model
A careful look at Figure 2 reveals that it is not in Second Normal Form. That’s because Secondary Financing for Mortgage Purpose Type of “Primary residence” and Property Type of “1-unit” is irrelevant to the conclusion value. This is obvious because the value for Max LTV is “95%” for this Mortgage Purpose Type and Property Type regardless of the value of Secondary Financing. So, Secondary Financing is an unnecessary condition.
Unnecessary conditions are bad. They add unnecessary redundancy, which correlates to unnecessary complexity, and the undesirable likelihood of introducing errors during updates. Maintaining values for these rows for Secondary Financing is useless and error-prone, and these values have no effect on the conclusion. Removing the contents of those cells leaves two identical rows (Row ID 1 and 2) as shown in Figure 3.
Studying Figure 3, it is easy to see that it is not now even in First Normal Form. That’s because it contains duplicate rows (Row ID 1 and 2) which violates one of the starting properties.
Duplicate rows are bad. They are another example of unnecessary redundancy and its associated problems. Deleting one of these rows results in the Rule Family in Figure 4.
Figure 4: Second Normal Form
Part 7: Third Normal Form
The universal purpose of Third Normal Form (for data and business logic) is to decompose Second Normal Form structures to eliminate transitive dependencies in which non-key portions depend on other non-key portions.
Third Normal Form for Business Logic
Third Normal Form in The Decision Model means business logic is already in Second First Normal Form and also every populated condition is non- transitively functionality dependent (i.e, inferentially dependent) on other conditions. Simply stated, there is no condition column that actually represents a conclusion related to other condition columns.
Third Normal Form is easier to understand through an example than through its definition.
Third Normal Form Example for Business Logic
Since the Rule Family in Figure 4 is already in Third Normal Form, consider a different Rule Family, the one in Figure 5. This is a fictitious but realistic table of logic for determining if a certain type of property and mortgage qualify for a homeowner’s relief program.
Looking closely, notice that the conditions for Property Unit Quantity and Mortgage Unpaid Balance Amount appear to determine the value in the condition column Mortgage Unpaid Balance Amount Eligibility. Specifically, Row IDs 1-4 seem to correlate to an “Eligible” value in the condition column for Mortgage Unpaid Balance Amount Eligibility while Row IDs 5-8 appear to correlate to a “Not Eligible” for the condition Mortgage Unpaid Balance Amount Eligibility. Moreover, when the value of Mortgage Unpaid Balance Amount Eligibility is “Eligible”, the value in the conclusion column is “Eligible.” When the value of Unpaid Balance Amount Eligibility is “Not Eligible”, the value in the conclusion column is “Not Eligible.” If business experts validate this finding, it represents a transitive dependency among condition columns.
As you might have guessed by now, transitive dependencies are bad. Transitive dependencies are yet another example of unnecessary redundancy and its associated problems.
Eliminating this transitive dependency means moving the first three condition columns to their own Rule Family whose conclusion is Mortgage Unpaid Balance Amount Eligibility. The original Rule Family retains Mortgage Unpaid Balance Amount Eligibility as a condition along with Mortgage Origination Date. The resulting Rule Families are in Figure 6.
Notice what has happened. When transformed to Third Normal Form, the original First Normal Form structure becomes two Rule Families with an inferential relationship between them. The resulting two Rule Families also contain the fewest quantity of rows needed to represent the full logic. Interpretation and future updates become simpler.
Part 8: A Word about Higher Normal Forms
Originally, in 2009, we stated “The Decision Model is introduced in this book with three basic normal forms (first, second, and third). Higher normal forms are likely to exist. The higher the normal form, the more desirable the Decision Model structure and content.” As of 2013, we have discovered fourth and fifth normal forms in decision models. This is not a surprise. It underscores the value of The Decision Model and its science.
Part 9: Why this is Good News for Data Professionals
Imagine the value of normalization in delivering large, complex decision models where the entire model is in Third Normal Form or higher, such as the one in Figure 7. This means the entire model is of highest integrity with the minimal representation (i.e., least unnecessary redundancy) and is compliant with all corresponding integrity principles. This is when the useful diagram becomes a living formal model – a new business asset.
Figure 7: Real-World Decision Model all in Third Normal Form or Higher
Decision Modeling is a recognized practice in companies of all sizes and in all industries. Examples of new roles appropriate for data professionals with respect to decision models are:
- Decision modeler who translates business input into Rule Families connected together in decision models
- Decision model reviewer who seeks and fixes logic and normalization errors
- Glossary administrator who refines condition and conclusion headings with fact type business-friendly names, definitions, and domains
- Technical support person who links decision model column headings to object models and data sources.
Interesting Points to Remember
Below are the most important points to remember about decision model normalization. These should be very familiar to data professionals.
- Never be afraid to normalize decision models because higher normal forms deliver higher quality logic.
- First Normal Form is the most important and often the least understood. It dictates a universal structure from which higher forms of normalization are possible.Don’t be concerned if normalization delivers more structures in your decision models than expected. The goal is to achieve highest integrity. In fact, if the history of data normalization is proof, the more structures the better.
- Don’t feel compelled to memorize the normal forms. Simply follow your intuition regarding structures that feel right and those that feel wrong. As always, if you find unnecessary redundancy, do what comes quite naturally – decompose!
Most important of all, be sure your decision modeling approach includes the science of normalization. Otherwise, you are missing the most important part of decision modeling and its greatest value.9 As data professionals, you know this all too well.
- von Halle, Barbara and Larry Goldberg, The Decision Model: A Business Logic Framework Linking Business and Technology, © 2009 Auerbach Publications/Taylor & Francis, LLC.
- This was an observation by Date, CJ, An Introduction to Database Systems, Volume 1, Fifth Edition1990, Addison-Wesley Publishing Company.
- The Relational Model is based on a theoretical foundation which includes the notions of functional dependency, normalization, and set theory applied to the inherent nature of data itself and nothing more.
- There are general definitions of “normalization” meaning conformance to a standard or to make something normal. Our usage in this paper is more specific.
- Readers interested in examples of data normal forms, see Chapter 11.
- An atomic logical expression in The Decision Model is of the form “operator + operand”.
- This is a partial representation of the LTV/TLTV/HTLTV Ratio Requirements for Conforming Mortgages published at http://www.freddiemac.com/sell/factsheets/ltv_tltv.htm.
- Maximum loan-to-value ratio.
- Readers interested in a more in-depth analysis of the Relational Model and The Decision Model will find one in Chapter 11.