This is the fourth column in my series about applying Agile techniques to data projects in our Tips and Tricks from the Trenches Column. This column’s goal is to share insights gained from experiences in the field through case studies. Painful experiences can sometimes lead to powerful lessons learned, and many lessons are won the hard way. This series will describe both how to avoid the pain and how to achieve success with winning strategies.
Summary of Agile Methodologies
In the first article in this series we described the Agile Process, plus we referenced another TDAN series on Agile projects. To provide a quick level-set for this article, suffice it is to say that change is inherent in the Agile process. Requirements are not completely known up front, but are discovered iteratively throughout the project. Data models, to be descriptive, depend upon data requirements to be mostly known up front. This has typically caused problems both for data modelers and the programmers who depend on those data models. These series of articles present various methods for data modelers to produce models that find a balance between flexibility and semantic adherence.
In the last three installments, we discussed some ideas for data modelers who are working on an Agile project. In the last column, we presented techniques in generic modeling that can be used in the absence of any prior information. Generic models can essentially model anything, but tell you nothing in their structure about the data contained therein. This column introduces patterns, which require a little bit of prior information, but less than is usually required. These techniques focus on bringing the level of abstraction down a notch and adding a little more intelligence into the data structures, but still remaining flexible.
There is a wonderful book entitled The Nimble Elephant: Agile Delivery of Data Models using a Pattern-Based Approach by John Giles. This book is mandatory reading for every data modeler working on an Agile project. We will dive into some of his suggestions in this column. In addition, it is impossible to discuss the subject of Data Model Patterns without giving proper recognition to the two “fathers” of patterns: David Hay and Len Silverston. Both authors have written a series of books on the subject that are extremely helpful. This column serves as an introduction to data model patterns; the area is rich and the models are intricate and extensive. An exhaustive coverage is well beyond the scope of this column. It is hoped that this column will whet your appetite and will make you want to read more and delve into some of the books by these authors.
Introduction to Data Model Patterns
Many things in the world are similar and have a similar structure. The generic approach to data modeling, as explored in the last column, does not really “bake” into the data model any structure at all. But many things in the world can often be observed to have many commonalities based on their function or classification. These commonalities can often be expressed in data model patterns.
A familiar example of the pattern approach at work is the pervasive usage of accounting COTS (Commercial Off-The-Shelf) software products. This is due to the massive standardization of the field of accounting, as represented in standards like GAAP: Generally Accepted Accounting Principles. These standards make the accounting problem space easier to use patterns for modeling data: Both David Hay and Len Silverston provide extensive data model patterns for financial management.
Another common example involves the modeling of organizations. Organizations have addresses such as mailing and/or geographic location. There are many types of organizations, such as profit, non-profit, government entities, and medical, to name a few, but all organizations have addresses.
There are other common entities that also have addresses, such as people. The data model pattern called “Party” was created to generalize person and organization. See Figure 1 below. This model provides the flexibility to track attributes specific to either Person or Organization, and also track attributes common to both, by including them in the Party entity.
As mentioned above, both Persons and Organizations have addresses. The Address entity can be linked to Party, which means that an address can belong to either a Person or an Organization. See Figure 2.
Each subtype of Party can also store attributes or have relationships specific to each that don’t involve the other. Figure 3 shows a few attribute examples and some relationship differences.
The Party model usually incorporates a “Role” Entity. Both Persons and Organizations can play roles with respect to the universe of discourse you are modeling. For example, a retail firm has Customers. A Customer can be either a Person or Organization. Therefore, a Customer is a Role played by a Party. See Figure 4 below.
Figure 4 shows a many-to-many relationship: One Party can play multiple Roles; for example, John can be both a Customer and an Employee.
A great feature of this modeling method is the ability to handle point-in-time. The many-to-many relationship is resolved through the creation of a concept known as a Role Type, such as Customer, related to a Party through the intersection entity called Party Role. This provides a vehicle to relate the Role to the Party with begin/end dates. Thus we see the usual manifestation of this pattern, known as Party Role, shown in Figure 5. A Role Type in the example would be “Customer,” and the Party Role entity connects the Role Type to a specific Party. John Doe, a Person, was a Customer (Role Type) starting on March 4, 2017 (Party Role).
Len Silverson has three Data Model Patterns books that are chock-full of common patterns that are seen across industries; some are specific to particular industries. I have used some of the other pattern types (besides Party Role) including Retail (Order and Product), Insurance, and various flavors of Classification. Reference data also follows a common pattern.
Here’s another example of a classic pattern: Customers purchase products via an Order, shown in Figure 6 below. This is a very simple example. Ordering products encompasses an entire chapter in Silverston’s Volume 1 book. And Product itself has its own chapter in his book.
Figure 6 illustrates that an Order is a transaction placed by a Party playing a Role (Customer) in a point of time. As mentioned above, the Product part of this model has an entire pattern associated with it, and so does Order. The complexities of each can be handled with standard patterns. For example, Product variations can be based on their classification. This involves yet another pattern type: Classification. A simplified version of this is shown in Figure 7.
This Classification pattern can apply to all sorts of things, not just a Product. It is quite reusable! If you look closely, you can see the resemblance to the Party Role pattern.
Any basic pattern can be extended by adding additional patterns. For example, a Person playing an Employee role has Human Resources (HR)-related data associated with it that Customers don’t have. These HR entities also follow patterns, so you can use these patterns to extend your model. Silverston’s Volume 1 Pattern book has a chapter dedicated to HR patterns. In the same way, the pattern above can be extended by adding all the nuances in the Product pattern as outlined in Silverston’s book.
Use of Patterns in Agile
These patterns are very well understood, and they are extremely versatile. It seems I always begin an Agile project with some variation of the Party Role pattern; almost all projects have some sort of Person and Organization data that they need to capture.
How do patterns work in Agile? When you gather your first set of basic requirements, you can easily figure out what sort of basic pattern is being expressed. You can then begin your modeling with this pattern. As more requirements are gathered, the model may need to be tweaked here and there, but the basic structure generally remains the same. There are many variations of the patterns, and you can follow these variations.
John Giles brings up a project he worked on that was focused on financial lending data. The firm had over 200 types of agreements. The data model pattern approach uses the Party Role type pattern and adds Agreement, plus several other entities specific to his problem domain (Asset and Location). This basic model allowed the flexibility to model all the 200 agreement types plus others that have yet to be discovered.
Another technique you can use is a combination of a data model pattern with the generic modeling approach we introduced in our prior column; here’s the link to the last article for your convenience. You may want to model many different things in your application using the Classification pattern, including Products, but are not limited to Products exclusively. Or, you can replicate the Classification pattern if you need specification for a particular situation.
Interestingly enough, the patterns can get very complex, and can consequently handle many different data nuances. However, it must be said, like in our last article, the more specificity you add to your models, the less flexible they become; instead, the more expressive the model becomes. Hence you always trade flexibility for expressivity and description. Silverston explains this principle very well in the first few chapters, and shows several ways to model the same thing, one being more generic than another, which is similar to the points we made in the last article in our series. This is something you must always keep in mind when modeling in an Agile environment. Obviously there will be some decisions made in this regard. Sometimes you can judge the amount of flexibility you will need by how much is known in advance. But other times you may not have much foreknowledge, and it may prove to be wise to err on the more generic patterns and approaches.
Extending a Pattern
Many times, you may have to extend the patterns to model appropriately.
Sometimes, the extension can be accomplished by simply adding a few more patterns of different types onto the model. The addition of HR patterns to the basic Party model is an example of this.
Situations I have come across where more custom extensions are necessary stem from the specific goals of an application, nuances in how a specific organization does business (business process), or specific functions of their business or expertise which mandate specialization. A good example of this is product features. Every business has specialty products. But even product features can be generalized if so desired, using generic modeling. An extension may be required, however, when a particular feature is tied to something specific, such as rules, and the rules are very important to the business to track.
Often, an enterprise decides to custom build software versus buy a commercial-off-the-shelf (COTS) software package. The reason behind such a decision is sometimes rooted in the unique nature of the business; it doesn’t easily fit common business models. An example of a business like this is Fannie Mae: Their business involves selling mortgage-backed securities. They are a kind of hybrid business – they differ from stocks/bonds, mutual funds and mortgages. This is a case where doing business is not a “one size fits all.” Yet, even in a very unique business, they still have organizations, persons, and products. Some patterns can almost always apply; yet special extensions are needed.
A similar situation where extensions may be necessary is when the enterprise has a very unique business process that is different from its competitors. I have seen this in the pharmaceutical industry, which their processes may differ based on the type of drug or disease they are focused on. Different processes necessitate tracking different attributes and relationships.
Before you add a lot of complex extensions however, look before you leap! It may not be necessary. For example, I worked on an application requiring medical diagnostic codes. I was attempting to model one of the diagnostic code systems (in this case, ICD-10) and was using the coding manual for guidance. It turned out that the application did not require all the rich distinctions in the coding manual; we were able to stay fairly high level and achieve the functionality required. The coding manual has many extensive hierarchies. We did not need to capture the hierarchies and the parent-child relationships for our purposes. All we required was the code and its value, so a very simple model was needed. This is a very important lesson to learn in Agile projects: don’t try to over-complicate things. Many times, simple is best. In science, this principle is called “Occam’s Razor.” In the next column in the series, we will discuss Agile data warehousing. The “father” of Agile data warehousing is my long-time friend and colleague, Ralph Hughes. He has written two books on the subject. So if you’d like to do some reading ahead, grab a copy of his book.
 John Giles. The Nimble Elephant: Agile Delivery of Data Models using a Pattern-Based Approach. Westfield, NJ: Technics Publications, 2012.
 David C. Hay. Enterprise Model Patterns. Bradley Beach, NJ: Technics Publications, 2011. Chapter 11, Accounting, page 221.
 Len Silverston. The Data Model Resource Book, Volume 1. New York: John Wiley & Sons, 2001. Chapter 9, Accounting and Budgeting, page 259.
 Silverston, Chapter 4, Ordering Products, page 105.
 Ibid, Chapter 3, Products, page 69.
 Silverston, Chapter 9, page 299.
 Giles, ibid. page 37.
 One example is different ways to model party relationships in Silverston, ibid, Chapter 2, People and Organizations, starting at page 39; another example is different ways to model hierarchies in Silverston and Paul Agnew. The Data Model Resource Book, Volume 3, Indianapolis, IN: Wiley Publishing, 2009. Chapter 4, page 133.
 Ralph Hughes. Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum. Waltham, MA: Morgan Kaufmann. 2012 and Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders. Waltham, MA: Morgan Kaufmann, 2015.