Tales & Tips from the Trenches: Data Model Patterns for Agile Projects

COL01xx - column image pls use itThis is the fourth column in my series about applying Agile techniques to data projects in our Tips and Tricks from the Trenches Column. This column’s goal is to share insights gained from experiences in the field through case studies. Painful experiences can sometimes lead to powerful lessons learned, and many lessons are won the hard way. This series will describe both how to avoid the pain and how to achieve success with winning strategies.

Summary of Agile Methodologies

In the first article in this series we described the Agile Process, plus we referenced another TDAN series on Agile projects. To provide a quick level-set for this article, suffice it is to say that change is inherent in the Agile process. Requirements are not completely known up front, but are discovered iteratively throughout the project. Data models, to be descriptive, depend upon data requirements to be mostly known up front. This has typically caused problems both for data modelers and the programmers who depend on those data models. These series of articles present various methods for data modelers to produce models that find a balance between flexibility and semantic adherence.

In the last three installments, we discussed some ideas for data modelers who are working on an Agile project. In the last column, we presented techniques in generic modeling that can be used in the absence of any prior information. Generic models can essentially model anything, but tell you nothing in their structure about the data contained therein. This column introduces patterns, which require a little bit of prior information, but less than is usually required. These techniques focus on bringing the level of abstraction down a notch and adding a little more intelligence into the data structures, but still remaining flexible.

Helpful References

There is a wonderful book entitled The Nimble Elephant: Agile Delivery of Data Models using a Pattern-Based Approach by John Giles.[2] This book is mandatory reading for every data modeler working on an Agile project. We will dive into some of his suggestions in this column. In addition, it is impossible to discuss the subject of Data Model Patterns without giving proper recognition to the two “fathers” of patterns: David Hay and Len Silverston. Both authors have written a series of books on the subject that are extremely helpful. This column serves as an introduction to data model patterns; the area is rich and the models are intricate and extensive. An exhaustive coverage is well beyond the scope of this column. It is hoped that this column will whet your appetite and will make you want to read more and delve into some of the books by these authors.

Introduction to Data Model Patterns

Many things in the world are similar and have a similar structure. The generic approach to data modeling, as explored in the last column, does not really “bake” into the data model any structure at all. But many things in the world can often be observed to have many commonalities based on their function or classification. These commonalities can often be expressed in data model patterns.

A familiar example of the pattern approach at work is the pervasive usage of accounting COTS (Commercial Off-The-Shelf) software products. This is due to the massive standardization of the field of accounting, as represented in standards like GAAP: Generally Accepted Accounting Principles. These standards make the accounting problem space easier to use patterns for modeling data: Both David Hay[3] and Len Silverston[4] provide extensive data model patterns for financial management.

Another common example involves the modeling of organizations. Organizations have addresses such as mailing and/or geographic location. There are many types of organizations, such as profit, non-profit, government entities, and medical, to name a few, but all organizations have addresses.

There are other common entities that also have addresses, such as people. The data model pattern called “Party” was created to generalize person and organization. See Figure 1 below. This model provides the flexibility to track attributes specific to either Person or Organization, and also track attributes common to both, by including them in the Party entity.

Screen Shot 2018-02-05 at 9.27.42 PM
As mentioned above, both Persons and Organizations have addresses. The Address entity can be linked to Party, which means that an address can belong to either a Person or an Organization.  See Figure 2.

Screen Shot 2018-02-05 at 9.27.49 PM

Each subtype of Party can also store attributes or have relationships specific to each that don’t involve the other.  Figure 3 shows a few attribute examples and some relationship differences.

Screen Shot 2018-02-05 at 9.27.57 PM

The Party model usually incorporates a “Role” Entity. Both Persons and Organizations can play roles with respect to the universe of discourse you are modeling. For example, a retail firm has Customers. A Customer can be either a Person or Organization. Therefore, a Customer is a Role played by a Party. See Figure 4 below.

Screen Shot 2018-02-05 at 9.28.06 PM

Figure 4 shows a many-to-many relationship: One Party can play multiple Roles; for example, John can be both a Customer and an Employee.

A great feature of this modeling method is the ability to handle point-in-time.  The many-to-many relationship is resolved through the creation of a concept known as a Role Type, such as Customer, related to a Party through the intersection entity called Party Role. This provides a vehicle to relate the Role to the Party with begin/end dates. Thus we see the usual manifestation of this pattern, known as Party Role, shown in Figure 5. A Role Type in the example would be “Customer,” and the Party Role entity connects the Role Type to a specific Party. John Doe, a Person, was a Customer (Role Type) starting on March 4, 2017 (Party Role).

Screen Shot 2018-02-05 at 9.28.17 PM
Len Silverson has three Data Model Patterns books that are chock-full of common patterns that are seen across industries; some are specific to particular industries. I have used some of the other pattern types (besides Party Role) including Retail (Order and Product), Insurance, and various flavors of Classification. Reference data also follows a common pattern.

Here’s another example of a classic pattern: Customers purchase products via an Order, shown in Figure 6 below. This is a very simple example. Ordering products encompasses an entire chapter in Silverston’s Volume 1 book[5]. And Product itself has its own chapter in his book[6].

Screen Shot 2018-02-05 at 9.28.24 PM

Figure 6 illustrates that an Order is a transaction placed by a Party playing a Role (Customer) in a point of time. As mentioned above, the Product part of this model has an entire pattern associated with it, and so does Order. The complexities of each can be handled with standard patterns. For example, Product variations can be based on their classification. This involves yet another pattern type: Classification. A simplified version of this is shown in Figure 7.

Screen Shot 2018-02-05 at 9.28.33 PM

This Classification pattern can apply to all sorts of things, not just a Product. It is quite reusable! If you look closely, you can see the resemblance to the Party Role pattern.

Any basic pattern can be extended by adding additional patterns. For example, a Person playing an Employee role has Human Resources (HR)-related data associated with it that Customers don’t have. These HR entities also follow patterns, so you can use these patterns to extend your model. Silverston’s Volume 1 Pattern book has a chapter dedicated to HR patterns[7]. In the same way, the pattern above can be extended by adding all the nuances in the Product pattern as outlined in Silverston’s book.

Use of Patterns in Agile

These patterns are very well understood, and they are extremely versatile. It seems I always begin an Agile project with some variation of the Party Role pattern; almost all projects have some sort of Person and Organization data that they need to capture.

How do patterns work in Agile? When you gather your first set of basic requirements, you can easily figure out what sort of basic pattern is being expressed. You can then begin your modeling with this pattern.  As more requirements are gathered, the model may need to be tweaked here and there, but the basic structure generally remains the same.  There are many variations of the patterns, and you can follow these variations.

John Giles brings up a project he worked on that was focused on financial lending data. The firm had over 200 types of agreements. The data model pattern approach uses the Party Role type pattern and adds Agreement, plus several other entities specific to his problem domain (Asset and Location). This basic model allowed the flexibility to model all the 200 agreement types plus others that have yet to be discovered[8].

Another technique you can use is a combination of a data model pattern with the generic modeling approach we introduced in our prior column;  here’s the link to the last article for your convenience. You may want to model many different things in your application using the Classification pattern, including Products, but are not limited to Products exclusively. Or, you can replicate the Classification pattern if you need specification for a particular situation.

Interestingly enough, the patterns can get very complex, and can consequently handle many different data nuances. However, it must be said, like in our last article, the more specificity you add to your models, the less flexible they become; instead, the more expressive the model becomes. Hence you always trade flexibility for expressivity and description. Silverston explains this principle very well in the first few chapters, and shows several ways to model the same thing, one being more generic than another[9], which is similar to the points we made in the last article in our series. This is something you must always keep in mind when modeling in an Agile environment. Obviously there will be some decisions made in this regard. Sometimes you can judge the amount of flexibility you will need by how much is known in advance. But other times you may not have much foreknowledge, and it may prove to be wise to err on the more generic patterns and approaches.

Extending a Pattern

Many times, you may have to extend the patterns to model appropriately.

Sometimes, the extension can be accomplished by simply adding a few more patterns of different types onto the model.  The addition of HR patterns to the basic Party model is an example of this.

Situations I have come across where more custom extensions are necessary stem from the specific goals of an application, nuances in how a specific organization does business (business process), or specific functions of their business or expertise which mandate specialization.  A good example of this is product features. Every business has specialty products. But even product features can be generalized if so desired, using generic modeling. An extension may be required, however, when a particular feature is tied to something specific, such as rules, and the rules are very important to the business to track.

Often, an enterprise decides to custom build software versus buy a commercial-off-the-shelf (COTS) software package. The reason behind such a decision is sometimes rooted in the unique nature of the business; it doesn’t easily fit common business models. An example of a business like this is Fannie Mae: Their business involves selling mortgage-backed securities. They are a kind of hybrid business – they differ from stocks/bonds, mutual funds and mortgages. This is a case where doing business is not a “one size fits all.” Yet, even in a very unique business, they still have organizations, persons, and products. Some patterns can almost always apply; yet special extensions are needed.

A similar situation where extensions may be necessary is when the enterprise has a very unique business process that is different from its competitors. I have seen this in the pharmaceutical industry, which their processes may differ based on the type of drug or disease they are focused on. Different processes necessitate tracking different attributes and relationships.

Before you add a lot of complex extensions however, look before you leap! It may not be necessary. For example, I worked on an application requiring medical diagnostic codes. I was attempting to model one of the diagnostic code systems (in this case, ICD-10) and was using the coding manual for guidance. It turned out that the application did not require all the rich distinctions in the coding manual; we were able to stay fairly high level and achieve the functionality required. The coding manual has many extensive hierarchies. We did not need to capture the hierarchies and the parent-child relationships for our purposes. All we required was the code and its value, so a very simple model was needed. This is a very important lesson to learn in Agile projects: don’t try to over-complicate things. Many times, simple is best. In science, this principle is called “Occam’s Razor.” In the next column in the series, we will discuss Agile data warehousing. The “father” of Agile data warehousing is my long-time friend and colleague, Ralph Hughes. He has written two books on the subject[10]. So if you’d like to do some reading ahead, grab a copy of his book.

bkoneil@mitre.org[1]

[2] John Giles. The Nimble Elephant: Agile Delivery of Data Models using a Pattern-Based Approach. Westfield, NJ: Technics Publications, 2012.

[3] David C. Hay. Enterprise Model Patterns. Bradley Beach, NJ: Technics Publications, 2011. Chapter 11, Accounting, page 221.

[4] Len Silverston. The Data Model Resource Book, Volume 1. New York: John Wiley & Sons, 2001. Chapter 9, Accounting and Budgeting, page 259.

[5] Silverston, Chapter 4, Ordering Products, page 105.

[6] Ibid, Chapter 3, Products, page 69.

[7] Silverston, Chapter 9, page 299.

[8] Giles, ibid. page 37.

[9] One example is different ways to model party relationships in Silverston, ibid, Chapter 2, People and Organizations, starting at page 39; another example is different ways to model hierarchies in Silverston and Paul Agnew. The Data Model Resource Book, Volume 3, Indianapolis, IN: Wiley Publishing, 2009. Chapter 4, page 133.

[10] Ralph Hughes. Agile Data Warehousing Project Management: Business Intelligence Systems Using Scrum. Waltham, MA: Morgan Kaufmann. 2012 and Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders. Waltham, MA: Morgan Kaufmann, 2015.

Share

submit to reddit

About Bonnie O'Neil

Bonnie O'Neil is a Principal Computer Scientist at the MITRE Corporation, and is internationally recognized on all phases of data architecture including data quality, business metadata, and governance. She is a regular speaker at many conferences and has also been a workshop leader at the Meta Data/DAMA Conference, and others; she was the keynote speaker at a conference on Data Quality in South Africa. She has been involved in strategic data management projects in both Fortune 500 companies and government agencies, and her expertise includes specialized skills such as data profiling and semantic data integration. She is the author of three books including Business Metadata (2007) and over 40 articles and technical white papers.

  • David Hay

    First, I definitely like the message Ms. O’Neil is conveying here. Data modeling is not such a burden that it should interfere with “agile” development. And it is true that there are some basic structures that, mastery of which will make the developer’s job much easier.

    My only problem is with the structures she showed.

    Now, it fairness, my problem is not with her as much as with Len Silverston, who’s book she apparently read. He has always contended that you could simply hang “Party Role” off “Party”. My problem with that has always been that, by definition, while all roles are typically played by a Party (Person or Organization), the term is not meaningful by itself. The role must be with respect to something: “Project Role”, “Contract Role”, “Geographic Role (Jurisdiction)”, etc.

    In Figure 6, she has each “Party Role” being played for more than one or more “Orders”. It should have been that each “Order Role” must be played by one and only one Party and played for one and only one Order. You can still have that each Order Role must be of one and only one Order Role Type, like “customer”, “vendor”, “shipper”, etc.

    Note that in fact each order should be for more than one Product, which would be addressed by making the Order composed of one or more Line Items, each of which is for one and only one Product.

    Now, if she’d read my book (“Enterprise Model Patterns: Describing the World”), she would have known that. :)

Top