The preceding article in this series looks at some real-world case studies that suggest that starting with an enterprise-wide, top-down model can— at least under certain circumstances— deliver greater value than beginning by fighting your way through the excruciating detail forced onto you when doing bottom-up modeling.
This follow-on article identifies several scenarios where top-down modelling is a winner, and why some old-school data modellers resist such an approach in spite of the fact that this quality-at-speed approach is what the business people are demanding. If these modellers can’t or won’t engage with the business to deliver value in a timely manner, at best they may be undervalued— and at worst shunned.
This article, together with the preceding article, form a paper that is the first in a trilogy, arguing “why” top-down models should be considered. A second, companion set of materials articulates “how” to develop such models. But any model is useless shelf ware if it’s not applied. One of the most common applications of these models is the design of a Data Warehouse, and this is the topic for the third member of the trilogy, specifically using Data Vault as the platform for discussion.
However, it is important to note that top-down models can deliver tangible business value across to a variety of other scenarios such as formulation of enterprise strategies, delivery of MDM solutions, design of a service oriented architecture, and more. Those interested in just Data Vaults will hopefully enjoy the entire trilogy; others are encouraged to read and apply the first two parts, and then maybe add their own paper on how they applied top-down models to their particular area of interest.
When & Why to Start With Top-Down Modeling
Some Common Catalysts
There are plenty of scenarios where the business wants a high-level, not-too-technical data model of their organization. And they want it soon, not next year.
- Sometimes they want an IT strategy, and enterprise data models (as is, and to be) are foundational parts of the jigsaw puzzle.
- Similarly, maybe they want an overall business strategy, and an IT strategy as described above is but one component, albeit an important one.
- Maybe you want to implement a business rules engine, but it needs to express the rules using business terms. That’s where an enterprise high-level model comes in to help define a glossary of business terms.
- If you’ve got an Enterprise Service Bus (ESB) initiative underway, the data payloads of services can be shaped by considering the overall enterprise.
All of those have their challenges, but if you’ve got corporate mergers going on, the problems are magnified. I worked with one organisation that was formed by the merging of 83 autonomous agencies on the evening of June 30th one year. They all knew it was coming, but the resultant single organisation formed on July 1 had some serious data integration issues.
On a much smaller scale, an organisation with a few hundred workers wanted some “pain point” workshops. As a foundation for discussion using agreed, common terms, we assembled a top-down big picture enterprise data model, involving business and data professionals, in just two days.
The list of possible catalysts for you wanting a top-down big-picture model goes on and on. You can probably articulate your own drivers fairly quickly. But there is one theme I see again and again where top-down enterprise data modelling should be the starting point: enterprise data warehousing.
Data Warehousing
For those not familiar with Data Vaults, they can be seen as an evolutionary advance from earlier data warehousing approaches. It’s too big a topic to discuss the relative merits of dimensional data marts as put forward by Ralph Kimball, the so-called “third normal form” (3NF) approach initially described by Bill Inmon, and Data Vault as founded by Dan Linstedt. For the purposes of looking at the role of top-down big-picture modeling in the data warehouse space, I am restricting myself to the application of enterprise thinking to Data Vault.
Dan Linstedt was very direct in a blog comment on his website in late 2016. He said that an ontology built at the enterprise level was a “very, very important asset to the corporation”, and that practitioners “… must focus on ontologies [when] building the Data Vault solution …”. In the same blog, he states that “Data Vault modeling was, is and always will be about the business”, and is then very direct, claiming that “… if the Data Vault you have in place today is not currently about the business, then unfortunately you’ve hired the wrong people.” Ouch. Truth hurts?
If ontologies are so important in the Data Vault world, what the heck is an ontology? The Oxford dictionary gives two definitions. The first is its traditional, historical meaning: “a branch of metaphysics dealing with the nature of being.” I’m not sure about you, but the first time I read that, I didn’t feel overly enlightened. Thankfully, the second definition is closer to what we need in a Data Vault design. An ontology is “a set of concepts and categories in a subject area or domain that shows their properties and the relations between them.” [Underlining line]
Apologies to the Oxford Dictionary folk, but I am going to do a simple mapping of their second definition to Data Vault. Let’s assume that we’ve talked to the business and came up with a conceptual data model that has entities that represent the business concepts, major attributes (also known as properties) for each of the entities, and uses an entity-relationship diagram (ERD) notation. We also have relationships between the entities— it looks like we’ve got ourselves an ontology. Too easy!
Now all we need to do is to map the newly minted ontology to a Data Vault design. We can take the conceptual model’s bits and pieces, and use them to map the concepts to hubs, the relations to links, and the properties/attributes to satellites. Job done.
That’s a vastly over-simplified methodology, based on Dan’s highlighting of the need for ontologies.
I was in a phone hook-up with Mike Magalsky from Micron, a significant American manufacturer. The volume of the loads to their Data Vault is impressive – billions of rows a day. Mike claimed that one of the major reasons for the success of their Data Vault implementation was their use of a conceptual framework delivered with a few days of effort at the project’s outset. This framework guided their implementation, and proved to be particularly robust, requiring little change to the resultant core Data Vault hubs.
Why Some Practitioners Avoid Top-Down Modeling
I was recently having a coffee with some great people that inspire me (thanks for your friendship, Rob, Emma, and Natalia). The question was asked: “How come good practice recommends the creation and application of high-level data models that reflect business concepts, yet so many organizations don’t follow the recommendations?”
For an answer, I really don’t want to be confrontational, but I think we may need to take a good look at how some “data” people are behaving.
Institutes of Learning: Data Modeling
Graeme Simsion and Graham Witt’s book, Data Modeling Essentials, talks about the merits of developing top-down data model diagrams. But they go further. My interpretation of what they’re saying goes something like this. Normalization is taught mechanistically. Get a source into first normal form, then second, then third. Job done (assuming the student exercise has no need for higher levels of normalization).
The student now thinks he or she has done database design. Find the raw sources, normalize them, consolidate the results, and then generate the physical data model from the results.
These authors note that experienced modelers do it differently. They actually talk to the clients to get a business-centered view rather than sitting in technical isolation and constructing a bottom-up, technology driven database design. The seasoned modelers may not even consciously think about normalization – they collect business concepts, attributes and relationships, and a design follows. If they think about normalization, it’s an afterthought to crosscheck their top-down approach.
More Theory on Approaches: Fact-Based Modeling
Fact-based modeling and its variants have been around for a long time. Maybe it’s not as well-known as traditional relational modelling, but it isn’t going away. I don’t want to get into a debate on the relative merits of its notation. Rather, I want to make a passing observation on how it seems to be taught, at least by some.
Wikipedia’s notes on Object-Role Modeling (ORM) talks about how you start with examples at the instance level. You observe that “John Smith was hired on 5 January 1995”, and that “Mary Joe was hired on 3 March 2010.” From such observations, you see the patterns and record fact-types such as “Person was hired on Date.”
Such an approach, reinforced by what has been communicated to me by some fact-based enthusiasts, raises many concerns for me in the context of bottom-up modeling dangers.
- Let’s recall the example I’ve already shared where the developer for a school system looked at some sample data and concluded that all family members shared one Family Name and one Family Address. The fact-based approach may be at risk if there are not enough samples.
- To counter this scope issue, an analyst may collect a vast array of samples. This has its own problems – too many samples may make the fact-gathering effort laborious, and make it hard to spot the patterns in the vast array of “facts”. This can make the approach inefficient.
- Based on what I’ve been told by some about fact-based modeling methodologies, the samples provided are likely to reflect today’s facts, not tomorrow’s. There seems to be no reason why scenarios for the future couldn’t be provided, but that’s not how I’ve been told it seems to be taught.
- There seems to be an underlying assumption that the detection of the generalised patterns is deterministic. One set of samples must result in one set of “fact type” generalisations, or so the theory seems to go. In his book, Data Modeling Theory and Practice, Graeme Simsion exposes the flawed and dangerous thinking that one problem can and must have one solution, and any other design therefore must be wrong.
All of this seems to stack up as a bottom-up approach. Yes, you may eventually get to a definition of business concepts and relationships, but it’s a heck of a long journey, with uncertain results. Before the fact-based enthusiasts react, let me possibly save them the effort. I have aired this list with a good friend, Graeme Port, who is well-versed in fact-based modelling. He agrees with the dangers, and concedes that there are those who follow these risky approaches.
In contrast, he has the maturity and breadth of experience to do fact-based modeling refreshingly differently. I suggest that the dangers I see may fundamentally lie in how some teach and practice fact-based modeling, rather than being inherent problems with the modeling notation itself. In fact, a top-down model can actually be captured using this notation, and with the right tooling, can generate more traditional models.
Bottom-Up in the Data Vault World
Some of you may have seen the Data Vault tool vendor demonstrations. Quite a number of vendors point their tool at a raw source, press what I call “the big green ‘Go’ button”, and as if by magic, a Data Vault appears. It’s bottom-up, source-centric thinking. And it’s a big problem.
Let me be clear. The tools are fine. It’s the way they are shown to create what Dan Linstedt, the founder of Data Vault, opposes and calls “source system Data Vault”, that’s the problem, where Data Vault is seen as being all about bottom-up creation, based on how source systems see their entities and their keys. They, at times, assume a source table’s primary key is a business key, even if it’s some meaningless surrogate, and the tool that generates the hub might give it a name that is the table name plus the word “HUB” stuck on at the front or back, instead of being a meaningful business name.
Unfortunately, it’s not only some tool vendors that imply a bottom-up approach as the default way to build a Data Vault; some consultants carry the same message.
Thankfully there are those who present a different approach. Last year I attended the World Wide Data Vault Consortium conference in the US. Paul Watson-Gover presented a delightful analogy. He stated that if a customer wants a car, and he delivery a container of car parts, the customer won’t be happy. If the business customer wants a Data Vault with business data hanging around business concepts identified by business keys, don’t think delivery of source-centric raw Data Vault components means you’ve done your job.
If People Believe Top-Down Modeling Can’t Be Done in a Timely Manner, Why Would They Even Try?
Even if lots of “data” people have been taught to do bottom-up modelling, some will have understood the desirability of doing top-down modelling in certain cases. Now here’s the really sad bit. They may believe in its merits, but either think it will take too long to do an enterprise-wide top-down model, or simply don’t know where to start.
Which leads me to the next topic.
Patterns: The Secret to “Fast-&-Good”
One Australian telecommunications company put in an initial investment of 5 years to build an enterprise data model. Some years later, an ex-employee was at another telco. They wanted something similar, but in time for the start-up of three projects with data interdependencies – in three weeks’ time! Can it be done?
Martin Fowler, in Analysis Patterns, says the creation of an elegant model will give you payback in a simpler build and maintenance, but he notes (emphasis mine) that it can take time to develop this elegant model:
“People often react to a simple model by saying ‘Oh yes, that’s obvious’ and thinking ‘So why did it take so long to come up with it?’ But simple models are always worth the effort. Not only do they make things easier to build, but more importantly they make them easier to maintain and extend in the future.”
So elegant models are better in the long term. But are they useful in the short term? In the words of David Hay (again with my emphasis),
“… using simpler and more generic models, we will find they stand the test of time better, are cheaper to implement and maintain, and often cater to changes in the business not known about initially.”
Thankfully, David Hay, Len Silverston, and others have already given us libraries of reusable, extensible data model patterns. And not only do some of these authors provide kick-start patterns for our use in almost any situation (Len calls them “universal” data model patterns), some are pre-packaged for use in specific industries. To deliver to the telco in the three weeks demanded, I leaned hard on Len’s patterns for the telecommunications industry. And delivered a “sufficient” model on time.
Full Circle on Top-Down vs Bottom-Up Debate
Just as an example, if we intended to run three agile projects with data interdependencies, the diagram below portrays a top-down approach using patterns. We could start with a library of proven patterns, and then quickly assemble a pattern-based model whose scope reflects the entire enterprise (or at least enough to span the affected projects). From there we seed the agile projects with their own kick-start models that are a consistent subset of the larger picture.
That’s top-down use of patterns. But let’s turn this on its head. We’ve still got the same three agile projects to run, and they still have data interdependencies. We could start each discrete project by forming a kick-start Sprint Zero model from patterns. Later we can pull together the discrete project models and aggregate them to form at least the start of an enterprise data model.
We may be forced into such an approach if the projects begin months apart, and when the first project kicks off, there is no vision for ever having the second or third project. Here’s the magic. If each project works somewhat independently but uses the same library of patterns, we can get what I call “accidental integration”. We didn’t set out to have an enterprise data model as one of the responsibilities for the agile projects. But by building each with common patterns, the chance of integration is improved.
Based on this scenario, it looks like we can start top-down or bottom-up with the same outcome. Isn’t that exactly the opposite of what I have been saying all the way throughout this paper? Yes. It may be true that for this scenario, the case for top-down may be weaker. But this paper is suggesting that there are still a number of situations where a top-down approach is preferable.
Where To From Here
This paper looks at why you sometimes may want to start top-down.
One companion set of material provides guidelines for how you can quickly develop a top-down framework that may be sufficient. Even if your situation looks more like the bottom-up scenario above, the companion materials may prove helpful as they include a light-weight form of common patterns.
Another companion set of material is intended to be a collection of tips-&-techniques for applying top-down big-picture models to deliver tangible benefits to the business. I am progressively hoping that I, or others, will provide helpful hints for applying top-down models to facilitate the shaping of a Data Vault design, the development of IT and business strategies, the identification of services and their data payloads for service-oriented architecture initiatives, the seeding of glossaries to underpin business rules, the management of master and reference data, the governance of enterprise data, and last but not least, the integration of individual agile projects with corporate data objectives (The latter objective is addressed in my book, “The Nimble Elephant: Agile Delivery of Data Models Using a Pattern-based Approach”).
Even if your motivation for pursuing top-down big-picture modelling is only one of the above reasons, that’s enough. How much more of a case exists if you are involved in two or more such initiatives. The multiplier effect kicks in as the investment in one area contributes to another related area. For example, if several agile projects produce data that will be required in a Data Vault, why not base all such projects on a common data architecture? Everybody wins.
Finally, I end with a quote from “Design Patterns” by Gamma, Helm, Johnson & Vlissides that I feel carries the message that top-down modelling delivers greater flexibility for the future than the strict disciplines of bottom-up modelling of today’s perceived realities:
“Strict modelling of the real world leads to a system that reflects today’s realities but not necessarily tomorrow’s. The abstractions that emerge during design are key to making a design flexible.”
This paper expresses one perspective. I hope it has proven to be helpful to you. But I do sincerely encourage you to continue the conversation by offering your own experiences and views. Let the conversation begin!
Acknowledgments
A number of the ideas in this paper have resulted from the interaction over years with many great people. A recent “thought leader” that has sharpened the ideas is Rob Barnard, and it is with gratitude I acknowledge his contribution.