Why Data Models Cannot Work

A data model represents things of significance to an enterprise and their interrelationships. It does this in such a way that characteristics of these things can be identified and understood as discrete facts. Data is a stored representation of these facts. Thus, a data model provides an organized way of cataloging the things of significance to an enterprise in terms of the information that is recorded about them. Additionally, and very practically, a data model can specify a design for a database to hold this data.

Yet, can data models really do all of this? Do they tell us the truth, the whole truth and nothing but the truth about what it is they are purporting to describe? Or do data models have inherent limits that prevent them from meeting our expectations?

This is not the same thing as asking if data models can be done badly. There is no doubt that they can be, but this is more a reflection on data modelers than data models. What we are asking is whether data models that are done as well as they can be tell us everything we need to know about the structure of the data we are managing.

The Yardstick for Judging a Data Model

If a data model does represent characteristics of things that can be stored as facts in a database, then the data model should tell us as much (or maybe more) about these facts as we can find out from looking at the corresponding database. In other words, we should not discover more about the characteristics of the things of significance to an enterprise by looking at a physical database than we can by looking at the corresponding data model. A database is thus the yardstick for judging the data model from which it is built. Where is the “truth” – is it in the data model, or the database, or both?

Let us try an exercise by taking a very simple example of a data model that has a single entity and a database built from it that consists of a single table. The entity is Financial Instrument, which can be defined as an obligation issued by a third party that conveys an interest in ownership, debt or other thing of value. Common examples would be stock, bonds and options. Figure 1 shows the structure of this entity taken from the data model.

Figure 1: The Financial Instrument Entity from the Data Model

We now build a database table from the data model, and eventually this is populated with 5 records as shown in Figure 2.

 

Figure 2: The Financial Instrument Table in the Database

What Does the Data Model Tell Us?

If we are to ask what a data model is telling us, then we should have some expectation of the way in which it will provide answers.

I suggest that the best way to understand a data model is to find what propositions it is stating. Propositions are statements that can be judged as true or false, or indeterminate if they cannot be decided upon. They are a major component of traditional logic, which provides a set of rules for stating and inspecting propositions that are extremely useful in data analysis.

We can look at both the data model and the database to see what propositions can be mined out from both sources. Figure 3 shows the propositions that can be extracted from the data model. It also shows whether each of these propositions is true or false in absolute terms. That is, whether the proposition is true or false in what is generally termed the “real world” as opposed to the context of the data model or database. The reasons for judging certain propositions as false or indeterminate are given in Figure 3.

 

Figure 3: Propositions Extracted from the Data Model

Figure 4 shows propositions that can be extracted from the database table shown in Figure 2. It is possible that even more propositions could be extracted.

 

Figure 4: Propositions Extracted from the Database

Three things are immediately apparent:

  • A lot more propositions can be extracted from the database (17) than can be extracted from the data model (7).

  • Only three propulsions (P1, P3 and P4) can be extracted from both the data model and the database.

  • Some of the propositions extracted from the data model are false, and a subset of these (P2, P5, P6 and P7) can be proven as false in the database.

Thus, the database provides more information about the data being managed than the data model does. Furthermore, the database can contradict the data model on certain points, and on these points the database is right and the data model is wrong.

So the data model only tells us some things that are true. It does not tell us everything that is true, and it tells us some falsehoods that can be proven by the database it specifies.

 

 Figure 5: Analysis of Propositions that are False or Indeterminate

Analysis of Data Model Propositions

Now let us look a little further into the propositions that have been extracted from the data model to see what we can understand in general from them.

1.  Every data model expresses propositions about each entity it contains.

We are only examining what can be said of a single entity here. Obviously, data models also express propositions about combinations of entities and relationships.

2.  Every such proposition has the entity as the basis of the subject.

E.g., Every Financial Instrument has a Financial Instrument Description.

Here the entity is the subject. We may be able to phrase the proposition in some other way, such as “A Financial Instrument Description is a necessary attribute of a Financial Instrument.” However, we can always convert such propositions back to the form where the entity appears as the subject.

3.  Every such proposition has an attribute as the basis of the predicate.

E.g., Every Financial Instrument has a Financial Instrument Description.

4.  Every such proposition is universal.

E.g., Every Financial Instrument has a Financial Instrument Description.

This applies to all instances of the entity Financial Instrument. A universal proposition is one that is true for every instance of the subject, without exception. There are also particular propositions, which apply to only some instances, and singular propositions, which apply to only one instance. We never find particular or singular propositions in data models.

5.  Every such proposition is affirmative.

E.g., Every Financial Instrument has a Financial Instrument Description.

Nothing is denied, or stated negatively. Propositions can be affirmative or negative. We never find negative propositions expressed in data models.

6.  The only terms that can appear in such propositions are the entity and the attributes defined for the entity.

7.  A data model cannot distinguish attributes that are characteristics of the entity being modeled from attributes that represent other entities.

E.g., Record Number is not an attribute of a Financial Instrument. It is an attribute of a Record.

In general, the other entities seem to be related to metadata.

When we look at the database as well as the data model, we can see that there are additional issues.

8.  No data model can express a particular proposition about an entity and its attributes.

E.g., Some Financial Instruments have a Maturity Date.

This proposition cannot be expressed in a data model, but it is expressed by in the database table. Thus, the data model is not capable of describing how data is actually structured in the database.

9.  No data model can express a negative proposition about an entity and its attributes.

E.g., Some Financial Instruments do not have a Coupon.

This proposition is expressed by records 1 and 2 of the database table. Again, therefore, the data model is not capable of describing how data is actually structured in the database.

10.  No data model can use terms other than entities and attributes to express propositions.

E.g., Only Financial Instrument, CUSIP, Financial Instrument Type, Financial Instrument Description, Coupon and Maturity Date are available in the above data model.

Yet we need to use the terms “Equity,” “Bond” and “Option” to form certain propositions, such as:

No Financial Instrument that is an Equity has a Maturity Date.

These terms only occur as physical data values in the column INST_TYPE. Yet we need them to express propositions like the one above, but they do not exist in the data model.

Conclusion

The above analysis shows that data models have a structure that only allows us to express certain kinds of knowledge about an individual entity. Only universal, affirmative propositions that use the terms corresponding to the entity and its attributes can be expressed. Yet the underlying database can express much more knowledge about the way it is organizing its information, including particular and negative propositions. The structure of a data model can force it to misrepresent and ignore truths that are present in a database. Individuals who have to work with a database, be they business users or IT staff, need to fully understand databases. They will not be able to do so from data models.

Data models have some advantages and can be very useful. However, we cannot represent that they are the sum total of knowledge about databases. Enterprise-level information knowledge management will never be attained by producing a comprehensive set of data models.

Share

submit to reddit
Top