Traditional entity-relationship (E-R) modeling recognizes three different levels of abstraction at which models are developed:
— Conceptual: A conceptual model should be focused on things related to the business and its requirements.
— Logical: A logical model should be focused on the design of data about those things, but without reference to a particular physical implementation.
— Physical: A physical model should be focused on how the logical data should be represented and stored in a particular physical database.
Three Levels of Business
In parallel, data modelers recognize that models can be developed across three levels in a business:
— Enterprise: An enterprise model should represent those things that are common to every part of the enterprise.
— Domain: A domain model should represent those things that are specific to one part of the enterprise; e.g., a sales domain model, a fabrication domain model, or a finance domain model.
— Application: An application model should represent those things that are specific to a given application or system.
Pairing Up the Levels
Since both of these lists have three items, and both move from a high level to a low level, it is very tempting to pair them up, as follows:
— Enterprise conceptual model
— Domain logical model
— Application physical model
Here’s what’s wrong with this approach:
— There are more things that are common to an entire enterprise than just concepts. For example, messages, files, or documents that are exchanged across the enterprise have physical models that are common to the enterprise.
— A given business level has more than just logical data that ties its applications together. It is likely tied together by common concepts and common physical models.
— Even an application has concepts and logical data definitions.
In fact, there are four kinds of things with which we must concern ourselves in a data model:
— Real world concepts: Concepts are ideas. We have ideas such as orders, non-tangible products, etc. Categories of things (e.g., person, tangible product) are also ideas.
— Real world objects: We often forget that individual physical things in the real world are very important. For instance, individual customers are important, not just the category of “customer”. Tangible products often must be tracked at an individual level.
— Logical data: By this phrase we mean something conceptual, which is a design for symbols and values, and how we will use those symbols and values to represent concepts and objects. In this context, the word “logical” really just means “not physical.” We know we are better off by not making decisions too soon about how data will be represented physically. We are better off if we defer such decisions until we complete our logical data design. (In the world of big data, however, we will probably start with data as physically received and deduce its logical significance.)
— Computer objects: Computer memory is composed of billions of real world objects, each of which can have either of two meaningless physical states at any one time. We choose to organize these objects in such a way that we can attach meaning to the states, by declaring how the meaningless physical states represent our data concepts.
When we relate these four kinds of things to the three business levels of enterprise, domain, and application, you discover first of all that you need all four kinds in all three business levels, but to varying degrees.
— Enterprise model: An enterprise model should model all of the real world concepts and objects that are common to every part of the business. This will most likely include employees and products, and possibly also internal financial information that is reported to the general ledger. To the extent that data, like financial and human resource information, is shared in a particular form across the enterprise, its logical and physical specification should appear in the enterprise model.
— Domain model: Similarly, a domain model should model all of the real world concepts and objects that are specific to that domain and that don’t already appear in the enterprise model. The domain model should also include the logical and physical data shared between applications in the domain that aren’t already in the enterprise model.
— Application model: It’s unlikely, though possible, that a single application will deal with real world concepts and objects that aren’t already modeled in the enterprise and domain models. It is likely that a single application will have data specific to it for which application-specific logical and physical models would be appropriate.
In summary, all four kinds of things—real world concepts, real world objects, logical data, and computer objects—are relevant at all three levels of models, but different subsets of them are relevant to each. This amounts to a four-by-three matrix.
Let’s use a COMN model to help visualize this. At the top of the figure below we have a (rather incomplete) hypothetical enterprise model, with real world objects of customers, tangible products, employees, buildings, and vehicles. Real world business concepts include order, shipment, payment, and fulfillment. Data relevant to the entire business includes customer IDs and names, product IDs and descriptions, etc. The physical representations of this data as computer objects matter when they appear in messages and documents shared across the enterprise.
The Finance domain model doesn’t add any new real world objects, but it does add concepts such as ledger, balance, journal, post, and close. There is logically defined data related to these concepts, including the general ledger, sub-ledgers, journals, accounts, and other reference data.
When we get down to the journal entry application, we see a new real world object, which is a two-factor authentication chip that journal entry personnel must use to access the application. There’s a bit of new data logically defined, which is the raw, non-validated journal data that is entered into the application. The journal entry application has to allow a journal entry to exist in non-validated form so that it can validate it. It will only post a validated journal entry. That posting will go into the journal table, and use account and other reference tables.
This graphic is meant to illustrate how four different kinds of entities can appear at the three levels of models. If each of these models were complete, there would be relationship lines between them, labeled with relationship names, roles, and the meaning (predicates) of each relationship. COMN doesn’t force isolation of the different kinds of entities into separate models.
In summary, don’t confuse three business levels with three data model levels. There are really four kinds of things that can be relevant to a model at any business level.
This monthly blog talks about data architecture and data modeling topics, focusing especially, though not exclusively, on the non-traditional modeling needs of NoSQL databases. The modeling notation I use is the Concept and Object Modeling Notation, or COMN (pronounced “common”), and is fully described in my book, NoSQL and SQL Data Modeling (Technics Publications, 2016). See http://comn.dataversity.net/ for more information.