One of the most illuminating lessons I’ve learned about data modeling came from Len Silverston’s universal data models, where he introduced what is now known as the party model, where a party is a person or an organization. This model teaches us, among other things, that “employee”, “customer”, “account holder”, etc., are not types of persons, but rather roles persons play. See Figure 1. The party model is essential to dealing with overlapping sets of parties showing up in different databases playing different roles. If we collapse identity and role, we can’t handle the fact that the same party might simultaneously be, for instance, a customer, an account holder, a beneficiary, and an employee. We end up with duplicate party identities, and consistent operations and reporting becomes almost impossible.
Roles Data Plays
It turns out that the separation of type and role applies universally, not just to parties. Even data itself plays roles. Consider names. We have names of people: nicknames, legal names, former names (such as maiden names), alias names, stage names, pen names. We have names of organizations: legal names, common names, “doing business as” names. We have names of countries: countries’ names in the countries’ official languages (and some countries have multiple official languages), names in English and French (as codified by ISO), common names, former names. Such names are interesting because they overlap so much. For example, “Gary” is a personal name and the name of a city (in the US state of Indiana). “Charlotte” is another personal name that is also a city name (in the US state of North Carolina). “Apple” is the name of a type of fruit and also the name of a company that makes computers. As a result, you can’t glance at a name in isolation and always deduce what role it plays.
How can we tell the difference between a type of name and a role a name plays? We can start by thinking about what a type is. A type is defined as that which designates a set. A major use for a type is to limit the values a variable can take on to only those values in the set that the type designates. This is very helpful in ensuring data quality. For example, we can count on a database field with the type date to contain only a valid date, and a field with the type Boolean to contain either the value false or the value true, and nothing else.
Here’s our first challenge. Looking at the various examples above, there is no clear separation between, for example, the set of personal names, the set of city names, the set of company names, and even the set of fruit names! So it becomes just about impossible to use a type to guarantee, for instance, that a personal name field is not accidentally populated with the name of a fruit.
The best we can do is to have a generic name type that restricts a database field to contain only a string of human-readable text: no control characters and no strings of special characters other than those used as punctuation. In other words, we define a name type as a restriction on a string type. (Those who have taken my NoSQL and SQL Data Modeling course know that a restriction on a type specifies a subtype.) This will eliminate obvious data quality problems; for instance, gibberish can’t be entered in a name field, nor can something purely numeric.
The distinctions listed above between various names are, then, not distinctions of type but rather distinctions of role: not distinctions between what sets of values names are drawn from, but rather distinctions between how names are used. Distinctions of role appear in a data model in the names of data attributes. Conventional data attribute names begin with the name of the entity type to which the data attribute belongs, followed by words qualifying the data attribute, and ending with a classword that implies the data attribute type. Role indications belong in the middle position, as qualifiers. In our case, the classword will be “Name”. Then, we might have:
- Person Given Name
- Person Family Name
- Company Legal Name
- Country Common Name
- City Name
- Fruit Type Name
One thought that helps distinguish between type and role is that roles can come and go, while type remains; also, an entity can play multiple roles at the same time. For example, a person may “become” (start to play the role of) an employee, cease to be an employee, and then become an employee again, all the while remaining an instance of the type “person”. During that same time that same person might begin to play the role of customer, or cease playing that role. Similarly, a single name can play the role of being a person’s given name, a family name, and a city name.
We’ve talked about how persons can play multiple roles, and how names can play multiple roles. Are there other roles that entity types, such as persons, can play, and that data, such as names, can play? Certainly! For example, in any relation/table, a data attribute might play the role of being a key (that is, a primary, alternate, or candidate key), or a component of a key. The same data in another relation/table might just be an ordinary attribute.
So, when analyzing a data problem, we might often ask ourselves, what is the type of this data? When we should also be asking, what role does this data play?
The Interaction of Type and Role
I hope I’ve helped you to see the difference between type and role, and the separation between them. Now that I’ve achieved that, I have to show you how roles can in fact determine types. Let’s go back to roles persons play. I can divide a set of persons into subsets based on the roles they play. See Figure 2. “Person playing Customer Role” is that subset of persons who are playing that role; similarly with “Person playing Employee Role”. The role a person plays, in effect, puts them into a subtype. The pentagon symbol indicating subtype has no X in it, meaning that the two subtypes are non-exclusive—a person could play both roles at once.
Why Does it Matter?
If one treats a role as a type, a data model and the database it describes can lack the ability to recognize that an entity or attribute can play multiple roles at the same time, and that roles can come and go while type remains constant. The resulting database can be plagued with problems of multiple identities for the same entity, and/or of complicated types that are just unnecessary.
This monthly blog talks about data architecture and data modeling topics, focusing especially, though not exclusively, on the non-traditional modeling needs of NoSQL databases. The modeling notation I use is the Concept and Object Modeling Notation, or COMN (pronounced “common”), and is fully described in my book, NoSQL and SQL Data Modeling (Technics Publications, 2016).