Published in TDAN.com October 2002
One of the reasons that data modelers are tolerated is because they produce a nice wall-size diagram. It contains a lot of information and saves the time of going through pages and pages of
documentation. The data modelers are very often judged by this most visible deliverable. In spite of both real and perceived importance of the diagrams, we usually fail here miserably. Most of them
are just a mess of lines and boxes laid out without any rhyme or reason. Instead of being a helping tool, they hinder understanding and turn people away.
The subsequent material is devoted to improving graphical representation by precise and meaningful layout and a well-defined color scheme.
2.1. LAYOUT: GENERAL PRINCIPLES
The following are some general diagramming rules:
- Aligning edges of the graphical elements.
“Line up … along the same grid line(s) throughout a piece” [Siebert92]
- Using white space to group related graphical elements together.
“Elements that are close together look like they belong together” [Siebert92]
- Making sure the diagram as a whole does not fall over.
Diagram as a cutout construct should be able to stand on its own, not fall on its side.
“Balance: An equal distribution of weight… Because our own balance is so important to us, we feel uncomfortable when we see something that isn’t balanced. (We avoid leaning trees.) The same is
true of a layout. If a layout isn’t balanced, readers feel uneasy – they feel something is wrong with the page” [Siebert92]
2.2. LAYOUT: BASIC ELEMENTS
The entity boxes and relationship lines are the basic graphical elements. One challenge is that the size of an entity does not always correspond to its importance. The box size depends on the
length of the entity name, and the number of associated relationships. For example, a minor, insignificant associative entity might graphically take a lot of space to accommodate its long name and
all the lines hooked to it.
So the entity position on the diagram rather than the size should take a meaningful role. We are used to reading books from left to right, from top to bottom. The diagram should read the same way.
To get that effect, we position the dependent entity to the right and below parent entities.
I also tend to show
- Hierarchies (such as organizational, product, etc.) vertically from top to bottom
- Time-dependency horizontally from left to right.
2.3. LAYOUT: RELATIONSHIPS
Circular references and multiple relationships between two entities should not be allowed [Reingruber94]. So the model with only transactional data has a relatively small number of relationships.
Advanced applications, which store business rules, tend to have a higher number of relationships. Data warehouses have even higher number with each fact table connected to each dimension. A
sampling of models showed 2-3 relationship lines connected to each entity for operational databases, and 3-5 for data warehouses. It becomes difficult to trace the relationships.
Techniques to improve the situation include: – Follow the ‘no-line crossing’ rule whenever possible, – Draw straight parallel lines whenever possible, – Combine relationship lines into beams; a
whole beam becomes a visual element instead of individual lines.
2.4. GRAPHICAL PATTERNS: STRAIGHT LINES
Let’s take a look at an example of layered positioning with straight lines (Fig.1) The grandparent is above the parent, which in its turn is above the child entity, etc. The most abstract concepts
get to the top, and the most detailed are at the bottom. It was music to my years when after only five minutes with the model my database administrator said: “all the data below the INSPECTION
entity will be uploaded from a handheld device”. He intuitively understood the convention and started to use it right away.
2.5. GRAPHICAL PATTERNS: RELATIONSHIP BEAMS
When model complexity increases, it is no longer possible to use just straight lines and there is a need to introduce something different. Fig.2 shows examples of “relationship beams” -A main
business entity with a number of code (static) entities -Multiple subtypes. An agent performs a number of activities on behalf of a client, such as making calls, sending letters and e-mails. There
is a relationship beam from the agent to three activities, and a similar one from the client.
2.6. GRAPHICAL PATTERNS: DATA WAREHOUSE
On Fig.3 all dimensions are at the top, and the facts are at the bottom. Relationship lines from each dimension are combined into beams with lines dropping into individual facts. Introduction of
beams reduces the clutter, because the number of beams is much less than the number of individual lines. It is also much easier to trace the beams of lines, instead of each line separately,
especially if a diagram spans multiple pages.
3. COLOR SCHEMES
Color augments a flat diagram with a third dimension. There is couple of common sense rules:
- Use pastel “see-through” colors. Make sure that the majority of people can differentiate each color.
- Limit the number of different colors. I saw some good quality diagrams with 8-10 colors. But the usual recommendation is to stick with 3 – 4 -Define and use a consistent color scheme
- Always include a color legend on the diagram
3.1. COLOR SCHEMES: DOMAIN-INDEPENDENT
The following are some domain-independent color schemes.
Their advantage is that they can be re-used across models, establishing consistency.
3.1.1. DEVELOPMENT CYCLE SCHEME
This scheme shows where each entity is in the development cycle.
In this example (Fig. 4), it separates release 1 entities from release 2 entities.
3.1.2. ABSTRACTION-LAYER SCHEME
This scheme (Fig.5) separates entities into layers, re-enforcing the layered positioning patterns discussed earlier. The three layers are:
- The middle layer is operational data.
- Above operational data is the meta-data level, or the business rule entities. The relationship from a business rule entity to an operational entity is of ‘classify’
- ‘be an instance of’ type. In this example: ELEMENT classifies INSPECTED ELEMENT, and INSPECTED ELEMENT is an instance of an ELEMENT.
- Below operational data is summary, history or archival data. It is the data derived from operational data. In this example: HAZARD WEEKLY and HAZARD MONTHLY are summarizations of the DOCUMENTED
HAZARD over different periods of time.
This is the scheme, which I use most often.
3.3.3. FOUR ARCHETYPE SCHEME
The Four Color Archetypes by Peter Coad [Coad99] includes:
- The ‘party (person or organization), place or thing’ archetype
- The ‘catalog-entry-like description’
- The role (a way of participating in something.
- The moment-interval (a moment in time or an interval of time that you need to track or do something about)
The book [Coad99] is devoted to the models designed and presented in this color scheme.
3.2. COLOR SCHEMES: DOMAIN-SPECIFIC
3.2.1. SUBJECT AREA SCHEME
I remember how much fun we had while picking out colors for subject areas. Greenish colors were slated for all money related subject areas, grayish colors for workflow, etc.
But with 20 to 30 subject areas for the enterprise model, it is just too many colors for the reader to follow. I don’t believe the subject area scheme to be useful.
3.2.2. IMPLEMENTATION SCHEME
On Fig.6 the diagram is color-coded by the target database: – Entities owned by the server-based database are in one color – Entities owned by the mobile handheld database are in a different color
Please note that an entity can be used by multiple applications, but considered to be owned by only one.
What are the benefits of such diagram improvements?
- A systemic approach improves data model content.
- A professional looking diagram makes people believe it.
- Intuitive layout and color scheme arm developers with additional knowledge.
- Better understanding of data shortens development time.
[Coad99] – Peter Coad and all, Java modeling in color with UML. Prentice Hall: NJ (1999)
[Siebert92] – Lori Siebert and Lisa Ballard. Making a Good Layout. North Light Books: Cincinnati (1992).
[Reingruber94] – Reingruber Michael, William W. Gregory. The data modeling handbook: a best-practice approach to building quality data models. A Wiley-QED Publication: New York (1994).