In the introduction of the book, Malcolm clearly states its objective: “to appreciate definitions in the context of information management.” In support of this statement, Malcolm lines up an impressive set of well-respected experts in the field, who all advocate the importance of clear definitions as a cornerstone of information management. The following quote from the book beautifully summarizes its contents, and why you should buy and read it:
Already in the beginning of the book, Malcolm puts a lot of stress on the justification for definitions: they are fundamental to understand business concepts (e.g., customer, subscription, account…) represented in data (e.g., in Excel tables, reports, databases…), they disambiguate terms (e.g., customer used in sales versus used in accounting), they clarify whether you are working with metadata or data (e.g., update date), they identify instances to their concepts, they are required for consistency in derivation or computation (i.e., as opposed to reverse engineering an algorithm hidden in a black box), they help to compare concepts in data mapping and they allow control over the drift in actual data in fields. Malcolm recognizes that defining is a large effort in source data analysis (SDA). He adds that existing definitions help avoid analysis rework and facilitate data integration.
The subject of semantics (as the study of meaning) is not a new one. Malcolm recognizes this correctly as his book comprehensively surveys the existing literature on definitions and relates it to the problems of data management. Nowadays, a buzz exists around the Semantic Web or Linked Data as new ways of handling the actual data. Rather than buzzing along, Malcolm looks at semantics as data professionals all over the world encounter the issue on a daily basis: “what does X mean” (when encountered in a report, database, meeting, XML…). Malcolm clearly explains this via the difference between a sign, a term, a concept, an instance, and how they fit together. Figure 4.2 in the book depicts this simply and clearly.
The book informs you of the necessity of building and managing definitions separately from particular applications, which is exactly how it should be. Throughout the evolution of information systems, there has always been a trend toward loose coupling as a means of keeping the complexity manageable: databases that manage data for different applications, the three-tier architecture, web services as more granular blocks of business logic, processes and rules as first class citizens reused in various applications…. The same must hold true for definitions: Once your organization has a definition for customer, you want it to be aligned and used within all your applications and systems, rather than silently fading out on a piece of paper. Malcolm identifies which data and metadata objects require definitions: database tables and columns, application screens and screen labels, application reports and report labels and interface files, as well as business concepts, entity types, their attributes and relationships.
The book also provides theory on definitions: real definitions versus nominal ones. Real definitions fully explain the nature of a concept, whereas nominal definitions explain the meaning of a word or term. Further-more, there is a typology of definitions (essential, distinctive, causal: purpose and cause, accidental, ostensive, stipulative, legislative and indefinables). Malcolm points to the importance of applying the right type at the right place, preferably by taking a strategic approach.
As definitions are an essential part of any data quality initiative, quality of definitions should be a given. Malcolm spends an entire chapter on different aspects of quality for definitions. From a methodological perspective, these can easily be built into your organizational approach toward definitions. This approach ensures that such an important cornerstone of your data management approach is of high quality. After all, it would not be very helpful to invest a lot of effort in what Bob Seiner calls “cheeseburger definitions.1” Malcolm clearly understands the issues that come with making definitions (e.g., “It is not possible to expect that all definitions will be complete at the outset. We come to know something initially and then gradually get to know it better.”).
Building and governing a layer of definitions is moving things down a vagueness funnel: inevitably you must start from ordinary terms in all of their vagueness, but gradually you need to push them down (e.g., by adding descriptions, examples, notes…) until the definitions are broken down into their concepts, facts and rules.
Further, Malcolm touches on the subjects of precision (the level of detail to which data is claimed to be reliable) and accuracy (the degree to which data really represents what it is intended to represent).
The book dedicates two chapters on two important aspects of definitions: scope and context. Scope is about drawing the borders of your playing field. Malcolm includes the following: populations or subpopulations, subclasses of a general concept and collections of different things. On context, Malcolm provides the following areas of an enterprise that can serve the context-providing purpose: subsidiaries, lines of business, horizontal business functions, geographical locations and applications. His view on the much-used example of customer is as clear as it is confronting (i.e., “What is the purpose of an enterprise-wide definition of “customer”?).
In one of the last chapters, Malcolm handles governance, a topic that is often seen as invasive, where it definitely need not be. According to the book, governance is about setting roles, responsibilities and rights concerning data. Again, Malcolm focuses on the need for openness, rather than hiding definitions deeply in fields particular to some legacy application. They should be stored, managed and published separately, available to anyone in the organization, under the right kind of responsibility (labeled as “trusteeship” in the book). Malcolm handles everything briefly in this chapter: from validation and verification to monitoring, evaluation and metrics.
The final chapters of the book focus on the metadata for definitions (nicely referring to Dublin Core), and insightful conclusions. Some examples:
- Given the right approach, executive management can appreciate the effort of definition making;
- Definition making is a process that needs to be run out of the scope of a single time-bounded project.
Eating his own dog food, Malcolm closes the book with a detailed glossary, definitions of Dublin Core, a good first metamodel and a non-trivial example to help you get started in defining your own business.
Malcolm’s book is limited in size, which makes it quick to read. It stays focused on its core topic, which it explains well and in great detail. Inevitably, there are parts that I am looking forward to finding on the book’s companion website (www.data-definition.com):
- An in-depth explanation on available standards to handle data definitions, such as the Object Management Group’s (OMG) Semantics of Business Vocabulary and Rules (SBVR). The content of this book seems to be in line with this standard.
- A view on the technology space to support data definitions and their governance.
- An overview of methodologies to achieve and sustain a body of definitions within and across organizations.
Overall, I would recommend this book to anyone who feels they need to get a grip on their business, whether you are starting a new business intelligence program (or already knee-deep in a data warehouse filled with unclear data), figuring out how to get your organization in line with complex regulations or just constructing new applications or databases.
To learn more about the book, or to purchase it, you can visit the book’s companion website.
- Definition of cheeseburger: a cheeseburger is a burger with cheese.