Join me in welcoming Robert Lutton, Sandhill Consultants and their new column ‘Data Management 20/20’ to the pages of TDAN.com. RSS
Integrating Data Modeling
with Data Governance
Most authorities on data governance stress that governance processes must be integrated into project activities and ongoing data management tasks. Governance describes the oversight required to plan, build, run, and monitor the operational activities. Most organizations do some level of data modeling. The construction of data models captures vital business information and should be managed with some formality so it can be leveraged as a valuable resource. When people think of data governance, they have visions of a big governance initiative with new technology, high level support, and funding. This is more of a grass roots approach where the modeling tool provides most of the technology and the governance processes provide just enough oversight to provide a consistent methodical approach to the design of data.
The integration of data modeling and data governance is a lot like data manufacturing. When something is manufactured it must have a consistent and repeatable construction process, a bill of materials, tools to construct it, people to do the work, and quality control standards and processes to identify, address, and prevent defects. The result is a Standard Data Object that acts as a building block to form the structure of a data model.
Policy
Policies are statements of intent that govern data design and guide the decision-making process. Policies are tied to the standards and procedures that enforce them. The policies that direct the design of data must also align with the business policies, strategies, goals, and other things that motivate the business. Below is a sample listing of some data design policy statements that support the business objective of data interoperability.
- Data design must facilitate data sharing
- Data design must enhance data understanding
- Data design must adhere to documented standards and procedures
A policy without enforcement is simply a guideline— be careful not to mix the two ideas. Guidelines are important in informing and shaping behavior. Over time, guidelines may develop into best practices. A policy is a behavior that you must do, and a guideline is a behavior that you should do. You will have a mixture of policies, guidelines, and best practices that direct the intended behavior.
Standards
The primary by-product of the data modeling process is metadata. Standards form the rules to enforce the policies. Data standards are the measuring stick for metadata quality. Data standards can include everything from how something is named and defined, to how it can be used. Without data standards you will never be able to measure the quality of your output. A data model scorecard or similar mechanism is useful in monitoring the level of quality of your data models. The adoption of a recognized standard, such as ISO 11179-5, is a quick way to build a set of standards around metadata creation without having to assemble them from scratch. Defect identification becomes much easier when you have a standard in place.
Change Control Process
One of the primary tenets of data governance is that change must be managed. Documented procedures must be in place to manage how change is to be made, who is responsible for the change, and a review process to validate and accept the change. Having easy-to-follow quality control procedures for model development promotes consistency and in turn, consistency builds trust. A set of published standard operating procedures removes the ambiguity around building and assessment of data models. Data models are not static, business is constantly evolving, and data needs change all the time. Establishing a change control process is an effective method of achieving better quality through the reduction of defects and communicating the process to individuals inside and outside the data management organization.
The process model below illustrates the workflow for reviewing a candidate data object for acceptance as a Standard Data Object (see figure 1). The check against the standard ensures that the submitted object meets the criteria for names, definition, abbreviation, and other pertinent characteristics. The candidate data object may be of sufficient quality to pass the standard but is it good enough to do the job it was designed to do. Verification evaluates the fitness of the data object. Once the candidate data object passes the quality control process, it becomes a Standard Data Object. Standard Data Objects are contained in a library where they can be reused in other models.
Figure 1: Sample Process Flow
Reusability
Data model objects are sometimes copied from one model to another without regard for integrity. This leads to error prone, redundant models that require extra effort to manage. It also reduces trust. If we go back to our data manufacturing analogy, a Standard Data Object is part of a warehouse of common data components that can be used to assemble new models without starting from scratch or a questionable source. A Standard Data Object has already undergone the quality control process and is a trusted component that is free of defects.
It is no secret that application development and data management teams are sometimes at odds with each other. A common complaint is that the data team’s lack of agility slows down the development team. Part of the problem stems from a lack of reusability. The more data objects that can be reused, the shorter the design time.
Measuring Reuse
Another important process in data governance is measurement. Standards that will help identify nonconforming data objects and change control procedures will help bring the nonconforming objects into compliance. It does not help with measuring overall progress toward the goal of defect-free data objects that enable greater agility. The measurement we need comes from the level of reuse. A Managed Data Design Program may start out building everything from scratch using quality control standards and processes to guide the development. Overtime more and more data objects can be evaluated and reused, thus reducing the development time required to produce quality data models. The percentage of reuse relates to the level of agility, trust, and traceability. A pie chart can show how much of the model is built from reusable parts (see figure 2).
Figure 2: Sample Reuse Report
Communication
Communication is the key to promoting the awareness and adoption of the Managed Data Design Program. Developing the policies, procedures, data objects, and supporting documentation is a significant investment in time. There is no reason to keep it to yourself. Consumers of data models include both business and technical audiences. Knowledge collaboration software is ideal for publishing content about the data modeling program and allowing people to comment.
Summary
Establishing a Managed Data Design Program is a necessary foundation for data as a resource. Data models contain the structure and meaning of data and data governance controls the processes that create and manage data. The convergence of governance and modeling provide for quality data content. Business management consultant Peter Drucker is famously quoted as having said “You can’t manage what you can’t measure.” I’d like to extend that thought a bit by saying:
- You can’t find what you don’t consistently name
- You can’t understand what you don’t define
- You can’t measure what is not predictably consistent
For more specific details on building quality data models see my recent TDAN.com article Improve Data Governance in Your Data Models.