Data is an organization’s ‘prana’ (breath of life).
For virtually all organizations, we can assert:
- Data has grown like an ancient banyan tree (tracing each branched root is very challenging)
- Knowledge management of the data (metadata) is lacking or incomplete
- No one individual has a God’s-eye view of the data
- No one individual can definitively say what data is most important
- You don’t currently have a plan to get your data organized
- You’re always playing catch-up, just-in-time for implementations
- The data layer becomes increasingly complex, making everything worse.
———-
Data is gradually (your duration may vary) recognized by the organization as a significant choke point, significantly inhibiting:
- Realization of the business strategy (the Big One)
- Innovation and implementation of desirable features (as steady state spend increases)
- Productivity of staff (rework, excessive integration testing, time to insight, etc.)
- But – if you take time and effort simply to improve the data layer, you’re losing time devoted to key business functions and new possibilities.
So what is an organization to do? [1] The first step is very important and needs to be carefully selected. We’ll take a quick tour of some options to help savvy executives decide on a suitable launch point.
Some organizations engage in a major effort that addresses data management while also meeting other vital business needs, e.g., re-architecting a big portion of the data layer, such as modernizing the entire legacy data warehouse environment. We are all familiar with how complex, unwieldy, and expensive legacy data warehouses and marts can become, with redundant data, unreconciled new data, zillions of interfaces, etc. If there is plenty of budget for staff and technology, and strong planning skills that enable a multi-year transformation to be envisioned and executed, a program like this can prove to be an effective decision, as it were, killing many birds with one stone.
Most organizations are not ready to take such major risks. A more cautious approach is to select a single initial project, where both a tangible gain and better data management practices can be addressed at the same time. Advantages of this approach are: it’s easier to approve; it’s a smaller financial bite, therefore lower risk; it requires fewer staff resources; and it can become a template for increasing data knowledge and implementing sound data management practices.
- Business value
- Important for at least one major business goal: e.g., improved reporting; regulatory compliance; competitive advantage; or key mission effectiveness
- The need for the primary deliverable (system, catalogue, report) is understood, approved, and appreciated across business lines
- Few or no dependencies (e.g., get right down to business)
- Able to be completed in a relatively short timeframe (minimizing risk)
- Must engage multiple business areas, strengthening governance
- Must align with data management program needs – e.g. new or improved policies, processes, standards, templates, governance charter, etc.
- Must have a willing, engaged and active executive sponsor.
This analysis should yield one or more candidate projects, to which we then apply overall foundational principles for what every organization should “have, become, and do about data.” Examples of key accomplishments that align with these requirements include, for example: 1) data mastering for core entity types (e.g., client, product, etc.) and 2) data categorization. We’re focusing on categorizing data by importance factors, employing the commonly used term “Critical Data Elements” (CDEs).
Organizations create and manage thousands of data elements. For some business lines, e.g., financials, data sets are usually defined and well managed. For other data sets, knowledge may be spotty, i.e., which data elements business processes create and use, and the importance of each. We’re going to posit a scenario, outlining an approach to categorizing critical data elements, and associate that with DMM process areas – fundamental data management best practices.[2]
Critical Data Elements – Data Management Dependencies
We’ll think through a sample scenario, using a person-centric business (customer, patient, visa applicant, etc.) as our industry, and posit that that a master data hub has been implemented to uniquely identify and describe individuals (typically 20-25 data elements) for all consuming data stores. That enables the organization to achieve a strategic longer-term goal – to assemble and implement a complete profile of the customer for the use of all business lines.
For ‘customer,’ business lines benefitting from a timely, accurate 360 degree view would include (but not limited to): sales; marketing; customer service; risk; and corporate strategy. Each of these business lines could compile a list of what they most need to know about a customer, or about customers in the aggregate, and be prepared to state WHY. The collective group of lists is a good starting point.
Since our customer profile is starting with different lists, and will serve multiple suppliers and consumers, we’ll need to get them together. Initially, they’ll need to: (1) agree on a consolidated list (Governance Management); (2) categorize that list by importance (CDEs) and (3) uniquely define the business terms corresponding to those data elements (Business Glossary). CDE categories may include determining what data elements fall into these example classifications:
- Regulatory and Financial Audit
- Financial, Operational or Reputational Risk (e.g. potential failures of business process execution, missing revenue targets, etc.)
- Privacy / Sensitive (whether internally generated or regulatory considerations)
- Revenue and Growth (e.g. accurate client count, projected client growth, 360 view of the client)
- Product/Service Development and Growth (e.g., sales by customer factors [demographics, location, etc.], returning customers, etc.)
- Data Monetization (e.g., data elements/sets that are candidates for productization).
A CDE analysis and designation is a practical foundation for determining relative importance, which in turn applies to what data should be governed, enhanced, integrated, controlled, and applied to business process improvements and monetization opportunities. It is important to note that an organization may determine several levels of “critical” as appropriate, per the priorities of the sample categories mentioned above. Some data is shared less, or less critical for business processes, and may require less rigorous definition, specification, standardization, and mapping; these levels can be determined with input from executive level data governance. The organization should develop a rational scheme for determining how closely the determined levels of ‘critical’ should be managed.
The persistent work products (glossary, metadata properties, CDEs, etc.) should be managed by the Data Management Function, working with governance groups – these work products are valuable and build segments of data asset knowledge. Convening the governance group, the organization gains agreement on the consolidated data set constituting the data content needed for a customer profile, and develops business terms with definitions for the data set. At this point, the organization, engaging its data stewards, can determine what information it needs (Metadata Management) about the customer data, determined by importance. For example, for customer privacy data, like Social Security Number, the organization would want to know: when it was captured; if it was corrected/changed; by whom it was changed, who is the data owner; who can modify or view access it; the source data store; other data stores in which it resides, etc.
Since multiple business areas will utilize the customer profile, it must be as timely, complete, consistent and accurate as possible. Therefore, a Data Quality Strategy should be created, essentially answering the question “what are our plans to assure quality?” As a part of identified activities, Data Profiling should be conducted for supplying data sources. This will discover defects, anomalies, and unreconciled names, formats, values, etc. Governance participants are then engaged to develop quality rules using data quality dimensions (Data Quality Assessment) to prevent errors in data acquired for the profile data store. Prior to implementation, source data will be corrected (in the source if possible, if not, at load time) by Data Cleansing.
The selection of the customer profile data set, and determination of criticality, will be reflected in the functional and data requirements (Data Requirements Definition). The organization will designate the authoritative data sources for each data element in the profile, and map the data elements to the business processes that create or modify them (Data Lifecycle Management). Applying data standards for representation (and modeling), security, privacy, access, and provisioning (Architectural Standards) will assure a well-organized design of the customer profile and an orderly path for Data Integration.
It should be clear that all the activities described in this scenario harness the collective power of planning and rational analysis, “thinking through” what’s important about the data and capturing significant knowledge that will both support the implementation of the customer profile in the short term, and create a reusable approach that can be leveraged across the organization.
Phased designation of critical data elements, then, is most easily implemented in manageable segments to enable the organization to gain, segment by segment, deeper knowledge about the data assets.
[1] Disclaimer: In our work with clients and students, the CMMI Institute highly recommends that an organization evaluate the current state of data management practices against the Data Management Maturity Model prior to beginning any significant program evolution effort, ensuring that all major gaps and strengths are identified.
[2] If you’ve read my other columns, you’ll be familiar with the convention of showing that “everything is connected” via a hub and spoke picture. Each process area can be viewed as the center of the data universe, with direct / indirect dependencies to other activities and work products. The CDE effort is, functionally speaking, the application of activities in many other fundamental processes, with the addition of designating categorization.