Data Professional Introspective: Be a Savvy and Confident Data Steward

Data-IntrospectiveAt conferences and in working with many organizations, I’ve often met individuals engaged in data governance – executives, stewards, owners, custodians, program managers, etc. – who are discouraged about the state of governance in their organizations. They’ve made statements like: “We tried to stand up governance and it failed;” “The stick is wearing out;” “I can’t get people to come to meetings;” “I’m discouraged because we don’t accomplish much;” or “This is all new to me and I feel overwhelmed.”

Let’s review the data governance function – it addresses three lifecycle phases of the data assets: building, sustaining, and controlling; and it executes through collaborative decision making. ‘Enterprise data,’ data which is shared and important for key business operations and decisions, is the target scope for governance. Further, governance is a permanent function, because there is always data to be added, modified, integrated, and managed, just like Finance always needs to manage budgets and funds, and Human Resources always needs to recruit and manage staff resources. Data governance can face many challenges in an organization – summarized here:

Sometimes the primary problem is structure. The organization didn’t select an experienced and motivated leader to get the governance groups up and running, or they didn’t select representatives with the requisite business data knowledge, or they have too few people, or too many, or the assigned roles weren’t understood or willingly accepted.

Sometimes the primary problem is expectations. Executives often don’t think about governance realistically, that is, they would be very happy if governance were to solve all of the organization’s data issues. They may succeed in gaining their peers’ approval for staff time, and that is a definite accomplishment. However, they can also engage in wishful thinking about how quickly governance can accomplish initiatives. This view leads to overloading governance participants with tasks and deadlines, overlooking the fact that most of them have ‘Day Jobs’ and that typically, governance participation is not written into their job description. They can also (conveniently) believe that sustained, active engagement by executives isn’t a key success factor for effective governance.

Sometimes the primary problem is lack of planning or poor management of governance activities. Groups may meet too often, or not often enough. Meetings may become too informal, too long, inefficient, or boring. An example: I recall a two-hour meeting with over 20 senior staff attempting to approve a single new type code. This was a huge waste of time and, if we use an average hourly loaded rate of $90, that meeting cost the organization $3600. Other management problems include action items and assignments that are not fully tracked, and insufficient or incomplete communication between governance representatives and the business lines, or between governance and the data management organization.

mecca01However, in this column, I’m not going to address those issues, despite their validity and their impact on the success of data governance. Instead, we’ll concentrate on what is within your personal control. The topic is the key role in managing the organization’s data assets – you, the data steward – and the learning and skills you can acquire to ensure that you remain interested, productive, and satisfied, for your internal pleasure in the knowledge of a job well done. As the saying goes, “Nothing succeeds like satisfaction.” And if you want to get spiritual about it, “The perfection of action creates Chi.”

When I was in my second year of employment, as a business analyst for the Tri-Services Military Information System, I complained to my father, an Air Force Colonel and Distinguished Flying Cross recipient, “I’m only using 25% of my brain – this is so boring.” He didn’t offer me sympathy, or take me to task, or try to tell me that I shouldn’t feel that way. He just said: “Learn your job very well, and you’ll come to enjoy it.” And that is indeed what happened. As I focused on making the most of my positions, projects, and roles, in a few years I found my passion – enterprise data – and have thoroughly enjoyed my career ever since.

With my father’s adage in mind, we’ll outline the skills that a world-class data steward would set out to acquire. We’ll approach this first from the process areas in the Data Management Maturity Model, which addresses fundamental data management practices.[1] In the DMM, Governance activities are called out in the process areas/ topics to which they apply. The hub and spoke diagram below depicts the fundamental processes for which governance is critical.

Governance is Vital for the Success of All Data Management Processes. Note: Business Case and Program Funding are not depicted due to diagram size.

Note: Business Case and Program Funding are not depicted due to diagram size.

Governance is Vital for the Success of All Data Management Processes

Each of these process areas requires governance activities and decisions. This is one reason why I urge organizations without current governance capabilities to start small; if you fully informed each new data steward of the evolving breadth and scope of their role, they might run screaming down the street.

The data steward is the ambassador from the business to the organization, engaging in shaping policy, gaining agreements, and representing your ‘country’ in the ‘United Nations’ of the enterprise.

Let’s look under the bed and face the monster – the chart below illustrates, in aggregate, the scope of practices requiring analysis, business input, and decisions from the data steward, aligned with the process areas above. No way around it – though it can be satisfying and at times exciting, the role implies a lot of responsibility.

mecca03

Your Mission, If You Choose to Accept It

Now we’re going to summarize the most important skills and analytical approaches – WHAT you need to learn to become a savvy and confident data steward, to do a great job for your business line, and to gain recognition for your expertise. Your evolving knowledge can best be acquired by a combination of training courses, publications, mentoring, and doing – working through governance tasks with your business and data steward peers. Each item on the list below can be prefaced by: “The world-class data steward is skilled at…

  1. Defining business concepts. Managing data as meaning is the heart of data management from the business perspective. Divergent use of terms describing key shared business concepts, or divergent meanings for the same term, is a major business issue. The same meaning is often given various names and the same term may have different shades of meaning across multiple business lines. As the data steward, your job is to: represent your business line’s use of each term; interpret other business lines’ usage; gain agreement on defining a base term that all stakeholders can approve; and assist in determining additional qualification name components to precisely define usage. Example: The term “Client,” referring to a person or organization that patronizes a firm’s products or services. In the case, it may be important to differentiate the lifecycle phases, between a ‘Prospective Client’ important to Marketing, a ‘Signed Client’ important to Sales, a ‘Verified Client’ important to Operations, etc.
  1. Describing and decomposing business processes – The well-equipped data steward is advised to become familiar with defining and parsing business processes into sub-processes. A common convention for decomposition is:
    • Function – activities that are consistently performed, often mostly within a business unit, such as ‘Identify Client’ (Marketing) or ‘Solicit Client’ (Sales), ‘Bill Client’ (Finance), etc.
    • Process – a series of actions or activity steps taken to achieve a specific end (or output, or outcome), such as ‘Register Client,’ ‘Verify Client Information,’ etc. Processes frequently have sub-processes, for instance ‘Create Client,’ ‘Capture Client Information,’ ‘Initiate Document Request,’ etc. Process diagrams typically depict Inputs – what’s needed to start the process; Outputs – what the process produces; Controls – constraints, regulations, or standards to which the process must conform; Mechanisms – tools, techniques, frameworks, software – enablers of the process. Processes Create, Update, Delete, or Read It’s very important for the data steward to coordinate closely with the business unit about data involved with business processes, and obtain verification at the detailed level.
    • Procedure – actions conducted in a specified manner, such as ‘enter the Client Name, Address, Phone Number, and Email Address on the Client Information screen in [System Name].’ A data steward may be called upon to provide input for procedures, particularly in the phase of data capture (i.e., to prevent Garbage In).
  1. Identifying and specifying data requirements – Data requirements are defined at various levels. At a high level, data requirements for a data store or repository are usually first outlined at a subject area level, such as ‘Clients,’ ‘Services,’ ‘Products,’ etc. You’ll be offering input such as ‘My business line needs accurate Product reference data,’ ‘We need real-time Client information updates,’ etc.

At the point of logical design, when the organization is preparing to enhance, build or acquire a data store or repository, the data scope must be specified in detail. Specific data requirements may be described as detailed requirements statements in a requirements document, such as ‘A Client may have one or more addresses,’ ‘The data store must allow capture of an apartment number, post office box, or suite number,’ ‘A street address shall contain both the street number and the street name.’ However, it is more efficient to organize data requirements by creating a logical data model, which diagrams and captures metadata about ‘entity types’ (persons, places, things, concepts, or events of interest to the business), ‘attributes’ (facts about an entity type, such as ‘Street Address, Zip Code, Effective Date,’ etc.), and data relationships (e.g., a Client ‘may be located at’ one or more addresses). Usually data architects create data models, but they must be created with business input and reviewed and validated by the business, in an iterative manner. The key player on the business side is, naturally, the data steward.[2] The data model should apply approved business terms as the starting point for naming.

  1. Determining what information is needed about data assets – Metadata – documented knowledge about the data assets – is typically captured for specific implemented solutions, and is often not organized in a centralized manner to make it accessible. The typical organization has bits and pieces of information about its data layer which is not integrated, and is stored in multiple locations. Well-organized and accessible business, technical, and operational metadata is very important for data repositories, predictive analytics, data store redesigns, custom to COTS migrations, and cloud data storage. The data steward’s job is to discover and communicate to the governance group what information the business line needs, thinking through business scenarios carefully, without making premature assumptions. For example, sometimes technical metadata is critical for a business user in making business decisions, e.g., knowing that there was a three-minute gap in stock transactions received during the market day.

Defining metadata needed by the business, helping creating it, and defining access and retrieval requirements, may occupy roughly 30-40% of a data steward’s focus over time, particularly if the organization is implementing a metadata repository. One application of metadata that is vitally important is the determination of authoritative data sources. For instance, if there are eight systems that contain Client data, how does the organization determine what source is the best for the new data warehouse?

  1. Advocating for data quality and defining quality rules – If the bedrock of data management can be reduced to three elements, they are Architecture, Governance, and Quality. Improvements to data quality directly benefit the business, enhancing business activities and decisions, data sharing and analytics. There are two primary areas of knowledge that the data steward needs to master for effective representation of the business line:
  • Business needs – Working closely with the business line, the data steward needs to represent high-level needs for accurate, accessible data. For example: ‘Sales needs to access the latest information updates to Client near-real time; ’Patient Services needs to ensure that there are no duplicate patient records;’ or ‘The Client mailing address must be validated against USPS deliverable addresses.’ This requires persistence, effective communication with business peers, and comprehensive thinking to unearth everything that’s important for the business to excel at executing its function.
  • Quality rules – If the organization is building a new data store, creating business terms, populating a metadata repository, or responding to specific data defects, the data steward will need to be engaged in developing quality rules. Some rules are inherent in the logical data model, e.g., ‘the database will keep a history of previous addresses with effective and expiration dates.’ Others will be added as system processes when a record is added or modified (edit checks, validation rules). The best way to tackle developing detailed quality rules is to apply data quality dimensions, criteria against which quality can be expressed, and then measured. Organizations should survey published sets of quality rules, and adopt or adapt them. The DMM, page 70, Data Quality Strategy, offers a useful starter set:
  • Accuracy – criteria related to affinity with original intent, veracity as compared to an authoritative source, and measurement precision
  • Completeness – criteria related to the availability of required data attributes
  • Coverage – criteria related to the availability of required data records
  • Conformity – criteria related to alignment of content with required standards
  • Consistency – criteria related to compliance with required patterns and uniformity rules
  • Duplication (Uniqueness) – criteria related to redundancy of records or attributes
  • Integrity – criteria related to the accuracy of data relationships (parent and child linkage)
    • Timeliness – criteria related to the currency of content and availability to be used when needed

Some organizations add ‘Accessibility’ as another dimension, and there are other variations. Application of dimensions takes some practice. Ideally, each data attribute of importance to the business line should be considered in the light of its threshold (what level of quality is acceptable for the business) and targets (what level of quality is desirable). For example, a target might be ‘The Client ID / Client Name combination must be 100% unique.’ Becoming familiar with dimensions, and digging into the details of their application across the data set(s) of interest to the data steward’s business line, is a valued skill that will help realize direct business benefits – better data, better decisions. Working with IT and governance peers, you’ll acquire deep knowledge of the data and appreciate its intricacies.

  1. Establishing relative data importance – whether for the purpose of regulation, competitive advantage, internal dependencies, or productization (data monetization), organizations often need to determine which data is the most important (aka, Critical Data Elements).

For example, let’s say that Sales really needs to know the characteristics of customers likely to purchase their new product, which has high revenue potential. Information to consider might include: how long the individual or organization has been a customer (Create Date); total past sales to that customer (aggregation – Total Customer Sale Amount, deriving by adding the Sale Amount of previous sales); products purchased (aggregation – the set of Product Types previously purchased by that customer), how many times they had purchased (Customer Sales Number) the NAICS code (industry, if an organization), and location (City, State, Zip Code). If they put all that information together, it would construct a more targeted set of potential customers, which they could continue to refine. In this example, the data elements mentioned would be very important for Sales.

Not all data elements in a data set of interest are critical. If my organization sells cosmetics, the fact that Eye Color = Green may be important; but if Eye Color is just a part of a Client Information record and I sell bulldozers, it’s not important for Sales. Considering the data set under analysis, with the goal of determining which data elements can truly be designated ‘Critical’ is an advanced data management skill. It requires understanding metadata, verification that business terms mean what they imply, analysis of upstream and downstream dependencies, scenario analysis to tease out the details, and finally, gaining agreement with other business lines for shared data. For the data steward, this is an opportunity to really show your stuff and have a direct role in a high-value business initiative.

The skills descriptions above are not exhaustive; there are many other soft skills that also contribute to your qualifications and expertise. Your data steward’s role may call upon you to participate in:

    • Identification of tasks and sequencing
    • Developing goals, objectives, and data management metrics
    • Communications and negotiating effectively
    • Fostering change management, and
    • Evaluating and helping to create policies, processes, and standards

mecca04If you set yourself the goal of continuous growth and accumulating data management skills, you become, in essence, the brain trust of the business; definitely included in the category of ‘Essential Staff.’ So to the data steward, the dedicated hero of the data layer, I encourage you to grow in knowledge and confidence through the approaches and analytical skills outlined above. As your organization continues its efforts to improve and optimize its data, you are right at the center of its success. May you enjoy the journey!


References

[1] The DMM has 414 functional practice statements – over 100 of those involve governance activities and decisions. Governance permeates the DMM; the practices are embedded in the topic areas in which they are applied, for example, Metadata Management, Data Quality Assessment, Data Integration, etc.

[2] Data modeling is an engaging thought process. It is MOST helpful for a data steward to learn to understand a data model to verify if essential business data is missing, if the data is represented accurately, if the data relationships are correct, if if the values and ranges specified for attributes are correct, etc. I recommend the book “Data Modeling Made Simple: A Practical Guide for Business and IT Professionals” by Steve Hoberman.

Share

submit to reddit

About Melanie Mecca

Melanie Mecca, Director of Data Management Products and Services, CMMI Institute, led development of the Data Management Maturity (DMM) SM Model. Her team created a highly interactive method for assessing an organization’s capabilities against the DMM, and she has led numerous assessments for organizations in the financial, Federal, and technology industries. She directed creation of the Building EDM Capabilities, Mastering EDM Capabilities, and Enterprise Data Management Expert (EDME) courses leading to DMM certification. In 30+ years solving enterprise data challenges, Ms. Mecca has architected and implemented data management programs and projects, data strategies and architectures, and designed enterprise data services. She is an active presenter of classes, seminars, webinars and case studies, and is a strong advocate for data management education, with a passion for assisting organizations to realize business value from their data management programs.

Top