Every month since January 2012 I have had the privilege of conducting a monthly Data Governance and Data Stewardship webinar series with Dataversity. The series is called Real-World Data Governance (#RWDG) and it takes place on the third Thursday of every month. I am thankful for the outstanding attendance and the quality of the Questions & Answers that I get with each webinar. Visit with me this month when I talk about Do-It-Yourself and Purchased Data Governance Tools.
This special feature contains the “Best of #RWDG Q&A” from the last year of these webinars. I hope to see you at future webinars. Register today.
What follows is a taste of the questions I receive and the answers that were provided.
May 2016 – Data Governance Framework for Smart Data w/Tony Sarris
Do you think data governance should extend to checking for bias; is there a need for governance around the analysis / modeling process?
The simple answer to that question is yes. But truthfully a lot depends on the degree to which your organization decides to govern the analysis itself. At a minimum it makes sense to assess for bias before exposing results. Assessing results for bias requires knowledgeable resources in this discipline. These people are the data scientists.
What is the name for the aspect that Data Scientists engage in to not identify data sources – is it something like “non-referential ” or “non-identifiable”?
Wikipedia has a pretty good definition of this concept, which is called differential privacy. Here is a data science related blog post that explains the concept and gives an example. “Simpler” in the title of the article is a relative term. It’s geared to data scientists and others with a fairly deep math or statistics background. The paper I mentioned during the webinar was co-authored by Zachary Chase Lipton, a colleague of mine who is at the University of California San Diego. He’s an expert on the subject and the person who introduced me to the concept and its application to machine learning.
April 2016 – How to Implement Data Governance Best Practice
Can you give some examples of how to measure Data Governance success?
There are two ways that I typically measure Data Governance success. The first and more immediate approach is to measure how well Data Governance has been accepted by the organization. Second, I suggest that you measure the business value brought to the organization by DG. The second takes more time, requires benchmarks, and should be tempered to make the results more believable. The second may include issues resolved, issues collected, the value of resolving these issues, reduction in labor, efficiency and effectiveness of governed processes, and so on. I hope this helps.
The trouble I am running into is gaining focus from Data Stewards for Data Governance activities (as they have a normal day job to perform). Everyone knows that DG is an important thing but it keeps falling to the bottom priority. What can I do to prevent this?
Your question is very tricky to answer. I have a lot of questions for you. What is it that you are asking the Stewards to do? Are these activities part of an enterprise initiative/project or is it some activity that Data Governance (on its own) is asking the stewards to perform? In the case that it is a planned enterprise activity, the appropriate steward participation should not be optional (since it is part of a planned and funded project). In the latter case, unless stewards are looking for more work or have the necessary bandwidth to participate in your activities, or the stewards are told directly by their management that they are required to work on these governance-specific activities, expect these activities to fall to the bottom of their priorities. Suggestion: govern activities that are parts of projects and priority initiatives to the people funding them.
How many resources should be part of a DG office? Would you need representation from both business and IT liaisons as part of a DG office?
This is a difficult question to provide a definitive answer. First of all, I am not a believer in the need for a DGO per se. The use of a DGO requires significant funding and provides a potentially unnecessary level of bureaucracy. There are other people that subscribe to using a DGO that may be able to answer this question better.
That being said, I will answer “how many resources should be a part of Data Governance leadership?” and that answer also depends. I have seen data governance teams with as many as 14 people (many consultants) and I have seen teams that consist of 1/8 of one person’s time. Obviously program development and deployment will move quicker if there are several resources rather than one, but I think a DGT of 14 people is overkill for most organizations.
How does Data Governance differ from Information Governance? How do we socialize this message across the enterprise if you have both initiatives active?
There may be a difference between these two but that is not a certainty. It all depends on how you define “data” and “Information.” In my opinion, Information = Data + Metadata. But that is not to say that Data Governance does not cover Metadata. In my definition, Data Governance is the execution and enforcement of authority over the management of data and data-related assets. If you replace the term “data” with “information,” the definition remains the same.
To me they can be considered interchangeable.
March 2016 – The Data Model as a Data Governance Artifact
Do you insert Data Stewardship in the governance of data definitions developed during the data modeling process? How can we effectively manage Business Terms definitions vs. Data Dictionary definitions?
You do not insert stewardship into the governance of data modeling definitions. In fact, you do the exact opposite. The people involved in providing data dictionary and business term definitions are typically data definition stewards because they have the responsibility (possibly formal accountability) of providing the business term or business data definitions.
If your organization maintains the three levels of metadata that I spoke about in the webinar, the business terms and the data dictionary definitions are not the same things, so they can effectively be managed through separate change management processes.
How are organizations handling data modeling when they purchase large systems like electronic health records from a vendor? Are organizations developing their own data models and comparing those to the vendors? Are the vendors allowing inspection of their data models?
Organizations are requesting data models for the investments in the large systems. They may not always get the exact models they request but it makes sense to ask the vendors for whatever information you can about the data that will be housed in their system.
Some vendors will claim that their database design is proprietary information. Others will promote the fact that their database design is open or available. My suggestion is that if you have any interest in accessing the data other than through the application itself, you will want to know at a minimum the design of how the physical data is stored.
February 2016 – Writing Data Governance Policies & Procedures
What top 2-3 techniques do you use to be less invasive?
I do not know if these are the top 2-3 but I suggest these three techniques to stay Non-Invasive:
- Identify / recognize (preferred) people into their Data Governance roles rather than Assign people into the roles. If accountabilities are based on people’s relationships to data (they are!) then we do not have to assign Stewards to be Stewards. The approach calls for formalizing accountabilities based on existing responsibilities. Non-invasive and less-threatening.
- Base your data governance awareness on orientation, on-boarding, and on-going communications tailored to precise audiences and penchant to accept what is being shared.
- Apply data governance to existing processes first rather than labeling all governed processes as “Data Governance Processes.” You probably do not want people pointing at Data Governance and saying that the reason you are following the process is because of Data Governance. The reason you are following the process or procedure is the reason why the process was defined in the first place … “Granting access to data process,” “issue resolution process,” “project management process,” etc
What are some alternatives to termination to enforce policy violation? We have leadership support but not to that extent. The company culture is very bottoms-up and Senior Management is reluctant to put down the hammer.
I have a client in this EXACT situation. Companies cannot be self-governed by the masses. There has to be SOME level of formal governance and management.
Termination is NOT the only option and should typically be used as a last resort – for repeated offenders of breaking the rules. Let’s use the protection of PII data – or lack thereof – as an example of how staff may go against policy.
As an example, let’s say that the PII Handling Rule for External Data Distribution says that we cannot send customer information to third-party suppliers unless the customer gives written permission to do so. However, a member of your staff is caught doing that and reprimanded while being told (again) why it is against the law and corporate policy to share that data. The person refuses to change their behavior. They get poor reviews.
This person is seen as an employee that breaks the defined rules. Same as a person that wears jeans to work when it is not permitted, but with more risk to the organization. This person can only get away with that behavior for so long before they pay the price – loss of pay, suspension, demotion, or ultimately the loss of their job.
The real difference between your non-invasive approach to Data Governance and an invasive approach to Data Governance seems to be money or the investment in Data Governance technology. Am I correct in my thinking?
Are you correct? No, no, no, no!
The REAL difference between the Non-Invasive Data Governance™ approach and a more invasive approach is the technique that is used to recognize and engage data stewards across the enterprise.
The non-invasive nature of the approach is applied in:
- How the program is sold (“we are already governing data but we are doing it informally, inefficiently, and ineffectively”)
- How the program approach focuses on formalizing accountability rather than handing accountability to people as something new
- How governance is applied to existing or new processes defined by the organization
Nothing about money or the investment in the technology here. Technology may be beneficial in the governance of people’s behavior when applied at the appropriate time, no matter what approach you follow.
January 2016 – Metadata to Support Data Governance
Can you set up a metadata initiative without the support of data SMEs in an organization?
Certainly. You can set up a metadata initiative in a vacuum but I am not certain it is in your best interest to do this. Oftentimes metadata initiatives focus on improving understanding of the data through the development of glossaries, vocabularies and dictionary definitions of the most meaningful data to the organization. It makes sense to lean on the data SMEs to provide the definitions. In lieu of their availability of interest, the definition, production, and use of the data in their subject area becomes more difficult to develop and record.
We have a lot of institutional resistance on the part of management to implementing data governance. Do you have any recommendations on how to overcome this?
First of all, you are not alone. My first suggestion is to fully understand why there is institutional resistance. Is it funding? Is it concern that governance will be difficult or interfere with how people handle data? Is it an “ownership” issue? I’d love to speak with you in more detail about ANY of these or other reasons.
In many situations, there is institutional resistance because the institution has not been given the appropriate information about the options in front of them for how to approach Data Governance. Most people think that Data Governance has to be expensive, difficult, and that it will interfere with how business is done, when the truth is that there are alternative approaches that address the difficulties many organizations fear. The Non-Invasive approach to data governance, when explained and implemented properly, oftentimes eliminates or greatly reduces the concerns associated with people’s resistance.
The fact is that organizations must find a way to govern their data. They must protect the data to its fullest and get the most value from that data to keep up with or stay ahead of their competition. The way these organizations govern their data will need to be auditable and the results will become public knowledge. It will no longer be a matter of resisting the inevitable. It will become a matter of the best way to govern your data. I say stick with being non-invasive to combat the resistance.
Any recommendations for the type of metrics we can gather to measure the success of our metadata program?
Some examples of metrics for metadata include the amount of business metadata that is collected, the validity of the metadata, the number of uses of the metadata, the number of people with access to metadata … The list can go on and on.
You can measure the volume of metadata, value from the metadata, availability of metadata, business areas or business systems with metadata recorded, data end user satisfaction with available metadata … Please reach out to me if you want to discuss this further.
Which role (business or IT) should lead metadata governance?
The answer to this question depends on your organization. Usually the person that heads up a metadata initiative or has the responsibility to collect and make available metadata is fairly technical and thus they usually come out of an IT area. However, the people that provide information about what metadata is most valuable, how they will use the metadata, and the metadata in terms of definitions and handling rules (as examples), are business people. Metadata governance has to be a partnership between business and IT.
How would you start when executives think Data Governance isn’t important but Metadata is?
Interesting conundrum. I can’t say that I have seen that before but I am guessing there are other organizations like yours. I would start by making certain that the Executives understand the relationship between Data Governance and Metadata management. I would start by using the industry argument of “you can’t have one without the other” and explain to them how the metadata has to be governed too.
December 2015 – Data Governance and the Internet of Things
Are there any negative consequences to governing IoT data wrong or not governing IoT data?
I believe there a lot of negative consequences to leaving IoT data (as I described in the webinar) ungoverned or in governing IoT data in such a way that will not work for your organization. The negative consequences of ungoverned IoT data in the business will be the same as the negative consequences of ungoverned non-IoT data. Personally we should all be concerned with the negative consequences of personal IoT data being ungoverned. The results of ungoverned IoT business or personal data include, but are not limited to, unprotected and inaccurate data. Those results may be less apparent for IoT data because of the nature of the beast.
November 2015 – Agile Data Governance
How is Agile Data Governance different from XP (Extreme Programming) Data Governance?
Extreme Programming (XP) is one of several popular Agile Processes. In XP checkpoints are incorporated to improve quality as an extension to traditional Agile methods. XP Governance is the execution and enforcement of authority to deliver software in a XP way.
I am not familiar with the term XP Governance. I believe that XP Governance differs from Agile Data Governance in exactly the same way XP differs from Agile in that governance is applied to the checkpoints to further improve quality.
I’ve seen agile developers build database schemas that mimic report and input screen mock-ups, complicating ETL data transformations between the two. How can we avoid this pitfall? Is one solution a bit of logical data modelling?
Data management best practices tell us that we should build database schemas based on business data requirements and proven data modeling methods. Best practices do not tell us to build database schemas that mimic report and input screens, and yes, that would complicate much more than the ETL.
Data modeling to design and build data structures makes a lot more sense. Introducing data modelers into the agile methods will take some work. Please refer to what I said in the webinar regarding ways to get Agile teams to align with data management interests.
October 2015 – Govern Metadata: Vocabulary, Dictionaries and Data
Is there a concept of certification of data lineage (technical and business metadata)?
There is a concept of certification and there is a concept of validation. Validation is the act of assuring quality and certification is the act of making certain that you are doing it correctly and with the proper knowledge. Lineage as “where the data came from” most definitely needs to be validated and so does the quality and value of the business and technical metadata. The developers and architects must be certified in the discipline of their trade and end users must be certified in the data and metadata usage.
At my company, our Enterprise Architects have selected two different metadata tools, one for “data” and one for “information.” When it comes to the business terms and definitions, I do not see any distinction between data and information. Thoughts?
Information is data with the context (read metadata) attached. The data “1500” means nothing by itself. Is it an address, dollar amount, quantity? It is hard to make information from just the data. I do not see the distinction when it comes to tools. Share with me your architect’s definition of information versus data and I will be glad to answer this question better.
I suppose an enterprise business data model will be a great start to building the business semantics. But what if there isn’t one?
Well … You could build or purchase one. I suggest build over purchase. Building an enterprise model can be a long and complicated process and requires cooperation and collaboration. It also requires that you have a way to resolve differences of opinion (read Governance), a place to store the business semantics (read metadata tool) and a process to make certain that the semantics are managed (read Data Governance again).
Do you find that organizations implement Metadata all at once (vocabulary, data dictionary, and data), or do they usually start with just one and move to the next? What is the success rate of starting with / implementing the vocabulary first?
Great questions. Some organizations build their metadata incrementally while other bite off a bigger portion of the elephant. I believe it is always best to start small, demonstrate value, learn from the experience and then expand. I cannot quote you a success rate percentage but I will tell you that vocabulary is one of the easier places to start. Want to know why? Let’s continue the conversation some time.
Any tips on business justification strategies for investment in metadata management?
It is very difficult to manage anything or to improve the quality of anything if you do not have information about that thing. This definitely holds true for data. Data cannot be governed if you do not have metadata.
You may want to start with a philosophy that you already have metadata collected in a lot of the tools in your environment. A lot of that metadata is only available for the metadata producers. That limits the value that can come from the metadata. Metadata management addresses that.
The metadata is like the gas your car runs on. It has been proven through repeated studies that without the metadata, the quality of data is lower, data analytics are not as comprehensive, business understanding of the data is complicated (to be nice), it takes longer to find the data you need, people get different answers from the data depending on who they ask … Need I say more?
September 2015 – Data Modeling is Data Governance w/David Hay
Is Data Governance an Art or a Science?
I would say the “art” of Data Governance is in the design of the program to match the organization’s culture and willingness to govern data through formalized accountability. The “science” of Data Governance is in the tactical deployment of the program including gaining Senior Leadership’s support, sponsorship and understanding, incorporating Data Governance into all projects, resolving “unsolvable” problems, and protecting the heck out of the data.
How does governed data address the values, behaviors, and politics of the business?
Governed data demonstrates that the definition, production, and use of the data is important to the company. The Executive Board requires that the data is governed because they require good, timely, high quality data to make the decisions that guide the company. Staff requires governed data to do their jobs efficiently and effectively. Customers require that you govern and therefore protect their data. Ungoverned data assures the company that they will spend more and get less from their data. Not always the best option.
August 2016 – Data Governance & Data Steward Certification
Is there a Data Governance component to the Data Management Maturity (DMM) model? Do you find many organizations pursuing DMM?
There is a data governance component to the DMM model that is shared by the Capability Maturity Model Institute (CMMI) and this component is linked to almost every other component of their model. The DMM is gaining popularity but has yet to become commonplace among organizations.
For someone who runs the data governance office, what is the best certification?
My suggestion is to gain as much knowledge and education as possible through a variety of resources so that you will be exposed to more than a single philosophy and approach. There are books, web-sites, conferences, professional organizations, etc.
If certification is still required, go with someone whose approach aligns to your way of thinking and the culture of your organization. Make certain your organization sees the value of the certification before putting your eggs in that basket.
Is Data Governance Certification starting to pop up as a desired qualification during the hiring process? Are companies starting to hire certified Data Governance professionals specifically?
Data Governance Certification has not popped up on my radar yet as a requirement for clients or potential clients. I have worked with organizations that have sent their people to education and conferences for improving data governance knowledge.
This does not mean there aren’t organizations that require Data Governance Certification. My points in the webinar regarding what to look for in potential certification should be considered before selecting a certifying body.
What risk is there, if any, if the Data Stewards are not certified?
This is a great question. My honest opinion is the only risk of not certifying your stewards is that the stewards will be less knowledgeable about your data, your rules, your sources, and the data’s meaning. That may seem pretty risky but you may find it difficult to select an external organization that can certify the stewards in your organization. Internal certifications are a better bet.
July 2015 – Big Data and BI Analytics Require Data Governance
Do you think it is a good idea to attempt governance of the data lakes, or only data that was pulled out for a particular business reason?
I believe the idea of the data lake is to have a place to keep the data in its rawest form and format. Governing what gets thrown in the lake may be possible however the governance of what’s in the lake may not add much value. It makes better sense to make certain what is pulled from the data lake for a particular business reason gets governed.
We have so much governance set up in healthcare data. How do you envision a more streamlined version of governance that can be used to govern healthcare data?
You are correct. There are a lot of rules when it comes to governing healthcare data. That does not necessarily mean that all healthcare institutions govern well to those rules.
To streamline governance in a healthcare environment and with healthcare data requires that governance is operationalized and institutionalized as much as possible. When new data is defined, processes must be put in place and followed to assure high quality data definition. When new data is created or data is updated or eliminated, processes must be put in place and followed to assure that the new or changed data is accurate and timely. When data is used, the rules associated with using the data must be communicated, institutionalized, and enforced throughout the data usage process. This is the simplest way to streamline data governance for healthcare (or any other) data.
Thank you for taking the time to walk back through my last dozen RWDG webinar. I hope the Q&A was helpful and that we will see you next time in the Real-World Data Governance webinar series with Dataversity.