Data stewardship is an important role in today’s data-driven business organizations. Data stewards facilitate consensus about data definitions, quality, and usage. They guide activities to complete metadata, improve data quality, and ensure regulatory compliance. Stewards are also responsible for making recommendations about data access, security, distribution, retention, archiving, and disposal.
Unfortunately, typical data stewardship practices often don’t measure up to the importance of the role. All too frequently, data stewards are identified and assigned responsibility without the time and training to do the job well. When we designate busy people as data stewards without making time for them to do stewardship work, we should not expect high-impact results. Nor should we expect success without training stewards about roles, relationships, and accountabilities related to data.
Along with time and training, data stewards need tools that help them to do their work. This article offers a simple tool to help diagnose data problems and find the path from symptoms to causes, and from causes to solutions. The tables below identify common symptoms of data challenges that data stewards frequently encounter, grouped by ten core data management processes – naming data, defining data, designing data, managing quality, integrating data, accessing data, managing metadata, administering databases, managing systems, and governing data. Common causes of and solutions to data problems are identified for each process.
To use the tool, begin by browsing the index of common symptoms to find those related to your data management issues. Then use the associated numbers to find each in symptom in the process tables. Note that a single symptom is often listed in several of the process tables. Explore the processes, causes, and solutions to develop problem-solving ideas and plans.
Index: Common Symptoms of Data Problems
application integration difficulty | 47 | inefficient business analysis | 26, 39 |
business rule violations in data | 31, 40, 90 | insufficient data storage capacity | 72 |
can’t access needed data | 52 | lack of data definitions | 10 |
complex system interfaces | 48, 81 | lack of trust in data | 32 |
conflicting documentation | 64 | large change request backlog | 20 |
confusing abbreviations | 5 | limited data sharing | 49 |
confusing documentation | 67 | lost data can’t be recovered | 80 |
corrupted data can’t be repaired | 79 | meaningless data definitions | 12 |
data consolidation difficulties | 98 | meaningless data names | 1 |
data not available when needed | 58 | missing documentation | 62 |
data ownership conflicts | 99 | misunderstood data | 15, 69 |
data privacy compromised | 53, 55, 89, 93 | multiple names & aliases | 6 |
data retention/disposal uncertainty | 97 | need for data standardization | 100 |
data security compromised | 54, 88, 92 | needed access not authorized | 56 |
data-related compliance violations | 94 | needed features not implemented | 78 |
difficult-to-use data | 28, 37 | non-unique data names | 2 |
disaster recovery uncertainties | 95 | obsolete data definitions | 13 |
enterprise reporting difficulty | 46 | obsolete permissions still active | 57 |
excessive database downtime | 76 | outdated documentation | 66 |
failure to meet business needs | 21 | overlapping and conflicting data | 44 |
hard to find data definitions | 14 | poor application performance | 83 |
hard to find documentation | 65 | poor data access performance | 60 |
hard to find needed data | 51 | poor data quality | 86, 91 |
hard-to-navigate databases | 29, 59 | poor database performance | 30 |
hard-to-identify data | 8 | poor query performance | 74 |
hard-to-navigate databases | 29 | poor structural integrity | 19, 33 |
high level of data disparity | 9, 17, 25, 43, 70, 85 | poor update performance | 75 |
high level of data redundancy | 18, 27, 71, 84 | shadow databases | 41 |
inadequate metadata | 87 | shadow systems & databases | 23 |
inappropriate use of data | 16 | spreadsheet proliferation | 22, 42, 50, 61 |
incomplete data | 36 | structureless data names | 4 |
incomplete documentation | 63 | territorialism inhibits data sharing | 96 |
incorrect data | 34 | unanticipated growth problems | 73 |
incorrect data definitions | 11 | unnamed data components | 7 |
incorrect data names | 3 | unreliable database connections | 77 |
incorrect reporting | 24, 38 |
Naming Data
Symptoms | Causes | Solutions | |
1 | meaningless data names | – informal naming practices- lack of naming standards- standards | – data naming taxonomy- data naming vocabulary- standard naming structure
– standard abbreviations list – compliance incentives |
2 | non-unique data names | ||
3 | incorrect data names | ||
4 | structureless data names | ||
5 | confusing abbreviations | ||
6 | multiple names & aliases | ||
7 | unnamed data components | ||
8 | hard-to-identify data | ||
9 | high level of data disparity |
Defining Data
Symptoms | Causes | Solutions | |
10 | lack of data definitions | – lack of data definition standards- poor data definition practices- lack of business participation
– legacy databases – disparate metadata |
– data definition standards- data definition templates- data definition wiki
– business/tech collaboration – data definition review – metadata repository – definitions system-of-record |
11 | incorrect data definitions | ||
12 | meaningless data definitions | ||
13 | obsolete data definitions | ||
14 | hard to find data definitions | ||
15 | misunderstood data | ||
16 | inappropriate use of data | ||
17 | high level of data disparity | ||
18 | high level of data redundancy |
Designing Data
Symptoms | Causes | Solutions | |
19 | poor structural integrity | – poor modeling techniques- wrong choice of model type- poor business representation
– excessive detail – insufficient detail – process-oriented design – application-oriented design |
– data model standards- E-R model guidelines- dimensional model guidelines
– normalization guidelines – atomic data guidelines – aggregate data guidelines – subject-oriented design – consumer-oriented design |
20 | large change request backlog | ||
21 | failure to meet business needs | ||
22 | spreadsheet proliferation | ||
23 | shadow systems & databases | ||
24 | incorrect reporting | ||
25 | high level of data disparity | ||
26 | inefficient business analysis | ||
27 | high level of data redundancy | ||
28 | difficult-to-use data | ||
29 | hard-to-navigate databases | ||
30 | poor database performance | ||
31 | business rule violations in data |
Managing Data Quality
Symptoms | Causes | Solutions | |
32 | lack of trust in data | poorly defined DQ rules- missing DQ rules- absence of quality measures
– absence of quality reporting – lack of accountability – incomplete/incorrect edits |
– DQ rules taxonomy- defined DQ rules- DQ metrics and measures
– published DQ reports – regular DQ audits – designated DQ accountability – DQ tasks in project plans |
33 | poor structural integrity | ||
34 | incorrect data | ||
35 | untimely data | ||
36 | incomplete data | ||
37 | difficult-to-use data | ||
38 | incorrect reporting | ||
39 | inefficient business analysis | ||
40 | business rule violations in data | ||
41 | shadow databases | ||
42 | spreadsheet proliferation |
Integrating Data
Symptoms | Causes | Solutions | |
43 | high level of data disparity | – lack of integration architecture- technology-driven integration- inadequate data warehouse
– absence of data marts – unmanaged master data – poor integration practices – missing/wrong data sources |
– sound integration architecture- business-driven integration- sound warehousing design
– targeted data marts – master data management – integration best practices – defined data sourcing criteria |
44 | overlapping and conflicting data | ||
45 | untraceable data | ||
46 | enterprise reporting difficulty | ||
47 | application integration difficulty | ||
48 | complex system interfaces | ||
49 | limited data sharing | ||
50 | spreadsheet proliferation |
Accessing Data
Symptoms | Causes | Solutions | |
51 | hard to find needed data | – missing metadata- inadequate data access tools- insufficient indexing
– inadequate search capability – lack of content management – poor user interface – excessive downtime – database design not user – ineffective performance tuning – ineffective security processes |
– robust metadata- user-friendly tools and interfaces- indexing and searching
– data access portals – service level agreements – service level accountability – published service level metrics – security policies & procedures – periodic security/privacy audits – security/privacy accountability |
52 | can’t access needed data | ||
53 | data privacy compromised | ||
54 | data security compromised | ||
55 | data privacy compromised | ||
56 | needed access not authorized | ||
57 | obsolete permissions still active | ||
58 | data not available when needed | ||
59 | hard-to-navigate databases | ||
60 | poor data access performance | ||
61 | spreadsheet proliferation |
Managing Metadata
Symptoms | Causes | Solutions | |
62 | missing documentation | – casual metadata management- fragmented metadata tools- lack of documentation standards – lack of data modeling – undocumented changes – no documentation incentives – no documentation reviews – “rush to production” projects |
– metadata templates & guidelines- project metadata standards- maintenance metadata standards
– metadata registries & portals – metadata system-of-record – metadata accountability – metadata tasks in project plans – incentives and reviews |
63 | incomplete documentation | ||
64 | conflicting documentation | ||
65 | hard to find documentation | ||
66 | outdated documentation | ||
67 | confusing documentation | ||
68 | untraceable data | ||
69 | misunderstood data | ||
70 | high level of data disparity | ||
71 | high level of data redundancy |
Administering Databases
Symptoms | Causes | Solutions | |
72 | insufficient data storage capacity | – ineffective storage management- passive growth management- ineffective performance tuning – unscheduled maintenance – inadequate database – outdated DBMS versions – insufficient backup & recovery |
– continuous capacity planning- proactive growth management- performance SLAs
– availability/uptime SLAs – connection protocol standards – connectivity SLAs – routine DBMS upgrades – backup & recovery practices |
73 | unanticipated growth problems | ||
74 | poor query performance | ||
75 | poor update performance | ||
76 | excessive database downtime | ||
77 | unreliable database connections | ||
78 | needed features not implemented | ||
79 | corrupted data can’t be repaired | ||
80 | lost data can’t be recovered |
Managing Systems
Symptoms | Causes | Solutions | |
81 | complex system interfaces | – lack of data sharing architecture- lack of integration architecture- poor application design – “quick fix” maintenance – “misfit” acquired systems – inconsistent data formats – limited reuse of data functions – testing with production data |
– application architecture standards- application design review- maintenance & testing standards
– application acquisition guidelines – data sharing incentives – database wrappers – SOA-based access & update – reusable data quality rules – managed test data |
82 | untraceable data | ||
83 | poor application performance | ||
84 | high level of data redundancy | ||
85 | high level of data disparity | ||
86 | poor data quality | ||
87 | inadequate metadata | ||
88 | data security compromised | ||
89 | data privacy compromised | ||
90 | business rule violations in data |
Governing Data
Symptoms | Causes | Solutions | |
91 | poor data quality | – lack of data management goals- unclear, uncertain, ambiguous or misaligned RAA (responsibility, authority, & accountability)- poor P&P (policies & procedures) – understaffed data management – underfunded data management |
– “data as an asset” culture- clear data management goals- quality RAA + P&P
– security RAA + P&P – privacy RAA + P&P – compliance RAA + P&P – disaster recovery RAA + P&P – designated data ownership |
92 | data security compromised | ||
93 | data privacy compromised | ||
94 | data-related compliance violations | ||
95 | disaster recovery uncertainties | ||
96 | territorialism inhibits data sharing | ||
97 | data retention/disposal uncertainty | ||
98 | data consolidation difficulties | ||
99 | data ownership conflicts | ||
100 | need for data standardization |