Agile Data Design – August 2014

Agile-DataThose of you who are as old as I am will remember a children’s book called “The Wind in the Willows”, by Kenneth Grahame. One of the central characters in the book is Mr. Toad, a vain and wealthy squire with an obsession for current fads. He jumps from one all-consuming passion to another, lavishing vast amounts of money on each, but never developing any understanding of the underlying processes that would make his interests educational and life-enriching.The plot reaches its crisis point when Mr. Toad develops a passion for motor cars. Refusing to learn anything about how to properly drive a motor car, or about the rules of the road, he smashes up one car after another and squanders most of his wealth on lawsuits and court fines. His friends (Badger, Rat, and Mole) finally have to stage an intervention to bring him to his senses.

I’ve been thinking of Mr. Toad a lot since interest in BI and analytics has exploded at the company I work for. It seems like everybody and his dog is running out and buying BI tools and data appliances, and arranging for POCs with every BI vendor under heaven (and lord, aren’t there a lot of them!).

As our company’s BI architect, it’s my job to impose some degree of order and coherence on our BI/analytics infrastructure, and provide some guidance and direction to the business managers who are trying and buying these products, without quenching their enthusiasm for BI itself. After all, the emerging interest in things like BI and Big Data represents a golden opportunity for those of us in data management. For decades, we’ve been preaching the business value of data reuse, and advocating for data management practices that improve the quality, reusability and business value of our data. Now, at long last, our companies are climbing aboard the data reuse bandwagon. This is the acid test of how good (or bad) our data management practices have been!

The best definition of BI I have ever encountered comes from knowledge management expert Thomas Davenport (2007): “BI is a set of technologies and processes that use data to understand and analyze organizational performance“.

This definition tells us two very important things about BI:

  • BI is not simply the use of a tool, or set of tools. There needs to be a defined process associated with the use of BI technology.
  • The purpose of BI is to understand our current organizational processes and improve them; that is, BI needs to be done in the context of process improvement within an organization.
In the first article I wrote for TDAN.com on Big Data[1], I made the point that BI needs to be considered as a process that intersects with other processes that exist (or may exist) within the organization, including:
  • Data management – the process of ensuring the quality, accessibility, reusability and business value of an organization’s data assets. This is usually an IT function.
  • Data governance – the process of determining and allocating responsibility for the definition and content of data. This is usually a business function.
  • Process improvement – the process of determining where and how incremental improvements in a company’s operating processes can be made.
  • Stakeholder management – the various processes of managing a company’s relationships with its key stakeholders (e.g., customer relationship management, supply-chain management, human resource management, etc.).
In my second article on Big Data[2], I made the additional point that failure to understand the process aspects of BI (focusing instead on the capabilities of the cool tools) has led to disaster at a number of companies, including CNN, Target, Wells Fargo and JP Morgan Chase. In particular, failure to understand that the signal purpose of BI and analytics is to manage stakeholder relationships and drive positive behavioral change. Companies that have succeeded with analytics (including UPS and Express Scripts) have used data to engage their customers, employees, suppliers and other company stakeholders in positive, creative and mutually profitable ways. Companies that have failed with analytics are those that use data to punish their stakeholders for “inappropriate” behavior (e.g., insurance companies who raise premiums for people who buy plus-sized clothing online or banks who lower the credit rating of people based on their Facebook associations).Business decisions based on data analytics should be made carefully, and always in a manner which enhances a company’s public repute and its relationships with key stakeholders.So what are some of the critical success factors for a BI or analytics initiative? Answers will vary somewhat from project to project, but here are some things that should be included in all BI projects, including POCs:

  • Identification of one or more business processes that will be improved (or created) using the results of the data analyses (e.g., Customer Retention), along with accompanying business stakeholders and subject matter experts.
  • A target metric, or set of target metrics, to be applied to the business process. For example, “Reduce customer churn for server by 10%”.
  • Identification of the source data needed to solve the problem. The BI tool should provide the ability to sift through large volumes of data quickly and easily, allow the business user to identify and extract the data that is needed, and document the metadata associated with this data (e.g., source, currency, business meaning, etc.) in a metadata repository.
  • Cleansing and transformation of the data. The BI tool should provide the means for doing any necessary cleansing and transformation of the source data (especially if the data is coming from transactional databases), so that computations and aggregations can be done. Again, it should be possible to document these transformations in a metadata repository for future reference.
  • Identification and correction of bad data. The tool should provide the ability to identify data that needs to be corrected, and there should be a defined process for correcting this data, both within the BI tool or repository, and at the source of the data.
  • Identification and extraction of “golden” data. The tool should provide the ability to identify data that can be used across the organization as master data, and allow this data to be extracted out to a separate master data repository for general reuse.
  • Data validation. The tool should support some process of validating the data (as well as the results of the data analyses). Can the results be compared with reports or analyses from other sources to provide at least some degree of reassurance? Do summarizations of the results pass a “smell test”? Also, the means by which the data was validated, and the results of the validation process should be documentable as metadata so that users of the analyses will know how much confidence they can place in them.
It is very important to understand that the process of BI is, essentially, a risk-management process. At every step in the process, users need to know how much confidence they can place in the data, what degree of uncertainty they need to manage, and what they can (and cannot) confidently use the data for.
  • Data security. The tool should support a process for defining access both to the source data and to the results of data analyses, so that only authorized persons (ideally, specified via Windows Active Directory groups or some similar mechanism) can view the data. The tool should support both row-based and column-based partitioning of the data.
  • Data distribution. The tool should support the publication or access of the results of analyses to authorized subscribers, either through email distribution of results (push) or through some sort of portal (pull).
  • Metadata management. As already mentioned, the tool should provide a means of maintaining metadata about both the source data and the results of data analysis. People looking at a report or a spreadsheet will want to know where the data came from, how current the data is, what transformations or manipulations of the data were done, what formulas were used in the calculations, and some direction as to what business purpose(s) the data can and cannot be used for. Metadata is crucial for managing both expectation and risk; that is, people should be made aware of both the business value of the analysis and the risk to the business (if any) of using it as the basis for business decisions.
  • Data governance. During the process of data discovery and analysis, questions will undoubtedly arise about the business meaning and definition of certain data fields, and about whose responsibility it is for defining this data and managing its content. These are data governance questions. If a data governance structure and process already exists within the organization, then self-service BI should be made a part of that process. Otherwise, some sort of mechanism (perhaps using Sharepoint or some similar portal) should be put into place to manage the questions and issues that arise, and assign responsibility for resolving them to the appropriate persons or groups.
  • Data retention. The tool should support some mechanism for documenting data retention requirements for both the source data and the results of analyses, identifying data that is no longer current enough to be useful, and purging this data from the repository. For example, in the case of one financial institution, a 500 terabyte (TB) database was found to contain 1 TB of active data, and 499 TB of analytics output, much of which was outdated and of no use.3
One final point: Not only does BI need to be part of a well-crafted strategy and process for solving business problems and improving business processes, it’s also important for the business to be willing to be “data-driven”. That is, business people must be willing to base their business decisions on objective data, rather than hunches, biases or what “everyone else is doing”. It does no good to spend money and effort creating objective data if the business isn’t willing (or able) to act upon it.What a successful BI project teaches us, ultimately, is humility. We learn that we didn’t know what we thought we knew, and are not doing what we ought to be doing. Humility, as Mr. Toad discovers in the story, is the prerequisite to successful change.NOTE: I’d like to make this a dialogue, so please feel free to email questions, comments and concerns to me. Thanks for reading!

References:

[1] Burns, Larry. “Big Data and Data Governance”. TDAN, November 2013: http://www.tdan.com/view-articles/17106.
[2]  Burns, Larry. “In Search of Big Data, and a Grown-Up in the Room”. TDAN, February 2014: http://www.tdan.com/view-articles/17237.
[3] IDG Communications. “Strategic Guide to Big Data: Challenges and Opportunities”, Spring 2014, p. 7.

Share

submit to reddit

About Larry Burns

Larry Burns has worked in IT for more than 25 years as a database administrator, application developer, consultant and teacher. He holds a B.S. in Mathematics from the University of Washington and a Masters degree in Software Engineering from Seattle University.  He currently works for a Fortune 500 company as a database consultant on numerous application development projects, and teaches a series of data management classes for application developers.  He was a contribut0r to DAMA International’s Data Management Body of Knowledge (DAMA-DMBOK), and is a former instructor and advisor in the certificate program for Data Resource Management at the University of Washington in Seattle.  You can contact him at Larry_Burns@comcast.net.

Top