As discussed in previous columns, one of the foundational elements for good analytics is the input and use of good data. The ability for an enterprise to produce and keep data as “clean” as possible is based on solid data management principles and the subsequent treatment of data as a valuable asset of the enterprise. These principles include but are not limited to broad and sound data governance policies and organization including data quality measurement and enforcement, active metadata management and an enterprise data architecture tied to each application. In addition, good enterprise data architectures should include the following key elements:
- Enterprise data model
- Application data models
- Data integration and data movement architecture
- Integrated metadata architecture
- Data access architecture
- Logical and physical data store architecture
- Data management architecture (including security, privacy, systems management, etc.)
- Best practices and guidelines for each of the above areas
In this column. I want to focus on the data modeling aspect and how having and implementing good data models is vital to the analytic food chain.
My assertion is that any good analytic system needs data quality and an agreed to or at least known definition around each key attribute and/or measure. An enterprise data architecture with an enterprise data model as one of its key components is about getting the right data, to the right place, at the right time in the most usable and trusted format. To achieve success in this area, proper attention must be given to the data structures being designed so a good data model is instrumental to a common definition, reusability and data integrity.
To help set the stage, we need to go back to the olden days, the 1970s, where almost all system design was based on process modeling. For instance, Ed Yourdon and Tom DeMarco advocated and popularized an approach to modeling business flows via the data flow diagrams. Also, around that time Dr. Peter Chen developed a form of system design called entity relationship (E/R) modeling which focused on defining and recording business data requirement through the development of a logical data model. It wasn’t until the 1980s that James Martin and Clive Finkelstein came up with the Information Engineering approach where data became the focus of the methodology based on the premise that business processes are fairly volatile but the basic data requirements and relationships are a far more stable foundation to design from. The idea is that if you bring an enterprise perspective to your data design, then changes to that design will be less frequent and should only occur if your business model changes. This data-focused methodology also fits well with the work Ted Codd had done at IBM around the relational model and the early implementations of relational database platforms. Eventually, CASE tools such as Knowledgeware’s IEW and Texas Instrument’s IEF were introduced that modeled both data and processes, but their strong selling point to data people was their ability to define data requirements in one tool that you could logically design and model and then transform those into a physical implementation via DDL generation onto a relational database management system.
This methodology is fairly popular in developing “data-rich” systems but has never been a preferred methodology for the object-oriented or service oriented architecture crowd. Many reasons abound such as culture, training/education and/or that data modeling just takes too long; but any system that is developed, no matter the methodology used, should incorporate a strong data design that has an enterprise view but allows for application flexibility.
To make data modeling more palatable in today’s agile and iterative development environments, two key approaches should be applied and practiced across the development lifecycle. These two terms are iteration and collaboration.
Iteration:
Iterative approaches match well with conceptual and logical modeling approaches.
- Start with a key-based model; add more entities/relationships/attributes as functionality is added. Keep the future in mind!
- Once it’s physical, don’t ignore the logical model. Continue to adapt and improve the model for future projects.
Collaboration:
- No “closed doors.” Data modelers should participate fully in use case development and application design sessions with lots of reviews and inspections.
- Tie into the enterprise data model or build from the bottom up if none exists.
- Conform to corporate naming standards and modeling guidelines.
- If organizationally a formal data architecture framework is not in place, set up an informal network to encourage model reuse.
- Don’t throw over the wall to the DBA team but instead encourage participation in all phases.
No matter the methodology used, a data model and all its related data design tasks should be mandatory; otherwise you will end up with poorly designed data structures, uncontrolled redundancy, conflicting definitions, incompatible data types and sizes and the inability to do proper queries, to name just a few of the inadequacies caused by the omission of a complete data design.
If your analytics does not have a strong data foundation, then no amount of money spent on infrastructure, tools and/or the top analysts in the country will overcome that shortcoming. Before tackling a major analytics initiative make sure you have a complete data architecture foundation in place that increases the chances that your data will be complete, correct and consistent. One of the first places to start is to incorporate all phases of data design in all your projects.