Big Changes in the World of Data Modeling

BLG02x - Wells - image(1)My work involves consulting and teaching about big data analytics, and I’m fully aware that the real value of big data is in the new knowledge and understanding that can be gained by analyzing the data. But, silly as it may sound, the most exciting thing to me about big data is that it is driving changes in the world of data modeling. I guess it’s true: once a data geek, always a data geek.

So what is it that shakes up the world of data modeling? The changes are driven by many things that have little to do with volume of data. It isn’t really about big data; it’s about a radically changed data pipeline and different kinds of data that don’t fit neatly into rows and columns.

First, let’s look at the data pipeline. Long-standing data modeling practices are based on the idea of designing structures to store data in relational databases. We work from conceptual modeling, through logical design, and ultimately to a physical model that describes how data will be stored. The big data world stands this on its head. The data already exists; it is already stored without need or opportunity for us to design. The modeling purpose changes from design to understanding. Instead of conceptual  logical physical, we begin with the physical and attempt to deduce a logical model. Instead of starting with entities, then proceeding to attributes and relationships, we begin with fields and try to deduce the things that they describe and the relationships among those things. It is this shift that drove me, along with Chris Adamson and Aaron Fuller to develop a course, Data Modeling in the Age of Big Data, which we teach frequently at TDWI events.

Now consider the new kinds of data. Everything in the NoSQL world changes how we approach data modeling. NoSQL databases support many constructs that entity-relationship models don’t handle well: many-to-many relationships, multi-valued attributes, embedded arrays, associations implemented without foreign key relationships, and much more. We have to think about modeling differently to accommodate these constructs, and adapting E-R modeling won’t get the job done. Finding a good modeling technique and notation that works was a challenge and something that we touch on only lightly in the TDWI course.

Last fall I met Ted Hills and attended his session at the Data Modeling Zone Conference (Chapel Hill NC, October 2015) where he presented a technique that elegantly steps up to this challenge. Ted has developed an extension of object modeling that he calls Concept and Object Modeling Notation (COMN) that fills the gap. Recently I had the opportunity to review the manuscript of his upcoming book NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software. On conclusion of the review I believe that this is a breakthrough modeling technique – and it is technique, not just notation. COMN provides notation to handle all of the constructs that E-R techniques don’t do well, and it steps up to the problem of linking physical and conceptual models. I’m excited and waiting to see the book in published form, which Ted tells me will be at Enterprise Data World in April. I’m convinced that COMN is the future of data modeling.

Share

submit to reddit

About Dave Wells

Dave Wells leads the Data Management Practice at Eckerson Group, a business intelligence and analytics research and consulting organization. Dave works at the intersection of information management and business management, where real value is derived from data assets. He is an industry analyst, consultant, and educator dedicated to building meaningful and enduring connections throughout the path from data to business value. Knowledge sharing and skills development are Dave’s passions, carried out through consulting, speaking, teaching, and writing. He is a continuous learner – fascinated with understanding how we think – and a student and practitioner of systems thinking, critical thinking, design thinking, divergent thinking, and innovation. He can be reached at dwells@eckerson.com.

  • John Giles

    Thanks so much, Dave, for sharing your insights. I’d love to toss in a few comments relating to the Concept & Object Modeling Notation (COMN) that might generate further discussion from you and others.

    As a quick personal background, my main focus is on data modelling. When I do work for a client, I am often asked to use the modelling tool (and notation) of their choice. Sometimes they want relational models – examples of nominated tools have included ERwin (a highly respected toolset from the relational world), and Visio. Other times they want an object-oriented (OO) class model e.g. using components of IBM’s Rational suite of tools, Sparx’s Enterprise Architect, or again Visio, but this time using its Unified Modeling Language (UML) facility for class diagrams.

    It’s my use of UML that has triggered this response. In your article, you note limitations that may be encountered in the relational modelling world, e.g. “…many-to-many relationships, multi-valued attributes, embedded arrays, associations implemented without foreign key relationships …”. I believe that all of these requirements you articulate are able to be handled within the UML class modelling notation. I do understand that the beginnings of the UML are rooted in an OO developer’s world, but it is worth noting that David Hay, in his book “UML and Data Modeling: A Reconciliation”, shows how the UML can be successfully used for modelling data structures.

    So where might my thinking relate to Ted Hills and his COMN? Based on my very limited understanding of COMN, it may be possible that his work and the UML notation have much in common. If this is true, there may be some real benefits for Ted and the COMN, especially if he is able to map the COMN constructs to the XML Metadata Interchange (XMI) protocol.

    Firstly, people who wish to take up Ted’s language may be able to retain the UML modelling tool they own and are comfortable with, and use it to assemble their thinking on a COMN model before exporting it (via XMI) to the COMN platform. Similarly, if they have already invested in models using the UML, they could potentially migrate their investment to the COMN platform if they so choose.

    Secondly, if people adopt a COMN modelling tool, they may be able to share their COMN work with others who have XMI-compliant UML class modelling tools.

    And last but not least, maybe one of the UML tools might be able to be applied to directly to COMN modelling, in a similar way to how David Hay demonstraties UML’s ability to be used by “data” modellers?

    Just some ideas from this end, and it is more than possible that I have misunderstood COMN. But even if this is true, hopefully me “asking the dumb questions” may provoke further dialogue by others. I hope so.

Top