My work involves consulting and teaching about big data analytics, and I’m fully aware that the real value of big data is in the new knowledge and understanding that can be gained by analyzing the data. But, silly as it may sound, the most exciting thing to me about big data is that it is driving changes in the world of data modeling. I guess it’s true: once a data geek, always a data geek.
So what is it that shakes up the world of data modeling? The changes are driven by many things that have little to do with volume of data. It isn’t really about big data; it’s about a radically changed data pipeline and different kinds of data that don’t fit neatly into rows and columns.
First, let’s look at the data pipeline. Long-standing data modeling practices are based on the idea of designing structures to store data in relational databases. We work from conceptual modeling, through logical design, and ultimately to a physical model that describes how data will be stored. The big data world stands this on its head. The data already exists; it is already stored without need or opportunity for us to design. The modeling purpose changes from design to understanding. Instead of conceptual → logical → physical, we begin with the physical and attempt to deduce a logical model. Instead of starting with entities, then proceeding to attributes and relationships, we begin with fields and try to deduce the things that they describe and the relationships among those things. It is this shift that drove me, along with Chris Adamson and Aaron Fuller to develop a course, Data Modeling in the Age of Big Data, which we teach frequently at TDWI events.
Now consider the new kinds of data. Everything in the NoSQL world changes how we approach data modeling. NoSQL databases support many constructs that entity-relationship models don’t handle well: many-to-many relationships, multi-valued attributes, embedded arrays, associations implemented without foreign key relationships, and much more. We have to think about modeling differently to accommodate these constructs, and adapting E-R modeling won’t get the job done. Finding a good modeling technique and notation that works was a challenge and something that we touch on only lightly in the TDWI course.
Last fall I met Ted Hills and attended his session at the Data Modeling Zone Conference (Chapel Hill NC, October 2015) where he presented a technique that elegantly steps up to this challenge. Ted has developed an extension of object modeling that he calls Concept and Object Modeling Notation (COMN) that fills the gap. Recently I had the opportunity to review the manuscript of his upcoming book NoSQL and SQL Data Modeling: Bringing Together Data, Semantics, and Software. On conclusion of the review I believe that this is a breakthrough modeling technique – and it is technique, not just notation. COMN provides notation to handle all of the constructs that E-R techniques don’t do well, and it steps up to the problem of linking physical and conceptual models. I’m excited and waiting to see the book in published form, which Ted tells me will be at Enterprise Data World in April. I’m convinced that COMN is the future of data modeling.