Data Architecture COMN Sense – The Missing Logical Layer in Data Management

blg02x-image-edAround 2009 someone used the term “NoSQL” to describe an emerging class of database management systems (DBMSs) that, as the name implies, did not offer the Structured Query Language (SQL) as a means for updating and retrieving data. Such DBMSs also typically did not store data in tables, but rather used structures such as key/value, document, object, and graph.

NoSQL DBMSs advanced especially in megascale Internet applications such as Facebook (which developed Cassandra) and Amazon (which developed Dynamo). Their applications were closed to traditional SQL DBMSs, whose scale was limited partly because of a commitment to high quality, ACID-strength transactions. By sacrificing some transaction consistency, these NoSQL DBMSs were able to reach a scale of hundreds of millions of users, something unheard of just a few years earlier.

The “NoSQL movement” spawned conferences, such as NoSQL Now! by Dataversity, and books, including my own (NoSQL and SQL Data Modeling, Technics Publications, 2016).

As NoSQL DBMS vendors struggled to gain market share, they found that one of the things holding them back was a lack of compatibility with the dominant data manipulation language, which is—you guessed it—SQL. Many NoSQL DBMS vendors have now added the option to their products to use SQL or SQL-like languages for data retrieval and manipulation, leading to a change in the interpretation of the moniker “NOSQL” to mean “Not Only SQL”.

At the same time, traditional SQL DBMS vendors, seeing their market share threatened, began to add non-tabular data organization options to their products, right alongside those traditional tables. They also made the ACID-feature selectable, so that not every transaction was forced to take the overhead of the strength necessary for financial transactions.

Thus, a convergence has begun, leading Gartner to predict, in their just-released 2016 Magic Quadrant for Operational Database Management Systems, that “By 2017, all leading operational DBMSs will offer multiple data models, relational and non-relational, in a single DBMS platform. By 2017, the ‘NoSQL’ label will cease to distinguish DBMSs, leading data and analytics leaders to select multi-model and/or specific document-style, key-value, graph and table-style engines.”

The ability to store data without pre-specifying a schema is a wonderful thing. However, after storing petabytes of data with little attention to organization, some have realized that, when designing NoSQL databases, the effort is focused mostly on physical database design: organizing and reorganizing data, copying and de-normalizing data, until critical queries perform adequately and critical data has the required keys and indexes. Then, application developers must be told what they must do to keep the data consistent, since one piece of data might be stored in many places, and consistency is either a second-place concern or not a concern at all to NoSQL DBMSs. Finally, all that physical data reorganization and application code re-release is costly, and it would be valuable to have tools and methods to reduce the number of iterations.

Some have come to live in a hybrid world, where data might be collected in NoSQL document databases, then analyzed in graph databases and traditional SQL databases. This last group especially has encountered the frustration that those logical data models that were supposed to be independent of physical database design issues are, in fact, affected by physical database design choices.

All this churn and NoSQL/SQL convergence is evidence that might make some widely accepted assumptions appear to be not so true after all.

First, SQL and entity-relationship data modeling notation—even conceptual and logical data models—are both tied to a particular physical data organization, which is data in tables.

Second, it turns out that the relational model of data does not—or should not—tell us how to store data (in tables). Rather, it tells us how to think about data. Concepts of attributes and types, of primary and alternate keys, of foreign key relationships, of consistency constraints, and of subtypes, are equally applicable to SQL and NoSQL databases. The big difference is that NoSQL databases put the burden on the application programmer of enforcing many constraints throughout their code that SQL databases can enforce directly and centrally. Not always a bad thing, but definitely different.

But then, if our primary data modeling tool and our primary data language are both tied to tabular data organization, it must be that in fact we don’t have a logical data modeling notation, nor a language, in which to rise above physical considerations and focus just on the structure and meaning of data!

I developed the Concept and Object Modeling Notation (COMN) in order to have a notation in which we could develop logical data models that are truly free of physical implementation choices, that could therefore serve as a universal graphical data notation across hybrid databases, within an enterprise, and across enterprises. But I made sure that the same notation could be used in a practical way to express SQL and NoSQL physical database designs, and could also be used to represent the real-world concepts and objects that are the focus of ontological models and that data is about. It’s still early days for COMN (version 1.1 of the notation was just released), and support for it is still developing (and growing). But I, and some others, believe that COMN enables us to reach that goal of true logical data definition, whether we use those models descriptively—to describe the data we’ve already collected—or prescriptively—to define a schema to which data should conform. COMN can be used to describe or define any data that is or will be stored in any SQL or NoSQL form. The Concept and Object Language (COL) is on the drawing board, and rests on the terminological clarifications brought by COMN. Don’t focus on that vaporware, though; instead, please help to put COMN on solid footing with tools (one committed vendor to date) and users (three corporations are interested, plus a number of individuals). Download reference materials. Read the book. Follow me on LinkedIn. Talk about the notation. Send me questions. Perhaps, after all, we’ll have a useful, practical graphical notation for data independent of its storage, for data in storage, and for the things that data is about.

This monthly blog talks about data architecture and data modeling topics, focusing especially, though not exclusively, on the non-traditional modeling needs of NoSQL databases. The modeling notation I use is the Concept and Object Modeling Notation, or COMN (pronounced “common”), and is fully described in my book, NoSQL and SQL Data Modeling (Technics Publications, 2016). See http://www.tewdur.com/ for more details.

Share

submit to reddit

About Ted Hills

Ted Hills has been active in the Information Technology industry since 1975. At LexisNexis, Ted co-leads the work of establishing enterprise data architecture standards and governance processes, working with data models and business and data definitions for both structured and unstructured data.

  • Joel Mamedov

    Thanks of an article. Data modeling tool vendors now adding capabilities to generate NOSQL “schema” from your traditional logical design. A logical design as name implies has nothing to do with storage or target database types. NOSQL in my view has a similarity with fast food phenomena. You can get your junk food quickly and cheaply. But, if you keep eating it for long period of time then more likely result will be some kind of major health problem. Then you need to spend more resources for cure.

Top