Isn’t it ironic that a technology that bears the label of “schema-less” is also known for the fact that schema design is one of its toughest challenges? Aside from the well-known scalability and cost benefits of NoSQL databases, schema flexibility frees up users from many of the constraints of normalization rules in relational databases.
The JSON-based dynamic-schema nature of NoSQL is a fantastic opportunity for application developers; it has the ability to start storing and accessing data with minimal effort and setup, flexibility, with fast and easy evolution. But while flexibility brings power, it also brings dangers for designers and developers new to NoSQL or less experienced.
This is why the NoSQL database vendors counter their marketing department’s simplicity message by devoting countless pages, blogs, and videos to the subject of schema design (i.e.; MongoDB, DynamoDB, Couchbase, Cassandra, …)
To make matters worse, each NoSQL document database adopts a different storage strategy, even if pretty much all of them use JSON. For example, MongoDB assumes the definition of one “collection” for each entity, while Couchbase encourages to mix different entities in as few “buckets” as possible, ideally just one. Each vendor also prescribes a different approach for the definition and usage of the primary key (e.g.; DynamoDB’s hash and range vs MongoDB’s system-generated objectIDs vs Couchbase’s user-defined IDs.)
All of these factors create a steeper learning curve and sometimes an unnecessary barrier to the adoption of NoSQL. A number of negative stories have appeared on the web, but when you read between the lines, failure is always due to a misunderstanding or a lack of experience with the design of the data model. Additional difficulties start appearing with increased complexity of the data and scale.
All of this is compounded by the fact that the data structure is tacitly described – in the application code. And examining the code is not the most productive way to engage in a fruitful dialog between analysts, architects, designers, developers, and DBAs.
This is where data modeling comes into play as a best practice. A database model describes the business and it’s also the blueprint of the application. Such a map helps evaluate design options beforehand, think through the implications of different alternatives, and recognize potential hurdles before committing sizable amounts of development effort. Even more so in an Agile development approach, a database model helps plan ahead, in order to minimize later rework. In the end, the modeling process accelerates development, increases quality of the application, and reduces execution risks.