Whether you’re opening a website or running a business, it’s very likely that you’re going to be using some sort of digital database, especially since the days of writing everything on punch-cards are thankfully behind us. Not only that, but the truth is that everybody produces and has to deal with some data, and so organizing it in a way that is easy to find makes it so that data is everybody’s job and they can do it well.
The question then becomes: What database to use?
Well, there’s a ton of answers to that question, such as MS-SQL, Oracle, or PostgreSQL. There’s also the big player on the battlefield: MySQL, the big boy that’s been around for ages and is one of the most popular. Slap an RDBMS that uses SQL and you’re pretty much set.
Of course, there are some interesting alternatives, which is why we want to introduce you to unconventional NoSQL.
What is NoSQL?
When it comes to SQL, they are completely relational databases. That means you have to build the parameters of the database, such as types of fields, the schema, and so on before you can actually input any data. This can be a bit of a bottleneck in certain situations and isn’t a great solution for agile development.
This is where NoSQL comes in. It doesn’t need any of that stuff for you to start inputting data into the database. One of the big benefits here is that it is nearly infinitely scalable and much faster than a lot of operations performed on databases. The latter part is due to it being low latency, which can help you serve millions of servers if you wanted to (well, theoretically if you could have that many).
On top of all that, there’s no need for a schema; it can manage large chunks of data, and most importantly, it’s programmer-friendly, in that it has simple APIs in most major programming languages out there.
So really, NoSQL contrasts really well with your typical SQL.
Types of Data Models
There are a variety of NoSQL databases and data models you can pick from, each with their own unique advantage and uses.
Key Value Store
Key-Value Stores are essentially based on Amazon’s dynamo Research Paper and Distributed hash Tables. How they work is that they use a hash table wherein a unique key is used to store a pointer and its associated data. As such, it has very high performance since you call the key directly and doesn’t store data relationally.
In contrast to the good old days when things were just simple columns and rows, most modern data representation use XML or JSON. The advantage of NoSQL is that since it doesn’t use a relational model to store information, nor does it hamstring either of them by tying them together. Since NoSQL has no schema, the files can exist independently of each other and make life easier all-around.
In fact, one popular implementation of this is MongoDB, which is used by Sourceforge and bit.ly.
Column-oriented or wide-column databases essentially store information as a three-dimensional array. Basically, the columns in each row are contained within that row, which is a bit hard to wrap your head around at first. Rows do not need to have the same number of columns and as such columns can be added infinitely without having to add it to other rows.
Incidentally, this data model came out of Google’s BigTable system and is used by them for its Google File System. Reddit, Facebook, and a few others also use a similar data model for their databases.
There is a little bit of an argument to be made here whether Object-oriented databases are purely NoSQL or not. While we won’t go into that argument here, we will say that they usually are, so we’ll include them here as such.
As the name suggests, this type of database model lets you store information as an object, making it transparent. This makes them great for research or web-scale and some well-known ones are Neo, Versant, and db4o.
This type of database stores data based on relevance, meaning parent-child or tree. Essentially, this is pretty great for storing location information in geospatial databases. This is quite useful considering how popular geotagging is these days, and the 1:N relationship makes the storage of this information much easier and straightforward. It is also a data model that works well with GIS with a well-known example being PostGIS.
Funnily enough, the Windows Registry uses a hierarchical database.
Otherwise known as a graph-network database or graph database, as the name suggests, this is a type of database that works well for representing information that would go on a graph.
Essentially, this model relies on relationships and nodes. The nodes represent an entity and a relationship shows how those two nodes are related. This means that graph databases can grow massively, and is especially useful for data that is altered frequently.
One of the most well-known implementations of this is FlockDB, which was developed by Twitter and used to show who follows who. Using the Gizzard Framework, the database can be queried a whopping 10,000 times per second, which is very impressive, to say the least.
Data in this model is stored as subject-predicate-object, with the predicate describing the link between the subject and object. Triple stores are actually relatively similar to network databases in the way that they function. In a sense, triple stores are great for semantic queries that SQL can’t handle as easily, especially as you add complexity.
Some great implementations of this model are Virtuoso and Sesame.
As you can see, NoSQL is incredibly versatile and allows you to handle different types of data depending on your needs. It’s great for massive amounts of data storage and recall, great for non-structured data, or data that doesn’t need to be relational and it’s excellent for scalability. It’s also easily available in most languages, which is one of the big factors that make it so popular. All that being said, is NoSQL better than SQL? Absolutely not. It wholly depends on your use-case scenario and what you’re trying to achieve. The truth is, sometimes SQL will be better and sometimes NoSQL will be better. Essentially, NoSQL is just a more flexible version of SQL that requires you to throw yourself into the thicket and cut down a jungle to get things done. If you just want something simpler and more straightforward, SQL works well.