The Curious Case of Modern Consistency

Ideas of the non-relational era in data management and the death of “conventional” databases, met widely in the end of 2000’s [1], have recently been reevaluated. Once it was a strong move against slow but reliable database management systems (DBMS) toward scalable and faster yet narrowed solutions. NoSQL solutions became so popular that developers sometimes viewed them as a “silver bullet.” There is no crucial problem with inconsistency, such as when one deletes compromising photos on a social network, but they are still visible to other people for a couple of seconds afterward. Nevertheless, in recent years, applying similar approaches to business-critical applications has revealed undesired outcomes.

Managing transactions for domains with valuable resources assumes data consistency as a major feature. Among them are online banking, real-time bidding, telecom billing, high-frequency trading, ad networks, sensor management, gaming, medical insurance, booking services, and more. For those systems, consistency means having data in a globally non-conflicting state over time. The stated property can be expressed by a set of constraints. As an example, a non-overdraft bank account should be greater than or equal to zero. This can be violated when several users withdraw from the same account simultaneously, but a constraint is not kept by a DBMS. This is what likely affected an ATM network in 2013 [2] and a Bitcoin market in 2014 [3]; a principal fault in data management led to the physical loss of resources (money, reputation, clients).

One probable reason for such faults is an increasing misunderstanding of consistency. Developers might expect too much of consistency in an exploited DBMS if he/she worked with conventional and strongly consistent databases before. In some DBMS’s, strong consistency is switched off by default or is not presented at all; this might be mistakenly unconsidered until unrecoverable troubles occur. Existing NoSQL solutions support geo-distributed scenarios, which is beneficial for latency reduction in mass online services. At the same time, one cannot build a globally consistent and reliably distributed system. Brewer’s theorem, a proven mathematical fact, states the trade off between these characteristics [4]. Those facts have emerged into different “kinds of consistencies,” which are reviewed in this article along with their business-related implications and challenges.

Strong Consistency

For developers, a database is a black box that is accessed for reads and writes. Things are simple when clients transact without overlaps. In picture 1.a, Alice books a room and Bob sequentially books another room. Anomalies with consistency arise from concurrency in transactions. In picture 1.b, transactions collide: Bob starts booking concurrently while Alice’s booking is in progress, so [A1-A2] intersects [B1-B2]. Strong consistency means exactly what we expect to see in sequential cases. If data is changed by one client, the new value is proved correct and visible to all clients. What if only one room is left? [A1-A2] can lock the room object, so that Alice effectively books it, while Bob’s attempt to book it is rejected at B1 and no overbooking happens. Though, if Alice decides not to book and quits at A2, the room is not booked at all. This is pessimistic concurrency control.

In optimistic concurrency control, [A1-A2] doesn’t lock the room, but if Alice books at A2, a write conflict is detected at B2, and Bob is rejected. If Alice quits, B2 observes no inferred conflict with [A1-A2], and Bob books the room. As such, in the latter mode revenues behave optimistically, while locks from the former are considered expensive. With ACID-compliant databases, not only C stands for Consistency, but all four letters provide transactional concepts to handle global constraints (Atomicity) and concurrent conditions (Isolation) in a Durable and decomposable way.

Eventual Consistency

To serve more users with less latencies, one can geographically distribute a number of servers and replicate the database over. All replicas eventually converge to an identical (eventually consistent) state, usually over a number of seconds. If a Stockholm-based server caters Alice, while a Berlin-based server caters Bob regarding the same room, then Alice’s changes are not visible to Bob immediately. Bearing Brewer’s theorem in mind, the simplest way to deal with strong consistency in that case is to abandon it. Early NoSQL adopters argued that Italian financiers successfully used eventually consistent accountant schemes for centuries in the Middle Ages [5]. Bank representatives claimed, “Let those guys simultaneously exhaust an account to minus, we’ll fix that later” [2,5]. If the widely used eventual principle of “last writer wins” is applied, than Bob gets the room, but not Alice, which is opposite to what should have happened. At the same time, some HORECA experts used to argue that you must understand eventual properties of your booking software and have some rooms permanently reserved for a shadow booking case [6].

Following this, one ends up implementing ad-hoc concurrency control with self-made transaction management tiers, narrowing those to subject domain entities and its specific anomalies. Luckily, lead engineers of Internet giants view building simple abstractions on top of weakly-consistent storage systems one of the top challenges because using weak consistency is difficult [7]. They love eventual consistency but eventually agree that some applications are much easier to implement with strong consistency [8]. Database patriarch Dr. Stonebreaker points out that ACID compliance can be written in at the application layer, though writing the code for such operations “is a fate worse than death” [9]. As a reference, users of some NoSQL DBs once complained of not being able to rely on querying shortly after writing to a DB, because the DB hadn’t yet converged.You can barely survive it if your servers are distant, but on a single multi-core machine, that feels quite curious.

Causal Consistency

For a replicated DB case, some anomalies (picture 2.a-c) can be addressed using a recent approach based on relative operational order [10]. To keep replicas identical for every client, a number called “logical clocks” increases in every replica at every local operation, so that all replicas live in the same “(chrono)logical order”. Matching “clocks” at synchronization allows one replica to hide its current changes from its clients until all preceding changes from other replicas arrive. Eventually all replicas converge, now with causal order preserved. However, this tracks only cause and effect relationships, but not concurrency-related anomalies (like in picture 2.d). To emphasize that, a similar approach, which tracks causal order by a special data structure with implicit clocks, is called “Consistency without concurrency control” [11].

Back to Basics?

Narrowing down causal consistency to trade-off for performance, one can come to “read-your-writes,” “session,” “monotonic read,” “monotonic write,” and even some other consistencies, such as “local.” That feels overly complicated, and strong consistency starts to be a sweet spot. For businesses, it brings reduced time to market and less total cost of ownership. Developers can focus on business logic and stunning things from their native areas rather than reinventing data tiers and sophisticated protocols.

Saving transaction-based consistency, performance becomes a big challenge. Scale-in solutions boost maximum performance using existing hardware instead of pushing on server fleets. They shrink out intermediate APIs and incorporate cutting-edge algorithmic approaches to speed up on commodity machines by orders of magnitude. The scale-out camp tries tackling physics and math to add elements of transactions to distributed DB’s. One of the largest Internet companies believes that “it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions” [12]. They have recently started to adopt high resolution atomic clocks (physical instead of “logical”) and GPS antennas to implement global transactions. “Clocks” become a reference point to look at a distributed system. Though, the need for the physical transfer of changes for long distances remains unavoidable. It is a vexed idea to use a scale-out approach only for increasing transactions per second rather than addressing hot failover replication or reducing networking latency; higher costs of ownership and loss in consistency come into place.

Several NoSQL systems have also started to provide restricted transaction-style facilities. As an example, recent document-oriented databases allow singleton operations scoped to a single document to be atomic, or even provide document-level locks.

Recent movement in the market, which is sometimes referred to as NewSQL, is not a back-to-basics, but rather a search for out-of-the-box DBMS’s to handle data management facets better apiece instead of compromising. Choosing a relevant DBMS solution, lean software vendors can simultaneously reduce time to market and increase business agility and product quality by strong consistency and ease of use, and protect business scalability through outstanding performance.