As a Data Management professional, it is important to understand the top three challenges of working with graph databases.
Modeling Highly Interconnected Data
One of the uniqueness of graph databases is its high level of intense expressiveness. This in turn makes it difficult to model a domain on a graph. It is similar to modeling knowledge.
Let’s consider ontology engineering as an example. In this case, an expert graph data engineer with proficiency in ontology engineering is needed to model a graph structure. In reality, this is not a recommended suggestion which will encourage the use of graph databases. In this scenario, what is needed is an approach which would allow engineers of all skill levels to be able to model their domain easily without being a graph database or ontology expert.
Maintenance of Data Consistency
Once an ontology is obtained to model the domain and govern the structure of the graph database, then it becomes important that data loaded into the graph database adheres to the ontology. The data model defined for the domain does not act as a schema for the graph database. Graph databases, like other NoSQL databases, delegate schema handling to the application.
The graph database in Figure 1 depicts:
- XYZ company is related to John Doe through an ‘Employed at’ relationship
- John Doe is related to Doggy through an ‘Owns a’ pet relationship
In this scenario, the graph database should not be allowed to store data that implies a pet named Doggy is employed by company XYZ. It is very challenging to deliver a system that allows you to maintain and guarantee the data consistency with a high level of intense expressiveness. Due to the high complexity of interconnected data handled by graphs, the shortfall of maintaining data consistency in graph databases becomes a crucial challenge to overcome in utilizing them confidently.
Graph queries are not generic enough to be used across domains. Every question posed to a graph database requires a custom query based on its domain model. In addition, it is not feasible to abstract the graph query into functions that would take a user’s input as an argument and reuse those functions across multiple use cases. The adoption of graph databases has been slow due to their complexity from a query perspective.
How to Identify the Right Graph Database
It is important to map project requirements to a graph database and then gauge the graph database tools using the SMART technique, explained below.
Step 1: Map Project Requirements
To start off in the process of identifying the right graph database tool for your project needs, consider the following criteria:
- What is the project’s problem statement?
- What kind of data needs to be stored?
- How do you want to query/explore your data?
- How does the database fit into your system?
- Who will manage your database?
Step 2: SMART Technique
Based on the project response to the above questions, now gauge the strength of all considered graph database tools using the following:
- Speed – With voluminous data piling up by every microsecond, the speed at which the enterprise can analyze their data becomes an extremely important factor for reducing cost and the time for decision making.
- Meaning – How well the graph database organizes, and stores data helps to maintain the connectedness of multiple entities, enabling computers to interpret related items in a meaningful context instead of just matching keywords. This helps tremendously to retrieve information based on meaning and logical relations from the system.
- Answers – The meaning attached to the entities allows graph databases to answer questions that cannot be answered with simple keyword searches. The querying capability of the graph database becomes extensively helpful to accurately and effectively retrieve the relevant information.
- Relationships – Graph databases help to explore both visible and hidden relationships despite the complex connections among billions of entities. This capability enables the organization to visualize their data from different perspectives and even to connect to other external sources and uncover new relationships.
- Transformation – Graph databases have wide-ranging potential to transform enterprise data management into a well-interconnected view of all data sets from many different angles.
For example, Product A may excel in Speed, Meaning and Answers (great query language); Product B may excel in Speed, Relationship and Transformation. You would then make your decision regarding which of these are more important to the needs of your project.
Once the basic concepts of graph databases are understood, it becomes easy to see their benefits, enabling a user to see the information as an inter-connected structure. This article is the final installment in a series published here on TDAN.com. I hope it and the series have provided useful insights in understanding what a graph database is, how it differs from other types of data stores, the two classes of graph databases, its use cases, challenges and identifying the right product.
This quarter’s column is written by Anandhi Sutti with THE MITRE CORPORATION who has over 20 years of Data Management experience. She has helped public and private sector clients to oversee and strategize the implementation of Data Management and Business Intelligence (BI) projects. She has strong expertise in Business Intelligence, Data Analytics, Data Modeling, Data Architecture and Data Strategy along with architecting complex systems and applications using a Software Development Life Cycle (SDLC). Anandhi has a master’s degree in computer applications and bachelor’s degree in mathematics.