Part 3 completes this article series by discussing some important topics beyond the critical differentiators in the terminology and capabilities of Property Graphs and Knowledge Graphs covered in Parts 1 and 2.
While it is not possible to cover all capabilities and considerations of Knowledge Graphs or Property Graphs in an article – even extended in three parts – this final part concludes by covering, at a high-level, a few important topics to consider including:
- Graph analytics
- The widely differing level of graph capabilities provided by Knowledge Graphs vs. Property Graphs to support the partitioning of data.
- Limitations of Property Graphs
- The inherent semantics built-in to Knowledge (RDF) Graphs allowing them to capture more than just data, but also the meaning or semantics of data, including rich constraints and highly expressive rules
- Some Guidance for Moving from a Property Graph to a Knowledge Graph
This white paper is not intended to completely cover all capabilities of Property Graphs or Knowledge Graphs. We have focused only on critical differentiators. With this, we need to at least mention two important topics:
- Algorithms for Graph Analytics
- Named Graphs
Graph analytics is a key application for property graphs. By analytics, we mean node centrality, node similarity, shortest paths, clustering and other algorithms. Property Graphs are known for offering these algorithms and many applications of property graphs rely on such algorithms. Having said this, there isn’t anything special in a property graph data model that makes these algorithms possible. They can be applied equally well over RDF Graphs. In fact, many RDF-based solutions are also offering similar algorithms.
The ability to partition data is important. Relational databases partition data using tables and views. Both Property Graphs and RDF Graphs let users work with sets of nodes of a specific type (in the case of Property Graphs, nodes carrying a specific label), e.g., a query can be limited to only work with actors or to only work with directors. This provides a very basic, limited partitioning.
RDF data can also be partitioned in named graphs. A named graph offers us a way to say that some group of triple statements belong to a “sub-graph.” We can then give it a uniquely identifying name (hence, the term “named graph”) and associate any other information with it that we see as important. The idea is somewhat similar to views in relational databases. A single statement can belong to many named graphs. Thus, it is a different concept from physically partitioning distinct graphs across different machines.
We can query a named graph individually, or we can query all available graphs, or a subset of available graphs. We can load a named graph, clear it and perform any other manipulations with it. This again follows the idea of “separate, but connectable.”
For example, in an enterprise Knowledge Graph solution, a given business glossary, or a taxonomy is a named graph. Resources in it can be connected to resources in other graphs, but it can also be manipulated as a distinct set of statements. For example, there could be a purpose associated with a glossary as a whole e.g., its users and uses can be identified and so on. There is no similar concept in the Property Graph world.
Limitations of Property Graphs
In this white paper, we describe some limitations of Property Graphs and their differences with knowledge Graphs that are based on RDF. The main vendor for property graph technology, Neo4J, offers a mature system with some attractive, easy to get started with capabilities. There are also a few other Property Graph databases on the market today.
However, we increasingly hear of customers hitting the wall with Property Graphs because as they start to use them, they recognize the need for one or more of the following capabilities:
- Capture of Schema in a Graph
- Support for Validation and Data Integrity
- Capture of Rich Rules
- Support for Inheritance and Inference
- Globally Unique Identifiers
- Resolvable Identifiers
- Connectivity Across Graphs
- Better Solution to Graph Evolvability
Note that these are fundamental limitations that are not addressed in the design of property graphs. In principle, it may be possible to add at least some of these capabilities to a Property Graph—but not that easily or elegantly. Some of you may are already on the road to doing this. However, it is a lot of effort, both conceptual (i.e., design and architecture) and implementation-wise. Even if you succeed in accomplishing it, you will end up with a proprietary home-grown version of capabilities that already exist, are standardized and well proven.
Inherent Semantics Make It Easy for RDF Graphs to Become Knowledge Graphs
As illustrated in the previous sections, RDF-based graphs capture more than just data. They capture the meaning or semantics of data, including rich constraints and highly expressive rules. All information is stored in a graph and is available for query and any other algorithms that can help us reason and discover new knowledge based on the available knowledge. And the amount of the available knowledge with Knowledge Graphs is practically unlimited—just as it is on the world wide web. We can reach out and take advantage of the information available in other graphs. Separate, but connectable is a key feature of the web—and of Knowledge Graphs.
With Property Graphs, data modeling happens on paper or on a white board, separate from the graph itself. Property Graphs are not self-describing and the meaning of the data they store is not a part of a graph.
Some Guidance for Moving from a Property Graph to a Knowledge Graph
It is fairly easy to generate one of the RDF standard serializations from a property graph. In fact, Neo4J offers a library for doing this. You can readily get the data out, but you will not be able to get the semantics of the data; this is due to the fact that the data model only exists in your initial design sketches and, partially, within Cypher queries and programs.
Further, as we have discussed, the structure of the graph data may be influenced by the specific limitations of the property graph data model and optimizations that were required due to the architecture of a property graph database. We already demonstrated how a decision to use intermediate nodes in a property graph may be based on the need to add information to a property, which is only possible if a property is turned into an edge.
Further, in property graphs some property values such as dates or names are often turned into entities because there is no efficient way of querying literal values, especially if they are multi valued. As a result, you may have an entity for a number 58,811 or a year 1956. This, however, could result in having so-called “dense nodes” or nodes that participate in many relationships. Typically, nodes that are targets of thousands of relationships are considered to be dense in Neo4J with the potential of performance issues when such nodes are deleted. The design of the model may, therefore, be impacted by the density considerations. Similarly, you may have relationships that represent specific dates e.g., BORN_IN_1956, BORN_IN_1957, etc. This is a design pattern used in property graphs because with a generic BORN_IN relationship, Cypher queries looking for people born in, let’s say 1956, do not perform well. Once you move to RDF, you may decide to revisit some of these design decisions.
The simplest way forward is to export property graph data as-is and then create a data model in RDF that represents the structure of the data. For example, if you created intermediate nodes in order to link roles to people portrayed by roles, you would mirror this in your RDF model (often called an ontology) even if strictly speaking this is not necessary in the RDF-based implementation.
Many applications today use GraphQL to read and write data. Neo4J and some other Property Graph offerings support GraphQL access to data. If you have used GraphQL to build your solution on top of a Property Graph, you will be able to keep much of your code as you move to an RDF platform that also supports GraphQL.
For property graphs, GraphQL Schemas need to be manually created and then manually maintained as the graph structures get extended and changed. One of the advantages of a self-describing graph is that GraphQL Schemas can be automatically generated from the data model. This delivers on the promise of frictionless development and graceful systems maintenance by rendering unnecessary any manual effort for defining and maintaining schemas.
Summary
Neo4J is a mature solution that popularized Property Graphs and made them easy to get started with. People tend to think that RDF based Knowledge Graphs are hard to understand, complex and hard to get started with. In the past, there was some truth to that characterization. Today, with mature Knowledge Graph products available, this is no longer the case.
Many users are discovering the limitations of property graphs. Even if you started your first graph project using a property graph, it is likely that sooner or later you will be hindered by limitations and will want to adopt or at least explore the feasibility of an RDF / Semantic Knowledge Graph based system. You will not be alone, as a number of organizations are graduating from property graphs to knowledge graphs. We hope that this paper has provided some insight and value in your decision making.
Part 3 of this article series concludes with a discussion of some implications of the differences in capabilities of the two main graph models. We learned that:
- Knowledge Graphs provide much stronger capabilities for modular separation and composition of graphs, allowing data to be partitioned in named graphs. Just as with the web, separate, but connectable is a key feature of Knowledge Graphs
- Property Graphs are offered as a mature system and may be easy to get started with (e.g. Neo4J). However, there are fundamental limitations that are not addressed in the design of property graphs that preclude some critical capabilities that are not easy to add if/when you need them.
- RDF-based graphs capture not only data, but also the meaning or semantics of data, including rich constraints and highly expressive rules. Property Graphs are not self-describing and the meaning of the data they store is not a part of a graph.
- Though it is possible to move from a Property Graph to a Knowledge Graph—you can readily get the data out—you can’t directly get the semantics of the data. The simplest way forward is to export property graph data as-is and then create a data model in RDF that represents the structure of the data.