A Quick Guide on How to Prevail in the Graph Database Arena

A brief discussion on criteria to meet a differentiation strategy for graph databases

11 minute read

Introduction

There are endless discussions on the databases arena about which DBMS is best suited for operational or data warehousing analytics, which one is the most efficient for online transaction processing, or which one is suitable for semantic integration. Recently graph databases are growing in popularity, especially in the enterprise space, and perhaps that adds more headache on those vendors that try to differentiate from competition and on those clients that are completely uncertain how to embrace this database technology.

Definition of Graph Databases

Recently Bloor published a report about Graph and RDF Databases. The author, Philip Howard, claims that “the difference between a true graph product and a triple store is that the former supports index free adjacency (which means you can traverse a graph without needing an index) and the latter doesn’t”. On the contrary Weinberger, CEO of ArrangoDB, argues that this is not a fundamental criterion on what is a graph database. In a post titled “Index Free Adjacency or Hybrid Indexes for Graph Databases” he proposes that the definition of graph database remains

a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data independent of the way the data is stored internally.

Claudius Weinberger

Indeed, in the same Bloor report a distinction between native and non-native graph databases is made based on their engine. In my opinion, any definition that avoids any reference to the semantics of nodes and edges or their internal structure is preferable. Failing to follow this guideline, it is unavoidable to favor specific implementations, e.g. Property Graph Databases or Triple Stores, and you may easily become myopic to other types that are based on different models, e.g. hypergraph databases, or different data storage paradigms, e.g. key-value stores. Therefore, I propose we adopt a vendor neutral definition, such as the following one, which cannot exclude any future type of graph database.

A Graph Database is a database that uses a graph topology, i.e. vertices and edges, to manage information at the conceptual level independent of the logical and physical implementation of the graph data structure.

Athanassios I. Hatzis, 28th February 2017

Many-to-many Relationships

In another recently published Spotlight paper by Bloor, “All about graphs: a primer”, the author discusses the Graph data model and highlights the representational differences of a many-to-many relationship including those of bipartite, hypergraph and associative graphs. He observes that

unlike other new database approaches, graphs cannot easily be subsumed by the leading relational database vendors because the architectural constraints of graphs do not fit easily within the relational paradigm.

Philip Howard

He mentions that the two main variants on entity relationships are labeled property graphs and subject-predicate-object triples. In practice, although the idea of relationships (associations) between entities is at the heart of Peter Chen’s Entity-Relationship model, Fig.2 and Fig.3, there are subtle dissimilarities in its implementation on various graph databases. A. Hatzis, in a series of posts on associative data modeling, that is written with a hands-on practice style, attempts to clear the information glut of this topic with a thorough examination of graph data models.

Multi-model Database Engine

The graph engine and the type of data model are critical factors for any graph database. Therefore it is not strange that many vendors have started marketing their DBMS as a multi-model. We have extensive and long experience with two such products, OrientDB and Intersystems Cache. The former supports Graph, Document, Key/Value, and Object models, the latter is an object database with relational access, integrated support for JSON documents and a multidimensional key-value storage mechanism that can be easily extended to cover Graph data model. Generally speaking, we have reasons to believe that multi-model DBMS will dominate the database market. Currently OrientDB has become a leading player in the graph databases and Intersystems Cache is one of the best operational DBMS according to Magic Quadrant report.

Physical versus Logical Perspective

Not only has a multi-model database been flexible with its logical schema, but it also has a unified storage data architecture. Although the developer should hardly need access to the physical implementation details of the storage engine, an API for direct use of the engine is desirable and beneficial for many reasons. Most important, this kind of architecture allows someone to build a customized database management system. In theory, ANSI/SPARC three level architecture (external, conceptual/logical and physical) is an effort to allow these three perspectives to be relatively independent of each other, but in practice the front-end of a DBMS is most often strongly dependent on the back-end storage data model.

A loose coupling can be achieved with associative/multidimensional arrays. No matter what is their physical implementation, i.e. hash tables or trees, based on this abstract data type you can model all four NoSQL database types, (Key/Value, Tabular/Columnar, Document, Graph). For one reason or another, we are of the opinion that associative/multidimensional arrays will eventually prevail in the world of databases. There is already strong competition for their best physical implementation and sparse, column-family store, databases have proven to be very popular (HBase, Hypertable, BigTable, Intersystems Cache).

There are other properties that are crucial for operational database management systems such as ACID transactions, distributed data architecture, and scalability. Whether we are talking for a multi-model or single model graph databases, there is a tendency to use them for on-line transaction processing therefore these properties are worth having. And again in terms of architectural design there is always the problem of how to achieve a loose coupling between the physical structures of a database and the application logic.

Conceptual Framework

With that said it brings us to the question on what kind of logical/conceptual data model architecture to use. Our R3DM/S3DM framework is based on the powerful theory of the semiotic triangle. We use numerical vectors (signs), to encode abstract things in our mind (signified) to which the sign refers, e.g. Person, name, Car, model. We associate these with data containers-forms that the sign takes for the storage of data values (signifier), i.e. primitive data types (see also Signified and Signifier). This trilateral principle of our framework permits a uniform treatment of semantics, syntax and storage of information based on a symbolic representation. This way we define a fundamental, atomic information resource unit, (AIR). Those units, in turn, can be easily shaped to form any tabular, hierarchical, or graph data structure in a unified way. For example, study this R3DM hypergraph representation of Qlikview associative model. Data granularity can be also deeply connected and related to the definition of a fundamental unit of processing.

Based on this single primitive construct as a building block, (AIR), we have implemented seven type systems for an upper level management of any DBMS. These are:

SYSTEMSHORTNAME
1. SYS_DatasetDSS
2. SYS_DomainModelDMS
3. SYS_EntityTypeETS
4. SYS_AttributeTypeATS
5. SYS_ValueTypeVTS
6. SYS_LinkTypeLTS
7. SYS_DatabaseDBS

We characterize Datasets, Domain Models (schemas), Entities, Attributes, etc, as information resources, values are information realization and our AIR units that represent everything are called information representations or simply references. Our current implementation phase has been completed on top of OrientDB and a forthcoming article will present R3DM/S3DM architecture in detail. In the past, Freebase collaborative knowledge graph had a type system that was built on primitive constructs.

Query Language

Yet another decisive norm in databases is the query language. With RDF directed, labeled graph data format and with RDF store databases respectively, e.g. OpenLink Virtuoso, AllegroGraph and Ontotext GraphDB, SPARQL query language is a standard way to retrieve data. On the contrary the query language of property graph databases varies a lot. There are similar to SQL APIs such as those of OrientDB and ArrangoDB, Neo4J is using its own Cypher declarative graph query language and there is also the Gremlin open-source graph programming language.

Another approach is that of GraphQL which is similar to Freebase MQL query language. Queries are shaped in JSON hierarchical format with patterns that follow the schema of the graph database.

We have developed a functional RESTful API that can be served as a prototype for a uniform, universal treatment of data language. Commands and their parameters can become more efficient and they can be simplified if we take on account the hierarchical relationship of Server, Database, Class, Property and Record containers. There are five sets of commands for getting, updating, deleting, adding and linking information. Current implementation is built with Wolfram Language and we will expose more details in a forthcoming article where we analyze R3DM/S3DM architecture.

Business Analytics

Last but not least, there is an emerging need for databases that can function as both analytic and operational. In particular, the modern data warehouse should unify all client’s transactional databases as well as integrate other external data sources that enable data cleansing, validation and enhancement. Not only that, but for quick and smart business analytics the interface should be both user friendly and functionally powerful. We are aware of such a player in this market segment with a technology that possess similar features to our R3DM/S3DM framework. This is the reason that we devoted one of our articles to describe QlikView’s unique, award-winning, in-memory associative technology.

Epilogue

Make no mistake, relational databases are the past of computer database technology. Graph databases are the present and the future. This quick review on what we considered important criteria for graph database related technology products might leave the reader in more perplexity than satisfaction. This is our perspective, we wanted to share some of our knowledge with experts and chief technology persons on this field so that we could discuss the matter in more detail with them. The future will show in how many of these discussion topics we were right.

Cross-References

R3DM Project Posts

2017

Associative Semiotic Hypergraph API in Mathematica for Next-Generation BI Systems
European Wolfram Technology Conference 19-20 June 2017 in Amsterdam
My speech at European Wolfram Technology Conference 2017 about a new data modeling framework R3DM/S3DM that is implemented on top of OrientDB graph database and coded in Wolfram Mathematica

Are our old data model standards out of shape ?
An overview of critical points to consider when modeling with R3DM/S3DM
Both Topic Maps and RDF/OWL exhibit signs of aging. These signs do not indicate maturity levels but on the contrary they signal a re-examination of the data modeling, information representation problem

The three dimensions of AI and a fourth one as the key to unlock them
Comments on a review of AI by John Launchbury, special assistant to DIRO, DARPA
Although there has been significant progress with first and second generation AI systems in reasoning, learning and perceiving, abstraction has not been part of the game. The mechanism of abstraction can unify these other three processes.

Associative Data Modelling Demystified: Part 6/6
R3DM/S3DM: Build Powerful, Meaningful, Cohesive Relationships Easily
Demonstration of a new data model framework that transforms OrientDB into a HyperGraph Database

Data Modelling Topologies of a Graph Database
Definition and Classification of Graph Databases into Three Categories
The associative data graph database model is still a heavy hitter, stacking up well against property graphs and triples/quadruples. Expect a comeback.

A Quick Guide on How to Prevail in the Graph Database Arena
A brief discussion on criteria to meet a differentiation strategy for graph databases
A swift introduction to the key factors that influence the performance and unification character of graph databases

Associative Data Modeling Demystified: Part 5/6
Qlik Associative Model
Qlik's competitive advantage over other BI tools is that it manages associations in memory at the engine level and not at the application level. Every data point in every field of a table is associated with every other data point anywhere in the entire schema.

2016

Associative Data Modeling Demystified: Part 4/6
Association in RDF Data Model
In this article we will see how we can define an association in RDF and what are the differences with other data models that we analyzed in previous posts of our series

Do you Understand Many-to-Many Relationships ?
Associative entities are represented differently in various data models
It is 2016 and in my opinion the situation with associative entities has become darn confusing. Edges of a Property Graph data model are bidirectional but RDF links are unidirectional.

Associative Data Modeling Demystified: Part 3/6
Association in Property Graph Data Model
In this article, we continue our investigation with the Property Graph Data model. We discuss how a many-to-many relationship is represented and compare its structure in other data models

Associative Data Modeling Demystified: Part 2/6
Association in Topic Map Data Model
In this post, we demonstrate how Topic Map data model represents associations. In order to link the two, we continue with another SQL query from our relational database

Associative Data Modeling Demystified: Part 1/6
Relation, Relationship and Association
In this article, we introduce the concept of association from the perspective of Entity-Relationship (ER) data model and illustrate it with the modeling of a toy dataset

2015

Towards a New Data Modelling Architecture
Part 2: Atomic Information Resource (AIR)
We introduce the Atomic Information Resource (AIR) unit of R3DM conceptual framework

Towards a New Data Modelling Architecture
Part1 - Relational/ER Constructs in Wolfram Language
We start with terms and constructs that most of us are familiar with from the Relational and Entity-Relationship database management systems