Data Modelling Topologies of a Graph Database

Definition and Classification of Graph Databases into Three Categories

Table of Contents

Definition

There is a lot of confusion with the definition of graph databases. In my opinion, any definition that avoids any reference to the semantics of nodes and edges or their internal structure is preferable. Failing to follow this guideline, it is unavoidable to favor specific implementations, e.g. Property Graph Databases or Triple Stores, and you may easily become myopic to other types that are based on different models, e.g. hypergraph databases, or different data storage paradigms, e.g. key-value stores. Therefore, I propose we adopt a vendor neutral definition, such as the following one, which cannot exclude any future type of graph database.

A Graph Database is a database that uses a graph topology, i.e. vertices and edges, to manage information at the conceptual level independent of the logical and physical implementation of the graph data structure - Athanassios I. Hatzis

Graph Databases Per Data Model

That said, there are many differences regarding to the abstraction layer of databases. These affect everything — visualization, query language, indexing, scaling, and transactions. Now, let me focus on the conceptual/logical layer, where my work is based. Depending on the structure of nodes and edges, one can describe the following three different data models.

  1. Property Graph Data Model

    • Directed Labeled Graph
    • Entity-centric with embedded properties and edges with bidirectional linking to nodes
    • Neo4J, OrientDB, ArrangoDB, etc.
  2. Triple/Quadruple Data Model

    • Directed Labeled Graph
    • Edge-centric with unidirectional linking on vertices
    • GraphDB, AllegroGraph, OpenLink Virtuoso, etc.
  3. Associative Data Model

    • Hypergraph/Bipartite Graph
    • Hypernodes, Hyperedges with bidirectional linking
    • Topic Map Data Model, R3DM/S3DM, X10SYS (AtomicDB), HypergraphDB, Qlik Technology

There are two main differences between (1) and (2). First, the type of edges in a property graph, by definition, is bidirectional. You can traverse any edge both ways, despite the fact there is a direction on the edge. On the contrary, with RDF, you have to define two labeled edges with opposite directions to achieve bidirectional linking. And secondly, in literal triples, object parts are properties of a subject part, but they are not first-class citizens and they are not embedded inside the structure of Entity nodes of a property graph.

I left the associative data model as the last thing to mention. R3DM/S3DM is the reincarnation of Topic Maps, the de facto standard for the representation of associations. The following series of posts on associative data modeling is written with a hands-on practice style. It is an attempt to clear the information glut of many-to-many relationships (a.k.a associations) with a thorough examination of well-known data models and at the same time introduce R3DM/S3DM to the public.

  • Part 1/6 - Relation, Relationship, and Association
  • Part 2/6 - Association in Topic Map Data Model
  • Part 3/6 - Association in Property Graph Data Model
  • Part 4/6 - Association in RDF Data Model and Sentences associative data model
  • Part 5/6 - Qlik Associative Mode
  • Part 6/6 - R3DM/S3DM Associative Semiotic Hypergraph Data Model

Epilogue

The verdict from this quick review on graph databases is that I have reasons to believe that associative data modeling is far more powerful and expressive than the other two. I foresee that DBMS vendors that will incorporate in their products R3DM/S3DM technology will eventually have a significant competitive advantage.

Cross-References

Athanassios I. Hatzis, PhD
Software Engineer - Researcher, Consultant, Independent Contractor

Related