Are our old data model standards out of shape ?

An overview of critical points to consider when modeling with R3DM/S3DM

7 minute read

Introduction

Both Topic Maps and RDF/OWL exhibit signs of aging. In my opinion these signs do not indicate maturity levels but on the contrary they signal a re-examination of the data modeling, information representation problem. There is an emergent need to unify and exchange transformations between serialization formats (XML, JSON, etc), (graph) DBMS data model standards and semantic web data models.

Hence this is my speech at European Wolfram Technology Conference 2017 about a new data modeling framework R3DM/S3DM that is implemented on top of OrientDB graph database and coded in Wolfram Mathematica.

Comparison with other data model standards

These are a few critical points to consider when you compare this data model with Topic Maps and RDF/OWL:

Namespace problem

Both RDF/OWL and Topic Maps are suffering from namespace problems and complexity. In topic maps for example, when you want to define associations, i.e. n-ary relations, relationships you must specify at least type and roles. For example:

Part(08:pid, "Acme Widget Washer":pname, white:pcolor )

But in this representation you do not have a handle for the association instance and the context of roles has always to be present to assign meaning on values. Things become even more complicate with RDF association (type or instance) where everything has to be broken down in triples with labeled uni-directional edges.

(Prt1 --pid--> 08, Prt1 --pname--> "Acme Widget Washer", Prt1 --pcolor--> white}

The predicate of RDF triplet is causing more harm than good. Any SPARQL traversal algorithm is heavily dependent on these predicates, and in practice for large collaborative knowledge bases, e.g. Freebase, they used to label both directions to make traversal easier. You may also consider that owl:sameAs adds more complexity in the graph and traversal.

Now compare these with the simplicity of Entity-Relationship model. The database vocabulary has the header of the association (relation) and the body contains tuples.

tuple type : (pid, pname, pcolor) tuple instance : (08, "Acme Widget Washer", white)

Is there an alternative representation to combine these ? Yes there is, you make a uniform numerical representation of types and instances, of entities and attributes, of data values and data types and you bind everything in a hypergraph space.

tuple type : 233:0{85:0, 91:0, 34:0} tuple instance : 233:1[85:6, 91:2, 34:9]

The vocabulary of relational model (header) permits to have ordered tuples of values (body), the numerical reference vectors of R3DM/S3DM model permits to have unordered tuples and there is a handle that represents each tuple instance. In RDF to represent a tuple you have to break it down into triples where you repeat the subject. And values (objects) must be semantically accompanied by the predicate. Thus R3DM/S3DM associative representation with numerical references is simpler and it proves to be more efficient with indexing too !

Separate abstraction layers

It is important to separate digital information resources, e.g. web pages, files, folders, audio/video recordings, images, text documents etc from real things e.g. humans, organizations, objects, etc. It is also important to distinguish between a flexible model and its instances. But it is equally or more important to separate any abstract concept from data values (numerical, string, bits, etc). Because the first is the vehicle for human thinking and the second is the way computers are processing data. Therefore this gap has to be bridged somehow. R3DM/S3DM achieves this with an extra abstraction layer where everything is connected with Atomic Information Resource (AIR) units. This AIR unit defines also the level of granularity. Instead of building everything with Topics, you use AIR units.

Granularity with AIR units

But the AIR unit has the advantage that can be indexed easily, it is represented with a numerical vector, an address that can pinpoint the exact location of an Entity-Attribute-Value item. It is similar to an IPv4 address of a machine (e.g. domain, network, server, node/device/machine). My question is the following. If we use such addresses for connecting machines on the internet, why don’t we establish a similar standard for connecting data ? An AIR unit is the fundamental powerful construction unit for smart data. It knows its siblings, its parent, its type, its nexus, its associated AIR units (nodes). A tuple of such units can stand on its own, without a header and its completely meaningful because the context has already been defined.

Filtering instead of querying

Thanks to the uniform representation of everything with AIR units that are connected with bidirectional edges there is no need to define a query language but instead you define powerful functional operations that filter and add data in an associative manner in a fully typed environment. R3DM/S3DM supports types for database metadata, data sources, models, entity types, attribute types, items (instances), link types and value types. Again everything is constructed with AIR units. Both bidirectional edges and a full type system that is based on primitives were key features of Metadata Freebase project and then Google’s knowledge graph.

A solid theoretical background

R3DM/S3DM data model is founded on the theory of semiosis. There have been attempts to connect RDF/OWL with Aristotle’s triangle of reference/meaning but in my opinion they fail to capture the essence of the abstraction mechanism in semiosis which is played by the sign as the vehicle of communication between the signifier and the signified.

Summarize

To summarize the power of R3DM/S3DM is hidden on its Atomic Information Resource units that are fully typed, addressable and can be dereferenced and the formation of n-ary bidirectional associations.

Cross-References

R3DM Project Posts

2017

Associative Semiotic Hypergraph API in Mathematica for Next-Generation BI Systems
European Wolfram Technology Conference 19-20 June 2017 in Amsterdam
My speech at European Wolfram Technology Conference 2017 about a new data modeling framework R3DM/S3DM that is implemented on top of OrientDB graph database and coded in Wolfram Mathematica

Are our old data model standards out of shape ?
An overview of critical points to consider when modeling with R3DM/S3DM
Both Topic Maps and RDF/OWL exhibit signs of aging. These signs do not indicate maturity levels but on the contrary they signal a re-examination of the data modeling, information representation problem

The three dimensions of AI and a fourth one as the key to unlock them
Comments on a review of AI by John Launchbury, special assistant to DIRO, DARPA
Although there has been significant progress with first and second generation AI systems in reasoning, learning and perceiving, abstraction has not been part of the game. The mechanism of abstraction can unify these other three processes.

Associative Data Modelling Demystified: Part 6/6
R3DM/S3DM: Build Powerful, Meaningful, Cohesive Relationships Easily
Demonstration of a new data model framework that transforms OrientDB into a HyperGraph Database

Data Modelling Topologies of a Graph Database
Definition and Classification of Graph Databases into Three Categories
The associative data graph database model is still a heavy hitter, stacking up well against property graphs and triples/quadruples. Expect a comeback.

A Quick Guide on How to Prevail in the Graph Database Arena
A brief discussion on criteria to meet a differentiation strategy for graph databases
A swift introduction to the key factors that influence the performance and unification character of graph databases

Associative Data Modeling Demystified: Part 5/6
Qlik Associative Model
Qlik's competitive advantage over other BI tools is that it manages associations in memory at the engine level and not at the application level. Every data point in every field of a table is associated with every other data point anywhere in the entire schema.

2016

Associative Data Modeling Demystified: Part 4/6
Association in RDF Data Model
In this article we will see how we can define an association in RDF and what are the differences with other data models that we analyzed in previous posts of our series

Do you Understand Many-to-Many Relationships ?
Associative entities are represented differently in various data models
It is 2016 and in my opinion the situation with associative entities has become darn confusing. Edges of a Property Graph data model are bidirectional but RDF links are unidirectional.

Associative Data Modeling Demystified: Part 3/6
Association in Property Graph Data Model
In this article, we continue our investigation with the Property Graph Data model. We discuss how a many-to-many relationship is represented and compare its structure in other data models

Associative Data Modeling Demystified: Part 2/6
Association in Topic Map Data Model
In this post, we demonstrate how Topic Map data model represents associations. In order to link the two, we continue with another SQL query from our relational database

Associative Data Modeling Demystified: Part 1/6
Relation, Relationship and Association
In this article, we introduce the concept of association from the perspective of Entity-Relationship (ER) data model and illustrate it with the modeling of a toy dataset

2015

Towards a New Data Modelling Architecture
Part 2: Atomic Information Resource (AIR)
We introduce the Atomic Information Resource (AIR) unit of R3DM conceptual framework

Towards a New Data Modelling Architecture
Part1 - Relational/ER Constructs in Wolfram Language
We start with terms and constructs that most of us are familiar with from the Relational and Entity-Relationship database management systems