Are our old data model standards out of shape ?

An overview of critical points to consider when modeling with R3DM/S3DM

2017-07-09 5 min read Semantic_Web

Project

Introduction

Both Topic Maps and RDF/OWL exhibit signs of aging. In my opinion these signs do not indicate maturity levels but on the contrary they signal a re-examination of the data modeling, information representation problem. There is an emergent need to unify and exchange transformations between serialization formats (XML, JSON, etc), (graph) DBMS data model standards and semantic web data models.

Hence this is my speech at European Wolfram Technology Conference 2017 about a new data modeling framework R3DM/S3DM that is implemented on top of OrientDB graph database and coded in Wolfram Mathematica.

Comparison with other data model standards

These are a few critical points to consider when you compare this data model with Topic Maps and RDF/OWL:

Namespace problem

Both RDF/OWL and Topic Maps are suffering from namespace problems and complexity. In topic maps for example, when you want to define associations, i.e. n-ary relations, relationships you must specify at least type and roles. For example:

Part(08:pid, "Acme Widget Washer":pname, white:pcolor )

But in this representation you do not have a handle for the association instance and the context of roles has always to be present to assign meaning on values. Things become even more complicate with RDF association (type or instance) where everything has to be broken down in triples with labeled uni-directional edges.

(Prt1 --pid--> 08, Prt1 --pname--> "Acme Widget Washer", Prt1 --pcolor--> white}

The predicate of RDF triplet is causing more harm than good. Any SPARQL traversal algorithm is heavily dependent on these predicates, and in practice for large collaborative knowledge bases, e.g. Freebase, they used to label both directions to make traversal easier. You may also consider that owl:sameAs adds more complexity in the graph and traversal.

Now compare these with the simplicity of Entity-Relationship model. The database vocabulary has the header of the association (relation) and the body contains tuples.

tuple type : (pid, pname, pcolor) tuple instance : (08, "Acme Widget Washer", white)

Is there an alternative representation to combine these ? Yes there is, you make a uniform numerical representation of types and instances, of entities and attributes, of data values and data types and you bind everything in a hypergraph space.

tuple type : 233:0{85:0, 91:0, 34:0} tuple instance : 233:1[85:6, 91:2, 34:9]

The vocabulary of relational model (header) permits to have ordered tuples of values (body), the numerical reference vectors of R3DM/S3DM model permits to have unordered tuples and there is a handle that represents each tuple instance. In RDF to represent a tuple you have to break it down into triples where you repeat the subject. And values (objects) must be semantically accompanied by the predicate. Thus R3DM/S3DM associative representation with numerical references is simpler and it proves to be more efficient with indexing too !

Separate abstraction layers

It is important to separate digital information resources, e.g. web pages, files, folders, audio/video recordings, images, text documents etc from real things e.g. humans, organizations, objects, etc. It is also important to distinguish between a flexible model and its instances. But it is equally or more important to separate any abstract concept from data values (numerical, string, bits, etc). Because the first is the vehicle for human thinking and the second is the way computers are processing data. Therefore this gap has to be bridged somehow. R3DM/S3DM achieves this with an extra abstraction layer where everything is connected with Atomic Information Resource (AIR) units. This AIR unit defines also the level of granularity. Instead of building everything with Topics, you use AIR units.

Granularity with AIR units

But the AIR unit has the advantage that can be indexed easily, it is represented with a numerical vector, an address that can pinpoint the exact location of an Entity-Attribute-Value item. It is similar to an IPv4 address of a machine (e.g. domain, network, server, node/device/machine). My question is the following. If we use such addresses for connecting machines on the internet, why don’t we establish a similar standard for connecting data ? An AIR unit is the fundamental powerful construction unit for smart data. It knows its siblings, its parent, its type, its nexus, its associated AIR units (nodes). A tuple of such units can stand on its own, without a header and its completely meaningful because the context has already been defined.

Filtering instead of querying

Thanks to the uniform representation of everything with AIR units that are connected with bidirectional edges there is no need to define a query language but instead you define powerful functional operations that filter and add data in an associative manner in a fully typed environment. R3DM/S3DM supports types for database metadata, data sources, models, entity types, attribute types, items (instances), link types and value types. Again everything is constructed with AIR units. Both bidirectional edges and a full type system that is based on primitives were key features of Metadata Freebase project and then Google’s knowledge graph.

A solid theoretical background

R3DM/S3DM data model is founded on the theory of semiosis. There have been attempts to connect RDF/OWL with Aristotle’s triangle of reference/meaning but in my opinion they fail to capture the essence of the abstraction mechanism in semiosis which is played by the sign as the vehicle of communication between the signifier and the signified.

Summarize

To summarize the power of R3DM/S3DM is hidden on its Atomic Information Resource units that are fully typed, addressable and can be dereferenced and the formation of n-ary bidirectional associations.

Cross-References

Abstraction_(software_engineering) Information Sign_(semiotics) Triangle_of_reference Signified_and_signifier Associative_model_of_data Data_model Semantics Database Database_management_system Artificial_intelligence Type_system Serialization Filter_(higher-order_function) Granularity Namespace