In early 2009 an increasing number of non-relational DBMS started to appear on the scene. They were based on data models such as Column, Document, Key-Value, Graph and Multi-Model. At about the same time, big corporations in IT industry were concerned with data processing of large and complex data sets. Big data movement was a major driving force to re-examine whether current DBMS were sufficient to cope with the new requirements on data processing. More specifically the four V’s of Big Data, i.e. volume, variety, velocity and veracity, put an increasing demand in major companies and organizations. This had as a result a reinvestigation on what DBMS technology to apply. Many started realizing that physical representation of data on the disk or in memory and data modeling are key factors on how you build or integrate parts of your IT solution.

While most players in the IT sector adopted Graph or Document databases and Hadoop based solutions, Hadoop is an enabler of HBase column store, it went almost unnoticed that several new DBMS, AtomicDB previous database engine of X10SYS, and Sentences, based on associative technology appeared on the scene. We have introduced and discussed about the data modelling architecture and the atomic information resource unit (AIR) of AtomicDB. Similar technology has been the engine power of Qlikview, a very popular software in Business Intelligence and Analytics, since 1993. Perhaps it is less known to the reader that the association construct is a first class citizen in Topic Map semantic web standard and it is translated to (RDF), the other semantic web standard. In other posts of this series we can see how it is possible to implement Associative Technology in multi-model graph databases such as OrientDB, in object-relational DBMS such as Intersystems Cache and Oracle or build the engine for in-memory processing with Wolfram Mathematica. In this article, we introduce the concept of association from the perspective of Entity-Relationship (ER) data model and illustrate it with the modelling of a toy dataset.

Supplier-Part-Catalogue Dataset

Our dataset is a collection of three related database tables. Two of the tables, Supplier ( TSV, XLSX ) and Part ( TSV, XLSX ), represent and describe Entity Types, i.e. Relations. The third one, Catalogue ( TSV, XLSX ) is the associative entity, a.k.a. bridge table, junction table, join table, etc. The typical case where data for the bridge table is captured is from a business inventory where each item is recorded with its unit price and a quantity in stock. Such information is stored in our Catalogue table, together with a date the item record was modified and a flag that indicates inspection of the stock.

The ER Data Model

Quite often (ER) Data Model is confused with the Relational Data Model. The reader should become familiar with the following equivalent terminology.

Entity Type (Set) Relation (Table, Base relvar)
  Heading - ordered set of attributes
  Body - set of n-tuples, extension of a predicate
Entity (instance, member) Tuple (ordered set of attribute values)
Value Set Type name (data type)
Value Attribute value
Relationship set View (Result set or Derived relvar)
Relationship (association)  
Attribute (function) Free variable of a predicate
mapping of Entity Set or Attribute name of the relation (column)
Relationship Set to Value Set ordered pair of attribute name and type name

In this classic paper, “The Entity-Relationship Model - Toward a Unified View of Data” Peter Chen defines relationship as an association among entities. In the same paper entity and relationship are illustrated as hypergraphs. In particular, Fig. 2 at page 13 and Fig. 3 at page 14 show how an instance from an Entity/Relationship set is linked to either values or other entities. Although Peter Chen did not describe these fundamental constructs of his data model with Hypegraph terms, it can be easily seen in these captured figures that a specific entity or relationship form hyperedges with other entities or values that play the role of hypernodes.

(Emploee):Ei at Fig.2 and (PROJECT-WORKER):Ri at Fig.3 form hyperedges

Association and Associative Array

We extend the definition of association in the (ER) data model to cover both Entities and Relationships. In the aforementioned paper, Chen formally defines attribute as a function which maps a given entity to a single value (or a single tuple of values). He also mentions that relationships have their own attributes whose meaning depends on the associated entities. The (key,value) pair of an Associative array, a.k.a. map, symbol table, or dictionary, captures this functional operation of attribute.

Wolfram Language - Association

Wolfram Language is using Association fundamental construct to represent an associative array. The following three Wolfram Language associations represent a relationship of a Catalog instance with a Supplier instance and some Part instance from our toy dataset.

(* Supplier Association *)
<|supID->1081, supName->"Acme Widget Suppliers", supAddress->"1 Grub St., Potemkin Village, IL 61801", supCity->"ILLINOIS", supCountry->"USA", supStatus->10|>

(* Part Association *)
<|prtID->998, prtName->"Fire Hydrant Cap", prtColor->"Red", prtWeight->7.2, prtUnit->"gr"|>

(* Catalogue Association *)
<|supID->1081, prtID->998, catPrice->11.7, catQuantity->400, catDate->"Wed 10 Sep 2014", catCheck->True|>

What we are missing in this representation are the hyperedges, i.e. the nexuses that link together hypernodes. According to hypergraph terminology, the edge that connects a hyperedge to a hypernode is called hyperlink or connector. You may think of these hyperlinks/connectors as hash fields but we need another identifier for the hash key, i.e. an identifier for the instance. We could have used supID, prtID or a combination of these two but we prefer to define new identifiers for hyperedges. The main reason for this is that in principle we want to design a uniform way of addressing and identifying both hyperedge and hypernode instances at a low-dimensional, e.g. (2D,3D,4D), space. If we encode this way, it is possible to avoid namespace problems. In this post we use names to compare the graph with figures from Peter Chen’s paper.

Three Associations in a Named-Row Dataset

We may also connect hyperedges, e.g. we can add a “type” hyperlink that maps instances to classes, a “vendor” hyperlink that maps a Catalog instance to its Supplier instance and a “stockItem” hyperlink that connects Catalog instance to Part instance. In Wolfram Language it is relatively easy to make a graph of these Associations. We colorized the graph according to the different data types we have, i.e. hyperedges, strings, doubles, integers, dates, and booleans, Fig.4.

Fig.4 - Hyperedges are displayed in red, strings are in grey, doubles in lightblue, integers in green, dates in lightgreen and boolean in black,

JSON Serialization

Another very popular representation for Association is JSON Serialization. Those associations we described above can be serialized with the following format.

  "$PartI": {
    "type": "$Item",
    "prtID": 998,
    "prtName": "Fire Hydrant Cap",
    "prtColor": "Red",
    "prtWeight": 7.2,
    "prtUnit": "gr"
  "$SupplierJ": {
    "type": "$Organization",
    "supID": 1081,
    "supName": "Acme Widget Suppliers",
    "supAddress": "1 Grub St., Potemkin Village, IL 61801",
    "supCity": "ILLINOIS",
    "supCountry": "USA",
    "supStatus": 10
  "$CatalogK": {
    "type": "$Inventory",
    "supID": 1081,
    "prtID": 998,
    "catPrice": 11.7,
    "catQuantity": 400,
    "catDate": "2014-09-10",
    "catCheck": true

Catalogue, Parts and Suppliers tables are also available in JSON format.

Association vs Relation

Although the logical view of Chen’s (ER) data model and his popular diagrammatic technique incorporates Entities, Relationships, Attributes and Values, at the physical layer of any relational DBMS we find records, i.e. Entities or Relationships are ordered tuples. A set of these ordered tuples that have the same heading, is called body. This header is another ordered tuple with attribute names. Theoretically speaking the body and the header of a table make the Relation. In practise the header is separated from the body and located at the database catalog together with other metadata for the schema of the database. SQL standard specifies a uniform mean to access this catalog.

Wolfram Language - Relation

The List fundamental construct of Wolfram Language can be used to represent a Relation. An extensive analysis of various representations and transformations on the constructs of the Relational data model in Wolfram Language can be found here.

(* Part Relation with a header and three tuples (body) *)
  {"pid", "pname", "pcolor", "pweight", "punit"},
  {991, "Left Handed Bacon Stretcher Cover", "Red", 15.5, "lb"},
  {992, "Smoke Shifter End", "Black", 3.75, "lb"},
  {993, "Acme Widget Washer", "Red", 142.88, "kg"}

The reader will notice that in both Associative array and Relation representations, the cell of a table, i.e. the value of an attribute, is always functionally dependent on the description of Attribute and Entity/Relationship. It cannot stand on its own without these semantic references. Moreover, for the Relation representation, the exact order of values has to be maintained together with any missing (NULL) values.

Data Redundancy

Relational databases are usually normalized to eliminate duplication of information and ensure referential integrity. This is especially recommended in one-to-many relationships. Normalization involves arranging attributes in tables, nevertheless in a relational database we cannot escape from data redundancy completely. For example if we take the unit attribute of the normalized Part table above we can see that values are still repeated. Most important the schema of a relational database is too rigid and cannot be easily modified without breaking normalization and compromising referential integrity.

Structured Query Language (SQL)

Performing queries of data in a relational database is highly dependent on these two factors we briefly mentioned above, the physical representation of data on the disk and the normalization procedure. One of the main bottlenecks with SQL is the natural join relational operator, counterpart of logical AND. An SQL Join is a means for constructing a result set combining columns from two (or more) tables by using values common to each. The more tables you join and the bigger the size of each table the more time it takes to consolidate the data of the view. On the contrary in many graph databases, connected nodes, i.e. record instances, are directly linked to each other. Each node, contains a list of pointers of its edges, therefore avoiding look-ups. In fact you are not searching for records, you are traversing connections instead. This may dramatically reduce the speed of constructing the result set of a query.

For demonstration and comparison purposes we provide such a JOIN sql query for our toy database together with an ER diagram of the database schema,Fig.5 and the result set on a Microsoft Access table view, Fig.6.

You may also download a Microsoft Access, accdb file format, database to experiment with.

Fig.5 - Left image illustrates the ER diagram of database schema, Fig.6 - Right image displays a result set from a join operation.
SELECT  Catalog . catsid ,
        Catalog . catpid ,
        Suppliers . sname ,
        Parts . pname ,
        Catalog . catcost ,
        Catalog . catqnt ,
        Catalog . catdate ,
        Catalog . catchk
FROM    Suppliers
        INNER  JOIN  ( Parts
                    INNER  JOIN  [Catalog]
                            ON  Parts . pid  =  Catalog . catpid )
                ON  Suppliers . sid  =  Catalog . catsid
ORDER   BY  Suppliers . sname  DESC ,
           Catalog . catcost  DESC ;

One can immediately notice the complexity of SQL query language on the nested SQL Join constructs.


In this article we described several limitations of the ER model that we wish to overcome, in brief these are:

  • Functional dependence of values
  • Data redundancy
  • Join operations

In the next article of our series we continue with an international industry standard for information management and interchange, the Topic Maps Data Model (TMDM). Associations in TMDM are similar to tuples but they have types. Each member of an association plays a role that is defined explicitly. In fact this is in full agreement with Chen’s Entity-Relationship diagrams (see Fig.1 and Fig.2 above). Chen discusses the role of an entity in a relationship and the role of an attribute in a relation and he considers distinct constrains on allowable values for a value set and constraints on permitted values for a certain attribute.

TMDM view is edifying because it divides information space in two layers. At the conceptual level we have topics that can be associated and represent any subject that can be conceived by a human being. At the occurrence level we have addressable information resources that describe those subjects.

Click on the image to continue reading our next article on this series.


R3DM Project Posts


Intersystems Cache Python ORM
Intersystems Cache Object-Relational Mapper in Python 3
Intersystems Cache Object-Relational Mapper in Python 3

Relational Data Model: Back to the roots
Important design and implementation principle arising from studying relational data model theory
Important design and implementation principle arising from studying relational data model theory

Build valuable relations; establish effective communications
A post that explains our philosophy and goals behind HEALIS products
A post that explains our philosophy and goals behind HEALIS products

TRIADB at Connected Data London
My speech at Connected Data London conference and demos of TRIADB implementation on Intersystems Cache DBMS
My speech at Connected Data London conference and demos of TRIADB implementation on Intersystems Cache DBMS


The wizards of stored computer program and the next generation of programmers
The fundamental aspect that software pioneers have been missing when they invented new programming languages or new nosql databases
Have you noticed that what ever the model and data structure in databases we cannot escape from the fundamental principle of managing data allocation space with references, i.e. pointer based logic, memory addressing ?

Associative Semiotic Hypergraph API in Mathematica for Next-Generation BI Systems
European Wolfram Technology Conference 19-20 June 2017 in Amsterdam
My speech at European Wolfram Technology Conference 2017 about a new data modeling framework R3DM/S3DM that is implemented on top of OrientDB graph database and coded in Wolfram Mathematica

Are our old data model standards out of shape ?
An overview of critical points to consider when modeling with R3DM/S3DM
Both Topic Maps and RDF/OWL exhibit signs of aging. These signs do not indicate maturity levels but on the contrary they signal a re-examination of the data modeling, information representation problem

The three dimensions of AI and a fourth one as the key to unlock them
Comments on a review of AI by John Launchbury, special assistant to DIRO, DARPA
Although there has been significant progress with first and second generation AI systems in reasoning, learning and perceiving, abstraction has not been part of the game. The mechanism of abstraction can unify these other three processes.

Associative Data Modelling Demystified: Part 6/6
Demonstration of a new data model framework that transforms OrientDB into a HyperGraph Database
Demonstration of a new data model framework that transforms OrientDB into a HyperGraph Database

Data Modelling Topologies of a Graph Database
Definition and Classification of Graph Databases into Three Categories
The associative data graph database model is still a heavy hitter, stacking up well against property graphs and triples/quadruples. Expect a comeback.

A Quick Guide on How to Prevail in the Graph Database Arena
A brief discussion on criteria to meet a differentiation strategy for graph databases
A swift introduction to the key factors that influence the performance and unification character of graph databases

Associative Data Modeling Demystified: Part 5/6
Qlik Associative Model
Qlik's competitive advantage over other BI tools is that it manages associations in memory at the engine level and not at the application level. Every data point in every field of a table is associated with every other data point anywhere in the entire schema.


Associative Data Modeling Demystified: Part 4/6
Association in RDF Data Model
In this article we will see how we can define an association in RDF and what are the differences with other data models that we analyzed in previous posts of our series

Do you Understand Many-to-Many Relationships ?
Associative entities are represented differently in various data models
It is 2016 and in my opinion the situation with associative entities has become darn confusing. Edges of a Property Graph data model are bidirectional but RDF links are unidirectional.

Associative Data Modeling Demystified: Part 3/6
Association in Property Graph Data Model
In this article, we continue our investigation with the Property Graph Data model. We discuss how a many-to-many relationship is represented and compare its structure in other data models

Associative Data Modeling Demystified: Part 2/6
Association in Topic Map Data Model
In this post, we demonstrate how Topic Map data model represents associations. In order to link the two, we continue with another SQL query from our relational database

Associative Data Modeling Demystified: Part 1/6
Relation, Relationship and Association
In this article, we introduce the concept of association from the perspective of Entity-Relationship (ER) data model and illustrate it with the modeling of a toy dataset


Plerophoria vs Information
The ancient Greek origin of the word information
The ancient Greek origin of the word information

Towards a New Data Modelling Architecture
Part 2: Atomic Information Resource (AIR)
We introduce the Atomic Information Resource (AIR) unit of R3DM conceptual framework

Towards a New Data Modelling Architecture
Part1 - Relational/ER Constructs in Wolfram Language
We start with terms and constructs that most of us are familiar with from the Relational and Entity-Relationship database management systems


R3DM/S3DM Illustration and Formalization
Old wiki pages and LinkedIn posts ported from and
Old wiki pages and LinkedIn posts ported from and


Old wiki pages on R3DM ported from, examples of R3DM in TMDM
Old wiki pages on R3DM ported from, examples of R3DM in TMDM

R3DM Questions and Answers
Old wiki pages on R3DM ported from
Old wiki pages on R3DM ported from

URIs for Real-World Objects
The problem of information resources definition, representation and identification
The problem of information resources definition, representation and identification


Ignite Athens 2012 : From WWW To GGG
My first pitch at Ignite Athens on 20th of September 2012 about Neurorganon Upper Level Ontology (NULON)
My first pitch at Ignite Athens on 20th of September 2012 about Neurorganon Upper Level Ontology (NULON)