Where is the gap in data modeling ?

Table of Contents


I was motivated to write this post from an article of Christian Kaul “Bridging the Knowledge Gap”. He is making questions about how, what is the best way to bridge the knowledge gap between data modeling experts and people from other fields ? But I think an important role that data modeling experts play is exactly that to bridge the gap between pure IT technical people like developers, database administrators, data engineers and people from other fields e.g. domain experts, scientists, business people, etc. A more interesting question in my opinion is how this gap is created. Get prepared because I will answer with more questions:

  • Do we all agree that the best way to model data-information is to create a graph ?
  • If yes the next question is how exactly you represent that graph ? As you probably know there are four major competing data models, (relational model - ER, labeled property graph, RDF and Topic Maps), Are you convinced that there can be no other better alternative ?
  • What about the building blocks, for example in RDF you have triples, in relational you have relations (tables) and tuples. Is that all ? Can we make better building blocks to connect data or information ?
  • And last but not least, at a physical layer, and I believe that is exactly where the gap lies between pure data modelers/architects and database engineers/developers, you have row, column, document, native graph structures. Are you totally convinced that the best way to physically implement a graph data-information model is with a native graph structure on permanent storage ?

If you ask me to answer these questions I suggest you read more information about S3DM/R3DM conceptual, computational semiotics framework and the related projects HyperMorph and TRIADB. This is what I have been involved with for the last ten years.

The nice thing with social media is that if you deserve to get some good feedback from professionals and experts of the field in a post like this one you might decide to extend it. That’s exactly what I did in the following few paragraphs to respond in a better way to comments in my post._

Business vs Technological Factors

I believe the resurgence of no-SQL movement (2009), which is highly related to data modeling, started out of the emergent need of companies-users to deal with the velocity, volume, variety and other properties of distributed data resources. That same movement pushed forward our technological boundaries to create better DBMS and BI, web/desktop applications.

Personally speaking I joined that movement because of my business need to manage complex medical databases. Another reason was to enrich and manage collectively and efficiently my own data resources, i.e. personal information management. Believe it or not I am still not satisfied from what exists out there and that is why I ended up researching and developing my own solution.

Apparently another critical factor has to do purely with the cost of a solution you apply to a complex data/information management problem. That is also why open-source based systems are becoming more and more popular nowadays. We need bulletproof, open-source tools with strong communities behind to support, develop and test them. Do consider also that in many cases open-source is an escape from the high cost cloud solutions and corresponding vendor-locking mechanisms.

Eventually, somehow, sometime something similar to a Linux OS, will be created for information management at a higher level. Something that will earn the consensus of developers, end users and companies emerging out of practical need for a new data modeling standard, for effective physical layer implementation, highly responsive graphical user interfaces and efficient augmented artificial intelligence tools.


Athanassios I. Hatzis, PhD
Software Engineer - Researcher, Founder/Independent Contractor