TRIADB

Associative Semiotic Hypergraph Database Framework

At A Glance

TRIADB is an innovative, multi-perspective database framework. It is a Python library that sits on top of suitable NoSQL/SQL data store engines and enables them to perform easily integration, correlation, aggregation and hypergraph exploration of multiple data resources. TRIADB is founded on the principles of R3DM/S3DM associative, semiotic, hypergraph technology pioneered by Dr. Athanassios I. Hatzis the founder of HEALIS.

Architectural Design

In terms of the architectural design TRIADB is based on R3DM/S3DM Associative Data Model. Foundational principles, theoretical formalization and ontological dimensions of the framework and the data model are dating back to the year 2012. R3DM/S3DM is a multi-perspective associative data model that shares certain similarities with Qlik associative technology, AtomicDB and X10SYS associative technology, Sentences associative data model, Topic Maps data model and correlation database model. The main difference of R3DM/S3DM technology from other similar associative technologies is that it has a solid theoretical background, a unified data modeling architecture and at the same time it is distinct in its design and implementation. For more information download:

HEALIS white paper on TriaClick Architectural Overview

Latest Release

Currently there are four TRIADB prototypes implemented, one for OrientDB with Mathematica, another on top of Intersystems Cache with Python and Jupyter, another for Redis and the latest one for MariaDB and ClickHouse. In the current release of TRIADB, TriaClick, MariaDB stores data dictionary information and ClickHouse data storage engines are used for processing and querying data.

Install and Test a fully functional and working freeware demo version of TriaClick

A console based API is implemented with TriaClick CQL (chain query language). Python method chaining is a very popular development approach in python object-relational mappers (ORMs). Same as ORM, CQL provides a high-level abstraction upon the query language of the DBMS, e.g. ClickHouse SQL. That allows the developer to write Python code and to use object methods with a fixed set of arguments instead of writing complex SQL code to create, read, update and delete data and schemas in the DBMS. CQL also makes it theoretically possible to switch an application between different DBMS and also reuse the same queries in a different project by modifying only the methods’ arguments. And because objects, i.e. data model and data instances, are constructed dynamically from numerical key references that are stored in TriaClick databases; the “impedance mismatch” problem disappears.

TriaClick Logical Inference

Currently user selections are implemented with CQL Select operator. After executing a CQL filtering operation TriaClick engine immediately calculates all distinct values for each Attribute of an Entity that are relevant to the selection. The engine not only filters the Entity, i.e. table, but also all the other attributes, i.e. columns, instantaneously filter themselves based on that selection. This logical inference allow TriaClick engine to show the user/developer not just which data is associated with user’s selections but also what data is excluded due to these value selections. And since our logical data model is a HyperGraph (see TriaClick HyperGraphs) every data point in the entire dataset is always associated with all other data points at all times.

The main differences from other associative engines is that processing of interactive, free-form queries is taking place on-disk thanks to the powerful columnar layout of ClickHouse and it is also possible to append or modify data without reloading the whole or part of the original data set.

Furthermore another key differentiation factor from other SQL based systems is the ability to refine context. In other data models entities and attributes are discrete, disconnected and don’t stay in context with one another. Filtering doesn’t show the relationship or impact that selection has on objects within an application. In TriaClick associative, semiotic, hypergraph data model user’s iterative interactions perform a progressive query materialization Fig. 1 that reflects at each step a new context for items of a data subset that are aware of their position and their neighbours.

Fig.1 - Incremental query in three stages of progressive associative filtering with iterative selections.

TriaClick State Machine

We can really think of TriaClick engine as a state machine for data sets. When we apply a selection and then ask for filtering the data; the engine will propagate that filter across the data model based on TriaClick hyperlinks. In the current implementation there are two different states for each distinct attribute value: One is the input state, i.e. whether the value is selected or not; and the other is the ouput state, i.e. whether the value is associated with those selected or excluded due to previous selected values Fig. 2.

Fig.2 - Distinct values of prtName attribute, i.e. items of a HyperAtom collection.

Demonstration

In the following screen Capture Demo of TriaClick we show the execution of commands from two python console applications that are built with TriaClick library. The various operations (methods) of our Chain Query Language (CQL) aim to make the processing line of data integration, exploratory data analysis and visualization easier, faster, intuitive, and more efficient and accurate for the database/data analyst expert.

There is also another playlist of video screen capture recordings from a previous implementation of TRIADB on top of Intersystems Cache DBMS. We demonstrate the python client API functional, interactive user interface of TRIADB system with commands to add models, resources and records, queries to retrieve data, or hypergraph traversal operations. You may start watching the last video of this series to get an overall impression of TRIADB use.

Screen Captures

TRIADB Hypergraph Neighbourhood

The first two images of our collection portray a hypegraph of seven records, two on the left (Parts), three in the center (Catalog Entries) and another two on the right (Suppliers). The main difference is that in the first one nodes are shown with values and in the second one with 2D numerical references. Images are aligned so you can switch back and forth with the left and right arrow keys.

The third image shows execution of TRIADB Python commands on a Jupyter notebook. This is a traversal query that fetches the tuples that make the neighbourhood of the two parts that we saw in the first two images.

In the fourth image two records are drawn, they have common values for the weight and the name of the part. There is a red (partID=993) and a silver (partID=994) Acme Widget Washer but they do not share a common unit for the weight. This cleansing problem is solved in the fifth image where the two parts share three common values.

The sixth image visualizes the bipartite mapping solution. The fields on the left are mapped onto attributes on the right and the pandas 2D frame in the seventh image presents their names and key references.

Finally the last two images display a hypergraph data model for movies and a movie instance with many participants. Again you can switch back and forth with the left and right arrow keys.

Conference Presentations

Key Differentiating Factors

The following is a list of technical specifications and features in the design and implementation of TRIADB. This same list is what makes TRIADB a unique and valuable software product:

  1. Multi-Perspective Database Framework: tuples, domain sets, objects, hypergraph, hierarchical
  2. Act both as an operational and data warehouse database with a 360 degree view
  3. Automatic fixed, primary indexing schema instead of user-defined secondary indexing
  4. Manage the references instead of data: relying on reference-based associations and logical identifiers
  5. No duplicates: single value instance based on system defined primitive data types
  6. Consolidation of multiple data resources and mapping on user-defined data models
  7. Management of data resources, data models and metadata
  8. Python Chain Query Language (CQL) that avoids namespace and impedance mismatch problem
  9. Interactive, free-form, contextual queries

Business Strategy

We create strong partnerships with database vendors to implement and fine-tune TRIADB on top of their technology stack and we offer consulting services on how to apply Associative, Semiotic, Hypergraph technology. We are not selling licenses or software, we provide full stack solutions and add-on value for our own clients, or for the clients of our vendors. And speaking about scaling, availability and performance, we scale together with our vendor. We inherit the performance, availability and TCO of our vendor.

Part of our business plan is to open-source our associative, semiotic, hypergraph technology. We are seeking for developer’s community consensus on the use of it and we strongly believe that TRIADB framework will be eventually adapted by major semantic and database technology players in the IT industry.

We embrace competition from full stack BI software, or from semantic technology platforms. We are not limited in the options we have. We will prove in practise that we are able to provide better solutions to our customers.

Next Milestone

We are aiming to reach TRIADB minimum viable product with one or more pilot studies that will be based on real client problems. Each time we produce a new release it will become available to download and test it for free. Stay tuned.

Acknowledgement

We would like to thank Intersystems Cache for providing us with a license of Caché DBMS to test TRIADB. We would also like to acknowledge the valuable assistance of ClickHouse engineering team and help from other developers at GitHub on resolving bug issues and asnwering questions.

Build valuable relations; establish effective communications

© HEALIS - Athanassios I. Hatzis, March 2018

Modified: