Table of Contents
Why HyperMorph
Not just another ETL tool
There are many open-source tools written in Python for ETL or ELT process, petl, bubbles, mara-pipelines, mara-schema, bonobo, luigi, odo, etlalchemy, mETL, riko, carry, locopy, etlpy, pygrametl. Authors in many of these tools realized that Python developers need a uniform interface based on object oriented abstractions for commonly used operations.
Well-designed powerful OOP classes
HyperMorph offers interactive console programming and development with high-level OOP components tailored to cover all aspects of database management and analytics. HyperMorph is very rich in that aspect and provides DataSet
, Table
and Field
classes for data resources, DataModel
, Entity
, Attribute
classes for data models, SchemaGraph
, SchemaNode
, SchemaLink
, SchemaPipe
classes for metadata management, DataGraph
, DataNode
, DataLink
, DataPipe
for data management, Connector
class for python drivers/clients and at the highest level of management we have ASET
(Associative Entity Set is similar to Relation) and HACOL
(HyperAtom collection).
Schema and data as objects and nodes on a hypergraph
HyperMorph goes one step ahead of the OOP design principle. It creates objects with 3D numerical vector identities and links them as nodes on a hypergraph. That graph is powered by graph-tool one of the best and fastest network analysis tools in Python. Hypermorph keeps separate schema information, i.e. metadata, from stuctured data (tuples, hierarchical, graph, table, etc). This unique feature offers the possibility to organize easily data resources and to build complex customised data models in order to digest data. Data integration (consolidation) requires to manage successfully the complexity of mapping data resources on a data model something that can be easily done when our objects are hypergraph enabled and have numerical key vectors to identify their exact location in the schema, data graph.
HyperMorph Connectors
Another fundamental difference of HyperMorph with ETL tools is on the Python DB Driver/Adapter side. The current release supports:
- Clickhouse-Driver
- MySQL-Connector
- SQLAlchemy with the following three dialects
- pymysql
- clickhouse
- sqlite
On top of these drivers HyperMorph uses a Connector
class to abstract and unify SQL command execution sql()
in a functional way and wrap commands to extract metadata get_tables_metadata()
, get_columns_metadata()
. Transformation to tuples, json rows, tuples, columns, and pyarrow batch/table is taking place at this level. At this stage performance is a critical factor. In our design and implementation of HyperMorph connectors we are seeking to minimise the time delay and data transferring speed. Therefore the protocol of communication that is used in the python database driver/adopter is highly important.
Pipelines
This is a standard approach in ETL frameworks and a very useful one because in general pipelines
are flexible and intuitive in programming. Hypermorph is not an exception we tried to make a difference here by designing same pipeline operators for fetching either data or metadata. For example there is an over()
operator for projection and to_dataframe()
for transformation to Python Pandas dataframe. We have even wrapped functional commands on pipelines so that you can choose between OOP (chaining) or functional style of programming.
Not only a data storage and transformations-analytics tool
There is another category of tools related with data storage (in-memory, on-disk), transformations and analytics processing, such as TileDB, datatable, pandas, petl, vaex, pytables, ibis, numpy, dask, pyarrow, gandiva. Usually most of them construct a table data structure in-memory or on-disk and use either a column layout or row layout to process the data. Hence they resemble database engines. In fact previous prototypes of HyperMorph (see TRIADB project) were based on SQL database engines. This time the current, first, release of HyperMorph is powered by PyArrow. There are many reasons for that choice. Most important PyArrow is mature and provides a columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware including GPUs. But regarding HyperMorph the killer feature of PyArrow package is dictionary encoding which is utilized to implement associative filtering, part of our associative semiotic hypergraph technology, in the style of Qlik analytics engine.
More promising than data virtualization and cloud analytics services
In recent years there is also another approach for data management and analytics aiming to skip the weary ETL process. Usually these are SaaS products on the cloud, such as panoply, dremio, knowi, denodo, and many others. They provide GUIs and act as middleware between DBMS and BI platforms. Naturally these are proprietary products and details on how they work under the hood are hidden. Developers or power users have to stick with menu-widget driven interfaces than having the ultimate flexibility of programming at the level of Python language. You may consider HyperMorph as an open-source API with the same role to fetch data for graph visualisation platforms. HyperMorph has three key differentiating points here data consolidation, user defined data modeling and interactive associative filtering for analytics with the option to visualize connected data on a graph. And because HyperMorph is open-source it is more promising that potentially our technology can be used from many software vendors for BI applications.
Speechless HyperMorph Screencast
Hypermorph speaks for itself
Watch the demo, check youtube settings and make sure video quality is at 1080p HD. You may also set the playback speed at 0.75 to increase the time of executing commands.
Now you know that you can …
and the only limit on what you can is your imagination.
Installation - Demo Test - Documentation
Step by step instructions
on how to install release.
Demo Guide to Test package
Demonstration of HyperMorph functionality on data resources and demo scripts that are included in the distribution.
Documentation
A draft of the documentation from comments in source code is generated automatically with Sphinx and it is hosted at GitHub.