Table of Contents
I have spent a lot of time studying in detail and experimenting with Freebase, and I am truly disappointed to see such a great collective effort to suddenly disappear from the software development scene without a good reason. I will justify this, but first let me discuss a bit about what is freebase for those that are not aware of the project.
Freebase, in my opinion, is currently the best collaborative information management system ever built. According to Wikipedia, “It is an online collection of structured data harvested from many sources, including individual, user-submitted contributions. Freebase aims to create a global resource which allows people (and machines) to access common information more effectively”.
Quick History Review
- In 2000 Danny Hillis first described his idea for creating a “knowledge web” which he called Aristotle !
- In 2003 the project known as “The Metaweb” begins, inside Applied Minds
- On October 2006 the One True Graph is born
- On July 16th 2010 Metaweb is acquired by Google
- On 30th of June 2015 Google plans to shut down completely the service. It is already in read-only mode
Summary of technology achievements
- Freebase counts twelve years of focused development effort by a strong dedicated team of experts.
- Freebase counts nine years of online editing with superb web-based client APIs and GUIs
- Freebase, Google, have defined and implemented state of the art technology in collaborative information management. Do consider the following topics just for a start :
Database management system and data architecture
The most important IT asset is their proprietary Graphd database. Some information have been disclosed to the public in the past, but of course the database itself is not going to be released or opened for inspection.
Graphd encodes everything as a tuple. Every structure and construct such as entity type, entity instance, property, data types, domain, namespace, ontology you name it, are constructed from tuples. The very first layer of abstraction that is created on top of this is the object-link that implements RDF kind of statements in Freebase. Every item in freebase is either a link or an object. And you can get ALL the links to or from any object. Defining a single primitive construct as a building block is one of my R3DM semiotic principles in data modelling abstraction. Topic Maps data model (TMDM) follow the same kind of logic, where everything is a topic and every other construct is built with topics. In RDF/OWL fundamental construct is the triplet, in Graph databases you have the node, in key-value stores you have the key-value primitive.
The identity crisis solution
The identity crisis, that is still a problem in the linked-data world, has a fair solution in Freebase with not a single URL-based namespace identifier but with three kinds of identifiers that each plays a different role :
- Under the hood GUIDs that are written internally in Graphd database
- Long user friendly, human-readable, IDs
- Short Machine IDs that cover the whole lifecycle of anything recorded in the database, i.e. track changes, solve merging/splitting issues, etc.
Schema, Type System, Ontology
A flexible generic schema, that covers any other ontology based on top of it with only four hierarchical containers.Namespace —> Domain —> Types —> Properties
This is the most generic, plain, T-Box you can define, with just four levels. Pause for a moment here and take a very careful read on the following. Are you familiar with the following fundamental hierarchies in Computer science ?
- Namespace - Package -Class -Object (OOP)
- Database - Schema -Table -Row (RDBMS)
- Domain - Type -Property -Instance (Ontologies and XML)
This is not a coincidence, and there is a very good reason that we naturally create abstractions based on this hierarchy. The very first person that came up with this astonishing observation is inventor Ron Everett and AtomicDB’s under the hood encoding structure (Environment, System, Context, Item) is the heart of their system. This is not the space or the time to cover that in more detail, it suffices that S3DM/R3DM semiotic conceptual model can offer an adequate explanation on the abstraction/reference mechanism behind the scenes.
In freebase every property, i.e. bidirectional link between two items, by the way this is also a fundamental technology novelty of AtomicDB, stay tuned…., knows exactly what entity types to expect.
They have implemented a kind of multiple inheritance mechanism, such that one type with its properties can automatically included within another type. Included types makes it possible to create any kind of hierarchical structure where it can be referenced. Most important you can easily create custom types based on previously defined types.
That is based on both the namespace and disambiguating properties in identifying similarly-named topics. A score is also calculated to rank the items in conflict
Complex/Compound Value Types (CVTs) and Relationships
You can define new data types, e.g. use a dated integer for measurements. You can also define mediator type of nodes between the source and target of the original simple relationship. No black nodes and other such RDF/OWL crap in freebase.
In freebase you can also store large objects, i.e. text/binary stream.
Full Access Control and Auditing mechanism
Every user action is recorded inside freebase. They have defined Unix like access control on anything, i.e. user, user group, permissions. It is possible to request a full history on any item that is inside the database
Fully web-based user interfaces and applications
If you want to attract easily new, inexperienced, users, then you must have the best ever built user interface suited for your purpose and Freebase-Google have managed to do that. You can access everything with a few clicks of a button. You can view everything in a well-presented hierarchical or table like format. You can build your queries online and save them as objects in the database, then you can simply access them with a permanent, tiny URL. Likewise, you can program your applications and save them like objects in the database.
MQL - The Metaweb Query Language
Last but not least, get your hands dirty with a truly magnificent piece of programming art in Freebase, their MQL query/update mechanism. It is based on pattern matching. Programmers and even advanced, non-expert users can define easily a query pattern and take back result in popular JSON formatted string. You can limit or sort results, you can ask for ordered collection items, you can specify optional directives and constrains and pattern matching with operators. Finally results have a numerical relevancy score, this is another Google’s secret technology similar to the full implementation of page ranking technology. For example you can get a ranked list of the most notable topics with a given name.
This is certainly not an exhausted list of best of breed Freebase features but I hope you have read enough about the technology to make your own judgement on why Google shuts down completely their freebase service. Their one page announcement does not say much about it. It is not signed by any Google-Freebase authority persons. In my opinion, the reason explained there to all of us that embraced that project is a sham and is asking for reproach. Most of us knows that Google’s knowledge graph, 2012, that enhances Google search engine with semantic information is based on Freebase.
Let met state clearly that I welcome Google’s decision to make contributions to Wikidata project but I do not understand why they have to shut down completely Freebase in doing that. It is going to take a long time to reach the level of experience that users already have with Freebase. And what about the curated data of many people and the projects they have built on top of it ? What about all these technological advances I described in this post, how one will be able to see how they operate and make a comparison with other projects ?
For a start, I think Wikidata is a new project, that is built on a far worse fundamental data architecture than Freebase. Nevertheless I acknowledge that there are many good ideas and efforts to share. But, it makes no sense to eradicate such a historical project as Freebase in order to continue development on a newcomer.
So, what other alternatives exist in this area ? One is the DBpedia project, purely RDF based linked-data project, that started in 2007. More recent proprietary systems are:
- Pool-Party semantic suite, also RDF linked-data based,
- Kamala a Topic-Map based web application,
- Siren Investigative Intelligent Platform
Neverthless I anticipate that sooner or later you should expect a unicorn in semantic based information management that will simply shutter and unify all the rest. The reason I believe this is going to happen is because none of these companies have an efficient underlying fundamental data architecture. In this other dimension, in the twilight zone, there is not any player in the market.