Wednesday, January 13, 2010

Looking towards a universal wrapper/proxy for knowlege and data stores

I often start out by writing code specific for a single project and then refactor it to make working code more generally useful. I am working on an applications book for the AllegroGraph RDF data store services. Since most people (probably) use Java clients with AllegroGraph, the first step is to wrap Franz's APIs with my own interfaces so for my own work (that is, beyond the scope of writing this book), I can write implementations for other back end RDF data stores as I need them.

I also plan on writing "thin" Scala, JRuby, and Clojure friendly interfaces to my Java library. For the purposes of the book, I'll use this library to support example client applications written in Java, JRuby, Scala, and Clojure. I have a lot of material already written with Lisp examples but I think that I am going to set that all aside for a future writing project (that I may, quite honestly, never get back to). I also decided to not support Python in my book: Franz has a good Python interface library and examples and in any case, I am not a Python developer.

So far, I have a fairly clear road map of what I need for this specific book project. Long term, after this book is done, I am aiming to also wrap other knowledge sources like OpenCyc, Freebase, etc. While it is tempting to view most knowledge sources as graph data (RDF), it seems like a poor idea to give up the inferencing available in OpenCyc, all the features of the Freebase MQL query language, etc.

Since I often find myself reusing my own small code examples to access multiple knowledge sources, it may be time soon to step back and decide what can be placed behind common interfaces.

7 comments:

Tom Morris said...

I'd be interested in hearing more about a wrapper for RDF, MQL, SPARQL, etc as you make progress.

I recognize SPARQL is a standard, but it's a darn ugly one. It'd be nice to have MQL style QBE available too.

adam said...

Can you provide an update on the ETA for your book? I'm working on a project right now where it would probably be a fantastic resource.

Mark Watson, author and consultant said...

Tom: I would think that a MQL wrapper could be very nice in idiomatic Scala or Clojure

Adam: I am not sure. I am waiting for Franz to delivery version 4 of AG. I did a lot of work on this book a year ago, but stopped, waiting for the new release. I am assuming that the beta I am using right now is OK for developing the example programs for a final version 4 release.

Adrian A. said...

What about Neo4j ?
http://neo4j.org/
It looks like a nice Java based alternative.

Mark Watson, author and consultant said...

Adrian: Neo4j is optimized for exploring/traversing a graph given a starting node. RDF data stores are optimized to match patterns in SPARQL queries against the RDF data in the data store. So, all are good tools but for different types of applications.

Adrian A. said...


So, all are good tools but for different types of applications.

Well, I saw that neo4j competes with AllegroGraph but it's pure Java: that's why I asked.

There's also a SPARQL module on top of neo4j, and a it seems to be used by some for RDF too:
http://components.neo4j.org/neo4j-rdf/
http://components.neo4j.org/neo4j-rdf-sail/

Mark Watson, author and consultant said...

Adrian: thanks for your comments. Neo4j uses the AGPL license so I can't use it on some projects. That said, 2 things: I like the AGPL license and use for some things that I write, and the commercial use costs of Neo4j are reasonable.

Using Neo4j as a Sesame SAIL back end looks interesting, but the native Sesame SAIL implementation is efficient and "just works."

BTW, I like the JRuby gem neo4j for using Neo4j - if you like Ruby, then check it out.