Monday, July 07, 2014

Using OpenCyc RDF/OWL data in StarDog

Originally published August 7, 2013

Over the years I have used the OpenCyc runtime system to explore and experiment with the OpenCyc data that consists of 239K terms and 2 million triples. In order to test the ease of use of the RDF OWL OpenCyc data I tried loading the OpenCyc OWL file into StarDog:

$ ./stardog-admin server start
$ ./stardog-admin db create -n opencyc ~/Downloads/opencyc-2012-05-10-readable.owl
Bulk loading data to new database.
Loading data completed...Loaded 3,131,677 triples in 00:00:47 @ 65.8K triples/sec.
Successfully created database 'opencyc'.

After you load the data you can experiment with the StarDog command line SPARQL interface. You use the interface to enter SPARQL queries one at a time:

./stardog query opencyc "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10"
Here are some SPARQL queries to get you started using the command line interface (I am only showing the SPARQL queries):
SELECT ?s ?p ?o WHERE { ?s ?p ?o FILTER(REGEX(?o, "Clinton")) } LIMIT 30
SELECT ?p ?o WHERE { :HillaryClinton ?p ?o } LIMIT 30
SELECT ?o WHERE { :HillaryClinton :wikipediaArticleURL ?o }
Notice that OpenCyc terms like :HillaryClinton are in the default namespace. The results for this last query are:
+-------------------------------------------------------+
|                           o                           |
+-------------------------------------------------------+
| "http://en.wikipedia.org/wiki/Hillary_Rodham_Clinton" |
+-------------------------------------------------------+
You can easily convert WikiPedia URLs to DBPedia URIs; for example, this URL would as a DBPedia URI would be <http://dbpedia.org/data/Hillary_Rodham_Clinton> and using the DBPedia live SPARQL query web app you might want to try the SPARQL query:
SELECT ?p ?o WHERE { dbpedia:Hillary_Rodham_Clinton ?p ?o } LIMIT 200
Some RDF repositories support the new SPARQL 1.1 feature of specifying additional SPARQL SERVICE end points so queries can combine triples from difference services. Bob DuCharme covers this in his book "Learning SPARQL" at the end of Chapter 3. Without using multiple SPARQL SERVICE end points you can still combine data from multiple services on the client side; for example: combine query results of multiple queries from a local StarDog or Sesame server with the remote DBPedia endpoint.

No comments: