Tuesday, July 08, 2014

Some Haskell hacks: SPARQL queries to DBPedia and using OpenCalais web service

For various personal (and a few consulting) projects I need to access DBPedia and other SPARQL endpoints. I use the hsparql Haskell library written by Jeff Wheeler and maintained by Rob Stewart. The following code snippet:


{-# LANGUAGE ScopedTypeVariables,OverloadedStrings #-}

module Sparql2 where

import Database.HSparql.Connection
import Database.HSparql.QueryGenerator

import Data.RDF hiding (triple)
import Data.RDF.TriplesGraph

simpleDescribe :: Query DescribeQuery
simpleDescribe = do
    resource <- prefix "dbpedia" (iriRef "http://dbpedia.org/resource/")
    uri <- describeIRI (resource .:. "Sedona_Arizona")
    return DescribeQuery { queryDescribe = uri }
    

doit = do
  (rdfGraph:: TriplesGraph) <- describeQuery "http://dbpedia.org/sparql" simpleDescribe
  --mapM_ print (triplesOf rdfGraph)
  --print "\n\n\n"
  --print rdfGraph
  mapM (\(Triple s p o) -> 
          case [s,p,o] of
            [UNode(s), UNode(p), UNode(o)] -> return (s,p,o)
            [UNode(s), UNode(p), LNode(PlainLL o2 l)] -> return (s,p,o2)
            [UNode(s), UNode(p), LNode(TypedL o2 l)] -> return (s,p,o2)
            _ -> return ("no match","no match","no match"))

    (triplesOf rdfGraph)

          
main = do
  results <- doit
  print $ results !! 0
  mapM_ print results

I find the OpenCalais web service for finding entities in text and categorizing text to be very useful. This code snippet uses the same hacks for processing the RDF returned by OpenCalais that I used in my last semantic web book:

NOTE: August 9, 2016: the following example no longer works because of API changes:


module OpenCalais (calaisResults) where

import Network.HTTP
import Network.HTTP.Base (urlEncode)

import qualified Data.Map as M
import qualified Data.Set as S

import Control.Monad.Trans.Class (lift)

import Data.String.Utils (replace)
import Data.List (lines, isInfixOf)
import Data.List.Split (splitOn)
import Data.Maybe (maybe)

import System.Environment (getEnv)

calaisKey = getEnv "OPEN_CALAIS_KEY"

escape s = urlEncode s

baseParams = "<c:params xmlns:c=\"http://s.opencalais.com/1/pred/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"><c:processingDirectives c:contentType=\"text/txt\" c:outputFormat=\"xml/rdf\"></c:processingDirectives><c:userDirectives c:allowDistribution=\"true\" c:allowSearch=\"true\" c:externalID=\"17cabs901\" c:submitter=\"ABC\"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>"

calaisResults s = do
  key <- calaisKey
  let baseUrl = "http://api.opencalais.com/enlighten/calais.asmx/Enlighten?licenseID=" 
                ++ key ++ "&content=" ++ (escape s) ++ "&paramsXML=" 
                ++ (escape baseParams)
  ret <- simpleHTTP (getRequest baseUrl) >>= 
    fmap (take 10000) . getResponseBody 
  return $ map (\z -> splitOn ": " z) $
    filter (\x -> isInfixOf ": " x && length x < 40)
      (lines (replace "\r" "" ret))
  
main = do
  r <- calaisResults "Berlin Germany visited by George W. Bush to see IBM plant. Bush met with President Clinton. Bush said “felt it important to step it up”"
  print r

You need to have your free OpenCalais developer key in the environment variable OPEN_CALAIS_KEY. The key is free and allows you to make 50K API calls a day (throttled to four per second).
I have been trying to learn Haskell for about four years so if anyone has any useful critiques of these code examples, please speak up :-)

No comments: