Surveying semantic web tools

I have been helping a customer with a semantic web project. He likes the Virtuoso platform because it is well documented and widely used and he wants to learn Clojure so there are no technology decisions to be made: we have a thin Clojure library for hitting SPARQL endpoints (which I will open source soon) and run the open source version of Virtuoso. Nice combination.

However, for my own interest I spent a fair amount of my time this weekend taking a stroll through the semantic web tools landscape:

In the early days of the semantic web Swi-Prolog and its semweb library was the way I got started. I took another look at the ClioPatria project. It is written in Swi-Prolog and has the best web front end I have seen for a SPARQL endpoint. ClioPatria is very compelling to me personally but then, I like Prolog! Few people program in Prolog anymore, but if you like Prolog definitely check out ClioPatria – a cool project!

I also experimented with owlim-lite which is very conveniently packaged as a Sesame WAR file (with owlim-lite being installed as a SAIL back end data layer). I have a lot of experience using Sesame (you can grab free PDFs of my two semantic web books, one of which uses Sesame on my web site). owlim-lite is free but not open source, you just need to ask for a copy.

I spent a fair amount of time about 1 1/2 years ago experimenting with Stardog but set it aside because I was more comfortable in the open source world of Sesame. I think I have changed my mind on staying with an open source stack. I found the combination of Stardog with Antonio Garrote Hernández’s Stardog Ruby gem to be a close to zero-friction experience. Everything “just worked:” loading large data sets, client side programming, etc. I was hacking around writing a Clojure library for Stardog last night but it is messy; I am going to wait until the Java client library is packaged separately in the future version 2.0. The free (but not open source) community edition of Stardog has very generous capacity limits.

I have been “believing in” the semantic web for over 10 years and with many high profile semantic web projects in large corporations and government programs, I expect solid growth in this industry. One frustrating aspect of the semantic web is depending on other people’s SPARQL endpoints to be running and available. The availability issue seems to be getting better but for now, I think that the sweet spot is information gathering systems (both automatic and human assisted) that are not real time – basically creating application specific local caches for linked data.

Handling anonymous user IDs in web apps

I don’t like web apps and apps for my Android phone and iPad having more information on users (like me!) than they need to support their functionality. I wrote a “practice” Clojurescript web app a few months ago that allows me to make notes on all of my devices. Really simple, but it does no more and no less than what I wanted for my own use. (You can play with it here.)

Recently I updated this to allow anonymous use with persistent user ID across all devices. If you hit the default web app URI you will notice a link at the bottom of the page that a user (you!) can copy and email to yourself. Open the email on your other devices, bookmark the URI, and optionally make a desktop icon link (this is what I do on my Galaxy S III, which is where I most often use this app).

This setup process is a little tedious: using my laptop I created a new account as a test using the root application URI, copied the URI listed at the bottom of the page, and it took a couple of minutes to set up my iPad and phone with the same (new) anonymous account.

One alternative is using authentication using Twitter, Google, Facebook, and/or Yahoo as I have implemented on several customer projects. Easy, but for many types of apps, it annoys me to make users identify themselves. Another alternative is having users create an account name and password, and perhaps this is better alternative.

I would appreciate comments on how other developers handle user login for anonymous applications.

If you are interested, here are the bits of code I use to implement this. I check for a cookie in the user’s request using:

(def cookie-name "focustestapp")

(defn check-auth [request]
  (let [id (:value ((:cookies request) cookie-name))]
   (or id (str (java.util.UUID/randomUUID)))))

I pass the generated or existing cooking to the Clojurescript code for the single page web app and the “login URI” is displayed for the user to reuse. All of the user’s data is set to the client side app using the route for /edn. I also have routes for updating/deleting/archiving notes; the client always passes back the user’s unique auth token.

(def index-page (slurp "resources/public/index.html"))

(compojure/defroutes routes
  (compojure/GET "/" request (let [id (check-auth request)]
                               {:body index-page :cookies {cookie-name id}))
 (compojure/GET "/edn" request (let [id (check-auth request)
                                      data (all-foci-for-user id)]
                                   :body (pr-str data)}))
  (compojure/GET "/update" request (update-data ((:query-params request) "s")))
  ....
  (route/resources "/"))

By the way, it is just about trivial to pass data back and forth between client side Clojurescript and server side Clojure code. In the route for /edn one time and then just deltas of changed data flow between client and server. Notice the call to (pr-str data) to serialize the data returned to the client. On the client side, require [clojure.edn :as reader] and read the response like:

(defn edn-to-clojure-data [s]
  (let [data (reader/read-string s)]
    (d/log "raw string for edn data  data= " data)
    data))

Sending data from the client to the server is also pleasantly easy. In this example (assuming a require for [goog.net.XhrIo :as xhr]) I am using GET:

(xhr/send (str "/update?s=" data) handle-ajax-response "GET" ""))

Catching up

I have been busy lately and haven’t blogged in 3 weeks, so this is a “catch up.” I have been working this year on developing several business ideas and getting more comfortable using Clojure and Clojurescript for almost all of my development.

I have had a break from these entrepreneurial activities: Carol and I went to Rhode Island to visit family for 9 days and after we returned home I have been helping two customers (one with a semantic web genomics project and the other with a natural language processing system for students learning English). I had promised myself that for this year and next year I would limit myself to about 1 hour per day of customer work, and in the last 2 weeks I have run over that limit a little bit, but I am having a lot of fun with this customer work. It is probably best to be flexible and not plan too far ahead!

I have committed myself to using Clojure for almost all of my development work but I am still not 100% convinced that Clojurescript is right for me for most projects. The issue is that it takes much more effort for me to build rich client applications in Clojurescript than traditional web apps using Hiccup (with small bits of AJAX or PJAX). I also haven’t given up playing with/experimenting with Ember.js and I feel close to really getting my head around Ember Data so when I need a rich client Ember.js is definitely still a possibility.

I am taking three Coursera classes right now (Probabilistic Graphical Models, Computational Neuroscience, and just starting Introduction to Data Science). I should probably drop at least one of these classes due to time constraints but I think I will hang in there with all three for a few more weeks.

Weather has been great in Sedona Arizona where we live. Here are a few pictures I have taken in Sedona and the surrounding area. In the last 6 weeks Carol and I have painted the outside of our house and caught up on yard work. I also have been hiking and kayaking a lot.

A Clojurescript rookie’s survival guide notes

I have been enjoying the combination of writing Clojurescript clients with server side Clojure + Compojure. However, I have run across a few pain points and use a few simple little hacks to make things easier. My little hacks may be far from “best practice”:

Forget about JSON, just serialize Clojure data. This seems strange since JSON is nice to use and cross platform, but when talking Clojure to Clojure, I find it easier to use pr-str to print Clojure data to a string and in Clojurescript to read it using:

(ns myfocus.client
  (:use-macros [crate.def-macros :only [defpartial]])
  (:require [goog.net.XhrIo :as xhr]
            [goog.object :as gobj]
            [goog.events.KeyHandler :as key-handler]
            [goog.events :as gevents]
            [goog.dom :as dom]
            [domina :as d]
            [domina.events :as events]
            [dommy.template :as template]
            [cljs.reader :as reader])) 

(defn str-to-clojure-data [s]
  (let [data (reader/read-string s)]
    ....))

and on the server side in Clojure (there may be security issues with this):

  (let [data (read-string s)
    ...

EDIT: Lucian and Bernard pointed out that I should have used clojure.edn/read-string instead of cljs.reader.read-string to avoid a security hole.

Handle authentication with an anonymous unique user ID and add this to all calls to the server. The first time a user hits an app, generate a unique and anonymous ID for them; on the server side:

(ns myfocus.server
  (:require [compojure.route :as route]
            [compojure.core :as compojure]
            [ring.util.response :as response]
            [ring.adapter.jetty :as jetty]
            clojure.pprint)
  (:use [ring.middleware.cookies :only [wrap-cookies]]
        [ring.middleware.params :only [wrap-params]]
        [ring.middleware.keyword-params :only [wrap-keyword-params]])
  (:use myfocus.utils)
  (:use myfocus.db)
  (:gen-class :main true))

(def cookie-name "magic-cookie-name")

(defn check-auth [request]
  (let [id (:value ((:cookies request) cookie-name))
   id2 (or id (str (java.util.UUID/randomUUID)))]
    id2))

I will use this check-auth function both when the user first loads the page and the user’s auth ID and data is sent to the client when I return a very small HTML file with the route

(route/resources "/")

and also later to verify that client requests are valid. The first thing that the Clojurescript (compiled to Javascript) does on page load is to request the user’s data that I process on the server side with a route like this:

  (compojure/GET "/edn" request
       (let [id (check-auth request)
             data (all-data-for-user id)]
             {:cookies {cookie-name id}
              :body (pr-str data)}))

I ended up forgetting about the proper semantics of REST verbs and always used GET requests, reusing the same functions for different types of requests. I pass a user’s auth ID with every call to the server.

I store the user’s auth ID, user data and dirty data identifiers in the browser using something like:

(def user-data (atom []))
(def dirty-data-names (atom #{}))
(def user-auth (atom ""))

All application data lives in the browser and I send diffs back to the server as data changes. I am writing a one page application and I use a few forms with no action like this on a single HTML page:

<form id="menu-form" class="menu_form" action="javascript:void(0);">
   <input id="recent-button" type="button" value="Recent"/>
   ...

I set event handlers for this input button (and all other controls) in a main function on the client:

(defn ^:export main [] ;;    main static initialization:
  (let [document (dom/getDocument)
        handler (goog.events.KeyHandler. document true)
        timer (goog.Timer. 2000)]  ;; 2 seconds
    (fetch-user-data)
    (when (not (.-enabled timer))
      (. timer (start)))
    (gevents/listen handler "key"
      (fn [evt]
        (let [id (.-id (.-target evt))]
          ...)))
    (gevents/listen timer goog.Timer/TICK flush-dirty-foci-to-server)
    (events/listen!
      (d/by-id "recent-button")
      :click (fn [event]
               (update-menu-focus-list sort-by-most-recent)
               (d/set-styles! (d/by-id "recent-button") {:background-color "darkgrey"})
               (d/set-styles! (d/by-id "important-button") {:background-color "white"})
               (d/set-styles! (d/by-id "archived-button") {:background-color "white"})))
    ...))

I generate new HTML for elements on the page when a user has to edit data, and discard the newly generated HTML when it is not needed.

While it at first seems tedious handling every event on the client and all communication with the server I (eventually) found that it just requires a set of small utility functions for communication, creating HTML DOM elements, and modifying and destroying DOM elements. Once the low level stuff is written then the main client side application logic is fairly easy to write. Because I have a lot of code reuse the client side code is fairly concise even though I have to handle low level details.

I have two apps I am working on (one simple and one moderately complicated) and after a learning curve I now have a better understanding of writing client side apps in Clojurescript.