Tuesday, July 22, 2014

Trying Office 365 on Mac, iPad, and Android

I am evaluating alternative cloud services and the 1 terabyte of OneDrive storage certainly attracted my attention. I have been using Dropbox for many years and have usually been happy with it. While I was very disappointed that Dropbox added Condoleezza Rice to their board of directors (I don't like her strong support of our invasion of Iraq and her views on privacy vs. unencumbered government surveillance) that alone is not enough to make me stop using Dropbox. Still it is good to have options and I very much like the direction that Microsoft's new CEO Satya Nadella is taking the company. Don't get me wrong, I don't view Microsoft, Apple, and Google as being perfect either in regards to user privacy. A simple fact of life is that the US government can apply very strong soft pressure against tech companies in the US, to the detriment of our economy. Anyway, enough politics, here are my initial thoughts on Office 365:

I signed up for the free 30 day trial of Office 365 earlier today and installed all of Office 365 on my MacBook Air and just OneDrive, OneNote, and Microsoft Word on my iPad and Android phone. So far the best feature is that Word documents are actually easy to read and edit on my iPad and Android phone. Sweet.

Satya Nadella's strategy of supporting all devices for Microsoft's productivity tools seems like a great strategy to me. Anyone who doesn't think that cloud based services will continue to dominate the way people use devices has not been paying attention.

Unfortunately, OneDrive has some really rough edges dealing with opening plain text files on my iPad and Android phone. I keep notes as text files and the option for using notes seems to be importing everything into OneNote. Note syncing between my MacBook Air, iPad, and Android phone works well, but I really do prefer plain text files. Strangely, OneNote does not store notes files on OneDrive! On my Mac, they are hidden in ~/Library in a cache folder. PDF files can be conveniently read from OneDrive on iPad, but it is not so convenient on my Android phone.

What about security and privacy?

I use encryption when storing sensitive information on Dropbox and I am modifying my backup zsh scripts to also encrypt sensitive information to OneDrive. Easy to do! As a consultant, customers trust me with some of their proprietary data and information and I always try to keep customer data encrypted on my laptop and cloud backup.

Why not use Google Drive?

Actually, even though I don't sync my Google Drive to my Mac, I do use the web interface and use it for offline backups. Google Drive, like Microsoft's OneDrive, is not as facile as Dropbox. There is also the simple fact that I rely on Google for so many services that I prefer using an alternative cloud drive.

I am in no hurry to complete my evaluation of Office 365. My paid for Dropbox account is prepaid for another seven months. When my free evaluation period of Office 365 is up I plan on paying for the service for a few months while deciding if I want to make it my primary cloud service.

Why about Apple?

I really enjoy using both iOS and Android devices, mostly for the fun of the different experience. That said, now that I am basically retired (I still consult several hours a week, work on a tech business idea, and write books, so my friends and family take my "retired" status with some skepticism :-) I might end up just living in Apple's little walled garden, and use their cloud services. Right now, Apple's cloud services are not very impressive but I expect large improvements. In any case, I am in no hurry but sometime in the next year I would like to settle on one primary cloud service, using others as a backup.

Tuesday, July 08, 2014

Some Haskell hacks: SPARQL queries to DBPedia and using OpenCalais web service

For various personal (and a few consulting) projects I need to access DBPedia and other SPARQL endpoints. I use the hsparql Haskell library written by Jeff Wheeler and maintained by Rob Stewart. The following code snippet:

{-# LANGUAGE ScopedTypeVariables,OverloadedStrings #-}

module Sparql2 where

import Database.HSparql.Connection
import Database.HSparql.QueryGenerator

import Data.RDF hiding (triple)
import Data.RDF.TriplesGraph

simpleDescribe :: Query DescribeQuery
simpleDescribe = do
    resource <- prefix "dbpedia" (iriRef "http://dbpedia.org/resource/")
    uri <- describeIRI (resource .:. "Sedona_Arizona")
    return DescribeQuery { queryDescribe = uri }
    

doit = do
  (rdfGraph:: TriplesGraph) <- describeQuery "http://dbpedia.org/sparql" simpleDescribe
  --mapM_ print (triplesOf rdfGraph)
  --print "\n\n\n"
  --print rdfGraph
  mapM (\(Triple s p o) -> 
          case [s,p,o] of
            [UNode(s), UNode(p), UNode(o)] -> return (s,p,o)
            [UNode(s), UNode(p), LNode(PlainLL o2 l)] -> return (s,p,o2)
            [UNode(s), UNode(p), LNode(TypedL o2 l)] -> return (s,p,o2)
            _ -> return ("no match","no match","no match"))

    (triplesOf rdfGraph)

          
main = do
  results <- doit
  print $ results !! 0
  mapM_ print results

I find the OpenCalais web service for finding entities in text and categorizing text to be very useful. This code snippet uses the same hacks for processing the RDF returned by OpenCalais that I used in my last semantic web book:

module OpenCalais (calaisResults) where

import Network.HTTP
import Network.HTTP.Base (urlEncode)

import qualified Data.Map as M
import qualified Data.Set as S

import Control.Monad.Trans.Class (lift)

import Data.String.Utils (replace)
import Data.List (lines, isInfixOf)
import Data.List.Split (splitOn)
import Data.Maybe (maybe)

import System.Environment (getEnv)

calaisKey = getEnv "OPEN_CALAIS_KEY"

escape s = urlEncode s

baseParams = "<c:params xmlns:c=\"http://s.opencalais.com/1/pred/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"><c:processingDirectives c:contentType=\"text/txt\" c:outputFormat=\"xml/rdf\"></c:processingDirectives><c:userDirectives c:allowDistribution=\"true\" c:allowSearch=\"true\" c:externalID=\"17cabs901\" c:submitter=\"ABC\"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>"

calaisResults s = do
  key <- calaisKey
  let baseUrl = "http://api.opencalais.com/enlighten/calais.asmx/Enlighten?licenseID=" 
                ++ key ++ "&content=" ++ (escape s) ++ "&paramsXML=" 
                ++ (escape baseParams)
  ret <- simpleHTTP (getRequest baseUrl) >>= 
    fmap (take 10000) . getResponseBody 
  return $ map (\z -> splitOn ": " z) $
    filter (\x -> isInfixOf ": " x && length x < 40)
      (lines (replace "\r" "" ret))
  
main = do
  r <- calaisResults "Berlin Germany visited by George W. Bush to see IBM plant. Bush met with President Clinton. Bush said “felt it important to step it up”"
  print r

You need to have your free OpenCalais developer key in the environment variable OPEN_CALAIS_KEY. The key is free and allows you to make 50K API calls a day (throttled to four per second).
I have been trying to learn Haskell for about four years so if anyone has any useful critiques of these code examples, please speak up :-)

Monday, July 07, 2014

Setting up your own SPARQL endpoint for Freebase (with a Java client)

Originally published July 5, 2014

SINDICETECH has done a great job configuring the RDF data dumps from Freebase and making them available as preconfigured images for both Google Cloud and AWS. You can read their documentation here.

I used SINDICETECH's AMI for Amazon Web Services and getting an EC2 instance set up was very simple (about 15 minutes, including the starup time for Virtuoso.). Good job SINDICETECH, and the people at Google and OPENLINK (helped tune Virtuoso) for making this happen! The directions called for using a small EC2 instance but since I will likely only be running my instance occasionally (as needed) I chose a medium instance, hopefully to make my queries run faster.

If you are used to using Freebase data, the conversion to the rdf.freebase.com/ns/ namespace is easy enough to understand. The easiest way to start exploring the data is to use the SPARQL web interface (see a screen shot at the end of this post). You can also use SPARQL libraries for your favorite languages and programmatically hit the SPARQL endpoint that you have set up on AWS or the Google Cloud. The MID on Freebase that represent me is /m/0b6_g82 and the following Java code runs a SPARQL query matching that MID as the subject:

package kb;

import com.hp.hpl.jena.query.*;

public class FreebaseSparqlTest1 {
  public static void main(String[] args) {
    String sparqlQueryString1=
        "PREFIX fb: <http://rdf.freebase.com/ns/>\n" +
            "\n" +
            "SELECT * WHERE{\n" +
            "   fb:m.0b6_g82 ?p ?o .\n" +
            "}\n" +
            "LIMIT 20";

    Query query = QueryFactory.create(sparqlQueryString1);
    QueryExecution qexec =
      QueryExecutionFactory.sparqlService("http://YOUR_IP_ADDRESS_HERE/sparql", query);

    ResultSet results = qexec.execSelect();
    while (results.hasNext()) {
      QuerySolution sol = results.nextSolution();
      System.out.println(sol.get("p") + "\t" + sol.get("o"));
    }
    qexec.close() ;
  }
}
You should obviously replace YOUR_IP_ADDRESS_HERE with the IP address of the Google Cloud server or AWS EC2 instance that you have started.

The following screenshot show the interactive SPARQL query web app:
screen shot of SPARQL web app

I have been using the DBPedia SPARQL endpoint a lot recently. The reliability of this endpoint has improved dramatically but I would still like to also run my own instance. I set up DBPedia on a large memory server for a customer a few years ago - not a difficult process but it takes time.

My experience converting my NLP library and demo web app from Clojure to Haskell

Originally published June 20, 2014

Several years ago I took ten years of part time natural language processing (NLP) hacks in Common Lisp, Scheme, and Java and converted a useful subset to Clojure. The resulting code base was much easier to work with both because Clojure is such a clean and practical language, and also because any large scale code cleanup removes technical debt and makes for easier development.

In the last month in my spare time I took my Clojure code and converted it to Haskell. Here is the demo web app. I have about 80% of the functionality of the NLP library converted (automatic summarization, entity detection, and categorization). The Haskell web app is very simple compared to the Clojure demo web app, and I might improve the demo web app sometime but for now I am more interested in improving the functionality and accuracy of the NLP library.

Even though I am still learning Haskell I am already finding it easier to work on the NLP library using Haskell. I find both the strong type system and the quick edit + compile + run repl cycles with Emacs + Haskell to be a lot of fun and productive. On the other hand, Clojure + lib noir + compojure are hands-down my favorite tools for writing web applications. Haskell + Yesod are certainly nice to use, but I like the Clojure web stack better.

I have also been experimenting using Haskell for SPARQL client code using the HSparql library. Long term, I want to augment my Haskell NLP library to generate useful RDF from text in news articles.

Do your family and friends a favor and watch the movie "Fed Up" with them

Originally published June 18, 2014

Learn how the food industry in the USA is destroying the health of many people. The movie Fed Up will educate you on how to protect your family's health. I wish that everyone would watch it. Here is the Knowledge Graph description:

For the past 30 years, everything we thought we knew about food and exercise is dead wrong. FED UP is the film the food industry doesn't want you to see. From Katie Couric, Laurie David, and director Stephanie Soechtig, FED UP will change the way you eat forever.

It has been a long time since I have seen a documentary as useful and interesting as Fed Up.

More fun with OS X 10.10 Yosemite beta and Apple's new language Swift

Originally published June 4, 2014

I updated to OS X 10.10 Yosemite beta today. One improvement that is immediately noticeable is running Netflix movies in the new Safari browser. The video is smooth and the fan does not run, so I believe the news that Apple and Netflix have cooperated on improvements to use less CPU resources. So far the only problem I had was that the XCode 6.0 beta app is named XCode6beta and the links for accessing gcc and other command line tools were broken. I did the simple hack of renaming the beta app to the standard name and all is well.

If you watched the videos from the Apple Developers Conference then you have seen the new flat UI styling, like iOS. I like it well enough.

For me, the second biggest news from the developers conference is the improved integration between Apple devices. Apple, Google, and Microsoft all want users to live in their walled gardens. Except for my Samsung android phone which I love, I am all-in using Apple gear (except of course for a bunch of Linux boxes and servers).

Depending on how well Google and Apple compete with each other to convince me and other end users that they are doing what the can to support security and privacy, I might voluntarily go live in Apple's walled garden in the next year. That said, I have strong hopes that Apple, Google, Facebook, and Microsoft all step up to the plate and lock down their devices, software, and Internet access with strong security and privacy controls. May the company that does right by us, privacy and security wise, win!

As I wrote yesterday, the big news yesterday is the new language Swift. I have little experience writing cocoa and cocoa touch applications, and I have worked through some example apps. My main near term interest in Swift is how good of a language it is for AI applications because I think interesting applications in the future likely will be hybrid mobile apps relying on services. Swift compiled code is supposed to be very efficient so sharing responsibilities for calculations between device and server might make sense. There is room for different AI functionality both on the mobile side and server side of apps. Initially I am experimenting with how Swift is for numeric calculations (neural networks) and text mining that requires text processing, maps (dictionaries), etc.

XCode playground example

This screen shot shows an XCode 6 beta interactive playground for trying bits of Swift code. Like Light Table and the Scala IDE workbenches, each line is evaluated with the results shown on the right side of the window.

Experimenting with Apple's new language Swift

Originally published June 3, 2014

To be honest, I have always considered developing for Apple's platforms to be less than a joyful experience. When the Mac first came out in 1984 I developed the AI application ExperOPS5, which did well financially but at a cost of a steep learning curve (there were many "telephone book like" Inside Mac manuals to be read, and the tooling was not great.) I also developed a commercial Go playing program for the Apple II which was a lot of fun but painful because the UCSD Pascal development environment was so slow.

Anyway, enough of my complaining about the distant past! I was pleased to see Apple's announcement yesterday of a new programming language Swift that seems to have copied nice features from Rust and functional languages like Haskell. I did a quick skim of the free Swift book and downloaded the XCode 6 beta IDE that supports Swift.

Swift seems like a practical language. I like that there are no pointers except for the inout keyword in function arguments that allow changing values of passed in variables - I am sure that none of us would want to do that! I do wish that the language was even more compact, but really, it seems pretty good.

The XCode 6 beta IDE supports "playgrounds" that are like the worksheets provided by the Scala IDE and by Lightable for Clojure and JavaScript. I have less than one hour of hacking experience with the XCode 6 beta IDE but I like it a lot so far. In fact I like Swift and the new XCode enough that I am sure to do a project in Swift in the near future.

Haskell experiments - NLP and Yesod web apps

Originally published May 30, 2014

I am still struggling a bit learning Haskell - so far this has been a multi-year process :-)

I converted some NLP code for tagging/categorization from Clojure to Haskell. I was surprised that this code ended up being only 62 lines, including white space. That said, there is still more code to port but I think the final library will be something like 100 lines of code. I like concise programming languages!

I also wrote a very simple Yesod based web app that uses the Haskell tagging/categorization module (41 lines of code). You can try it here if you want.

I still find coding in Haskell to be slow and painful, but I do have periods when I feel very productive so I hope to sometime in the future get as efficient as I am in Clojure or Ruby. Wish me luck :-)

Technical diversity

Originally published May 1, 2014

A few times in my technical career/life I have tried to concentrate on one language, one stack (when doing web apps). This usually does not work out for me both because I have diverse interests and customers want me to use a variety of tech.

I have several deployed web apps using Clojure (and Clojurescript for one), one using Meteor.js, and a few Ruby/Sinatra apps for customers.

I am playing with Haskell/Yesod and I really like the type checking right down to web assets but due to my relative inexperience with this stack it does not yet seem agile to me. I have experimented with Pharo Smalltalk and Seaside quite a bit over the years, and to some extent Seaside and Meteor.js fill the same ecological niche for me (rich client web apps).

For general programming I also can't settle on one language.

I think that I am going to give up trying to settle on using just one or two languages and live with the overhead of staying spun up on different technologies.

I am still trying to learn Haskell

Originally published April 14, 2014

I started studying Haskell again a few months ago and since I finished my new book last weekend, studying Haskell using the excellent learning resources at FPComplete has been my primary non-working geek activity.

In the last four or five years I have bought four Haskell books, made some effort to learn Haskell, had fun, and indirectly improved my Clojure and Scala chops - but, Haskell has never before really 'clicked' for me.

FPComplete has several good tutorials on Haskell that are live web pages: you do the example exercises in the text by using the 'Open in IDE' links. Reading and then immediately trying to solve problems with what you just read is a great way to learn.

You can use large parts of FPComplete for free, but I signed up for their Personal Plan so I could use their online IDE for my own private projects. The people at FPComplete wrote their web apps for their IDE and for the interactive lessons using Haskell (of course!). I have tried nitrous.io's online IDE for Node.js, Ruby, and Python which is also fairly nice, but it is not quite as effective as FPComplete. Google also has a web based IDE that is used internally (also very nice!) so there is evidence that web based IDEs have real traction.

While entering code in the FPComplete web based IDE it is really helpful to get constant feedback that the edited code is free from compiler errors; if there are errors then you get feedback for making corrections. Running code is fast, even though this relies on remote FPComplete servers. Github integration works well (as it also does for nitrous.io). I set up Emacs with haskell-mode years ago, and I also like it very much. I find myself using the FPComplete IDE for about half of my Haskell hacking and Emacs for the other half.

My new book "Build Intelligent Systems with JavaScript"

Originally published April 12, 2014

I have been working on this book since last summer. You can get more information here.

In some sense I wrote this book for my own requirements. I the last 20+ years I have written information processing systems in Common Lisp, Java, Clojure, and Ruby. Last year I became interested in JavaScript largely because of web frameworks like Express.js and Meteor.js. I then started hacking on some of the same types of problems that I used to use Lisp languages for, and enjoyed using JavaScript.

I started writing small snippets of code in JavaScript for accessing data stores, doing simple NLP, and general information processing. While I was motivated by my own projects I also thought that other people might find the material useful.

Java 8 and Clojure

Originally published April 5, 2014

I have to admit that I have often been lazy about converting Java code to Clojure. Several of my side projects have :java-source-paths set in my project.clj file and I just copy in the bits of old Java code that I am too lazy to convert. In some way I justify this because it is so easy to mix and match Clojure and Java code and the lein build process works well and I have never had any problems.

One thing that does not work, as far as I know, is mixing Java 8 lambdas with Clojure. That may happen sometime, but not a big deal for me. Mixing in my favorite Java 8 addition (streams) with Clojure is also not so important to me since it would not bring much to the table for Clojure developers.

I am far from being a "Clojure purist" (or any other single language) but one thing that really strikes me after using Clojure for about one third of my development over the last 4 or 5 years is that it is such a practical language for getting stuff done.

Trying out the Google Cloud Platform

Originally published April 2, 2014

I watched the Google Cloud Platform webcast last week and a few days later I received a $500 credit that I need to use in the next three months. The side project I am working on right now is in Clojure. A few years ago I wrote a few small test web apps in Clojure for AppEngine but the loading request time (i.e., the time to serve a request for a cold instance) was several seconds - not so good. With the free credit, I am experimenting now with a Compute Engine instance to run the prototype Clojure web app, just running with "lein trampoline ..."

In the past several years I have experimented a lot with AppEngine. With Java (and using Objectify as a datastore API) loading request time was very quick (about a second) and I wrote a GWT application, hosted my knowledgebooks.com site, and wrote several Google Wave robots hosted on AppEngine. I don't much use Python but I did do one Python AppEngine app for a customer several years ago and that was a nice experience.

Compared to leasing a physical server or an plain old VPS, using a Google's and Amazon's offerings are expensive, even with recent discounts for Google Cloud Platform and Amazon AWS. For AWS the win is all of the ancillary services like S3, DynamoDB, CloudFront, RDS, etc. Google Cloud Platform is catching up with AWS in the range of services and with similar pricing I think that the competition will come down to two things: developer happiness and support. I really like AWS and every major consulting customer (except for Google, and that makes sense :-) I have had in the last 6 years has at least partially used AWS. So, understanding that I love AWS, I can list some advantages of the Google Cloud Platform without you thinking that I am dis'ing on AWS:

  • The Google Cloud Platform web console seems to be faster and more responsive (but the command line tools for each service seem comparable).
  • I really like the online log viewing on Google Cloud Platform which can collect all logs for a project in one place - a big win.
  • Using Google Cloud Platform reminds me of working as a contractor at Google last year when I really enjoyed using their internal systems. Not exactly the same at all, but a pleasant reminder of using what has to be one of the best software development platforms (e.g., all source code to everything immediately accessible, working off of trunk, incredible tools for viewing logs and tracking down problems, etc.)
  • Immediate scaling, as needed

For developers competition between Amazon and Google (and other contenders like IBM) is a great thing. The only thing that I think is very important is keeping in mind the cost savings from leasing raw servers or commodity VPS (ignoring the higher costs of devops). At the other end of the cost spectrum, for some applications, going "gold carpet" for PaaS offerings like FP Complete (I am a happy customer) for Haskell development and deployment to managed servers or Heroku still makes a lot of sense - it just depends on cost/time tradeoffs.

Great joy in the world - Java 8 released

Originally published March 18, 2014

I have been using pre-release versions of Java 8 for a long time but it is great to have an official release. You can get it here.

I have found that lambdas and streams have been very useful features.

Many people, myself included, worried about the future of Java when Oracle bought the bones of Sun Microsystems. I must say now that I think Oracle has been doing a good job supporting Java.

I would like to convert most of my open source Java projects to Java 8, taking advantage of new features as appropriate. The problem is that I hate to break backwards compatibility for people who aren't going to update soon. I probably should not update the code examples for my Java books either.

I finally tried Google Glasses

Originally published February 15, 2014

I never had the chance to try Google Glasses while I consulted at Google last year but my friend Judy just got a pair and I tried them this morning. Judy took the following picture of me wearing her glasses:

Mark wearing Google Glasses

I was pleasantly surprised at how nice they were (after a couple of minutes getting them adjusted). I think that when Google finalizes the product and starts shipping them at a reduced price that the glasses will be popular for a variety of applications like:

  • Workers who need to keep their hands free
  • People with disabilities
  • Walking tours, supplying information about what a person is looking at
  • Hands free web search
  • Hands free photography and videography (but, the Google Glasses need an indicator light warning people when the camera is on!)
  • etc., etc.
I already "talk to my phone" to do web searches, get directions, etc., so talking to the glasses seems natural enough. I think that for general consumers that price point will make or break Google Glasses as a product. For professional applications, price is much less important, but I would bet the Google wants something more than a niche market. One issue I have is that both Android phones and iPhones are already so convenient with the Google Now app and Siri for voice control and getting information that the glasses really need to be inexpensive. It takes me a few seconds to take out my phone and ask Google Now a question - a few seconds saved with the glasses, but I only use Google Now to get information a few times a day so the glasses would not save me very much time.

Long term, wearable mobile devices will probably be very important to Google's business. This is just a guess, but: I expect Google Now (in some form) to eventually become the primary way that users access Google's various services and not only will Google Now be highly adaptable to individuals, but there will likely be different flavors for peoples' individual contexts like what kind of job they have and whether they are at work or at play. The big question I have is what the monetization strategy will be. It is difficult to accept that advertisements will be appreciated by users in voice interactions with mobile devices. It is true that the data Google gets from services like Google Now, on device GPS, etc. obviously helps target advertisements.

The other question I have (and only waiting a few years will answer my question) is: how is Apple going to compete? As a very happy owner of a MacBook Air and an iPad, I love Apple gear, and Google Now services run great on my iPad (and I assume also on iPhones) so Google probably does not care too much if people use iPhones or Android phones to access Google Now services. However it seems to me that Apple is leaving a lot on the table if they only go after the hardware and operating system market. I find using Siri on my iPad a nice experience but Apple does not have the same level of detailed information about me to compete with Google Now. It may be a bit of a stretch but I wouldn't be surprised if Apple ends up owning the market for people who want to maintain more privacy and Google will own the market of people willing to let Google have every bit of information about them in order to make money and provide services. Part of me would rather be in the "privacy consumer market" if Apple steps up and builds an ecosystem that protects people's privacy, and they could do this because they can make money without knowing a lot of personal details about people who use their gear. Google does not have that flexibility.

What a difference 3 years makes - Cassandra 2.x, CQL, Java, Clojure, and Node APIs

Originally published February 2, 2014

It has been a few years since I had a need for Cassandra (I now have a pilot project for which I would like a scalable data store). I grabbed the latest Cassandra source from github and I have been looking at some of the code, and running it locally also with the CQL shell. I know that CQL has been around for a while, but this is my first experience with it. Huge improvement. I really like it.

I am still trying to decide if I want to use Java (version 8), Clojure or Javascript but the newest Java driver for Cassandra, the Clojure wrapper alia for this driver, and the node-cassandra-cql all look very good.

I give up some tunable degree of consistency (the "C" in "ACID") with Cassandra but I can easily live with that in my project. For now "developer friendliness" is really important to me since I am just experimenting with some ideas but if I need to I would like to be able to scale without changing my code.

My Mom Elaine Watson died yesterday

Originally published January 29, 2014

My Mom was a fun person to be with. My very fondest memories of time with her are many hundreds of days spent boating together and also our family dinners.

My Mom was a gourmet cook and when I was growing up she would make incredible dinners for my Dad Ken, my older brother Ron, and me.

Love you Mom!

Here is a picture of my Mom and Dad taken in 1955 when I was four years old:

Mom and Dad

Java and Clojure examples for reading the new WARC Common Crawl files

Originally published January 26, 2014

I just added a Clojure example to my Common Crawl repo. This Clojure example assumes that you have locally copied a crawl segment file to your laptop. In the next week I will add another Clojure example that pulls segment files from S3.

There are two Java examples in the repo for reading local segment files and from S3.

Less Java and Clojure, more JavaScript

Originally published January 21, 2014

... and, almost no Ruby.

I have always been a polyglot (programmer who uses many programming languages). Since 1974, I have easily been paid to design and write software in over a dozen languages. I still really like (love!!) Clojure as a language, Java is very practical for many applications because of the JVM's efficiency and the wonderful Java library and framework ecosystem. I don't use Ruby very much anymore except for small utilities.

Why JavaScript? The primary reason is that I am writing a JavaScript book ("Build Intelligent Systems with JavaScript") and the secondary reason is most of my development efforts are centering around Meteor right now. The more I use Meteor, the more impressed I am. I am not yet a highly skilled JavaScript programmer (I just use a small bit of the language, and rely in IntelliJ with jslint always running to keep me mostly out of trouble) but JavaScript is a good-enough language and I find languages that compile to JavaScript like ClojureScript and CoffeeScript to be a little more trouble than they are worth (although I read and digest every article I see on ClojureScript on Hacker News). I have written the same sort of web app in both Clojure + ClojureScript and in JavaScript + Meteor and I find the Meteor experience to be more effective.

I am very pleased to be helping the Common Crawl Organization

Originally published January 13, 2014

I am setting aside some of my time to volunteer helping out with the CommonCrawl.org

Much of the information in the world is now digitized and on the web. Search engines allow people to have a tiny view of the web, sort of like shining a low powered flashlight around in the forest at night. The Common Crawl provides the data from billions of web sites as compressed web archive files in Amazon S3 storage and thus allows individuals and organizations to inexpensively access much of the web for whatever information they need - like turning the lights on :-)

The crawl is now in a different file format. My first project is working on programming examples and how-to material for using this new format.

Happy New Year

Originally published January 3, 2014

I wanted to wish everyone a happy new year!

Carol and I are now (fairly) permanently back living in our home in Sedona Arizona except for three planned trips to visit family. Working at Google in Mountain View last fall for four months was a fun and very interesting experience but it is good to be back home. Even nice hotels get tiring after a while.

I usually make new years resolutions, but this year there is not much to change. I would like to get back to a (mostly) vegetarian diet. We have also been drastically reducing the amount of gluten (and wheat in general) from our diets. I also intend to get back to hiking 10 to 15 hours a week like I used to.

When it comes to work and continual learning, I don't really have any specific new years resolutions. I am getting back into Clojure and ClojureScript (and looking at the Reactive libraries). I am also still interested in how the Java 8 rollout will go. Except for working on code examples for a book project I am not doing too much work in JavaScript right now. I also think that I will continue my trend of more often using Clojure instead of Ruby for general programming.

Practical Internet Privacy and Security

Originally published December 23, 2013

I find some conflict between my desire to take advantage of services from Google, Twitter, Facebook, and some other web properties and my desire for maintaining some reasonable amount of privacy. To be sure these services are not free because these companies make money from information about us. This blog post contains my own practical approach to using these services.

The multiple web browser approach

I use three web browsers on a regular basis:

  • Chrome - I run Chrome with default privacy settings and only use Chrome for accessing Google, Twitter and Facebook services. These companies are very capable of tracking activities on the web. I consciously allow them to track me while using their services.
  • Firefox - I use Firefox for my web browsing. I run Firefox in a very secure privacy mode using the Ghostery plugin to block tracking. Under the Firefox preferences, I set the maximum security settings.
  • Opera - I only use Opera for my online banking. I am not sure, but it seems logical that using a separate web browser for just online banking makes this more secure.
When I am doing my usual web browsing with Firefox, I do not click "Likes" or Google "+" links. If I really like what someone has written then I will email them and engage them directly in a real conversation.

Using web services as a paying customer

I am a very happy customer of Evernote and Dropbox. The $150 per year total that I pay them for their services is well worth it. I am not going to discuss their services, but rather why I feel comfortable using their services:

  • Encryption that I can control - I backup sensitive files on Dropbox using a bash shell script that ZIPs and GPG encrypts bits of sensitive information that I want to both back up and share between computers that I use. This shell script when it runs creates about five encrypted files and pushes them to Dropbox. I am a programmer so I am almost always working in a shell window and it takes about 5 seconds to run the bash script. So sensitive data never gets sent to Dropbox un-encrypted. The Evernote service also allows local encryption of any part of any note: just select whatever is sensitive and use the menu "Edit -> Encrypt Selected Text..."
  • Privacy policies - I feel comfortable with the privacy and terms of use policies of both Dropbox and Evernote.
  • Paying customer - Since I am a paying customer I can understand how these companies make money and "keep the lights on."

Own your own web properties

A little off topic from Internet privacy, but my other advice to friends and family members is to own your own content on the web. That is, stake out your property on the web under a domain that you own and using web hosting that you both control and can change anytime you want.

Have a good idea that you want to share? Then write about it on your web site or on your blog that you control hosted under your own domain. When I write something that I want to share, I put it on my own web property, and use the Chrome web browser to access Google+, Facebook, and Twitter to link to whatever I wrote. This just makes good sense to me: own and control your stuff.

Trying to find a Java equivalent for Clojure + Compojure + Noir

Originally published November 17, 2013

For a few years my most used stack for writing web applications has been Clojure + Compojure+ Noir, usually using Bootstrap, with some experiments using Ember.js and Meteor. After playing a lot with Java 8 (with lambdas, Streams, etc.) I am considering once again using Java as one of my go-to languages for new projects.

Even with new language features, Java 8 is not quite the fun developer's experiences as Clojure, JRuby, and Scala but Java has a few advantages: lots of available developers, great tooling, mature tooling, and a wealth of libraries.

I looked hard at Scala and Scalatra but there are rough edges to Scala (like sbt!) that I don't particularly like. Clojure is close to the efficiency of Java (about a factor of 2, in both space and time) but not as efficient. I love coding in Ruby, but the runtime performance keeps me from using Ruby except for small scripts for text processing, quick Sinatra web apps, etc.

I have experimented this weekend with Java 8 + Spark (has Sinatra style routing) + Freemarker templating + Bootstrap. I don't have the warm feeling with this combination that I do for Clojure + Compojure+ Noir and for Ruby + Sinatra. I need to try a small project to kick the tires, but I would really like to give Java 8 a shot at recapturing my "developer's mindshare."

I am immersed in graph databases

Originally published November 2, 2013

I am old school: most of my data storage requirements used to be well met with Postgres. Now most of what I work with is graph databases of one form or another:

This morning I was in my office at Google working on a Knowledge Graph schema (yeah, I know it is Saturday, so it goes...) and now that I am back "home" (*) this afternoon, I am working on a technical review for the book Neo4j In Action - Neo4j is a very fine open source (and commercial) graph database. Last night I was reviewing third party developer documentation for Facebook's Open Graph, Wolfram Alpha, and the still publicly available Freebase APIs. This is all for my "after Google" project that I have planned for later next year, if my plans don't change.

I don't usually put long quotes from other people in my blog, but I saw something early this morning that really resonated with me: Darin Stewart's blog at Gartner discussing Google's Knowledge Graph and their business:

With a few tools, some semantic know-how and a bit of elbow grease, you could create your own knowledge graph that integrated these public sources with your own internal, proprietary data. The biotech and intelligence industries have been doing it for years.

I think he hits on a huge new area of semantic web and linked data business that I think follows a natural evolutionary progression in IT: businesses with separate siloed and separate IT systems --> some use of master data systems to tie disparate business units together --> complete integration of internal systems with external public data sources. Ideally this should be done using semantic web techniques like Ontologies and controlled vocabularies for all sources of data and supported by standardized (for a specific business or organization) UIs for users so they don't suffer from cognitive disconnect as they use multi-sourced data.

(*) In some sense of "home": my wife and I did not want to completely move to Mountain View for my current consulting gig so we are staying long term in a suite at a Residence Inn - which I recommend, BTW, if you need to live away from home for a long while but don't feel like moving.

Great talk at Google today by Reddit co-founder Alexis Ohanian

Originally published October 17, 2013

Alexis Ohanian's new book Without their Permission was just released yesterday. Copies were on sale and I bought one :-) When Alexis signed the book I bought, I joked about my slight disappointment that Reddit switched away from their Common Lisp initial implementation. Thus his comment. Nice Reddit Alien drawing.

signed book

Alexis talked about freedom of the Internet, and his charitable work. He also had great advice on entrepreneurship. The fellow at Google who interviewed Alexis (I didn't catch his name) did a fine job. Lots of fun!

As long as I am posting pictures, here is a picture of the talk and of my lunch earlier in the day:

signed book

signed book

Experience Working at Google

Originally published October 8, 2013

I haven't blogged in a while because I have been busy getting used to working as a contractor at Google. My wife Carol and I have also been busy getting settled living in Silicon Valley and exploring the area. I have somewhat reluctantly put my new book project Build Intelligent Systems with JavaScript on hold. You may wonder about my writing a book using Javascript examples when I am so enthusiastic about using Clojure, ClojureScript, and Ruby. I decided that the use of Javascript is so pervasive that it would be useful to write a good reference and source of examples using JavaScript for knowledge representation, AI, semantic web, graph databases (and other NoSQL data stores), relational databases, and general information gathering and reuse. I enjoy writing Javascript code while working in IntelliJ with JSLint turned on to warn errors and style warnings.

I think that I will mostly be using Java for development at Google. The training classes and learning materials are excellent. I feel like I am a very tiny cog in a very, very large machine, and as I told my family, so far I am a happy little cog.

The stories about the food at Google are all true: it is of exceptional high quality: tasty and healthy. There are many restaurants/cafes on the campus. So far, I have just been eating at four of them near to where I work.

My team at Google is hiring for a Semantic Web/Linked Data position

Originally published October 8, 2013

Here is the job posting. Please read this job posting if you have a background in the Semantic Web and Linked Data. We have an interesting project and Google is a fun place to work.

JVM language shootout

Originally published October 8, 2013

I played with Java 8 over the weekend using IntelliJ which has editing support for Lambdas and other Java 8 features. I think that Java 8 will be a very nice update for the language when it becomes widely used. Will the new features be enough to keep from loosing developer mind share to other JVM languages like Scala, JRuby, and Clojure? I really like Scala but the language requires a large learning curve and the compiler is a little slow. Ruby is a great language and the JRuby implementation is very well done, is a pleasure to use, but there is a real performance penalty. Clojure is a lot of fun to use but I don't think it will ever be very widely used even though the development tools and available libraries keep getting better - it is a great language for those of us who love Lisp languages. Other languages like Kotlin are also worth keeping an eye on. Kotlin and Clojure get extra points for also compiling to Javascript.

I was joking about the 'shootout' title of this blog post. You should use a programming language that you enjoy programming in and that meets the requirements of your projects :-)

Working at Google as a consultant

Originally published September 4, 2013

I accepted a contracting gig for working at Google a while back and after a vacation with my family I started this week.

I don't have too much to say except that the people are nice and Google is a very well run company.

Free Software Foundation (FSF) is even more relevant now than ever before

Originally published August 11, 2013

I have been supporting the FSF since before it was created as an organization in 1985 - I bought tapes with GNU software and the Emacs book from Richard Stallman in the late 1970s. Even though I have written and sold commercial software products for the Xerox Lisp Machine, Macintosh, and Windows in the decade from 1982 to 1991, since the mid-1990s my work and business has been largely centered around using and writing free software or open source software.

I just listened to a recent talk by Richard Stallman (he starts at time point 1 hour, 41 minutes into the video). I think that Richard really gets it in many ways. I understand if some people don't agree with his more extreme views of freedom and privacy, but I try to follow his suggestions (and that is what he does, offer suggestions) as much as I can within the limits of earning a living (e.g., many of my consulting customers do not support development using free software or open source licenses). In case you don't have time to listen to him, here are some of the topics in his talk:

  • We need fair governments to protect people from the excesses of powerful corporations and individuals. Unfortunately, most governments more serve the interests of the rich and powerful, something that people need to push back against. Effective law enforcement is absolutely necessary and it should be done under rule of law with proper court orders.
  • People should be anonymous in their purchases. As much as possible avoid being tracked by online sellers like Amazon (my wife and I live in the mountains in Central Arizona, so unfortunately we rely on ordering things from Amazon and other online sellers).
  • As usual, he talked briefly about the four software freedoms supported by GNU software licenses.
  • With the recent revelations about NSA "collect everything they can" operations (and similar activities by and with other governments) having free and open software systems with documented hardware is more important and relevant than ever. He makes the point that with free software we can at least try to preserve our rights.
  • There are good and bad ways to use the Internet. Good: maintaining control of your own data, using your own software on your own computer and using the Internet infrastructure carefully to preserve your privacy and your rights. Bad: providing a lot of personal information; allowing your activities to be tracked unnecessarily.
I do sometimes use the (often very useful) services provided by Google, Facebook, Twitter, Amazon, and Apple but I try to be aware of what information that I am providing. I like to use one web browser for accessing these services (e.g., logging on to Facebook to see what family members are doing, and selectively using Google services) and a separate web browser with locked down privacy controls for everything else. Using two different web browsers really helps me keep straight whether I am temporarily using useful services and sharing some personal data as a trade for those services, or, I am in a "privacy and freedom safe" mode.

We need to all make our own informed decisions on how we use technology and different people have different needs and concerns. I do enjoy talking with less tech savvy friends and family about these issues, not so much lecturing, but just so they know what their options are. My wish-list for technology that preserves freedoms is:

  • Use of free software on my computers. This is possible since GNU/Linux provides all of the tools that I need for writing and software development.
  • Privacy of data stored as backup and for convenience "in the cloud." This is also possible and very easy to do. For customer and my own proprietary information and data, I have backup scripts that ZIP and encrypt backups for automatic storage on DropBox, SpiderOak, S3, etc. Encryption is an absolute requirement for securely doing business on the Internet. As much as we can, we should all try to encrypt as much of our private data, emails, etc. as possible. It is a good habit to have. Please consider teaching less tech savvy friends and family members how to use encryption.
  • Secure and private communication. This I do not have. My cellphone company Verizon does (I believe) pass on all of my voice calls, text messaging, and Internet connections to my government (against my wishes !). In a similar way, all of our Internet traffic is stored for possible future analysis. Again, I am not happy about this since I believe in the proper rule of law: it is legally correct to get individual court orders to gather data for criminal prosecutions. No court order should equal no wide spread data collection. This seems like a no-brainer requirement for a free and open society, but the news media and most elected officials do too good of a job fooling people into accepting giving up freedom for incorrectly perceived extra safety. I have never received a warm response from my elected Congressional representatives when I bring up this subject in correspondence. The defense industry and the "forever war on terrorism" is BIG BUSINESS and has disproportionate influence on our elected representatives; unfortunately, this is not going to change - a case of wealthy and powerful special interests getting their own way.
  • Privacy-supporting web services for search and social media. I am probably in the minority in this, but I would prefer paying a monthly fee for services and receive a strong guarantee of privacy except, of course, in the case of a proper and legal court order in the pursuit of specific investigations that is fully compliant with our rights under the US Constitution and the Bill of Rights.

Using OpenCyc RDF/OWL data in StarDog

Originally published August 7, 2013

Over the years I have used the OpenCyc runtime system to explore and experiment with the OpenCyc data that consists of 239K terms and 2 million triples. In order to test the ease of use of the RDF OWL OpenCyc data I tried loading the OpenCyc OWL file into StarDog:

$ ./stardog-admin server start
$ ./stardog-admin db create -n opencyc ~/Downloads/opencyc-2012-05-10-readable.owl
Bulk loading data to new database.
Loading data completed...Loaded 3,131,677 triples in 00:00:47 @ 65.8K triples/sec.
Successfully created database 'opencyc'.

After you load the data you can experiment with the StarDog command line SPARQL interface. You use the interface to enter SPARQL queries one at a time:

./stardog query opencyc "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10"
Here are some SPARQL queries to get you started using the command line interface (I am only showing the SPARQL queries):
SELECT ?s ?p ?o WHERE { ?s ?p ?o FILTER(REGEX(?o, "Clinton")) } LIMIT 30
SELECT ?p ?o WHERE { :HillaryClinton ?p ?o } LIMIT 30
SELECT ?o WHERE { :HillaryClinton :wikipediaArticleURL ?o }
Notice that OpenCyc terms like :HillaryClinton are in the default namespace. The results for this last query are:
+-------------------------------------------------------+
|                           o                           |
+-------------------------------------------------------+
| "http://en.wikipedia.org/wiki/Hillary_Rodham_Clinton" |
+-------------------------------------------------------+
You can easily convert WikiPedia URLs to DBPedia URIs; for example, this URL would as a DBPedia URI would be <http://dbpedia.org/data/Hillary_Rodham_Clinton> and using the DBPedia live SPARQL query web app you might want to try the SPARQL query:
SELECT ?p ?o WHERE { dbpedia:Hillary_Rodham_Clinton ?p ?o } LIMIT 200
Some RDF repositories support the new SPARQL 1.1 feature of specifying additional SPARQL SERVICE end points so queries can combine triples from difference services. Bob DuCharme covers this in his book "Learning SPARQL" at the end of Chapter 3. Without using multiple SPARQL SERVICE end points you can still combine data from multiple services on the client side; for example: combine query results of multiple queries from a local StarDog or Sesame server with the remote DBPedia endpoint.

Python SPARQL client example

Originally published August 5, 2013

I wrote about using linked data sources like DBPedia yesterday but I should have included some example code for you to play with. I will get you started with these directions:

Start by installing three packages:

sudo easy_install simplejson
sudo easy_install RDFLib
sudo easy_install SPARQLWrapper
If you use the faceted search browser for DBPedia and search for "Berlin" and see a likely URI dbpedia:Berlin_Berlin to start with. The following code finds all triples that have dbpedia:Berlin_Berlin as an object and displays the subjects and predicates. Then, for all of the subjects in these triples, I search again for all triples with these subjects:
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX dbpedia: <http://dbpedia.org/resource/>
    SELECT ?s ?p
    WHERE { ?s ?p dbpedia:Berlin_Berlin } LIMIT 5
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

print("subject and predicate of all triples with object == dbpedia:Berlin_Berlin")

for result in results["results"]["bindings"]:
    print("s=" + result["s"]["value"] + "\tp=" + result["p"]["value"])

print("loop over all subjects returned in first query:")
for result in results["results"]["bindings"]:
    sparql.setQuery("""
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        PREFIX dbpedia: <http://dbpedia.org/resource/>
        SELECT ?p ?o
        WHERE { <""" + result["s"]["value"] +
          """> ?p ?o } LIMIT 10""")
    results2 = sparql.query().convert()
    for result2 in results2["results"]["bindings"]:
        print("p=" + result2["p"]["value"] + "\to=" + result2["o"]["value"])
This is really a simple (and contrived) example that you can experiment with in just a few minutes, but it shows the mechanics of "spidering" linked data. You will need to learn the SPARQL query language for any real application. You can find many good SPARQL tutorials on the web, or you can grab a free PDF of either the Java or Common Lisp edition of my book "Practical Semantic Web and Linked Data Applications" on my my books web page.

I like to use DBPedia's snorql web app to experiment with SPARQL queries, get the queries working, then implement them in Python or Ruby scripts.

Friday, August 02, 2013

Semantic web and linked data are a form of agile development

I am working on a book on building intelligent systems in JavaScript (1) and I just wrote part of the introduction to the section on semantic web technologies:

In order to understand and process data we must understand the context in which it was created and used. We looked at document oriented data storage in the last chapter. To a large degree documents are an easier source of knowledge to use because they contain some of their own context, especially if they use a specific data schema. In general data items are smaller than documents and some external context is required to make sense of different data sources. The semantic web and linked data technologies provide a way to specify a context for understanding what is in data sources and the associations between data in different sources.

The thing that I like best about semantic web technologies is the support for exploiting data that was originally developed by small teams for specific projects. I view the process as a bottom up process with no need to plan complex schemas and plans for future use of data. Projects can start by solving a specific problem and then the usefulness of data can increase with future reuse. Good software developers learn early in their careers that design and implementation need to be done incrementally. As we develop systems, we get a better understanding of how our systems will be used and how to build them. I believe that this agile software development philosophy can be extended to data science: semantic web and linked data technologies facilitate agile development, allowing us to learn and modify our plans when building data sets and systems that use them.

(1) I prefer other languages like Clojure, Ruby, and Common Lisp, but JavaScript is a practical language for both server and client side development. I use a small subset of JavaScript and have JSLint running in the background while I work: “good enough” with the advantages of ubiquity and good run time performance.

3rd edition of my book just released: “Loving Common Lisp, or the Savvy Programmer’s Secret Weapon”

"Loving Common Lisp, or the Savvy Programmer’s Secret Weapon"

The github repo for the code is here.

Enjoy!

Easy setup for A/B Testing with nginx, Clojure + Compojure

Actually, I figured out the following directions for my Clojure + Compojure web apps, but as long as you are using nginx, this would work for Node.js, Rails, Sinatra, etc.

The first thing you need to do is to make two copies of whatever web app you want to perform A/B Testing on, and get two Google Analytics user account tokens _uacct (i.e., the string beginning with “UA-”) tokens, one for each version. I usually use Hiccup, but for adding the Google Analytics Javascript code, I just add it as a string to the common layout file header like (reformatted to fit this page width by adding line breaks):
(html5
    [:head [:title "..."]
     ""
     (include-css "/css/bootstrap.css")
     (include-css "/css/mark.css")
     (include-css "/css/bootstrap-responsive.css")
     ""
     ""
     ]
The next step is to configure nginx to split requests (hopefully equally!) between both instances of you web app. In the following example, I am assuming that I am running the A test on port 6070 and the B test on port 6072. I modified my nginx.conf file to look like:
upstream backend {
    ip_hash;
    server   127.0.0.1:6070;
    server   127.0.0.1:6072;
  }

  server {
     listen 80;
     server_name  DOMAINNAME.com www.DOMAINNAME.com;
     location / {
        proxy_redirect off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://backend;
    }
    error_page 500 502 503 504  /error.html;
    location = /error.html {
        root  /etc/nginx;
    }
  }
The ip_hash directive is supposed to evenly split requests by requesting IP address. This means that if a user hits your web app from their home and then again at their local coffee shop that the might see both A and B versions of your web app. Other options would be to use a per user device cookie, etc., but I think that randomly assigning version A or B based on a hash of the requesting IP address is sufficient for my needs.

I am just starting to use this scheme for A/B Testing, but it seems to work as expected. I do suggest that when your clone you web app that you keep versions A and B identical for a few days and check the Google Analytics for both account tokens to make sure the statistics for page views, times on pages, etc. are close to being the same for A and B.

After more testing, Google Analytics shows that the nginx ip_hash directive seems to split traffic near perfectly 50% to each A and B versions of my web site.

The 4th edition of my book “Practical Artificial Intelligence Programming with Java” is now available

Buy a copy at Leanpub!

The recommended price is $6 and the minimum price is $3. This includes PDF, Kindle, and iPad/iPhone formats and free updates as I fix any errors and update the book with new material. You may want to look at the github repository for the book example code before purchasing this book to make sure that the material covered in my book will be of interest to you. I will probably update the book in a few weeks after getting feedback from early readers. I am also still working on Clojure and JRuby wrapper for the Java code examples and as I update the code I will frequently push changes to the github repository for the example code.

My version of Ember.js ‘Get Excited Video’ code with Sinatra based service

The Ember.js Get Excited Video has source code for the example in the video but the test data is local on the client side. I forked this project and added a Ruby Sinatra REST service. My version is here. This uses Ember.js version 1.0 RC4. It took me some time get the example code working with a REST service so I hope my forked example will save you some time and effort.

Rest service and client in DART

The DART language looks like a promising way to write rich clients in a high level language. I have been looking at DART and the Ruby to JavaScript compiler Opal as possible (but not likely) substitutes for Clojurescript with Clojure back ends. It took me a little while to get a simple REST service and client working in development mode inside the DART IDE. The following code snippets might save you some time. Here is a simple service that returns some JSON data:
import 'dart:io';
import 'dart:json' as JSON;

main() {
  var port = 8080;
  HttpServer.bind('localhost', port).then((HttpServer server) {
    print('Server started on port: ${port}');
    server.listen((HttpRequest request) {
      var resp = JSON.stringify({
        'name': 'Mark',
        'hobby': 'hiking'}
      );
      request.response..headers.set(HttpHeaders.CONTENT_TYPE,
                                   'application/json');
      request.response..headers.set('Access-Control-Allow-Origin','*');
      request.response..headers..write(resp)..close();
    });
  });
}
It is required to set Access-Control-Allow-Origin. Here is the client code (I am not showing the HTML stub that loads the client):
import 'dart:html';
import 'dart:json';

void main() {
  // call the web server asynchronously
  var request = HttpRequest.getString("http://localhost:8080/")
                           .then(onDataLoaded);
}

void onDataLoaded(String responseText) {
  var jsonString = responseText;
  print(jsonString);
  Map map = parse(jsonString);
  var name = map["name"];
  var hobby = map["hobby"];
  query("#sample_text_id").text =
      "My name is $name and my hobby is $hobby";
}
The call to query(…) is similar to a jQuery call. As you might guess, “#sample_text_id” refers to a DOM element from the HTML page with this ID. DART on the client side seems to be very well supported both with components and tooling. I think that DART on the server side is still a work in progress but looks very promising.

Thursday, March 14, 2013

Small example app using Ember.js and Node.js

I have been playing with Ember.js, and generally trying to get a little at better programming in Javascript, a language I have used for years, but for which I am still a novice. I wrote the other day about Small example app using Ember.js and Clojure + Compojure + Noir and I thought I would try replacing the simple Clojure REST backend with an equally simple Node.js backend. The results of this simple exercise are in the github repo emberjs-nodejs. I leave it to you to take a look if you are interested.

I will say that development with Javascript, Ember.js, and Node.js seems very light weight and agile, even though I use IntelliJ for editing and project management. Starting an app takes maybe a second. Compared to, for example, Java + GWT, or even Clojure + Compojure, I find Javascript, Ember.js, and Node.js to be really a light and fun combination. It would be even more fun if I were a better Javascript programmer :-)

Tuesday, March 12, 2013

More Clojure deployment options

I have been very happy running multiple Clojure web apps using the embedded Jetty server using lein trampoline on a large VPS. I start each app on a different port and use nginx to map to each app to its own domain name. Easy and this lets me also adjust the JVM memory individually for each application. This works so well for me that I almost feel guilty trying alternatives :-)

I don't know if I will permanently change my deployment strategy but I am experimenting using Immutant which is a Clojure deployment platform built on top of JBoss AS 7. After installing the lein Immutant plugin and the latest version of Immutant, then you can run the JBoss AS/Immutant using lein, and separately deploy and un-deploy web applications using lein. Pretty slick, but I am still trying to get a grip on interactive development by connecting nREPL (documentation: Interactive development). My usual style of interactive development is pretty nice (I use IntelliJ to edit, keep the web app running with lein run with live loading of changes to Clojure code (including Hiccup, which is my favorite ""markup""), CSS, etc. I am going to give Immutant with nREPL interactive development a long and patient try - not sure what I will be using a month from now: probably not because I prefer stacks that are less complicated. My days of loving heavy weight J2EE apps are over :-)

BTW, I would bet that many of us have suffered a little using a do-it-yourself custom cartridge on Red Hat's OpenShift PaaS. It works, but at least for me, it was painful even with a lot of useful material on the web describing how to do it. This Immuntant based example looks more promising.

I have used Heroku in the past for Clojure web apps but since I usually have 3 or 4 distinct web apps deployed the cost of about $35 each is much more than putting them all on a large memory multiple core VPS. I very much wish that Heroku had a slightly less expensive paid 1 dyno plan that never gets swapped out when idle, causing a loading request delay.

I haven't yet tried Jelastic hosting. Their minimum paid plan with 128MB RAM (that I assume would always be active, so no loading request delays) is about $15/month which sounds fair enough. They deserve a try in the near future :-)

Another option that I have used for both one of my Clojure web apps and a low traffic web app for a customer is to pre-pay for a micro EC2 instance for a monthly cost of about $5. My only problem with this is that EC2 instances are not so reliable, and I feel like I am better off with a quality VPS hosting company. BTW, if you run a Clojure web app using the embedded Jetty server on a micro EC2, be sure you run it using "nice" to lower its priority and avoid AWS drastically reducing your resources because you use too much CPU time for too long of a period; I find much better continuous performance using "nice" - go figure that one out!

One AWS option I haven't tried yet is using lein-beanstalk which looks good from the documentation. Elastic Load Balancing on AWS costs about $18/month which drives up the cost of a Elastic Beanstalk deployment of a low traffic web app, but I think it does offer resilience to failing EC2s. You are limited to one web app per Elastic Beanstalk deployment, so this is really only a good option for a high traffic app.

A few years ago I also used appengine-magic for hosting on GAE but it bothers me that apps are not easily portable to other deployment platforms, especially converting datastore code. This is too bad because when Google raised prices and brought AppEngine out of beta, that made it more attractive to me, even with some developers complaining of large costs increases. Still, for my desire for a robust and inexpensive hosting for low or medium traffic web sites, AppEngine is in the running by simply setting the minimum number of idle instances to 1 and the maximum number of instances to 1: that should handle modest traffic for about $10/month with (hopefully) no loading request delays.

Edit: someone at my VPS hosting company (RimuHosting) just told me that as long as I set up all my apps, nginx, etc. to automatically start when a system is rebooted, then I probably have no worries: on any hardware problems they restart from a backup on a new physical server. I do occasionally try a soft-reboot of my VPS just to make sure everything gets started properly. I thought that RimuHosting would do this, but I asked to make sure.

Edit: the New York Times has a good article on the big business of offering cloud services that is worth reading: http://www.nytimes.com/2013/03/13/technology/google-takes-on-amazon-and-microsoft-for-cloud-computing-services.html

Edit: 2013-03-14: Jim Crossley set up a demo project that exercises using the common services for Immuntant/JBoss, and wraps project source and resource files for live loading of changes to Clojure code and resources: Immutant demo.

Monday, March 11, 2013

Small example app using Ember.js and Clojure + Compojure + Noir

I use Clojure with Compojure and Noir for (almost) all of my web apps and lately I have also been experimenting with Ember.js. After buying and reading Marc Bodmer's book Instant Ember.js Application Development How-to yesterday I decided to make a very small template application using Ember.js for the UI and a trivial back end REST service written in Clojure. I used Marc's Ember.js setup and it worked well for me.

The github repo for my small template project is emberjs-clj

Please note that this example is a trivial Ember.js application (about 50 lines of code) and is intended just to show how to make a REST call from an Ember.js front end app, how to implement the REST service in Clojure, and not much else. I wanted a copy and paste type template project to use for starting "real projects."

You can grab the repo from github, or if you just want to see the interface between the UI and back end service, here is the code run by the Javascript UI:

RecipeTracker.GetRecipeItems = function() {
  $.ajax({
    url: '/recipes/',
    dataType: 'json',
    success : function(data) {
      for (var i = 0, len = data.length; i < len; i++) {
        RecipeTracker.recipesController.addItem(
          RecipeTracker.Recipe.create({
            title: data[i]['title'],
            directions: data[i]['directions'],
            ingredients: data[i]['ingredients']
        }));
      }
    } });
};
and here is the Clojure code for returning some canned data:
(def data [
       {"title" "Gazpacho",
        ...},...])

(defn recipes-helper []
  (json/write-str data))

(defpage "/recipes/" [] (recipes-helper ))
Hopefully my demo project will save you some effort if you want to use Ember.js with a Clojure back end.

Edit 2013-03-13: updated the example on github to Ember.js 1.0 RC1, cirrectling some breaking API changes.

Saturday, March 09, 2013

Google Research's wiki-links data set

wiki-links was created using Google's web crawl and looking for back links to Wikipedia articles. The complete data set less than 2 gigabytes in size, so this playing with the data is "laptop friendly."

The data looks like:

MENTION vacuum tubes 10838 http://en.wikipedia.org/wiki/Vacuum_tube
MENTION electron gun 598  http://en.wikipedia.org/wiki/Electron_gun
MENTION oscilloscope 1307 http://en.wikipedia.org/wiki/Oscilloscope
MENTION radar        1657 http://en.wikipedia.org/wiki/Radar
One possible use for this data might be to compare two (possibly multiple word) terms by looking up their Wikipedia pages, remove the stop (noise words) from both pages, and calculate a similarity based on "bag of words", etc. Looks like a great resource!

Another great data set from Google for people interested in NLP (natural language processing) is the Google ngram data set that has ngram sets for "n" in the range [1,5]. This data set is huge and not "laptop friendly" so last year I leased very large memory server from hetzner.de for a few months while I used the ngram data sets. I wish that I still had this data online but the cost of the server eventually became greater than the value of ready access to the data. The next time I need it I am planning on configuring a large memory EC2 instance with enough EBS storage for the data, indices, and application specific stuff - then I can stop the large memory instance when I don't need the data online which is probably 99% of the time: most of the costs will just be for the EBS storage itself, and not the (approximately) $0.50/hour when I keep the instance running.

Edit: I just did the math: renting a Hetzner server turns out to be much less expensive than using an EC2 instance that is usually spun down because 1 terabyte of EBS storage is $100/month (almost double what a Hetzner server costs).

Monday, February 25, 2013

Building custom data stores

Creating a custom datastore may seem like a bad idea when such great tools like Postgres, MongoDB, CouchDB, etc. are available in their open source goodness as well as good commercial products such as Datomic, AllegroGraph, Stardog, etc. Still, frustration of not having just what I needed for a project (more on requirements later) convinced me to spend some time building my own datastore based on some available open source libraries.

Much of the motivation for my work developing kbsportal.com is to make possible the development of a larger turnkey information appliance. I have been using MongoDB for this, but even with an application specific wrapper MongoDB has been a little awkward for my requirements, which are:

  • I want a reasonably efficient document store that supports the usual CRUD operations on arbitrary Clojure maps (which can be nested to any depth). Clojure maps are basically what I use to contain and use data so I wanted a datastore that supports this, simply.
  • I want all text in documents (embedded at any depth in the document) to be searchable.
  • I need to be able to annotate data stored documents and sometimes relationships between documents.
  • My preferred notation for annotating data is RDF
  • I need to be able to efficiently perform SPARQL queries on the RDF annotations.
  • Coupling between documents and RDF: auto delete of any triples referencing a document ID, if the referenced document is deleted.

Initially I was going to write a wrapper library using two datastores as SaaS products: Cloudant (for CouchDB with Lucene indexing) and Dydra.com (for a RDF datastore, with extras). A small wrapper API would have made this all work but since a lot of what I am doing is in the experimenting phase I decided that I didn't want to use remote web services for coding experiments. Using these services, with a wrapper would be nice for production, but not for hacking.

Anyway, I have built a small project that uses HSQLDB (relational database) and Sesame (RDF :

EDIT: Patrick Logan asked about my use of HSQLDB; not specific to HSQLDB really, but here is the important code (hand edited to try to get it to fit on this web page) for adding documents that are nested maps, indexing them, and searching (note: I usually use Clucy/Lucene for search in Clojure code, but for what I am doing right now, this suffices):

(defn index-if-str [x id]
  (if (= (class x) java.lang.String)
    (sql/with-connection hsql-db
      (doseq [token (map (fn [s] (.toLowerCase s))
                     (clojure.string/split x #"[ ;.,]()"))]
        (if token
          (sql/insert-record "search" {:doc_id id :word token}))))))

(defn insert-doc [map]
  (let [id
        (:id (sql/with-connection hsql-db
               (sql/insert-record
                 "docs" {:json (json/write-str map)})))]
    (postwalk (fn [x] (index-if-str x id)) map)
    id))

;; (insert-doc {:foo "bar" :i 101 :name "sue jones"})

(defn search [s]
  (map
    first
    (let [indices
          (map
            :doc_id
            (let [tokens
                  (apply str (interpose ", "
                     (map (fn [s] (str "'" (.toLowerCase s) "'"))
                       (clojure.string/split s #"[ ;.,]()"))))]
              (sql/with-connection hsql-db
                 (sql/with-query-results results
                    [(str "select * from search where word in (" tokens ")")]
                    (into [] results)))))]
      (sort (fn [a b] (compare (second b) (second a))) (into [] (frequencies indices))))))

Friday, February 15, 2013

Using the Microsoft Translation APIs from Java, Clojure, and JRuby

I wrote last July about my small bit of code on github that wrapped the Microsoft Bing Search APIs. I recently extended this to also wrap the Translation APIs using the open source project microsoft-translator-java-api project on Google Code. I just provide a little wrapper for the microsoft-translator-java-api project and if you are working in Java you should just use their library directly.

Hopefully this will save you some time if you need to use the translation services. The free tier for the translation services is currently 2 million characters translated per month.

Saturday, February 02, 2013

Goodness of micro frameworks and libraries

I spent 10+ years using large frameworks, mainly J2EE and Ruby on Rails. A large framework is a community and set of tools that really frames our working lives. I have received lots of value and success from J2EE and Rails but in the last few years I have grown to prefer micro frameworks line Sinatra (Ruby) and Compojure + Noir + Hiccup (Clojure).

Practitioners who have mastered one of the larger frameworks like Rails, J2EE, Spring, etc. can sometimes impressively and quickly prototype and then build large functioning systems. I had an odd thought this morning, and the more I mull it over, the more it makes sense to me: large frameworks seem to be optimized for consultants and consulting companies for the quick kill: get in, build the most impressive system possible with the minimum resources, and leave after finishing a successful project. This is an oversimplification, but seems to be true in many cases.

The flip side to the initial productivity of large frameworks is a very real "tax" in long term maintenance because there are so many interrelated components in a system - components that might not be used or weakly used.

Micro frameworks are designed to do one or just a few things well and other third party libraires and plugins need to be chosen and integrated. This does take extra time but then the overall codebase with dependencies is smaller and focussed (mostly) on just what is required. I view using micro frameworks and libraries only, and composing systems into distinct services as a longer term strategy to reduce resource costs for systems long term.

I still invest a fair amount of time tracking larger frameworks, recently being especially interested in what is available in Rails 4 and the latest SmartGWT (a nice framework for writing both web client and server side code in Java - lots of functionality, but not as great for quick agile development in my opinion).