Friday, August 02, 2013

I moved my blog to http://markwatson.com/blog

I am leaving the old blog in place, just as an archive, but please update your bookmarks to reference:

http://markwatson.com/blog

Semantic web and linked data are a form of agile development

I am working on a book on building intelligent systems in JavaScript (1) and I just wrote part of the introduction to the section on semantic web technologies:

In order to understand and process data we must understand the context in which it was created and used. We looked at document oriented data storage in the last chapter. To a large degree documents are an easier source of knowledge to use because they contain some of their own context, especially if they use a specific data schema. In general data items are smaller than documents and some external context is required to make sense of different data sources. The semantic web and linked data technologies provide a way to specify a context for understanding what is in data sources and the associations between data in different sources.

The thing that I like best about semantic web technologies is the support for exploiting data that was originally developed by small teams for specific projects. I view the process as a bottom up process with no need to plan complex schemas and plans for future use of data. Projects can start by solving a specific problem and then the usefulness of data can increase with future reuse. Good software developers learn early in their careers that design and implementation need to be done incrementally. As we develop systems, we get a better understanding of how our systems will be used and how to build them. I believe that this agile software development philosophy can be extended to data science: semantic web and linked data technologies facilitate agile development, allowing us to learn and modify our plans when building data sets and systems that use them.

(1) I prefer other languages like Clojure, Ruby, and Common Lisp, but JavaScript is a practical language for both server and client side development. I use a small subset of JavaScript and have JSLint running in the background while I work: “good enough” with the advantages of ubiquity and good run time performance.

3rd edition of my book just released: “Loving Common Lisp, or the Savvy Programmer’s Secret Weapon”

"Loving Common Lisp, or the Savvy Programmer’s Secret Weapon"

The github repo for the code is here.

Enjoy!

Easy setup for A/B Testing with nginx, Clojure + Compojure

Actually, I figured out the following directions for my Clojure + Compojure web apps, but as long as you are using nginx, this would work for Node.js, Rails, Sinatra, etc.

The first thing you need to do is to make two copies of whatever web app you want to perform A/B Testing on, and get two Google Analytics user account tokens _uacct (i.e., the string beginning with “UA-”) tokens, one for each version. I usually use Hiccup, but for adding the Google Analytics Javascript code, I just add it as a string to the common layout file header like (reformatted to fit this page width by adding line breaks):
(html5
    [:head [:title "..."]
     ""
     (include-css "/css/bootstrap.css")
     (include-css "/css/mark.css")
     (include-css "/css/bootstrap-responsive.css")
     ""
     ""
     ]
The next step is to configure nginx to split requests (hopefully equally!) between both instances of you web app. In the following example, I am assuming that I am running the A test on port 6070 and the B test on port 6072. I modified my nginx.conf file to look like:
upstream backend {
    ip_hash;
    server   127.0.0.1:6070;
    server   127.0.0.1:6072;
  }

  server {
     listen 80;
     server_name  DOMAINNAME.com www.DOMAINNAME.com;
     location / {
        proxy_redirect off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://backend;
    }
    error_page 500 502 503 504  /error.html;
    location = /error.html {
        root  /etc/nginx;
    }
  }
The ip_hash directive is supposed to evenly split requests by requesting IP address. This means that if a user hits your web app from their home and then again at their local coffee shop that the might see both A and B versions of your web app. Other options would be to use a per user device cookie, etc., but I think that randomly assigning version A or B based on a hash of the requesting IP address is sufficient for my needs.

I am just starting to use this scheme for A/B Testing, but it seems to work as expected. I do suggest that when your clone you web app that you keep versions A and B identical for a few days and check the Google Analytics for both account tokens to make sure the statistics for page views, times on pages, etc. are close to being the same for A and B.

After more testing, Google Analytics shows that the nginx ip_hash directive seems to split traffic near perfectly 50% to each A and B versions of my web site.

The 4th edition of my book “Practical Artificial Intelligence Programming with Java” is now available

Buy a copy at Leanpub!

The recommended price is $6 and the minimum price is $3. This includes PDF, Kindle, and iPad/iPhone formats and free updates as I fix any errors and update the book with new material. You may want to look at the github repository for the book example code before purchasing this book to make sure that the material covered in my book will be of interest to you. I will probably update the book in a few weeks after getting feedback from early readers. I am also still working on Clojure and JRuby wrapper for the Java code examples and as I update the code I will frequently push changes to the github repository for the example code.

My version of Ember.js ‘Get Excited Video’ code with Sinatra based service

The Ember.js Get Excited Video has source code for the example in the video but the test data is local on the client side. I forked this project and added a Ruby Sinatra REST service. My version is here. This uses Ember.js version 1.0 RC4. It took me some time get the example code working with a REST service so I hope my forked example will save you some time and effort.

Rest service and client in DART

The DART language looks like a promising way to write rich clients in a high level language. I have been looking at DART and the Ruby to JavaScript compiler Opal as possible (but not likely) substitutes for Clojurescript with Clojure back ends. It took me a little while to get a simple REST service and client working in development mode inside the DART IDE. The following code snippets might save you some time. Here is a simple service that returns some JSON data:
import 'dart:io';
import 'dart:json' as JSON;

main() {
  var port = 8080;
  HttpServer.bind('localhost', port).then((HttpServer server) {
    print('Server started on port: ${port}');
    server.listen((HttpRequest request) {
      var resp = JSON.stringify({
        'name': 'Mark',
        'hobby': 'hiking'}
      );
      request.response..headers.set(HttpHeaders.CONTENT_TYPE,
                                   'application/json');
      request.response..headers.set('Access-Control-Allow-Origin','*');
      request.response..headers..write(resp)..close();
    });
  });
}
It is required to set Access-Control-Allow-Origin. Here is the client code (I am not showing the HTML stub that loads the client):
import 'dart:html';
import 'dart:json';

void main() {
  // call the web server asynchronously
  var request = HttpRequest.getString("http://localhost:8080/")
                           .then(onDataLoaded);
}

void onDataLoaded(String responseText) {
  var jsonString = responseText;
  print(jsonString);
  Map map = parse(jsonString);
  var name = map["name"];
  var hobby = map["hobby"];
  query("#sample_text_id").text =
      "My name is $name and my hobby is $hobby";
}
The call to query(…) is similar to a jQuery call. As you might guess, “#sample_text_id” refers to a DOM element from the HTML page with this ID. DART on the client side seems to be very well supported both with components and tooling. I think that DART on the server side is still a work in progress but looks very promising.

Thursday, March 14, 2013

Small example app using Ember.js and Node.js

I have been playing with Ember.js, and generally trying to get a little at better programming in Javascript, a language I have used for years, but for which I am still a novice. I wrote the other day about Small example app using Ember.js and Clojure + Compojure + Noir and I thought I would try replacing the simple Clojure REST backend with an equally simple Node.js backend. The results of this simple exercise are in the github repo emberjs-nodejs. I leave it to you to take a look if you are interested.

I will say that development with Javascript, Ember.js, and Node.js seems very light weight and agile, even though I use IntelliJ for editing and project management. Starting an app takes maybe a second. Compared to, for example, Java + GWT, or even Clojure + Compojure, I find Javascript, Ember.js, and Node.js to be really a light and fun combination. It would be even more fun if I were a better Javascript programmer :-)

Tuesday, March 12, 2013

More Clojure deployment options

I have been very happy running multiple Clojure web apps using the embedded Jetty server using lein trampoline on a large VPS. I start each app on a different port and use nginx to map to each app to its own domain name. Easy and this lets me also adjust the JVM memory individually for each application. This works so well for me that I almost feel guilty trying alternatives :-)

I don't know if I will permanently change my deployment strategy but I am experimenting using Immutant which is a Clojure deployment platform built on top of JBoss AS 7. After installing the lein Immutant plugin and the latest version of Immutant, then you can run the JBoss AS/Immutant using lein, and separately deploy and un-deploy web applications using lein. Pretty slick, but I am still trying to get a grip on interactive development by connecting nREPL (documentation: Interactive development). My usual style of interactive development is pretty nice (I use IntelliJ to edit, keep the web app running with lein run with live loading of changes to Clojure code (including Hiccup, which is my favorite ""markup""), CSS, etc. I am going to give Immutant with nREPL interactive development a long and patient try - not sure what I will be using a month from now: probably not because I prefer stacks that are less complicated. My days of loving heavy weight J2EE apps are over :-)

BTW, I would bet that many of us have suffered a little using a do-it-yourself custom cartridge on Red Hat's OpenShift PaaS. It works, but at least for me, it was painful even with a lot of useful material on the web describing how to do it. This Immuntant based example looks more promising.

I have used Heroku in the past for Clojure web apps but since I usually have 3 or 4 distinct web apps deployed the cost of about $35 each is much more than putting them all on a large memory multiple core VPS. I very much wish that Heroku had a slightly less expensive paid 1 dyno plan that never gets swapped out when idle, causing a loading request delay.

I haven't yet tried Jelastic hosting. Their minimum paid plan with 128MB RAM (that I assume would always be active, so no loading request delays) is about $15/month which sounds fair enough. They deserve a try in the near future :-)

Another option that I have used for both one of my Clojure web apps and a low traffic web app for a customer is to pre-pay for a micro EC2 instance for a monthly cost of about $5. My only problem with this is that EC2 instances are not so reliable, and I feel like I am better off with a quality VPS hosting company. BTW, if you run a Clojure web app using the embedded Jetty server on a micro EC2, be sure you run it using "nice" to lower its priority and avoid AWS drastically reducing your resources because you use too much CPU time for too long of a period; I find much better continuous performance using "nice" - go figure that one out!

One AWS option I haven't tried yet is using lein-beanstalk which looks good from the documentation. Elastic Load Balancing on AWS costs about $18/month which drives up the cost of a Elastic Beanstalk deployment of a low traffic web app, but I think it does offer resilience to failing EC2s. You are limited to one web app per Elastic Beanstalk deployment, so this is really only a good option for a high traffic app.

A few years ago I also used appengine-magic for hosting on GAE but it bothers me that apps are not easily portable to other deployment platforms, especially converting datastore code. This is too bad because when Google raised prices and brought AppEngine out of beta, that made it more attractive to me, even with some developers complaining of large costs increases. Still, for my desire for a robust and inexpensive hosting for low or medium traffic web sites, AppEngine is in the running by simply setting the minimum number of idle instances to 1 and the maximum number of instances to 1: that should handle modest traffic for about $10/month with (hopefully) no loading request delays.

Edit: someone at my VPS hosting company (RimuHosting) just told me that as long as I set up all my apps, nginx, etc. to automatically start when a system is rebooted, then I probably have no worries: on any hardware problems they restart from a backup on a new physical server. I do occasionally try a soft-reboot of my VPS just to make sure everything gets started properly. I thought that RimuHosting would do this, but I asked to make sure.

Edit: the New York Times has a good article on the big business of offering cloud services that is worth reading: http://www.nytimes.com/2013/03/13/technology/google-takes-on-amazon-and-microsoft-for-cloud-computing-services.html

Edit: 2013-03-14: Jim Crossley set up a demo project that exercises using the common services for Immuntant/JBoss, and wraps project source and resource files for live loading of changes to Clojure code and resources: Immutant demo.

Monday, March 11, 2013

Small example app using Ember.js and Clojure + Compojure + Noir

I use Clojure with Compojure and Noir for (almost) all of my web apps and lately I have also been experimenting with Ember.js. After buying and reading Marc Bodmer's book Instant Ember.js Application Development How-to yesterday I decided to make a very small template application using Ember.js for the UI and a trivial back end REST service written in Clojure. I used Marc's Ember.js setup and it worked well for me.

The github repo for my small template project is emberjs-clj

Please note that this example is a trivial Ember.js application (about 50 lines of code) and is intended just to show how to make a REST call from an Ember.js front end app, how to implement the REST service in Clojure, and not much else. I wanted a copy and paste type template project to use for starting "real projects."

You can grab the repo from github, or if you just want to see the interface between the UI and back end service, here is the code run by the Javascript UI:

RecipeTracker.GetRecipeItems = function() {
  $.ajax({
    url: '/recipes/',
    dataType: 'json',
    success : function(data) {
      for (var i = 0, len = data.length; i < len; i++) {
        RecipeTracker.recipesController.addItem(
          RecipeTracker.Recipe.create({
            title: data[i]['title'],
            directions: data[i]['directions'],
            ingredients: data[i]['ingredients']
        }));
      }
    } });
};
and here is the Clojure code for returning some canned data:
(def data [
       {"title" "Gazpacho",
        ...},...])

(defn recipes-helper []
  (json/write-str data))

(defpage "/recipes/" [] (recipes-helper ))
Hopefully my demo project will save you some effort if you want to use Ember.js with a Clojure back end.

Edit 2013-03-13: updated the example on github to Ember.js 1.0 RC1, cirrectling some breaking API changes.

Saturday, March 09, 2013

Google Research's wiki-links data set

wiki-links was created using Google's web crawl and looking for back links to Wikipedia articles. The complete data set less than 2 gigabytes in size, so this playing with the data is "laptop friendly."

The data looks like:

MENTION vacuum tubes 10838 http://en.wikipedia.org/wiki/Vacuum_tube
MENTION electron gun 598  http://en.wikipedia.org/wiki/Electron_gun
MENTION oscilloscope 1307 http://en.wikipedia.org/wiki/Oscilloscope
MENTION radar        1657 http://en.wikipedia.org/wiki/Radar
One possible use for this data might be to compare two (possibly multiple word) terms by looking up their Wikipedia pages, remove the stop (noise words) from both pages, and calculate a similarity based on "bag of words", etc. Looks like a great resource!

Another great data set from Google for people interested in NLP (natural language processing) is the Google ngram data set that has ngram sets for "n" in the range [1,5]. This data set is huge and not "laptop friendly" so last year I leased very large memory server from hetzner.de for a few months while I used the ngram data sets. I wish that I still had this data online but the cost of the server eventually became greater than the value of ready access to the data. The next time I need it I am planning on configuring a large memory EC2 instance with enough EBS storage for the data, indices, and application specific stuff - then I can stop the large memory instance when I don't need the data online which is probably 99% of the time: most of the costs will just be for the EBS storage itself, and not the (approximately) $0.50/hour when I keep the instance running.

Edit: I just did the math: renting a Hetzner server turns out to be much less expensive than using an EC2 instance that is usually spun down because 1 terabyte of EBS storage is $100/month (almost double what a Hetzner server costs).

Monday, February 25, 2013

Building custom data stores

Creating a custom datastore may seem like a bad idea when such great tools like Postgres, MongoDB, CouchDB, etc. are available in their open source goodness as well as good commercial products such as Datomic, AllegroGraph, Stardog, etc. Still, frustration of not having just what I needed for a project (more on requirements later) convinced me to spend some time building my own datastore based on some available open source libraries.

Much of the motivation for my work developing kbsportal.com is to make possible the development of a larger turnkey information appliance. I have been using MongoDB for this, but even with an application specific wrapper MongoDB has been a little awkward for my requirements, which are:

  • I want a reasonably efficient document store that supports the usual CRUD operations on arbitrary Clojure maps (which can be nested to any depth). Clojure maps are basically what I use to contain and use data so I wanted a datastore that supports this, simply.
  • I want all text in documents (embedded at any depth in the document) to be searchable.
  • I need to be able to annotate data stored documents and sometimes relationships between documents.
  • My preferred notation for annotating data is RDF
  • I need to be able to efficiently perform SPARQL queries on the RDF annotations.
  • Coupling between documents and RDF: auto delete of any triples referencing a document ID, if the referenced document is deleted.

Initially I was going to write a wrapper library using two datastores as SaaS products: Cloudant (for CouchDB with Lucene indexing) and Dydra.com (for a RDF datastore, with extras). A small wrapper API would have made this all work but since a lot of what I am doing is in the experimenting phase I decided that I didn't want to use remote web services for coding experiments. Using these services, with a wrapper would be nice for production, but not for hacking.

Anyway, I have built a small project that uses HSQLDB (relational database) and Sesame (RDF :

EDIT: Patrick Logan asked about my use of HSQLDB; not specific to HSQLDB really, but here is the important code (hand edited to try to get it to fit on this web page) for adding documents that are nested maps, indexing them, and searching (note: I usually use Clucy/Lucene for search in Clojure code, but for what I am doing right now, this suffices):

(defn index-if-str [x id]
  (if (= (class x) java.lang.String)
    (sql/with-connection hsql-db
      (doseq [token (map (fn [s] (.toLowerCase s))
                     (clojure.string/split x #"[ ;.,]()"))]
        (if token
          (sql/insert-record "search" {:doc_id id :word token}))))))

(defn insert-doc [map]
  (let [id
        (:id (sql/with-connection hsql-db
               (sql/insert-record
                 "docs" {:json (json/write-str map)})))]
    (postwalk (fn [x] (index-if-str x id)) map)
    id))

;; (insert-doc {:foo "bar" :i 101 :name "sue jones"})

(defn search [s]
  (map
    first
    (let [indices
          (map
            :doc_id
            (let [tokens
                  (apply str (interpose ", "
                     (map (fn [s] (str "'" (.toLowerCase s) "'"))
                       (clojure.string/split s #"[ ;.,]()"))))]
              (sql/with-connection hsql-db
                 (sql/with-query-results results
                    [(str "select * from search where word in (" tokens ")")]
                    (into [] results)))))]
      (sort (fn [a b] (compare (second b) (second a))) (into [] (frequencies indices))))))

Friday, February 15, 2013

Using the Microsoft Translation APIs from Java, Clojure, and JRuby

I wrote last July about my small bit of code on github that wrapped the Microsoft Bing Search APIs. I recently extended this to also wrap the Translation APIs using the open source project microsoft-translator-java-api project on Google Code. I just provide a little wrapper for the microsoft-translator-java-api project and if you are working in Java you should just use their library directly.

Hopefully this will save you some time if you need to use the translation services. The free tier for the translation services is currently 2 million characters translated per month.

Saturday, February 02, 2013

Goodness of micro frameworks and libraries

I spent 10+ years using large frameworks, mainly J2EE and Ruby on Rails. A large framework is a community and set of tools that really frames our working lives. I have received lots of value and success from J2EE and Rails but in the last few years I have grown to prefer micro frameworks line Sinatra (Ruby) and Compojure + Noir + Hiccup (Clojure).

Practitioners who have mastered one of the larger frameworks like Rails, J2EE, Spring, etc. can sometimes impressively and quickly prototype and then build large functioning systems. I had an odd thought this morning, and the more I mull it over, the more it makes sense to me: large frameworks seem to be optimized for consultants and consulting companies for the quick kill: get in, build the most impressive system possible with the minimum resources, and leave after finishing a successful project. This is an oversimplification, but seems to be true in many cases.

The flip side to the initial productivity of large frameworks is a very real "tax" in long term maintenance because there are so many interrelated components in a system - components that might not be used or weakly used.

Micro frameworks are designed to do one or just a few things well and other third party libraires and plugins need to be chosen and integrated. This does take extra time but then the overall codebase with dependencies is smaller and focussed (mostly) on just what is required. I view using micro frameworks and libraries only, and composing systems into distinct services as a longer term strategy to reduce resource costs for systems long term.

I still invest a fair amount of time tracking larger frameworks, recently being especially interested in what is available in Rails 4 and the latest SmartGWT (a nice framework for writing both web client and server side code in Java - lots of functionality, but not as great for quick agile development in my opinion).

Monday, January 28, 2013

A little weird, but interesting: using IntelliJ LeClojure's debugger with Clojure code

I have been working on a web app using Compojure, Noir, and Hiccup. I usually run both lein repl and lein run concurrently and use either IntelliJ or Aquamacs (Mac Emacs) as a code editor. If I am working on a small bit of new code and experimenting a lot, then Aquamacs + nrepl works well for me. If I have my MacBook Air plugged into a huge monitor sometimes I run lein run and a LeClojure repl inside IntelliJ and detach a few edit panes as separate windows - so many nice choices for development!

I don't really like debuggers because getting into the habit of using them can end up wasting a lot of time that could be spent writing unit tests, etc. That said, I was looking at some Ruby code I wrote for a customer but have not deployed and it occurred to me to try the RubyMine debugger which worked very well and generally didn't get in my way or waste too much time manually stepping through some code.

So, I decided to spend a little time trying IntelliJ LeClojure's debugger on my Clojure project:

I wrote a little stub file to run as the debug target "script":

(ns cookingspace)
(use 'cookingspace)
(-main)
I placed this file in the top level project directory so the relative path for loading required data files is set up OK. Also, I had to change the default run/debug option to not make the project before running - this is necessary since I have top level expressions that initialize data from reading data files, and the Clojure compiler is not run from the project directory so I would get build errors. You probably will not have that issue with your projects.

One thing that is cool about running the app in the debugger is that the LeClojure plugin also starts a repl (see the second tab at the bottom to the screenshot) that shows all debug println print outs and provides a repl without having to start a separate process.

I don't think that I will use the debugger very often (again, using a debugger to understand and debug code is usually not the best use of time) but it is cool to have another tool. That said, I might find that always running in the debugger and only setting breakpoints on the rare occasions that I need to better understand what some code is doing might work well. I will try this mode of development for a few hours a week as an experiment.

Friday, January 18, 2013

Cooking functionally

I am not actually using functional programming in the kitchen :-)

I am re-writing my old cookingspace.com web app in Clojure with a new twist: the new web app will be an AI agent for planning recipes based on the food you have on-hand and based on a user's history, preferences, and the general mood they are in (food-wise). I have written some tools to convert my old relational database code for (more or less) static data (USDA nutrition information, recipes) to Clojure literal data.

Based on the type of food a user feels like cooking and what food they have on-hand, I need to transform recipes in an intelligent way to (hopefully!) also taste great after ingredients have substituted or morphed because a user would prefer a different type of sauce over a base recipe, etc., etc.

Anyway, working in Lisp and experimenting in a repl is making the entire process easier.

My wife and I are both very good cooks and we can make up super-tasty recipes on the fly with whatever ingredients we have on hand. I am betting on two things: I can automate our expertise and that people will enjoy using the system when it is finished.

Wednesday, January 16, 2013

Faceted search: one take on Facebook's new Graph Search vs. other search services

Faceted search is search where the domain being search is filtered by categories or some taxonomy. Individuals become first class objects in Facebook's new Graph Search and (apparently) search is relative to a node in their social graph that represents the Facebook user, other users they are connected with, and data for connected users.

I don't yet have access to Facebook's new Graph Search but I have no reason to doubt that as it evolves both Facebook users and Facebook's customers (i.e., advertisers and other organizations that make money from user data) should be happy with the service.

Google's Knowledge Graph and their search in general are also personalized per user. Once again this is made possible by collecting data on users and monetizing by providing this information to their customers (i.e., once more, advertisers, etc.)

Pardon a plug for the Evernote service (I am a happy paying customer): Evernote serves as private search. Throw everything relavant to my digital life into Evernote, and I can later search "just my stuff." I don't doubt that Evernote somehow also makes money by aggregating user information.

I assume that any 3rd party web service I use is somehow monetizing on data about me. I decide to use 3rd services more for the value they provide since my cynical self assumes the worse about privacy.

Dealing with faceted search and graph databases at Facebook and Google scale is an engineering art form. Fortunately for the rest of us, frameworks/libraries like Solr (faceted search) and Neo4J (a very easy to use graph database) make it straight forward to experiment with and use the same technologies, but admittedly without the advantage of very large data stores.

Sunday, January 13, 2013

The Three Things

Since I (mostly) shut down my consulting business (which I have enjoyed for 15 years) in favor of developing my own business ideas I find that I am working more hours a week now. When consulting the breakdown of my time spent per week was roughly 15 hours consulting, 8 hours writing, 5 hours reading technical material (blogs, books, and papers), 5 hours programming on side projects, and many hours of "me time" (hiking, cooking, reading fiction, meditation, and hanging out with my wife and our friends).

Wondering what "The Three Things" are? Hang on, I will get to that.

I now find myself spending 40+ hours a week on business development. Funny thing is, I still am spending a lot of time on the same activities like hiking, hanging out, reading, and side projects. I feel like a magician who has created extra time. Now for The Three Things:

I have always known this but things have become more clear to me:

  1. Spend time on what energizes you. If work drags you down and robs you of energy then you are probably doing the wrong thing. Whenever you can, engage in activities you love doing rather than maximizing how much money you make.
  2. Continually try to improve yourself by choosing activities for both your personal growth and that have a positive effect on other people.
  3. Even in the face of disappointments in life try to live in a constant state of gratitude for loved ones, skills, opportunities, and the adventure of not really knowing what comes next in life.

Friday, January 11, 2013

Ray Kurzweil On Future AI Project At Google

Here is a good 11 minute interview with Ray Kurzweil.

In the past Google has been fairly open with publishing details of how their infrastructure works (e.g., map reduce, google file system, etc.) so I am hopeful that the work of Ray Kurzweil, Peter Norvig, and their colleagues will be published, sooner rather than later.

Kurzweil talks in the video about how the neocortex builds hierarchical models of the world through experience and he pioneered the use of Hierarchical hidden Markov Models. It is beyond my own ability to judge if HHMMs are better than the type of hierarchical models formed in deep neural networks, as discussed a lot by Geoffrey Hinton in his class "Neural Networks for Machine Learning." In this video and in Kurzweil's recent Authors at Google talk he also discusses IBM's Watson project and how it is capable of capturing semantic information from articles it reads; humans do a better job at getting information from a single article, but as Kurzweil says, IBM Watson can read every Wikipedia article - something that we can not do.

As an old Lisp Hacker it fascinates me that Google does not use Lisp languages for AI since languages like Common Lisp and Clojure are my go-to languages for coding "difficult problems" (otherwise just use Java <grin>). I first met Peter Norvig at the Lisp Users & Vendors Conference (LUV) in San Diego in 1992. His fantastic book "Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp" had just been published in 1991, as had my much less good Springer-Verlag book "Common LISP Modules: Artificial Intelligence in the Era of Neural Networks and Chaos Theory." Anyway, it is not for me to tell companies what programming languages to use :-)

I thought that one of the most interesting parts of the linked video was Kurzweil's mention of how he sees real AI (i.e., being able to understand natural language) will fit into Google's products.

While individuals like myself and small companies don't have the infrastructure and data resources that Google has, if you are interested in "real AI", deep neural networks, etc. I believe that it is still possible to perform useful (or at least interesting) experiments with smaller data sets. I usually use a dump of all Wikipedia articles, without the comments and edit history. Last year I processed Wikipedia with both my KBSPortal NLP software (which, incidentally, I am hoping to ship a major new release in about two months) and the excellent OpenCalais web services. These experiments only took some patience and a leased Hetzner quad i7 32GB server. As I have time, I would also like to experiment with a deep neural network language model as discussed by Geoffrey Hinton in his class.

Saturday, December 29, 2012

I am trying to improve my skills at design and web development

I built my first simple web page at SAIC in 1992 when my good friend Gregg Hanna set up a publicly accessible web server for my working group. Since then I have had a lot of people suggest that my web sites could look better but frankly I have always been more interested in content and developing cool web application functionality.

Recently I have been putting some effort into improving my design skills and the best resource that I have found is

The author Robin Williams does a fantastic job at explaining four basic concepts of design: contrast, repetition, alignment, and proximity. She then provides good examples that show the reader how to recognize bad design and how to correct design errors.

I spent some time redesigning my main web site and really enjoyed the process. I started by determining the worst aspects of the old design based on Robin's advice and then tried to correct the design flaws using her examples.

I understand the technical aspects of using HTML5, CSS, and JavaScript but I was having some problems attempting to build effective web applications for both mobile devices and web browsers. As I have blogged about before, I have experimented and used the following tools on customer projects: plain old JSPs, Rails, Play Framework, and most recently Hiccup with Clojure web applications.

I have purchased several good books on CSS, HTML5, and JavaScript in the last few years but the one that has helped me the most has been:

Ethan's short book on responsive design really helped me a lot because he efficiently covered what I needed to know about media queries and effective CSS and HTML5.

I have been using Dojo Mobile and more recently Twitter Bootstrap which did a lot of the heaving lifting for me in my first attempts at creating responsive multiple platform web applications. Reading Ethan's book helped me understand some of what Dojo and Boostrap were doing for me "behind the scenes" and also gives me some confidence in writing one page web applications from scratch without frameworks that might do more than I want and add unnecessary complexity.

Thursday, December 27, 2012

Technology tire kicking: trying Rails 4.0 beta

As I mentioned in my last blog article I am (mostly) shutting down my consulting business in order to have time for work on writing projects and to try to develop three business ideas. All three involve web apps/services and I want to use Clojure for two of them and Rails for the third.

Rails 4 should be released early in 2013 but I thought I would get a leg up and start experimenting with Rails 4 now. Fairly easy to set up and try:

git clone https://github.com/rails/rails.git
cd rails
gem install sprockets
rake build ; rake install
And then version 4 beta is installed:
✗ rails -v
Rails 4.0.0.beta
✗ rails new testapp -d postgresql
I had to comment out the assets group in the generated Gemfile before bundle install but otherwise everything worked fine.

Monday, December 24, 2012

Happy Holidays and my future plans

I would like to wish you Happy Holidays! I hope you are with family and friends enjoying yourself over the holidays.

I wanted to share with you my plans for the future. Starting in January I am planning on mostly shutting down my consulting business. I have been consulting for about 14 years and consulting has provided me with a great life style but it is time for a change. Before consulting I worked at SAIC, Physical Dynamics, and Angel Studios. I will still provide some consulting services but will limit the time directly helping customers on very small projects.

I plan to spend most of my "work" time writing and developing a few software as a service business ideas.

I have written published books for some great publishers (Springer-Verlag, McGraw-Hill, Morgan Kaufman, APress, Sybex, M&T Press, and J. Wiley) but because I prefer writing on niche subjects (things that are of special interest to me!) I will probably only write free books published as PDFs in the future. I don't really need the income generated by publishing books and I would prefer writing on "smaller topics" that likely would have a small market. My wife Carol is an excellent editor and she will help me as will volunteers who read early versions my books and provide technical feedback.

One book idea that I have been planning is titled "Single Page Web Applications in Clojure" - this is a niche topic that few people will be interested in but I have a personal interest in writing an open source framework and writing a short book around my software will hopefully make the whole project more useful.

I have had a lot of people help me in my working life. So much of what I have accomplished so-far in my life has been made possible by other people mentoring and helping me! When my writing (or open source projects) helps other people I feel like I am paying back all of the people who have helped me.

I plan on making writing my main priority and activity but I also hope to spend a significant amount of time developing some ideas I have for software as a service products. I have always been a polyglot programmer to fit in with whatever languages my employers/customers use: Java, Ruby, Common Lisp, Clojure, Scheme, Scala, Prolog, C/C++, and Python. I will probably just use Clojure on my own projects, with some Ruby glue code for little utilities. I will write more about this when I have prototype web apps in place for people to try.

Happy Holidays and a Happy New Year!

Sunday, December 23, 2012

Home from our Amazon River vacation - here are some pictures

I took a ton of hi-def video and pictures. Here is a Google+ photo album of a few of the pictures (small representative sample)

I have a Canon T2i camera with a nice 24-105mm L-lens. I mostly take hi-def video hand held. If we were in the same place enjoying a glass of wine together I would show you the hi-def video, but the pictures in the linked photo album are OK as a representation of the experiences Carol and I had.

Sunday, December 16, 2012

Problem fixed with Holland America: they offered a nice refund

Note: update: Holland America gave us a fair refund - I withdraw my complaints listed on this blog article

My wife and I have been on 15 cruises and the service provided by Holland America on the 23 day cruise we are on right now is so much worse than the other 14 cruises we have been on that I feel motivated to write up our experiences.

Note: our cruise was up the Amazon River and we did have some memorable experiences which I will blog about in the next week or so and provide links to some of our pictures and videos.

In order of "worse things first":

  • We booked a tour to Santarem (large industrial city) and Alter do Chao (a pretty little town on the water with amazing beaches). The tour guide was from Santarem and spent all 4 3/4 hours of the tour in her home town and blew off taking us to the scheduled stop in Alter do Chao. She did give us lots of unwanted shopping experiences, and wasted time stopping the bus by a new highway project and going into lots of detail about that. Many other cases where she was just killing time. She did have the driver stop on a road for one minute above Alter do Chao so we could catch a glimpse of the beautiful town that we missed. I formally asked Holland America for a partial tour refund but I received a negative response from them to my written request: no partial refund.
  • Our toilet did not work for over seven hours. This was not quite a ship wide occurrence since we talked to a few people who were not affected. Just hearsay, but we heard people talking of having their stateroom toilets out of order from a few hours to two days. When I first asked the front desk about this I was told it was a ship wide occurence and to be patient. After about 6 hours I asked again and was told that our toilet might be fixed that day. I got fed up, went back and told the lady at the front desk that this was a health violation and guess what: an engineer came and fixed our toilet within about 20 minutes. Sometimes complaining helps.
  • We booked an expensive 7 hour small boat tour and because of shallow water could not get to our destination. We were warned about this by the tour office the night before the tour and we decided to go anyway, so that was our own fault, but the tour should probably just have been cancelled.
  • We did not have any hot water in our stateroom for several days.
  • The ship had been in dry dock until the morning of the cruise and the exterior was filthy. The port side of the promenade deck was not fully reconstructed and was in particularly bad shape. However, within about 24 hours everything was cleaned up, so not such a big deal.
  • A nit pick: sometimes there were just paper napkins in the formal dining room, and they were not very good paper napkins.
Those were our own experiences. While drinking and having dinner with other passengers, we heard their complaints also; for example:
  • People's stateroom air-conditioning was not working, often for days. One couple took their pillows and sheets up to a public bar area and slept there because the bar had working air conditioning.
  • Complaints about the dirty state of the ship, toilets, and hot water issues.
One couple who we often met for drinks during happy hour are long time Holland America customers who have logged almost 300 days with this cruise line. They made it very clear that they will not be traveling again with Holland America. Another friend I made on board ship is also a long time customer and explained that this situation being caused by Holland America being bought by a larger company and they still charge a very high premium price for non-premium service.

I would like to say that we have had excellent service by our stateroom stewards and the food waiters. Also the food has been very good.

Monday, November 26, 2012

I am going to be mostly off the Internet for 3 weeks

I moderate comments because occasionally someone leaves some SPAM as a comment - so, there is usually just a short delay between the time readers post comments and when I moderate/publish them.

I will be on vacation, and for most of the time I will not have an Internet connection, so any comments left on my blog may not get moderated until the end of December when I get back home.

Best regards,
Mark

Saturday, November 24, 2012

Deep Learning

I worked in the field of artificial intelligence during the "AI winter" (a backlash against too optimistic predictions of achieving "real AI") and to this day I avoid getting too optimistic of huge short term gains in our field. That said, in the last several months a few things are stirring up my old optimism!

I have been enjoying Geoffrey Hinton's Coursera course Neural Networks for Machine Learning and I was pleased to see a front page New York Times article this morning on deep learning. Another really nice reference for deep learning is a very large PDF/viewgraph presentation by Richard Socher, Yoshua Bengio and Chris Manning.

Another very good resource is the Deep Learning Tutorial that provides the theory, math, and working Python example code.

Deep neural networks have many hidden layers and have traditionally been difficult to train.

In addition to very fast processors (graphics chipsets) a very neat engineering trick is pre-training weights in deep networks by stacking Restricted Boltzmann Machines, etc. After pre-training, weights can be fine-tuned using back propagation.

I haven't been this optimistic about (relatively) short term progress in AI since the early 1980s. Hoorah!