Monday, December 05, 2011

(re) learning Clojure

I have been using Clojure for about 25% of my consulting work in the last 2 years, read two books on Clojure, and I had some Clojure examples in a book I wrote last year.

That said, I don't really feel "expert" at the language the way I do with Java, Ruby, and Common Lisp.

I am trying to fill in some gaps by carefully reading through one of my customer's Clojure code, and all Clojure libraries that I use like Noir, Compojure, etc. I am trying to pick up more idioms. I enjoy it when I see a new trick in someone else's code and going back to my code to improve it.

Sunday, December 04, 2011

Using the New York Times Semantic Web APIs

I am working on a side project of my own in Clojure using the AllegroGraph 4 and Stardog RDF repositories (thanks to Franz and to Clark & Parsia for licenses to use their products!) and my own NLP code. I am using the excellent NYT data access APIs to get research/test data.

I am going to show you some simple examples in Ruby for accessing the NYT Semantic Web APIs that are free to use up to 5000 API calls a day.

I also use other NYT APIs. Each API has an access key that you need to sign up for. I set my access keys as environment variables that I access in my code; for example in Ruby:

# New York Times API Keys:
NYT_SEMANTIC_WEB = ENV['NYT_SEMANTIC_WEB']
NYT_SEARCH = ENV['NYT_SEARCH']
NYT_NEWSWIRE = ENV['NYT_NEWSWIRE']
NYT_PEOPLE = ENV['NYT_PEOPLE']
NYT_TAGS = ENV['NYT_TAGS']

In the following code snippets, I am only using the Semantic Web APIs. I want to first search for available concept types and concept names, based on keyword search:
require 'simple_http'
require 'json'

def semantic_concept_search query
  uri = "http://api.nytimes.com/svc/semantic/v2/" +
        "concept/search.json?" +
        "query=#{CGI.escape(query)}&api-key=" +
        NYT_SEMANTIC_WEB
  JSON.parse(SimpleHttp.get(uri))
end

def  pp_semantic_concept_search query
  json = semantic_concept_search(query)
  puts "Results:\n"
  json["results"].each do |result|
    puts "\n\tconcept_name:\t#{result['concept_name']}"
    puts "\tconcept_type:\t#{result['concept_type']}"
    puts "\tconcept_uri:\t#{result['concept_uri']}" if result['concept_uri']
  end
end

pp_semantic_concept_search("Obama")
The second method "pretty prints" the JSON data that I am interested in. Some of the sample output looks like:
concept_name: Obama, Barack
concept_type: nytd_per
concept_uri: http://data.nytimes.com/47452218948077706853

concept_name: Obama, Malia
concept_type: nytd_per

concept_name: Obama, Michelle
concept_type: nytd_per
concept_uri: http://data.nytimes.com/N13941567618952269073
Once I have a concept type and concept name I can then look up articles:
def lookup_concept_data concept_type, concept_name
  uri = "http://api.nytimes.com/svc/semantic/v2/" +
        "concept/name/#{concept_type}/" +
        "#{CGI.escape(concept_name)}.json?&" +
        "fields=all&api-key=" + NYT_SEMANTIC_WEB
  JSON.parse(SimpleHttp.get(uri))
end
 
def pp_lookup_concept_data concept_type, concept_name
  puts "** type: #{concept_type} name: #{concept_name}"
  json = lookup_concept_data(concept_type, concept_name)
  puts "Results:\n"
  json["results"].each do |result|    puts "\n\tLinks:"
    result["links"].each do |link|
      puts "\t\trelation: #{link['relation']}"
      puts "\t\tlink: #{link['link']}"
      puts "\t\tlink_type: #{link['link_type']}"
    end
    result["article_list"]["results"].each do |article|
      puts "\tTitle: #{article['title']}"
      puts "\tDate: #{article['date']}"
      puts "\tBody: #{article['body']}\n\n"
    end
  end
end

pp_lookup_concept_data('nytd_per', 'Obama, Barack')
Some sample output looks like:
Links:
 relation: sameAs
 link: http://rdf.freebase.com/ns/en.barack_obama
 link_type: freebase_uri
 relation: sameAs
 link: http://dbpedia.org/resource/Barack_Obama
 link_type: dbpedia_uri
 relation: sameAs
 link: http://en.wikipedia.org/wiki/Barack_Obama
 link_type: wikipedia_uri

Title: U.S. Urges Egypt To Let Civilians Govern Quickly
Date: 20111126
Body: WASHINGTON -- Ever since tens of thousands of protesters converged on Tahrir Square in Cairo for the first Day of Revolution exactly 10 months ago, the Obama administration has struggled to strike the right balance between democracy and stability. In the early morning hours on Friday, President Obama came out on the side of the Arab street, issuing

Title: EDITORIAL; The Solyndra Mess
Date: 20111125
Body: The Republicans on the House Energy and Commerce Committee appear to have hit the pause button on their investigation into the failure of Solyndra, a solar panel maker that entered bankruptcy proceedings in September, defaulting on a $528 million federal loan. What have we learned? Nobody comes out of this looking good. Not the Obama

Great to see useful linked data/Semantic Web data sources being made available! Hopefully these little code snippets will save you some time in getting started using the NYT APIs.

Saturday, November 26, 2011

Closer to the metal: Clojure, Noir, and plain old Javascript

I am wrapping up a long term engagement over the next five to six weeks that uses Java EE 6 on the backend, and SmartGWT (like GWT, but with very nice commercially supported components) clients. As I have time, I am starting up some new work that uses Clojure and Noir, and it is like a breath of fresh air:

I keep a repl open on the lein project and also separately run the web app so any file changes (including the Javascript in the project) are immediately reflected in the app. Such a nice development environment that I don't even think about it while I am working, and maybe that is the point!

As I have mentioned in previous blog posts, I really like the Clojure Noir web framework that builds on several other excellent projects. Developing in Noir is a lot like using the Ruby Sinatra framework: handles routes, template support options, but it is largely roll your own environment.

Monday, November 21, 2011

Ruby Sinatra web apps with background work threads

In Java-land, I have often used the pattern of writing a servlet with an init() method that starts up one or more background work threads. Then while my web application is handling HTTP requests the background threads can be doing work like fetching RSS feeds for display in the web app, perform periodic maintenance like flushing old data from a database, etc. This is a simple pattern that is robust and easy to implement with a few extra lines of Java code and an extra servlet definition in a web.xml file.

In Ruby-land this pattern is even simpler to implement:

require 'rubygems'
require 'sinatra'

$sum = 0

Thread.new do # trivial example work thread
  while true do
     sleep 0.12
     $sum += 1
  end
end

get '/' do
  "Testing background work thread: sum is #{$sum}"
end
While the main thread is waiting for HTTP requests the background thread can do any other work. This works fine with Ruby 1.8.7 or any 1.9.*, but I would run this in JRuby for a long-running production app since JRuby uses the Java Thread class.

Using the Stardog RDF datastore from JRuby

I was playing with the latest Stardog release during lunch - the way to quickly get going with the included Java examples is to create a project (I use IntelliJ, but use your favorite Java IDE) and include all JAR files in lib/ (included all nested directories) and the source under examples/src.

6/21/2012 note: I just tried these code snippets with the released version 1.0 of Stardog and the APIs have changed.


I took the first Java example class ConnectionAPIExample and converted the RDF loading and query part to JRuby (strange formatting to get it to fit the page width):
require 'java'
Dir.glob("lib/**.jar").each do |fname|
  require fname
end

com.clarkparsia.stardog.security.SecurityUtil.
      setupSingletonSecurityManager()
com.clarkparsia.stardog.StardogDBMS.get().
      createMemory("test")

CONN = com.clarkparsia.stardog.api.
        ConnectionConfiguration.to("test").connect()
CONN.begin()
CONN.add().io().format(org.openrdf.rio.RDFFormat::N3).
  stream(java.io.FileInputStream.new(
            "examples/data/sp2b_10k.n3"))

QUERY = CONN.query("select * where {?s ?p ?o}")
QUERY.limit(10)
RESULTS = QUERY.executeSelect()

while RESULTS.hasNext() do
  result = RESULTS.next()
  result.getBindingNames().toArray().each do |obj|
    puts "#{obj}: #{result.getBinding(obj).getValue().stringValue()}"
  end
  puts
end
This is mostly just a straight conversion from Java to Ruby. The first few lines enumerate all JAR files and require them. The last part, of interpreting the results, took a few minutes to figure out. I used IntelliJ to explore the result values of class MapBindingSet, looking at available methods to call to get the binding names of the variables in my SPARQL query and the values (as strings) for these three variables for each returned result.

Output will look like:
s: http://localhost/vocabulary/bench/Journal
p: http://www.w3.org/2000/01/rdf-schema#subClassOf
o: http://xmlns.com/foaf/0.1/Document

s: http://localhost/vocabulary/bench/Proceedings
p: http://www.w3.org/2000/01/rdf-schema#subClassOf
o: http://xmlns.com/foaf/0.1/Document
...
If you want to run this bit of code, put it in a file test.rb in the top level Stardog distribution directroy and just run
jruby test.rb
I wanted to be able to use Stardog from both JRuby and Clojure. My lunch time hacking today is just a first step.

Tuesday, November 15, 2011

Experimenting with Google Cloud SQL

I received a beta invite today and had some time to read the documentation and start experimenting with it tonight.

First, the best thing about Google Cloud SQL: when you create an instance you can specify more than one AppEngine application instances that can use it. This should give developers a lot of flexibility for coordinating multiple deployed applications that are in an application family. I think that this is a big deal!

Another interesting thing is that you are allowed some access to the database from outside the AppEngine infrastructure. You are limited to 5 external queries per second but that does offer some coordination with other applications hosted on other platforms or host providers.

Their cloud SQL service is free during beta. It will be interesting to see what the cost will be for different SQL instance types.

It was very simple getting the example Java app built and deployed. I created a separate SQL instance (these are separate from other deployed AppEngine application instances), made a new IntelliJ AppEngine project, pasted in the example code, and it all worked.

Perception of quality is often influenced by price. Since developers now have to pay more for using AppEngine, I find myself looking more at AppEngine as a premium service, which it is. Despite my dislike for MySQL (I use PostgreSQL when given a choice), Google's hosted and managed MySQL cloud data service looks good and provides developers with more options. Their SQL service is synchronously replicated between data centers automatically for you.

It has been a few years now since I had to either set up a physical server or a leased raw server for any deployments. I like that! Thank you Platform as a Service (PaaS) providers like Heroku (built on AWS) and AppEngine - they are the future. I still do a lot of work on "plain AWS" but that is still much more agile than provisioning my own servers.

Saturday, November 12, 2011

The quality of new programming languages is apparent by looking at projects using the language

The community growing around the Clojure language is great. While the Clojure platform is still evolving (quickly!) browsing through available libraries, frameworks, and complete projects is amazing.

My "latest" favorite Clojure project is Noir that simply provides a composable mechanism for building web applications (using defpartial). I get to use Noir on two customer web app projects (and some work with HBase + Clojure) over the next month or two, and I am looking forward to that. The simpler of the two web apps is an admin console exposing some APIs on a private LAN and the Try Clojure web app is a great starting point, as well as an example of a nicely laid out Noir application.

Since Clojure is such a concise language I find it easy to read through, understand, evaluate, and use projects. Since I am still learning Clojure (I have just used Clojure for about 6 months of paid work over the last couple of years) the time spent reading a lot of available code to find useful stuff is very well spent because reading good code with an open repl is a great way to learn new idioms.

Monday, November 07, 2011

Writing a simple SQL data source for the free LGPL version of SmartGWT

While travelling back from a vacation I cleaned up some old experimental code for writing a fairly generic SmartGWT data source with the required server side support code. The commercial versions of SmartGWT have support for connecting client side grid and other components to server side databases. For the free version of SmartGWT you have to roll your own and in this post I'll show you a simple way to do this that should get you started. Copy the sample web app that is included in the free LGPL version of SmartGWT and make the modifications listed below.

I also set up a Github project that contains everything ready to run in IntelliJ.

The goal is to support defining client side grids connected to a database using a simple SQL statement to fetch the required data using a custom class SqlDS. I had to strangely format the following code snippets to get them to fit the content width for my blog:

    ListGrid listGrid = new ListGrid();
    listGrid.setDataSource(
      new SqlDS(
         "select title, content, uri from news where " +
         "content like '%Congress%'"));
    listGrid.setAutoFetchData(true);

The following datasource looks for the column names (i.e., "title", "content", and "uri") in the SQL query and creates fields in the constructed SqlDS instance with those column names. I also assume that there is a servlet defined to process the HTTP GET fetch at the bottom of the constructor:

package com.markwatson.client;

import com.smartgwt.client.data.DataSource;
import com.smartgwt.client.data.DataSourceField;
import com.smartgwt.client.types.DSDataFormat;
import com.smartgwt.client.types.FieldType;

import java.util.Arrays;
import java.util.List;

public class SqlDS extends DataSource {
  public SqlDS(String sql) {
    setID(id);
    setDataFormat(DSDataFormat.JSON);

    List<String> tokens =
        Arrays.asList(sql.toLowerCase()
           .replaceAll(",", " ").split(" "));
    int index1 = tokens.indexOf("select");
    int index2 = tokens.indexOf("from");
    for (int i=index1+1; i<index2; i++) {
      if (tokens.get(i).length() > 0) {
         addField(new DataSourceField(tokens.get(i),
               FieldType.TEXT, tokens.get(i)));
      }
    }
    // should do a better job at UUENCODEing SQL:
    setDataURL("/news?query="+ sql.replaceAll(" ","20%"));
  }
}

The only thing left to do is write a servlet that processes web wervice requests like /news?query=... and returns JSON data with fields from the SQL query for each returned row for display in the list grid:

package com.markwatson.server;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.io.PrintWriter;

public class DbRestServlet extends HttpServlet {
    @Override
    public void doGet(HttpServletRequest req,
         HttpServletResponse resp) throws IOException {
      PrintWriter out = resp.getWriter();
      try {
          // remove "query="
          String sql = req.getQueryString().substring(6); 
          int index = sql.indexOf("&");
          sql = sql.substring(0, index);
          out.println(
             DbUtils.doQuery(sql.replaceAll("20%", " ")));
      } catch (Exception ex) {
        ex.printStackTrace(System.err);
        out.println("[]");
      }
    }
}

The utility class DbUtils returns JSON data which is what the client side SqlDS DataSource class expects from the server:

package com.markwatson.server;

import org.codehaus.jackson.map.ObjectMapper;

import java.io.StringWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class DbUtils {
  static String dbURL;
  static Connection dbCon;

  static {
    try {
      Class.forName("org.postgresql.Driver");
      // Define the data source for the driver
      dbURL = "jdbc:postgresql://localhost/test_database";
      dbCon = DriverManager.getConnection(
                     dbURL, "postgres", "password");
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

  public static String doQuery(String sql)
                               throws Exception {
    ObjectMapper mapper =
       new ObjectMapper(); // should cache and reuse this!
    List<Map<String, String>> ret =
        new ArrayList<Map<String, String>>();
    Statement statement = dbCon.createStatement();
    ResultSet rs = statement.executeQuery(
                            sql.replaceAll("20%", " "));
    java.sql.ResultSetMetaData meta = rs.getMetaData();
    int size = meta.getColumnCount();
    while (rs.next()) {
      Map<String, String> row =
         new HashMap<String, String>();
      for (int i = 1; i <= size; i++) {
        String column = meta.getColumnName(i);
        Object obj = rs.getObject(i);
        row.put(column, "" + obj);
      }
      ret.add(row);
    }
    StringWriter sw = new StringWriter();
    mapper.writeValue(sw, ret);
    return sw.toString();
  }
}

I had to add three JAR files to the SmartGWT sample project:

jackson-core-lgpl-1.8.1.jar
jackson-mapper-lgpl-1.8.1.jar
postgresql-9.0-801.jdbc4.jar

SmartGWT's DataSource abstraction is a real improvement over how I connect to databases in GWT apps where I tend to write a lot of small RPC services to fetch and save data as required. My simple DataSource subclass SqlDS does not support writing data back to the database from the client; it can either be extended or you can use a RPC service call to save edited data.

Sunday, November 06, 2011

Annoyed by anti-MongoDB post on HN

I am not going to link to this article - no point in giving it more attention. The anonymous post claimed data loss and basic disaster using MongoDB. I call bullshit on this anonymous rant. Why was it posted anonymously?

I am sitting in an airport waiting to fly home right now: just finished extending a Java+MongoDB+GWT app and I am starting to do more work on a project using Clojure+Noir+MongoDB.

I do have a short checklist for using MongoDB:
  • For each write operation I decide if I can use the default write and forget option or slightly slow down the write operation by checking CommandResult cr = db.getLastError(); - every write operation can be fine tuned based on the cost of losing data. I usually give up a little performance for data robustness unless data can be lost with minimal business cost.
  • I usually use the journalling option.
  • Use replica pairs or a slave.
  • I favor using MongoDB for rapid prototyping and research.
  • I use the right tool for each job. PostgreSQL, various RDF data stores, and sometimes Neo4J are also favorite data store tools.

Friday, November 04, 2011

Notes on converting an GWT + AppEngine web app using Objectify to a plain GWT + MongoDB web app

There has been a lot of noise in blog-space criticizing Google for the re-pricing of AppEngine services. I don't really agree with a lot of the complaints because it seems fair for Google to charge enough to make AppEngine a long term viable business.

That said, I have never done any customer work targeting the AppEngine platform because no one has requested it. (Although I have enthusiastically used AppEngine for some of my own projects and I have written several AppEngine and Wave specific articles.) I still host KnowledgeBooks.com on AppEngine.

I wrote a GWT + AppEngine app for my own use about a year ago, and since I always have at least one EC2 instance running for my own experiments and development work I decided to move my app. It turns out that converting my app is fairly easy using these steps:
  • Copy my IntelliJ project, renaming it and removing AppEngine facets and libraires.
  • Add the MongoDB Java required JARs
  • I had all of my Objectify datastore operations in a single utility class on the server side - I converted this to use MongoDB
Sure, a complex application would take a while, but my app only has 6 model classes (all POJOs) so the whole process took less than 90 minutes.

Recent evaluations of web frameworks while on vacation

My wife Carol and I have been visiting family in Rhode Island this week and since our grandkids are in school on weekdays, I have had a lot of time to spend writing the fourth edition of my Java AI book and also catching up on reevaluating web frameworks.

Although my main skill sets are in data/text mining, general artificial intelligence work and Java server side development, I do find myself spending a lot of time also writing web applications. In the last few years, I have done a lot of work with Rails (and some Sinatra), GWT, and most recently with SmartGWT because one of my customers really liked SmartGWT's widgets. (Note: if you are in the San Jose area and want to work on a SmartGWT project with me, please email me!)

For my own use, because I have strong Java and Ruby skills, the combination of Rails, GWT, and SmartGWT works very well for me when I need to write a web app.

That said, I have spent time this week playing with Google's Closure Javascript tools and less time with ClojureScript that uses Google's Closure. Frankly, both Closure and ClojureScript look fantastic, but I have a personal bias against making Javascript development a career and although ClojureScript works around this issue by compiling a nice subset of Clojure to Javascript I am concerned that the market for developing with ClojureScript is probably small. If you do want to write Lisp code on the server and client side definitely spend a few evenings playing with ClojureScript because it may be a good fit for you. I have also recently had a good experience with Clojure and the Noir web framework.

Tuesday, November 01, 2011

Anyone know any SmartGWT and Java developers looking for a job?

A call out for some help: one of my favorite customers is looking for a SmartGWT and Java developer in San Jose area - anyone know anyone good + available?

Sunday, October 23, 2011

Common Lisp example code for my Semantic Web book is now LGPL licensed

A few days ago I re-released the Java, JRuby, Clojure, and Scala example code for my JVM languages edition of my Semantic Web book under the LGPL.

I just did the same thing today for the Common Lisp edition of this book:

Github repository

Thursday, October 20, 2011

Changed license from AGPLv3 to LGPLv3 for example code in my book "Practical Semantic Web and Linked Data Applications, Java, Scala, Clojure, and JRuby Edition"

Here is the github repository for the source code and all required libraries.

My open content web page" where you can download a free PDF for my book or follow the link to Lulu to buy a print version.

Enjoy!

Semantic Web, Web 3.0, and composable systems

I really enjoyed Steve Yegge's long post last week about the shortcomings of Google's architecture. Google provides great services that I use every day but building systems as Amazon does of composable web services (AWS) to build more complex products and services seems like a better approach.

I have been experimenting with Semantic Web (SW) technologies since reading Tim Berners-Lee, James Hendler, and Ora Lassila's 2001 Scientific American article. I have not often had customer interest in using Semantic Web technologies and I think that I am starting to understand why people miss the value-add:

Just as AWS provides composable web services SW helps information providers to provide structured and semantically meaningful data to customers and users who decide what information to fetch, as they need it. These consumers of SW data sources must have a much higher skill set to build automated systems compared to a user of the web who manually navigates around the web to find information that they need.

So I think that the issue becomes how can to make it relatively easy for system designers and software engineers to fetch and consume information. The easy answer is to point them to a good book on SPARQL and RDF data sources. A better answer is probably to provide examples using common programming languages, the "best" libraires for making SPARQL queries, and small sample applications tailored to the types of data that an information provider provides and what type of inferencing makes sense to discover implicit data that is not explicitly in the provider's data store.

I would describe the SW as building and using composable data sources that are defined in terms of ontology's that make it possible to merge data from different sources and to discover implicit data through inference/reasoning.

Wednesday, October 19, 2011

A letter to my friends and family: the death of American democracy: not dying, but already dead

Hello family and friends,

Democracy in our country is dead, but you would not know it from reading the highly censored corporate-owned and controlled "news"/propaganda media.

If you look to foreign news or youtube or the general Internet or talk to friends in foreign countries that have a free press, you will understand that is is not rank and file cops, but their supervisors commiting what I think can only be called illegal brutality against the "occupy" movement.

The high-ranking police do this because they are ordered by their puppet-masters to do so. There is a huge disparity between what the general public wants and what the corporate lackeys in Congress and the corporate lackey Obama (following in the ubber corporate lackey W.Bush's footsteps) do. As Warren Buffet said in a recent interview, the USA is now a plutocracy, and that is a shame. Good writeup on a writer's arrest: I enjoy Naomi Wolf's work - a writer with reasonable views.

Our founding fathers warned us about banks and control of currency, control by the rich, etc. taking control of our country, but the right-wing extremists have removed civics from school curriculums, used total control of the media to scare people into giving up their rights as American citizens, and acted against the common good.

One Republican agenda is to make it difficult for the poor, the elderly without transportation, and students to register to vote, and their anti American "values" disgust me. Anyone who has anything to do with keeping American citizens from voting by making registration more difficult is an asshole. These are people who don't understand what our country is supposed to represent, or don't care.

One last comment: the un-educated are most easily swayed by right-wing propaganda, which is why I believe that our educational system has been deconstructed by those on the far right politically.

It is time for people who have self-identified as republicans and conservatives to speak up against the un-American right wingers who corrupt our political system. Barry Goldwater had great conservative ideals which I largely agree with, but I bet he is rolling in his grave in disgust by the current bunch of "conservatives."

Wednesday, October 05, 2011

Appreciating Steve Jobs and the people taking part in "Occupy Wall Street"

First: my condolences to Steve Job's family and friends. He was an awesome guy who lived on his own terms and made the world a better place by doing things that he loved and was proud of.

I would also like to give a shout out of appreciation to the broad spectrum of Americans who are taking part in "Occupy Wall Street." They are facing state sponsored brutality: the elite class doesn't like the legal protests so they put pressure on the government and government influences police to do things that in their hearts they know are not right. I have been reading a lot of strong criticism of the police for their brutality in New York City against mostly peaceful American citizens exercising their first amendment rights - I personally try to not blame the police because I think it is more accurate to blame the people who control them. There are shocking videos on youtube of police brutality against US citizens in New York City during these protests and I thought about putting some links here, but these videos are violent and may be upsetting to many people.

The erosion of rights for American citizens is shocking even though I have been watching the same thing happening already in England so I have been expecting a similar collapse of basic American rights and values in favor of the powerful.


Sunday, October 02, 2011

Experimenting with Clojure 1.3 and Noir 1.2

Noir is a Clojure "mini framework" that is built on top of Compojure. Chris Granger released a new version today that is updated for Clojure 1.3. After working mostly in Clojure last year but using Clojure not very much this year (lots of work for a Java shop) I decided to check out both Clojure 1.3 and Noir 1.2 this afternoon - and I liked what I saw.

The Noir example application uses a recent version of clj-stacktrace and stack traces are much better: Noir prints a well formatted stack trace on any generated web page if an error occurs. This stack trace is very good, filtering out information that you really don't want to see, identifying where the error occurred, and with usually a useful error message.

This eliminates the only major complaint I have ever had with Clojure. Very cool!

The Noir web site had a link to an article written by Ignacio Thayer on running a Clojure Noir MongoDB app on Heroku, using a free MongoDB account. Worked great. I made a trivial change to src/noir_mongo_heroku/views/welcome.clj to also work with a local MongoDB service:
 (let [mongo-url (get (System/getenv) "MONGOHQ_URL")]
   (if mongo-url
     (let [config    (split-mongo-url mongo-url)]
       (mongo! :db (:db config)
               :host (:host config)
               :port (Integer. (:port config)))
       (authenticate (:user config) (:pass config)))
     (mongo! :db "db"))
Noir roughly supplies the same general level of functionality as Sinatra. Noir's development environment, like Compojure that it is layered on, supports live code reloading so if you are used to an interactive dev style like that of Rails and Sinatra, and if you like Lisp (:-) then give it a try.

I am an old Lisp hacker, starting with Lisp on a Dec 10, and getting my first Lisp Machine in 1982. I think that Lisp (Common Lisp, Scheme, Clojure, etc.) mostly appeals to those of us who would (mostly) like to build up our own infrastructure. I say this even though some Lisps have huge libraries: Franz Lisp, Racket, Clojure (both good native libraries and stuff inherited from Java-land), etc. are certainly "batteries included" languages/platforms, but I still characterize Lisp'ers as build it ourselves types.


JPA 2 is the only part of Java EE 6 that I like a lot - how it compares to ActiveRecord

First, in Ruby-land: I am a huge fan of both Datamapper and ActiveRecord. Here I am only going to talk about ActiveRecord because it is freshest for me because I have been reading through a few Rails specific books that Obie Fernandez's publisher Addison-Wesley sent me review copies of earlier this year: these books use ActiveRecord 3.*. Recently I created two small throw-away learning apps using Rails 3.1 to kick the tires on new features and I used ActiveRecord for each.

One of my customers is a Java EE 6 shop (although we now do use SmartGWT for web apps) and I have been using JPA 2 (Hibernate provider) a lot. In Java-land, I can't imagine using anything else to access relational databases unless you want to use the Hibernate APIs directly, and I would not be inclined to walk that path.

I used to approach object modeling and design differently in Ruby: I would usually start with a relational database and use ActiveRecord's (fairly) automatically generated wrapping APIs. I now think that this is a generally "less good" approach (unless you are using a large legacy database) and now I start with generating models and migrations for the first cut of my object models and then use new migrations to manage changes to my models' schemas. Perhaps a small difference, but I am happier thinking about Ruby model classes than database schema.

So, my development approach is now very similar to using Ruby+ActiveRecord and Java+JPA 2.

My belief is that it is much faster to do object modeling and data persistence, including time for changing object models and associated business logic in Ruby and ActiveRecord. Not even the most die-hard Java developers should seriously argue with this. Given more expensive development in Java + JPA 2, I think that it is worthwhile listing some things I like less about ActiveRecord:
  1. It is more difficult to read code and understand the models because access methods are not defined in the code. For example, if a class User has a many to many relationship with the class Project, evaluating something like (User.first.public_methods - Object.methods).sort will indeed show that ActiveRecord provides a method projects that returns an array off associated projects. However, if you don't know the conventions that ActiveRecord uses for inherited class methods like has_and_belongs_to_many, has_many, belongs_to, etc. then reading other people's code may be confusing. I find that navigating large numbers of JPA annotated model classes in a good IDE like IntelliJ is quick and makes understanding a very large codebase manageable.
  2. A quick 100 hour reading of the Hibernate Reference Manual shows more capability and options than I am aware of in ActiveRecord.
  3. For large "enterprise" applications where increased cost of development does not matter as much, the availability of off shelf distributed caching options and other highly scalable infrastructure components favors, in my opinion, the Java platform.
That said, valid counter arguments are:
  1. The learning curve for ActiveRecord is relatively small so after writing a few complete web applications a developer will know most of what they need and well written Rails apps should be easy to read and understand. I also find that reading though large Rails applications using RubyMine provides almost as good of a developer experience as working with large Java projects with IntelliJ.
  2. Most Java developers will never need to use most of what JPA 2 and Hibernate offer.
  3. Worrying about very large scale optimization is almost always a premature optimization. I am reading a great book "The Lean Startup" that makes this point very well: avoid making very long term plans, instead favoring short iterations of planning, measuring, then pivot or keep the current plan.
One thing that I believe that both JPA 2 and ActiveRecord do very well is managing transitive relationships. Both can be set to perform cascading deletes, etc. I think that JPA 2 and Hibernate offer a finer control for maintaining associations between objects but for my work I have found ActiveRecord to always be adequate.

I enjoy using both ActiveRecord and JPA 2 enough that I don't really care which platform my customers prefer, and I almost never try to talk my customers into switching away from their platform of choice.

Thursday, September 29, 2011

Finally saw movie "Crazy Heart" - thinking that an AI could write country western songs

Inspired by James Meehan’s Tale-Spin program and thesis, I have to say: writing an AI program to write country western music seems possible. Prof$t!

Sunday, September 25, 2011

For work I have been using GWT/SmartGWT. For fun: Seaside and Pharo

I have been enjoying working on two customer web apps written in SmartGWT. SmartGWT is built on Google's GWT with the addition of Isomorphic's smart client library that implements very nice data grids and other UI components and also has good support for wiring rich clients to data sources. Still, there is a lot of ceremony involved in GWT and SmartGWT development so I would recommend these technologies for large projects. For me this ceremony and large learning curve is well worth it because I like coding both rich client and server side components in Java in one development environment (IntelliJ).

For side projects that require a web UI I like using both Play! and Rails (and a lot of Rails development work in the last 3 or 4 years).

Just recently as another side learning project I have been revisiting the Seaside continuation based web framework for Smalltalk. This week I bought the PDF version of "Dynamic Web Development with Seaside" and when I get bits of free time I have really been enjoying reading through the book with a Pharo Smalltalk Seaside image open next to the book to work along and experiment as I read. You can read through this book online also if you want to check out Seaside.

Seaside provides a very agile development environment, both editing code in a Smalltalk browser and hooks in web apps themselves.

Thursday, September 15, 2011

Google+ Developer APIs

I received an email from Google today about the release of APIs to access public data. I looked at the Ruby example (a Sinatra App) and the Java JSP example during lunch. Looks like a good kick-start for using public Google+ data in our own web apps. There are also examples for other languages. If I have time this weekend I would like to try deploying an app of my own.

Sunday, August 28, 2011

Getting set up to work on the 4th edition of my Java Artificial Intelligence book

For the 3rd edition, I used Eclipse for both development of the Java examples and to prepare the Latex manuscript (using the Eclipse Latex plugin TeXlipse).

I really prefer using IntelliJ for Java development and TeXShop (Mac OS X only) for editing Latex files so I just converted my writing setup.

I have a fairly good idea of what new topics I want to cover but I am still deciding what material from the 3rd edition I want to remove.

Since I released the 3rd edition over three years ago, I have averaged about 300 downloads of the free PDF version a day with a few sales of the print edition each month. I like making a free version available for people to read and generating traffic for my web site where I advertise my consulting services is great. Unfortunately, in the last couple of years when I see my book in search results it is very often on someone else's web site which violates the conditions of the Creative Commons non-commercial use license for the PDF version of my book.

My Latex setup supports output of PDF, HTML, and HTML with automatically generated embedded Google ads. I might change my mind, but I think that the free version of the 4th edition will be HTML pages on my web site, perhaps with a few Google ads, but probably not. I'll continue to sell print copies of my book on Lulu and also offer a PDF version for a few dollars.

Changing the way we use the Internet

Unless searching for online docs, looking up error codes and error messages, etc., I do relatively little web search and browsing anymore - compared to even a year ago. I usually rely on good links from Twitter and Google+ to find things worth reading, keep up with new tech, and sometimes even read the news.

In the 1980s, I was a "find useful stuff at public FTP sites" resource at SAIC. I spent time maintaining lists of useful FTP sites and what they contained so I could help people quickly find stuff. Gopher was a step up. Good search engines were a huge improvement for finding stuff on the web.

Now I find myself mostly depending on what interesting people recommend. Even though I am a techie and don't represent a typical Web user, I still think that the trend of using social media to find interesting (and even useful!) material is widespread. It will be interesting to see how the major web companies like Google, Amazon, Microsoft, Yahoo, etc. perform financially in the future because it seems very difficult to predict new disruptive technologies that will capture peoples attention and interest.

Tuesday, August 16, 2011

What I have been working on lately

It has been a while since I blogged. Carol and I are leaving soon on a driving trip with our grandkids, daughter and son in law - it was requested that we both leave our laptops at home, a request that we are planning to honor. So, since I will be without a computer for about 10 days, here is a quick catch-up on what I have been doing:

I have been fairly busy lately working for two customers. At a friends company we are writing rich client web applications using SmartGWT and Java EE 6 on the backend. I have also been working for Compass Labs on building a large graph database and also doing some data mining of Freebase data. Google bought Freebase last year and is working on new features and APIs. I am in a private beta for their new Freebase APIs - good stuff!

Tuesday, August 02, 2011

Second edition of "Semantic Web for the Working Ontologist"

First thanks to Morgan Kaufman Publishers for sending me a copy of the second edition. Dean Allenmang and Jim Hendler did a great job of updating the examples and fixing a few small glitches from the first edition.

I have been extremely busy with work so I have only been able to spend about 90 minutes so-far with the second edition but I hope to give it a careful reading when I am on vacation in a few weeks.

This book is an excellent guide for anyone who wants to invest a fair amount of time to learn how to write semantic web enabled applications: a very comprehensive book.

I recently bought a eBook copy of Bob DuCharme's short book Learning SPARQL
Querying and Updating with SPARQL 1.1
that I would also like to recommend as a fairly easy introduction to using SPARQL repositories and generally using SPARQL in applications.

If two good new semantic web books were not enough, I have also had fun recently experimenting with Clark & Parsia's new Stardog RDF datastore that offers fast data loading, fast SPARQL queries, OWL 2 support, and built in search.

Saturday, July 09, 2011

Working on a new GWT application for a personal project

I have been using SmartGWT a lot for a customer's project so I have been generally digging into both the Google Web Toolkit (GWT) and the SmartGWT system that builds on top of GWT.

My personal project is a reimplementation of my Rails application for ConsultingWith.me (very old placeholder page) in GWT. I plan on writing a few long blog articles about GWT and/or SmartGWT in the near future but for now I have learned a few things that are worth sharing:
  • Plan ahead on designing data models that essentially live on the client side, that support caching on the client side, and efficiently support asynchronous data fetches from the server. The view pages (written in Java, compiled to compact and efficient Javascript) must always make asynchronous calls to the local (in the browser) data model because it is unknown whether the local model has the data or must itself fetch asynchronously from the server.
  • Stop thinking about session data in the same way that you do in an old style web application: any class variables in your application are always available in the compiled client side code. The only reason to store session data on the server is security: remember that in principle someone could hack the Javascript code in their browser so for some operations the client needs to pass a security token (defined during login/authentication) to the server and the server must verify access privileges for an operation.
  • Developing in your IDE's debugger is a good idea. I use IntelliJ and there are separate panels for server side log views, warnings generated by the Javascript code on the client, and the debugger itself. Very useful!
  • Once you have designed your (mostly) client side data model and determined its interface to your server side datastore (which is the AppEngine datastore in my current project) try to get as much of the server side support written and debugged as you can before seriously starting to write the client side UI. The main reason for this is that if you are only editing client side Java code (that gets compiled to Javascript) you can edit, save changes, refresh your browser and immediately see the results. Whenever you need to modify server side code you must restart the application which really slows down development.
Writing a rich web application in GWT certainly takes a different mindset than simpler AJAX applications but it is worth the effort in some cases. For my current personal project, the original Rails implementation had great functionality but was not fluid enough to use continuously throughout my work day to track time and write work notes. Because this is for my own use and there is no deadline for finishing it is worth my time to get it just right, working on it an evening or two a week.

At the beginning of this year I was very interested in SproutCore for the same general developer use cases as GWT. SproutCore uses the same sort of asynchronous event handling style as GWT with a big difference that you write the client in Javascript instead of Java. This year my largest customer evaluated both SproutCore and GWT (and derivatives like SmartGWT and GWT-Ext). Since they are a Java shop they decided to use SmartGWT which gave me an incentive to dive more into GWT related technologies. As with most of you, my technical interests are sometimes (but not always!) driven by what people pay me to work on.

Tuesday, July 05, 2011

Apache Google Wave in a Box project is starting to look good

It has been a long time since I tried to run the wave protocol stuff. I just grabbed the latest source and followed these directions on my MacBook. I was quickly up and running with no problems.

The web UI is a lot simpler than Wave but looks similar. Wave in a Box is looking good as a development platform - something to customize for your organization.

I used two browsers, Chrome and Safari to create two test accounts, and as I expected, the real time messaging, etc. worked fine.

Wednesday, June 29, 2011

Google+ seems to be very well done

Thanks to Marc Chung for the Google+ invite. The web UI is very slick and I know I am going to have fun with g+. Anyone on Google+, let me know if you want to be in my Artificial Intelligence, Clojure, and/or Ruby groups.

Monday, June 27, 2011

I am using SmartGWT on two projects

My currently largest customer uses the commercial version of Isomorphic's Smart GWT and I have spent a few evenings working on one of my own projects (that I will probably open source when it is done) that uses the free LGPL version. 7/9/2011 edit: I ended up re-writing this in straight-up GWT. I should also mention that I have signed a consulting agreement with Isomorphic (svn commit rights :-) but I have not had time in my schedule to do any work for them yet.

SmartGWT uses Google's Java to Javascript compiler but instead of using the standard GWT UI components it uses Isomorphic's SmartClient Javascript library (suitably wrapped for extending GWT). The commercial version's sweet spot is reasonably easy integration with server side data sources like relational databases, JSON web services, etc. The free LGPL version provides an example of a client side data source that you can hook up with custom code to web services that you write yourself. For my for-fun side project my back end server processes requests and returns JSON data: fairly easy but not as nice as using Isomorphic's proprietary data sources.

The developer experience is similar to using Google's GWT. You program the UI in Java and it gets compiled to Javascript. If you are only modifying client sode code, then run in the debug/developer mode and your changes to Java UI code get recompiled and loaded into your web browser when you do a refresh.

Sunday, June 19, 2011

Prelude to learning Clojure and Scala: learn some Haskell

I worked through part of the "Real World Haskell" book a few years ago, but settled on mostly using Clojure as a functional language, with some Scala also. I bought "Learn You a Haskell for Great Good!" this week and I have been enjoying the gentle approach to learning the language. Miran Lipovańća did a good job writing this book. ("Real World Haskell" is also excellent.)

One thing that occurred to me is that since Clojure and Scala borrow so many good ideas from Haskell that learning some Haskell before diving into either Clojure or Scala might be a good idea.

Wednesday, June 15, 2011

Largest public SPARQL endpoint: sparql.sindice.com

The Sindice project has transitioned from a university and consortium project of DERI to a commercial company. Check out the SPARQL endpoint web form - very impressive. During lunch I tried using the Sindice Java client library and it was easy to use but does not for some reason support direct SPARQL queries.

Sunday, June 12, 2011

Programmer study time

I love both of my jobs (programming and writing) as long as I don't overdo it and take a lot of time off for other activities like hiking, kayaking, playing musical instruments, and cooking.

I have another "down time" activity that is both fun and relaxing for me: studying things that help me with my jobs. For example, when I first adopted Ruby as my primary scripting language and also started developing using Rails, I spent a lot of time reading through the C implementation of Ruby, the Ruby libraries, and the Rails source code. I find this kind of study relaxing because there are no deliverables and things learned studying the implementation of tools I use really pays off in increased productivity and learning new programming idioms and techniques. I used to base a lot of my work on the Tomcat server and ten years ago I made a real effort to understand its implementation. When I was very young I worked as a systems programmer and kept source listings of interesting parts of the operating system at home.

My "reading" activities today included two hours looking through some of the code in the EJB container sub-package of the Glassfish web application server. One of my customers runs much of their business on Glassfish so I am motivated to understand the platform. I learn things reading through Glassfish code that I never would reading books on EJB 3.1.

Time is our most precious possession and certainly it is not to be wasted. That said, spending at least several hours a month carefully studying the code in a few of the open source software tools I use is time well spent.

Wednesday, May 04, 2011

Polyglot programmers: setting up multi-language access to data

For those of us who tend to use several programming languages doing some up front work to make sure that we have access to all required data stores in the languages we use then makes it possible to really pick the best language for each task. I have a side project that I have been coding on for over ten years, with lots of code in Common Lisp, Scheme, Java and Ruby. If I can ever get a long enough break from my consulting business I have a few good ideas how to monetize some of this work. I was working on this project during a recent cross-country flight and I started adding a bit of new functionality in Ruby, switched to Common Lisp, and the implementation was a bit easier. I needed access to a PostgreSQL database (using PostgreSQL's built in text indexing and search) I use and once I had Internet access after the flight (for some quick reference), I worked out the few lines of code to interface with my annotated news data collection. Here is a small code snippet in case you need to do the same thing (using the lower level pg library instead of clsql because I only need to hit PostgreSQL):
(ql:quickload "pg")

(pg:with-pg-connection
  (conn "kbsportal_development" "postgres"
         :host "127.0.0.1" :password "password")
  (postgresql::pgresult-tuples
    (pg:pg-exec conn "select * from news")))

;; search support:
(pg:with-pg-connection
  (conn "kbsportal_development" "postgres"
        :host "127.0.0.1" :password "password")
  (postgresql::pgresult-tuples
    (pg:pg-exec conn "select * from news where to_tsvector(content) @@ to_tsquery('obama | congress')")))
I added this snippet to a directory full of snippets for common tasks for each language that I use.

Saturday, April 30, 2011

Text search in SimpleDB: a Ruby example

You might want to use SimpleDB for storage and to support text indexing and search if you did not want to manually run and administer Solr yourself. Here is a little snippet that shows how to store searchable documents in SimpleDB:
require 'rubygems'
require 'aws_sdb'

SERVICE = AwsSdb::Service.new

# assuming that this domain is already created
DOMAIN = "some_test_domain_7854854"

class Document

  def initialize name, text
    words = (name + ' ' + text).downcase.split.uniq
    attributes = {:words => words, :text => text}
    SERVICE.put_attributes(DOMAIN, name, attributes)
  end
  
  def Document.search query
    # The last inject takes the intersection and
    # insures that all search terms are present:
    keys = query.downcase.split.collect {|x|
      SERVICE.query(DOMAIN,
                    "['words' starts-with '#{x}']")[0]
    }.inject {|x, y| x & y }
    keys.collect {|key|
                  SERVICE.get_attributes(DOMAIN, key)}
  end

end

Document.new('title1',
             'The bird flew to the lake for water')
Document.new('title2',
             'The dog chased the cat')

p Document.search 'flew lake'
The formatting of this code snippet is odd because I was trying to get short lines to fit the page width. This code snippet is not terribly efficient but since the first 25 Amazon SimpleDB Machine Hours consumed per month are free for your Amazon AWS account using this code example in your applications can end up being almost free (there are small data storage and bandwidth charges) and you get the advantage of no administration hassles. The output for the above code snippet is:
[{"text"=>["The bird flew to the lake for some water"],
  "words"=>["bird", "flew", "for", "lake", "the",
            "title1", "to", "water"]}]
There are two improvements that you can implement: remove noise/stop words from the words attribute and make the code multithreaded to execute the individual SimpleDB queries in parallel when possible to do so. I was trying to make this example code snippet concise. For simple and/or moderately used applications these improvements aren't necessary.

If you run this example remotely from your laptop, notice that remote SimpleDB access is a little slow. When run on a small EC2 instance, it takes about 0.05 seconds to add a "document" to SimpleDB and about 0.1 seconds to search using two search terms.

Thursday, April 21, 2011

And the best JVM replacement language for Java is: Java?

Although I use Ruby (mostly Rails) and Common Lisp on many customer projects, I am heavily invested in the Java platform and I don't see that changing in the next ten years or so.

Java is more than a little heavy on ceremony however, and I would like a really agile language for the JVM. I have used Clojure a lot in the last year for work on one customer's project but at least for now the lack of concise and useful runtime error backtraces kills some of the joy of using Clojure. Really nice language and community however, and I expect in a few years Clojure may be my primary JVM language. I love coding in Ruby and the JRuby developers do a great job moving the sub-platform forward. However, except for large Rails applications, I don't see myself writing very large applications in Ruby: for me Ruby is a scripting language for getting stuff done quickly and easily. I do like Scala but the learning curve is steep and that means that it is difficult to find pre-trained highly skill Scala developers.

Java has the sweet spot of lots of great tools and a rock solid infrastructure. So, how to make Java more agile? I do a few things that help: I use public attributes so I don't bother with getters/setters anymore unless I am using a framework that needs them for introspection. I very much like JPA, but I am growing less fond of the rest of the Java EE 6 stack - really a lot of layers between designing and writing code and runtime; too much abstraction for my tastes. The Play! Framework is great, in general, and I am using it on my personal project and I look forward to seeing how Play! develops as an agile platform over the next few years.

Tuesday, April 19, 2011

Some new Platform as a Service providers: cloudfoundry.com and dotcloud.com

I am on vacation so I have not had much chance to try the beta invites I just received for cloudfoundry.com and dotcloud.com but both look promising as works in progress.

For now, Cloud Foundry is set up for Ruby Rack applications (like Rails and Sinatra) and Java Spring apps. They currently support MongoDB, MySQL and Redis. They will release the core software if you want to run a cloud on your own servers.

Dotcloud supports a wide range of platforms and data stores. Their roadmap shows what is available right now and what is planned.

Both beta programs are free for now. It will be interesting to see what the costs are.

Wednesday, April 13, 2011

(Roughly) comparing Play! version 1.2 with Rails

Both the Play! and Rails frameworks implement MVC and have very agile development environments. Play!, being written in Java (but also supporting Scala development) accomplishes this agility by using the Eclipse incremental Java compiler so if you edit any Java code or HTML template files (with embedded Java/Groovy expressions) you immediately see the results after refreshing your web browser.

While Play! is not nearly as complete of a stack as Rails, it does include modules for
  • MongoDB
  • AppEngine
  • Objectify
  • GWT
  • Search
  • PDF generation of any view
  • Scala use
  • CoffeeScript
  • OpenAuth working with Google, Yahoo, Twitter, etc.
  • Simple CRUD scaffolding
  • Facebook Connect and Graph API
  • Lucene search of JPA models
  • etc.

I have several years of Rails experience and I am using Java EE 6 for a customer project. With this background, I put Play! in the sweet spot between Java EE 6 and Rails: easy to learn if you know Java and supports agile development. My favorite part of Java EE 6 is JPA, which Play! supports.

I have played with Play! off and on for over a year, but just for a few hours at a time, and never any serious projects. (This is largely because most people who hire me usually want me to do Lisp or Ruby development.) I have more or less decided to use Play! for one of my own projects because I already have so much reusable Java code I have written for it and I like the interactive Play! development process. My wife and I are just starting a vacation, and after finding myself in a quiet place with time on my hands (we catch an early flight tomorrow morning and are staying near the airport) I just reimplemented a bit of data modeling code that I wrote in Clojure, Ruby, and Common Lisp last weekend, this afternoon in Java + JPA + Play! I really have been struggling with the decision of which language and framework to use, so I am in experimentation mode! BTW, I am using PostgreSQL with its native indexing and search functionality and I find that JPA and Java object models mix fairly well by mostly using JPA with some native quieres like this contrived example using PostgreSQL's indexing and search functionality:
List results = News.em().createNativeQuery("select * from news where to_tsvector(content) @@ to_tsquery('japan | nuclear')", News.class).getResultList();
that maps results back to my Java POJOs.

Sunday, April 03, 2011

Amazon Cloud Player: make sure you take advantage of their introductory offer

I just purchased an MP3 album "Johnny Winter And / Live" for $5 and got a $20 one year upgrade of 20 GB of cloud storage - a sweet deal, but considering that you always get 5 GB free this may not be much of an added value. Amazon has a nifty uploader application that looked at my iTunes MP3s and playlists and is cloning that on Amazon Cloud Player automatically. My entire iTunes library will only take up a few GBs after it is automatically uploaded. Sometimes Amazon kills Apple's iTunes store on price: I was about to buy a few tracks on iTunes last year and then realized I could buy the entire album as MP3 on Amazon for for not much more.

Amazon seems to be investing in introductory offers like the upgrade for Cloud Player and the first time AWS developer's package (basically free to develop and deploy for one year). Certainly expensive for them to provide as free services but Amazon is playing the long game. My ordered list of the most impressive technology companies:
  • Amazon
  • Google
  • Apple
  • Netflix
Notice that Microsoft is not on my list :-)

Tuesday, March 22, 2011

The Cloud, The Cloud

Redit was down for over 5 hours last week because of problems with EBS volumes on AWS. Netflix was down a few hours today: another AWS user but I don't know yet what the difficulties are. So Amazon has problems and the web is full of people complaining about occasional problems with Google's AppEngine cloud hosting service.

No one likes to not support users 24x7 and the users don't like interrupted service, but I think that these occasional outages are just growing pains as we move towards a new way to deploy applications that costs less money, requires fewer staff resources, and likely is more energy efficient.

I think that I have only had one customer in 3 years who did not at least partially deploy on Amazon's AWS. This is the future and we need to learn how to work around problems and take advantage of resource savings when we can.

Sunday, March 20, 2011

Node.js

I read this morning about a new beta book from O'Reilly Up and Running With Node. Before skimming through the beta book, I took a half hour to review the standard Node.js Manual & Documentation. The beta book is very nice (recommended!) and I especially enjoyed the discusion about using the Node REPL. I was lucky enough to have received a Heroku Node beta invitation last year but so far all I have done with Node is to build new versions every few months and play with the examples in the documentation. I think that this is worth the time, even though I have a very full consulting schedule, because Javascript is probably going to become popular for server side development as it already is for browser development. This will happen because of the efficiency of V8, the ability to share code between server and client side, and the inherent scalability of event, rather than multiple thread systems.

Wednesday, March 16, 2011

David Rumelhart passed away. RIP to a good guy.

Before David won a MacArthur Grant he was a professor at UCSD and co-wrote a few great books on artificial neural networks that really helped me a lot. My company also hired him as a consultant and he gave me advice when I implemented the 12 currently popular neural network learning and recall algorithms in the 1980s for our ANSim software product. It was a great experience sitting in his living room talking about NNs.

He was a nice guy and I am sure that he helped many people with his work.

MongoDB 1.8 released - and there is joy throughout the land

I have been running 1.7.5 on my laptop and just upgraded to 1.8 stable. While the normal way to run MongoDB (at least in my work) is to use read-only slaves for analytics, etc., I am still glad to see the single server robustness changes, including optional journaling. I also noticed that in the admin shell, showing databases provides database size estimates.

Another useful change is replica set authentication using identical key files that are placed on each server. You then let one server know about the others (as before). You can read about other improvements here.

Sunday, March 13, 2011

Nourish and manage your career, not your job

I have been working close to full time since the beginning of this year for two customers. This is unusual for me since I have usually capped my work week at 32 hours maximum over the last 25 years. I have been enjoying the work and extra earnings but I believe that working too many hours carries a real cost and some risks:

It is far more important to manage our own careers than any particular job. You don't own your job but you do own your career. Just as you maintain your home and your car, careers require fairly much constant maintenance, including:
  • Life long learning of new technical skills.
  • Developing skills that enable people you deal with to also be successful: strive for win-win outcomes.
  • Networking that supports finding work, getting second opinions on important decisions, and leads on new interesting and useful technologies.
  • Time for self analysis: what has worked for you in your career (and life!), possible improvements, and understanding situations and attitudes to avoid in the future.

Individual jobs, including consulting gigs, are just tools for improving a career. While we owe employers our best effort, honesty, and transparency (e.g., share bad news earlier rather than later), we also owe it to ourselves to nourish and care for our own careers.

Sunday, February 20, 2011

Don't stray too far from well supported language, tool and platform combinations

I have been doing a lot of customer work lately using Java EE 6 and Glassfish. Fairly nice development environment even if Java EE 6 is heavy on ceremony (but better than J2EE).

Just for fun today I took small play apps I had written in Rails and Clojure + Compojure and shoe-horned them to run on Glassfish. Interesting exercise, but unless there is an overwhelming need to create a custom deployment setup, it is so much better to go with the flow and stick with well crafted and mature setups like
  • Java and Java EE 6
  • Clojure + Compojure running with embedded Jetty behind nginx for serving static assets
  • Rails app hosted on Heroku
  • Web apps written in either Java or Python hosted on AppEngin using the supported frameworks
  • etc.
I must admit that I enjoy hacking not only code but also deployment schemes - enjoy it too much sometimes. Sometimes it is worthwhile, most often not.

Monday, February 07, 2011

Curated data

It is difficult to predict what data will have long term value so it is often safest to archive everything. With data storage costs approaching zero I think that we can expect high value data to last forever, baring a nuclear war or the crash of society.

Curated data has a higher value than saving "everything." I think that the search engine Blekko is interesting and useful because of what it does not have: human powered curation yields fewer results but very little SPAM. The Guardian's curated structured data stores have much higher value than the original raw data (from government sources, etc.). I can imagine The Guardian curated data becoming a permanent part of our history as for example are ancient stone tablets we see in museums.

I have long planned on providing curated news and technology data that has semantic markup either on my ancient knowledgebooks.com domain or a new placeholder kbsportal.com but I seldom have free time slots because of my consulting business. Hint: I would like having a few partners who are into statistical natural language processing and general data geeks to help me with this. I don't know if it would end up being a viable business or just a public service portal.

Sunday, February 06, 2011

Big Data

For the last few decades, it seems like I work on a project every few years that stretches my expectations of how much data can be effectively processed (starting in the 1980s: processing world-wide seismic to detect underground nuclear explosions, all credit card phone calls to detect theft, etc.)

I was in meetings three days last week with a new customer and as we talked about their current needs I made mental notes of what information they are probably not capturing that they should because it is likely to be valuable in the future in ways that are difficult to predict.

To generalize a bit, every customer interaction with a company's web sites should be captured (including navigation trails through web sites to model what people are personally interested in), every interaction with support staff, every purchase and return, etc.

Amazon has set a high standard in user modeling with Amazon suggests for products that you might want to buy. Collecting data on your customers should not make them feel creepy about interacting with your company but rather should make them feel important and well-serviced.

I am a voracious reader both for fun and to continue a life long education. I have found myself shifting some of my reading attention from computer science to statistics, math, and general business intelligence. (That said, I still read several computer science books per month).

I was disappointed that I could attend the Strata 2011 data conference but I did enjoy watching the keynote speech videos: fairly good material and worth some viewing time.

Monday, January 31, 2011

Two good books on AppEngine development

The publisher PACKT sent me a review copy of Google App Engine Java and GWT Application Development by my friend (via email correspondence) Daniel Guermeur and co-author Amy Unruh (thanks!) and I bought Code in the Cloud Programming Google AppEngine by Mark C. Chu-Carroll.

Both are very good books and complement each other.

Mark's book gives an interesting insite into AppEngine from someone who works at Google. He covers both Python and Java development. I relied on the Python sections of the book when I wrote a Python based AppEngine application last December. I am not much of a Python programmer but his book got me going quickly and I had few problems. He uses Google's Django support for AppEngine for the Python examples and Google Widget Toolkit (GWT) for the Java examples.

Daniel's and Amy's book is a hands-on guide to using the Eclipse IDE and GWT to develop AppEngine applications in Java. They use JDO for the book examples which is probably best for the general reader but my own preference is the lighter weight objectify-appengine datastore wrapper. The example application is neat: Connectr gathers social media and provides some aggregation services. I like that the application is interesting and contains useful code. I would particularly recommend this book if you want to use Eclipse and GWT to build AppEngine applications and you want everything you need in one tutorial and reference book.

Friday, January 21, 2011

Social networking: why fewer connections may be better

I have a very public web presence from this blog and my web site. I enjoy sharing information and communicating with people via email and occasionally (with a heads-up email first) talking on the telephone. I also spend an hour a week giving free advice to students on their projects, employment hints, and to a more limited degree give feedback on technical ideas. I can enjoy doing this because email is asynchronous: I can handle these interactions when they don't interfere with my work or research.

In the past I have accepted connections on LinkedIn and Facebook from people who I don't know, just t be friendly. However, there is a cost to this.

LinkedIn frequently sends out email statuses of what colleagues (current and present) are doing. I like this for people I know well either personally or through years of email interactions. However, status updates from people I am not closely associated with take time even to ignore.

The situation is worse on Facebook. I used to always accept friend connections from anyone who looked like they were a computer scientist from their public profile (in addition to people I know).

I am a native English speaker with reading knowledge of French. One of the great things about the web is communicating with people from around the world with similar interests. Due to my limited natural language skills however, this has to be in English. Sometimes half of the updates on my Facebook are in languages I can't read.

I am leaving all of my LinkedIn connections as-is but in the last few weeks I have been removing friend connections on Facebook except for close friends, family, and people I have worked with in the past. Checking my Facebook page a few times a week is much more enjoyable.

For other communication, email is best.

Sunday, January 09, 2011

Java EE 6 is actually pretty good

I do most of my development in very agile programming environments like Ruby and Rails (with Datamapper or ActiveRecord), Clojure with MongoDB, etc. I like languages that have an interactive repl.

Recently, I have taken on for a new customer helping (a life-long friend's company) do some conversion to Java EE6 and I must say that Java EE 6 is very well done. I wrote a J2EE book many years ago and used to be into Java server side development but drifted to other platforms partly because that is what customers hired me to work on and partly because of my own technical interests. In the last 5 years I have probably spent about 30% of my time working with Java (customers want Lisp and Ruby development).

Java EE 6 is so much better than J2EE. Writing POJOs (with EJB annotations), unit testing them as simple POJOs, and then integration testing them in an EJB container makes for a reasonable programming environment. I still think that you get much more bang for your programming buck with Ruby and Rails but there are applications that fit very well into the Java EE 6 infrastructure.

Sunday, January 02, 2011

Recommended: Niall Ferguson's "The Ascent of Money"

I just finished watching tonight the DVD of the PBS series but the book covers the same material. Harvard professor/historian Nial Ferguson puts down and in its place the (in my opinion also) misguided view that governments can, long term, spend their ways out of problems.

The PBS series is especially fun to watch (in addition to being very educational) because as Ferguson traces "bubbles" throughout history, there is local video that helped me picture what happened in ancient, medieval, and modern times.

Worth watching! (There are many excerpts on youtube if you don't have time to watch the 4 hour series.)

I rented the DVD from Netflix, and their summary is good:
British historian and author Niall Ferguson explains how big money works today as well as the causes of and solutions to economic catastrophes in this extended version The Ascent of Money documentary. Through interviews with top experts, such as former Federal Reserve Chairman Paul Volcker and American currency speculator George Soros, the intricate world of finance, including global commerce, banking and lending, is examined thoroughly.
One bit of advice is that there is no firm economic security in anything but income. For individuals this the ability to earn money and for governments the ability to raise sufficient taxes to pay for most expenditures. The PBS documentary makes this point very well.

Saturday, January 01, 2011

Happy New Year

I start each day by enumerating the good things in my life (computer scientist-speak for "counting my blessings") followed by meditation and relaxation techniques. A nice way to start each day.

I would like to do the same today, the first day of 2011 (note that 2011 is the sum of consecutive primes: 157+163+167+173+179+181+191+193+197+199+211).

I am grateful for my family and friends, living in one of the most beautiful places in the world (Sedona Arizona), having interesting work with great customers, resources to self-fund my own research as a computer scientist, and time to enjoy my hobbies (cooking, hiking and reading).

Happy New Year!