Sunday, May 13, 2012

I am adjusting to a mobile digital life: the acceleration of convenience

The first time I used a computer was in 1962 when I was 11 years old. It was not convenient: my Dad had to take me to his office to use one of the early timesharing systems using a local teletype machine with punched tape for saving and reloading stuff. A few years later I took an extension course at a local UC campus, FORTRAN with punched cards with nice keypunch facilities: definitely a large improvement!

In the 1980s life really got good: my own Xerox 1108 Lisp Machine and an Internet connection.

In the decades starting in 1990 and 2000 there were continual improvements but progress was gradual: better Internet connections, the development of the Web, improvements in software tools like code repositories, IDEs, etc.

Skipping ahead to the present time, I am trying to adjust my digital life, including my work and writing flows, to a more mobile lifestyle.

Probably the biggest change for me is that I used to keep just about everything that I worked with and touched in a repository (first cvs, then svn, and now git). I no longer need incremental backups of all of my work because for low priority stuff I can always dig back in time using simple Time Machine backups of my MacBook Air. I do still use a lot of git repositories for high value assets like:

  • Software development projects for customers
  • My open source projects at github
  • My own high value projects such as software and other assets for my web properties and commercial software products
This is a major change for me. I keep other assets that don't really need versioning on DropBox:
  • All assets for my writing projects: books I have written in the past, my current writing project, etc.
  • A million + 1 small code snippets and small bits or program code organized by all languages that I use (Common Lisp, Java, Ruby, Scheme, Haskell, Python, Prolog, etc. (whenever I figure out a new coding technique, how to do something generally useful, etc., I always like to save a little code snippet to refer to later)
  • Favorite pictures taken in my lifetime (old ones are now digitized)
  • Favorite videos taken in my lifetime (lots of digitally converted 8mm videos from my childhood through recent vacation videos: everything!)
  • All of my personal notes organized by travel logs, personal writing, etc.
  • A large fraction of the huge amount of music that I have purchased, organized by artist (and sometimes also by album)
  • Most of the eBooks that I have ever purchased if they are not in the Kindle format.
For Kindle books not purchased through Amazon, I email them to my Kindle/Amazon email address and let Amazon manage keeping everything in-sync on all of my devices. Kindle platform syncing the current reading location across devices is a huge convenience since I routinely read on my Kindle, iPad, and MacBook Air.

I am careful to not keep anything that is highly proprietary on DropBox (i.e., assets for customer projects). I used to also keep well organized directories of useful reading material found on the web (e.g., PDFs on AI, machine learning, NLP, server deployments, etc., etc.). I have very recently made the rather large step in throwing all of this material into Evernote, backing it up, and deleting it.

The really big win in relying on DropBox, Evernote, and the Kindle platforms is being able to switch between computers easily and also have most of my stuff available on my iPad and Droid cellphone. I use several computers and it is a slight nuisance doing a git pull every time I switch computers. I like to use mobile devices for reading and general thinking time and this is a lot easier now. I have been running Apple's beta OS X Mountain Lion that has iCloud integration. There is a chance that if iCloud is very well implemented that I may slowly transition away from DropBox to iCloud, and switch to using an iPhone. However, DropBox is very well implemented so Apple would need to make iCloud's implementation across all Apple devices very, very good for me to make the switch.

Saving the last big win for last: a mobile digital life promotes more of what Clojure creator (and general programming mentor) Rich Hickey calls "hammock time": the time you spend away from the computer thinking. I still use a pad or paper and a pen for away from a computer thinking time, but I find mobile devices augment this activity nicely.

Friday, May 11, 2012

Since Bing search and spelling APIs are no longer free, I did a quick survey of other services and signed up for Yahoo BOSS search.

I signed up for Google's APIs around 2002 and used it until they stopped the service. I later switched to Microsoft's Bing APIs: the cost was great, free! Bing's new entry level cost of $40/month for 20,000 queries does not match my use case. I wish they had a $5/month for 2000 queries.

I do a low volume of search for personal research projects and prototypes. Yahoo BOSS's service has an attractive price but I worry about the service being terminated with the deal with Microsoft. Google offers custom site search, but I didn't see any options for a general paid-for web search API.

So, I signed up with Yahoo BOSS for search and if the service is ever terminated, I will look elsewhere. The price is good, $0.80 per 1000 queries.

2012-05-19 edit: Bing search now has a free tier for up to 5000 search requests per month. Cool! For research purposes, this is good enough.

Monday, April 23, 2012

Back from Vacation, catching up on work, and a Java retrospective

Carol and I got home from visiting family in Rhode Island late last night and I have been working catching up on customer work since 4am this morning. Here is a picture of my birthday dinner.
The kids got two pound lobsters for everyone - I am holding mine and a plate of grilled veggies.

Java: I used to do research programming in Common Lisp or Scheme. For ideas that worked I often re-coded them in Java for better deployment options, for better alignment with customer preferences, etc. I fell out of that habit six or seven years ago because I had a long term Common Lisp development job and also really got into Ruby and Clojure development. Both Ruby and Clojure are great for research programming and for some types of applications the deployment options are good.

That said, even Clojure which is efficient (about 1/3 the speed of Java and uses about 3 times the amount of memory) wastes computing resources (hits the environment and the pocketbook). Ruby is much less efficient than Clojure.

I started thinking about efficiency after talking on my flight home to an Intel engineer who was sitting next to me on the plane. He was reading through a book on DSP with tons of differential equations. He said that he used C sometimes but felt better about projects that were mostly done in assembler language because it was a shame to waste processing cycles. We ended up talking about the efficiency of programming languages.

I still believe that it makes sense to prototype systems in whatever language and platform make sense for getting working code in place quickly. The decision is whether to recode for more efficient deployments and if so when to do it.

Sunday, March 18, 2012

Using Wolfram Alpha from Clojure

I have been blown away in the last year by Wolfram Alpha but I haven't done much with the developer's APIs. To make it easier to experiment with Wolfram Alpha, I wrote a simple Clojure wrapper for the Java APIs. You can get a copy at github.

In case you don't want to grab the github repo, here is most of the code:

(ns wolfram)

(def appid (System/getenv "WOLFRAM_APP_ID"))
(def engine (com.wolfram.alpha.WAEngine.))
(.setAppID engine appid)
(.addFormat engine "plaintext")

(defn query [input]
  (let [query (.createQuery engine)]
    (.setInput query input)
    (let [result (.performQuery engine query)]
      {:pods
       (for [pod (.getPods result)]
         {:title (.getTitle pod)
          :sub-pods
          (for [sub-pod (.getSubpods pod)]
            (for [contents (.getContents sub-pod)]
              (.getText contents)))})})))
Notice that you need to set the API key for your application in an environment variable. You get 2000 free API calls a month. Here is some sample output (with some output removed for brevity):
test=> (query "distance between San Diego and San Francisco")
{:pods ({:title "Input interpretation", :sub-pods (("distance | from | San Diego, California\nto | San Francisco, California"))} {:title "Result", :sub-pods (("453.7 miles"))} {:title "Unit conversions", :sub-pods (("730.2 km  (kilometers)") ("730194 meters") ("7.302?10^7 cm  (centimeters)") ("394.3 nmi  (nautical miles)"))} {:title "Direct travel times", :sub-pods (("aircraft  (550 mph) | 49 minutes 30 seconds\nsound | 36 minutes\nlight in fiber | 3.41 ms  (milliseconds)\nlight in vacuum | 2.44 ms  (milliseconds)\n(assuming constant-speed great-circle path)"))} {:title "Map", :sub-pods ((""))})}
user=> (query "pi")
{:pods ({:title "Input", :sub-pods (("pi"))} {:title "Decimal approximation", :sub-pods (("3.1415926535897932384626433832795028841971693993751058..."))} {:title "Property", :sub-pods (("pi is a transcendental number"))} {:title "Number line", :sub-pods ((""))} {:title "Continued fraction", :sub-pods (("[3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, ...]"))} {:title "Alternative representations", :sub-pods (("pi = 180 ?") ("pi = -i log(-1)") ("pi = cos^(-1)(-1)"))} {:title "Series representations", :sub-pods (("pi = 4 sum_(k=0)^infinity(-1)^k/(2 k+1)") ("pi = -2+2 sum_(k=1)^infinity2^k/(binomial(2 k, k))") ("pi = sum_(k=0)^infinity(50 k-6)/(2^k binomial(3 k, k))"))} {:title "Integral representations", :sub-pods (("pi = 2 integral_0^infinity1/(t^2+1) dt") ("pi = 4 integral_0^1 sqrt(1-t^2) dt") ("pi = 2 integral_0^infinity(sin(t))/t dt"))})}

Saturday, March 03, 2012

A bright future, with some potential problems

Even though the news media portrays a dire world situation, I disagree. In the last few decades the world has become a safer place and fundamental shifts in technology keep driving down the cost of computing resources, networks, and storage that enable greatly increased global productivity. For much of the world globalization is a rising tide that floats most people's boats.

The problem is that not everyone benefits from new paradigms for constant lifelong learning, diminishing advantages of organizations who hold to old mega-scale production and business models, and a free flow of information. The book The Power of Pull is a good reference for ideas how to take advantage of the transitions that the world is going through, whether you like them or not!

The losers in this new world are people and organizations who cannot (or don't want to) adapt and learn and who expect material rewards that are out of touch with their productivity. The biggest potential problem that concerns me is that some of these "losers" have tremendous political and economic clout and will struggle to hang on to old advantages instead of engaging in more forward thinking and productive activities. You don't have to look further than businesses that are "too big to fail" to understand the real dangers of powerful incumbents to our future prosperity and security.

On a personal level, I do believe that for the most part we have control of our lives and that both our happiness and sadness in life is mostly an internal process in our own minds and is fairly independent of the world at large. Certainly, some people are born into, or live, in very harsh situations, but for most people there is at least the opportunity for material success and personal happiness. A cliche, but true: people who live in the past tend to be depressed, those who live in the future are anxious, and those who live in the moment are usually happy and content. The more we can focus our attention on what we are doing in the moment the happier and more productive we can be.

I leave it up to you how you want to manage your life, but I will mention a few things that work for me:

  • I don't waste much time exposing myself to the negatively toned corporate-slanted news media. It is necessary to understand what is happening in the world, and why, but a few minutes a day reading news stories from multiple sources around the world suffices.
  • Everyday I enjoy the time I set aside for learning new technologies, practicing a musical instrument, trying new recipes, hiking with friends, and generally enjoying my family. Without fun time, it is difficult to be productive while working.
  • I spend time and resources helping and mentoring people, and working extra time each week to support three very worthwhile charities. I am convinced that a quality life requires the certain knowledge that we are personally helping to make the world a better place.
  • Time is probably our most precious resource. In addition to saving the time not wasted on corporate news, I try to evaluate how I spend my time, realizing that watching TV, watching too many movies, and other mindless time sinks all have tremendous opportunity costs: how much more can we accomplish and how much more can we enjoy our lives if we apply critical thinking to how we spend our time?

Sunday, February 19, 2012

Using pjax with Clojure and Noir: minimize client side Javascript code while maintaining fast page load times

I don't like doing a lot of client side Javascript (or Coffeescript) development. pjax is a way to minimize client side Javascript while maintaining fast page load times.

I became interested in pjax after reading an article on the development of Basecamp Next. DHH indicated that they looked at pjax but then rolled their own similar system.

Here is a github repo with a Clojure and Noir example web app using pjax that I wrote this morning. There were a few non-obvious aspects to using pjax with Noir so hopefully this will save you some time.

If you don't want to grab the github repo, here are a few interesting code snippets. First, we need to run a little Javascript to process links to set up for AJAX calls setting a "X-PJAX" header:

$(function(){
    // Activate PJAX test links
    // Response will be loaded into #wrapper element
    $('a').pjax('#wrapper')
})
I put this code in resoures/public/js/application.js which is loaded in the common Clojure page wrapping code (in common.clj):
(ns noir-pjax-example.views.common
  (:use [noir.core :only [defpartial]]
        hiccup.core
        hiccup.page-helpers))

(defpartial layout [& content]
  (println "\n**** layout\n")
  (html5
    [:head [:title "noir-pjax-example"]
     (include-css "/css/reset.css")
     (include-js "/js/jquery.js")
     (include-js "/js/jquery.pjax.js")
     (include-js "/js/application.js")]
    [:body "<h1><b>Demo of pjax with Clojure and Noir</b></h1><br/><br/>"
     "<div id=\"wrapper\">"
     content
     "</div>"]))
The trick is that this wrapper code only gets executed one time and the browser only needs to set up the page one time. Only the div with id "wrapper" gets replaced by the standard pjax Javascript file jquery.pjax.js.

Sure, there is still Javascript using pjax, but you don't have to write much at all. In this case, I am "pjax-ifying" all HTML links with the small Javascript snippet in application.js; in a real application, you would be more selective and perhaps also set up multiple page elements that pjax updates. The following code snippet shows the file welcome.clj:

(ns noir-pjax-example.views.welcome
  (:require [noir-pjax-example.views.common :as common]
            [noir.content.getting-started])
  (:require noir.request)
  (:use [noir.core :only [defpage]]
        [hiccup form-helpers page-helpers]
        hiccup.core
        hiccup.page-helpers))

(defpage "/" []
  (if (nil? (((noir.request/ring-request) :headers)
             "x-pjax"))
    (common/layout
      [:p "Welcome to noir-pjax-example /"]
      [:a {:href "/"} "foo 1"])
    (str
      "<p>Welcome to noir-pjax-example /</p>"
      "<a class=foo href=\"/foo\">foo 2</a>")))

(defpage "/foo" []
  (if (nil? (((noir.request/ring-request) :headers)
             "x-pjax"))
    (common/layout
      [:p "Welcome to noir-pjax-example /foo"]
      [:a {:href "/"} "home 4"])
    (str
      "<p>Welcome to noir-pjax-example /foo</p>"
      "<a class=foo href=\"/\">home 3</a>")))
There is not much of a trick here: I check to see if there is a "x-pjax" header and if there is I don't call the common layout page wrapper function.

Friday, February 17, 2012

Nice discovery: PJAX and Rails

Well this has been a discovery for me :-)

I finished my work early today and started my afternoon working through a simple iOS 5 tutorial. As recreation, I went over to Hacker News and read a linked article by David Heinemeier Hansson on making the response time fast on Basecamp Next while still doing mostly server side processing. The article and the comments were great.

This is of some interest to me because I have recently spent a lot of time writing a lot of client side Javascript for a Dojo + Rails app: straightforward but time consuming. DHH in the article and HN comments was making the point that for his company, it was a better developer experience doing more with server side Rails and less custom rich client code in Coffeescript or Javascript. I agree. He, and other people in the comments mentioned pjax as a library for sending back requests to the server that are marked with a HTTP header 'X-PJAX' if the page layout is not to be returned. This makes it relatively easy to still write mostly server side code but make page load times small when most or all of the page layout is not changed. Here is a simple Rails demo program by Edison Machado.

I need a few days to digest this, but this is likely going to change how I write Rails applications.

Thursday, February 16, 2012

I feel a bit like a traitor to the open source movement: I just re-signed up as a Mac OS X and iOS developer

I am a Linux enthusiast (downloaded my first distro over a 2400 baud modem, a long time ago!) and I really like the Android platform.

That said, I have really been enjoying the integration between my iPad and my Apple iTV (that my stepson gave me for Christmas) and the Mountain Lion OS X information released today makes me feel fairly certain that the "Apple experience" is what I want when I am not earning money doing server side Java, AI and textmining consulting gigs, etc. For the work I do for making money (i.e., consulting) it doesn't matter what computer and operating system that I use. I am even planning on trading in my Droid cellphone for an iPhone this year.

I also have a long history with Apple. I prepaid for an Apple II and received serial number 71. I wrote the simple little chess program that Apple gave away on a cassette tape for a while. When the Mac shipped in 1984 I bought one right away and wrote a commercial app that generated a lot of revenue. So, Apple and I are old friends :-)

One of the reasons I paid Apple today to (re)join their developers program is that I want to play around with the early developers release of Mountain Lion. I would also like to experiment (play!) with the intersection of iOS and OS X rich clients for web services, etc.

Edit: I installed Mountain Lion. So far I like it and the only disappointment is that Air Play is not working (yet) to my iTV. As expected not all Apple apps are updated to use iCloud storage, etc.

Edit #2: AirPlay Mirroring doesn't work yet on an Intel Core 2 Duo processor.

Edit #3: after 1 week: I have spent a few hours working through Apple's iOS 5 and OS X Xcode tutorials - fun stuff, but I am going to spend less time on this in the near future because I am very busy with work projects.

Saturday, February 11, 2012

github repo for 4th edition of my Java AI book

As of right now, this new github repo mostly contains the code from the 3rd edition of my book but as I re-write the book, I'll also be updating my code. Some of this Java code really needs a rewrite: many of the examples from the first edition were written in 1998 - a long time ago! I have reworked the code with each new edition. The code examples are licensed under the LGPL but I am considering dual licensing them under Apache 2 also.

Any suggestions for code improvements, pull requests, etc. will be appreciated.

Saturday, January 21, 2012

Citrusleaf: an interesting (non open source) NoSQL data store

I have been using Citrusleaf for a customer (SiteScout) task. Interesting technology. Maybe because I am excessively frugal, but I almost always favor open source tools (Ruby, Clojure, Java, PostgreSQL, MongoDB, Emacs, Rails, GWT, etc., etc. that I base my businesses on). That said, I also rely on paid for software and services (IntelliJ, Rubymine, Heroku, AWS services, etc.) and it looks like Citrusleaf is a worthy tool because of its speed and scalability (which it gets from Paxos, using lots of memory, efficient multicast when possible for communication between nodes in a cluster, etc.)

Wednesday, January 18, 2012

Yes, the DynamoDB managed data service is a very big deal

Just announced today: DynamoDB solves several problems for developers:

  • No administration except for creating database tables (including some decisions like using simple lookup keys or keys with range indices and whether reads should be consistent or not)
  • Fast and predictable performance at any scale (but see comment below on the requirement for provisioning)
  • Fault tolerance
  • Efficient atomic counters
The probable hassle for developers that I see is in knowing how to provision tables for reasonable numbers of allowed reads and writes per second. When you create tables one option is to get warning emails when you hit 80% of provisioning capacity; I interpret this to mean that you really had better not go over the capacity that you have provisioned. Amazon needs to know how much capacity you need in order to allocate enough computing nodes for your tables. The capacity that you pay for can be raised and lowered to avoid getting runtime exceptions when you go over your provisioned number of reads and/or write per second.

The lastest AWS Java SDK handles DynamoDB. For Ruby, the latest aws-sdk (gem install aws-sdk) supports DynamoDB. I signed up for DynamoDB, looked at the Java example, and wrote a little bit of working Ruby code using documentation - I had to slightly change the example code to get it to work for me:

require 'aws-sdk'

dynamo_db = AWS::DynamoDB.new(
    :access_key_id => ENV['AMAZON_ACCESS_KEY_ID'],
    :secret_access_key => ENV['AMAZON_SECRET_ACCESS_KEY'])
table = dynamo_db.tables.create('my-table', 10, 5)

begin
  sleep 3
  puts "Waiting on status change #{table.status}"
end while table.status == :creating

# add an item
item = table.items.create('id' => '12345', 'foo' => 'bar')

# add attributes to an item
item.attributes.add 'category' => %w(demo), 'tags' => %w(sample item)
p item

# update an item with mixed add, delete, update
item.attributes.update do |u|
  u.add 'colors' => %w(red)
  u.set 'category' => 'demo-category'
  u.delete 'foo'
end
p item.attributes.to_h

# delete attributes
item.attributes.delete 'colors', 'category'

# get attributes
p item.attributes.to_h

# delete an item and all of its attributes
item.delete
I used the AWS web console to then delete the test table to avoid charges.

DynamoDB is a big deal because while it is easy enough to horizontally scale out web applications and back end business applications, it is a real pain to scale out data storage for session handling and application data. Except for paying for the service, Amazon is trying to remove these hassles for developers.

I think that in addition to deployments to EC2s, DynamoDB will be a very big deal for Heroku users because it gives them another data store option in addition to Heroku's excellent managed PostgreSQL service, MongoHQ, Cloudant, and other 3rd party data service providers.

Wednesday, January 11, 2012

Web 3.0 and the Semantic Web, a slight return

After talking with a friend and a friend of his about the Semantic Web and healthcare yesterday, I re-watched a great video on Web 3.0 by Kate Ray that I bookmarked and blogged about a couple of years ago. I like this video because it frames the problems that the Semantic Web is trying to solve. My last published book (for APress) had Web 3.0 in the title, a term that did not really catch on :-)

At least a little bit of my enthusiasm for Semantic Web technologies has diminished over the last ten years because of problems that I have had on customer projects trying to collect linked data from disparite sources and merge it into something useful. There are (apparently) no silver bullets and any data collection and exploitation activities involve a lot of difficult work.

I would not be surprised if this problem of merging different data sources is not solved by using Ontologies and webs of linked data sites, but rather, by vendors curating data in narrow domains and selling interfaces to this curated data.

In a world of too much information the activity of curation can have a very high value and this value and the market price for these services will determine the amount of resources invested in combinations of automated and manual curation of information.

Tuesday, January 10, 2012

sleepybird.us site is online

Yesterday I wrote about two web portals I have been working on in Clojure. One of them is online: our stock photo web site. This is a simple web app written with Clojure and Noir. I use the excellent stripe.com system for accepting orders for JPEGs (and soon hi-def video clips). In my tests it seems easy enough to buy JPEG files: you just check the ones you want, go to the purchase page, and in a few seconds you are downloading a ZIP file with the JPEGs you purchased. A simple little web app but I think that my wife and I will have fun with it: we are avid photographers.

Monday, January 09, 2012

My two new projects: both web portals written in Clojure

I have three web portal projects that I have wanted to develop for quite some time. I am close to releasing two of them (a text analytics web service and a stock photos and video clip store. My wife and I are avid photographers and we have been wanting to travel more and do more photography; I started putting together the photo site yesterday morning and hope to have it fully on line in the next day of two - simple to implement. The text analytics web service will be publicly available within a month or so (right now, just the demo page is active - I short circuited the new account login for now). My third project is a web portal for a single consultant to manage multiple customers. Last year I prototyped this for my own use using Java + GWT + AppEngine and then ported it off of AppEngine, using MongoDB for the data store. I have had such a fun and productive time using Clojure and Noir for my two recent projects that I am considering porting this third project to Clojure. I might leave it as-is except that I have already done most of the design for a more complex web app to manage multiple consultants with multiple customers. I know that the development would go much faster using Lisp.

Monday, January 02, 2012

Using Emacs and org-mode in OS X

I recently ran across David O'Toole's org-mode tutorial and I have been experimenting with using org-mode with Emacs instead of the little utility web app I wrote for my own use a few years ago. I decided that I like org-mode better after learning the basic commands even though I can no longer access my to-do lists from ay web browser. Org-mode is useful for more than simply managing to-do lists and tasks but that is what I am mostly using it for.

To make org-mode always easily available I added this to my ~/.profile:

alias orgmode='echo -e "\033]0;org-mode\007";Emacs -nw ~/Documents/org-mode/.'
This will open org-mode in a the current term window tab and change the tab title to "org-mode." I keep all of my org-files in /Users/markw/Documents/org-mode/ so change that bit of bash script to reflect where you want to keep your org data files. You might also want to substitute emacs for Emacs -nw which is what I use for command line Emacs because I prefer the latest version 2.4.x of Emacs.

Monday, December 05, 2011

(re) learning Clojure

I have been using Clojure for about 25% of my consulting work in the last 2 years, read two books on Clojure, and I had some Clojure examples in a book I wrote last year.

That said, I don't really feel "expert" at the language the way I do with Java, Ruby, and Common Lisp.

I am trying to fill in some gaps by carefully reading through one of my customer's Clojure code, and all Clojure libraries that I use like Noir, Compojure, etc. I am trying to pick up more idioms. I enjoy it when I see a new trick in someone else's code and going back to my code to improve it.

Sunday, December 04, 2011

Using the New York Times Semantic Web APIs

I am working on a side project of my own in Clojure using the AllegroGraph 4 and Stardog RDF repositories (thanks to Franz and to Clark & Parsia for licenses to use their products!) and my own NLP code. I am using the excellent NYT data access APIs to get research/test data.

I am going to show you some simple examples in Ruby for accessing the NYT Semantic Web APIs that are free to use up to 5000 API calls a day.

I also use other NYT APIs. Each API has an access key that you need to sign up for. I set my access keys as environment variables that I access in my code; for example in Ruby:

# New York Times API Keys:
NYT_SEMANTIC_WEB = ENV['NYT_SEMANTIC_WEB']
NYT_SEARCH = ENV['NYT_SEARCH']
NYT_NEWSWIRE = ENV['NYT_NEWSWIRE']
NYT_PEOPLE = ENV['NYT_PEOPLE']
NYT_TAGS = ENV['NYT_TAGS']

In the following code snippets, I am only using the Semantic Web APIs. I want to first search for available concept types and concept names, based on keyword search:
require 'simple_http'
require 'json'

def semantic_concept_search query
  uri = "http://api.nytimes.com/svc/semantic/v2/" +
        "concept/search.json?" +
        "query=#{CGI.escape(query)}&api-key=" +
        NYT_SEMANTIC_WEB
  JSON.parse(SimpleHttp.get(uri))
end

def  pp_semantic_concept_search query
  json = semantic_concept_search(query)
  puts "Results:\n"
  json["results"].each do |result|
    puts "\n\tconcept_name:\t#{result['concept_name']}"
    puts "\tconcept_type:\t#{result['concept_type']}"
    puts "\tconcept_uri:\t#{result['concept_uri']}" if result['concept_uri']
  end
end

pp_semantic_concept_search("Obama")
The second method "pretty prints" the JSON data that I am interested in. Some of the sample output looks like:
concept_name: Obama, Barack
concept_type: nytd_per
concept_uri: http://data.nytimes.com/47452218948077706853

concept_name: Obama, Malia
concept_type: nytd_per

concept_name: Obama, Michelle
concept_type: nytd_per
concept_uri: http://data.nytimes.com/N13941567618952269073
Once I have a concept type and concept name I can then look up articles:
def lookup_concept_data concept_type, concept_name
  uri = "http://api.nytimes.com/svc/semantic/v2/" +
        "concept/name/#{concept_type}/" +
        "#{CGI.escape(concept_name)}.json?&" +
        "fields=all&api-key=" + NYT_SEMANTIC_WEB
  JSON.parse(SimpleHttp.get(uri))
end
 
def pp_lookup_concept_data concept_type, concept_name
  puts "** type: #{concept_type} name: #{concept_name}"
  json = lookup_concept_data(concept_type, concept_name)
  puts "Results:\n"
  json["results"].each do |result|    puts "\n\tLinks:"
    result["links"].each do |link|
      puts "\t\trelation: #{link['relation']}"
      puts "\t\tlink: #{link['link']}"
      puts "\t\tlink_type: #{link['link_type']}"
    end
    result["article_list"]["results"].each do |article|
      puts "\tTitle: #{article['title']}"
      puts "\tDate: #{article['date']}"
      puts "\tBody: #{article['body']}\n\n"
    end
  end
end

pp_lookup_concept_data('nytd_per', 'Obama, Barack')
Some sample output looks like:
Links:
 relation: sameAs
 link: http://rdf.freebase.com/ns/en.barack_obama
 link_type: freebase_uri
 relation: sameAs
 link: http://dbpedia.org/resource/Barack_Obama
 link_type: dbpedia_uri
 relation: sameAs
 link: http://en.wikipedia.org/wiki/Barack_Obama
 link_type: wikipedia_uri

Title: U.S. Urges Egypt To Let Civilians Govern Quickly
Date: 20111126
Body: WASHINGTON -- Ever since tens of thousands of protesters converged on Tahrir Square in Cairo for the first Day of Revolution exactly 10 months ago, the Obama administration has struggled to strike the right balance between democracy and stability. In the early morning hours on Friday, President Obama came out on the side of the Arab street, issuing

Title: EDITORIAL; The Solyndra Mess
Date: 20111125
Body: The Republicans on the House Energy and Commerce Committee appear to have hit the pause button on their investigation into the failure of Solyndra, a solar panel maker that entered bankruptcy proceedings in September, defaulting on a $528 million federal loan. What have we learned? Nobody comes out of this looking good. Not the Obama

Great to see useful linked data/Semantic Web data sources being made available! Hopefully these little code snippets will save you some time in getting started using the NYT APIs.

Saturday, November 26, 2011

Closer to the metal: Clojure, Noir, and plain old Javascript

I am wrapping up a long term engagement over the next five to six weeks that uses Java EE 6 on the backend, and SmartGWT (like GWT, but with very nice commercially supported components) clients. As I have time, I am starting up some new work that uses Clojure and Noir, and it is like a breath of fresh air:

I keep a repl open on the lein project and also separately run the web app so any file changes (including the Javascript in the project) are immediately reflected in the app. Such a nice development environment that I don't even think about it while I am working, and maybe that is the point!

As I have mentioned in previous blog posts, I really like the Clojure Noir web framework that builds on several other excellent projects. Developing in Noir is a lot like using the Ruby Sinatra framework: handles routes, template support options, but it is largely roll your own environment.

Monday, November 21, 2011

Ruby Sinatra web apps with background work threads

In Java-land, I have often used the pattern of writing a servlet with an init() method that starts up one or more background work threads. Then while my web application is handling HTTP requests the background threads can be doing work like fetching RSS feeds for display in the web app, perform periodic maintenance like flushing old data from a database, etc. This is a simple pattern that is robust and easy to implement with a few extra lines of Java code and an extra servlet definition in a web.xml file.

In Ruby-land this pattern is even simpler to implement:

require 'rubygems'
require 'sinatra'

$sum = 0

Thread.new do # trivial example work thread
  while true do
     sleep 0.12
     $sum += 1
  end
end

get '/' do
  "Testing background work thread: sum is #{$sum}"
end
While the main thread is waiting for HTTP requests the background thread can do any other work. This works fine with Ruby 1.8.7 or any 1.9.*, but I would run this in JRuby for a long-running production app since JRuby uses the Java Thread class.

Using the Stardog RDF datastore from JRuby

I was playing with the latest Stardog release during lunch - the way to quickly get going with the included Java examples is to create a project (I use IntelliJ, but use your favorite Java IDE) and include all JAR files in lib/ (included all nested directories) and the source under examples/src.

I took the first Java example class ConnectionAPIExample and converted the RDF loading and query part to JRuby (strange formatting to get it to fit the page width):
require 'java'
Dir.glob("lib/**.jar").each do |fname|
  require fname
end

com.clarkparsia.stardog.security.SecurityUtil.
      setupSingletonSecurityManager()
com.clarkparsia.stardog.StardogDBMS.get().
      createMemory("test")

CONN = com.clarkparsia.stardog.api.
        ConnectionConfiguration.to("test").connect()
CONN.begin()
CONN.add().io().format(org.openrdf.rio.RDFFormat::N3).
  stream(java.io.FileInputStream.new(
            "examples/data/sp2b_10k.n3"))

QUERY = CONN.query("select * where {?s ?p ?o}")
QUERY.limit(10)
RESULTS = QUERY.executeSelect()

while RESULTS.hasNext() do
  result = RESULTS.next()
  result.getBindingNames().toArray().each do |obj|
    puts "#{obj}: #{result.getBinding(obj).getValue().stringValue()}"
  end
  puts
end
This is mostly just a straight conversion from Java to Ruby. The first few lines enumerate all JAR files and require them. The last part, of interpreting the results, took a few minutes to figure out. I used IntelliJ to explore the result values of class MapBindingSet, looking at available methods to call to get the binding names of the variables in my SPARQL query and the values (as strings) for these three variables for each returned result.

Output will look like:
s: http://localhost/vocabulary/bench/Journal
p: http://www.w3.org/2000/01/rdf-schema#subClassOf
o: http://xmlns.com/foaf/0.1/Document

s: http://localhost/vocabulary/bench/Proceedings
p: http://www.w3.org/2000/01/rdf-schema#subClassOf
o: http://xmlns.com/foaf/0.1/Document
...
If you want to run this bit of code, put it in a file test.rb in the top level Stardog distribution directroy and just run
jruby test.rb
I wanted to be able to use Stardog from both JRuby and Clojure. My lunch time hacking today is just a first step.

Tuesday, November 15, 2011

Experimenting with Google Cloud SQL

I received a beta invite today and had some time to read the documentation and start experimenting with it tonight.

First, the best thing about Google Cloud SQL: when you create an instance you can specify more than one AppEngine application instances that can use it. This should give developers a lot of flexibility for coordinating multiple deployed applications that are in an application family. I think that this is a big deal!

Another interesting thing is that you are allowed some access to the database from outside the AppEngine infrastructure. You are limited to 5 external queries per second but that does offer some coordination with other applications hosted on other platforms or host providers.

Their cloud SQL service is free during beta. It will be interesting to see what the cost will be for different SQL instance types.

It was very simple getting the example Java app built and deployed. I created a separate SQL instance (these are separate from other deployed AppEngine application instances), made a new IntelliJ AppEngine project, pasted in the example code, and it all worked.

Perception of quality is often influenced by price. Since developers now have to pay more for using AppEngine, I find myself looking more at AppEngine as a premium service, which it is. Despite my dislike for MySQL (I use PostgreSQL when given a choice), Google's hosted and managed MySQL cloud data service looks good and provides developers with more options. Their SQL service is synchronously replicated between data centers automatically for you.

It has been a few years now since I had to either set up a physical server or a leased raw server for any deployments. I like that! Thank you Platform as a Service (PaaS) providers like Heroku (built on AWS) and AppEngine - they are the future. I still do a lot of work on "plain AWS" but that is still much more agile than provisioning my own servers.