Monday, December 05, 2011

(re) learning Clojure

I have been using Clojure for about 25% of my consulting work in the last 2 years, read two books on Clojure, and I had some Clojure examples in a book I wrote last year.

That said, I don't really feel "expert" at the language the way I do with Java, Ruby, and Common Lisp.

I am trying to fill in some gaps by carefully reading through one of my customer's Clojure code, and all Clojure libraries that I use like Noir, Compojure, etc. I am trying to pick up more idioms. I enjoy it when I see a new trick in someone else's code and going back to my code to improve it.

Sunday, December 04, 2011

Using the New York Times Semantic Web APIs

I am working on a side project of my own in Clojure using the AllegroGraph 4 and Stardog RDF repositories (thanks to Franz and to Clark & Parsia for licenses to use their products!) and my own NLP code. I am using the excellent NYT data access APIs to get research/test data.

I am going to show you some simple examples in Ruby for accessing the NYT Semantic Web APIs that are free to use up to 5000 API calls a day.

I also use other NYT APIs. Each API has an access key that you need to sign up for. I set my access keys as environment variables that I access in my code; for example in Ruby:

# New York Times API Keys:
NYT_SEMANTIC_WEB = ENV['NYT_SEMANTIC_WEB']
NYT_SEARCH = ENV['NYT_SEARCH']
NYT_NEWSWIRE = ENV['NYT_NEWSWIRE']
NYT_PEOPLE = ENV['NYT_PEOPLE']
NYT_TAGS = ENV['NYT_TAGS']

In the following code snippets, I am only using the Semantic Web APIs. I want to first search for available concept types and concept names, based on keyword search:
require 'simple_http'
require 'json'

def semantic_concept_search query
  uri = "http://api.nytimes.com/svc/semantic/v2/" +
        "concept/search.json?" +
        "query=#{CGI.escape(query)}&api-key=" +
        NYT_SEMANTIC_WEB
  JSON.parse(SimpleHttp.get(uri))
end

def  pp_semantic_concept_search query
  json = semantic_concept_search(query)
  puts "Results:\n"
  json["results"].each do |result|
    puts "\n\tconcept_name:\t#{result['concept_name']}"
    puts "\tconcept_type:\t#{result['concept_type']}"
    puts "\tconcept_uri:\t#{result['concept_uri']}" if result['concept_uri']
  end
end

pp_semantic_concept_search("Obama")
The second method "pretty prints" the JSON data that I am interested in. Some of the sample output looks like:
concept_name: Obama, Barack
concept_type: nytd_per
concept_uri: http://data.nytimes.com/47452218948077706853

concept_name: Obama, Malia
concept_type: nytd_per

concept_name: Obama, Michelle
concept_type: nytd_per
concept_uri: http://data.nytimes.com/N13941567618952269073
Once I have a concept type and concept name I can then look up articles:
def lookup_concept_data concept_type, concept_name
  uri = "http://api.nytimes.com/svc/semantic/v2/" +
        "concept/name/#{concept_type}/" +
        "#{CGI.escape(concept_name)}.json?&" +
        "fields=all&api-key=" + NYT_SEMANTIC_WEB
  JSON.parse(SimpleHttp.get(uri))
end
 
def pp_lookup_concept_data concept_type, concept_name
  puts "** type: #{concept_type} name: #{concept_name}"
  json = lookup_concept_data(concept_type, concept_name)
  puts "Results:\n"
  json["results"].each do |result|    puts "\n\tLinks:"
    result["links"].each do |link|
      puts "\t\trelation: #{link['relation']}"
      puts "\t\tlink: #{link['link']}"
      puts "\t\tlink_type: #{link['link_type']}"
    end
    result["article_list"]["results"].each do |article|
      puts "\tTitle: #{article['title']}"
      puts "\tDate: #{article['date']}"
      puts "\tBody: #{article['body']}\n\n"
    end
  end
end

pp_lookup_concept_data('nytd_per', 'Obama, Barack')
Some sample output looks like:
Links:
 relation: sameAs
 link: http://rdf.freebase.com/ns/en.barack_obama
 link_type: freebase_uri
 relation: sameAs
 link: http://dbpedia.org/resource/Barack_Obama
 link_type: dbpedia_uri
 relation: sameAs
 link: http://en.wikipedia.org/wiki/Barack_Obama
 link_type: wikipedia_uri

Title: U.S. Urges Egypt To Let Civilians Govern Quickly
Date: 20111126
Body: WASHINGTON -- Ever since tens of thousands of protesters converged on Tahrir Square in Cairo for the first Day of Revolution exactly 10 months ago, the Obama administration has struggled to strike the right balance between democracy and stability. In the early morning hours on Friday, President Obama came out on the side of the Arab street, issuing

Title: EDITORIAL; The Solyndra Mess
Date: 20111125
Body: The Republicans on the House Energy and Commerce Committee appear to have hit the pause button on their investigation into the failure of Solyndra, a solar panel maker that entered bankruptcy proceedings in September, defaulting on a $528 million federal loan. What have we learned? Nobody comes out of this looking good. Not the Obama

Great to see useful linked data/Semantic Web data sources being made available! Hopefully these little code snippets will save you some time in getting started using the NYT APIs.