Saturday, April 30, 2011

Text search in SimpleDB: a Ruby example

You might want to use SimpleDB for storage and to support text indexing and search if you did not want to manually run and administer Solr yourself. Here is a little snippet that shows how to store searchable documents in SimpleDB:
require 'rubygems'
require 'aws_sdb'

SERVICE = AwsSdb::Service.new

# assuming that this domain is already created
DOMAIN = "some_test_domain_7854854"

class Document

  def initialize name, text
    words = (name + ' ' + text).downcase.split.uniq
    attributes = {:words => words, :text => text}
    SERVICE.put_attributes(DOMAIN, name, attributes)
  end
  
  def Document.search query
    # The last inject takes the intersection and
    # insures that all search terms are present:
    keys = query.downcase.split.collect {|x|
      SERVICE.query(DOMAIN,
                    "['words' starts-with '#{x}']")[0]
    }.inject {|x, y| x & y }
    keys.collect {|key|
                  SERVICE.get_attributes(DOMAIN, key)}
  end

end

Document.new('title1',
             'The bird flew to the lake for water')
Document.new('title2',
             'The dog chased the cat')

p Document.search 'flew lake'
The formatting of this code snippet is odd because I was trying to get short lines to fit the page width. This code snippet is not terribly efficient but since the first 25 Amazon SimpleDB Machine Hours consumed per month are free for your Amazon AWS account using this code example in your applications can end up being almost free (there are small data storage and bandwidth charges) and you get the advantage of no administration hassles. The output for the above code snippet is:
[{"text"=>["The bird flew to the lake for some water"],
  "words"=>["bird", "flew", "for", "lake", "the",
            "title1", "to", "water"]}]
There are two improvements that you can implement: remove noise/stop words from the words attribute and make the code multithreaded to execute the individual SimpleDB queries in parallel when possible to do so. I was trying to make this example code snippet concise. For simple and/or moderately used applications these improvements aren't necessary.

If you run this example remotely from your laptop, notice that remote SimpleDB access is a little slow. When run on a small EC2 instance, it takes about 0.05 seconds to add a "document" to SimpleDB and about 0.1 seconds to search using two search terms.

Thursday, April 21, 2011

And the best JVM replacement language for Java is: Java?

Although I use Ruby (mostly Rails) and Common Lisp on many customer projects, I am heavily invested in the Java platform and I don't see that changing in the next ten years or so.

Java is more than a little heavy on ceremony however, and I would like a really agile language for the JVM. I have used Clojure a lot in the last year for work on one customer's project but at least for now the lack of concise and useful runtime error backtraces kills some of the joy of using Clojure. Really nice language and community however, and I expect in a few years Clojure may be my primary JVM language. I love coding in Ruby and the JRuby developers do a great job moving the sub-platform forward. However, except for large Rails applications, I don't see myself writing very large applications in Ruby: for me Ruby is a scripting language for getting stuff done quickly and easily. I do like Scala but the learning curve is steep and that means that it is difficult to find pre-trained highly skill Scala developers.

Java has the sweet spot of lots of great tools and a rock solid infrastructure. So, how to make Java more agile? I do a few things that help: I use public attributes so I don't bother with getters/setters anymore unless I am using a framework that needs them for introspection. I very much like JPA, but I am growing less fond of the rest of the Java EE 6 stack - really a lot of layers between designing and writing code and runtime; too much abstraction for my tastes. The Play! Framework is great, in general, and I am using it on my personal project and I look forward to seeing how Play! develops as an agile platform over the next few years.

Tuesday, April 19, 2011

Some new Platform as a Service providers: cloudfoundry.com and dotcloud.com

I am on vacation so I have not had much chance to try the beta invites I just received for cloudfoundry.com and dotcloud.com but both look promising as works in progress.

For now, Cloud Foundry is set up for Ruby Rack applications (like Rails and Sinatra) and Java Spring apps. They currently support MongoDB, MySQL and Redis. They will release the core software if you want to run a cloud on your own servers.

Dotcloud supports a wide range of platforms and data stores. Their roadmap shows what is available right now and what is planned.

Both beta programs are free for now. It will be interesting to see what the costs are.

Wednesday, April 13, 2011

(Roughly) comparing Play! version 1.2 with Rails

Both the Play! and Rails frameworks implement MVC and have very agile development environments. Play!, being written in Java (but also supporting Scala development) accomplishes this agility by using the Eclipse incremental Java compiler so if you edit any Java code or HTML template files (with embedded Java/Groovy expressions) you immediately see the results after refreshing your web browser.

While Play! is not nearly as complete of a stack as Rails, it does include modules for
  • MongoDB
  • AppEngine
  • Objectify
  • GWT
  • Search
  • PDF generation of any view
  • Scala use
  • CoffeeScript
  • OpenAuth working with Google, Yahoo, Twitter, etc.
  • Simple CRUD scaffolding
  • Facebook Connect and Graph API
  • Lucene search of JPA models
  • etc.

I have several years of Rails experience and I am using Java EE 6 for a customer project. With this background, I put Play! in the sweet spot between Java EE 6 and Rails: easy to learn if you know Java and supports agile development. My favorite part of Java EE 6 is JPA, which Play! supports.

I have played with Play! off and on for over a year, but just for a few hours at a time, and never any serious projects. (This is largely because most people who hire me usually want me to do Lisp or Ruby development.) I have more or less decided to use Play! for one of my own projects because I already have so much reusable Java code I have written for it and I like the interactive Play! development process. My wife and I are just starting a vacation, and after finding myself in a quiet place with time on my hands (we catch an early flight tomorrow morning and are staying near the airport) I just reimplemented a bit of data modeling code that I wrote in Clojure, Ruby, and Common Lisp last weekend, this afternoon in Java + JPA + Play! I really have been struggling with the decision of which language and framework to use, so I am in experimentation mode! BTW, I am using PostgreSQL with its native indexing and search functionality and I find that JPA and Java object models mix fairly well by mostly using JPA with some native quieres like this contrived example using PostgreSQL's indexing and search functionality:
List results = News.em().createNativeQuery("select * from news where to_tsvector(content) @@ to_tsquery('japan | nuclear')", News.class).getResultList();
that maps results back to my Java POJOs.

Sunday, April 03, 2011

Amazon Cloud Player: make sure you take advantage of their introductory offer

I just purchased an MP3 album "Johnny Winter And / Live" for $5 and got a $20 one year upgrade of 20 GB of cloud storage - a sweet deal, but considering that you always get 5 GB free this may not be much of an added value. Amazon has a nifty uploader application that looked at my iTunes MP3s and playlists and is cloning that on Amazon Cloud Player automatically. My entire iTunes library will only take up a few GBs after it is automatically uploaded. Sometimes Amazon kills Apple's iTunes store on price: I was about to buy a few tracks on iTunes last year and then realized I could buy the entire album as MP3 on Amazon for for not much more.

Amazon seems to be investing in introductory offers like the upgrade for Cloud Player and the first time AWS developer's package (basically free to develop and deploy for one year). Certainly expensive for them to provide as free services but Amazon is playing the long game. My ordered list of the most impressive technology companies:
  • Amazon
  • Google
  • Apple
  • Netflix
Notice that Microsoft is not on my list :-)