Thursday, May 31, 2007

Google Gears: a sea change for web applications?

I thought that Adobe Apollo might have hit a sweet spot for allowing developers to create hybrid desk top and web applications. I think that Google may have hit a sweeter spot with Gears. Gears installs as a FireFox add-on and uses the SQLite database to store data locally on your file system. Each site that you visit that is 'Gears enabled' causes a pop up permissions dialog to appear - I recommend being careful of which web sites you allow to use the Gears add-on. I have reviewed the developers' documentation and it looks straight forward to set up and a Javascript SQL database API makes it easy to use SQL in your Javascript.

Saturday, May 19, 2007

Why the ODF is better than Microsoft's document formats

It takes a few lines of Ruby code to process OpenOffice.org document files:
require 'rubygems'
require 'rexml/document'
require 'rexml/streamlistener'
require 'zip/zipfilesystem' # install with gem
include REXML

class OOXmlHandler
include StreamListener
attr_reader :plain_text
def initialize; @plain_text = ""; @last_tag_name =""; end
def tag_start name, attrs; @last_tag_name = name; end
def text s
@plain_text << s << "\n" if @last_tag_name.index('text')
end
end

class ReadOpenOffice
attr_reader :text
def initialize file_path
Zip::ZipFile.open(file_path) { |zipFile|
xml_handler = OOXmlHandler.new
Document.parse_stream((zipFile.read('content.xml')), xml_handler)
@text = xml_handler.plain_text
}
end
end

puts ReadOpenOffice.new('KBrecipes.odt').text
I have spent too much of my time over the last 10 years dealing "programatically" with Microsoft document formats. I am tired of wasting my time when open document formats are so much easier and less expensive to use.

Monday, May 14, 2007

Data representations: the more the better

A friend of mine, also a long time Lisp hacker, stated the opinion about 15 years ago that object oriented might be the "last great programming paradigm". A good thought, but object representation is just one way to think about and manipulate data. I admit that my favorite style of programming is with an object oriented language with transparent object relational mapping (e.g., Java with Prevayler, Ruby with ActiveRecord, or Lisp with something like AllegroCache).

The ability to write programs (preferably quickly :-) allows us to experiment with data representations. The point I am making here is that different software tools can not only help solve different problems with different levels of effort, but the tools that we choose inform us and change the way we think.

Sometimes a relational data model just works best, both for efficient access (not letting an object relational mapping system build an entire object in memory when you are only interested in a few columns in a table) or for thinking about and browsing data.

When there are too many attributes characterizing data that you need to explore or use, then faceted browsing helps a lot: pick the most important attribute, then the second, etc., eliminating large parts of a data space.

For some problems graphs are the best data representation and languages like Lisp and Prolog that allow list data structures to be cut up and put back together are most effective.

Wednesday, May 09, 2007

Use of arrays considered harmful?

Pardon me for playing with the title of Edsger Dijkstra's famous paper, but as a Java developer I am starting to believe that arrays should be seldom used, given Java generics and the revised collection classes. I started thinking about this while reading someone else's code and noticing some artificial looking conversions between arrays and collections. Then for fun, I did some measurement and noticed over 15% of processing time was spent in toArray() methods.

The big cost in IT is usually in development and maintenance, not in deployment costs. Most projects are small, not Amazon size deployments so I believe that unless you are sure that you are building a very large scale system, it is best to optimize development to reduce costs for small and medium size systems. Smaller code size means smaller costs. Java programs should be as short and concise as possible and generics and use of collections helps a lot. Sometimes when I read the source for Java programs it looks like the authors were paid by line of code :-)

Saturday, May 05, 2007

Interesting technology: AllegroGraph

I am using Franz's AllegroGraph for two proof of concept projects for a customer: one using the Java APIs (free version) and one using the Lisp version that is unlimited in the size of stored data. RDF storage and querying is not easy technology to use (at least for me) but looks very promising.

The thing that I find interesting about using AllegroGraph is that you are dealing with disk-based persistent data, but not dealing with objects - not dealing with object relational mapping, etc. Instead, you work with graph data structures that are stored on disk, with parts cached in memory. Interesting stuff.

Still, dealing with RDF is not optimal, compared to dealing with graphs in memory. As an example: I used to work a lot with Rete networks using Lisp (hacking Charles Forgy's Lisp code) and dealing with graph data structures built up with Lisp lists, cons, etc. is just easier to do. In memory graphs, semantic networks, etc. are just easier for me to wrap my thoughts around. However, approaches like AllegroGraph have the advantage of scalability.

Wednesday, May 02, 2007

My article "A Java Developer's Guide to Ruby" was just published

Check it out on DevX.

In this article I write about why a Java developer might want to also use Ruby and I cover a few cool Ruby features.