Saturday, June 30, 2007

Education is a life long path

For both personal growth (and fun!) and to stay current as a consultant I find myself spending about 10 to 12 hours a week in general learning activities. I addition to good books, ACM Portal papers, etc., one of the most enjoyable sources of new and/or interesting ideas are the Google Video Lectures. If you have not already done so, install the Google video player; higher resolution viewing than in-browser viewing that makes it easier to read view-graphs and live demos. On OS X, the video player stores downloads in ~/Movies where I keep a text file of notes on what I found most useful/interesting in the videos, and approximate time locations.

Friday, June 29, 2007

I upgraded my open source projects to use version 3 of the GPL and LGPL today

I agree with the new provisions, so why not! I believe that more companies and organizations should adopt wider use of the GPL (or Apache, BSD, etc.) to lower cost and improve quality. I argue that a small part of IT infrastructure can be proprietary to protect intellectual property while most is layered on top of (and supports with contributions) open source infrastructure software.

Thursday, June 28, 2007

I revisted the scene of my 2/2/2006 accident today

One of my friends (who is a psychologist) suggested this morning going back to the hiking location of my 2/2/2006 accident (broken bone in shoulder, torn rotators cuff, shoulder dislocation). I had some initial hesitation about going back, but I had a great time today: we climbed the mountains separating Boynton and Fay canyons. I had a difficult time getting up the mountain but the experience was very good. This picture (taken today) is near the area where I fell: above_Fay_Canyon_looking_towards_Bear_Mountain In addition to the views, we enjoyed visitng two Indian ruin sites today. I have many pictures of the Sedona area on my Flickr photo site.

Saturday, June 23, 2007

My best Ruby coding hint

One programming trick that I use to reference Ruby source code for the standard libraries and locally installed gems is to set up TextMate projects (Eclipse Ruby projects also work well) for:
  • /usr/local/lib/ruby/1.8 - location of Ruby source code for standard libraries on my system
  • /usr/local/lib/ruby/gems/1.8/gems - location of local gem installs (often contain examples/tests and documentation files) on my system
When you are using a standard library or a gem it is great to have the source right in front of you for reference.

Friday, June 22, 2007

"Everything is Miscellaneous" - book review

I just finished David Weinberger's great book Everything Is Miscellaneous: The Power of the New Digital Disorder during a picnic with my wife today. For me, the main "take away" ideas from the book were:
  • In a digital world, the same information can be in multiple places, and organized the way users need or want it.
  • Rigid tree-structured indexing allows parent/child associations but fails to allow general associations between objects. Example: multiple tags allow users to identify information the way they want to access it.
  • When applying tags indicating that an object belongs to a group, it is good to allow a numeric value. Example: Mary is a: manager (0.75), technical guru (0.9), good in meetings (0.25)
  • Technologies like Wikipedia will continue to make information and knowledge a commodity without the limits placed on paper documents like the Encyclopaedia Britannica. There are no limits to the number and size of articles in Wikipedia. Wikipedia supports rich links that can identify relationships between articles. As articles in Wikipedia mature and stabalize they transition from being information to representing knowledge.
  • The web is messy: anyone can write and link to anything. RDF and the Semantic Web can provide some degree of order while still allowing messy bottom up development while providing opportunities to share schemas and ontologies.
A fun book to read and a good source of ideas.

BTW, here is a picture near where we had our picnic today (Red Rock Crossing in Sedona Arizona):DSC00007

Why GPL version 3 is important

Just my personal take: Compatibility with Apache 2 license is huge. I would guess that most developers of software under GPL and Apache licenses would like to enable people and organizations to build new open source software with both GPL and Apache licensed components.

Another reason why GPL v3 is important: there are too many open source licenses right now. If GPL v3 is successful, then I hope that an even larger percentage of open source projects use the GPL. While some corporations do everything that they can to lock in business by spreading FUD about open source and using non-standard file formats, it is in the general public interest to have a strong open source infrastructure and standard document formats. Standards in open source licenses are also important which is why I would encourage standardization on GPL (and LGPL), Apache, and BSD - whatever developers feel is right for their projects and their personal philosophy on open source. I believe that there is also no problem mixing BSD and GPL modules in the same system as long as individual licenses are included with any distribution.

While my projects are currently LGPL I prefer the GPL. LGPL has one huge advantage: LGPL licensed code can always be used under the GPL license. I LGPL my projects to let more people and organizations use my work but I am considering changing to GPL v3 in the future.

Thursday, June 21, 2007

Hibernate Search: good integration of Lucene and Hibernate

In the last 2 years, when not too busy consulting, I have been working on a knowledge management system KBSportal. While I have prototyped some ideas in Ruby and Common Lisp, my final target implementation language has always been Java (trying to make something that will be very widely used).

In the Java version, I implemented a threaded asynchronous system for maintaining both a relational database and manage Lucene indices and search. Last night I spent some time reading the documentation and playing with the Hibernate Search example programs. Hibernate Search supports both an asynchronous update mode and a simpler synchronous mode where objects are created in a relational database and immediately indexed for search. The search API returns either object IDs and search scores or simply returns Java objects matching a search query. The important thing to me is that by using Hibernate Search I can remove a lot of my own code making my system easier to understand and modify, and take advantage of future improvements made by the Hibernate Search developers. Writing my own version was fun and educational, but I like the Hibernate Search implementation more than my approach that was custom fitted to my application. Great stuff!

For Ruby developers: check out acts_as_ferret that provides the same kind of integration between ferret (Ruby/C port of Lucene) and ActiveRecord: also good stuff!

Wednesday, June 20, 2007

Cool: Ruby Google GData client library

This library is still being developed (right now just calendar and blogger support is in place). If you need a complete client API implementation right now, use the Java libraries: the examples are very complete making it easy to get started. I am going to wait for the rest of the Ruby client library to be released before I get serious with GData applications because find Ruby to be a more convenient "hacking" language to get (mostly small) projects done quickly.

Fantasy that may likely come true: Prometeus (experience is the new reality)

This video, at least on a technology level, syncs up very well with my own expectations: 5 minute Video on YouTube - well worth watching if you have not already seen it

Artificial Intelligence (AI) information assistant avatars, compelling Virtual Reality (VR) with haptics (*), and shared information will almost certainly be in our future (assuming that Bush or someone else does not start using nuclear weapons and destroy civilization).

I think that many of you would agree on this view of our future technology and how it will affect life and work. What is less clear to me is what our society will look like; I think that assuming our civilization lasts, we have 2 probable outcomes:
  • Novelist William Gibson's view of the future: governments are sidelined, corporations form the structure of economic and work life, "dog eat dog" meritocracy at all levels of society - this is the outcome I expect
  • A system like we have now, but where "competitive" governments compete for skilled workers and citizens by competing on low tax rates, public safety, fair and balanced arbitration between public and corporate interests - probably not going to happen, but this would be the best outcome
(*) haptics: force feedback on human/computer interfaces, sense of taste, smell, etc. I implemented a force feedback steering wheel for a VR racing simulator about 12 years ago: if you would drive the vehicle off of the racing track, you would feel the off road vibration differently than the on road steering wheel vibration; hit something and feel a realistic effect in the steering wheel. Coupled with good sound experiences with spatial effects (head related transfer function) this kind of thing is very compelling, or "suspension of disbelief" as it is known in the VR business.

Sunday, June 17, 2007

Using Lucene with JRuby

I use the Ruby Ferret indexing and search library a lot. Ferret is a port (some Ruby, mostly C) of Lucene. I have recently been getting into using JRuby. A few days ago, I discovered that it was reasonable easy to run a simple Rails web application using the Java application server JBoss using JRuby (this took me an hour - next time will be easy). Today, I spent a short while getting Lucene and JRuby working together:
require "java"
require "lib/lucene-core-2.1.0.jar"

class Lucene
@index_path = nil
def initialize(an_index_path = "data/")
@index_path = an_index_path
end
def add_documents id_text_pair_array # e.g., [[1,"test1"],[2,'test2']]
index_available = org.apache.lucene.index.IndexReader.index_exists(@index_path)
index_writer = org.apache.lucene.index.IndexWriter.new(
@index_path,
org.apache.lucene.analysis.standard.StandardAnalyzer.new,
!index_available)
id_text_pair_array.each {|id_text_pair|
term_to_delete = org.apache.lucene.index.Term.new("id", id_text_pair[0].to_s) # if it exists
a_document = org.apache.lucene.document.Document.new
a_document.add(org.apache.lucene.document.Field.new('text', id_text_pair[1],
org.apache.lucene.document.Field::Store::YES,
org.apache.lucene.document.Field::Index::TOKENIZED))
a_document.add(org.apache.lucene.document.Field.new('id', id_text_pair[0].to_s,
org.apache.lucene.document.Field::Store::YES,
org.apache.lucene.document.Field::Index::TOKENIZED))
index_writer.updateDocument(term_to_delete, a_document) # delete any old docs with same id
}
index_writer.close
end
def search(query)
parse_query = org.apache.lucene.queryParser.QueryParser.new(
'text',
org.apache.lucene.analysis.standard.StandardAnalyzer.new)
query = parse_query.parse(query)
engine = org.apache.lucene.search.IndexSearcher.new(@index_path)
hits = engine.search(query).iterator
results = []
while (hits.hasNext && hit = hits.next)
id = hit.getDocument.getField("id").stringValue.to_i
text = hit.getDocument.getField("text").stringValue
results << [hit.getScore, id, text]
end
engine.close
results
end
def delete_documents id_array # e.g., [1,5,88]
index_available = org.apache.lucene.index.IndexReader.index_exists(@index_path)
index_writer = org.apache.lucene.index.IndexWriter.new(
@index_path,
org.apache.lucene.analysis.standard.StandardAnalyzer.new,
!index_available)
id_array.each {|id|
index_writer.deleteDocuments(org.apache.lucene.index.Term.new("id", id.to_s))
}
index_writer.close
end
end
This code assumes that the Java Lucence JAR file lucene-core-2.1.0.jar is in the subdirectory lib. A short test program is:
require "lucene"
require 'pp'

ls = Lucene.new
ls.add_documents([[1,"test one two"],[2,'testing 1 2 3'], [3,'this is a longer test string']])
ls.delete_documents([1]) # optional: test document delete from index
pp ls.search("test")
I had some hesitations about JRuby: I was concerned that using JRuby would lack the light weight feel of hacking in native Ruby. No worries though: JRuby is easy and quick to work with.

Tuesday, June 12, 2007

GMail PowerPoint file viewing

If you use GMail, try searching your email for "pps" - the extension for PowerPoint slideshow attachment files - then select "View as Slideshow". Very cool. Now if Google adds a service like Gliffy online diagram editor to the Google Office suite, then they will have a real office replacement: web applications that are less capable that desktop applications like OpenOffice.org but are a big win when it comes to group access, search, etc. I have started writing papers less than 3 or 4 pages, notes on how to configure servers for Tomcat, Seaside, etc., ideas for technical and creative writing, etc. on Google documents. For me, the one last 'must have' feature for Google documents is a technical diagram drawing tool with the ability to embed figures in Google word processing documents.

Monday, June 11, 2007

Apple Safari for Windows

I am surprised: Safari for Windows :-)

I just installed the new Safari beta 3 on my MacBook, but did not grab the Windows version. I see the big win distributing a Windows version of iTunes, but I don't see the win for tossing another web browser into the mix available for Windows unless Apple is planning on making Safari into a platform with "proprietary" extensions. I quoted "proprietary" because the core of Safari is open source.

Sunday, June 10, 2007

Politics: a lost cause (I changed my blog name)

As my friends and family know, I have basically given up on the political and economic situation in my country (USA): total corporate control of the news media along with large scale corporate bribes to both democrats and republicans is having two un-stoppable effects: we are seeing the largest transfer of wealth to the super rich ever and our representative democracy has lost effectiveness when the corporate media can effectively sideline popular candidates like Ron Paul and Dennis Kucinich.

Oh well. One last bit of advice to the economically marginalized (i.e., "the middle class"): perhaps it would be a good time to downsize your lifestyle and get rid of as much debt as you can, while it is still possible to do so.

On a happier note: when I removed the word "politics" from my web blog title, I added "Knowledge Management" - something that I work on a lot.

Saturday, June 09, 2007

JRuby 1.0 released

Great news! Although I also use Common Lisp and Java a lot, Ruby is a favorite programming language because code is so concise and easy to read. If I never had to worry about runtime performance, I would use Ruby even more. I suspect that JRuby will evolve into a reasonable fast language even if the JVM is not the best runtime platform for a dynamic language like Ruby.

Monday, June 04, 2007

Why the complaints over purchaser's name embedded in Apple's non-DRMed music?

I do not understand the complaints! If I buy non-DRMed music that I am free to personally use the rest of my life on any device that I own, why should I care if my name is watermarked in the music? This music is not supposed to be shared with other people anyway - fair enough.

Apple did do something recently that I did not like: they made it more difficult to generate MP3 files. I use 2 cheap generic MP3 players, and not iPods so I do need MP3 files generated from what I purchase. I have seen some workarounds listed on the web, but it is an inconvenience. I suspect that there is a strong connection between the watermarking and the difficulty in converting to MP3s that would obviously lose the watermarking. It seems to me that Apple should have made the new version of iTunes make it very easy to generate MP3s, but have the conversion process copy the watermark - I would not care about that. Looks like Apple does not want to make using generic MP3 palyers with iTunes an easy process - I understand this but don't like it.

Sunday, June 03, 2007

Emacs 22.1 is released - first major update in 5 years

Check it out!

For Mac OS X users, I built the new version using:

./configure --enable-carbon-app=/Users/mark/bin --without-x

to also build an Aqua application in my personal bin directory in addition to a command line version in /usr/local/bin/emacs

Again, for Mac users: I like to put this in my .emacs file:

(global-set-key "\C-xt" 'tool-bar-mode)

So I can toggle the graphic toolbar icons on and off when running the Aqua application.

The URL for this blog has changed

Please use http://markwatson.com/blog/ now to access this blog. The old URL does a redirect. Please also use the new RSS feed: http://markwatson.com/blog/atom.xml

Saturday, June 02, 2007

Is the Common Lisp 'loop' macro evil?

I must admit to being "old school" when it comes to Lisp coding. I like s-expressions. The loop macro introduces a non s-expression syntax to the language that I find a little uncomfortable. I spent 20 minutes this morning looking for interesting loop example uses and generally playing with them. Definitely some cool stuff, but I think that I will continue to pass on using the loop macro, except when it is already in other people's code - no point in rewriting stuff that works.

Different programming languages have different coding styles that seem to work well and are language specific. As an example, yesterday I wrote some Ruby code with two nested collection collect blocks. I had been coding non-stop in Common Lisp for weeks, and before I checked my new Ruby code into svn, I took another look at it: looked strange at first even though it was a natural Ruby coding style. I looked at it again this morning, after my "thinking in Lisp" mode was temporarily suspended, and the nested collect blocks looked just fine. Amazing how choice of programming language affects the way we think to solve problems.