Friday, May 23, 2008

"Indiana Jones and the Kingdom of the Crystal Skull": Cate Blanchett steals the show

Cate Blanchett plays the villainess, with lots of humor. This movie is well worth seeing. Blanchett is an amazing actress who has played many very different types of roles in her career.

Wednesday, May 21, 2008

Packaging Java libraries to be "IDE friendly"

I have been working on packaging some of my (mostly) research code into four libraries (a picture), three that depend on a forth.

My concern is mostly "Java IDE kung-fu": users of these libraries (mostly just me :-) only want the highest level APIs to show in popup completion lists and the entire set of implementation classes remain invisible. The solution is easy: a public API class with implementation classes in the same package with package-ony (i.e., neither public or private) visibility.

Sunday, May 18, 2008

Scala and the Lift web applicaton framework

I have been playing with Scala for a while - playing is the correct word to use since I am waiting to see how popular the language becomes. I think that Scala will possibly end up being 'the better Java' for the JVM, but for my business I prefer not learning and using another language that is not main stream (my almost 25 years of using Lisp professionally has sometimes been a hassle because of the unavailability of other skilled Lisp developers and a smaller ecosystem, and I don't want to devote a lot of time to mastering another language that may end up being "on the fringe").

That said, Scala is a very nice language that has two non-language things going for it: very efficient runtime performance with OK memory use and that it runs on the JVM. Scala looks to be a good language for AI development and its interactive console adds some of the advantages of interactive bottom up development - a style I like to use when working in Lisp, Ruby, or Python.

Until this morning I have only read about the Scala Lift web framework, but after reading Vivek Pandey's blog about running Lift I gave it a try this morning. The maven setup and default web application construction was all very smooth, and the generated code was interesting to read. I also like the way Scala unit tests work and the debug modes supporting both an interactive Scala console and running with an embedded jetty web server. Everything works very well together and the entire system has a polished feel to it.

Saturday, May 17, 2008

Book review: "Semantic Web for the Working Ontologist"

Dean Allemang and Jim Hendler's book provides a good overview of data modeling for the Semantic Web. Amazon purchase link: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. As someone who has invested a lot of time with both open source tools (Jena, Redland, Sesame, OwlApi, Protege, and Swi-Prologs Semantic Web libraries) and a commercial product (Franz AllegroGraph) it is refreshing to read a good book that abstracts away details like specific tools and RDF XML serialization and covers concepts and modeling how-to issues. I found it useful to enjoy this book at a high level while stopping occasionally to pause and experiment at the low level with OwlAPI, Redland, Protege, Sesame, and AllegroGraph. BTW, I wish that someone had told me years ago to never view XML serialization of RDF :-) The authors choice of showing XML serialization one time and then using N3 is very good.

There are a few tiny annoyances with this book, the primary one being small errors in the text that should have been caught in technical review. These do not however detract at all from the usefulness of the book - it is just too bad that such a very well thought out book has easily fixed mistakes.

For me one of the potential uses of this book is to loan it to or recommend it to customers who might want or need to use Semantic Web technology: I make my living as a consultant and it is important to have well informed customers and this book will provide a good understanding and rational for technically inclined customers, especially people with strong domain knowledge who want to (and can) directly participate in modeling efforts.

Saturday, May 10, 2008

Programming: sometimes simpler is better

I recently chose a development environment for a spare time project: I am re-working some of my old algorithms and miscelanious code (in several different programming languages) for extracting semantic information from plain text after reading through the excellent book The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. I have been working on information extraction for about 20 years (very much part time), and although most of the material in this book was familiar I found the book to be an excellent reference and a good summary for the state of the art in information extraction techniques. I have blogged before about the excellent Reuters/ClearForest system - the authors were principles at ClearForest.

I chose for this project the combination of Gambit-C Scheme, Emacs, and a few customizations of the Gambit-C Emacs code. For "mostly thinking" projects like my information extraction library, I like simplicity: a simple clean programming language and an environment that provides good editing and debugging support but otherwise stays out of my way. Professionally, I do a lot of work with Common Lisp (either Franz + ELI + Emacs, or SBCL + Slime + Emacs) but since I am basically just experimenting with algorithms I felt like using something light weight. I thought about using Ruby (with either the excellent NetBeans support or TextMate) but I like the ease with which Gambit-C Scheme can be used to build native applications or libraries (compiles to intermediate C) and I will probably want to share my information extraction program (perhaps a free and commercial version) but not release the source code. The performance of compiled Gambit-C code is also very good.

Tuesday, May 06, 2008