Tuesday, October 02, 2007

Good PDF book: "Ferret" by David Balmain

Slashdot had a discussion yesterday on indexing and searching documents - a subject of particular interest to me. After reading the comments, I revisited the indexing and search tools that I have used over the years: Ferret (a Lucene clone) is my favorite library for several reasons: it uses the Lucene API (which I have used for years), it is very fast, and coding in Ruby is faster for me than Java (Lucene) or Common Lisp (Montezuma). I bought Dave's book on Ferret yesterday, and it is a good reference with lots of good examples.

I have a "semi alive" open source project (KBSPortal) written in Java, uses Lucene and my own clustering and analysis libraries. I have been mulling over switching to Ruby and Ruby on Rails because it would be easier developing the web interface, I like to code in Ruby more than Java, and there are some very nice text analysis Ruby Gems that I could use in place of some of my own Java analysis code (in the spirit of building on other people's libraries, when possible, to take advantage of shared work). I get consulting work setting up custom document management systems and I would like to have a complete stack that could be set up and customized in less than a day.

No comments: