Saturday, September 20, 2008

Extracting text from a documents

I am happy to see that the Apache POI project's new POI 3.5.1 beta 1 is supporting some OpenOffice.org document formats. I have been using POI for years to access the contents of Microsoft Office documents from Java applications. It is great to have one library that supports most document types that I need to work with. POI is also usable with JRuby or with RUBY using the POI-Ruby sub-project (requires compiling POI with gjc and then using SWIG). BTW, I have a Ruby library that I wrote about 4 years ago on my Open Source web page for working with OpenOffice.org, Word, and AbiWord documents if you want something simple and hackable.

No comments: