Saturday, May 19, 2007

Why the ODF is better than Microsoft's document formats

It takes a few lines of Ruby code to process document files:
require 'rubygems'
require 'rexml/document'
require 'rexml/streamlistener'
require 'zip/zipfilesystem' # install with gem
include REXML

class OOXmlHandler
include StreamListener
attr_reader :plain_text
def initialize; @plain_text = ""; @last_tag_name =""; end
def tag_start name, attrs; @last_tag_name = name; end
def text s
@plain_text << s << "\n" if @last_tag_name.index('text')

class ReadOpenOffice
attr_reader :text
def initialize file_path { |zipFile|
xml_handler =
Document.parse_stream(('content.xml')), xml_handler)
@text = xml_handler.plain_text

I have spent too much of my time over the last 10 years dealing "programatically" with Microsoft document formats. I am tired of wasting my time when open document formats are so much easier and less expensive to use.