Thursday, September 25, 2008

Looking for reviewers for my book "Practical Artificial Intelligence Programming With Java"

I am within a month or so of completing the third edition of my book. This book will always be available as a free PDF from my web site and as an instant-print book.

I would very much appreciate technical feedback on the manuscript which can be downloaded from my open content page: www.markwatson.com/opencontent/

A direct download link is: www.markwatson.com/opencontent/JavaAI3rd.pdf

Thanks in advance!

Wednesday, September 24, 2008

Inspirational: David Heinemeier Hansson's keynote talk at RailsConf

I enjoyed his talk because we share many of the same values and ideas: work relatively few hours per week but make them count, invest in your own future, etc.

For much of my career I worked at a large corporation (SAIC) but usually just worked 32 hours per week (anything above 30 hours qualified me for full benefits). For giving up 20% of my pay for not working Mondays, I had time to write, spend more time with family and friends, and simply enjoyed life more. David's company uses a 32 hour work week and he said that they have not lost productivity.

I also liked what he said about figuring out what things really make a difference to society and your career and work hard on those rather than spending too much effort on things that don't matter very much. Good advice, but not so easy to do. I live in a beautiful but remote area so my current career is basically competing with other telecommuters, many of them who live in countries with a much lower cost of living. Getting the most (important) work done in short time periods is crucial for my business so efficiency "is king" in my home office and as David points out, significantly more productive tools like Rails really do provide a good competitive advantage. I still like to build systems using Java server side technologies but only if I am certain that a customer can afford larger development costs and the project needs the extra runtime performance and scalability.

David also talked about life outside of technology and I am fairly good at taking time off for hiking, kayaking, cooking, movies, etc. That said, maintaining life outside of technology is a challenge for me because even though I can usually keep my consulting workload between 15 and 25 hours a week, I also enjoy writing (a lot!) and it is not so easy to limit my time. A little off topic, but one of the things that I enjoy most about writing is that the process always makes me understand things better. Often I will feel like I understand something just because I use it in my work, only to discover that in trying to explain something I realize that there are gaps in my own knowledge or my level of understanding is not as deep as I thought until I spend extra effort.

Sunday, September 21, 2008

Very cool: PracTex online magazine for LaTex users

Something for people who love to use LaTex (all 17 of us): PracTex RSS feed.

I very much enjoy writing and using LaTex makes the whole process even more fun. I just discovered PracTex this morning (thank you Google Reader "suggestions" :-)

Saturday, September 20, 2008

Space4J: similar to Prevayler but takes advantage of Java 1.6 concurrent data access APIs

Prevaler is a great alternative to using a relational database if you need persistence and your application's data easily fits in memory. I wrote a large web app (similar to SharePoint) about 6 years ago using Prevaler and at one point it ran almost three years without restarting (until the server that it was running on needed maintenance). Prevayler is solid stuff.

Space4J works on the same concept but takes advantage of the Java 1.6 concurrency APIs. Space4J is a new project and I have only had time to look through the source code and examples, but it definitely looks like a possible substitute for Prevayler on future projects. I think that Space4J might perhaps take more advantage of generics (e.g., the interface Space and implementing classes could be "generic-ized" for type safety and elimination of unchecked exceptions).

Extracting text from a documents

I am happy to see that the Apache POI project's new POI 3.5.1 beta 1 is supporting some OpenOffice.org document formats. I have been using POI for years to access the contents of Microsoft Office documents from Java applications. It is great to have one library that supports most document types that I need to work with. POI is also usable with JRuby or with RUBY using the POI-Ruby sub-project (requires compiling POI with gjc and then using SWIG). BTW, I have a Ruby library that I wrote about 4 years ago on my Open Source web page for working with OpenOffice.org, Word, and AbiWord documents if you want something simple and hackable.

Monday, September 15, 2008

Distributed robust system for provenance and trust in Semantic Web Applications and Tim Berners-Lee's new World Wide Web Foundation

With some reluctance, I am going to toss out what I think is a great business idea that is too large and resource intensive for me to pursue myself: develop the infrastructure and business models for a network graph (not a hierarchy) of "trust providers" similar to issuers like Thawte of SSL certificates, but for semantic web data.

First, I want to describe the problem to be solved: assuming the existence of RDF/RDFS/OWL data on the web, how do you know what is correct and what is faked for whatever nefarious reasons? What is the provenance of the data? Even human readers have a difficult time separating out real information from rumors, errors, and outright lies on the web.

Proposed solution: organizations "sign" data with a certificate for either a fee or other motivation. Using the current technology, RDF triples would be reified with one or more "trust tokens" (also implemented as RDF) from known signers who vouch for the provenance and accuracy of data. For now, this rating would have to be performed by human analysts, but could hopefully be done quickly and not too expensively with something like Amazon's Mechanical Turk system. I don't see this trust measurement as a Boolean trust or no-trust value - rather, a numeric range. Further: known signers can rate other signers. Signers would have a trust score. Accuracy and provenance of data could thus be assigned trust score based on the trust ratings given by one or more signers and the trust score of the signers themselves. The problem is to make this process of assignment a small fraction of the cost of producing RDF/RDFSOWL knowledge sources while adding significant extra value.

There is a lot of literature; try searching for "web of trust semantic web" and "provenance semantic web". When I read about Tim Berners-Lee's new World Wide Web Foundation this morning I started to hope that they might develop some open and free infrastructure software to support trust annotation of data. The high economic cost of quality trust-rated RDF/RDFS/OWL knowledge sources is definitely a problem, but it is difficult to even imagine the possible range of financial and social benefits. Having standard open source software to manage trust would help reduce costs for providing trust and provenance data through a network of cooperating trust providers.

Sunday, September 14, 2008

I'll be at MerbCamp in San Diego October 11-12

I usually use Rails but Merb is also a good alternative (more flexible, thread safe, and probably about twice as fast). Anyway, for Ruby enthusiasts who read my blog and will be attending MerbCamp let me know so we can meet up.

Rails, Trails, Lift, and Seaside

I am fairly much "in like" with Rails: I have been using it for personal and customer projects for almost 3 years. If Ruby had good runtime performance, I would be happy with Ruby and Rails for most of my development. Because Ruby is such a terse language, it is very easy to read and understand the code and (few) configuration files that Rails generates for you and it is easy to write custom models, controllers, and views - mostly because Ruby is such a fun language to work with.

I just took another good look at Trails this morning, and for building CRUD web applications it is starting to look very good because of the great runtime performance of both Java and the Tomcat/Jetty/Hibernate, etc. software stack. Unfortunately even with annotations for POJOs that make dealing with persistence much easier, Java is not a concise language and I find it less fun to browse generated code and customize generated applications. Not the fault of Trails: Java is a language that is optimized for very large projects rather than agile development of small or medium size projects. This is a "right tool for the job" issue.

Lift is written in the Scala language (runs on the JVM with good Java integration) and largely because Scala is more terse than Java, I find the generated code and any customizations to be easier to read, understand, and write. Scala lacks some flexibility of dynamic JVM languages like JRuby and Groovy, but the runtime performance of Scala is excellent. Lift's built in ORM persistence is modeled after Rail's ActiveRecord. The fact that Scala natively supports embedded XML makes it interesting for building web applications. As a language Scala looks very good but I have not had time to climb very high up the Scala learning curve.

Both Trails and Lift use Maven and are very easy to install and experiment with: check out the Trails quickstart and the Lift quickstart. Well worth experimenting with.

Seaside runs in the open source Squeak Smalltalk environment (free but performance is almost as bad as Ruby) or the VisualWork Smalltalk environment (fairly inexpensive commercial licensing and good runtime performance). I have not tried Seaside in other supported Smalltalk systems. When I use Squeak (not too often) it is usually to experiment with some old NLP code that I wrote: a good environment for trying out new ideas. I have experimented with Seaside, and it is easy to build small web sites with and also easy to deploy to Linux servers. Definitely a good option if you already know Smalltalk and you want to write very interactive web applications.

Friday, September 12, 2008

Very useful book: "LaTeX Graphics Companion, The (2nd Edition) (Tools and Techniques for Computer Typesetting)"

The authors have done a great job at creating a virtual encyclopedia that documents packages for generating graphics. (Amazon link)

I am using LaTex for most of my writing projects and this book provided me with a fast start for generating UML diagrams, 2D and 3D graphics, formulas, Chess/Go/Backgammon boards and move lists, a very wide range of engineering diagrams, music scores, etc., etc. This is a large book (almost 1000 pages) but the layout and well organized examples from the book (which are easy to try out) make the whole book feel accessible and lots of fun to work with.

Saturday, September 06, 2008

Java arrays and primitive types should (perhaps) be deprecated

I have been brushing up on my Java skills this year. For 2 years I did mostly Ruby and Common Lisp development. I think that Lisp is a great research language but not so great for deployments and long term maintainability. Ruby is a great prototyping language and for small web portal projects Rails is my favorite tool. However, I keep coming back to Java because of the tools/libraries and the robust deployment software options.

So, I have carefully read through both Effective Java (2nd edition) and Java Generics recently to brush up on my Java skills. As a result, I completely refactored one of my medium size projects to use generics and collection classes exclusively - no arrays. Since arrays must contain reifiable types they play poorly with generics.

There are some obvious cases where not using primitive types leads to excessive object creation and boxing/unboxing. That said, I expect Java compilers, Hotspot, and the JVM in general to keep getting better and this may be a non-issue in the future.