Sunday, December 28, 2008

Merb and Rails merge

I have some mixed feelings about the integration of Merb and Rails because Merb is a small "micro kernel" type framework that seems just about perfect for projects like light weight web services and web portals.

On the good side of this merger: if Rails becomes more modular and the resulting Rails 3.0 can also be tailored down to 'just what is needed', then this merge should end up being good for developers.

I have just signed a publishing contract for a new Ruby book that has one part on publishing information for consumption by both humans and software agents. After waiting a few months, I will grab early Rails 3.0 builds and write to Rails 3.0. A bit funny because before my holiday break I was deciding whether to use Merb or Rails for this book example.

Sunday, December 14, 2008

I am very much enjoying learning Haskell

As I have mentioned, I am learning Haskell to fill a gap in my programming language tool kit: a concise, productive language with great run time performance (low memory use and efficient utilization of multi-core CPUs).

I am experiencing another benefit: a different way to analyze and solve problems. Learning Haskell reminds me of using Prolog. Obviously Prolog and Haskell are very different programming languages so the comparison is really in how different they are from the languages that I usually use in my work (Java, Lisp, and Ruby). It has been over two years since I used Prolog in a consulting job but I used to use Prolog a fair amount. For the near future I am happy learning Haskell for a few specific applications and for giving me a fresh programming perspective. I am also pleased to see the wide range of Haskell libraries and the Haskell community seems interesting. I have been using Planet Haskell as a launch point for reading.

How far can corporate oversight reform go? Fix our agriculture industry?

I saw a link to this video on an O'Reilly web blog - good stuff, but how many people are willing to watch a 30 minute video, even for important issues?

A summary of Michael Pollan's points that resonated with my own beliefs (with my opinions added): The real costs of eating unhealthy food that also stresses our environment in production are hidden by huge government subsidies and the politics of agriculture corporations bribing and influencing the US Congress, and more recently the executive branch. In a weak economy and energy crisis the best "green industry" to promote is small farms that grow food close to where people live and run by farmers who are very skilled in running farms in an ecologically intelligent way. Higher costs of locally grown healthy food that leads to a sustainable society look more reasonable when potentially lower health and energy costs are factored in. This could be highly beneficial to farmers but would reduce profits of the giant agriculture corporations.

In the video, John Battelle and Michael Pollan talked about ways that transparency in our food industry would self-correct consumer food choices. If supermarkets were required by law to post pictures or videos of industrial meat factories then many consumers would naturally switch to healthier food alternatives, even with small extra expenses. If consumers understood the health risks of highly government subsidized foods like low quality factory meat, soybean oils, and highly processed corn-based products then free market forces would shift production to more "society friendly" food products.

You can help yourself and society by trying to buy locally grown food.

Monday, December 08, 2008

Best chicken, ever

My wife and I are both great cooks so when she said that the chicken that I made for our Sunday dinner was the best she ever tasted I thought that it was worthwhile posting my recipe.

24 hours before cooking the chicken (a whole, locally raised and organic) I washed it in cold water, trimmed a small amount of the fatty skin, split it through the breast, folding it flat. Optional: at this point I partially de-boned the chicken by removing the backbone but leaving the skin behind the backbone intact.

I added 3 tablespoons of salt, 2 tablespoons of soy sauce, 3 tablespoons of maple syrup, one teaspoon of turmeric, some pepper, and half a finely diced fresh onion to a large deep glass mixing bowl. I added some water and mixed all the "brine" ingredients. I placed the whole (but slightly flattened) chicken in the bowl, and added just enough water to completely cover it. I covered the bowl and placed it in our refrigerator for 24 hours. Note: it is very important to not let the chicken get to room temperature: poultry should always be kept at 40 degrees (F) or cooler until right before it is cooked.

24 hours later: I pre-heated our oven to 500 degrees and started our barbecue (I used real hickory charcoal, but gas barbecues are OK). When the oven was at 500 degrees, I placed the chicken in a glass shallow baking pan, split side with ribs facing down, and cooked it at 500 degrees for 5 minutes, then turned the oven off. 7 or 8 minutes later when the coals were ready, I placed the partially cooked chicken on the grill, rib side down. Cooking time will vary: try to get the breast meat done without drying out the meat. I turned the chicken a few times while cooking it. Note: the chicken meat absorbs a lot of water during the 24 hour "brining" period so a lot of steam is released while cooking the bird.

If you don't want to barbecue, you can just leave the chicken in the oven to cook it. After 5 minutes in a 500 degree oven, turn the oven down to 325 and continue until the meat is done. Note: you always want to start chickens or turkeys in a very hot oven to drive moisture in the meat into the center, near the bones - this helps to keep the meat from drying out.

Sunday, December 07, 2008

Haskell it is

For my own research (not for consulting work, at least for now) I need to speed up machine learning runs and other experiments. I have "4 cores" to work with (and I hope that my next server purchase for my home office has many more than that) so I have been playing around with different programming languages that support concurrency without a lot of effort.

Haskell has impressive run time and memory performance; for example: comparing Haskell and Scala. I have been reading an online version of "Real World Haskell" and recently ordered a print-copy of the book.

I usually do most of my exploratory/research programming in Scheme or Common Lisp so using a different language is fun. Gambit-C Scheme does have the Termite package for concurrency but something more main-stream like Scala or Haskell seemed like a better idea. I invested some learning time in Erlang about a year ago but I think that Erlang is more optimized for concurrency over different computers on the same LAN rather than using many cores in a single server.

Wednesday, December 03, 2008

Good article on adding security to Semantic Web applications

This article on the Sun BabelFish blog provides good design and some implementation notes for using SSL and (possibly) self-signed certificates for authentication for software agents that have write access to the URI that they are associated with. Useful stuff.

Monday, November 24, 2008

Something fun: new book project on the Semantic Web using AllegroGraph

The book is about 15% done (about 50 pages so far) and a rough draft PDF file is available. I realize that the market for this book will be small because AllegroGraph is a commercial product. However, Franz does make a non-commercial use version available for free, so my expectation is that when the book is done (between 2 and 6 months, depending on how busy my consulting schedule is) a fair number of people will enjoy the book with the non-commercial version of AllegroGraph. The finished book will be available for free as a PDF file and as a print book from lulu.com.

This book is fairly easy for me to write because I have existing coding experiments for just about all the Semantic Web application examples in the book. Also, since there are so many good Semantic Web references on the web and in existing books, I am only covering the SW technology that is used in the book examples. I want the book to be self contained: just enough tutorial and reference material covering AllegroGraph and other SW technologies so readers can completely understand the application examples.

Saturday, November 15, 2008

Giving up on just using one IDE

I have tried hard in the last year to standardize all of my work on one IDE (experimenting with Eclipse+Mylar and NetBeans). I have given up: lately it seems like I need NetBeans for Java-TV (Blu-ray Java) and JavaME tasks, IntelliJ for most other Java tasks and playing with Scala, TextMate (mostly) for Ruby, and Emacs for the Lisp examples for my new book project (on Franz's AllegroGraph). Anyway, I have given up on just being able to use one IDE: a nice thought but not realistic.

Monday, November 10, 2008

My new book "Practical Artificial Intelligence Programming in Java" is available in print and as a free PDF download

My book "Practical Artificial Intelligence Programming in Java, third edition"
is available in print and PDF download versions:
Support independent publishing: buy this book on Lulu.

A free download of the PDF version is available on my Open Content web page.

This book uses several excellent open source and public domain libraries and this code is distributed in the ZIP file of book example code. Please read the third party software licenses in the directory licenses-for-3rd-party-libraries. For the book example code that I have written you can use the Commercial Use License if you have purchased either the for-fee PDF version or the print version of this book. If you have downloaded the free PDF version from this web page then you may use my book example code under the Non-commercial Use License.

Sunday, November 09, 2008

Don't repeat yourself: for code, sure, but how about for data?

In individual applications we want to make sure that we don't have replicated code that is identical or very similar. What about replication across all projects, both active and in "freeze mode"?

I periodically like to consolidate source code: keep a single latest svn trunk version on my system and organize the code that I have written and frequently reuse into libraries. I am in the process of packaging up most of my Ruby code in local gems.

I also have issues with many copies of textual data files. For data used in Java libraries and applications the solution is simple: I keep data with the code that needs it in JAR files that are kept in a single library directory on my development system. I have been doing this for over 10 years and this is a really nice way to keep data assets and code together.

Sometimes I simply link data statically into compiled applications that I use (e.g., in the last year I have reimplemented many of my statistical NLP tools in Gambit-C Scheme and I generate a single command line utility program with all the required data statically lined.)

For data assets used in programs developed in multiple programing languages, a "separation of concerns" between code and data assets makes more sense.

I need to better organize other data assets like tagged training data, raw text organized into a hierarchy of categories, data that I have culled form the web and stored in XML files, etc. I am starting the process of putting the most up to date versions into a single directory and tweaking my code to check the DATA environment variable value and then load data assets as-needed. I will probably not import this data directory into svn or git: most of the data seldom changes and some of the assets are huge.

Friday, November 07, 2008

Good Ruby support in IntelliJ 8.0

IntelliJ 8.0 was released yesterday and after installing the Ruby + Rails plugin, IntelliJ is very competitive with the NetBeans for Rails development.

One feature that I particularly like is the jump links in the editor that let you jump from a controller method to the corresponding view template. There are also links from a method to the super class method that is being overridden (if any). There is currently a small bug in the plugin: multiple identical jump links are shown; all work the same.

In some ways it is nice to have Java and Ruby support in one IDE, but there are "Java only" menus shown while working on Rails projects - that is one advantage of the new RubyMine IDE: basically IntelliJ with all Java support removed. At the current time, Rails support for IntelliJ 8.0 seems to be more stable than the prerelease version of RubyMine but it will be interesting to compare the two next year when RubyMine is released as a product.

Tuesday, November 04, 2008

Bad news: I did not get to read "The Reasoned Schemer" this morning. Good news: no lines at my polling place today!

I have not read through "The Reasoned Schemer" in a long while, and since it is such a light (to carry) book I thought that it would be perfect for standing in line reading :-)

BTW, I do not know anyone who is not voting in this election. Whoever you prefer, McCain or Obama, vote!

Monday, November 03, 2008

Cool: JetBrain's new RubyMine

I have started using JetBrain's IntelliJ 8.x milestone releases for experimenting with Scala and doing Java development so I was very pleased to see JetBrain's new Ruby development IDE. I'll write more about it after using it on a test project. This is a public preview with a release date for next year.

Saturday, November 01, 2008

Complaints about Ruby memory use: false?

Please feel free to add a comment if you disagree with this, but I just don't see problems with excessive memory use in long-running Ruby apps like Rails and Merb. I saw a blog entry this morning where someone running a small web app and including passenger and Apache - about a dozen processes with a very large combined memory footprint.

I just checked two long-running deployments of mine (one Rails and one Merb) and the Rails application processes totaled about 80 MB VSIZE and the Merb combined processes were about 70MB VSIZE. In my setup, add in about 7.5 MB for nginx and 10 MB for memcached. Nginx and memcached are shared by all web apps, long-running and experiments, running on this particular server.

Have you seen excessive memory use in your deployments? If so, what are your setups? On the other hand, I would also like to hear about low-memory deployment schemes. I have not tried Phusion Passenger but I heard good things at MerbCamp 2008 about reduced memory use with Passenger.

Friday, October 31, 2008

My Merb DevX article is online

I really like developing with Rails but for some types of web applications Merb is a very good fit. The Merb framework is much smaller than Rails and you have a lot of choices for ORM, template engines, etc.

My DevX Merb article was just published this evening and provides a simple example: rewriting my RubyPlanet.net Ruby news filtering web application.

I find Merb to be complementary to Rails so if you like developing with Rails I think that you will like Merb also.

Wednesday, October 29, 2008

Charming and useful: Tim O'Reilly's conversation with Yossi Vardi

I live in a remote area (2 1/2 hours from the nearest large airport) so I am always on the lookout for great online conference videos to watch - avoid travel and expense when possible. So far, my favorite online video from the European Web2Expo is the conversation with Yossi Vardi who was the founder of ITC and as O'Reilly says is "the godfather of Israeli venture capitalists."

Vardi's business model seems to be choose people who he likes and have a passion for something, and invest in them, treating them fairly. I liked his comment that "business plans are a sub-genera of science fiction", meaning that although you need a long term goal and basic business ideas, you need to be adaptive in how you reach your goals. O'Reilly and Vardi also had good thoughts on the cultural stigma of failure and how that can affect entrepreneurship.

A question from the audience addressed when a web 2.0 company might want to transition from free to paid-for use. Their advice was good: if you can get millions of users then free is a good business model. As Yossi Vardi said, you only then need one paying customer: the person who buys your company. Tim O'Reilly pointed out that new businesses can not count on having a huge number of customers so a good metric is how much real value a web application provides users. I also liked what they had to say about bootstrapping and using money efficiently. Good stuff!

Wednesday, October 22, 2008

Cool: OS X version of Mono 2.0 released

Why would I care? If you read my blog you know that I mostly enjoy developing in Ruby, Lisp, and Java. That said, I tend to get small consulting jobs using a wide range of programming languages and this is one thing that I like so much about OS X: good support for Python, Perl, PHP, Prolog, and yes, now C#. So we all have our favorite programming languages but computer science is mostly about algorithms, understanding operating systems, infrastructure software, etc.

I converted RubyPlanet.net to Ruby + Merb

As I mentioned in my last post, I am making the rewrite of my Java-based RubyPlanet.net web site in Ruby and Merb the subject of a DevX article. I just pushed the code to my server that I use for Ruby and Common Lisp web deployments. By default the site shows recent Ruby news but the more interesting way to use it is to filter the results (for example just show web blog entries containing "JRuby", etc.)

An interesting thing was that both the Java + JSP (which I did about 3 years ago) and the Ruby + Merb versions took about the same amount of time to implement and deploy (about 3 to 4 hours).

Sunday, October 19, 2008

Wow, Merb really is fast

I am re-writing a quick one or two evening hack from 3 years ago (RubyPlanet.net) in Ruby + Merb. I originally wrote this web site in Java - humorous since it is a Ruby blog feed aggregator :-)

I am also using this as the example program for a DevX article on Merb that I am working on. I just did a very quick first cut at the example program for the article (i.e., what soon will be the new RubyPlanet.net) and without doing any caching and hitting a database a lot, I was surprised to see that
ab -n 5 http://localhost:4000/
showed 44 page requests per second. My goal is to get 100 to 200 page requests handled per second (depending on the number of blog entries on the main page) using a single process on my MacBook, and that looks like it may be easy. Deployed on a very low cost VPS with a bunch of other applications, this web app may not be all that fast "out in the field" but I want that to be a limitation of the server, not the code.

Tuesday, October 14, 2008

MerbCamp 2008 wrapup

I am back home after attending MerbCamp in San Diego last weekend. Merb is sort of like a micro-kernel architecture version of Rails: a small core with many plugins (and also complete "slice" mini-apps) that (hopefully) do not depend on each other. The idea is that you only add in what you need. Right now is probably not the time to try Merb for the first time: the developers are working right now to release version 1.0 RC1 (with version 1.0 to follow as quickly as possible). I am currently using Merb 0.9.9 and I am not going to update until I can do a "gem update merb" to move up to the 1.0 APIs. As announced at MerbCamp, the developers want to stabilize the APIs for version 1.0 and then continually work with minor 1.x releases for about one year, then release 2.0 that is likely to not be very backwards compatible with 1.0. Also, 1.x releases will be backwards compatible with 1.0 but not necessarily other 1.y releases. I think that this is a good plan, and matches the way I have used Rails for the last three years: I tend to freeze individual projects against a specific version of Rails, and only do security updates. I plan on using Merb for a semantic web project, and I will probably just stick with 1.0 and not follow the API changes during the 1.x path towards 2.0.

Saturday, October 11, 2008

MerbCamp 2008

I am at MerbCamp - so far an enjoyable conference. Merb is a more modular and more efficient version of Rails. As you might expect, Merb does not yet have the easy 'out of the box' experience that Rails provides. That said, I will probably start to use Merb instead of Rails on some new projects because of performance (run time and much less memory required) reasons.

Monday, October 06, 2008

Swi-Prolog and the Semantic Web

A long time ago, my first useful experiments with using RDF were based on (after trying other tools) using Swi-Prolog's semantic web libraries. Since then, I have also been using other tools (mostly Sesame, some Jena, and some Franz's commercial AllegroGraph product - which I am planning on writing a short 'applications' book on, BTW, after I finish my Java AI book).

I noticed (see linked PDF paper) this morning that the RDFizing and Interlinking the EuroStat Data Set Effort (riese) architecture (diagram) uses Swi-Prolog on the back end. Very cool. The riese web site itself is interesting: human readable web pages with embedded RDFa for semantic web software agents. (Make sure you view page source on your browser.)

Friday, October 03, 2008

Trying to find a single Java web application framework

While I like using Rails for many types of web applications I still look to the Java platform for applications requiring higher run time efficiency and to take advantage of deployment tools and environments.

The problem I am having is that I would very much like to settle on a single framework in order to reduce the effort of staying on top of too many tools and frameworks. In the past I have used JSP, my own custom JSP tag libraries, other people's JSP tag libraries, struts, various ORMs, some work with GWT, some work with Spring, and some with Wicket. I used to use the full J2EE stack, but largely gave that up about 5 years ago.

I would like to be able to invest at most 100 hours of study time, and get back up to speed on a single framework, but I am not sure which to choose. GWT is very tempting but GWT does not cover all of the types of web applications and services that I am likely to be contracted to build. Seam looks good as an integrated framework, but I need to set aside a long weekend to give it a good preliminary evaluation.

Many different frameworks and tools leads to a healthy software ecosystem, but for an individual developer it really is best to choose a small subset of tools to use for business and to set aside a relatively small amount of time (usually evenings and weekends) to keep up with new stuff. I mostly perform many small and medium size development projects for customers, and choosing a single framework like Rails (in the Ruby development space) really helps me stay focused and not waste "spin up" time. I would like to be in the same place in the Java development space.

Wednesday, October 01, 2008

Odd but fun: online avatars

About a year ago a customer asked me to do a quick demo avatar for their Rails-based web site. We decided to use a trial license of OddCast.com's avatar toolkit. The demo was a lot of fun and very simple to implement (a few hours) but my customer decided to not buy the service. Anyway, I just received an OddCast newsletter that contained this link - make sure that you try uploading a picture of a face to animate; also move the mouse cursor around: OddCast avatars act like the old "X eyes" program, looking at the cursor. I used to do development work for both Disney and Nintendo, and the lesson I took away was to make things fun - entertainment is an end to itself. While browser based entertainment has a hard time right now competing with rich clients like World of Warcraft, browser entertainment experiences will keep getting better with JIT compiled Javascript, FLEX, Silverlight, etc.

Thursday, September 25, 2008

Looking for reviewers for my book "Practical Artificial Intelligence Programming With Java"

I am within a month or so of completing the third edition of my book. This book will always be available as a free PDF from my web site and as an instant-print book.

I would very much appreciate technical feedback on the manuscript which can be downloaded from my open content page: www.markwatson.com/opencontent/

A direct download link is: www.markwatson.com/opencontent/JavaAI3rd.pdf

Thanks in advance!

Wednesday, September 24, 2008

Inspirational: David Heinemeier Hansson's keynote talk at RailsConf

I enjoyed his talk because we share many of the same values and ideas: work relatively few hours per week but make them count, invest in your own future, etc.

For much of my career I worked at a large corporation (SAIC) but usually just worked 32 hours per week (anything above 30 hours qualified me for full benefits). For giving up 20% of my pay for not working Mondays, I had time to write, spend more time with family and friends, and simply enjoyed life more. David's company uses a 32 hour work week and he said that they have not lost productivity.

I also liked what he said about figuring out what things really make a difference to society and your career and work hard on those rather than spending too much effort on things that don't matter very much. Good advice, but not so easy to do. I live in a beautiful but remote area so my current career is basically competing with other telecommuters, many of them who live in countries with a much lower cost of living. Getting the most (important) work done in short time periods is crucial for my business so efficiency "is king" in my home office and as David points out, significantly more productive tools like Rails really do provide a good competitive advantage. I still like to build systems using Java server side technologies but only if I am certain that a customer can afford larger development costs and the project needs the extra runtime performance and scalability.

David also talked about life outside of technology and I am fairly good at taking time off for hiking, kayaking, cooking, movies, etc. That said, maintaining life outside of technology is a challenge for me because even though I can usually keep my consulting workload between 15 and 25 hours a week, I also enjoy writing (a lot!) and it is not so easy to limit my time. A little off topic, but one of the things that I enjoy most about writing is that the process always makes me understand things better. Often I will feel like I understand something just because I use it in my work, only to discover that in trying to explain something I realize that there are gaps in my own knowledge or my level of understanding is not as deep as I thought until I spend extra effort.

Sunday, September 21, 2008

Very cool: PracTex online magazine for LaTex users

Something for people who love to use LaTex (all 17 of us): PracTex RSS feed.

I very much enjoy writing and using LaTex makes the whole process even more fun. I just discovered PracTex this morning (thank you Google Reader "suggestions" :-)

Saturday, September 20, 2008

Space4J: similar to Prevayler but takes advantage of Java 1.6 concurrent data access APIs

Prevaler is a great alternative to using a relational database if you need persistence and your application's data easily fits in memory. I wrote a large web app (similar to SharePoint) about 6 years ago using Prevaler and at one point it ran almost three years without restarting (until the server that it was running on needed maintenance). Prevayler is solid stuff.

Space4J works on the same concept but takes advantage of the Java 1.6 concurrency APIs. Space4J is a new project and I have only had time to look through the source code and examples, but it definitely looks like a possible substitute for Prevayler on future projects. I think that Space4J might perhaps take more advantage of generics (e.g., the interface Space and implementing classes could be "generic-ized" for type safety and elimination of unchecked exceptions).

Extracting text from a documents

I am happy to see that the Apache POI project's new POI 3.5.1 beta 1 is supporting some OpenOffice.org document formats. I have been using POI for years to access the contents of Microsoft Office documents from Java applications. It is great to have one library that supports most document types that I need to work with. POI is also usable with JRuby or with RUBY using the POI-Ruby sub-project (requires compiling POI with gjc and then using SWIG). BTW, I have a Ruby library that I wrote about 4 years ago on my Open Source web page for working with OpenOffice.org, Word, and AbiWord documents if you want something simple and hackable.

Monday, September 15, 2008

Distributed robust system for provenance and trust in Semantic Web Applications and Tim Berners-Lee's new World Wide Web Foundation

With some reluctance, I am going to toss out what I think is a great business idea that is too large and resource intensive for me to pursue myself: develop the infrastructure and business models for a network graph (not a hierarchy) of "trust providers" similar to issuers like Thawte of SSL certificates, but for semantic web data.

First, I want to describe the problem to be solved: assuming the existence of RDF/RDFS/OWL data on the web, how do you know what is correct and what is faked for whatever nefarious reasons? What is the provenance of the data? Even human readers have a difficult time separating out real information from rumors, errors, and outright lies on the web.

Proposed solution: organizations "sign" data with a certificate for either a fee or other motivation. Using the current technology, RDF triples would be reified with one or more "trust tokens" (also implemented as RDF) from known signers who vouch for the provenance and accuracy of data. For now, this rating would have to be performed by human analysts, but could hopefully be done quickly and not too expensively with something like Amazon's Mechanical Turk system. I don't see this trust measurement as a Boolean trust or no-trust value - rather, a numeric range. Further: known signers can rate other signers. Signers would have a trust score. Accuracy and provenance of data could thus be assigned trust score based on the trust ratings given by one or more signers and the trust score of the signers themselves. The problem is to make this process of assignment a small fraction of the cost of producing RDF/RDFSOWL knowledge sources while adding significant extra value.

There is a lot of literature; try searching for "web of trust semantic web" and "provenance semantic web". When I read about Tim Berners-Lee's new World Wide Web Foundation this morning I started to hope that they might develop some open and free infrastructure software to support trust annotation of data. The high economic cost of quality trust-rated RDF/RDFS/OWL knowledge sources is definitely a problem, but it is difficult to even imagine the possible range of financial and social benefits. Having standard open source software to manage trust would help reduce costs for providing trust and provenance data through a network of cooperating trust providers.

Sunday, September 14, 2008

I'll be at MerbCamp in San Diego October 11-12

I usually use Rails but Merb is also a good alternative (more flexible, thread safe, and probably about twice as fast). Anyway, for Ruby enthusiasts who read my blog and will be attending MerbCamp let me know so we can meet up.

Rails, Trails, Lift, and Seaside

I am fairly much "in like" with Rails: I have been using it for personal and customer projects for almost 3 years. If Ruby had good runtime performance, I would be happy with Ruby and Rails for most of my development. Because Ruby is such a terse language, it is very easy to read and understand the code and (few) configuration files that Rails generates for you and it is easy to write custom models, controllers, and views - mostly because Ruby is such a fun language to work with.

I just took another good look at Trails this morning, and for building CRUD web applications it is starting to look very good because of the great runtime performance of both Java and the Tomcat/Jetty/Hibernate, etc. software stack. Unfortunately even with annotations for POJOs that make dealing with persistence much easier, Java is not a concise language and I find it less fun to browse generated code and customize generated applications. Not the fault of Trails: Java is a language that is optimized for very large projects rather than agile development of small or medium size projects. This is a "right tool for the job" issue.

Lift is written in the Scala language (runs on the JVM with good Java integration) and largely because Scala is more terse than Java, I find the generated code and any customizations to be easier to read, understand, and write. Scala lacks some flexibility of dynamic JVM languages like JRuby and Groovy, but the runtime performance of Scala is excellent. Lift's built in ORM persistence is modeled after Rail's ActiveRecord. The fact that Scala natively supports embedded XML makes it interesting for building web applications. As a language Scala looks very good but I have not had time to climb very high up the Scala learning curve.

Both Trails and Lift use Maven and are very easy to install and experiment with: check out the Trails quickstart and the Lift quickstart. Well worth experimenting with.

Seaside runs in the open source Squeak Smalltalk environment (free but performance is almost as bad as Ruby) or the VisualWork Smalltalk environment (fairly inexpensive commercial licensing and good runtime performance). I have not tried Seaside in other supported Smalltalk systems. When I use Squeak (not too often) it is usually to experiment with some old NLP code that I wrote: a good environment for trying out new ideas. I have experimented with Seaside, and it is easy to build small web sites with and also easy to deploy to Linux servers. Definitely a good option if you already know Smalltalk and you want to write very interactive web applications.

Friday, September 12, 2008

Very useful book: "LaTeX Graphics Companion, The (2nd Edition) (Tools and Techniques for Computer Typesetting)"

The authors have done a great job at creating a virtual encyclopedia that documents packages for generating graphics. (Amazon link)

I am using LaTex for most of my writing projects and this book provided me with a fast start for generating UML diagrams, 2D and 3D graphics, formulas, Chess/Go/Backgammon boards and move lists, a very wide range of engineering diagrams, music scores, etc., etc. This is a large book (almost 1000 pages) but the layout and well organized examples from the book (which are easy to try out) make the whole book feel accessible and lots of fun to work with.

Saturday, September 06, 2008

Java arrays and primitive types should (perhaps) be deprecated

I have been brushing up on my Java skills this year. For 2 years I did mostly Ruby and Common Lisp development. I think that Lisp is a great research language but not so great for deployments and long term maintainability. Ruby is a great prototyping language and for small web portal projects Rails is my favorite tool. However, I keep coming back to Java because of the tools/libraries and the robust deployment software options.

So, I have carefully read through both Effective Java (2nd edition) and Java Generics recently to brush up on my Java skills. As a result, I completely refactored one of my medium size projects to use generics and collection classes exclusively - no arrays. Since arrays must contain reifiable types they play poorly with generics.

There are some obvious cases where not using primitive types leads to excessive object creation and boxing/unboxing. That said, I expect Java compilers, Hotspot, and the JVM in general to keep getting better and this may be a non-issue in the future.

Tuesday, August 26, 2008

Just out: my DevX article on Semantic Web, Sesame with Java and JRuby

I just noticed that my latest article on the Semantic Web has just been posted to DevX. This is a short article, dealing mostly how to use the Sesame library with Java and JRuby with some background material on RDF/RDFS (using N-triple and N3). One interesting thing that I do not really cover in the article, but include code for with the ZIP file for the article, is that I produced the sample RDF data using Open Calais. Enjoy :-)

Tuesday, August 19, 2008

Wanted: authors for my 'virtual publishing' business

My wife Carol and I are starting a virtual publishing business. We are seeking a few authors to work with.

Carol is a world class editor and I am going to help authors with book content planning, technical editing, and marketing. We are aiming for the niche book market. I have noticed that the large book chains now have very little shelf space devoted to computer books. With fewer books sold through the major book store chains, instant on demand printing is looking more attractive since print on demand books can still be purchased from Amazon and directly from the printer. Additionally, traditional publishers must consider the size of market for any proposed book project to offset their high overhead per book while the cost is relatively constant for instant printing.

We are planning on flipping the royalty split differently than traditional publishers: authors will get most of the profit from each book sold. The profit from on demand printed books will probably be around $10 to $15 per book. Compare this to a few dollars per book that publishers usually pay in royalties.

I believe that both authors and my wife and I can make sufficient profit from niche market books to make this profitable. I am about 1/3 done with my first book that we will publish ("Practical Artificial Intelligence Programming in Java, third edition") and I have done some work on writing example programs for a work in progress ("Practical Semantic Web Programming in Java").

My advice to people wanting to write a book: if the topic is more "mass market", then I would recommend going with a traditional publisher. I have had 14 books published and with only one exception, every publisher that I have worked with has been great. However, many of us have a real passion for specific subjects that are more niche or small market: here I believe it is better to write what you have real passion for, and hope that the higher profit per book sold makes print on demand work financially.

Wednesday, August 13, 2008

New version of my KBtextmaster NLP library is available

I just released a new version of my KBtextmaster Natural Language Processing (NLP) Java library. Free for non-commercial use, with a small fee for commercial use. Should also work fine with JRuby :-)

Friday, August 08, 2008

More use of Eclipse and Mylyn: new book project using Latex

Except for Ruby and Rails (where I use a combination of NetBeans and TextMate) I am switching over just about all of my projects to Eclipse and Mylyn because of Mylyn's task management functionality: if you have not given Mylin a try, please do :-)

I am working on the 3rd edition of my Java AI book and I set up Eclipse with TeXlipse today. Now, I have always liked using TeXShop on my Mac, and I still really like TeXSHop but the ability to have my book code examples and my Latex files in one working environment with Mylin task management makes it well worth the effort to switch setups.

Saturday, August 02, 2008

Eclipse Mylyn: a show-changer?

I have been using Netbeans this year for most Java, Ruby, and Ruby on Rails development. Except for Ruby on Rails development, I think that I may switch back to Eclipse because of the Mylyn sub-project. I blogged about Mylyn a year and half ago (when it was called Mylar) and I recently gave the latest versions of Eclipse and Mylyn another try with Java, plain Ruby, and plain Python projects.

Mylyn treats tasks as first class objects that aggregate developer experience with specific source files, emails, web sites visited, bug tracking systems, etc. In the simplest use, you create tasks and let Eclipse know which task that you are working on. Mylyn remembers which files you edit, etc. and only shows you working material that you use most for the current task. You can easily switch back to a "show me everything" mode. Switching tasks immediately changes Eclipse to show you only the working materials for the newly selected task.

As a developer it feels very comfortable to see just what you need as you switch tasks. I like to turn off email and the telephone during development to reduce distractions. Having an IDE hide everything but the working material for the task at hand also helps to stay focused. Cool stuff!

Tuesday, July 29, 2008

New cuil.com search site and other alternative search engines

As someone who has spent a lot of my own time experimenting with Nutch, I have long desired to create my own "niche" search site that indexed only technology sites with clustered result categories. So, I am a little envious of ex-google employees and their family/friends who (reportedly) had $30 million of venture capital to start the cuil.com search site. Although a lot of the images don't seem to match search results, cuil.com looks pretty good - I especially like the "Explore by Category" tab that works similarly to another favorite search site clusty.com. "Explore by Category" is both cool and useful!

It is interesting that new search engines can attract a lot of venture capital: with Google, Microsoft, and Yahoo all making very large investments, it must make investors nervous - but with the upside of large financial gains if any search startup gets a good fraction of the market.

Monday, July 28, 2008

Open data sources like Metaweb, Wikipedia, and SEC Edgar database

I just read a few month old blog by Toby Segaran (author of the very useful book Programming Collective Intelligence) on link information for shared board of directors members between large corporations. Many years ago I did something similar from combined CIA Factbook and SEC Edgar data and I still have a SQL dump file on my Open Source web page.

Since Toby works at Metaweb he fetched the corporate director link data from Metaweb (Freebase). Freebase sets a high standard for the ease of finding and extracting information. Other sources like Wikipedia (via custom web scraping or fetching their entire database) or the RDF extraction of Wikipedia (DBpedia) are not as simple to use, but still useful.

I have a long history of organizing and cataloging information, starting in the 1980s at SAIC. Back in the pre-gopher days, I used to maintain lists (as plain text files) of where to find useful tools and information on FTP sites on the Internet and when someone would ask me where to find something then I would grep my own lists. Things have improved a lot since then :-)

I just finished the rough draft for an article on the Semantic Web this morning. Although standards like RDF/RDFS/OWL/SPARQL are very useful, I expect the Semantic Web to also have a strong ad hoc component. However ad hoc information sources may have standard interfaces built for them (E.g., SPARQL end points, etc.)

Thursday, July 24, 2008

Dynamic language 'goodness': comparing JRuby and Java Semantic Web example programs

Although there are several Semantic Web libraries or frameworks that I like to use, I had to choose just one for a DevX article that I am finishing up. I chose to use Sesame. After covering what I think are some "big wins" of using RDF/RDFs/OWL (for some applications) I present some example programs that I hope that readers have lots of fun with. The "wrapper" library that I wrote for Sesame works fine for both Java (which Sesame is written in) and JRuby. I must say that for experimenting with Sesame, JRuby is a lot nicer because the example programs are much shorter and with Ruby duck typing it is easier to write callback handlers, etc. for my wrapper library. Being able to work interactively in a JRuby jirb shell is also a big win for experimenting with code, different SPARQL queries, etc.

Thursday, July 17, 2008

Programming for small devices

Several years ago I did a few projects for the "Java cell phone" (J2ME) platform, and had a lot of fun with that.

After recently setting up NetBeans with the Java ME CDC tools and Eclipse with the most recent Android platform tools, late last night and early this morning I installed Apple's latest developer's tools that include the iPhone SDK and Dashcode. Since I very much like my Nokia N800, I am also interested in medium resolution devices (the N800 has a good 800x480 screen).

My interest is in writing web portals that support both browsers and small devices. One option is just creating special CSS for different web browser screen sizes, and another option is rendering page view data as XML or JSON and letting rich clients provide the display and handling of forms, etc. (an option I used several years ago on a customer project).

Ideally, I would like to be able to support a wide variety of small devices without a very large investment in my time getting (back) up to speed. I have just a little experience with Objective-C and Cocoa so for the iPhone, just using Dashcode looks like a good option (for me).

Sunday, July 13, 2008

I am evaluating Google's Protocol Buffers for my knowledgebooks.com KB_bundle product

I am working on a new Java version of my knowledgebooks.com KB_bundle product (see home page for an overview) that implements an all in one toolbox for Natural Language Processing (NLP), entity extraction from text, text summarizing, text clustering, knowledge extraction to RDF/RDFS, support for document management (file management, index/search), and SPARQL querires of either embedded or external RDF data stores. KB_bundle will be free for non-commercial use and evaluation, and available for a fee for commercial use.

While I designed KB_bundle as an embedded Java library, I have always planned for both RESTful and SOAP web service support. I have been looking at Google's Protocol Buffer documentation and examples this weekend and I think that I will also supply a third wrapper for Protocol Buffer RPC support.

Earlier this year, a project that I was working on had performance problems due to the overhead of serializing data to XML and then parsing it in a REST based system. The problem was that when the project started, relatively little data was transferred between back end processes and a front end Rails application so the overhead of using XML was OK. As the project requirements changed, we passed much more data encoded in XML. I am looking at Protocol Buffer in general as a way to avoid performance problems in the future.

Saturday, July 12, 2008

OpenDS 1.0 LDAPv3 server

OpenDS 1.0 LDAP server has just been released and was easy to install, configure, and run. One thing that I especially like is that it is set up by default to run nicely in a development environment (including test data to play with) with directions for reconfiguring for production use with replication.

I used the JNLP setup file, hitting this link and accepted the standard install options (installed in my home directory in ~/OpenDS). There are test command line clients to test the installation and configuration; for example:
markw$ bin/ldapsearch --hostname localhost --port 1389 --baseDN "dc=example,dc=com" --searchScope base "(objectClass=*)"
dn: dc=example,dc=com
objectClass: domain
objectClass: top
dc: example
and then you can use JNDI APIs for Java client LDAP enabled applications. I think that Sun is going to offer good support for Glassfish + OpenDS (if they don't already). BTW, I have many years of good experiences developing on the Tomcat platform (and a little less use of JBoss) but I am becoming more enthusiastic about Glassfish, integration with NetBeans, etc. The days of consultants developing their own private set of infrastructure tools is just about over: for me, I look to either a subset of J2EE or Ruby on Rails to save development time on projects. Except for developing my own tools that are very application domain specific (usually AI, text and data mining, NLP, etc.), I prefer spending time studying and using standard frameworks, plugins, and components.

Wednesday, June 25, 2008

A trip down memory lane: Pascal development

In the late 1970s, I used UCSD Pascal to develop what was the world's first commercial Go playing program ("Honinbo Warrior"). A nice language and tools. I just received a small grant to convert one of my LGPL Open Source projects (FastTag: a part of speech tagger for both English and for English + medical terms) from Java to Pascal.

Monday, June 23, 2008

Great tools: the Java advantage

Last February, I wrote a AI blog article on a very simple Ruby library to parse some relationship values returned by the Open Calais web service. I wanted the same functionality today in a Java program. I was surprised that the number of lines of code required was the same - strange since Ruby is a much more concise language than Java.

The trick was that I was able to point my NetBeans IDE at the WSDL file for the Open Calais web service, and the work of calling the web service was essentially done. The Java code for parsing what I wanted out of the returned result was a little longer than the Ruby code, so the two programs ended up being the same length.

Another example of a time saver for Java vs. dynamic languages like Ruby, Python, and Lisp: using a Java IDE (like NetBeans) to generate unit test stubs from any application class - not so good for test first development, but I don't do that: I like to make a first cut at implementing classes, then add unit tests. While the Domain Specific Language (DSL) Ruby on Rails does generate test stubs, they are not as complete or useful as what NetBeans generates for me.

Lastly, because Java is a statically typed language, Java IDEs still do a better job at refactoring, code completion, etc. than environments for dynamic languages although Emacs based Lisp development tools and the Ruby NetBeans tools are very good.

Friday, June 20, 2008

Good news: Microsoft to support ODF as default Word file format

I am going to wait a while, and if their support is good I will probably upgrade to the newest version of Mac Word.

I have written 2 of my last 3 books using OpenOffice.org, not Word, but Word is a slick product and if I feel very comfortable that my Word ODF documents are readable by all of the word processors that already support ODF, then Microsoft gets another sale - this all depends on their sticking to the ODF standard and not messing with it.

I recommended in a blog a few years ago that Microsoft both support ODF and stop releasing new version names of Windows and instead sell a yearly subscription for updates - let's see if they take my advice on that :-)

Thursday, June 12, 2008

PLT Scheme v4.0 is released

The PLT Scheme system has always been impressive and now it looks even better. It supports R6RS and many improvements to the documentation. If you have used PLT Scheme (DrScheme) before, it is a good idea to at least read through the Welcome to PLT Scheme Introduction to pick up the changes.

Sunday, June 01, 2008

Ruby on Rails 2.1 and the MagLev Ruby virtual machine

First, 2.1 looks like a great update: I have not seen any compatibility problems with 2.0 that could not be instantly fixed. The named_scope (has_finder) changes look good for organizing database queries, and I especially like the way they nest. I have not tried using the new gems dependency functionality yet but this looks very useful when deploying applications to fresh servers.

The news about the large performance boost using MagLev looks interesting, but I will reserve my enthusiasm until the project is further along and I can try running it myself. I find myself reverting from coding in Ruby back to Common Lisp or Java to get around performance issues, so a much faster Ruby runtime environment sounds good. Ruby is such a slow language, that there is plenty of room for improvement.

Friday, May 23, 2008

"Indiana Jones and the Kingdom of the Crystal Skull": Cate Blanchett steals the show

Cate Blanchett plays the villainess, with lots of humor. This movie is well worth seeing. Blanchett is an amazing actress who has played many very different types of roles in her career.

Wednesday, May 21, 2008

Packaging Java libraries to be "IDE friendly"

I have been working on packaging some of my (mostly) research code into four libraries (a picture), three that depend on a forth.

My concern is mostly "Java IDE kung-fu": users of these libraries (mostly just me :-) only want the highest level APIs to show in popup completion lists and the entire set of implementation classes remain invisible. The solution is easy: a public API class with implementation classes in the same package with package-ony (i.e., neither public or private) visibility.

Sunday, May 18, 2008

Scala and the Lift web applicaton framework

I have been playing with Scala for a while - playing is the correct word to use since I am waiting to see how popular the language becomes. I think that Scala will possibly end up being 'the better Java' for the JVM, but for my business I prefer not learning and using another language that is not main stream (my almost 25 years of using Lisp professionally has sometimes been a hassle because of the unavailability of other skilled Lisp developers and a smaller ecosystem, and I don't want to devote a lot of time to mastering another language that may end up being "on the fringe").

That said, Scala is a very nice language that has two non-language things going for it: very efficient runtime performance with OK memory use and that it runs on the JVM. Scala looks to be a good language for AI development and its interactive console adds some of the advantages of interactive bottom up development - a style I like to use when working in Lisp, Ruby, or Python.

Until this morning I have only read about the Scala Lift web framework, but after reading Vivek Pandey's blog about running Lift I gave it a try this morning. The maven setup and default web application construction was all very smooth, and the generated code was interesting to read. I also like the way Scala unit tests work and the debug modes supporting both an interactive Scala console and running with an embedded jetty web server. Everything works very well together and the entire system has a polished feel to it.

Saturday, May 17, 2008

Book review: "Semantic Web for the Working Ontologist"

Dean Allemang and Jim Hendler's book provides a good overview of data modeling for the Semantic Web. Amazon purchase link: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. As someone who has invested a lot of time with both open source tools (Jena, Redland, Sesame, OwlApi, Protege, and Swi-Prologs Semantic Web libraries) and a commercial product (Franz AllegroGraph) it is refreshing to read a good book that abstracts away details like specific tools and RDF XML serialization and covers concepts and modeling how-to issues. I found it useful to enjoy this book at a high level while stopping occasionally to pause and experiment at the low level with OwlAPI, Redland, Protege, Sesame, and AllegroGraph. BTW, I wish that someone had told me years ago to never view XML serialization of RDF :-) The authors choice of showing XML serialization one time and then using N3 is very good.

There are a few tiny annoyances with this book, the primary one being small errors in the text that should have been caught in technical review. These do not however detract at all from the usefulness of the book - it is just too bad that such a very well thought out book has easily fixed mistakes.

For me one of the potential uses of this book is to loan it to or recommend it to customers who might want or need to use Semantic Web technology: I make my living as a consultant and it is important to have well informed customers and this book will provide a good understanding and rational for technically inclined customers, especially people with strong domain knowledge who want to (and can) directly participate in modeling efforts.

Saturday, May 10, 2008

Programming: sometimes simpler is better

I recently chose a development environment for a spare time project: I am re-working some of my old algorithms and miscelanious code (in several different programming languages) for extracting semantic information from plain text after reading through the excellent book The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. I have been working on information extraction for about 20 years (very much part time), and although most of the material in this book was familiar I found the book to be an excellent reference and a good summary for the state of the art in information extraction techniques. I have blogged before about the excellent Reuters/ClearForest system - the authors were principles at ClearForest.

I chose for this project the combination of Gambit-C Scheme, Emacs, and a few customizations of the Gambit-C Emacs code. For "mostly thinking" projects like my information extraction library, I like simplicity: a simple clean programming language and an environment that provides good editing and debugging support but otherwise stays out of my way. Professionally, I do a lot of work with Common Lisp (either Franz + ELI + Emacs, or SBCL + Slime + Emacs) but since I am basically just experimenting with algorithms I felt like using something light weight. I thought about using Ruby (with either the excellent NetBeans support or TextMate) but I like the ease with which Gambit-C Scheme can be used to build native applications or libraries (compiles to intermediate C) and I will probably want to share my information extraction program (perhaps a free and commercial version) but not release the source code. The performance of compiled Gambit-C code is also very good.

Tuesday, May 06, 2008

Saturday, April 19, 2008

Using JSON for communicating between Ruby and Lisp or Scheme

JSON is very much lighter weight than XML and is meeting a need for easily calling some Scheme code from a Ruby program. The Scheme code I am using is old, I wrote it years ago to extract entities from plain text. Since the Scheme program starts very quickly, I am able to simply start a Scheme interpreter as a separate process using back quotes to capture any output to stdout in a Ruby string variable:
require 'json'
s = `gsi extraction.scm -e '(get-proper-names "President Bush moved to South America.")'`
# parse JSON text...
Here I am using Gambit-C Scheme interpretively. It is very easy to write any structured data in JSON format in the Scheme (or Lisp) code and parse it on the Ruby side. It is also easy to speed things up and compile my program:
gsc -c extraction.scm
gsc -link extraction.c
gcc -o extraction extraction.c extraction_.c -lgambc -I/Library/Gambit-C/current/include/ -L/Library/Gambit-C/current/lib/
which both makes the program faster and reduces the startup time down to a few milliseconds. Since the extraction program starts very quickly, this solution is good enough. However, I have been looking at an alternative idea in case I ever need to use a Lisp or Scheme program that has a long startup time. I have not done this yet, but here are my ideas for Gambit-C Scheme:
  • Write a C wrapper for the Scheme (a good reference) to create a Ruby callable class whose initialization method performs the long running initialization for the Scheme library
  • Wrap the Scheme side library according to this reference
This looks easy enough, but getting memory allocation and deallocation correct might be difficult. For now, I am calling my old Scheme code from Ruby the easy way :-)

Tuesday, April 08, 2008

New Google Web App Engine

This looks very good. You get access to Google's infrastructure: GFS, big table, email, etc. The IDE is cool: it only requires Python 2.5 to run.

I was too late to get one of the first 10,000 developer invites, but I am hopeful that before too long my turn will come :-)

The Google App Engine supports many Python web development frameworks like Django, CherryPy, Pylons, and web.py. I have only used CherryPy to any great extent, but Django has a good reputation, and I once worked through it's tutorial and it looked good (also, very good documentation). Google also supplies their own Python web application framework webapp.

Sunday, April 06, 2008

National Public Radio: "Our Confusing Economy, Explained" - excellent!

www.npr.org
Perplexed by the U.S. economy? You're not alone. Law professor Michael Greenberger joins Fresh Air to explain the sub-prime mortgage crisis, credit defaults, the shaky future of other types of loans and what we can expect from the U.S. financial markets.

Greenberger is a professor at the University of Maryland School of Law and the director of the University's Center for Health and Homeland Security.
This is a long listen (39 minutes), but is the sort of thing that if every voting American listened to, and if every politician in Washington knew everyone listened to, that a lot of the future corruption in Washington might be avoided. Professor Greenberger also gives very solid advice on what we must do as a country to avoid having Asian countries "eat our lunch": emphasize science and technology and build companies that produce real products - and don't idolize people and companies who make money speculating instead of producing. I liked that he seemed to think our problems are fixable, but only with positive action. Listening to this interview, and then emailing your 2 senators and your representative in congress regarding this material would be a good start.

Friday, April 04, 2008

Blu-ray and Java vs. DVDs

As a consumer I have not bought a Blu-ray player yet - I think that it will still be some time before the technology settles. That said, I am intrigued by the BD-J Java programming environment for Blu-ray and something that I would like to look at. About 8 years ago I wrote a set top box emulation in Java for Disney, perhaps similar to what people may do in BD-J with custom directory viewers, ways to annotate material, etc. Who knows what smart developers will do with the platform? Not me.

I am turned off by the extremely high cost of writable Blu-ray media - yuck! The prices will come down, but by how much?

Massive personal storage does not (yet) excite me. We have hidef satellite TV with a hidef DVR, but we don't need to store too many hours of video, and there is nothing so far that we want to keep around for more than a week. My brother owns 1500+ Blu-ray, HDVD, and DVDs and he takes great pleasure in owning entertainment media. My wife and I prefer renting (NetFlicks) and time delayed DVR viewing. It is easy enough to re-order a NetFlix movie in the future if I want to watch something again.

I still find the storage on a DVD-R to be great for most of my uses, especially since I often compress backup data before writing. If you deal with a lot of text, like I do, it is amazing what fits onto a DVD-R.

Thursday, March 27, 2008

Book review: "Building SOA-Based Composite Applications Using NetBeans IDE 6"

Authors David Salter and Frank Jennings have written a very targeted book specifically for enterprise developers wanting to use NetBeans with the SOA plugins.

Service Oriented Architecture (SOA) can be loosely described as implementing business processes by combining web service calls to available services. Just as you would probably not want to perform database operations without transactions, when you combine web service calls for specific business processes you want to wrap them in transactions and using BPEL (Business Process Execution Language) modules makes it easier to express logic and error handling. The BPEL designer plugin lets you work with a graphical interface and avoid tedious hand editing of XML files.

Just like using full J2EE stack adds a lot of complexity, with attendant benefits, to large scale Java server side development, SOA also adds a lot of "baggage" to building systems using web services, but as systems become larger and more complicated that structure can really pay off.

There are no magic bullets in software design and development that remove the requirement for hard work in analyzing system requirements and iteratively work on design and staged prototypes: for SOA applications you need to understand your business processes, what events can occur, and what error conditions need to be handled. If you don't understand these basics, no framework is likely to save a project. If you do understand these basics, and you want to use Java SOA frameworks and tools, then this book is a good guide to getting started. I do have a few minor complaints about this book: figures are not numbered (perhaps not such a bad idea, I am just used to seeing figures identified) and there is sometimes too much detail given for some developers: the reader is carefully led through setup, using the IDE plugins specific to SOA (BPEL builder, WSDL and XML Schema editors, etc.) and sample applications with many screen shots. Even with these small complaints, this highly targeted book should be very useful to Java developers looking to develop SOA applications. It would be best if readers already had at least a little knowledge of SOA, BPEL, etc. since this book mostly covers using the tools in NetBeans to facilitate build SOA applications.

Wednesday, March 26, 2008

Minimum number of languages that a developer should master?

A funny moment: recently I heard a senior computer scientist blast Java on very poor performance (server side Java performance is very good - the language has other problems in my opinion) while extolling the virtues of Common Lisp as, if I understood him correctly, the one true 'do everything' language. I enjoy Lisp (I wrote 2 Springer-Verlag Lisp books, many years ago), but it is a marginal language if you count developers, available frameworks, and relatively few large deployed systems (a few notable exceptions to this). Lisp is a language that I can recommend to some people as a second or third language, but I would have a difficult time recommending it as any developer's primary language for writing production code. Lisp is great for research.

In any case, whenever I hear someone blasting Java, I know that they likely don't have real experience developing large scale Java server side applications and don't understand the benefits of the server JVM and a huge collection of good tools and frameworks.

Whenever I hear someone knocking scripting languages like Ruby, Python, or Perl I always think that what a shame it is that they don't save themselves a lot of development time using an agile scripting language - when appropriate.

Then there are times when it is "just right" that development be slow and painful, and C++ is used for runtime speed and memory efficiency :-)

Also if you need a carpenter to do some maintenance on your home: don't hire a carpenter who only uses a saw.

Advantages of open source: quickly working around problems

Yesterday was fun. We use SBCL (Common Lisp) on a project I work on, and my customer hit a limit of SBCL not using all the memory on one of their new servers. A quick email to a SBCL developer, and we were patched and running very quickly (thanks Nikodemus!)

A decision to use any open source project is usually based on how "vibrant" and active the community is surrounding an open source project. This is easy to determine developer activity by release history and user base by newsgroups + wikis.

I definitely have my favorite commercial developers (plugs to Franz Lisp, Jetbrains/IntelliJ, OmniGroup/OmniGraffle, and TextMate) and my decision to purchase and use commercial products is based strongly on support, patches, and new versions.

Sunday, March 16, 2008

C++, taking a second look

I earned my living between 1988 and 1997 developing and being a mentor using C++. I usually argue against using C++, a language that offers superb runtime performance (speed and memory utilization) with the penalty of greatly increased development costs.

For most applications, it is less expensive to optimize for developer productivity. In my decade of using C++, with hindsight, the only projects that needed the performance of C++ were a commercial product for Windows 1.03, some VR work for SAIC and Disney, work on two Nintendo video games, a real time expert system for PacBell, and a PC racing game. All other C++ projects could have been done with more economy and effectiveness using other languages.

I used to error on the side of staying current with too many programming languages, although in the last few years I have invested heavy time in only three: Ruby, Java, and Common Lisp. Even though I find C++ development to be less fun, slow, and sometimes even a little painful, I am considering replacing Common Lisp with C++ in my small set of 3 languages that I am willing to invest heavily in. This Benchmark game of C++ vs. SBCL Lisp is one reason for this (possible) decision. The other reason is that I find Ruby to also be very good for quick prototyping and fast agile development, and even though the runtime performance of Ruby is very poor, it seems like the combination of Ruby + Java + C++ covers a wider range of application development than the combination of Ruby + Java + Lisp. In other words: Ruby gives me a lot of what Lisp does and I feel like I need one agile scripting language in my programming language toolbox.

Sure, "the best tool for the job", but I like to also consider the costs of not totally mastering the tools that I use in my work. This is why I have given up (for serious work) some great programming languages like Python, Smalltalk, C#, and Prolog.

Saturday, March 15, 2008

Excellent video: Dan Ingalls demos Lively Kernel at Google

I wrote about Sun's Lively Kernel web browser based programming environment last month and watched this excellent Lively Kernel talk by Dan Ingalls this morning. I especially liked one point that he made: with the right architectural decisions systems can be fairly simple and still be general.

The demonstration on the video is both impressive and serves as documentation if you want to experiment with the Lively Kernel yourself. As it is, the Lively Kernel is a good prototype but if the performance is improved a bit and the programming environment and IDE (which is like Squeak Smalltalk, only for Javascript) continues to improve, then Lively Kernel is on my (personal) radar for something that I will use in the future.

I remember getting a Xerox 1108 Lisp Machine in 1982 and I also wanted the optional Smalltalk environment which was not in the budget. However, a sales engineer at Xerox did let me use Smalltalk for a month on a trial basis. It is amusing to be able to get a smalltalk like programming environment like Lively Kernel for free that is hosted in a web browser. Something about Moore's Law and also about free software projects...

Off course, by far the best free Smalltalk experience is Squeak but Lively Kernel is still very cool, if only for its possible future use of quickly writing web interfaces using existing web services, etc.

Wednesday, March 12, 2008

My Spanish4.us web portal

My Spanish4.us web portal is a simple 1 page fits all study center for a Spanish class that my wife and I are taking.

This is something that I love about Ruby on Rails: I had a simple idea and in less than 2 hours I had prototyped the application and deployed it to one of my leased servers. I will add more phrases to the popup translation tool and more general information over the next few weeks.

Friday, March 07, 2008

Ruby becoming a first class language on Mac OS X

Good news: Apple is supporting Ruby 1.9 using the Objective C runtime. This mainly to support building native Mac applications but this will eventually be good for other more general purpose Ruby development on Macs. Very cool since both Objective C and Ruby have their roots in Smalltalk.

Friday, February 29, 2008

Importance of understanding business and sales issues; dynamc languages

I tend to view work from a consultant's point of view, but this also probably applies to you if you work for a company: while staying on top of a few technologies is obviously important (I find that the combination of Ruby, Ruby on Rails, Java, and Common Lisp covers most of what I need to get just about any job done using tools that are at least reasonably appropriate), success in any information processing career requires more than just technical savvy:

What is much more difficult, but in some ways more fun and rewarding, is the effort to learn as much as possible about the business and sales processes relevant to each project. Ultimately, most software is written and maintained to meet business and sales goals, so it pays to understand non-IT related issues as well as technical IT issues.

One reason that I like to use Ruby is that the language is so concise and terse, that I can spend more time thinking about the larger issues of problems that I am trying to solve - the technical aspects of writing code are diminished.

Saturday, February 23, 2008

Ruby client code for accessing OpenCalais and Metaweb/Freebase web services

I wrote a Ruby API for accessing OpenCalais this morning. OpenCalais processes text and extracts semantic information.

On a slightly related subject: I have enjoyed experimenting with the Python Metaweb APIs in the last year, and I just wrote about in my AI blog about Christopher Eppstein's new ActiveRecord like API for accessing structured data in Freebase.

Thursday, February 21, 2008

Heavy weight Javascript client applications vs. lighter weight AJAX

I experimented with Mjt last year: Mjt is a client side template system: Javascript is used to merge data from JSON web service calls with templates to generate HTML - all in the browser (except for data fetched from a server). Mjt looks solid and has been fairy widely used; an alternative client side framework is Sun's experimental (not for production use!) Lively Kernel project. If you have not played with Lively Kernal, give it at least a one minute try - it uses the Morphic GUI framework, so if you have used Squeak, it will seem familiar.

The big problem, as I see it, of client side Javascript frameworks is issues of maintainability. I have worked with Javascript heavy web applications that other people originally wrote and they are definitely much more difficult to jump into, understand, and modify compared for example to AJAX heavy Rails applications or GWT web applications.

That said, there is something tidy about the idea of writing web applications in two intertwined but separate tasks:
  • Writing JSON web services and separately unit testing them
  • Interactively developing the client side with a framework like Mjt
I like to recognize technologies as early as possible that I might use in the future. Although I don't (yet) feel really comfortable working with frameworks like Mjt my gut feeling is that this is the future because it makes it easier to work with multiple languages and platforms for implementing web services and makes it easier to mix up data from multiple sources.

Microsoft live.com, Yahoo attempted buyout

I have been following the attempted Yahoo buyout with great interest because I buy into the idea of universal access to online information using many types of devices: PCs, Macs, iPhones, Nokia N800s, secret decoder rings, etc.

In the future that I predict and look forward to, following and exploiting standards will be absolutely required for success. As part of my own research (and fun), I just about continuously try and evaluate every type of online information service (Amazon's web services, Google gdata, freebase.com, dabbledb.com, etc.)

Microsoft's live.com seems to be getting better as far as supporting Mac, Linux, Firefox, etc. The question to me is: how open is Microsoft willing to become?

If I were to sit down and enjoy a beer with Bill Gates and Steve Balmer (unlikely unless they are vacationing in Sedona, Arizona) I would have some good advice for them: do a sea change and embrace open standards, stop selling new versions of Windows and instead sell yearly subscriptions to Windows and Office (slow improvements, no more big "XP", "Vista", etc. releases), and use their resources to make their software and infrastructure flexible, standard, and valuable to users.

If Microsoft does buy Yahoo, it will be interesting to see if they try to force changing to Microsoft infrastructure: they certainly had problems after buying Hotmail and doing a major conversion to Microsoft server side infrastructure. Yahoo is doing some great things with Open Source (Hadoop, Javascript libraries, etc.) and it will be interesting to see if Microsoft will permit using competing infrastructure software for internal systems.

Friday, February 15, 2008

My DevX article "Real-Life Rails: Develop with NetBeans, Deploy on Linux"

My most recent DevX article has just been published. This was fun material to write about because after some experimentation I feel like I have my Ruby on Rails development environment and server deployment strategy just right, at least for my needs. I should mention that although I have been professionally writing Ruby on Rails applications for a few years, I have not yet written an application that will not run nicely on a single server using nginx, memcache, and a few mongrels. I set my development.rb environment for my MacBook and my production.rb environment for the Linux server I am deploying to, and svn is the glue that holds everything together. If you are interested in deploying very large scale applications, my article will not be very useful to you.

IBM's Project Zero

IBM has an interesting idea with Project Zero, which borrows a lot from ideas behind frameworks like Ruby on Rails: use of a dynamic scripting language (Groovy or PHP), use of a "script aware" HTML template language, and built in support for REST and AJAX.

I worked through the tutorial that uses Groovy (instead of the other supported scripting language PHP), and my first impression is that the Eclipse plugin support is well done (although color and syntax support for editing templates would be good) and the framework meets its goals: support building interactive web applications with little required knowledge of the underlying technologies.

I would be more enthusiastic about Project Zero if I were a Groovy enthusiast. For Groovy loving developers, Project Zero looks to be very useful.

Friday, February 08, 2008

NetBeans 6.1 development build: almost there for my work

I just tried the daily dev build (NetBeans 6.1 Dev 200802080008) for OS X. It is almost there for my daily work - my current Java development project (a commercial version of my old NLBean open source project with a new AI NLP module), Scala coding experiments, and new Rails projects all work great. The one problem: I get errors when using existing Rails NetBeans projects (actually, I get the same errors when trying to modify project properties in new Rails projects but new projects can be created with the desired properties). Close, but not quite there. BTW, the Scala NetBeans plugins, which are very new, are looking very good.

Tuesday, February 05, 2008

PostgreSQL 8.3 on OS X: I like the full text indexing/search features

I built the latest version from source, with one problem: I was only able to install readline from source using "--disable-shared" so I ended up also building PostgreSQL statically linked - oh well so much for being in hurry, I have 2 gigs of RAM on my MacBook, so what is a little memory between friends :-)

I have been waiting for version 8.3 because of the full text indexing/search features. Here is the Text Search documentation - enjoy! Here is a little sample of the SQL extensions to support indexing and search:
test=# create table test (id integer, name varchar(30), email varchar(30));
CREATE TABLE

test=# create index test_name_idx on test using gin(to_tsvector('english', name));
CREATE INDEX
test=# insert into test values (1, 'Mark Watson', [email protected]');
INSERT 0 1
test=# insert into test values (2, 'Carol Watson', [email protected]');
INSERT 0 1
test=# select * from test where to_tsvector(name) @@ to_tsquery('mark');
id | name | email
----+-------------+---------------
1 | Mark Watson | [email protected]
(1 row)

test=# select * from test where to_tsvector(name) @@ to_tsquery('watsons');
id | name | email
----+--------------+----------------
1 | Mark Watson | [email protected]
2 | Carol Watson | [email protected]
(2 rows)

test=# test=# select * from test where to_tsvector(name) @@ to_tsquery('mark & watson');
id | name | email
----+-------------+---------------
1 | Mark Watson | [email protected]
(1 row)

test=# select * from test where to_tsvector(name) @@ to_tsquery('mark | watson');
id | name | email
----+--------------+----------------
1 | Mark Watson | [email protected]
2 | Carol Watson | [email protected]
(2 rows)

test=#
Obviously, if you were creating a new table with many rows, add the index after the data is added to the table. "gin" refers to a complete inverted word index. Specifying 'english' ensures that a word stemmer if used that understands English language conventions. Note that a search for 'watsons' matches because the search terms are stemmed before search.

The search syntax looks odd, but I expect to get used to it quickly. For Rails: I use "acts_like_ferret" a lot; I'll wait a month to see if any handy plugin is written for PostgreSQL specific search - I would rather that someone else write it. I need to check out acts_as_tsearch, but I don't think that it is updated yet to work with the final 8.3 release.

Monday, February 04, 2008

Snowing in Sedona Arizona

My wife and I took a drive this morning after it stopped snowing. Some nice pictures taken near our home in Sedona:
near Boynton Canyon
Dry Creek Road - looking west

XMPP (Jabber)

I had experimenting with XMPP on my long term list of things to do. I took a 90 minute break from work this afternoon to set up a playground: OpenFire XMPP server and the Ruby XMPP4r client library. Setting up the OpenFire service on one of my leased servers was easy - a very good administration web application and in general an easy install.

I had more problems with XMPP4r but setting Jabber::debug = true helped. I installed the easier to use wrapper library xmpp4r-simple but decided that its API was probably too limited (long term), so I might as well get used to XMPP4r.

I also grabbed the Common Lisp XMPP client cl-xmpp but experimenting with Ruby clients is probably easier. The OpenFire developers also supply a Java client library (Smack) that is on my list of things to try.

I think that XMPP may be a good "push" technology for distributed knowledge sharing systems (an interest of mine). XMPP has a lot going for it: a good security model, straight forward bi-directional communication between any two connected clients, and a publish/subscribe capability like the Java Message System (JMS). The Comet architecture (uses HTTP and JSON, instead of socket connections and XML) looks interesting but XMPP seems to have a head start and I don't think that I need to learn both technologies (yet).

Getting Things Done: a perspective from a work at home programmer

While I like to automate repetitive tasks (server deployments, builds, tests, etc.), I also enjoy "tuning up" my personal work habits, tweaking them to get things just right. Hopefully, you will find something useful here (and please add comments on how you "tune up" your own work flow):
  • I keep three lists of things to do: tasks for today (I include errands to run in the same list as work tasks), things to get done in the next week, and long term things that I would like to do, but might not ever get to. I work on a MacBook and use the "Stickies" Dashboard widget to keep these lists.
  • I schedule a break every 20 minutes using the "3-2-1" Dashboard widget. These breaks last a few minutes and give me an opportunity to walk around, get a glass of water or a coffee, step outside, etc.
  • Control interruptions: my wife does not work and is at home with me for most of my work day. Whenever my "3-2-1" 20 minute alarm goes off for a short break, my wife knows that she can talk with me without interrupting my work flow. Also, my parrot has become accustomed to the sound of my 20 minute alarm and gets excited when it goes off: he often gets his head scratched when I get up to walk around. I also like to avoid reading email more often than a couple of times an hour: unless I am on one of my short every 20 minute breaks, I prefer to not interupt my train of thought. I also have my wife screen my telephone calls for the same reason.
  • Use a pad of paper and a pen/pencil: for me, this is a great way to think, work on algorithms, etc. Computer science does not always have to involve using a computer :-) A pad of paper can save time, delaying coding until I have really thought about the best way to solve a problem.
  • Keep a detailed work log for each project or customer: it may seem counter intuitive, but I find that the 10 minutes a day that I spend maintaining detailed work logs makes me much more productive, long term. Having notes as text files, that you can quickly search, saves a lot of time the "second time" that you need to do something. I have work logs for a few projects that have been actively used for years, and no matter how large these work logs are, I can quickly find information about why decisions were made, how a particular server was set up, etc. -- saves a lot of time!
  • Start work early in the day: I know that this does not work for everyone, but my best work time is early in the morning. I believe that one of the best strategies for getting things done efficiently is taking sufficient breaks. However, as a consultant, I only get paid for the time I spend working with my pad of paper and laptop. By starting work early in the day, I can afford to take longer breaks during the day that keep me efficient. My favorite things to do for long breaks: take a long walk on the hiking trails behind my house, take my wife to a matinee movie, and have a picnic lunch down by Oak Creek (short drive from our house).
  • Don't drink wine every night: I find that I do some of my most creative work later in the evening, an hour or two after eating dinner. For me, working a few evenings a week, sometimes on fun educational projects instead of paid for work, gives me a different perspective. Adding a few evenings a week to my available work time obviously lets me clear more "to do" tasks and allows me to take longer breaks during the day to stay efficient.
  • I vary my work location: this goes against advice from other people to have one "work room", but I find that rotating between three locations inside my house and my deck, that I stay fresher mentally and this helps me relax while I work. When I used to work in an office, I found that I could increase my productivity by having "meetings" while walking around the block; this obviously works best for only 2 or 3 people. Breaking out of your normal working environment is stimulating and short "walking around" meetings can be very productive.
  • Appreciate the value of your work: I find that if I momentarily reflect on how my work helps people, then I can then stay focused on tasks that might not be intellectually interesting to me. In a similar way, I find that taking a few minutes, once or twice a day, to reflect on the blessings in my life, helps keep me in a grateful frame of mind, and be much happier and productive.
  • Tailor eating habits with your work schedule: small meals and healthy snacks are better than eating just a couple of very large meals each day. You will get tired after a very large meal, reducing productivity. Small and healthy meals and snacks keep your energy level up and keep you mentally alert. Avoid eating sugar; having a desert a few times a week is fine, but eating sweets every day is unhealthy, reducing efficiency. Like drinking wine, eating a good desert is more enjoyable if done only occasionally.
  • Try to eliminate things that cause you to worry. Worrying can be a real productivity killer. Two common things that people worry about are health and finances. We can not (yet) control our genetic makeup, but a healthy lifestyle yields a healthier life. Many financial problems can be eased by simply spending less than you earn and have a savings/investment plan. Time not spent worrying can be spent earning money or enjoying life with family and friends.
  • And, saving the most important for last: have a career that you love. It is much more important to work on things that you enjoy than to maximize the amount of money that you earn. You will get more things done if you (mostly) enjoy your work. If you enjoy your work then you may need to spend less money on material things to cheer you up and temporarily make you happy. Remember: there are two kinds of people in the world: consumers and investors.

Wednesday, January 30, 2008

I finally tried Ruby 1.9 (developer's release)

I have been putting this off - I have been busy. Ruby 1.9 uses the YARV and seems to be about twice as fast as Ruby 1.8 - a nice speed improvement which is likely to get much better before a Ruby 2.0 release. I tried the new version on several of my POR (plain old Ruby) programs, and everything worked fine for me. Rails would not work for me, but I did not expect it to. I like the way that gem and rake are now bundled with Ruby 1.9 - that makes good sense.

With the good work also being done on JRuby and Rubinius, it is great to know that Ruby is under active development.

Saturday, January 26, 2008

More TV/Internet convergence: Hulu.com

I don't know why, but I just received an invitation to beta hulu.com and they look good! Hulu.com has a very nicely done web interface - playlist management looks especially good. Many, many online TV shows with few and short commercials. I have been using Joost for almost a year, so I am definitely into "watch TV shows and other videos anytime you want".

I am a different kind of consumer than my Dad and older brother: between them, they own 4 high definition TVs, 3 Blue Ray players, and 2 HDDVD players. While they favor high definition (which is great!), I favor convenience - I like to take short breaks from work, exercise, and reading to watch movies and some TV. I find that being able to watch 10 or 20 minutes at a time, bookmark what I am in the middle of, etc. is just right for short breaks. For watching movies with my wife and friends, I am sold on Netflicks. I have yet to buy a Blue Ray or HDDVD player for our high def TV, but progressive scan DVDs are good enough for now. If Blue Ray really wins the format war, then I will get a player sooner rather than later. Netflicks provides a great service at modest cost (and I very much like their web portal for ordering movies, etc. - really, a brilliant piece of work)

For me, the loser in the video/movie media competition is cable and satellite TV. We spend a small fortune on cable TV and I definitely do not feel like we get our money's worth - I would cancel all but basic service if my wife would let me. I plan on switching to satellite service just to see if there is a difference and to get some high def channels - I might change my opinion :-) A lot of our friends really like using DVRs with cable or satellite TV, but for me, the rented DVD (soon to be Blue Ray disk?) is better. Unlike my brother who owns 1500+ Blue Ray/HDDVD/DVD disks, I don't want to own media - life is simpler renting what I want and not clutter up our small and very tidy home with stored physical media.

Friday, January 25, 2008

I updated CookingSpace.com - now shows nutrients for meals and daily meal plans

I have been having fun with CookingSpace.com. It is now feature complete as far as what I want in it myself, but have received a few suggestions for improvement since I hosted a new version a few days ago.

I wanted to track the vitamin K and fats in the recipes that I eat, then decided to show most of the nutrients in the USDA database. I now show nutrient breakdown by complete meals and 1 day meal plans. Meals are aggregates of one or more recipes. The easiest way to try it is to just click on "Show random recipe" and "Show random meal plan" a few times. I am only adding about one recipe a day, so it will be a while before there is a large selection.

CookingSpace, except for the nutrient calculations, is a simple Ruby on Rails application. I run it on a very low cost VPS using mongrel clusters and nginx. The only complex part of the web application is the "admin only" part that is used to define recipes. Some ingredients have many entries in the USDA database; chicken has about 100 entries so when entering a recipe I need to match with one specific entry (how chicken is cooked, skin/no skin, part(s) of the chicken, etc.) Because of this, it takes me 5 to 10 minutes to enter a single recipe.

For AI enthusiasts only: I have spent a fair amount of time thinking about automatically processing text only recipes into my internal structured data for a recipe. Tough problem since, for the chicken example, I would need to determine which of the 100 chicken database entries to assign to a chicken ingredient in a recipe based on the textual description for a recipe. I think that, given a half year, I could do this :-) However, I am doing CookingSpace as a public service to help myself and anyone else who wants to estimate their nutrient intake - so I hope that will be useful when I eventually get a few hundred representative recipes entered (the 'hard way').

Yes, tools matter

Sometimes people comment to me that I will change my mind rather quickly. I admit that this is sometimes true: life is a little like a game of chess, and you 'play the current position'.

Anyway, I have been perhaps too smug with Ruby and Netbeans 6.0: really a fairly good development environment for a very good language. Anyway, I have really been enjoying using Ruby+Netbeans - so much better than using TextMate or Emacs.

That said, Patrick Logan's blog on Dynamic Languages: Should the Tools Suck? made me do a double take. The first thing that I did was to fire up Squeak Smalltalk and run Patrick's message sender example. I am grateful to the developers of Netbeans and especially the Ruby support plugins, but yes, Ruby IDE support is still weak compared to Smalltalk - but hopefully getting better faster. (BTW, Squeak is almost as slow as Ruby, execution wise. Commercial VisualWorks Smalltalk is a lot faster if you need the extra speed.)