Friday, February 26, 2010

New (very) rough drafts available for both Java and Lisp editions of "Practical Semantic Web and Linked Data Applications"

While I have been following Semantic Web technologies for over 8 years, I find the slow adoption to be frustrating. I am trying to do my part to promote the Semantic Web by writing two editions of "Practical Semantic Web and Linked Data Applications." One edition has examples in Java/Clojure/Scala/Jruby and the other has examples written in Common Lisp. There is a lot of common material in both editions so I expect people to only read the edition that covers their favorite programming language.

I will sell physical print books via lulu.com and there will always be free PDF versions on my Open Content web page. I hope to have the Java/Clojure/Scala/Jruby edition finished by early April and the Common Lisp edition finished sometime in early summer.

I just posted PDFs for both editions. That said, these are very rough cuts, with large sections still to be written.

There are a lot of very good Semantic Web tools and frameworks - both open source and commercial. I am using the commercial AllegroGraph product for both books (the free edition is very capable and sufficient for many applications). Additionally, all of the examples in the Java/Clojure/Scala/Jruby edition also work with the open source Sesame library (even the geolocation and text search examples because I layered this functionality on top of Sesame).

Sunday, February 21, 2010

Ugh, Python. But, PyCharm is charming :-)

I love coding in Ruby but I have always had an aversion to Python. I have tried: about 5 years ago I carefully worked through Mark Pilgrim's excellent "Dive Into Python" book and I use Python to take advantage of libraries like NLTK (Natural Language Tool Kit).

I got pointed back to using Python this morning. I am writing an article for Developer.com magazine comparing Amazon AWS and Google AppEngine. I have used, I think, every available platform service on AWS and I have written a lot of Java code for AppEngine (and experimented with JRuby and Clojure). The one huge gap in my experience for writing this article is writing a Python based web app for AppEngine. And, Python really is the best supported AppEngine Language (e.g., smallest server instance startup time for "loading requests," better tool support for importing/exporting data, etc.) Anyway, it would be totally unfair to critique AppEngine without spinning up on their best supported language and tools.

One happy find for Python development, however, is the new preview release of PyCharm from Jetbrains. I just installed it and added a Ubuntu panel launcher for it right next to my launchers for IntelliJ and Rubymine. So far, I really like it - even with the rough edges I expect in a preview release of any product. I have been using IntelliJ a lot lately for Java, Scala, and Clojure development and I use Rubymine frequently so the close to zero learning time using PyCharm was nice, as is being able to switch easily between Python 2.5/2.6/3.1 etc.

Saturday, February 20, 2010

Wrappers: staying portable and agile

As developers we build systems using components that other people write. This is the rational way to write software: trying to get the best result using the fewest possible resources. That said, it really pays off in the long run to not only use the best available components but to plan and design for portability. The art of the wrapper. Seeing Jonathan Weiss's simply_stored wrapper for CouchDB and SimpleDB this morning reminded me how important this effort is.

The trick is to spend an appropriate amount of time staying portable. In the last 12 years I have invested quite a lot of time and resources working on my own entity identification code (i.e., find people, product, place names in text, relationships between them, associating pronouns in text with names, etc.) Today, I almost always use Open Calais instead of my own code because it performs better, but, I do maintain and improve my code just so I maintain the flexibility of having my own NLP components.

I like using the commercial product AllegroGraph as an RDF store and reasoner but I also take the time to maintain my own code that also acts as an RDF store with geolocation and full text search. I use a simple wrapper so it is easy switching between either back end.

So, how much effort is appropriate to stay portable? I think that some real effort is worthwhile, especially when it is now common to heavily use other people's infrastructure (in addition to software components). In other words, it is always good to have an exit strategy when using AppEngine, AWS, etc. When I talk with customers, cloud platform exit strategies are not as important as deciding on components, language, and general software stack - but it is still a good conversation to have.

Open source, when appropriate business wise, is arguably the best protection for controlling the platform and components that you use to build systems.

Wednesday, February 17, 2010

Comparing Clojure and Scala as 'Java replacements'

I have been doing more Java coding than I would like lately <grin>. An antidote has been some work with Clojure and Scala. Unfortunately, I have not reached a sufficient level of proficiency with either language that I can instantly switch between them: it takes me more than a few minutes to to get into the flow with either of them.

For customer work, I don't expect either language to be in much demand but since I also work on my own projects I would like to have a good alternative language when Ruby or Lisp (either Common Lisp or Scheme) don't work for me because of lack of libraries or other constraints. (BTW, in the last 3 years, probably 60% of my paid for consulting work has used Ruby, about 30% Common Lisp, and only about 10% Java.)

For my personal development needs, Clojure's and Scala's good support for concurrency is not too large of an issue. What is an issue for me is that I want to be able to use existing Java libraries and frameworks and I like concise programming languages (otherwise, I would simply keep using Java as my Ruby alternative). JRuby is also a powerful contender except that I find that using alternative languages stretches the mind and is worth the small amount of wasted time transitioning from using one programming language to another. I spend so much time using Ruby because it is a required language for projects that I would prefer another choice, even though Ruby is a great (or at least fun) language.

Except for the Common Lisp edition of the AllegroGraph book I am writing, I don't expect to be using Lisp very much in the next few years (unless a customer asks for it), so I would eventually like to settle on using either Scala or Clojure as my "concise Java alternative," but I am in no hurry to make a choice.

I definitely have an easier time integrating existing Java code with Clojure than I do with Scala. That said, I like Scala's syntax better (strange, given the fact that I have been coding in Lisp for over 30 years).

Tuesday, February 16, 2010

new URL for this blog: http://blog.markwatson.com

I had to change the URL for this blog: please update any links to:

http://blog.markwatson.com

Monday, February 15, 2010

Semantic Web: an alternative for RDFa

A few years ago I thought that XHTML would eventually be widely used but when the W3C decided to standardize on HTML5 (which I love for non Semantic Web reasons), that may have been the beginning of the end for RDFa because RDFa is an XML application.

I believe that a better alternative in a HTML5 world is to keep RDF separate from web pages but have a clear set of rules for finding RDF data files that correspond to web pages (either static or generated). One rule might be to look for a file named index.rdf for top level domain URLs; for example, see if http://markwatson.com/index.rdf exists for http://markwatson.com. For a URL like http://markwatson.com/hobbies look for http://markwatson.com/hobbies.rdf or http://markwatson.com/hobbies/index.rdf.

Although CMS support (e.g., Drupal) for RDFa and helper libraries like the RDFa Rails plugin might make it fairly easy for some web sites to provide RDFa, I think that we need something simpler that might be adopted by more web sites.

I am writing an open source tool (that will be an example program in the Semantic Web book I am writing) that will generate RDF data from web pages. I'll post a link when the code is ready.

Friday, February 12, 2010

Click is now a top level Apache project

Congrats to the Click team. My interest in doing Java web apps is low since I have mostly been using Rails for the last three years.

That said, Click hits a sweet spot with a good templating system and model super classes for pages, forms, tables, etc. Because I might be using Google AppEngine for more projects, I may need to use Java so I have just read through the documentation and code samples for the latest version that is compatible with AppEngine.

Saving money: using a 64bit Ubuntu VPS instead of a 64bit Large EC2 instance

I need to frequently run a 64bit Linux web service (commercial app, not available as a 32bit app). Unfortunately, small EC2 instances are only available as 32bit. I ended up using a 64bit VPS instance (I use RimuHosting, but I think Slicehost also has low cost 64bit VPSs). Saving a lot of money over the long term :-)