Friday, December 31, 2010

Nice: Neo4j version 1.2 final released

Neo4j is a high performance graph database that I usually use with the JRuby neo4j gem and sometimes with Clojure (documentation here).

Neo4j is open source (AGPL v3) and is alternatively available for a reasonable fee with a commercial license (where you don't need to AGPL your project). I took advantage of the free offer to get your first Commercial Basic Server license, even though I am likely to open source my project anyway.

Wednesday, December 29, 2010

Good SimpleDB performance tips

I don't usually write blogs that just reference other people's material, but this three part article by Siddharth Anand (first installment, then follow links to other articles) is really worth reading if you use SimpleDB.

BTW, while the local SimpleDB simulator simpledb-dev works OK, for development I usually just access SimpleDB remotely from my laptop. One word of warning: while properly constructed SimpleDB queries can run very quickly from EC2 hosted applications, remote access tends to be 5 to 10 times slower.

WebServius wrapper for web services and data services

Webservius is a wrapper that provides for your web service APIs and data sources. Webservius handles billing, tools for your customers to use to monitor their use of your services, logging and use statistics, etc. They charge you 10% of what is billed through their service.

When I have a chance I will try the free version of this service (limited number of API calls per day and access to end users must be free) and write up the experience.

I noticed WebServius on Amazon's blog but Webservius is an independent company that uses the Amazon AWS platform.

Monday, December 27, 2010

Control of news media == ability to set public opinion

In the late 1800s Western Union was able to alter the results of a presidential election (reference: The Master Switch: The Rise and Fall of Information Empires - recommended!)

A similar situation exists today in the USA: all major news sources provide a very slanted pro-corporate agenda that results in a large percentage of the population simply not understanding things like the benefits of a social safety net (a trade of tax money to keep society civil and safer), how we incarcerate a much larger percentage of our population than other countries, that Wikileaks has released a very small percentage of the diplomatic cables and those cables that have been released have been redacted to minimize the chance of putting people in danger, that we spend as much on our military as the rest of the world combined, how military spending enriches only a very small number of powerful people, the level of corruption in Congress (both democrats and republicans), etc.

It is sad that better sources of news come from the Comedy Central channel (for example, The Daily Show), Al Jazeera, foreign services like German Der Spiegel, small companies like, etc.

The USA is in rapid decline. Part of the decline is due to failing infrastructure, large scale corruption, massive military costs, and also a much less informed population. All of these means to decline are fueled by a tightly controlled dishonest news media.

Wednesday, December 22, 2010

Christmas came early: my Google TV arrived at 7pm tonight

Setting it up took 45 minutes because it immediately downloaded a new version of the OS when I set up the wireless Internet connection. I use Directv and it synced up with my DVR and Sony TV with no problems. The keyboard is nice, with a trackpad and mouse button top right corner. It is much nicer using the keyboard rather than the remote for Directv guide and DVR control.

I have experimented with writing Android cellphone apps with the SDK and now I want to try some HTML 5 apps for both the Android cellphones and Google TV. Fun! I had a consulting job two years ago writing some Java blu-ray example apps and the development environment for that was, frankly, painful. Both Android and Google TV seem much more developer-friendly.

You personalize Google TV by logging into your Google/GMail/Apps account. I had to enter my account information multiple times for and Picasa photo albums. There are some wrinklles to iron out but the platform has a lot of promise.

We watched part of a Netflix movie on demand and it looked good. I have a Hulu Plus account but when I went to Hulu I got a pop up dialog warning me that Hulu is building out for Plus access on Google TV. I am looking forward to that because Hulu Plus is a pretty good service. Along with all of the Directv and Netflix material, it seems like there is an infinite amount of good stuff to watch.

A few improvements I look forward to:
  • When looking at my Picasa photo albums it seemed like the Google TV did not cache-ahead very well. I would like a picture to stay on screen until the next one is ready.
  • When watching one of my wife's documentaries for a charity she helps, Youtube playback was pretty good because we uploaded a fairly lowres video to Youtube and it played back smoothly. I then tried one of my videos on Youtube that was fairly hires and there were too many pauses for buffering. Also, if I paused the video it did not seem to buffer ahead like on my laptop.
I am very excited with the Google TV platform and I thank Google for giving me a free Logitech unit. I am not always good at predicting which nacent technologies succeed in the marketplace but to me, at least, the Android platform and the Google TV built on very similar technology show a lot of promise.

Excellent product: RubyMine 3.0

I bought RubyMine when it was first released and recently paid for upgrading to version 3.0. I switch between using TextMate (or GEdit on Linux) and RubyMine for both Ruby and for Rails development but since getting version 3.0 I spend almost all of my time using RubyMine.

Rails 3 support is very good and working with RVM is a nice new feature. I never used to use the Ruby debugger (at all!) but I have used it twice briefly in version 3.0. At least for Rails 3 development I still separately run both rails server and rails console outside of the IDE - a matter of personal preference.

I was using PyCharm (also by JetBrains) a lot for 2 weeks and the autocompletion hints and instant syntax warnings really helped me because my Python and Django skills are light-weight. I don't much need autocompletion hints and instant syntax warnings for Ruby and Rails development but it is unobtrusive and often useful.

The HTML, JavaScript, Erb, and Haml support seems better also. One feature that I don't use in daily development much but that I really like is View->Show Model Dependence Diagram that gives a great overview of your database model classes, their attributes, and their associations. Great for documentation and to serve as a reminder if I have not worked on a project for a while and need to get back into it. Quick navigation support to jump to definitions, etc. also saves me time.

EDIT: Since I wrote this blog I have started using the Model Dependence Diagram fairly often. It would be great to also have an option to show a UML class diagram.

My only real disappointment with version 3.0 is that it does not implement the tear-off editing tabs (i.e., tear-off to create a new and separate window) that the latest version of IntelliJ supports.

That said, using RubyMine is a comfortable experience and the IDE features stay out of my way until I need them.

Getting the Dojo 1.5 rich text editor dijit.Editor working nicely with Rails 3

I could not find much on the web and it took me a little time to get everything working (especially saving edited content) so I'll share some code, hopefully to save you some time. In apps/views/layouts/application.html.erb I added some boilerplate (with non-editor stuff not shown) to be included with each rendered page:
  <link rel="stylesheet" type="text/css"
  <link rel="stylesheet" type="text/css"
    href= "">
  <script type="text/javascript" src=""
          djConfig="parseOnLoad: true">
  <script type="text/javascript">
Then in my view I create an inline rich editor using:
<div dojoType="dijit.Editor" id="editor1"
     extraPlugins="[{name: 'save', url: 'task'}]">
  <%= raw(@task_current.content) %>
It is very important to use the raw method because Rails 3 automatically makes safe HTML encoded strings for any result returned by <%= ... %>. Also notice the url value of 'task': this sends the content data to the index method of the task_controller. This method checks to see if an incoming request is an XHR POST and if so gets the raw content:
if request.xhr?
   if session[:task_id]
     task = Task.find(session[:task_id])
For brevity I did not show any error handling.


Tuesday, December 21, 2010

Reading "The Rails 3 Way"

Obie had his publisher send me a review copy of his book Rails 3 Way that was just printed. It is a very good reference for Rails 3, and it really is a reference book meant to be accessed for specific problems. That said, I am reading it straight through because although I have a lot of Rails development experience, I would like to understand Rails at a deeper level.

While Rails itself is "opinionated" Obie's book is even more so: he bases his business on Rails and the book reflects the way things are done in his company. The book uses Haml and RSpec exclusively. I still mostly use Test::Unit and Erb, and the book will probably not change that (but it might!).

I found Paolo Perrotta's "Metaprogramming Ruby" to be a great resource for getting more into the low level details of Ruby and I expect that "The Rails 3 Way" will serve the same purpose for Rails.

Friday, December 17, 2010

Using cloud services when services like and Wave get cancelled

You can still access my public bookmarks if you want them. I will miss the service, as I miss Google Wave.

I tried several cloud based to-do and getting things done style web sites but ended up writing my own web app but left the door open to other people by releasing the app as open source. I used ran the online word processor for a few years but made sure to provide data export options.

People and companies who provide free services have no real obligation to continue the services forever but they have a responsibility to give users a good exit strategy. I exported my bookmarks and I suggest you do the same.

Wednesday, December 08, 2010

Suggestions for Python SDK and AppEngine

Partly because I don't often code in Python, I have been feeling some pain using the Python AppEngine SDK. I thought that I would pass on a few things that have made the process easier for me:
  • Use an IDE. I have been using PyCharm but other people have mentioned liking Eclipse with the Python and AppEngine plugins. Live hints when I get a local variable name wrong, suggestions to automatically import one of my own modules, and autocompletion have really helped. If I were more familiar with Python and the Python AppEngine SDK then this would not be as beneficial.
  • Unit tests: I have been using to support unit testing and that has been helpful. I get all of the model code and controller helper functions working and tested before doing the UI
  • I find that the default Python SDK template engine (from Django) to be adequate and to be fairly easy to use.
  • I have found Mark C. Chu-Carroll's book Code in the Cloud, Programming Google AppEngine to be a useful tutorial.

Platforms and Infrastructure as service

Good news for the people at Heroku (Salesforce just paid 200+ million for the company). This is definitely a kick up for platforms as a service. I have long considered Heroku to be the best platform as a service offering because it is so developer friendly - very different from AppEngine which I would label as "scalability friendly."

I had to recently make a decision between developing for Heroku or AppEngine for my new business ideas ( and Heroku and Ruby on Rails, along with all of the useful plugins and auxiliary services offered by Heroku, make the most agile web application development and deployment story right now.

In contrast, developing for AppEngine is a pain in many ways. That said, I think that the new AppEngine SDK and services are a solid improvement and since I hope for more than small scale success the relatively inexpensive hosting costs and automatic scalability of AppEngine won me over (at least for these projects).

Infrastructure as a service is also becoming more accepted. My largest consulting customer is considering hosted infrastructure options for very large high performance datastores. As easy as it is to set up a cluster using MongoDB, HBase, or BigCouch, etc. there are so often good business and technical reasons to pay specialists companies to do it for you.

Thursday, December 02, 2010

AppEngine SDK 1.4 release is likely a game changer

I use AppEngine to host my own projects but not for my customers (everyone who has hired me for a few years has wanted Amazon AWS deployments - no exceptions).

I think that the story for using AppEngine is definitely better with the 1.4 release. A charge of $9/month for keeping 3 compute instances always spun up for an application seems like a modest cost to workaround slow loading request times. (I have written twice about this: 1 and 2.)

Ten minute CPU limits for background processes is a nice increase over the old 30 second limits.

I have recently been using the Python AppEngine SDK for my latest side project (something I need for my own use and I am also planning on offering it to other consultants for a small fee).

Platform as a service: I have had the opportunity to help many individuals and companies in 14 years of running my own consulting business and I feel like I have a fairly clear understanding of the opportunity costs of manually managing servers. Small teams especially get hit hard when too much available engineering time is spent on deployments and administration of servers. I expect managed deployment platforms like AppEngine and Heroku will be a strong growth market. Managing servers and deployments may be fun but often does not make business sense.

Tuesday, November 30, 2010

Python is not such a bad language

I have used Python off and on for about 10 years and have never particularly liked it. This is funny since Ruby is my favorite programming language and Ruby and Python are very similar both in features and in the types of applications that they are best used for. For me the largest shortcoming of Python is the lack of blocks.

That said, I have two small side projects ( and that I wanted to host on AppEngine. I have a fair amount of experience with the Java AppEngine SDK but my gut instinct was to go with the Python SDK using the default webapp library. After spending about 5 hours writing code, I find that Python is fairly comfortable. 7/9/2011 edit: I enjoyed getting a basic system working with Python (a great learning experience) but ended up writing a feature complete version in Rails and using that for several months. In the last week or so I have re-implemented it in Java + GWT for the Appengine.

Sunday, November 28, 2010

Must-have tool for understanding your web site and blog: Google Analytics

I use Google Analytics as a feedback mechanism for what people actually read on my blog and to know which pages on my web site people read (which I equate to which pages people find most useful). As an example, 50% of visitors go directly to my Open Content web page so this gives me a strong incentive to write more free web books and post the PDFs.

As a consultant I interact with many developers and companies and it always surprises me when people don't bother to measure performance, in this case understanding what content people find interesting and/or useful.

Speaking of measuring performance:

Another must have tool if you do deployments: get a free 1-server account at (Disclosure: if you end up eventually signing up for a non-free multiple server account then I get a small perk.) If one of your web applications starts to slow down it can save a lot of time being able to tell at a glance what the I/O, CPU, network, etc. activity has been in the last few days.

Tuesday, November 23, 2010

Wonderful book: "Land of Lisp" - Conrad Barski is a great author and communicator

I have been enjoying Conrad Barski's web based tutorials for years and I recently received his new book Land of Lisp: Learn to Program in Lisp, One Game at a Time! in the mail.

I have been writing Lisp code since the late 1970s and in ancient times I wrote two Lisp books for Springer Verlag. To be honest, I just bought the book for enjoyment but I find myself getting a new perspective and learning more about Common Lisp.


Sunday, November 21, 2010

Distributed NoSQL datastores: Cassandra and Cloudant's BigCouch

In my work for customers in recent years almost everything that I do uses PostgreSQL (sometimes PostGIS) and/or MongoDB. (I write a lot about the Semantic Web but so far no one has paid me to work on a project with an RDF data store like Sesame or AllegroGraph.)

While I think PostgreSQL and MongoDB are great, their replication stories have not been great. MongoDB's master/slave and replica pairs work OK, and replica sets (MongoDB 1.6 and above) look to be a big improvement (it only takes a few minutes to try the MongoDB 1.6.x replica set tutorial example; follow the instructions.) I have not tried replica sets yet in a production environment but I am looking forward to it! I find MongoDB to be extremely developer friendly with convenient client libraires in Clojure and Ruby (I don't like dealing with JSON data and hashes in Java).

PostgreSQL 9 replication is easier to set up and administer than Slony but I have not had to use it in production. The replication supports master/slave hot stand-by but it is not a distributed data store with no master process.

Cassandra was designed to be distributed with no specific master server. I am almost done reading "Cassandra, The Definitive Guide." I have been enjoying experimenting with Cassandra a lot recently on both my laptop and transient EC2 instances. One negative about Cassandra for me is that I find the Java and Clojure client libraries to be inconvenient to use compared with their MongoDB counterparts. The Ruby client library is very developer friendly!

CouchDB was the first NoSQL datastore that I used (except for RDF data stores) but I have never found it to be as convenient for my work as MongoDB. That may change thanks to Cloudant's BigCouch open-source version of CouchDB that has built in clustering capability. It is extremely easy to set up a test system following the directions on BigCouch github. On both my MacBook and also using EC2s, it only took about 15 minutes to set up a cluster. If I had to set up a fault tolerant distributed data store on small (or even micro) EC2 instances cluster, BigCouch would be a strong candidate because of the relatively low RSIZE memory footprint compared to Cassandra (for empty systems, about 15MB for CouchDB and 150MB for Cassandra - but expect these memory requirements to increase rapidly with large data stores). BTW, check out the people working at Cloudant: interesting that so many physicists work there. Cloudant bases their CouchDB hosting business on BigCouch.

Good to be a programmer. Or: custom solutions are sometimes better

I signed up for a free Evernote account a year ago last April but never really used it. After reading an article in the New York Times this morning about sharing data across computers and handheld devices I decided to install the Evernote app on my Android cellphone. I also installed the web clipper Evernote Chrome browser extension. After setting up notebooks for major interests (writing tools, writing ideas, different technology areas), I spent some time using Evernote (OS X app, browser interface, and Droid app). Really nice.

I thought of using Evernote for my to-do tasks but my own custom web app works better for me. This reminded me how great it is to be a programmer and make things just the way you want. Now, I must admit that is very simple (it has just the functionality that I want and nothing else). This project is open source and took me perhaps 6 hours to write and deploy on Heroku. It is really a great feeling to make something just for yourself (although other people are welcome to use it).

Some background for another "just for me" web app:

About four years ago I almost died very quickly from two large pulmonary embolisms. Fortunately I am almost fully recovered (e.g., I can do 5 or 6 hour hikes, but not the 8 or 9 hour hikes I used to do) but I need to monitor my intake of vitamin K because it interferes with the small doses of blood thinner I take daily. Now, vitamin K is absolutely vital to have in your diet (one example: keeps calcium deposits out of arteries and heart valves and promotes bone growth and general health). So, four years ago I created a simple web app that contains many of the recipes that I use and some convenience foods that I like. What is different about is that it uses the USDA National Nutrient Database for Standard Reference, Release 20 to show the nutrients in food I eat. At first, I referred to the nutrients per serving for individual recipes and for complete meal plans but after a while, I trained myself to remember the approximate amounts of vitamin K and about 20 other nutrients in food. You can see displays for random recipes by clicking this link a few times. This web app turned out to be a large project, taking me about 40 hours to write because the recommended USDA database had a lot of errors so I ended up using their much larger "research" database and then trimming out things I didn't need. Anyway, for me this was time well spent.

Anyway, it is great to be a programmer :-)

Saturday, November 20, 2010

Is it better to spend time learning new programming languages or study languages and tools you already use?

I have been thinking about a discusion on Hacker News yesterday about which new programming language people want (or need) to learn. While I definitely enjoy learning new programming languages by writing small applications, I think that I personally get a better productivity boost by reviewing languages and tools I already use.

In the last year I have mostly used Ruby and Clojure for my work. In the last few months I have read two books on Clojure and one on Ruby. Sort of: the more you know, the more you can learn and more deeply understand something.

Recently I reviewed the commands and read another tutorial for the screen utility that I have been using almost everyday for three years. Well worth the time.

I do a lot of work on remote servers and emacs is not always installed so I have also used vi for years. This morning, I saw a reference to learning vim by using vimtutor and spend 20 minutes working through the complete tutor program (learn by doing: you edit the tutorial as part of the tutorial). Time very well spent.

Friday, November 19, 2010

New AppEngine 1.4.0 features a game changer for JVM languages

As I wrote last April you can use Objectify in Java apps to reduce loading request times (avoiding JDO overhead) but running JRuby and Clojure applications on AppEngine has been hindered by still long loading times. The new feature forthcoming to AppEngine (SDK available, but server side support is not in place yet) of allowing paid for apps to keep three compute instances always running will open up the AppEngine platform for other languages and frameworks. On the development side I have had good JRuby and Sinatra experiences with the AppEngine SDK but have never wanted to deploy anything other than experiments - that will change now.

Thursday, November 18, 2010

Publishing to the Kindle, Android, iPhone, and iPad

I just bought a Kindle and I like it more than I thought that I would. The screen is easy to read and it is much lighter than an iPad.

I have a plan to add support for Kindle, Android, iPhone, and iPad file formats in my writing/publishing pipeline. Currently, my "development system" for writing is Latex with some custom Ruby scripts and a Makefile for each writing project. From Latex source files, I can currently generate:
  • Lulu print books
  • PDFs for laptop viewing
  • HTML pages with automatically inserted Google Adsense ads
  • HTML pages
I have been generating fairly good revenue from publishing my Open Content books and selling print copies (some generous people also pay for the PDF instead of using the free downloads). Because I am earning more money from my Open Content writing, I plan on devoting much more time to writing next year. I plan on always making the large format PDFs free for download, and selling print books and versions for hand-held devices.

Benchmarks: memory use can be as important as runtime for some applications

I often look to the benchmark game results. While I am pleased that Clojure is now included, I find the memory use (at least for these benchmark programs) to be disturbing (e.g., Clojure vs. Ruby 1.9 and Clojure vs. Java). When deploying to small servers, VPSs, small EC2 instances, etc. memory use can be critical.

Saturday, November 13, 2010

Clozure Common Lisp 1.6 has been released

Although I have used SBCL for more consulting work than Clozure CL, I have started using Clozure for more of my own projects. One thing I prefer is that standalone applications built with Clozure are about 10 megabytes smaller than apps built with SBCL. Release notes

Thanks to Tom Munnecke for portrait photographs

My friend Tom Munnecke recently took some casual portrait pictures of me, and I am now using one as my Twitter picture. Sometime I will also replace some of the pictures on main web site.

Friday, November 12, 2010

Contact your Congressional Representatives and Senators and ask them to have their families publicly go through the TSA grope experience

Use this link to contact them.

Also, how about President Obama letting his wife and daughters get groped by TSA? Our Senators and Representatives let their wives and kids get groped?

Hate to break it to you, but the government officials who we elect and who get paid off by corporate lobbyists don't live in the same world we do.

People get the government that they deserve. Are you going to do anything about this? An email is good, but also consider taking the extra time to insist on talking to your elected officials.

BTW, I don't think that TSA has every caught a terrorist, or done any good at all. (Not like the FBI and other intelligence services who do worthwhile work.)

I am softening my position on Oracle's stewardship of Java

I have to admit that I got a little carried away with my criticism, fueled largely by my agreeing in substance with the position of the Apache Foundation. However, Apple's announcement and IBM's earlier announcements about working with Oracle and OpenJDK have caused me to soften my position.

To be fair to Oracle, my complaints about lack of tolerance for Apache Harmony and the FSF Gnu Java implementation apply equally to the now defunct Sun Microsystems.

The GPL license for OpenJDK is not all that comforting given the patents wrapped up in Java implementations: only passing the Java Compatibility Kit tests gets a patent waiver. Still, for Java developers, I think that all is right in the world, on a practical level.

Wednesday, November 10, 2010

My nephew died this morning: rest in peace Anthony

My nephew Anthony was hit by a car last night and died early this morning.
The following picture shows Anthony, my sister in law Anita, and me in front of my house in Sedona. Anthony was about 17 years old in this picture.

Here is another picture of Anthony and my Dad when they were visiting Sedona. Anthony was 20 years old in this picture.

Anthony loved talking about politics, his family, and music.

Tuesday, November 09, 2010

Question: what would be the legal ramifications of forking Java, but not calling it Java?

I understand that Oracle owns the trademark and there are patent issues. Still, an alternative pure open source platform named wombat or whatever might end up being necessary.

I understand that Oracle would like to monetize Java but there is that "killing the goose that laid the golden egg" metaphor becoming real life :-)


Monday, November 08, 2010

Downloading all of your data from Facebook

Nice: I just tried this: go to your Facebook Account menu on the upper right corner of the home web page and select Account Settings. Near the bottom of the page, click the learn more link for Download Your Information. Follow the instructions and you will get a ZIP file with your home page, friends list, all messages, wall, and photographs.

Friday, November 05, 2010

Free PDF for the Common Lisp edition of my "Practical Semantic Web and Linked Data Applications" book

I have been working on this book on and off for two years and finally finished it recently while on a long vacation. You can get a free PDF on my Open Content web page or if you want to encourage me to write more material on niche topics, you can buy a print copy.

Although this book covers a small market, I believe that the combination of Common Lisp and the AllegroGraph RDF data store is a great combination for developing knowledge intensive software.

Monday, November 01, 2010

Wow: I think QuickLisp will change my Common Lisp development setup

I have been experimenting with the beta for and it looks very good. I never disliked the ASDF package management system, but QuickLisp looks to be easier to use and with a few hundred common Common Lisp already in the repository, getting most dependencies installed is quick and easy. Good job Zach!

For many years I have been using a brute force approach to package management: in every project I work on, I have a utils directory where I un-tar the source code to all dependencies, and their dependencies, etc. I then locally
(push "utils/PACKAGE_DIR/" asdf:*central-registry*)
for all dependencies. This has always worked really well for me because I can ZIP up a project directory, rsync it to a server, and I am good to go with all dependencies. However, it is a pain to keep multiple copies of libraries in multiple projects. For reasons I don't even remember anymore, I never liked to use a global ASDF cache.

It will be a long weekend day project, but if I don't run into any serious problems with QuickLisp, I am going to clean up all of my Common Lisp projects to not use local copies of dependencies. I'll wait until the Common Lisp edition of my Semantic Web book is done, to avoid delaying that effort.

Friday, October 29, 2010

Convergence on HTML5 for user facing software development

News this morning on ZDNet about Microsoft moving away from Silverlight towards using HTML5 is more good news. I expect (hope!) to see more iPhone and Android development to be based on HTML5 rather than native apps.

As we see improvements in device independence (i.e., getting to read email, tweet, watch video, video teleconference, etc. on our smart phones, netbooks, laptops, Google TV, Wii, XBox, etc.) I hope to see real improvements in development platforms for universal applications.

Tuesday, October 26, 2010

Java support on the Mac

Long term, this is a big issue for me: I spend a lot of my time developing Java, Clojure, and Scala code using IntelliJ so I need support for desktop Java apps like IntelliJ. For now, my customer (CompassLabs) is a Clojure shop (at least the work I do) and Emacs+swank+Clojure is a sweet combination, so Clojure development will not really be impacted because this does not require Swing desktop app support. But I am pretty much addicted to IntelliJ for Java and Scala coding.

I would be interested in an official statement from JetBrains how they will handle this issue, long term.

An easy solution is to just use my i5 Windows+Ubuntu laptop in the future.

Monday, October 25, 2010

review of new Hulu Plus service

I got a beta preview invite today, and so far it looks really good: HD, no opening commercials (at least for what I chose to watch during my lunch break), and additional material. Hulu Plus is $10/month so I may not use it permanently but I will give it a try for several months. Just my opinion, but Hulu Plus at a reduced rate of $5 or $6/month would make it more compelling since for $15/month I get 2 Netflix blu-ray movie checkouts (at the same time) and their very good streaming service.

My wife and I are also Directv customers and although we really like the service, it is expensive. Directv must be worried about competition from direct Internet viewing options because they called up several nights ago and offered a small reduction in my monthly rate and a free 6 month promotion for all their channels (except for pay per view). Directv is a good user experience.

If you are like me, you use movies and a few TV shows for something to do while either bored or too tired at the end of the day to do much else. Unless viewing is a social/group activity with side conversation, I feel a little guilty sitting around enjoying media that other people created when I could be getting exercise, working on my current book project, or making some walking around money by consulting.

A little off topic, but I re-watched late last night two of the GoogleTV presentation videos from the last Google I/O and I am excited about more seamless integration of all media that I have purchased (or is free with advertising) and all devices that I use (MacBook Pro, Droid, HD TV with blu-ray and Wii). BTW, I did a small Blu-ray Java app for a customer two years ago that was really just a demo: it put a user's Twitter stream in the upper right corner while playing from a blu-ray disk.

* edit: apparently there are no commercials: I finished watching a recent episode of the office and the screen went black for a second in places where there were commercial breaks.

Saturday, October 23, 2010

Future society: an optimum strategy for flourishing

I just watched this interview with Tim Wu. He nails it re: the tendency of information industries to move from open to closed systems. I just pre-ordered his new book: The Master Switch: The Rise and Fall of Information Empires There is little doubt in my mind how our society is going to evolve in an era of consolidated corporate power and ubiquitous information systems.

Although I don't subscribe to the idea that history repeats itself, I do believe that history does inform us about human nature and how powerful people fight to consolidate power and influence. This tendency is firmly stapled into our DNA.

There will almost certainly be strife between what used to be the middle class and financial and political elites. I read yesterday that one of the international rating agencies predicted a loss of "social cohesion" in the USA. Right now, there are large strikes in France over raising the retirement age from 60 to 62. It is interesting that here in the USA, the waning middle class seems silent in comparison. In my country the consolidation of power by control of news by just a few corporations has been brilliant in execution, but not in the public good: we see people swayed by corporate controlled news to back political agendas that are very much against their own interests.

I have always had a strong interest in strategy games like Go (I wrote the first commercially available Go playing program) and Chess (I wrote the simple Chess program Apple shipped with very early Apple IIs). The point is: when playing strategy games, you play the current board position. Now, life is not a perfect knowledge game like Chess and Go but as individuals it still makes sense to "play the position" with what knowledge and resources we have.

What knowledge do we have? I would argue very little if you rely on USA filtered and biased news sources. You can improve your knowledge a little by reading multiple international news sources. I also rely on what people I trust write in their blogs and books.

What resources do we have? As individuals our resources are specialized job skills, education, family and friends, real property (hopefully not debt encumbered), owning your own business, cash assets, and social and professional networking. Negative resources are things like excessive personal debt.

In playing Chess and Go, it is almost always useful to understand what moves your opponent wants you to make, and avoid these moves! In our lives, I believe that it is useful to try understanding what the financial and political elites want the masses of non-elites to do. In recent history, using advertising and media control, a vision of extravagant lifestyles has been pushed at people and this has been carefully combined with control of cheap borrowing to induce people into taking on excessive private debt: optimum elite strategy for increasing control, wealth, and influence. The obvious strategy for non-elites is to avoid debt, avoid getting sucked into a materialistic lifestyle, and to invest personal resources in acquiring specialized job skills and making education a life-long process.

Thursday, October 21, 2010

Social network based authentication done right

Although there are valid privacy concerns using social networks like Facebook, and to a lessor degree Twitter (because almost all tweets are intended to be public), for most of us the value proposition of shared user identity between web sites provides advantages of consistent login/authentication without multiple accounts and also enabling web sites to potentially show you more things that are interesting based on your online behavior.

I have been an occasional user since they went beta. Today I was looking at their login/authentication scheme that uses either Twitter or Facebook authentication. I tried using both Twitter and Facebook for authentication and liked that Hunch recognized that my previous Hunch account, my Facebook account, and my Twitter account belonged to the same person and immediately offered to merge the accounts.

Giving a site like Hunch the ability to access some Twitter and Facebook data on users opens up even more opportunities for using machine learning to further personalize the user experience for

Most of my work in the last year has been using machine learning to process some form of user data (although I also get a lot of plain Java/Ruby/Lisp development work) and understanding the behind the scenes infrastructure and techniques as well as user experience benefits has softened my previous stance against corporations collecting too much information. A few days ago when my wife and I were in Hong Kong, we were talking about the ubiquitous advertising (the night time skyline is a beautiful canvas for advertisers) and customized advertisement displays in the movie Minority Report. Custom ads that are interesting and useful are a good thing. Just be sure to understand what information you share and how it is used.

For more of a learning experience I have started doing some up-front research for adding Facebook and Twitter authentication to my web site (something I wrote for my own use, but it has users). I would like to have my own "full stack" environment for collecting user preferences and using machine learning for recommendations. For customers, I tend to touch only parts of their systems so work on is motivated by my desire to understand more of the entire process rather than build a popular web portal (although that would be nice also!)

Nice: Clojure results now in Computer Language Benchmarks Game


Clojure vs. Ruby 1.9: median results 8 times faster

Clojure vs. Java 6 server: median results 4 times slower

Clojure vs. Python 3: median results 10 times faster

My travel journal notes for my Siberia, Japan, and China trip

Here are my rough notes that I was emailing to my family and friends. As I edit them I will post a few of my best pictures here on my public Picasa web alblum - just look for recent photo albums with "2010" in the title. I did not make notes for the first week as we were going north west through the Aleutian Islands and into the Bering Sea.

Fun excursion: 1.25 hour drive Petropavlovsk to Indigenous village

We went way off the beaten track today, but had a lot of fun. Except for a lot of driving on very bad roads, most of the day was spent inside an Indian style lodge similar to what a large family would live in during the long winter. Entrance was a long very low tunnel. The lodge had a small hole in the ceiling directly above the fire pit.

Three women sang and danced inside the lodge, one spoke a little English and told stories and legends, etc. They also cooked us a meal that was pretty good.

I have some fantastic video of the singing and dancing (similar to Southwest Indians, but more complex, musically). I'll post a few of the video clips when I have a chance. Carol took video of me dancing, but that will be censored.

We are now steaming towards Japan

We have today and tomorrow at sea, then the next day we land in the port of Sendai. Countryside and temples, here we come!

As per the email I sent last night, Carol and I really enjoyed our "Indigenous people's village tour" yesterday. The roads were so very bad that they had to transfer us from a robust tour bus to an extremely robust mini-monster truck (huge wheels, 4-wheel drive) for the last few miles and even then getting to our destination was dicey. Carol just showed me some of the videos she took; using both of our videos, we should be able to put together a good presentation. We just used our digital cameras, but the sound and video quality is good. Carol just showed me a great video sequence from the mini-monster truck going down a very bad road with a large field of yellow wild flowers on our right. Right now she is editing a photo from our ship showing snow covered mountains/volcanoes surrounding the harbor we were in yesterday. Stay tuned for production-ready product :-)

Our tour guide yesterday was a little apologetic for the condition of their city, roads, etc. Definitely, not a lot of wealth to be seen (she said the ultra-rich people lived in guarded enclaves well outside of the city). Still, they have lots of natural resources and I think enough jobs - their life style may compare OK to non-rich people living in the USA in a few years.

We saw an amazing magic show last night, "Moscow Magic." The couple had been married for 30 years and their act together was very smooth and well executed. They have a 25 year old daughter who is a magician with the Moscow Circus.

Carol and I are hanging out in the library this morning at a stone table that is covered with semi-precious stones: reminds me of the inlaid semi-precious stones on the sides of the Taj Mahal. Carol is editing trip pictures and video on her laptop and I am getting ready to work on my current book project. I have only been spending about 90 minutes a day writing, but have been, I think, very productive. I'll say it again: if I were wealthy I would live and work on cruise ships about half the time.

bird rescue (?)

We are passing the far northern Japanese islands this morning, arriving in Sendai early tomorrow morning. It is getting warmer: we are now about the same latitude as Portland Oregon, so we have come far south compared to being in the Bering Sea. I was out on deck very early this morning at the very top of the ship in a short sleeve shirt, so it really is warming up (probably 40 degrees at 6am).

Most days we go to both a history lecture on the areas we are visiting (lecturer is awesome) and also listen to the naturalist on board. Yesterday the naturalist was telling us that large birds land on the ship and can not take off so it is a kindness to pick them up and toss them over the railing. Carol and I went outside on the Promenade deck last night at about 9pm and what do we see but a large bird (the wings looked like a frigate bird but the color was different) trying to take off. Now, these birds have these tiny stubby little legs and he was not getting any lift-off speed. I picked him up (heavy bird, lots of fat on his chest) and carried him to the railing but I could not bring myself to toss him overboard into the darkness. You guessed it: I handed him to Carol and she did the bird tossing. It was so dark that we have no idea if we saved him or tossed him to a watery death. I hope the former option. In any case, it was a thrill to hold such a large bird; as some of you know, I am more than a little fond of birds, and miss my ornery little parrot. We saw four more birds of the same species later in the evening and they too looked really scared to be stuck on deck. I picked up one of them, but Carol and I decided to not toss him, so I set him down near three other birds that were nearby. Early this morning, two of them were still on deck, hiding behind some lounge chairs. edit:It turns out these were water birds and I would have done them all a favor by spending 5 minutes tossing all of them back into the ocean. (This is what the naturalist told me a few days later when I talked with him at lunch time.)

We had a formal dinner last night and the ship's chief medical officer sat with us - interesting to hear about his life. edit: He joined us on two shore tours so we ended up getting to know him fairly well.

I am very excited about landing in Japan tomorrow. We have reservations on four awesome sounding shore excursions (one for each port in Japan we will be visiting). Carol has been taking many good pictures, and eventually we will organize an online photo and video album (our target date for this is February 2011 :-)

Sendai: first port in Japan

Yesterday was very special. I have always wanted to visit Japan and the tour yesterday was the perfect start.

We first went to a tourist area that is one of the most popular destinations for Japanese: a few hundred beautiful stony little islands with a few pine trees on each. Our view point was a three hundred year old tea house on a bluff.

Nearby, there was an ancient forest with a long formal entry, and the main gate framing the same islands. On one side were about a dozen meditation caves and on another an early Shogun's residence. All beautiful and interesting.

Our last stop was a Shinto Shrine on a small mountain, also looking out over the sea in the distance. Our guide was wonderful, with her spiel, giving me more information when I asked questions, and teaching us all the proper way to pray at Shinto Shrines (yes, I will probably convert :-)

A bit of a disappointment today: we were going to go on a self guided tour through caves and catacombs where Buddhist monks have over the centuries made carvings and pictures. Each person was to be given a candle, and they would send us in by ourselves. Anyway, the cruise tour lady on the ship warned people during a talk of claustrophobia problems, and so many people bailed on the tour that they cancelled it. I said how much I liked the your yesterday, so they are sending us on something similar today, but it will be in Yokohama (we should dock in about 1 hour).

Yokohama: 2nd port in Japan

Yokohama is very different than Sendai; Yokohama is a huge port, but well laid out and very clean. Even the water in the bay is clean. So far, all of Japan has been spotless and tidy.

We took a tour today to the ancient city of Kamakura, made capital of Japan by the Shogun in 1192. Our first stop was at a giant bronze Buddha: taller than you would believe. I went inside it also - sort of interesting.

We also spent time at a Shinto Shrine and walked through town. I bought two t-shirts, some Japanese cookies our guide recommended, and a hand painted bamboo bookmark. Carol and I also walked down several small alleys/walkways in the residential areas: odd because in the middle of the city, it was absolutely quiet.

After the tour, Carol and I had a late lunch on the ship, then spent the afternoon and early evening in a park near the ship that was a popular hangout spot for locals. We leave at 10:30pm for the port of Shimizu.

After dinner, the same music group that was performing in the morning on the pier was on board entertaining. They had a calligraphy artist doing "real time" huge drawings, interpreting the music. Then, with a translator, the calligraphy artist explained the nuances of what she was doing.

Shimizu: port 3 in Japan

We started the day going to one of the Japanese national treasures + places: the Nihondaira Park and Kunozan Toshogu Shrine. The shrine was in a very remote spot: had to take a 2000 foot cable car to get there. It was built for the Shogun Tokugawa Ieyasu who was an early and especially powerful leader. I can't really describe the area and our pictures don't do it justice. There were beautiful buildings, very old cedar groves, the monument where his body has lain for 800 years, and hawks flying overhead. The breeze made the trees seem like a live animal around the shrine. Sitting there quietly has been the absolutely best time so far on this trip. The park also had an excellent museum of ancient Samarai armor and weapons.

After some more sight seeing, we had a fancy traditional Japanese lunch in a nice hotel - really strange stuff, but I ate everything. Watson traditions to uphold, and all of that.

In the afternoon, we saw something fairly much awesome: the small museum that has most of the existing work of the wood block artist Utagawa Hiroshige. He worked around 1830 to 1840 so this is not really old like most of what we have seen. You have seen his famous "Ocean Wave" wood block print, and the collection of his entire work was amazing.

It is worth noting that on all tours we have taken in Japan, the places we visited have had almost all Japanese tourists and a relatively small number of foreign tourists. The Japanese are very big on visiting their own places of interest. Easy to understand because of the beauty and general historic interest.

Kobe + Kyoto: 4th port in Japan

Carol and I are sad to be leaving Japan in an hour. We did a 10 hour tour today, leaving our ship in the port of Kobe and driving to the ancient capital city of Kyoto. I would like to come back to Japan and rent an apartment for a few months sometime when I am not consulting and just writing. Wonderful place, full of polite and very happy and contented looking people.

The first stop in Kyoto was the part time residence of the Shogun Yoshimitsu that was built in 1397. It was very dangerous being Shogun: enemies were often trying to assassinate the current Shogun. Yoshimitsu's residence had squeaky "nightingale" floors that creaked as we walked on them. A little noisy, but no sneaking up on anyone without a lot of effort. There were a series of rooms that were used for audiences with the Shogun, and the security setup was intense looking.

Calvin: our guide today was an old woman who seemed very knowledgeable about Ninjas. She said that they would hire themselves out to the highest bidder, either to protect the Shogun or assassinate him. She said that they were like the American CIA (not entirely sure what she meant by that). In Yoshimitsu's residence, he paid local Ninjas to masquerade as gardeners to spy on people approaching the residence and generally collect intelligence.

We then went to the Golden Pavilion, a Shinto park + shrine with lots of water, reflections, etc. Lunch was a buffet, and yes, I did eat some Kobe beef. Kyoto was an interesting place, and we spent the afternoon going to several more historic sites.

Two day tour to Beijing

Just a short note because I am tired. Carol and I just got back from a 35 hour trip to Beijing. We left the ship first thing yesterday morning for a 2 1/2 hour drive to Beijing. We had lunch and then headed for the Great Wall. One note: the weather was beautiful and sunny: blue skies except for a few scattered clouds. The Great Wall was most interesting because it gives some indication of the power of the Chinese emperors. We got extremely tired climbing the stairs on one segment of the Great Wall and you will see why when I send out some pictures.

After the Great Wall we moved on to the Ming Tombs in the hills outside of Beijing. This was an interesting lesson in how Feng Shui is done on a very large scale. We then went at dusk on what is called the 'sacred walk' where dead emperors were carried in a procession. Interesting because of large animal sculptures along the way.

We then had a painfully long drive through Beijing traffic to what is reputed to be the best duck restaurant in Beijing. I have no real knowledge if this claim is true but there were lots of pictures of famous people and politicians eating there. The food was tasty.

On the way to our hotel, the InterContinental, we passed the lit up Olympic sites for the swimming structure and the "Bird's Nest" - looked really cool from the bus, but when Carol and I walked back there at about 11pm, the lights were out. It was great taking a long nighttime walk in Beijing anyway.

Our hotel room was unbelievable, probably the nicest room I have ever had. Unfortunately, we had to meet for breakfast at 6:30am (full on Chinese food buffet with all kinds of strange stuff to eat) and get an early start to Tian'anmen Square: odd but interesting experience - people were in a 3 or 4 hour line to see Mao's tomb. We walked around and gawked. Don't underestimate the patriotic fervor in China: we saw it everywhere. Our guide talked about every person pulling together to help the economy. Single minded and to the point. Very scary, us being economic competitors. Also, I didn't mention before how beautifully splendid Beijing itself is. I know that there are poor areas in China, but Beijing was awesome and people just had a happy look about themselves living or working there.

We then went to the Forbidden City. I can not describe how huge this is because we covered such a tiny part of it. The open squares are immense and the imperial buildings, well, are big and imperial. Our guide told me that if a baby was born there and spent each night in a different room, then he would be 30 years old by the time that he slept in all major rooms. Our guide took us to the "Emperor's lovenest" (as he called it) area. The area where he lived with a dozen rooms nearby for his favorite concubines was small compared to the rest of the Forbidden City.

After lunch we went to the Temple of Heaven which was nice after the Forbidden City because the temple area only covered (about) 25 acres and did not numb the mind trying to contemplate the place. Everyone on the tour was happy enough to start the 2 1/2 hour drive back to the ship, after the second long day in a row.

We now have two days at sea before spending two days in Shanghai - looking forward to some down time and working on my current book project.

"Steaming" towards Shanghai

You might remember my previous email about "bird tossing." Sort of a followup: I was laying by the pool this morning doing some writing and I felt a little thump on my stomach. I looked down and there was a tiny canary-type bird that had landed on me. We looked at each other and it jumped down on the lounge chair right beside me. After a pause of a few seconds, it ran up my right pant leg, took a sharp right at my laptop, ran across my stomach, and flew away. Yesterday a similar bird flew right up beside Carol and I with a large moth in its mouth and last night we watched two of them bathing in a pool of fresh water (from the decks being washed down). Cheerful little birds.

Judy Garland's daughter Lorna Luft was the performer for last night. I saw her early yesterday morning while walking on deck and we talked briefly, but I didn't know then who she was. Her show last night was good and we just spent an hour listening to her answering people's questions in the lounge. Frank Sinatra was her godfather and a major influence in her life, and her best friend is Barry Manilow - she had some good stories.

We head up the Yangtze River at 1:30am tomorrow morning and from what I hear the captain will earn his salary. (Actually, he is charming, and has already earned his salary.) We fly home in 6 days and I am looking forward to getting back to real life.

Yahoo! Shanghai!

Patti: Happy Birthday!! I hope you had a great time in Hawaii.

We arrived in Shanghai early yesterday morning after about 3 hours of river navigation. We took the "Zhujiajiao Water Town" excursion that lasted all day yesterday. The water town was like a little Venice: lots of small bridges and very small sanpan boats to get around. Our ride on a little sanpan only lasted about 10 minutes but it was fun. Lots of narrow pedestrian only streets and we both bought a lot of inexpensive stuff in small shops. Later we had a mediocre lunch and visited a silk factory. I have some interesting video clips I took of the process of getting the silk threads off of the cocoons. Carol bought a beautiful green silk jacket; she is thrilled with it.

Last night we had dinner outside on the tail end of the ship and enjoyed the spectacular Shanghai skyline for several hours. Really nice. We had rack of lamb and a couple of chefs were cooking it up fresh nearby so it did not get cold coming all the up from the kitchen area. Since I didn't eat much lunch I also had a second entree of fresh mussels and two nice appetizer plates. As expected, food on the ship is fantastic, especially lots of high quality shasimi and sushi every lunch.

After the super-tidy futuristic feel of Beijing, I feel that we are now seeing more of the 'real China', except that Shanghai historically has such a strong western influence, and you see that in the 100 year old western style buildings. This morning we went into the city on our own and walked through the areas that we only drove through yesterday: the Bundt financial districts, lots, lots, lots of shopping, etc. I had a small cold yesterday, feeling better today, but I wanted to take it easy so after a while I gave Carol all of my Yuan money and I caught a bus back to the area where the ship is.

We now have two sea days starting tomorrow and then a long day in Hong Kong (two tours, totaling 10.5 hours, then we get up early the next morning and fly home). I am really looking forward to getting home - our normal life is pretty sweet, and this trip seems somewhat like a fantasy.

Hong Kong, and a Typhoon

Today has been a long but fun day. We woke up and looked out the window as we were entering Hong Kong bay. Beautiful city.

We took an 8 hour 'best of Hong Kong' tour and saw just about all the famous places. We started in the Bird Market where there were a zillion birds for sale and generally a lot of Chinese bird lovers hanging around. We then went through a food market that was super interesting, if a bit bloody, gory, etc. We then took a funicular to Victoria Peak and ate in a very good restaurant; Carol made sure that I got the seat at our large table right up next to a floor to ceiling window, looking down over Hong Kong - felt like I was dining on the edge of a cliff. We then went to the other side of the the Island to Aberdeen and took a sampan ride during which I shot a lot of video. We then went to Stanley Market, and then back to the ship.

We had a hurried but tasty dinner and then headed out for a 2+ hour nighttime harbor cruise that was a lot of fun also.

We just put our large bags outside our stateroom for the porters, and we are now going to get some sleep since we get up early to fly home tomorrow.

re: the Typhoon: the ship's captain has said that the cruise might have to be diverted, changing the itinerary. Our new ship-board friends who are on the full 69 day cruise (we just did the first 25 day segment) did not seem to mind a possible change of ports visited. Hopefully we will have no weather problems flying out tomorrow morning.

Wednesday, October 20, 2010

I am back home after a 4 week vacation: blog comments are now enabled

I did not want to deal with blog comment SPAM while travelling so I had temporarily turned off comments. I will post my travel log when I get a chance.

Sunday, September 26, 2010

Big productivity gain: not having an Internet connection in the middle of the Pacific Ocean

Carol and I are on a long cruise, and because of the high cost of Internet connectivity, I am only getting on the web for about 5 minutes every other day.

I have been spending about 2 hours each day working on the Lisp edition of my Semantic Web book, and I must say that my productivity seems a lot better when I am not distracted with an Internet connection.

So far, we have been very good about not over eating on this trip - enjoying the food but eating small portions. Except for some complementary Champaign the first night we have avoided alcohol, making it easier to not over-eat!

We will be onboard for 25 days so we don't feel pressured to engage in all activities that might be fun. So far, we have been enjoying a series of onboard lectures, the movie theater, and lots of walking on deck.

Tuesday, September 21, 2010

I am going to be on travel for 4 weeks: temporarily turning off blog comments

Carol and I are leaving on a long trip. Unfortunately, I get SPAM comments on my blog which are easy enough to remove, but I will be off of the Internet for long periods of time. I'll turn comments back on when I get home.

I have my laptop setup to work on the Common Lisp edition of my Semantic Web book so that will probably be available in final form in about 6 weeks.

Wednesday, September 15, 2010

Rich client web apps: playing with SproutCore, jQuery, and HTML5

In the last 14 years I have worked on two very different types of tasks: AI and textmining, and on (mostly server side) web applications. Putting aside the AI stuff (not the topic for today), I know that I need to make a transition to developing rich clients applications. This is not such an easy transition for me because I feel much more comfortable with server side development using Java, Ruby on Rails, Sinatra, Merb, etc. On the client side, I just use simple Javascript for AJAX support, HTML and CSS.

As background learning activities I have been working through Bear Bibeault's and Yehuda Katz's jQuery in Action and Mark Pilgrim's HTML5 books. Good learning material.

When I read that Yehuda Katz is leaving Engine Yard to work on the SproutCore framework I took another good look at SproutCore last night, worked through parts of the tutorial with Ruby + Sinatra, and Clojure + Compojure server backends. I find Javascript development to be awkward, but OK. I need to spend some time getting setup using IntelliJ on both jQuery and SproutCore learning projects. If anyone has any development environment suggestions, I am listening.

Tuesday, September 14, 2010

MongoDB "good enough practices"

I have been using MongoDB for about a year for customer jobs and my own work and I have a few practices that are worth sharing:

I use two levels of backup and vary the details according to how important or replaceable the data is: I like to perform rolling backups to S3 periodically. This is easy enough to do using cron, putting something like this in crontab:
5 16 * * 2 (cd /mnt/temp; rm -f -r *.dump*; /usr/local/mongodb/bin/mongodump -o myproject_tuesday.dump > /mnt/temp/mongodump.log; /usr/bin/zip -9 -r myproject_tuesday.dump > /mnt/temp/zip.log; /usr/bin/s3cmd put s3://mymongodbbackups)
The other level of backup is to always run at least one master and one read-only slave. By design, the preferred method for robustness is replicating mongod processes on multiple physical services. Choose master/slave or replica set installations, but don't run just a single mongod.

I often need to do a lot of read operations for analytics or simply serving up processed data. Always read from a read-only slave unless the small consistency hit (it takes a very short amount of time to replicate master writes to slaves) is not tolerable for your application. For applications that need to read and write, just either keep two connections open or use a MongoDB ORM like Mongoid that supports multiple read and write mongods.

Another thing I try to do is to place applications that need to perform high volume reads on the same server that runs a MongoDB slave; this eliminates network bandwidth issues for high volume "mostly read" applications.

Saturday, September 11, 2010

Very interesting technology behind Google's new Instant Search

Anyone using Google search and who is paying attention has noticed the very different end-user experience. Showing search results while typing queries now requires that Google has to to generate at least 5 times the number of results pages, use new Javascript support for fast rendering of instant search results, and, most interesting to me, a new approach to their backend processing:

It has been about 7 years since I read the original papers on Google's Big Table and map reduce, so it is not at all surprising to me that Google re-worked their web indexing and search. The new approach using Caffeine forgoes the old approach of batch map reduce processing and maintains a large database that I think is based on Big Table and now performs continuous incremental updates.

I am sure that Google will release technical papers on Caffeine - I can't wait!

Using Hadoop for analyzing social network data

At CompassLabs my colleague Vivek and I are using Hadoop and Amazon's Elastic MapReduce to process social network data. I can't talk about what we are doing except to say that it is cool.

I blogged last week about taking the time to create a one-page diagram showing all map-reduce steps and data flow (with examples showing data snippets): this really helps manage complexity. I have a few other techniques that I have found useful enough to share:

Take the time to setup a good development environment. Almost all of my map-reduce applications are written in either Ruby or Java (with a few experiments in Clojure and Python). I like to create Makefiles to quickly run multiple map-reduce jobs in a workflow on my laptop. For small development data sets, after editing source code, I can run a work flow and be looking at output in about 10 seconds for Ruby, a little longer for Java apps. Complex work flows are difficult to write and debug so get comfortable with your development environment. My Makefiles build local JAR files (if I am using Java), copy map-reduce code and test data to my local Hadoop installation, remove the output directories, run the jobs in sequence, and optionally open the outputs for each job step in a text editor.

Take advantage of Amazon's Elastic MapReduce. I just have limited experience setting up and using custom multi-server clusters because for my own needs and so far for work for two customers Elastic MapReduce has provided good value and saved a lot of setup time and administration time. I think that you really need to get to certain large scale of operations before it makes sense to maintain your own large Hadoop cluster.

Thursday, September 09, 2010

why doesn't iTunes support Ogg sound files 'out of the box'?

You know why: Apple does not mind inconveniencing users in order to keep their little walled garden the way they want it. I have been a long time Apple supporter (I wrote the chess game they gave away with the early Apple IIs, and wrote a commercial Mac app in 1984) but sometimes they do aggravate me.

Two new books today

I just got my delivery from Amazon: "Linear Algebra" (George Shilov) and "Metaprogramming Ruby" (Paolo Perrotta).

I have a degree in Physics but I find my linear algebra to be a little rusty so I bought Shilov's book to brush up. I bought Perrotta's book because while reading over some of the Rails 3 codebase, too often I find bits of code that I don't quite understand, at least without some effort.

Sunday, September 05, 2010

I've improved my Hadoop map reduce development process

I had to design a fairly complicated work flow in the last several days, and I hit upon a development approach that worked really well for me to get things written and debugged on my laptop:

I started by hand-crafting small input data sets for all input sources. I then created a quick and dirty diagram using OmniGraffle (any other diagramming tool would do) showing how I thought my multiple map reduce jobs would play together. I marked up the diagram with job names and input/output directories for each job that included sample data. Each time new output appeared, I added sample output to the diagram. I had a complicated work flow so it was tricky to keep everything on one page for reference, but the advantage of having this overview diagram is that it made it much easier to keep track of what each map reduce job in the workflow needed to do and made it easier to hand-check each job.

As I refactored my workflow by adding or deleting jobs and changing code, I took a few minutes to keep the diagram up to date - well worth it. Another technique that I find convenient is to rely on good old-fashioned make files both to run multiple jobs together on my laptop with a local Hadoop setup, and also to organize the Elastic MapReduce command lines to run on AWS.

I have been experimenting with higher level tools like Cascading and Cascalog that help manage work flows, but I decided to just write my own data source joins, etc. and organize everything as a set of individual map reduce jobs that are run in a specific order.

Friday, September 03, 2010

Efficient: just signed up to write an article on Rails 3 after spending weeks spinning up on Rails 3

I was just asked to write an article on my first impressions of Rails 3. This is very convenient because I have been burning a lot of off-work cycles spinning up on Rails 3 (I have done no work using Rails in 5 months because I have been 100% booked doing text/data mining). Architecturally and implementation-wise, Rails 3 rocks: I will have fun writing about it.

Very cool: a tutorial on using the MongoDB sniff tool

No original material here, I just wanted to link some else's cool article on using mongosniff to watch all network traffic going into and out of a mongod process. The output format is easy to read and useful.

Very good news that Google will be providing a "Wave in a Box" open source package

Early this year I played around with the open source code on the Wave protocol site, but "play" is the active word here: I did nothing practical with it.

Although I never used Wave's web UI very much, I did find writing Wave robots interesting and potentially very useful. I invested a fair amount of time in learning the technology. I was disappointed when Google recently announced their phasing out support of Wave but today's announcement that they are completing the open source project to the point of its being a complete system is very good news.

Wednesday, September 01, 2010

I finished reviewing a book proposal tonight for an AI text book

Based on the number of books I have written, it is obvious that I love writing. I also enjoy reviewing book proposals and serving as a tech editor, as long as I am fascinated by the subject matter! The proposal that I just reviewed for Elsevier was very interesting.

I believe that the world (some parts faster than others) is transitioning to a post industrial age where the effective use of information might start to approach the importance of raw labor, physical resources, and capital (and who knows how the world's money systems will transition).

When I was reading this book proposal and also in general books and material on the web, one litmus test I have for "being interesting" is how forward thinking technical material is, that is, how well will it help people both cope and take advantage of new world economic systems.

GMail Priority InboxBox

Finally, I got an invitation and I am trying it. One problem that I have is feeling that I have to read email as it arrives so I find myself not running an email client if I am really concentrating on work or writing. With the new display, I will only see emails at the top of GMail's form if they are deemed important because they are from people I always respond to, etc. I is also convenient being able to switch back and forth between the old style inbox and priority inbox.

Command line tips for OS X and Linux

I wrote last year about keeping .ssh, .gpg, and other sensitive information on an encrypted disk and create soft links so when the disk is mounted, sensitive information is available.

I have a few command line tricks that save me a lot of time that are worth sharing:
  • Use a pattern like history | grep rsync to quickly find recent commands. Much better than wading through your history.
  • Make aliases for accessing services on specific servers for example alias kb2_mongo='mongo'. By having consistent naming aliases for your servers and for running specific services like the mongo console, it is easy to both remember your aliases and use them.
  • Create aliases with consistent naming conventions to ssh to all of your servers. I use different prefixes for my servers and for each of my customers.
  • Create an alias like alias lh='ls -lth | head' to quickly see just the most recently modified files in a directory, most recent first.
  • For your working development system create two letter aliases to get to common working directories (most recent projects, writing, top level code experiment directory, etc.). I try to be consistent and use some of the same aliases on my servers.

Consistent APIs for collections

I have been using Clojure a lot for work this year and the consistent API for anything that is a seq (lists, vectors, maps, trees, etc.) is probably my favorite language feature. Scala 2.8 collections offer the same uniform API. For me Clojure and Scala, with a fairly small number of operations to remember across most collections therefore represent a new paradigm for programming compared to some older languages Like Java, Scheme, and Common Lisp that force you to remember too many different operation names. The Ruby Enumerable Module is also provides a nice consistent API over collections. Most Ruby collection classes mixin Enumerable, but the API consistency is not as good as Scala and Clojure. That said, even though Enumerable only requires a small number of methods to be implemented like each, map, find, etc., the ability to combine these methods with blocks is very flexible.

Monday, August 30, 2010

Nice, just installed Rubinius-1.0.1

I first tried seriously using the "Ruby implemented in Ruby" Rubinius last spring and really liked it. If you have not done so already install it (rvm install rbx) and give it a try. Rubinius does not support 1.9.x syntax yet, but that is coming. Great work by the developers of Ruby 1.9.* but I still like the idea of Rubinius, long term. Good show for Engineyard supporting the Rubinius development.

Sunday, August 29, 2010

Using cljr for Clojure development

At work I now use the Clojure setup that everyone else uses, emacs+swank-clojure, with our custom repositories. For my own Clojure hacking (my own projects) I have just about settled on using cljr for convenience and agility. For me, the big win is being able to access Clojure libraries, Java libraries, and JAR files containing data sets I use often for NLP work from any directory. I don't need a heavy weight project, like for example, using Leiningen with all dependencies locally loaded. cljr uses Leiningen to manage the packages in the single ~/.cljr repository. When you startup cljr, everything in ~/.cljr is on your JVM classpath: this may seem a little heavy, but it is very convenient.

As an example, this morning I noticed an old Twitter direct message from the author of Nozzle library asking me if I had a chance to try it. Instead of setting up a separate Leiningen project directory, I just did a cljr install com.ashafa/nozzle 0.2.1, went to my catch-all directory where I keep short snippets of Clojure code, and entered Tunde's test program for Nozzle:
;; assumes: cljr install com.ashafa/nozzle 0.2.1

(use 'com.ashafa.nozzle)

(def username (System/getenv "TWITTER_ACCOUNT"))
(def passwd   (System/getenv "TWITTER_PASSWD"))

(defn my-callback 
   (println message))

(def noz (create-nozzle "filter" username passwd my-callback {:track "twitter"}))
and running it is as simple as:
cljr run nozzle-twitter-test.clj
or, using swank and Emacs:
cljr swank
and in Emacs do M-x slime-connect and in the repl: (load "nozzle-twitter-test")

My light weight Clojure wrapper for the PowerLoom knowledge representation and reasoning system

A ZIP file with everything you need to try it is on my open source web page.

PowerLoom has been in development for many years and is available in Common Lisp, C++, and Java editions. I wrapped the Java edition for this project.

This is just a first cut at a wrapper because assertions and queries must be encoded as strings.

Ruby happiness: first Ruby 1.9.2 released, now Rails 3.0

Assuming you have RVM installed, don't wait:
rvm install 1.9.2
rvm 1.9.2
gem install rails
and you will be up to date. I wrote a small utility app to browse MongoDB data we use for text mining for my customer this morning and I used Rails 2.3.8 and hopefully that will be the last time I start a new project < version 3.0. My excuse for not using version 3.0 was that I wrote and deployed the app in less than an hour, and I am just not up to speed on Rails 3 yet. That will change!

I am merging my other three blogs into this (my main) blog

I had what I thought was a good idea in the last year: split out special interests into:I am going to leave my other three blogs intact, as-is, but I am going to start doing two things: all of my non-book writing will go into this single blog and I am going to copy a few of my recent articles in the other three blogs to this one. Havng four distinct blogs has been a nuisance.

Friday, August 27, 2010

Moving MySQL to a large EBS volume

I had to move a very large customer MySQL database used for data mining to a large EBS raid. Since following the usual documentation did not work for me, here are my notes: after following the standard instructions for setting up RAID 0 (EBS is robust enough that I see little reason to use error correcting RAID, but do so if you wish), I followed some of the instructions here for mapping the usual installation location for MySQL data to the RAID EBS volume and using some fstab trickery (copied from the linked article):
sudo mkdir /vol/etc /vol/lib /vol/log
sudo mv /etc/mysql /vol/etc/
sudo mv /var/lib/mysql /vol/lib/
sudo mv /var/log/mysql /vol/log/

sudo mkdir /etc/mysql
sudo mkdir /var/lib/mysql
sudo mkdir /var/log/mysql

echo "/vol/etc/mysql /etc/mysql none bind" | sudo tee -a /etc/fstab
sudo mount /etc/mysql

echo "/vol/lib/mysql /var/lib/mysql none bind" | sudo tee -a /etc/fstab
sudo mount /var/lib/mysql

echo "/vol/log/mysql /var/log/mysql none bind" | sudo tee -a /etc/fstab
sudo mount /var/log/mysql
I then did an apt-get remove on MySQL, followed by an apt-get install of MySQL and everything that I wanted is now set up on the large EBS RAID volume and I could start the very long database import process.

Tuesday, August 24, 2010

Good resources for learning HTML5

I recommend starting with for a good overview, get really excited by playing with some demos at, and then using as a tutorial/reference.

Friday, August 20, 2010

filling a tech knowledge hole: I bought "HTML5 Up and Running"

This has been a very busy year for me, and because of that I have been ignoring the tidal wave known as HTML5. The book looks very good so far.

Sunday, August 15, 2010

How programming languages affect thinking; Clojure at work; my Clojure wrapper for PowerLoom

The Sapir-Whorf hypothesis is that the human language that we think in and communicate with affects our thought processes: the way we think.

Because my current job mostly uses the Clojure programming language, I have been thinking in Clojure idioms lately - a big change from Java and a smaller but still significant change to using Ruby. (BTW, you may have noticed that I don't blog here anymore about Ruby; this is because I have a dedicated Ruby blog now.)

At work Clojure has been a good choice because it is concise, has well designed APIs (for example, most built in data structures support the seq uniform APIs: everything mostly works the same for lists, sequences, binary trees, maps, etc.), and can take advantage available Java libraries. As a personal project, I finished wrapping the PowerLoom knowledge representation and reasoning system in a thin Clojure library this morning. (See my Clojure blog for more information.)

Wednesday, July 28, 2010

Big Data

Since I have been working for CompassLabs I have been getting even more appreciation for just how value there is in data. This article in the New York Times also makes the business case for data mining.

My first real taste for the power of data came about 10 years ago when I worked for WebMind. We looked at data from online financial discusion groups, SEC Edgar data, etc. to try to value stocks based on sentiment analysis (text mining) and raw data mining.

Thursday, July 22, 2010

Haskell is much easier after coding in Clojure and Scala for a while

I got very excited by Haskell about 18 months ago and spent a fair amount of time kicking the tires and reading the Haskell book. That said, my interest waned and I moved on to other things (mostly Ruby development with some Clojure, less Scala).

When I noticed the the recent new Haskell release for July 2010 I installed it and started working through Miran Lipovańća's online book Learn You a Haskell for Great Good!. This time, things seem to "just click" and Haskell's functional style seems very natural. I have real regrets that I probably won't be using Haskell much because I mostly code in what people pay me to use which in the last 5 years has been Lisp, Ruby, Java, and Clojure.

Tuesday, July 20, 2010

Interesting new Google Buzz API: PubSubHubbub firehose

I spent some time experimenting with the Buzz APIs this morning - well documented and simple to use. The firehose data will be useful for tracking public social media posts.

I set up Google's example app on my AppEngine account and had fun playing with it. Unfortunately, because of the amount of incoming data, it would only run each day for about 4 or 5 hours before hitting the free resource quota limits. Since this was just for fun, I didn't feel like paying for additional resources.

Friday, July 16, 2010

Good news: Google buying Freebase

That is very cool, I think.

I have lately been waist deep in using Freebase for customer work.

While there is a lot of cruft data in Freebase, with some manual effort and some automation, it is a good source of a wide variety of information. Depending on application, DBpedia and GeoNames are other good resources for structured data.

I have a fair amount of example code for Freebase, DBpedia, and GeoNames in my latest book (there is a free PDF download on my open content web page, or you can buy a copy at

Wednesday, July 14, 2010

Scala 2.8 final released. I updated my latest book's Scala examples

Good news, Scala 2.8 has been released.

I updated the github repo for the code examples for my book Practical Semantic Web and Linked Data Applications. Java, Scala, Clojure, and JRuby Edition (links for free PDF download and print book purchase).

I haven't had the opportunity to do very much coding in Scala for several months because the company I have been working for (CompassLabs) is mostly a Clojure and Java shop. That said, Scala is a great language it is good to see the final release of 2.8 with the new collections library and other changes.

Good job: CouchDB version 1.0

I usually use PostgreSQL and MongoDB (and sometimes RDF data stores) for my data store needs, but I have spent a lot of time in the last couple of years experimenting with CouchDB and always keep it handy on my laptop and one of my servers. I was happy to upgrade to version 1.0 today!

Sunday, July 11, 2010

Monetizing social graphs

Interesting news this morning of Google's investment in online games 800 pound gorilla Zynga in order to have access to social graph data from people logging into Google accounts to play games.

There has been a lot of buzz about Facebook's effective social graph data and games like those provided by Zynga have helped them. That said, I would still bet on Google having a better chance of making the most money off of social graphs because they get to effectively combine data from at least five sources to build accurate user profiles: statistical NLP analysis of GMail, search terms used by people who are logged in to any Google services, friends and business connections from GMail address books, social connections from Google Buzz (which often includes data from other social graphs like Twitter), and in the near future online multi-player gaming.

There is another issue: infrastructure. While I am willing to roughly equate the capabilities for non-realtime analytics of very large Hadoop clusters and Google's internal (original) MapReduce infrastructure, I would bet that Facebook will have problems with their mixture of highly sharded MySQL, massive use of memcached, and some use of Cassandra for their live systems. At least to me, Goggle's infrastructure is the most interesting aspect of the company. Facebook has awesome infrastructure, but Google's is even more so.