Tuesday, October 27, 2009

Using nailgun for faster JRuby startup

I finally got around to trying nailgun tonight. On OS X with JRuby 1.4.0RC2, I built nailgun using:
cd JRUBY_HOME/tool/nailgun
./configure
make # I ignored the warning "no debug symbols in executable (-arch x86_64)"
In one terminal window just leave a nailgun server running:
$ jruby --ng-server
NGServer started on all interfaces, port 2113.
When you want to run JRuby as a railgun client, try something like:
jruby --ng text-resource.rb
On my MacBook, this cuts about 5 seconds of JRuby startup time off of running this test program.

Sweet. For small programs, using ruby is still faster than jruby but this makes developing with JRuby faster.

I just tried Amazon's new Relational Database Service (RDS)

Amazon just released a beta of their Relational Database Service (RDS). You pay by the EC2 instance hour, about the same cost as a plain EC2, but about $0.01/hour more for a small instance, plus some storage costs, and bandwidth costs if you access the database outside of an Amazon availability zone.

RDS MyQL compatible (version 5.1) and is automatically monitored, restarted, and backed up.

Currently, there is no master slave replication, but this is being worked on (RDS beta just started today).

Here are my notes on my first use of RDS:
  • Install the RDS command line tools
  • rds-create-db-instance --db-instance-identifier marktesting123 --allocated-storage 5 --db-instance-class db.m1.small --engine MySQL5.1 --master-username marktesting123 --master-user-password markpasstesting123
  • Wait a few minutes and see if the RDS instance is ready: rds-describe-db-instances
  • Open up ports for external access, if required (note, here I am opening up for world wide access just for this test): rds-authorize-db-security-group-ingress default --cidr-ip 0.0.0.0/0
  • Use a mysql client to connect: mysql -h marktesting123.cyvbi77nio5f.us-east-1.rds.amazonaws.com -u marktesting123 -p
  • create database recipes;
  • in another bash shell: cat recipes.sql | mysql -h marktesting123.cyvbi77nio5f.us-east-1.rds.amazonaws.com recipes -u marktesting123 -p
  • In the mysql client: use the remote RDS hosted database and be happy :-)
  • delete RDS instance (to stop paying for it): rds-delete-db-instance marktestng123 --skip-final-snapshot
Any mysql client libraries should work fine.

Securing your Mac laptop

Laptops get lost and stolen a lot. I am extra careful with my laptop because I keep so much of my and my customer's private data on it. I take a few steps to protect this information that I want to share with you (Mac OS X specific):

I keep a small encrypted disk image that contains all my passwords and other sensitive information. It also contains my .ec2, .s3cfg, .profile, .ssh, .gnupg, and .heroku files. Then in my home directory I make soft links ln -s ... to these files.

I do not keep the password for this disk image in my OS X keychain!

It is a very small hassle: each time I boot up, I mount this image so my .ssh, etc. files are available. This adds 10 seconds of "overhead" to each time I boot my laptop.

Whenever I start working for a new customer, I ask them if they would like me to also keep their working materials encrypted (some overhead involed, so I like to ask them if I should spend the time doing this).

Update: a reader pointed out that this is only a partial solution, and I agree with that. Using full volume encryption is a much more secure system. With my easy scheme, for example, browser session cookies are vulnerable so someone might be able to access your gmail account and any services that use email password resets.

Saturday, October 24, 2009

More getting stuff done by doing what I most want to do experiments

I read an interesting article a few weeks ago (sorry, no attribution - can't find the article again) about trying to always do what you want to be doing. I used to do "round robin" style scheduling of my time: keeping a single to-do list and cycling through it (and sometimes just finishing small tasks outright).

I have always thought that I needed to apply some meta-level discipline to get tasks that I don't enjoy as much done in a timely way. Scheduling work is not so difficult because I usually have just 3 or 4 active customers, and I enjoy most of my work. Other things like yard work (I prefer new projects over maintenance) got the round-robin treatment, and even recreation (I like to hike, cook/eat, read, and watch movies) activities used to be scheduled round-robin style to a (very) small degree.

Lately, I have been experimenting with not doing any meta-level scheduling. Now when I finish an activity I start the new activity that is what I most want to do. The reason that this works is that even activities that I don't enjoy as much (e.g., I just did some maintenance work on our deck) actually get done because it feels so good to clear them from my to-do list!

Work-wise, this new "non scheduling" approach seems to also be working fine. I very much enjoy working, but some work is more fun :-) Still, I think that I am getting a small up-tick in productivity, and I find that I still get around to all tasks. One reason that this works is that some work tasks really are very difficult technically for me, and if I hit them right when I feel like a harder problem, things that I thought would be very difficult turn out to be simpler.

I was motivated to write up my experiments a few nights ago when I re-watched the documentary "The Hero's Journey" on Joseph Campbell's life and work. One of Campbell's teachings is to "follow your bliss," that is, to do in life what you really want to do. Listening to Joseph Campbell is like getting a tune up :-)

Friday, October 23, 2009

RDF datastores are noSQL also - always keep an RDF data store service running

We tend not to use things that are not "ready at hand."

RDF datastores are noSQL also :-)

I always keep Sesame running as a service just as I run PostgreSQL and MySQL services.

Some things are better stored, queried, and maintained in a graph database.

If you always have something like Sesame (or the free edition of AllegroGraph) running as a service, and if you have client libraries installed for your favorite programming languages then it is easier to quickly choose the best data store for any given task.

BTW, I also always keep a CouchDB service running.

Sunday, October 18, 2009

Cloud computing options and portability

I listened to Paul Miller's podcast with Rackspace's president of their Cloud Division Lew Moorman this morning. I mostly agree with his comments on easy portability between Rackspace cloud services and Amazon's EC2.

I have not yet used Rackspace's cloud offerings, so my comments here are based on their documentation and a conversation I had with one of their support engineers (for one of my steady customers: I declined some work tasks to move to Rackspace because I don't like to spread myself too thin: I spend a lot of effort staying up to speed on Amazon and AppEngine, so I prefer to specialize on those two deployment platforms). The advantage of Rackspace is the binding of a persistent disk volume with their virtualized server instances (really, they offer a standard sort of VPS hosting service) where with Amazon it takes a little extra work to manage EBS volumes separately. For me, I like the benefit of Amazon's SQS, S3, and Elastic MapReduce - that said, I would make a small bet that Rackspace will provide similar services, otherwise they will just be competing in the VPS business space, although with good support (that said, I have always received very good support from smaller hosting companies like RimuHosting).

Lew Moorman was critical of Google's AppEngine because it depends on very proprietary software (mostly their highly scalable non-relational datastore). A fair criticism, but if you really wanted to move off of AppEngine, there are migration paths like using DataNucleus JDO support for a relational database server or something massively scalable like HBase (from the Hadoop project). One big win for using AppEngine is that if you decide to build on top of the Wave platform, software agents at least for now need to be hosted on AppEngine, and this process is fairly simple.

Saturday, October 17, 2009

Switching an AppEngine project from JRuby+Sinatra to Java+JSP

It is a bit of a pain to take several hours to convert a working codebase in one language/platform to another. I kept having small problems with JRuby and Sinatra that were just AppEngine specific (Ruby (or JRuby) and Sinatra are awesome).

I am only about 20% into development, and I decided that I wanted really solid tools/platform. Also, converting working code in one language to another is simple.

What convinced me to make the switch is that Java + Eclipse plugins support is just so good for AppEngine development, that for now the change seems like a good decision. For my next AppEngine project, I'll probably go back to JRuby + Sinatra since the support is getting better.

I built the open source IDEA 9.0 git snapshot - works fine

Something to do while watching TV :-)

With the Apache 2.0 license, it will be interesting to see how it is used. It is a large git clone, but built easily using ant.

I get a free commercial license for IntelliJ IDEA (as I used to get free Enterprise JBuilder licenses from Borland) but I still plan on following the open source IDEA project - hopefully interesting things will happen! I use Eclipse a lot just because the Java AppEngine support is so very good, but for plain old Java coding, I like IDEA.

The open source edition of IDEA is great for plain old Java coding, BTW, but is missing JSP + Tomcat development support (but NetBeans does a good job for J2EE-- development, and who does J2EE development anymore :-)

It takes a while to do a git clone and build the IDE (builds versions for OS X, Windows, and Linux as the default ant build target) and since the build process is so easy, it was not much fun, so you might as well just download a built version for your OS platform if you don't want to peruse the source code.

Wednesday, October 14, 2009

Nice tool for writing and maintaining documentation: YMUL web service and yumlcmd Ruby gem

Although some customers request using a Word Processor for producing documentation, if it is my choice I like Latex and OmniGraffle for producing diagrams. Latex is the fastest tool (that I use) for producing great looking print or PDF documents.

I am experimenting with something else this morning: the YUML web app for creating UML diagrams and the yumlcmd Ruby gem (add http://gemcutter.org to your gem source and then gem install yumlcmd). Thanks to Under the Hat for pointing these tools out - check their blog for directions.

Although YUML hardly replaces OmniGraffle, it is cool to have documentation text based (Latex files and YUML files): faster, and less work.

Sunday, October 11, 2009

Some frustration with JRuby + Rails on Google AppEngine

A few engineers at Google and other developers are doing some good work towards getting Rails running on AppEngine both robustly and in a way that provides a good local development environment. One problem is simply that if your web app is not active, initializing JRuby + Rails + and all required gems can time out (30 second window for handling requests).

The Java and Python support for AppEngine is fantastic, but for two projects I want to do (my own projects, but may be revenue generating :-) I want a more agile programming language that Java and while my Python skills are sort-of OK, my knowledge of Django is very light.

I should probably just bite the bullet and spin up on Django, but I would strongly prefer working in Ruby. I have been experimenting with the JRuby + Sinatra + ERB + datamapper combination and at least an inactive web application spins up well within the 30 second request timeout window. I very much like datamapper (object identity issues) and it should not be too difficult to be completely portable on two platforms (given data import/export utilities):
  • JRuby + AppEngine
  • Ruby (1.8.x or 1.9.x) on any server
I like Sinatra as a light weight framework, and this technology choice is OK with me, except for one worry: my reason for wanting to use AppEngine (rather that Amazon EC2+S3+EBS, which is what I have been using for most customer projects and my own stuff) is to minimize my hosting costs if one or both of my ideas works out - I worry that using JRuby on Java AppEngine will not provide the same high performance as Java or Python web apps. I did compare (using the Apache benchmark tool) request times for JRuby + Sinatra + AppEngine (about 400 milliseconds/request) with Java + JSP + AppEngine (about 600 milliseconds per request). The rendered Java page was more complex so the benchmark between JRuby + Sinatra vs. Java + JSP looks like a wash.

For future projects if I need lots of back end processing (map reduce, spidering sites, etc.) then I will stick with Amazon EC2. If I can get by with just a web interface and a data store, then I would prefer AppEngine (to save a little money). Two great platforms!

Saturday, October 10, 2009

Designing for scalability and platform portability

Once an application is designed and at least partially implemented, options for scalability and portability are reduced. If a system's usage profile can not be predicted, then deploying to physical servers is a real problem because you have to pay for support for peak usage periods - however, relying on cloud infrastructure can very much limit platform portability.

It helps to consider scalability up front! Relying on scalable data store infrastructure like Googles AppEngine datastore or Amazon's SimpleDB can make life easier. For server side Java, coding to JPA makes it possible with some work to be portable between AppEngine datastore, SimpleDB, or using a traditional database on your own server. Some care needs to be taken to code to a subset of JPA (e.g., no cross domain queries in SimpleDB) if portability is important. In the Rails world, using Datamapper provides similar flexibility for portability between AppEngine datastore, SimpleDB, or a conventional database. And, taking this approach fails when you need a relational database.

Using asynchronous messaging is anther good strategy for scalability (and for decoupling). With some care, applications can be somewhat portable between AppEngine's Task Queue API, Amazon's SQS, or running something like RabbitMQ on your own servers.

Monday, October 05, 2009

Storing Lucene indices in Cassandra; cloud versus running your own server farm

The Lucandra project looks very interesting, but is incomplete at this time (see the "to be dones" at the bottom of the linked page).

Cassandra is a great project. I almost incorporated it into the design of a customer project recently, but we decided to host on Amazon so using their EC2, S3, SQS, and Electric Map Reduce services won out over rolling a custom stack.

I think that this must be a start up dilemma: long term, it is probably least expensive running one's own small server farm, but when you are just getting started a "pay as you go" cloud approach using very solid infrastructure tools like EC2, S3, SQS, SimpleDB, etc. makes sense.

I can't say this from personal experience, but my gut feeling is that if you can live within the constraints of Google's AppEngine, then it is probably less expensive using AppEngine than running your own server farm - even long term. BTW, if you have not read my DevX article on implementing search on the Java version of AppEngine, please check it out.

Thursday, October 01, 2009

Interesting new book: "Networks, Crowds, and Markets: Reasoning about a Highly Connected World"

This book will be published in 2010 but a complete pre-publication draft is available here. There is a PDF download link for the entire book near the top of the page.

If you enjoyed reading Albert-László Barabási's classic book "Linked: The New Science of Networks" then this new book looks like a great followup. I have not got too far into David Easley's and Jon Kleinberg's new book yet, but the range of topics in this 800 page book looks like it will make a good read.

Using Facebook Connect just got a lot easier

A new set of tools makes it much easier to integrate Facebook Connect in your web sites. Good job Facebook. Their old APIs and support were somewhat difficult to work with.

I would have added another example to my last book (about Web 3.0 stuff) if this had been available a few months ago.