Sunday, February 20, 2011

Don't stray too far from well supported language, tool and platform combinations

I have been doing a lot of customer work lately using Java EE 6 and Glassfish. Fairly nice development environment even if Java EE 6 is heavy on ceremony (but better than J2EE).

Just for fun today I took small play apps I had written in Rails and Clojure + Compojure and shoe-horned them to run on Glassfish. Interesting exercise, but unless there is an overwhelming need to create a custom deployment setup, it is so much better to go with the flow and stick with well crafted and mature setups like
  • Java and Java EE 6
  • Clojure + Compojure running with embedded Jetty behind nginx for serving static assets
  • Rails app hosted on Heroku
  • Web apps written in either Java or Python hosted on AppEngin using the supported frameworks
  • etc.
I must admit that I enjoy hacking not only code but also deployment schemes - enjoy it too much sometimes. Sometimes it is worthwhile, most often not.

Monday, February 07, 2011

Curated data

It is difficult to predict what data will have long term value so it is often safest to archive everything. With data storage costs approaching zero I think that we can expect high value data to last forever, baring a nuclear war or the crash of society.

Curated data has a higher value than saving "everything." I think that the search engine Blekko is interesting and useful because of what it does not have: human powered curation yields fewer results but very little SPAM. The Guardian's curated structured data stores have much higher value than the original raw data (from government sources, etc.). I can imagine The Guardian curated data becoming a permanent part of our history as for example are ancient stone tablets we see in museums.

I have long planned on providing curated news and technology data that has semantic markup either on my ancient knowledgebooks.com domain or a new placeholder kbsportal.com but I seldom have free time slots because of my consulting business. Hint: I would like having a few partners who are into statistical natural language processing and general data geeks to help me with this. I don't know if it would end up being a viable business or just a public service portal.

Sunday, February 06, 2011

Big Data

For the last few decades, it seems like I work on a project every few years that stretches my expectations of how much data can be effectively processed (starting in the 1980s: processing world-wide seismic to detect underground nuclear explosions, all credit card phone calls to detect theft, etc.)

I was in meetings three days last week with a new customer and as we talked about their current needs I made mental notes of what information they are probably not capturing that they should because it is likely to be valuable in the future in ways that are difficult to predict.

To generalize a bit, every customer interaction with a company's web sites should be captured (including navigation trails through web sites to model what people are personally interested in), every interaction with support staff, every purchase and return, etc.

Amazon has set a high standard in user modeling with Amazon suggests for products that you might want to buy. Collecting data on your customers should not make them feel creepy about interacting with your company but rather should make them feel important and well-serviced.

I am a voracious reader both for fun and to continue a life long education. I have found myself shifting some of my reading attention from computer science to statistics, math, and general business intelligence. (That said, I still read several computer science books per month).

I was disappointed that I could attend the Strata 2011 data conference but I did enjoy watching the keynote speech videos: fairly good material and worth some viewing time.