Tuesday, May 19, 2015

using nitrous.io

I recently signed up for the pro version of nitrous.io since they are phasing out the old version that I used for free. So far I am very pleased. Now you get isolated containers to work in with root access. I use the $15/month version that gives you 1G ram and 20G of disk space for a total of two containers. Since I only use nitrous.io for development, I found it easier just use one container (using the entire 1G ram) and cloning the git projects that I am working on.

Some of the things that I particularly like are:

  • The editor built into the web based IDE handles just about all programming languages and file types.
  • There is a separate app than can do two things: sync files between your laptop and your container and also forward ports so you can test run apps in the container and use your local web browser. If you don't want to forward ports, you can use their preview option that opens a new browser tab and lets you test in your browser without any port forwarding.
  • The battery on my laptops last longer if I do builds and test runs in the container. No (or less) fan noise :-)
  • It is easy to spin up a new container for experiments and just delete it.
  • When I am travelling, I don't have to load up a particular laptop with all of my projects - everything is in my cloud container so all I really need is a web browser.
  • Plays nicely with Heroku. I haven't tried it, but nitrous.io is also set up to play nicely with Google Cloud Services and Microsoft Azure.
  • With a little setup, you can SSH into your container from your phone, tablet, and laptops.
  • The nitrous.io web IDE reminds me of the Cider web IDE that I used internally at Google.
  • In general I think I save a fair amount of time having a container always available with my setup.
  • I tend to use my Mac and Windows 8.1 laptops more often that my Linux laptop, and even though I have SSH, git, IntelliJ, etc. set up on all of my laptops, always having a Linux development environment no matter what device I am on is a more consistent development experience.

Tuesday, May 12, 2015

More infrastructure changes: Heroku

I am pleased that Heroku has introduced a new low volume pricing tier. I never felt very comfortable freeloading on their free tier and free tier web apps also timed out leading to longer loading request times. Now for $7/month per app Heroku supports a "hobbyist" mode for lower traffic sites that never get timed out or swapped out. So far I have redeployed three of my low traffic sites to Heroku's new plan, moving them from a dedicated server. They reduced their free tier hosting of a site to only being active just 18 hours a day - in other words: great for testing deployments but not good for hosting sites for free 24/7. I think this is a good move on their part although I did hear some complaints on Hacker News about this. Under this new pricing tier my three low volume sites cost about $21/month to host. Under the old paid plan the cost would have been about $105/month.

BTW, I would like to thank everyone who took the survey (or emailed suggestions directly to me) about topics for my new book project Power Java. When this book is released, hopefully by August 2015, I will then continue work on and finish what is the same book, but using Clojure for the example programs (Power Clojure). Thanks! I appreciated the input.

Tuesday, April 14, 2015

Some infrastructure changes

In addition to using several programming languages, I also like to experiment with different web infrastructures.

A few weeks ago I switched from using gmail as my primary email service to using fastmail.com. I still use gmail as a backup email and for my Google identity but I decided that I liked Fastmail a bit better and the yearly cost is not much.

The other change I have made is switching my www.markwatson.com web site from a Ruby + Sinatra web app on my own server to a PHP app running on Google's AppEngine. My absolutely favorite feature of AppEngine is the rolling system logs that can be checked easily from the AppEngine console. When I worked at Google in 2013, I loved the internal development environment (Borg, online system logs, the Cider IDE, and much more). Using AppEngine is, in a small way, reminiscent of Google's internal environment - at least enough so to make me nostalgic :-)

I have never been a huge fan of PHP although I have used it over the years for occasional tasks for customers and (rarely) for my own web properties. Without using third party libraries, PHP with HTML, CSS, and a little JavaScript seems pleasantly low level and easy to hack.

As much as I like devops and in general configuring and running Linux servers (often VPSs on Azure, Digital Ocean, and AWS), I sometimes feel a little guilty spending time on operations: perhaps my time could be better spent. My preferred PaaS providers are AppEngine and Heroku, with Cloud Foundary technology services (like IBM's Bluemix) also fairly nice. One thing that has always bothered me with PaaS however are "free" usage tiers. For one thing I like being a paying customer with SLAs, support, etc. Also, free tiers have to affect to some degree pricing for paid users.

Wednesday, April 08, 2015

My two Clojure projects, life in Sedona Arizona, and my new book project

Two Clojure projects?

Well, actually, I had just one Clojure project until today. I refer to my project as KB2 (KnowledgeBooks.com 2) and it is basically a kitchen sink for everything that I thought that I wanted in a personal (and perhaps small group) research and content management system:

  • A personal version of Evernote: allows me to collect eBooks, web pages snippets and notes in a personal repository that is searchable. I use a Firefox add-on I wrote to capture multiple selections on web pages and send them to the web app.
  • Uses NLP to identify entities in eBooks, web pages and notes and add an information icon that provides DBPedia (WikiPedia) information on the fly.
  • Uses the Bing search API to find information on what my NLP analysis code considers if the main topic of eBooks, web pages and notes.

I enjoy meditation (also practice Yoga since about 1975 and Qigong for about two years) and after my early morning mediation this morning I had one of those ah-ha moments:

In using KB2 myself, the automatic Bing searches showing results on the side and DBPedia entity lookups started to get in my way after the novelty of these features "wore off." In other words, the Evernote team new what they were doing when they designed their (rather good) product! This morning I cloned KB2 into KB3 removing everything but the "personal version of Evernote" functionality. KB2 has a lot of useful Clojure code in it, and if I am not too lazy I might open source it all. KB2 has had so many rewrites that the Clojure and Clojurescript (and a little Java and JavaScript) code really need some cleanup love.

Life in Sedona Arizona

My wife Carol and I have been enjoying the early spring time in the mountains of Central Arizona. My friend Bill Bohan (one of the authors of the book Great Sedona Hikes) took this picture of me while we were hiking on Bear Mountain last week:


I have also been enjoying gardening. Several friends and I volunteer to keep a 1 mile historic irrigation ditch functioning to provide water to a historic farm and Crescent Moon Red Rock Crossing Park. The following three pictures (the first two taken by Don Fyffe) show me and some friends unloading some wood chips someone gave us for the farm. The third picture shows me holding up a bok choy I grew with the community garden in the background:

My new book project: "Power Java"

I was 'scientific' in my approach to choosing the material for this book: I had 11 topics that I wanted write about and I used a Google survey (it is located here) to get feedback from people who follow me on social media. Both the survey results and some great suggestions emailed to me by Alex Ott really helped me narrow down the topics for the book. Thanks to everyone who helped! (You can still add to the survey if you want.)

It was a difficult decision choosing Java for this book project. Most of my development in the last year has been in Clojure and Haskell (with a little Ruby and Java) but I decided that the book would have a wider audience written in Java.

I might provide an appendix and additional sample code showing the use of some of the examples with Clojure and JRuby wrappers but it is so easy for Clojure and JRuby developers to reuse Java code that I am not sure if it is worthwhile adding this material to the book. (Feedback on this will be appreciated, BTW.)

Monday, March 02, 2015

Net Neutrality (Yeah!), a new distributed wiki, and my current Clojure project

It is, IMHO, a very good thing that net neutrality now has the force of law behind it. The Internet is the most important artifact for sharing information ever and is worth protecting as a neutral platform. Even though there is a lot of value in walled gardens created by Google, Facebook, and Apple I still hope to see many more systems developed that support individual control of our own data and better support for privacy.

I think that Ward Cunningham's new Federated Wiki (on github: github.com/fedwiki) is a very interesting idea for combining local storage with federated sharing of content. Something to keep our eyes on!

I have been working on a combined document repository and general research tool that is tailored to my own needs but I will also sell it as a low cost commercial product (or I might make a compiled version free with the source code available with a commercial license for inclusion in other products - I am not sure yet). I want a system where I can store PDF (and perhaps other formats) files from eBooks I have purchased, papers published on the web, etc. with the usual search and annotation functionality. I am writing browser plugins for Firefox and Chrome to let me clip material from the web (stored with source URL and text from multiple selections) as JSON data for ingestion into my system. The final layer of functionality is support for research notebooks that can be used to collect references to local and web sources, along with my own notes and writing. My system is mostly written in Clojure and ClojureScript (with some Java and JavaScript). I still need to design and implement sharing by exporting research notebooks as PDF files and an easy to reuse JSON dump format, and a mechanism for making parts of a personal knowledge repository public via a read-only web interface.

The first version of my product will be for individual use (simple install on a personal server, or run locally on a laptop) but I am interested in evolving it into a federated system for use by small teams. A federated system is also useful for the single user use case: for example, sync content from a laptop to a server.

Friday, January 16, 2015

I bought an HP Stream 11" Windows 8.1 laptop

I have good intentions of starting to give talks at my small town's local library on safe internet browsing, privacy issues, and any topics that I get requests for. I bought a $200 HP Stream 11" laptop to get up to speed on Windows 8.1 since I only use OS X and Linux.

I am pleasantly surprised at how much I like this laptop! The display is pretty good, the keyboard is also OK, and the no-fan design is nice. It works well with a second 1080p external monitor also. I ordered the laptop directly from Microsoft: their "signature editions" come with no crap-ware; just Windows 8.1 and nothing else.

The shortcomings:

  • Only 32 GB of solid state disk space. After installing Ruby, Java 8, IntelliJ, git command line tools, Pharo Smalltalk, Chrome, and cloning several git projects I have about 11 GB free.
  • Slow CPU. For using Pharo Smalltalk this is no problem. It is a small problem for Clojure development. I use IntelliJ with the Clojure plugin for editing and have a lein repl running in a command window. Not too slow, but not great either.

I bought this laptop just to experiment/play with Windows 8.1 but I think that it will become my travelling laptop. It is very light and small to carry around. The build quality so far seems very good for a $200 laptop.

Saturday, January 03, 2015

Happy New Year

I would like to wish everyone a Happy New Year!

I am not going to make new year technology predictions, but I do have a few comments on what technology I am using and what I expect to be using this year:

First, I am surprised at how much Microsoft technology I am using. This started last spring when for political reasons I wanted to stop using Dropbox (I did not like their appointment of Condoleezza Rice to their board of directors). At the time, I could not find a good Dropbox replacement but after getting a Microsoft BizSpark business development grant last fall, and very much liking the Azure services, I recently decided to take another look at Office 365. It turns out that for the same $100/year I paid Dropbox for a terabyte of storage, with Microsoft I now get for the same cost one terabyte of cloud storage per family member and everyone in my household also gets Office applications and use of the online Office tools (that run really well in Chrome with Chrome apps). The versions of Microsoft Word, OneNote, and OneDrive for my iPad and my Samsung Note 4 Android "phone" are also excellent. I bet that I end up being a long term customer of Azure and Office 365.

Probably the biggest technology change I am making in the new year is my decision to go back to using Clojure as my primary programming language. I wrote a lot of code in Haskell in 2014, and spent many hundreds of hours studying Haskell - all time very well spent. That said, with Haskell I found myself dropping back to occasional use of Ruby and Java for various tasks so I ended up using three programming languages. It is nice to settle in with one comfortable and productive programming language and not have to do context changes. (That said, I still use Java, Javascript, and Clojure.)

My BizSpark sponsored (with a generous free Azure usage tier for up to 3 years) business plan involves writing a web service that I won't go into right now, but I have already started replacing the Haskell + Ruby prototypes with Clojure code.

The other big change in my life, technology wise, is my Samsung Note 4 Android "phone." We have been listening to the hype surrounding a mobile device lifestyle for years, but for me having a very capable computing device with a very high resolution screen always with me has been a major improvement because I can always write, communicate, do photography, and in a pinch open a SSH window and write a little code or do some admin on a server. It is liberating to not always need to have a laptop with me.

Thursday, December 18, 2014

Happy Holidays

I want to wish everyone a happy holiday season!

Carol and I have been traveling a lot this year and we are staying home this Christmas, hanging out with our friends in Sedona Arizona instead of traveling to see family in California and Rhode Island.

I was going to buy a new laptop for my work and writing (my MacBook Air is 3 years old) but in spite of my laptop being dinged up it is otherwise in good shape. I decided to do something radical with it yesterday:

When I bought my laptop 3 years ago I initialized it with a Time Machine backup from my older MacBook Pro. Needless to say, there was a lot of cruft on my system. I was also running a developer's preview of Yosemite (with all available updates). Yesterday I reformatted the disk drive and did a fresh install of OS X Yosemite - without restoring anything from Time Machine.

I did manually (command line) restore several directories from the latest folder in the Time Machine backups to save the time of not doing fresh git clones of all of my projects, work and writing. I also decided to just install a minimal set of writing tools that I currently use (mostly I now use leanpub.com so all I need for writing is a plain text editor). For programming languages I only installed Haskell, Java 8, node.js, and Ruby.

Restoring OS X systems with Time Machine works very well, but it was time for a fresh start. Everything seems to run much faster and my SSD drive is 75% empty.

I am "sort of" retired now. I try to limit myself to about 10 hours a week of consulting and about 20 hours a week working on a commercial software product. Currently I am helping one customer integrate IBM Watson into their product and I am doing some development work for another customer (online sports game: Ember.js front end with a Haskell back end). Fun stuff!

With two friends I volunteer maintaining a historic farm owned by the US Forest Service. Part of this is keeping land clear of weeds and planting and harvesting food, and another part is keeping a 1 mile irrigation ditch (supplies water for the farm and a local park) clear. Tim took these two pictures of Don and I working on the irrigation ditch last week:

I enjoy hiking and kayaking, but this year I have spent more time volunteering on the farm.

Saturday, December 06, 2014

I am back from vacation

Carol and I just got back from a cruise from San Diego to Hawaii to Ensenada Mexico and back to San Diego. We have done this cruise before but several people in our extended family wanted to go and we enjoyed the family time. The best part of this particular cruise is that you get about 10 "sea days" which I enjoy.

I took this picture from the Promenade Deck where I spent most of my time reading (and I hate to admit, hacking some Haskell NLP code):

Carol and I don't gamble but she received a birthday gift from the cruise line of some free slot machine time, as seen in this picture:

We had a lot of fun on shore also. Brother Ron, sister Anita, Carol, and I went snorkeling in Maui on a reef 2/3 mile offshore where we saw many green sea turtles. Here is a picture of me that my brother Ron took and a picture of a sea turtle that I took with Ron's camera:

We also had fun on the big island, Oahu, and Kauai. When the ship stopped in Ensenada a few of us went inland to two wineries - a different experience than in wineries back home: more wine and lots of tasty food to go with it. Here is one last picture of my Dad, me, and my brother Ron in the ship's library:

I have a new phone, a Samsung Galaxy Note 4, that was great for the trip. It takes 4K ultra high definition video with video stabilization and great pictures. I put together an 11 minute video for my family that is really pretty good. The advantage of using a phone is that you always have it with you. The huge size of the Note 4 also made is great for reading Kindle books and listening to music and audio books. Now that I have the Note 4 I don't use my iPad mini very often. Except for 3 or 4 one hour Haskell hacking sprints, I never had to use my laptop; everything (email, web, reading, etc.) was done with my phone.

Friday, October 17, 2014

I updated cookingspace.com

I originally wrote cookingspace.com in Ruby and Rails about 8 years ago, and a few years ago I rewrote it from scratch in Clojure.

This week I made some major improvements. First, I cleaned up some technical debt by rewriting cookingspace.com as a plain Compojure and Hiccup app, removing all reliance on the deprecated Noir library. I also did some major code cleanup.

I also rewrote the code for calculating and displaying the nutritional information for the recipes. The nutritional information used on this site is derived from the USDA Nutrition Database. Nutrition information is shown for each displayed recipe. This includes total percentage of minimum daily requirement and for each nutrient the recipe ingredients supplying most of the nutrient (ingredients providing less than 1% of contribution to daily requirement are not shown). The cookingspace.com system tracks 42 nutrients including most vitamins and minerals important for good health.

I originally wrote cookingspace.com to help me track the amount of vitamin K in my diet. I now have the relative vitamin K levels in common food memorized and it is easy to eat (approximately) the same amount or vitamin K each day. Mission accomplished! I now use cookingspace.com to get a better understanding of which recipe ingredients contribute good and "bad" nutrients.

Sunday, October 12, 2014

Experimenting with Clojure + Ember.js and ClojureScript with Om

For a personal project I want to make a web app with a "rich client" interface. I had originally planned to write this app in Haskell with the Yesod web framework. However, as much as I like Haskell, I do still have occasional time wasting problems with cabal, Yesod, and sometimes with non-pure Haskell code. My gut feeling is that I will get things done faster if I use Clojure.

In the past I have experimented using Clojure and Ember.js but until today I have not spent much time with ClojureScript and Om (I have written web apps using ClojureScript, so the learning curve is trying to use Om).

Getting started with Om is straight forward. I used the chestnut lein plugin to create a new Clojure + ClojureScript + Om project. The chestnut plugin is very nice - it set up a reasonable development environment without having to go through a learning curve. After experimenting with the generated project, I then starting substituting in code from David Nolen's Om tutorial into the skeleton project that the chestnut lein plugin created.

I am not much of a user interface expert, and I am not sure how much time I will devote to learning Om. If I earned my living doing web apps, this learning curve would be very worthwhile. I usually use simple tools to make web apps like: Ruby + Sinatra, sometimes Rails, often Clojure + composure + hiccup.

A few years ago I did experiment with Ember.js and created small projects on my github account to experiment with Ember.js with various backend services. Today I cleaned up my embers-clj sample project. I removed the use of the old noir library and made this a straight-up compojure app. I also changed the JavaScript Ember.js application to remove two deprecation warnings.

I am leaning towards using Ember.js, mostly because I already am familiar with it. I also like the combination of Ember.js with a node.js backend (my started project for Ember.js and node.js is also on github). There are advantages to using JavaScript for both client and server sides but I like Clojure better.

Saturday, October 11, 2014

It is simple to use the IBM Watson AI APIs

If you sign up for a (free for 30 days) account on IBM BlueMix, it is simple to use a pre-canned IBM Watson instance that contains medical information and travel information. Code samples are provided for Java, node.js, and Ruby. I wanted to use Ruby so I used these setup instructions.

IBM BlueMix uses the Cloud Foundry PaaS tools. If you have any experience using Cloud Foundry then setting up a (free for 30 days) BlueMix account, and deploying one of the sample web applications that uses the pre-canned IBM Watson medical and travel instance can be done in about 20 or 30 minutes. This is a worthwhile exercise because once you deploy your sample web app you can experiment with IBM Watson's ability to parse natural language questions and return relevant data. Very nice stuff!

In order to build a custom application using IBM Watson you need to supply training documents and training questions. I am helping a customer do this right now. Currently you need a partnership arrangement with IBM to train your own IBM Watson instance but I believe that the ability for anyone to do this via BlueMix will be available in the near future.

Sunday, September 28, 2014

I was accepted into the Microsoft BizSpark program

Since I winding down my consulting business this year (that means that I am limiting myself to a maximum of about 10 hours a week working for consulting customers) I have spent a lot of time getting better at developing in Haskell, reviewing what I hopefully already know about machine learning, and taking classes. In other words, I want to work on my own stuff :-)

I have had an idea for starting a small business and a while ago I applied to the Microsoft BizSpark program. I was just accepted into the program a few days ago. Using my own business idea as my yardstick, Microsoft is taking long term bets with BisSpark. It costs them money and resources to support the development of new business ideas, but the long tail is many years of selling infrastructure services. Even though there is not much lock-in using Microsoft Azure I am absolutely personally committed to using Azure long term if my idea works: Microsoft is providing up to $150/month of free Azure services for up to three years and it seems like really bad form to not reward them with long tail business if things work out. If you have a web based business idea that you want to pursue, I would suggest giving BizSpark a serious look.

I am planning on just using Linux servers on Azure, and it has been really easy to configure a Ubuntu server, hook up domains, etc. So far I am only using a single server for development and test deployments. I am used to doing everything on the command line but the Azure dev dashboard is useful to get a quick view of resource use and configuration. I am just using a small A-series 1 core VPS with 1.75 GB of RAM for development right now but I am pleased by how fast large builds run. It would be interesting to see relative performance of "1 core" VPS systems from many providers.

Azure offers some nice "Amazon-like" add ons for monitoring and setting up clusters for horizontal scaling. While it is definitely less expensive (except for labor costs) to run your own servers, I am a huge fan of (almost) no-admin PaaS services like Heroku, IBM's BlueMix, Google AppEngine, etc. and basic cloud infrastructure providers like Amazon (AWS), Google (Compute Engine) and Microsoft (Azure). I expect the large infrastructure providers to make a healthy profit, and I expect that they will!

Wednesday, September 24, 2014

I pushed a NLP demo to IBM's PaaS service BlueMix

The demo processes news stories to summarize them and map entities found in the text to DBPedia URIs. The Ruby code is similar in functionality to the open source Haskell NLP code on my github account.

Some background: I have been helping a customer integrate the IBM Watson AI platform into his system. I noticed on Hacker News this morning that IBM's PaaS service BlueMix will very soon offer a sandbox for IBM Watson services. I signed up for BlueMix to have an opportunity to get more experience using IBM Watson.

I just spent an hour putting together a quick NLP demo that uses my own entity detection code and the Ruby classification gem which supports pretty good summarization. Give it a try :-)

2014/09/29 update: I stopped this quick demo I put together - is is simple and was just to experiment with BlueMix. A better demo is my KBSPortal.com site.

BlueMix is built using Cloud Foundry so if you are already familiar with the Cloud Foundry command line tools then you will find the development cycle very familiar.

Wednesday, September 17, 2014

Setting up "Heroku like" git push deploys on a VPS is so easy

I was reading about Docker closing a $40M series C round this morning. While containerization is extremely useful at large scale, I think that the vast majority of individual developers and small teams write many web applications that don't need to scale beyond a beefed up VPS or single physical server.

For a good developer experience it is difficult to beat a slightly expensive but convenient PaaS like Heroku. However, if you have many small web app projects and experiments then hosting on a PaaS and paying $30-$50/month per application can add up, year after year. If you need failover and scalability, then paying for a PaaS or implementing a more failsafe system on AWS makes sense. For experimental projects that don't need close to 100% uptime, I set up a .git/hooks/post-commit git hook like this:

./rsync.sh
ssh [email protected] 'bash -s' < run.sh
I have my DNS setup for myappname.com (this is not a real domain, I am using it as an example) and all other domains for my example/experimental web apps point to the IP address of my large VPS. My rsync.sh files look like this:
rsync -e "ssh" -avz --delete --delete-excluded  \
   --exclude-from=/Users/mark/Code/mywebapps/myappname.com/rsync_exclude \
   /Users/mark/BITBUCKET/myappname.com  [email protected]:/home/mark/
In my rsync_exclude file I specify to not copy my .git folder to the server:
.git
The run.sh file that gets remotely executed on my server looks like this:
#! /bin/bash

ps aux | grep -e 'myappname.com' | grep -v grep | awk '{print $2}' | xargs -i kill {}
(cd myappname.com; lein deps; nohup lein trampoline run prod > out.log&)
This is the pattern I use for running Clojure web apps. Running Ruby/Sinatra, Haskell, and Java apps is similar.

Since I tend to run many small experiments on a single large VPS, I use entries like the following in my /etc/rc.local file to restart all applications if I reboot the VPS:

(cd /home/mark/myappname.com ; su mark -c 'nohup lein trampoline run prod > out.log&') &

I use an account on the server that does not have root or sudo privileges so my web apps use non-privileged ports and I use nginx as a proxy. In my nginx.conf file, I have entries like the following to map non-privileged to virtual domain names:

 server {
    listen       80;
    server_name myappname.com www.myappname.com;
    location / {
      proxy_pass http://localhost:7070;
      proxy_redirect off;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
    error_page 500 502 503 504  /error.html;
    location = /error.html {
             root  /etc/nginx;
    }
 }
In this example, the myappname.com application is running on the non-privileged port 7070 and this app would be accessed as http://myappname.com or http://www.myappname.com. On my laptop, just doing a git push has the new version of my app running on my server in a few seconds.

Sunday, September 14, 2014

Changed license on my Haskell NLP code and comments on IBM Watson AI system

When I added licensing information on the github repository for my Haskell NLP experiments I specified the AGPL v3 license. I just changed it to GPL v3 so now it can be used as a web service without affecting the rest of any system that you use it for. I also did some code cleanup this morning. In addition to the natural language processing code, this repository also contains some example SPARQL client code and my Open Calais client library that you might find useful.

Some news about IBM Watson: their developer web site now has more documentation and example code available without needing to register to become an IBM Watson Partner.

I am helping a long term customer use IBM Watson as a web service over the next several months so I registered as a partner and have been enjoying reading all of the documentation on training an instance for a specific application, the REST APIs, etc. Good stuff, and I think IBM may grow a huge business around Watson.

Saturday, September 13, 2014

I am open sourcing my Haskell NLP experiments

I just switched the github repository for my NLP experiments to be a public repository. Git pull requests will be appreciated! The code and data is released under the AGPL version 3 license - if you improve the code I want you to share the improvements with me and others :-)

This is just experimental code but hopefully some people may find it useful. My latest changes involve trying to use DBPedia URIs as identifiers for entities detected in text. Simple stuff, but it is a start.

Sunday, July 27, 2014

Testing the new Amazon Zocalo cloud file storage service

I am still looking for Dropbox alternatives. I wrote five days ago about trying Office 365 and OneDrive and today I will briefly go over my first impressions of Zocalo:

The setup was confusing: I used a "-" in the site name and the initialization process failed silently, never sending me a confirmation email. I removed the "-" (as a wild guess!) from the site name and everything worked. Zocalo is a beta service, so this is understandable.

A more serious problem for my particular use case is that there seems to be no support for "selective sync." On Dropbox, I save space on the small SSD drive of my MacBook Air by un-syncing folders that I probably won't need for a while. I like this feature.

The strong points of Zocalo are managing files in a work team or a family group. This is the great use case for Zocalo.

It really is not fair to compare Zocalo and Office 365/OneStore with Dropbox because third parties have had time to use the Dropbox APIs in their applications. As an example, text editors on my Android phone and iPad use the Dropbox APIs for live editing of compatible files. In time, third party support will probably be there for Zocalo and OneDrive.

Tuesday, July 22, 2014

Trying Office 365 on Mac, iPad, and Android

I am evaluating alternative cloud services and the 1 terabyte of OneDrive storage certainly attracted my attention. I have been using Dropbox for many years and have usually been happy with it. While I was very disappointed that Dropbox added Condoleezza Rice to their board of directors (I don't like her strong support of our invasion of Iraq and her views on privacy vs. unencumbered government surveillance) that alone is not enough to make me stop using Dropbox. Still it is good to have options and I very much like the direction that Microsoft's new CEO Satya Nadella is taking the company. Don't get me wrong, I don't view Microsoft, Apple, and Google as being perfect either in regards to user privacy. A simple fact of life is that the US government can apply very strong soft pressure against tech companies in the US, to the detriment of our economy. Anyway, enough politics, here are my initial thoughts on Office 365:

I signed up for the free 30 day trial of Office 365 earlier today and installed all of Office 365 on my MacBook Air and just OneDrive, OneNote, and Microsoft Word on my iPad and Android phone. So far the best feature is that Word documents are actually easy to read and edit on my iPad and Android phone. Sweet.

Satya Nadella's strategy of supporting all devices for Microsoft's productivity tools seems like a great strategy to me. Anyone who doesn't think that cloud based services will continue to dominate the way people use devices has not been paying attention.

Unfortunately, OneDrive has some really rough edges dealing with opening plain text files on my iPad and Android phone. I keep notes as text files and the option for using notes seems to be importing everything into OneNote. Note syncing between my MacBook Air, iPad, and Android phone works well, but I really do prefer plain text files. Strangely, OneNote does not store notes files on OneDrive! On my Mac, they are hidden in ~/Library in a cache folder. PDF files can be conveniently read from OneDrive on iPad, but it is not so convenient on my Android phone.

What about security and privacy?

I use encryption when storing sensitive information on Dropbox and I am modifying my backup zsh scripts to also encrypt sensitive information to OneDrive. Easy to do! As a consultant, customers trust me with some of their proprietary data and information and I always try to keep customer data encrypted on my laptop and cloud backup.

Why not use Google Drive?

Actually, even though I don't sync my Google Drive to my Mac, I do use the web interface and use it for offline backups. Google Drive, like Microsoft's OneDrive, is not as facile as Dropbox. There is also the simple fact that I rely on Google for so many services that I prefer using an alternative cloud drive.

I am in no hurry to complete my evaluation of Office 365. My paid for Dropbox account is prepaid for another seven months. When my free evaluation period of Office 365 is up I plan on paying for the service for a few months while deciding if I want to make it my primary cloud service.

What about Apple?

I really enjoy using both iOS and Android devices, mostly for the fun of the different experience. That said, now that I am basically retired (I still consult several hours a week, work on a tech business idea, and write books, so my friends and family take my "retired" status with some skepticism :-) I might end up just living in Apple's little walled garden, and use their cloud services. Right now, Apple's cloud services are not very impressive but I expect large improvements. In any case, I am in no hurry but sometime in the next year I would like to settle on one primary cloud service, using others as a backup.

Update July 27, 2014: I have been using Office 365 for five days on my OS X laptop, iPad, and Android phone. So far, the only thing that I really dislike is that selective sync does not work on the Mac client: selecting a folder to not sync causes the app to crash. I do like the OneNote application: it works well on my Mac, iPad, and Android phone.

Tuesday, July 08, 2014

Some Haskell hacks: SPARQL queries to DBPedia and using OpenCalais web service

For various personal (and a few consulting) projects I need to access DBPedia and other SPARQL endpoints. I use the hsparql Haskell library written by Jeff Wheeler and maintained by Rob Stewart. The following code snippet:

{-# LANGUAGE ScopedTypeVariables,OverloadedStrings #-}

module Sparql2 where

import Database.HSparql.Connection
import Database.HSparql.QueryGenerator

import Data.RDF hiding (triple)
import Data.RDF.TriplesGraph

simpleDescribe :: Query DescribeQuery
simpleDescribe = do
    resource <- prefix "dbpedia" (iriRef "http://dbpedia.org/resource/")
    uri <- describeIRI (resource .:. "Sedona_Arizona")
    return DescribeQuery { queryDescribe = uri }
    

doit = do
  (rdfGraph:: TriplesGraph) <- describeQuery "http://dbpedia.org/sparql" simpleDescribe
  --mapM_ print (triplesOf rdfGraph)
  --print "\n\n\n"
  --print rdfGraph
  mapM (\(Triple s p o) -> 
          case [s,p,o] of
            [UNode(s), UNode(p), UNode(o)] -> return (s,p,o)
            [UNode(s), UNode(p), LNode(PlainLL o2 l)] -> return (s,p,o2)
            [UNode(s), UNode(p), LNode(TypedL o2 l)] -> return (s,p,o2)
            _ -> return ("no match","no match","no match"))

    (triplesOf rdfGraph)

          
main = do
  results <- doit
  print $ results !! 0
  mapM_ print results

I find the OpenCalais web service for finding entities in text and categorizing text to be very useful. This code snippet uses the same hacks for processing the RDF returned by OpenCalais that I used in my last semantic web book:

module OpenCalais (calaisResults) where

import Network.HTTP
import Network.HTTP.Base (urlEncode)

import qualified Data.Map as M
import qualified Data.Set as S

import Control.Monad.Trans.Class (lift)

import Data.String.Utils (replace)
import Data.List (lines, isInfixOf)
import Data.List.Split (splitOn)
import Data.Maybe (maybe)

import System.Environment (getEnv)

calaisKey = getEnv "OPEN_CALAIS_KEY"

escape s = urlEncode s

baseParams = "<c:params xmlns:c=\"http://s.opencalais.com/1/pred/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"><c:processingDirectives c:contentType=\"text/txt\" c:outputFormat=\"xml/rdf\"></c:processingDirectives><c:userDirectives c:allowDistribution=\"true\" c:allowSearch=\"true\" c:externalID=\"17cabs901\" c:submitter=\"ABC\"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>"

calaisResults s = do
  key <- calaisKey
  let baseUrl = "http://api.opencalais.com/enlighten/calais.asmx/Enlighten?licenseID=" 
                ++ key ++ "&content=" ++ (escape s) ++ "&paramsXML=" 
                ++ (escape baseParams)
  ret <- simpleHTTP (getRequest baseUrl) >>= 
    fmap (take 10000) . getResponseBody 
  return $ map (\z -> splitOn ": " z) $
    filter (\x -> isInfixOf ": " x && length x < 40)
      (lines (replace "\r" "" ret))
  
main = do
  r <- calaisResults "Berlin Germany visited by George W. Bush to see IBM plant. Bush met with President Clinton. Bush said “felt it important to step it up”"
  print r

You need to have your free OpenCalais developer key in the environment variable OPEN_CALAIS_KEY. The key is free and allows you to make 50K API calls a day (throttled to four per second).
I have been trying to learn Haskell for about four years so if anyone has any useful critiques of these code examples, please speak up :-)