Tuesday, May 23, 2017

I updated my Natural Language Processing (NLP) library for Pharo Smalltalk

I have recently spent some time playing around in Pharo Smalltalk and in the process made some improvements to my NLP library: I changed the license to MIT and added summarization and sentence segmentation. Older code provides functionality for part of speech tagging and categorization.

Code, data, and directions are in my github repository nlp_smalltalk.

My first experience with Smalltalk was early 1983. The year before my company had bought a Xerox 1108 Lisp Machine for me and a Xerox technical sales person offered me a one month trial license for their Smalltalk system.

Pharo Smalltalk very much impresses me both for its interactive programming environment and also for the many fine libraries written for it.

I don't spend much time programming in the Pharo environment so I am very far from being an expert. That said, I find it a great programming environment for getting code working quickly.

Thursday, April 06, 2017

I am using Lisp more, and big changes to my consulting business

I haven't been using Lisp languages much in the last five or six years since I started using Ruby or Haskell more often to experiment with new ideas and projects. Now that I am winding down my remote consulting business (more detail later) I want to spend more time writing:

I have three book projects that I am currently working on: "Practical Scheme Programming (Using Chez Scheme and Chicken Scheme)", "Ruby Programming Playbook",  and a fourth edition update to "Loving Common Lisp, or the Savvy Programmer's Secret Weapon". All three of these books will be released using a Creative Commons no commercial reuse, no modifications, share with attribution license, so copies of these eBooks can be shared with your friends.

I was very excited when the fantastic Chez Scheme system was open sourced but some of the excitement was soon tempered by the time required to get much of my existing Scheme code running under Chez Scheme and R6RS. To be honest, much of the work for this book is simply for my own use but I hope that I can help other Scheme developers by organizing my work to get set up for Chez Scheme development, by supplying a large number of useful code "recipes", and documenting techniques for using Chez Scheme. I used to mostly use the (also excellent) Gambit Scheme, created by Marc Feeley, for writing small utilities that I wanted to compile down to compact native code. Scheme works better than Common Lisp or Clojure for generating small and fast executables with fast startup times.

Another writing project I am working on now is to update and expand my book Loving Lisp book (it has sold well, and previous buyers get a free upgrade when I am done with the rewrite later this year). There are several new examples I am adding, many of which will simply be rewrites of Haskell, JavaScript, and Java programs that I have developed in recent years. Rewriting them in idiomatic Common Lisp will be fun.

I have a long history using Common Lisp, starting when I upgraded my  Xerox 1108 Lisp Machine in 1983. I have written two successful commercial products in Common Lisp, used a derivative of Common Lisp to program a Connection Machine, and used CL on several large projects.

To brush up on Common Lisp, I am (once again) working through several books written by people with far better Common Lisp programming skills than I have: Edi Weitz, Peter Norvig, and  Paul Graham. My Common Lisp code sometimes suffers from using too small of a subset of Common Lisp and I don't always write idiomatic Common Lisp code. It is definitely time for me to brush up on my skills as I rewrite my old book.

I have only written one Ruby book (for Apress) in the past but since I use Ruby so much for small "getting stuff done" tasks, I have intended to finish a new book on Ruby that I started writing about two years ago.

I hope to have all three of these writing projects complete in the next 18 months. When I published my last book "Introduction to Cognitive Computing" last month, I counted my titles and realized that I have written 24 books in the last 29 years. I love every aspect of writing, and I have met many awesome people because I am an author, so the effort is very worthwhile.

re: large changes to my consulting business:


I have combined doing remote and onsite consulting for the last 18 years. I used to very much enjoy both remote and onsite work but in the last few years I have found that I don't enjoy working remotely as much as I used to while I have been very much enjoying my onsite gigs where I get to meet my customers and not just talk on the phone. Working onsite is a very different experience. So, I have stopped accepting remote work except for longer term projects with some onsite "face time" with customers.

Sunday, March 05, 2017

Technology, antifragile businesses, and workflow

I have been enjoying Nassim Taleb's book 'Antifragile' in which I have learned (or better understood) how difficult to impossible it is to predict the future, especially events with a low probability. Taleb does convince that it is possible and desirable to rate personal habits, health issues, business, governments, etc. as to how fragile <--> robust <--> antifragile they are. Robust is good, antifragile is even better.

It is fragile, for example, to depend on the salary from one company to support your family while investing heavily in that company's stock. It is more robust having a side business to earn extra money and to broadly distribute long term investments. It is antifragile to own multiple businesses. Taleb argues, and I agree, that it is better to earn less but have safer more distributed income streams. Personally, I have three businesses: software development consulting, writing books, and I am a landlord for income properties. I am in the process of opening a fourth business, iOS and macOS apps for two very different use cases (4/26/2017 edit: I finished prototyping my iOS nutrition/recipe iOS app, but I am putting on hold further iOS developement for now).

I read a lot, both books and long essay format things on the web. For years I have categorized and collected useful articles and snippets of text and relied on text search to find what I need later. Since most of my reading is done on my iPad Pro, I have changed the way I manage research materials: I use the iOS app Notability to quickly markup what I am reading with ideas for business, application ideas, etc. I then export what I am reading into one of a hierarchy of folders on Google Drive. I favor Google Drive over iCloud or DropBox because all PDFs on Google Drive are easily searchable. Using an Apple Pencil makes the Notability app more convenient. In any case, it is much more useful to capture my own thoughts along with research materials.

This setup replaces a custom 'Evernote like' system I wrote a few years ago that by using a Firefox plugin I wrote I could capture snippets of what I am reading on the web to a custom web app I wrote. It was a pain to maintain this system for my own use, and it was in no state to develop into a product. Relying on a commercial product and Google Drive for storage is much better.

I have drastically reduced the complexity of my working tools. I have shelved the multitude of Linux systems I have (it would embarrass me to admit how many I have), and now I just use an iPad Pro, a MacBook with and external Retina display, and I have a 60 GB RAM, 16 core VPS I can spin up and down as I need it. Almost all of my development work is now in Ruby (using RubyMine), Java (using IntelliJ), and Haskell (using Emacs and Intero). Having a much simpler work environment cuts my overhead. I have also simplified publishing to the web, favoring using Heroku, Google AppEngine, and serving static sites using Google cloud storage.

Saturday, January 14, 2017

Happy New Year 2017

Happy New Year everyone!

We live in interesting times. We are witnessing exponential growth in technologies and social and economic change. I am going to share my personal views on these two topics and then conclude with my plans for 2017 for leading a free and inspired life.

It is difficult for us humans to really understand exponential growth, as we are seeing in artificial intelligence and other technologies like genetic engineering. One personal way to come to grips with exponential growth is to conduct a thought experiment: compare the technological changes in the world between the times you were ten and twenty years old and the changes in technology in the last ten years. Even a few years ago my cellphone did a fairly poor job at understanding my spoken speech and now it understands me almost perfectly and speech input is now the way that many of us interact with our mobile devices. In my field of machine learning and artificial intelligence, deep learning neural networks have revolutionized how we do speech recognition, language modeling, recognize images, and build predictive models. I expect that environmentally safe energy advances like solar, storage of electricity, and likely viable fusion power will profoundly alter the world for the better in the next ten years. Also remember that with very inexpensive power, fresh water becomes less of a problem at least near oceans because of desalination.

The outcome of rapid social and economic change is much less clear. I believe that people should think for themselves when it comes to politics, and in general politics gets too much of our attention. My thoughts on how to live a free and inspired life are influenced by Catherine Austin Fitts who suggests that we pay more attention to our own physical, mental, and spiritual health than externalities like politics and the trappings of materialism that do little to improve our lives. Catherine promotes the idea of concentrating on adding value to our work, businesses, and society at large. I tend to divide events in my environment into "things I can affect" and "things that I can't do anything about." In principle I like to put almost all of my energy and creativity into things that I can affect.

In addition to enjoying the company of my family and friends my plans for 2017 include:

  • Spending close to zero time watching the "news" on TV. I believe that 20 or 30 minutes a week reading news on the web, preferably randomly chosen from multiple news agencies in many different countries, is sufficient to understand what I need to know about the world situation. Spending many hours a week in a "mental bubble" watching the same news service on the same TV station in just my own country seems like a colossal waste of time that can be better spent on other activities. In the Data Science sense, I "sample" news sources.
  • I hope to spend even more time writing in 2017. I published Haskell Tutorial and Cookbook at the end of last year and my two current writing projects are Introduction to Cognitive Computing and Haskell Cookbook, Volume 2. My wife Carol helps me with editing my books.
  • Cooking and the science of food is a core personal interest and I hope to spend a fair amount of time applying AI and machine learning to my recipe site cookingspace.com. Currently I use the USDA Nutrition Database to estimate the amount of core nutrients in recipes and use a machine learning model to predict what additional ingredients would taste good with any recipe. I am rewriting the core analysis code in Ruby (the latest version is in Clojure) and I plan on major web site updates and I plan on also using the RubyMotion development tools to write apps that use the same analysis code for iOS, Android, and macOS.
  • Spending even more time hiking, kayaking, and at the gym.


Saturday, December 10, 2016

Benefits of static web sites hosted on Google Cloud Storage, Azure Storage, or Amazon S3

Most of my sites have no dynamic content but I often still hosted them as a Ruby Sinatra app or use PHP.

A few years ago I experimented with using the static site generator Jekyll but still hosted the generated site on one of my own servers using nginx. After a while I decided to revert the site to its old implementation as a Sinatra web app (even though the site was always static, that is, no server side actions required except for serving up static HTML, JS, and CSS files).

I am now using a far superior setup, and I am going to document this new setup to document it for myself, and perhaps other people may find this useful also:

I chose to use Google Cloud Storage for personal reasons (I used to work as a contractor at Google and I fondly remember their infrastructure, and using GCP is slightly similar) but using Amazon S3 or Microsoft Azure is also simple to set up.

Start by installing Harp and a static site local web server:

npm install -g harp
npm install -g local-web-server


Harp by default uses Jade, and I spent about 10 minutes of "furious editing" for each of my static sites to convert to the Jade format from HTML, ERB, or PHP files. This is optional but I like Jade and I thought that long term it would save me maintenance effort using Jade. As you edit your site use "harp server" to test the site locally. When you compile a web site using "harp compile" a subdirectory www is created with your static site ready to deploy. You can test the generated static site using "cd www; ws" where ws is the local web server you just installed.

You need to create a storage bucket with your domain name, which for this example we will refer to as DOMAIN.COM. I created two buckets, one DOMAIN.COM and one www.DOMAIN.COM and for www.DOMAIN.COM I created a single index.jade file (that gets compiled to www/index.html) that just has both a HTML redirect header and Javascript for a redirect to DOMAIN.COM.

The only part of this process that takes a little time is proving to Google that you own the domain, if you have not done so in the past. Just follow the instructions when creating the buckets and then copy your local files:

cd www
gsutil -m rsync -R . gs://DOMAIN.COM
gsutil defacl ch -u AllUsers:R gs://DOMAIN.COM

I had to also manually set the static files copied to GCS to have public access. You will have to change the DNS settings for your site to create a CNAME for both www.DOMAIN.COM and for www.DOMAIN.COM pointing to c.storage.googleapis.com. Whenever you edit a local file use "cd www; gsutil -m rsync -R . gs://DOMAIN.COM" to re-sync with your GCS bucket.

After waiting for a few minutes test to make sure your site is visible on the web. For one of my sites I also used a free Cloudflare service for HTTPS support. This is very easy to setup if you already have a Cloudflare login. Just add a free web site and make the same to CNAME definitions pointing to c.storage.googleapis.com and then Cloudflare will give you two DNS servers of their own that you need to use instead of whatever DNS service you were using before. 


Saturday, November 19, 2016

My new Haskell book Haskell Tutorial and Cookbook now available

My new Haskell book Haskell Tutorial and Cookbook  is now available for a minimum price of $4.

This book has a Creative Commons share and share alike, no commercial use license - so you can legally (and with my blessings) share it with your friends.

Monday, September 05, 2016

Great short video by Douglas Rushkoff that summarizes his latest book 'Throwing Rocks at the Google Bus'

Those of you who know me in "real life" might remember me talking about how much I enjoyed Rushkoff's newest book that was published recently. Well, he just put out a short video that summarizes many of the useful and interesting best-parts of the book: YouTube Link

I especially like the part where he explains why family owned businesses are so much more stable than other businesses. Makes sense. He also explains the flow of how the economy has worked since the dark ages, and how modern technology platforms, while a mostly good thing, necessitate doing things differently now, or else.

Anyway, enjoy the video, or not :-)

Wednesday, August 24, 2016

The Julia programming language: amazingly nice

Well, at least I am amazed. I took a brief look at Julia a few years ago but since I understood it to be somewhat derivative of GNU/Octave (or Matlab) and R (I sometimes use GNU/Octave, but not often), I only gave Julia a very short look.

Fortunately, a current customer uses Julia so I have been ramping up on the language and I very much like it. A bit off topic but I would like to give a shout-out to the O'Relly Safari Books Online service which I recently joined when they had a $200/year guaranteed for life subscription price (half regular price). I am reading "Getting Started with Julia" by Ivo Balbaert which is fine for now. I have "Julia for Data Science" by Zacharias Voulgaris and "Mastering Julia" by Malcolm Sherrington in my reading queue. When learning a new technology having up to date books available really is better than finding information on the web (or at least augmentation to material on the web).

I very much like the tooling for Julia. Julia is a new language but there are already many useful libraries available. Julia uses github for storing modules in the standard library and the integration works very well, at least on Ubuntu Linux. So far, I have been happy just using GNU/gedit for development. I haven't tried Julia on OS X or Windows 10.

The Julia repl is great! Color coding and auto completion are especially well done.

I like just about everything about the Julia language except for 1-based indexing of matrices. Oh well.

Julia is readable, functions are first class objects and programming in Julia is very "Lisp like." With optional type hints (mostly in defining function arguments) Julia is a very high performance language. I love developing in Ruby but I do dream of much higher performance. Julia does not seem like a complete replacement for Ruby (for me) however. That might change.

In addition to doing work with Julia, I have also been experimenting with lots of little coding projects: the Merly web framework (simple, sort of like Sinatra), using the standard HiddenMarkovModels library, and experimenting with a few of the neural network libraries. All good stuff.


Sunday, August 21, 2016

My prediction: Immersive real-time VR in Olympic closing ceremonies in 8 years

My wife and I are watching the closing ceremonies right now. Great visual effects that will be even better with immersive virtual reality. I expect that in 8 years we will have the option of being able to change our point of view from the stands to down on the central floor in a complete immersive VR experience with 3D sound and head tracking.

I haven't worked in VR in almost 20 years when I helped found the virtual reality systems division at SAIC (where I handled 3D sound with head related transfer functions, motion, haptics, and some graphics) and then a year later did a virtual reality project for Disney while working at Angel Studios. Even if I don't work in VR anymore I am a huge fan and I have high expectations for what is to come in user experience.

Sunday, August 14, 2016

I was surprised that so many of the NACL 2016 papers described deep learning projects

I attended the North American Chapter of the Association for Computational Linguistics conference last June in San Diego. Here is a link to the published papers.

The conference was great. The keynote talks, panel discussions, and the talks I attended were interesting! As an independent consultant I payed my own way to the conference and I found it to be a good investment. Sometime I would enjoy attending a European chapter of the ACL conference.


Saturday, August 13, 2016

Some new love for Scala and Python

I am a practical developer. I do have my favorite programming languages (Ruby, Haskell, Clojure, and Java) but I tend to look first at what libraries are available in different languages for whatever project I am currently working on.

I did a lot of work in machine learning in the 1980s (mostly in neural networks) and since then I have probably spent about 15% of my work time directly working on machine learning problems. That has changed in the last few years since several of my consulting customers wanted help spinning up on machine learning.

I have used Scala a fair amount but it has never been a "favorite language," mostly because I didn't care for the tooling. Now I find myself motivated to use Scala because of the awesome Apache MLlib and Breeze machine learning libraries. Also, I have solved my "tooling problem" for Scala development; if you are interested here is my setup: I use a remote high-memory, high-CPU server instance for fast builds. I used to use IntelliJ for Scala development but now I just keep a SBT console open and use Emacs with Ensime and sbt-mode using SSH shells. This is a simple setup but now I am happier using Scala.

I have also been spending a fair amount of time with Google's TensorFlow deep learning tools and the easiest path to solving problems with TensorFlow is working in Python. If you are interested, I do almost all of my work with Python using the free community edition of PyCharm.

So, in general I am trying to avoid the "want to use my favorite programming language trap." The joy is in solving problems and not in wanting to use a favorite language and software stack.

Friday, July 15, 2016

Using TensorFlow Deep Learning neural network with the University of Wisconsin cancer data set

My example of using a TensorFlow Deep Learning neural network to build a prediction model using the University of Wisconsin cancer data: https://github.com/mark-watson/cancer-deep-learning-model

This short example also shows how to use CSV files with TensorFlow. It took me a short while getting my data in CSV files into TensorFlow so hopefully this complete example, with data, will save people a little time.

Look at the source code for a documentation link if you want to change default parameters like using L1 or L2 regularization, etc.

Friday, June 10, 2016

Action items after attending the Decentralized Web Summit

I attended the first six hours of the Decentralized Web Summit  on Wednesday (I had to leave early to attend a family event). Great talks and panel sessions and it was fun to have conversations with Tim Berners-Lee and Vint Cerf, and also say hello to Cory Doctorow. I would like to thank all of the people I talked with during breaks, breakfast, and lunch: good conversations and shared ideas. The basic theme was what can we as technologists do to "lock open the web" to prevent governments and corporations from removing privacy and freedoms in the future.

There was a lot of discusion why the GPL is a powerful tool for maintaining freedom. The call to action for the summit was (quoting from the web site) "The current Web is not private or censorship-free. It lacks a memory, a way to preserve our culture’s digital record through time. The Decentralized Web aims to make the Web open, secure and free of censorship by distributing data, processing, and hosting across millions of computers around the world, with no centralized control."

I have been thinking about my own use of the Internet and the trade-offs that I sometimes make in order to have an easier and more polished web experience and things that I will try to do differently. My personal list of action items, which I am already starting is:

  1. Separate my working use of computers from my mobile experience: on my Linux laptop, setting maximal privacy settings on IceCat (privacy tuned Firefox) and avoiding social media use (Twitter, Facebook, and Google+). I also use Fastmail for most email that does not involve travel arrangements.
  2. For convenience travelling, I allow myself on my Android phone to use Google Inbox for email related to travel arrangements, Google Now alerts (travel reminders, etc.) and generally use social media. I have been merging all of my email together but I have now started to keep GMail distinctly separate from my personal email account on Fastmail.
  3. Re-evaluating the use of Cloud Services. I am experimenting with using GNU Note (GNote) for note taking on my Linux laptop. I am continuing my practice of encrypting backups (saved as date-versioned ZIP files) before transferring to OneDrive, Dropbox, and Google Drive. I have been using three Cloud storage services to effectively have three backup locations.

Modern smartphones are not privacy friendly devices and I decided to just live with some compromises. On the other hand, on my Linux laptop used for writing and consulting work, I am attempting to take all reasonable steps to maintain privacy and security.

Tim Berners-Lee mentioned the W3C Solid design and reference implementations for decentralized identity, authorization, and access control. The basic idea is to have common decentralized data for a user that is secure and private, and can be used by multiple clients by each user, using their secure data.

In the past, I have tried running my own instance of Apache (used to be Google) Wave and asking family and friends to use it as our personal social media. To be honest, people I know mostly didn't want to use it. Since I view my smartphone as already "damaged goods" as far as privacy goes, I will continue using it to check social media like Facebook, Google+ and Twitter. I have been trying to use GNU Social more often (my feed is https://quitter.no/markwatson). I do use GNU Social on my Linux laptop.

Last week a friend of mine asked me why I care about privacy and protecting the web against corporate and governmental over reach. That is not an easy question to answer with a simple short answer. Certainly, laws like Digital Millennium Copyright Act have a chilling effect of making it legally dangerous for security experts to evaluate the safety of electronic devices like medical treatment, etc. Studies have shown that the lack of privacy has a chilling effect on using the rights of free speech. In addition to my own practices, as an individual one of the best things that I do to help is in making donations to the FSF, EFF, ACLU, Mozilla, and archive.org.

Wednesday, June 01, 2016

As AI systems make more decisions, we need Libre Software now more than ever

I have been using AI technology on projects since the 1980's (first using mostly symbolic AI, then neural networks and machine learning) and in addition to the exponentially growing progress the other thing that strikes me is how a once small AI developers community has grown by perhaps almost three orders of magnitude in the number of people working in the field. As the new conventional wisdom goes AI services will be like cloud computing services and power: ubiquitous.

As AI systems decide what medical care people get, who gets mortgages and at what interest rates, the ranking of employees in large organizations, nation states automatically determining who is a threat to their power base or public safety, control of driverless cars, maintain detailed information on everyone and drive their purchasing decisions, etc., having some transparency in algorithms and software implementation is crucial.

Notice how I put "Libre Software" in the title, not "Open Source." While business friendly permissive licenses like Apache 2 and MIT are appropriate for many types of projects, Libre Software licenses like the GPL3 and AGPL3 will ensure that the public commons of AI algorithms and software implementation stays open and transparent.

What about corporations maintaining their proprietary intellectual property for AI? I am sensitive to this issue but I still argue that the combination of a commons of Libre open source AI software with proprietary server infrastructure and proprietary data sources should be sufficient to protect corporations' investments and competitive advantages.

Monday, May 16, 2016

Writing technical books: the craft of simplifying ideas

I am in the process of writing a fairly broad book on setting up a laboratory for cognitive technology / artificial intelligence. I don't find writing to be easy but I enjoy the process a lot. The main problem that I have is removing unnecessary materials and ideas, leaving just enough so readers can understand the core ideas and experiment with these core ideas using example programs. Unnecessary complexity makes understanding difficult and generally does not help a reader solve their specific problems.

If a reader understands the core ideas then they will know when to apply them. It is easy enough, when working on a project, to dig down as necessary to learn and solve problems but the difficult thing for most people is knowing what ideas and technologies might work.

In my field (artificial intelligence) the rate of progress has accelerated greatly, leading to much complexity and thus increasing difficulty just to "keep up" with new advances. I organize my thoughts by using a rough hierarchy of classes of useful technologies and form a taxonomy by mentally mapping problems / applications to the most appropriate class in this hierarchy. When I read a new paper or listen to a talk on YouTube, I try to place major themes or technologies into this hierarchy and when someone describes a new problem to me, I try to match the problem with the correct classification in my hierarchy of solutions / technologies. Vocabulary is important to me because I organize notes in small text files that might contain a synopsis of a web page with a URI, a business idea, interesting ideas from reading material, etc. Key vocabulary words are the search terms for finding relevant notes.

Monday, April 04, 2016

I am enjoying my business trip in Singapore

Singapore is a great place: a well run country that caters to business.

Sunday:

After a surprisingly easy trip from Arizona to Singapore I am getting settled in. No jet lag today! I took a very long walk first thing this morning and took a few pictures: https://onedrive.live.com/redir?resid=96BA71CF7F82BA59!56414&authkey=!AJdcHJmgLjMlwWM&ithint=folder%2c 

The first picture is the sunset from 30K feet starting to descend into the Hong Kong airport, then a picture after landing. Carol and I have been in Hong Kong, so that was nice and familiar. I didn't have a window seat landing in Singapore, so no photos from the air. The other pictures are from outside my hotel in Singapore, deep inside the MRT (rapid transit center), and generally around the neighborhood. Singapore has a very nice feel about it. Everything from airport services, port of entry, transportation, hotel accommodations, etc. is well-run, friendly and first rate!

Late morning I took a second long walk, so some more pictures:

https://onedrive.live.com/redir?resid=96BA71CF7F82BA59!56426&authkey=!APSbUaijMVgBxOo&ithint=folder%2c 

The Indian Temple in the pictures was open for some sort of service today. I took off my shoes then entered it. There was a holy man in a loin cloth near the entrance and he greeted me warmly enough so I felt comfortable staying in an out of the way place and meditated a bit. Nice place. I didn't take any pictures inside the Temple because I remember that in India in Jain and other temples that photography was not appreciated. Inside, there were many pictures and sculpture reliefs of Ganesha the elephant God and you can see similar on the close up picture of the Temple roof.

Late this afternoon, I wanted to get a little more site-seeing in before my work week starts tomorrow morning so I walked to the Buddha Tooth Relic Temple and Museum: 

Before leaving my hotel I visited a rooftop garden in the hotel and while I was there it started monsoon-strength rains - a very sudden event since it had been sunny all day. I waited under an umbrella in the garden for the heavy rain to stop and then did the walk in a light drizzle. The Buddha Tooth Relic Temple was packed with people, a few tourists but mostly devotionals.

Monday:

I started work today. I am here for two weeks and I am enjoying working on a great project.

Monday, March 21, 2016

In defense of iPads as productivity devices

I often hear or read people referring to iPads as toys. I don't agree.
I use my iPad Pro as a "productivity device." Multiple SSH terminals open at the same time to my servers, the publishing system I now use to write my books, cloud based note taking and research (using Google Keep, Evernote, Word and Notes, etc). I also read eBooks, listen to audio books, and my wife and I use it to watch Hulu TV, Netflicks, HBO Go, and purchased Google Play movies and TV shows.
I find the iPad an awesomely useful device. I only use my laptops for software development and since I use Emacs for Lisp, Haskell, and Ruby, with multiple SSH terms that I can flip between quickly, the device also supports programming.
I do spend a fair amount of time in IDEs like RubyMine and IntelliJ on one of my 4 laptops, but I just prefer mobile devices whenever I can use them. In addition to my iPad Pro, I also get a lot of use out of my iPad mini 4 and Android Note 4 phone. The trick is having all of my data available on all devices and realizing that most value of a knowledge worker (software developer in my case) comes from thinking to understand problems rather than typing on a keyboard.

Wednesday, March 09, 2016

History in the making: first Lee Sedol vs. AlphaGo match game

I was up to 1am this morning watching the game live. I became interested in AI in the 1970s and the game of Go was considered to be a benchmark for AI systems. I wrote a commercial Go playing program for the Apple II that did not play a very good game by human standards but did play legally and understood some common patterns. At about the same time I was fortunate enough to get to play both the woman's world Go champion and the national champion of South Korea in exhibition games.

I am a Go enthusiast!

The game played last night was a real fight in three areas of the board and in Go local fights affect the global position. AlphaGo played really well and world champion Lee Sedol resigned near the end of the game.

Saturday, March 05, 2016

OK, now I remember why I like Ruby: reading through the code for the Reality Wikipedia/DBPedia interface

I have been diving deep this year using Haskell, largely in working on examples for the Haskell tutorial and cookbook-style book I am writing. I was revisiting some of my own (old) code for using Wikipedia/DBPedia data and I ran across the very nice Reality library which is written in Ruby. Reality is so very much better than my old code and I enjoyed looking at the implementation.

Ruby and Haskell complement each other in the sense that they are in the opposite ends of programming languages spectrum. If you were forced to only use two programming languages Ruby and Haskell would be good choices. Ruby, like Clojure, has ready access to the vast Java ecosystem via JRuby so the combination of Haskell and Ruby really does cover the bases.

The ability to integrate real world data as found in Wikipedia/DBPedia into systems is a powerful idea. In building AI systems, large companies like Google, Facebook, and Microsoft preprocess and use available world knowledge (I worked for a while with the Knowledge Graph at Google, so I know their process and I assume that Microsoft and Facebook are similar), however, for small organizations and hobbyists/enthusiasts caching and indexing the world's knowledge just isn't possible but some of the same effect can be had by making live API calls to DBPedia, Wikidata, etc.

While I appreciate the work the 800 pound gorillas (Google/Microsoft/Facebook) are doing, I also hope that a rich cooperating ecosystem of small organizations continues to also claim relevance in building systems that help everyone integrate their own data / knowledge / experience with the deep knowledge that we all (hopefully) contribute to on the web.

I find myself pushing back against the "gorillas" by preferring, when feasible, to participate in community efforts. A good example is using GNU Social as a partial replacement to Google+, Facebook, and Twitter (you can follow me on GNU Social at quitter.no/markwatson). In a similar way, I hope that developers contribute to and use good open source projects that support deep knowledge management, deep learning (yeah, "deep" is probably used too often), and AI in general.

In a world where global corporate powers centralize power and control, I believe that it becomes more important for people to make personal decisions to support local businesses, care about the financial and environmental health of their local communities, and continue to use the Internet and the WWW to promote individualism and community, not globalism.

Tuesday, January 26, 2016

Great talk on Spark

I just listened to an ACM sponsored talk Making Big Data Processing Simple With Spark by Matei Zaharias. You may need to be an ACM member to watch the webinar. I first joined ACM in the mid 1970s - recommended.

For handling huge datasets Spark is evolutionary or revolutionary depending on your point of view. A bit of personal history before I talk specifically about Spark:

In the late 1980s I was an architect and developer on a multinational project to use seismic data from 38 data collection stations to detect atomic bomb tests. All of our data handling software was custom; if we had Spark, or even Hadoop, we would have saved a ton of effort. Similarly, in the 1990s I was tech lead on a fraud detection system that used massive real time telephone records data sets. Modern infrastructure would have saved a lot of time and money.

My first serious use of map reduce was processing large Twitter data sets at Compass Labs. We used Hadoop on Amazon ElasticMapreduce. Later when I worked as a contractor at Google, in addition to using map reduce, I was introduced to realtime interactive tools like Dremel that made it easy to interactively use large data sets.

With Spark, everyone gets to interactively work with massive datasets! I think that Spark is evolutionary in that it builds on and plugs into existing work like the Hadoop File Sytem and supports familiar map reduce style operations. I think that it is revolutionary in the memory based distributed architecture and application programming model. Spark was designed based on limitations of map reduce systems like Hadoop that while providing easy to use programming models, have ineffiencies in data access. With Spark, you have an easy to use programming model, more efficiency, and built in interactivity. I have examples of using Spark in my last book Power Java. You can experiment with Spark on your laptop and only worry about accessing a cluster when you need to scale.