Friday, July 15, 2016

Using TensorFlow Deep Learning neural network with the University of Wisconsin cancer data set

My example of using a TensorFlow Deep Learning neural network to build a prediction model using the University of Wisconsin cancer data: https://github.com/mark-watson/cancer-deep-learning-model

This short example also shows how to use CSV files with TensorFlow. It took me a short while getting my data in CSV files into TensorFlow so hopefully this complete example, with data, will save people a little time.

Look at the source code for a documentation link if you want to change default parameters like using L1 or L2 regularization, etc.

Friday, June 10, 2016

Action items after attending the Decentralized Web Summit

I attended the first six hours of the Decentralized Web Summit  on Wednesday (I had to leave early to attend a family event). Great talks and panel sessions and it was nice to say hello to Tim Berners-Lee, Vint Cerf, and Cory Doctorow. I would like to thank all of the people I talked with during breaks, breakfast, and lunch: good conversations and shared ideas. The basic theme was what can we as technologists do to "lock open the web" to prevent governments and corporations from removing privacy and freedoms in the future.

There was a lot of discusion why the GPL is a powerful tool for maintaining freedom. The call to action for the summit was (quoting from the web site) "The current Web is not private or censorship-free. It lacks a memory, a way to preserve our culture’s digital record through time. The Decentralized Web aims to make the Web open, secure and free of censorship by distributing data, processing, and hosting across millions of computers around the world, with no centralized control."

I have been thinking about my own use of the Internet and the trade-offs that I sometimes make in order to have an easier and more polished web experience and things that I will try to do differently. My personal list of action items, which I am already starting is:

  1. Separate my working use of computers from my mobile experience: on my Linux laptop, setting maximal privacy settings on IceCat (privacy tuned Firefox) and avoiding social media use (Twitter, Facebook, and Google+). I also use Fastmail for most email that does not involve travel arrangements.
  2. For convenience travelling, I allow myself on my Android phone to use Google Inbox for email related to travel arrangements, Google Now alerts (travel reminders, etc.) and generally use social media. I have been merging all of my email together but I have now started to keep GMail distinctly separate from my personal email account on Fastmail.
  3. Re-evaluating the use of Cloud Services. I am experimenting with using GNU Note (GNote) for note taking on my Linux laptop. I am continuing my practice of encrypting backups (saved as date-versioned ZIP files) before transferring to OneDrive, Dropbox, and Google Drive. I have been using three Cloud storage services to effectively have three backup locations.

Modern smartphones are not privacy friendly devices and I decided to just live with some compromises. On the other hand, on my Linux laptop used for writing and consulting work, I am attempting to take all reasonable steps to maintain privacy and security.

Tim Berners-Lee mentioned the W3C Solid design and reference implementations for decentralized identity, authorization, and access control. The basic idea is to have common decentralized data for a user that is secure and private, and can be used by multiple clients by each user, using their secure data.

In the past, I have tried running my own instance of Apache (used to be Google) Wave and asking family and friends to use it as our personal social media. To be honest, people I know mostly didn't want to use it. Since I view my smartphone as already "damaged goods" as far as privacy goes, I will continue using it to check social media like Facebook, Google+ and Twitter. I have been trying to use GNU Social more often (my feed is https://quitter.no/markwatson). I do use GNU Social on my Linux laptop.

Last week a friend of mine asked me why I care about privacy and protecting the web against corporate and governmental over reach. That is not an easy question to answer with a simple short answer. Certainly, laws like Digital Millennium Copyright Act have a chilling effect of making it legally dangerous for security experts to evaluate the safety of electronic devices like medical treatment, etc. Studies have shown that the lack of privacy has a chilling effect on using the rights of free speech. In addition to my own practices, as an individual one of the best things that I do to help is in making donations to the FSF, EFF, ACLU, Mozilla, and archive.org.

Wednesday, June 01, 2016

As AI systems make more decisions, we need Libre Software now more than ever

I have been using AI technology on projects since the 1980's (first using mostly symbolic AI, then neural networks and machine learning) and in addition to the exponentially growing progress the other thing that strikes me is how a once small AI developers community has grown by perhaps almost three orders of magnitude in the number of people working in the field. As the new conventional wisdom goes AI services will be like cloud computing services and power: ubiquitous.

As AI systems decide what medical care people get, who gets mortgages and at what interest rates, the ranking of employees in large organizations, nation states automatically determining who is a threat to their power base or public safety, control of driverless cars, maintain detailed information on everyone and drive their purchasing decisions, etc., having some transparency in algorithms and software implementation is crucial.

Notice how I put "Libre Software" in the title, not "Open Source." While business friendly permissive licenses like Apache 2 and MIT are appropriate for many types of projects, Libre Software licenses like the GPL3 and AGPL3 will ensure that the public commons of AI algorithms and software implementation stays open and transparent.

What about corporations maintaining their proprietary intellectual property for AI? I am sensitive to this issue but I still argue that the combination of a commons of Libre open source AI software with proprietary server infrastructure and proprietary data sources should be sufficient to protect corporations' investments and competitive advantages.

Monday, May 16, 2016

Writing technical books: the craft of simplifying ideas

I am in the process of writing a fairly broad book on setting up a laboratory for cognitive technology / artificial intelligence. I don't find writing to be easy but I enjoy the process a lot. The main problem that I have is removing unnecessary materials and ideas, leaving just enough so readers can understand the core ideas and experiment with these core ideas using example programs. Unnecessary complexity makes understanding difficult and generally does not help a reader solve their specific problems.

If a reader understands the core ideas then they will know when to apply them. It is easy enough, when working on a project, to dig down as necessary to learn and solve problems but the difficult thing for most people is knowing what ideas and technologies might work.

In my field (artificial intelligence) the rate of progress has accelerated greatly, leading to much complexity and thus increasing difficulty just to "keep up" with new advances. I organize my thoughts by using a rough hierarchy of classes of useful technologies and form a taxonomy by mentally mapping problems / applications to the most appropriate class in this hierarchy. When I read a new paper or listen to a talk on YouTube, I try to place major themes or technologies into this hierarchy and when someone describes a new problem to me, I try to match the problem with the correct classification in my hierarchy of solutions / technologies. Vocabulary is important to me because I organize notes in small text files that might contain a synopsis of a web page with a URI, a business idea, interesting ideas from reading material, etc. Key vocabulary words are the search terms for finding relevant notes.

Monday, April 04, 2016

I am enjoying my business trip in Singapore

Singapore is a great place: a well run country that caters to business.

Sunday:

After a surprisingly easy trip from Arizona to Singapore I am getting settled in. No jet lag today! I took a very long walk first thing this morning and took a few pictures: https://onedrive.live.com/redir?resid=96BA71CF7F82BA59!56414&authkey=!AJdcHJmgLjMlwWM&ithint=folder%2c 

The first picture is the sunset from 30K feet starting to descend into the Hong Kong airport, then a picture after landing. Carol and I have been in Hong Kong, so that was nice and familiar. I didn't have a window seat landing in Singapore, so no photos from the air. The other pictures are from outside my hotel in Singapore, deep inside the MRT (rapid transit center), and generally around the neighborhood. Singapore has a very nice feel about it. Everything from airport services, port of entry, transportation, hotel accommodations, etc. is well-run, friendly and first rate!

Late morning I took a second long walk, so some more pictures:

https://onedrive.live.com/redir?resid=96BA71CF7F82BA59!56426&authkey=!APSbUaijMVgBxOo&ithint=folder%2c 

The Indian Temple in the pictures was open for some sort of service today. I took off my shoes then entered it. There was a holy man in a loin cloth near the entrance and he greeted me warmly enough so I felt comfortable staying in an out of the way place and meditated a bit. Nice place. I didn't take any pictures inside the Temple because I remember that in India in Jain and other temples that photography was not appreciated. Inside, there were many pictures and sculpture reliefs of Ganesha the elephant God and you can see similar on the close up picture of the Temple roof.

Late this afternoon, I wanted to get a little more site-seeing in before my work week starts tomorrow morning so I walked to the Buddha Tooth Relic Temple and Museum: 

Before leaving my hotel I visited a rooftop garden in the hotel and while I was there it started monsoon-strength rains - a very sudden event since it had been sunny all day. I waited under an umbrella in the garden for the heavy rain to stop and then did the walk in a light drizzle. The Buddha Tooth Relic Temple was packed with people, a few tourists but mostly devotionals.

Monday:

I started work today. I am here for two weeks and I am enjoying working on a great project.

Monday, March 21, 2016

In defense of iPads as productivity devices

I often hear or read people referring to iPads as toys. I don't agree.
I use my iPad Pro as a "productivity device." Multiple SSH terminals open at the same time to my servers, the publishing system I now use to write my books, cloud based note taking and research (using Google Keep, Evernote, Word and Notes, etc). I also read eBooks, listen to audio books, and my wife and I use it to watch Hulu TV, Netflicks, HBO Go, and purchased Google Play movies and TV shows.
I find the iPad an awesomely useful device. I only use my laptops for software development and since I use Emacs for Lisp, Haskell, and Ruby, with multiple SSH terms that I can flip between quickly, the device also supports programming.
I do spend a fair amount of time in IDEs like RubyMine and IntelliJ on one of my 4 laptops, but I just prefer mobile devices whenever I can use them. In addition to my iPad Pro, I also get a lot of use out of my iPad mini 4 and Android Note 4 phone. The trick is having all of my data available on all devices and realizing that most value of a knowledge worker (software developer in my case) comes from thinking to understand problems rather than typing on a keyboard.

Wednesday, March 09, 2016

History in the making: first Lee Sedol vs. AlphaGo match game

I was up to 1am this morning watching the game live. I became interested in AI in the 1970s and the game of Go was considered to be a benchmark for AI systems. I wrote a commercial Go playing program for the Apple II that did not play a very good game by human standards but did play legally and understood some common patterns. At about the same time I was fortunate enough to get to play both the woman's world Go champion and the national champion of South Korea in exhibition games.

I am a Go enthusiast!

The game played last night was a real fight in three areas of the board and in Go local fights affect the global position. AlphaGo played really well and world champion Lee Sedol resigned near the end of the game.

Saturday, March 05, 2016

OK, now I remember why I like Ruby: reading through the code for the Reality Wikipedia/DBPedia interface

I have been diving deep this year using Haskell, largely in working on examples for the Haskell tutorial and cookbook-style book I am writing. I was revisiting some of my own (old) code for using Wikipedia/DBPedia data and I ran across the very nice Reality library which is written in Ruby. Reality is so very much better than my old code and I enjoyed looking at the implementation.

Ruby and Haskell complement each other in the sense that they are in the opposite ends of programming languages spectrum. If you were forced to only use two programming languages Ruby and Haskell would be good choices. Ruby, like Clojure, has ready access to the vast Java ecosystem via JRuby so the combination of Haskell and Ruby really does cover the bases.

The ability to integrate real world data as found in Wikipedia/DBPedia into systems is a powerful idea. In building AI systems, large companies like Google, Facebook, and Microsoft preprocess and use available world knowledge (I worked for a while with the Knowledge Graph at Google, so I know their process and I assume that Microsoft and Facebook are similar), however, for small organizations and hobbyists/enthusiasts caching and indexing the world's knowledge just isn't possible but some of the same effect can be had by making live API calls to DBPedia, Wikidata, etc.

While I appreciate the work the 800 pound gorillas (Google/Microsoft/Facebook) are doing, I also hope that a rich cooperating ecosystem of small organizations continues to also claim relevance in building systems that help everyone integrate their own data / knowledge / experience with the deep knowledge that we all (hopefully) contribute to on the web.

I find myself pushing back against the "gorillas" by preferring, when feasible, to participate in community efforts. A good example is using GNU Social as a partial replacement to Google+, Facebook, and Twitter (you can follow me on GNU Social at quitter.no/markwatson). In a similar way, I hope that developers contribute to and use good open source projects that support deep knowledge management, deep learning (yeah, "deep" is probably used too often), and AI in general.

In a world where global corporate powers centralize power and control, I believe that it becomes more important for people to make personal decisions to support local businesses, care about the financial and environmental health of their local communities, and continue to use the Internet and the WWW to promote individualism and community, not globalism.

Tuesday, January 26, 2016

Great talk on Spark

I just listened to an ACM sponsored talk Making Big Data Processing Simple With Spark by Matei Zaharias. You may need to be an ACM member to watch the webinar. I first joined ACM in the mid 1970s - recommended.

For handling huge datasets Spark is evolutionary or revolutionary depending on your point of view. A bit of personal history before I talk specifically about Spark:

In the late 1980s I was an architect and developer on a multinational project to use seismic data from 38 data collection stations to detect atomic bomb tests. All of our data handling software was custom; if we had Spark, or even Hadoop, we would have saved a ton of effort. Similarly, in the 1990s I was tech lead on a fraud detection system that used massive real time telephone records data sets. Modern infrastructure would have saved a lot of time and money.

My first serious use of map reduce was processing large Twitter data sets at Compass Labs. We used Hadoop on Amazon ElasticMapreduce. Later when I worked as a contractor at Google, in addition to using map reduce, I was introduced to realtime interactive tools like Dremel that made it easy to interactively use large data sets.

With Spark, everyone gets to interactively work with massive datasets! I think that Spark is evolutionary in that it builds on and plugs into existing work like the Hadoop File Sytem and supports familiar map reduce style operations. I think that it is revolutionary in the memory based distributed architecture and application programming model. Spark was designed based on limitations of map reduce systems like Hadoop that while providing easy to use programming models, have ineffiencies in data access. With Spark, you have an easy to use programming model, more efficiency, and built in interactivity. I have examples of using Spark in my last book Power Java. You can experiment with Spark on your laptop and only worry about accessing a cluster when you need to scale.


Saturday, January 23, 2016

Simple Haskell: using a sqlite3 database

I have been using Lisp languages professionally since the early 1980s. While I now use Java, Ruby, and Clojure for much of my work, I have been slowly been getting up to speed using Haskell over the last 5 years. My difficulties using Haskell are caused almost 100% when I need write impure Haskell code. This occasional discomfort is made up for by the fun and productivity of writing pure Haskell code. Using haskell-mode in Emacs I get the same happy feeling writing pure Haskell code that I used to get using Common Lisp, Scheme, and Clojure - and with the advantages of a strongly typed language!

I like to mock up test data and write the pure code first and then write impure code that needs to access the web, RDF data stores, relational databases, file IO, etc. For me, as a student of Haskell, this is the easiest way to write Haskell programs.

About 15 years ago, in one of my Java artificial intelligence books I wrote an example program that provides a natural language processing (NLP) interface to relational databases. I have decided that I would like to do the same, but in Haskell, and take advantage of what I have learned in the last 15 years. Writing the code to convert natural language queries into SQL queries is pure Haskell code (given mockup data for database metadata and sample table data, and test NLP queries) and I am enjoying working on that. Eventually I will need to write some impure code that accesses the popular databases. To make the initial development as easy as possible (a good idea since I may never totally finish this side project) I have decided that I will use sqlite and the sqlite-simple library. For the first proof of concept/prototype, I don't expect to need much impure code. A good thing!

This reminds me of a comment Erik Meijer made when he was teaching the edX functional programming class. He said that as developers we can think of pure Haskell code a being islands and impure code that has to maintain state and interact with the world as the ocean containing the islands. I like this metaphor!

I write little code snippets (or sometimes mini-projects) to experiment with nonpure Haskell code and the following listing, derived from the sqlite-simple library, contains the small experiments with the functionality that I need for now. I thought it was worth sharing in case this saves anyone else some time:


{-# LANGUAGE OverloadedStrings #-}
import Database.SQLite.Simple

{-
   Create sqlite database:
     sqlite3 test.db "create table test (id integer primary key, str text);"

   This is derived from the example at github.com/nurpax/sqlite-simple
-}

main :: IO ()
main = do
  conn <- open "test.db"
  -- start by getting table names in database:
  do
    r <- query_ conn "SELECT name FROM sqlite_master WHERE type='table'" :: IO [(Only String)]
    print "Table names in database test.db:"
    mapM_ (print . fromOnly) r
  
  -- get the metadata for table test in test.db:
  do
    r <- query_ conn "SELECT sql FROM sqlite_master WHERE type='table' and name='test'" :: IO [(Only String)]
    print "SQL to create table 'test' in database test.db:"
    mapM_ (print . fromOnly) r
  
  -- add a row to table 'test' and print out the rows in table 'test':
  do
    execute conn "INSERT INTO test (str) VALUES (?)"
      (Only ("test string 2" :: String))
    r2 <- query_ conn "SELECT * from test" :: IO [(Int, String)]
    print "number of rows in table 'test':"
    print (length r2)
    print "rows in table 'test':"
    mapM_ print  r2
    
  close conn

Just to make this example complete, here is my stack.yaml file:

resolver: lts-4.0
packages: - '.'
extra-deps: []
flags: {}
And here is my sqlite.cabal file:
name:                sqlite
version:             0.1.0.0
synopsis:            Experiment with sqlite-simple
description:         Derived from example in github.com/nurpax/sqlite-simple
homepage:            https://github.com/mark-watson?tab=repositories
license:             Apache2
license-file:        LICENSE
author:              Mark Watson
maintainer:          [email protected]
copyright:           2016 Mark Watson
category:            Web
build-type:          Simple
-- extra-source-files:
cabal-version:       >=1.10

executable test1
  hs-source-dirs:      .
  main-is:             test1.hs
  ghc-options:         -threaded -rtsopts -with-rtsopts=-N
  build-depends:       base
                     , sqlite-simple
  default-language:    Haskell2010

Here is a build and sample run (assuming that the sqlite database test.db has been created as per the comments in the first source listing):

✗ stack build
✗ stack exec test1
"Table names in database test.db:"
"test"
"SQL to create table 'test' in database test.db:"
"CREATE TABLE test (id INTEGER PRIMARY KEY, str text)"
"number of rows in table 'test':"
3
"rows in table 'test':"
(1,"test string 2")
(2,"test string 2")
(3,"test string 2")

I would like to thank Janne Hellsten for maintaining the sqlite-simple library and I would also like to thank the developers of stack. Using stack has solved most of my build issues with Haskell. Thanks!

Thursday, January 14, 2016

I will not vote for Hillary Clinton. I reject the "lesser of two evils" argument.

I believe that Hillary Clinton is in the pocket of Wall Street, a lacky by any definition. I also believe that she is, as Ralph Nader says, a poster child for the military industrial complex. I also don't like her close ties to agribusiness giant Monsanto and her advocacy for the industry's genetically modified crops.

I believe that our two party system is broken, almost never giving us a choice that matches the preferences of the electorate. Corporate news corporations favor Clinton over Bernie Sanders in subtle and unfair ways, basing so much of their slanted (as directed to the financial interests of the network owners) discussion in terms assuming Hillary Clinton will be the Democratic candidate and pushing the false narrative that Bernie Sanders has no chance of winning the general election.

Some of my friends who are Democrats believe that it is a mistake to not vote for whatever Democratic toadie the establishment runs. What if a Republican wins? Oh NOoos! The sky will fall.

I believe that the sky will fall on our representative democracy if people don't stand up to the political establishment and the corporations that their preferred candidates represent.

Thursday, December 31, 2015

Happy New Year

Hello everyone. I want to wish everyone a happy new year and say a few things about what I expect for the new year.

I believe that one of the most important issues facing "first world" countries like the USA and England are the issues of Internet security and privacy. The news this morning of the umbrage of US congress people to the news that NSA is monitoring of their communications with people in the Israeli government is laughable: let us be clear about this: these people don't care about the privacy of US citizens but they do care about their own privacy and the privacy of leaders of another country. This stinks, and badly.

While privacy is important I believe that a bigger issue security. I would like to see my government (USA) conduct a multi-year "going to the moon" type project for strengthening our information infrastructures to the benefit of people, companies, and governments. This means that there can be no encryption back doors installed in any software and hardware systems. If governments have universal decryption keys then eventually these keys will leak to organized crime, terrorists, and other governments. Imagine the scenario of everyone waking up some morning to emptied bank accounts - that is a possible scenario if 'back doors' are installed in public infrastructure.

On a happier note, the will of the people in my country regarding labeling of GMO foods has won out, at least for now. I believe that 90% of people in the USA poll in favor of accurate labeling of foods. When you consider the power of the food corporation lobbying block, with their paid for politicians (Hillary Clinton and most of the republican presidential candidates, as well as most of Congress) this is a surprising but good victory. Yay!

Despite environmental and political corruption problems I remain very optimistic about the future.

I expect scientific advances in clean energy, artificial intelligence and robotics, and medical breakthroughs to continue at a rapid pace and yield benefits for most people on earth.

In my field (artificial intelligence) we have seen enormous progress in development of useful systems based on deep learning. That said, I don't believe that deep learning is the path to general artificial intelligence. Deep learning is an elegant hack (for training many layer neural networks) but we need a formal model for true AI.

On personal technology: I have spent an enormous amount of time (very enjoyable time) studying and using Haskell, Clojure, Scala, and other languages on projects. While I will always allocate time for learning and practicing with new languages and technologies, for 2016 I have made a news years resolution to "just use Java 8" and "get stuff done." I will continue to mostly use Ruby when I need a scripting language. My decision is to spend more time on artificial intelligence research and projects and Java 8 is usually a practical enough language for this work.

Thursday, December 10, 2015

Raspberry Pi and education

I may be late to the Raspberry Pi party - I just bought my first one this week. The Rasberry Pi is everything that I would hope for in an educational computer: cheap enough for all children to own and based on open source software (Debian Linux, LibreOffice, lots of games, and programming languages like Python, Ruby, Java, Scratch, etc. pre installed).

The open nature of the Raspberry Pi encourages kids to experiment. RPs might not be as practical as other systems like ChromeBook that have more distributed infrastructure behind them but I think that open systems provide a better better environment for experimenting with computers.

I reformatted a 32GB memory card and installed a fresh Debian Linux image provided by the Raspberry Pi project and when hooked up to a large monitor the Raspberry Pi 2 is quite capable. I installed the RubyMine IDE and git cloned a few of my Ruby projects and loaded the manuscript for my current writing project. I find the system is surprising fast with its 4 core ARM processor. For fun I have used it for my work for the last day. Of course I am writing this blog article on my RP setup.

Our future lies in how well our educational system works. In the modern world people should never stop learning new things both for the fun of it and to enhance their careers and their contributions to society. Very inexpensive devices like the RP (the latest model costs $5) that can be experimented with provide children with a good model for a life long process of experimenting and learning.

Saturday, December 05, 2015

Digital Life: a modicum of privacy

This post contains my advice for maintaining a reasonable amount of privacy without reducing the utility and entertainment we get from the Internet. It is no news that governments are pushing back against our right of privacy and we should also be concerned by tracking by both corporations and organized crime. Privacy is a basic human right and once rights are lost or reduced in scope they can be very difficult to get back.

To start with I believe that everyone should have the privacy enhanced Tor web browser installed. Tor was developed originally by the US Navy in support of journalists and other people living in countries with oppressive regimes. I strongly recommend using Tor for the following reasons:

  • Research any medical conditions that you have.
  • You are interested in buying a product and you don't want advertisers to put ads on web sites you visit because you would rather make independent unbiased purchasing decisions.
  • Visit any sites for any reason that you would not like a future employer to know that you visited. We all look at odd information on the web out of curiosity, research, or for whatever reasons.
  • The availability of privacy enhancing tools is important and at least occasional use by the general public of tools like Tor help to legitimize these tools.
I don't think for a minute that privacy enhancement tools prevent major government actors like the NSA and GCHQ from accessing our private data. It slows them down a little, which I argue is a good thing, but does not stop them. For the general public the real benefits come from stopping (or slowing down) access to your data by corporations and organized crime. I think that it would be naive to think that organized crime does not have the interest and the ability to collect private data.

Private cloud storage: I use SpiderOak but there are several other good safe storage options.

When I was a kid I enjoyed writing in a diary. I sort of do the same thing as an adult, writing many short categorized notes about things I want to do, personal philosophy and spirituality, ideas for writing projects, travel notes, etc. I think that if it is worthwhile seriously thinking about something then it is worthwhile making notes. I now use the simple text markdown format for these notes - writing notes helps organize our thoughts and later quickly find old ideas we took the time to journal. For years I used cloud services like Google Docs + Keep and Microsoft OneNote. I am mostly transitioning to using secure and private cloud storage and as it turns out, well organized notes in markdown are as convenient as storing my ideas and notes other less secure cloud services.

Online banking: I prefer to use (relatively) locked down devices like an iPad or a Chromebook for online banking. I think it is less likely that these devices are compromised than Windows, Linux, or Mac laptops. And don't forget to use a private mode window in your browser when doing online banking, access sensitive government web sites like Affordable Health Care, etc.

What about using social media? I enjoy social media, especially Google+ and Twitter and I use all social media to shamelessly plug the books that I write. I use a simple trick for using social media and using Google search: I use the Chrome web browser for these tasks and use either Firefox or Safari for all other web browsing. As far as tracking activities go, this helps prevent information leakage. It is a bit of a nuisance: when I see a web link on social media I would like to look at, instead of clicking the link I right-click the link to copy the URI and use a keyboard shortcut to switch to Firefox or Safari and paste in the link. Yes, this takes about 4 or 5 seconds an is a little inconvenient.

Governments and corporations use strong encryption and so should you. Encryption drives safe information flows and is vital to all of the world economies. Encryption can not have "back doors" because of the threat to the global economy and that of companies and individuals if organized crime (when I talk about organized crime I am also including organizations that others might call terrorists) gained access to back door encryption keys. The damage this would cause is unimaginable. Fortunately many consumer computing devices support encrypted file storage out of the box: modern Android phones, iPhones, iPads, Mac OS X, Ubuntu Linux, and the professional versions of Windows 10. Use encryption - it is well worth the effort.

Wednesday, November 25, 2015

Ruby SVM text classifier

There are several useful Ruby gems/libraries for using Support Vector Machines (SVM) and another to convert text into SVM style feature vectors. I recently packaged up what I needed with a Ruby script to fold the data for testing, etc.

Here is the github repository.

It took me a short while to get everything working together so hopefully this will save you several minutes of extra effort if you want to use SVM for text classification.

Friday, November 13, 2015

My new book "Power Java" is released today

I recommend that you look through the github repo for my book to see what I cover and if it looks interesting please please consider buying my new book on leanpub.

I cover a wide range of topics including machine learning, linked data, network programming techniques for IoT, and some ideas for knowledge management using cloud data.

Thursday, October 01, 2015

My Cognition Technology blog and website

I created a new blog yesterday http://blog.cognition.tech for news and my personal programming experiments involving machine learning and deep learning. There is a companion website www.cognition.tech where I offer consulting, mentoring, and turnkey development.

I will continue using this blog for personal posts, programming languages (mostly Haskell, Java, Clojure, and Ruby) and general discusions on technology.

Monday, September 28, 2015

I need some sympathy: spending most of my time coding in Java and Python

As the universe unfolds, I have been spending most of my time recently working with machine learning and for the forseeable future that will not change. Lets face it, many of the really great ML libraries are written in Java and Python.

I still love development with Clojure and Ruby, and I am still on my long term quest to become passably proficient with Haskell. That said, it is crazy to not simply use languages that have the best library and framework support for any task.

Sunday, August 02, 2015

We are getting closer to the dream of the 1980s and 1990s: software reuse

I worked (mostly) at SAIC in the 1980s and 1990s. In the groups I worked in we developed large software systems (sometimes with hardware components) for customers. Software reuse was a dream back then that was largely unfulfilled. Our procedure for reuse was mostly cut and paste from old projects, with some effort to write reusable libraries. There was also a movement to use commercial off the shelf (COTS) software.

It occurred to me recently that we are now much closer to the dream of widespread software reuse. What has changed is a healthy open source (and libre) software ecosystem of trusted and vetted libraries, frameworks, and complete applications. I tend to trust software from FSF and the Apache Foundation, for example. Organizations and individuals are motivated to release software for a variety of reasons: for help in development and bug detection, for good publicity and self promotion, and sometimes for ego. All good reasons!

My process when starting a new project is to first identify existing open source software that I can build on. My choice of programming language is often dictated by the language used in the open source software projects that would be most beneficial to my project. It is a thrill to build a new project using mostly existing software. This greatly reduces the cost of projects and I think also greatly increases developer satisfaction. Who wants to spend 6 months writing a project mostly from scratch when it can be done much more quickly building on other people's work.

For me the magic that makes this all happen is public repositories like github and bitbucket. The cost of evaluating an open source project for reuse can be very low: a git clone, build, run the tests, look at the tests, and read through the documentation and source code.

So I believe that we have made incredible progress in software reuse in the last 30 years.

Wednesday, July 29, 2015

I tried Windows 10 the first day of the rollout (today!)

Installing Windows 10 on my 5 month old HP Stream 11 was easy.I have no comments on that process.

Visually two things stand out: windows are all white except for a very thin aqua blue margin and my slow laptop seems to run the UI faster. I don't know how much of the speed bump is making the code more efficient and how much is doing away with some animation effects.

The desktop now seems like a mixture between Windows 7 and 8.1. The start menu is back and the bottom icon navigation bar is always visible along the bottom of the screen unless you put an application in full screen mode. Clicking the windows start menu in the lower left corner of the screen brings up a combo: classic looking menu on the left and the metro large icon interface like Windows 8.1 on the right side of the popup - but this popup only covers about 30% of the screen. It all seems a bit odd to me but I really like it - after using it for ten minutes (it took a little while to adjust to the new interface). I think the new interface in general is very well done. I haven't spent enough time with the new Edge web browser to have a firm opinion yet, but it seems functional and the reading view does a good job of reformatting web pages without advertisements for comfortable reading.

Windows 10 Cortana, similar to Google Now and Apple Siri, is always available just to the right of the Windows start button. Just like the search utility on Windows 8.1 this is the way to quickly find stuff in the system. Need to change the PATH, then just type 'environment variables' and instantly the environment edit utility is shown. I think this actually works a little better than Spotlight on OS X and quite a bit better than hot key search on Ubuntu. I tried using Cortana for searching for things in my community. It did OK, but will hopefully get better as it gets access to more of my life context.

I will spend time locking down some of the privacy settings. I alreaded deleted the Skype application because it leaks a little too much personal information for my taste. Cortana is configuraable for setting which types of information are collected. It pays to take the time checking possible privacy settings for products and services from Microsoft, Google, Facebook, Apple, etc. I tend to keep my Linux laptops more locked down, privacy wise, than my Windows and Apple laptops. I try to strike a balance between having some privacy and also enjoy available products and services.