Thursday, November 23, 2017

Easy Jupyter notebook setup on AWS GPU EC2 with machine learning AMI

The Amazon machine learning AMI (link may change in the future) is set up for CUDA/GPU support and preinstalled: TensorFlow, Keras, MXNet, Caffe, Caffe2, PyTorch, Theano, CNTK, and Torch.

I chose the least expensive g2.2xlarge EC2 instance type with a GPU and used the One Click Launch option (you will need to specify a key file pem file for the AWS region where you are starting the instance). to have an instance running and available in about a minute. This GPU instance costs $0.65/hour so remember to either stop it (if you want to reuse it later and don't mind paying a small cost of persistent local storage) or terminate it if you don't want to be charged for the 60GB of SSD storage space associated with the EC2.

I am very comfortable working in SSH shells using Emacs, screen, etc. When an instance boots up, the Actions -> Connect menu shows you the temporary public address which you can use to SSH in:

ssh -i "~/.ssh/key-pair.pem" [email protected]

I keep my pem files in ~/.ssh, you might store them in a different place. If you haven't used EC2 instances before and don't already have an pem access files, follow these directions.

Anaconda is installed so jupyter is also pre-installed and can be started from any directory on your EC2 using:

jupyter notebook

After some printout, you will see a local URI to access the Jupyter notebook that will look something like this:

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=f25c0cdf24b128c0c7ae2ce92bc0583934bc0c1293f83ccf

In another terminal window start another SSH session but this time map the local port 8888 to port 8888 on the EC2:

ssh -L 8888:127.0.0.1:8888 -i ".ssh/key-pair.pem" [email protected]

Now on your laptop you can attach to the remote Jupyter instance using (your token will be different):
http://localhost:8888/?token=f25c0cdf24b128c0c7ae2ce92bc0583934bc0c1293f83ccf

Alternative to using SSH tunnel:

A nice alternative is to install (on your laptop - no server side installation is required) and use sshuttle. Assuming I have a domain name attached to the sometimes running EC2, I use the following aliases in my bash configuration file:

alias gpu='ssh -i "~/.ssh/key-pair.pem" [email protected]YDOMAIN.COM'
alias tun="sshuttle -r [email protected]MYDOMAIN.COM 0/0 -e 'ssh -i \".ssh/key-pair.pem\"'"

Note: Keeping an Elastic IP Address attached to a EC2 when the EC2 is usually not running will cost you about $3.40/month, but I find having a "permanent" IP address assigned to a domain name is convenient.

Goodies in the AWS machine learning AMI:

There are many examples installed for each of these frameworks. I usually use Keras and I was pleased to see the following examples ready to run:
addition_rnn.py
antirectifier.py
babi_memnn.py
babi_rnn.py
cifar10_cnn.py
conv_filter_visualization.py
conv_lstm.py
deep_dream.py
image_ocr.py
imdb_bidirectional_lstm.py
imdb_cnn.py
imdb_cnn_lstm.py
imdb_fasttext.py
imdb_lstm.py
lstm_benchmark.py
lstm_text_generation.py
mnist_acgan.py
mnist_cnn.py
mnist_hierarchical_rnn.py
mnist_irnn.py
mnist_mlp.py
mnist_net2net.py
mnist_siamese_graph.py
mnist_sklearn_wrapper.py
mnist_swwae.py
mnist_tfrecord.py
mnist_transfer_cnn.py
neural_doodle.py
neural_style_transfer.py
pretrained_word_embeddings.py
reuters_mlp.py
reuters_mlp_relu_vs_selu.py
stateful_lstm.py
variational_autoencoder.py
variational_autoencoder_deconv.py

There are many other examples for the other frameworks TensorFlow, MXNet, Caffe, Caffe2, PyTorch, Theano, CNTK, and Torch.

No comments: