Setting up an IPython Cluster via SSH

Posted 2012-10-06 in python

For my Master Thesis I came up with quite a few Monte Carlo simulations. I use them to filter particle size histograms and to estimate errors of derived distribution parameters.

Every thing Monte Carlo related tends to be quite processor intensive. Which is rather painfull as I usually work on an older Macbook Pro with a Core2Duo processor. Fortunately I have an i5 machine to my disposal. But I don't want to work on that machine directly, I rather prefer my own.

That is why I wanted to set up a small IPython cluster. I anyway work with the fabulous IPython notebook and console all day, so, after watching some of the videos on the IPython cluster I was certain, that it would serve my purposes quite well. That was easily confirmed by a few local trials. So I went ahead and dove head first into the documentation on how to set up an IPython cluster via SSH. Usually the documentation of IPython is very good but with the information provided I was not able to set up a working configuration. So I'll walk you through my now working config.1

Setup

First make sure you are running the latest iPython version from the master branch of the IPython github. In this case it is the 0.14dev version, courtesy of the ScipySuperpack.

Than add a new dedicated user (ipcluster) to the machine the engines are supposed to run on. Make sure the user can run ipython and that all the tests pass. Also you have to set up password less login via ssh so that you can log in from your laptop into the worker machine.

Configuration

Now create a new profile on your controller machine (The MBP in my case) by running this command in the shell

ipython profile create --parallel --profile=ssh

The config files are placed into the .ipython/profile_ssh/ folder in your home directory. These are the files you need to alter.

In ipcluster_config.py go ahead and change the following, in line with the docs:

c.IPClusterEngines.engine_launcher_class = 'SSH'
c.SSHEngineSetLauncher.engines = { 
    'xxx.xxx.xxx.002' : 4  # IP of the machine for the engines
    }

This sets up the engines to be started via SSH. The dictionary specifies the number of engines to start on which machine. It can easily be appended with more machines to run engines on.2

Unfortunately that was not enough to get the cluster to start, as ipcluster start tries to ssh into the remote machine with your local user name. This can be changed with setting in ipcluster_config.py:

c.SSHEngineLauncher.user = 'ipcluster'

Now the engines start on the remote machine, but they are still not able to connect to the locally running controller. The reason is, that by default the controller only listens to connections from localhost. To change that behaviour you need to set the following options in ipcontroller_config.py:

 c.HubFactory.client_ip = 'xxx.xxx.xxx.001' # IP of my laptop
 c.HubFactory.ip        = 'xxx.xxx.xxx.001'

and in ip_engine_config.py:

c.EngineFactory.location = 'xxx.xxx.xxx.001'

That makes the engines try to connect to a controller on xxx.xxx.xxx.162 and the controller to listen to connections to the same IP.

Then copy the profile_ssh folder to the remote .ipython folder and run

ipcluster start --profile=ssh

on the local machine. This will start four engines on xxx.xxx.xxx.002 and connect theses engines to the locally running controller.

Your cluster is now ready to be used. The IPython documentation is a good starting point to read about the ways to use all the computing power that is now at your fingertips.


  1. Remark: Recently Jean-Francis Roy published a good walk through on his blog on how to start all the components of an IPython cluster manually and connect them via SSH tunnels. In my case I use the ipcluster start script to automate that process. 

  2. I hope that I can acquire at least one more node.