For my Master Thesis I came up with quite a few Monte Carlo simulations. I use them to filter particle size histograms and to estimate errors of derived distribution parameters.
Every thing Monte Carlo related tends to be quite processor intensive. Which is rather painfull as I usually work on an older Macbook Pro with a Core2Duo processor. Fortunately I have an i5 machine to my disposal. But I don't want to work on that machine directly, I rather prefer my own.
That is why I wanted to set up a small IPython cluster. I anyway work with the fabulous IPython notebook and console all day, so, after watching some of the videos on the IPython cluster I was certain, that it would serve my purposes quite well. That was easily confirmed by a few local trials. So I went ahead and dove head first into the documentation on how to set up an IPython cluster via SSH. Usually the documentation of IPython is very good but with the information provided I was not able to set up a working configuration. So I'll walk you through my now working config.1
Setup
First make sure you are running the latest iPython version from the master branch of the IPython github. In this case it is the 0.14dev version, courtesy of the ScipySuperpack.
Than add a new dedicated user (ipcluster) to the machine the engines are
supposed to run on. Make sure the user can run ipython and that all the tests
pass. Also you have to set up password less login via ssh so that you can log in
from your laptop into the worker machine.
Configuration
Now create a new profile on your controller machine (The MBP in my case) by running this command in the shell
ipython profile create --parallel --profile=ssh
The config files are placed into the .ipython/profile_ssh/ folder in your home
directory. These are the files you need to alter.
In ipcluster_config.py go ahead and change the following, in line with the
docs:
c.IPClusterEngines.engine_launcher_class = 'SSH' c.SSHEngineSetLauncher.engines = { 'xxx.xxx.xxx.002' : 4 # IP of the machine for the engines }
This sets up the engines to be started via SSH. The dictionary specifies the number of engines to start on which machine. It can easily be appended with more machines to run engines on.2
Unfortunately that was not enough to get the cluster to start, as ipcluster
start tries to ssh into the remote machine with your local user name. This can
be changed with setting in ipcluster_config.py:
c.SSHEngineLauncher.user = 'ipcluster'
Now the engines start on the remote machine, but they are still not able
to connect to the locally running controller. The reason is, that by default the
controller only listens to connections from localhost. To change that behaviour
you need to set the following options in ipcontroller_config.py:
c.HubFactory.client_ip = 'xxx.xxx.xxx.001' # IP of my laptop c.HubFactory.ip = 'xxx.xxx.xxx.001'
and in ip_engine_config.py:
c.EngineFactory.location = 'xxx.xxx.xxx.001'
That makes the engines try to connect to a controller on xxx.xxx.xxx.162 and the
controller to listen to connections to the same IP.
Then copy the profile_ssh folder to the remote .ipython folder and run
ipcluster start --profile=ssh
on the local machine. This will start four engines on xxx.xxx.xxx.002 and
connect theses engines to the locally running controller.
Your cluster is now ready to be used. The IPython documentation is a good starting point to read about the ways to use all the computing power that is now at your fingertips.
-
Remark: Recently Jean-Francis Roy published a good walk through on his blog on how to start all the components of an IPython cluster manually and connect them via SSH tunnels. In my case I use the
ipcluster startscript to automate that process. ↩ -
I hope that I can acquire at least one more node. ↩