System Administration with Capistrano

published on 2009-11-25 in computing

In previous years, when I had to run commands on hundreds or thousands of servers, I'd hack together a combo of expect and perl. It would almost always evolve into a system of complicated config and command files that was rickety and didn't handle errors well. I didn't dare mess with multi-threaded perl, which meant it was serial execution and slow for large clusters. It got the job done but left me wishing for a better system. I have always had cfengine in my sysadmin toolbox, but it's more about entropy reduction and not set up for one-off or occasional situations. I tried a few parallel shell implementations (such as dsh, pdsh) and found them all lacking.

Enter Capistrano. It bills its self as an 'easy deployment' system, with Ruby on Rails application deployment as the main use case. And since I'd never worked in a RoR environment before, I had no real reason to look into it much. But in the last 3 months, I have worked at 2 different companies that use RoR + Capistrano for deployment and have learned enough to see it's true power. How I'd describe it to a fellow sysadmin is: "parallel execution of scripts and commands on multiple hosts...easily". Want to quickly execute a command on every host in your cluster? This is the way to do it.

Installing it is pretty need a modern version of Ruby, a modern version of RubyGems and then a gem install capistrano later, and you're good to go. You only need to install all of this on the controlling/deployment server....not on all your clusters/nodes. If you get errors with the version of ruby/gems that comes with your distro, install from source (recommended). I followed this tutorial to get it set up, and to get the basics. You sould read it as well. They skip a few necessary things (such as sudo and useful ENV variables) which I have below.

An example Capfile of how to restart apache on a whole cluster:

role :apache_cluster,"www1","www2","www3"
desc "restart apache on www hosts"  
task "restart_apache_www", :roles => :apache_cluster do  
sudo "/etc/init.d/apache2 restart"  

sudo is a built in method of modern versions of Capistrano. Instead of the 'run' method, you use 'sudo' and it understand and responds to the prompt (if prompted). Very slick. One thing to keep in mind is that is is running everything as YOU, unless otherwise specified. It will look like you logged into 50 servers all at once and ran sudo commands all at once. I bet that'd look cool on a Splunk log graph.

Now, from the command line, type 'cap -T' to get a list of your documented commands. As long as you describe your commands, you will always get a list of what you can run. 'cap -e command' will explain commands.

$ cap -T  
cap invoke # Invoke a single command on the remote servers.  
cap restart_apache_www # restart apache on www hosts  
cap shell # Begin an interactive Capistrano session.

Run the command we set up: 'cap restart_apache_www'. It will prompt for your password.

$ cap restart_apache_util  
* executing \`restart_apache_www'  
* executing "sudo -p 'sudo password: ' /etc/init.d/apache2 restart"  
servers: ["", "",
[] executing command  
[] executing command  
[] executing command  
** [out ::] * Restarting web server apache2  
** [out ::] ...done.  
** [out ::] * Restarting web server apache2  
** [out ::] ...done.  
** [out ::] * Restarting web server apache2  
** [out ::] ...done.command finished

And that was completed in parallel, in about 1 second. What if you have a one-off thing you want to run on all hosts? Try cap invoke, no Capfile required. If you have a Capfile with hosts defined, it will run against all of them by default, or it can take a role by passing ROLE as an env variable.

$ cap COMMAND=uptime HOSTS="www1,www2" invoke  
* executing \`invoke'  
* executing "uptime" servers: ["www1", "www2"]  
[] executing command  
[] executing command  
** [out :: www1] 16:57:04 up 190 days, 4:30, 0 users, load average:
0.30, 0.33, 0.33  
** [out :: www2] 16:57:04 up 190 days, 4:42, 0 users, load average:
0.42, 0.32, 0.32  
command finished


cap ROLES=www COMMAND=uptime invoke  
* executing \`invoke'  
* executing "uptime" servers: ["www1", "www2", "www3"]  
[www1] executing command  
[www2] executing command  
[www3] executing command  
** [out :: www1] 17:00:17 up 190 days, 4:33, 0 users, load average:
0.54, 0.37, 0.34  
** [out :: www2] 17:00:17 up 190 days, 4:46, 0 users, load average:
0.18, 0.27, 0.29  
** [out :: www3] 17:00:17 up 190 days, 5:02, 0 users, load average:
0.17, 0.22, 0.25  
command finished

But every time you 'invoke', you must re-type your password. Want to stay connected? Try the shell:

$ cap shell HOSTS="www1,www2"  
* executing \`shell'  
Welcome to the interactive Capistrano shell! This is an experimental  
feature, and is liable to change in future releases. Type 'help' for  
a summary of how to use the shell.  
cap> uptime  
[establishing connection(s) to www1, www2]  
** [out :: www1] 17:03:24 up 190 days, 4:36, 0 users, load average:
0.29, 0.32, 0.32  
** [out :: www2] 17:03:24 up 190 days, 4:49, 0 users, load average:
0.35, 0.30, 0.29  
cap> w  
** [out :: www1] 17:03:37 up 190 days, 4:36, 0 users, load average:
0.24, 0.31, 0.31  
** [out :: www2] 17:03:37 up 190 days, 4:49, 0 users, load average:
0.30, 0.29, 0.28  
cap> ls /tmp/blah  
*** [err :: www1] ls: cannot access /tmp/blah  
*** [err :: www2] ls: cannot access /tmp/blah  
*** [err :: www1] : No such file or directory  
*** [err :: www2] : No such file or directory  
error: failed: "sh -c 'ls /tmp/blah'" on www1,www2

Notice that errors show up with a 'err' line.

While useful and timesaving, this barely scratches the surface of the power of Capistrano. I suggest you read the "From the Beginning" doc on the Capistrano site. If you discover any cool recipes, share them in the comments on this blog and I'll publish them as a followup later (as I learn more recipes myself).

P.S. I think I'm going to add a feature to MachDB to export host lists in a format compatible with the Capfiles.

Tags: sysadmin