Dev Environment as Software

In Mindset: Your Dev Environment Is Software I showed several small examples for how you can optimize parts of your development workflow, just by realizing that the stuff you're using to develop is also software. (Yes, this is trivial, but stating it explicitly leads to interesting results.)

Today I'd like to give a detailed explanation of how a biggish refactor of your workflow might look like. Case in point: SSHing to nodes by ip, by hostname, by EC2 instance id, by Chef role, as different users. Fast.

Just some simple functions

More often than not, it usually starts with some simple aliases. For example, I'll very often SSH in to a host as root. That's ssh root@, or alternatively ssh -l root. Other times I'll want to ssh as the user publisher. I never want to log in as any other user. So obviously, create trivial functions.

sr () {
    ssh "$@" -l root
}
sp () {
    ssh "$@" -l publisher
}

This lets us go from ssh root@host to sr host. Not bad, for basically no effort.

Executing stuff on lots of nodes

Sometimes you need to run a one-off command on several nodes. And see the output. Real use-case: I have a list of IPs / hostnames for nodes where Chef has been misbehaving. After uploading the fix, I want to re-run chef just on these nodes, right now. Let's call the function sm for ssh multiple.

sm() {
  cmd=$1; shift
  for h in $*
  do
    echo $cmd on $h | tee $h.out
    sr $h "$cmd" | tee -a $h.out
  done
}

With just that bit of code we can run a command on all those pesky nodes: sm 'pkill -USR1 chef-client' app{1..5} will tell the running Chef client on app1 through app5 to start a chef run. Note how the script also saves output into a file per host.

If the command is long-running, we may want to run the commands in parallel (after trying it out on a few hosts first to see that it works as intended). Easy, just a tiny modification of the above:

smp() {
  cmd=$1; shift
  for h in $*
  do
    echo $cmd on $h in background | tee $h.out
    (sr $h "$cmd" | tee -a $h.out) &
  done
}

This could be more elegant and better controlled with xargs or GNU parallel, but we're aiming for quick and dirty. The name of course stands for ssh multiple parallel.

EC2 instances

I'll often have just an EC2 instance ID to identify a host. For reference, these look something like i-189d8c21 (id changed to protect the innocent). There are two ways to go about logging in:

I can either go to the EC2 web interface, log in typing the 2-factor token from my phone, oh wait it was invalidated just as I was finishing, ok got it now, find the instance by the id, find the IP on the bottom part of the screen, copy it, then sr <paste>. That's around a minute.
There are a bunch of command-line tools to query and manipulate instance EC2 instance data. For example ec2-describe-instance $id is will show some data, including the address I need. However, I won't remember the string ec2-describe-instance, and there's also the copy-pasting. This takes around 30 seconds.

I hope by now you're jumping out of your chair screaming BUT WE CAN USE THAT TOOL TO AUTOMATE SOME OF THIS! Unless there's people around you, then I hope you're just metaphorically jumping and only screaming inside your head.

So yes, after some tweaking, we can get the IP address of a host from an instance id: sh ec2-describe-instances --show-empty-fields $id | grep '^INSTANCE' | cut -f4

Let's wrap it in a handy function: sh ec2_ssh() { id=$1; shift ssh_debug "Logging in to EC2 instance $id" ec2host=$(ec2-describe-instances --show-empty-fields $id | grep '^INSTANCE' | cut -f4) ssh $ec2host "$@" }

Now we can go ec2_ssh -l root i-189d8c21. By the way: I shamelessly stole this function and some of the later ideas from zsol. That's another great way to optimize your workflow: look at how others do it.

You could of course create wrappers like ec2_sr to log in as root, but we'll do better than that. But first...

Log in to lots of instances, by Chef role

This is another very common thing to do. What I'd usually do is something like this:

Go to my chef repo clone (we have all our stuff in a single repository), let's call it $CHEF_HOME
knife ssh something_app -x root cssh
Oh look, no nodes found.
ls roles | ag something
Of course, there's a dash instead of an underscore
knife ssh something-app -x root cssh
And then get to work.

This sucks. Can we optimize it? SURE! First we'll need to figure out which role we actually want to work on. If there are several candidates, we'll just return them all here and deal with it later.

find_roles() {
    regex="$1"; shift
    ls $CHEF_HOME/roles | sed 's/\.json$//' | grep -E "$regex"
}

Once we have a list of roles (ideally containing only one role), we can easily write a function that logs in to all of the nodes in a role. Note that knife ssh uses -x for the username instead of -l.

chef_ssh() {
    role_regex=$1; shift
    role=$(find_roles "$role_regex")
    if [ $(echo "$role" | wc -w) = '0' ]; then
        echo "No idea how to SSH into $role"
    elif [ $(echo "$role" | wc -w) != '1' ]; then
        echo "Found more than one matching Chef role:"
        echo "$role"
    else
        ssh_debug "Logging in to nodes with Chef role $role"
        cd $CHEF_HOME
        knife ssh roles:$role cssh -x root
    fi
}

Note that we use the input as a regex in find_roles (grep -E). This lets us do fancy tricks, like: if you don't remember whether it's something_app or something-app, you can just say something.app. Also, if there are roles called both foo and foobar, then chef_ssh foo will fail saying there are two matching roles. You can of course extend the function to prefer exact matches. I'm a bit lazy, so instead of thinking more, I'd just use chef_ssh foo\$.

Putting it all together

I still need to remember which function to call. Since we're already optimizing, let's go all the way and create a single function that will use the above functions to do the right thing. There's some thinking we have to do:

Check if a string is an EC2 instance id: grep -o 'i-[0-9a-f]\{8\}'
Check if a string resolves as a host / ip address. The host command can do this for us, but if the intertubes are slow / your DNS server is down, this can take quite a while, so let's put a timeout on it: timeout 2 host "$1"

We'll also want some verbose output for debugging our logic, which should be off by default. A useful pattern is writing a function like this:

ssh_debug() {
    [ -n "$SSH_DEBUG" ] && echo "$@"
}

Then instead of echo useful debug info you can write ssh_debug useful debug info, and have it only actually written to STDOUT if you ran your command with something like SSH_DEBUG=1 in the ENV.

Now, linking this all up:

s() {
    id=$(echo $1 | grep -o 'i-[0-9a-f]\{8\}')
    if [ $? -eq 0 ]; then
        shift
        ssh_debug "First argument looks like an EC2 instance id"
        ec2_ssh $id "$@"
    elif timeout 2 host "$1" > /dev/null; then
        ssh_debug "First argument doesn't look like an EC2 instance id, but I can resolve it as a hostname. Logging in directly."
        ssh "$@"
    else
        ssh_debug "First argument doesn't look like an EC2 instance id, I can't resolve it as a hostname, assuming it's a Chef role (maybe a regex)"
        chef_ssh "$@"
    fi
}

And then the wrapper functions sr and sp can be updated to use s instead of ssh. Since sm and smp were already using sr, they automatically benefit from the implementation of s. Note that when using Chef roles, extra arguments like -l root will be ignored due to the differences in argument schemes.

To see the whole thing in one piece, just look at my aliases file. The SSH part begins at around line 45 (no direct link since it'll probably change, this link will always point to the latest version instead of the correct line).

Take-away

You're dedicated, having read this far. Good, good.
If you apply just the tiniest bit of the experience you already use in your job (in a somewhat new way), you can significantly speed up how you actually do said job.

Happy hacking!

Dev Environment as Software - SSH and Aliases