The Proc Connector and Socket Filters

The proc connector is one of those interesting kernel features that most people rarely come across, and even more rarely find documentation on. Likewise the socket filter. This is a shame, because they’re both really quite useful interfaces that might serve a variety of purposes if they were better documented.

The proc connector allows you to receive notification of process events such fork and exec calls, as well as changes to a process’s uid, gid or sid (session id). These are provided through a socket-based interface by reading instances of struct proc_event defined in the kernel header.

#include <linux/cn_proc.h>

The interface is built on the more generic connector API, which itself is built on the generic netlink API. These interfaces add some complexity as they are intended to provide bi-directional communication between the kernel and userspace; the connector API appears to have been largely forgotten as newer such socket interfaces simply declare their own first-class socket classes. So we need the headers for those too.

#include <linux/netlink.h>
#include <linux/connector.h>

(For brevity, I’ll omit any standard boilerplate such as the headers you need for syscalls and library functions that you should be used to as well as function definitions, error checking, and so-forth.)

Ok, now we’re ready to create the connector socket. This is straight-forward enough, since we’re dealing with atomic messages rather than a stream, datagram is appropriate.

int sock;
sock = socket (PF_NETLINK, SOCK_DGRAM | SOCK_NONBLOCK | SOCK_CLOEXEC,
               NETLINK_CONNECTOR);

To select the proc connector we bind the socket using a struct sockaddr_nl object.

struct sockaddr_nl addr;
addr.nl_family = AF_NETLINK;
addr.nl_pid = getpid ();
addr.nl_groups = CN_IDX_PROC;

bind (sock, (struct sockaddr *)&addr, sizeof addr);

Unfortunately that’s not quite enough yet; the proc connector socket is a bit of a firehose, so it doesn’t in fact send any messages until a process has subscribed to it. So we have to send a subscription message.

As I mentioned before, the proc connector is built on top of the generic connector and that itself is on top of netlink so sending that subscription message also involves embedded a message, inside a message inside a message.  If you understood Christopher Nolan’s Inception, you should do just fine.

Since we’re nesting a proc connector operation message inside a connector message inside a netlink message, it’s easiest to use an iovec for this kind of thing.

struct iovec iov[3];
char nlmsghdrbuf[NLMSG_LENGTH (0)];
struct nlmsghdr *nlmsghdr = nlmsghdrbuf;
struct cn_msg cn_msg;
enum proc_cn_mcast_op op;

nlmsghdr->nlmsg_len = NLMSG_LENGTH (sizeof cn_msg + sizeof op);
nlmsghdr->nlmsg_type = NLMSG_DONE;
nlmsghdr->nlmsg_flags = 0;
nlmsghdr->nlmsg_seq = 0;
nlmsghdr->nlmsg_pid = getpid ();

iov[0].iov_base = nlmsghdrbuf;
iov[0].iov_len = NLMSG_LENGTH (0);

cn_msg.id.idx = CN_IDX_PROC;
cn_msg.id.val = CN_VAL_PROC;
cn_msg.seq = 0;
cn_msg.ack = 0;
cn_msg.len = sizeof op;

iov[1].iov_base = &cn_msg;
iov[1].iov_len = sizeof cn_msg;

op = PROC_CN_MCAST_LISTEN;

iov[2].iov_base = &op;
iov[2].iov_len = sizeof op;

writev (sock, iov, 3);

The netlink message length is the combined length of the following connector and proc connector operation messages, and is otherwise simply a message from our process id with no following messages.  However all of the interfaces to netlink take a lot of care to make sure the following structure in the message is aligned as wide as possible using the NLMSG_LENGTH macro, to avoid issues with platforms that have fixed alignment for data types, so we have to be careful of that too.

So we actually have a bit of padding between the struct nlmsghdr and the struct cn_msg, this is accomplished by actually using a character buffer of the right size for the first iovec element and accessing it through a struct nlmsghdr pointer.

The connector message indicates that it is relevant to the proc connector through the idx and val fields, and the length is the legnth of the proc connector operation message.

Finally the proc connector operation message (just an enum) says we want to subscribe. Why isn’t there padding between the connector and proc connector operation messages? Because the last element in struct cn_msg is a zero-width type which results in the right padding, this interface is rather newer than netlink.

iovec stitches it all together so it’s sent as a single message, visualized this message looks like this:

There’s a matching PROC_CN_MCAST_IGNORE message if you want to turn off the firehose without closing the socket.

Ok, the firehose is on now we need to read the stream of messages.  Just like the message we sent, the stream of messages we receive are actually netlink messages, and inside those netlink messages are connector messages, and inside those are proc connector messages.

Netlink allows for all sorts of things like multi-part messages, but in reality we can ignore most of that since connector doesn’t use the, but it’s worth future-protecting ourselves and being liberal in what we accept.

struct msghdr msghdr;
struct sockaddr_nl addr;
struct iovec iov[1];
char buf[PAGE_SIZE];
ssize_t len;

msghdr.msg_name = &addr;
msghdr.msg_namelen = sizeof addr;
msghdr.msg_iov = iov;
msghdr.msg_iovlen = 1;
msghdr.msg_control = NULL;
msghdr.msg_controllen = 0;
msghdr.msg_flags = 0;

iov[0].iov_base = buf;
iov[0].iov_len = sizeof buf;

len = recvmsg (sock, &msghdr, 0);

Why do we use recvmsg rather than just read? Because netlink allows arbitrary processes to send messages to each other, so we need to make sure the message actually comes from the kernel; otherwise you have a potential security vulnerability. recvfrom lets us receive the sender address as well as the data.

if (addr.nl_pid != 0)
        continue;

(I’m assuming you’re reading in a loop there.)

So now we have a netlink message package from the kernel, this may contain multiple individual netlink messages (it doesn’t, but it may). So we iterate over those.

for (struct nlmsghdr *nlmsghdr = (struct nlmsghdr *)buf;
     NLMSG_OK (nlmsghdr, len);
     nlmsghdr = NLMSG_NEXT (nlmsghdr, len))

And we should ignore error or no-op messages from netlink.

if ((nlmsghdr->nlmsg_type == NLMSG_ERROR)
    || (nlmsghdr->nlmsg_type == NLMSG_NOOP))
        continue;

Inside each individual netlink message is a connector message, we extract that and make sure it comes from the proc connector system.

struct cn_msg *cn_msg = NLMSG_DATA (nlmsghdr);

if ((cn_msg->id.idx != CN_IDX_PROC)
    || (cn_msg->id.val != CN_VAL_PROC))
        continue;

Now we can safely extract the proc connector message; this is a struct proc_event that we haven’t seen before. It’s quite a large structure definition so I won’t paste it here, since it contains a union for each of the different possible message types. Instead here’s code to actually print the relevant contents for an example message.

struct proc_event *ev = (struct proc_event *)cn_msg->data;

switch (ev->what) {
case PROC_EVENT_FORK:
        printf ("FORK %d/%d -> %d/%d\n",
                ev->event_data.fork.parent_pid,
                ev->event_data.fork.parent_tgid,
                ev->event_data.fork.child_pid,
                ev->event_data.fork.child_tgid);
        break;
/* more message types here */
}

As you can see, each message type has an associated member of the event_data union containing the information fields for it. And as you can see, this gives you information about each individual kernel task, not just the top-level processes you’re normally used to seeing. In other words, you see threads as well as processes.

Like I keep saying, it’s a firehose. It would be great if there was some way to filter the socket in the kernel so that our process doesn’t even get woken up for messages. Wake-ups are bad, especially in the embedded space.

Fortunately there is a way to filter sockets on the kernel-side, the kernel socket filter interface. Unfortunately this isn’t too well documented either; but let’s use this opportunity to document an example.

We’ll filter the socket so that we only receive fork notifications, discarding the other types of proc connector event type and most importantly discarding the messages that indicate new threads being created (those where the pid and tgid fields differ). One important part of filtering is that you should be careful so that only expected messages are filtered, and that unexpected messages are still passed through.

The filter machine consists of a set of machine language instructions added to the socket through a special socket option. Fortunately this machine language is copied from the Berkeley Packet Filter from BSD, so we can find documentation for it in the bpf(4) manual page there. Just ignore the structure definitions, because they are different on Linux.

So let’s get started with our example; first we need to add the right header.

#include <linux/filter.h>

And now we need to insert the filter into the socket creation, before the subscription message is sent is usually a good place. On Linux the instructions are given as an array of struct sock_filter members which we can construct using the BPF_STMT and BPF_JUMP macros.

Just to make sure everything is working, we’ll create a simple “no-op” filter.

struct sock_filter filter[] = {
        BPF_STMT (BPF_RET|BPF_K, 0xffffffff),
};

struct sock_fprog fprog;
fprog.filter = filter;
fprog.len = sizeof filter / sizeof filter[0];

setsockopt (sock, SOL_SOCKET, SO_ATTACH_FILTER, &fprog, sizeof fprog);

Not very useful, but it means we can now concentrate on writing the filter code itself. This filter consists of a single statement, BPF_RET that tells the kernel to deliver an amount of bytes of the packet to the receiving process and to return from the filter. The BPF_K option means that we give the amount of bytes as the argument to the statement, and in this case we give the largest possible value. In other words, this statement declares to deliver the whole packet and return from the filter.

To not wake up the process at all, and filter everything we deliver no bytes and return from the filter.

BPF_STMT (BPF_RET|BPF_K, 0);

You may want to test that too.

Ok, now let’s actually do some examination of the packets to filter out the noise. Recall that we’re dealing with nested messages here, messages inside messages, inside messages. Visualizing this is really important to understanding what you’re dealing with.

The most basic filter code consists of three operations: load a value from the packet into the machine’s accumulator, compare that against a value and jump to a different instruction if equal (or not equal), and then possibly return or perform another operation.

All of the following filter code replaces whatever you had in the filter[] array before.

So first we should examine the nlmsghdr on the start of the packet, we want to make sure that there is just one netlink message in this packet. If there are multiple, we just pass the whole packet to userspace for dealing with. We check the nlmsg_type field to make sure it contains the value NLMSG_DONE.

BPF_STMT (BPF_LD|BPF_H|BPF_ABS,
          offsetof (struct nlmsghdr, nlmsg_type));
BPF_JUMP (BPF_JMP|BPF_JEQ|BPF_K,
          htons (NLMSG_DONE),
          1, 0);
BPF_STMT (BPF_RET|BPF_K, 0xffffffff);

The first statement says to load (BPF_LD) a “halfword” (16-bit) value (BPF_H) from the absolute offset (BPF_ABS) equivalent to the position of the nlmsg_type member in struct nlmsghdr. Since we expect that structure to be the start of the message, this means the accumulator should now have that value.

The next statement is a jump (BPF_JMP), it says to compare the accumulator for equality (BPF_JEQ) against the constant argument (BPF_K). We only want to continue if this is the sole message, so the value we compare against is NLMSG_DONE – first remembering to deal with host and network ordering.

If true, the jump will jump one statement; if false the jump will not jump any statements. These are the third and fourth arguments to the BPF_JUMP macro.

Note that the error case is always to return the whole packet to the process, waking it up. And the success case is future processing of the packet. This makes sure that we don’t filter unexpected packets that userspace may really need to deal with. Don’t use the socket filter for security filtering, it’s for reducing wake-ups.

So let’s filter the next set of values, we want to make sure that this netlink message is from the connector interface. Again we load the right “word” (32-bit) values (BPF_W) from the appropriate offsets and check them against constants.

BPF_STMT (BPF_LD|BPF_W|BPF_ABS,
          NLMSG_LENGTH (0) + offsetof (struct cn_msg, id)
          + offsetof (struct cb_id, idx));
BPF_JUMP (BPF_JMP|BPF_JEQ|BPF_K,
          htonl (CN_IDX_PROC),
          1, 0);
BPF_STMT (BPF_RET|BPF_K, 0xffffffff);

BPF_STMT (BPF_LD|BPF_W|BPF_ABS,
          NLMSG_LENGTH (0) + offsetof (struct cn_msg, id)
          + offsetof (struct cb_id, idx));
BPF_JUMP (BPF_JMP|BPF_JEQ|BPF_K,
          htonl (CN_VAL_PROC),
          1, 0);
BPF_STMT (BPF_RET|BPF_K, 0xffffffff);

So after this filter code has executed, we know the packet contains a single netlink message from the proc connector. Now we want to make sure it’s a fork message; this is a bit different from before, because now we explicitly do filter out the other message types so the return case for non-equality is to return zero bytes.

BPF_STMT (BPF_LD|BPF_W|BPF_ABS,
          NLMSG_LENGTH (0) + offsetof (struct cn_msg, data)
          + offsetof (struct proc_event, what);
BPF_JUMP (BPF_JMP|BPF_JEQ|BF_K,
          htonl (PROC_EVENT_FORK),
          1, 0);
BPF_STMT (BPF_RET|BPF_K, 0);

And now we can compare the pid and tgid values for the parent process and the child process fields. This is again slightly interesting because we can’t compare against an absolute offset with the jump instruction so we use the second index register instead (BPF_X in the jump instruction). Of course it would be too easy if we could load directly into that, so we have to do it via the scratch memory store instead; this requires loading into the accumulator (BPF_LD), storing into scratch memory (BPF_ST) and loading the index register (BPF_LDX) from scratch memory (BPF_MEM).

BPF_STMT (BPF_LD|BPF_W|BPF_ABS,
          NLMSG_LENGTH (0) + offsetof (struct cn_msg, data)
          + offsetof (struct proc_event, event_data)
          + offsetof (struct fork_proc_event, parent_pid));
BPF_STMT (BPF_ST, 0);
BPF_STMT (BPF_LDX|BPF_W|BPF_MEM, 0);

Then we load the tgid value into the accumulator and we can compare and jump as before; if they are equal we want to continue, if they are inequal we want to filter the packet.

BPF_STMT (BPF_LD|BPF_W|BPF_ABS,
          NLMSG_LENGTH (0) + offsetof (struct cn_msg, data)
          + offsetof (struct proc_event, event_data)
          + offsetof (struct fork_proc_event, parent_tgid));
BPF_JUMP (BPF_JMP|BPF_JEQ|BPF_X,
          0,
          1, 0);
BPF_STMT (BPF_RET|BPF_K, 0);

Then we do the same for the child field.

BPF_STMT (BPF_LD|BPF_W|BPF_ABS,
          NLMSG_LENGTH (0) + offsetof (struct cn_msg, data)
          + offsetof (struct proc_event, event_data)
          + offsetof (struct fork_proc_event, parent_pid));
BPF_STMT (BPF_ST, 0);
BPF_STMT (BPF_LDX|BPF_W|BPF_MEM, 0);

BPF_STMT (BPF_LD|BPF_W|BPF_ABS,
          NLMSG_LENGTH (0) + offsetof (struct cn_msg, data)
          + offsetof (struct proc_event, event_data)
          + offsetof (struct fork_proc_event, parent_tgid));

BPF_JUMP (BPF_JMP|BPF_JEQ|BPF_X,
          0,
          1, 0);

BPF_STMT (BPF_RET|BPF_K, 0);

After all that filter hurdling, we have a packet that we want to pass through to the process, so the final instruction is a return of the largest packet size.

BPF_STMT (BPF_RET|BPF_K, 0xffffffff);

That’s it. Of course, what you do with this is up to you. One example could be a daemon that watches for excessive forks and kills fork bombs before they kill the machine. Since you get notification of changes of uid or gid, another example could be a security audit daemon, etc.

Upstart uses this interface for its own nefarious process tracking purposes.

The Importance of Being Tested

In addition to the regular posts documenting features of 0.6 and giving hints and tips about it’s usage, release announcements and so-forth; I’ll also be posting insights and anecdotes about Upstart’s ongoing development.  A particular story cropped up again this month, and I thought I’d share it with you.

When I began work on Upstart, one of the earliest decisions I made was to make sure the code was very-well covered by a comprehensive test suite.  I’d been working with Robert Collins a lot in the previous couple of years and he is very much an advocate of practices such as Extreme Programming (XP) and Agile Development; especially the discipline of Test Driven Development.

I’d also recently seen a keynote by Andrew Tridgell in which he talked about some of the development of Samba 4, in particular the high use of both test cases and code generation in that code-base.  Something he said in the keynote stuck with me: “untested code is broken code”.

Statistics obviously depend on exactly how you count lines of code, but using a simple semi-colon count the combined source code of libnih and Upstart is slightly over 20,000 lines of code.  The combined source code of the test suite for both is slightly over 120,000 lines of code.

The init daemon is an extremely important part of a Linux system, if it crashes then you’re left with a kernel panic; if it simply misbehaves, you’re left with just severe problems.  Not only was I changing it, but I was replacing a very simple dumb system (Sys V init) with something comparatively complex with rules and behaviours that needed rigorous testing.

It would have been very scary to have developed it without the careful testing, and I would have been very worried if anyone had agreed to replace such a core component of the system without this test suite to back up its behaviour.

That being said, maintaining the test suite can be a huge burden.  Don’t believe what anybody tells you, if you’re writing test cases as well as code, then your pace of development slows as well.  They’re right that you spend a lot less time debugging of course, but unlike in the commercial software business free software developers tend to release first and debug later.   If you use a similarly high test to code ratio in your own project, then you’ll find that the time until your first release will be pretty long and the time between releases longer as well.

Another decision is whether to do Test Driven Development or not; that discipline requires that you always write the tests first, to fail, and only write code in order to make the tests pass.  I’m not a fan of TDD, and I’ve no problem admitting that I mostly did not use it for Upstart.  My gut feel is that TDD produces code that hangs, swings and loops just to deal with testing.  It also just doesn’t suit my coding style: I like to write code from the middle outwards, the function API is the last thing I tend to fix, where TDD forces it to be the first.

I’m also not convinced TDD is really suitable for a language like C; it’s pretty hard to get a test case to compile, run and fail without writing any supporting code such as a header file, etc.

I have found TDD useful when I have code that really does break down into a single unit with a well-defined and obvious API, and that while the inputs and outputs have been obvious, the algorithm for getting between them wasn’t at the time.

What I’ve tended to do instead is write code naturally how I would, and write test cases alongside to run the code and make sure it’s working.  As the code grows more complex, more test cases appear for it.  One big advantage to this is then I don’t need to reboot or fire up a VM as much, I can test a large proportion of Upstart’s operation through testing.

Now, onto the stories.  There are two similar ones.

One of the side-effects of testing Upstart so strongly is that the tests are not only driving the code I’ve written but also code in libraries and even in the Kernel.  One particular set of tests was covering the code in libnih and Upstart that handles watching the configuration directory for changes, it’s this code that means Upstart automatically reloads jobs when you edit them without needing an explicitly signal.

One day these test cases started failing without warning.  Investigation showed that they passed fine under older kernels, but with the newest kernel update to Ubuntu, they failed.

The inotify subsystem in the kernel had undergone a radical overhaul and rewrite.  Rather than being its own code, it was completely rebased onto the new fsnotify system.  Fortunately I was aware of this, and after careful checking that it was indeed the kernel behaviour that was now incorrect (and that it wasn’t incorrect before), I got in touch with the Eric Paris, the author of the new code, and was able to give him minimal example code to replicate the problem.

inotify: check filename before dropping repeat events

This was a while ago, but pretty much the same story happened again recently, just this time not with the kernel.

Again, the story started with Upstart’s test suite failing.  The engineer who first noticed it assumed it was an issue with the new build daemon and disabled the test for the time being.  The test was in the part of the code testing Upstart’s interaction with D-Bus.

Now, sometimes I tend to write tests to deal with corner-cases and “what if” scenarios that I dream up.  This isn’t always about testing my code, often it’s a case of finding out whether something is really possible or whether that thing misbehaves.  These tests still stay in the suite of course.

A particular set of tests were intended to find out what happened if the D-Bus daemon crashed during initial connection, I considered this fairly important because at times the libdbus library has called exit() or abort() when things happened that it didn’t like.  If you call that from the init daemon, the kernel panics.

These tests had worked fine for a couple of years (actually at the time I had to fix bugs in libdbus to make them pass) but now one of these tests was breaking.  The disconnection was causing SIGPIPE to be delivered to the test.

Again, this turned out to be due to a change to D-Bus.  Lennart Poettering had been working on some changes to avoid libdbus’s awkward SIGPIPE handling and replace it with the use of the MSG_NOSIGNAL flag.  Unfortunately he’d missed a case in the authentication code.  The side-effect was that if the D-Bus daemon had crashed, been killed, OOM’d, etc. during initial connection – the connecting application would have gone too.  Especially bad for an init daemon.

Fortunately Upstart’s test suite caught it, and the fix was simple.

sysdeps-unix: use MSG_NOSIGNAL when sending creds

(reposted from http://upstart.at/2010/12/20/the-importance-of-being-tested/ – post comments there)

Events are like Methods

In last week’s post I talked about how Events can be treated like Signals, this week we’ll be looking at how Events can be treated like Methods.  That might seem a little surprising, since normally one considers signals and methods as very different things, but to Upstart they are both just events.

What do I mean by Methods?  You’ve almost certainly done some kind of programming, even if just a little scripting, so you should know about methods or functions.

In contrast to signals, which are just a notification that something happened on the system, a method is a request for the system to do something on your behalf.  Usually to make some kind of change to the system state.

Likewise in contrast to the signals where you don’t care about the result, for a method you want to wait for the changes to be completed and perhaps even be notified if the method failed.

It’s just as easy to implement a method in Upstart as it is to implement something that considers an event a signal.  Here’s an example of how you might implement a suspend method:

start on suspend

task
exec pm suspend

Doesn’t look that much difference from a signal, the only new stanza in this is task (and that’s not necessary for a method either).  So what happens if we want to trigger a suspend?  We use the command:

root@worldofwarcraft:~# initctl emit suspend

The difference here from emitting a signal we demonstrated in the previous post is that we aren’t using the –no-wait flag.

So we emit the suspend event, and Upstart will start our job as a result; but initctl emit will not return immediately, it waits for the results of the event to complete before it returns.

Because we used the task stanza in the configuration, we’ve told Upstart that the process we execute is expected to take a limited amount of time and then finish by itself.  This means that Upstart will not believe the job is complete until the process has exited, and will continue to block the event while it is still running.

Finally if the command exited with an error, that error is propagated back to the event that started it, and the initctl emit command will exit with an error code.

So now we can use Upstart events and jobs for two different purposes; we can announce changes to the system, and we can use them as methods to make changes to the system.

The most typical event that is used as a methods on your system is the runlevel event used to change the runlevel for System-V compatibility and generally emitted by the telinit and shutdown tools.  The /etc/init/rc.conf script that handles it can be pretty simple and looks not unlike the suspend example above:

start on runlevel [0123456]

task
exec /etc/init.d/rc $RUNLEVEL

What happens if you don’t include task?  Well, that means Upstart will consider the job as ready when the process executed is running, and the event will be unblocked and initctl emit will return.  If the service fails to start, then initctl will return with an error.  This is great for methods that start (or stop) services.

Side-note: the start and stop commands act very much like method events, they block until the service is running or the task has finished and they return errors as well.  However they’re not actually implemented as events right now, an oversight I intend to correct in Upstart 2.

(reposted from http://upstart.at/2010/12/16/events-are-like-methods/ – post comments there)

Event matching in Upstart

A little while ago I was asked to solve a problem that somebody was having with Upstart, and I realised that people weren’t understanding how things were actually working and were just muddling along when doing event matching in jobs.  This is unfortunate, because it hides some of Upstart’s true power, so I thought it high time I actually explained this.

Let’s start with a simple example.  Fire up any Linux distribution with Upstart 0.6, Ubuntu or Fedora current releases will do, and create a file named /etc/init/example1.conf with the following content:

start on surprise

This is pretty simple, it’s a job that does nothing except declare that it’s started when the surprise event happens.  We can demonstrate that works by emitting the event ourselves and checking the status of the job before and afterwards:

root@angrybirds:/etc/init# status example1
example1 stop/waiting
root@angrybirds:/etc/init# initctl emit surprise
root@angrybirds:/etc/init# status example1
example1 start/running

Nothing too surprising after all, I hope.  The job did indeed start on the surprise event, and would now be running if we’d actually told Upstart to run something.

Incidentally I’m often asked why there isn’t a single list of events anywhere, that’s because you can match any event you like as long as you know something emits it.  Events are supposed to come from all manner of sources.  I do try and document them though, try running man 7 startup on your system to see an example of an event’s man page.

If events were just names, they’d be pretty boring.  Events can also have attached environment variables, and these get put into the environment of any job’s process started by the event.  Here’s /etc/init/example2.conf:

start on weather

script
    echo $KIND > /tmp/weather
end script

This will now run a small shell script that outputs the $KIND environment variable to a file.  This isn’t set anywhere, but we can pass it in the event.

root@angrybirds:/etc/init# cat /tmp/weather
cat: /tmp/weather: No such file or directory
root@angrybirds:/etc/init# initctl emit weather KIND=RAIN
root@angrybirds:/etc/init# cat /tmp/weather
RAIN

Ok, these are just examples but there are plenty of useful events on your system right now which carry environment variables such as which network interface just came up, and so on.

If you wanted to only run on a certain type of weather, you might think to check the value of $KIND within the script; you could do that, but it’s inefficient, ideally you don’t want your script run at all.  Fortunately we can match the environment of an event in the job easily enough, here’s /etc/init/example3.conf:

start on weather KIND=snow

Hopefully you’ll figure that this one will only start if it’s snowing, and you’d be right:

root@angrybirds:/etc/init# status example3
example3 stop/waiting
root@angrybirds:/etc/init# initctl emit weather KIND=hail
root@angrybirds:/etc/init# status example3
example3 stop/waiting
root@angrybirds:/etc/init# initctl emit weather KIND=snow
root@angrybirds:/etc/init# status example3
example3 start/running

Events can have more than one environment variable, and you can have more than one match:

start on weather KIND=rain INTENSITY=heavy

The matches are actually globs, so you can use * and ? in there and as well as =, there’s obviously !=.

One useful use for the latter is in the stop on stanza, as well as being available for the job’s processes you can also use these in other stanzas within the job.  Here’s a cute example for /etc/init/example4.conf:

start on weather KIND=rain or weather KIND=snow
stop on weather KIND!=$KIND

This one takes a bit of explaining.  First of all to start the job we match the weather event with $KIND set to either rain or snow.  Now we supply a condition to stop the job, and we also match the weather event with a given value of $KIND except this time we match what looks like itself.

In fact this expansion of $KIND is the value that variable had when the job was started, not the value in the new event.  It says to stop the job if it stops raining, or stops snowing depending on which of the two started it.  Most importantly, if an event simply repeats the same kind of weather, but maybe with a different intensity, the job carries on running (but it doesn’t have its environment updated – UNIX can’t do that).

root@angrybirds:/etc/init# status example4
example4 stop/waiting
root@angrybirds:/etc/init# initctl emit weather KIND=rain INTENSITY=heavy
root@angrybirds:/etc/init# status example4
example4 start/running
root@angrybirds:/etc/init# initctl emit weather KIND=rain INTENSITY=light
root@angrybirds:/etc/init# status example4
example4 start/running
root@angrybirds:/etc/init# initctl emit weather KIND=sun
root@angrybirds:/etc/init# status example4
example4 stop/waiting

Ok, last fake example before we get onto the fun bits.  Remember the example from above:

start on weather KIND=rain INTENSITY=heavy

Upstart lets us shortcut this a little, the environment variables are specified in an order on the initctl command-line and if we know what that order is, we can just assume what variable is in that position.  So as long as we know a weather event always has a KIND followed by an INTENSITY, we could shortcut that to:

start on weather rain heavy

If you’ve used Upstart at all, you’ve seen that shortcut before.  A lot.  You may not have even realised it was a shortcut at all, and that’s what I hope to fix here.

Here’s an example of where you’ve used that:

start on started dbus

You should hopefully now recognise that started is the name of the event there, an dbus is simply the value of its first argument, whatever that might be.  Remember I mentioned that events have man pages?  Take a look at man 7 started, which is the man page for this event.

It documents which environment variables are attached to the started event, and most importantly what order they come in.

started JOB=JOB INSTANCE=INSTANCE [ENV]...

So really when we wrote the previous, we were just using a shortcut to specify:

start on started JOB=dbus

You might wonder what difference this makes.  A good example of how to exploit this is the stopped event.  If you look at it’s man page (man 7 stopped) you’ll see it has a large number of environment variables specifying not only which job stopped but the reason for it stopping.  One of those is the exit signal, for example.

Now you know that you’re just matching the $JOB environment variable, it’s obvious that you don’t have to!  You can match any other environment variable or variables in the event, or none at all.

Here’s how to run a script if any other job on the system exits with a segmentation fault:

start on stopped EXIT_SIGNAL=SEGV

I said you didn’t have to match any variables, just like in the first examples we didn’t, there’s a neat use for that with the job events.  The starting event blocks the named job from actually starting until anything run by it is started; or, in the case of jobs marked task, finished.

Here’s a little job that runs every time another job is started, and blocks that job from actually starting until the script finishes.

start on starting
task

script
    ....
end script

Useful both for debugging and performance analysis.

Now for the really neat bit.  So far we’ve concentrated on the environment variables that come from events, and those that Upstart puts into the job events.  But we can influence these in rather useful ways.

Firstly we can declare a default value for an environment variable in a job, if no alternate value is given in the start event or command, then this default value wins:

start on mounted

env MOUNTPOINT=/tmp
script
    ....
end script

This script will run for each occurrence of the mounted event, and will hopefully get the value for $MOUNTPOINT from that event.  But should the value be missing from the event, or the script be started manually by a system administrator, a default value is provided.

This isn’t a false example, that’s from the job on your system that cleans up the /tmp directory on boot.  The default value wasn’t there in earlier versions of Ubuntu, and this had a rather disastrous side-effect when run by hand.

Ok, we can set the values of environment variables from a job, and we don’t have to match the job name in the usual job events.  We can combine these two facts in a very interesting way when we can export the value of a job’s environment variable into its job events.

Here’s the first job:

env AM_A_DISPLAY_MANAGER=1
export AM_A_DISPLAY_MANAGER

This sets the default value of $AM_A_DISPLAY_MANAGER, but this isn’t a variable we ever expect to be supplied by an event so it just gets passed into the environment of its processes.  It’s not that useful either on its own.

The export line is the useful one, it adds the value of the named environment variable to the job’s events.  That is the starting, started, stopping and stopped events.

Now, in another job, we can do:

start on started AM_A_DISPLAY_MANAGER=1

This is run when any job is started that has that environment variable in its events.  In other words, we can tag classes of services so we don’t have to list every single one.

And because everything in Upstart is the same fundamental type of thing, this can work in the opposite direction.  For example we can put in our job:

env NEED_PORTMAP=1
export NEED_PORTMAP

This means our events will have NEED_PORTMAP=1 in them, now remembering that the job waits for the side-effects of the starting event to complete, we can now write in /etc/init/portmap.conf:

start on starting NEED_PORTMAP=1

So we can implement a dependency-based init system with Upstart, an event-based init system.

I look forwards to finding out what else you can do with it.

(reposted from http://upstart.at/2010/12/03/event-matching-in-upstart/ – post comments there)

Dependency-based & Event-based init daemons and launchd

With the recent announcement of systemd, I’ve noticed some increased confusion around Upstart and what it means to be an event-based init daemon.  Now seems as good a time as any to try and clear that up by describing what I mean by that.

Dependency-based init

Before Upstart came along, the state of the art of init daemon replacements were the dependency-based init daemons.  The two most well-known at the time was the Service Management Facility (SMF) of Solaris, and initng on Linux.

The easiest way to understand how a dependency-based init daemon works is to look at another dependency-based system you’re probably more familiar with: the package manager of your Linux distribution.

When you want to install a package, for example the Apache Web Server, you tell the package manager to do that.  The Apache package will list additional dependencies that it requires to be installed, and those in turn will list additional dependencies, and so on.  The package manager will walk this dependency tree, eliminating those that you already have installed, and it will then flatten the remaining tree to get an order in which those remaining can be safely installed.

To put it simply: you say that you want Apache installed, but you may get more than that installed to ensure that Apache works.

A dependency-based init daemon works in fundamentally the same way.  When you say that you want Apache started, it looks at the configuration for that service for the list of dependency services, and builds up a similar tree.  Eliminating those already running, and flattening the tree, gives you a list of services that must be started in an order that they should be safe to start in.

You say you want Apache running, but you may get more than Apache running as a result.

Booting a system with a dependency-based init daemon, however, is a little strange.  They need to know the target set of services that must be running, otherwise they would start nothing.  SMF simply started all services that were not in manual start mode, initng had the concept of goal services whose dependencies were those that should be running — and used these to define the runlevels.

Once you have that list of goal services, you work out the dependency trees, and flatten them as normal – and thus you get an order that all services on the system should be started in.

Dependency-based init daemons work, but I believed there was a better way to do things.  I invented the event-based init daemon instead.

Event-based init

An event-based init daemon isn’t really a great leap from a dependency-based init daemon, it simply does everything backwards.  A simplistic view says that instead of starting Apache’s dependencies because Apache is started, it starts Apache because its dependencies are now running.

But it’s much more interesting than that, and much more flexible.  Most people don’t get the epiphany.

A better description might be that services are started and stopped due to external influences on them.  Those external influences can be anything, for example: hardware coming and going; changes in the time; and not least, other services.

The events represent changes in the system state, and services define the states in which they can be running, and the system reacts accordingly.

I’m still convinced this is the best way to work, not in the least because you can implement a dependency-based system with an event-based init daemon.  Starting a service causes an event for each of its dependencies declaring a need for them, and the service waits for those events to complete; those events cause the dependencies to be started.

launchd

The other well-known init daemon out there is Apple’s launchd, of which Lennart’s recent systemd project is similar implementation in some ways but not in others.

launchd’s modus operandi is that it starts services on demand, and it does this on the assumption that all services communicate through sockets or through the Mach IPC model.  For the socket-based services, launchd itself creates the listening sockets, and when it receives a connection it starts the service and hands off the listening socket to it.

This has a beautiful engineering elegance, and it’s easy to see why it appeals to us.

You don’t need to configure a service’s dependencies or requirements in the init daemon, instead the service causes its dependencies to be started through this on-demand activation. If the dependency isn’t ready to be started, the service simply blocks in the connect or open syscall until it is ready.

As launchd as matured, Apple have added support to watch for files on the disk and for cron-like schedule events.  In many ways, this makes launchd kinda like an event-based init daemon, except with listening sockets.

systemd takes a similar approach with regard to the listening sockets, though my understanding so far is that it combines it with a dependency-based resolution procedure for other parts of the system, rather than an event-based one.  I’m willing to be corrected on this though.

Upstart

Upstart is an event-based init daemon; it’s taken a little while to develop because it’s the first pure example of its kind, and I only replaced the working sysvinit cautiously.  I basically had to prove to myself, and others, that an event-based init daemon can really work.  That’s why Ubuntu 9.10 and 10.04 were the first versions to really start taking advantage of it.

I also wanted to keep it relatively stable to encourage adoption by other distributions, and I believe this has also paid off given that Fedora, RedHat and OpenSuSE have all adopted it now.

I’ve proven it works, and it’s been adopted, now the fun development can begin!

Two of the main complains about Upstart are that the start on and stop on mechanism to define services is complicated and exposes far too much of the event model, and that it’s not very well documented. Ironically, these two complaints are entirely related.

The start on/stop on mechanism is basically just a debug interface, it allowed me during early development to access the raw event queue and find out what types of service model we really needed.  Since it’s a debug interface, it wasn’t documented; I knew that future versions of Upstart would have a much better model.

So to correct a common misconception, the hideous start on lines are not a side-effect of event-based init daemons; they’re a side-effect of developing an event-based init daemon in a release early open-source way.

I’ve also mentioned that events can be just about anything, not just directly from other services.  This includes on-demand activation; I don’t see any reason why Upstart should not be able to create sockets as launchd does, a connection on those sockets would simply be an event that would cause a service to be started.

Likewise, I fully intend Upstart to take over activation of system and session bus services from D-Bus, using an event from the D-Bus daemon to start and manage the service on its behalf.

This latter example neatly illustrates how start on will be replaced.  Take a system bus service, you might declare such a service like this:

dbus system-bus org.freedesktop.UDisks
exec /usr/lib/udisks-daemon

That initial line replaces a whole slew of previous verbs.  It tells Upstart that this service should be activated from the D-Bus system bus when a message for the given name has no destination in the bus.  It also tells Upstart that this service should not be considered “ready” until it actually registers that name on the bus.

Finally it tells Upstart that the service can only be run while the D-Bus system bus service is running.  You might think this superfluous, but remember from above that an event-based init daemon can work both ways; starting this service manually as a system administrator would start the message bus for you, if it wasn’t already running.  This can be done with either an event or through the service connecting to the message bus via a known socket.

It’s this flexibility that still leaves me convinced that Upstart is a better all-round approach than the purity of launchd (or systemd).

Take another service, for example, the printing service: CUPS.  At first glance, you might believe that it can be on-demand activated when something connects to its socket.

And that would certainly appear to work, you’d click Print in an application and the printer service would be started.

But that’s not the full picture; what if there was a job in the queue from before you shut down?  You also need the service started if there are any files in the named queue directory.

And that’s still not the full picture; CUPS performs remote printer discovery, you most certainly don’t want to click Print and see no printers because CUPS hasn’t had time to discover them, having only just been started.  Users have short attention spans to wait, I know I certainly do.

You need a combination of different conditions to start CUPS; it should be started on demand, it should be started if there are files in the print queue, and it should be still started on boot (just low-priority once the system is idle) to discover remote printers.

A pure on-demand daemon just doesn’t cut it, you need something more flexible.

The last point about user impatience is also my other major disagreement here.  launchd supposes that you should always optimise for the minimum system footprint, at a cost to interaction performance.

It assumes that it’s ok to wait for a service to start when you click a button the first time, or bogusly that all services start immediately!

While this might be true in many situations, it’s also not true in many others.  I’ve met very few system administrators who think that their web server should only ever be started on demand, and shut down again once there are no users browsing it.

And if you’re going to do always-running services like this, you do need to be able to encode their dependencies and requirements in the init-daemon configuration, which negates the engineering precision of avoiding doing so through on-demand activation.

On systemd

I’m sure you’ve all by now read the announcement of systemd, and have probably come running to my blog to see what the reaction of Ubuntu and the Upstart author is!

As you know, improvements to the boot process has been something that Ubuntu have been working on for a few years now and this led to the development of Upstart.  We’re not the only ones working in this area, Intel have also been hard at work with different improvements of their own with the Moblin and MeeGo projects.

So it’s great to see some Fedora and OpenSuSE guys working on this too, and bringing some different ideas to the table!

I can’t say I disagree with some of Lennart’s observations about problems with Upstart, it’s certainly nowhere near perfect.  Now that the stable period leading up to the release of Ubuntu 10.04 LTS is over, I’m looking forwards to getting back into the code and trying to address them.

It’s far too early to tell which approach is going to work out better in the end; but that’s one of the great things about Linux.  The different distributions are able to develop in different directions, and we’re able to try out many different things.

On a personal note, I’m particularly pleased that Lennart has continued the punny naming scheme I began with Upstart. System D is a French concept that embraces responding to challenges when they happen, thinking fast and on your feet and adapting and improvising to get the job done.

Upstart 0.5: Relationships

Even the relatively simple System V rc scripts recognise that there are relationships between services, and that in many cases one or more others must be started before a particular service can itself be started: it allows for such relationships to be expressed by using a directory of numbered scripts that are run in series by the sysv rc script.

Tackling this problem in some way is arguably one of the main reasons that each of the alternate init daemons exists. Even launchd acknowledges the problem, even if its solution is to tell service developers that they should spin or sleep while dependencies aren’t available.

The Competition

The way in which the other leading init replacements tackle the relationship problem is through dependencies. This is not that surprising, since the concept is shared (and effectively mirrored) by both the dynamic link loader and the package manager; both things that a service maintainer knows well.

To illustrate how dependencies work, since I use that term precisely to mean only this behaviour, we’ll use one of the chains of the well known Network Manager service.

  • Network Manager depends on HAL
  • HAL depends on D-Bus

When A depends on B, B is required for A to function properly. Any attempt to start A must first start B.

This works well for the link loader, when we load an executable we also need to load and map the shared objects it links to.

It also works well for the package manager, when we install Network Manager it means we also need to install HAL and D-Bus for it to function.

However for an init daemon, it’s not normally ideal: the only reason that D-Bus and HAL will be running is because Network Manager depends on them. If we were to stop Network Manager, we would also stop HAL and D-Bus.

This obviously isn’t what we want, HAL and D-Bus are both essential services in their own right. Thus we end up with a target or goal set of services that must be started anyway, within this group the dependency relationships are only effective for ordering of them. Ironically, it is very rare indeed for a service to not be a target and so all of the complex ability of the dependency-based daemon is lost; the only reason to generate the dependency tree at runtime at all is to allow for parallel starts.

Upside Down Dependencies

Thus one of the first things that service maintainers have to get used to about Upstart is that its service relationships are upside down from the way that they might expect. Upstart assumes that if a service is installed, not disabled, and the required services, tasks or hardware is available then the service should be running.

In the dependency-based model, starting Network Manager would first start HAL which would first start D-Bus.

In the Upstart (event-based) model, D-Bus is started fulfilling HAL’s requirements so HAL is started, fulfilling Network Manager’s requirements (once a network card is available?) so Network Manager is then started.

Upstart has no notion of targets or goals, it simply ensures that all services that can and should be running are; and ensures that services are stopped when it is no longer the right time for them to be running.

Relationships through Events

The way in which relationships between services are defined is by having services react to each other’s events. To continue with our example, HAL would therefore have the following in its job definition:


start on started dbus
stop on stopping dbus

The first line means that when the dbus service is fully up and running (recall from previous posts that this event can be delayed as necessary), HAL will itself be started.

The second line is a little more interesting. Events in Upstart will block until the jobs they affect complete, and the stopping event is emitted before the dbus job is actually stopped and blocks it from doing so. Put more simply, HAL will be fully stopped before D-Bus is stopped.

Thus we have the simplest kind of Upstart relationship. Starting D-Bus will start HAL immediately afterwards, and stopping D-Bus will stop HAL first.

The portmap problem

Most maintainers at this point will be feeling quite smug and about to hit the comments button because they’ve thought of an example service that actually is a dependency, and should not be running if nothing needs it.

Remember that I said they were rare, not non-existant.

One such example is portmap, another is often something like tomcat. There are a few, but they’re certainly not the common case.

Happily one of the elegant things about Upstart’s design is that it does still support this model where it’s needed. In order for portmap to be started when we start an nfs-server, we simply write the following in portmap’s job definition:


start on starting nfs-server
stop on stopped nfs-server

Compare to the example for D-Bus/HAL and you’ll notice that it’s the events that have changed.

Remember that the starting event, like the stopping event we used in the previous example, blocks the job until jobs affected by the event are completed. Thus this first line means that when we start nfs-server, it will not be started until portmap is started.

And the second line is pretty much the mirror of the first in the previous example, once the nfs-server is stopped, we stop portmap as well since it’s no longer needed.

It may seem a little odd that the rules go in portmap, and not nfs-server, but it makes logical sense. It means that for an admin to work out why portmap is getting started, they just need to read the portmap definition and not hunt around the system to see what else might be doing it.

Also in many of the cases, such requirements are actually conditional. Apache doesn’t need to require tomcat, it’s only a requirement if it’s installed. Thus it makes more sense for tomcat to add itself to Apache’s environment rather than Apache to look for tomcat.

Upstart 0.5: Events

In the previous posts, I’ve covered the various features that make Upstart a good service manager, but these are things you’ll find in most others as well. It’s now time to cover that which is singularly unique to Upstart, Events.

Start and Stop

You’ve already seen the start and stop commands, which do somewhat unsurprising things to jobs. The important thing to remember about these is that they are not events. I just wanted to clear that up before we start, since it’s often been a source of confusion not helped by the design of some earlier versions of Upstart.

start and stop operate directly on jobs, and the command will not normally return until the operation is complete or otherwise interrupted. Services are considered complete when they are running, Tasks are considered complete when they have stopped again; in both cases the stop command is complete when the service or task has actually stopped.

This is important since it provides a common-sense behaviour, ensuring that the following operation is not a race condition:


# start apache
apache running (start), process 3591
# wget http://localhost/

Solving race conditions is one key part of Upstart’s purpose.

Both commands may also set environment variables, those set by the start command form part of the environment of the job itself and those set by the stop command are available to the pre-stop script.


# cat /etc/init/jobs.d/getty
instance $TTY
env SPEED=38400
exec /sbin/getty $SPEED $TTY

# start getty TTY=tty1
getty (tty1) running (start), process 4152

Events

As described above, the start and stop commands are admin instructions that act directly on named jobs. Events have many similar properties: they carry environment variables that end up in the environment of jobs they start, and they are not complete until the jobs that they affected have been started or stopped as appropriate.

The difference is that the start and stop commands are targeted at specific jobs, whereas events have no such targetting and instead it is jobs that specify which events they are interested in.

In the Upstart world events serve three general purposes: they act as signals of state changes that jobs can react to (e.g. hardware going away), as method calls to automatically start or stop jobs (e.g. shutdown) and as a way of passing information between jobs.

Events are identified by their name and have a different namespace to that of jobs. They are emitted by a D-Bus call or by using emit on the command-line, naming the event and providing any associated environment variables you wish:


# emit interface-up IFACE=eth0 ADDRFAM=Ethernet ADDRESS=01:23:45:67:89:0a

Jobs may match them on this name and any number of their environment variables, specifying whether the event would automatically start or stop the Job.


start on interface-up IFACE=eth* ADDRFAM=Ethernet

As a short-hand, where the order of the variables for an event is fixed, the names may be omitted:


start on interface-up wlan*

When a job is started by an event, the environment for that event forms part of the environment for the job and may be used when matching events that can automatically stop the job. Harking back to our getty job from previous posts, we can bind this to the lifetime of the underlying device.


start on tty-added
stop on tty-removed TTY=$TTY

instance TTY
exec /sbin/getty 38400 $TTY

We can also match multiple events, either requiring that both occur or either using unsurprising operators:


start on a-up and b-up
stop on a-down or b-down

In these situations, once stopped, both the a-up and b-up events must happen again for the job to be restarted.

Upstart Events

Upstart itself only emits a few events, leaving the rest up to application authors to define. The startup event is the most interesting of these, and is ultimately what nearly all jobs get chained from.

Job Events

As jobs are started and stopped, Upstart emits events on their behalf for four key points in their lifecyle.

  • starting is emitted when the job is first starting, and the job will not actually be started until this event completes.
  • started is emitted once the job is fully running.
  • stopping is emitted when the job is stopping (after the pre-stop has completed), the job will not actually be stopped until this event completes.
  • stopped is emitted once the job is fully stopped.

All of the events have the name of the job in the first variable, JOB and the instance of the job (if applicable) in the second variable, INSTANCE. The stopping and stopped events then have a series of variables indicating the reason for the job stopping: RESULT indicates whether it was a normal stop or a failure then if it failed, PROCESS will say what failed and EXIT_SIGNAL or EXIT_STATUS will contain the terminating signal or exit code.

For example, we can take action to backup a database if the server crashes:


start on stopping hersql RESULT=failed EXIT_SIGNAL=SEGV
task
exec hersql-backup

Jobs can also export variables from their own environment to others through these events by using the export stanza:


start on interface-up
stop on interface-down $IFACE

instance $IFACE
export IFACE
exec ...

Another job may then be started along with this one, and know what interface it’s bound to:


start on started JOBNAME
stop on stopping JOBNAME

instance $IFACE

We’ll look at the various powerful forms of dependency that these events allow us to express in the next post.

Upstart 0.5: Job Lifetime

Continuing the series of posts on Upstart 0.5, in this post I’ll be talking about the various ways that Upstart allows you to manage the lifetime of a job. These are guarantees that Upstart provides you so that when you start a job, you know what will happen if that job dies unexpectedly or someone else tries to start the job as well.

Respawning

We’ve all encountered those daemons that mysteriously die: sometimes they’re taken out by the OOM killer, and sometimes they’re just buggy and crash from time to time. And there’s also those processes that exit when they’re done, and need to be restarted (e.g. getty).

For all of these, Upstart provides the facility to respawn the job; effectively an automatic restart in the case of failure. Respawning is controlled by three things:

  • Whether or not to respawn
  • Whether or not the job exited “normally”
  • Whether it has been respawned too many times recently

Let’s take the sobby server as an example, here’s a job that tends to crash every now and then, and we’d like to keep it running. However, we’re also aware that every now and then, it crashes hard and needs repairing; so we limit it’s respawning to 10 times in 5 seconds (which happens to be the default).


  exec /usr/bin/sobby --autosave-file=/var/lib/sobby/autosave /var/lib/sobby/autosave

  respawn
  respawn limit 10 5

The daemon will be continually respawned until either the limit is reached, or the service is explicitly stopped by request. This isn’t ideal though, sobby has an exit command which we wish to honour; the daemon is well written enough that it only returns the zero exit code if this command has been run, and otherwise always returns a failure or signal of some description.

In addition, we know that the ABRT signal is raised on the daemon when the session file is corrupted (I’m making this up, btw), so we want to stop respawning in that case:

To accomplish this, we simply state which exit codes and signals are considered a normal exit condition:


  exec /usr/bin/sobby --autosave-file=/var/lib/sobby/autosave /var/lib/sobby/autosave

  respawn
  respawn limit 10 5

  normal exit 0 ABRT

Tasks can be respawned too; the only difference is that zero is always considered a normal exit condition for a task:


  task
  exec /usr/sbin/some-check $DEVICE

  respawn

This task will be continually run until it ends with a zero (success) exit code. We could add additional normal exit conditions as well, just as we can with a service.

Singletons

All Upstart jobs are singletons by default, this means that only one instance of that job may be running at any one time. To illustrate, let’s continue using the sobby job we defined above and start it:


  # start sobby
  sobby running (start), process 14977

Ok, we have a single instance of the sobby job running, and we can interrogate the status of that:


  # status sobby
  sobby running (start), process 14977

Now what happens if we (or someone else) tries to start another copy:


  # start sobby
  start: cannot start 'sobby': Already running
  zsh: exit 1   start sobby

This is the most sensible and sane default, it saves you having to worry about locking between services and mos importantly means that you can treat failures to obtain resources as true errors.

For example, if you request a D-Bus name and don’t get it, or attempt to bind to a socket and fail, you can treat that as an error since you know the service manager is already ensuring you’re a singleton. This means that you won’t silently pretend everything’s ok, and thus won’t hide problems.

Instance jobs

But what if you do want to be able to run multiple copies of the job? Upstart supports this though instance jobs, which may have multiple copies running. As well as being identified by the shared job name, each instance is also identified by a second-level instance name.

The instance name for each instance of a job must be unique within that job. Attempting to start another instance with an already used name will return an already running error again.

Thus the usual method for defining an instance name is by using variables from the job environment, which you’ll recall come from sources including the start request.

Let’s use the getty job we defined in the last post and turn that into an instance job:


  instance $TTY
  exec /sbin/getty 38400 $TTY

The instance keyword is the new addition, this defines the name for each instance of the job. Setting it to an ordinary string wouldn’t be much help, since there could only be one unique expansion, and you’d be back to a singleton job again; so we define it using variables from the job’s environment which will be expanded.

In this case, we can have an instance of the job for each unique value of the $TTY variable. This makes sense since this is also what we pass to getty. This means that Upstart is still able to provide the guarantee that another getty won’t be running with the same tty.

All that we need do is pass the value of the TTY environment variable when we start or stop the getty job:


  # start getty TTY=tty1
  getty (tty1) running (start), process 15001
  # start getty TTY=tty2
  getty (tty2) running (start), process 15006

And if we try and run another copy with the same TTY variable, we’ll still get already running:


  # start getty TTY=tty1
  start: cannot start 'getty': Already running
  zsh: exit 1   start getty TTY=tty1

There’s no builtin way to allow unlimited instances, since these would tend to eventually consume all available resources. Since any service or task needs to operate on something, or even just write something, then you’ll need some kind of locking and something in the job environment to tell it what to work on or write. If someone manages to come up with a truly unlimited instance job, you could do it trivially by passing a UUID=$(uuidgen) variable and instancing on that.

In the next post, I’ll cover one of the major differences between Upstart and other service managers: events!