Next month I am hoping to release Upstart 0.5.0, the culmination of almost a year’s worth of work on it. Comparitively the version that shipped in edgy (0.2.x) was simply an essay to figure out the basics and the version in feisty thru hardy (0.3.x) a first draft. The new version has been stripped back to the very basics and rebuilt to correct the problems we found with the earlier versions, and to make sure it can handle real world uses as simply and elegantly as possible.
Over the next few weeks, I’ll be writing about the new version; both how it has improved from previous versions and how it compares to what else is out there.
First we’ll look at how Upstart allows you to manage the lifecyle of services and tasks (collectively jobs) that you wish to manage. We’ll use the D-Bus daemon as an example service, simply because it’s a modern, well-behaved service that we’re all familiar with.
With SystemV RC, we would have had a single
/etc/init.d/dbus file accepting both
stop as arguments. They may have looked something like this:
case "$1" in start) start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon ;; end) start-stop-daemon --stop --pidfile /var/run/dbus.pid ;; esac
As you’re well aware, the simple act of starting a daemon and stopping again is not so simple this way. You nearly always end up requiring some kind of helper like
start-stop-daemon to help out, and rely on accurate PID files and the like.
Upstart, like just about every other modern service manager (but strangely, not SMF), takes care of all of this hard work for you. Instead of defining how to start and stop a service you just define what to start. Here’s how you’d define the same service in Upstart:
Setup and teardown
Of course, we all know that no service definition is ever that simple. I massively simplified the SystemV example for the purposes of documentation. In reality, we frequently need to do various things to set up the system for the daemon and clean up again afterwards. The original start shell code probably looks more like this (and even now, I’m simplifying for space):
mkdir /var/run/dbus chown messagebus.messagebus /var/run/dbus /usr/bin/dbus-uuidgen --ensure start-stop-daemon --start --pidfile /var/run/dbus.pid /usr/sbin/dbus-daemon
We need a directory for socket files, etc. and to create the machine id if missing. ANd likewise to shut it down, we need to clean up:
start-stop-daemon --stop --pidfile /var/run/dbus.pid rm -rf /var/run/dbus
And this is where most init replacements fall down (especially launchd). In fact, ironically, you’ll often find the developers using their minimal service definitions when they talk about how fast their system can boot. You can boot really fast if you don’t start anything properly.
Obviously I wouldn’t be pointing this out if Upstart didn’t allow you to do this properly; we’ll extend our minimal service definition to include the set up and tear down code necessary.
pre-start script mkdir /var/run/dbus chown messagebus.messagebus /var/run/dbus /usr/bin/dbus-uuidgen --ensure end script exec /usr/sbin/dbus-daemon post-stop script rm -rf /var/run/dbus end script
Before we just defined one process in a job’s lifecycle, known as the main process. Our new definition defines two more, the pre-start and post-stop processes. We’ve chosen to define them as shell scripts embedded in the definition, we could have defined them as binaries to execute if we preferred (using
pre-start exec), and we could have defined the main process as a script (using
As their name suggests, these processes are run before the main process is started and after it has been stopped respectively. In fact, Upstart guarantees more than that:
- For every time that the job is started, the post-stop process will be run.
- For every time that the main process is run, the pre-start process will have been completed successfully first.
It might seem a little strange that the post-stop process will always run but the pre-start process doesn’t have as strong a guarantee. This is because it’s possible for the job to be stopped immediately after it is started. Should that happen, Upstart will not run the main process since there’s no need, and therefore will also not run the pre-start process; however to ensure the system is clean, it always runs the post-stop process.
These guarantees also provide sane restart behaviour. If you restart a job, the main process is killed, the post-stop process is run, then the pre-start process is run again before the main process. If you cancel a restart (by stopping the job again) after the post-stop process has been run, it will always be run again.
Spawned, Running and Killed
Upstart makes important distinctions in the state of the main process, it does not necessarily assume that just because the
exec() syscall has succeeded that the process is in a suitable running state. Likewise, it does not assume that just because the
kill() syscall has succeeded that the process is no longer running.
The latter is easy to understand, delivering the
TERM signal to a running process normally just invokes its own termination handler which may perform any number of activities before cleanly shutting down. Upstart waits for the actual child signal signifying termination before running the post-stop script, until that point the process is considered merely “killed”. Obviously too long in the “killed” state means Upstart delivers the much more harcode
KILL signal, but that’s adjustable.
The former is harder to understand since the new binary is in memory and is probably at least initialising, but that’s the point: it isn’t yet ready for other jobs to use. In the SystemV script, this wasn’t an issue, since we could generally rely on daemons (well behaved ones anyway) to follow the convention that they should not
fork() until initialisation was completed successfully.
Since Upstart forks and supervises its own processes, it generally prefers that daemons do not
fork() and remain as the pid they were given when started. So how do jobs signify that they are ready? There are a few ways:
- By forking as before. As I’ve talked about before, Upstart can supervise process that fork, and it will wait for that to happen before assuming the process is ready.
- By raising the
STOPsignal. Jobs marked with
expect stopwill wait for this, and once received will sent it the
CONTsignal and assume that it is now ready.
- By registering a D-Bus name. An early 0.5.x release will wait for a particular D-Bus name to be registered, and not assume that the job is ready until it has done so.
- By calling
listen(). Again, planned for an early 0.5.x release, Upstart will use the same mechanism it uses to follow forks to watch for the
- With a post-start script, more on that in a second.
The last two processes
I’ve introduced the three processes that most jobs will tend to use, but there’s also another two which will be somewhat rarer but are probably the most powerful of them all. These are the post-start and pre-stop processes, and they’re interesting because they’re run while the main process is running.
The post-start process, as its name suggests, is run after the main process has been spawned and any event we were expecting (see above) has happened. The job will not be considered ready until the post-start process completes, thus a common use for it is to interrogate the daemon or send it commands it can only act on once its running.
The pre-stop process is run when a request to stop the job occurs (this means it is not run if the main process terminates on its own), and the process is not killed until it finishes. It receives information about the request, and can cause that request to be ignored (thus leaving the job running). Another common use is to send the daemon commands before it receives the TERM signal.
So that’s a look at the ways we can define the lifecycle of an Upstart job. In the next couple of posts we’ll look at the environment and session of jobs, and then at matters such as respawning and singletons.