For the last couple of months, both at the Ubuntu Developer Summit in Mountain View and on the #upstart IRC channel, we’ve been discussing the changes we want to make to upstart for the Feisty Fawn release of Ubuntu.
This will ship with a version of upstart based on the 0.3 series (it may end up getting called 0.5 before release); the primary goal for this are to have an init system that is suitable for general standalone list in any Linux distribution.
I’ll be giving a talk at linux.conf.au 2007 in Sydney with that aim, I hope to persuade at least one other major Linux distribution that it’s the right solution.
A complete list of the specifications and bugs being targeted for the 0.3 release can be found in Launchpad.
The rest of this post will introduce some of the shiniest new things.
Upstart takes care of starting, supervising and stopping daemons itself; unlike in the init script system where you have to write code to do that yourself, often using a helper like
start-stop-daemon. All you need to is give the path to, and arguments for, the binary you wish to be started.
Some jobs, especially quick tasks, will usually be written as shell scripts. To save having to write a separate file and invoke it, you can include shell script code directly in the job file instead of using the
script echo /usr/share/apport/apport > /proc/sys/kernel/crashdump-helper end script
Usually it’s not sufficient to just start a binary and wish it well; you frequently need something to be run before it is started to prepare the system, and sometimes something after it terminates to clean up again.
For these purposes, additional snippets of shell code can be given — to be run before the binary is started, and after it has finished. Unlike init scripts, these do not need to start or stop the daemon itself; that’s done automatically based on the
pre-start script mkdir -p /var/run/dbus chown messagebus:messagebus /var/run/dbus end script post-stop script rm -f /var/run/dbus/pid end script
For consistency, executables may be specified with
pre-start exec and
post-start exec instead of shell scripts as above.
It’s sometimes useful to be able to run something after the binary has been started; for example, you may wish to attempt to connect to the daemon to determine whether it is ready to serve requests.
post-start script or
post-start exec can be used to this.
post-start script # wait for listen on port 80 while ! nc -q0 localhost 80 </dev/null >/dev/null 2>&1; do sleep 1; done end script
It’s also useful to be able to notify a daemon that it may be about to be stopped, or delay it for a while.
pre-stop script or
pre-stop exec can be used for this.
pre-stop script # disable the queue, wait for it to become empty fooctl disable while fooq >/dev/null; do sleep 1 done end script
Events are now quite a bit more detailed than in previous versions; they’re still named with simple strings that are up to the system sending the event, but they can now include arguments and environment variables which are passed through to jobs being started or stopped as a result.
initctl emit network-interface-up eth0 -DIFADDR=00:11:D8:98:1B:37
This command will now output all of the effects of this event, and will not terminate until the event has been fully handled inside upstart.
Events such as the above can be used by jobs that examine the event arguments and environment within their script:
start on network-interface-up script [ $1 = lo ] && exit 0 grep -q $IFADDR /etc/network/blacklist && exit 0 # etc. end script
or matched directly in the
start on and
stop on stanzas:
start on block-device-added sda*
The events generated by job state changes have also changed. Previously both jobs and events shared the same namespace, which not only caused confusion but actually caused some problems when one accidentally named a job after an event.
The two primary events generated are now simply called
stopped; they inform you that a job is fully up and running, or fully shut down again. The name of the job is received as an argument to this event.
start on started dbus
started event is not emitted until the
post-start task (described above) has finished; so the
post-start task can delay other jobs from starting because they can’t yet connect to the daemon.
stopped event is not emitted until after the
post-stop task has finished.
The other two events emitted by a job are special; they are the
stopping events. The reason they are special is that the job is not permitted to start or stop until the event has been handled.
This means that if you have a task to perform when your database server is stopped, but before it’s actually terminated, it’s as simple as:
start on stopping mysql exec /usr/bin/backup-db.py
MySQL won’t be terminated until the backup has finished.
This is especially useful for daemons that depend on each other, for example HAL needs DBUS, it shouldn’t be started until DBUS is running and DBUS should not be stopped until HAL has been terminated. All the HAL job needs is:
start on started dbus stop on stopping dbus
Likewise if tomcat is installed, Apache should not be started until tomcat is running; and tomcat should not be stopped until apache has been terminated. All the tomcat job needs is:
start on starting apache stop on stopped apache
Nothing goes smoothly all of the time, sometimes tasks the job runs will fail, or the daemon itself will die. As well as providing the ability for a crashed daemon to be automatically restarted, upstart ensured that other jobs are notified with a special
failed argument to the
start on stopped typo failed script echo "typo failed again " | mail -s "typo failed" root end script
And if any job started or stopped by an event fails, it’s possible to discover that the event itself failed.
start on network-interface-up/failed
While tasks such as configuring a network interface, or checking and mounting a block device are usually performed as a result of events; services are more complicated.
Services normally need to be running while the system is in a certain state, not just when a particular event occurs. Therefore upstart allows you to describe arbitrarily complex system states by referring to events that define their changes.
For example, many services should be running only while the filesystem is mounted, and at least one network device is up. We have events to indicate the changes into and out of these dates, we just need to combine them:
from fhs-filesystem-mounted until fhs-filesystem-unmounted and from network-up until network-down
until operator defines a period between two events, the
and operator ensures we’re within both of these periods.
Perhaps we need to be running while any display manager is:
from started gdm until stopping gdm or started kdm until stopping kdm
Or maybe we only want to be run if a network interface comes up before bind9 has been started:
on network-interface-up and from startup until started bind9
These “complex event configurations” can appear in any job file; and any job file itself can serve as a reference for other jobs. They will be started and stopped at the same time as the named job:
script stanza from a job file means that it simply defines a state that can serve as a reference for others. As such, the
multiuser state is simply a job file that defines it.
As an added bonus, these states can still have
post-stop, etc. scripts.