With the recent announcement of systemd, I’ve noticed some increased confusion around Upstart and what it means to be an event-based init daemon. Now seems as good a time as any to try and clear that up by describing what I mean by that.
Dependency-based init
Before Upstart came along, the state of the art of init daemon replacements were the dependency-based init daemons. The two most well-known at the time was the Service Management Facility (SMF) of Solaris, and initng on Linux.
The easiest way to understand how a dependency-based init daemon works is to look at another dependency-based system you’re probably more familiar with: the package manager of your Linux distribution.
When you want to install a package, for example the Apache Web Server, you tell the package manager to do that. The Apache package will list additional dependencies that it requires to be installed, and those in turn will list additional dependencies, and so on. The package manager will walk this dependency tree, eliminating those that you already have installed, and it will then flatten the remaining tree to get an order in which those remaining can be safely installed.
To put it simply: you say that you want Apache installed, but you may get more than that installed to ensure that Apache works.
A dependency-based init daemon works in fundamentally the same way. When you say that you want Apache started, it looks at the configuration for that service for the list of dependency services, and builds up a similar tree. Eliminating those already running, and flattening the tree, gives you a list of services that must be started in an order that they should be safe to start in.
You say you want Apache running, but you may get more than Apache running as a result.
Booting a system with a dependency-based init daemon, however, is a little strange. They need to know the target set of services that must be running, otherwise they would start nothing. SMF simply started all services that were not in manual start mode, initng had the concept of goal services whose dependencies were those that should be running — and used these to define the runlevels.
Once you have that list of goal services, you work out the dependency trees, and flatten them as normal – and thus you get an order that all services on the system should be started in.
Dependency-based init daemons work, but I believed there was a better way to do things. I invented the event-based init daemon instead.
Event-based init
An event-based init daemon isn’t really a great leap from a dependency-based init daemon, it simply does everything backwards. A simplistic view says that instead of starting Apache’s dependencies because Apache is started, it starts Apache because its dependencies are now running.
But it’s much more interesting than that, and much more flexible. Most people don’t get the epiphany.
A better description might be that services are started and stopped due to external influences on them. Those external influences can be anything, for example: hardware coming and going; changes in the time; and not least, other services.
The events represent changes in the system state, and services define the states in which they can be running, and the system reacts accordingly.
I’m still convinced this is the best way to work, not in the least because you can implement a dependency-based system with an event-based init daemon. Starting a service causes an event for each of its dependencies declaring a need for them, and the service waits for those events to complete; those events cause the dependencies to be started.
launchd
The other well-known init daemon out there is Apple’s launchd, of which Lennart’s recent systemd project is similar implementation in some ways but not in others.
launchd’s modus operandi is that it starts services on demand, and it does this on the assumption that all services communicate through sockets or through the Mach IPC model. For the socket-based services, launchd itself creates the listening sockets, and when it receives a connection it starts the service and hands off the listening socket to it.
This has a beautiful engineering elegance, and it’s easy to see why it appeals to us.
You don’t need to configure a service’s dependencies or requirements in the init daemon, instead the service causes its dependencies to be started through this on-demand activation. If the dependency isn’t ready to be started, the service simply blocks in the connect or open syscall until it is ready.
As launchd as matured, Apple have added support to watch for files on the disk and for cron-like schedule events. In many ways, this makes launchd kinda like an event-based init daemon, except with listening sockets.
systemd takes a similar approach with regard to the listening sockets, though my understanding so far is that it combines it with a dependency-based resolution procedure for other parts of the system, rather than an event-based one. I’m willing to be corrected on this though.
Upstart
Upstart is an event-based init daemon; it’s taken a little while to develop because it’s the first pure example of its kind, and I only replaced the working sysvinit cautiously. I basically had to prove to myself, and others, that an event-based init daemon can really work. That’s why Ubuntu 9.10 and 10.04 were the first versions to really start taking advantage of it.
I also wanted to keep it relatively stable to encourage adoption by other distributions, and I believe this has also paid off given that Fedora, RedHat and OpenSuSE have all adopted it now.
I’ve proven it works, and it’s been adopted, now the fun development can begin!
Two of the main complains about Upstart are that the start on and stop on mechanism to define services is complicated and exposes far too much of the event model, and that it’s not very well documented. Ironically, these two complaints are entirely related.
The start on/stop on mechanism is basically just a debug interface, it allowed me during early development to access the raw event queue and find out what types of service model we really needed. Since it’s a debug interface, it wasn’t documented; I knew that future versions of Upstart would have a much better model.
So to correct a common misconception, the hideous start on lines are not a side-effect of event-based init daemons; they’re a side-effect of developing an event-based init daemon in a release early open-source way.
I’ve also mentioned that events can be just about anything, not just directly from other services. This includes on-demand activation; I don’t see any reason why Upstart should not be able to create sockets as launchd does, a connection on those sockets would simply be an event that would cause a service to be started.
Likewise, I fully intend Upstart to take over activation of system and session bus services from D-Bus, using an event from the D-Bus daemon to start and manage the service on its behalf.
This latter example neatly illustrates how start on will be replaced. Take a system bus service, you might declare such a service like this:
dbus system-bus org.freedesktop.UDisks
exec /usr/lib/udisks-daemon
That initial line replaces a whole slew of previous verbs. It tells Upstart that this service should be activated from the D-Bus system bus when a message for the given name has no destination in the bus. It also tells Upstart that this service should not be considered “ready” until it actually registers that name on the bus.
Finally it tells Upstart that the service can only be run while the D-Bus system bus service is running. You might think this superfluous, but remember from above that an event-based init daemon can work both ways; starting this service manually as a system administrator would start the message bus for you, if it wasn’t already running. This can be done with either an event or through the service connecting to the message bus via a known socket.
It’s this flexibility that still leaves me convinced that Upstart is a better all-round approach than the purity of launchd (or systemd).
Take another service, for example, the printing service: CUPS. At first glance, you might believe that it can be on-demand activated when something connects to its socket.
And that would certainly appear to work, you’d click Print in an application and the printer service would be started.
But that’s not the full picture; what if there was a job in the queue from before you shut down? You also need the service started if there are any files in the named queue directory.
And that’s still not the full picture; CUPS performs remote printer discovery, you most certainly don’t want to click Print and see no printers because CUPS hasn’t had time to discover them, having only just been started. Users have short attention spans to wait, I know I certainly do.
You need a combination of different conditions to start CUPS; it should be started on demand, it should be started if there are files in the print queue, and it should be still started on boot (just low-priority once the system is idle) to discover remote printers.
A pure on-demand daemon just doesn’t cut it, you need something more flexible.
The last point about user impatience is also my other major disagreement here. launchd supposes that you should always optimise for the minimum system footprint, at a cost to interaction performance.
It assumes that it’s ok to wait for a service to start when you click a button the first time, or bogusly that all services start immediately!
While this might be true in many situations, it’s also not true in many others. I’ve met very few system administrators who think that their web server should only ever be started on demand, and shut down again once there are no users browsing it.
And if you’re going to do always-running services like this, you do need to be able to encode their dependencies and requirements in the init-daemon configuration, which negates the engineering precision of avoiding doing so through on-demand activation.
