Debug Systemd Service Units
Lastmod: 2023-01-26

Overview

When creating a new service file (or Unit file as they are officially called) it can be very confusing to understand what it’s actually attempting to run and what environment it’s running within. Your command works fine when you run it in the terminal but when systemd tries the command fails. So what is systemd doing differently?

Check RunBook Match

When you run sudo servicectl status <servicename>.service you see an error that looks like this:

     Loaded: loaded (/etc/systemd/system/ .service; disabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since #; # ago
    Process: # ExecStart=# (code=exited, status=127)
   Main PID: # (code=exited, status=127)

# systemd[1]: : Scheduled restart job, restart counter is at 5.
# systemd[1]: Stopped Test Server.
# systemd[1]: test.service: Start request repeated too quickly.
# systemd[1]: test.service: Failed with result 'exit-code'.
# systemd[1]: Failed to start Test Server.

Initial Steps Overview

  1. Check that your Unit Environment variables are correct

  2. Check journalctl for errors

  3. Check the full output of status

  4. Try running the command manually in a similar setup

Detailed Steps

1) Unit Environment variables need to be set

systemd will not inherit the PATH or any other environment variables of the User you have specified in the unit file.

In an environment where you know your command works try running env and then search through the output for any variables that your command might need. Unless you have specified these in your Unit file, they will not be set. This includes the PATH which is used to determine how to find your service.

1.1) PATH needs to be set

Run this: env | grep -i ^path

Copy the line you found to your service Unit file, prefixing it with the word Environment=

It might look something like this:

Environment=PATH=/home/username/.asdf/shims:/home/username/.asdf/bin:/home/username/bin:/home/username/go/bin:/home/username/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

Then reload your service and see if it’s working:

service_name='test.service'
sudo systemctl daemon-reload
sudo systemctl restart "${service_name}"
sudo systemctl status "${service_name}"

If it’s working, make sure to go back and strip it down to the paths you actually need.

1.2) Other variables

While a misconfigured PATH (see step 1.1) is usually the cause, many languages depend on other environment variables being set so that they can find packages that they depend on. eg. GOPATH, CARGO_HOME, GEM_HOME, NODE_PATH, ASDF_DIR etc

When systemd starts a service it does so in a clean environment, so if you need any environment variables set, then you need to add them to your service unit file: e.g.

Environment=ASDF_DIR=/home/foo/.asdf

2) journalctl

journalctl -u <service_name> -f

Unfortunately, while this command will show you the standard output and standard errors, it will often buffer this output quite differently than if you had run this command directly. This can make it a lot more difficult to link errors to their associated tasks.

Short of updating your command to not buffer output, you can get around this by installing expect via your Linux package manager. Then you can prefix your ExecStart command with /usr/bin/unbuffer e.g.

sudo apt install expect
ExecStart=/usr/bin/unbuffer /path/to/test.py

This has the downside of introducing another process where things could break.

3) systemctl status

sudo system status --full --lines=50 <service-name>

While this command could give us some useful information, more often than not the logs it shows are simply related to needing to restart too many times. If you remove and Restart= option from your service’s Unit file, then you should see far more useful errors.

Make sure to sudo systemctl daemon-reload and then sudo systemctl restart <service> before checking the status again.

The --full flag makes sure the service path isn’t truncated and --lines=<number> allows for showing more lines than the default 20.

4) Manual execution

When all else fails it’s time to attempt to emulate what systemd is trying to do and then addressing any errors that arise. You want to make sure you are running the command as the same user and with the exact same environment and command.

4.1) Constructing the command

We want to construct and run something like the following:

sudo runuser -l <User> -g <Group> -c "cd <WorkingDirectory> && <EnvironmentFile contents> <Environment> <ExecStart>"

-l <User> and -g <Group> should use their values from the Unit file or root if not specified.

cd <WorkingDirectory> should only be set if there is a value in the Unit file.

<EnvironmentFile contents> should just list the simple key-value pairs from this file. Systemd doesn’t need quotes around these values so they should be removed or escaped e.g. \". Not doing so will interfere with the quotes around the command that is being passed to runuser.

<Environment> is much the same; just a list of key-value pairs.

If you have any duplicate environment variables things can get a bit more tricky, in which the value specified last being the one that wins.

Finally <ExecStart> is simply just the value of the command to run.

4.2) Example

In case the section above wasn’t clear, here is an example:

# contents of /etc/default/extra
PATH="/home/test/bin"
DISPLAY=:2
# contents of /etc/systemd/system/test.service
[Unit]
Description=Amazing Test Service
After=nginx.service

[Service]
Type=simple
User=deploy
Environment=NPM_DIR=/home/test/.npm
EnvironmentFile=-/etc/default/extra
WorkingDirectory=/home/test/app
ExecStart=/home/test/app/bin/start_server
Restart=on-failure

[Install]
WantedBy=multi-user.target

This would become the following command:

sudo runuser -l deploy -c "cd /home/test/app && PATH=/home/test/bin DISPLAY=:2 NPM_DIR=/home/test/.npm /home/test/app/bin/start_server"

If you have copied everything correctly, in most cases this should result in the same error that systemd is seeing, but now you have excluded systemd from the problem. I often find at this stage the error becomes very obvious to me as I can instantly see what is different between this and the environment where the command works.

Common issues are normally, that the user or group is not set up to run this command or it’s missing required Environment variables.

4.3) Cannot reproduce?

Unfortunately, systemd is a complicated beast and while the above manual step will cover the most common case, it would be impossible to cover all the possible variants.

You can however build on this manual command by incorporating more variables from your Unit file until you track down the point of contention. This however will require digging into each of the options to figure out how to translate them into our manual version: systemd.exec documentation

Check Resolution

When successful, the following command should no longer have the output Active: failed

sudo system status --full --lines=50 <service-name>

If you see Active: inactive (dead) this doesn’t mean things are failing, it’s just saying that the command has already finished the task and exited.

Further Information

This is a fantastic overview of the basics of Systemd. I’ve linked to partway into the video so that it starts and the useful section: systemd - The Good Parts

General systemd documentation:

Authors

Gerry

comments powered by Disqus