Exit Codes, systemd and SQL Server on Linux
In this blog post we’re going to cover systemd, process exit codes and highlight how systemd reacts in certain exit conditions from SQL Server on Linux. My friend and SQL Server guru Argenis Fernandez – @dbargenis asked about this behavior on Twitter and I’ve been meaning to write this post, so here you go! Also, there’s a Connect item filed by Argenis on this here. Vote!
systemd Basics
Systemd is an initialization daemon, it’s job is to bring the system to usable state. Meaning, it’s responsible for the orderly starting of services on a Linux system. It does much more than that, in fact, one of it’s other core components is journald. Journald stores logging information from systemd units.
Now with respect to SQL Server on Linux, the information logged into journald is systemd’s information about our mssql.service unit and also the SQL Server Error Log. The SQL Server Error Log lands in here because it’s written to standard out and by default, a service unit’s standard out is written into the journal.
We can query the information stored in journald with the command journalctl and below is the syntax to query a particular service unit’s log in journald. Running just journalctl will cause the text output to run off the side of the screen without wrapping the text in your terminal. To get the text to wrap, use the following code…then pipe the output into a pager like more or less because you know…less is more. We’ll use this command in the next section.
journalctl -u mssql-server –no-pager | more
Service units in systemd are configured in unit files and SQL Server on Linux’s unit file lives in the file /usr/lib/systemd/system/mssql-server.service. Inside that file we have the following configuration:
# Restart on non-successful exits.
Restart=on-failure
What Restart=on-failure option does, is if a systemd unit returns an unclean exit code, systemd will automatically restart the service. There are other conditions that cause it to restart such as responding to unclean signals, watchdog and also service timeouts. We’ll save those discussions for another day.
Understanding Process Exit Codes
When a process exits, it will return an integer value to the parent process. If a process terminates cleanly, it conventionally returns 0 to the parent process. This means all is well in the world and the process has shut down. In a relational database system this is significant, in that, we really do want our systems to shut down cleanly. Now when things go wrong, that’s when process return a non-zero value. systemd is the parent process of the sqlservr process and receives it’s exit code. Let’s see what happens in certain exit conditions from SQL Server on Linux.
Let’s say we initiate a shutdown in SQL Server the T-SQL command SHUTDOWN here’s what we’ll get in journald.
Oct 28 05:27:58 sqlb.lab.centinosystems.com sqlservr[1103]: 2017-10-28 05:27:58.37 spid51 Server shut down by request from login sa.
Oct 28 05:27:58 sqlb.lab.centinosystems.com sqlservr[1103]: 2017-10-28 05:27:58.37 spid51 SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.
Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service: main process exited, code=exited, status=1/FAILURE
Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Unit mssql-server.service entered failed state.
Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service failed.
Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service holdoff time over, scheduling restart.
Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Started Microsoft SQL Server Database Engine.
Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Starting Microsoft SQL Server Database Engine…
Oct 28 05:28:02 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:28:02.87 Server Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64)
Oct 28 05:31:24 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:31:24.22 spid51 Server shut down by NOWAIT request from login sa.
Oct 28 05:31:24 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:31:24.22 spid51 SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.
Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service: main process exited, code=exited, status=1/FAILURE
Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Unit mssql-server.service entered failed state.
Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service failed.
Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service holdoff time over, scheduling restart.
Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Started Microsoft SQL Server Database Engine.
Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Starting Microsoft SQL Server Database Engine…
Oct 28 05:31:27 sqlb.lab.centinosystems.com sqlservr[2035]: 2017-10-28 05:31:27.64 Server Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64)