Exit Codes, systemd and SQL Server on Linux

In this blog post we’re going to cover systemd, process exit codes and highlight how systemd reacts in certain exit conditions from SQL Server on Linux. My friend and SQL Server guru Argenis Fernandez – @dbargenis asked about this behavior on Twitter and I’ve been meaning to write this post, so here you go! Also, there’s a Connect item filed by Argenis on this here. Vote!

systemd Basics

Systemd is an initialization daemon, it’s job is to bring the system to usable state. Meaning, it’s responsible for the orderly starting of services on a Linux system. It does much more than that, in fact, one of it’s other core components is journald. Journald stores logging information from systemd units. 

Now with respect to SQL Server on Linux, the information logged into journald is systemd’s information about our mssql.service unit and also the SQL Server Error Log. The SQL Server Error Log lands in here because it’s written to standard out and by default, a service unit’s standard out is written into the journal. 

We can query the information stored in journald with the command journalctl and below is the syntax to query a particular service unit’s log in journald. Running just journalctl will cause the text output to run off the side of the screen without wrapping the text in your terminal. To get the text to wrap, use the following code…then pipe the output into a pager like more or less because you know…less is more. We’ll use this command in the next section.

journalctl -u mssql-server –no-pager | more

Service units in systemd are configured in unit files and SQL Server on Linux’s unit file lives in the file /usr/lib/systemd/system/mssql-server.service. Inside that file we have the following configuration:

# Restart on non-successful exits.

Restart=on-failure

What Restart=on-failure option does, is if a systemd unit returns an unclean exit code, systemd will automatically restart the service. There are other conditions that cause it to restart such as responding to unclean signals, watchdog and also service timeouts. We’ll save those discussions for another day.

Understanding Process Exit Codes

When a process exits, it will return an integer value to the parent process. If a process terminates cleanly, it conventionally returns 0 to the parent process. This means all is well in the world and the process has shut down. In a relational database system this is significant, in that, we really do want our systems to shut down cleanly. Now when things go wrong, that’s when process return a non-zero value.  systemd is the parent process of the sqlservr process and receives it’s exit code. Let’s see what happens in certain exit conditions from SQL Server on Linux.

Let’s say we initiate a shutdown in SQL Server the T-SQL command SHUTDOWN here’s what we’ll get in journald. 

Oct 28 05:27:58 sqlb.lab.centinosystems.com sqlservr[1103]: 2017-10-28 05:27:58.37 spid51      Server shut down by request from login sa.

Oct 28 05:27:58 sqlb.lab.centinosystems.com sqlservr[1103]: 2017-10-28 05:27:58.37 spid51      SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service: main process exited, code=exited, status=1/FAILURE

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Unit mssql-server.service entered failed state.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service failed.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service holdoff time over, scheduling restart.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Started Microsoft SQL Server Database Engine.

Oct 28 05:28:00 sqlb.lab.centinosystems.com systemd[1]: Starting Microsoft SQL Server Database Engine…

Oct 28 05:28:02 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:28:02.87 Server      Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64)

 
Now in the output above, you’ll notice a bolded line. In there, you can system that systemd[1] receives a return code from SQL Server of status=1/FAILURE.  Systemd[1] is the parent process to sqlservr, in fact it’s the parent to all processes on our system. It receives the exit code and immediately, systemd initiates a restart of the service due to the configuration we have for our mssql-server systemd unit.
 
What’s interesting is that this happens even on a normal shutdown. But that simply doesn’t make sense, return values on clean exits should return 0. It’s my understanding of the SHUTDOWN command, that it will cause the database engine to shutdown cleanly. 
 
Now let’s say we initiate a SHUTDOWN WITH NOWAIT
 

Oct 28 05:31:24 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:31:24.22 spid51      Server shut down by NOWAIT request from login sa.

Oct 28 05:31:24 sqlb.lab.centinosystems.com sqlservr[1822]: 2017-10-28 05:31:24.22 spid51      SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service: main process exited, code=exited, status=1/FAILURE

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Unit mssql-server.service entered failed state.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service failed.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: mssql-server.service holdoff time over, scheduling restart.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Started Microsoft SQL Server Database Engine.

Oct 28 05:31:25 sqlb.lab.centinosystems.com systemd[1]: Starting Microsoft SQL Server Database Engine…

Oct 28 05:31:27 sqlb.lab.centinosystems.com sqlservr[2035]: 2017-10-28 05:31:27.64 Server      Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) – 14.0.3006.16 (X64)


In this output, we see the same behavior, SQL Server returns a status=1/FAILURE and restarts. This makes sense to me, an immediate shutdown should return a non-zero value and it does. 
 
In this post we covered, systemd units and their restart configurations. We looked at exit codes and how they’re used to communicate back to the parent process about the child’s exit status. While it’s just SQL Server…as DBAs we still need to learn about this new operating environment and how it works! Keep learning!
 
For more details on this behavior and it’s configuration I encourage you to read the documentation at this link here. Further, if you’d like a deep dive into systemd check out my Pluralsight course LFCE: Advanced Network and System Administration In this course I cover systemd in great detail, it’s architecture, units and target.