Data Persistency and Advanced SQL Server Disk Topologies in Kubernetes

When working with SQL Server in containers and Kubernetes storage is a key concept. In this post, we’re going to walk through how to deploy SQL Server in Kubernetes with Persistent Volumes for the system and user databases.

One of the key principals of Kubernetes is the ephemerality of Pods. No Pod is every redeployed, a completely new Pod is created. If a Pod dies, for whatever reason, a new Pod is created in its place there is no continuity in the state of that Pod. The newly created Pod will go back to the initial state of the container image defined in the Pod’s spec. This is very valuable for stateless workloads, not so much for stateful workloads like SQL Server.

This means that for a stateful workload like SQL Server we need to store both configuration and data externally from the Pod to maintain state through the recreation of a Pod. Kubernetes give us constructs two constructs to do that, environment variables and Persistent Volumes. 

Using Environment Variables for Container Configuration

Container-based applications use environment variables for configuration at startup. The SQL Server container has a collection of environment variables that can be used to configure it at container startup. We will leverage two of those in this configuration. MSSQL_DATA_DIR and MSSQL_LOG_DIR these allow us to define a file system locations for user database and log files. When the SQL Server container is started inside the Pod, it reads the environment variables at runtime and sets its configuration based on those values. We define these variables as part of the Pod Spec. We will cover that configuration below.

Using Persistent Volumes to Maintain Database State

To persist the state of our SQL Server container, we will configure SQL Server to store its data and log files for both user and system databases on Persistent Volumes.

First, let’s review how SQL Server in a container starts up. During the initial startup, the SQL Server process checks to see if there are any system databases in the default system file location which is, /var/opt/mssql/data. If there are none the system databases are copied there, if they are there no action is taken. 

To add persistently to the system databases, and really all of the other components of SQL Server such as the Error Log and other system files, we will configure /var/opt/mssql so that it is backed by a Persistent Volume.

By placing the system databases on a Persistent Volume, when a Pod is recreated and the Persistent Volumes are attached and mounted in the same location when the SQL Server process starts up it sees the system databases and has what it needs to maintain state between creation.

If there are records for user databases in the system databases, SQL Server will start the process of bringing those databases online as well. We certainly the default location for user databases is /var/opt/mssql/data but we are going to override that with an environment variable for both the data and log directories, placing each on a dedicated Persistent Volumes.

Let’s walk through that configuration together. 

Persistent Volume Claims

In this configuration, we will use dynamic storage provisioning. In dynamic provisioning, a Persistent Volume Claim (PVC) is used to request a Persistent Volume (PV) from a Storage Class. In this case, we’ll be using AKS’s managed-premium Storage Class. 

Here we define three PVCs, one for each place we want Persistent Volume, for the system files and databases and the user database and log files.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: "pvc-sql-data"
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: managed-premium
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: "pvc-sql-system"
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: managed-premium
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: "pvc-sql-log"
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: managed-premium
  resources:
    requests:
      storage: 10Gi

Deployment

In the Pod spec for our Deployment, we want to define several elements to support this configuration. 

  • Volumes – define volumes that can be mounted by this Pod. In this case, we’re creating and naming three volumes, backed by the PVCs defined above.
  • volumeMounts – volumes mounted into the container and their mountPath, location. This maps the names from the named Volumes to a location in the filesystem in the container.
  • env – due to the ephemerality of the container in the Pod, we need to tell SQL Server at start up that the data and log files will be stored in a specified directory. We are leaving the system databases and files in the default location which is /var/opt/mssql
The net effect of this storage configuration is that we are mapping the Persistent Volumes into a particular location in the filesystem inside the container. 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mssql-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mssql
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mssql
    spec:
      
      containers:
      - name: mssql
        image: 'mcr.microsoft.com/mssql/server:2017-latest'
        ports:
        - containerPort: 1433
        env:
        - name: ACCEPT_EULA
          value: 'Y'
        - name: MSSQL_DATA_DIR
          value: '/data'
        - name: MSSQL_LOG_DIR
          value: '/log'
        - name: SA_PASSWORD
          value: 'S0methingS@Str0ng!'
        volumeMounts:
        - name: mssql-system
          mountPath: /var/opt/mssql
        - name: mssql-data
          mountPath: /data
        - name: mssql-log
          mountPath: /log
      volumes:
      - name: mssql-system
        persistentVolumeClaim:
          claimName: pvc-sql-system
      - name: mssql-data
        persistentVolumeClaim:
          claimName: pvc-sql-data
      - name: mssql-log
        persistentVolumeClaim:
          claimName: pvc-sql-log

Service

We’ll front end our SQL Server with a public IP address and a load balancer. 

apiVersion: v1
kind: Service
metadata:
  name: mssql-deployment
spec:
  selector:
    app: mssql
  ports:
    - protocol: TCP
      port: 31433
      targetPort: 1433
  type: LoadBalancer

Apply the Configuration

Save the code above into a YAML file and deploy it into SQL Server.

kubectl apply -f deployment-advanced-disk.yaml

You’ll get this output

persistentvolumeclaim/pvc-sql-data created
persistentvolumeclaim/pvc-sql-system created
persistentvolumeclaim/pvc-sql-log created
deployment.apps/mssql-deployment created
service/mssql-deployment created

Confirm the configuration

We can use kubectl get pv to list the Persistent Volumes (PV) dynamically allocated by our cluster. Here there are three Persistent Volumes. The key here is the status is Bound, which means they are bound to a PVC. I also want to point out the Reclaim Policy is Delete. This means if the PVC is deleted, the PV will be deleted at a cleanup interval sometime in the future. 

kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS      REASON   AGE
pvc-e0b418ef-6e69-11e9-a433-f659caf6a6f5   10Gi       RWO            Delete           Bound    default/pvc-sql-data     managed-premium            11m
pvc-e0cf2345-6e69-11e9-a433-f659caf6a6f5   10Gi       RWO            Delete           Bound    default/pvc-sql-system   managed-premium            11m
pvc-e0ea01a8-6e69-11e9-a433-f659caf6a6f5   10Gi       RWO            Delete           Bound    default/pvc-sql-log      managed-premium            11m

With kubectl get pvc we get a list of the PVCs in our configuration, once for each we defined above. The key here is the status is Bound, or that they are bound to a PV.

kubectl get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
pvc-sql-data     Bound    pvc-e0b418ef-6e69-11e9-a433-f659caf6a6f5   10Gi       RWO            managed-premium   12m
pvc-sql-log      Bound    pvc-e0ea01a8-6e69-11e9-a433-f659caf6a6f5   10Gi       RWO            managed-premium   12m
pvc-sql-system   Bound    pvc-e0cf2345-6e69-11e9-a433-f659caf6a6f5   10Gi       RWO            managed-premium   12m 

Now let’s use kubectl describe pods to get the deep dive info about our storage configuration and how it’s mapped into the Pod. 

There are three keep places in the output below I want to point you to

  • Containers: mssql: Environment: you’ll find the two environment variables set for the data and log directories. Configured as /data and /log
  • Mounts: we see the file system location inside the container and the name of the Volumes defined in the Pod Spec
  • Volumes: we see the name of the Volumes, their type, claim name and the read/write status.
  • Events: this is a log of the events for the creation of this Pod. Key here is that sometimes the container will come up prior to the storage being available to the Pod. That’s what the error below is, but it clears itself up and the container is able to start.
kubectl describe pods
Name:               mssql-deployment-df4cf5c4c-nf8lf
Namespace:          default
Priority:           0
PriorityClassName:
Node:               aks-nodepool1-89481420-2/10.240.0.6
Start Time:         Sat, 04 May 2019 07:41:59 -0500
Labels:             app=mssql
                    pod-template-hash=df4cf5c4c
Annotations:
Status:             Running
IP:                 10.244.1.51
Controlled By:      ReplicaSet/mssql-deployment-df4cf5c4c
Containers:
  mssql:
    Container ID:   docker://f2320ae8f94c24fbb04214b903b4a218b82e9548f8d88a95daa7e207eeaa42b4
    Image:          mcr.microsoft.com/mssql/server:2017-latest
    Image ID:       docker-pullable://mcr.microsoft.com/mssql/server@sha256:39554141d307f2d40d2abfc54e3a0eea3aa527e58f616496c6f3ed3245a2e2b1
    Port:           1433/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sat, 04 May 2019 07:44:21 -0500
    Ready:          True
    Restart Count:  0
    Environment:
      ACCEPT_EULA:                   Y
      MSSQL_DATA_DIR:                /data
      MSSQL_LOG_DIR:                 /log
      SA_PASSWORD:                   S0methingS@Str0ng!
      KUBERNETES_PORT_443_TCP_ADDR:  cscluster-kubernetes-cloud-fd0c5e-8bca8b54.hcp.centralus.azmk8s.io
      KUBERNETES_PORT:               tcp://cscluster-kubernetes-cloud-fd0c5e-8bca8b54.hcp.centralus.azmk8s.io:443
      KUBERNETES_PORT_443_TCP:       tcp://cscluster-kubernetes-cloud-fd0c5e-8bca8b54.hcp.centralus.azmk8s.io:443
      KUBERNETES_SERVICE_HOST:       cscluster-kubernetes-cloud-fd0c5e-8bca8b54.hcp.centralus.azmk8s.io
    Mounts:
      /data from mssql-data (rw)
      /log from mssql-log (rw)
      /var/opt/mssql from mssql-system (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-z9sbf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  mssql-system:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-sql-system
    ReadOnly:   false
  mssql-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-sql-data
    ReadOnly:   false
  mssql-log:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-sql-log
    ReadOnly:   false
  default-token-z9sbf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-z9sbf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age   From                               Message
  ----     ------                  ----  ----                               -------
  Normal   Scheduled               13m   default-scheduler                  Successfully assigned default/mssql-deployment-df4cf5c4c-nf8lf to aks-nodepool1-89481420-2
  Normal   SuccessfulAttachVolume  13m   attachdetach-controller            AttachVolume.Attach succeeded for volume "pvc-e0ea01a8-6e69-11e9-a433-f659caf6a6f5"
  Normal   SuccessfulAttachVolume  12m   attachdetach-controller            AttachVolume.Attach succeeded for volume "pvc-e0cf2345-6e69-11e9-a433-f659caf6a6f5"
  Normal   SuccessfulAttachVolume  12m   attachdetach-controller            AttachVolume.Attach succeeded for volume "pvc-e0b418ef-6e69-11e9-a433-f659caf6a6f5"
  Warning  FailedMount             11m   kubelet, aks-nodepool1-89481420-2  Unable to mount volumes for pod "mssql-deployment-df4cf5c4c-nf8lf_default(027c46f7-6e6a-11e9-a433-f659caf6a6f5)": timeout expired waiting for volumes to attach or mount for pod "default"/"mssql-deployment-df4cf5c4c-nf8lf". list of unmounted volumes=[mssql-system mssql-data]. list of unattached volumes=[mssql-system mssql-data mssql-log default-token-z9sbf]
  Normal   Pulled                  11m   kubelet, aks-nodepool1-89481420-2  Container image "mcr.microsoft.com/mssql/server:2017-latest" already present on machine
  Normal   Created                 11m   kubelet, aks-nodepool1-89481420-2  Created container
  Normal   Started                 11m   kubelet, aks-nodepool1-89481420-2  Started container

Creating a Database and Verifying File Location

With this code, we’ll get our IP address for our SQL Server service then we’ll create a database and query master_files for a list of data files. Notice I’m defining my service port as 31443 which is what we defined when creating our service in the earlier step.

SVCIP=$(kubectl get svc mssql-deployment | grep mssql-deployment |  awk '{print $4}')
sqlcmd -S $SVCIP,31433 -U sa -Q 'CREATE DATABASE TestDB1' -P $PASSWORD
sqlcmd -S $SVCIP,31433 -U sa -Q 'SELECT name,physical_name from sys.master_files' -P $PASSWORD

And we’ll get this output, you can see all of the system databases backed by /var/opt/mssql and our user database is on /data and the log is on /log. All backed by Persistent Volumes.

master        /var/opt/mssql/data/master.mdf
mastlog       /var/opt/mssql/data/mastlog.ldf
tempdev       /var/opt/mssql/data/tempdb.mdf
templog       /var/opt/mssql/data/templog.ldf
modeldev      /var/opt/mssql/data/model.mdf
modellog      /var/opt/mssql/data/modellog.ldf
MSDBData      /var/opt/mssql/data/MSDBData.mdf
MSDBLog       /var/opt/mssql/data/MSDBLog.ldf
TestDB1       /data/TestDB1.mdf
TestDB1_log   /log/TestDB1_log.ldf

Confirming Persistency

Let’s go ahead and delete our Pod to confirm that when it’s recreated by our Deployment our data is still there. 

kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
mssql-deployment-df4cf5c4c-nf8lf   1/1     Running   0          4d2h
kubectl delete pod mssql-deployment-df4cf5c4c-nf8lf 
pod "mssql-deployment-df4cf5c4c-nf8lf" deleted

Once the Pod is recreated, let’s query master files to see where our databases are located. And you’ll find that your the database created in the previous step persisted between Pod creations.

sqlcmd -S $SVCIP,31433 -U sa -Q 'SELECT name,physical_name from sys.master_files' -P $PASSWORD
master        /var/opt/mssql/data/master.mdf
mastlog       /var/opt/mssql/data/mastlog.ldf
tempdev       /var/opt/mssql/data/tempdb.mdf
templog       /var/opt/mssql/data/templog.ldf
modeldev      /var/opt/mssql/data/model.mdf
modellog      /var/opt/mssql/data/modellog.ldf
MSDBData      /var/opt/mssql/data/MSDBData.mdf
MSDBLog       /var/opt/mssql/data/MSDBLog.ldf
TestDB1       /data/TestDB1.mdf
TestDB1_log   /log/TestDB1_log.ldf