Memory Settings for Running SQL Server in Kubernetes

People often ask me what’s the number one thing to look out for when running SQL Server on Kubernetes…the answer is memory settings. In this post, we’re going to dig into why you need to configure resource limits in your SQL Server’s Pod Spec when running SQL Server workloads in Kubernetes. I’m running these demos in Azure Kubernetes Service (AKS), but these concepts apply to any SQL Server environment running in Kubernetes. 

Let’s deploy SQL Server in a Pod without any resource limits.  In the yaml below, we’re using a Deployment to run one SQL Server Pod with a PersistentVolumeClaim for our instance directory and also frontending the Pod with a Service for access. 

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mssql-deployment-2017
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
        app: mssql-2017
  template:
    metadata:
      labels:
        app: mssql-2017
    spec:
      hostname: sql3
      containers:
      - name: mssql
        image: 'mcr.microsoft.com/mssql/server:2017-CU16-ubuntu'
        ports:
        - containerPort: 1433
        env:
        - name: ACCEPT_EULA
          value: "Y"
        - name: SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mssql
              key: SA_PASSWORD
        volumeMounts:
        - name: mssqldb
          mountPath: /var/opt/mssql
      volumes:
      - name: mssqldb
        persistentVolumeClaim:
          claimName: pvc-sql-2017
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-sql-2017
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: managed-premium
---
apiVersion: v1
kind: Service
metadata:
  name: mssql-svc-2017
spec:
  selector:
    app: mssql-2017
  ports:
    - protocol: TCP
      port: 1433
      targetPort: 1433
  type: LoadBalancer

Running a Workload Against our Pod…then BOOM!

With that Pod deployed, I loaded up a HammerDB TPC-C test with about 10GB of data and drove a workload against our SQL Server. Then while monitoring the workload…boom HammerDB throws connection errors and crashes. Let’s look at why.

First thing’s first, let’s check the Pods status with kubectl get pods. We’ll that’s interesting I have 13 Pods. 1 has a Status of Running and the remainder have are Evicted

kubectl get pods
NAME                                     READY   STATUS    RESTARTS   AGE
mssql-deployment-2017-8698fb8bf5-2pw2z   0/1     Evicted   0          8m24s
mssql-deployment-2017-8698fb8bf5-4bn6c   0/1     Evicted   0          8m23s
mssql-deployment-2017-8698fb8bf5-4pw7d   0/1     Evicted   0          8m25s
mssql-deployment-2017-8698fb8bf5-54k6k   0/1     Evicted   0          8m27s
mssql-deployment-2017-8698fb8bf5-96lzf   0/1     Evicted   0          8m26s
mssql-deployment-2017-8698fb8bf5-clrbx   0/1     Evicted   0          8m27s
mssql-deployment-2017-8698fb8bf5-cp6ml   0/1     Evicted   0          8m27s
mssql-deployment-2017-8698fb8bf5-ln8zt   0/1     Evicted   0          8m27s
mssql-deployment-2017-8698fb8bf5-nmq65   0/1     Evicted   0          8m21s
mssql-deployment-2017-8698fb8bf5-p2mvm   0/1     Evicted   0          25h
mssql-deployment-2017-8698fb8bf5-stzfw   0/1     Evicted   0          8m23s
mssql-deployment-2017-8698fb8bf5-td24w   1/1     Running   0          8m20s
mssql-deployment-2017-8698fb8bf5-wpgcx   0/1     Evicted   0          8m22s

What Just Happened?

Let’s keep digging and look at kubectl get events to see if that can help us sort out what’s happening…reading through these events a lot is going on. Let’s start at the top, we can see that our original Pod mssql-deployment-2017-8698fb8bf5-p2mvm is Killed and the line below that tells us why. The Node had a MemoryPressure condition. A few lines below that we see that our mssql container was using 4461532Ki which exceeded its request of 0 (more on why it’s 0 in a bit). So then our Deployment Controller sees that our Pod is no longer up and running so the Deployment controller does what it’s supposed to do start a new Pod in the place of the failed Pod.
 
The scheduler in Kubernetes will try to place a Pod back onto the same Node if the Node is still available, in our case aks-agentpool-43452558-0. And each time the scheduler places the Pod back onto the same Node it find that the MemoryPressure condition is still true, so after the 10th try the scheduler selects a new Node, aks-agentpool-43452558-3 to run our Pod. And in the last line of the output below we can see that once the workload is moved to aks-agentpool-43452558-3 the MemoryPressure condition goes away on aks-agentpool-43452558-0 since it’s no longer running our workload. 
 
kubectl get events --sort-by=.metadata.creationTimestamp
LAST SEEN   TYPE      REASON                      OBJECT                                        MESSAGE
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-clrbx    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-clrbx to aks-agentpool-43452558-0
17m         Warning   EvictionThresholdMet        node/aks-agentpool-43452558-0                 Attempting to reclaim memory
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-clrbx
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-ln8zt
17m         Normal    Killing                     pod/mssql-deployment-2017-8698fb8bf5-p2mvm    Stopping container mssql
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-54k6k    The node had condition: [MemoryPressure].
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-p2mvm    The node was low on resource: memory. Container mssql was using 4461532Ki, which exceeds its request of 0.
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-cp6ml    The node had condition: [MemoryPressure].
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-cp6ml    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-cp6ml to aks-agentpool-43452558-0
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-54k6k    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-54k6k to aks-agentpool-43452558-0
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-clrbx    The node had condition: [MemoryPressure].
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-cp6ml
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-54k6k
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-ln8zt    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-ln8zt to aks-agentpool-43452558-0
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-96lzf    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-96lzf to aks-agentpool-43452558-0
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-96lzf
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-ln8zt    The node had condition: [MemoryPressure].
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-96lzf    The node had condition: [MemoryPressure].
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-4pw7d    The node had condition: [MemoryPressure].
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-4pw7d    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-4pw7d to aks-agentpool-43452558-0
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-4pw7d
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-2pw2z    The node had condition: [MemoryPressure].
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-2pw2z    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-2pw2z to aks-agentpool-43452558-0
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-2pw2z
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-4bn6c    The node had condition: [MemoryPressure].
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-4bn6c
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   Created pod: mssql-deployment-2017-8698fb8bf5-stzfw
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-4bn6c    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-4bn6c to aks-agentpool-43452558-0
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-stzfw    The node had condition: [MemoryPressure].
17m         Normal    SuccessfulCreate            replicaset/mssql-deployment-2017-8698fb8bf5   (combined from similar events): Created pod: mssql-deployment-2017-8698fb8bf5-td24w
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-wpgcx    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-wpgcx to aks-agentpool-43452558-0
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-wpgcx    The node had condition: [MemoryPressure].
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-stzfw    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-stzfw to aks-agentpool-43452558-3
17m         Warning   Evicted                     pod/mssql-deployment-2017-8698fb8bf5-nmq65    The node had condition: [MemoryPressure].
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-nmq65    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-nmq65 to aks-agentpool-43452558-0
17m         Normal    NodeHasInsufficientMemory   node/aks-agentpool-43452558-0                 Node aks-agentpool-43452558-0 status is now: NodeHasInsufficientMemory
17m         Normal    Scheduled                   pod/mssql-deployment-2017-8698fb8bf5-td24w    Successfully assigned default/mssql-deployment-2017-8698fb8bf5-td24w to aks-agentpool-43452558-3
16m         Normal    SuccessfulAttachVolume      pod/mssql-deployment-2017-8698fb8bf5-td24w    AttachVolume.Attach succeeded for volume "pvc-f35b270a-e063-11e9-9b6d-ee8baa4f9319"
15m         Normal    Pulling                     pod/mssql-deployment-2017-8698fb8bf5-td24w    Pulling image "mcr.microsoft.com/mssql/server:2017-CU16-ubuntu"
15m         Normal    Pulled                      pod/mssql-deployment-2017-8698fb8bf5-td24w    Successfully pulled image "mcr.microsoft.com/mssql/server:2017-CU16-ubuntu"
15m         Normal    Started                     pod/mssql-deployment-2017-8698fb8bf5-td24w    Started container mssql
15m         Normal    Created                     pod/mssql-deployment-2017-8698fb8bf5-td24w    Created container mssql
12m         Normal    NodeHasSufficientMemory     node/aks-agentpool-43452558-0                 Node aks-agentpool-43452558-0 status is now: NodeHasSufficientMemory
 
But guess what…we’re going to have the same problem on this new Node. If we run our workload again, our memory allocation will grow and Kubernetes will kill the Pod again once the MemoryPressure condition is met. So what do we do…how can we prevent our nodes from going into a MemoryPressure condition? 

Understanding Allocatable Memory in Kubernetes 

Using kubectl describe nodein the output below there’s a section Allocatable. In there we can see that amount of allocatable resources on this Node in terms of CPU, disk, RAM and Pods. These are the amount of resources available to run user Pods on this Node. And there we see the amount of allocatable memory is 4667840Ki (~4.45GB) so we have about that much memory to run our workloads. The amount here is a function of the amount of memory in the Node and reservations made by Kubernetes for system functions, more on that here. Our AKS cluster VMs are Standard DS2 v2 which have 2vCPU and 7GB of RAM, so about 2.55GB is reserved for other uses.  The output below is from after our Pod was evicted so we can see the LastTransitionTime shows the last time a condition occurred and for MemoryPressure we can see an event at 7:53 AM. The other LastTransitionTimes are from when the Node was started. Another key point is in the Events section where we can see the conditions change state.
 
kubectl describe nodes aks-agentpool-43452558-0
Name:               aks-agentpool-43452558-0
...output omitted...
Unschedulable:      false
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 10 Sep 2019 16:20:00 -0500   Tue, 10 Sep 2019 16:20:00 -0500   RouteCreated                 RouteController created a route
  MemoryPressure       False.  Sat, 28 Sep 2019 07:58:56 -0500.  Sat, 28 Sep 2019 07:53:55 -0500.  KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sat, 28 Sep 2019 07:58:56 -0500   Tue, 10 Sep 2019 16:18:27 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sat, 28 Sep 2019 07:58:56 -0500   Tue, 10 Sep 2019 16:18:27 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sat, 28 Sep 2019 07:58:56 -0500   Tue, 10 Sep 2019 16:18:27 -0500   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  Hostname:    aks-agentpool-43452558-0
  InternalIP:  10.240.0.6
Capacity:
 attachable-volumes-azure-disk:  8
 cpu:                            2
 ephemeral-storage:              101584140Ki
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory:                         7113152Ki
 pods:                           110
Allocatable:
  attachable-volumes-azure-disk:  8
 cpu:                            1931m
 ephemeral-storage:              93619943269
 hugepages-1Gi:                  0
 hugepages-2Mi:                  0
 memory: 4667840Ki  pods:                           110
...output omitted...
Events:
Type     Reason                     Age                  From                               Message
  ----     ------                     ----                 ----                               -------
  Warning  EvictionThresholdMet       10m                  kubelet, aks-agentpool-43452558-0  Attempting to reclaim memory
  Normal   NodeHasInsufficientMemory  10m                  kubelet, aks-agentpool-43452558-0  Node aks-agentpool-43452558-0 status is now: NodeHasInsufficientMemory
  Normal   NodeHasSufficientMemory    5m15s (x2 over 14d)  kubelet, aks-agentpool-43452558-0  Node aks-agentpool-43452558-0 status is now: NodeHasSufficientMemory

SQL Server’s View of Memory on Kubernetes Nodes

When using a Pod with no memory limits defined in the Pod Spec (which is why we saw 0 for the limits in the Event entry) SQL Server sees 5557MB (~5.4GB) memory available and thinks it has that to use. Why is that? Well, SQL Server on Linux looks at the base OS to see how much memory is available on the system and by default uses approximately 80% of that memory due its architecture (SQLPAL).
2019-09-28 14:46:16.23 Server      Detected 5557 MB of RAM. This is an informational message; no user action is required. 
This is bad news in our situation, Kubernetes has only 4667840Ki (~4.45GB) to allocate before setting the MemoryPressure condition which will cause our Pod to be Evicted and Terminated. And as with our workload running SQL Server allocates memory, primarily to the buffer pool, and it exceeds the Allocatable amount of memory for the Node Kubernetes kills our Pod to protect the Node and the cluster as a whole. 

Configuring Pod Limits for SQL Server

So how do we fix all of this? We need to set a resource limit in our Pod Spec. Limits allow us to control the amount of a particular resource exposed to a Pod. And in our case, we want to limit the amount of memory we want SQL Server to see. In our environment we know we have  4667840Ki (~4.45GB) of Allocatable memory for user Pods on Nodes so lets set a value lower than that…and to be super safe I’m going to use 3GB. In the code below you can see in the Pod Spec for our mssql container we have a section for resources, limits and a value of memory: “3Gi”.

spec:
      hostname: sql3
      containers:
      - name: mssql
        image: 'mcr.microsoft.com/mssql/server:2017-CU16-ubuntu'
        ports:
        - containerPort: 1433
        env:
        - name: ACCEPT_EULA
          value: "Y"
        - name: SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mssql
              key: SA_PASSWORD
        resources:
          limits:
            memory: "3Gi"
        volumeMounts:
        - name: mssqldb
          mountPath: /var/opt/mssql
      volumes:
      - name: mssqldb
        persistentVolumeClaim:
          claimName: pvc-sql-system-2017
With this configured we limit the amount of memory SQL Server sees to 3GB. Given that the container is running SQL Server on Linux, SQL Server will actually see about 80% of that 2458MB
2019-09-28 14:01:46.16 Server      Detected 2458 MB of RAM. This is an informational message; no user action is required.

Summary

With that, I hope you can see why I consider memory settings the number one thing to look out for when deploying SQL Server in Kubernetes.  Setting appropriate values will ensure that your SQL Server instance on Kubernetes stays up and running and happily with the other workloads you have running in your cluster.  What’s the best value to set? We need to take into account the amount of memory on the Node, the amount of memory we need to run our workload in SQL Server, and the reservations needed by both Kubernetes and SQLPAL. Additionally, we should set max server memory instance level setting inside of SQL Server to limit the amount of memory that’s allocatable. My suggestion to you is to configure both a resource limit at the Pod Spec and configure max server memory at the instance level.

If you want to read more about resource management and Pod eviction check out this resources: