Setting up SQL Server 2022 s3 Object Storage Integration using MinIO with Docker Compose

In this blog post, I’ve implemented two example environments for using SQL Server 2022’s s3 object integration. One for backup and restore to s3 compatible object storage and the other for data virtualization using Polybase connectivity to s3 compatible object storage. This work aims to get you up and running as quickly as possible to work with these new features. I implemented this in Docker Compose since that handles all the implementation and configuration steps for you. The complete code for this is available on my GitHub repo…I’m walking you through the implementation here in this post.

In my post Setting up MinIO for SQL Server 2022 s3 Object Storage Integration we did this step by step at the command line. Using Docker Compose it will do all the hard work for you and you can get up and running fast.

Let’s walk through what you’ll get in each environment.

Backup and Restore Test Environment

Up first is Backups. In this repo’s backup directory, there’s a script demo.sh. In this script, you will find the commands needed to start the environment and do a basic connectivity test using a SQL Server backup. To start everything up, you’ll change into the backup directory and run docker compose up --detach. At this point, you’ll have a functioning enviroment, but let’s dig into the details of how this is implemented in docker compose.

docker compose up --detach

First, since SQL Server’s s3 object integration requires a valid and trusted certificate, a service named config runs a container that creates the required certificate needed for this environment and stores them in the current working directory in a subdirectory named certs.

  config:
    platform: linux/amd64
    build:
      context: .
      dockerfile: dockerfile.ssl
    volumes:
      - ./openssl.cnf:/tmp/certs/openssl.cnf
      - ./certs:/certs
    command: openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -keyout /certs/private.key -out /certs/public.crt -config /tmp/certs/openssl.cnf

Second, we start a service named minio1 that starts a MinIO container on a static IP address of 172.18.0.2 and exposes the data and admin ports on 9000 and 9001. There are also two volumes defined a data volume s3-data for the object data stored in MinIO, and the other, certs is a bind mount exposing the certificates into the MinIO container for use TLS connections. MinIO automatically configures itself for TLS connectivity when it finds certificates in this location. The final configurations are the root username and password environment variables and the command starting up the container.

  minio1:
    image: quay.io/minio/minio
    platform: linux/amd64
    depends_on: 
      - config
    hostname: minio1
    networks:
      s3-data:
        ipv4_address: 172.18.0.20
    extra_hosts:
      - "s3.example.com:172.18.0.20"
    ports:
      - 9000:9000
      - 9001:9001
    volumes:
      - s3-data:/data
      - ./certs:/root/.minio/certs/
    environment:
      - MINIO_ROOT_USER=MYROOTUSER
      - MINIO_ROOT_PASSWORD=MYROOTPASSWORD
    command: server /data --console-address ":9001" 

Next, the createbucket service creates a user in MinIO that we will use inside SQL Server to access MinIO and also creates a bucket named sqlbackups for our backup and restore testing.

  createbucket:
    image: minio/mc
    platform: linux/amd64
    networks:
      s3-data:
    extra_hosts:
      - "s3.example.com:172.18.0.20"
    depends_on:
      - minio1
    entrypoint: /bin/sh -c "sleep 60; 
                            /usr/bin/mc alias set s3admin https://s3.example.com:9000 MYROOTUSER MYROOTPASSWORD --api S3v4 --insecure;
                            /usr/bin/mc admin user add s3admin anthony nocentino --insecure;
                            /usr/bin/mc admin policy attach s3admin readwrite --user anthony --insecure;
                            /usr/bin/mc alias set anthony https://s3.example.com:9000 anthony nocentino --insecure;
                            /usr/bin/mc mb anthony/sqlbackups  --insecure;"                            

Finally, we start a service named sql1, which runs the latest published container image for SQL Server 2022 mcr.microsoft.com/mssql/server:2022-latest. In this service, we add an extra_host so that the SQL Server container can resolve the DNS name of our MinIO container so that it can make the proper TLS connection. There is a data volume for our SQL Server data sql-data, and we’re using a bind mount to expose the MinIO container’s public certificate into SQL Server to that it’s trusted using the code ./certs/public.crt:/var/opt/mssql/security/ca-certificates/public.crt:ro.

  sql1:
    image: mcr.microsoft.com/mssql/server:2022-latest
    platform: linux/amd64
    depends_on: 
      - config
      - createbucket
      - minio1
    hostname: sql1
    networks:
      - s3-data
    extra_hosts:
      - "s3.example.com:172.18.0.20"
    ports:
      - 1433:1433
    volumes:
      - sql-data:/var/opt/mssql
      - ./certs/public.crt:/var/opt/mssql/security/ca-certificates/public.crt:ro
    environment:
      - ACCEPT_EULA=Y
      - MSSQL_SA_PASSWORD=S0methingS@Str0ng!

Once the containers are up and running, you’ll want to create a database, create a credential for access to your s3 bucket in MinIO, then run a backup. Here’s some example code to backup to our MinIO container.

Create a database in SQL Server

CREATE DATABASE TESTDB1

Create the S3 credential in SQL Server

CREATE CREDENTIAL [s3://s3.example.com:9000/sqlbackups] WITH IDENTITY = 'S3 Access Key', SECRET = 'anthony:nocentino'

Run the backup to the s3 target

BACKUP DATABASE TestDB1 TO URL = 's3://s3.example.com:9000/sqlbackups/TestDB1.bak' WITH COMPRESSION, STATS = 10, FORMAT, INIT

When you’re all finished, you can use docker compose down --rmi local --volumes to stop all the containers and destroy all the images and volumes associated with this environment.

Polybase and s3 Data Virtualization Environment

Up next is Data Virtualization. In this repo’s polybase directory, there’s a script demo.sh. This script has the commands you’ll need to start up the environment and do a basic connectivity test using Polybase-based access to s3-compatible object storage. To start everything up, you’ll change into the polybase directory and run docker compose build and then a docker compose up --detach . This docker compose manifest will do a few things…let’s walk through that.

This docker compose manifest starts the same as the backup one above. But in addition to that, it creates the certificate needed, starts a configured MinIO container, and then creates the required user and bucket in MinIO. It also copies a simple CSV file into the MinIO container. This is the data we’ll access from SQL Server via Polybase over s3.

Since Polybase isn’t enabled by default in the published container image mcr.microsoft.com/mssql/server:2022-latest, we have to build a container image for SQL Server with Polybase installed. And that’s what we’re doing in the sql1 service in the dockerfile named dockerfile.sql.

Start up the environment

Once you’re ready to go, start up the environment with docker compose up --detach and follow the steps in demo.sh.

With the SQL Server container up and running, let’s walk through the steps to access data on s3 compatible object storage. All this code is in demo.sql in the repo. But I want to walk you through it here too.

Configure Polybase in SQL Server instance

Confirm if the Polybase feature is installed, 1 = installed

SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;

Next, enable Polybase in your instance’s configuration

exec sp_configure @configname = 'polybase enabled', @configvalue = 1;

Confirm if Polybase is in your running config, run_value should be 1

exec sp_configure @configname = 'polybase enabled'

Configure access to external data using Polybase over S3

Create a database to hold objects for the demo

CREATE DATABASE [PolybaseDemo];

Switch into the database context for the PolybaseDemo database

USE PolybaseDemo

Create a database master key, this is use to protect the credentials you’re about to create

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0methingS@Str0ng!';  

Create a database scoped credential, this should have at minimum ReadOnly and ListBucket access to the s3 bucket

CREATE DATABASE SCOPED CREDENTIAL s3_dc WITH IDENTITY = 'S3 Access Key', SECRET = 'anthony:nocentino' ;

Before you create the external data source, you need to restart the SQL Server container. To restart your a container started by docker compose you can use this:

docker compose restart sql1

If you don’t restart the SQL Server container you’ll get this error:

Msg 46530, Level 16, State 11, Line 1
External data sources are not supported with type GENERIC.

Create your external datasource on your s3 compatible object storage, referencing where it is on the network LOCATION, and the credential you just defined

CREATE EXTERNAL DATA SOURCE s3_ds
WITH
(    LOCATION = 's3://s3.example.com:9000/'
,    CREDENTIAL = s3_dc
)

First, we can access data in the s3 bucket and for a simple test, let’s start with CSV. During the docker compose up, the build copied a CSV into the bucket it created. This should output Hello World!

SELECT  * 
FROM OPENROWSET
(    BULK '/sqldatavirt/helloworld.csv'
,    FORMAT       = 'CSV'
,    DATA_SOURCE  = 's3_ds'
) 
WITH ( c1 varchar(50) )             
AS   [Test1]

OPENROWSET is cool for infrequent access, but if you want to layer on SQL Server security or use statistics on the data in the external data source, let’s create an external table. This first requires defining an external file format. In this example, its CSV

CREATE EXTERNAL FILE FORMAT CSVFileFormat
WITH
(    FORMAT_TYPE = DELIMITEDTEXT
,    FORMAT_OPTIONS  ( FIELD_TERMINATOR = ','
,                      STRING_DELIMITER = '"'
,                      FIRST_ROW = 1 )
);

Next, we define the table’s structure. The CSV here is mega simple, just a couple rows with a two columns. When defining the external table where the data lives on our network with DATA_SOURCE, the LOCATION within that DATA_SOURCE and the FILE_FORMAT

CREATE EXTERNAL TABLE HelloWorld ( c1 varchar(50) )
WITH (
     DATA_SOURCE = s3_ds
,    LOCATION = '/sqldatavirt/helloworld.csv'
,    FILE_FORMAT = CSVFileFormat
);

Now we can access the data just like any other table in SQL server.

SELECT * FROM [HelloWorld];

A note about Polybase using containers with default settings

If you get this error below, you will want to increase the memory resource in your Docker configuration. The default is 2GB, I set it to 4GB an all worked.

2022-08-13 13:09:43.22 spid41s     There is insufficient system memory in resource pool 'internal' to run this query.

Wrap things up

When you’re done, you can use docker compose down --volumes --rmi local to clean up all the resources, images, network, and the volumes holding the database in the databases and MinIO.