Using T-SQL Snapshot Backup - Are Snapshots Backups?

Introduction

Traditional SQL Server backups can struggle with large databases, resulting in longer backup times and resource contention. T-SQL Snapshot Backup, a new feature in SQL Server 2022, addresses these challenges by allowing storage-based snapshots to be coordinated through T-SQL. This feature delivers faster, more efficient backups, especially for large-scale environments with the most aggressive of recovery objectives.

Anatomy of a Full Backup

Before we start learning about T-SQL Snapshot backup, let’s establish what a backup in SQL Server is. When a backup is initiated in SQL Server, several things occur. Follow along in Figure 1 below; here, you see a database with database files on Disk A and a transaction log file on Disk B.

Let’s go through the process:

  1. A checkpoint runs, stabilizing the database on disk by writing any dirty pages and marking a starting recovery point in the transaction log.
  2. The database engine marks the start of the backup in the transaction log.
  3. A backup file(s) is created with a backup header that describes the file’s contents, the database’s file layout, and the database’s status at the time of the backup.
  4. Database pages in the database files (s) are read sequentially from the database file(s) and written into the backup file(s).
  5. Once the pages are copied into the backup file, the active portion of the transaction log is also copied into the backup.
  6. Mark in the transaction log that the backup is finished.
  7. Upon completion, MSDB records the backup in the backup history. This information includes details such as the backup type, time, start and stop LSN, and other critical backup information.

Full Backup
Figure 1: Anatomy of a Full Backup

Challenges as Databases Grow Larger

The full backup we just discussed is the foundation of a database protection solution for the DBA. The database engine controls what’s written into the backup file, containing all the database data and log at the time of the backup. However, reading and writing all the data into a file can pose challenges as databases grow. When I present on this topic, I often ask the audience how many people have a database larger than 10TB - about half of the room raises their hand. I then ask how many restore their database to meet their business’s recovery objectives; nearly all those hands go down. Sometimes, I’ll ask how many of you run 24x7 shops, and a full backup impacts your production workloads, usually about 10% of the hands in the room up. Let’s talk about some challenges around running full backups as databases grow.

  • Data Size — A full backup includes all the database data and the active part of the log. Databases are constantly growing, so the backup size also increases.
  • Time Consuming — As data grows, the time required for backup and restore operations also increases significantly.
  • Resource Pressure — During backup, there’s pressure on the disk subsystem, network interconnects, and system processors, especially if backup compression is used.
  • Workload Impact — Backup operations may impact the user workload as both compete for the same resources.
  • Costs — Designing a backup system for large databases requires dedicated infrastructure, and replicating backup data between primary and disaster recovery sites is challenging and costly due to the need for dedicated networking.
  • Recovery Time Objective (RTO) — The real challenge we’re trying to solve for is meeting your Recovery Time Objective (RTO), which is how quickly databases can be brought back online after a failure. Traditional full backups and restores take time for backup, restore, and data writing, and growing data size and resource competition further complicate the restore process, making it harder to meet recovery objectives defined by the business.

Let’s Talk About Snapshots

We have been talking about the difficulties of full backups. There is a better way: storage-based snapshots. A storage-based snapshot is a complete, point-in-time, read-only version of the data on the disk(s) when the snapshot is taken. With a storage-based snapshot, you can restore the disk(s) to a previous point in time instantly, as well as copy or clone the disk(s) to grant access to the data to other systems or replicate the snapshot to independent storage or even between data centers for disaster recovery.

  • Data Size — Snapshots on modern storage platforms only capture the changes made since the last snapshot, reducing the storage required for each snapshot.
  • Time Consuming — Snapshots are executed almost instantaneously, as data doesn’t need to be copied as in a traditional full backup.
  • Resource Pressure — Snapshots don’t put additional pressure on server, storage or network resources since they don’t involve data copying.
  • Workload Impact — depending on the type of snapshot executed, there is no impact on the workload. We’ll dive deeper into this in an upcoming session.
  • Costs — Snapshots only store changes made after the snapshot, reducing the amount of storage space needed compared to a full backup.
  • Recovery Time Objective (RTO) — With snapshots, you can instantly revert to a previous snapshot to meet your recovery objectives.

But I’ve Used Snapshots Before

I know, I know…you’ve heard the snapshot song before, tried it, and were burned. Things have changed. I know these pains. I’ve been there. Let’s talk about some of the past’s less than “greatest” snapshot hits.

  • Application Consistent Snapshots — Nobody loves VSS (Volume Shadow Copy Service), nobody. It often required third-party tools, which had interesting support models. Most importantly, it was slow and could cause long and unpredictable IO stun times.

  • Crash Consistent Snapshots — I’m a fan of crash-consistent snapshots. This type of snapshot is useful as it can provide aggressive recovery points, sometimes down to the minute, and does not require an application IO freeze. However, they fall short in one area—point-in-time recovery. You can’t perform a log restore or use them to join an availability group.

  • Database snapshots - such as SQL Server’s mechanism, have limitations. They are restricted to the instance where the snapshot is taken, making them non-portable between instances. They also can’t be used to replicate to another storage media. Additionally, their copy-on-write implementation can lead to performance challenges.

  • Vendor Specific Implementations — Here, I’m talking about things like Azure VM or VMware whole VM snapshots. While they are beneficial for VM recovery, they are not suitable if you need to restore only one database or a subset of databases.

Introducing T-SQL Snapshot Backup

Microsoft SQL Server engineering recognized a need for a better recovery mechanism for SQL Server both on-premises and in Azure. This led to the development of T-SQL Snapshot Backups in SQL Server 2022. This feature enables the taking of SQL Server-aware, application-consistent snapshot-based backups, reducing downtime and helping to meet recovery objectives.

  • Quiesce the Database(s) with No External Tools — This feature allows for the quiescing of the database, a group of databases or the whole instance, directly within SQL Server using T-SQL, eliminating the need for external tools and ensuring consistency and recoverability.

  • SQL Server Aware and in Complete Control — SQL Server controls the process. It knows what is in each snapshot and records this in its backup history.

  • Unlocks Point-in-Time Recovery — T-SQL Based Snapshot backups can be used as a base for log restores. You revert your storage to the your desired recovery point with the snapshot, then place the database into RESTORING mode, and restore log backups up until to your exact recovery point. The key here is you don’t have to read the data from a full backup file and write it back into primary storage. You’re instantly reverting the storage to a previous recovery point, putting the database in RESTORING mode, and restoring your log backups. This dramatically reduces the restore time and resources needed to do a full restore a database, group of databases, or whole instance to seconds.

  • Seeding Availability Groups and Log Shipping - Snapshots can be used to quickly build an availability group, reseed a replica, or initialize a log shipping replica.

  • Enables Cross–Platform Scenarios–Windows and Linux - Since SQL Server controls the snapshot process, it is compatible with both SQL Server Windows and Linux.

  • It’s FAST!!! (Especially When Compared with VSS) - In my testing, the entire snapshot process completes in hundreds of milliseconds, and it does this consistently.

Anatomy of a T-SQLSnapshot Backup

Let’s give a high-level overview of the T-SQL Snapshot Backup process. Follow along in Figure 2 below. In the next section, we’ll get into some code showing what this looks like.

  1. A checkpoint runs, stabilizing the database on disk by writing any dirty pages and marking a starting recovery point in the transaction log.
  2. The database engine marks the start of the backup in the transaction log.
  3. The database and log files are frozen for write IO, while reads can still happen. For more details, refer to this post: Understanding IO Freeze T-SQL Snapshot Backups for more details.
  4. Perform the snapshot at the storage layer.
  5. A backup metadata file(s) is created with a backup header This has all of the same information as a traditional backup header, it just describes what is in the snapshot, rather than a full backup file. More on this later.
  6. Upon successfully completing the snapshot and writing of the metadata file, the database and log thaw for write IOs.
  7. Mark in the transaction log that the backup is finished.
  8. Upon completion, MSDB records the backup in the backup history. Just like a traditional full backup, there are new fields in the table to record that this is a snapshot backup.
  9. You can now take log backups as you normally would, the snapshot backup is the base for the full restore.

Snapshot Backup
Figure 2: Anatomy of a T-SQL Based Snapshot Backup

Let’s take a moment to get a little meta. Consider what you’re aiming for with a full backup. Your goal is to restore the database(s) to its state before a failure. Using traditional backups, we take a full backup, which reads all the data and writes it to an intermediate file and slaps a header on the front of that. Then, when needed, we read all the data and write it out to return to the previous state. With snapshots, you revert the database(s) back to that previous state…instantly. That’s the goal.

T-SQL Snapshot Backup Workflow

We discussed the process in the previous section; now, let’s examine how this appears in code.

  1. Suspend - new syntax exposed in SQL Server 2022 allows you to suspend a database, a group of databases, or the whole instance. Here, we’re suspending a single database. When this executes, backup locks are taken and the database(s) is frozen for write IO.

    ALTER DATABASE DB1 SET SUSPEND_FOR_SNAPSHOT_BACKUP = ON
    
  2. Snapshot - Once the database is suspended for write IO, you execute a snapshot with your underlying storage subsystem. I’m waving my hands here without code, as the implementation details are specific to the storage subsystem. Upcoming blog posts in this series will explore Azure and Pure Storage snapshot examples.

    Take the storage snapshot – Azure Disk, Storage Array, or Hypervisor
    
  3. Backup - After successfully creating a storage snapshot, the database is still frozen for write IO. You now need to execute a BACKUP command for the database, group of databases, or instance. The example below shows a database backup named DB1, with the syntax BACKUP DATABASE DB1 to initiate the backup process. You add the TO DISK=DB1.bkm parameter to define the location where the metadata file will be written. This can be a local disk or network share. The WITH METADATA_ONLY tells SQL Server to write out just the header information and not to copy the database data into the backup file since the database data is in the snapshot. Once this is completed, the database is thawed for writing IOs. In my testing, this process takes several hundred milliseconds.

    Although not mandatory, it’s a good idea to use the MEDIADESCRIPTION parameter to store the location and name of the snapshot backup in the backup header. This will help you easily find the correct snapshot for this backup header file.

    BACKUP DATABASE DB1
        TO DISK=DB1.bkm
        WITH METADATA_ONLY,
        MEDIADESCRIPTION=‘SNAPSHOT_NAME|SNAPSHOT_LOCATION’
    

The T-SQL Snapshot Backup Metadata File

The snapshot backup metadata file in SQL Server 2022 is crucial, as it details what’s in the backup, including the backup type, date, and database name(s). You need this file to perform a point-in-time restore. It’s essential to protect it, just like your actual backup files. However, if you lose this file, all is not lost. Without the metadata file, you can still bring databases online, but you’ll lose the ability to put the database into restoring mode. So you will lose point-in-time recovery, AG seeding, or log shipping replica initialization, but you can still get online the database from the point in time of the snapshot.

Anatomy of a T-SQLSnapshot Restore

Let’s give a high-level overview of the T-SQL Snapshot Backup Restore process. Follow along in Figure 3 below. In the next section, we’ll get into some code showing what this looks like.

  1. Since we’re using storage based snapshots for our restore the first thing that you need to do is take the database offline and also offline/unmount the volumes where the database files are stored.
  2. Then, create a copy or clone of the storage snapshot. If you’re recovering to the same instance, you can overwrite the existing disks with this copy.
  3. After cloning, bring the volumes back online or mount them. This will bring the database files back to their previous state on the volumes.
  4. Next, run a RESTORE command with the new METADATA_ONLY option, pointing to the metadata-only backup file generated when taking the T-SQL Based Snapshot Backup. This informs the database engine that the files are the base of the restore operation. To put the databases in RESTORING mode, specify the NO_RECOVERY option.
  5. With the database in RESTORING mode, let’s restore some transaction log backups; for example, in Figure 1 we are restoring to time t1.
  6. Once the desired recovery point is reached, let’s bring the database online using the RESTORE DATABASE DB1 command, similar to any other restore operation.

Snapshot Restore
Figure 3: Anatomy of a T-SQL Based Snapshot Restore

The restore example above demonstrates restoring over an existing database. Instead of overwriting the current volumes with the snapshot, you have the option to clone to a new set of volumes and then perform the restore operation. This allows you to keep the source database online. You can then complete the restore operation restoring the database on the new volumes with a new database name. This approach enables you to recover data from the restored database back into the source database without taking the source database offline. And you did this…instantly and didn’t have to perform a full restore, which would require additional time, space on disk, compute and network resources.

T-SQL Snapshot Backup Restore Workflow

We discussed the backup process in the previous section; now, let’s examine how the restore process looks in code.

  1. Prepare the database - just like a traditional full, in place restore, the database will need to be offline to perform the T-SQL Backup Snapshot restore process.

    ALTER DATABASE $DbName SET OFFLINE WITH ROLLBACK IMMEDIATE
    
  2. Prepare the Storage - First, take the database offline and then unmount the volumes supporting the database. After that, copy, clone, or revert to a snapshot to return the data to the previous state when the T-SQL snapshot backup was taken. You then need to online or mount the volumes. I’m going to wave my hands again here because the implementation details depend on your storage environment

    Take the storage snapshot – Azure Disk, Storage Array, or Hypervisor
    
  3. Restore the database - After reverting to the snapshot and ensuring that the disks are back online or mounted, the database files are accessible to the operating system and SQL Server. Next, you should initiate a RESTORE using the new METADATA_ONLY option. This command directs the database engine to start the restore process using the available database files on disk as the base for the restoreoperation. You can choose to keep the database in RESTORING mode by using the NORECOVERY option, or bring the database online.

    RESTORE DATABASE $DbName FROM DISK = '$BackupFile' WITH METADATA_ONLY, REPLACE, NORECOVERY
    

Are Snapshots Backups?

DBAs have “one” job:

  • Take backups (and also test restores)
  • Get them off the primary storage
  • Replicate them to another site as quickly as possible

OK, that’s more than one job, but you get the point. It’s our duty to protect our organization’s data. Now, let’s address the elephant in the room: Is this backup?

Modern storage platforms function at 99.9999 availability. Local snapshots to a single storage device are pretty redundant. There’s a reason you run your data on that storage platform: because you expect it always to be there. BUT, any backup, snapshot or traditional, isn’t a backup until you get it onto a second storage subsystem. Further, you must replicate them from your primary site into a disaster recovery site. Modern storage platforms account for this with snapshot replication. Most also use storage-efficient replication, only replicating the changes so you can more efficiently move data between your primary and disaster recovery sites. Significantly reducing replication times when compared to replication of native backups.

So, if I can copy my T-SQL Snapshot Backups to a separate storage environment within my data center and then copy them to a third location in a disaster recovery site, and if I can perform a point-in-time restore on any of these locations, is this a backup?

I want to emphasize that using this feature, our focus here is on quickly restoring the databases you’re having the most challenging time backing up and restoring. Backup systems have evolved to provide additional features such as archiving, regulatory compliance, and information lifecycle management. I still believe it’s important to continue taking SQL Server native backups, as you may need to restore backups from previous periods to facilitate other restore patterns, such as data anomalies caused by bugs in application code. So, will snapshots replace your backup infrastructure? That depends on your situation and the recovery patterns you need to engineer in your environment.

Wrapping Things Up

Look, I know this is a contentious topic. But hopefully, you’ll consider this a solution for your environment’s biggest databases, where you’re struggling with backup times, resource constraints, or meeting recovery objectives. Performing a full restore instantly is a pretty complelling reason to explore this feature more.

In this post, we focused on the snapshot backup process at a high level. In upcoming posts, we will explore what a restore looks like and perform several different restore scenarios.


Using T-SQL Snapshot Backup Blog Series

This article is part of a blog series covering the Using T-SQL Snapshot Backup. Check out the other posts in this series:

You can watch Bob Ward and me present a deep dive on this feature at PASS Summit 2023 in the session “Are Snapshots Backups?”.