Seeding an Availability Group Replica from Snapshot
Background
If you’ve been using Availability Groups, you’re familiar with the replica seeding (sometimes called initializing, preparing or data sychronization) process. Seeding is a size of data operation, copying data from a primary replica to one or more secondary replicas. This is required before joining a database to an Availability Group. You can seed a replica with backup and restore or automatic seeding, each with its own challenges. Regardless of which method you use, the seeding operation can take a long amount of time. The time it takes to seed a replica is based on the size of the database, the speed of the network, and storage. If you have multiple replicas seeding all of them is N times the fun!
But what if I told you that you could seed your Availability Group from a storage-based snapshot and that the re-seeding process can be nearly instantaneous?
In addition to saving you time, this process saves your database systems from the CPU, network, and disk consumption that comes with direct seeding and using backups and restores to seed.
This process described in this post is imlemented on Pure Storage’s FlashArray and works cloud scenarios on Pure’s Cloud Block Store.
Pretty awesome, huh…let’s dig in.
Introducing TSQL Snapshot Backup
In SQL Server 2022, there is a new feature called TSQL snapshot backup. This new feature allowing you to take application-consistent backups using storage-based snapshots. I know, lots of y’all just rolled your eyes because snapshots aren’t backup. That’s true, but if I replicate the snapshot somewhere else and can perform an actual database restore from that snapshot, then we’re talking about backups. This means you can restore a database and leave it in restoring mode for a point-in-time restore with differential and log backups and seeding an Availability Group replicas.
So let’s do it…we’re going to snapshot a database, replicate it to another storage array and clone that snapshot to a second instance of SQL Server, and seed an Availability Group replica from that.
Prerequisites
You’ll need to have some prerequisites for this to work correctly. Let’s talk about those first.
- Volumes grouped in a Protection Group - Most databases are spread across multiple volumes. For snapshot backup to work correctly, the volumes must be snapshot at the exact time. A Protection Group guarantees that a snapshot happens simultaneously across all volumes in the Protection Group.
- Snapshop replication between arrays - You can perform this process on a single array, but I want the data on two separate arrays if we’re talking about availability. So we want to replicate the snapshot between two storage arrays, and that replicated snapshot will be used to seed the replica. Cool sidebar here; if you need to scale out read replicas… clone the snapshot to several AG replicas on the same array, and you’ll benefit from data reduction for your data.
- Pausing or replay log backups - If you’re seeding an AG and a log backup happens on the Primary, that log backup needs to be restored on the Secondary for the AG to synchronize correctly. During this process, you will want to either pause or build a process to continuously restore those logs on the secondary. But hey, I said this was going to be fast…and it is…so this is less of an issue than it used to be when you had to backup and restore the whole database.
- You have an AG already - For this post, I’m not going to cover creating an Availability Group. So I assume that you already have the AG up and running, with both instances configured as replicas, and the database is online on the primary and not in the AG, not on the secondary.
Seeding an Availability Group Replica from Snapshot
Let’s dig into how to seed an Availability Group Replica from Snapshot.
First, we’ll want to set up our environment. I’m using dbatools and the Pure Storage PowerShell SDK to coordinate this process. Here I’m initializing some variables for reuse in the script.
Import-Module dbatools
Import-Module PureStoragePowerShellSDK
$Primary = 'aen-sql-22-a'
$Secondary = 'aen-sql-22-b'
$PrimarySqlInstance = Connect-DbaInstance -SqlInstance $Primary
$SecondarySqlInstance = Connect-DbaInstance -SqlInstance $Secondary -TrustServerCertificate -NonPooledConnection
$SecondaryPsSession = New-PSSession -ComputerName $Secondary -TrustServerCertificate -NonPooledConnection
$Credential = Get-Credential -UserName "$env:USERNAME" -Message 'Enter your credential information...'
Take the Snapshot Backup on the Primary’s FlashArray
Let’s walk through the code block below.
On the primary array…
- Freeze IO on the database using
ALTER DATABASE TestDB1 SET SUSPEND_FOR_SNAPSHOT_BACKUP = ON
- Take a snapshot of the Protection Group that replicates to the Secondary’s array
- Execute the
BACKUP DATABASE TestDB1 TO DISK='\\FILESERVER\BACKUP\TestDB1-Replica.bkm' WITH METADATA_ONLY
command
The BACKUP
command generates a metadata file that describes what’s in the backup. We’ll need this later to restore the database on the Secondary replica. We’ll talk about how long the database is frozen later in the post. Spoiler alert…the amount of time is measured in milliseconds.
#Connect to the Primary replicas FlashArray
$PrimaryFlashArray = New-PfaArray –EndPoint primaryarray.yourdatacenter.local -Credentials $Credential -IgnoreCertificateError
#Freeze the database
Invoke-DbaQuery -SqlInstance $PrimarySqlInstance -Database master -Query 'ALTER DATABASE TestDB1 SET SUSPEND_FOR_SNAPSHOT_BACKUP = ON'
#Take a snapshot of the Protection Group
$PrimarySnapshot = New-PfaProtectionGroupSnapshot -Array $PrimaryFlashArray -Protectiongroupname 'aen-sql-22-pg' -ApplyRetention -ReplicateNow
#Take a metadata backup of the database, this will automatically unfreeze if successful
$Query = "BACKUP DATABASE TestDB1 TO DISK='\\FILESERVER\BACKUP\TestDB1-Replica.bkm' WITH METADATA_ONLY"
Invoke-DbaQuery -SqlInstance $PrimarySqlInstance -Database master -Query $Query
Let’s talk about snapshot replication for a second.
The first time FlashArray replicates a snapshot between the arrays, it will move the data reduced data. On SQL Server, FlashArray generally sees a 3.58:1 data reduction. This reduces the time for that initial seeding of the Secondary replica on the Secondary’s array since less data has to replicate. This technique is immensly helpful in scenarios where you have to seed a replica in a DR site or cloud over a WAN or VPN link.
Now, if this was a re-seed of a replica…when we take a snapshot of the Primary replica’s array and replicate it to the Secondary’s array…only data that has changed on the Primary’s array and not yet on the Secondary’s array will be copied over the wire. Dramatically reducing the amount of data that needs to be replicated and the time it takes to re-seed that Secondary replica. If this is a multi-terabyte database or set of databases, the time savings here is enormous. If you’re using Cloud Block Store in the cloud, and you have replicas on-premises, or replicas across Availability Zones or Regions, this will save you time on the re-seeding process and also save you on network tranfer costs.
Connect to the Secondary’s FlashArray
So now that we have a snapshot that’s replicated to the Secondary’s array…let’s connect to that array and retrieve that replicated snapshot. And that’s what’s happening in the code below.
- Connect to the Secondary’s array
- Get the snapshot. And make sure the snapshot’s name matches the one we just took above.
$SecondaryFlashArray = New-PfaArray –EndPoint secondaryarray.yourdatacenter.local -Credentials $Credential -IgnoreCertificateError
#Get the replicated snapshot on the Secondary's array...it has to be THAT same exact snapshot
$TargetSnapshot = Get-PfaProtectionGroupSnapshots -Array $SecondaryFlashArray -Name 'primaryarray:aen-sql-22-pg' | Where-Object { ($_.Name).EndsWith($PrimarySnapshot.name) }
Offline the volumes on the Secondary
Now, on the Secondary replica, we need to refresh the volumes on that replica with clones of the volumes in the snapshot.
- So you offline the volumes supporting the database
- clone those volumes with
New-PfaVolume
- Online the volumes
You’ll want to make sure your volume names and drive letters/mount points match the Primary’s layout…which, if you’re using Availability Group, you probably already are. I’m using VMware VMs here with vVols attached. This technique works for RDM and physical servers.
Invoke-Command -Session $SecondaryPsSession -ScriptBlock { Get-Disk | Where-Object { $_.SerialNumber -eq '6000c29f238200118f86219b90d11fe6' } | Set-Disk -IsOffline $True }
Invoke-Command -Session $SecondaryPsSession -ScriptBlock { Get-Disk | Where-Object { $_.SerialNumber -eq '6000c296fdfa73527a65de55eb6db3de' } | Set-Disk -IsOffline $True }
#Overwrite the volumes on the Secondary from the protection group snapshot
New-PfaVolume -Array $SecondaryFlashArray -VolumeName 'vvol-aen-sql-22-b-1-1b3811f2-vg/Data-9a615973' -Source ($TargetSnapshot.name + ".vvol-aen-sql-22-a-1-74fc3caf-vg/Data-618800dc") -Overwrite
New-PfaVolume -Array $SecondaryFlashArray -VolumeName 'vvol-aen-sql-22-b-1-1b3811f2-vg/Data-8ef07fb5' -Source ($TargetSnapshot.name + ".vvol-aen-sql-22-a-1-74fc3caf-vg/Data-e9bb1e31") -Overwrite
#Online the volumes on the Secondary
Invoke-Command -Session $SecondaryPsSession -ScriptBlock { Get-Disk | Where-Object { $_.SerialNumber -eq '6000c29f238200118f86219b90d11fe6' } | Set-Disk -IsOffline $False }
Invoke-Command -Session $SecondaryPsSession -ScriptBlock { Get-Disk | Where-Object { $_.SerialNumber -eq '6000c296fdfa73527a65de55eb6db3de' } | Set-Disk -IsOffline $False }
Restore the database from snapshot backup with NORECOVERY
on the secondary
With the data on the volumes attached to the Secondary replica, you can restore the snapshot backup on the Secondary. The critical thing here is the NORECOVERY
option since we’re seeding an Availability Group…the database state needs to be RESTORING
.
$Query = "RESTORE DATABASE TestDB1 FROM DISK = '\\FILESERVER\BACKUP\TestDB1-Replica.bkm' WITH METADATA_ONLY, REPLACE, NORECOVERY"
Invoke-DbaQuery -SqlInstance $SecondarySqlInstance -Database master -Query $Query
Finalize the Seeding of the Replica and Join the AG
From here on out, since the database is a RESTORING
state on the Secondary replica, we’re looking at standard Availability Group manual seeding.
- Take a log backup
- Restore it on the Secondary
- Add the replica to the Availability Group
- Add the database to the Availability Group
- Start data movement
$Query = "BACKUP LOG TestDB1 TO DISK = '\\FILESERVER\BACKUP\TestDB1-seed.trn' WITH FORMAT"
Invoke-DbaQuery -SqlInstance $PrimarySqlInstance -Database master -Query $Query
#Restore it on the Secondary
$Query = "RESTORE LOG TestDB1 FROM DISK = '\\FILESERVER\SHARE\BACKUP\TestDB1-seed.trn' WITH NORECOVERY"
Invoke-DbaQuery -SqlInstance $SecondarySqlInstance -Database master -Query $Query
### Set the seeding mode on the Seconary to manual
Invoke-DbaQuery -SqlInstance $PrimarySqlInstance -Database master -Query 'ALTER AVAILABILITY GROUP [ag1] MODIFY REPLICA ON N''aen-sql-22-b'' WITH (SEEDING_MODE = MANUAL)'
#Add the database to the AG
Invoke-DbaQuery -SqlInstance $PrimarySqlInstance -Database master -Query 'ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [TestDB1];'
#Enable replication
Invoke-DbaQuery -SqlInstance $SecondarySqlInstance -Database master -Query 'ALTER DATABASE [TestDB1] SET HADR AVAILABILITY GROUP = [ag1];'
How long is the freeze?
In my lab, the freeze and snapshot process takes around 150 milliseconds. But look at the code; that time also includes the network time of sending the freeze command over the network since I’m doing it remotely.
$Start = (Get-Date)
#Freeze the database
Invoke-DbaQuery -SqlInstance $PrimarySqlInstance -Database master -Query 'ALTER DATABASE TestDB1 SET SUSPEND_FOR_SNAPSHOT_BACKUP = ON'
#Take a snapshot of the Protection Group
$PrimarySnapshot = New-PfaProtectionGroupSnapshot -Array $PrimaryFlashArray -Protectiongroupname 'aen-sql-22-pg' -ApplyRetention -ReplicateNow
#Take a metadata backup of the database, this will automatically unfreeze if successful
$Query = "BACKUP DATABASE TestDB1 TO DISK='\\FILESERVER\BACKUP\TestDB1-Replica.bkm' WITH METADATA_ONLY"
Invoke-DbaQuery -SqlInstance $PrimarySqlInstance -Database master -Query $Query
$Stop = (Get-Date)
Write-Output "The snapshot time takes...$(($Stop - $Start).Milliseconds)ms!"
The snapshot time takes…157ms!
What’s This All Mean?
Well, first, this helps increase the availability of your database systems. IF you had a replica failure and it’s offline, your system is vulnerable if another replica fails. Leveraging this technique, you can quickly bring your systems back to their full protection. Further, as a DBA, you won’t have to sit around and monitor the re-seeding process so that you focus on different tasks in your organization.
Here’s a GitHub Gist of this code for reference.