SQL Server Disaster Recovery: Key Considerations

SQL Server instances, even if equipped with provisions like automatic backup and recovery, are not sufficiently protected against sudden, disastrous situations like multi-site database failure, ransomware attacks, etc.

In the absence of a diligently tested and documented strategy, SQL Server disaster recovery might take longer than the SLAs or might not happen at all, bringing alive an SQL DBA’s worst nightmare – a prolonged outage.

This article provides an insight into the key considerations while preparing an effective SQL Server disaster recovery plan. These considerations are based on defining the success metrics for SQL recovery, choosing a backup strategy, leveraging SQL availability features, etc.

Define the Success Metrics

SQL Server disaster recovery has two key metrics: recovery point objective (RPO) and recovery time objective (RTO). These metrics are essential components of the service level agreement (SLA).

Recovery Point Objective (RPO)

RPO is the latest “point in time” to restore an SQL Server after disaster recovery. RPO depends upon the backup frequency – more frequent backups allow restoration to a nearer point in time, reducing the extent of data loss.

Recovery Time Objective (RTO)

RTO is the “maximum time allowed” to restore the database to a normal working state after a disaster. RPO depends on the type and size of backup – full backups take longer to restore than differential backups.

Choose the Right Backup Strategy

SQL Server allows multiple backup options that can be combined to speed up the SQL disaster recovery process per the RTO while restoring maximum data per the RPO. The most common types of backup options are:

Full backup – Creates a copy of all database objects and transaction logs. Full backups take the maximum time and storage space but can restore all items in database failure scenarios.

Differential backup – Creates a copy of all the changes in the database since the previous full backup. Differential backup takes less space than a full backup and allows faster restoration.

Transaction log backup – Used to back up all the transaction log records created since the last full backup. It allows point-in-time recovery of SQL Server databases until the previous transaction log before the failure event.

Tail-log backup – This option creates a backup of log records that are not yet backed up by the last transaction log backup. It allows restoring the database to the nearest possible “point in time” after failure using the latest transaction logs.

Leverage SQL Server Availability Features

SQL Server 2017 introduced several availability features to support disaster recovery and extended SQL Server installations to Linux. These features can help you recover SQL databases in vast scenarios, and therefore, you need to consider them in your disaster recovery plan for SQL Server:

Always on availability groups

This feature enables database-level protection by replicating each database transaction on another SQL instance, configured in standalone mode or as a failover cluster. The primary database works as the read/write copy in the availability group, and the replicas receive transactions via this primary database.

In a disastrous event, always-on availability with multi-site implementation allows site resilience and automated disaster recovery by switching over to the secondary data centers on another location.

Failover cluster instance (FCI)

FCI leverages the underlying clustering feature in Windows Server to extend protection to the complete SQL server installation. In a failure event, FCI with always-on availability will move the entire instance to another server.

FCI mechanism extended to multiple locations provides a powerful solution for disaster recovery. However, in that case, the same storage disks need to be shared across all the locations through replication.

Log shipping

Log shipping is an older SQL Server disaster recovery method, suitable where RPO and RTO are flexible and databases are not business-critical. Nonetheless, log shipping is a worthwhile option considering its simplicity. The log shipping process automatically backs up the transaction logs of the primary database and then copies and applies them on another instance called “warm standby.”

Notably, in log shipping, the switching over process is always manual, initiated using T-SQL. Also, log shipping needs to be configured for every database.

End Note

Disastrous situations generally occur without forewarning. The outcome could be a “total outage” of SQL Server for a prolonged duration. You can avert this eventuality by following the SQL Server disaster recovery checklist:

Always have a comprehensive backup strategy, even with always-on availability and FCI.
Store the database backup copies on fault-tolerant media like SAN, tape, RAID, etc.
Maintain multiple backup copies on physical and virtual storage at different locations.
Test the consistency and readiness of backups.
Perform logical and physical consistency checks on all database objects. Learn more about these checks here.
Leverage professional SQL database recovery software to meet contingencies like database corruption, backup availability or restoration issues, etc. Having reliable software like Stellar Toolkit for MS SQL can help DBAs tackle database and backup-related issues in vast scenarios.