SQL Server 2008: Backup Compression- Part 1
First up is Backup Compression. This has been one of most heavily requested features since before I joined the SQL team back in 1999 – for me it’s really cool that it’s in this coming release.
It’s widely recognized that database sizes are growing significantly, and that the growth is being forced by many factors, such as companies requiring more data to be available online for longer (e.g. to comply with government regulations) or an increasing amount of data being digitized for storage. Although the per/GB cost of storage is falling, the actual cost of keeping a database available is increasing. Here’s why:
- Prior to SQL Server 2008, a backup of a VLDB (or any sized database really) takes, on average, the same amount of disk space as the data in the database itself (remember that database files can be sized much larger than the amount of data they hold). A prudent backup strategy is to have multiple backups online, and multiple redundant copies of each backup. Thus, the amount of disk space required increases rather quickly – more than offsetting the drop in storage costs.
- Keeping safe copies of all database backups means they have to be physically copied to a separate location. Increasing database sizes translates into more bandwidth required to copy larger database backups (or lower bandwidth links get saturated for longer and longer).
- If you have a well designed storage strategy, then you’re probably using RAID – so the more backups you need to store, and the higher level of RAID you use, the more drives you need to store all these backup copies.
In addition to costs, the elapsed times of backup and restore operations need to be considered. Backup and restore operations are essentially IO-bound. For a backup in SQL Server 2005 and before, the whole database has to be read and then written to a new location – with the total IO size of the writes to the backup equalling that of the reads from the database. The same is true for a restore, with reads from the backup equalling writes to the restored database. (As an aside, a restore operation also has the added CPU cost of having to examine each page as it’s read from the backup to figure out where it should be written to in the restored database – this usually makes a restore operation take 10-20% more elapsed time than a backup.)
The perfect solution to these problems, at the expense of sometimes-considerable CPU time, is to compress the database as it’s written into the backup. This reduces storage space per backup, required network bandwidth per backup copy, and elapsed time for backup and restore operations. The last point – reducing elapsed time for restore operations – is especially important in disaster recovery situations, where anything that can help reduce database downtime is a good thing.
A less ideal solution I’ve seen is to take a regular backup and then compress it after-the-fact before copying to other locations. While this is a reasonable solution, it requires more disk space than compressing the database as its backed up, and it’s a more complicated procedure. It also increases the time to take the backup, as the compression is done in a seperate step.
An alternative to compressing the backup at all is to make the backup location a compressed NTFS directory. While this achieves the compression goal, it doesn’t permanently compress the backup so doesn’t reduce the network bandwidth required to copy the backup or the space needed to archive the backup to tape.
Up until SQL Server 2008, the only solutions for compression-during-backup have come from third-party software vendors. Although these solutions do the job of aleviating the problems I’ve described above, there are two major roadblocks to their adoption that I’ve heard from SQL Server customers:
- You need to buy another software license as well as SQL Server – this can be pretty expensive for a large number of SQL Server installations. Management is also a headache, to ensure that all sites that may need to decompress the backup have the correct software installed.
- Some IT shops are Microsoft-only, which precludes the use of any software not supplied by Microsoft.
In SQL Server 2008, Microsoft will include a long-awaited and much-asked-for backup compression solution – eliminating the two roadblocks above. This is a fantastic first step improving backup/restore functionality – hopefully in the release after SQL Server 2008 we’ll see further innovations that will allow encrypted backups, table-level restore, and easier validation of the database stored in a backup.
Some points to note:
- Adhering to the principal-of-least-surprise, backup compression will be off by default, with very simple syntax to turn it on – directly with T-SQL or through the tools.
- The compression algorithm used is proprietary to Microsoft and has yielded similar compression ratios to well-known third-party products.
- During Microsoft’s in-house testing on real -world customer databases, average compression ratios of 5:1 have been observed.
- A backup set will not be able to contain both compressed and uncompressed backups.
- None of the existing functionality will be altered by compression – e.g. the operation of WITH CHECKSUM or WITH CONTINUE_AFTER_ERROR (see here for more info on those options).
- A restore operation will be able to tell automatically whether a given backup is compressed or not and just do the right thing.
Once backup compression is available in a public CTP, I’ll blog some sample scripts and report on things like:
- the compression ratios, elapsed time differences, and CPU usages for a few sample databases
- the varoius configuration options available
- any differences in the MSDB backup history tables or the output from RESTORE HEADERONLY/LABELONLY
In summary, backup compression is a very exciting feature and should enable many customers to save money, either on additional software licenses or on storage/network costs, and time, especially in all-important disaster recovery situations.