SQL Database snapshots – when things go wrong
Reverting to a snapshot
Many people use snapshots as a way of protecting against mistakes during a complicated set of changes to a database – you can just revert to the snapshot if something goes wrong. However, reverting to a snapshot has a little-known (but documented) problem. Let’s see. I’m going to create a database plus a snapshot of it and then start taking backups.
CREATE DATABASE SnapshotTest;
GO
ALTER DATABASE SnapshotTest SET RECOVERY FULL;
GO
BACKUP DATABASE SnapshotTest TO DISK = ‘C:\SQLskills\SnapshotTest.bak’ WITH INIT;
GO
CREATE TABLE SnapshotTest.dbo.MyTable (c1 INT);
GO
BACKUP LOG SnapshotTest TO DISK = ‘C:\SQLskills\SnapshotTest_log.bak’ WITH INIT;
GO
– Imagine a bunch of things happen here
– Create the snapshot database, by first checking which files exist
SELECT * FROM SnapshotTest.sys.database_files;
GO
CREATE DATABASE ST_Snap ON(NAME = SnapshotTest, FILENAME = ‘C:\SQLskills\SnapshotTest_snap.snp’)AS SNAPSHOT OF SnapshotTest;GO
Now I’m going to do some stuff that’s a mistake:
DROP TABLE SnapshotTest.dbo.MyTable;
GO
Not a problem as I can revert to my database snapshot:
RESTORE DATABASE SnapshotTest FROM DATABASE_SNAPSHOT = ‘ST_Snap’;GO
Oops! I forgot to take a log backup to capture everything that happened since the last log backup…
BACKUP LOG SnapshotTest TO DISK = ‘C:\SQLskills\SnapshotTest_log.bak’;
GO
Msg 4214, Level 16, State 1, Line 1
BACKUP LOG cannot be performed because there is no current database backup.
Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.
Reverting to a snapshot rebuilds the transaction log and breaks the log backup chain. The only thing I can do is take a full or differential backup andthen start taking log backups again. So – I lost the ability to do point-in-time restores in the period from the last log backup to the time when I reverted to the snapshot.
So how can I tell when the database was reverted? Books Online documents that the restorehistory table in msdb should have an entry withrestore_type = ‘R’. Let’s try:
SELECT * FROM msdb.dbo.restorehistory WHERE [destination_database_name] = ‘SnapshotTest’;GO
Nope – nothing. That functionality seems to be broken. The only way I could find to trace reverting was in the error log:
2008-01-30 11:09:21.73 spid53 Reverting database ‘SnapshotTest’ to the point in time of database snapshot ‘ST_Snap’ with split point LSN 26000000013800001 (0x0000001a:0000008a:0001). This is an informational message only. No user action is required.
2008-01-30 11:09:21.74 spid53 Starting up database ‘SnapshotTest’.
2008-01-30 11:09:21.74 spid53 The database ‘SnapshotTest’ is marked RESTORING and is in a state that does not allow recovery to be run.
2008-01-30 11:09:21.85 spid53 Starting up database ‘SnapshotTest’.
2008-01-30 11:09:21.87 spid53 Starting up database ‘ST_Snap’.
Note that the snapshot database is still there and continues to work.
Running out of space in a snapshot
The next problem that can happen is if you create a database snapshot on a volume that doesn’t have much disk space. In that case you can actually run out of space and the snapshot will go suspect and be unusable until it’s dropped and recreated. Let’s see what happens:
– Create snapshot of large database on a volume with not much space
SELECT * FROM SalesDB.sys.database_files;
GO
CREATE DATABASE SalesDB_Snap ON
(NAME = SalesDBData, FILENAME = ‘D:\sqlskills\SalesDB_snap.snp’)AS SNAPSHOT OF SalesDB;GO
ALTER INDEX ALL ON SalesDB.dbo.Sales REBUILD;GO
Msg 5128, Level 17, State 2, Line 1
Write to sparse file ‘D:\sqlskills\SalesDB_snap.snp’ failed due to lack of disk space.
The query that causes the sparse file to run out of space doesn’t fail, but the user running the query will see this error. Note that it doesn’t say the snapshot has gone suspect. If you try to use it then it will tell you, plus there info written to the error log:
2008-01-30 11:50:29.14 spid53 Error: 17053, Severity: 16, State: 1.
2008-01-30 11:50:29.14 spid53 D:\sqlskills\SalesDB_snap.snp: Operating system error 112(There is not enough space on the disk.) encountered.
2008-01-30 11:50:29.15 spid53 Error: 3420, Severity: 21, State: 1.
2008-01-30 11:50:29.15 spid53 Database snapshot ‘SalesDB_Snap’ has failed an IO operation and is marked suspect. It must be dropped and recreated.
Running out of space while running DBCC CHECKDB
One thing that isn’t common knowledge is that DBCC CHECKDB uses database snapshots as it’s mechanism for running online in SQL Server 2005. You can’t control where the snapshot is created (it’s actually created in the same location as the files comprising the database being checked – see my previous post on the DBCC CHECKDB stages here for more info) but you can create your own snapshot and check that. Anyway – it’s possible to run out of space in the snapshot while DBCC CHECKDB is running. I setup a situation with a large database on a drive with only 1MB of space left, started a large index rebuild in the database and then tried to run DBCC CHECKDB. Let’s see what happened:
The index rebuild statement:
ALTER INDEX ALL ON SalesDB2.dbo.Sales REBUILD;GO
Msg 5128, Level 17, State 2, Line 1
Write to sparse file ‘d:\sqlskills\salesdbdata.mdf:MSSQL_DBCC20′ failed due to lack of disk space.
So some poor unsuspecting user gets this error and has no idea why. For DBAs its a bit obscure too. The syntax is describing an alternate streamon the existing data file and the snapshot in question can’t be accessed at all.
The DBCC CHECKDB statement:
DBCC CHECKDB (salesdb2) WITH NO_INFOMSGS;GO
Msg 1823, Level 16, State 1, Line 1
A database snapshot cannot be created because it failed to start.
Msg 1823, Level 16, State 2, Line 1
A database snapshot cannot be created because it failed to start.
Msg 7928, Level 16, State 1, Line 1
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 5128, Level 17, State 2, Line 1
Write to sparse file ‘d:\sqlskills\salesdbdata.mdf:MSSQL_DBCC20′ failed due to lack of disk space.
Msg 3313, Level 21, State 2, Line 1
During redoing of a logged operation in database ‘salesdb2′, an error occurred at log record ID (1628:252:1). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
Wow – major amounts of errors – and it all boils down to running out of space.