Search This Blog

Showing posts with label SQL Full backup. Show all posts
Showing posts with label SQL Full backup. Show all posts

How can a log backup fail but a full backup succeed?

Backup detected log corruption in database FakeDBName. Context is FirstSector. LogFile: 2 'F:\SQLLOGS\XYZ\FakeDBName_Log.ldf' VLF SeqNo: x502e VLFBase: x2ce40000 LogBlockOffset: x2d0a9000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x4ee3 LogBlock.StartLsn.Blk: x1348 Size: x200 PrevSize: x400
2009-03-06 10:00:02.61 Backup      Error: 3041, Severity: 16, State: 1.
2009-03-06 10:00:02.61 Backup      BACKUP failed to complete the command BACKUP LOG FakeDBName Check the backup application log for detailed messages.
2009-03-06 10:00:03.61 Backup      Error: 3041, Severity: 16, State: 1.
However a full backup succeeds, as does a DBCC CHECKDB. What's going on?
The answer comes with understanding what portions of the transaction log are required for these operations. (For a good understanding of the transaction log itself, along with logging and recovery, see my article in the February TechNet Magazine.)
A transaction log backup, by its very nature, has to backup *all* transaction log generated since the last log backup – so it will try to backup the corrupt portion and fail.
A full database backup only has to backup enough transaction log to allow the database to be restored and recovered to a transactionally consistent point. In other words, it only requires the transaction log back to the beginning of the oldest active transaction at the point that the data-reading section of the full backup completes. This is a source of immense confusion – many people don't believe that a full (or differential) backup needs to also backup some transaction log. For a more in-depth study of this, see my previous blog postsDebunking a couple of myths around full database backups and More on how much transaction log a full backup includes.
A DBCC CHECKDB operation uses a database snapshot to get a transactionally consistent view of the database on which to run consistency checks. When the database snapshot is created, crash recovery is run on it to make it transactionally consistent. That requires the same amount of log as if a full backup was taken – back to the beginning of the oldest active transaction at the time the database snapshot is created. See CHECKDB From Every Angle: Complete description of all CHECKDB stages for more info.
So – it's entirely possible for the situation reported above to exist. The question then becomes, how to recover from it?
Assuming that the database files are intact, there is a simple solution. This solution will break the log backup chain, but given that the log is corrupt so a log backup cannot be taken, the log backup chain is *already* broken. Here's what to do:
  1. Stop all user activity in the database
  2. Switch to the SIMPLE recovery model (breaking the log backup chain and removing the requirement that the damaged portion of log must be backed up)
  3. Switch to the FULL recovery model
  4. Take a full database backup (thus starting a new log backup chain)
  5. Start taking log backups
You might want to manually shrink and grow the log file in between steps 2 and 3 too – in case the log file is on a damaged portion of disk – or maybe even shrink it right down and add another log file on an undamaged disk. You also will need to do some root-cause analysis to determine why the corruption occured in the first place, and to take preventative measures to stop it happening again.

More on how much transaction log a full backup includes


The full backup has to include all the transaction log from the begin LSN of the oldest active transaction at the time the data read portion of the backup ends, until the LSN at which the data read portion ends. If that begin LSN is later in time than the LSN of the checkpoint that backup does initially, why does the full backup need to include all thr transaction log between the checkpoint and the begin LSN? What is it used for?

I replied in the comments with a quip that it would be easier to reply with a whiteboard and a timeline – so I got all enthusiastic and created a picture in Powerpoint to help explain better.
timeline More on how much transaction log a full backup includes
Consider the timeline in the picture above for a full backup (the red numbers match the list below):
  1. The backup operation take a checkpoint to force all dirty pages in the buffer pool to disk – both those containing changes from transactions that have committed and those containing changes from transactions that are still in-flight. The backup operation then starts reading the allocated pages in the database.
  2. The read operation reads page X
  3. Transaction A starts
  4. Transaction A makes a change to page X. The copy in the backup is now out-of-date. Note that the backup will not read page X again – it's already passed that point in the database.
  5. Transaction B starts. It won't complete before the data read operation completes so it's begin LSN is the oldest active transaction begin LSN.
  6. Transaction A commits. This commits the changes to page X.
  7. The backup data read operation completes and transaction log reading starts.
Now, the reason that the transaction log is read is so that the restore operation can recover the database so it is transactionally consistent as of the point in time when the read data operation completed.
If the transaction log was only included from the oldest active transaction begin LSN (point 5), then the copy of page X that was restored from the backup (read at point 2) would not be updated with the changes from transaction A (that happened at point 4). This means that it would not be transactionally consistent with the rest of the database as of the time the read data operation completed (point 7).
So, (ignoring replication) the minimum LSN of the transaction log that's included in the full backup is MIN (LSN of last checkpoint, LSN of oldest active transaction). This ensures that recovery can REDO log records to bring pages up-to-date and UNDO log records for transactions that had not committed.