Catalog Error with Backup Exec

The disaster recovery project has been moving along with fits and starts. I was certainly expecting this to be a learning experience and it hasn’t failed to disappoint in that regard. Today, I kicked off a catalog of a new tape and promptly received this error:

The requested media is not listed in the media index and could not be mounted. To add the media’s catalog information to the disk-based catalogs, run an inventory operation on the media and resubmit the Catalog operation.

I ran another successful inventory of the tape for good measure, but the error remained. I rebooted the server and the tape drive. No love. Frustrating since I’ve been successfully cataloging tapes for the last few weeks.

Following the links from the error report, I turn off the option to “Use storage media-based catalogs.” By clearing the check box for this option, Backup Exec was forced to ignore any catalog information on the tape itself and build the catalog by reviewing each file on tape individually. This process takes longer, but in my case, was successful.

This is the recommended change to make when normal catalog methods fail. It’s also something you’ll need to do if you must catalog the contents of single tape from a backup job that spans multiple tapes, which can also fail if you don’t have all the tapes from the set in inventory. For more information about the differences between storage media-based catalogs and on-disk catalogs for Backup Exec, check out this additional explanation of the “storage media-based catalog” option at Symantec’s website.

Failed SQL Restores That Actually Succeed

This week’s adventure in disaster recovery has been with one of our in-house SQL applications. The application has several databases that need to be restored and we find that the Backup Exec restore job is reported as having failed with the error of “V-79-65323-0 – An error occurred on a query to database .” This error doesn’t prevent SQL from using the databases properly and hasn’t appeared to affect the application.

Once the job completes Backup Exec also warns that the destination server requires a reboot. We are speculating that Backup Exec is unable to do a validation query to the restored database due to the need for the reboot, so the error is somewhat superfluous.

We are going to experiment a bit to see if turning of the post-restore consistency checks eliminate this error in the future, but for the moment we just opted to note the error in our recovery documentation so we don’t spend time worrying about during another test or during a real recovery scenario.

We’ve also found that for some reason, it’s very important to pre-create subfolders under the FTData folder before restoring the databases. If these folders related to the full text index aren’t available the job will fail, too. This has required our DBA to write some scripts to have available in the event of the restore to create these directories, as well as drop and recreate the indexes once everything is restored.

While I appreciate learning more about the database backend of some of our applications, I’m so glad I’m not a DBA. 🙂

Disaster Recovery Testing – Epic Fail #1

As I’ve mentioned before, my big project for this month is disaster recovery testing. A few things have changed since our last comprehensive test of our backup practices and we are long overdue. Because of this, I expect many “failures” along the way that will need to be remedied. I expect our network documentation to be lacking, I expect to be missing current versions of software in our disaster kit. I know for a fact that we don’t have detailed recovery instructions for several new enterprise systems. This is why we test – to find and fix these shortcomings.

This week, at the beginning stages of the testing we encountered our first “failure”. We’ve dubbed it “Epic Failure #1” and its all about those backup tapes.

A while back our outside auditor wanted us to password protect our tapes. We were running Symantec Backup Exec 10d at the time and were happy to comply. The password was promptly documented with our other important passwords. Our backup administrator successfully tested restores. Smiles all around.

We faithfully run backups daily. We run assorted restores every month to save lost Word documents, quickly migrate large file structures between servers, and to correct data corruption issues. We’ve had good luck with with the integrity of our tapes. More smiles.

Earlier this week, I load up the first tape I need to restore in my DR lab. I typed the password to catalog the tape and it tells me I have it wrong. I typed it again, because it’s not an easy password and perhaps I had made a mistake. The error message appears, my smile did not.

After poking in the Backup Exec databases on production and comparing existing XML catalog files from a tape known to work with the password, we conclude that our regular daily backup jobs simply have a different password. Or at least the password hash is completely different, yet this difference is repeated across the password protected backup jobs on all our production backup media servers. Frown.

After testing a series of tapes from different points in time from different servers, we came the the following disturbing conclusion: The migration of our Backup Exec software from 10d to 12.5, which also required us to install version 11 as part of the upgrade path, mangled the password hashes on the pre-existing job settings. Or uses a different algorithm, or something similar with the same result.

Any tapes with backup jobs that came from the 10d version of the software use the known password without issue. And any new jobs that are created without a password (since 12.5 doesn’t support media passwords anymore) are also fine. Tapes that have the “mystery password” on them are only readable by a media server that has the tape cataloged already, in this case the server that created it. So while they are useless in a full disaster scenario, they work for any current restorations we need in production. We upgraded Backup Exec just a few months ago, so the overall damage is limited to a specific time frame.

Correcting this issue required our backup administrator to create new jobs without password protection. Backup Exec 12.5 doesn’t support that type of media protection anymore (it was removed in version 11) so there is no obvious way to remove the password from the original job. Once we have some fresh, reliable backups off-site I can continue with the disaster testing. We’ll also have to look into testing the new tape encryption features in the current version of Backup Exec and see if we can use those to meet our audit requirements.

The lesson learned here was that even though the backup tapes were tested after the software upgrade, they should have been tested on a completely different media server. While our “routine” restore tasks showed our tapes had good data, it didn’t prove they would still save us in a severe disaster scenario.