Arm in ARM with VM Backups

With all the interesting announcements with BUILD this week, you might have missed this which, to me, signals that the end is near for the wait for Azure Backup and DR related features to be available in “Azure v2” aka Azure Resource Manager.

Now in public preview: Azure Backup for ARM VMs!  Check out the full post with instructions on how to get started – https://azure.microsoft.com/en-us/documentation/articles/backup-azure-vms-first-look-arm/

Seeing some of the these features being ported over means that organizations looking to start out using the cloud to supplement there on-prem backup and recover plans aren’t locked into the Azure Service Manager model.

No word yet on the big guns like Azure Site Recovery, but being able to snapshot an Azure VM for backup purposes making moving some on-prem workloads to the cloud much easier!

 

Today I Learned: Azure Site Recovery Mobility Service

Today I learned that if you are switching from the legacy version of ASR to the new version of ASR, the Mobility Service version has a new revision, which would be expected.  However, you have to manually uninstall the older version from the servers you wish to protect.  Otherwise, the “push” installation just fails and doesn’t report anything really useful in the Azure portal.

When I went to manually install it on the server, I got better error messages from the application noting that a previous version of the service was installed and needed manual removal.

Perhaps I’ve just saved you some time one day.

Business Continuity and the Cloud

This week marks the start of TechNet on Tour, coming to twelve cities.   The full day workshops include lecture and hands-on-labs where you can learn about some of the ways you can utilize Microsoft Azure to help with your disaster recovery planning.
But let me tell you about the first “business continuity” plan I was part of.  It involved a stash of tapes, daily backups on a two week cycle with the Friday backups being held for a month.  The nightly backup job fit on two tapes and every morning, I ejected the tapes from the machine and dropped them in my bag.  They went home with me, across town, and came back every day to be swapped with latest ones.  Whenever I took a vacation, I designated an available person to perform the same task.
That was it.  The tapes were rarely looked at, the data never tested and fortunately, never needed.  We were partying like it was 1999. Because it was.
Still, the scenario isn’t uncommon.  There are still lots of small businesses, with only single locations and still lots of tapes out there.  But now, there is more data and more urgency for that data to be recovered as quickly as possible with as little loss as possible.  And there are still only 24 hours in the day. How annoying to arrive at work in the morning, only to find the overnight backup job still running.
As I moved through jobs and technologies evolved, we addressed the growing data and lack of time in many ways…  Adjusting backup jobs to capture less critical or infrequently changing data only over the weekends.  More jobs that only captured delta changes.  Fancier multiple-tape changers, higher density tapes, local “disk to disk” backups that were later moved to tape, even early “Internet” backup solutions, often offered by the same companies that handled your physical tape and box rotation services.
We also chased that holy-grail of “uptime”.  Failures weren’t supposed to happen if you threw enough hardware in a room.  Dual power supplies, redundant disk arrays, multiple disk controllers, UPS systems with various bypass offerings.  Add more layers to protect the computers, the data.
Testing was something we wanted to do more often.  But it was hard justify additional hardware purchases to upper management.  Hard to find the time to set up a comprehensive test.   But we tried and often failed.  And learned.  Because each test or real outage is a great opportunity to learn.  Outages are often perfect storms… if only we had swapped out that dying drive a day before, if only that piece of hardware was better labeled, if only that was better documented… and each time we made improvements.
I remember, after a lengthy call with a co-location facility that wanted us to sign a year agreement even though we only wanted space for 3 months to run a recovery test, how I wished for something I could just use for the time I needed.  It’s been a little over 5 years since that phone call, but finally there is an answer and it’s “the cloud”.
Is there failure in the cloud? Of course, it’s inevitable. For all the abstractness, it’s still just running on hardware. But the cloud provides part of an answer that many businesses simply didn’t have even five years ago.  Business that never recovered from the likes of Katrina and other natural or man-made disasters, might still have a shot today.
So catch a TechNet Tour if it passes through your area.  Look at taking advantage of things like using the cloud as target instead of tape, or replicating a VM to Azure with Azure Site Recovery.  Even starting to dabble in better documentation or scripting with PowerShell to make your key systems more consistently reproducible will go a long way.  Do a “table top” dry run of your existing DR plan today.
Sysadmins don’t let other sysadmins drop  DLT tapes in their bags.  Let’s party like it’s 2015.  Because it is.

Summer Reads!

Ah, summertime…. Vacations, relaxing on the patio, fruit salads, sparkly drinks and learning. Right? I spent some time by the beach and the pool recently and then came back to a pile of interesting things I wanted to read or try out.

There are also two new video blogs available on Channel 9 that will keep adding new content you might want to check out.

TechNet on Tour – Disaster Recovery!

We technical evangelists are at it again!  This September and October, we will visit 10 cities to talk about using Microsoft Azure as part of your disaster recovery plan.

Attendees will receive a free Microsoft Azure pass and the opportunity to complete several disaster recovery related labs during the course of the workshop.
  • 9/1 – Seattle, WA
  • 9/3 – San Francisco, CA
  • 9/22 – Houston, TX
  • 9/29 – Charlotte, NC
  • 9/30 – Malvern, PA
  • 10/6 – Indianapolis, IN
  • 10/7 – Tampa, FL
  • 10/8 – New York, NY
  • 10/14 – Irvine, CA
  • 10/16 – Dallas, TX
Register now to join in!

I’ve Got Nothing: The DR Checklist

So what do you have to lose?  If you’ve been reading along with the blog series, I hope you’ve been thinking a bit about ways you can bring your disaster recovery plans to the next level. My first post in the series on what to consider might have gotten you started on some of the items in this list. If you need some ideas of where to go next, or if you happen to be just starting out, here is a even longer list of things you might need.

Disclaimer: I love technology, I think that cloud computing and virtualization are paramount to increasing the speed you can get your data and services back online. But when disaster strikes, you can bet I’m reaching for something on paper to lead the way.  You do not want your recovery plans to hinge on finding the power cable for that dusty laptop that is acting as the offline repository for your documentation. It’s old school, but it works. If you have a better suggestion than multiple copies of printed documentation, please let me know. Until then, finding a ring binder is my Item #0 on the list.  (Okay, Hyper-V Recovery Manager is a pretty cool replacement for paper if you have two locations, but I’d probably still have something printed to check off…)

The Checklist

  1. Backups – I always start at the backups. When your data center is reduced to a pile of rubble the only thing you may have to start with is your backups, everything else supports turning those backups into usable services again. Document out your backup schedule, what servers and data are backed up to what tapes or sets, how often those backups are tested and rotated. Take note if you are backing up whole servers as VMs, or just the data, or both. (If you haven’t yet, read Brian’s post on the value of virtual machines when it comes to disaster recovery.)
  2. Facilities – Where are you and your backups going to come together to work this recovery magic? Your CEO’s garage? A secondary location that’s been predetermined? The Cloud?  List out anything you know about facilities. If you have a hot site or cold site, include the address, phone numbers and access information. (Look at Keith’s blog about using Azure for a recovery location.)
  3. People – Your DR plan should include a list of people who are part of the recovery process. First and foremost, note who has the right to declare a disaster in the first place. You need to know who can and can’t kick off a process that will start with having an entire set of backups delivered to an alternate location.  Also include the contact information for the people you need to successfully complete a recovery – key IT, facilities and department heads might be needed.  Don’t forget to include their backup person.
  4. Support Services – Do you need to order equipment?  Will you need support from a vendor? Include names and numbers of all these services and if possible, include alternatives outside of your immediate area. Your local vendor might not be available if the disaster is widespread like an earthquake or weather incident.
  5. Employee Notification System – How do you plan on sharing information with employees about the status of the company and what services will be available to use?  Your company might already have something in place – maybe a phone hotline or externally hosted emergency website. Make sure you are aware of it and know how you can get updates made to the information.
  6. Diagrams, Configurations and Summaries – Include copies of any diagrams you have for networking and other interconnected systems. You’ll be glad you have them for reference even if you don’t build your recovery network the same way.
  7. Hardware – Do you have appropriate hardware to recover to? Do you have the networking gear, cables and power to connect everything together and keep it running? You should list out the specifications of the hardware you are using now and what the minimum acceptable replacements would be. Include contact information for where to order hardware from and details about how to pay for equipment. Depending on the type of disaster you are recovering from, your hardware vendor might not be keen on accepting a purchase order or billing you later. If you are looking at Azure as a recovery location, make sure to note what size of compute power would match up.
  8. Step-By-Step Guides – If you’ve started testing your system restores, you should have some guides formed.  If your plans include building servers from the ground up, your guides should include references to the software versions and licensing keys required. When you are running your practice restores, anything that makes you step away from the guide should be noted. In my last disaster recovery book, I broke out the binder in sections, in order of recovery with the step-by-steps and supporting information in each area. (Extra credit if you have PowerShell ready to automate parts of this.)
  9. Software – If a step in your process includes loading software, it needs to be available on physical media. You do not want to have to rely on having a working, high-speed Internet connect to download gigs of software.
  10. Clients – Finally, don’t forget your end users. Your plan should include details about how they will be connecting, what equipment they would be expected to use if the office is not available and how you will initially communicate with them.  Part of your testing should include having a pilot group of users attempt to access your test DR setup so you can improve the instructions they will be provided. Chances are, you’ll be too busy to make individual house calls. (For more, check out Matt’s post on using VDI as a way to protect client data.)

Once you have a first pass gathering of all your disaster recovery items and information, put it all in a container that you can send out to your off-site storage vendor or alternate location. Then when you practice, start with just the box – if you can’t kick off a recovery test with only the contents (no Internet connection and no touching your production systems) improve them and try again.  Granted, if you are using the cloud as part of your plan, make sure you know which parts require Internet access, have a procedure for alternative connectivity and know what parts of your plans would stall while securing that connection.  You won’t be able to plan for every contingency, but knowing where parts of the plan can break down makes it easier to justify where to spend money for improvement, or not.

No matter the result of your testing, it will be better than the last time. Go forth and be prepared.

Oh, one more thing, if you live in a geographic area where weather or other “earthly” disasters are probable, please take some time to do some DR planning for your home as well.  I don’t care who you work for, if your home and family aren’t secure after a disaster you certainly won’t be effective at work. Visit www.ready.gov or www.redcross.org/prepare/disaster-safety-library for more information.

This is post part of a 15 part series on Disaster Recovery and Business Continuity planning by the US based Microsoft IT Evangelists. For the full list of articles in this series see the intro post located here: http://mythoughtsonit.com/2014/02/intro-to-series-disaster-recovery-planning-for-i-t-pros/

Question: Is there value in testing your Disaster Recovery Plan?

Answer: Only if you want a shot at it actually working when you need it.

There are a few reasons you need to regularly test your recovery plans… I’ve got my top three.

  1. Backups only work if they are good.
  2. Your documentation is only useful if you can follow it.
  3. You are soft and easily crushed.

Backups
Everyone knows the mantra of “backup, backup, backup” but you also have to test those backups for accuracy and functionality. I’m not going to beat this one endlessly, but please read an old post of mine – “Epic Fail #1” to see how backups can fail in spectacular, unplanned ways.

Documentation
Simply put, you need good documentation. You need easy to locate lists of vendors, support numbers, configuration details of machines and applications, notes on how “this” interacts with “that”, what services have dependencies on others and step by step instructions for processes you don’t do often and even those you DO do everyday.

When under pressure to troubleshoot an issue that is causing downtime, it’s likely you’ll loose track of where to find information you need to successfully recover.  Having clean documentation will keep you calm and focused at a time you really need to have your head in the game.

Realistically, your documentation will be out of date when you use it.  You won’t mean for it to be, but even if you have a great DR plan in place, I’ll bet you upgraded a system, changed vendors, or altered a process almost immediately after your update cycle. Regular review of your documents is a valuable part of testing, even if you don’t touch your lab.

My personal method is to keep a binder with hard copies of all my DR documentation handy.  Whenever I change a system, I make a note on the hard-copy. Quarterly, I update the electronic version and reprint it.  With the binder, I always have information handy in case the electronic version is not accessible and the version with the handwritten notes is often more up to date with the added margin notes. Even something declaring a section “THIS IS ENTIRELY WRONG NOW” can save someone hours of heading down the wrong path.

You
No one wants to contemplate their mortality, I completely understand. (Or maybe you just want to go on vacation without getting a call half way through. Shocker, right?) But if you happen to hold the only knowledge of how something works in your data center, then you are a walking liability for your company. You aren’t securing your job by being the only person with the password to the schema admin account, for example. It only takes one run in with a cross-town bus to create a business continuity issue for your company that didn’t even touch the data center.

This extends to your documentation. Those step-by-step instructions for recovery need to include information and tips that someone else on your team (or an outside consultant) can follow without having prior intimate knowledge of that system.  Sometimes the first step is “Call Support, the number is 800-555-1212” and that’s okay.

The only way to find out what others don’t know is to test.  Test with tabletop exercises, test with those backup tapes and test with that documentation.  Pick a server or application and have someone who knows it best write the first draft and then hand it to someone else to try to follow. Fill in the blanks. Repeat. Repeat again.

A lot of this process requires only your time. Time you certainly won’t have when your CEO is breathing down you neck about recovering his email.

Additional Resources
This is post part of a 15 part series on Disaster Recovery and Business Continuity planning by the US based Microsoft IT Evangelists. For the full list of articles in this series see the intro post located here: http://mythoughtsonit.com/2014/02/intro-to-series-disaster-recovery-planning-for-i-t-pros/

If you are ready to take things further, check out Automated Disaster Recovery Testing with Hyper-V Replica and PowerShell – http://blogs.technet.com/b/keithmayer/archive/2012/10/05/automate-disaster-recovery-plan-with-windows-server-2012-hyper-v-replica-and-powershell-3-0.aspx