Arm in ARM with VM Backups

With all the interesting announcements with BUILD this week, you might have missed this which, to me, signals that the end is near for the wait for Azure Backup and DR related features to be available in “Azure v2” aka Azure Resource Manager.

Now in public preview: Azure Backup for ARM VMs! Check out the full post with instructions on how to get started – https://azure.microsoft.com/en-us/documentation/articles/backup-azure-vms-first-look-arm/

Seeing some of the these features being ported over means that organizations looking to start out using the cloud to supplement there on-prem backup and recover plans aren’t locked into the Azure Service Manager model.

No word yet on the big guns like Azure Site Recovery, but being able to snapshot an Azure VM for backup purposes making moving some on-prem workloads to the cloud much easier!

Today I Learned: Azure Site Recovery Mobility Service

Today I learned that if you are switching from the legacy version of ASR to the new version of ASR, the Mobility Service version has a new revision, which would be expected. However, you have to manually uninstall the older version from the servers you wish to protect. Otherwise, the “push” installation just fails and doesn’t report anything really useful in the Azure portal.

When I went to manually install it on the server, I got better error messages from the application noting that a previous version of the service was installed and needed manual removal.

Perhaps I’ve just saved you some time one day.

Business Continuity and the Cloud

This week marks the start of TechNet on Tour, coming to twelve cities. The full day workshops include lecture and hands-on-labs where you can learn about some of the ways you can utilize Microsoft Azure to help with your disaster recovery planning.

But let me tell you about the first “business continuity” plan I was part of. It involved a stash of tapes, daily backups on a two week cycle with the Friday backups being held for a month. The nightly backup job fit on two tapes and every morning, I ejected the tapes from the machine and dropped them in my bag. They went home with me, across town, and came back every day to be swapped with latest ones. Whenever I took a vacation, I designated an available person to perform the same task.

That was it. The tapes were rarely looked at, the data never tested and fortunately, never needed. We were partying like it was 1999. Because it was.

Still, the scenario isn’t uncommon. There are still lots of small businesses, with only single locations and still lots of tapes out there. But now, there is more data and more urgency for that data to be recovered as quickly as possible with as little loss as possible. And there are still only 24 hours in the day. How annoying to arrive at work in the morning, only to find the overnight backup job still running.

As I moved through jobs and technologies evolved, we addressed the growing data and lack of time in many ways… Adjusting backup jobs to capture less critical or infrequently changing data only over the weekends. More jobs that only captured delta changes. Fancier multiple-tape changers, higher density tapes, local “disk to disk” backups that were later moved to tape, even early “Internet” backup solutions, often offered by the same companies that handled your physical tape and box rotation services.

We also chased that holy-grail of “uptime”. Failures weren’t supposed to happen if you threw enough hardware in a room. Dual power supplies, redundant disk arrays, multiple disk controllers, UPS systems with various bypass offerings. Add more layers to protect the computers, the data.

Testing was something we wanted to do more often. But it was hard justify additional hardware purchases to upper management. Hard to find the time to set up a comprehensive test. But we tried and often failed. And learned. Because each test or real outage is a great opportunity to learn. Outages are often perfect storms… if only we had swapped out that dying drive a day before, if only that piece of hardware was better labeled, if only that was better documented… and each time we made improvements.

I remember, after a lengthy call with a co-location facility that wanted us to sign a year agreement even though we only wanted space for 3 months to run a recovery test, how I wished for something I could just use for the time I needed. It’s been a little over 5 years since that phone call, but finally there is an answer and it’s “the cloud”.

Is there failure in the cloud? Of course, it’s inevitable. For all the abstractness, it’s still just running on hardware. But the cloud provides part of an answer that many businesses simply didn’t have even five years ago. Business that never recovered from the likes of Katrina and other natural or man-made disasters, might still have a shot today.

So catch a TechNet Tour if it passes through your area. Look at taking advantage of things like using the cloud as target instead of tape, or replicating a VM to Azure with Azure Site Recovery. Even starting to dabble in better documentation or scripting with PowerShell to make your key systems more consistently reproducible will go a long way. Do a “table top” dry run of your existing DR plan today.

Sysadmins don’t let other sysadmins drop DLT tapes in their bags. Let’s party like it’s 2015. Because it is.

Summer Reads!

Ah, summertime…. Vacations, relaxing on the patio, fruit salads, sparkly drinks and learning. Right? I spent some time by the beach and the pool recently and then came back to a pile of interesting things I wanted to read or try out.

There are also two new video blogs available on Channel 9 that will keep adding new content you might want to check out.

Raw Tech – Lots of Dev stuff, but a pretty interesting assortment.
Azure Documentation Shorts – Quick videos covering the how-to of what’s documented for Azure.

TechNet on Tour – Disaster Recovery!

We technical evangelists are at it again! This September and October, we will visit 10 cities to talk about using Microsoft Azure as part of your disaster recovery plan.

Attendees will receive a free Microsoft Azure pass and the opportunity to complete several disaster recovery related labs during the course of the workshop.

9/1 – Seattle, WA
9/3 – San Francisco, CA
9/22 – Houston, TX
9/29 – Charlotte, NC
9/30 – Malvern, PA
10/6 – Indianapolis, IN
10/7 – Tampa, FL
10/8 – New York, NY
10/14 – Irvine, CA
10/16 – Dallas, TX

I’ve Got Nothing: The DR Checklist

So what do you have to lose? If you’ve been reading along with the blog series, I hope you’ve been thinking a bit about ways you can bring your disaster recovery plans to the next level. My first post in the series on what to consider might have gotten you started on some of the items in this list. If you need some ideas of where to go next, or if you happen to be just starting out, here is a even longer list of things you might need.

Disclaimer: I love technology, I think that cloud computing and virtualization are paramount to increasing the speed you can get your data and services back online. But when disaster strikes, you can bet I’m reaching for something on paper to lead the way. You do not want your recovery plans to hinge on finding the power cable for that dusty laptop that is acting as the offline repository for your documentation. It’s old school, but it works. If you have a better suggestion than multiple copies of printed documentation, please let me know. Until then, finding a ring binder is my Item #0 on the list. (Okay, Hyper-V Recovery Manager is a pretty cool replacement for paper if you have two locations, but I’d probably still have something printed to check off…)

The Checklist

Backups – I always start at the backups. When your data center is reduced to a pile of rubble the only thing you may have to start with is your backups, everything else supports turning those backups into usable services again. Document out your backup schedule, what servers and data are backed up to what tapes or sets, how often those backups are tested and rotated. Take note if you are backing up whole servers as VMs, or just the data, or both. (If you haven’t yet, read Brian’s post on the value of virtual machines when it comes to disaster recovery.)
Facilities – Where are you and your backups going to come together to work this recovery magic? Your CEO’s garage? A secondary location that’s been predetermined? The Cloud? List out anything you know about facilities. If you have a hot site or cold site, include the address, phone numbers and access information. (Look at Keith’s blog about using Azure for a recovery location.)
People – Your DR plan should include a list of people who are part of the recovery process. First and foremost, note who has the right to declare a disaster in the first place. You need to know who can and can’t kick off a process that will start with having an entire set of backups delivered to an alternate location. Also include the contact information for the people you need to successfully complete a recovery – key IT, facilities and department heads might be needed. Don’t forget to include their backup person.
Support Services – Do you need to order equipment? Will you need support from a vendor? Include names and numbers of all these services and if possible, include alternatives outside of your immediate area. Your local vendor might not be available if the disaster is widespread like an earthquake or weather incident.
Employee Notification System – How do you plan on sharing information with employees about the status of the company and what services will be available to use? Your company might already have something in place – maybe a phone hotline or externally hosted emergency website. Make sure you are aware of it and know how you can get updates made to the information.
Diagrams, Configurations and Summaries – Include copies of any diagrams you have for networking and other interconnected systems. You’ll be glad you have them for reference even if you don’t build your recovery network the same way.
Hardware – Do you have appropriate hardware to recover to? Do you have the networking gear, cables and power to connect everything together and keep it running? You should list out the specifications of the hardware you are using now and what the minimum acceptable replacements would be. Include contact information for where to order hardware from and details about how to pay for equipment. Depending on the type of disaster you are recovering from, your hardware vendor might not be keen on accepting a purchase order or billing you later. If you are looking at Azure as a recovery location, make sure to note what size of compute power would match up.
Step-By-Step Guides – If you’ve started testing your system restores, you should have some guides formed. If your plans include building servers from the ground up, your guides should include references to the software versions and licensing keys required. When you are running your practice restores, anything that makes you step away from the guide should be noted. In my last disaster recovery book, I broke out the binder in sections, in order of recovery with the step-by-steps and supporting information in each area. (Extra credit if you have PowerShell ready to automate parts of this.)
Software – If a step in your process includes loading software, it needs to be available on physical media. You do not want to have to rely on having a working, high-speed Internet connect to download gigs of software.
Clients – Finally, don’t forget your end users. Your plan should include details about how they will be connecting, what equipment they would be expected to use if the office is not available and how you will initially communicate with them. Part of your testing should include having a pilot group of users attempt to access your test DR setup so you can improve the instructions they will be provided. Chances are, you’ll be too busy to make individual house calls. (For more, check out Matt’s post on using VDI as a way to protect client data.)

Once you have a first pass gathering of all your disaster recovery items and information, put it all in a container that you can send out to your off-site storage vendor or alternate location. Then when you practice, start with just the box – if you can’t kick off a recovery test with only the contents (no Internet connection and no touching your production systems) improve them and try again. Granted, if you are using the cloud as part of your plan, make sure you know which parts require Internet access, have a procedure for alternative connectivity and know what parts of your plans would stall while securing that connection. You won’t be able to plan for every contingency, but knowing where parts of the plan can break down makes it easier to justify where to spend money for improvement, or not.

No matter the result of your testing, it will be better than the last time. Go forth and be prepared.

Oh, one more thing, if you live in a geographic area where weather or other “earthly” disasters are probable, please take some time to do some DR planning for your home as well. I don’t care who you work for, if your home and family aren’t secure after a disaster you certainly won’t be effective at work. Visit www.ready.gov or www.redcross.org/prepare/disaster-safety-library for more information.

This is post part of a 15 part series on Disaster Recovery and Business Continuity planning by the US based Microsoft IT Evangelists. For the full list of articles in this series see the intro post located here: http://mythoughtsonit.com/2014/02/intro-to-series-disaster-recovery-planning-for-i-t-pros/

Question: Is there value in testing your Disaster Recovery Plan?

Answer: Only if you want a shot at it actually working when you need it.

There are a few reasons you need to regularly test your recovery plans… I’ve got my top three.

Backups only work if they are good.
Your documentation is only useful if you can follow it.
You are soft and easily crushed.

Backups
Everyone knows the mantra of “backup, backup, backup” but you also have to test those backups for accuracy and functionality. I’m not going to beat this one endlessly, but please read an old post of mine – “Epic Fail #1” to see how backups can fail in spectacular, unplanned ways.

Documentation
Simply put, you need good documentation. You need easy to locate lists of vendors, support numbers, configuration details of machines and applications, notes on how “this” interacts with “that”, what services have dependencies on others and step by step instructions for processes you don’t do often and even those you DO do everyday.

When under pressure to troubleshoot an issue that is causing downtime, it’s likely you’ll loose track of where to find information you need to successfully recover. Having clean documentation will keep you calm and focused at a time you really need to have your head in the game.

Realistically, your documentation will be out of date when you use it. You won’t mean for it to be, but even if you have a great DR plan in place, I’ll bet you upgraded a system, changed vendors, or altered a process almost immediately after your update cycle. Regular review of your documents is a valuable part of testing, even if you don’t touch your lab.

My personal method is to keep a binder with hard copies of all my DR documentation handy. Whenever I change a system, I make a note on the hard-copy. Quarterly, I update the electronic version and reprint it. With the binder, I always have information handy in case the electronic version is not accessible and the version with the handwritten notes is often more up to date with the added margin notes. Even something declaring a section “THIS IS ENTIRELY WRONG NOW” can save someone hours of heading down the wrong path.

You
No one wants to contemplate their mortality, I completely understand. (Or maybe you just want to go on vacation without getting a call half way through. Shocker, right?) But if you happen to hold the only knowledge of how something works in your data center, then you are a walking liability for your company. You aren’t securing your job by being the only person with the password to the schema admin account, for example. It only takes one run in with a cross-town bus to create a business continuity issue for your company that didn’t even touch the data center.

This extends to your documentation. Those step-by-step instructions for recovery need to include information and tips that someone else on your team (or an outside consultant) can follow without having prior intimate knowledge of that system. Sometimes the first step is “Call Support, the number is 800-555-1212” and that’s okay.

The only way to find out what others don’t know is to test. Test with tabletop exercises, test with those backup tapes and test with that documentation. Pick a server or application and have someone who knows it best write the first draft and then hand it to someone else to try to follow. Fill in the blanks. Repeat. Repeat again.

A lot of this process requires only your time. Time you certainly won’t have when your CEO is breathing down you neck about recovering his email.

Additional Resources
This is post part of a 15 part series on Disaster Recovery and Business Continuity planning by the US based Microsoft IT Evangelists. For the full list of articles in this series see the intro post located here: http://mythoughtsonit.com/2014/02/intro-to-series-disaster-recovery-planning-for-i-t-pros/

If you are ready to take things further, check out Automated Disaster Recovery Testing with Hyper-V Replica and PowerShell – http://blogs.technet.com/b/keithmayer/archive/2012/10/05/automate-disaster-recovery-plan-with-windows-server-2012-hyper-v-replica-and-powershell-3-0.aspx

Disaster Recovery for IT Pros: How to Plan, What are the Considerations?

I’ve done a little disaster recovery planning in my day. As an IT Professional, it’s really easy to get caught up in the day-to-day. We have users that need assistance, servers that need love (updates), applications that need upgrading… whatever today’s problem is, it needed solving yesterday. Disaster Recovery is often the elephant in the room, the insurance you don’t have time to buy. Everyone knows it’s needed, no one ever wants to use it and often, there is no clear way to begin.

I’ve always thought that being an IT Pro is one of the most powerful, powerless jobs in existence. We have our fingers on the pulse of what makes our businesses run, we have access to ALL THE DATA and we have the power to control access and availability to the resources. But we are often slaves to the business – we are responsible for providing the best up times, the best solutions and the best support we can. Facing budgets we can’t always control while trying to explain technology to people who don’t have time to understand it.

So where do you begin when tasked with updating or creating your disaster recovery plan? The good news is you don’t need money or lots of extra hardware to start good disaster recovery planning – grab the note-taking tools of your choice and start asking questions.

Here are my three main questions to get started:

What is the most important application or services in each business unit or for the business overall?
How much downtime is acceptable?
How much data loss is acceptable?

These are your considerations. Period. I didn’t mention money, but I know you want to argue that you can’t recover without it. And that is true. But until you know what your goal is, you have no idea how much it may or may not cost.

This post is one of many in disaster recovery series being penned by the IT Pro Evangelists at Microsoft. As the series progresses, you’ll find the complete list on Brian Lewis’s blog post, “Blog Series: DR Planning for IT Pros.” We will cover tools and applications you can consider in you planning and get you started with using them. They have various costs, but until you know your goal, you won’t know what tools will help and can’t argue the budget.

So let’s put the pencil to the paper and start answering those three questions.

Start at the top: Go to upper management, have your CTO or CIO to pull together a leadership meeting and rank what systems the business units use and what they think is needed first. Get them to look at the business overall and determine how much downtime is too much, how quickly do they want services recovered and how much data they are willing to lose.

When it comes to determining your internal SLA you do need to know what scenario you are planning for. Preparing for a riot that blocks access to your office is different than an earthquake that renders your data center a steaming pile of rubble. Ultimately, you want different plans for different scenarios, but if you must start somewhere, go with the worst case so you can cover all your bases.

But what if you can’t get leadership to sit down for this, or they want you to come to the table first with draft plan. Just GUESS.

Seriously, you have your hand on the data center, you know the primary goals of your business. If it was your company, what do you think you need to recover first? Use your gut to get you started. Look at your data center and pick out some of the key services that likely need to be recovered first to support the business needs. Domain controllers, encryption key management systems, infrastructure services like DNS and DHCP, communication tools and connectivity to the Internet might float to the top.

Sort the List: People want email right away? Great, that also needs an Internet connection and access to your authentication system, like Active Directory. People want the document management system or CRM or some in-house app with a database back-end? Fabulous, you need your SQL Servers and maybe the web front-end or the server that supports the client application.

Gather Your Tools: Look at your list of loosely ranked servers, devices and appliances and start building a shopping list of things you need to even start recovery. I always start with the “steaming pile of rubble” scenario, so my list starts like this:

Contact information for hardware and software vendors
Contact information and locations where my data center can function temporarily
List of existing hardware and specifications that would need to be met or exceeded if ordering new equipment for recovery
List of operating systems and other software, with version details and support documentation
Names of the people in the company that would be crucial to the successful recovery of the data center

Type this all up. If any of the things listed above involve looking at a server or visiting a web page, remember that in the “pile-of-rubble” scenario you will likely not have access to those resources. Save it wherever you save this type of documentation. Then print out a copy and put it in a binder on your desk. Print out another copy, seal it in an envelope and take it home.

Congratulations! You are closer to a usable DR plan than you were before you started and we’ve just scratched the surface. Disaster Recovery planning is often pushed off until tomorrow. Whatever you have today, be it an outdated document from your company leadership, server documentation that is a year old, or NOTHING, you can take time each day to improve it. How you plan is going to depend on the needs of your organization and you won’t be able to complete the process in a silo, but you can get started.

I really enjoy disaster recovery planning. It’s challenging, it’s ever changing and I haven’t even mentioned how things like virtualization, Hyper-V Replica and Azure can be some of the tools you use. Stay tuned for more in the series about how some of those things can come into play. Sometimes the hardest part about disaster recovery planning is just getting started.

***

End of the Month Round Up

I’m looking forward to attending TechEd in Orlando in two weeks. If you haven’t already signed up to attend, it might actually be too late! TechEd is sold out this year and they are accepting names for the waiting list only at this time. I imagine it will be a crazy time, filled with lots of learning and networking with peers.

I won’t be speaking this year, but that just gives me more time to attend some of the great sessions – I’ll be concentrating on Active Directory in Server 2012, Exchange 2010, PowerShell and some System Center.

If you are hoping for something more local to your home town, check out the Windows Server 2012 Community Roadshow. US locations will include Houston, Chicago, Irvine, New York and San Jose, just to name a few. Microsoft MVPs will be presenting the content, so don’t miss out a free chance to prepare for the release of Server 2012.

Another notable event that’s upcoming is the World IPv6 Launch. Check out which major ISPs and web companies are turning on IPv6 for the duration.

Finally, if you are looking to make some improvements to your personal, cloud-based storage and file management for your personal computers, take a look at SugarSync. I’ve been using it for several years and it’s been an easy way for me to access files from multiple computers and keep everything synced and backed up. I’ve even got a link for a referral if you’d like to try it out.

Recovering Exchange 2010 – Notes from the Field

With Exchange 2007/2010 more tightly integrated with Active Directory, recovering a server after a loss of hardware can be significantly easier than with previous version of Exchange. This is a boon for those of us in smaller offices where only one Exchange Server exists, holding multiple roles.

Check out this TechNet article with the basics for recovering Exchange 2010. However, there are some little tips that would be helpful, especially when you might be working under a stressful situtation to restore your mail system.

Make sure you know where your install directory is if Exchange isn’t installed in the default location. If you don’t have it written down as part of your disaster recovery documentation, you can get that information out of Active Directory using ADSIEDIT.
Make sure you know the additional syntax for “setup /m:RecoverServer” switch. If you need to change the target directory the proper syntax is /t:”D:\Microsoft\Exchange\V14″ or whatever your custom path happens to be.
If you are planning on using the /InstallWindowsComponents switch to save some time with getting your IIS settings just right, make sure you’ve preinstalled the .NET Framework 3.5.1 feature set first.
Don’t forget to preinstall the Office 2010 Filter Packs. You don’t need them to complete the setup, but you will be reminded about them as a requirement.
Make sure you install your remote agent (or whatever components are necessary) for your backup software. Once the Exchange installation is restored, you’ll need to mark your databases as “This database can be overwritten by a restore” so that you can restore the user data.

As always, planning ahead will save you in times of trouble. Happy disaster recovery planning (and testing)!