This month, one of our data center projects was to clean up the mess of cabling that had gotten out of hand after years of adds, moves and changes to switches and other equipment. I find it interesting that with so many wireless devices around and so much talk of using virtualization and the cloud, we still spend so much time tangled in cords and cables! Cable management can often be a challenge and this had become downright embarrassing. Here is a before picture:
We took on a pretty extensive list of tasks as part of this clean up, including replacing server older networking components with a single new Cisco ASA. While it’s usually not recommended to make several logical and physical changes at the same time so you can avoid troubleshooting nightmares later, we were taking advantage of a planned power outage and wanted to accomplish as much as we can while we had everything turned off – including rebalancing all our servers on our power circuits, updating our UPS firmware and recabling every server and workstation port in the data center.
Here is shot of the same racks after the project was nearly complete. It’s like night and day!
Everything is labled and color coded for ease of use. And we were lucky that all of our servers, appliances and services were powered on and returned to service without much trouble. This project also forced me to update several out-of-date diagrams and charts that are used for managing the network.
While it was a crazy weekend with our own version of a “spaghetti western”, the end result was well worth it!
At least once a year, the time comes to re-address the documentation around the IT department regarding disaster recovery. One of the things I’ve been working on improving over the last two years is our network runbook. We keep a copy of this binder in two places – in our document management system (which can be exported to a CD) and in hard copy, because when systems are down the last thing you want to be unable to access is the documentation about how to make things work again.
Here’s a rundown of what I have in mine so far, it’s in 10 sections:
- Runbook Summary – A list of all servers with their IP address, main purpose, a list of notable applications running on each and which are virtual or not. I also include a list of which servers are running which operating system, a list of key databases on servers and finally copies of some of our important passwords.
- Enterprise AD – A listing of all corporate domains and which servers perform what roles. I include all IP information for each server, the partitions and volumes on each and where the AD database is stored. Functional levels for the domain and forest are also documented.
- Primary Servers and Functions – This is similar to the Enterprise AD section, but it’s for all non-domain controllers. I list out server information for file services, database servers and their applications and backup servers. I document shares, partition and volume information (including the size), important services that should be running and where to find copies of installation media.
- ImageRight – Our document management system deserves it’s own section. In addition to the items similar to the servers in the previous section, I also include some basic recovery steps, dependencies and the boot sequence of the servers and services. Any other information for regular maintenance or activities on this system are also included here.
- Email / Exchange – This is another key system that deserves it’s own section in my office. I include all server details (like above) and also completely list out every configuration setting in Exchange 2003. This will be less of an issue with Exchange 2007 or 2010 where more of the configuration information is stored in Active Directory. However, it makes me feel better to have it written down. I also include documentation related to our third-party spam firewall and other servers related to email support.
- Backup Details – A listing of each backup server, what jobs it manages and what data each of those jobs capture.
- Telecommunications – Details about the servers and key services. I also include information regarding our auto attendants, menu trees and software keys.
- Networking – Maps and diagrams for VLANs, static IP address assignments, external IP addresses
- Contacts & Support – Internal and external support numbers. Also include circuit numbers and other important identifying information.
- Disaster Recovery – Information about the location of our disaster recovery kit, hot line and website. A list of the contents of our disaster kit and knowledge base articles related to some of our DR tasks and hard copies of all our disaster recovery steps.
This binder is always in flux – I’m always adding and changing information and making notes, as well as trying to keep up with changes that other team members are making to the systems they work with most. It will never be “done” but I’m hoping that whenever I have to reach for it, that it will always be good enough.
I’ve posted several times about working on a disaster recovery project at the office using Server 2008 Terminal Services. We’ve officially completed the testing and had some regular staffers log on and check things out. That was probably one of the most interesting parts.
One issue with end user access was problems with the Terminal Services ActiveX components on Windows XP SP3. This is disabled by default as part of a security update in SP3. This can usually be fixed with a registry change which I posted about before, however that requires local administrative privileges that not all our testing users had. There are also ActiveX version issues if the client machine is running an XP service pack that is earlier than SP3.
Administrative privileges also caused some hiccups with one of our published web apps that required a Java plug-in. At one point, the web page required a Java update that could only be installed by a server administrator and this caused logon errors for all the users until that was addressed.
In this lab setting, we had also restored our file server to a different OS. Our production file server is Windows 2000 and in the lab we used Windows 2008. This resulted in some access permission issues for some shared and “home” directories. We didn’t spend any time troubleshooting the problem this time around, but when we do look to upgrade that server or repeat this disaster recovery test we know to look into the permissions more closely.
Users also experienced trouble getting Outlook 2007 to run properly. I did not have issues when I tested my own -there were some dialog boxes that needed to be address before it ran for the first time to confirm the username and such. While the answers to those boxes seem second nature to those of us in IT, we realized that will need to provide better documentation to ensure that users get email working right the first time.
In the end, detailed documentation proved to be the most important aspect of rolling this test environment out to end users. In the event of a disaster, it’s likely that our primary way of sharing initial access information would be by posting instructions to the Internet. Providing easy to follow instructions that include step-by-step screenshots that can be followed independently are critical. After a disaster, I don’t expect my department will have a lot of time for individual hand-holding for each user that will be using remote access.
Not only did this project provide an opportunity to update our procedures used to restore services, it showed that it’s equally as important to make sure that end users have instructions so they can independently access those services once they are available.
A colleague of mine asked a valid question about my last post regarding how my office IT department uses ImageRight for document management instead of something else, like a Wiki. Of course a Wiki would work just fine. So would SharePoint or any other software the helps manage documentation and allows for collaboration.
I’m not saying that ImageRight is the end-all, be-all for document management. It’s just that ImageRight is what we have. One of the big topics that came up at the Vertafore Connections conference I attended a few months ago was that many companies using the product only deploy it to one or two departments to perform very specific business functions. I’ve found that it can be used by many other business areas if one just takes the time to carve out a place for their specific documentation and processes.
There is that old “law of the instrument” that can make a familiar tool look like the panacea of all problems, but I’m not trying to make an unsuitable piece of software meet our needs. We are simply using a product that our company has already invested in, instead of looking outside our existing infrastructure for a new solution. Not only does this save licensing, installation and maintenance costs for an additional product, it encourages members of our department to use ImageRight regularly, making us better able to support the other staff members in the office. We are not only supporting the backend of the program, but interacting with it as an end-user as well – a win-win for everyone.
Since our implementation of ImageRight, our Network Operations team has embraced it as a way to organize our server and application documentation in a manner that makes it accessible to everyone in our team. Any support tickets, change control documents, white papers and configuration information that is stored in ImageRight is available to anyone in our group for reference.
This reduces version control issues and ensures that a common naming (or “filing”) structure is used across the board, making information easier to find. (For reference, an ImageRight “file” is a collection of documents organized together like a physical file that hangs in a file cabinet.) Plus, the ability to export individual documents or whole ImageRight “files” to a CD with an included viewer application is a great feature that I’m using as part of our Disaster Recovery preparations.
I have a single file that encompasses the contents of our network “runbook”. This file contains server lists and configuration details, IP and DNS information, network maps, application and service dependencies, storage share locations/sizes, support contact information, etc. It consists of text documents, spreadsheets, PDF files and other types of data. I keep a hard copy printed at my desk so I can jot notes when changes are needed, but ImageRight ensures I have an electronic backup that I can edit on a regular basis. Plus, I regularly export a updated copy to a CD that I add to the off-site Disaster Recovery box.
The value of ImageRight in a disaster scenario expands beyond just our configuration documents. In an office where we deal with large amounts of paper, encouraging people to see that those documents are added to ImageRight in a timely manner will ensure faster access to work products after an event prevents access to the office or destroys paper originals.