The Network is Down, and other things you don't want to hear in a disaster.
If you are a Systems Administrator, the refrain the network is down is routine. Depending on the sophistication of your users, it might be a more specific lament, such as the Internet is broken or email is unavailable. If you have really sophisticated users, you might actually get a real error message to work with. And if you work in a large organization, the level of sophistication could be all over the map. Computers and the associated networks have become so much a part of our daily life that when they are not available, everything grinds to a halt. But they have also become so user-friendly (to a point) that user training has been eliminated in most companies for all but the most sophisticated applications. End-users are expected to know how to use the standard suite of office automation applications. But rarely are they expected to know more than that. In fact, if you gathered ten random people in a room and asked how the information got from point A to point B, nine of them would look at you blankly and wonder why you were asking them such a technical question. This point was driven home to me in some reading I am doing. Cisco Press has released the third edition of Pricilla Oppenheimer's Top-Down Network Design. When it was first released, it was really the only reference implementation for good network design and the information in it has been updated to reflect the change in direction, and thus the need to rethink the preconceptions we have been building networks under in the new paradigm of the glass cloud we are moving into. But what got my attention was the section on Disaster Recovery. First, she asks this question:
Have you figured out what to do if the disaster involves a serious disease where the server and network administrators need to be quarantined?
If you are looking at this question and your first response is, "it will never happen," I would urge you to think again, especially if you are in line or senior management. As we consolidate, the skills sets are being consolidated and the jobs are being eliminated. If your IT shop is representative, more than half of the staff are gone. And that is after a period of downsizing where more than a quarter of the staff were let go since 2000. Or, to put it another way, a really bad cold could knock out your key people for several days even if they are not the ones who get sick! But the second point that really hit home was this:
Not only must the technology be tested, but employees must be drilled on the actions they should take in a disaster...people should practice working with the network in the configuration it will likely have after a disaster.... The drills should be taken seriously and should be designed to include time and stress pressures to simulate the real thing.
To put it another way, do your users know what their responsibilities are in the event that the network is down and how to do a little troubleshooting to help out? Most corporate disaster plans have a lot of magic happens here sections where the magic is to be filled in by the responsible department. And if you are the responsible person in that department, I am willing to bet that disaster, or continuity of operations planning, is hardly the top item on your to do list. In fact, I would be surprised if it even was on your to do list. Yet an emergency can strike at any moment, with or without a plan, and unless you are really a wizard, magic does not happen here. So I encourage you to do your part. You might not get the boss to agree to sending everyone to network training, but every little bit helps. And at the end of the day, it might actually make your job easier.