Disaster Recovery
Discussion
Hi,
I am trying to find a good guide for recovering data in the event of disaster but cannot find any useful guides on the web or library. Our situation is we have 7 servers running our web site, files, print and email. Does anyone have experience in this as there seems to be a lot of IT professionals. Any help would be greatly appreciated.
cheers
rac504uk
I am trying to find a good guide for recovering data in the event of disaster but cannot find any useful guides on the web or library. Our situation is we have 7 servers running our web site, files, print and email. Does anyone have experience in this as there seems to be a lot of IT professionals. Any help would be greatly appreciated.
cheers
rac504uk
rac504uk said:
Hi,
I am trying to find a good guide for recovering data in the event of disaster but cannot find any useful guides on the web or library. Our situation is we have 7 servers running our web site, files, print and email. Does anyone have experience in this as there seems to be a lot of IT professionals. Any help would be greatly appreciated.
cheers
rac504uk
Do you need help on the business planning side or the IT side.
the IT side is very straight forward assuming you have straight forward systems!
Most of our customers backup to a dedicated backup server tehn to either a rev drive (SME's) or to DLT/LTO etc for the larger ones.
Once a year we test the backups onto one of our spare servers.
If you don't have this kind of knowledge in house you might want to get a friendly IT bod involved as you have quite a number of servers to go wrong there!
Give us a bit more info and we can fill in the blanks for you.
This was posted on arstechnica a fair while ago which I saved as it's IMO an excellent strategy for writing your DR plan.
----------
posted by quux, arstechnica forums:
I've written DR plans for various things but they're actually considered confidential and nondisclosable - going through and scrubbing all company data out of them isn't something I'd have fun doing for free. However there are some common elements I can share.
Find some way to place a value on whatever it is you're protecting. Could be dollars per gigabyte or server, could be dollars per hour of lost work, could be liability if loss/damage/leakage of certain information makes you lawsuit prone, could be cost to reconstruct from scratch, could be combination of these and/or other factors. Think out of the box here. It's usually all but impossible to get an exact dollar figure, but estimate as well as you can. I cannot overstress the importance of making the customer (or boss or whomever) understand the value of what they stand to lose; preferably in cold, hard, cash figures. This not only sells the plan but justifies its continued upkeep. Think how many DR plans you've seen funded and then sort of whither into nothing for lack of a continued sense of why it's being done.
List as many risks to the protected asset as possible. Again think out of the box and put even your craziest risks down on paper. Lots of disasters are disasters simply because no one predicted it could happen - but if someone had, a small design change could have made it a small bump in the road rather than a business halting episode. IF we're talking about data consider these major types of risk: loss of data, corruption of data (imagine a single cell in a spreadsheet changing from 1,000,000 to 0: corruption and loss are not the same thing!), unauthorized copying of data (ie corp. espionage), inability to verify data is intact & unchanged, and so on. Consider less obvious disasters as well: what if all the admins of a key financial app got sick?
Try to assign some sort of probability to each risk identified above. Obviously you attack the high probable/high cost risks first and work your way down to the low probable/low risk things last.
Once you have these three items pretty well sketched out (they'll never be 100% complete because the world is a chaotic place), the rest just starts to fall like dominos. Consider the recovery plan for each of the risks, and whether the recovery plan costs more than the {protected asset} could ever be worth. Then start writing recovery plans - where possible a single recovery plan can cover multiple risks. But no single recovery strategy will cover ALL risks.
How granular your recovery plans are is up to you. I find that more granular is good; when the fit hits the shan it is extremely calming to get the book out, start with step one, and work thru till finish. You want to minimize the number of decisions people have to make under stress.
Pay very close attention to making sure the tools for disaster recovery are available - this is not a small part of the task! So it's time to recover all the servers from tape ... where are your OS and backup software CD's? Where are the docs for these? Where are the license keys? If the server room burned down, where will your company go to aquire new servers? What sort of lead times can they expect? Who has keys to the door locks, if the badge readers go down? What were the serial numbers on all the hardware? How big was each partition? What's the support number for all your critical hardware and software? And so on, and on... a lot of DR is knowing what you had when it's time to recreate it, and who you can call for help.
Similarly, consider where to get extra people. And generally think about how many people will be available during DR, and how long you'll work them. Whoever is in charge should be thinking in shifts at the beginning of the DR scenario - not at the end when everyone has gotten stupid tired.
Include realistic timelines to DR. Again, realize that many DR scenarios are high stress, and tools and people may not be easy to find. Something that takes 20 minutes in normal situations may take an hour or more under true DR conditions.
Have a communications plan! Be sure it points out who is in charge, who sets the priorities. And be sure it includes someone whose job is to keep everyone else off the backs of the working people. Nothing sucks more than to be trying to bring back a dead {whatever} with your cellphone ringing every 5 minutes as all the VP's, managers, and well-wishers call for status.
This seems silly, but give some thought and some ink to the question 'how do we know it's a disaster?' Having some threshold at which time the disaster plan is activated can be important. Similarly, think about the question 'when is the disaster over?'
Schedule DR walkthroughs or simulations. You feel silly doing these - until you notice some silly little thing (no backup software CD's? aaa!) that would completely stall a real-life disaster recovery.
Schedule DR reviews and repeat signoffs by all involved. People come and go, they forget things, etc. It'd be hilarious to discover that the DR 'chief' was a new guy completely unaware of the plan...
There's more, but this is what came off the top of my head. The DR plan isn't complete until it has been signed off on by someone with the signing authority to actually DO all the things outlined in the plan. Everyone else named in the plan should also review and sign off.
In the above I tried hard to stay away from actual DR tactics (restore from tape? have an offsite DR location with continuous data replication? outsource the whole shebang?) because these will differ from situation to situation.
DR is almost never given the attention it deserves until just after disaster strikes. Which is of course the worst time. I remember watching a fire chief testify in front of the county council once time. They took turns complaining about how many buildings had burned down that year, and trying to cut his funding. He let them all run down, and quietly asked how many buildings had NOT burned down that year, then went on to show graphs of the steadily declining number of fires in that county. His pitch was, 'we're a good fire department, but we'd rather be a great fire prevention department!'. He was fully funded that year. I think this is a great story for DR planners.
----------
There's also a guy who wrote a set of articles for the Novell site - they're an excellent first-hand account of a disaster (basement floor flooding), how they dealt with it and what they learned for the future. Well worth reading.
www.novell.com/coolsolutions/author/1125.html
----------
posted by quux, arstechnica forums:
I've written DR plans for various things but they're actually considered confidential and nondisclosable - going through and scrubbing all company data out of them isn't something I'd have fun doing for free. However there are some common elements I can share.
Find some way to place a value on whatever it is you're protecting. Could be dollars per gigabyte or server, could be dollars per hour of lost work, could be liability if loss/damage/leakage of certain information makes you lawsuit prone, could be cost to reconstruct from scratch, could be combination of these and/or other factors. Think out of the box here. It's usually all but impossible to get an exact dollar figure, but estimate as well as you can. I cannot overstress the importance of making the customer (or boss or whomever) understand the value of what they stand to lose; preferably in cold, hard, cash figures. This not only sells the plan but justifies its continued upkeep. Think how many DR plans you've seen funded and then sort of whither into nothing for lack of a continued sense of why it's being done.
List as many risks to the protected asset as possible. Again think out of the box and put even your craziest risks down on paper. Lots of disasters are disasters simply because no one predicted it could happen - but if someone had, a small design change could have made it a small bump in the road rather than a business halting episode. IF we're talking about data consider these major types of risk: loss of data, corruption of data (imagine a single cell in a spreadsheet changing from 1,000,000 to 0: corruption and loss are not the same thing!), unauthorized copying of data (ie corp. espionage), inability to verify data is intact & unchanged, and so on. Consider less obvious disasters as well: what if all the admins of a key financial app got sick?
Try to assign some sort of probability to each risk identified above. Obviously you attack the high probable/high cost risks first and work your way down to the low probable/low risk things last.
Once you have these three items pretty well sketched out (they'll never be 100% complete because the world is a chaotic place), the rest just starts to fall like dominos. Consider the recovery plan for each of the risks, and whether the recovery plan costs more than the {protected asset} could ever be worth. Then start writing recovery plans - where possible a single recovery plan can cover multiple risks. But no single recovery strategy will cover ALL risks.
How granular your recovery plans are is up to you. I find that more granular is good; when the fit hits the shan it is extremely calming to get the book out, start with step one, and work thru till finish. You want to minimize the number of decisions people have to make under stress.
Pay very close attention to making sure the tools for disaster recovery are available - this is not a small part of the task! So it's time to recover all the servers from tape ... where are your OS and backup software CD's? Where are the docs for these? Where are the license keys? If the server room burned down, where will your company go to aquire new servers? What sort of lead times can they expect? Who has keys to the door locks, if the badge readers go down? What were the serial numbers on all the hardware? How big was each partition? What's the support number for all your critical hardware and software? And so on, and on... a lot of DR is knowing what you had when it's time to recreate it, and who you can call for help.
Similarly, consider where to get extra people. And generally think about how many people will be available during DR, and how long you'll work them. Whoever is in charge should be thinking in shifts at the beginning of the DR scenario - not at the end when everyone has gotten stupid tired.
Include realistic timelines to DR. Again, realize that many DR scenarios are high stress, and tools and people may not be easy to find. Something that takes 20 minutes in normal situations may take an hour or more under true DR conditions.
Have a communications plan! Be sure it points out who is in charge, who sets the priorities. And be sure it includes someone whose job is to keep everyone else off the backs of the working people. Nothing sucks more than to be trying to bring back a dead {whatever} with your cellphone ringing every 5 minutes as all the VP's, managers, and well-wishers call for status.
This seems silly, but give some thought and some ink to the question 'how do we know it's a disaster?' Having some threshold at which time the disaster plan is activated can be important. Similarly, think about the question 'when is the disaster over?'
Schedule DR walkthroughs or simulations. You feel silly doing these - until you notice some silly little thing (no backup software CD's? aaa!) that would completely stall a real-life disaster recovery.
Schedule DR reviews and repeat signoffs by all involved. People come and go, they forget things, etc. It'd be hilarious to discover that the DR 'chief' was a new guy completely unaware of the plan...
There's more, but this is what came off the top of my head. The DR plan isn't complete until it has been signed off on by someone with the signing authority to actually DO all the things outlined in the plan. Everyone else named in the plan should also review and sign off.
In the above I tried hard to stay away from actual DR tactics (restore from tape? have an offsite DR location with continuous data replication? outsource the whole shebang?) because these will differ from situation to situation.
DR is almost never given the attention it deserves until just after disaster strikes. Which is of course the worst time. I remember watching a fire chief testify in front of the county council once time. They took turns complaining about how many buildings had burned down that year, and trying to cut his funding. He let them all run down, and quietly asked how many buildings had NOT burned down that year, then went on to show graphs of the steadily declining number of fires in that county. His pitch was, 'we're a good fire department, but we'd rather be a great fire prevention department!'. He was fully funded that year. I think this is a great story for DR planners.
----------
There's also a guy who wrote a set of articles for the Novell site - they're an excellent first-hand account of a disaster (basement floor flooding), how they dealt with it and what they learned for the future. Well worth reading.
www.novell.com/coolsolutions/author/1125.html
Gassing Station | Computers, Gadgets & Stuff | Top of Page | What's New | My Stuff


