Disaster recovery

Tue, 1st Sep 2009

FYI, this story is more than a year old

If your answer is “backup tapes”, you’re doing it wrong!

After many years as a pilot for the RAF, my father flew commercial airliners and joked that he was paid very well to watch a computer do his job.

Funnily enough, he wasn’t really joking - other than takeoff and landing, commercial planes largely fly themselves. So why was he paid so well? If something went wrong he was the disaster recovery plan. He had the skills and experience to ensure the plane landed safely; the role of IT is no different.

More than ever before, technology underpins most companies’ operations to the extent that any kind of service interruption can bring everything to a halt. Most business processes have become real-time; instant messaging and email have become the communications norm, there is a move toward cloud computing in the enterprise, and data volumes are growing exponentially. Business continuity is the primary driver of disaster recovery planning, and I guarantee if you ask your CEO what their top three concerns are, business continuity will be one of them.

Recently my team performed our six-monthly test tape restore exercise - it took almost six days to restore the data! When it comes to archiving, tape is king and I wouldn’t dream of stopping our nightly tape backup regimens; but business continuity is all about minimising downtime, so I find it staggering that some IT professionals still rely solely on backup tape as an answer to disaster recovery.

There is no one-size-fits-all solution to disaster recovery planning. It’s a process the entire company needs to be involved in, but the IT department will have the most significant input. From an IT perspective, there are only two basic questions you should be asking yourself to ensure your plans are effective:

1. What is critical to your company?Identify which systems would affect your company’s operations if they failed or suffered degraded performance. Then consider how long each system could be degraded or out of service for before having a significant impact on your business - one, four, 24 hours? Does it depend on the time of the month? What would the cost be in terms of lost revenue or productivity?

What about the people? Are you and your department all located on one site? Do you support other sites? Are things documented so someone else could step in if needed?

What if road workers accidentally cut all communications, including the internet connection, to your office for 48 hours? How would you interact with your suppliers and your customers? What would you do about email? How would you process the payroll?

2. What constitutes a disaster?Of course, there are the obvious ones, like fire, earthquake or an extended power outage. In these instances, what are the key systems you’ll need and who are the staff you would need back up and running most quickly? Do you have a second site staff could work from? And could you be replicating data or systems to it now in preparation? How would you communicate with your staff and co-ordinate them?

Then how about the less obvious disasters: a chemical spill, or pandemic like H1N1? Potentially all your systems and data could be fine, but nobody allowed physical access to the building. Is VPN access available and do key staff members have laptops? Do they take them home?

What would you do if just one of your critical systems failed? Should you be running a redundant backup system, or is it sufficient to mirror just the data so it can be instantly accessed?

Maybe your company can’t justify the cost and will accept the operational downtime required restoring from tape, but that’s not your call to make. Effective disaster recovery is about having a documented plan in place, preferably with recovery time-frames, which is communicated to and signed off by the management.

The above may seem like two simple questions, but I assure you the more you think on them, the more effective your disaster recovery planning will be and will no doubt lead to the next obvious question: ‘Assuming you’ve failed-over in a disaster situation, how do you get back to normal operations?’

Lastly, don’t forget disaster recovery plans need to be reviewed, updated (if needed) and tested regularly! How else will you have the skill to land that plane if things really go wrong?

Peter Mangin

Peter Mangin is the CIO for Saatchi & Saatchi New Zealand. Mangin also manages the New Zealand operations of Vivaki as well as other Publicis Groupe companies. He has worked in IT for more than 15 years and has been working in advertising for the last nine years.

Phone: +64 9 355 5000Email: peter.mangin@saatchi.co.nzWeb: www.saatchi.com

Share on: