Does NZ business have it's head in the sand regarding Disaster Recovery?
Disaster Recovery, or DR, as it's often shortened to, is one of those things that ICT departments love to hate. Not because it's technologically difficult but because NZ organisations seem to just 'not want to know' about it.
Disaster Recovery is often bagged together as a form of business continuity (BC), but businesses should be quick to take note the distinction: Disaster Recovery from an ICT perspective is only one small part of a business continuity plan. The important thing to bear in mind is that both should not be owned by the ICT department and left for technical staff to create. In fact, the policy of what should happen in given circumstances should be decided by the business. For example - an ICT department can't necessarily make the call on whether 2 hours of downtime is acceptable in a DR situation, or whether 2 weeks is acceptable, and in what context.
NZET spoke this week with a major NZ organisation's IT department management staff member. The staff member (whom wishes to remain anonymous) stated: "The board of directors do not have the desire to make a decision regarding disaster recovery, even after being highlighted by recent external audits that this was improper. The directorate delegated the responsibility to the 'already under-resourced' IT department". If this sounds familiar, it's because it happens all over the country even in high-profile organisations. Take for example last Month's 3-day IBM data centre outage in their new Highbrook based NZ cloud hosting facility.
Here are some basic questions each organisation should ask itself at the executive level (Risk management or CIO/CTO level):
- Does the organisation have an up to date Business Continuity and Disaster Recovery plan that has been devised at the business (not ICT) level?
- Have both plans been tested? For example:
- Have the communications methods been trialled to see if they work in the event of disaster (SMS, telephone, web?)
- Are there any single points of failure? Are these acceptable?
- Have all of the critical business systems been backed up and restored to prove integrity?
- Does the RTO (Return Time Objective) of the recovery plan meet with the business' current needs?
- Has the SLA been clearly defined? - Note that most good SLAs are multi-tiered: lower RTO for critical systems, higher for non-base.
- Have all BC avenues been documented to the business? - For example, how should the business react to each scenario and gravitas, including Earthquake, Volcano, Aeroplane crash, Bomb, Fire, Flood, Electrical failure, as well as the less obvious, eg: digital and physical security threats.
- Has a capacity management plan in the event of sudden data influx been discovered?
If an ICT department or engineer is left to create these metrics on their own, invariably they will be sub-optimal at best for the business. An ideal outcome is one that is created in collaboration between major business streams, including the ICT department. Regular checks from outside of the department should be made to ensure that compliance (including testing) is being met. Continual monitoring of capacity is also important due to consumable resource including tape storage, disk storage and expansion requirements such as server and licensing play a key part in the operational aspects of a working disaster recovery plan.
On a final note, businesses should take note that with NZ's growing reliance on cloud based ICT service provisions, just as the recent IBM data centre service failure (which is believed due to the failure of a single IBM V7000 series storage array), careful attention should be drawn to the DR & BC components of your service agreement. Many data centres offer multi-tiered solutions based on affordability. Never assume that a cloud based offering is providing a backup of your data at all - Google, DropBox and Microsoft were amongst the big names to lose data in some of their well-known cloud based services, however, reading the small print, it was clear that none of these organisations were obliged in any way to ensure the integrity of your data and would only do so on best effort case. The moral of the story: if you can't afford to lose business information, don't put all of your eggs into one basket and always check the small print.