Xtracta: Disaster recovery and data loss prevention
Data protection, both physical and digital are key parts to protecting a business, organisation or even an individual from loss of critical information.
All too often when talking to a variety of organizations, I find little thought is given to the various scenarios that could occur and in such cases, what action plan would be implemented.
A term you will be reading throughout this blog is "what-if" - it's something that sits at the heart of my decision making in this area. Being analytical and considering rare scenarios can open your thinking to new ideas and possibilities and the associated risk to your organisation.
Unfortunately it's difficult to insure against these so its best to protect yourself from the start.
This topic is massive so I will cover just a few interesting points in this posting.
Backup is not Disaster Recovery.
Many organisations and people believe that as long as they have a good backup procedure, then they are protected from disaster. While (depending on their backup strategies as aforementioned) this may well hold true, many organisations don't think about what is actually required to get (and excuse the pun) back-up and running again.
For example, we decided to trial a restoration from backups from one of our most popular Xtracta App processing sites to see how it would work. We had a plan on how to do this that we had built, maintained and updated since launching the site - but what we had thought of in theory had a single flaw. While we had then (and at time of writing still have) a 3-month retention policy we hadn't factored in the growing file size of source documents.
My guess for the reasons behind this come down to changes in technology:
• Higher resolution cameras on smart phones for photographed documents coming in
• Higher resolutions as standard on scanners
• Faster internet connections (both mobile and fixed) and a decrease in the cost of data transmission meaning users weren't worried about the sizes of files they were sending
• Our storage capacity was fine as we had massively over-provisioned this from the beginning. When storage capacity is more than adequate, not much thought is given to the growth rate in data while your mind is on many other parts of the business.
In any case, after talking with a customer who had many problems with their infrastructure and needed to restore from backups (and helping them out a little in the process) it really got me thinking about what would be the challenges for us if we found ourselves in a similar situation? The key challenge faced by the customer was the amount of time required to restore from backup and have their system working again. In their case it was 25 hours. Luckily it happened over a weekend for them and they run Monday-Friday but its a scary prospect for a key production system.
When looking at our projected data growth (and even current size), it became clear we would face a similar issue should we ever need to restore from backups. And unfortunately the Xtracta App does not take the weekend off either!
As a highly available SaaS application we need to recover almost instantaneously from any issue to maintain service availability - there will always be a client at any point in time who desperately needs the Xtracta App to meet a deadline and there is nothing worse for them than in their moment of need and stress, for it to be unavailable.
As such we implemented greater real time replication and redundancy so firstly a failure wouldn't necessarily take our systems down and secondly if it did, our ability to get up and running again would be significantly expedited.
Businesses who do backups need to think the same way. If they have a fire at their office and they lose all of their servers, while they may have data stored off-site it's not a 5 minute job to:
• buy new servers
• re-setup the network
• restore all systems/data from backup
The above processes will often taken days or even longer during which your business is halted. In such a case it's highly recommended to have a disaster recovery (DR) site that can take over very quickly.
Location, Location, Location
Holding backups and running DR sites are great and congratulations to those who are already doing these! One of the key things to consider is where are these stored/run from and what does that mean in a disaster? Stored too close and they could be at risk of whatever destroyed the primary data/computing site - too far away and they could be too slow to access.
Let's take Hurricane Sandy for example which brought devastation to the east coast of the USA. A major issue that caused data hosting facilities to go down was flooding of basements taking out power sources. So while writing this I can already hear the calls - right that's fine since they all have backup generators, which is true. The problem was that fire regulations meant that fuel for the generators had to be stored in the basements. Again this wouldn't be a problem but unfortunately the laws of physics meant the pumps had to be there also.
So while the fuel was fine in its storage tanks and the servers/generator were fine on the upper stories - fuel simply couldn't be brought to the generators. Some enterprising facilities formed human chains of people handing buckets from the tanks up the stairs to the generators' small reserve tanks a few floors up; but that isn't what you want to rely on in a DR situation.
So for those companies who may have had their premises extensively damaged further down the coast and relied on their New York based DR site to kick in were on thin ice. A better strategy would have been to look at central or West Coast hosting facilities which rarely get hit by the same event that has occurred on the East Coast. In the aforementioned situation, even 500km between the sites didn't make a difference since they got hit by the same event.
For those businesses where the director may store portable hard drives with backups at their home 5km from their office - it could well be hit by the same tornado or inundated with the same flood. Think about these risks and store backups/choose DR locations accordingly.
Other things to consider include:
• Power grid - different is better
• Connections between the locations and how your telco's network actually could utilize these links
• Economic viability of your DR provider if your city gets hit and the DR provider goes out of business
• There are many more but always think of the "what-ifs".
Auditing your SaaS provider / Thinking of new risks
SaaS (software as a service) companies have soared in popularity lately. It's a great model to use with only OPEX costs and generally no contracts (thereby avoiding any CAPEX), plus very easy to roll out systems - we use a number at Xtracta for a variety of functions (holding any client data is not one of them however!). While the SaaS model makes data protection easier in many ways, it also makes measuring risk more difficult from both a backup, DR and privacy standpoint. While privacy is beyond the remit of this blog, from a data protection standpoint, it's easy to assume your provider has adequate backups and DR systems. Is this truly the case and if you consider "what-if" situations can you see any flaws in their models?
A good example and I believe under-rated risk here in New Zealand is our connectivity to the rest of the world. New Zealand has a single international submarine cable supplier who provides 2 submarine cables - one going to Australia and the other to Hawaii. The landing stations are close, perhaps 50km apart on separate coasts of the North Island. Looking at what happened in the 2011 Tōhoku earthquake in Japan, multiple submarine cables went offline and it was an extended period before they were fixed.
Or recently the SEA-ME-WE 3 West cable linking Perth and Singapore was cut and unable to be fixed for months due to lack of permits being granted for its repair by the Indonesian government (perhaps "bribes" were not included within the cable's insurance policy). In any case it's quite possible this situation could happen in New Zealand and the economic impact would be devastating. For organizations who rely on services like Dropbox, Gmail etc. they will lose access to all of those services, even if they are only using them to transact with other parties in NZ.
While data loss due to system failures hasn't been a major issue for SaaS providers, possibly a greater risk is the business operations and continuity of those providers. Look at cases like Megaupload or even back to the .com busts of the early 2000s and imagine if they were a critical supplier of services to the operations of your organisation. Consider the SaaS provider, get to know it's history and if possible - protect yourself with your own backups of data held on their systems and a plan in place to rollout a replacement to their software just in case.
For those Physical Files
The first thing, and it is I am compelled to say it - digitize those files. Any physical document is at a hugely greater risk (apart from a rare situation like a coronal mass ejection destroying our ability to use electronics on Earth, but that's another story!) than a digital file - primarily because the only way to "back it up" is to make another physical copy which is labour intensive and inefficient. We often hear from a number of customers that they think they need to keep physical copies as that is required by particular legislation within their various jurisdictions. Our experience is that is not correct, in many jurisdictions a digital copy of data is fine as long as it is a "like for like" representation and is properly protected from loss.
The reality is that digital data is easier to replicate to locations many km away, can have nearly infinite copies and can be used by multiple people at the same time (protect files from disappearing as we have seen before with historical books containing ripped out pages, expected documentation to be missing from a filing cabinet or even mail not quite making it to the organization even though its gone into the mailbox!).
It wasn't until the following scenario that even I had really thought about this, but the Christchurch earthquake introduced many people to another risk associated with holding physical data - access. After the Earthquake, the entire CBD of the city was shut down into a "red zone" as it was highly dangerous and the public were kept out. For those businesses whose premises had survived the quake - they couldn't retrieve their data to keep their businesses running from new premises.
A similar situation has arisen in Fukushima with the radiation induced evacuation zone becoming off-limits. The stories we have heard about those with businesses in the red zone include clandestine operations where balaclava clad staff would sneak in at night to recover important data at great peril to their own well-being. If that data was in a digital format at a DR site in another city they would have been up and running sooner and focusing on rebuilding their business rather than smuggling themselves into their old premises.
I hope that readers get thinking about "what-ifs" and how they can protect their data and ensure business continuity.
By Jonathan Spence - Xtracta Blog