There are a myriad of networking problems that can make you pull your hair out and scream at your laptop. One problem that seems to be at the top of the list is the amount of time it takes to get to the root of the problem, or 'time-to-visibility'. Whether that’s an outage, an application problem or an attack, every minute is vitally critical to effectively correcting the issue. Stating the blindingly obvious, the shorter the time to visibility, the more you can do with the resources you’ve got.
As any engineer will attest, the most difficult problems to troubleshoot – the ones that absorb 90% of available resources – are the intermittent ones. The problems that are there, aren’t there or sometimes there are the ones that are just plain irritating. Globally, days, if not weeks, of effort can be wasted chasing intermittent faults.
The traditional strategy employed by most organisations is to wait for a user to report a fault once, or maybe twice, then ship in a probe of some description to capture a trace, which can then be analysed to help diagnose the problem. And 10 years ago, when networks ran sub-gigabit and the network wasn’t mission-critical, this was probably OK. But things have changed – a lot. In organisations where the network is the business, or the business is critically dependent on the network for operational continuity, the concept of pervasive capture is catching fire. Why? Because pervasive capture, accompanied by the right visualisation tools, is the key to reducing time-to-visibility. Consider the following situation:
"We want to give you our money, but you won’t let us”
Service Provider X is having a nightmare. At the same time every day, an unknown anomaly causes their billing system to go dark for about an hour, and the ops team has no idea why. The fallout from a situation like this is fairly obvious: The outage essentially built a cement wall between the company and incoming payments. It’s not unreasonable to assert that without proper network visibility a fix on a network of this magnitude could take a couple of weeks. It is also safe to assume that a considerable number of customers will become fed up with not being able to pay their bill and opt to give their money to another provider.
"This may take a while, and it won’t be cheap”
Along with the lost revenue, Service Provider X will also have to incur the cost of actually fixing the problem. Without accurate visibility into the network, extra resources will certainly be necessary to detect and locate the anomaly. Perhaps your team of engineers will get lucky and find the problem immediately, but in all likelihood you won’t be that fortunate. And if your company requires outside resources to correct the issue, those costs will further multiply.
"I’m mad and want to tell the world”
There are also reputation and brand considerations to take into account. We no longer live in a society where an angry customer (or former customer) will tell friends and family about the incident in intermittent doses. Today, consumers who feel they are on the short end of the CRM stick express their anger globally, via social media. As each day passes, mounting customers will express dissatisfaction with your organisation. The last thing you want is to have your company’s Twitter handle trending with a "#badcustomerservice” hash tag. Peripherally, reduced time-to-visibility could be your marketing department’s best friend.
The example illustrates that there actually two metrics at play here – time-to-visibility and cost-of-blindness. The concept isn’t rocket science, but it’s missed by a lot of organisations. With networks of tomorrow expanding to 40Gbps and 100Gbps, it’s an issue that should be given more consideration in the very near future. Instead of extended downtime, lost revenue, fix costs and a damaged reputation, organisations that invest in visibility can detect problems in a matter of minutes and avoid the aforementioned pitfalls. Have a frank discussion about your time-to-visibility and cost-of-blindness, and decide whether it’s time to take another look at pervasive packet capture.
Tim Nichols is vice president of marketing at Endace. This article originally appeared in the May issue of IT Brief.