Observability: A new focus for cloud-native businesses

Mon, 30th Jul 2018

FYI, this story is more than a year old

‘Observability' is the word coming from everybody's mouths across enterprises, whether you're in IT Operations, DevOps, Agile, or Site Reliability Engineering (SRE). Let's take a closer look at what observability means and how it applies to both web-scale and in the traditional sense.

What is Observability?

As with many new concepts in IT (such as DevOps), the industrial world was the first to coin the term observability. In this case, observability describes an attribute of systems that are internally instrumented, allowing equipment operators to see inside the otherwise hidden processes of their systems.

For example, if an operator at a water treatment plant can't gain visibility of the inside of opaque water pipes, they have no way of determining if the water is flowing, which way it's flowing, or whether the water is dirty or clean – a lack of observability.

What the operator could do is adding flow gauges and sensors inside the pipes. These would be connected by telemetry to a dashboard, allowing the operator to gain full visibility, or observability, of the status of water in the pipes.

Observability in Software Applications and Services

Similarly to the industrial world, observability can be applied to software services. When developers code today, they include measurement and telemetry which delivers observable applications.

This allows operations teams to:

Detect, contain, and alert sooner on critical incidents and events.
Investigate the root causes of problems more efficiently.
Fix incidents faster with real-time feedback on remediation efforts.
Undertake more accurate post-incident reviews and post-mortems.
Better understand the problem history and prevent recurrence.
Close feedback loops with requirements for continuous improvement.
Use analytics and machine learning to predict and prevent problems.

Observability in the Real World

Observability is becoming the norm for cloud-native businesses, unhindered by decades of success and the ‘legacy' of systems and applications that come with that success. If large traditional enterprises do have this history, they are still able to implement observability into their existing services:

With no code changes – by streaming system-level data directly from infrastructure components (e.g. throughput, utilisation, capacity, etc. of servers, storage, visual management services (VMs), cloud services, containers, etc.)
With minimal code changes – by deploying collected to measure and forward specific infrastructure attributes (e.g. CPU workload, memory usage, I/O rates, or storage utilisation)
With some code changes – by deploying stats to collect and forwarding metric data from inside your application (e.g. counters and timers for transaction time, round-trip time, etc.)
With major code changes – by implementing semantic logging to instrument any application activity, from ‘speeds and feeds' to business metrics (e.g. revenue, click-through rate (CTR), customer experience, etc.)

While these approaches are valuable in themselves, the additional effort always adds value. For example, data from legacy data center infrastructure management (DCIM) or application performance management (APM) tools will help to detect and triage technical problem events and answer IT questions.

Actioning Observability with AIOps

Possessing new data, graphs, KPIs and dashboards alone will not allow your business to succeed. Observability has to be actioned in order for you to unlock its true value, whether this is from a real-time problem and incident triage, close DevOps feedback loops, or proactively prevent problems. This means collecting observability data and aligning it with other monitoring outputs, processing it with analytics and using machine learning to begin producing automated responses. Once you have combined monitoring with observability, machine learning, predictive analytics and advanced data integration you will have what Gartner dubs ‘Artificial Intelligence for IT Operations' or ‘AIOps.'

True business-technology alignment

For cloud-based startups delivering web-based services, observability is an exciting new concept in IT. For traditional IT Ops, it still seems difficult to achieve, however, it is achievable for any business, even large enterprises. As an addition to traditional monitoring, observability marks a new era in IT ops and software service delivery, facilitating businesses towards true business-technology alignment.

By Andi Mann, Chief Technology Advocate, Splunk