Story image

Observability: A new focus for cloud-native businesses

‘Observability’ is the word coming from everybody’s mouths across enterprises, whether you’re in IT Operations, DevOps, Agile, or Site Reliability Engineering (SRE). Let’s take a closer look at what observability means and how it applies to both web-scale and in the traditional sense.

What is Observability?

As with many new concepts in IT (such as DevOps), the industrial world was the first to coin the term observability. In this case, observability describes an attribute of systems that are internally instrumented, allowing equipment operators to see inside the otherwise hidden processes of their systems.

For example, if an operator at a water treatment plant can’t gain visibility of the inside of opaque water pipes, they have no way of determining if the water is flowing, which way it’s flowing, or whether the water is dirty or clean – a lack of observability.

What the operator could do is adding flow gauges and sensors inside the pipes. These would be connected by telemetry to a dashboard, allowing the operator to gain full visibility, or observability, of the status of water in the pipes.

Observability in Software Applications and Services

Similarly to the industrial world, observability can be applied to software services. When developers code today, they include measurement and telemetry which delivers observable applications.

This allows operations teams to:

  • Detect, contain, and alert sooner on critical incidents and events.
  • Investigate the root causes of problems more efficiently.
  • Fix incidents faster with real-time feedback on remediation efforts.
  • Undertake more accurate post-incident reviews and post-mortems.
  • Better understand the problem history and prevent recurrence.
  • Close feedback loops with requirements for continuous improvement.
  • Use analytics and machine learning to predict and prevent problems.

Observability in the Real World

Observability is becoming the norm for cloud-native businesses, unhindered by decades of success and the ‘legacy’ of systems and applications that come with that success. If large traditional enterprises do have this history, they are still able to implement observability into their existing services:

  • With no code changes – by streaming system-level data directly from infrastructure components (e.g. throughput, utilisation, capacity, etc. of servers, storage, visual management services (VMs), cloud services, containers, etc.)
  • With minimal code changes – by deploying collected to measure and forward specific infrastructure attributes (e.g. CPU workload, memory usage, I/O rates, or storage utilisation)
  • With some code changes – by deploying stats to collect and forwarding metric data from inside your application (e.g. counters and timers for transaction time, round-trip time, etc.)
  • With major code changes – by implementing semantic logging to instrument any application activity, from ‘speeds and feeds’ to business metrics (e.g. revenue, click-through rate (CTR), customer experience, etc.)

While these approaches are valuable in themselves, the additional effort always adds value. For example, data from legacy data centre infrastructure management (DCIM) or application performance management (APM) tools will help to detect and triage technical problem events and answer IT questions.

Actioning Observability with AIOps

Possessing new data, graphs, KPIs and dashboards alone will not allow your business to succeed. Observability has to be actioned in order for you to unlock its true value, whether this is from a real-time problem and incident triage, close DevOps feedback loops, or proactively prevent problems.
 
This means collecting observability data and aligning it with other monitoring outputs, processing it with analytics and using machine learning to begin producing automated responses. Once you have combined monitoring with observability, machine learning, predictive analytics and advanced data integration you will have what Gartner dubs ‘Artificial Intelligence for IT Operations’ or ‘AIOps.’

True business-technology alignment

For cloud-based startups delivering web-based services, observability is an exciting new concept in IT. For traditional IT Ops, it still seems difficult to achieve, however, it is achievable for any business, even large enterprises. As an addition to traditional monitoring, observability marks a new era in IT ops and software service delivery, facilitating businesses towards true business-technology alignment. 

By Andi Mann, Chief Technology Advocate, Splunk

TechOne bringing solar lights to students in need
The company is partnering with charity SolarBuddy to bring solar-powered lights to children in energy poverty to alleviate study stress after dark.
Universal Robots aims for A/NZ growth with new hire
Peter Hern takes on the role of leading customer support, sales and partner development for Universal Robots in Australia and New Zealand.
Microsoft urges organisations to tackle data blindspots
Despite significant focus placed on CX transformation, over a third of Australian organisations claimed that more than one in five of their projects failed.
Raising the stakes: McAfee’s predictions for cybersecurity
Security teams and solutions will have to contend with synergistic threats, increasingly backed by artificial intelligence to avoid detection.
How big data can revolutionise NZ’s hospitals
Miya Precision is being used across 17 wards and the emergency department at Palmerston North Hospital.
Renesas develops 28nm MCU with virtualisation-assisted functions
The MCU features four 600 megahertz CPUs with a lock-step mechanism and a large 16 MB flash memory capacity.
Exclusive: Ping Identity on security risk mitigation
“Effective security controls are measured and defined by the direct mitigation of inherent and residual risk.”
CylancePROTECT now available on AWS Marketplace
Customers now have access to CylancePROTECT for AI-driven protection across all Windows, Mac, and Linux (including Amazon Linux) instances.