Story image

Observability: A new focus for cloud-native businesses

‘Observability’ is the word coming from everybody’s mouths across enterprises, whether you’re in IT Operations, DevOps, Agile, or Site Reliability Engineering (SRE). Let’s take a closer look at what observability means and how it applies to both web-scale and in the traditional sense.

What is Observability?

As with many new concepts in IT (such as DevOps), the industrial world was the first to coin the term observability. In this case, observability describes an attribute of systems that are internally instrumented, allowing equipment operators to see inside the otherwise hidden processes of their systems.

For example, if an operator at a water treatment plant can’t gain visibility of the inside of opaque water pipes, they have no way of determining if the water is flowing, which way it’s flowing, or whether the water is dirty or clean – a lack of observability.

What the operator could do is adding flow gauges and sensors inside the pipes. These would be connected by telemetry to a dashboard, allowing the operator to gain full visibility, or observability, of the status of water in the pipes.

Observability in Software Applications and Services

Similarly to the industrial world, observability can be applied to software services. When developers code today, they include measurement and telemetry which delivers observable applications.

This allows operations teams to:

  • Detect, contain, and alert sooner on critical incidents and events.
  • Investigate the root causes of problems more efficiently.
  • Fix incidents faster with real-time feedback on remediation efforts.
  • Undertake more accurate post-incident reviews and post-mortems.
  • Better understand the problem history and prevent recurrence.
  • Close feedback loops with requirements for continuous improvement.
  • Use analytics and machine learning to predict and prevent problems.

Observability in the Real World

Observability is becoming the norm for cloud-native businesses, unhindered by decades of success and the ‘legacy’ of systems and applications that come with that success. If large traditional enterprises do have this history, they are still able to implement observability into their existing services:

  • With no code changes – by streaming system-level data directly from infrastructure components (e.g. throughput, utilisation, capacity, etc. of servers, storage, visual management services (VMs), cloud services, containers, etc.)
  • With minimal code changes – by deploying collected to measure and forward specific infrastructure attributes (e.g. CPU workload, memory usage, I/O rates, or storage utilisation)
  • With some code changes – by deploying stats to collect and forwarding metric data from inside your application (e.g. counters and timers for transaction time, round-trip time, etc.)
  • With major code changes – by implementing semantic logging to instrument any application activity, from ‘speeds and feeds’ to business metrics (e.g. revenue, click-through rate (CTR), customer experience, etc.)

While these approaches are valuable in themselves, the additional effort always adds value. For example, data from legacy data centre infrastructure management (DCIM) or application performance management (APM) tools will help to detect and triage technical problem events and answer IT questions.

Actioning Observability with AIOps

Possessing new data, graphs, KPIs and dashboards alone will not allow your business to succeed. Observability has to be actioned in order for you to unlock its true value, whether this is from a real-time problem and incident triage, close DevOps feedback loops, or proactively prevent problems.
 
This means collecting observability data and aligning it with other monitoring outputs, processing it with analytics and using machine learning to begin producing automated responses. Once you have combined monitoring with observability, machine learning, predictive analytics and advanced data integration you will have what Gartner dubs ‘Artificial Intelligence for IT Operations’ or ‘AIOps.’

True business-technology alignment

For cloud-based startups delivering web-based services, observability is an exciting new concept in IT. For traditional IT Ops, it still seems difficult to achieve, however, it is achievable for any business, even large enterprises. As an addition to traditional monitoring, observability marks a new era in IT ops and software service delivery, facilitating businesses towards true business-technology alignment. 

By Andi Mann, Chief Technology Advocate, Splunk

How Adobe aims to drive digital transformation for financial services
Digital transformation is a requirement for ongoing competitiveness that clearly helps businesses run more efficiently.
Using blockchain to ensure regulatory compliance
“Data privacy regulations such as the GDPR require you to put better safeguards in place to protect customer data, and to prove you’ve done it."
Human value must be put back in marketing - report
“Digital is now so widely adopted that its novelty has worn off. In their attempt to declutter, people are being more selective about which products and services they incorporate into their daily lives."
A10 aims to secure Kubernetes container environments
The solution aims to provide teams deploying microservices applications with an automated way to integrate enterprise-grade security with comprehensive application visibility and analytics.
DigiCert conquers Google's distrust of Symantec certs
“This could have been an extremely disruptive event to online commerce," comments DigiCert CEO John Merrill. 
Microsoft NZ bids Goldie a “fond farewell”
Microsoft New Zealand director of commercial and partner business takes new role across the Tasman. The search for his replacement has begun.
Google says ‘circular economy’ needed for data centres
Google's Sustainability Officer believes major changes are critical in data centres to emulate the cyclical life of nature.
One Identity a Visionary in Magic Quad for PAM
One Identity was recognised in the Gartner Magic Quadrant for Privileged Access Management for completeness of vision and ability to execute.