Observability 101

center

Updated in June 2023

What is observability? (dynatrace)

In IT and cloud computing, observability is the ability to measure a system’s current state based on the data it generates.

dynatrace.com/blog

What is observability? (Splunk)

Observability is the ability to measure the internal states of a system by examining its outputs.

splunk.com

Observability vs. monitoring

Monitoring requires you to know what you care about before you know you care about it.

Observability allows you to understand your entire system and how it fits together, and then use that information to discover what specifically you should care about when it’s most important.

lightstep.com (comparison)

Observability vs. DevOps

center

Telemetry data

  • Logs
  • Metrics
  • Traces (& distributed tracing)

Aka "The Three Pillars of Observability"

A personal experience

  • SSH to Linux servers and tail log files
  • RDP to Windows servers and open log files in a text editor
  • Write custom libraries to enrich application logs
  • Send logs to Redis data store to have a fast & decoupled solution
  • Parse logs with Logstash (with grok filters), store the data in Elasticsearch and view them in Kibana
  • Write custom HTTP processors to read/write fields to messages exchanged between microservices
  • Migrate to Kubernetes and collect metrics with Prometheus

Distributed tracing

center

Cloud Native

Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. (ref. CNCF)

center

CNCF (Cloud Native Computing Foundation)

The CNCF serves as the vendor-neutral home for many of the fastest-growing open source projects.

center

CNCF project velocity in 2022

center

OpenTelemetry (OTel)

High-quality, ubiquitous, and portable telemetry to enable effective observability

OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

opentelemetry.io, GitHub

Reasons to choose OpenTelemetry

  • Open source & vendor neutral
  • CNCF active & trending project (merge of OpenCensus and OpenTracing)
  • Reliable & decoupled solution
  • State of the Art design and implemenation
  • A standard adopted and supported by observability leaders
  • Easy to integrate & extend

Architecture of OpenTelemetry

center

OpenTelemetry Collector

center

Kubernetes deployment (DaemonSet)

Helm chart

center

Tracing Signal

  • Trace
  • Span
  • SpanContext

Other OpenTelemetry components

Demonstration

  • Clone devpro/servicenow-dotnet-client
  • Start containers with docker-compose up
  • Configure and run the sample web API with dotnet run --project src samples/WebApiSample
  • Make REST API calls from Swagger page
  • Open local Grafana and look at Loki, Prometheus and Tempo exploration pages

Getting started

  • Experiment locally with docker compose: OpenTelemetry Collector with Grafana and/or Elastic Stack or Splunk
  • Evaluate OpenTelemetry SDK (exporter & instrumentation): .NET, Python

References