In the first part of our blog series about observability we covered the basic principles of observability and explained how it differs from the classical monitoring term. In this article we’ll discuss OpenTelemetry and its instrumentation approaches.
OpenTelemetry is currently the most actively developed standard in the field of observability. It is being adopted as the Cloud Native Computing Foundation incubating project. Born primarily as a merging of former OpenTracing and OpenCensus standards, OpenTelemetry continues to gain popularity, with its supporters including representatives of Google, Microsoft, and Uber.
The goal of the OpenTelemetry project is to introduce a standardized open solution for any development team to enable a proper observability layer in its project. OpenTelemetry provides a standard protocol description for metrics, tracing, and logging collection. It also collects APIs under its nest instrumentation for different target languages and data infrastructure components.
The development of specifications and all related implementations is being run in an open way in Github, so anyone involved can propose changes.
Different instrumentation implementations for different languages are in development. The current state of readiness can always be found on a related page of official documentation (for example, PHP).
Logs are the oldest and best-known type of telemetry signals, and they have a significant legacy. Log collection and storage is a well-understood task, with many solutions being established and widely adopted to carry it out. For example, the infamous ELK (or EFK) stack, Splunk, and Grafana Labs recently introduced the Loki project, a lighter alternative to ElasticSearch.
The main problem is that logs are not integrated with other telemetry signals – no solutions offer an option to correlate a log record with a relative metric or trace. Having the opportunity to do this can form a very powerful introspection framework.
OpenTelemetry specifications try to solve this problem with a logging format standard proposal. It allows correlating logs via execution context metadata, timing, or a log emitter source.
However, right now the standard is at an experimental stage and under heavy development, so we won’t focus on it here. You can read more about the current specifications on the OpenTelemetry site.
As discussed previously, metrics are numeric data aggregates representing the software system performance. Through aggregation we can develop a combination of measurements into exact statistics during a time window.
The OpenTelemetry metrics system is flexible. It was designed to be like this to cover the existing metric systems without any loss of functionality. As a result, a move to OpenTelemetry is less painful than other alternatives.
The OpenTelemetry standard defines three metrics models:
The metrics standard defines three metric transformations that can happen in between the Event and Stream models:
We will talk about the Stream and Time Series models in the third part of our blog series, where we will discuss signal transportation and storage. For now, let’s focus on the Event model, which is related to instrumentation.
The process of creation for every metric in OpenTelemetry consists of three steps:
The OpenTelemetry measurements model defines six types:
Through Aggregations in OpenTelemetry, measurements are aggregated into end metric values that afterwards will be transported to storage. OpenTelemetry defines the following measurements as aggregations:
A developer can define their own aggregations, but in most cases, the default ones predefined for each type of measurement will suit the developer’s needs.
After all aggregations have been done, additional filtering or customization can be carried out on the View level. To summarize, here is an example of a simple metric creation (in GoLang):
import "go.opentelemetry.io/otel/metric/instrument"
counter := Meter.SyncInt64().Counter(
"test.counter",
instrument.WithUnit("1"),
instrument.WithDescription("Test Counter"),
)
// Synchronously increment the counter.
counter.Add(ctx, 1, attribute.String("attribute_name", "attribute_value"))
Here we create a simple metric consisting of one counter measurement. As you can see,many details we discussed are hidden but can be exposed if the developer needs them.
In the next part of our blog series, we will talk about metrics transportation, storage, and visualization.
As we discussed previously, traces represent an execution path inside a software system. The execution path itself is a series of operations. A unit of operation is represented in the form of a span. A span has a start time, duration, an operation name, and additional context attached to it. Spans are interconnected via context propagation and can be nested (one operation can consist of multiple smaller operations inside itself). The resulting hierarchical tree structure of spans represents the trace – an entire execution path inside a software system.
Here is an example of the simplest span creation (in GoLang):
import "go.opentelemetry.io/otel/trace"
var tracer = otel.Tracer("test_app")
// Create a span
ctx, span := tracer.Start(ctx, "test-operation-name",
trace.WithSpanKind(trace.SpanKindServer))
testOperation()
// Add attributes
if span.IsRecording() {
span.SetAttributes(
attribute.Int64("test.key1", 1),
attribute.String("test.key2","2"),
)
}
// End the span
span.End()
Now we have our first trace.
A trace can be distributed through different software microservices. In this case, so as not to lose the interconnection, OpenTelemetry SDK can automatically propagate context through the network according to the protocol being used. One example is the W3C Trace Context HTTP headers definition. However, not all language SDKs support automatic context propagation, so you may have to instrument it manually depending on the language you use.
The ability to interconnect different types of signals makes an observability framework powerful. For example, it allows you to identify a service response that took too long via metrics and, in one click, jump to the correlating trace of this response execution to identify what part of the system caused the slow processing.
Signals in OpenTelemetry can be interconnected in a couple of ways. One is the use of Exemplars – specific values supplied with trace, logs, and metrics. These consist of a particular record ID, time of observation, and optional filtered attributes specifically dedicated to allow direct connection between traces and metrics.
Another approach to signal interconnection is the association of the same metadata with the use of Baggage and Context. Baggage is a specific value supplied with traces, logs, and metrics that allows you to annotate it and consists of user defined pairs of keys and values. By annotating corresponding metrics and traces with the same values in Baggage, the user can correlate them.
We covered the pillars of OpenTelemetry and some details of application instrumentation. But we don’t just need to instrument our applications – we should also introduce tooling for the aggregation, storage, and visualization of the signals we supply. In the third part of this series, we will discuss tooling and the OpenTelemetry collector component in detail.
Want to read more like this?
Get the latest news and tips from NordVPN.