Span or Attribute? in OpenTelemetry custom instrumentation

TL;DR: Attribute. More information on one event gives us more correlation power. It’s also cheaper.

When you want to add some information to your tracing telemetry, you could emit a log, create a span, or add a piece of data to your current span. Adding a piece of data to your current span is the best! Usually.

a trace with spans (rows with a colored bar on the timeline for their duration), logs (dots on a span), and attributes (fields in a list when you click on a span)

Attributes are the best, and also the cheapest.

If you have request name, user ID, request properties, feature flags, and notes about what happened in a single event, then you can correlate

  • feature flags with error rate
  • number of items with latency
  • which users hit the same stack trace The more data on the top-level span, the more answers you can get to “What is different about the requests that failed?”[1]

More information in one place is better! You can say trace.getCurrentSpan().set_attribute(“my_module.items.count”, items.length) anywhere in your code, and accumulate data on a single event. This might be my favorite thing about OpenTelemetry tracing.

Providers like Honeycomb that charge per event make adding attributes nearly free. (There’s still network, and long-term storage if you use that.)

Spans are for important units of work.

But sometimes it’s better to create a whole new span!

When to start a new span:

  • Incoming request – Gotta create a top-level span to represent the work, so that you can add all those sweet attributes to it! This might be a root span (incoming work from outside, new trace) or a server span (continuing a propagated trace). In services, these come from instrumentation libraries.
  • Network boundaries – spans are great for seeing dependencies between components. When you’re calling out to another service or database, it’s normal to make a client span for the outgoing call. These are created by many instrumentation libraries.
  • Async boundaries – spans are great for seeing what ran in parallel and what concurrently.
  • Performance concerns – spans are great for seeing what is slow.

Logs are useful sometimes.

If something might happen more than once, then a single-valued attribute can’t record them all. If you want to track how long that thing took, use a span. If it’s a fixed-time event (like an interrupt or error), then a log is good![2][2]

For example, if there’s only way an exception could be thrown in the scope of the span, then putting exception.message on the span is great. But if it’s possible for another exception to be thrown, that message would be overwritten! This is a good time to emit a log. Make sure the log participates in the trace (it includes trace and span ID), and then it will show up on your current span in the trace view. It doesn’t hurt to put that message on the span as well.

These are suggestions.

These are guidelines, but the choice is yours. What do you want your trace to look like? What do you want to see called out in the trace waterfall, and what do you want to have together for correlation? Maybe you want both: an attribute on the root span, and a span that shows duration and detail.

Tracing tells the story of your application. Tell it the way that works for you.

Prompt

Get the AI to tell the story to you, and to verify that it works by testing. Here’s some advice to add to give your AI when coding:

## Observability Practices
- add important data to the current span as attributes. Examples:
- request parameters, especially internal IDs
- feature flag values
- anything that the code branches on
- counts of how many times a loop was iterated
- results of downstream calls
- Name attributes like: <application>.<module>.<field>
- Do not create span events, they're expensive.
- Create logs only on exceptions
- bring in instrumentation libraries for frameworks and client libraries to create the span structure
- when kicking off async work, create a new span around each async task so that we can see what happens
concurrently and what waits.
- Use the Honeycomb MCP to check that your attributes and spans show up correctly after testing.

[1] The data doesn’t have to be on the same span to correlate it; Honeycomb can query across spans and logs in a trace. But it’s faster and easier when the data is on the same span, and BubbleUp (“what is different?”) works on single events.

[2] You might wonder, why a log instead of a span event? They are the same inside Honeycomb. Logs are sent immediately and are more likely to arrive. This matters in web clients, where people close the tab and the span never ends.

Discover more from Jessitron

Subscribe now to keep reading and get access to the full archive.

Continue reading