Instrument First, Build Second: The Case for Telemetry-Driven Development
Source: lobsters
Most observability work happens after the fact. You ship a feature, it misbehaves in production, and then you go back and add some metrics. You wire up a trace, maybe attach a logger, and promise yourself you will instrument things properly next time. Telemetry-Driven Development, as explored in Noah Betzen’s talk and accompanying repository, is the attempt to make “next time” the default.
The name is deliberately parallel to Test-Driven Development. In TDD you write the test first, let it fail, and then write the code that satisfies it. The discipline is not about the test itself; it is about forcing you to think about the contract before you write the implementation. Telemetry-Driven Development borrows the same loop: define the signals you want to observe before you write the code that emits them. The question shifts from “what should this function do?” to “what should I be able to see when this function runs?”
The Elixir Telemetry Ecosystem as a Model
Betzen works extensively in Elixir, and the Elixir ecosystem is a useful case study here because it has converged on a single, decoupled telemetry primitive. The :telemetry library provides a publish-subscribe bus for runtime events. You call :telemetry.execute/3 with an event name, a map of measurements, and a map of metadata:
:telemetry.execute(
[:my_app, :request, :stop],
%{duration: System.monotonic_time() - start},
%{path: conn.request_path, status: conn.status}
)
Nothing is coupled here. The emitter does not know or care who is listening. Handlers attach via :telemetry.attach/4, and libraries like telemetry_metrics translate those raw events into structured metrics (counters, summaries, last values) that downstream reporters can forward to Prometheus, StatsD, or whatever else you have wired up.
Phoenix, Ecto, Oban, Broadway, and most major Elixir libraries already emit telemetry events following a consistent naming convention. When you use Ecto.Repo, queries automatically emit [:my_app, :repo, :query] events. Phoenix emits [:phoenix, :router_dispatch, :stop]. You get useful observability before writing a single custom instrument, because the ecosystem decided that emitting events is the library author’s responsibility, not the application author’s.
This is the architectural bet at the core of telemetry-driven thinking: instrumentation belongs in the library, not bolted on top of it.
What OpenTelemetry Gets Right (and Where It Gets Heavy)
At a broader industry level, OpenTelemetry is the attempt to standardize exactly this. It defines three signal types, traces, metrics, and logs, and provides SDKs for most major languages that implement a vendor-neutral API. The key design decision mirrors what Elixir’s :telemetry does: the API (what your code calls) is separated from the SDK (how signals are collected and exported). In theory, you can swap exporters without touching application code.
In practice, OpenTelemetry adoption involves real friction. The SDK initialization for a Node.js service looks something like this:
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: 'http://collector:4318/v1/traces' }),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
This is not particularly onerous, but it is boilerplate that lives outside your main application entry point and is easy to skip during early development when you are moving fast. The telemetry-driven discipline argues that this setup should happen on day one, not before the first production incident.
The OpenTelemetry Collector adds another layer: a process that receives OTLP data, applies transforms (sampling, attribute enrichment, redaction), and forwards it to backends. For production this is essentially mandatory, because you do not want every service instance talking directly to your observability backend. But running a collector locally during development means spinning up another process, and that friction is real.
The Feedback Loop That Actually Matters
The core argument for instrumenting before writing is not philosophical. It is about the feedback loop you are creating for yourself.
Consider a background job processor. If you write the job, ship it, and add metrics later, your metric design is constrained by the implementation you already have. You will emit what is easy to emit. If instead you start by specifying that you want to observe queue depth, processing latency broken down by job type, retry counts, and failure reasons before writing any processing logic, those requirements constrain your data model. You need to know the job type at emission time, which means it needs to be a first-class field on your job struct. You need a monotonic clock measurement at enqueue time, which means you need to record when the job entered the queue, not just when it started processing.
The observability requirements pull on the design of the underlying code, just as a test’s assertions pull on the shape of the function under test.
This is not purely theoretical. Libraries like Oban for Elixir, which provides background job processing, emit granular telemetry events including [:oban, :job, :start], [:oban, :job, :stop], and [:oban, :job, :exception] with structured metadata about queue, worker, priority, and attempt number. This is possible because Oban’s authors thought about what operators would need to observe before they finalized the data structures jobs carry.
Structured Logging Is Not Enough
One objection to telemetry-driven development is that structured logging handles most of this. If you emit JSON-structured log lines with enough context, you can derive metrics and traces post hoc using something like Loki with LogQL.
The problem is cardinality and query time. Logs are unindexed by default, and deriving time-series data from logs requires either pre-aggregation at collection time (which means you still need to know what you want to measure) or expensive scanning. High-cardinality signals, like latency histograms broken down by user ID, are not practical to derive from logs at scale.
More importantly, logs and metrics serve different observability use cases. A metric tells you something is wrong right now. A log tells you what happened during a specific event. A distributed trace tells you where time went across service boundaries. Using logs as a substitute for metrics is not a portability win; it is a capability reduction.
Tracing as Design Tool
One underused aspect of telemetry-first thinking is using distributed traces during development, not just in production. Tools like Jaeger and Zipkin can run locally via Docker with minimal configuration. If you start a trace at the beginning of an HTTP request and propagate the trace context through your service calls, you get a visual representation of where time is actually spent.
This matters for Discord bot development, for example. A slash command handler that fans out to three database queries and two HTTP calls before responding has an obvious performance structure. Without tracing, you profile it by adding timers, running load, and checking logs. With a trace you see the waterfall on the first request. The bottleneck is usually obvious immediately.
The opentelemetry_phoenix and opentelemetry_ecto libraries automate this for Phoenix applications by attaching to the existing telemetry events those libraries already emit. The instrumentation is essentially free once the SDK is initialized.
The Gap Between Emitting and Acting
Betzen’s repository points at something that pure OpenTelemetry documentation rarely addresses directly: the gap between emitting signals and actually doing something with them. You can have thorough instrumentation and still not change your development behavior based on it.
Telemetry-driven development in the full sense means closing that loop. You define what questions you need to be able to answer about your system, you instrument to make those questions answerable, and then you actually look at the answers before and after changes. This is closer to how SLO-based engineering works: you define service level objectives upfront, measure against them continuously, and treat budget burn as a first-class signal in your development process.
The discipline is easier to maintain if your local development environment includes a dashboarding layer. Running Grafana and Prometheus locally via Docker Compose during development, rather than relying on a staging environment for visibility, means you see metric changes immediately when you alter behavior. The feedback loop shrinks from hours to seconds.
Where This Falls Down
The honest limitation is that telemetry-first discipline requires knowing what you need to observe before you have built the thing. For genuinely exploratory work, particularly when you are not sure what the right abstraction is yet, specifying metrics upfront is guesswork. You will emit events that end up being irrelevant and miss the ones you actually need.
The pragmatic resolution is to treat telemetry design like API design: not something you get perfect on the first pass, but something you iterate on with the same care you give the public interface. Renaming a metric or changing its label cardinality is a breaking change for anyone consuming it. The Elixir telemetry ecosystem has developed conventions around this, including versioned event namespaces, precisely because early instrumentation designs often need revision.
The Discipline Is the Point
What makes telemetry-driven development compelling is not any specific tool or framework. It is the habit of treating observability as a first-class design constraint rather than an operational afterthought. The Elixir ecosystem models one way to institutionalize this, by making telemetry emission part of the library contract. OpenTelemetry provides the cross-language scaffolding. But the actual practice, defining what you need to see before you write what produces it, is just discipline.
The same discipline that makes TDD valuable is not really about the tests. It is about the forced specification step before implementation. Telemetry-driven development borrows exactly that: you cannot instrument what you have not thought about, and thinking about what to instrument requires thinking clearly about what your code is supposed to do and how it is supposed to perform. That constraint is usually worth more than the signals themselves.