OpenTelemetry — Interactive Visualization

History

OpenCensus (Google, 2017)

Metrics and tracing with vendor-neutral data collection. First to unify signals under one project.

OpenTracing (CNCF, 2016)

Distributed tracing API. No data collection mandate — pure API spec that anyone could implement.

OpenTelemetry (2019)

CNCF merged both projects. Absorbed best ideas into single unified API + SDK. Now CNCF's 2nd largest project after Kubernetes.

Why OpenTelemetry?

Before OTel — Vendor Lock-in

Your App

→

Jaeger Agent

traces only

→

Jaeger Backend

Your App

→

StatsD Exporter

metrics only

→

Prometheus

Your App

→

Custom Logging

ELK only

→

Elasticsearch

After OTel: Instrument once. Route to any backend by changing Collector config — not your application code.

The Unified Model

Application Code

Auto-instrumentation + Manual API

↓

OTel SDK

API implementation + sampling + resource attributes

↓

OTel Collector

Receives, processes, exports telemetry

↓

Any Backend

Jaeger · Prometheus · SigNoz · Datadog · Tempo · Loki

Core Goals

Vendor Neutrality

Own your telemetry data. Route to any backend by swapping Collector config.

Unified Signals

Single model for traces, metrics, and logs. Same concepts across all three.

Auto-Instrumentation

Zero-code / low-code observability. Agents attach to libraries automatically.

Cross-Cutting

Context propagation, resource attributes, and semantic conventions unify all signals.

Polyglot Support

First-class SDKs: Go, Python, Java, JavaScript, .NET, Rust, C++, PHP, Ruby.

Open Specification

OTel spec is vendor-neutral. Every implementation follows the same behavior.

Three Signals

OpenTelemetry defines three signals — fundamental telemetry types. All three share the same context propagation model and can be correlated.

Traces

Directed acyclic graph (DAG) of spans. Captures the causal chain of operations across services. The primary observability primitive.

Metrics

Point-in-time observations of measurements. Sampled separately from traces. Designed for alerting and dashboards.

Logs

High-fidelity timestamped records. OTel LogRecords can carry trace_id/span_id for correlation back to traces.

Signal Relationships

Trace (signal) └── Span (signal-specific data structure) ├── Links to other spans (causal graph edges) └── Contains events (logs within a trace) Metric (signal) └── DataPoints (per-instrument type: counter, histogram, gauge) Log (signal) └── LogRecord (timestamped, attributed, severity-rated) └── Can carry trace_id + span_id → links to trace

Trace Model

Trace ID: abc123 (16 bytes, globally unique) │ ├── Root Span: "POST /orders" ← order-service │ │ Span ID: span-1 │ │ │ ├── Child Span: "validate" ← order-service │ │ │ Span ID: span-2 │ │ └── (work) │ │ │ └── Child Span: "POST /invoice" ← order-service │ │ Span ID: span-3 │ │ (calls invoice-service) │ │ │ └── [ propagation: traceparent header ] │ │ │ └── Linked Span: "generate_invoice" ← invoice-service │ │ Span ID: span-4 │ │ (linked from a DIFFERENT trace context on wire)

Span Model

Field	Type	Description
`name`	string	Human-readable operation name
`trace_id`	16-byte ID	Globally unique trace identifier
`span_id`	8-byte ID	Unique span within the trace
`parent_span_id`	8-byte ID	Parent span ID (empty for root)
`start_time` / `end_time`	Timestamp	Wall-clock start and end
`kind`	SpanKind	`server`, `client`, `producer`, `consumer`, `internal`
`status`	Status	`unset`, `ok`, `error`
`attributes`	Map[string, Value]	Key-value pairs describing the span
`events`	[]SpanEvent	Timestamped log messages during the span
`links`	[]SpanLink	Links to other spans (potentially from other traces)

SpanKind

Kind	Meaning	Visual
`server`	Incoming request handler	`←——` arrow in
`client`	Outgoing request to a dependency	`——→` arrow out
`producer`	Message sent to queue (no immediate response)	`——↗` arrow to queue
`consumer`	Message received from queue	`↘——` arrow from queue
`internal`	Internal operation (default)	no arrow

Metrics Instruments

Instrument	Sync/Async	Use
Counter	Sync	Additive values (requests served, bytes sent)
UpDownCounter	Sync	Non-additive (active connections, queue depth)
Histogram	Sync	Distribution of values (request latencies, payload sizes)
ObservableCounter	Async (callback)	System metrics from APIs (CPU usage)
ObservableUpDownCounter	Async	Gauge-like additive metrics
ObservableGauge	Async	Point-in-time values (temperature, queue depth)

Log Model

Field	Description
`timestamp`	When the event occurred
`severity`	Log level: TRACE(5), DEBUG(10), INFO(20), WARN(30), ERROR(40)
`body`	Log message
`resource`	Attributes of the emitting entity (service.name, etc.)
`attributes`	Structured key-value pairs
`trace_id`, `span_id`	If emitted within a traced context (correlation key)

Signal Correlation: Trace context flows into all three signals. A span event carries trace_id. A metric exemplar carries trace_id + span_id. A LogRecord can carry trace_id + span_id. Click any → navigate all.

Collector Architecture

The OTel Collector is a vendor-neutral proxy that receives, processes, and exports telemetry. It sits between your application and observability backends.

Receivers

Ingest telemetry from apps

OTLP · Jaeger · Zipkin
Prometheus · Kafka · filelog

→

Processors

Modify / filter / sample

batch · memory_limiter · transform
filter · tail_sampling · k8sattributes

→

Exporters

Send to backends

OTLP · Jaeger · Prometheus
Loki · Datadog · AWS X-Ray

Extensions

zpages · health_check · pprof

Connectors

spanmetrics (traces → metrics)

Receivers

Receiver	Protocol	Signal
`otlp`	gRPC / HTTP	traces, metrics, logs
`jaeger`	Thrift / gRPC	traces
`zipkin`	HTTP	traces
`prometheus`	HTTP pull	metrics
`prometheusremotewrite`	HTTP remote write	metrics
`hostmetrics`	System calls	metrics
`kafka`	Kafka	traces, metrics, logs
`filelog`	File tail	logs
`syslog`	Syslog	logs

Processors

Processor	Function
`batch`	Batches spans/metrics/logs to reduce export calls
`memory_limiter`	Rejects data when memory is high (OOM protection)
`transform`	Modify attributes using OTTL (OTel Transformation Language)
`filter`	Filter spans/metrics/logs by criteria
`resource`	Add/modify resource attributes
`attributes`	Add/modify span/log attributes
`probabilistic_sampler`	Sample X% of traces
`tail_sampling`	Sample based on policies (error, latency, SLO)
`routing`	Route to different exporters based on criteria
`k8sattributes`	Inject Kubernetes metadata (pod, namespace, etc.)

Exporters

Exporter	Backend
`otlp`	Any OTel-native backend
`otlphttp`	Any backend via HTTP
`jaeger`	Jaeger
`zipkin`	Zipkin
`prometheus`	Prometheus (pull or remote_write)
`loki`	Grafana Loki (logs)
`datadog`	Datadog
`awsxray`	AWS X-Ray
`awsemf`	AWS CloudWatch EMF (metrics)
`logging`	Stdout (debug)

Minimal Config

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  memory_limiter:
    limit_mib: 512
    check_interval: 1s

exporters:
  otlp:
    endpoint: http://tempo:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp]

Deployment Modes

Agent Mode

Sidecar or DaemonSet on each node. Applications send to localhost. Reduces backend connections, adds local batching/compression.

Gateway Mode

Single Collector Deployment as central aggregation point. Single choke point for routing, filtering, sampling.

Standalone

Single process doing everything. For small deployments and local development.

Agent + Gateway (Production)

App

→

OTel Agent

localhost:4317

→

OTel Gateway

central cluster

→

Backend

SigNoz/Tempo

App

→

OTel Agent

→

OTel Gateway

→

Backend

App

→

OTel Agent

→

OTel Gateway

→

Backend

Tail Sampling Policies

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    expected_new_traces_per_sec: 100
    policies:
      - name: errors-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-traces-policy
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 10}
      - name: latency-slo-policy
        type: and
        and: {and_policy_requirements:
          - policy: latency
            latency: {threshold_ms: 100}
          - policy: status_code
            status_code: {status_codes: [OK]}
        }

Connectors (Beta)

spanmetrics connector creates RED metrics (Request rate, Error rate, Duration) automatically from traces — no code changes needed.

connectors:
  spanmetrics:
    metrics_exporter: prometheus

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, spanmetrics]
    metrics:
      receivers: [spanmetrics]
      exporters: [prometheus]

W3C Trace Context

Context propagation links spans across process boundaries (network calls, message queues, async tasks) into a single end-to-end trace.

traceparent Header

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
             │  │                                │              │
             │  trace_id (32 hex chars)         │   │    │
             │                                   │   │    └── flags (2 hex chars)
             │                                   │   └─────────── span_id (16 hex chars)
             │                                   └────────────── version (2 hex chars)
             └────────────────────────────────────────────────── version prefix

Field	Length	Description
`version`	2 hex	Protocol version (currently `00`)
`trace_id`	32 hex	16-byte global trace ID
`parent_id` (span_id)	16 hex	8-byte span ID of the parent
`flags`	2 hex	Options (bit 0 = sampled: `01` = sampled, `00` = not)

tracestate Header

tracestate: congo=t61rcWkgMzE,rojo=00f067aa0ba902b7

Format: key=value,key=value (max 32 pairs, 256 chars total). Carries vendor-specific or cross-cutting metadata. Optional — traceparent is mandatory.

Propagators

The Propagators API injects context into outgoing carriers (HTTP headers, message metadata) and extracts from incoming carriers.

Propagator	traceparent	tracestate	Baggage	Notes
`TraceContext`	W3C standard	W3C standard	No	Default
`Baggage`	No	No	W3C standard	Must be combined with TraceContext
`W3C`	W3C standard	W3C standard	No	Alias for TraceContext
`B3`	B3 single header	N/A	Via `bkvr`	Legacy Zipkin format
`AWS X-Ray`	AWS format	N/A	No	AWS-specific
`Jaeger`	Jaeger headers	N/A	No	Legacy Jaeger format

Go: Setting Propagators

import "go.opentelemetry.io/otel/propagation"

// Register a composite propagator (trace context + baggage)
otel.SetTextMapPropagator(propagation.NewCompositePropagator(
    propagation.TraceContext{},   // W3C Trace Context
    propagation.Baggage{},         // W3C Baggage
))

Python: Setting Propagators

from opentelemetry import propagate
from opentelemetry.propagate import set_global_textmap
from opentelemetry.sdk.trace.propagation.tracecontext import TraceContextPropagator

set_global_textmap(TraceContextPropagator())

Inject and Extract (Go)

// Inject: extract context from span and inject into HTTP headers
func makeHTTPRequest(ctx context.Context, url string) (*http.Response, error) {
    req, _ := http.NewRequest("GET", url, nil)
    propagator := propagation.TraceContext{}
    propagator.Inject(ctx, req.Header, propagation.HeaderCarrier(req.Header))
    return http.DefaultClient.Do(req)
}

// Extract: extract trace context from incoming HTTP headers
func handleHTTPRequest(w http.ResponseWriter, r *http.Request) {
    propagator := propagation.TraceContext{}
    ctx := propagator.Extract(r.Context(), propagation.HeaderCarrier(r.Header))
    ctx, span := tracer.Start(ctx, "handler")
    defer span.End()
}

Baggage

Baggage is key-value metadata propagated alongside trace context. Unlike span attributes (scoped to a single span), baggage flows through the entire trace and across all services.

Tenant ID

Propagate tenant context across all services for multi-tenant correlation.

Feature Flags

Carry experiment/flag state through the trace without explicit passing.

Build Info

Git SHA, CI pipeline ID — attached to every span automatically.

Customer ID

Correlate logs across services using customer context from trace entry.

Baggage Limitations: No cardinality limit — high-cardinality values bloat tracestate. No encryption — baggage is in HTTP headers, treat as non-sensitive. Not all proxies forward tracestate — check your ingress.

Baggage Format (W3C)

// Via tracestate (preferred — forwarded by more proxies)
tracestate: otel.baggagegage="key1=value1,key2=value2"

// Or via dedicated header (less common)
baggage: key1=value1, key2=value2

Baggage: Go

import "go.opentelemetry.io/otel/baggage"

// Add baggage
b, _ := baggage.NewMember("tenant.id", "acme-corp")
m, _ := baggage.NewMember("user.role", "admin")
bag, _ := baggage.New(b, m)
ctx := baggage.ContextWithBaggage(ctx, bag)

// Read baggage anywhere in the trace
baggage := baggage.FromContext(ctx)
if val, ok := baggage.Member("tenant.id"); ok {
    span.SetAttributes(attribute.String("tenant.id", val))
}

Context in Message Queues

# Publishing: inject trace context into message headers
from opentelemetry.propagate import inject
headers = {}
inject(headers)  # injects traceparent + baggage into headers
producer.send("my-topic", value=data, headers=headers)

# Consuming: extract context from message and create linked span
from opentelemetry.propagate import extract
ctx = extract(message.headers)
with tracer.start_as_current_span("process-message", context=ctx) as span:
    # span is linked to the producer span
    pass

Sampling Flag

traceparent: 00-...-...-01

sampled flag set — record this trace

→

Full Telemetry

all spans exported

traceparent: 00-...-...-00

not sampled — trace ID still propagates

→

Phantom Trace

root span only (useful for counting)

Core Constructs

TracerProvider

Top-level factory that creates Tracer instances. Holds sampler, span processor, and resource attributes. Created once at application startup.

Tracer

Creates spans. Scoped to a library or module. Use service name as tracer name. One tracer per logical component.

Span

The fundamental unit — a named, timed operation with trace_id, span_id, parent_span_id, start/end time, kind, status, attributes, events, and links.

SpanContext

Minimal data to link a span across process boundaries: trace_id (16 bytes), span_id (8 bytes), trace_flags (1 byte with sampled bit), tracestate, is_remote flag.

Span Lifecycle

Start Span

trace_id assigned
span_id assigned
parent_span_id set
start_time set

➔

Work

attributes set
events added
child spans created

➔

End Span

end_time set
span recorded
batched → exported

Go: Manual Tracing

// 1. Create TracerProvider (once at startup)
tp := trace.NewTracerProvider(
    trace.WithResource(resource.New(ctx,
        resource.WithAttributes(
            attribute.String("service.name", "order-service"),
        ),
    )),
    trace.WithSampler(trace.AlwaysSample()),
)

// 2. Register globally
otel.SetTracerProvider(tp)

// 3. Get a Tracer
tracer := tp.Tracer("order-service")

// 4. Start a span
ctx, span := tracer.Start(ctx, "handleOrders")
defer span.End()

// 5. Add attributes (metadata)
span.SetAttributes(
    attribute.String("order.id", orderID),
    attribute.Float64("order.amount", amount),
    attribute.String("http.method", "POST"),
)

// 6. Add an event (a log point in time)
span.AddEvent("order validated")
span.AddEvent("invoice response received", trace.WithAttributes(
    attribute.Int("http.status_code", 201),
))

// 7. Mark error if needed
span.SetStatus(codes.Error, "failed to call invoice service")

Python: Manual Tracing

from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

# 1. Get the global tracer
tracer = trace.get_tracer("invoice-service")

# 2. Start a span (context manager auto-ends)
with tracer.start_as_current_span("generate_invoice") as span:
    span.set_attribute("invoice.order_id", str(order_id))
    span.set_attribute("invoice.amount", amount)
    span.add_event("invoice generation started")

    invoice = generate_invoice(order_id, amount)

    span.set_status(Status(StatusCode.OK))
    # or: span.set_status(Status(StatusCode.ERROR, "reason"))

Key Patterns

Pattern 1: Context Passing

The ctx carries the current trace context. Start a span with it — children automatically link as children.

Pattern 2: HTTP Client Span

Inject trace context into outgoing HTTP request via otelhttp.NewClient(). Child span automatically links to parent.

Pattern 3: Cross-Service Parent-Child

Propagator extracts traceparent from incoming HTTP headers. Child span uses extracted context as parent.

Pattern 4: Auto-Instrumentation

Wrap HTTP handlers with otelhttp.NewHandler(). All routes automatically create spans with HTTP attributes.

Pattern 5: Error Marking

Always set span.SetStatus(codes.Error, "reason") and attribute.Bool("error", true) on error spans.

Sampling

Sampler	Behavior
`AlwaysOn`	Every span recorded (dev)
`AlwaysOff`	No spans recorded (perf testing)
`TraceIdRatio`	Sample X% of root spans; all children follow
`ParentBased`	Child follows parent's sampling decision

// 10% of traces; if parent was sampled → sample everything
sampler := trace.ParentBased(
    trace.TraceIDRatioBased(0.1),
)

SpanStatus

Status	Code	When to use
`Unset`	`0`	Default — no status set. Treated as Ok. Backends typically don't display.
`Ok`	`1`	Span completed successfully. Set explicitly when you want guaranteed visibility.
`Error`	`2`	Span ended in failure. Surfaces in error-focused views.

Set Ok explicitly only when you need guaranteed status display in backends that filter by status. Otherwise Unset is fine. Always set Error on failures.

Instruments Overview

OTel defines 6 instruments in 3 categories. Sync instruments: your code calls .Add() or .Record() directly. Async instruments: OTel SDK calls your callback periodically.

Counter

Monotonic — only increments. Use for requests_total, bytes_sent. Always use positive values.

UpDownCounter

Non-monotonic — goes up AND down. Use for active_connections, queue_size.

Histogram

Distribution of values. Buckets into predefined boundaries. Use for request_duration_ms, payload_size_bytes.

ObservableCounter

Async version of Counter. OTel SDK calls your callback periodically. Use for system metrics from APIs.

ObservableUpDownCounter

Async version of UpDownCounter. Use for pulled metrics like memory_used_bytes.

ObservableGauge

Point-in-time value. Use for temperature, queue_depth, disk_usage.

Counter: Go

// Create at startup
counter, err := meter.Int64Counter(
    "http_requests_total",
    metric.WithDescription("Total HTTP requests received"),
    metric.WithUnit("requests"),
)

// Record — always Add() with a positive value for counters
counter.Add(ctx, 1,
    metric.WithAttributes(
        attribute.String("method", "GET"),
        attribute.String("path", "/orders"),
        attribute.String("status", "200"),
    ),
)

Histogram: Go

// Create at startup
histogram, err := meter.Float64Histogram(
    "order_processing_duration_ms",
    metric.WithDescription("Order processing time in milliseconds"),
    metric.WithUnit("ms"),
    metric.WithExplicitBucketBoundaries(
        5.0, 10.0, 25.0, 50.0, 100.0, 250.0,
        500.0, 1000.0, 2500.0, 5000.0, 10000.0,
    ),
)

// Record a measurement
histogram.Record(ctx, 127.5,
    metric.WithAttributes(
        attribute.String("method", "POST"),
        attribute.String("path", "/orders"),
    ),
)

Observable Gauge: Go

var currentQueueSize int64

_, err := meter.Int64ObservableGauge(
    "queue_size",
    metric.WithDescription("Current number of items in queue"),
    metric.WithCallback(func(_ context.Context, o metric.Int64Observer) error {
        o.Observe(currentQueueSize)
        return nil
    }),
)

Histogram: Python

# Create at startup
duration_histogram = meter.create_histogram(
    name="order_processing_duration_ms",
    description="Order processing time in milliseconds",
    unit="ms",
)

# Record
duration_histogram.record(127.5, {"method": "POST", "path": "/orders"})

Attributes (Labels)

Attributes classify metric recordings. Every unique combination creates a new time series.

Cardinality Warning: attribute.String("request_id", unique_id) creates a new time series for every request. Keep attribute values low cardinality (≤ 100 unique values).

# Good — low cardinality
counter.add(1, {"customer_tier": "premium"})

# Bad — high cardinality (one time series per unique value)
counter.add(1, {"request_id": "req-abc-123"})

Temporality

Temporality	What it means	Use case
Cumulative (default)	Each export contains all values since app start	General use
Delta	Each export is only the delta since last export	Prometheus remote write (reduces cardinality)

Exemplars

Exemplars are trace references embedded in histogram buckets: actual trace_id and span_id attached to a bucket recording. Enable drill-down from metric → trace.

order_processing_duration_ms

p95 = 450ms

→

Exemplar

trace_id=abc123
span_id=def456
value=447ms

→

Click in SigNoz

Jump to that trace

Exemplars are automatic when your .Record() call runs inside an active trace context. Enable exemplars in your SDK config.

Full Setup: Go Metrics

func initMeter(ctx context.Context) (func(), error) {
    exporter, err := otlpmetricgrpc.New(ctx)
    if err != nil {
        return nil, err
    }

    res, err := resource.New(ctx,
        resource.WithAttributes(
            attribute.String("service.name", "order-service"),
        ),
    )

    reader := metric.NewPeriodicBatchReader(exporter,
        metric.WithInterval(10 * time.Second),
    )

    mp := metric.NewMeterProvider(
        metric.WithResource(res),
        metric.WithReader(reader),
    )

    otel.SetMeterProvider(mp)
    return func() { mp.Shutdown(ctx) }, nil
}

LogRecord Model

Logs in OTel are first-class signals. A LogRecord carries timestamp, severity, body, resource, attributes, and optionally trace_id + span_id for correlation.

LogRecord Fields

timestamp, severity (TRACE/DEBUG/INFO/WARN/ERROR), body, resource, attributes, trace_id, span_id

Trace Correlation

LogRecords emitted within a traced context carry trace_id and span_id. Enables log → trace drill-down.

Severity Levels

5 levels: TRACE (5), DEBUG (10), INFO (20), WARN (30), ERROR (40). Numeric for ordering.

Bridge Patterns

OTel bridges existing log libraries (log4j, stdlib, zap) via SDK log handlers. Existing logs flow into OTel pipeline.

Signal Relationships: Log + Trace

Span

span_id=abc

←——→

trace context propagated

via W3C traceparent header

←——→

LogRecord

span_id=abc

Same trace_id flows through both — click log → navigate to trace

Python: Emit Log (within trace)

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("process-order") as span:
    # Log within this span — trace_id/span_id auto-attached
    span.add_event("log", attributes={
        "message": "order processing started",
        "order_id": order_id,
    })
    # Span events are logs within a trace context
    # Use span.add_event() for in-trace logging

Python: Log Bridge (stdlib → OTel)

import logging
from opentelemetry.sdk._logs import LoggingHandler

# Bridge Python stdlib logging → OTel SDK
otel_handler = LoggingHandler()
logging.root.addHandler(otel_handler)

# All logging.info(), logging.error() calls now flow through OTel
logging.info("payment processed", extra={"order_id": "12345"})
# These carry trace_id/span_id when called within a traced context

Go: LogRecord via Logger

import "go.opentelemetry.io/otel/log"

// Create a logger (once at startup)
logger := mp.Logger("order-service")

// Emit a log record within a trace context
logger.Info(ctx, "order processed",
    log.WithAttributes(
        attribute.String("order.id", orderID),
        attribute.String("status", "completed"),
    ),
)
// ctx carries trace context — log record gets trace_id + span_id auto

Severity Mapping

OTel Severity	Numeric	Typical Mapping
`TRACE`	5	Most verbose — stack traces, full debug dumps
`DEBUG`	10	Debug info, variable dumps
`INFO`	20	Normal operations, state transitions
`WARN`	30	Unexpected but handled (retries, fallbacks)
`ERROR`	40	Failures requiring attention

Logs without trace context: A LogRecord without trace_id/span_id is a standalone log. Ingest it via the Collector filelog receiver and use resource attributes (service.name, etc.) for correlation.

Agent vs Gateway Mode

Agent Mode (DaemonSet)

One Agent pod per node. Applications send to localhost. Agent does local batching, enrichment, and ships to Gateway. Reduces backend connections from apps.

Gateway Mode (Deployment)

Single or few Gateway pods. Agents forward to it. Gateway aggregates, tail-samples, and routes to backends. Central policy enforcement point.

Architecture: Agent + Gateway

Node 1

App Pod

↓

OTel Agent (DaemonSet)

localhost:4317

Node 2

App Pod

↓

OTel Agent (DaemonSet)

localhost:4317

↓

OTel Gateway (Deployment)

k8sattributes processor, tail sampling

↓

Backend

SigNoz / Tempo / Jaeger

Helm Install: Agent Mode

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install otel-collector open-telemetry/opentelemetry-collector \
  --set mode=daemonset \
  --set config.receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317 \
  --set config.receivers.otlp.protocols.http.endpoint=0.0.0.0:4318

Helm Install: Gateway Mode

helm install otel-collector-gateway open-telemetry/opentelemetry-collector \
  --set mode=deployment \
  --set config.receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317 \
  --set config.exporters.otlp.endpoint=http://tempo:4317

Auto-Instrumentation in K8s

Use the OpenTelemetry Operator with the Instrumentation CR to auto-inject OTel SDK into pods without modifying application code.

# Install OpenTelemetry Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

# Create Instrumentation resource (per namespace)
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
  namespace: default
spec:
  exporter:
    endpoint: http://otel-collector:4317
  propagators:
    - tracecontext
    - baggage
  resource:
    addK8sUID: true
    addAttributes: true

# Annotate pod to enable auto-instrumentation
apiVersion: v1
kind: Pod
metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-sdk: "true"
    instrumentation.opentelemetry.io/inject-tracer: "opentelemetry-auto"
spec:
  containers:
  - name: my-app
    image: my-app:latest

K8s Attributes Processor

The k8sattributes processor automatically enriches spans with Kubernetes metadata.

processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.pod.start_time
        - k8s.container.name
        - k8s.container.restart_count
    filter:
      node: ".*worker.*"   # Only pods on worker nodes

Resource Limits

Memory

Set memory limits to match memory_limiter config. Target 512–1024Mi for agents, 1–2Gi for gateways under load.

CPU

OTel Agent: 250–500m CPU. Gateway: 1–2 CPU. Enable ballast extension only if not using GOMEMLIMIT env var (Collector v0.91+).

Network

Agents expose ports 4317 (gRPC), 4318 (HTTP). Gateway needs egress for OTLP to backend. Ensure NetworkPolicy allows.

GOMEMLIMIT

Collector v0.91+ recommends GOMEMLIMIT instead of ballast. Set env: with GOMEMLIMIT: 400MiB.

Production checklist: memory_limiter processor on all pipelines, batch processor for export efficiency, k8sattributes for pod metadata enrichment, and tail_sampling at the gateway for cost control.