All Visualizations
◉ Observability

OpenTelemetry (OTel)

Unified telemetry for cloud-native: traces, metrics, and logs under one vendor-neutral standard. Route to any backend — Jaeger, Prometheus, SigNoz, Datadog, Tempo.

OpenCensus (Google, 2017)
Metrics and tracing with vendor-neutral data collection. First to unify signals under one project.
OpenTracing (CNCF, 2016)
Distributed tracing API. No data collection mandate — pure API spec that anyone could implement.
OpenTelemetry (2019)
CNCF merged both projects. Absorbed best ideas into single unified API + SDK. Now CNCF's 2nd largest project after Kubernetes.
Before OTel — Vendor Lock-in
Your App
Jaeger Agent
traces only
Jaeger Backend
Your App
StatsD Exporter
metrics only
Prometheus
Your App
Custom Logging
ELK only
Elasticsearch
After OTel: Instrument once. Route to any backend by changing Collector config — not your application code.
The Unified Model
Application Code
Auto-instrumentation + Manual API
OTel SDK
API implementation + sampling + resource attributes
OTel Collector
Receives, processes, exports telemetry
Any Backend
Jaeger · Prometheus · SigNoz · Datadog · Tempo · Loki
Vendor Neutrality
Own your telemetry data. Route to any backend by swapping Collector config.
Unified Signals
Single model for traces, metrics, and logs. Same concepts across all three.
Auto-Instrumentation
Zero-code / low-code observability. Agents attach to libraries automatically.
Cross-Cutting
Context propagation, resource attributes, and semantic conventions unify all signals.
Polyglot Support
First-class SDKs: Go, Python, Java, JavaScript, .NET, Rust, C++, PHP, Ruby.
Open Specification
OTel spec is vendor-neutral. Every implementation follows the same behavior.

OpenTelemetry defines three signals — fundamental telemetry types. All three share the same context propagation model and can be correlated.

Traces
Directed acyclic graph (DAG) of spans. Captures the causal chain of operations across services. The primary observability primitive.
Metrics
Point-in-time observations of measurements. Sampled separately from traces. Designed for alerting and dashboards.
Logs
High-fidelity timestamped records. OTel LogRecords can carry trace_id/span_id for correlation back to traces.
Trace (signal) └── Span (signal-specific data structure) ├── Links to other spans (causal graph edges) └── Contains events (logs within a trace) Metric (signal) └── DataPoints (per-instrument type: counter, histogram, gauge) Log (signal) └── LogRecord (timestamped, attributed, severity-rated) └── Can carry trace_id + span_id → links to trace
Trace ID: abc123 (16 bytes, globally unique) │ ├── Root Span: "POST /orders" ← order-service │ │ Span ID: span-1 │ │ │ ├── Child Span: "validate" ← order-service │ │ │ Span ID: span-2 │ │ └── (work) │ │ │ └── Child Span: "POST /invoice" ← order-service │ │ Span ID: span-3 │ │ (calls invoice-service) │ │ │ └── [ propagation: traceparent header ] │ │ │ └── Linked Span: "generate_invoice" ← invoice-service │ │ Span ID: span-4 │ │ (linked from a DIFFERENT trace context on wire)
FieldTypeDescription
namestringHuman-readable operation name
trace_id16-byte IDGlobally unique trace identifier
span_id8-byte IDUnique span within the trace
parent_span_id8-byte IDParent span ID (empty for root)
start_time / end_timeTimestampWall-clock start and end
kindSpanKindserver, client, producer, consumer, internal
statusStatusunset, ok, error
attributesMap[string, Value]Key-value pairs describing the span
events[]SpanEventTimestamped log messages during the span
links[]SpanLinkLinks to other spans (potentially from other traces)
KindMeaningVisual
serverIncoming request handler←—— arrow in
clientOutgoing request to a dependency——→ arrow out
producerMessage sent to queue (no immediate response)——↗ arrow to queue
consumerMessage received from queue↘—— arrow from queue
internalInternal operation (default)no arrow
InstrumentSync/AsyncUse
CounterSyncAdditive values (requests served, bytes sent)
UpDownCounterSyncNon-additive (active connections, queue depth)
HistogramSyncDistribution of values (request latencies, payload sizes)
ObservableCounterAsync (callback)System metrics from APIs (CPU usage)
ObservableUpDownCounterAsyncGauge-like additive metrics
ObservableGaugeAsyncPoint-in-time values (temperature, queue depth)
FieldDescription
timestampWhen the event occurred
severityLog level: TRACE(5), DEBUG(10), INFO(20), WARN(30), ERROR(40)
bodyLog message
resourceAttributes of the emitting entity (service.name, etc.)
attributesStructured key-value pairs
trace_id, span_idIf emitted within a traced context (correlation key)
Signal Correlation: Trace context flows into all three signals. A span event carries trace_id. A metric exemplar carries trace_id + span_id. A LogRecord can carry trace_id + span_id. Click any → navigate all.

The OTel Collector is a vendor-neutral proxy that receives, processes, and exports telemetry. It sits between your application and observability backends.

Receivers
Ingest telemetry from apps
OTLP · Jaeger · Zipkin
Prometheus · Kafka · filelog
Processors
Modify / filter / sample
batch · memory_limiter · transform
filter · tail_sampling · k8sattributes
Exporters
Send to backends
OTLP · Jaeger · Prometheus
Loki · Datadog · AWS X-Ray
Extensions
zpages · health_check · pprof
Connectors
spanmetrics (traces → metrics)
ReceiverProtocolSignal
otlpgRPC / HTTPtraces, metrics, logs
jaegerThrift / gRPCtraces
zipkinHTTPtraces
prometheusHTTP pullmetrics
prometheusremotewriteHTTP remote writemetrics
hostmetricsSystem callsmetrics
kafkaKafkatraces, metrics, logs
filelogFile taillogs
syslogSysloglogs
ProcessorFunction
batchBatches spans/metrics/logs to reduce export calls
memory_limiterRejects data when memory is high (OOM protection)
transformModify attributes using OTTL (OTel Transformation Language)
filterFilter spans/metrics/logs by criteria
resourceAdd/modify resource attributes
attributesAdd/modify span/log attributes
probabilistic_samplerSample X% of traces
tail_samplingSample based on policies (error, latency, SLO)
routingRoute to different exporters based on criteria
k8sattributesInject Kubernetes metadata (pod, namespace, etc.)
ExporterBackend
otlpAny OTel-native backend
otlphttpAny backend via HTTP
jaegerJaeger
zipkinZipkin
prometheusPrometheus (pull or remote_write)
lokiGrafana Loki (logs)
datadogDatadog
awsxrayAWS X-Ray
awsemfAWS CloudWatch EMF (metrics)
loggingStdout (debug)
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  memory_limiter:
    limit_mib: 512
    check_interval: 1s

exporters:
  otlp:
    endpoint: http://tempo:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [batch, memory_limiter]
      exporters: [otlp]
Agent Mode
Sidecar or DaemonSet on each node. Applications send to localhost. Reduces backend connections, adds local batching/compression.
Gateway Mode
Single Collector Deployment as central aggregation point. Single choke point for routing, filtering, sampling.
Standalone
Single process doing everything. For small deployments and local development.
Agent + Gateway (Production)
App
OTel Agent
localhost:4317
OTel Gateway
central cluster
Backend
SigNoz/Tempo
App
OTel Agent
OTel Gateway
Backend
App
OTel Agent
OTel Gateway
Backend
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    expected_new_traces_per_sec: 100
    policies:
      - name: errors-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-traces-policy
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 10}
      - name: latency-slo-policy
        type: and
        and: {and_policy_requirements:
          - policy: latency
            latency: {threshold_ms: 100}
          - policy: status_code
            status_code: {status_codes: [OK]}
        }
spanmetrics connector creates RED metrics (Request rate, Error rate, Duration) automatically from traces — no code changes needed.
connectors:
  spanmetrics:
    metrics_exporter: prometheus

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp, spanmetrics]
    metrics:
      receivers: [spanmetrics]
      exporters: [prometheus]

Context propagation links spans across process boundaries (network calls, message queues, async tasks) into a single end-to-end trace.

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
             │  │                                │              │
             │  trace_id (32 hex chars)         │   │    │
             │                                   │   │    └── flags (2 hex chars)
             │                                   │   └─────────── span_id (16 hex chars)
             │                                   └────────────── version (2 hex chars)
             └────────────────────────────────────────────────── version prefix
FieldLengthDescription
version2 hexProtocol version (currently 00)
trace_id32 hex16-byte global trace ID
parent_id (span_id)16 hex8-byte span ID of the parent
flags2 hexOptions (bit 0 = sampled: 01 = sampled, 00 = not)
tracestate: congo=t61rcWkgMzE,rojo=00f067aa0ba902b7

Format: key=value,key=value (max 32 pairs, 256 chars total). Carries vendor-specific or cross-cutting metadata. Optional — traceparent is mandatory.

The Propagators API injects context into outgoing carriers (HTTP headers, message metadata) and extracts from incoming carriers.

PropagatortraceparenttracestateBaggageNotes
TraceContextW3C standardW3C standardNoDefault
BaggageNoNoW3C standardMust be combined with TraceContext
W3CW3C standardW3C standardNoAlias for TraceContext
B3B3 single headerN/AVia bkvrLegacy Zipkin format
AWS X-RayAWS formatN/ANoAWS-specific
JaegerJaeger headersN/ANoLegacy Jaeger format
import "go.opentelemetry.io/otel/propagation"

// Register a composite propagator (trace context + baggage)
otel.SetTextMapPropagator(propagation.NewCompositePropagator(
    propagation.TraceContext{},   // W3C Trace Context
    propagation.Baggage{},         // W3C Baggage
))
from opentelemetry import propagate
from opentelemetry.propagate import set_global_textmap
from opentelemetry.sdk.trace.propagation.tracecontext import TraceContextPropagator

set_global_textmap(TraceContextPropagator())
// Inject: extract context from span and inject into HTTP headers
func makeHTTPRequest(ctx context.Context, url string) (*http.Response, error) {
    req, _ := http.NewRequest("GET", url, nil)
    propagator := propagation.TraceContext{}
    propagator.Inject(ctx, req.Header, propagation.HeaderCarrier(req.Header))
    return http.DefaultClient.Do(req)
}

// Extract: extract trace context from incoming HTTP headers
func handleHTTPRequest(w http.ResponseWriter, r *http.Request) {
    propagator := propagation.TraceContext{}
    ctx := propagator.Extract(r.Context(), propagation.HeaderCarrier(r.Header))
    ctx, span := tracer.Start(ctx, "handler")
    defer span.End()
}

Baggage is key-value metadata propagated alongside trace context. Unlike span attributes (scoped to a single span), baggage flows through the entire trace and across all services.

Tenant ID
Propagate tenant context across all services for multi-tenant correlation.
Feature Flags
Carry experiment/flag state through the trace without explicit passing.
Build Info
Git SHA, CI pipeline ID — attached to every span automatically.
Customer ID
Correlate logs across services using customer context from trace entry.
Baggage Limitations: No cardinality limit — high-cardinality values bloat tracestate. No encryption — baggage is in HTTP headers, treat as non-sensitive. Not all proxies forward tracestate — check your ingress.
// Via tracestate (preferred — forwarded by more proxies)
tracestate: otel.baggagegage="key1=value1,key2=value2"

// Or via dedicated header (less common)
baggage: key1=value1, key2=value2
import "go.opentelemetry.io/otel/baggage"

// Add baggage
b, _ := baggage.NewMember("tenant.id", "acme-corp")
m, _ := baggage.NewMember("user.role", "admin")
bag, _ := baggage.New(b, m)
ctx := baggage.ContextWithBaggage(ctx, bag)

// Read baggage anywhere in the trace
baggage := baggage.FromContext(ctx)
if val, ok := baggage.Member("tenant.id"); ok {
    span.SetAttributes(attribute.String("tenant.id", val))
}
# Publishing: inject trace context into message headers
from opentelemetry.propagate import inject
headers = {}
inject(headers)  # injects traceparent + baggage into headers
producer.send("my-topic", value=data, headers=headers)

# Consuming: extract context from message and create linked span
from opentelemetry.propagate import extract
ctx = extract(message.headers)
with tracer.start_as_current_span("process-message", context=ctx) as span:
    # span is linked to the producer span
    pass
traceparent: 00-...-...-01
sampled flag set — record this trace
Full Telemetry
all spans exported
traceparent: 00-...-...-00
not sampled — trace ID still propagates
Phantom Trace
root span only (useful for counting)
TracerProvider
Top-level factory that creates Tracer instances. Holds sampler, span processor, and resource attributes. Created once at application startup.
Tracer
Creates spans. Scoped to a library or module. Use service name as tracer name. One tracer per logical component.
Span
The fundamental unit — a named, timed operation with trace_id, span_id, parent_span_id, start/end time, kind, status, attributes, events, and links.
SpanContext
Minimal data to link a span across process boundaries: trace_id (16 bytes), span_id (8 bytes), trace_flags (1 byte with sampled bit), tracestate, is_remote flag.
Start Span
trace_id assigned
span_id assigned
parent_span_id set
start_time set
Work
attributes set
events added
child spans created
End Span
end_time set
span recorded
batched → exported
// 1. Create TracerProvider (once at startup)
tp := trace.NewTracerProvider(
    trace.WithResource(resource.New(ctx,
        resource.WithAttributes(
            attribute.String("service.name", "order-service"),
        ),
    )),
    trace.WithSampler(trace.AlwaysSample()),
)

// 2. Register globally
otel.SetTracerProvider(tp)

// 3. Get a Tracer
tracer := tp.Tracer("order-service")

// 4. Start a span
ctx, span := tracer.Start(ctx, "handleOrders")
defer span.End()

// 5. Add attributes (metadata)
span.SetAttributes(
    attribute.String("order.id", orderID),
    attribute.Float64("order.amount", amount),
    attribute.String("http.method", "POST"),
)

// 6. Add an event (a log point in time)
span.AddEvent("order validated")
span.AddEvent("invoice response received", trace.WithAttributes(
    attribute.Int("http.status_code", 201),
))

// 7. Mark error if needed
span.SetStatus(codes.Error, "failed to call invoice service")
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

# 1. Get the global tracer
tracer = trace.get_tracer("invoice-service")

# 2. Start a span (context manager auto-ends)
with tracer.start_as_current_span("generate_invoice") as span:
    span.set_attribute("invoice.order_id", str(order_id))
    span.set_attribute("invoice.amount", amount)
    span.add_event("invoice generation started")

    invoice = generate_invoice(order_id, amount)

    span.set_status(Status(StatusCode.OK))
    # or: span.set_status(Status(StatusCode.ERROR, "reason"))
Pattern 1: Context Passing
The ctx carries the current trace context. Start a span with it — children automatically link as children.
Pattern 2: HTTP Client Span
Inject trace context into outgoing HTTP request via otelhttp.NewClient(). Child span automatically links to parent.
Pattern 3: Cross-Service Parent-Child
Propagator extracts traceparent from incoming HTTP headers. Child span uses extracted context as parent.
Pattern 4: Auto-Instrumentation
Wrap HTTP handlers with otelhttp.NewHandler(). All routes automatically create spans with HTTP attributes.
Pattern 5: Error Marking
Always set span.SetStatus(codes.Error, "reason") and attribute.Bool("error", true) on error spans.
SamplerBehavior
AlwaysOnEvery span recorded (dev)
AlwaysOffNo spans recorded (perf testing)
TraceIdRatioSample X% of root spans; all children follow
ParentBasedChild follows parent's sampling decision
// 10% of traces; if parent was sampled → sample everything
sampler := trace.ParentBased(
    trace.TraceIDRatioBased(0.1),
)
StatusCodeWhen to use
Unset0Default — no status set. Treated as Ok. Backends typically don't display.
Ok1Span completed successfully. Set explicitly when you want guaranteed visibility.
Error2Span ended in failure. Surfaces in error-focused views.
Set Ok explicitly only when you need guaranteed status display in backends that filter by status. Otherwise Unset is fine. Always set Error on failures.

OTel defines 6 instruments in 3 categories. Sync instruments: your code calls .Add() or .Record() directly. Async instruments: OTel SDK calls your callback periodically.

Counter
Monotonic — only increments. Use for requests_total, bytes_sent. Always use positive values.
UpDownCounter
Non-monotonic — goes up AND down. Use for active_connections, queue_size.
Histogram
Distribution of values. Buckets into predefined boundaries. Use for request_duration_ms, payload_size_bytes.
ObservableCounter
Async version of Counter. OTel SDK calls your callback periodically. Use for system metrics from APIs.
ObservableUpDownCounter
Async version of UpDownCounter. Use for pulled metrics like memory_used_bytes.
ObservableGauge
Point-in-time value. Use for temperature, queue_depth, disk_usage.
// Create at startup
counter, err := meter.Int64Counter(
    "http_requests_total",
    metric.WithDescription("Total HTTP requests received"),
    metric.WithUnit("requests"),
)

// Record — always Add() with a positive value for counters
counter.Add(ctx, 1,
    metric.WithAttributes(
        attribute.String("method", "GET"),
        attribute.String("path", "/orders"),
        attribute.String("status", "200"),
    ),
)
// Create at startup
histogram, err := meter.Float64Histogram(
    "order_processing_duration_ms",
    metric.WithDescription("Order processing time in milliseconds"),
    metric.WithUnit("ms"),
    metric.WithExplicitBucketBoundaries(
        5.0, 10.0, 25.0, 50.0, 100.0, 250.0,
        500.0, 1000.0, 2500.0, 5000.0, 10000.0,
    ),
)

// Record a measurement
histogram.Record(ctx, 127.5,
    metric.WithAttributes(
        attribute.String("method", "POST"),
        attribute.String("path", "/orders"),
    ),
)
var currentQueueSize int64

_, err := meter.Int64ObservableGauge(
    "queue_size",
    metric.WithDescription("Current number of items in queue"),
    metric.WithCallback(func(_ context.Context, o metric.Int64Observer) error {
        o.Observe(currentQueueSize)
        return nil
    }),
)
# Create at startup
duration_histogram = meter.create_histogram(
    name="order_processing_duration_ms",
    description="Order processing time in milliseconds",
    unit="ms",
)

# Record
duration_histogram.record(127.5, {"method": "POST", "path": "/orders"})

Attributes classify metric recordings. Every unique combination creates a new time series.

Cardinality Warning: attribute.String("request_id", unique_id) creates a new time series for every request. Keep attribute values low cardinality (≤ 100 unique values).
# Good — low cardinality
counter.add(1, {"customer_tier": "premium"})

# Bad — high cardinality (one time series per unique value)
counter.add(1, {"request_id": "req-abc-123"})
TemporalityWhat it meansUse case
Cumulative (default)Each export contains all values since app startGeneral use
DeltaEach export is only the delta since last exportPrometheus remote write (reduces cardinality)

Exemplars are trace references embedded in histogram buckets: actual trace_id and span_id attached to a bucket recording. Enable drill-down from metric → trace.

order_processing_duration_ms
p95 = 450ms
Exemplar
trace_id=abc123
span_id=def456
value=447ms
Click in SigNoz
Jump to that trace
Exemplars are automatic when your .Record() call runs inside an active trace context. Enable exemplars in your SDK config.
func initMeter(ctx context.Context) (func(), error) {
    exporter, err := otlpmetricgrpc.New(ctx)
    if err != nil {
        return nil, err
    }

    res, err := resource.New(ctx,
        resource.WithAttributes(
            attribute.String("service.name", "order-service"),
        ),
    )

    reader := metric.NewPeriodicBatchReader(exporter,
        metric.WithInterval(10 * time.Second),
    )

    mp := metric.NewMeterProvider(
        metric.WithResource(res),
        metric.WithReader(reader),
    )

    otel.SetMeterProvider(mp)
    return func() { mp.Shutdown(ctx) }, nil
}

Logs in OTel are first-class signals. A LogRecord carries timestamp, severity, body, resource, attributes, and optionally trace_id + span_id for correlation.

LogRecord Fields
timestamp, severity (TRACE/DEBUG/INFO/WARN/ERROR), body, resource, attributes, trace_id, span_id
Trace Correlation
LogRecords emitted within a traced context carry trace_id and span_id. Enables log → trace drill-down.
Severity Levels
5 levels: TRACE (5), DEBUG (10), INFO (20), WARN (30), ERROR (40). Numeric for ordering.
Bridge Patterns
OTel bridges existing log libraries (log4j, stdlib, zap) via SDK log handlers. Existing logs flow into OTel pipeline.
Span
span_id=abc
←——→
trace context propagated
via W3C traceparent header
←——→
LogRecord
span_id=abc
Same trace_id flows through both — click log → navigate to trace
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("process-order") as span:
    # Log within this span — trace_id/span_id auto-attached
    span.add_event("log", attributes={
        "message": "order processing started",
        "order_id": order_id,
    })
    # Span events are logs within a trace context
    # Use span.add_event() for in-trace logging
import logging
from opentelemetry.sdk._logs import LoggingHandler

# Bridge Python stdlib logging → OTel SDK
otel_handler = LoggingHandler()
logging.root.addHandler(otel_handler)

# All logging.info(), logging.error() calls now flow through OTel
logging.info("payment processed", extra={"order_id": "12345"})
# These carry trace_id/span_id when called within a traced context
import "go.opentelemetry.io/otel/log"

// Create a logger (once at startup)
logger := mp.Logger("order-service")

// Emit a log record within a trace context
logger.Info(ctx, "order processed",
    log.WithAttributes(
        attribute.String("order.id", orderID),
        attribute.String("status", "completed"),
    ),
)
// ctx carries trace context — log record gets trace_id + span_id auto
OTel SeverityNumericTypical Mapping
TRACE5Most verbose — stack traces, full debug dumps
DEBUG10Debug info, variable dumps
INFO20Normal operations, state transitions
WARN30Unexpected but handled (retries, fallbacks)
ERROR40Failures requiring attention
Logs without trace context: A LogRecord without trace_id/span_id is a standalone log. Ingest it via the Collector filelog receiver and use resource attributes (service.name, etc.) for correlation.
Agent Mode (DaemonSet)
One Agent pod per node. Applications send to localhost. Agent does local batching, enrichment, and ships to Gateway. Reduces backend connections from apps.
Gateway Mode (Deployment)
Single or few Gateway pods. Agents forward to it. Gateway aggregates, tail-samples, and routes to backends. Central policy enforcement point.
Node 1
App Pod
OTel Agent (DaemonSet)
localhost:4317
Node 2
App Pod
OTel Agent (DaemonSet)
localhost:4317
OTel Gateway (Deployment)
k8sattributes processor, tail sampling
Backend
SigNoz / Tempo / Jaeger
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install otel-collector open-telemetry/opentelemetry-collector \
  --set mode=daemonset \
  --set config.receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317 \
  --set config.receivers.otlp.protocols.http.endpoint=0.0.0.0:4318
helm install otel-collector-gateway open-telemetry/opentelemetry-collector \
  --set mode=deployment \
  --set config.receivers.otlp.protocols.grpc.endpoint=0.0.0.0:4317 \
  --set config.exporters.otlp.endpoint=http://tempo:4317

Use the OpenTelemetry Operator with the Instrumentation CR to auto-inject OTel SDK into pods without modifying application code.

# Install OpenTelemetry Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

# Create Instrumentation resource (per namespace)
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
  namespace: default
spec:
  exporter:
    endpoint: http://otel-collector:4317
  propagators:
    - tracecontext
    - baggage
  resource:
    addK8sUID: true
    addAttributes: true
# Annotate pod to enable auto-instrumentation
apiVersion: v1
kind: Pod
metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-sdk: "true"
    instrumentation.opentelemetry.io/inject-tracer: "opentelemetry-auto"
spec:
  containers:
  - name: my-app
    image: my-app:latest

The k8sattributes processor automatically enriches spans with Kubernetes metadata.

processors:
  k8sattributes:
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.pod.start_time
        - k8s.container.name
        - k8s.container.restart_count
    filter:
      node: ".*worker.*"   # Only pods on worker nodes
Memory
Set memory limits to match memory_limiter config. Target 512–1024Mi for agents, 1–2Gi for gateways under load.
CPU
OTel Agent: 250–500m CPU. Gateway: 1–2 CPU. Enable ballast extension only if not using GOMEMLIMIT env var (Collector v0.91+).
Network
Agents expose ports 4317 (gRPC), 4318 (HTTP). Gateway needs egress for OTLP to backend. Ensure NetworkPolicy allows.
GOMEMLIMIT
Collector v0.91+ recommends GOMEMLIMIT instead of ballast. Set env: with GOMEMLIMIT: 400MiB.
Production checklist: memory_limiter processor on all pipelines, batch processor for export efficiency, k8sattributes for pod metadata enrichment, and tail_sampling at the gateway for cost control.