All Visualizations
Kubernetes Deep Dive

Kubernetes Networking

From IP ranges to Ingress — an interactive walkthrough of how networking works inside Kubernetes, one layer at a time.

Three Non-Overlapping IP Ranges

Kubernetes requires three completely separate IP address spaces. Operators must configure all three — and they must never overlap.

Node Network
192.168.1.0/24 — Your infrastructure
Pod CIDR
10.244.0.0/16 — CNI allocates from this
Service CIDR
10.96.0.0/12 — Virtual IPs only
Range Name Used For Configured By Example
Node Network Host/node IP addresses Network admin / DHCP 192.168.1.0/24
Pod CIDR Pod IP addresses CNI plugin / kubeadm --pod-network-cidr 10.244.0.0/16
Service CIDR Virtual ClusterIPs kube-apiserver --service-cluster-ip-range 10.96.0.0/12
Key insight: Service ClusterIPs are purely virtual — no interface ever has that address. kube-proxy creates iptables/IPVS rules to redirect traffic from the virtual IP to actual backing pods.
Warning: All three ranges must be non-overlapping. A pod IP must never equal a node IP or a service IP, and vice versa.

The Kubernetes Networking Guarantee

Kubernetes provides a flat network model where every pod gets its own IP and can communicate with every other pod directly — no NAT required.

Every Pod Gets Its Own IP

Pods are treated like VMs. No port mapping, no NAT — the pod's IP is its identity. Containers inside share the same network namespace.

No NAT Between Pods

Pods communicate directly using pod IPs. Traffic sent from pod A to pod B arrives with source IP = pod A, destination IP = pod B.

Same-Flat Network Across All Nodes

Pods on different nodes can communicate directly too. The network fabric ensures the pod IP is routable from anywhere in the cluster.

IP Per Container (via Pause)

Containers share the pod's network namespace. The pause container holds the network namespace so app containers can crash and restart without losing the IP.

Pod-to-Pod Communication (No NAT)
Node A
192.168.1.10
Pod A: 10.244.0.2
10.244.0.2 → 10.244.1.3
no NAT
Node B
192.168.1.11
Pod B: 10.244.1.3

The Pause Container & Network Namespace

Every pod runs a "pause" container (sandbox) that holds the network namespace. App containers join this namespace at startup.

Node
eth0 (in Pod netns)
IP: 10.244.0.2
Pause Container (Sandbox)
Holds the network namespace
□ eth0 @ 10.244.0.2
nginx (shares eth0)
sidecar (shares eth0)
All containers share eth0 from pause container's network namespace
Shared loopback (lo) — localhost works within pod

Network Namespace Held by Pause

One network namespace per pod, created before app containers start. The pause container is essentially a no-op process that just holds the namespace open.

Crash Isolation

If an app container crashes, the pause container keeps the network namespace alive. The IP persists and Kubernetes can restart the app without renumbering.

Shared loopback

All containers in a pod share lo. localhost within the pod reaches other containers via localhost:port — useful for sidecars and adapter patterns.

veth Pairs & the CNI Bridge

On the same node, pods communicate through virtual ethernet (veth) pairs connected to a bridge called cni0.

Same-Node Packet Flow
Pod A
10.244.0.2
veth-A
veth pair
🔄
cni0
veth pair
Pod B
10.244.0.3
veth-B
MAC Address Interface IP Address
aa:bb:cc:dd:ee:00 veth-A 10.244.0.2
aa:bb:cc:dd:ee:01 veth-B 10.244.0.3

veth Pair = Virtual Cable

A veth pair is like a pipe — one end in the pod's network namespace, the other on the host connected to the bridge. Traffic going in one end comes out the other.

Bridge Acts Like a Switch

The cni0 bridge learns MAC addresses from incoming frames and forwards them to the correct port. It's a software switch running on the host.

ARP Resolution

The bridge maintains an ARP table mapping IP addresses to MAC addresses. When Pod A wants Pod B, the bridge looks up which port has Pod B's MAC.

# Create veth pair and attach to bridge ip link add veth-A type veth peer name veth-A-host ip link set veth-A netns pod-A-pid ip link set veth-A-host master cni0 ip addr add 10.244.0.2/24 dev veth-A ip link set veth-A up

Container Network Interface

CNI is the contract between kubelet and network plugins. When a pod is created, kubelet calls the CNI plugin to set up networking.

1 Pod scheduled & created by kubelet
kubelet calls CNI ADD with container ID, netns path ADD /run/netns/pod-netns cni0
CNI plugin configures network (veth, bridge, IP allocation)
CNI returns assigned IP to kubelet IP: 10.244.0.5
Pod deletion triggers CNI DEL to teardown DEL /run/netns/pod-netns
# /etc/cni/net.d/10-bridge.conf { "cniVersion": "0.4.0", "name": "bridge", "type": "bridge", "bridge": "cni0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "subnet": "10.244.0.0/16", "routes": [{"dst": "0.0.0.0/0"}] } }
Plugin Types
bridge
host-device
vlan
ipvlan
macvlan
ptp
portmap
bandwidth
tuning
sbr
flannel
calico
cilium
weave

IP Address Management

The host-local IPAM plugin allocates IPs from per-node subnets. When a pod is scheduled, the node's CNI allocates the next available IP from that node's /24.

Cluster CIDR Split into Node Subnets
10.244.0.0/16 (Cluster CIDR)
Node A
10.244.0.0/24
Pod IPs: 10.244.0.1–254
Node B
10.244.1.0/24
Pod IPs: 10.244.1.1–254
Node C
10.244.2.0/24
Pod IPs: 10.244.2.1–254
IP Allocation Flow
1 Pod scheduled to Node A
2 Kubelet calls CNI ADD
3 host-local assigns next from 10.244.0.0/24
4 IP: 10.244.0.5

host-local IPAM

Stores allocated IPs on the node filesystem (/var/lib/cni/networks/). Never reuses an IP until it's explicitly released. Simple, deterministic, per-node.

Node Subnet Pre-allocated

Each node receives its /24 subnet when the cluster is initialized. The node's CNI only hands out IPs from this pre-allocated pool.

No IP Conflicts

Since each node manages its own /24, there's no central IPAM coordinator needed. Two different nodes can both use 10.244.0.5 — they're different networks.

VXLAN Encapsulation

When pods on different nodes need to communicate, the CNI encapsulates packets using VXLAN. The original pod IP packet is wrapped inside a UDP packet with node IPs as outer headers.

Packet Journey: Node A Pod → Node B Pod
Source Node A
Pod A: 10.244.0.2
Node: 192.168.1.10
UDP 4789 VXLAN VNI 1
1. Pod A sends to 10.244.1.3 (Pod B IP)
2. Route table: not local → forward to cni0
3. Host routing: 10.244.1.3 not in local subnet
4. VXLAN VTEP encapsulates packet
5. Outer: src=192.168.1.10, dst=192.168.1.11
Destination Node B
Pod B: 10.244.1.3
Node: 192.168.1.11
Outer Header: src=192.168.1.10, dst=192.168.1.11, protocol=UDP, port=4789
VXLAN Header: VNI=1 (24-bit, supports 16M virtual networks)
Inner Packet: src=10.244.0.2, dst=10.244.1.3 (original pod IP packet)
CNI Overlay Modes Comparison
CNI / Mode Encapsulation Performance Use Case
flannel (host-gw) None Best Layer 2 adjacent nodes
flannel (UDP) VXLAN Good Cross-subnet routing
calico (BGP) None Best Large-scale, no encapsulation
calico (IPIP) IP-in-IP Good Cross-subnet, simple tunnel
cilium eBPF Best High-performance, observability

kube-proxy & Service Load Balancing

Services get a virtual ClusterIP that doesn't correspond to any real interface. kube-proxy watches the API server and programs iptables or IPVS rules to load-balance traffic to backing pods.

ClusterIP (Virtual IP)
10.96.0.1
↓ ↓ ↓
10.244.0.10:80
10.244.0.11:80
10.244.0.12:80
kube-proxy Watching API Server
kube-apiserver
Port 6443
Endpoints changes
Pod IP:port updates
kube-proxy
Updates iptables/IPVS
iptables IPVS
Algorithm Chain traversal Hash table lookup
LB algorithms Random only RR, source hash, least conn
Scale O(n) rules O(1) lookup
Default Yes No (opt-in)
# iptables -L -t nat -L KUBE-SERVICES (abbreviated) KUBE-SVC-XXXX tcp -- anywhere 10.96.0.1 tcp dpt:http # ... which jumps to ... KUBE-SVC-XXXX all -- anywhere anywhere # DNAT to endpoint: KUBE-SEP-YYYY tcp -- anywhere anywhere DNAT to:10.244.0.10:80

How ClusterIPs Are Allocated

Service ClusterIPs are allocated from the service CIDR by the API server. They are purely virtual — no network interface is ever assigned these addresses.

Service CIDR Address Space
10.96.0.0/12
10.96.0.1 — kubernetes.default.svc (reserved)
Available for Services (10.96.0.2 – 10.107.255.255)
Service ClusterIP Allocation Steps
1 Service created (no clusterIP specified)
2 apiserver allocates next available from service CIDR
3 apiserver stores in etcd
4 kube-proxy watches, creates iptables rules
5 ClusterIP never assigned to any interface

Headless Services

clusterIP: None means no virtual IP is allocated. CoreDNS returns pod IPs directly instead. Used for service discovery when you need direct pod access.

Any Unallocated IP Works

ClusterIP can be any unallocated IP in the service CIDR — even non-routable ones. It's purely a virtual mapping managed by iptables/IPVS rules.

Key insight: ClusterIP can be any unallocated IP in the service CIDR — even non-routable ones. It's purely a virtual mapping managed by iptables/IPVS.

Endpoints & EndpointSlices

An Endpoint is a collection of IP:port pairs that back a Service. When a pod with matching labels is created, its IP:port is added to the Endpoint. When it dies, it's removed.

Service → Endpoints Relationship
Service
selector: app=nginx
ClusterIP: 10.96.0.100
EndpointSlice: 10.244.0.10:80
EndpointSlice: 10.244.0.11:80
EndpointSlice: 10.244.0.12:80
Pod dies: Endpoint removed → kube-proxy updates rules → traffic stops routing to that pod

EndpointSlices (~100 per Service)

Kubernetes groups ~100 endpoints per EndpointSlice object. A single large Service can have multiple EndpointSlices. This avoids single-object size limits.

Pod Death → Endpoint Removal

When a pod terminates, the endpoints controller immediately removes it from the EndpointSlice. kube-proxy reacts by updating its rules within seconds.

Headless = No ClusterIP

Headless services (clusterIP: None) still have Endpoints, but no virtual IP. DNS returns the pod IPs directly instead of a service VIP.

# Check endpoints for a service kubectl get endpoints <svc-name> # Sample output: NAME ENDPOINTS AGE my-svc 10.244.0.10:80,10.244.0.11:80 5d

CoreDNS & Service Discovery

CoreDNS is the default DNS server for Kubernetes clusters. It handles DNS queries for cluster-local service names and converts them to ClusterIPs or pod IPs.

DNS Query Flow
1 busybox runs nslookup my-svc.default.svc.cluster.local
2 Query sent to kube-dns (100.64.0.10) /etc/resolv.conf nameserver 100.64.0.10
3 CoreDNS looks up in its cache
4 Returns ClusterIP 10.96.0.100

Pod FQDN

pod-name.ns.pod.svc.cluster.local — pods can also be resolved by their full FQDN, useful for direct pod-to-pod communication.

Headless = Direct Pod IPs

For headless services (clusterIP: None), CoreDNS returns A records for all matching pod IPs directly. No load balancing at the DNS level.

SRV Records

_http._tcp.my-svc.ns.svc.cluster.local points to my-svc.ns.svc.cluster.local:80 — used for service discovery by protocols that need port info.

# Pod /etc/resolv.conf nameserver 100.64.0.10 search default.svc.cluster.local svc.cluster.local cluster.local options ndots:5 # CoreDNS ConfigMap (simplified) cluster.local { # forward cluster domain queries to upstream }

Ingress & HTTP Routing

Ingress is the Kubernetes resource for external HTTP/HTTPS access to Services. The Ingress Controller (nginx, contour, traefik) enforces routing rules defined in Ingress resources.

🌐
External Client
🔌
Ingress Controller
nginx
🔗
Service A
10.96.0.100
🐈
Pod A
10.244.0.10
Path-Based Routing
Host Header Path Service Port
api.example.com /api/* api-svc 8080
api.example.com /static/* static-svc 8081
api.example.com / default-svc 8082
Host-Based Routing
api.example.com Backend: api-svc:8080
dashboard.example.com Backend: dashboard-svc:8080
grafana.example.com Backend: monitoring-svc:3000

Ingress Controller vs Ingress Resource

The Ingress Controller is the actual HTTP proxy (nginx, contour, traefik). The Ingress resource is the Kubernetes configuration object that describes routing rules.

TLS Termination

HTTPS traffic is terminated at the Ingress controller. The controller holds the TLS certificate and decrypts traffic before forwarding to backend services over plain HTTP.

# Ingress resource (nginx) apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: my-ingress annotations: nginx.ingress.kubernetes.io/rewrite-target: / spec: ingressClassName: nginx rules: - host: api.example.com http: paths: - path: /api pathType: Prefix backend: service: name: api-svc port: number: 8080