Three Non-Overlapping IP Ranges
Kubernetes requires three completely separate IP address spaces. Operators must configure all three — and they must never overlap.
| Range Name | Used For | Configured By | Example |
|---|---|---|---|
| Node Network | Host/node IP addresses | Network admin / DHCP | 192.168.1.0/24 |
| Pod CIDR | Pod IP addresses | CNI plugin / kubeadm --pod-network-cidr | 10.244.0.0/16 |
| Service CIDR | Virtual ClusterIPs | kube-apiserver --service-cluster-ip-range | 10.96.0.0/12 |
The Kubernetes Networking Guarantee
Kubernetes provides a flat network model where every pod gets its own IP and can communicate with every other pod directly — no NAT required.
Every Pod Gets Its Own IP
Pods are treated like VMs. No port mapping, no NAT — the pod's IP is its identity. Containers inside share the same network namespace.
No NAT Between Pods
Pods communicate directly using pod IPs. Traffic sent from pod A to pod B arrives with source IP = pod A, destination IP = pod B.
Same-Flat Network Across All Nodes
Pods on different nodes can communicate directly too. The network fabric ensures the pod IP is routable from anywhere in the cluster.
IP Per Container (via Pause)
Containers share the pod's network namespace. The pause container holds the network namespace so app containers can crash and restart without losing the IP.
The Pause Container & Network Namespace
Every pod runs a "pause" container (sandbox) that holds the network namespace. App containers join this namespace at startup.
Network Namespace Held by Pause
One network namespace per pod, created before app containers start. The pause container is essentially a no-op process that just holds the namespace open.
Crash Isolation
If an app container crashes, the pause container keeps the network namespace alive. The IP persists and Kubernetes can restart the app without renumbering.
Shared loopback
All containers in a pod share lo. localhost within the pod reaches other containers via localhost:port — useful for sidecars and adapter patterns.
veth Pairs & the CNI Bridge
On the same node, pods communicate through virtual ethernet (veth) pairs connected to a bridge called cni0.
| MAC Address | Interface | IP Address |
|---|---|---|
| aa:bb:cc:dd:ee:00 | veth-A | 10.244.0.2 |
| aa:bb:cc:dd:ee:01 | veth-B | 10.244.0.3 |
veth Pair = Virtual Cable
A veth pair is like a pipe — one end in the pod's network namespace, the other on the host connected to the bridge. Traffic going in one end comes out the other.
Bridge Acts Like a Switch
The cni0 bridge learns MAC addresses from incoming frames and forwards them to the correct port. It's a software switch running on the host.
ARP Resolution
The bridge maintains an ARP table mapping IP addresses to MAC addresses. When Pod A wants Pod B, the bridge looks up which port has Pod B's MAC.
Container Network Interface
CNI is the contract between kubelet and network plugins. When a pod is created, kubelet calls the CNI plugin to set up networking.
IP Address Management
The host-local IPAM plugin allocates IPs from per-node subnets. When a pod is scheduled, the node's CNI allocates the next available IP from that node's /24.
host-local IPAM
Stores allocated IPs on the node filesystem (/var/lib/cni/networks/). Never reuses an IP until it's explicitly released. Simple, deterministic, per-node.
Node Subnet Pre-allocated
Each node receives its /24 subnet when the cluster is initialized. The node's CNI only hands out IPs from this pre-allocated pool.
No IP Conflicts
Since each node manages its own /24, there's no central IPAM coordinator needed. Two different nodes can both use 10.244.0.5 — they're different networks.
VXLAN Encapsulation
When pods on different nodes need to communicate, the CNI encapsulates packets using VXLAN. The original pod IP packet is wrapped inside a UDP packet with node IPs as outer headers.
2. Route table: not local → forward to cni0
3. Host routing: 10.244.1.3 not in local subnet
4. VXLAN VTEP encapsulates packet
5. Outer: src=192.168.1.10, dst=192.168.1.11
VXLAN Header: VNI=1 (24-bit, supports 16M virtual networks)
Inner Packet: src=10.244.0.2, dst=10.244.1.3 (original pod IP packet)
| CNI / Mode | Encapsulation | Performance | Use Case |
|---|---|---|---|
| flannel (host-gw) | None | Best | Layer 2 adjacent nodes |
| flannel (UDP) | VXLAN | Good | Cross-subnet routing |
| calico (BGP) | None | Best | Large-scale, no encapsulation |
| calico (IPIP) | IP-in-IP | Good | Cross-subnet, simple tunnel |
| cilium | eBPF | Best | High-performance, observability |
kube-proxy & Service Load Balancing
Services get a virtual ClusterIP that doesn't correspond to any real interface. kube-proxy watches the API server and programs iptables or IPVS rules to load-balance traffic to backing pods.
| iptables | IPVS | |
|---|---|---|
| Algorithm | Chain traversal | Hash table lookup |
| LB algorithms | Random only | RR, source hash, least conn |
| Scale | O(n) rules | O(1) lookup |
| Default | Yes | No (opt-in) |
How ClusterIPs Are Allocated
Service ClusterIPs are allocated from the service CIDR by the API server. They are purely virtual — no network interface is ever assigned these addresses.
Headless Services
clusterIP: None means no virtual IP is allocated. CoreDNS returns pod IPs directly instead. Used for service discovery when you need direct pod access.
Any Unallocated IP Works
ClusterIP can be any unallocated IP in the service CIDR — even non-routable ones. It's purely a virtual mapping managed by iptables/IPVS rules.
Endpoints & EndpointSlices
An Endpoint is a collection of IP:port pairs that back a Service. When a pod with matching labels is created, its IP:port is added to the Endpoint. When it dies, it's removed.
EndpointSlices (~100 per Service)
Kubernetes groups ~100 endpoints per EndpointSlice object. A single large Service can have multiple EndpointSlices. This avoids single-object size limits.
Pod Death → Endpoint Removal
When a pod terminates, the endpoints controller immediately removes it from the EndpointSlice. kube-proxy reacts by updating its rules within seconds.
Headless = No ClusterIP
Headless services (clusterIP: None) still have Endpoints, but no virtual IP. DNS returns the pod IPs directly instead of a service VIP.
CoreDNS & Service Discovery
CoreDNS is the default DNS server for Kubernetes clusters. It handles DNS queries for cluster-local service names and converts them to ClusterIPs or pod IPs.
Pod FQDN
pod-name.ns.pod.svc.cluster.local — pods can also be resolved by their full FQDN, useful for direct pod-to-pod communication.
Headless = Direct Pod IPs
For headless services (clusterIP: None), CoreDNS returns A records for all matching pod IPs directly. No load balancing at the DNS level.
SRV Records
_http._tcp.my-svc.ns.svc.cluster.local points to my-svc.ns.svc.cluster.local:80 — used for service discovery by protocols that need port info.
Ingress & HTTP Routing
Ingress is the Kubernetes resource for external HTTP/HTTPS access to Services. The Ingress Controller (nginx, contour, traefik) enforces routing rules defined in Ingress resources.
| Host Header | Path | Service | Port |
|---|---|---|---|
| api.example.com | /api/* | api-svc | 8080 |
| api.example.com | /static/* | static-svc | 8081 |
| api.example.com | / | default-svc | 8082 |
Ingress Controller vs Ingress Resource
The Ingress Controller is the actual HTTP proxy (nginx, contour, traefik). The Ingress resource is the Kubernetes configuration object that describes routing rules.
TLS Termination
HTTPS traffic is terminated at the Ingress controller. The controller holds the TLS certificate and decrypts traffic before forwarding to backend services over plain HTTP.