Kubernetes Networking — Deep Dive Visualization

00 Foundation

Three Non-Overlapping IP Ranges

Kubernetes requires three completely separate IP address spaces. Operators must configure all three — and they must never overlap.

Node Network

192.168.1.0/24 — Your infrastructure

Pod CIDR

10.244.0.0/16 — CNI allocates from this

Service CIDR

10.96.0.0/12 — Virtual IPs only

Range Name	Used For	Configured By	Example
Node Network	Host/node IP addresses	Network admin / DHCP	192.168.1.0/24
Pod CIDR	Pod IP addresses	CNI plugin / kubeadm --pod-network-cidr	10.244.0.0/16
Service CIDR	Virtual ClusterIPs	kube-apiserver --service-cluster-ip-range	10.96.0.0/12

Key insight: Service ClusterIPs are purely virtual — no interface ever has that address. kube-proxy creates iptables/IPVS rules to redirect traffic from the virtual IP to actual backing pods.

Warning: All three ranges must be non-overlapping. A pod IP must never equal a node IP or a service IP, and vice versa.

01 Flat Network

The Kubernetes Networking Guarantee

Kubernetes provides a flat network model where every pod gets its own IP and can communicate with every other pod directly — no NAT required.

Every Pod Gets Its Own IP

Pods are treated like VMs. No port mapping, no NAT — the pod's IP is its identity. Containers inside share the same network namespace.

No NAT Between Pods

Pods communicate directly using pod IPs. Traffic sent from pod A to pod B arrives with source IP = pod A, destination IP = pod B.

Same-Flat Network Across All Nodes

Pods on different nodes can communicate directly too. The network fabric ensures the pod IP is routable from anywhere in the cluster.

IP Per Container (via Pause)

Containers share the pod's network namespace. The pause container holds the network namespace so app containers can crash and restart without losing the IP.

Pod-to-Pod Communication (No NAT)

Node A

192.168.1.10

Pod A: 10.244.0.2

10.244.0.2 → 10.244.1.3

no NAT

Node B

192.168.1.11

Pod B: 10.244.1.3

02 Pod & Pause

The Pause Container & Network Namespace

Every pod runs a "pause" container (sandbox) that holds the network namespace. App containers join this namespace at startup.

Node

eth0 (in Pod netns)

IP: 10.244.0.2

Pause Container (Sandbox)

Holds the network namespace

□ eth0 @ 10.244.0.2

nginx (shares eth0)

sidecar (shares eth0)

All containers share eth0 from pause container's network namespace

Shared loopback (lo) — localhost works within pod

Network Namespace Held by Pause

One network namespace per pod, created before app containers start. The pause container is essentially a no-op process that just holds the namespace open.

Crash Isolation

If an app container crashes, the pause container keeps the network namespace alive. The IP persists and Kubernetes can restart the app without renumbering.

Shared loopback

All containers in a pod share lo. localhost within the pod reaches other containers via localhost:port — useful for sidecars and adapter patterns.

03 veth & Bridge

veth Pairs & the CNI Bridge

On the same node, pods communicate through virtual ethernet (veth) pairs connected to a bridge called cni0.

Same-Node Packet Flow

Pod A

10.244.0.2

veth-A

veth pair

🔄

cni0

veth pair

Pod B

10.244.0.3

veth-B

MAC Address	Interface	IP Address
aa:bb:cc:dd:ee:00	veth-A	10.244.0.2
aa:bb:cc:dd:ee:01	veth-B	10.244.0.3

veth Pair = Virtual Cable

A veth pair is like a pipe — one end in the pod's network namespace, the other on the host connected to the bridge. Traffic going in one end comes out the other.

Bridge Acts Like a Switch

The cni0 bridge learns MAC addresses from incoming frames and forwards them to the correct port. It's a software switch running on the host.

ARP Resolution

The bridge maintains an ARP table mapping IP addresses to MAC addresses. When Pod A wants Pod B, the bridge looks up which port has Pod B's MAC.

# Create veth pair and attach to bridge
ip link add veth-A type veth peer name veth-A-host
ip link set veth-A netns pod-A-pid
ip link set veth-A-host master cni0
ip addr add 10.244.0.2/24 dev veth-A
ip link set veth-A up

04 CNI Plugin

Container Network Interface

CNI is the contract between kubelet and network plugins. When a pod is created, kubelet calls the CNI plugin to set up networking.

1 Pod scheduled & created by kubelet

→ kubelet calls CNI ADD with container ID, netns path ADD /run/netns/pod-netns cni0

→ CNI plugin configures network (veth, bridge, IP allocation)

→ CNI returns assigned IP to kubelet IP: 10.244.0.5

→ Pod deletion triggers CNI DEL to teardown DEL /run/netns/pod-netns

# /etc/cni/net.d/10-bridge.conf
{
  "cniVersion": "0.4.0",
  "name": "bridge",
  "type": "bridge",
  "bridge": "cni0",
  "isGateway": true,
  "ipMasq": true,
  "ipam": {
    "type": "host-local",
    "subnet": "10.244.0.0/16",
    "routes": [{"dst": "0.0.0.0/0"}]
  }
}

Plugin Types

bridge

host-device

vlan

ipvlan

macvlan

ptp

portmap

bandwidth

tuning

sbr

flannel

calico

cilium

weave

05 CNI IPAM

IP Address Management

The host-local IPAM plugin allocates IPs from per-node subnets. When a pod is scheduled, the node's CNI allocates the next available IP from that node's /24.

Cluster CIDR Split into Node Subnets

10.244.0.0/16 (Cluster CIDR)

Node A

10.244.0.0/24

Pod IPs: 10.244.0.1–254

Node B

10.244.1.0/24

Pod IPs: 10.244.1.1–254

Node C

10.244.2.0/24

Pod IPs: 10.244.2.1–254

IP Allocation Flow

1 Pod scheduled to Node A

→

2 Kubelet calls CNI ADD

→

3 host-local assigns next from 10.244.0.0/24

→

4 IP: 10.244.0.5

host-local IPAM

Stores allocated IPs on the node filesystem (/var/lib/cni/networks/). Never reuses an IP until it's explicitly released. Simple, deterministic, per-node.

Node Subnet Pre-allocated

Each node receives its /24 subnet when the cluster is initialized. The node's CNI only hands out IPs from this pre-allocated pool.

No IP Conflicts

Since each node manages its own /24, there's no central IPAM coordinator needed. Two different nodes can both use 10.244.0.5 — they're different networks.

06 Multi-Node Overlay

VXLAN Encapsulation

When pods on different nodes need to communicate, the CNI encapsulates packets using VXLAN. The original pod IP packet is wrapped inside a UDP packet with node IPs as outer headers.

Packet Journey: Node A Pod → Node B Pod

Source Node A

Pod A: 10.244.0.2

Node: 192.168.1.10

UDP 4789 • VXLAN VNI 1

1. Pod A sends to 10.244.1.3 (Pod B IP)
2. Route table: not local → forward to cni0
3. Host routing: 10.244.1.3 not in local subnet
4. VXLAN VTEP encapsulates packet
5. Outer: src=192.168.1.10, dst=192.168.1.11

Destination Node B

Pod B: 10.244.1.3

Node: 192.168.1.11

Outer Header: src=192.168.1.10, dst=192.168.1.11, protocol=UDP, port=4789
VXLAN Header: VNI=1 (24-bit, supports 16M virtual networks)
Inner Packet: src=10.244.0.2, dst=10.244.1.3 (original pod IP packet)

CNI Overlay Modes Comparison

CNI / Mode	Encapsulation	Performance	Use Case
flannel (host-gw)	None	Best	Layer 2 adjacent nodes
flannel (UDP)	VXLAN	Good	Cross-subnet routing
calico (BGP)	None	Best	Large-scale, no encapsulation
calico (IPIP)	IP-in-IP	Good	Cross-subnet, simple tunnel
cilium	eBPF	Best	High-performance, observability

07 kube-proxy

kube-proxy & Service Load Balancing

Services get a virtual ClusterIP that doesn't correspond to any real interface. kube-proxy watches the API server and programs iptables or IPVS rules to load-balance traffic to backing pods.

ClusterIP (Virtual IP)

10.96.0.1

↓ ↓ ↓

10.244.0.10:80

10.244.0.11:80

10.244.0.12:80

kube-proxy Watching API Server

kube-apiserver

Port 6443

←

Endpoints changes

Pod IP:port updates

→

kube-proxy

Updates iptables/IPVS

	iptables	IPVS
Algorithm	Chain traversal	Hash table lookup
LB algorithms	Random only	RR, source hash, least conn
Scale	O(n) rules	O(1) lookup
Default	Yes	No (opt-in)

# iptables -L -t nat -L KUBE-SERVICES (abbreviated)
KUBE-SVC-XXXX  tcp  --  anywhere  10.96.0.1  tcp  dpt:http
# ... which jumps to ...
KUBE-SVC-XXXX  all  --  anywhere  anywhere

# DNAT to endpoint:
KUBE-SEP-YYYY  tcp  --  anywhere  anywhere  DNAT  to:10.244.0.10:80

08 Service IP Allocation

How ClusterIPs Are Allocated

Service ClusterIPs are allocated from the service CIDR by the API server. They are purely virtual — no network interface is ever assigned these addresses.

Service CIDR Address Space

10.96.0.0/12

10.96.0.1 — kubernetes.default.svc (reserved)

Available for Services (10.96.0.2 – 10.107.255.255)

Service ClusterIP Allocation Steps

1 Service created (no clusterIP specified)

2 apiserver allocates next available from service CIDR

3 apiserver stores in etcd

4 kube-proxy watches, creates iptables rules

5 ClusterIP never assigned to any interface

Headless Services

clusterIP: None means no virtual IP is allocated. CoreDNS returns pod IPs directly instead. Used for service discovery when you need direct pod access.

Any Unallocated IP Works

ClusterIP can be any unallocated IP in the service CIDR — even non-routable ones. It's purely a virtual mapping managed by iptables/IPVS rules.

Key insight: ClusterIP can be any unallocated IP in the service CIDR — even non-routable ones. It's purely a virtual mapping managed by iptables/IPVS.

09 Endpoints

Endpoints & EndpointSlices

An Endpoint is a collection of IP:port pairs that back a Service. When a pod with matching labels is created, its IP:port is added to the Endpoint. When it dies, it's removed.

Service → Endpoints Relationship

Service

selector: app=nginx

ClusterIP: 10.96.0.100

→

EndpointSlice: 10.244.0.10:80

EndpointSlice: 10.244.0.11:80

EndpointSlice: 10.244.0.12:80

Pod dies: Endpoint removed → kube-proxy updates rules → traffic stops routing to that pod

EndpointSlices (~100 per Service)

Kubernetes groups ~100 endpoints per EndpointSlice object. A single large Service can have multiple EndpointSlices. This avoids single-object size limits.

Pod Death → Endpoint Removal

When a pod terminates, the endpoints controller immediately removes it from the EndpointSlice. kube-proxy reacts by updating its rules within seconds.

Headless = No ClusterIP

Headless services (clusterIP: None) still have Endpoints, but no virtual IP. DNS returns the pod IPs directly instead of a service VIP.

# Check endpoints for a service
kubectl get endpoints <svc-name>

# Sample output:
NAME              ENDPOINTS                        AGE
my-svc         10.244.0.10:80,10.244.0.11:80   5d

10 CoreDNS

CoreDNS & Service Discovery

CoreDNS is the default DNS server for Kubernetes clusters. It handles DNS queries for cluster-local service names and converts them to ClusterIPs or pod IPs.

DNS Query Flow

1 busybox runs nslookup my-svc.default.svc.cluster.local

2 Query sent to kube-dns (100.64.0.10) /etc/resolv.conf nameserver 100.64.0.10

3 CoreDNS looks up in its cache

4 Returns ClusterIP 10.96.0.100

Pod FQDN

pod-name.ns.pod.svc.cluster.local — pods can also be resolved by their full FQDN, useful for direct pod-to-pod communication.

Headless = Direct Pod IPs

For headless services (clusterIP: None), CoreDNS returns A records for all matching pod IPs directly. No load balancing at the DNS level.

SRV Records

_http._tcp.my-svc.ns.svc.cluster.local points to my-svc.ns.svc.cluster.local:80 — used for service discovery by protocols that need port info.

# Pod /etc/resolv.conf
nameserver  100.64.0.10
search  default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

# CoreDNS ConfigMap (simplified)
cluster.local {
  # forward cluster domain queries to upstream
}

11 Ingress

Ingress & HTTP Routing

Ingress is the Kubernetes resource for external HTTP/HTTPS access to Services. The Ingress Controller (nginx, contour, traefik) enforces routing rules defined in Ingress resources.

🌐

External Client

→

🔌

Ingress Controller

nginx

→

🔗

Service A

10.96.0.100

→

🐈

Pod A

10.244.0.10

Path-Based Routing

Host Header	Path	Service	Port
api.example.com	/api/*	api-svc	8080
api.example.com	/static/*	static-svc	8081
api.example.com	/	default-svc	8082

Host-Based Routing

api.example.com → Backend: api-svc:8080

dashboard.example.com → Backend: dashboard-svc:8080

grafana.example.com → Backend: monitoring-svc:3000

Ingress Controller vs Ingress Resource

The Ingress Controller is the actual HTTP proxy (nginx, contour, traefik). The Ingress resource is the Kubernetes configuration object that describes routing rules.

TLS Termination

HTTPS traffic is terminated at the Ingress controller. The controller holds the TLS certificate and decrypts traffic before forwarding to backend services over plain HTTP.

# Ingress resource (nginx)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-svc
            port:
              number: 8080