Networking Fundamentals Part 7 — Load Balancers and Proxies

When a Single Server Isn’t Enough
Proxies — Something Sitting in the Middle
L4 vs. L7 — Which Layer Does It Inspect?
Distribution Algorithms — Who Gets the Next Request?
Session Affinity — Same User, Same Server
Health Checks — Dead Servers Must Be Removed
TLS Termination — Where to Strip the Encryption
CDN — The Load Balancer’s Distant Relative
Real-World Products — What Goes Where
- nginx
- HAProxy
- Envoy
- Cloudflare
Wrapping Up the Series — And What Comes Next

When a Single Server Isn’t Enough

Let’s briefly recap the journey from Part 1 to here. IP and ports determine where to talk, TCP/UDP set the mode of conversation, DNS translates names to addresses, HTTP carries the request, and TLS wraps it securely. Everything up to this point assumed a one-to-one relationship between client and server.

But when traffic grows, a single server can’t keep up. There are two options: scaling up by beefing up the hardware, or scaling out by running multiple machines with the same function. Scaling up has limits and is expensive. So most web services choose scale-out — spinning up multiple servers and placing a device that distributes traffic in front of them.

That device is the load balancer. And alongside it comes the proxy, which serves a similar but slightly different role. This post covers how these two concepts work, at which layers, and in what ways.

Proxies — Something Sitting in the Middle

Let’s first clarify what a proxy is. A proxy is an intermediary device that sits between client and server, relaying requests and responses. Its name varies depending on where it sits.

flowchart LR
    subgraph FW["Forward Proxy (client-side)"]
        C1[Employee PC] --> FP[Corporate proxy]
        FP --> INT1[Internet servers]
    end
    subgraph RV["Reverse Proxy (server-side)"]
        INT2[External users] --> RP[Reverse proxy]
        RP --> B1[Internal server A]
        RP --> B2[Internal server B]
    end

The difference in one sentence: a forward proxy acts on behalf of the client; a reverse proxy acts on behalf of the server.

Forward proxy: Funnels outbound requests from a corporate network through a single point. Used for internet access control, caching, and audit logging. The client knows it’s using a proxy — you configure the proxy address in the browser
Reverse proxy: Sits in front of servers and receives external requests. The client typically doesn’t know the proxy exists — they just connect to a single domain. The dozens of backends behind it are hidden by the reverse proxy

A load balancer is a type of reverse proxy. That’s why tools like nginx and HAProxy are simultaneously called reverse proxies and load balancers. In this post, “proxy” generally refers to a reverse proxy.

L4 vs. L7 — Which Layer Does It Inspect?

The most important distinction for load balancers is which OSI layer the decision is made at. OSI divides network functions into 7 layers, as briefly covered in Part 1. In the load balancer context, you only need to remember L4 (Transport layer — TCP/UDP) and L7 (Application layer — HTTP).

flowchart TB
    subgraph L4["L4 Load Balancer (TCP/UDP)"]
        L4IN[Packet] -->|Looks at IP + port only| L4DEC[Distribution decision]
        L4DEC --> L4B1[Backend]
    end
    subgraph L7["L7 Load Balancer (HTTP)"]
        L7IN[HTTP request] -->|Host, path, headers, cookies| L7DEC[Distribution decision]
        L7DEC --> L7B1[Backend]
    end

An L4 load balancer distributes packets based on IP and port only. It doesn’t care about HTTP headers or URL paths. The advantage is speed and simplicity — processing at microsecond levels per connection, protocol-agnostic as long as it’s TCP-based. MySQL connections, Redis connections, gRPC, and even custom game server protocols can all be balanced at L4.

An L7 load balancer parses the HTTP message. It examines the Host header, URL path, cookies, and HTTP method. This enables sophisticated routing like “/api goes to backend A, /static goes to backend B” or “route api.example.com and admin.example.com differently.” TLS termination from Part 6 is also L7 territory — you have to decrypt the ciphertext before you can see the HTTP headers.

Attribute	L4	L7
Information inspected	IP, port	HTTP host, path, headers, cookies
Protocols	TCP/UDP broadly	HTTP(S) primarily
TLS termination	Typically no (passthrough)	Yes
Speed	Very fast	Slower (parsing overhead)
Capabilities	Simple distribution	Path-based routing, header modification, authentication

In practice, both layers are often combined. An L4 front line forwards all incoming traffic to a cluster of L7 load balancers, and the L7 layer handles HTTP specifics. Google’s Maglev and the AWS NLB + ALB combo follow this pattern.

Distribution Algorithms — Who Gets the Next Request?

The rule for choosing which server receives the next request among multiple backends is the distribution algorithm. Here are four commonly used ones.

Round-Robin

As the name suggests, requests are distributed sequentially in rotation. The simplest approach. With three backends: 1 → 2 → 3 → 1 → 2 → 3.

flowchart LR
    R[Incoming requests<br/>1,2,3,4,5,6,7,8] --> LB[LB: round-robin]
    LB -->|1,4,7| A[Server A]
    LB -->|2,5,8| B[Server B]
    LB -->|3,6| C[Server C]

Works well when backends have equal capacity and load. When request processing times vary wildly, some servers idle while others are swamped.

Weighted Round-Robin

Each server gets a weight. Useful when mixing servers of different specs or gradually shifting traffic to a new version (canary deployment). With weights A:3, B:2, C:1, out of every six requests three go to A, two to B, and one to C.

Least Connections

Sends the next request to the server with the fewest active connections. For workloads where request processing times vary greatly (WebSocket, streaming, long analytics queries), this is much fairer than round-robin. It requires maintaining connection counters, introducing a bit of state, but the cost is minimal.

Source IP Hash (ip-hash)

Hashes the client IP to determine which server to route to. Requests from the same IP always go to the same server. This forms the basis for the next topic: session affinity.

Session Affinity — Same User, Same Server

HTTP is fundamentally stateless. But in legacy apps that store login sessions in server memory, if the same user’s requests don’t go to the same server, the session breaks. This is solved by session affinity (sticky sessions).

Two implementations are common.

IP-based: Hash the client IP and pin to the same server. If many people are behind a single NAT, they all pile onto one server
Cookie-based: The load balancer attaches a special cookie to the response. On the next request, it reads that cookie and routes to the same server. Avoids the IP method’s skew and is more accurate

sequenceDiagram
    participant U as User
    participant LB as Load Balancer
    participant B1 as Backend 1
    U->>LB: First request
    LB->>B1: Selected by round-robin
    B1-->>LB: Response
    LB-->>U: Set-Cookie: LB_ROUTE=B1
    U->>LB: Second request (Cookie: LB_ROUTE=B1)
    LB->>B1: Cookie parsed → same server
    B1-->>U: Response with session maintained

Session affinity is useful but fundamentally a workaround. Keeping state on servers is the enemy of scale-out. Adding a new server doesn’t migrate existing sessions, and if a server dies, every user with affinity to that server is logged out. The modern preference is to externalize sessions to an external store like Redis, making servers fully stateless. Then any server can produce the same response regardless.

Health Checks — Dead Servers Must Be Removed

If the load balancer doesn’t know some backends are dead or misbehaving, traffic keeps flowing to them and 5xx errors pour out. So load balancers periodically send health checks to backends.

# Conceptual health check configuration example (nginx upstream)
upstream backend {
    server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;
}

L4 health check: Only checks if a TCP connection can be established. Fast, but misses “the app is dead but the port is open” situations
L7 health check: Sends an HTTP request to a specific URL and expects a 200 response. If the /health endpoint is built to also verify the DB connection, it can judge whether “the app is truly alive”

Kubernetes’ readiness and liveness probes are the same concept. Regardless of the layer, the core idea is periodically asking “are you ready to receive traffic?”

TLS Termination — Where to Strip the Encryption

The TLS handshake from Part 6 has a notable CPU cost. If dozens of backend servers each handle TLS independently, resources are redundantly consumed. So in many environments, the load balancer decrypts TLS and sends plain HTTP internally. This point is called TLS termination.

flowchart LR
    CLIENT[Client] -->|HTTPS| LB["L7 LB<br/>TLS termination"]
    LB -->|HTTP| B1[Backend 1]
    LB -->|HTTP| B2[Backend 2]
    LB -->|HTTP| B3[Backend 3]

The advantages are clear: certificate management is centralized (easier to renew), backends don’t need to know about TLS, and HTTP header-based routing becomes naturally possible. The downside is that the segment between LB and backends is plaintext. In the zero trust era (a security model that says “don’t trust the internal network either”), the trend is to re-encrypt this segment with TLS as well (mTLS, mutual TLS). Service meshes like Istio automate this work.

CDN — The Load Balancer’s Distant Relative

A CDN (Content Delivery Network — a distributed network that serves content from servers physically close to the user) is similar to a load balancer but with a different flavor. While a load balancer distributes traffic within a single data center, a CDN distributes content across edge servers deployed worldwide.

Responding from the edge closest to the user drastically reduces latency and offloads the origin server. The general mechanism works like this.

flowchart TB
    U1[Seoul user] --> E1[Edge: Seoul]
    U2[London user] --> E2[Edge: London]
    U3[Sao Paulo user] --> E3[Edge: Sao Paulo]
    E1 --> O["Origin<br/>(US-East data center)"]
    E2 --> O
    E3 --> O
    E1 -.->|Cache hit skips origin call| U1

DNS directs to the nearest edge: When a user looks up a domain, the CDN’s DNS returns the IP of the nearest edge based on the user’s IP. Low-level techniques like anycast are also used
Cache decision at the edge: If the edge already has the requested resource (cache hit), it doesn’t go to the origin. On a miss, it requests from the origin, stores the response at the edge, and returns it
Cache TTL: The server uses the Cache-Control header from Part 5 to dictate “how long this can be stored.” Static assets get long TTLs; API responses get short ones or no-store

Today’s CDNs (Cloudflare, Fastly, CloudFront, etc.) go beyond simple caching to run code at the edge. Routing rules, A/B testing, WAF (Web Application Firewall — blocks web attacks), edge functions — a lot of work finishes at the edge before ever reaching the origin. The trend is that “the CDN is the application’s front door.”

Real-World Products — What Goes Where

Finally, let’s survey the products that carry weight in practice. Each has different strengths.

nginx

A long-established reverse proxy and web server. Available as a default package in Linux distributions, with low barrier to entry and declarative configuration syntax. Handles L7 routing, TLS termination, static file serving, and caching all in one. Also the most widely used controller as ingress-nginx in Kubernetes.

# nginx: /api goes to backend, everything else is static
upstream api_backend {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
}

server {
    listen 443 ssl;
    server_name example.com;

    ssl_certificate     /etc/nginx/tls/example.crt;
    ssl_certificate_key /etc/nginx/tls/example.key;

    location /api/ {
        proxy_pass http://api_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

    location / {
        root /var/www/html;
    }
}

In this configuration, requests to /api/ are round-robined between the two backends. Passing the original client IP to the backend via the X-Forwarded-For header is standard reverse proxy etiquette.

HAProxy

As its name implies (High Availability Proxy), this product focuses on high-performance load balancing. Strong at both L4 and L7, with rich statistics and fine-grained tuning options. Frequently seen in telecom, financial, and game server environments handling extreme traffic.

Envoy

A modern proxy led by Google and Lyft. Built for cloud-native environments, it treats dynamic configuration and observability as first-class citizens. Configuration can be changed remotely via the xDS API, making it widely used as the data plane in service meshes (Istio, Consul Connect). Supports HTTP/2, HTTP/3, and gRPC out of the box.

Cloudflare

Less a product and more a global-scale CDN and security network. Users simply point their domain’s name servers to Cloudflare and immediately ride on that infrastructure. DDoS mitigation, WAF, caching, and edge computing (Workers) are all integrated in one platform. AWS CloudFront, Fastly, and Akamai are competitors in the same space.

The choice ultimately comes down to workload and operational capability. A single service may be perfectly served by one nginx instance, while others need the full stack from edge to mesh with Envoy + Cloudflare. Define the problem first, then pick the tool — not the other way around.

Wrapping Up the Series — And What Comes Next

The networking fundamentals series mapped out the following landscape: addresses are assigned with IP and ports, the conversation style is chosen with TCP and UDP, names are resolved with DNS, secure web requests are built with HTTP and TLS, and those requests are distributed across multiple servers with load balancers and proxies. Threading all seven parts together, what happens when you type a URL in the browser address bar becomes a single, coherent flow.

For those wanting to go deeper, the paths diverge into several directions.

Network operations: Observing real packets with tcpdump and Wireshark, managing sockets and routing with ss, netstat, and ip. Kernel-level networking like iptables, nftables, and eBPF also belongs here
Cloud networking: VPCs, subnets, routing tables, NAT Gateways, Transit Gateways, VPC peering, private links — the networking models of cloud providers
Observability: Collecting and interpreting metrics and traces from load balancers and service meshes with Prometheus and OpenTelemetry
Advanced security: mTLS, zero trust, workload identity with SPIFFE/SPIRE, WAF, and DDoS mitigation

Networking isn’t an end in itself but closer to a common language for understanding systems. Whether you build backends, do DevOps, or work in security, the vocabulary built here serves as a translator when reading material in other domains. If this series serves as the foundation for that translator, it has fulfilled its purpose.

The networking fundamentals series ends here.