Table of contents
- When a Single Server Isn’t Enough
- Proxies — Something Sitting in the Middle
- L4 vs. L7 — Which Layer Does It Inspect?
- Distribution Algorithms — Who Gets the Next Request?
- Session Affinity — Same User, Same Server
- Health Checks — Dead Servers Must Be Removed
- TLS Termination — Where to Strip the Encryption
- CDN — The Load Balancer’s Distant Relative
- Real-World Products — What Goes Where
- Wrapping Up the Series — And What Comes Next
When a Single Server Isn’t Enough
Let’s briefly recap the journey from Part 1 to here. IP and ports determine where to talk, TCP/UDP set the mode of conversation, DNS translates names to addresses, HTTP carries the request, and TLS wraps it securely. Everything up to this point assumed a one-to-one relationship between client and server.
But when traffic grows, a single server can’t keep up. There are two options: scaling up by beefing up the hardware, or scaling out by running multiple machines with the same function. Scaling up has limits and is expensive. So most web services choose scale-out — spinning up multiple servers and placing a device that distributes traffic in front of them.
That device is the load balancer. And alongside it comes the proxy, which serves a similar but slightly different role. This post covers how these two concepts work, at which layers, and in what ways.
Proxies — Something Sitting in the Middle
Let’s first clarify what a proxy is. A proxy is an intermediary device that sits between client and server, relaying requests and responses. Its name varies depending on where it sits.
flowchart LR
subgraph FW["Forward Proxy (client-side)"]
C1[Employee PC] --> FP[Corporate proxy]
FP --> INT1[Internet servers]
end
subgraph RV["Reverse Proxy (server-side)"]
INT2[External users] --> RP[Reverse proxy]
RP --> B1[Internal server A]
RP --> B2[Internal server B]
end
The difference in one sentence: a forward proxy acts on behalf of the client; a reverse proxy acts on behalf of the server.
- Forward proxy: Funnels outbound requests from a corporate network through a single point. Used for internet access control, caching, and audit logging. The client knows it’s using a proxy — you configure the proxy address in the browser
- Reverse proxy: Sits in front of servers and receives external requests. The client typically doesn’t know the proxy exists — they just connect to a single domain. The dozens of backends behind it are hidden by the reverse proxy
A load balancer is a type of reverse proxy. That’s why tools like nginx and HAProxy are simultaneously called reverse proxies and load balancers. In this post, “proxy” generally refers to a reverse proxy.
L4 vs. L7 — Which Layer Does It Inspect?
The most important distinction for load balancers is which OSI layer the decision is made at. OSI divides network functions into 7 layers, as briefly covered in Part 1. In the load balancer context, you only need to remember L4 (Transport layer — TCP/UDP) and L7 (Application layer — HTTP).
flowchart TB
subgraph L4["L4 Load Balancer (TCP/UDP)"]
L4IN[Packet] -->|Looks at IP + port only| L4DEC[Distribution decision]
L4DEC --> L4B1[Backend]
end
subgraph L7["L7 Load Balancer (HTTP)"]
L7IN[HTTP request] -->|Host, path, headers, cookies| L7DEC[Distribution decision]
L7DEC --> L7B1[Backend]
end
An L4 load balancer distributes packets based on IP and port only. It doesn’t care about HTTP headers or URL paths. The advantage is speed and simplicity — processing at microsecond levels per connection, protocol-agnostic as long as it’s TCP-based. MySQL connections, Redis connections, gRPC, and even custom game server protocols can all be balanced at L4.
An L7 load balancer parses the HTTP message. It examines the Host header, URL path, cookies, and HTTP method. This enables sophisticated routing like “/api goes to backend A, /static goes to backend B” or “route api.example.com and admin.example.com differently.” TLS termination from Part 6 is also L7 territory — you have to decrypt the ciphertext before you can see the HTTP headers.
| Attribute | L4 | L7 |
|---|---|---|
| Information inspected | IP, port | HTTP host, path, headers, cookies |
| Protocols | TCP/UDP broadly | HTTP(S) primarily |
| TLS termination | Typically no (passthrough) | Yes |
| Speed | Very fast | Slower (parsing overhead) |
| Capabilities | Simple distribution | Path-based routing, header modification, authentication |
In practice, both layers are often combined. An L4 front line forwards all incoming traffic to a cluster of L7 load balancers, and the L7 layer handles HTTP specifics. Google’s Maglev and the AWS NLB + ALB combo follow this pattern.
Distribution Algorithms — Who Gets the Next Request?
The rule for choosing which server receives the next request among multiple backends is the distribution algorithm. Here are four commonly used ones.
Round-Robin
As the name suggests, requests are distributed sequentially in rotation. The simplest approach. With three backends: 1 → 2 → 3 → 1 → 2 → 3.
flowchart LR
R[Incoming requests<br/>1,2,3,4,5,6,7,8] --> LB[LB: round-robin]
LB -->|1,4,7| A[Server A]
LB -->|2,5,8| B[Server B]
LB -->|3,6| C[Server C]
Works well when backends have equal capacity and load. When request processing times vary wildly, some servers idle while others are swamped.
Weighted Round-Robin
Each server gets a weight. Useful when mixing servers of different specs or gradually shifting traffic to a new version (canary deployment). With weights A:3, B:2, C:1, out of every six requests three go to A, two to B, and one to C.
Least Connections
Sends the next request to the server with the fewest active connections. For workloads where request processing times vary greatly (WebSocket, streaming, long analytics queries), this is much fairer than round-robin. It requires maintaining connection counters, introducing a bit of state, but the cost is minimal.
Source IP Hash (ip-hash)
Hashes the client IP to determine which server to route to. Requests from the same IP always go to the same server. This forms the basis for the next topic: session affinity.
Session Affinity — Same User, Same Server
HTTP is fundamentally stateless. But in legacy apps that store login sessions in server memory, if the same user’s requests don’t go to the same server, the session breaks. This is solved by session affinity (sticky sessions).
Two implementations are common.
- IP-based: Hash the client IP and pin to the same server. If many people are behind a single NAT, they all pile onto one server
- Cookie-based: The load balancer attaches a special cookie to the response. On the next request, it reads that cookie and routes to the same server. Avoids the IP method’s skew and is more accurate
sequenceDiagram
participant U as User
participant LB as Load Balancer
participant B1 as Backend 1
U->>LB: First request
LB->>B1: Selected by round-robin
B1-->>LB: Response
LB-->>U: Set-Cookie: LB_ROUTE=B1
U->>LB: Second request (Cookie: LB_ROUTE=B1)
LB->>B1: Cookie parsed → same server
B1-->>U: Response with session maintained
Session affinity is useful but fundamentally a workaround. Keeping state on servers is the enemy of scale-out. Adding a new server doesn’t migrate existing sessions, and if a server dies, every user with affinity to that server is logged out. The modern preference is to externalize sessions to an external store like Redis, making servers fully stateless. Then any server can produce the same response regardless.
Health Checks — Dead Servers Must Be Removed
If the load balancer doesn’t know some backends are dead or misbehaving, traffic keeps flowing to them and 5xx errors pour out. So load balancers periodically send health checks to backends.
# Conceptual health check configuration example (nginx upstream)
upstream backend {
server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;
}
- L4 health check: Only checks if a TCP connection can be established. Fast, but misses “the app is dead but the port is open” situations
- L7 health check: Sends an HTTP request to a specific URL and expects a 200 response. If the
/healthendpoint is built to also verify the DB connection, it can judge whether “the app is truly alive”
Kubernetes’ readiness and liveness probes are the same concept. Regardless of the layer, the core idea is periodically asking “are you ready to receive traffic?”
TLS Termination — Where to Strip the Encryption
The TLS handshake from Part 6 has a notable CPU cost. If dozens of backend servers each handle TLS independently, resources are redundantly consumed. So in many environments, the load balancer decrypts TLS and sends plain HTTP internally. This point is called TLS termination.
flowchart LR
CLIENT[Client] -->|HTTPS| LB["L7 LB<br/>TLS termination"]
LB -->|HTTP| B1[Backend 1]
LB -->|HTTP| B2[Backend 2]
LB -->|HTTP| B3[Backend 3]
The advantages are clear: certificate management is centralized (easier to renew), backends don’t need to know about TLS, and HTTP header-based routing becomes naturally possible. The downside is that the segment between LB and backends is plaintext. In the zero trust era (a security model that says “don’t trust the internal network either”), the trend is to re-encrypt this segment with TLS as well (mTLS, mutual TLS). Service meshes like Istio automate this work.
CDN — The Load Balancer’s Distant Relative
A CDN (Content Delivery Network — a distributed network that serves content from servers physically close to the user) is similar to a load balancer but with a different flavor. While a load balancer distributes traffic within a single data center, a CDN distributes content across edge servers deployed worldwide.
Responding from the edge closest to the user drastically reduces latency and offloads the origin server. The general mechanism works like this.
flowchart TB
U1[Seoul user] --> E1[Edge: Seoul]
U2[London user] --> E2[Edge: London]
U3[Sao Paulo user] --> E3[Edge: Sao Paulo]
E1 --> O["Origin<br/>(US-East data center)"]
E2 --> O
E3 --> O
E1 -.->|Cache hit skips origin call| U1
- DNS directs to the nearest edge: When a user looks up a domain, the CDN’s DNS returns the IP of the nearest edge based on the user’s IP. Low-level techniques like anycast are also used
- Cache decision at the edge: If the edge already has the requested resource (cache hit), it doesn’t go to the origin. On a miss, it requests from the origin, stores the response at the edge, and returns it
- Cache TTL: The server uses the
Cache-Controlheader from Part 5 to dictate “how long this can be stored.” Static assets get long TTLs; API responses get short ones orno-store
Today’s CDNs (Cloudflare, Fastly, CloudFront, etc.) go beyond simple caching to run code at the edge. Routing rules, A/B testing, WAF (Web Application Firewall — blocks web attacks), edge functions — a lot of work finishes at the edge before ever reaching the origin. The trend is that “the CDN is the application’s front door.”
Real-World Products — What Goes Where
Finally, let’s survey the products that carry weight in practice. Each has different strengths.
nginx
A long-established reverse proxy and web server. Available as a default package in Linux distributions, with low barrier to entry and declarative configuration syntax. Handles L7 routing, TLS termination, static file serving, and caching all in one. Also the most widely used controller as ingress-nginx in Kubernetes.
# nginx: /api goes to backend, everything else is static
upstream api_backend {
server 10.0.1.10:8080;
server 10.0.1.11:8080;
}
server {
listen 443 ssl;
server_name example.com;
ssl_certificate /etc/nginx/tls/example.crt;
ssl_certificate_key /etc/nginx/tls/example.key;
location /api/ {
proxy_pass http://api_backend;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
location / {
root /var/www/html;
}
}
In this configuration, requests to /api/ are round-robined between the two backends. Passing the original client IP to the backend via the X-Forwarded-For header is standard reverse proxy etiquette.
HAProxy
As its name implies (High Availability Proxy), this product focuses on high-performance load balancing. Strong at both L4 and L7, with rich statistics and fine-grained tuning options. Frequently seen in telecom, financial, and game server environments handling extreme traffic.
Envoy
A modern proxy led by Google and Lyft. Built for cloud-native environments, it treats dynamic configuration and observability as first-class citizens. Configuration can be changed remotely via the xDS API, making it widely used as the data plane in service meshes (Istio, Consul Connect). Supports HTTP/2, HTTP/3, and gRPC out of the box.
Cloudflare
Less a product and more a global-scale CDN and security network. Users simply point their domain’s name servers to Cloudflare and immediately ride on that infrastructure. DDoS mitigation, WAF, caching, and edge computing (Workers) are all integrated in one platform. AWS CloudFront, Fastly, and Akamai are competitors in the same space.
The choice ultimately comes down to workload and operational capability. A single service may be perfectly served by one nginx instance, while others need the full stack from edge to mesh with Envoy + Cloudflare. Define the problem first, then pick the tool — not the other way around.
Wrapping Up the Series — And What Comes Next
The networking fundamentals series mapped out the following landscape: addresses are assigned with IP and ports, the conversation style is chosen with TCP and UDP, names are resolved with DNS, secure web requests are built with HTTP and TLS, and those requests are distributed across multiple servers with load balancers and proxies. Threading all seven parts together, what happens when you type a URL in the browser address bar becomes a single, coherent flow.
For those wanting to go deeper, the paths diverge into several directions.
- Network operations: Observing real packets with
tcpdumpand Wireshark, managing sockets and routing withss,netstat, andip. Kernel-level networking likeiptables,nftables, and eBPF also belongs here - Cloud networking: VPCs, subnets, routing tables, NAT Gateways, Transit Gateways, VPC peering, private links — the networking models of cloud providers
- Observability: Collecting and interpreting metrics and traces from load balancers and service meshes with Prometheus and OpenTelemetry
- Advanced security: mTLS, zero trust, workload identity with SPIFFE/SPIRE, WAF, and DDoS mitigation
Networking isn’t an end in itself but closer to a common language for understanding systems. Whether you build backends, do DevOps, or work in security, the vocabulary built here serves as a translator when reading material in other domains. If this series serves as the foundation for that translator, it has fulfilled its purpose.
The networking fundamentals series ends here.

Loading comments...