🦀 RustyLB: High-Performance Layer 7 Load Balancer

A production-grade, asynchronous Load Balancer built in Rust using Tokio and Hyper 1.0. Designed to demonstrate advanced SRE concepts including Consistent Hashing, Graceful Shutdown, Health-Based Failover, and Real-time Observability.

🚀 Key Features

⚡ Hyper 1.0 & Tokio: Built on Rust’s modern async ecosystem (hyper, hyper-util, Tokio) for non-blocking I/O and high concurrency.
🔄 Consistent Hashing: Uses a virtual-node ring algorithm to minimize cache miss impact during scaling events (unlike simple Round Robin).
🛡️ Token Bucket Rate Limiter: Protects backends from DDoS and "noisy neighbor" attacks by enforcing strict RPS limits per client IP.
💓 Active Health Monitoring: Background task actively probes backend availability and automatically ejects unhealthy nodes in <3 seconds.
🛑 Graceful Shutdown: Implements "Zero Downtime" deployments. Catches SIGINT, stops accepting new connections, and waits for active requests to drain before exiting.
📊 Observability Stack: Native integration with Prometheus for metrics (Requests, Latency, Errors) and Grafana for visualization.

🛠️ Architecture

graph LR
    Client[Client Traffic] -->|Port 3000| LB[RustyLB Load Balancer]
    
    subgraph "RustyLB Internals"
        LB --> RateLimit[Token Bucket Limiter]
        RateLimit --> HashRing[Consistent Hash Ring]
        HashRing -->|Select Node| Forward[Hyper 1.0 Proxy]
    end
    
    subgraph "Backend Services"
        Forward -->|HTTP| S1[Service 8081]
        Forward -->|HTTP| S2[Service 8082]
        Forward -->|HTTP| S3[Service 8083]
    end

    subgraph "Observability"
        Prometheus -->|Scrape /metrics| LB
        Grafana -->|Query| Prometheus
    end

    Health[Health Monitor] -.->|Probe| S1
    Health -.->|Probe| S2
    Health -.->|Probe| S3

Diagram renders natively on GitHub.

⚖️ Design Tradeoffs

Consistent Hashing vs. Round Robin: I chose Consistent Hashing to ensure cache locality. In a distributed system, if one node dies, Round Robin would reshuffle all keys. Consistent Hashing only reshuffles 1/N keys, preventing cache stampedes.
Passive Circuit Breaking: Instead of a complex state machine (Open/Half-Open), I implemented a passive system where the Health Monitor acts as the source of truth for the Hash Ring. This simplifies the logic while maintaining resilience.
Labeled Metrics: Instead of structured logging for every request (which is expensive), I used labeled Prometheus metrics to track latency and error rates per-backend in real-time.

📦 Quick Start

1. Prerequisites

Rust (Cargo)
Docker (for Grafana/Prometheus)

2. Run the Observability Stack

Start Prometheus and Grafana automatically:

docker-compose up -d

Grafana: http://localhost:3001 (User: admin / Pass: admin)
Prometheus: http://localhost:9091

3. Run the Load Balancer

# Run with info logs enabled
RUST_LOG=info cargo run --bin lb

4. Test Traffic

You can use curl to send traffic. The LB listens on port 3000.

curl [http://127.0.0.1:3000](http://127.0.0.1:3000)

🧪 Verification: The "Survival" Test

To prove the Graceful Shutdown capability (ensuring no users are disconnected during a deployment):

Note: The slow backend used in this test is an external test harness and is intentionally not part of this repository.

1. Start a Backend That Responds Slowly

(e.g., any HTTP server that sleeps for several seconds before responding)

2. Send a Request

curl http://127.0.0.1:3000

3. Kill the Load Balancer

Immediately hit Ctrl+C in the Rust terminal.

✅ Result

The load balancer will log:

🛑 Graceful shutdown...

It will wait for the in-flight request to finish, and the client will successfully receive:

I survived the shutdown! 🎉

📊 Metrics (Prometheus)

The Load Balancer exposes standard SRE metrics at /metrics:

Metric Name	Type	Description
`requests_total`	Counter	Total requests routed per backend
`requests_dropped_total`	Counter	Requests blocked by Rate Limiter
`active_connections`	Gauge	Current in-flight requests
`request_duration_seconds`	Histogram	Request latency distribution (P50–P99 via PromQL)

📌 Project Scope

This project focuses on infrastructure correctness and observability, not feature breadth.

Intentionally out of scope:

TLS termination
HTTP/2 / gRPC
Dynamic config reload
Full circuit breaker state machines

These tradeoffs keep the codebase small, auditable, and focused on SRE fundamentals.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
docker-compose.yaml		docker-compose.yaml
prometheus.yml		prometheus.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦀 RustyLB: High-Performance Layer 7 Load Balancer

🚀 Key Features

🛠️ Architecture

⚖️ Design Tradeoffs

📦 Quick Start

1. Prerequisites

2. Run the Observability Stack

3. Run the Load Balancer

4. Test Traffic

🧪 Verification: The "Survival" Test

1. Start a Backend That Responds Slowly

2. Send a Request

3. Kill the Load Balancer

✅ Result

📊 Metrics (Prometheus)

📌 Project Scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦀 RustyLB: High-Performance Layer 7 Load Balancer

🚀 Key Features

🛠️ Architecture

⚖️ Design Tradeoffs

📦 Quick Start

1. Prerequisites

2. Run the Observability Stack

3. Run the Load Balancer

4. Test Traffic

🧪 Verification: The "Survival" Test

1. Start a Backend That Responds Slowly

2. Send a Request

3. Kill the Load Balancer

✅ Result

📊 Metrics (Prometheus)

📌 Project Scope

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages