Skip to content

Add Scotty: Haskell web framework on Warp (first Haskell entry!)#233

Open
BennyFranciscus wants to merge 3 commits intoMDA2AV:mainfrom
BennyFranciscus:add-scotty
Open

Add Scotty: Haskell web framework on Warp (first Haskell entry!)#233
BennyFranciscus wants to merge 3 commits intoMDA2AV:mainfrom
BennyFranciscus:add-scotty

Conversation

@BennyFranciscus
Copy link
Copy Markdown
Collaborator

Scotty

Adds Scotty — a lightweight Haskell web framework inspired by Ruby's Sinatra, running on the Warp HTTP server.

This is the first Haskell entry in HttpArena! 🎉

Details

Language Haskell (GHC 9.8)
Framework Scotty 0.30
Engine Warp 3.4
Type Framework

Subscribed Tests

baseline, pipelined, noisy, limited-conn, json, upload, compression, mixed, async-db, static

Implementation Notes

  • Compiled with -O2 -threaded -rtsopts and runtime flags -N -A64m -I0 for maximum throughput
  • Dataset and large compression payload pre-loaded into memory at startup
  • Static files cached in a Map at startup with correct MIME types
  • Manual gzip/deflate compression using zlib (compression level 1 for speed)
  • SQLite via sqlite-simple, PostgreSQL via postgresql-simple (connection-per-request with bracket)
  • Multi-stage Docker build: haskell:9.8-slim builder → debian:bookworm-slim runtime

Validation

All 29 validation checks pass locally ✅

cc @scotty-web — would love to see Scotty's numbers on the leaderboard!

/validate

Scotty is a lightweight Haskell web framework inspired by Ruby's Sinatra,
built on top of the high-performance Warp HTTP server.

- Language: Haskell (GHC 9.8, compiled with -O2 -threaded)
- Engine: Warp
- Tests: baseline, pipelined, noisy, limited-conn, json, upload,
  compression, mixed, async-db, static
- All 29 validation checks pass

Implementation notes:
- Dataset and large payload pre-loaded into memory at startup
- Static files cached in a Map at startup with correct MIME types
- Manual gzip/deflate compression using zlib (level 1 for speed)
- SQLite via sqlite-simple, PostgreSQL via postgresql-simple
- Multi-stage Docker build with bookworm-slim runtime
@MDA2AV
Copy link
Copy Markdown
Owner

MDA2AV commented Mar 28, 2026

/benchmark

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Benchmark run triggered for scotty (all profiles). Results will be posted here when done.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: scotty | Profile: all profiles

scotty / baseline / 512c (p=1, r=0, cpu=64)
  Best: 11479 req/s (CPU: 272.5%, Mem: 658.8MiB) ===

scotty / baseline / 4096c (p=1, r=0, cpu=64)
  Best: 12871 req/s (CPU: 262.0%, Mem: 2.6GiB) ===

scotty / baseline / 16384c (p=1, r=0, cpu=64)
  Best: 11927 req/s (CPU: 337.7%, Mem: 632.8MiB) ===

scotty / pipelined / 512c (p=16, r=0, cpu=unlimited)
  Best: 14055 req/s (CPU: 249.7%, Mem: 540.4MiB) ===

scotty / pipelined / 4096c (p=16, r=0, cpu=unlimited)
  Best: 15110 req/s (CPU: 273.7%, Mem: 2.3GiB) ===

scotty / pipelined / 16384c (p=16, r=0, cpu=unlimited)
  Best: 13811 req/s (CPU: 280.6%, Mem: 595.6MiB) ===

scotty / limited-conn / 512c (p=1, r=10, cpu=unlimited)
  Best: 11441 req/s (CPU: 265.1%, Mem: 1.1GiB) ===

scotty / limited-conn / 4096c (p=1, r=10, cpu=unlimited)
  Best: 11190 req/s (CPU: 309.2%, Mem: 2.0GiB) ===

scotty / json / 4096c (p=1, r=0, cpu=unlimited)
  Best: 14748 req/s (CPU: 286.7%, Mem: 2.2GiB) ===

scotty / json / 16384c (p=1, r=0, cpu=unlimited)
  Best: 13506 req/s (CPU: 304.1%, Mem: 2.6GiB) ===

scotty / upload / 64c (p=1, r=0, cpu=unlimited)
  Best: 51 req/s (CPU: 443.4%, Mem: 6.3GiB) ===

scotty / upload / 256c (p=1, r=0, cpu=unlimited)
  Best: 51 req/s (CPU: 448.8%, Mem: 6.7GiB) ===

scotty / upload / 512c (p=1, r=0, cpu=unlimited)
  Best: 0 req/s (CPU: 0%, Mem: 0MiB) ===
Full log
[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    1.07s    1.07s    1.11s    1.15s    1.17s

  256 requests in 5.00s, 256 responses
  Throughput: 51 req/s
  Bandwidth:  8.10KB/s
  Status codes: 2xx=256, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 256 / 256 responses (100.0%)
  CPU: 523.5% | Mem: 11.8GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    1.08s    1.09s    1.13s    1.15s    1.50s

  257 requests in 5.00s, 257 responses
  Throughput: 51 req/s
  Bandwidth:  8.13KB/s
  Status codes: 2xx=257, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 257 / 257 responses (100.0%)
  CPU: 507.9% | Mem: 20.6GiB

=== Best: 51 req/s (CPU: 443.4%, Mem: 6.3GiB) ===
  Input BW: 1020.00MB/s (avg template: 20971593 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-scotty
httparena-bench-scotty

==============================================
=== scotty / upload / 256c (p=1, r=0, cpu=unlimited) ===
==============================================
530e164a79e329b920f60c39e0b3963f259090c3d504692543d224b88060ecf3
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    4.48s    4.49s    4.49s    4.50s    4.50s

  256 requests in 5.00s, 256 responses
  Throughput: 51 req/s
  Bandwidth:  8.10KB/s
  Status codes: 2xx=256, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 256 / 256 responses (100.0%)
  CPU: 448.8% | Mem: 6.7GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    4.51s    4.52s    4.53s    4.53s    4.53s

  256 requests in 5.00s, 256 responses
  Throughput: 51 req/s
  Bandwidth:  8.10KB/s
  Status codes: 2xx=256, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 256 / 256 responses (100.0%)
  CPU: 436.2% | Mem: 12.7GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    4.56s    4.57s    4.57s    4.58s    4.58s

  256 requests in 5.00s, 256 responses
  Throughput: 51 req/s
  Bandwidth:  8.10KB/s
  Status codes: 2xx=256, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 256 / 256 responses (100.0%)
  CPU: 431.0% | Mem: 21.7GiB

=== Best: 51 req/s (CPU: 448.8%, Mem: 6.7GiB) ===
  Input BW: 1020.00MB/s (avg template: 20971593 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-scotty
httparena-bench-scotty

==============================================
=== scotty / upload / 512c (p=1, r=0, cpu=unlimited) ===
==============================================
86ab59f33af750929dde9b786754f4ae42b49c4d01ea3719f39f75b43bf82556
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     512 (8/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 422.3% | Mem: 6.6GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     512 (8/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 429.8% | Mem: 13.7GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     512 (8/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 430.2% | Mem: 24.0GiB

=== Best: 0 req/s (CPU: 0%, Mem: 0MiB) ===
httparena-bench-scotty
httparena-bench-scotty
httparena-postgres
httparena-postgres
[restore] Restoring CPU governor to performance...

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Benchmark results are in — nice!

Baseline/JSON/Pipelined look solid: 11-15K req/s across concurrency levels. Not bad at all for Haskell, especially considering Scotty is a high-level framework on top of Warp.

Upload is the problem: 51 req/s at 64c/256c, then drops to 0 req/s at 512c. Memory balloons to 24 GiB. Classic sign that Wai/Warp is buffering the entire 20MB upload body in memory before passing it to the handler. At 512 concurrent connections that's ~10GB of request bodies in flight, which likely OOMs or stalls the GHC runtime.

I'll look into whether Scotty/Wai supports lazy or streaming request body consumption — if we can process chunks incrementally instead of strictRequestBody, that should fix it. Might need getRequestBodyChunk from the Wai Request type.

…emory

Upload was using Scotty's 'body' (strictRequestBody) which reads the
entire request body into a lazy ByteString before processing. At 512
concurrent connections with 20MB payloads, this causes ~10GB memory
pressure and OOMs the GHC runtime (0 req/s at 512c, 24GiB at 256c).

Switch to streaming via Wai's getRequestBodyChunk — reads and counts
bytes incrementally with constant memory overhead.
@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Pushed a fix — switched upload handler from body (which calls strictRequestBody and buffers everything in memory) to streaming via Wai's getRequestBodyChunk. Now reads and counts bytes incrementally with constant memory overhead.

This should fix the 0 req/s at 512c and the 24 GiB memory balloon.

@MDA2AV
Copy link
Copy Markdown
Owner

MDA2AV commented Mar 28, 2026

very low cpu usage

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Yeah, really bad CPU utilization — only ~270% out of 6400% available. GHC's IO manager with -N (all cores) creates 64 capabilities but they end up contending on each other instead of doing useful work. Warp doesn't scale linearly past ~4-8 cores on the traditional IO manager.

Going to try -N4 or -N8 in the RTS options — GHC typically hits peak throughput around that range for network-heavy workloads. Less scheduler overhead, less lock contention in the green thread runtime.

GHC's IO manager doesn't scale well past a few cores for network
workloads — -N (all 64 cores) was only hitting ~270% CPU due to
scheduler and lock contention. -N4 with -qn4 (4 parallel GC threads)
should give better throughput by reducing overhead.
@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Pushed a fix — limited RTS to -N4 -qn4 instead of -N (all cores). GHC's thread scheduler creates one capability per core with -N, but for IO-bound network work, all that buys you is contention. 4 capabilities with 4 GC threads should be the sweet spot.

If throughput is still low after re-benchmark, could try -N8 or experiment with the newer threaded IO manager (-with-rtsopts=--io-manager=native) available in GHC 9.8+.

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

@MDA2AV Could we get a re-benchmark when you get a chance? The -N4 -qn4 RTS fix + streaming upload should improve CPU utilization significantly over the previous run.

@MDA2AV
Copy link
Copy Markdown
Owner

MDA2AV commented Mar 29, 2026

/benchmark --save

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Benchmark run triggered for scotty (all tests) with --save. Results will be posted here when done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants