Add Fletch framework implementation for HttpArena (H1 + mixed + async-db)#260
Add Fletch framework implementation for HttpArena (H1 + mixed + async-db)#260MDA2AV merged 8 commits intoMDA2AV:mainfrom
Conversation
|
/validate |
|
✅ Validation passed for Full log |
BennyFranciscus
left a comment
There was a problem hiding this comment.
Hey @kartikey321, nice first submission! Fletch looks interesting — cool to see more Dart entries. CI passing 29/29 is a great start.
Two things that need fixing before this can be benchmarked:
1. JSON endpoint pre-computes the response
final jsonResponseBytes = _buildJsonResponseBytes();
// ...
app.get("/json", (req, res) {
res.bytes(jsonResponseBytes, contentType: "application/json");
});The total field (price * quantity) must be computed per-request, not cached at startup. The benchmark validates this by checking computation correctness with varying inputs. Pre-computing once and serving the same bytes every time bypasses the actual work the test is measuring.
Fix: Move the _buildJsonResponseBytes() logic into the request handler so _mapItem runs on every request.
2. Compression endpoint pre-compresses at startup
final gzipResponseBytes = _buildGzipResponseBytes();
// ...
app.get("/compression", (req, res) {
res.bytes(gzipResponseBytes, contentType: "application/json");
res.setHeader("Content-Encoding", "gzip");
});Gzip compression must happen per-request — pre-compressing the response at startup is not allowed. This is an explicit rule from the repo maintainer.
Fix: Read the large dataset once at startup (that's fine), but do jsonEncode + GZipCodec.encode inside the request handler.
Everything else looks solid — the multi-isolate setup with shared: true, SQLite prepared statements, Postgres pool with lazy retry, and the upload Content-Length shortcut are all good patterns. Just need those two fixes and you're good to go! 🚀
|
@BennyFranciscus got it , thank you for the heads up, i have changed what you asked for. |
|
/validate |
|
✅ Validation passed for Full log |
|
/benchmark |
|
🚀 Benchmark run triggered for |
BennyFranciscus
left a comment
There was a problem hiding this comment.
Looks great @kartikey321 — both fixes are clean. JSON and compression now compute per-request, exactly as needed. Validation is green (29/29) and I've kicked off the benchmark run. We'll have numbers soon!
Quick notes on the implementation — nice touches:
- Multi-isolate with
shared: trueis the right call for Dart - Using Content-Length for uploads instead of buffering is a smart shortcut
- The lazy Postgres pool with retry is solid defensive code
Excited to see how Fletch stacks up. Will post results once the run finishes 🏎️
|
Looks good @kartikey321 — the per-request JSON computation and per-request gzip are both correct now. 👍 Benchmark is running, let's see how Fletch does! |
|
Hey @kartikey321 welcome to the arena! |
Benchmark ResultsFramework: Comparison with mainNo results found for Full log |
|
Looks solid but CPU usage is quite low ~1000% so only 10 out of the 128 CPU threads |
|
Yeah good catch — Dart's runtime is isolate-based so it probably defaults to a handful of isolates rather than spreading across all 128 threads. @kartikey321 you might want to look into spawning more isolates to match the available cores. Something like |
|
@BennyFranciscus @MDA2AV Fletch already scales worker count by CPU visibility in the container: So isolate count is not hardcoded to a small number. Given the benchmark logs, CPU around ~1200–1400% could also be due to Could we confirm runner/container CPU visibility for this run?
If helpful, I can also test a rerun with higher loadgen thread count |
|
Oh nice, you're right — my bad for assuming it wasn't scaling. If it's already doing Good call on checking nproc inside the container — that's probably the key data point here. The baseline test runs with @MDA2AV would be able to confirm the container CPU visibility. Either way the numbers look clean — zero 5xx across the board, which is always nice to see for a first submission. |
|
baseline test has a cpu=64, the other tests are not restricted, container has access to whole CPU. So for baseline container sees 64 CPUs and for other tests it sees 128 |
|
Sorry for the commit confusion. I had added a change using environment variables and pushed before validating it properly. I’ve reverted that commit. I’m investigating the CPU utilization issue on a VPS now to isolate whether this is runtime behavior or environment related. I’ll post updates with findings and any validated fix. |
|
Ah that settles it then — if other frameworks are hitting 3.5M at 6400% CPU on baseline, the headroom is definitely there and it's not a loadgen bottleneck. So Dart is spawning the isolates but they're just not saturating the cores. Could be a bottleneck in the listener/accept pattern — if the main isolate is doing all the accept() work and dispatching to workers, that single accept loop can become the ceiling. Some Dart servers use @kartikey321 might be worth checking how Fletch distributes incoming connections across isolates. If it's a single-accept-then-dispatch model, switching to shared binding could unlock a lot more throughput on a 64+ core machine like this. |
If you can't find anything I can just run your app on the server without container, that can validate any doubts. |
|
Looks good to me. Welcome to the game 😉 @kartikey321 If you want we can merge a first round so you get a performance diff by the benchmarking platform on subsequent PRs. So it is not necessary to create a perfect one in the first place. @MDA2AV Regarding the |
Platform.numberOfProcessors reads the host's total CPU count and ignores the --cpus=N Docker limit Same pattern used by node/express/fastify/bun/elysia/workerman.
|
/benchmark |
|
🚀 Benchmark run triggered for |
Benchmark ResultsFramework: Comparison with mainNo results found for Full log |
|
Numbers are pretty consistent with the last run — baseline hovering around 175-180K, CPU still sitting at ~1360%. So the isolate scaling thing is definitely a Dart runtime characteristic rather than run-to-run variance. +1 on @Kaliumhexacyanoferrat's suggestion to merge this and iterate. Having the baseline on main means you'll get automatic diffs on follow-up PRs, which makes it way easier to experiment with different accept patterns or isolate strategies and see what actually moves the needle. Re: the Content-Length shortcut question — that's a fair point. If the upload test is meant to measure body-read throughput, skipping the read via Content-Length kind of defeats the purpose. But if the direction is toward "advance without reading when possible" then it's more of a framework-level optimization that should be allowed. Probably needs a clear call from @MDA2AV on the intent of the test so all implementations play by the same rules. |
|
Performance fix: isolate count now respects the --cpus cgroup quota Identified and fixed a core scaling issue: Platform.numberOfProcessors in Dart calls sysconf(_SC_NPROCESSORS_ONLN) which reads the host's total CPU count, completely ignoring Docker's --cpus=N On a 128-CPU runner with --cpus=64 (as the baseline profile sets), the server was spawning 128 isolates competing for 64 CPUs — each isolate getting half the connections it needed to stay busy, The fix follows the same pattern used by every other framework in this benchmark that explicitly manages worker count: node / express / fastify / koa / hono - os.availableParallelism() Added entrypoint.sh that passes $(nproc) as a CLI argument to the server binary. nproc on Linux reads the cgroup CPU quota directly (/sys/fs/cgroup/cpu.max), returning 64 inside --cpus=64 And somehow its not improved. I didnt get the time to debug it on bare metal. Regarding the content length shortcut, please do notify me if its not upto the mark i will change it. Ya i agree that we can merge it , when i get the time i will set up a profiler on a linux machine to debug it, alongside native dart:io to see whats the issue, i feel that dart maps isolates to raw cpu cores, but as of now i can't tell whats the issue |
|
That's a really solid detective job tracing it to The fact that it didn't improve is actually the interesting part though. If you're now spawning 64 isolates on 64 CPUs and still only hitting ~1360% CPU, the bottleneck isn't isolate count — it's something upstream preventing those isolates from saturating their cores. A few things worth investigating:
The profiler idea is probably the fastest path to an answer — specifically tracking where each isolate spends its time (accept vs parse vs handler vs write). That'll tell you immediately if it's contention or per-isolate throughput. Re: Content-Length — no worries, let's wait for the call on that and iterate. Want me to kick off another benchmark run after you push changes, or should we merge as-is and iterate from main? |
|
/benchmark --save running this to persist results |
|
🚀 Benchmark run triggered for |
This PR adds a new fletch framework entry under frameworks/fletch with
a Dockerized Dart server implementation for HttpArena profiles.
What’s included
unavailability can recover on subsequent requests.
^3.2.0).
required by Dart sqlite FFI.