fix(http): unblock request reader after Zig 0.16 migration#10
Open
Kures wants to merge 2 commits into
Open
Conversation
Every HTTP request was timing out because readSliceShort blocks until the destination buffer is fully filled or the stream reaches EOF. The read chunk is 4096 bytes, but a typical request (e.g. GET /health) is ~84 bytes — so the reader sat waiting for 4 KB of input that never came. After the curl client gave up and FIN'd the connection, the reader would finally return; by then the response was useless. Switch to readVec, which performs a single recv per call and returns whatever the kernel had — typically the entire request in one shot. Effect: - Before: every endpoint hangs until client timeout, then ~5s after the request the response is sent into a closed socket. - After: /health and /metrics respond in <2 ms. zig build test still 340/340 because the test harness invokes api.handleRequest directly and never exercises this read path. The right structural follow-up is a black-box integration test that spawns the binary and asserts a real HTTP response, but that's out of scope for this fix.
…ession The prior commit fixed the symptom (readVec instead of readSliceShort in readHttpRequest) but left the structural gap that allowed the regression to slip through: nothing in the unit-test suite exercises the live accept-read-write loop. All 340 tests invoke api.handleRequest directly, so a stall in the read step is invisible to them. This test binds 127.0.0.1:0, connects from a libc client socket on the same thread (kernel's listen backlog handles the race), sends a short HTTP request without ever closing the write side, and calls readHttpRequest on the accepted Stream. The assertion is wall-clock under 200 ms — comfortable headroom over loopback variance, far below the ~5 s the regression would take to surface (the read had to wait for the buffer to fill or for an EOF from the client; neither happens here). A SO_RCVTIMEO of 2 s on the server-side socket converts a regressed "hangs forever" into a clean test failure rather than a hung CI run. Verified two ways: - with the fix in place: 341/341 pass - with readVec reverted to readSliceShort: this test fails (340 pass, 1 crash) within the SO_RCVTIMEO window Skipped on Windows because the test reaches into libc socket primitives directly; the platform-portable equivalent would need a Winsock branch we do not need for the regression we are guarding.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR5 —
fix(http): unblock request reader after Zig 0.16 migrationBranch:
fix/http-read-blockingFixes: every HTTP endpoint times out on
mainsince54455f9(PR #5, Zig 0.16 merge, 2026-04-17). The published2026.3.2image is unaffected because it predates that merge.Why
After the Zig 0.16 migration landed, anyone building from
mainHEADfinds that
curl http://localhost:8080/healthhangs until its timeout.TCP handshakes succeed, the request is read, the handler runs, but the
response only reaches the wire 3-5 seconds after the request — by
which time the client has already given up.
The unit-test suite (
zig build test) stayed green at 340/340 becausetests exercise
api.handleRequestdirectly and never go through thelive accept-read-write loop in
src/main.zig.Root cause
Not the writer chain. The reader.
readHttpRequestcallsreader.interface.readSliceShort(&chunk)withchunk: [4096]u8. Per the Zig 0.16 documentation:So
readSliceShortblocks until the destination buffer is fullyfilled or the stream EOFs. A typical
GET /healthis ~84 bytes, sothe reader sat waiting for another 4012 bytes that would never come.
The connection only progressed when curl gave up after its timeout
and FIN'd the socket — at which point
readSliceShortreturned the 84bytes it had been holding, the handler ran, and the response was sent
into a socket the client had already closed.
tcpdumpon the loopback interface confirmed the timeline (requestACK'd at
t=0, response packet leaving the server att≈5s, RST fromthe client immediately after).
Fix
Replace
readSliceShortwithreadVec.readVecperforms at mostone
recvsyscall per call and returns whatever the kernel has —typically the entire HTTP request in one shot. The surrounding
whileloop already handles fragmented requests correctly.Diff (
src/main.zig, inreadHttpRequest):Four added lines, one deleted. No other code changes.
Verification
Reproducer from the issue draft (build
mainfrom source, run withthe unmodified
ziglang/zig:0.16.0toolchain in Docker, hitlocalhost:8080with curl):Per-request timing measured in the accept loop:
/healthand/metricsboth respond in roughly 2 ms end-to-end.zig build test --summary allis still 340/340 passing — the fixdoes not regress any covered behaviour.
Closing the test-coverage gap
The reason this regression slipped through was that none of the 340
unit tests exercised the live accept-read-write loop — they all call
api.handleRequestdirectly. So a second commit on this branch(
e7fc120) adds an in-process integration test that:127.0.0.1:0, connects a libc client socket from the samethread (kernel's listen backlog handles the lack of a separate
thread),
readHttpRequeston the acceptedStream,correctly.
SO_RCVTIMEOof 2 s on the server-side socket converts a regressed"hangs forever" into a clean failure rather than a hung CI run.
Verified the test catches the regression by temporarily reverting
readVectoreadSliceShort:readSliceShort(regressed)readVec(fixed)The test is skipped on Windows because it reaches directly into libc
socket primitives. A platform-portable equivalent would need a
Winsock branch, which we do not need for the regression we are
guarding here.
Why I'm filing this as a PR
The original issue draft hypothesised about
writer.flush()swallowingthe bytes and listed two failed workarounds. Those guesses were on the
wrong layer — the writer chain is fine, it just never gets a chance
to run because the reader stalls upstream. Once the reader is fixed,
the unmodified writer code from the Zig 0.16 migration delivers the
response correctly.