fix: gap-fill and exact-range trim for post-threshold chunks by nikooo777 · Pull Request #904 · permaweb/HyperBEAM

nikooo777 · 2026-05-12T00:45:07Z

Summary

Fix a silent-corruption bug where ~arweave@2.9/raw= returned wrong bytes for any post-strict-data-split-threshold dataitem whose on-chain chunks didn't happen to align to a 256 KiB stride relative to the dataitem's data start. The advertised content-length was correct; the body was both shorter than that and contained bytes from outside the requested range at the tail.

Root cause is in get_chunk_range_fixed_size/3: it generated candidate query offsets at 256 KiB increments from the requested start, then used a one-extra-chunk fallback if the assembled length was short. That heuristic only covers the trivial "off by one trailing chunk" case — it misses interior gaps when the bucket grid drifts from the request stride, and assemble_chunks/2 made it worse by never trimming the trailing overshoot of the fallback chunk. Aligned dataitems happened to work and masked the bug.

Changes (all in src/dev_arweave.erl):

get_chunk_range_fixed_size/3 now defers to fill_gaps/4, the same iterative gap-fill helper the pre-threshold path already uses. Interior gaps of any shape get detected and refilled until the range is contiguous.
fill_gaps/4 trims trailing overshoot to exactly [Offset, EndOffset] after assembly; assemble_chunks/2 already trimmed the leading edge. The pre-threshold variable-size path also benefits.
get_chunk/3 routes global no-length chunk requests through a small new single_chunk_suffix/2 helper. Without this, the tightened trim would collapse no-length responses from "a chunk's suffix" to a single byte, regressing both the public chunk HTTP endpoint and the internal bundle_header/2 caller (which would then report invalid_bundle_header for every bundle).
fetch_chunk_range/4 docstring updated to describe the new exact-range contract.

Net diff: 51 insertions, 52 deletions in one file. No exported function signatures change; no schema or config changes.

Test plan

Index a block containing a post-threshold L2 dataitem whose chunks don't tile cleanly with a 256 KiB stride from its data start. Confirm the indexer detects bundles and writes per-item offsets as before.
Fetch the dataitem via ~arweave@2.9/raw=<id> and confirm the response body length matches the advertised content-length exactly and that the bytes decode end-to-end (e.g. an MP4 or WASM whose section/box walker reaches EOF cleanly). Before fix: same request returned ~33 KB short with non-payload bytes near the end.
Hit ~arweave@2.9/chunk?offset=... without a length parameter and confirm the response is a chunk's suffix (tens of KiB), not 1 byte. Confirms the no-length contract is preserved for both external HTTP callers and bundle_header/2.
rebar3 eunit: all dev_arweave tests pass (35/35), including the post-split chunk path tests. The only failures across the suite are three pre-existing network-dependent dev_name ARNS tests that reproduce identically on unmodified edge.

- Use fill_gaps in get_chunk_range_fixed_size for interior gaps - Trim trailing overshoot inside fill_gaps to the exact range - Route global no-length chunk requests through single_chunk_suffix - Refresh fetch_chunk_range docstring to match new behavior

JamesPiechota · 2026-05-12T01:52:05Z

+%% @doc Fetch the single upstream chunk containing Offset and return its
+%% suffix from Offset onward. Used when the chunk request has no explicit
+%% length: the caller wants "a chunk's worth" rather than an exact range.
+single_chunk_suffix(Offset, Opts) ->


Is this necessary for this PR? Did something break with the default Length=1 handling? (if it's not necessary we can also get rid of the HasExplicitLength check above)

JamesPiechota · 2026-05-12T01:53:17Z

+            Bin = iolist_to_binary(Binaries),
+            Expected = EndOffset - Offset + 1,
+            {ok, binary:part(Bin, 0, min(Expected, byte_size(Bin)))};


Do you know why this is needed here? There is some truncating that happens earlier in the stack - and since fill_gaps was previously called for pre-threshold ranges, is that flow path still good?

I'm still going through the logic, but on the offset 194_794_421_495_003 example, these are the values:

Expected 732228

byte_size(Bin) 840978

Going through the logic to make sure there is no extra code that we don't need.

All tests pass if we revert this change, because we have the one inside get_chunk to trim the binary.

JamesPiechota · 2026-05-12T01:54:09Z

Extra test I recommend adding to dev_arweave:

get_post_split_mid_chunk_large_module_test_parallel() ->
    Offset = 194_794_421_495_003,
    ExpectedLength = 732_228,
    {ok, Data} = hb_ao:resolve(
        #{ <<"device">> => <<"arweave@2.9">> },
        #{
            <<"path">> => <<"chunk">>,
            <<"offset">> => Offset + 1,
            <<"length">> => ExpectedLength
        },
        #{}
    ),
    ?assertEqual(ExpectedLength, byte_size(Data)).

speeddragon · 2026-05-12T16:23:27Z

+%% @doc Fetch a range of chunks in parallel. Determines the appropriate
+%% algorithm to use based on offset, length, and an optional relative
+%% transaction ID. For global (no relative TX) offsets, returns exactly the
+%% bytes in the inclusive range [Offset, Offset + Length - 1]: both leading
+%% and trailing overshoot are trimmed and any interior gaps in chunk
+%% coverage are filled iteratively. For relative-TX offsets the legacy
+%% concatenation path is used and the caller may receive more than Length
+%% bytes (no trailing trim), so it must truncate the result itself.


I'm not sure if this is useful. The function isn't public, and the logic should be simple to understand and the code to read.

Agree. I generally remove all those explanatory comments that Claude adds.

speeddragon · 2026-05-12T16:50:11Z

+    %% For some reason if we wait 1500ms before the request this test doesn't fail
+    %% with connect_timeout when runnning in parallel.
+    timer:sleep(1500),


This is a flaky test. This pass 100% of the time when run as a single test, but fails 80%+ of the time when run in parallel (default now) with checkout_timeout. This error happens on the server side, so this is the response sent with {ok, checkout_timeout} (which is weird, because it's on the server side).

There is a bug in the code, where hackney configurations are sent when setting up test servers, to be fixed in another PR, but still, increasing checkout timeout in hackney doesn't solve the issue, it just keeps waiting until the timeout and fails.

JamesPiechota

LGTM

JamesPiechota reviewed May 12, 2026

View reviewed changes

speeddragon reviewed May 12, 2026

View reviewed changes

impr: Simplify logic

7ad7ad6

speeddragon force-pushed the fix/post-threshold-chunk-gap-fill branch from 83336c0 to 7ad7ad6 Compare May 12, 2026 16:44

speeddragon reviewed May 12, 2026

View reviewed changes

JamesPiechota approved these changes May 12, 2026

View reviewed changes

impr: Remove function comment

0ed28e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gap-fill and exact-range trim for post-threshold chunks#904

fix: gap-fill and exact-range trim for post-threshold chunks#904
nikooo777 wants to merge 3 commits into
edgefrom
fix/post-threshold-chunk-gap-fill

nikooo777 commented May 12, 2026

Uh oh!

JamesPiechota May 12, 2026

Uh oh!

JamesPiechota May 12, 2026

Uh oh!

speeddragon May 12, 2026

Uh oh!

speeddragon May 12, 2026

Uh oh!

JamesPiechota commented May 12, 2026

Uh oh!

speeddragon May 12, 2026

Uh oh!

JamesPiechota May 12, 2026 •

edited

Loading

Uh oh!

speeddragon May 12, 2026

Uh oh!

JamesPiechota left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nikooo777 commented May 12, 2026

Summary

Test plan

Uh oh!

JamesPiechota May 12, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota May 12, 2026

Choose a reason for hiding this comment

Uh oh!

speeddragon May 12, 2026

Choose a reason for hiding this comment

Uh oh!

speeddragon May 12, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota commented May 12, 2026

Uh oh!

speeddragon May 12, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

speeddragon May 12, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JamesPiechota May 12, 2026 •

edited

Loading