feat: add filtered L1 Bundle copycat depth-aware indexing by ID by charmful0x · Pull Request #734 · permaweb/HyperBEAM

charmful0x · 2026-03-06T11:02:34Z

About

performance-aware resource-minimalist filtering-aware L1 bundles copycat indexing - status: wip

configurability

add owner alias (reduce address computation at every query):

Opts2 = dev_copycat_arweave:add_owner_alias(    <<"FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw">>,    <<"neo-bundler">>,    Opts1  ).

set L1 bundle safe size cap (useful to configure the memory allowed usage per process, taking into accoutn arweave_workers count * MEMORY_SAFE_CAP)

dev_copycat_arweave:set_memory_safe_cap(xxxxbytes, Opts).

safe max recursion depth

~copycat@1.0/arweave&depth=safe_max

in id=.. path: &depth=safe_max recurse every bundle till DEPTH_RECURSION_CAP. if no depth is provided (1..safe_max), it defaults to safe_max.

defaults

-define(DEPTH_L1_OFFSETS, 1).
-define(DEPTH_RECURSION_CAP, 4).
%% 1GB in bytes
-define(MEMORY_SAFE_CAP, 1024 * 1024 * 1024).

supported paths

at the moment, the new path is the &id=.. + filters. how it works:

it requires the L1 IDs offsets to be present in store
it fetch the L1 ID headers to retrieve owner and tags + validates that its a bundle
it applies the filters to the L1 TX, pass on it if the filters hit the guating
if filters passed, it downloads the L1 TX bytestream, does depth recusion in memory, index children offsets -> recurses in memory and indexes descendant offsets

to test the feature locally, and to simulate having the required local L1 tx offset index, index a block with depth=1

  application:ensure_all_started(hb).
  application:ensure_all_started(inets).
  application:ensure_all_started(ssl).
  hb_http:start().

  TestStore = hb_test_utils:test_store().
  StoreOpts = #{<<"index-store">> => [TestStore]}.
  Store = [
    TestStore,
    #{
      <<"store-module">> => hb_store_arweave,
      <<"name">> => <<"cache-arweave">>,
      <<"index-store">> => [TestStore],
      <<"arweave-node">> => <<"https://arweave.net">>
    }
  ].

  Opts = #{
    store => Store,
    arweave_index_ids => true,
    arweave_index_store => StoreOpts,
    arweave_index_workers => 4,
    prometheus => false,
    http_client => httpc,
    http_retry => 1,
    http_retry_time => 200,
    http_retry_mode => constant,
    http_retry_response => [failure]
  }.

  Opts1 = dev_copycat_arweave:add_owner_alias(
    <<"FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw">>,
    <<"neo-bundler">>,
    Opts
  ).

  %% simulate already having the L1 offset index locally
  hb_ao:resolve(
    <<"~copycat@1.0/arweave&from=1870797&to=1870797&mode=write&depth=1">>,
    Opts1
  ).

for block https://aolink.ar.io/#/block/1870797

now if we assume we have the required L1 TXs offsets indexed locally, we can integrate over IDs and assert filters:

hb_ao:resolve(<<"~copycat@1.0/arweave&id=6DODXspJYXcMbUvadcAQ9FoP3xh5N0dhDCiOwU7d4Q4&mode=write&depth=safe_max&include-owner-alias=neo-bundler&exclude-tag=Bundler-App-Name:Redstone">>, Opts1).

res:
{ok,#{items_count => 1404,bundle_count => 1,skipped_count => 0,

if we try a L1 TX with redstone filter enabled: must be a turbo owner L1 TX (bundle) but exclude redstone tags.

txid: https://viewblock.io/arweave/tx/5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY

  Opts1A = dev_copycat_arweave:add_owner_alias(
    <<"JNC6vBhjHY1EPwV3pEeNmrsgFMxH5d38_LHsZ7jful8">>,
    <<"turbo">>,
    Opts1
  ).

42> hb_ao:resolve(    <<"~copycat@1.0/arweave&id=5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY&mode=write&include-owner-alias=turbo&exclude-tag=Bundler-App-Name:Redstone">>,    Opts1A  ).
=== HB DEBUG ===[3273092ms in <0.1056.0> @ hb_ao:194 / hb_ao:204 / hb_ao:543 / dev_copycat_arweave:137 / dev_copycat_arweave:689]==>
arweave_tx_skipped, tx_id: [Explicit:] <<"5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY">>, reason: exclude_tag_match
{ok,#{items_count => 0,bundle_count => 0,skipped_count => 1,
      <<"priv">> =>
          #{<<"hashpath">> =>
                <<"M8hn9wfAiAF8pQirF-j48KgJ1lXxkD02MlDX0uWhUoM/1SgyB85LEUXqwat2_18-vyaJIk2NpNKeiUKNpbrP4XA">>}}}

TODOs:

make DEPTH_RECURSION_CAP configurable, with getter/setter parity to MEMORY_SAFE_CAP
cleanup & perf optimization

charmful0x · 2026-03-06T12:01:03Z

added support for comma separated include-owner-alias filter:

48> hb_ao:resolve(    <<"~copycat@1.0/arweave&id=6DODXspJYXcMbUvadcAQ9FoP3xh5N0dhDCiOwU7d4Q4&mode=write&include-owner-alias=neo-bundler,turbo&exclude-tag=Bundler-App-Name:Redstone">>,    Opts2  ).
{ok,#{items_count => 1404,bundle_count => 1,skipped_count => 0,
      <<"priv">> =>
          #{<<"hashpath">> =>
                <<"M8hn9wfAiAF8pQirF-j48KgJ1lXxkD02MlDX0uWhUoM/sV4DMZxa1fCz2ajn144goQSnEGoJcaTSHrTaOqvhHps">>}}}
49>

charmful0x · 2026-03-06T15:40:32Z

get/set recursion cap -- overrides DEPTH_RECURSION_CAP and defaults to it if not set.

dev_copycat_arweave:set_depth_recursion_cap(5, Opts).

dev_copycat_arweave:get_depth_recursion_cap(Opts).

charmful0x · 2026-03-06T22:04:59Z

new features:

hb_ao:resolve(    <<"~copycat@1.0/arweave&id=fFt5eteych-ppitofKFoeuzm5I_2CyY1ce4FSAGC3Ow&mode=write&load-l1-offset=true&include-owner-alias=neo-bundler,turbo&exclude-tag=Bundler-App-Name:Redstone&include-tag=Bundler-App-Name:ao">>,    Opts2  ).

&include-tag=Key:Value : require the L1 TX header to contain that tag pair
&load-l1-offset=true : if the L1 TX offset is not present in the local store, fetch it from the network, write it locally, and continue with the existing id=... indexing path

charmful0x · 2026-03-07T11:50:22Z

updated the MEMORY_SAFE_CAP to match the highest recorded L1 data tx size under turbo's ao bundler:

Ardrive Turbo (data uploads stopped at block 867572 ): JNC6vBhjHY1EPwV3pEeNmrsgFMxH5d38_LHsZ7jful8

{
  "total_size_bytes": "69035238626980",
  "total_size_gb": "64294.076",
  "largest_txid": "DEk-63yOLQNt04ZjUeTYJ4GJ18ur7kDNLg_6wBsVvz0",
  "largest_tx_size": "5369655672",
  "smallest_txid": "Bmbz9xuw3m1whhBXa2hI1OZXp68Bi0Wsu-rdCV2YOwg",
  "smallest_tx_size": "3836"
}

neo-uploader (actively uploading data): FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw

latest snapshot stats:

  "total_size_bytes": "3663945608",
  "total_size_gb": "3.412",
  "largest_txid": "wzoLJaO6ahteoIU_UfjC0noJPM1PxV7XVDFH_QLM0nE",
  "largest_tx_size": "84331158",
  "smallest_txid": "eNzAdhwi6GC9HMcjfAn-MFWaD_zHOK8bFokqPeStA6E",
  "smallest_tx_size": "2231"

bucketed distribution

{
"input_file": "turbo-txs.json",
"total_entries": 198400,
"entries_with_size": 198400,
"under_100mb": 93767,
"under_100mb_pct": 47.26,
"under_250mb": 139179,
"under_250mb_pct": 70.15,
"under_500mb": 164341,
"under_500mb_pct": 82.83,
"under_1gb": 181073,
"under_1gb_pct": 91.27,
"under_2gb": 192998,
"under_2gb_pct": 97.28,
"under_3gb": 196288,
"under_3gb_pct": 98.94,
"under_4gb": 197542,
"under_4gb_pct": 99.57,
"under_5gb": 198093,
"under_5gb_pct": 99.85,
"over_6gb": 0,
"over_6gb_pct": 0,
"buckets": {
  "bucket_0_100mb": {
    "count": 93767,
    "pct": 47.26
  },
  "bucket_100_250mb": {
    "count": 45412,
    "pct": 22.89
  },
  "bucket_250_500mb": {
    "count": 25162,
    "pct": 12.68
  },
  "bucket_500mb_1gb": {
    "count": 16732,
    "pct": 8.43
  },
  "bucket_1_2gb": {
    "count": 11925,
    "pct": 6.01
  },
  "bucket_2_3gb": {
    "count": 3290,
    "pct": 1.66
  },
  "bucket_3_4gb": {
    "count": 1254,
    "pct": 0.63
  },
  "bucket_4_5gb": {
    "count": 551,
    "pct": 0.28
  },
  "bucket_5_6gb": {
    "count": 307,
    "pct": 0.15
  },
  "bucket_over_6gb": {
    "count": 0,
    "pct": 0
  }
}
}

charmful0x · 2026-03-08T11:49:08Z

the feature is functional and performance, ready to be tested. we do have the full AO data subset L1 txids.

however, an upstream limitation is that some large L1 roots still fail on current chunk-based retrieval (when downloading the full L1 bundle bytestream), i tried to download the full L1 TX data under the /raw gateway path, it failed as well. example KGZuBPJkb39s60kvKkGkYDe1SJxG7UfQMo_aCmXT67o (not ao data protocol L1 TX)

src/dev_copycat_arweave.erl

JamesPiechota · 2026-03-08T14:56:37Z

src/dev_copycat_arweave.erl

+    Opts#{copycat_depth_recursion_cap => Cap}.
+%% @doc Get the set depth recursion cap. if not set, defaults to ?DEPTH_RECURSION_CAP
+get_depth_recursion_cap(Opts) ->
+    case maps:get(copycat_depth_recursion_cap, Opts, not_found) of


Suggested change

case maps:get(copycat_depth_recursion_cap, Opts, not_found) of

case hb_opts:get(copycat_depth_recursion_cap, Opts, not_found) of

I think convention is to use the hb_opts wrapper rather than maps directly to query the Opts.

JamesPiechota · 2026-03-08T14:58:11Z

src/dev_copycat_arweave.erl

+    case maps:get(copycat_memory_cap, Opts, not_found) of
+        not_found -> ?MEMORY_SAFE_CAP;
+        Cap -> Cap
+    end.


Suggested change

case maps:get(copycat_memory_cap, Opts, not_found) of

not_found -> ?MEMORY_SAFE_CAP;

Cap -> Cap

end.

hb_opts:get(copycat_memory_cap, Opts, ?MEMORY_SAFE_CAP).

charmful0x · 2026-03-08T17:06:30Z

thanks for the review @JamesPiechota ! i addressed the suggestions. i didnt apply the GH UI suggestion directly because the hb_opts:get/3 argument order was reversed there, but the getters now read through hb_opts and i added the defaults in hb_opts so the config key normalization works as expected

i kept the Opts setters for shell ergonomics since they are just local override helpers on top of the hb_opts defaults/read path. if you think those add unnecessary API surface, i can drop them too (same fow owner alias set/get given it's module scopped UX feature)

WDYT?

Naming changes: - index_bundle_header for the light/shallow indexing, index_full_bundle for the branches which read/load all chunks in a bundle before indexing - process_block_tx for old-style (shallow) indexing, process_l1_tx for new-style (deep) indexing

charmful0x mentioned this pull request Mar 6, 2026

feat: id/depth nested bundles indexing #722

Closed

charmful0x changed the title ~~wip: add filtered L1 Bundle copycat depth-aware indexing by ID~~ feat: add filtered L1 Bundle copycat depth-aware indexing by ID Mar 8, 2026

charmful0x requested a review from JamesPiechota March 8, 2026 11:49

JamesPiechota reviewed Mar 8, 2026

View reviewed changes

src/dev_copycat_arweave.erl Show resolved Hide resolved

JamesPiechota reviewed Mar 8, 2026

View reviewed changes

charmful0x added 18 commits March 10, 2026 14:19

chore: primitive alias owner helpers

01dbede

feat: owner ex/inclusion filters

73f3f23

feat: parse_exclude_tags/2 util

e78d9c4

feat: l1 filters handler & minimal patches

3d25405

feat: process_l1_candidates/3 & wiring with Id path

bc7da8e

chore: clearer L1 filters error logging

868b3de

feat: bundle bytestream download + in-memory processing

169093b

perf: safe_max depth default & L1 bundle safe max size

1c5d8a7

fix: skip normalizing not_found tag

a46260c

feat: support comma separatred aliases

78375f8

feat: depth recursion cap setter/getter

f02c3a0

docs: add fns doc

75d8274

feat: &include-tag filter

0c14a7e

feat: load l1 tx offset

3561f56

chore: bump default MEMORY_SAFE_CAP

78906e2

docs: document ensure_l1_tx_offset

8913294

chore: emit event on network fetch for missing l1 store offsets

f2a54a7

fix: address comments

95b8d15

feat: block N depth indexing

ca7a964

JamesPiechota force-pushed the feat/bundles-copycat branch from b0e92a3 to ca7a964 Compare March 10, 2026 18:24

JamesPiechota changed the base branch from neo/edge to fix/neo-logging March 10, 2026 18:25

JamesPiechota added 5 commits March 10, 2026 17:35

test: add tests

74ef105

chore: some minor renaming

9350225

chore: add some more logging to the L1 TX indexer

c65e936

chore: have indexer return the number of L1 TX items indexed

933692d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add filtered L1 Bundle copycat depth-aware indexing by ID#734

feat: add filtered L1 Bundle copycat depth-aware indexing by ID#734
charmful0x wants to merge 24 commits intofix/neo-loggingfrom
feat/bundles-copycat

charmful0x commented Mar 6, 2026 •

edited

Loading

Uh oh!

charmful0x commented Mar 6, 2026

Uh oh!

charmful0x commented Mar 6, 2026

Uh oh!

charmful0x commented Mar 6, 2026 •

edited

Loading

Uh oh!

charmful0x commented Mar 7, 2026 •

edited

Loading

Uh oh!

charmful0x commented Mar 8, 2026

Uh oh!

Uh oh!

JamesPiechota Mar 8, 2026

Uh oh!

JamesPiechota Mar 8, 2026

Uh oh!

JamesPiechota Mar 8, 2026

Uh oh!

charmful0x commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	case maps:get(copycat_depth_recursion_cap, Opts, not_found) of
	case hb_opts:get(copycat_depth_recursion_cap, Opts, not_found) of

Conversation

charmful0x commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

configurability

supported paths

TODOs:

Uh oh!

charmful0x commented Mar 6, 2026

Uh oh!

charmful0x commented Mar 6, 2026

Uh oh!

charmful0x commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charmful0x commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

bucketed distribution

Uh oh!

charmful0x commented Mar 8, 2026

Uh oh!

Uh oh!

JamesPiechota Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

JamesPiechota Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

charmful0x commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

charmful0x commented Mar 6, 2026 •

edited

Loading

charmful0x commented Mar 6, 2026 •

edited

Loading

charmful0x commented Mar 7, 2026 •

edited

Loading