Skip to content

feat: add filtered L1 Bundle copycat depth-aware indexing by ID#734

Open
charmful0x wants to merge 24 commits intofix/neo-loggingfrom
feat/bundles-copycat
Open

feat: add filtered L1 Bundle copycat depth-aware indexing by ID#734
charmful0x wants to merge 24 commits intofix/neo-loggingfrom
feat/bundles-copycat

Conversation

@charmful0x
Copy link

@charmful0x charmful0x commented Mar 6, 2026

About

performance-aware resource-minimalist filtering-aware L1 bundles copycat indexing - status: wip

configurability

  • add owner alias (reduce address computation at every query):
Opts2 = dev_copycat_arweave:add_owner_alias(    <<"FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw">>,    <<"neo-bundler">>,    Opts1  ).
  • set L1 bundle safe size cap (useful to configure the memory allowed usage per process, taking into accoutn arweave_workers count * MEMORY_SAFE_CAP)
dev_copycat_arweave:set_memory_safe_cap(xxxxbytes, Opts).
  • safe max recursion depth
~copycat@1.0/arweave&depth=safe_max

in id=.. path: &depth=safe_max recurse every bundle till DEPTH_RECURSION_CAP. if no depth is provided (1..safe_max), it defaults to safe_max.

  • defaults
-define(DEPTH_L1_OFFSETS, 1).
-define(DEPTH_RECURSION_CAP, 4).
%% 1GB in bytes
-define(MEMORY_SAFE_CAP, 1024 * 1024 * 1024).

supported paths

at the moment, the new path is the &id=.. + filters. how it works:

  1. it requires the L1 IDs offsets to be present in store
  2. it fetch the L1 ID headers to retrieve owner and tags + validates that its a bundle
  3. it applies the filters to the L1 TX, pass on it if the filters hit the guating
  4. if filters passed, it downloads the L1 TX bytestream, does depth recusion in memory, index children offsets -> recurses in memory and indexes descendant offsets

to test the feature locally, and to simulate having the required local L1 tx offset index, index a block with depth=1

  application:ensure_all_started(hb).
  application:ensure_all_started(inets).
  application:ensure_all_started(ssl).
  hb_http:start().

  TestStore = hb_test_utils:test_store().
  StoreOpts = #{<<"index-store">> => [TestStore]}.
  Store = [
    TestStore,
    #{
      <<"store-module">> => hb_store_arweave,
      <<"name">> => <<"cache-arweave">>,
      <<"index-store">> => [TestStore],
      <<"arweave-node">> => <<"https://arweave.net">>
    }
  ].

  Opts = #{
    store => Store,
    arweave_index_ids => true,
    arweave_index_store => StoreOpts,
    arweave_index_workers => 4,
    prometheus => false,
    http_client => httpc,
    http_retry => 1,
    http_retry_time => 200,
    http_retry_mode => constant,
    http_retry_response => [failure]
  }.

  Opts1 = dev_copycat_arweave:add_owner_alias(
    <<"FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw">>,
    <<"neo-bundler">>,
    Opts
  ).

  %% simulate already having the L1 offset index locally
  hb_ao:resolve(
    <<"~copycat@1.0/arweave&from=1870797&to=1870797&mode=write&depth=1">>,
    Opts1
  ).

for block https://aolink.ar.io/#/block/1870797

now if we assume we have the required L1 TXs offsets indexed locally, we can integrate over IDs and assert filters:

hb_ao:resolve(<<"~copycat@1.0/arweave&id=6DODXspJYXcMbUvadcAQ9FoP3xh5N0dhDCiOwU7d4Q4&mode=write&depth=safe_max&include-owner-alias=neo-bundler&exclude-tag=Bundler-App-Name:Redstone">>, Opts1).

res:
{ok,#{items_count => 1404,bundle_count => 1,skipped_count => 0,

if we try a L1 TX with redstone filter enabled: must be a turbo owner L1 TX (bundle) but exclude redstone tags.

txid: https://viewblock.io/arweave/tx/5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY

  Opts1A = dev_copycat_arweave:add_owner_alias(
    <<"JNC6vBhjHY1EPwV3pEeNmrsgFMxH5d38_LHsZ7jful8">>,
    <<"turbo">>,
    Opts1
  ).
42> hb_ao:resolve(    <<"~copycat@1.0/arweave&id=5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY&mode=write&include-owner-alias=turbo&exclude-tag=Bundler-App-Name:Redstone">>,    Opts1A  ).
=== HB DEBUG ===[3273092ms in <0.1056.0> @ hb_ao:194 / hb_ao:204 / hb_ao:543 / dev_copycat_arweave:137 / dev_copycat_arweave:689]==>
arweave_tx_skipped, tx_id: [Explicit:] <<"5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY">>, reason: exclude_tag_match
{ok,#{items_count => 0,bundle_count => 0,skipped_count => 1,
      <<"priv">> =>
          #{<<"hashpath">> =>
                <<"M8hn9wfAiAF8pQirF-j48KgJ1lXxkD02MlDX0uWhUoM/1SgyB85LEUXqwat2_18-vyaJIk2NpNKeiUKNpbrP4XA">>}}}

TODOs:

  • make DEPTH_RECURSION_CAP configurable, with getter/setter parity to MEMORY_SAFE_CAP
  • cleanup & perf optimization

@charmful0x
Copy link
Author

added support for comma separated include-owner-alias filter:

48> hb_ao:resolve(    <<"~copycat@1.0/arweave&id=6DODXspJYXcMbUvadcAQ9FoP3xh5N0dhDCiOwU7d4Q4&mode=write&include-owner-alias=neo-bundler,turbo&exclude-tag=Bundler-App-Name:Redstone">>,    Opts2  ).
{ok,#{items_count => 1404,bundle_count => 1,skipped_count => 0,
      <<"priv">> =>
          #{<<"hashpath">> =>
                <<"M8hn9wfAiAF8pQirF-j48KgJ1lXxkD02MlDX0uWhUoM/sV4DMZxa1fCz2ajn144goQSnEGoJcaTSHrTaOqvhHps">>}}}
49> 

@charmful0x
Copy link
Author

get/set recursion cap -- overrides DEPTH_RECURSION_CAP and defaults to it if not set.

dev_copycat_arweave:set_depth_recursion_cap(5, Opts).

dev_copycat_arweave:get_depth_recursion_cap(Opts).

@charmful0x
Copy link
Author

charmful0x commented Mar 6, 2026

new features:

hb_ao:resolve(    <<"~copycat@1.0/arweave&id=fFt5eteych-ppitofKFoeuzm5I_2CyY1ce4FSAGC3Ow&mode=write&load-l1-offset=true&include-owner-alias=neo-bundler,turbo&exclude-tag=Bundler-App-Name:Redstone&include-tag=Bundler-App-Name:ao">>,    Opts2  ).
  • &include-tag=Key:Value : require the L1 TX header to contain that tag pair
  • &load-l1-offset=true : if the L1 TX offset is not present in the local store, fetch it from the network, write it locally, and continue with the existing id=... indexing path

@charmful0x
Copy link
Author

charmful0x commented Mar 7, 2026

updated the MEMORY_SAFE_CAP to match the highest recorded L1 data tx size under turbo's ao bundler:

Ardrive Turbo (data uploads stopped at block 867572 ): JNC6vBhjHY1EPwV3pEeNmrsgFMxH5d38_LHsZ7jful8

{
  "total_size_bytes": "69035238626980",
  "total_size_gb": "64294.076",
  "largest_txid": "DEk-63yOLQNt04ZjUeTYJ4GJ18ur7kDNLg_6wBsVvz0",
  "largest_tx_size": "5369655672",
  "smallest_txid": "Bmbz9xuw3m1whhBXa2hI1OZXp68Bi0Wsu-rdCV2YOwg",
  "smallest_tx_size": "3836"
}

neo-uploader (actively uploading data): FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw

latest snapshot stats:

  "total_size_bytes": "3663945608",
  "total_size_gb": "3.412",
  "largest_txid": "wzoLJaO6ahteoIU_UfjC0noJPM1PxV7XVDFH_QLM0nE",
  "largest_tx_size": "84331158",
  "smallest_txid": "eNzAdhwi6GC9HMcjfAn-MFWaD_zHOK8bFokqPeStA6E",
  "smallest_tx_size": "2231"

bucketed distribution

{
"input_file": "turbo-txs.json",
"total_entries": 198400,
"entries_with_size": 198400,
"under_100mb": 93767,
"under_100mb_pct": 47.26,
"under_250mb": 139179,
"under_250mb_pct": 70.15,
"under_500mb": 164341,
"under_500mb_pct": 82.83,
"under_1gb": 181073,
"under_1gb_pct": 91.27,
"under_2gb": 192998,
"under_2gb_pct": 97.28,
"under_3gb": 196288,
"under_3gb_pct": 98.94,
"under_4gb": 197542,
"under_4gb_pct": 99.57,
"under_5gb": 198093,
"under_5gb_pct": 99.85,
"over_6gb": 0,
"over_6gb_pct": 0,
"buckets": {
  "bucket_0_100mb": {
    "count": 93767,
    "pct": 47.26
  },
  "bucket_100_250mb": {
    "count": 45412,
    "pct": 22.89
  },
  "bucket_250_500mb": {
    "count": 25162,
    "pct": 12.68
  },
  "bucket_500mb_1gb": {
    "count": 16732,
    "pct": 8.43
  },
  "bucket_1_2gb": {
    "count": 11925,
    "pct": 6.01
  },
  "bucket_2_3gb": {
    "count": 3290,
    "pct": 1.66
  },
  "bucket_3_4gb": {
    "count": 1254,
    "pct": 0.63
  },
  "bucket_4_5gb": {
    "count": 551,
    "pct": 0.28
  },
  "bucket_5_6gb": {
    "count": 307,
    "pct": 0.15
  },
  "bucket_over_6gb": {
    "count": 0,
    "pct": 0
  }
}
}

@charmful0x charmful0x changed the title wip: add filtered L1 Bundle copycat depth-aware indexing by ID feat: add filtered L1 Bundle copycat depth-aware indexing by ID Mar 8, 2026
@charmful0x
Copy link
Author

the feature is functional and performance, ready to be tested. we do have the full AO data subset L1 txids.

however, an upstream limitation is that some large L1 roots still fail on current chunk-based retrieval (when downloading the full L1 bundle bytestream), i tried to download the full L1 TX data under the /raw gateway path, it failed as well. example KGZuBPJkb39s60kvKkGkYDe1SJxG7UfQMo_aCmXT67o (not ao data protocol L1 TX)

@charmful0x charmful0x requested a review from JamesPiechota March 8, 2026 11:49
Opts#{copycat_depth_recursion_cap => Cap}.
%% @doc Get the set depth recursion cap. if not set, defaults to ?DEPTH_RECURSION_CAP
get_depth_recursion_cap(Opts) ->
case maps:get(copycat_depth_recursion_cap, Opts, not_found) of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case maps:get(copycat_depth_recursion_cap, Opts, not_found) of
case hb_opts:get(copycat_depth_recursion_cap, Opts, not_found) of

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think convention is to use the hb_opts wrapper rather than maps directly to query the Opts.

Comment on lines +55 to +58
case maps:get(copycat_memory_cap, Opts, not_found) of
not_found -> ?MEMORY_SAFE_CAP;
Cap -> Cap
end.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case maps:get(copycat_memory_cap, Opts, not_found) of
not_found -> ?MEMORY_SAFE_CAP;
Cap -> Cap
end.
hb_opts:get(copycat_memory_cap, Opts, ?MEMORY_SAFE_CAP).

@charmful0x
Copy link
Author

thanks for the review @JamesPiechota ! i addressed the suggestions. i didnt apply the GH UI suggestion directly because the hb_opts:get/3 argument order was reversed there, but the getters now read through hb_opts and i added the defaults in hb_opts so the config key normalization works as expected

i kept the Opts setters for shell ergonomics since they are just local override helpers on top of the hb_opts defaults/read path. if you think those add unnecessary API surface, i can drop them too (same fow owner alias set/get given it's module scopped UX feature)

WDYT?

@JamesPiechota JamesPiechota changed the base branch from neo/edge to fix/neo-logging March 10, 2026 18:25
Naming changes:
- index_bundle_header for the light/shallow indexing, index_full_bundle for the branches which read/load all chunks in a bundle before indexing
- process_block_tx for old-style (shallow) indexing, process_l1_tx for new-style (deep) indexing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants