feat: add filtered L1 Bundle copycat depth-aware indexing by ID#734
feat: add filtered L1 Bundle copycat depth-aware indexing by ID#734charmful0x wants to merge 24 commits intofix/neo-loggingfrom
Conversation
|
added support for comma separated include-owner-alias filter: 48> hb_ao:resolve( <<"~copycat@1.0/arweave&id=6DODXspJYXcMbUvadcAQ9FoP3xh5N0dhDCiOwU7d4Q4&mode=write&include-owner-alias=neo-bundler,turbo&exclude-tag=Bundler-App-Name:Redstone">>, Opts2 ).
{ok,#{items_count => 1404,bundle_count => 1,skipped_count => 0,
<<"priv">> =>
#{<<"hashpath">> =>
<<"M8hn9wfAiAF8pQirF-j48KgJ1lXxkD02MlDX0uWhUoM/sV4DMZxa1fCz2ajn144goQSnEGoJcaTSHrTaOqvhHps">>}}}
49> |
|
get/set recursion cap -- overrides dev_copycat_arweave:set_depth_recursion_cap(5, Opts).
dev_copycat_arweave:get_depth_recursion_cap(Opts).
|
|
new features: hb_ao:resolve( <<"~copycat@1.0/arweave&id=fFt5eteych-ppitofKFoeuzm5I_2CyY1ce4FSAGC3Ow&mode=write&load-l1-offset=true&include-owner-alias=neo-bundler,turbo&exclude-tag=Bundler-App-Name:Redstone&include-tag=Bundler-App-Name:ao">>, Opts2 ).
|
|
updated the MEMORY_SAFE_CAP to match the highest recorded L1 data tx size under turbo's ao bundler: Ardrive Turbo (data uploads stopped at block 867572 ): JNC6vBhjHY1EPwV3pEeNmrsgFMxH5d38_LHsZ7jful8 {
"total_size_bytes": "69035238626980",
"total_size_gb": "64294.076",
"largest_txid": "DEk-63yOLQNt04ZjUeTYJ4GJ18ur7kDNLg_6wBsVvz0",
"largest_tx_size": "5369655672",
"smallest_txid": "Bmbz9xuw3m1whhBXa2hI1OZXp68Bi0Wsu-rdCV2YOwg",
"smallest_tx_size": "3836"
}neo-uploader (actively uploading data): FPjbN_btYKzcf8QASjs30v5C0FPv7XpwKXENBW8dqVw latest snapshot stats: "total_size_bytes": "3663945608",
"total_size_gb": "3.412",
"largest_txid": "wzoLJaO6ahteoIU_UfjC0noJPM1PxV7XVDFH_QLM0nE",
"largest_tx_size": "84331158",
"smallest_txid": "eNzAdhwi6GC9HMcjfAn-MFWaD_zHOK8bFokqPeStA6E",
"smallest_tx_size": "2231"bucketed distribution{
"input_file": "turbo-txs.json",
"total_entries": 198400,
"entries_with_size": 198400,
"under_100mb": 93767,
"under_100mb_pct": 47.26,
"under_250mb": 139179,
"under_250mb_pct": 70.15,
"under_500mb": 164341,
"under_500mb_pct": 82.83,
"under_1gb": 181073,
"under_1gb_pct": 91.27,
"under_2gb": 192998,
"under_2gb_pct": 97.28,
"under_3gb": 196288,
"under_3gb_pct": 98.94,
"under_4gb": 197542,
"under_4gb_pct": 99.57,
"under_5gb": 198093,
"under_5gb_pct": 99.85,
"over_6gb": 0,
"over_6gb_pct": 0,
"buckets": {
"bucket_0_100mb": {
"count": 93767,
"pct": 47.26
},
"bucket_100_250mb": {
"count": 45412,
"pct": 22.89
},
"bucket_250_500mb": {
"count": 25162,
"pct": 12.68
},
"bucket_500mb_1gb": {
"count": 16732,
"pct": 8.43
},
"bucket_1_2gb": {
"count": 11925,
"pct": 6.01
},
"bucket_2_3gb": {
"count": 3290,
"pct": 1.66
},
"bucket_3_4gb": {
"count": 1254,
"pct": 0.63
},
"bucket_4_5gb": {
"count": 551,
"pct": 0.28
},
"bucket_5_6gb": {
"count": 307,
"pct": 0.15
},
"bucket_over_6gb": {
"count": 0,
"pct": 0
}
}
} |
|
the feature is functional and performance, ready to be tested. we do have the full AO data subset L1 txids. however, an upstream limitation is that some large L1 roots still fail on current chunk-based retrieval (when downloading the full L1 bundle bytestream), i tried to download the full L1 TX data under the /raw gateway path, it failed as well. example KGZuBPJkb39s60kvKkGkYDe1SJxG7UfQMo_aCmXT67o (not ao data protocol L1 TX) |
src/dev_copycat_arweave.erl
Outdated
| Opts#{copycat_depth_recursion_cap => Cap}. | ||
| %% @doc Get the set depth recursion cap. if not set, defaults to ?DEPTH_RECURSION_CAP | ||
| get_depth_recursion_cap(Opts) -> | ||
| case maps:get(copycat_depth_recursion_cap, Opts, not_found) of |
There was a problem hiding this comment.
| case maps:get(copycat_depth_recursion_cap, Opts, not_found) of | |
| case hb_opts:get(copycat_depth_recursion_cap, Opts, not_found) of |
There was a problem hiding this comment.
I think convention is to use the hb_opts wrapper rather than maps directly to query the Opts.
src/dev_copycat_arweave.erl
Outdated
| case maps:get(copycat_memory_cap, Opts, not_found) of | ||
| not_found -> ?MEMORY_SAFE_CAP; | ||
| Cap -> Cap | ||
| end. |
There was a problem hiding this comment.
| case maps:get(copycat_memory_cap, Opts, not_found) of | |
| not_found -> ?MEMORY_SAFE_CAP; | |
| Cap -> Cap | |
| end. | |
| hb_opts:get(copycat_memory_cap, Opts, ?MEMORY_SAFE_CAP). |
|
thanks for the review @JamesPiechota ! i addressed the suggestions. i didnt apply the GH UI suggestion directly because the i kept the Opts setters for shell ergonomics since they are just local override helpers on top of the hb_opts defaults/read path. if you think those add unnecessary API surface, i can drop them too (same fow owner alias set/get given it's module scopped UX feature) WDYT? |
b0e92a3 to
ca7a964
Compare
Naming changes: - index_bundle_header for the light/shallow indexing, index_full_bundle for the branches which read/load all chunks in a bundle before indexing - process_block_tx for old-style (shallow) indexing, process_l1_tx for new-style (deep) indexing
About
performance-aware resource-minimalist filtering-aware L1 bundles copycat indexing - status: wip
configurability
in id=.. path:
&depth=safe_maxrecurse every bundle till DEPTH_RECURSION_CAP. if no depth is provided (1..safe_max), it defaults to safe_max.supported paths
at the moment, the new path is the &id=.. + filters. how it works:
to test the feature locally, and to simulate having the required local L1 tx offset index, index a block with depth=1
for block https://aolink.ar.io/#/block/1870797
now if we assume we have the required L1 TXs offsets indexed locally, we can integrate over IDs and assert filters:
res:
{ok,#{items_count => 1404,bundle_count => 1,skipped_count => 0,if we try a L1 TX with redstone filter enabled: must be a turbo owner L1 TX (bundle) but exclude redstone tags.
txid: https://viewblock.io/arweave/tx/5q95xvC3_BbZa5C4hypcgnIHkyLSlgGef-OpAdMjoOY
TODOs:
DEPTH_RECURSION_CAPconfigurable, with getter/setter parity toMEMORY_SAFE_CAP