feat: add filter-based index cleanup for L2 data items#658
feat: add filter-based index cleanup for L2 data items#658vilenarios wants to merge 4 commits intodevelopfrom
Conversation
Gateway operators indexing data via ArDrive Turbo for optimistic 30-day retention currently have no way to selectively clean up expired or unwanted data items. The only existing mechanism (prune-stable-data-items) supports height range + indexed_at threshold but no tag or wallet filtering. This adds filter-based index cleanup that deletes L2 data items from SQLite and ClickHouse based on owner wallet address, tags, block height range, and indexing age. Three interfaces are provided: - Admin endpoint: POST /ar-io/admin/index-cleanup - Background worker: configurable via ENABLE_INDEX_CLEANUP_WORKER - Both share the same core cleanup engine Key design decisions: - Dry-run mode defaults ON (must explicitly set dryRun: false) - Leaves bundles/bundle_data_items tables intact to prevent unbundler re-import loops (follows existing prune pattern) - Leaves contiguous_data table intact (content-addressed, may be shared) - Empty filter guard prevents accidental mass deletion - Batch processing with cursor-based pagination (configurable batch size) - ClickHouse cleanup via ALTER TABLE DELETE with is_data_item guard New environment variables: - ENABLE_INDEX_CLEANUP_WORKER (default: false) - INDEX_CLEANUP_INTERVAL_SECONDS (default: 86400) - INDEX_CLEANUP_BATCH_SIZE (default: 1000) - INDEX_CLEANUP_FILTER (JSON filter spec) - INDEX_CLEANUP_MIN_AGE_SECONDS (default: 2592000 / 30 days) - INDEX_CLEANUP_DRY_RUN (default: true) PE-9016 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The prod tsconfig does not narrow the `hasOwners` boolean check to prove `filter.owners` is defined. Add explicit guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughAdds an index-cleanup feature: config flags, SQLite candidate discovery/counting and batch deletions, ClickHouse deletion support, a periodic IndexCleanupWorker, an admin API endpoint, new types, and tests. Changes
Sequence DiagramsequenceDiagram
participant Admin
participant Route as API Route
participant SQLiteDB as SQLite Database
participant ClickHouseDB as ClickHouse Database
participant Worker as IndexCleanupWorker
Admin->>Route: POST /ar-io/admin/index-cleanup (filter, dryRun, batchSize)
alt dryRun = true
Route->>SQLiteDB: countIndexCleanupCandidates(filter)
SQLiteDB-->>Route: count
Route-->>Admin: {count, totalDeleted:0}
else dryRun = false
loop per batch
Route->>SQLiteDB: getIndexCleanupCandidateIds(filter, limit, afterId)
SQLiteDB-->>Route: {ids[], hasMore}
Route->>SQLiteDB: deleteIndexCleanupBundlesBatch(ids)
SQLiteDB-->>Route: deletedCounts
Route->>SQLiteDB: deleteIndexCleanupDataBatch(ids)
SQLiteDB-->>Route: deletedCounts
alt ClickHouse enabled
Route->>ClickHouseDB: deleteDataItemsByIds(ids)
ClickHouseDB-->>Route: ack
end
end
Route-->>Admin: {totalDeleted}
end
Note over Worker: Periodic background run (same flow as non-dry-run)
Worker->>SQLiteDB: countIndexCleanupCandidates(filter)
SQLiteDB-->>Worker: count
loop per batch
Worker->>SQLiteDB: getIndexCleanupCandidateIds(filter, limit, afterId)
SQLiteDB-->>Worker: {ids[], hasMore}
Worker->>SQLiteDB: deleteIndexCleanupBundlesBatch(ids)
Worker->>SQLiteDB: deleteIndexCleanupDataBatch(ids)
alt ClickHouse enabled
Worker->>ClickHouseDB: deleteDataItemsByIds(ids)
ClickHouseDB-->>Worker: ack
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (6)
src/workers/index-cleanup-worker.ts (2)
15-32: Consider extracting thedbtype to a named interface.The database dependency type is defined identically in both the class property (lines 15-32) and constructor parameter (lines 53-70). Extracting this to a named interface would reduce duplication and improve maintainability.
♻️ Suggested refactor
+interface IndexCleanupDb { + getIndexCleanupCandidateIds(params: { + filter: IndexCleanupFilter; + limit: number; + afterId?: Buffer; + }): Promise<{ ids: Buffer[]; hasMore: boolean }>; + countIndexCleanupCandidates(filter: IndexCleanupFilter): Promise<number>; + deleteIndexCleanupBundlesBatch(ids: Buffer[]): Promise<{ + stableDataItemTagsDeleted: number; + stableDataItemsDeleted: number; + newDataItemTagsDeleted: number; + newDataItemsDeleted: number; + }>; + deleteIndexCleanupDataBatch(ids: Buffer[]): Promise<{ + contiguousDataIdParentsDeleted: number; + contiguousDataIdsDeleted: number; + }>; +} export class IndexCleanupWorker { private log: winston.Logger; - private db: { - getIndexCleanupCandidateIds(params: { - ... - }): Promise<...>; - ... - }; + private db: IndexCleanupDb;Also applies to: 53-70
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workers/index-cleanup-worker.ts` around lines 15 - 32, Extract the inline db type into a named interface (e.g., IndexCleanupDb) containing the methods getIndexCleanupCandidateIds, countIndexCleanupCandidates, deleteIndexCleanupBundlesBatch, and deleteIndexCleanupDataBatch with their existing signatures, then replace the anonymous type on the class property db and the constructor parameter with this new interface; update any imports/exports if needed and ensure the constructor parameter and the private db property both use the same IndexCleanupDb type to remove duplication.
88-96: Consider running an initial cleanup on start.The worker starts an interval but doesn't run an immediate cleanup. Depending on the use case, operators may want cleanup to begin immediately rather than waiting for
intervalMsto elapse. This could be made configurable or documented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workers/index-cleanup-worker.ts` around lines 88 - 96, The start() method in index-cleanup-worker.ts currently sets up a periodic timer but never performs an immediate cleanup; update start() to optionally run an initial cleanup by calling this.cleanup() right after setting the interval (or before, if you prefer) controlled by a new boolean option like runOnStart or immediateCleanup on the worker instance (add the option to the constructor/defaults), and emit a log entry from start() indicating that an initial cleanup was executed (or skipped) so operators can see behavior; keep existing fields (intervalMs, dryRun, minAgeSeconds, filter) intact and ensure cleanup() handles concurrent invocations safely if needed.src/workers/index-cleanup-worker.test.ts (3)
156-193: Consider verifying the actual IDs passed to ClickHouse cleanup.The test verifies
deleteDataItemsByIdsis called once but doesn't assert that the IDs are correctly converted to base64url format. Adding an assertion on the call arguments would strengthen the test.💡 Suggested enhancement
assert.equal(mockClickHouse.deleteDataItemsByIds.mock.callCount(), 1); + const passedIds = mockClickHouse.deleteDataItemsByIds.mock.calls[0].arguments[0]; + assert.equal(passedIds.length, 1); + // Verify base64url conversion occurred + assert.equal(typeof passedIds[0], 'string');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workers/index-cleanup-worker.test.ts` around lines 156 - 193, The test currently only asserts deleteDataItemsByIds was called once; update the test around IndexCleanupWorker.cleanup to also assert the actual IDs passed to mockClickHouse.deleteDataItemsByIds are the expected base64url-encoded values (convert the Buffer id1 to base64url and compare to the argument passed). Locate the mockClickHouse.deleteDataItemsByIds mock used in the test and add an assertion on its call arguments (the first call's payload) to verify correct conversion/formatting of IDs.
215-230: Consider adding an assertion thatstart()actually schedules the interval.The test calls
start()andstop()but doesn't verify that the interval was actually created. While this is a basic smoke test, you could enhance it by checking thatcleanup()would eventually be called or thatstop()clears the interval.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workers/index-cleanup-worker.test.ts` around lines 215 - 230, Add an assertion that start() actually schedules the periodic work: in the test for IndexCleanupWorker, use fake timers (or spy on the instance method) to call worker.start(), advance timers and assert that cleanup() was invoked (or inspect the worker's timer/interval property exists), then call worker.stop() and assert the timer/interval was cleared (or that cleanup is no longer called after advancing timers). Reference the IndexCleanupWorker class and its start(), stop(), and cleanup() methods to locate where to add the fake timers/spy and the assertions.
61-96: Test assertion may be fragile if batch iteration logic changes.The test expects
deleteIndexCleanupBundlesBatchanddeleteIndexCleanupDataBatchto each be called exactly once. This works because the mock returnshasMore: trueon the first call (with IDs) andhasMore: falseon the second call (with empty IDs), so only one batch has actual work.Consider making the test more explicit by adding a comment explaining this expected behavior, or adjusting the mock to return two non-empty batches to verify the loop iterates correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workers/index-cleanup-worker.test.ts` around lines 61 - 96, The test's assumptions about batch iteration are fragile because getIndexCleanupCandidateIds returns one non-empty batch then an empty batch; update the test for clarity by either adding a short comment explaining why deleteIndexCleanupBundlesBatch and deleteIndexCleanupDataBatch are expected to be called once, or better: change the mock for getIndexCleanupCandidateIds (used by IndexCleanupWorker.cleanup) to return two non-empty batches (e.g., first { ids: [id1, id2], hasMore: true } and second { ids: [id3], hasMore: false }) and then update the assertions to expect the corresponding increased call counts for deleteIndexCleanupBundlesBatch and deleteIndexCleanupDataBatch to explicitly verify the loop iterates over multiple batches.src/routes/ar-io.ts (1)
756-759: Consider returning more detailed error information.The error handler logs the full error but only returns
error?.messageto the client. For admin endpoints, providing more context (like a stack trace in non-production environments) could aid debugging.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ar-io.ts` around lines 756 - 759, The catch block in src/routes/ar-io.ts currently logs the full error via log.error but only returns error?.message to clients; update the error response to include additional context (e.g., error.name, error.code, and the stack) for non-production environments and keep production responses minimal. Specifically, in the catch handling where log.error('Index cleanup error', { error: error?.message }) and res.status(500).send(error?.message) are used, change the response to a JSON payload that includes message and conditionally includes stack (process.env.NODE_ENV !== 'production') and any other safe diagnostic fields like error.name or error.code, while ensuring stack traces or sensitive details are not exposed in production.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/config.ts`:
- Around line 1371-1381: INDEX_CLEANUP_INTERVAL_SECONDS and
INDEX_CLEANUP_BATCH_SIZE (and the other similar vars around 1390-1395) currently
coerce env values with unary + which produces NaN/0 for malformed or empty
input; change them to parse and validate integers explicitly: read the env
string via env.varOrDefault, parse with Number.parseInt (or a small helper like
parseEnvInt), ensure the result is a finite integer and within acceptable bounds
(>0), and throw or exit with a clear error message during startup if validation
fails so the process fails fast instead of running with NaN/0 values.
In `@src/database/composite-clickhouse.ts`:
- Around line 534-550: The current loop over `tables` swallows individual `ALTER
TABLE ... DELETE` errors (caught in the `catch` that calls `this.log.error`) but
still emits `this.log.info('ClickHouse index cleanup mutations submitted', {
totalIds: ids.length, ... })`, so partial failures are reported as success;
modify the method around the `for (const table of tables)` loop to collect
failures (e.g., an array of failure metadata from the `catch` that includes
`table`, `batchSize`, and `error.message`), do not suppress them, and after the
loop if any failures were recorded either throw a new Error (or rethrow the
first error) containing the failure summary or mark the operation as failed so
the subsequent `this.log.info('ClickHouse index cleanup mutations
submitted'...)` is not executed; reference `this.clickhouseClient.command`, the
`catch` block currently logging via `this.log.error`, the
`tables`/`batch`/`idList` loop, and the final `this.log.info` when implementing
this change.
In `@src/database/index-cleanup.test.ts`:
- Line 21: The test imports the production logger via "import log from
'../log.js';"—replace that import with createTestLogger from
"test/test-logger.ts" and instantiate a test logger (e.g., const log =
createTestLogger()) so the test uses the test-only silent logger; update any
references to the existing "log" identifier to use the created test logger and
remove the production logger import.
- Around line 134-135: The "Index cleanup" test suite (describe 'Index cleanup'
/ tests for getIndexCleanupCandidateIds) lacks per-test teardown so inserted
rows (IDs like ID1-ID6) can leak between tests; add an afterEach hook that
truncates or deletes from the tables used by these tests (e.g.,
documents/embeddings or whichever test helpers insert rows) or reset the test DB
state via the same DB client used in tests, ensuring each test uses a clean
table state (alternatively generate unique IDs per test if truncation is not
possible).
In `@src/database/standalone-sqlite.ts`:
- Line 2850: buildCleanupCandidateQuery() currently only selects from
stable_data_items (the line creating `query` using
`sql.select(selectClause).from('stable_data_items sdi')`), so rows that exist
only in new_data_items are never found; update the query to include both tables
(either by unioning a second select from `new_data_items ndi` with the same
selectClause or by selecting from a derived table that UNIONS stable_data_items
and new_data_items) so the resulting candidate set matches the delete path that
removes both stable_* and new_* rows.
- Around line 4142-4161: The switch cases that declare consts are not
block-scoped and violate noSwitchDeclarations: wrap the bodies of the
'getIndexCleanupCandidateIds', 'countIndexCleanupCandidates',
'deleteIndexCleanupBundlesBatch', and 'deleteIndexCleanupDataBatch' cases in
braces so each const (cleanupCandidates, cleanupCount, bundlesCleanupResult,
dataCleanupResult) is scoped to its own block; e.g., enclose the statements that
call worker.getIndexCleanupCandidateIds(args[0]) /
worker.countIndexCleanupCandidates(args[0]) /
worker.deleteIndexCleanupBundlesBatch(args[0]) /
worker.deleteIndexCleanupDataBatch(args[0]) and the corresponding
parentPort?.postMessage(...) inside { ... } for each case.
In `@src/routes/ar-io.ts`:
- Around line 626-640: The route handler registered with
arIoRouter.post('/ar-io/admin/index-cleanup') reads batchSize from req.body and
defaults to 1000 but does not validate it; add validation in that handler to
coerce and validate batchSize (e.g., parseInt) and reject or normalize invalid
values: enforce it is an integer > 0 (and optionally cap to a sane maximum),
return a 400 error when the client provides a non-numeric, non-integer, zero, or
negative batchSize, and otherwise use the validated value in the rest of the
function; update any variable references to batchSize inside the handler to use
the validated value.
In `@src/system.ts`:
- Around line 1402-1413: The code currently instantiates IndexCleanupWorker with
JSON.parse(config.INDEX_CLEANUP_FILTER) even when it's the default '{}' which
causes the cleanup query builder to fail; update the startup logic that
constructs IndexCleanupWorker to validate the parsed filter (from
config.INDEX_CLEANUP_FILTER) and ensure it contains at least one criterion
(e.g., non-empty object or required keys) before creating the worker. If
validation fails, do not start the worker—log a clear warning/error including
the invalid filter value—and keep indexCleanupWorker undefined (or bail out) so
the worker doesn't repeatedly wake and fail; adjust the code around the
IndexCleanupWorker construction and related variables to perform this check.
---
Nitpick comments:
In `@src/routes/ar-io.ts`:
- Around line 756-759: The catch block in src/routes/ar-io.ts currently logs the
full error via log.error but only returns error?.message to clients; update the
error response to include additional context (e.g., error.name, error.code, and
the stack) for non-production environments and keep production responses
minimal. Specifically, in the catch handling where log.error('Index cleanup
error', { error: error?.message }) and res.status(500).send(error?.message) are
used, change the response to a JSON payload that includes message and
conditionally includes stack (process.env.NODE_ENV !== 'production') and any
other safe diagnostic fields like error.name or error.code, while ensuring stack
traces or sensitive details are not exposed in production.
In `@src/workers/index-cleanup-worker.test.ts`:
- Around line 156-193: The test currently only asserts deleteDataItemsByIds was
called once; update the test around IndexCleanupWorker.cleanup to also assert
the actual IDs passed to mockClickHouse.deleteDataItemsByIds are the expected
base64url-encoded values (convert the Buffer id1 to base64url and compare to the
argument passed). Locate the mockClickHouse.deleteDataItemsByIds mock used in
the test and add an assertion on its call arguments (the first call's payload)
to verify correct conversion/formatting of IDs.
- Around line 215-230: Add an assertion that start() actually schedules the
periodic work: in the test for IndexCleanupWorker, use fake timers (or spy on
the instance method) to call worker.start(), advance timers and assert that
cleanup() was invoked (or inspect the worker's timer/interval property exists),
then call worker.stop() and assert the timer/interval was cleared (or that
cleanup is no longer called after advancing timers). Reference the
IndexCleanupWorker class and its start(), stop(), and cleanup() methods to
locate where to add the fake timers/spy and the assertions.
- Around line 61-96: The test's assumptions about batch iteration are fragile
because getIndexCleanupCandidateIds returns one non-empty batch then an empty
batch; update the test for clarity by either adding a short comment explaining
why deleteIndexCleanupBundlesBatch and deleteIndexCleanupDataBatch are expected
to be called once, or better: change the mock for getIndexCleanupCandidateIds
(used by IndexCleanupWorker.cleanup) to return two non-empty batches (e.g.,
first { ids: [id1, id2], hasMore: true } and second { ids: [id3], hasMore: false
}) and then update the assertions to expect the corresponding increased call
counts for deleteIndexCleanupBundlesBatch and deleteIndexCleanupDataBatch to
explicitly verify the loop iterates over multiple batches.
In `@src/workers/index-cleanup-worker.ts`:
- Around line 15-32: Extract the inline db type into a named interface (e.g.,
IndexCleanupDb) containing the methods getIndexCleanupCandidateIds,
countIndexCleanupCandidates, deleteIndexCleanupBundlesBatch, and
deleteIndexCleanupDataBatch with their existing signatures, then replace the
anonymous type on the class property db and the constructor parameter with this
new interface; update any imports/exports if needed and ensure the constructor
parameter and the private db property both use the same IndexCleanupDb type to
remove duplication.
- Around line 88-96: The start() method in index-cleanup-worker.ts currently
sets up a periodic timer but never performs an immediate cleanup; update start()
to optionally run an initial cleanup by calling this.cleanup() right after
setting the interval (or before, if you prefer) controlled by a new boolean
option like runOnStart or immediateCleanup on the worker instance (add the
option to the constructor/defaults), and emit a log entry from start()
indicating that an initial cleanup was executed (or skipped) so operators can
see behavior; keep existing fields (intervalMs, dryRun, minAgeSeconds, filter)
intact and ensure cleanup() handles concurrent invocations safely if needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: adfae3c3-5ac7-4d3f-916a-5c3806bc24e6
📒 Files selected for processing (9)
src/config.tssrc/database/composite-clickhouse.tssrc/database/index-cleanup.test.tssrc/database/standalone-sqlite.tssrc/routes/ar-io.tssrc/system.tssrc/types.d.tssrc/workers/index-cleanup-worker.test.tssrc/workers/index-cleanup-worker.ts
| ); | ||
| } | ||
|
|
||
| const query = sql.select(selectClause).from('stable_data_items sdi'); |
There was a problem hiding this comment.
This selector never sees new_data_items.
buildCleanupCandidateQuery() is pinned to stable_data_items, but the delete path later removes both stable_* and new_* rows. Any matching item that still exists only in new_data_items cannot be counted or deleted by this feature.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/database/standalone-sqlite.ts` at line 2850, buildCleanupCandidateQuery()
currently only selects from stable_data_items (the line creating `query` using
`sql.select(selectClause).from('stable_data_items sdi')`), so rows that exist
only in new_data_items are never found; update the query to include both tables
(either by unioning a second select from `new_data_items ndi` with the same
selectClause or by selecting from a derived table that UNIONS stable_data_items
and new_data_items) so the resulting candidate set matches the delete path that
removes both stable_* and new_* rows.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the current implementation of buildCleanupCandidateQuery in the PR branch
rg -n "buildCleanupCandidateQuery|new_data_items|stable_data_items" src/database/standalone-sqlite.ts -A 3 -B 1 | head -100
Length of output: 1472
@vilenarios, thanks for the update! However, after checking the current branch, Line 2850 in src/database/standalone-sqlite.ts still reads:
const query = sql.select(selectClause).from('stable_data_items sdi');There is no UNION with new_data_items, so the fix doesn't appear to be reflected yet. Could you double-check that the change was pushed? The delete path in deleteIndexCleanupBundlesBatch still removes rows from both stable_data_items and new_data_items, so the candidate query needs to cover both tables to ensure consistency.
Here's a sketch of what the fix would look like:
private buildCleanupCandidateQuery(
filter: IndexCleanupFilter,
selectClause: string,
): sql.SelectStatement {
// ... validation unchanged ...
// Build filter conditions for reuse
const buildFiltered = (tableAlias: string, tableName: string) => {
const q = sql.select(selectClause.replace(/\bsdi\b/g, tableAlias))
.from(`${tableName} ${tableAlias}`);
if (hasOwners && filter.owners) {
q.where(sql.in(`${tableAlias}.owner_address`, filter.owners.map((o) => fromB64Url(o))));
}
// ... tags, height, indexedAt filters ...
return q;
};
// UNION stable + new items so both are included in candidate set
const stableQuery = buildFiltered('sdi', 'stable_data_items');
const newQuery = buildFiltered('ndi', 'new_data_items');
// Return as a derived UNION query
// (sql-bricks may require raw SQL here for the UNION)
...
}The exact approach depends on how sql-bricks handles UNIONs in your codebase; you may need a raw SQL string for the union. Let me know if you'd like a more complete implementation.
🧠 Learnings used
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-20T15:05:19.465Z
Learning: Applies to src/database/**/*.{ts,tsx} : When adding database methods, implement in worker class, add queue wrapper in main database class, add case handler in worker message handler, and add signature to appropriate interface in types.d.ts
Learnt from: djwhitt
Repo: ar-io/ar-io-node PR: 635
File: src/routes/data/handlers.ts:677-686
Timestamp: 2026-03-10T20:58:16.097Z
Learning: In Node.js streams, attaching an error listener immediately after stream.pipe(res) is safe because pipe is synchronous and error events are emitted asynchronously, so there is no meaningful race window. Do not flag this ordering as a bug in code reviews. This guidance applies when reviewing Node.js stream usage across TypeScript files; if you see a pattern involving attaching an 'error' listener after pipe(), assume it is safe and focus on real race conditions or error handling concerns elsewhere.
Learnt from: djwhitt
Repo: ar-io/ar-io-node PR: 637
File: src/system.ts:547-555
Timestamp: 2026-03-11T13:40:11.246Z
Learning: In src/system.ts, ensure loop protection is activated without depending on ownOrigin/ARNS_ROOT_HOST config. The detection should use detectLoopInViaChain to inspect the via-chain header and skip gateways already present in the chain. There should be no requirement for local configuration, and missing environment variables should not disable loop protection.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #658 +/- ##
===========================================
+ Coverage 77.19% 77.35% +0.16%
===========================================
Files 111 112 +1
Lines 37227 37712 +485
Branches 2787 2837 +50
===========================================
+ Hits 28736 29172 +436
- Misses 8455 8503 +48
- Partials 36 37 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/database/standalone-sqlite.ts (1)
2834-2906: Consider adding TSDoc comments to document the cleanup query building.Per coding guidelines, adding TSDoc comments when modifying code helps with documentation. The
buildCleanupCandidateQuerymethod and related public methods (getIndexCleanupCandidateIds,countIndexCleanupCandidates,deleteIndexCleanupBundlesBatch,deleteIndexCleanupDataBatch) would benefit from documentation explaining their purpose, parameters, and the filter criteria supported.Example TSDoc for buildCleanupCandidateQuery
/** * Builds a SQL query to find data item candidates for index cleanup. * Requires at least one filter criterion to prevent accidental mass deletion. * * `@param` filter - Cleanup filter criteria (owners, tags, height range, maxIndexedAt) * `@param` selectClause - SQL select clause (e.g., 'sdi.id' or 'COUNT(DISTINCT sdi.id)') * `@returns` A sql-bricks SelectStatement for querying stable_data_items * `@throws` Error if filter specifies no criteria */As per coding guidelines: "
**/*.{ts,tsx,js}: Add or improve TSDoc comments when modifying code to enhance documentation"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/standalone-sqlite.ts` around lines 2834 - 2906, Add TSDoc comments for buildCleanupCandidateQuery and the related public methods getIndexCleanupCandidateIds, countIndexCleanupCandidates, deleteIndexCleanupBundlesBatch, and deleteIndexCleanupDataBatch: describe each method's purpose, parameters (explain IndexCleanupFilter fields: owners, tags, minHeight, maxHeight, maxIndexedAt), the selectClause and return type (sql.SelectStatement or relevant return), and thrown errors (e.g., when no filter criteria are provided); keep the summary, `@param`, `@returns` and `@throws` tags concise and include the note that at least one filter criterion is required to avoid accidental mass deletions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/database/standalone-sqlite.ts`:
- Around line 2834-2906: Add TSDoc comments for buildCleanupCandidateQuery and
the related public methods getIndexCleanupCandidateIds,
countIndexCleanupCandidates, deleteIndexCleanupBundlesBatch, and
deleteIndexCleanupDataBatch: describe each method's purpose, parameters (explain
IndexCleanupFilter fields: owners, tags, minHeight, maxHeight, maxIndexedAt),
the selectClause and return type (sql.SelectStatement or relevant return), and
thrown errors (e.g., when no filter criteria are provided); keep the summary,
`@param`, `@returns` and `@throws` tags concise and include the note that at least one
filter criterion is required to avoid accidental mass deletions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 65e983be-6978-4849-b857-dfb7df644e7d
📒 Files selected for processing (1)
src/database/standalone-sqlite.ts
Without these entries in the environment section, the INDEX_CLEANUP_* variables from .env are never forwarded to the core container. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use positiveIntOrDefault for config env vars instead of unary + to fail fast on malformed input - Collect ClickHouse deletion failures and throw after loop instead of silently logging partial failures as success - Block-scope const declarations in switch/case to fix noSwitchDeclarations - Add batchSize validation in admin endpoint (must be positive int) - Validate INDEX_CLEANUP_FILTER JSON in system.ts with try/catch, log error and skip worker startup on malformed config - Replace production logger with createTestLogger in test file - Extract IndexCleanupDb interface from inline type duplication - Run initial cleanup on worker start instead of waiting for first interval - Improve batch iteration test to verify multi-batch loop with two non-empty batches - Assert actual IDs passed to ClickHouse mock in worker test - Add toB64Url static import in admin endpoint (was already fixed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gateway operators indexing data via ArDrive Turbo for optimistic 30-day retention currently have no way to selectively clean up expired or unwanted data items. The only existing mechanism (prune-stable-data-items) supports height range + indexed_at threshold but no tag or wallet filtering.
This adds filter-based index cleanup that deletes L2 data items from SQLite and ClickHouse based on owner wallet address, tags, block height range, and indexing age.
Three interfaces are provided:
Key design decisions:
New environment variables:
PE-9016