Skip to content

Comments

CMR-11070: Fix bug when finalizing rebalancing back to small collections, fix in-memory db to allow rebalancing#2384

Merged
zimzoom merged 4 commits intomasterfrom
CMR-11070
Feb 20, 2026
Merged

CMR-11070: Fix bug when finalizing rebalancing back to small collections, fix in-memory db to allow rebalancing#2384
zimzoom merged 4 commits intomasterfrom
CMR-11070

Conversation

@zimzoom
Copy link
Contributor

@zimzoom zimzoom commented Feb 13, 2026

Overview

What is the objective?

Fixes a bug where if a granule index is rebalanced back into small collections index, the old separate granule index does not get deleted.

What are the changes?

  • Updated remove-granule-index-from-index-set function to properly remove old index from the index set
  • Added final step to the finalize-collection-rebalancing function to delete the old index when the "target" is small collections index. Although the ticket describes this as occurring when the old granule index was resharded, this was not getting deleted regardless of resharding history or lack thereof, which is why the logic has been added here as a new block.
  • Also, as a support feature added a missing protocol extension for the in-memory database intended for local development, whose absence was preventing running rebalance locally.

What areas of the application does this impact?

indexer-app

Required Checklist

  • [x ] New and existing unit and int tests pass locally and remotely
  • [x ] clj-kondo has been run locally and all errors in changed files are corrected (note, did not correct previosuly existng unused binding errors through memory_db because would make file harder to read)
  • [x ] I have commented my code, particularly in hard-to-understand areas
  • I have made changes to the documentation (if necessary)
  • [x ] My changes generate no new warnings

Additional Checklist

  • I have removed unnecessary/dead code and imports in files I have changed
  • I have cleaned up integration tests by doing one or more of the following:
    • migrated any are2 tests to are3 in files I have changed
    • de-duped, consolidated, removed dead int tests
    • transformed applicable int tests into unit tests
    • reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

Summary by CodeRabbit

  • New Features

    • Added a public batched in-memory concept search API for efficient bulk retrieval.
  • Tests

    • Added a unit test verifying correct removal of a granule index from index sets.
  • Chores

    • Formatting, naming and structural refinements across index-management flows.
    • Improved error handling, sequencing and store interactions in rebalancing/resharding/reset paths.
    • Minor log/message tweaks; no breaking public-interface changes.

@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Formatting and local-binding refactors across indexer resharding/rebalancing/reset flows; added a public in-memory helper find-concepts-in-batches that returns lazy annotated concept batches; and a unit test verifying granule-index removal behavior.

Changes

Cohort / File(s) Summary
Indexer resharding / cleanup
indexer-app/src/cmr/indexer/services/index_set_service.clj
Reflowed formatting and whitespace; renamed local bindings; changed remove-granule-index-from-index-set to a two-step update returning the updated index-set and updating both indexes and concepts; added explicit ES-store usage in finalize paths and minor log/message tweaks. No public arity/signature changes.
In-memory batched search
metadata-db-app/src/cmr/metadata_db/data/memory_db.clj
Added public find-concepts-in-batches (two arities) that yields lazy batches of concepts annotated with provider-id; registered it in concept-search-behaviour; included in-file demo/test usage.
Tests
indexer-app/test/cmr/indexer/test/services/index_set_service_test.clj
Added test-remove-granule-index-from-index-set to assert removal from indexes and concepts and validate returned index-set structure.

Sequence Diagram(s)

(Skipped — changes are refactors, a single-component in-memory helper, and a unit test; no multi-component sequential flow to visualize.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

hacktoberfest-accepted

Suggested reviewers

  • eereiter
  • daniel-zamora
  • jceaser

Poem

🐰 I hopped through bindings, tidy and bright,
Bunched concepts in batches, gentle as night.
Granule traces swept from their nest,
Tests gave a nod — the index is blessed.
🥕 A tiny hop to celebrate the light.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main bug fix (granule index deletion when rebalancing back to small collections) and mentions the supporting in-memory database fix.
Description check ✅ Passed The description covers the Overview section with clear objectives and changes, identifies impacted areas, and includes a mostly-completed Required Checklist; however, documentation and dead-code cleanup items remain unchecked.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch CMR-11070

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zimzoom zimzoom marked this pull request as draft February 13, 2026 19:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@indexer-app/src/cmr/indexer/services/index_set_service.clj`:
- Line 514: Remove the stray debug println call `(println
"~~~~~~~coll-base-name: " coll-base-name)` from index_set_service.clj; if you
need runtime visibility of coll-base-name, replace it with a structured logger
call such as `(log/info ...)` or `(log/debug ...)` (or the project's logging fn)
within the same function/scope where `coll-base-name` is bound so observability
uses the app's logging framework instead of println.
- Around line 552-590: The finalize-collection-rebalancing function may call
es/delete-index with a nil old-separate-index; before calling es/delete-index in
the when (= "small-collections" target) branch, guard that old-separate-index is
non-nil (and optionally non-empty) and only call es/delete-index when it exists;
reference the local old-separate-index (from get-in old-gran-index-set
[:index-set :concepts :granule (keyword concept-id)]) and the es/delete-index
call to locate and protect the delete invocation.

In `@metadata-db-app/src/cmr/metadata_db/data/memory_db.clj`:
- Around line 285-305: The batch finder is sorting by a non-existent :id key
causing non-deterministic ordering; update the sort in find-concepts-in-batches
to use :concept-id instead of :id (i.e., change the sort-by call that produces
sorted-concepts to sort by :concept-id) so batches are deterministic and match
the Oracle behavior, leaving the rest of the lazy-find/batch logic unchanged and
still associating :provider-id onto each returned concept.
🧹 Nitpick comments (1)
metadata-db-app/src/cmr/metadata_db/data/memory_db.clj (1)

699-719: Test/demo code in the comment block is fine for REPL use but note it won't work as-is.

The test snippet calls save-concept with a provider map as the second argument (line 706), but save-concept (line 452) expects the second arg to be a provider and then validates concept fields like :revision-id (via the :pre assertion). The test concepts also lack :metadata and other fields that after-save or downstream logic may expect. This is minor since comment blocks are never evaluated, but it could mislead someone trying to use it at the REPL.

@zimzoom zimzoom force-pushed the CMR-11070 branch 2 times, most recently from b492092 to 383c4f4 Compare February 13, 2026 20:05
@zimzoom
Copy link
Contributor Author

zimzoom commented Feb 13, 2026

Adding/testing CodeRabbit suggestions, please hold.

@zimzoom
Copy link
Contributor Author

zimzoom commented Feb 13, 2026

Note that the suite of system int tests for testing rebalancing collections BACK to small collections has been commented out and marked as non-operational as of some time prior to this ticket, which is why that has not been updated here. (larger effort/out of scope)

;; Update the index set. This will create the new collection indexes as needed.
(validate-requested-index-set context es-config/gran-elastic-name gran-index-set true)
(update-index-set context es-config/gran-elastic-name (util/remove-nils-empty-maps-seqs gran-index-set))))
new-gran-index-set (as-> old-gran-index-set index-set
Copy link
Contributor Author

@zimzoom zimzoom Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these lines for defining this var new-gran-index-set actually remain the same, except the names in this first line were updated to have the new- and old- prefixes.

(if (= "small-collections" target)
(remove-granule-index-from-index-set index-set concept-id)
index-set))
es-store (indexer-util/context->es-store context es-config/gran-elastic-name)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added es-store because needed for deleting indices

es-store (indexer-util/context->es-store context es-config/gran-elastic-name)]
(try
;; Update the index set. This will create the new collection indexes as needed.
(validate-requested-index-set context es-config/gran-elastic-name new-gran-index-set true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two forms (validate-requested-index-set and (update-index-set below remain the same, they have just been wrapped in try.

(update-index-set context es-config/gran-elastic-name (util/remove-nils-empty-maps-seqs new-gran-index-set))

;; Delete the separate index for this collection when moving back into small collections index
(when (= "small-collections" target)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This when is the new logic for deleting

(println "Batch:")
(doseq [concept batch]
(println (select-keys concept [:concept-id :native-id]))) (println))
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My LSP did not like this but I left it because I could have sworn that rich comments are the one time you are supposed to have the ending paren on its own line, for IDE line-wise selection.

Copy link
Contributor

@jceaser jceaser Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a lover of:
{
}
I always hated putting ) with all the others, it makes it so hard to figure out how to add/remove things.

what's up with the floating println? I understand your just writing a new line after a loop, but it is just hanging out there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if you are asking about what's up with its existence or up with its formatting, so --
it exists to make everything easier to read when there is many loops,
& its formatting is interesting, that was the LSP autoformatter (which is the official Clojure one). I had it on its own line and it got moved up to same line as previous form with "that number of spaces" buffer. I think it basically detected it was printing nothing and assumed that logically it should go with the previous form for readability? Never seen that before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not bothered so much by putting ) with all the others because putting them on their own line would be way way way too cluttered for Lisps, and because it's expected that the IDE has form-wise selection. But in the case of rich comments I could have sworn it was supposed to be on its own line ... IDK, maybe they changed it but I have a feeling it might be a missing LSP feature. Conceptually it also makes it more readable because rich comments are the one time when you should not interpret the entire form as one thing, some pieces of it might execute, others might need something else to resolve, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zimzoom zimzoom marked this pull request as ready for review February 13, 2026 20:53
@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 22.58065% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.92%. Comparing base (b72194d) to head (fc001f7).

Files with missing lines Patch % Lines
...app/src/cmr/indexer/services/index_set_service.clj 25.33% 56 Missing ⚠️
...data-db-app/src/cmr/metadata_db/data/memory_db.clj 11.11% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2384      +/-   ##
==========================================
- Coverage   57.93%   57.92%   -0.01%     
==========================================
  Files        1066     1066              
  Lines       73217    73249      +32     
  Branches     2120     2119       -1     
==========================================
+ Hits        42418    42433      +15     
- Misses      28824    28840      +16     
- Partials     1975     1976       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

filtered-concepts (concepts/search-with-params concepts (assoc params :provider-id provider-id))
sorted-concepts (vec (sort-by :concept-id filtered-concepts))
start-index (max requested-start-index 0)]
(letfn [(find-batch [start-index]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did AI suggest letfn, I seam to be seeing it more often these days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think it's neat and useful

Copy link
Contributor

@jceaser jceaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor questions I feel you can handle.

@daniel-zamora
Copy link
Contributor

Note that the suite of system int tests for testing rebalancing collections BACK to small collections has been commented out and marked as non-operational as of some time prior to this ticket, which is why that has not been updated here. (larger effort/out of scope)

is it possible to add some smaller unit tests, especially for some of the error cases?

(when (< start-index (count sorted-concepts))
(let [batch (find-batch start-index)]
(if (empty? batch)
nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if blocks with nils should be converted to whens

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
indexer-app/test/cmr/indexer/test/services/index_set_service_test.clj (2)

502-509: Third assertion is redundant — consider replacing the two path checks with it.

"returns expected index-set structure" at line 508 is a strict superset of the two individual path assertions above it: if (= expected-index-set result) passes, the sub-path checks trivially pass as well. Either keep only the full-structure assertion (simpler), or drop it and keep only the targeted path assertions (easier to diagnose partial failures). Having all three adds noise without additional coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@indexer-app/test/cmr/indexer/test/services/index_set_service_test.clj` around
lines 502 - 509, The two narrow assertions using (get-in result [:index-set
:granule :indexes]) and (get-in result [:index-set :concepts :granule]) are
redundant given the full equality check (is (= expected-index-set result));
remove the two specific (testing ...) blocks and keep only the full-structure
assertion that compares result to expected-index-set so the test is simpler and
non-duplicative (referencing symbols: result, expected-index-set, get-in,
:index-set :granule :indexes, :index-set :concepts :granule).

486-509: Add edge-case tests: selective removal, absent concept-id, and field preservation.

The current test only exercises the happy path with a single-index fixture. Three important cases are uncovered, which is especially relevant given the system integration tests for rebalancing back to small-collections remain commented out:

  1. Selective removal – when multiple indexes exist in :granule :indexes, only the target should be removed; verify the remaining indexes are untouched.
  2. Absent concept-id – calling the function when collection-concept-id is not present in :concepts :granule (verify graceful/no-op behaviour rather than a silent corruption or exception).
  3. Sibling field preservation – when :granule also contains :rebalancing-collections / :resharding-indexes etc., those fields must survive the removal unmodified.
🧪 Suggested additional test cases
+ (testing "does not remove other indexes when multiple are present"
+   (let [other-index "1_c9999_other"
+         multi-index-set {:index-set
+                          {:granule
+                           {:indexes [{:name index-name :number_of_shards 5}
+                                      {:name other-index :number_of_shards 3}]}
+                           :concepts
+                           {:granule {(keyword collection-concept-id) index-name
+                                      (keyword "C9999-OTHER") other-index}}}}
+         result (#'svc/remove-granule-index-from-index-set multi-index-set collection-concept-id)]
+     (is (= [{:name other-index :number_of_shards 3}]
+            (get-in result [:index-set :granule :indexes])))
+     (is (= {(keyword "C9999-OTHER") other-index}
+            (get-in result [:index-set :concepts :granule])))))

+ (testing "preserves unrelated granule fields (e.g. rebalancing-collections)"
+   (let [index-set-with-extras {:index-set
+                                {:granule
+                                 {:indexes [{:name index-name :number_of_shards 5}]
+                                  :rebalancing-collections ["C9999-OTHER"]}
+                                 :concepts
+                                 {:granule {(keyword collection-concept-id) index-name}}}}
+         result (#'svc/remove-granule-index-from-index-set index-set-with-extras collection-concept-id)]
+     (is (= ["C9999-OTHER"]
+            (get-in result [:index-set :granule :rebalancing-collections])))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@indexer-app/test/cmr/indexer/test/services/index_set_service_test.clj` around
lines 486 - 509, Add three new tests for
svc/remove-granule-index-from-index-set: (1) "selective removal" — craft an
initial-index-set where :index-set :granule :indexes contains multiple index
maps including the target (index-name) and assert only that index is removed
while others remain unchanged; (2) "absent concept-id" — call
remove-granule-index-from-index-set with a collection-concept-id not present
under :index-set :concepts :granule and assert the result is a no-op (indexes
and other concept entries unchanged, no exception); (3) "sibling field
preservation" — include extra keys under :index-set :granule (e.g.,
:rebalancing-collections, :resharding-indexes) and assert those keys and their
values are preserved after removal; reference
svc/remove-granule-index-from-index-set and use get-in on [:index-set :granule
:indexes], [:index-set :concepts :granule], and the sibling keys to make your
assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@indexer-app/test/cmr/indexer/test/services/index_set_service_test.clj`:
- Around line 502-509: The two narrow assertions using (get-in result
[:index-set :granule :indexes]) and (get-in result [:index-set :concepts
:granule]) are redundant given the full equality check (is (= expected-index-set
result)); remove the two specific (testing ...) blocks and keep only the
full-structure assertion that compares result to expected-index-set so the test
is simpler and non-duplicative (referencing symbols: result, expected-index-set,
get-in, :index-set :granule :indexes, :index-set :concepts :granule).
- Around line 486-509: Add three new tests for
svc/remove-granule-index-from-index-set: (1) "selective removal" — craft an
initial-index-set where :index-set :granule :indexes contains multiple index
maps including the target (index-name) and assert only that index is removed
while others remain unchanged; (2) "absent concept-id" — call
remove-granule-index-from-index-set with a collection-concept-id not present
under :index-set :concepts :granule and assert the result is a no-op (indexes
and other concept entries unchanged, no exception); (3) "sibling field
preservation" — include extra keys under :index-set :granule (e.g.,
:rebalancing-collections, :resharding-indexes) and assert those keys and their
values are preserved after removal; reference
svc/remove-granule-index-from-index-set and use get-in on [:index-set :granule
:indexes], [:index-set :concepts :granule], and the sibling keys to make your
assertions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
metadata-db-app/src/cmr/metadata_db/data/memory_db.clj (1)

284-303: Replace (not (empty? batch)) with (seq batch) and add batch-size validation.

Line 301 uses the anti-pattern (not (empty? batch)), which is non-idiomatic Clojure. Use (seq batch) instead. Additionally, the function lacks a guard against invalid batch-size values: a zero value causes an infinite loop (batch never advances), and negative values cause subvec to fail. Add a precondition {:pre [(pos? batch-size)]} to enforce this.

Suggested changes
-  ([db provider params batch-size requested-start-index]
+  ([db provider params batch-size requested-start-index]
+   {:pre [(pos? batch-size)]}
    (let [{:keys [concept-type]} params
          provider-id (:provider-id provider)
          concepts @(:concepts-atom db)
          filtered-concepts (concepts/search-with-params concepts (assoc params :provider-id provider-id))
          sorted-concepts (vec (sort-by :concept-id filtered-concepts))
          start-index (max requested-start-index 0)]
      (letfn [(find-batch [start-index]
                (let [end-index (min (+ start-index batch-size) (count sorted-concepts))
                      batch (subvec sorted-concepts start-index end-index)]
                  (mapv #(assoc % :provider-id provider-id) batch)))
              (lazy-find [start-index]
                (when (< start-index (count sorted-concepts))
-                 (let [batch (find-batch start-index)]
-                   (when (not (empty? batch))
-                     (cons batch (lazy-seq (lazy-find (+ start-index batch-size))))))))]
+                 (let [batch (find-batch start-index)]
+                   (when (seq batch)
+                     (cons batch (lazy-seq (lazy-find (+ start-index batch-size))))))))]
        (lazy-find start-index)))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@metadata-db-app/src/cmr/metadata_db/data/memory_db.clj` around lines 284 -
303, The function find-concepts-in-batches should validate batch-size and use
idiomatic emptiness checks: add a precondition {:pre [(pos? batch-size)]} to the
find-concepts-in-batches arity that accepts requested-start-index to prevent
zero/negative sizes, and in the lazy-find helper replace (not (empty? batch))
with (seq batch); this touches the find-batch / lazy-find logic and ensures
batch-size advances and subvec is never called with invalid bounds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@metadata-db-app/src/cmr/metadata_db/data/memory_db.clj`:
- Around line 284-303: The function find-concepts-in-batches should validate
batch-size and use idiomatic emptiness checks: add a precondition {:pre [(pos?
batch-size)]} to the find-concepts-in-batches arity that accepts
requested-start-index to prevent zero/negative sizes, and in the lazy-find
helper replace (not (empty? batch)) with (seq batch); this touches the
find-batch / lazy-find logic and ensures batch-size advances and subvec is never
called with invalid bounds.

@zimzoom zimzoom merged commit 80dd520 into master Feb 20, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants