Skip to content

copy bftrees from the snapshot location to the save location#783

Open
backurs wants to merge 1 commit intomainfrom
arturs/copy_bftrees
Open

copy bftrees from the snapshot location to the save location#783
backurs wants to merge 1 commit intomainfrom
arturs/copy_bftrees

Conversation

@backurs
Copy link
Contributor

@backurs backurs commented Feb 16, 2026

This is a small PR that makes sure that we copy on-disk bftree index files from the snapshot() location to the location as specified when saving the bf-tree based index.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates bf-tree index persistence so that when saving an on-disk bf-tree-based index, the generated snapshot files are copied from the bf-tree’s internal snapshot location to the save prefix location provided to save_with().

Changes:

  • Change bf-tree provider snapshot() helpers to return the snapshot PathBuf.
  • Update BfTreeProvider::save_with() to copy vector/neighbor/(quant) .bftree snapshot files to the target prefix paths when they differ.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
diskann-providers/src/model/graph/provider/async_/bf_tree/vector_provider.rs Make snapshot() return the underlying bf-tree snapshot path (PathBuf).
diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs Make snapshot() return the underlying bf-tree snapshot path (PathBuf).
diskann-providers/src/model/graph/provider/async_/bf_tree/neighbor_provider.rs Make snapshot() return the underlying bf-tree snapshot path (PathBuf).
diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs Copy .bftree snapshot outputs to the save prefix paths during save_with().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2065 to +2088
std::fs::copy(&vectors_snapshot_path, &target_vectors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?;
}
let target_neighbors_path = BfTreePaths::neighbors_bftree(&saved_params.prefix);
if neighbors_snapshot_path != target_neighbors_path {
std::fs::copy(&neighbors_snapshot_path, &target_neighbors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy neighbors from {:?} to {:?}: {}",
neighbors_snapshot_path, target_neighbors_path, e
))
})?;
}
let target_quant_path = BfTreePaths::quant_bftree(&saved_params.prefix);
if quant_snapshot_path != target_quant_path {
std::fs::copy(&quant_snapshot_path, &target_quant_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy quant from {:?} to {:?}: {}",
quant_snapshot_path, target_quant_path, e
))
})?;
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same performance concern in the quantized save_with: these std::fs::copy calls can block the async executor and double the amount of IO for large bf-tree files. Consider offloading to a blocking task or using an async copy primitive.

Suggested change
std::fs::copy(&vectors_snapshot_path, &target_vectors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?;
}
let target_neighbors_path = BfTreePaths::neighbors_bftree(&saved_params.prefix);
if neighbors_snapshot_path != target_neighbors_path {
std::fs::copy(&neighbors_snapshot_path, &target_neighbors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy neighbors from {:?} to {:?}: {}",
neighbors_snapshot_path, target_neighbors_path, e
))
})?;
}
let target_quant_path = BfTreePaths::quant_bftree(&saved_params.prefix);
if quant_snapshot_path != target_quant_path {
std::fs::copy(&quant_snapshot_path, &target_quant_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy quant from {:?} to {:?}: {}",
quant_snapshot_path, target_quant_path, e
))
})?;
tokio::fs::copy(&vectors_snapshot_path, &target_vectors_path)
.await
.map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?;
}
let target_neighbors_path = BfTreePaths::neighbors_bftree(&saved_params.prefix);
if neighbors_snapshot_path != target_neighbors_path {
tokio::fs::copy(&neighbors_snapshot_path, &target_neighbors_path)
.await
.map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy neighbors from {:?} to {:?}: {}",
neighbors_snapshot_path, target_neighbors_path, e
))
})?;
}
let target_quant_path = BfTreePaths::quant_bftree(&saved_params.prefix);
if quant_snapshot_path != target_quant_path {
tokio::fs::copy(&quant_snapshot_path, &target_quant_path)
.await
.map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy quant from {:?} to {:?}: {}",
quant_snapshot_path, target_quant_path, e
))
})?;

Copilot uses AI. Check for mistakes.
Comment on lines +1903 to +1911
// Copy snapshot files to the target prefix location if they differ
let target_vectors_path = BfTreePaths::vectors_bftree(&saved_params.prefix);
if vectors_snapshot_path != target_vectors_path {
std::fs::copy(&vectors_snapshot_path, &target_vectors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?;
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_with takes a StorageWriteProvider, but these new .bftree copies bypass it by calling std::fs::copy directly. If a non-filesystem storage provider is used (e.g., virtual/in-memory, remote, etc.), params/delete/PQ files will be written to storage but the bf-tree files will be written to the local filesystem instead, yielding an incomplete/incorrect save. Consider copying via StorageReadProvider/StorageWriteProvider streams (read snapshot file, write to storage.create_for_write) or documenting/enforcing that this SaveWith impl only supports filesystem-backed storage.

Copilot uses AI. Check for mistakes.
Comment on lines +1906 to +1920
std::fs::copy(&vectors_snapshot_path, &target_vectors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?;
}
let target_neighbors_path = BfTreePaths::neighbors_bftree(&saved_params.prefix);
if neighbors_snapshot_path != target_neighbors_path {
std::fs::copy(&neighbors_snapshot_path, &target_neighbors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy neighbors from {:?} to {:?}: {}",
neighbors_snapshot_path, target_neighbors_path, e
))
})?;
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::fs::copy runs a potentially large, blocking file copy inside an async save_with. This can block the executor thread and adds an extra full read+write of the index file. Consider moving the copy into a blocking section (e.g., spawn_blocking) or using an async file copy implementation if save_with is expected to run on a Tokio runtime.

Suggested change
std::fs::copy(&vectors_snapshot_path, &target_vectors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?;
}
let target_neighbors_path = BfTreePaths::neighbors_bftree(&saved_params.prefix);
if neighbors_snapshot_path != target_neighbors_path {
std::fs::copy(&neighbors_snapshot_path, &target_neighbors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy neighbors from {:?} to {:?}: {}",
neighbors_snapshot_path, target_neighbors_path, e
))
})?;
let src = vectors_snapshot_path.clone();
let dst = target_vectors_path.clone();
tokio::task::spawn_blocking(move || std::fs::copy(&src, &dst))
.await
.map_err(|e| {
ANNError::log_index_error(format!(
"Failed to execute blocking copy for vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?
.map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?;
}
let target_neighbors_path = BfTreePaths::neighbors_bftree(&saved_params.prefix);
if neighbors_snapshot_path != target_neighbors_path {
let src = neighbors_snapshot_path.clone();
let dst = target_neighbors_path.clone();
tokio::task::spawn_blocking(move || std::fs::copy(&src, &dst))
.await
.map_err(|e| {
ANNError::log_index_error(format!(
"Failed to execute blocking copy for neighbors from {:?} to {:?}: {}",
neighbors_snapshot_path, target_neighbors_path, e
))
})?
.map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy neighbors from {:?} to {:?}: {}",
neighbors_snapshot_path, target_neighbors_path, e
))
})?;

Copilot uses AI. Check for mistakes.
Comment on lines 1899 to +1906
// Save vectors and neighbors
self.full_vectors.snapshot();
self.neighbor_provider.snapshot();
let vectors_snapshot_path = self.full_vectors.snapshot();
let neighbors_snapshot_path = self.neighbor_provider.snapshot();

// Copy snapshot files to the target prefix location if they differ
let target_vectors_path = BfTreePaths::vectors_bftree(&saved_params.prefix);
if vectors_snapshot_path != target_vectors_path {
std::fs::copy(&vectors_snapshot_path, &target_vectors_path).map_err(|e| {
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new behavior is meant to handle the case where a provider was loaded from one prefix (snapshot location) and then saved to a different prefix, but the existing tests appear to only cover save+load using the same prefix path. Adding a regression test that loads from prefix A, saves to prefix B, and then loads from prefix B would validate the copy logic and prevent future regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +2062 to +2089
// Copy snapshot files to the target prefix location if they differ
let target_vectors_path = BfTreePaths::vectors_bftree(&saved_params.prefix);
if vectors_snapshot_path != target_vectors_path {
std::fs::copy(&vectors_snapshot_path, &target_vectors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy vectors from {:?} to {:?}: {}",
vectors_snapshot_path, target_vectors_path, e
))
})?;
}
let target_neighbors_path = BfTreePaths::neighbors_bftree(&saved_params.prefix);
if neighbors_snapshot_path != target_neighbors_path {
std::fs::copy(&neighbors_snapshot_path, &target_neighbors_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy neighbors from {:?} to {:?}: {}",
neighbors_snapshot_path, target_neighbors_path, e
))
})?;
}
let target_quant_path = BfTreePaths::quant_bftree(&saved_params.prefix);
if quant_snapshot_path != target_quant_path {
std::fs::copy(&quant_snapshot_path, &target_quant_path).map_err(|e| {
ANNError::log_index_error(format!(
"Failed to copy quant from {:?} to {:?}: {}",
quant_snapshot_path, target_quant_path, e
))
})?;
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above in the quantized save_with: the .bftree files are copied via std::fs::copy rather than through the provided StorageWriteProvider. This makes the save output dependent on the local filesystem even though other artifacts (params JSON, delete bitmap, PQ pivots) are written via storage.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.99%. Comparing base (7cd231a) to head (690c861).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #783      +/-   ##
==========================================
- Coverage   89.00%   88.99%   -0.01%     
==========================================
  Files         428      428              
  Lines       78417    78417              
==========================================
- Hits        69795    69790       -5     
- Misses       8622     8627       +5     
Flag Coverage Δ
miri 88.99% <ø> (-0.01%) ⬇️
unittests 88.99% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants