MB-71397: Fix CPU OOM during GPU Train#76
Conversation
Co-authored-by: Copilot <copilot@github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses CPU out-of-memory crashes during GPU training of IVF+ScalarQuantizer indexes by introducing training-time subsampling (to avoid allocating full-size residual buffers) and by propagating the encoder-training vector limit when cloning a CPU IVF index to GPU.
Changes:
- Add subsampling in
GpuIndexIVFScalarQuantizer::trainResiduals_usingfvecs_maybe_subsample. - Introduce
GpuIndexIVF::train_encoder_num_vectors()and store an encoder-training vector limit inGpuIndexIVF. - Propagate
IndexIVF::train_encoder_num_vectors()from CPU to GPU inGpuIndexIVF::copyFrom.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
faiss/gpu/GpuIndexIVFScalarQuantizer.cu |
Subsamples training vectors before residual computation to reduce CPU memory usage during GPU training. |
faiss/gpu/GpuIndexIVF.h |
Adds a virtual encoder-training vector limit accessor and stores the propagated limit. |
faiss/gpu/GpuIndexIVF.cu |
Copies the CPU encoder-training vector limit into GPU state and exposes it via a new accessor. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
When training on the CPU index we hit this code: void IndexIVF::train(idx_t n, const float* x) {
if (verbose) {
printf("Training level-1 quantizer\n");
}
// Train Quantizer
train_q1(n, x, verbose, metric_type);
if (verbose) {
printf("Training IVF residual\n");
}
// optional subsampling
idx_t max_nt = train_encoder_num_vectors();
if (max_nt <= 0) {
max_nt = (size_t)1 << 35;
}
// Train Residuals
TransformedVectors tv(
x, fvecs_maybe_subsample(d, (size_t*)&n, max_nt, x, verbose));
if (by_residual) {
std::vector<idx_t> assign(n);
quantizer->assign(n, tv.x, assign.data());
std::vector<float> residuals(n * d); // <--- OOM LINE
quantizer->compute_residual_n(n, tv.x, residuals.data(), assign.data());
train_encoder(n, residuals.data(), assign.data());
} else {
train_encoder(n, tv.x, nullptr);
}
is_trained = true;
}We basically:
On the GPU side of things, we do not subsample the vectors resulting in hitting the OOM LINE on the GPU code, since we will create an unbounded vector. This patch fixes this by mimicing the CPU behaviour for subsampling of the residual vectors |
Thejas-bhat
left a comment
There was a problem hiding this comment.
looks good, but can you add the MB associated with this?
crash due to allocating residuals for the full training set.
of vectors, mirroring the existing CPU behaviour.