Add support for FP32 and mixed precision in PDLP by Kh4ster · Pull Request #910 · NVIDIA/cuopt

Kh4ster · 2026-02-27T14:08:47Z

This PR adds support for FP32 and mixed precision in PDLP (not MIP, Dual Simplex or Barrier).

Those two new options are available through:

FP32: creating the whole solver object as FP32 or passing "--pdlp-fp32" to the solve_LP binary
Mixed precision: toggling the option "mixed_precision_spmv" in solver_settings or passing "--mixed-precision-spmv" to the solve_LP binary

Below, the detail of what each feature allows:

FP32 Precision Mode

By default, PDLP operates in FP64 (double) precision. Users can switch to FP32 (float) precision for the entire solve. FP32 uses half the memory of FP64 and allows PDHG iterations to be on average twice as fast, but it may require more iterations to converge due to reduced numerical accuracy. FP32 mode is only supported with the PDLP method (not concurrent) and without crossover.

Note: The default precision is FP64 (double).

Mixed Precision SpMV

When running PDLP in FP64 mode, users can enable mixed precision sparse matrix-vector products (SpMV) during PDHG iterations. In this mode, the constraint matrix and its transpose are stored in FP32 while vectors and the compute type remain in FP64. This allows SpMV operations to be faster thanks to reduced memory bandwidth requirements, while maintaining FP64 accuracy in the accumulation. This will make PDHG iterations faster while limiting the potential negative impact on convergence (compared to running in FP32 mode). Convergence checking and restart logic always use the full FP64 matrix, so this mode does not reduce memory usage since both the FP32 and FP64 copies of the matrix are kept in memory. Mixed precision SpMV only applies in FP64 mode and has no effect when running in FP32.

Note: The default value is false.

copy-pr-bot · 2026-02-27T14:08:51Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-27T14:20:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds FP32 PDLP support and an optional mixed-precision SpMV path. Code is templated over floating type f_t, new CLI flags control precision and mixed-SpMV, FP32 matrix copies and mixed-SpMV helpers are added, and many template-instantiation guards were extended to enable float builds.

Changes

Cohort / File(s)	Summary
Benchmark / CLI `benchmarks/linear_programming/cuopt/run_pdlp.cu`	Added `--pdlp-fp32` and `--mixed-precision-spmv` flags; templated solver settings and introduced `run_solver<f_t>` with runtime precision dispatch; moved arg/handle init into templated run path.
Solver Settings & Config `cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`, `cpp/src/pdlp/solver_settings.cu`, `cpp/src/math_optimization/solver_settings.cu`	Templated time_limit/tolerances to `f_t`, added `mixed_precision_spmv` flag, converted numeric literals/bounds to `f_t(...)`.
PDLP Core & PDHG `cpp/src/pdlp/pdlp.cu`, `cpp/src/pdlp/pdhg.hpp`, `cpp/src/pdlp/pdhg.cu`, `cpp/src/pdlp/solve.cu`	Added `enable_mixed_precision_spmv` param and propagation; PDHG now selects mixed-precision SpMV paths when enabled; added templated `cublasGeam<f_t>` wrapper; FP32 paths restricted (no crossover/other solvers).
cusparse_view / Mixed-SpMV Implementation `cpp/src/pdlp/cusparse_view.hpp`, `cpp/src/pdlp/cusparse_view.cu`	Added FP32 copies/descriptors/buffers, mixed-precision SpMV helpers and optional preprocess (CUDA12.4+), `update_mixed_precision_matrices()`, and ctor parameter to enable feature.
Template Instantiation Guards many files under `cpp/src/pdlp/...`, `cpp/src/mip_heuristics/...`	Broadened float instantiation guards from `MIP_INSTANTIATE_FLOAT` to `MIP_INSTANTIATE_FLOAT
Sparse / Matrix Utilities & Conversions `cpp/src/dual_simplex/sparse_matrix.cpp`, `cpp/src/mip_heuristics/presolve/third_party_presolve.cpp`, `cpp/src/pdlp/optimization_problem.cu`	Added float explicit instantiations; introduced `convert_vector<To,From>` and f_t↔double conversions for PSLP/Papilo integration; adjusted host/vector handling and presolve/undo flows.
Solution I/O & Conversion `cpp/src/math_optimization/solution_writer.hpp`, `cpp/src/math_optimization/solution_writer.cu`, `cpp/src/pdlp/solution_conversion.cu`, `cpp/src/mip_heuristics/solver_solution.cu`	Templated `write_solution_to_sol_file<f_t>` and updated signatures; added `to_cpu_buffer` helper; added explicit float/double instantiations and file-open safety; precision-aware formatting.
Utilities, Strategies & Restart Components various `cpp/src/mip_heuristics/...`, `cpp/src/pdlp/...` files	Replaced raw numeric literals with `f_t(...)`, cast time/fixrate calculations to `f_t`, and expanded instantiation guards to include `PDLP_INSTANTIATE_FLOAT`.
Tests `cpp/tests/linear_programming/pdlp_test.cu`	Added `mip_constants.hpp` include; templated `is_incorrect_objective<f_t>`; added FP32 test variants and guarded float tests under `MIP_INSTANTIATE_FLOAT
Docs `docs/cuopt/source/lp-qp-features.rst`, `docs/cuopt/source/lp-qp-milp-settings.rst`	Documented FP32 PDLP mode and Mixed Precision SpMV option, constraints, and defaults.
Build Constants `cpp/src/mip_heuristics/mip_constants.hpp`	Added `PDLP_INSTANTIATE_FLOAT` macro and bumped header year.
Miscellaneous small edits several `cpp/src/pdlp/...` and `cpp/src/mip_heuristics/...` files	Minor casts/templating fixes (e.g., typed literals f_t(1.0)), added includes, and adjusted few instantiation/forward-declarations for float specializations.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 1.30% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title clearly and concisely summarizes the main changes: adding FP32 and mixed precision support to PDLP. The title accurately reflects the primary objective without vague or misleading content.
Description check	✅ Passed	The pull request description is comprehensive and directly related to the changeset. It clearly explains both features (FP32 Precision Mode and Mixed Precision SpMV), their usage, performance implications, and constraints, all of which align with the changes made throughout the codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pdlp_fp32_and_mixed_precision_support

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (6)

docs/cuopt/source/lp-qp-milp-settings.rst (1)

201-207: Prefer an explicit label for the Mixed Precision SpMV cross-reference.

Using a named anchor avoids fragile section-title references and keeps links stable if headings change.

📚 Suggested doc tweak

-For an alternative that maintains FP64 accuracy while improving performance, see :ref:`Mixed Precision SpMV`.
+For an alternative that maintains FP64 accuracy while improving performance, see :ref:`mixed-precision-spmv`.

+.. _mixed-precision-spmv:
 Mixed Precision SpMV
 ^^^^^^^^^^^^^^^^^^^^

As per coding guidelines for docs/**/*, documentation should prioritize consistency and clarity.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/cuopt/source/lp-qp-milp-settings.rst` around lines 201 - 207, Add a
stable named anchor for the "Mixed Precision SpMV" section and use that label in
the earlier cross-reference: insert a label line like "..
_mixed-precision-spmv:" immediately before the "Mixed Precision SpMV" heading,
and replace the existing inline reference "see :ref:`Mixed Precision SpMV`" with
"see :ref:`mixed-precision-spmv`" (or another consistent label name) so the link
remains stable if the heading text changes.

cpp/src/mip_heuristics/presolve/third_party_presolve.cpp (1)

33-46: Avoid full extra copies in same-type vector conversion.

convert_vector takes const&, so when f_t == double, the double -> double path copies very large vectors (Ax, bounds, objective) unnecessarily before PSLP calls.

Proposed refactor

-template <typename To, typename From>
-std::vector<To> convert_vector(const std::vector<From>& src)
+template <typename To, typename From>
+std::vector<To> convert_vector(std::vector<From> src)
 {
   if constexpr (std::is_same_v<To, From>) {
-    return src;  // No conversion needed
+    return std::move(src);  // No conversion needed
   } else {
     std::vector<To> dst(src.size());
     for (size_t i = 0; i < src.size(); ++i) {
       dst[i] = static_cast<To>(src[i]);
     }
     return dst;
   }
 }
@@
-std::vector<double> h_coefficients = convert_vector<double>(h_coefficients_ft);
-std::vector<double> h_obj_coeffs   = convert_vector<double>(h_obj_coeffs_ft);
-std::vector<double> h_var_lb       = convert_vector<double>(h_var_lb_ft);
-std::vector<double> h_var_ub       = convert_vector<double>(h_var_ub_ft);
-std::vector<double> h_constr_lb    = convert_vector<double>(h_constr_lb_ft);
-std::vector<double> h_constr_ub    = convert_vector<double>(h_constr_ub_ft);
+std::vector<double> h_coefficients = convert_vector<double>(std::move(h_coefficients_ft));
+std::vector<double> h_obj_coeffs   = convert_vector<double>(std::move(h_obj_coeffs_ft));
+std::vector<double> h_var_lb       = convert_vector<double>(std::move(h_var_lb_ft));
+std::vector<double> h_var_ub       = convert_vector<double>(std::move(h_var_ub_ft));
+std::vector<double> h_constr_lb    = convert_vector<double>(std::move(h_constr_lb_ft));
+std::vector<double> h_constr_ub    = convert_vector<double>(std::move(h_constr_ub_ft));

As per coding guidelines "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".

Also applies to: 295-300

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp` around lines 33 -
46, The current template convert_vector<To,From>(const std::vector<From>&)
unconditionally returns a std::vector<To>, causing a full copy even when
To==From; fix this by adding an overload for the identical-type case: implement
template<typename T> const std::vector<T>& convert_vector(const std::vector<T>&
src) { return src; } and keep the existing two-type template for actual
conversions (std::vector<To> convert_vector(const std::vector<From>&) with
static_cast loop); update usages (e.g., where convert_vector is called for Ax,
bounds, objective and the other occurrence mentioned) to accept/handle a const
reference when available to avoid unnecessary large-vector copies.

benchmarks/linear_programming/cuopt/run_pdlp.cu (1)

193-198: Add fail-fast validation for unsupported flag combinations.

--pdlp-fp32 / --mixed-precision-spmv constraints are documented, but incompatible combinations are only surfaced later by solver validation. A small upfront check in main would return clearer CLI errors earlier and avoid unnecessary setup/parse work.

💡 Suggested guard in main

   bool use_fp32 = program.get<bool>("--pdlp-fp32");
+  const int method = program.get<int>("--method");
+  const bool crossover_enabled = program.get<int>("--crossover") != 0;
+  const bool mixed_spmv = program.get<bool>("--mixed-precision-spmv");
+
+  if (use_fp32 && (method != static_cast<int>(cuopt::linear_programming::method_t::PDLP) ||
+                   crossover_enabled)) {
+    std::cerr << "--pdlp-fp32 is only supported for PDLP method without crossover.\n";
+    return 1;
+  }
+  if (use_fp32 && mixed_spmv) {
+    std::cerr << "--mixed-precision-spmv has no effect in FP32 mode.\n";
+  }

   if (use_fp32) {
     return run_solver<float>(program, handle_);
   } else {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmarks/linear_programming/cuopt/run_pdlp.cu` around lines 193 - 198, The
CLI currently defers reporting incompatible flag combinations until solver
validation; add an early fail-fast check in main: read the boolean for
"--pdlp-fp32" (use_fp32) and the boolean for "--mixed-precision-spmv" from
program (same API used for program.get<bool>(...)) before calling
run_solver<float>/run_solver<double>, and if the combination is unsupported
(e.g. use_fp32 && mixed-precision-spmv true) print a clear error to stderr and
return a non-zero exit code to abort setup immediately; place this guard just
before the existing branch that calls run_solver to avoid unnecessary
parsing/setup work.

cpp/tests/linear_programming/pdlp_test.cu (1)

1930-2055: Add one FP32 test with mixed_precision_spmv=true to verify no-op behavior.

The PR contract says mixed precision SpMV has no effect in FP32 mode. Adding that explicit case would lock this behavior.

As per coding guidelines, **/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 1930 - 2055, Add a
new FP32 unit test mirroring the existing
float32_papilo_presolve_works/run_float32 pattern that sets
solver_settings.mixed_precision_spmv = true and verifies numerical correctness
(expect CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32); locate and modify the pdlp_test.cu tests around
TEST(pdlp_class, run_float32) or TEST(pdlp_class, float32_papilo_presolve_works)
to add TEST(pdlp_class, float32_mixed_precision_spmv_noop) which constructs
op_problem via cuopt::mps_parser::parse_mps<int,float>, sets
solver_settings.method = cuopt::linear_programming::method_t::PDLP and
solver_settings.mixed_precision_spmv = true, calls solve_lp(&handle_,
op_problem, solver_settings), and asserts both termination status is
CUOPT_TERIMINATION_STATUS_OPTIMAL and the primal objective equals
afiro_primal_objective_f32 to guarantee the mixed-precision flag is a no-op for
FP32.

cpp/src/pdlp/cusparse_view.cu (1)

1091-1091: Consider removing unnecessary synchronization in initialization setup.

Line 1091 synchronizes after mixed precision matrix transforms, but no CPU-side code depends on the transformed data before the solve loop. Downstream initialization functions (compute_initial_step_size, compute_initial_primal_weight) use scaled coefficients, not the float matrices. The mixed precision matrices are consumed only later in GPU kernels on the same stream, where stream ordering ensures correctness without explicit host-blocking synchronization.

Note: This is initialization code, not a hot solver iteration path, so the performance impact is minimal. However, the synchronization may be redundant for correctness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` at line 1091, Remove the blocking host
synchronization call handle_ptr_->get_stream().synchronize() after creating the
mixed-precision matrices; instead rely on CUDA stream ordering so GPU kernels
that consume the mixed-precision data run on the same stream without host-side
synchronize. Locate the call in cusparse_view.cu (the initialization path where
mixed-precision transforms are performed) and delete the synchronize()
invocation; keep the rest of the initialization (including
compute_initial_step_size and compute_initial_primal_weight) unchanged since
they only use scaled coefficients on the host and do not require the float
matrices to be host-visible before the solve kernels execute.

cpp/src/pdlp/pdlp.cu (1)

2974-2996: Consider adding a clarifying comment for the asymmetric instantiation macros.

The PDLP_INSTANTIATE_FLOAT vs MIP_INSTANTIATE_DOUBLE asymmetry is intentional (FP32 is PDLP-specific, while MIP always uses FP64 and relies on PDLP double instantiation). A brief comment would help future maintainers understand this design decision.

📝 Suggested documentation

+// PDLP supports both FP32 and FP64 precision modes.
+// Float instantiation is controlled by PDLP_INSTANTIATE_FLOAT (PDLP-specific).
+// Double instantiation uses MIP_INSTANTIATE_DOUBLE since MIP depends on PDLP<double>.
 `#if` PDLP_INSTANTIATE_FLOAT
 template class pdlp_solver_t<int, float>;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 2974 - 2996, Add a short clarifying
comment above the asymmetric instantiation blocks explaining why
PDLP_INSTANTIATE_FLOAT and MIP_INSTANTIATE_DOUBLE differ: FP32 instantiation is
PDLP-specific while MIP always uses FP64 and depends on PDLP double
instantiation; place the comment immediately above the two preprocessor blocks
that instantiate pdlp_solver_t and the
compute_weights_initial_primal_weight_from_squared_norms function templates so
future maintainers see the rationale for the asymmetry.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/math_optimization/solver_settings.cu`:
- Around line 74-75: The FP32 default tolerances
(CUOPT_PRIMAL_INFEASIBLE_TOLERANCE, CUOPT_DUAL_INFEASIBLE_TOLERANCE and
CUOPT_MIP_ABSOLUTE_GAP) use f_t(1e-10) which is below float epsilon and may
underflow to denorm/zero; update the default construction to clamp to the type's
machine epsilon (e.g., replace raw f_t(1e-10) with std::max(f_t(1e-10),
std::numeric_limits<f_t>::epsilon()) or choose separate defaults for float vs
double) so pdlp_settings.tolerances.primal_infeasible_tolerance and
dual_infeasible_tolerance (and the MIP absolute gap default) are representable
in FP32.

In `@cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu`:
- Around line 182-184: The float instantiation guard in simple_rounding.cu is
using PDLP_INSTANTIATE_FLOAT which is inconsistent with the double guard
(MIP_INSTANTIATE_DOUBLE) and other MIP files; update the preprocessor check
around the INSTANTIATE(float) call to use MIP_INSTANTIATE_FLOAT to match the
file's MIP-specific instantiation pattern (or, only if this code truly needs to
be shared with PDLP, use the combined guard MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT) so that INSTANTIATE(float) follows the same macro
convention as INSTANTIATE(double) and other MIP files.

In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 599-667: The code unconditionally allocates mixed-precision
buffers and creates mixed cuSPARSE descriptors (A_float_, A_T_float_, A_mixed_,
A_T_mixed_, buffer_non_transpose_mixed_, buffer_transpose_mixed_, and calls to
mixed_precision_spmv_buffersize) whenever enable_mixed_precision_spmv is true,
which wastes memory for batch paths that use SpMM on the FP64 descriptors; wrap
the entire mixed-precision allocation block so it only runs when mixed precision
is enabled AND the current execution will use SpMV (e.g., !is_batch_mode or a
flag like use_spmv) — move the allocations, cub::DeviceTransform calls,
cusparseCreateCsr calls, and buffer_size computations inside that guard and skip
them in batch/SpMM execution paths to avoid unnecessary memory use.
- Around line 606-616: The two unwrapped calls to
cub::DeviceTransform::Transform that convert doubles to floats (the calls using
double_to_float_functor with sources op_problem_scaled.coefficients and A_T_.
and destinations A_float_ and A_T_float_ respectively, using
handle_ptr->get_stream().value()) must be wrapped with RAFT_CUDA_TRY to surface
CUDA errors; locate the Transform invocations and wrap each call with
RAFT_CUDA_TRY( ... ) so any returned cudaError_t is checked, and apply the same
change for the identical pair of Transform calls found later in the file.

In `@cpp/src/pdlp/cusparse_view.hpp`:
- Around line 201-207: Replace the raw cusparseSpMatDescr_t handles A_mixed_ and
A_T_mixed_ with the existing RAII wrapper type cusparse_sp_mat_descr_wrapper_t
(i.e., change their declarations to use cusparse_sp_mat_descr_wrapper_t instead
of cusparseSpMatDescr_t), update the code paths that call cusparseCreateCsr to
assign into these wrappers, and remove any manual calls to cusparseDestroySpMat
(relying on the wrapper destructor). Ensure all references to A_mixed_,
A_T_mixed_, cusparseCreateCsr and mixed_precision_enabled_ compile against the
wrapper API so the descriptors are automatically destroyed.

In `@cpp/src/pdlp/pdhg.hpp`:
- Around line 32-33: The constructor signature in pdhg.hpp currently defaults
enable_mixed_precision_spmv to true, which can silently change behavior; change
the default to false at the constructor boundary by updating the parameter
default from enable_mixed_precision_spmv = true to enable_mixed_precision_spmv =
false in the PDHG (pdhg) constructor declaration so callers that omit this
argument keep the documented default-off behavior.

In `@cpp/src/pdlp/solve.cu`:
- Around line 607-612: The FP32+crossover unsupported check currently inside the
template block (f_t) runs too late; move the validation into the beginning of
run_pdlp (or the earlier dispatch function that selects the f_t instantiation)
so invalid configurations fail fast. Specifically, add a guard that mirrors the
existing cuopt_expects(!settings.crossover, ...) check at the top of run_pdlp
(or immediately before selecting/instantiating the PDLP template) to throw a
ValidationError when std::is_same_v<f_t, float> (or when dispatch would pick
f_t=float) and settings.crossover==true; reference run_pdlp, the dispatch call
site for f_t, and the existing cuopt_expects check to locate and duplicate the
logic early in the call path.

---

Nitpick comments:
In `@benchmarks/linear_programming/cuopt/run_pdlp.cu`:
- Around line 193-198: The CLI currently defers reporting incompatible flag
combinations until solver validation; add an early fail-fast check in main: read
the boolean for "--pdlp-fp32" (use_fp32) and the boolean for
"--mixed-precision-spmv" from program (same API used for program.get<bool>(...))
before calling run_solver<float>/run_solver<double>, and if the combination is
unsupported (e.g. use_fp32 && mixed-precision-spmv true) print a clear error to
stderr and return a non-zero exit code to abort setup immediately; place this
guard just before the existing branch that calls run_solver to avoid unnecessary
parsing/setup work.

In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp`:
- Around line 33-46: The current template convert_vector<To,From>(const
std::vector<From>&) unconditionally returns a std::vector<To>, causing a full
copy even when To==From; fix this by adding an overload for the identical-type
case: implement template<typename T> const std::vector<T>& convert_vector(const
std::vector<T>& src) { return src; } and keep the existing two-type template for
actual conversions (std::vector<To> convert_vector(const std::vector<From>&)
with static_cast loop); update usages (e.g., where convert_vector is called for
Ax, bounds, objective and the other occurrence mentioned) to accept/handle a
const reference when available to avoid unnecessary large-vector copies.

In `@cpp/src/pdlp/cusparse_view.cu`:
- Line 1091: Remove the blocking host synchronization call
handle_ptr_->get_stream().synchronize() after creating the mixed-precision
matrices; instead rely on CUDA stream ordering so GPU kernels that consume the
mixed-precision data run on the same stream without host-side synchronize.
Locate the call in cusparse_view.cu (the initialization path where
mixed-precision transforms are performed) and delete the synchronize()
invocation; keep the rest of the initialization (including
compute_initial_step_size and compute_initial_primal_weight) unchanged since
they only use scaled coefficients on the host and do not require the float
matrices to be host-visible before the solve kernels execute.

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2974-2996: Add a short clarifying comment above the asymmetric
instantiation blocks explaining why PDLP_INSTANTIATE_FLOAT and
MIP_INSTANTIATE_DOUBLE differ: FP32 instantiation is PDLP-specific while MIP
always uses FP64 and depends on PDLP double instantiation; place the comment
immediately above the two preprocessor blocks that instantiate pdlp_solver_t and
the compute_weights_initial_primal_weight_from_squared_norms function templates
so future maintainers see the rationale for the asymmetry.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 1930-2055: Add a new FP32 unit test mirroring the existing
float32_papilo_presolve_works/run_float32 pattern that sets
solver_settings.mixed_precision_spmv = true and verifies numerical correctness
(expect CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32); locate and modify the pdlp_test.cu tests around
TEST(pdlp_class, run_float32) or TEST(pdlp_class, float32_papilo_presolve_works)
to add TEST(pdlp_class, float32_mixed_precision_spmv_noop) which constructs
op_problem via cuopt::mps_parser::parse_mps<int,float>, sets
solver_settings.method = cuopt::linear_programming::method_t::PDLP and
solver_settings.mixed_precision_spmv = true, calls solve_lp(&handle_,
op_problem, solver_settings), and asserts both termination status is
CUOPT_TERIMINATION_STATUS_OPTIMAL and the primal objective equals
afiro_primal_objective_f32 to guarantee the mixed-precision flag is a no-op for
FP32.

In `@docs/cuopt/source/lp-qp-milp-settings.rst`:
- Around line 201-207: Add a stable named anchor for the "Mixed Precision SpMV"
section and use that label in the earlier cross-reference: insert a label line
like ".. _mixed-precision-spmv:" immediately before the "Mixed Precision SpMV"
heading, and replace the existing inline reference "see :ref:`Mixed Precision
SpMV`" with "see :ref:`mixed-precision-spmv`" (or another consistent label name)
so the link remains stable if the heading text changes.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4d2726 and 64c5457.

📒 Files selected for processing (42)

benchmarks/linear_programming/cuopt/run_pdlp.cu
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
cpp/src/dual_simplex/sparse_matrix.cpp
cpp/src/math_optimization/solution_writer.cu
cpp/src/math_optimization/solution_writer.hpp
cpp/src/math_optimization/solver_settings.cu
cpp/src/mip_heuristics/diversity/lns/rins.cu
cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
cpp/src/mip_heuristics/mip_constants.hpp
cpp/src/mip_heuristics/presolve/gf2_presolve.cpp
cpp/src/mip_heuristics/presolve/third_party_presolve.cpp
cpp/src/mip_heuristics/problem/presolve_data.cu
cpp/src/mip_heuristics/problem/problem.cu
cpp/src/mip_heuristics/solution/solution.cu
cpp/src/mip_heuristics/solver_solution.cu
cpp/src/pdlp/cpu_pdlp_warm_start_data.cu
cpp/src/pdlp/cusparse_view.cu
cpp/src/pdlp/cusparse_view.hpp
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
cpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cu
cpp/src/pdlp/optimization_problem.cu
cpp/src/pdlp/pdhg.cu
cpp/src/pdlp/pdhg.hpp
cpp/src/pdlp/pdlp.cu
cpp/src/pdlp/pdlp_warm_start_data.cu
cpp/src/pdlp/restart_strategy/localized_duality_gap_container.cu
cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
cpp/src/pdlp/restart_strategy/weighted_average_solution.cu
cpp/src/pdlp/saddle_point.cu
cpp/src/pdlp/solution_conversion.cu
cpp/src/pdlp/solve.cu
cpp/src/pdlp/solver_settings.cu
cpp/src/pdlp/solver_solution.cu
cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
cpp/src/pdlp/termination_strategy/convergence_information.cu
cpp/src/pdlp/termination_strategy/infeasibility_information.cu
cpp/src/pdlp/termination_strategy/termination_strategy.cu
cpp/src/pdlp/translate.hpp
cpp/src/pdlp/utilities/problem_checking.cu
cpp/tests/linear_programming/pdlp_test.cu
docs/cuopt/source/lp-qp-features.rst
docs/cuopt/source/lp-qp-milp-settings.rst

cpp/src/math_optimization/solver_settings.cu

cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu

cpp/src/pdlp/cusparse_view.cu

cpp/src/pdlp/cusparse_view.hpp

cpp/src/pdlp/pdhg.hpp

cpp/src/pdlp/solve.cu

Kh4ster · 2026-02-27T15:06:41Z

/ok to test 366fd6a

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tests/linear_programming/pdlp_test.cu (1)
54-59: ⚠️ Potential issue | 🟠 Major

Add NaN/Inf guard to is_incorrect_objective to prevent masked solver regressions.

The function accepts NaN values silently. When objective is NaN, std::abs(objective) returns NaN, and the comparisons evaluate to false, causing the helper to incorrectly report the objective as valid. This can mask solver regressions. The codebase already uses std::isfinite extensively in solver components (pdlp, feasibility_jump, barrier, etc.) for this exact purpose.
Suggested fix
 template <typename f_t>
 static bool is_incorrect_objective(f_t reference, f_t objective)
 {
+  if (!std::isfinite(reference) || !std::isfinite(objective)) { return true; }
   if (reference == 0) { return std::abs(objective) > 0.01; }
   if (objective == 0) { return std::abs(reference) > 0.01; }
   return std::abs((reference - objective) / reference) > 0.01;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 54 - 59, The helper
is_incorrect_objective must treat NaN/Inf as incorrect: at the start of
is_incorrect_objective(f_t reference, f_t objective) check
std::isfinite(reference) and std::isfinite(objective) and return true if either
is not finite, then proceed with the existing zero checks and
relative-difference logic; update includes/usings if necessary to ensure
std::isfinite is available and referenced.

🧹 Nitpick comments (4)

cpp/src/pdlp/cpu_pdlp_warm_start_data.cu (1)

16-39: Consider batching synchronization for better performance (optional).

Each helper function synchronizes after its copy operation. When convert_to_cpu_warmstart or convert_to_gpu_warmstart copies 9 vector fields, this results in 9 separate synchronization points.

For warm start data that isn't in a hot path, this is acceptable. However, if performance becomes a concern, consider refactoring to batch all copies and synchronize once at the end of the conversion functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cpu_pdlp_warm_start_data.cu` around lines 16 - 39, The helpers
device_to_host_vector and host_to_device_vector currently call
stream.synchronize() after each raft::copy, causing multiple sync points when
convert_to_cpu_warmstart/convert_to_gpu_warmstart copy many fields; change the
helpers to avoid per-copy synchronization by either (a) removing the
stream.synchronize() calls and documenting that the caller must synchronize the
stream once after batching copies, or (b) adding an optional parameter (e.g.
bool do_sync = true) so callers can disable per-copy sync and perform a single
stream.synchronize() at the end of
convert_to_cpu_warmstart/convert_to_gpu_warmstart; update those convert_*
functions to batch the copies and call stream.synchronize() once.

cpp/tests/linear_programming/pdlp_test.cu (1)

1931-2054: Add an explicit FP32 “mixed_precision_spmv is a no-op” test.

PR behavior states mixed-precision SpMV has no effect in FP32 mode, but this block doesn’t directly assert that contract yet.

Suggested test addition

+TEST(pdlp_class, float32_mixed_precision_spmv_no_effect)
+{
+  const raft::handle_t handle_{};
+  auto path = make_path_absolute("linear_programming/afiro_original.mps");
+  cuopt::mps_parser::mps_data_model_t<int, float> op_problem =
+    cuopt::mps_parser::parse_mps<int, float>(path, true);
+
+  auto settings_base   = pdlp_solver_settings_t<int, float>{};
+  settings_base.method = cuopt::linear_programming::method_t::PDLP;
+  settings_base.mixed_precision_spmv = false;
+
+  auto settings_mixed = settings_base;
+  settings_mixed.mixed_precision_spmv = true;
+
+  auto solution_base  = solve_lp(&handle_, op_problem, settings_base);
+  auto solution_mixed = solve_lp(&handle_, op_problem, settings_mixed);
+
+  EXPECT_EQ((int)solution_base.get_termination_status(), CUOPT_TERIMINATION_STATUS_OPTIMAL);
+  EXPECT_EQ((int)solution_mixed.get_termination_status(), CUOPT_TERIMINATION_STATUS_OPTIMAL);
+  EXPECT_NEAR(solution_base.get_additional_termination_information().primal_objective,
+              solution_mixed.get_additional_termination_information().primal_objective,
+              1e-2f);
+}

As per coding guidelines: **/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error').

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 1931 - 2054, Add a
test that explicitly verifies mixed-precision SpMV is a no-op in FP32: create a
pdlp_solver_settings_t<int,float> (same as in run_float32) set method =
cuopt::linear_programming::method_t::PDLP and set
solver_settings.mixed_precision_spmv = true, call solve_lp(&handle_, op_problem,
solver_settings), then assert termination status equals
CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32 (use is_incorrect_objective to check numerical
equality), mirroring the checks in the existing run_float32/papilo/pslp tests so
the behavior is explicitly tested.

cpp/src/pdlp/pdhg.cu (1)

254-280: Consolidate repeated mixed-vs-standard SpMV dispatch into one helper.

The same precision-dispatch branch is repeated in three methods. A shared helper would reduce drift risk and keep future algorithm/descriptor changes in one place.

♻️ Suggested refactor sketch

+template <typename i_t, typename f_t>
+template <typename MatStd, typename MatMixed, typename VecX, typename VecY, typename BufStd, typename BufMixed>
+inline void pdhg_solver_t<i_t, f_t>::run_spmv_dispatch(MatStd mat_std,
+                                                        MatMixed mat_mixed,
+                                                        VecX vec_x,
+                                                        VecY vec_y,
+                                                        BufStd buf_std,
+                                                        BufMixed buf_mixed)
+{
+  if constexpr (std::is_same_v<f_t, double>) {
+    if (cusparse_view_.mixed_precision_enabled_) {
+      mixed_precision_spmv(handle_ptr_->get_cusparse_handle(),
+                           CUSPARSE_OPERATION_NON_TRANSPOSE,
+                           reusable_device_scalar_value_1_.data(),
+                           mat_mixed,
+                           vec_x,
+                           reusable_device_scalar_value_0_.data(),
+                           vec_y,
+                           CUSPARSE_SPMV_CSR_ALG2,
+                           buf_mixed,
+                           stream_view_);
+      return;
+    }
+  }
+  RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmv(handle_ptr_->get_cusparse_handle(),
+                                                        CUSPARSE_OPERATION_NON_TRANSPOSE,
+                                                        reusable_device_scalar_value_1_.data(),
+                                                        mat_std,
+                                                        vec_x,
+                                                        reusable_device_scalar_value_0_.data(),
+                                                        vec_y,
+                                                        CUSPARSE_SPMV_CSR_ALG2,
+                                                        (f_t*)buf_std,
+                                                        stream_view_));
+}

As per coding guidelines, Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication.

Also applies to: 308-334, 356-382

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdhg.cu` around lines 254 - 280, Duplicate precision-dispatch
SpMV logic should be extracted into a single helper: create a templated utility
(e.g., spmv_dispatch or dispatch_spmv) that accepts the cusparse handle
(handle_ptr_->get_cusparse_handle()), operation flag, alpha/beta device scalars
(reusable_device_scalar_value_1_.data(),
reusable_device_scalar_value_0_.data()), both mixed and standard matrix pointers
(cusparse_view_.A_mixed_, cusparse_view_.A), input/output buffers
(cusparse_view_.tmp_primal, cusparse_view_.dual_gradient), algorithm/buffer args
(CUSPARSE_SPMV_CSR_ALG2 and both
buffer_non_transpose_mixed_.data()/buffer_non_transpose.data()), and the stream
(stream_view_), then replace the repeated branches in pdhg.cu (and the other two
occurrences) with calls to that helper which checks
cusparse_view_.mixed_precision_enabled_ and calls mixed_precision_spmv or
raft::sparse::detail::cusparsespmv accordingly.

cpp/src/pdlp/cusparse_view.cu (1)

1069-1082: Remove the blocking stream synchronization after mixed-precision matrix transforms.

Line 1081's handle_ptr_->get_stream().synchronize() blocks the host after submitting two matrix transforms (A_ → A_float_ and A_T_ → A_T_float_) to the same GPU stream. Since subsequent mixed-precision SpMV operations are queued on the same stream, the hardware will ensure correct ordering without explicit host stall. Removing this synchronization eliminates an unnecessary pipeline blockage in the hot path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1069 - 1082, The explicit
host-side stall is caused by handle_ptr_->get_stream().synchronize() after
launching two cub::DeviceTransform::Transform calls that convert A_→A_float_ and
A_T_→A_T_float_; remove the synchronize call so the transforms run
asynchronously on the same stream (GPU will preserve ordering) and let
subsequent mixed-precision SpMV kernels queue on that stream without blocking
the host; ensure no other code relies on the host wait and delete only the
handle_ptr_->get_stream().synchronize() invocation referenced here.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 1151-1156: The current hard validation using cuopt_expects blocks
float-precision users; update solve_lp() to auto-map float precision to
method_t::PDLP when the user has not explicitly selected an incompatible method
(i.e., settings.method is still method_t::Concurrent/default) instead of
failing: detect float precision and if settings.method == method_t::Concurrent,
set settings.method = method_t::PDLP and emit a log/info message about the
override; keep the existing cuopt_expects check only for the case where the user
explicitly set an incompatible method (e.g., DualSimplex/Barrier) so an error is
still raised for explicit invalid combinations. Ensure you reference and modify
the code paths around solve_lp(), settings.method, and the float-precision
branch that currently calls run_pdlp().

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Line 23: Update the preprocessor guard that gates the PDLP float tests:
replace the overly-broad conditional that uses MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT with a check that only uses PDLP_INSTANTIATE_FLOAT so
PDLP float tests compile only when PDLP is instantiated for float; locate the
conditional in pdlp_test.cu (the lines surrounding the existing `#if`
MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT) and change it to `#if`
PDLP_INSTANTIATE_FLOAT, leaving the rest of the test code unchanged.

---

Outside diff comments:
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 54-59: The helper is_incorrect_objective must treat NaN/Inf as
incorrect: at the start of is_incorrect_objective(f_t reference, f_t objective)
check std::isfinite(reference) and std::isfinite(objective) and return true if
either is not finite, then proceed with the existing zero checks and
relative-difference logic; update includes/usings if necessary to ensure
std::isfinite is available and referenced.

---

Nitpick comments:
In `@cpp/src/pdlp/cpu_pdlp_warm_start_data.cu`:
- Around line 16-39: The helpers device_to_host_vector and host_to_device_vector
currently call stream.synchronize() after each raft::copy, causing multiple sync
points when convert_to_cpu_warmstart/convert_to_gpu_warmstart copy many fields;
change the helpers to avoid per-copy synchronization by either (a) removing the
stream.synchronize() calls and documenting that the caller must synchronize the
stream once after batching copies, or (b) adding an optional parameter (e.g.
bool do_sync = true) so callers can disable per-copy sync and perform a single
stream.synchronize() at the end of
convert_to_cpu_warmstart/convert_to_gpu_warmstart; update those convert_*
functions to batch the copies and call stream.synchronize() once.

In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1069-1082: The explicit host-side stall is caused by
handle_ptr_->get_stream().synchronize() after launching two
cub::DeviceTransform::Transform calls that convert A_→A_float_ and
A_T_→A_T_float_; remove the synchronize call so the transforms run
asynchronously on the same stream (GPU will preserve ordering) and let
subsequent mixed-precision SpMV kernels queue on that stream without blocking
the host; ensure no other code relies on the host wait and delete only the
handle_ptr_->get_stream().synchronize() invocation referenced here.

In `@cpp/src/pdlp/pdhg.cu`:
- Around line 254-280: Duplicate precision-dispatch SpMV logic should be
extracted into a single helper: create a templated utility (e.g., spmv_dispatch
or dispatch_spmv) that accepts the cusparse handle
(handle_ptr_->get_cusparse_handle()), operation flag, alpha/beta device scalars
(reusable_device_scalar_value_1_.data(),
reusable_device_scalar_value_0_.data()), both mixed and standard matrix pointers
(cusparse_view_.A_mixed_, cusparse_view_.A), input/output buffers
(cusparse_view_.tmp_primal, cusparse_view_.dual_gradient), algorithm/buffer args
(CUSPARSE_SPMV_CSR_ALG2 and both
buffer_non_transpose_mixed_.data()/buffer_non_transpose.data()), and the stream
(stream_view_), then replace the repeated branches in pdhg.cu (and the other two
occurrences) with calls to that helper which checks
cusparse_view_.mixed_precision_enabled_ and calls mixed_precision_spmv or
raft::sparse::detail::cusparsespmv accordingly.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 1931-2054: Add a test that explicitly verifies mixed-precision
SpMV is a no-op in FP32: create a pdlp_solver_settings_t<int,float> (same as in
run_float32) set method = cuopt::linear_programming::method_t::PDLP and set
solver_settings.mixed_precision_spmv = true, call solve_lp(&handle_, op_problem,
solver_settings), then assert termination status equals
CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32 (use is_incorrect_objective to check numerical
equality), mirroring the checks in the existing run_float32/papilo/pslp tests so
the behavior is explicitly tested.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64c5457 and 366fd6a.

📒 Files selected for processing (31)

cpp/src/dual_simplex/sparse_matrix.cpp
cpp/src/math_optimization/solution_writer.cu
cpp/src/math_optimization/solver_settings.cu
cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
cpp/src/mip_heuristics/problem/problem.cu
cpp/src/mip_heuristics/solution/solution.cu
cpp/src/mip_heuristics/solver_solution.cu
cpp/src/pdlp/cpu_pdlp_warm_start_data.cu
cpp/src/pdlp/cusparse_view.cu
cpp/src/pdlp/cusparse_view.hpp
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
cpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cu
cpp/src/pdlp/optimization_problem.cu
cpp/src/pdlp/pdhg.cu
cpp/src/pdlp/pdhg.hpp
cpp/src/pdlp/pdlp.cu
cpp/src/pdlp/pdlp_warm_start_data.cu
cpp/src/pdlp/restart_strategy/localized_duality_gap_container.cu
cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
cpp/src/pdlp/restart_strategy/weighted_average_solution.cu
cpp/src/pdlp/saddle_point.cu
cpp/src/pdlp/solution_conversion.cu
cpp/src/pdlp/solve.cu
cpp/src/pdlp/solver_settings.cu
cpp/src/pdlp/solver_solution.cu
cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
cpp/src/pdlp/termination_strategy/convergence_information.cu
cpp/src/pdlp/termination_strategy/infeasibility_information.cu
cpp/src/pdlp/termination_strategy/termination_strategy.cu
cpp/src/pdlp/utilities/problem_checking.cu
cpp/tests/linear_programming/pdlp_test.cu

🚧 Files skipped from review as they are similar to previous changes (15)

cpp/src/mip_heuristics/solution/solution.cu
cpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cu
cpp/src/pdlp/pdhg.hpp
cpp/src/pdlp/solution_conversion.cu
cpp/src/pdlp/pdlp_warm_start_data.cu
cpp/src/pdlp/solver_solution.cu
cpp/src/pdlp/saddle_point.cu
cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
cpp/src/pdlp/termination_strategy/termination_strategy.cu
cpp/src/pdlp/termination_strategy/infeasibility_information.cu
cpp/src/pdlp/solver_settings.cu
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
cpp/src/pdlp/restart_strategy/weighted_average_solution.cu
cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
cpp/src/mip_heuristics/problem/problem.cu

cpp/src/pdlp/solve.cu

cpp/tests/linear_programming/pdlp_test.cu

Kh4ster · 2026-02-27T16:16:51Z

/ok to test 2cb9ce5

Kh4ster · 2026-03-02T17:07:14Z

/ok to test 410ec3c

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

cpp/src/pdlp/solution_conversion.cu (1)
138-146: Add explicit <type_traits> include for std::is_same_v.

std::is_same_v is used at line 142 but the file lacks a direct include for <type_traits>. Adding this include avoids reliance on transitive headers, which can be fragile if upstream dependencies change.
Suggested patch
 `#include` <rmm/device_buffer.hpp>
 `#include` <rmm/device_uvector.hpp>
+#include <type_traits>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solution_conversion.cu` around lines 138 - 146, Add a direct
include for <type_traits> at the top of the translation unit so std::is_same_v
used in the to_cpu_buffer template is defined without relying on transitive
includes; update the includes in cpp/src/pdlp/solution_conversion.cu (before the
anonymous-namespace template<typename f_t> cuopt::cython::cpu_buffer
to_cpu_buffer(std::vector<f_t>& src)) to add the header for <type_traits>.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/pdlp/optimization_problem.cu`:
- Around line 1508-1978: This file contains a duplicated block of template
member-function definitions (including set_objective_name, set_problem_name,
set_variable_names, set_row_names, get_n_variables, get_n_constraints, get_nnz,
get_n_integers, get_handle_ptr, all get_* accessors, view(), set_maximize,
write_to_mps, print_scaling_information, has_quadratic_objective, and the
conflicting empty()) that re-defines methods already implemented earlier; remove
this duplicate block so each method (e.g.,
optimization_problem_t<i_t,f_t>::set_objective_name, ::write_to_mps,
::print_scaling_information, ::has_quadratic_objective, and ::empty) is defined
only once, keeping the intended canonical implementations (resolve the empty()
conflict by retaining the earlier definition), rebuild to ensure no
ODR/redefinition errors, and run tests.

---

Nitpick comments:
In `@cpp/src/pdlp/solution_conversion.cu`:
- Around line 138-146: Add a direct include for <type_traits> at the top of the
translation unit so std::is_same_v used in the to_cpu_buffer template is defined
without relying on transitive includes; update the includes in
cpp/src/pdlp/solution_conversion.cu (before the anonymous-namespace
template<typename f_t> cuopt::cython::cpu_buffer to_cpu_buffer(std::vector<f_t>&
src)) to add the header for <type_traits>.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abf13f3 and 410ec3c.

📒 Files selected for processing (6)

benchmarks/linear_programming/cuopt/run_pdlp.cu
cpp/src/pdlp/optimization_problem.cu
cpp/src/pdlp/solution_conversion.cu
cpp/src/pdlp/solve.cu
cpp/src/pdlp/translate.hpp
cpp/tests/linear_programming/pdlp_test.cu

🚧 Files skipped from review as they are similar to previous changes (2)

cpp/tests/linear_programming/pdlp_test.cu
cpp/src/pdlp/translate.hpp

cpp/src/pdlp/optimization_problem.cu

Kh4ster · 2026-03-03T09:12:27Z

/ok to test a3dd383

Kh4ster · 2026-03-03T10:45:38Z

/ok to test 14f9052

aliceb-nv

LGTM, thanks for the extensive work Nicolas :)
Just a few minor stylistic nitpicks.

aliceb-nv · 2026-03-03T10:55:53Z

cpp/src/mip_heuristics/presolve/third_party_presolve.cpp

+    std::vector<f_t> h_primal_out(n_cols);
+    std::vector<f_t> h_dual_out(n_rows);
+    std::vector<f_t> h_reduced_costs_out(n_cols);
+    for (int i = 0; i < n_cols; ++i) {
+      h_primal_out[i]        = static_cast<f_t>(uncrushed_sol->x[i]);
+      h_reduced_costs_out[i] = static_cast<f_t>(uncrushed_sol->z[i]);
+    }
+    for (int i = 0; i < n_rows; ++i) {
+      h_dual_out[i] = static_cast<f_t>(uncrushed_sol->y[i]);
+    }
+
+    primal_solution.resize(n_cols, stream_view);
+    dual_solution.resize(n_rows, stream_view);
+    reduced_costs.resize(n_cols, stream_view);
+    raft::copy(primal_solution.data(), h_primal_out.data(), n_cols, stream_view);
+    raft::copy(dual_solution.data(), h_dual_out.data(), n_rows, stream_view);
+    raft::copy(reduced_costs.data(), h_reduced_costs_out.data(), n_cols, stream_view);


nit: it might be possible to unify that part of the function with thrust::transform calls that would automatically perform the implicit conversion through the iterators. It would materialize into a kernel rather than a memcpy but I don't think it matters much here

Come to think of it we can probably do the same for the input also. It might be possible to get rid of the constexpr branches for the most part that way

cpp/src/pdlp/cusparse_view.hpp

aliceb-nv · 2026-03-03T11:01:48Z

cpp/src/pdlp/pdhg.cu

+    if constexpr (std::is_same_v<f_t, double>) {
+      if (cusparse_view_.mixed_precision_enabled_) {
+        mixed_precision_spmv(handle_ptr_->get_cusparse_handle(),
+                             CUSPARSE_OPERATION_NON_TRANSPOSE,
+                             reusable_device_scalar_value_1_.data(),
+                             cusparse_view_.A_mixed_,
+                             cusparse_view_.reflected_primal_solution,
+                             reusable_device_scalar_value_0_.data(),
+                             cusparse_view_.dual_gradient,
+                             CUSPARSE_SPMV_CSR_ALG2,
+                             cusparse_view_.buffer_non_transpose_mixed_.data(),
+                             stream_view_);
+      }
+    }
+    if (!cusparse_view_.mixed_precision_enabled_) {
+      RAFT_CUSPARSE_TRY(
+        raft::sparse::detail::cusparsespmv(handle_ptr_->get_cusparse_handle(),
+                                           CUSPARSE_OPERATION_NON_TRANSPOSE,
+                                           reusable_device_scalar_value_1_.data(),
+                                           cusparse_view_.A,
+                                           cusparse_view_.reflected_primal_solution,
+                                           reusable_device_scalar_value_0_.data(),
+                                           cusparse_view_.dual_gradient,
+                                           CUSPARSE_SPMV_CSR_ALG2,
+                                           (f_t*)cusparse_view_.buffer_non_transpose.data(),
+                                           stream_view_));
+    }


Another nit - it might be possible to unify this into a single A_spmv function, that takes an option to specify whether to use A or its transpose. Not fully confident if its worth the code changes though

Kh4ster · 2026-03-03T11:36:27Z

/ok to test ab9e8eb

rg20

I think we should move the logic of precision inside the solve and not change/add new APIs.

rg20 · 2026-03-03T15:31:06Z

cpp/src/mip_heuristics/solution/solution.cu

 }

-#if MIP_INSTANTIATE_FLOAT
+#if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT


If this is done in all the files, we can just have one flag, CUOPT_INSTANTIATE_FLOAT

rg20 · 2026-03-03T15:35:07Z

cpp/src/pdlp/solution_conversion.cu

-  cpu.primal_solution_ = std::move(primal_solution_);
-  cpu.dual_solution_   = std::move(dual_solution_);
-  cpu.reduced_cost_    = std::move(reduced_cost_);
+  cpu.primal_solution_ = to_cpu_buffer(primal_solution_);


I don't think we should bring up the mixed precision logic this far. The conversion should have happened much before.

rg20 · 2026-03-03T15:36:55Z

cpp/src/pdlp/solve.cu

-    if (problem.maximize) {
-      adjust_dual_solution_and_reduced_cost(
-        final_dual_solution, final_reduced_cost, problem.handle_ptr->get_stream());
+  if constexpr (std::is_same_v<f_t, double>) {


Does this mean crossover is disabled for single precision? I don't think we want to make that decision.

rg20 · 2026-03-03T15:38:46Z

cpp/src/pdlp/solve_remote.cu


 // Explicit template instantiations for remote execution stubs
+#if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT
+template std::unique_ptr<lp_solution_interface_t<int, float>> solve_lp_remote(


The APIs should be still in double precision and only the inner solves should be changed.

rg20 · 2026-03-03T15:40:37Z

cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp

+   * Convergence checking and restarts always use the full FP64 matrix, so this does
+   * not reduce overall memory usage.  Has no effect in FP32 mode.
+   */
+  bool mixed_precision_spmv{false};


is there another flag for fp32?

I think we should just have --pdlp-precision flag with options: default, single, double, mixed as options. This allows us the change the definition of mixed in future.

rg20 · 2026-03-03T15:41:37Z

docs/cuopt/source/lp-qp-features.rst


 Users can submit a set of problems which will be solved in a batch. Problems will be solved at the same time in parallel to fully utilize the GPU. Checkout :ref:`self-hosted client <generic-example-with-normal-and-batch-mode>` example in thin client.

+FP32 Precision Mode


Lets just have one parameter for precision

Kh4ster added 11 commits February 4, 2026 10:39

working fp32 support for PDLP no presolve no crossover

6e309df

updade run_pdlp to allow for fp32

b9ac363

Merge branch 'main' into pdlp_float32

c7f6a24

Merge branch 'main' into pdlp_float32

3d11bfd

support fp32 with presolve

c4e778e

implement and toggle mixed precision

6a72668

Merge branch 'main' into mixed_precision_on

c134104

Merge branch 'main' into mixed_precision_on

76eef99

cleanup and doc

ea37dd2

update doc

f8f673a

style cleanup

64c5457

Kh4ster requested review from a team as code owners February 27, 2026 14:08

Kh4ster requested review from aliceb-nv, kaatish and tmckayus February 27, 2026 14:08

Kh4ster self-assigned this Feb 27, 2026

Kh4ster removed the request for review from kaatish February 27, 2026 14:10

Kh4ster added feature request New feature or request non-breaking Introduces a non-breaking change pdlp labels Feb 27, 2026

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

Kh4ster added 2 commits February 27, 2026 15:41

use || for all pdlp float instanciation

342ba32

address PR comments

366fd6a

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

cpp/src/pdlp/solve.cu Show resolved Hide resolved

cpp/tests/linear_programming/pdlp_test.cu Show resolved Hide resolved

anandhkb added this to the 26.04 milestone Feb 27, 2026

add forgotten parameter

abf13f3

Merge branch 'main' into pdlp_fp32_and_mixed_precision_support

2cb9ce5

Kh4ster added 2 commits March 2, 2026 18:04

handle cuda version not supporting mixed precision

db6b2a3

Merge branch 'main' into pdlp_fp32_and_mixed_precision_support

410ec3c

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

cpp/src/pdlp/optimization_problem.cu Outdated Show resolved Hide resolved

fix compilation issue following the recent main merge

a3dd383

fix cuda version guard to check cusparse version dynamically

14f9052

aliceb-nv approved these changes Mar 3, 2026

View reviewed changes

fix doc

ab9e8eb

rg20 reviewed Mar 3, 2026

View reviewed changes


		Users can submit a set of problems which will be solved in a batch. Problems will be solved at the same time in parallel to fully utilize the GPU. Checkout :ref:`self-hosted client <generic-example-with-normal-and-batch-mode>` example in thin client.

		FP32 Precision Mode

Conversation

Kh4ster commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FP32 Precision Mode

Mixed Precision SpMV

Uh oh!

copy-pr-bot bot commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kh4ster commented Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Kh4ster commented Feb 27, 2026

Uh oh!

Kh4ster commented Mar 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kh4ster commented Mar 3, 2026

Uh oh!

Kh4ster commented Mar 3, 2026

Uh oh!

aliceb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kh4ster commented Mar 3, 2026

Uh oh!

rg20 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kh4ster commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading