Skip to content

Add support for FP32 and mixed precision in PDLP#910

Open
Kh4ster wants to merge 20 commits intomainfrom
pdlp_fp32_and_mixed_precision_support
Open

Add support for FP32 and mixed precision in PDLP#910
Kh4ster wants to merge 20 commits intomainfrom
pdlp_fp32_and_mixed_precision_support

Conversation

@Kh4ster
Copy link
Contributor

@Kh4ster Kh4ster commented Feb 27, 2026

This PR adds support for FP32 and mixed precision in PDLP (not MIP, Dual Simplex or Barrier).

Those two new options are available through:

  • FP32: creating the whole solver object as FP32 or passing "--pdlp-fp32" to the solve_LP binary
  • Mixed precision: toggling the option "mixed_precision_spmv" in solver_settings or passing "--mixed-precision-spmv" to the solve_LP binary

Below, the detail of what each feature allows:

FP32 Precision Mode

By default, PDLP operates in FP64 (double) precision. Users can switch to FP32 (float) precision for the entire solve. FP32 uses half the memory of FP64 and allows PDHG iterations to be on average twice as fast, but it may require more iterations to converge due to reduced numerical accuracy. FP32 mode is only supported with the PDLP method (not concurrent) and without crossover.

Note: The default precision is FP64 (double).

Mixed Precision SpMV

When running PDLP in FP64 mode, users can enable mixed precision sparse matrix-vector products (SpMV) during PDHG iterations. In this mode, the constraint matrix and its transpose are stored in FP32 while vectors and the compute type remain in FP64. This allows SpMV operations to be faster thanks to reduced memory bandwidth requirements, while maintaining FP64 accuracy in the accumulation. This will make PDHG iterations faster while limiting the potential negative impact on convergence (compared to running in FP32 mode). Convergence checking and restart logic always use the full FP64 matrix, so this mode does not reduce memory usage since both the FP32 and FP64 copies of the matrix are kept in memory. Mixed precision SpMV only applies in FP64 mode and has no effect when running in FP32.

Note: The default value is false.

@Kh4ster Kh4ster requested review from a team as code owners February 27, 2026 14:08
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Kh4ster Kh4ster self-assigned this Feb 27, 2026
@Kh4ster Kh4ster removed the request for review from kaatish February 27, 2026 14:10
@Kh4ster Kh4ster added feature request New feature or request non-breaking Introduces a non-breaking change pdlp labels Feb 27, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds FP32 PDLP support and an optional mixed-precision SpMV path. Code is templated over floating type f_t, new CLI flags control precision and mixed-SpMV, FP32 matrix copies and mixed-SpMV helpers are added, and many template-instantiation guards were extended to enable float builds.

Changes

Cohort / File(s) Summary
Benchmark / CLI
benchmarks/linear_programming/cuopt/run_pdlp.cu
Added --pdlp-fp32 and --mixed-precision-spmv flags; templated solver settings and introduced run_solver<f_t> with runtime precision dispatch; moved arg/handle init into templated run path.
Solver Settings & Config
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp, cpp/src/pdlp/solver_settings.cu, cpp/src/math_optimization/solver_settings.cu
Templated time_limit/tolerances to f_t, added mixed_precision_spmv flag, converted numeric literals/bounds to f_t(...).
PDLP Core & PDHG
cpp/src/pdlp/pdlp.cu, cpp/src/pdlp/pdhg.hpp, cpp/src/pdlp/pdhg.cu, cpp/src/pdlp/solve.cu
Added enable_mixed_precision_spmv param and propagation; PDHG now selects mixed-precision SpMV paths when enabled; added templated cublasGeam<f_t> wrapper; FP32 paths restricted (no crossover/other solvers).
cusparse_view / Mixed-SpMV Implementation
cpp/src/pdlp/cusparse_view.hpp, cpp/src/pdlp/cusparse_view.cu
Added FP32 copies/descriptors/buffers, mixed-precision SpMV helpers and optional preprocess (CUDA12.4+), update_mixed_precision_matrices(), and ctor parameter to enable feature.
Template Instantiation Guards
many files under cpp/src/pdlp/..., cpp/src/mip_heuristics/...
Broadened float instantiation guards from MIP_INSTANTIATE_FLOAT to `MIP_INSTANTIATE_FLOAT
Sparse / Matrix Utilities & Conversions
cpp/src/dual_simplex/sparse_matrix.cpp, cpp/src/mip_heuristics/presolve/third_party_presolve.cpp, cpp/src/pdlp/optimization_problem.cu
Added float explicit instantiations; introduced convert_vector<To,From> and f_t↔double conversions for PSLP/Papilo integration; adjusted host/vector handling and presolve/undo flows.
Solution I/O & Conversion
cpp/src/math_optimization/solution_writer.hpp, cpp/src/math_optimization/solution_writer.cu, cpp/src/pdlp/solution_conversion.cu, cpp/src/mip_heuristics/solver_solution.cu
Templated write_solution_to_sol_file<f_t> and updated signatures; added to_cpu_buffer helper; added explicit float/double instantiations and file-open safety; precision-aware formatting.
Utilities, Strategies & Restart Components
various cpp/src/mip_heuristics/..., cpp/src/pdlp/... files
Replaced raw numeric literals with f_t(...), cast time/fixrate calculations to f_t, and expanded instantiation guards to include PDLP_INSTANTIATE_FLOAT.
Tests
cpp/tests/linear_programming/pdlp_test.cu
Added mip_constants.hpp include; templated is_incorrect_objective<f_t>; added FP32 test variants and guarded float tests under `MIP_INSTANTIATE_FLOAT
Docs
docs/cuopt/source/lp-qp-features.rst, docs/cuopt/source/lp-qp-milp-settings.rst
Documented FP32 PDLP mode and Mixed Precision SpMV option, constraints, and defaults.
Build Constants
cpp/src/mip_heuristics/mip_constants.hpp
Added PDLP_INSTANTIATE_FLOAT macro and bumped header year.
Miscellaneous small edits
several cpp/src/pdlp/... and cpp/src/mip_heuristics/... files
Minor casts/templating fixes (e.g., typed literals f_t(1.0)), added includes, and adjusted few instantiation/forward-declarations for float specializations.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely summarizes the main changes: adding FP32 and mixed precision support to PDLP. The title accurately reflects the primary objective without vague or misleading content.
Description check ✅ Passed The pull request description is comprehensive and directly related to the changeset. It clearly explains both features (FP32 Precision Mode and Mixed Precision SpMV), their usage, performance implications, and constraints, all of which align with the changes made throughout the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pdlp_fp32_and_mixed_precision_support

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (6)
docs/cuopt/source/lp-qp-milp-settings.rst (1)

201-207: Prefer an explicit label for the Mixed Precision SpMV cross-reference.

Using a named anchor avoids fragile section-title references and keeps links stable if headings change.

📚 Suggested doc tweak
-For an alternative that maintains FP64 accuracy while improving performance, see :ref:`Mixed Precision SpMV`.
+For an alternative that maintains FP64 accuracy while improving performance, see :ref:`mixed-precision-spmv`.

+.. _mixed-precision-spmv:
 Mixed Precision SpMV
 ^^^^^^^^^^^^^^^^^^^^

As per coding guidelines for docs/**/*, documentation should prioritize consistency and clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/cuopt/source/lp-qp-milp-settings.rst` around lines 201 - 207, Add a
stable named anchor for the "Mixed Precision SpMV" section and use that label in
the earlier cross-reference: insert a label line like "..
_mixed-precision-spmv:" immediately before the "Mixed Precision SpMV" heading,
and replace the existing inline reference "see :ref:`Mixed Precision SpMV`" with
"see :ref:`mixed-precision-spmv`" (or another consistent label name) so the link
remains stable if the heading text changes.
cpp/src/mip_heuristics/presolve/third_party_presolve.cpp (1)

33-46: Avoid full extra copies in same-type vector conversion.

convert_vector takes const&, so when f_t == double, the double -> double path copies very large vectors (Ax, bounds, objective) unnecessarily before PSLP calls.

Proposed refactor
-template <typename To, typename From>
-std::vector<To> convert_vector(const std::vector<From>& src)
+template <typename To, typename From>
+std::vector<To> convert_vector(std::vector<From> src)
 {
   if constexpr (std::is_same_v<To, From>) {
-    return src;  // No conversion needed
+    return std::move(src);  // No conversion needed
   } else {
     std::vector<To> dst(src.size());
     for (size_t i = 0; i < src.size(); ++i) {
       dst[i] = static_cast<To>(src[i]);
     }
     return dst;
   }
 }
@@
-std::vector<double> h_coefficients = convert_vector<double>(h_coefficients_ft);
-std::vector<double> h_obj_coeffs   = convert_vector<double>(h_obj_coeffs_ft);
-std::vector<double> h_var_lb       = convert_vector<double>(h_var_lb_ft);
-std::vector<double> h_var_ub       = convert_vector<double>(h_var_ub_ft);
-std::vector<double> h_constr_lb    = convert_vector<double>(h_constr_lb_ft);
-std::vector<double> h_constr_ub    = convert_vector<double>(h_constr_ub_ft);
+std::vector<double> h_coefficients = convert_vector<double>(std::move(h_coefficients_ft));
+std::vector<double> h_obj_coeffs   = convert_vector<double>(std::move(h_obj_coeffs_ft));
+std::vector<double> h_var_lb       = convert_vector<double>(std::move(h_var_lb_ft));
+std::vector<double> h_var_ub       = convert_vector<double>(std::move(h_var_ub_ft));
+std::vector<double> h_constr_lb    = convert_vector<double>(std::move(h_constr_lb_ft));
+std::vector<double> h_constr_ub    = convert_vector<double>(std::move(h_constr_ub_ft));

As per coding guidelines "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".

Also applies to: 295-300

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp` around lines 33 -
46, The current template convert_vector<To,From>(const std::vector<From>&)
unconditionally returns a std::vector<To>, causing a full copy even when
To==From; fix this by adding an overload for the identical-type case: implement
template<typename T> const std::vector<T>& convert_vector(const std::vector<T>&
src) { return src; } and keep the existing two-type template for actual
conversions (std::vector<To> convert_vector(const std::vector<From>&) with
static_cast loop); update usages (e.g., where convert_vector is called for Ax,
bounds, objective and the other occurrence mentioned) to accept/handle a const
reference when available to avoid unnecessary large-vector copies.
benchmarks/linear_programming/cuopt/run_pdlp.cu (1)

193-198: Add fail-fast validation for unsupported flag combinations.

--pdlp-fp32 / --mixed-precision-spmv constraints are documented, but incompatible combinations are only surfaced later by solver validation. A small upfront check in main would return clearer CLI errors earlier and avoid unnecessary setup/parse work.

💡 Suggested guard in main
   bool use_fp32 = program.get<bool>("--pdlp-fp32");
+  const int method = program.get<int>("--method");
+  const bool crossover_enabled = program.get<int>("--crossover") != 0;
+  const bool mixed_spmv = program.get<bool>("--mixed-precision-spmv");
+
+  if (use_fp32 && (method != static_cast<int>(cuopt::linear_programming::method_t::PDLP) ||
+                   crossover_enabled)) {
+    std::cerr << "--pdlp-fp32 is only supported for PDLP method without crossover.\n";
+    return 1;
+  }
+  if (use_fp32 && mixed_spmv) {
+    std::cerr << "--mixed-precision-spmv has no effect in FP32 mode.\n";
+  }

   if (use_fp32) {
     return run_solver<float>(program, handle_);
   } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/linear_programming/cuopt/run_pdlp.cu` around lines 193 - 198, The
CLI currently defers reporting incompatible flag combinations until solver
validation; add an early fail-fast check in main: read the boolean for
"--pdlp-fp32" (use_fp32) and the boolean for "--mixed-precision-spmv" from
program (same API used for program.get<bool>(...)) before calling
run_solver<float>/run_solver<double>, and if the combination is unsupported
(e.g. use_fp32 && mixed-precision-spmv true) print a clear error to stderr and
return a non-zero exit code to abort setup immediately; place this guard just
before the existing branch that calls run_solver to avoid unnecessary
parsing/setup work.
cpp/tests/linear_programming/pdlp_test.cu (1)

1930-2055: Add one FP32 test with mixed_precision_spmv=true to verify no-op behavior.

The PR contract says mixed precision SpMV has no effect in FP32 mode. Adding that explicit case would lock this behavior.

As per coding guidelines, **/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 1930 - 2055, Add a
new FP32 unit test mirroring the existing
float32_papilo_presolve_works/run_float32 pattern that sets
solver_settings.mixed_precision_spmv = true and verifies numerical correctness
(expect CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32); locate and modify the pdlp_test.cu tests around
TEST(pdlp_class, run_float32) or TEST(pdlp_class, float32_papilo_presolve_works)
to add TEST(pdlp_class, float32_mixed_precision_spmv_noop) which constructs
op_problem via cuopt::mps_parser::parse_mps<int,float>, sets
solver_settings.method = cuopt::linear_programming::method_t::PDLP and
solver_settings.mixed_precision_spmv = true, calls solve_lp(&handle_,
op_problem, solver_settings), and asserts both termination status is
CUOPT_TERIMINATION_STATUS_OPTIMAL and the primal objective equals
afiro_primal_objective_f32 to guarantee the mixed-precision flag is a no-op for
FP32.
cpp/src/pdlp/cusparse_view.cu (1)

1091-1091: Consider removing unnecessary synchronization in initialization setup.

Line 1091 synchronizes after mixed precision matrix transforms, but no CPU-side code depends on the transformed data before the solve loop. Downstream initialization functions (compute_initial_step_size, compute_initial_primal_weight) use scaled coefficients, not the float matrices. The mixed precision matrices are consumed only later in GPU kernels on the same stream, where stream ordering ensures correctness without explicit host-blocking synchronization.

Note: This is initialization code, not a hot solver iteration path, so the performance impact is minimal. However, the synchronization may be redundant for correctness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` at line 1091, Remove the blocking host
synchronization call handle_ptr_->get_stream().synchronize() after creating the
mixed-precision matrices; instead rely on CUDA stream ordering so GPU kernels
that consume the mixed-precision data run on the same stream without host-side
synchronize. Locate the call in cusparse_view.cu (the initialization path where
mixed-precision transforms are performed) and delete the synchronize()
invocation; keep the rest of the initialization (including
compute_initial_step_size and compute_initial_primal_weight) unchanged since
they only use scaled coefficients on the host and do not require the float
matrices to be host-visible before the solve kernels execute.
cpp/src/pdlp/pdlp.cu (1)

2974-2996: Consider adding a clarifying comment for the asymmetric instantiation macros.

The PDLP_INSTANTIATE_FLOAT vs MIP_INSTANTIATE_DOUBLE asymmetry is intentional (FP32 is PDLP-specific, while MIP always uses FP64 and relies on PDLP double instantiation). A brief comment would help future maintainers understand this design decision.

📝 Suggested documentation
+// PDLP supports both FP32 and FP64 precision modes.
+// Float instantiation is controlled by PDLP_INSTANTIATE_FLOAT (PDLP-specific).
+// Double instantiation uses MIP_INSTANTIATE_DOUBLE since MIP depends on PDLP<double>.
 `#if` PDLP_INSTANTIATE_FLOAT
 template class pdlp_solver_t<int, float>;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 2974 - 2996, Add a short clarifying
comment above the asymmetric instantiation blocks explaining why
PDLP_INSTANTIATE_FLOAT and MIP_INSTANTIATE_DOUBLE differ: FP32 instantiation is
PDLP-specific while MIP always uses FP64 and depends on PDLP double
instantiation; place the comment immediately above the two preprocessor blocks
that instantiate pdlp_solver_t and the
compute_weights_initial_primal_weight_from_squared_norms function templates so
future maintainers see the rationale for the asymmetry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/math_optimization/solver_settings.cu`:
- Around line 74-75: The FP32 default tolerances
(CUOPT_PRIMAL_INFEASIBLE_TOLERANCE, CUOPT_DUAL_INFEASIBLE_TOLERANCE and
CUOPT_MIP_ABSOLUTE_GAP) use f_t(1e-10) which is below float epsilon and may
underflow to denorm/zero; update the default construction to clamp to the type's
machine epsilon (e.g., replace raw f_t(1e-10) with std::max(f_t(1e-10),
std::numeric_limits<f_t>::epsilon()) or choose separate defaults for float vs
double) so pdlp_settings.tolerances.primal_infeasible_tolerance and
dual_infeasible_tolerance (and the MIP absolute gap default) are representable
in FP32.

In `@cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu`:
- Around line 182-184: The float instantiation guard in simple_rounding.cu is
using PDLP_INSTANTIATE_FLOAT which is inconsistent with the double guard
(MIP_INSTANTIATE_DOUBLE) and other MIP files; update the preprocessor check
around the INSTANTIATE(float) call to use MIP_INSTANTIATE_FLOAT to match the
file's MIP-specific instantiation pattern (or, only if this code truly needs to
be shared with PDLP, use the combined guard MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT) so that INSTANTIATE(float) follows the same macro
convention as INSTANTIATE(double) and other MIP files.

In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 599-667: The code unconditionally allocates mixed-precision
buffers and creates mixed cuSPARSE descriptors (A_float_, A_T_float_, A_mixed_,
A_T_mixed_, buffer_non_transpose_mixed_, buffer_transpose_mixed_, and calls to
mixed_precision_spmv_buffersize) whenever enable_mixed_precision_spmv is true,
which wastes memory for batch paths that use SpMM on the FP64 descriptors; wrap
the entire mixed-precision allocation block so it only runs when mixed precision
is enabled AND the current execution will use SpMV (e.g., !is_batch_mode or a
flag like use_spmv) — move the allocations, cub::DeviceTransform calls,
cusparseCreateCsr calls, and buffer_size computations inside that guard and skip
them in batch/SpMM execution paths to avoid unnecessary memory use.
- Around line 606-616: The two unwrapped calls to
cub::DeviceTransform::Transform that convert doubles to floats (the calls using
double_to_float_functor with sources op_problem_scaled.coefficients and A_T_.
and destinations A_float_ and A_T_float_ respectively, using
handle_ptr->get_stream().value()) must be wrapped with RAFT_CUDA_TRY to surface
CUDA errors; locate the Transform invocations and wrap each call with
RAFT_CUDA_TRY( ... ) so any returned cudaError_t is checked, and apply the same
change for the identical pair of Transform calls found later in the file.

In `@cpp/src/pdlp/cusparse_view.hpp`:
- Around line 201-207: Replace the raw cusparseSpMatDescr_t handles A_mixed_ and
A_T_mixed_ with the existing RAII wrapper type cusparse_sp_mat_descr_wrapper_t
(i.e., change their declarations to use cusparse_sp_mat_descr_wrapper_t instead
of cusparseSpMatDescr_t), update the code paths that call cusparseCreateCsr to
assign into these wrappers, and remove any manual calls to cusparseDestroySpMat
(relying on the wrapper destructor). Ensure all references to A_mixed_,
A_T_mixed_, cusparseCreateCsr and mixed_precision_enabled_ compile against the
wrapper API so the descriptors are automatically destroyed.

In `@cpp/src/pdlp/pdhg.hpp`:
- Around line 32-33: The constructor signature in pdhg.hpp currently defaults
enable_mixed_precision_spmv to true, which can silently change behavior; change
the default to false at the constructor boundary by updating the parameter
default from enable_mixed_precision_spmv = true to enable_mixed_precision_spmv =
false in the PDHG (pdhg) constructor declaration so callers that omit this
argument keep the documented default-off behavior.

In `@cpp/src/pdlp/solve.cu`:
- Around line 607-612: The FP32+crossover unsupported check currently inside the
template block (f_t) runs too late; move the validation into the beginning of
run_pdlp (or the earlier dispatch function that selects the f_t instantiation)
so invalid configurations fail fast. Specifically, add a guard that mirrors the
existing cuopt_expects(!settings.crossover, ...) check at the top of run_pdlp
(or immediately before selecting/instantiating the PDLP template) to throw a
ValidationError when std::is_same_v<f_t, float> (or when dispatch would pick
f_t=float) and settings.crossover==true; reference run_pdlp, the dispatch call
site for f_t, and the existing cuopt_expects check to locate and duplicate the
logic early in the call path.

---

Nitpick comments:
In `@benchmarks/linear_programming/cuopt/run_pdlp.cu`:
- Around line 193-198: The CLI currently defers reporting incompatible flag
combinations until solver validation; add an early fail-fast check in main: read
the boolean for "--pdlp-fp32" (use_fp32) and the boolean for
"--mixed-precision-spmv" from program (same API used for program.get<bool>(...))
before calling run_solver<float>/run_solver<double>, and if the combination is
unsupported (e.g. use_fp32 && mixed-precision-spmv true) print a clear error to
stderr and return a non-zero exit code to abort setup immediately; place this
guard just before the existing branch that calls run_solver to avoid unnecessary
parsing/setup work.

In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp`:
- Around line 33-46: The current template convert_vector<To,From>(const
std::vector<From>&) unconditionally returns a std::vector<To>, causing a full
copy even when To==From; fix this by adding an overload for the identical-type
case: implement template<typename T> const std::vector<T>& convert_vector(const
std::vector<T>& src) { return src; } and keep the existing two-type template for
actual conversions (std::vector<To> convert_vector(const std::vector<From>&)
with static_cast loop); update usages (e.g., where convert_vector is called for
Ax, bounds, objective and the other occurrence mentioned) to accept/handle a
const reference when available to avoid unnecessary large-vector copies.

In `@cpp/src/pdlp/cusparse_view.cu`:
- Line 1091: Remove the blocking host synchronization call
handle_ptr_->get_stream().synchronize() after creating the mixed-precision
matrices; instead rely on CUDA stream ordering so GPU kernels that consume the
mixed-precision data run on the same stream without host-side synchronize.
Locate the call in cusparse_view.cu (the initialization path where
mixed-precision transforms are performed) and delete the synchronize()
invocation; keep the rest of the initialization (including
compute_initial_step_size and compute_initial_primal_weight) unchanged since
they only use scaled coefficients on the host and do not require the float
matrices to be host-visible before the solve kernels execute.

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2974-2996: Add a short clarifying comment above the asymmetric
instantiation blocks explaining why PDLP_INSTANTIATE_FLOAT and
MIP_INSTANTIATE_DOUBLE differ: FP32 instantiation is PDLP-specific while MIP
always uses FP64 and depends on PDLP double instantiation; place the comment
immediately above the two preprocessor blocks that instantiate pdlp_solver_t and
the compute_weights_initial_primal_weight_from_squared_norms function templates
so future maintainers see the rationale for the asymmetry.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 1930-2055: Add a new FP32 unit test mirroring the existing
float32_papilo_presolve_works/run_float32 pattern that sets
solver_settings.mixed_precision_spmv = true and verifies numerical correctness
(expect CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32); locate and modify the pdlp_test.cu tests around
TEST(pdlp_class, run_float32) or TEST(pdlp_class, float32_papilo_presolve_works)
to add TEST(pdlp_class, float32_mixed_precision_spmv_noop) which constructs
op_problem via cuopt::mps_parser::parse_mps<int,float>, sets
solver_settings.method = cuopt::linear_programming::method_t::PDLP and
solver_settings.mixed_precision_spmv = true, calls solve_lp(&handle_,
op_problem, solver_settings), and asserts both termination status is
CUOPT_TERIMINATION_STATUS_OPTIMAL and the primal objective equals
afiro_primal_objective_f32 to guarantee the mixed-precision flag is a no-op for
FP32.

In `@docs/cuopt/source/lp-qp-milp-settings.rst`:
- Around line 201-207: Add a stable named anchor for the "Mixed Precision SpMV"
section and use that label in the earlier cross-reference: insert a label line
like ".. _mixed-precision-spmv:" immediately before the "Mixed Precision SpMV"
heading, and replace the existing inline reference "see :ref:`Mixed Precision
SpMV`" with "see :ref:`mixed-precision-spmv`" (or another consistent label name)
so the link remains stable if the heading text changes.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4d2726 and 64c5457.

📒 Files selected for processing (42)
  • benchmarks/linear_programming/cuopt/run_pdlp.cu
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/dual_simplex/sparse_matrix.cpp
  • cpp/src/math_optimization/solution_writer.cu
  • cpp/src/math_optimization/solution_writer.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/mip_heuristics/diversity/lns/rins.cu
  • cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
  • cpp/src/mip_heuristics/mip_constants.hpp
  • cpp/src/mip_heuristics/presolve/gf2_presolve.cpp
  • cpp/src/mip_heuristics/presolve/third_party_presolve.cpp
  • cpp/src/mip_heuristics/problem/presolve_data.cu
  • cpp/src/mip_heuristics/problem/problem.cu
  • cpp/src/mip_heuristics/solution/solution.cu
  • cpp/src/mip_heuristics/solver_solution.cu
  • cpp/src/pdlp/cpu_pdlp_warm_start_data.cu
  • cpp/src/pdlp/cusparse_view.cu
  • cpp/src/pdlp/cusparse_view.hpp
  • cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cu
  • cpp/src/pdlp/optimization_problem.cu
  • cpp/src/pdlp/pdhg.cu
  • cpp/src/pdlp/pdhg.hpp
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/pdlp_warm_start_data.cu
  • cpp/src/pdlp/restart_strategy/localized_duality_gap_container.cu
  • cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/pdlp/restart_strategy/weighted_average_solution.cu
  • cpp/src/pdlp/saddle_point.cu
  • cpp/src/pdlp/solution_conversion.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solver_settings.cu
  • cpp/src/pdlp/solver_solution.cu
  • cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
  • cpp/src/pdlp/termination_strategy/convergence_information.cu
  • cpp/src/pdlp/termination_strategy/infeasibility_information.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/translate.hpp
  • cpp/src/pdlp/utilities/problem_checking.cu
  • cpp/tests/linear_programming/pdlp_test.cu
  • docs/cuopt/source/lp-qp-features.rst
  • docs/cuopt/source/lp-qp-milp-settings.rst

@Kh4ster
Copy link
Contributor Author

Kh4ster commented Feb 27, 2026

/ok to test 366fd6a

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/tests/linear_programming/pdlp_test.cu (1)

54-59: ⚠️ Potential issue | 🟠 Major

Add NaN/Inf guard to is_incorrect_objective to prevent masked solver regressions.

The function accepts NaN values silently. When objective is NaN, std::abs(objective) returns NaN, and the comparisons evaluate to false, causing the helper to incorrectly report the objective as valid. This can mask solver regressions. The codebase already uses std::isfinite extensively in solver components (pdlp, feasibility_jump, barrier, etc.) for this exact purpose.

Suggested fix
 template <typename f_t>
 static bool is_incorrect_objective(f_t reference, f_t objective)
 {
+  if (!std::isfinite(reference) || !std::isfinite(objective)) { return true; }
   if (reference == 0) { return std::abs(objective) > 0.01; }
   if (objective == 0) { return std::abs(reference) > 0.01; }
   return std::abs((reference - objective) / reference) > 0.01;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 54 - 59, The helper
is_incorrect_objective must treat NaN/Inf as incorrect: at the start of
is_incorrect_objective(f_t reference, f_t objective) check
std::isfinite(reference) and std::isfinite(objective) and return true if either
is not finite, then proceed with the existing zero checks and
relative-difference logic; update includes/usings if necessary to ensure
std::isfinite is available and referenced.
🧹 Nitpick comments (4)
cpp/src/pdlp/cpu_pdlp_warm_start_data.cu (1)

16-39: Consider batching synchronization for better performance (optional).

Each helper function synchronizes after its copy operation. When convert_to_cpu_warmstart or convert_to_gpu_warmstart copies 9 vector fields, this results in 9 separate synchronization points.

For warm start data that isn't in a hot path, this is acceptable. However, if performance becomes a concern, consider refactoring to batch all copies and synchronize once at the end of the conversion functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cpu_pdlp_warm_start_data.cu` around lines 16 - 39, The helpers
device_to_host_vector and host_to_device_vector currently call
stream.synchronize() after each raft::copy, causing multiple sync points when
convert_to_cpu_warmstart/convert_to_gpu_warmstart copy many fields; change the
helpers to avoid per-copy synchronization by either (a) removing the
stream.synchronize() calls and documenting that the caller must synchronize the
stream once after batching copies, or (b) adding an optional parameter (e.g.
bool do_sync = true) so callers can disable per-copy sync and perform a single
stream.synchronize() at the end of
convert_to_cpu_warmstart/convert_to_gpu_warmstart; update those convert_*
functions to batch the copies and call stream.synchronize() once.
cpp/tests/linear_programming/pdlp_test.cu (1)

1931-2054: Add an explicit FP32 “mixed_precision_spmv is a no-op” test.

PR behavior states mixed-precision SpMV has no effect in FP32 mode, but this block doesn’t directly assert that contract yet.

Suggested test addition
+TEST(pdlp_class, float32_mixed_precision_spmv_no_effect)
+{
+  const raft::handle_t handle_{};
+  auto path = make_path_absolute("linear_programming/afiro_original.mps");
+  cuopt::mps_parser::mps_data_model_t<int, float> op_problem =
+    cuopt::mps_parser::parse_mps<int, float>(path, true);
+
+  auto settings_base   = pdlp_solver_settings_t<int, float>{};
+  settings_base.method = cuopt::linear_programming::method_t::PDLP;
+  settings_base.mixed_precision_spmv = false;
+
+  auto settings_mixed = settings_base;
+  settings_mixed.mixed_precision_spmv = true;
+
+  auto solution_base  = solve_lp(&handle_, op_problem, settings_base);
+  auto solution_mixed = solve_lp(&handle_, op_problem, settings_mixed);
+
+  EXPECT_EQ((int)solution_base.get_termination_status(), CUOPT_TERIMINATION_STATUS_OPTIMAL);
+  EXPECT_EQ((int)solution_mixed.get_termination_status(), CUOPT_TERIMINATION_STATUS_OPTIMAL);
+  EXPECT_NEAR(solution_base.get_additional_termination_information().primal_objective,
+              solution_mixed.get_additional_termination_information().primal_objective,
+              1e-2f);
+}

As per coding guidelines: **/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error').

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 1931 - 2054, Add a
test that explicitly verifies mixed-precision SpMV is a no-op in FP32: create a
pdlp_solver_settings_t<int,float> (same as in run_float32) set method =
cuopt::linear_programming::method_t::PDLP and set
solver_settings.mixed_precision_spmv = true, call solve_lp(&handle_, op_problem,
solver_settings), then assert termination status equals
CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32 (use is_incorrect_objective to check numerical
equality), mirroring the checks in the existing run_float32/papilo/pslp tests so
the behavior is explicitly tested.
cpp/src/pdlp/pdhg.cu (1)

254-280: Consolidate repeated mixed-vs-standard SpMV dispatch into one helper.

The same precision-dispatch branch is repeated in three methods. A shared helper would reduce drift risk and keep future algorithm/descriptor changes in one place.

♻️ Suggested refactor sketch
+template <typename i_t, typename f_t>
+template <typename MatStd, typename MatMixed, typename VecX, typename VecY, typename BufStd, typename BufMixed>
+inline void pdhg_solver_t<i_t, f_t>::run_spmv_dispatch(MatStd mat_std,
+                                                        MatMixed mat_mixed,
+                                                        VecX vec_x,
+                                                        VecY vec_y,
+                                                        BufStd buf_std,
+                                                        BufMixed buf_mixed)
+{
+  if constexpr (std::is_same_v<f_t, double>) {
+    if (cusparse_view_.mixed_precision_enabled_) {
+      mixed_precision_spmv(handle_ptr_->get_cusparse_handle(),
+                           CUSPARSE_OPERATION_NON_TRANSPOSE,
+                           reusable_device_scalar_value_1_.data(),
+                           mat_mixed,
+                           vec_x,
+                           reusable_device_scalar_value_0_.data(),
+                           vec_y,
+                           CUSPARSE_SPMV_CSR_ALG2,
+                           buf_mixed,
+                           stream_view_);
+      return;
+    }
+  }
+  RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmv(handle_ptr_->get_cusparse_handle(),
+                                                        CUSPARSE_OPERATION_NON_TRANSPOSE,
+                                                        reusable_device_scalar_value_1_.data(),
+                                                        mat_std,
+                                                        vec_x,
+                                                        reusable_device_scalar_value_0_.data(),
+                                                        vec_y,
+                                                        CUSPARSE_SPMV_CSR_ALG2,
+                                                        (f_t*)buf_std,
+                                                        stream_view_));
+}

As per coding guidelines, Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication.

Also applies to: 308-334, 356-382

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdhg.cu` around lines 254 - 280, Duplicate precision-dispatch
SpMV logic should be extracted into a single helper: create a templated utility
(e.g., spmv_dispatch or dispatch_spmv) that accepts the cusparse handle
(handle_ptr_->get_cusparse_handle()), operation flag, alpha/beta device scalars
(reusable_device_scalar_value_1_.data(),
reusable_device_scalar_value_0_.data()), both mixed and standard matrix pointers
(cusparse_view_.A_mixed_, cusparse_view_.A), input/output buffers
(cusparse_view_.tmp_primal, cusparse_view_.dual_gradient), algorithm/buffer args
(CUSPARSE_SPMV_CSR_ALG2 and both
buffer_non_transpose_mixed_.data()/buffer_non_transpose.data()), and the stream
(stream_view_), then replace the repeated branches in pdhg.cu (and the other two
occurrences) with calls to that helper which checks
cusparse_view_.mixed_precision_enabled_ and calls mixed_precision_spmv or
raft::sparse::detail::cusparsespmv accordingly.
cpp/src/pdlp/cusparse_view.cu (1)

1069-1082: Remove the blocking stream synchronization after mixed-precision matrix transforms.

Line 1081's handle_ptr_->get_stream().synchronize() blocks the host after submitting two matrix transforms (A_A_float_ and A_T_A_T_float_) to the same GPU stream. Since subsequent mixed-precision SpMV operations are queued on the same stream, the hardware will ensure correct ordering without explicit host stall. Removing this synchronization eliminates an unnecessary pipeline blockage in the hot path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1069 - 1082, The explicit
host-side stall is caused by handle_ptr_->get_stream().synchronize() after
launching two cub::DeviceTransform::Transform calls that convert A_→A_float_ and
A_T_→A_T_float_; remove the synchronize call so the transforms run
asynchronously on the same stream (GPU will preserve ordering) and let
subsequent mixed-precision SpMV kernels queue on that stream without blocking
the host; ensure no other code relies on the host wait and delete only the
handle_ptr_->get_stream().synchronize() invocation referenced here.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 1151-1156: The current hard validation using cuopt_expects blocks
float-precision users; update solve_lp() to auto-map float precision to
method_t::PDLP when the user has not explicitly selected an incompatible method
(i.e., settings.method is still method_t::Concurrent/default) instead of
failing: detect float precision and if settings.method == method_t::Concurrent,
set settings.method = method_t::PDLP and emit a log/info message about the
override; keep the existing cuopt_expects check only for the case where the user
explicitly set an incompatible method (e.g., DualSimplex/Barrier) so an error is
still raised for explicit invalid combinations. Ensure you reference and modify
the code paths around solve_lp(), settings.method, and the float-precision
branch that currently calls run_pdlp().

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Line 23: Update the preprocessor guard that gates the PDLP float tests:
replace the overly-broad conditional that uses MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT with a check that only uses PDLP_INSTANTIATE_FLOAT so
PDLP float tests compile only when PDLP is instantiated for float; locate the
conditional in pdlp_test.cu (the lines surrounding the existing `#if`
MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT) and change it to `#if`
PDLP_INSTANTIATE_FLOAT, leaving the rest of the test code unchanged.

---

Outside diff comments:
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 54-59: The helper is_incorrect_objective must treat NaN/Inf as
incorrect: at the start of is_incorrect_objective(f_t reference, f_t objective)
check std::isfinite(reference) and std::isfinite(objective) and return true if
either is not finite, then proceed with the existing zero checks and
relative-difference logic; update includes/usings if necessary to ensure
std::isfinite is available and referenced.

---

Nitpick comments:
In `@cpp/src/pdlp/cpu_pdlp_warm_start_data.cu`:
- Around line 16-39: The helpers device_to_host_vector and host_to_device_vector
currently call stream.synchronize() after each raft::copy, causing multiple sync
points when convert_to_cpu_warmstart/convert_to_gpu_warmstart copy many fields;
change the helpers to avoid per-copy synchronization by either (a) removing the
stream.synchronize() calls and documenting that the caller must synchronize the
stream once after batching copies, or (b) adding an optional parameter (e.g.
bool do_sync = true) so callers can disable per-copy sync and perform a single
stream.synchronize() at the end of
convert_to_cpu_warmstart/convert_to_gpu_warmstart; update those convert_*
functions to batch the copies and call stream.synchronize() once.

In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1069-1082: The explicit host-side stall is caused by
handle_ptr_->get_stream().synchronize() after launching two
cub::DeviceTransform::Transform calls that convert A_→A_float_ and
A_T_→A_T_float_; remove the synchronize call so the transforms run
asynchronously on the same stream (GPU will preserve ordering) and let
subsequent mixed-precision SpMV kernels queue on that stream without blocking
the host; ensure no other code relies on the host wait and delete only the
handle_ptr_->get_stream().synchronize() invocation referenced here.

In `@cpp/src/pdlp/pdhg.cu`:
- Around line 254-280: Duplicate precision-dispatch SpMV logic should be
extracted into a single helper: create a templated utility (e.g., spmv_dispatch
or dispatch_spmv) that accepts the cusparse handle
(handle_ptr_->get_cusparse_handle()), operation flag, alpha/beta device scalars
(reusable_device_scalar_value_1_.data(),
reusable_device_scalar_value_0_.data()), both mixed and standard matrix pointers
(cusparse_view_.A_mixed_, cusparse_view_.A), input/output buffers
(cusparse_view_.tmp_primal, cusparse_view_.dual_gradient), algorithm/buffer args
(CUSPARSE_SPMV_CSR_ALG2 and both
buffer_non_transpose_mixed_.data()/buffer_non_transpose.data()), and the stream
(stream_view_), then replace the repeated branches in pdhg.cu (and the other two
occurrences) with calls to that helper which checks
cusparse_view_.mixed_precision_enabled_ and calls mixed_precision_spmv or
raft::sparse::detail::cusparsespmv accordingly.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 1931-2054: Add a test that explicitly verifies mixed-precision
SpMV is a no-op in FP32: create a pdlp_solver_settings_t<int,float> (same as in
run_float32) set method = cuopt::linear_programming::method_t::PDLP and set
solver_settings.mixed_precision_spmv = true, call solve_lp(&handle_, op_problem,
solver_settings), then assert termination status equals
CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32 (use is_incorrect_objective to check numerical
equality), mirroring the checks in the existing run_float32/papilo/pslp tests so
the behavior is explicitly tested.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64c5457 and 366fd6a.

📒 Files selected for processing (31)
  • cpp/src/dual_simplex/sparse_matrix.cpp
  • cpp/src/math_optimization/solution_writer.cu
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
  • cpp/src/mip_heuristics/problem/problem.cu
  • cpp/src/mip_heuristics/solution/solution.cu
  • cpp/src/mip_heuristics/solver_solution.cu
  • cpp/src/pdlp/cpu_pdlp_warm_start_data.cu
  • cpp/src/pdlp/cusparse_view.cu
  • cpp/src/pdlp/cusparse_view.hpp
  • cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cu
  • cpp/src/pdlp/optimization_problem.cu
  • cpp/src/pdlp/pdhg.cu
  • cpp/src/pdlp/pdhg.hpp
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/pdlp_warm_start_data.cu
  • cpp/src/pdlp/restart_strategy/localized_duality_gap_container.cu
  • cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/pdlp/restart_strategy/weighted_average_solution.cu
  • cpp/src/pdlp/saddle_point.cu
  • cpp/src/pdlp/solution_conversion.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solver_settings.cu
  • cpp/src/pdlp/solver_solution.cu
  • cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
  • cpp/src/pdlp/termination_strategy/convergence_information.cu
  • cpp/src/pdlp/termination_strategy/infeasibility_information.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/utilities/problem_checking.cu
  • cpp/tests/linear_programming/pdlp_test.cu
🚧 Files skipped from review as they are similar to previous changes (15)
  • cpp/src/mip_heuristics/solution/solution.cu
  • cpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cu
  • cpp/src/pdlp/pdhg.hpp
  • cpp/src/pdlp/solution_conversion.cu
  • cpp/src/pdlp/pdlp_warm_start_data.cu
  • cpp/src/pdlp/solver_solution.cu
  • cpp/src/pdlp/saddle_point.cu
  • cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/termination_strategy/infeasibility_information.cu
  • cpp/src/pdlp/solver_settings.cu
  • cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/pdlp/restart_strategy/weighted_average_solution.cu
  • cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
  • cpp/src/mip_heuristics/problem/problem.cu

@anandhkb anandhkb added this to the 26.04 milestone Feb 27, 2026
@Kh4ster
Copy link
Contributor Author

Kh4ster commented Feb 27, 2026

/ok to test 2cb9ce5

@Kh4ster
Copy link
Contributor Author

Kh4ster commented Mar 2, 2026

/ok to test 410ec3c

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cpp/src/pdlp/solution_conversion.cu (1)

138-146: Add explicit <type_traits> include for std::is_same_v.

std::is_same_v is used at line 142 but the file lacks a direct include for <type_traits>. Adding this include avoids reliance on transitive headers, which can be fragile if upstream dependencies change.

Suggested patch
 `#include` <rmm/device_buffer.hpp>
 `#include` <rmm/device_uvector.hpp>
+#include <type_traits>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solution_conversion.cu` around lines 138 - 146, Add a direct
include for <type_traits> at the top of the translation unit so std::is_same_v
used in the to_cpu_buffer template is defined without relying on transitive
includes; update the includes in cpp/src/pdlp/solution_conversion.cu (before the
anonymous-namespace template<typename f_t> cuopt::cython::cpu_buffer
to_cpu_buffer(std::vector<f_t>& src)) to add the header for <type_traits>.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/pdlp/optimization_problem.cu`:
- Around line 1508-1978: This file contains a duplicated block of template
member-function definitions (including set_objective_name, set_problem_name,
set_variable_names, set_row_names, get_n_variables, get_n_constraints, get_nnz,
get_n_integers, get_handle_ptr, all get_* accessors, view(), set_maximize,
write_to_mps, print_scaling_information, has_quadratic_objective, and the
conflicting empty()) that re-defines methods already implemented earlier; remove
this duplicate block so each method (e.g.,
optimization_problem_t<i_t,f_t>::set_objective_name, ::write_to_mps,
::print_scaling_information, ::has_quadratic_objective, and ::empty) is defined
only once, keeping the intended canonical implementations (resolve the empty()
conflict by retaining the earlier definition), rebuild to ensure no
ODR/redefinition errors, and run tests.

---

Nitpick comments:
In `@cpp/src/pdlp/solution_conversion.cu`:
- Around line 138-146: Add a direct include for <type_traits> at the top of the
translation unit so std::is_same_v used in the to_cpu_buffer template is defined
without relying on transitive includes; update the includes in
cpp/src/pdlp/solution_conversion.cu (before the anonymous-namespace
template<typename f_t> cuopt::cython::cpu_buffer to_cpu_buffer(std::vector<f_t>&
src)) to add the header for <type_traits>.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abf13f3 and 410ec3c.

📒 Files selected for processing (6)
  • benchmarks/linear_programming/cuopt/run_pdlp.cu
  • cpp/src/pdlp/optimization_problem.cu
  • cpp/src/pdlp/solution_conversion.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/translate.hpp
  • cpp/tests/linear_programming/pdlp_test.cu
🚧 Files skipped from review as they are similar to previous changes (2)
  • cpp/tests/linear_programming/pdlp_test.cu
  • cpp/src/pdlp/translate.hpp

@Kh4ster
Copy link
Contributor Author

Kh4ster commented Mar 3, 2026

/ok to test a3dd383

@Kh4ster
Copy link
Contributor Author

Kh4ster commented Mar 3, 2026

/ok to test 14f9052

Copy link
Contributor

@aliceb-nv aliceb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the extensive work Nicolas :)
Just a few minor stylistic nitpicks.

Comment on lines +855 to +871
std::vector<f_t> h_primal_out(n_cols);
std::vector<f_t> h_dual_out(n_rows);
std::vector<f_t> h_reduced_costs_out(n_cols);
for (int i = 0; i < n_cols; ++i) {
h_primal_out[i] = static_cast<f_t>(uncrushed_sol->x[i]);
h_reduced_costs_out[i] = static_cast<f_t>(uncrushed_sol->z[i]);
}
for (int i = 0; i < n_rows; ++i) {
h_dual_out[i] = static_cast<f_t>(uncrushed_sol->y[i]);
}

primal_solution.resize(n_cols, stream_view);
dual_solution.resize(n_rows, stream_view);
reduced_costs.resize(n_cols, stream_view);
raft::copy(primal_solution.data(), h_primal_out.data(), n_cols, stream_view);
raft::copy(dual_solution.data(), h_dual_out.data(), n_rows, stream_view);
raft::copy(reduced_costs.data(), h_reduced_costs_out.data(), n_cols, stream_view);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be possible to unify that part of the function with thrust::transform calls that would automatically perform the implicit conversion through the iterators. It would materialize into a kernel rather than a memcpy but I don't think it matters much here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it we can probably do the same for the input also. It might be possible to get rid of the constexpr branches for the most part that way

Comment on lines +356 to +382
if constexpr (std::is_same_v<f_t, double>) {
if (cusparse_view_.mixed_precision_enabled_) {
mixed_precision_spmv(handle_ptr_->get_cusparse_handle(),
CUSPARSE_OPERATION_NON_TRANSPOSE,
reusable_device_scalar_value_1_.data(),
cusparse_view_.A_mixed_,
cusparse_view_.reflected_primal_solution,
reusable_device_scalar_value_0_.data(),
cusparse_view_.dual_gradient,
CUSPARSE_SPMV_CSR_ALG2,
cusparse_view_.buffer_non_transpose_mixed_.data(),
stream_view_);
}
}
if (!cusparse_view_.mixed_precision_enabled_) {
RAFT_CUSPARSE_TRY(
raft::sparse::detail::cusparsespmv(handle_ptr_->get_cusparse_handle(),
CUSPARSE_OPERATION_NON_TRANSPOSE,
reusable_device_scalar_value_1_.data(),
cusparse_view_.A,
cusparse_view_.reflected_primal_solution,
reusable_device_scalar_value_0_.data(),
cusparse_view_.dual_gradient,
CUSPARSE_SPMV_CSR_ALG2,
(f_t*)cusparse_view_.buffer_non_transpose.data(),
stream_view_));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit - it might be possible to unify this into a single A_spmv function, that takes an option to specify whether to use A or its transpose. Not fully confident if its worth the code changes though

@Kh4ster
Copy link
Contributor Author

Kh4ster commented Mar 3, 2026

/ok to test ab9e8eb

Copy link
Contributor

@rg20 rg20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the logic of precision inside the solve and not change/add new APIs.

}

#if MIP_INSTANTIATE_FLOAT
#if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is done in all the files, we can just have one flag, CUOPT_INSTANTIATE_FLOAT

cpu.primal_solution_ = std::move(primal_solution_);
cpu.dual_solution_ = std::move(dual_solution_);
cpu.reduced_cost_ = std::move(reduced_cost_);
cpu.primal_solution_ = to_cpu_buffer(primal_solution_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should bring up the mixed precision logic this far. The conversion should have happened much before.

if (problem.maximize) {
adjust_dual_solution_and_reduced_cost(
final_dual_solution, final_reduced_cost, problem.handle_ptr->get_stream());
if constexpr (std::is_same_v<f_t, double>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean crossover is disabled for single precision? I don't think we want to make that decision.


// Explicit template instantiations for remote execution stubs
#if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT
template std::unique_ptr<lp_solution_interface_t<int, float>> solve_lp_remote(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APIs should be still in double precision and only the inner solves should be changed.

* Convergence checking and restarts always use the full FP64 matrix, so this does
* not reduce overall memory usage. Has no effect in FP32 mode.
*/
bool mixed_precision_spmv{false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there another flag for fp32?

I think we should just have --pdlp-precision flag with options: default, single, double, mixed as options. This allows us the change the definition of mixed in future.


Users can submit a set of problems which will be solved in a batch. Problems will be solved at the same time in parallel to fully utilize the GPU. Checkout :ref:`self-hosted client <generic-example-with-normal-and-batch-mode>` example in thin client.

FP32 Precision Mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just have one parameter for precision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants