Add support for FP32 and mixed precision in PDLP#910
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds FP32 PDLP support and an optional mixed-precision SpMV path. Code is templated over floating type f_t, new CLI flags control precision and mixed-SpMV, FP32 matrix copies and mixed-SpMV helpers are added, and many template-instantiation guards were extended to enable float builds. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (6)
docs/cuopt/source/lp-qp-milp-settings.rst (1)
201-207: Prefer an explicit label for the Mixed Precision SpMV cross-reference.Using a named anchor avoids fragile section-title references and keeps links stable if headings change.
📚 Suggested doc tweak
-For an alternative that maintains FP64 accuracy while improving performance, see :ref:`Mixed Precision SpMV`. +For an alternative that maintains FP64 accuracy while improving performance, see :ref:`mixed-precision-spmv`. +.. _mixed-precision-spmv: Mixed Precision SpMV ^^^^^^^^^^^^^^^^^^^^As per coding guidelines for
docs/**/*, documentation should prioritize consistency and clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/cuopt/source/lp-qp-milp-settings.rst` around lines 201 - 207, Add a stable named anchor for the "Mixed Precision SpMV" section and use that label in the earlier cross-reference: insert a label line like ".. _mixed-precision-spmv:" immediately before the "Mixed Precision SpMV" heading, and replace the existing inline reference "see :ref:`Mixed Precision SpMV`" with "see :ref:`mixed-precision-spmv`" (or another consistent label name) so the link remains stable if the heading text changes.cpp/src/mip_heuristics/presolve/third_party_presolve.cpp (1)
33-46: Avoid full extra copies in same-type vector conversion.
convert_vectortakesconst&, so whenf_t == double, thedouble -> doublepath copies very large vectors (Ax, bounds, objective) unnecessarily before PSLP calls.Proposed refactor
-template <typename To, typename From> -std::vector<To> convert_vector(const std::vector<From>& src) +template <typename To, typename From> +std::vector<To> convert_vector(std::vector<From> src) { if constexpr (std::is_same_v<To, From>) { - return src; // No conversion needed + return std::move(src); // No conversion needed } else { std::vector<To> dst(src.size()); for (size_t i = 0; i < src.size(); ++i) { dst[i] = static_cast<To>(src[i]); } return dst; } } @@ -std::vector<double> h_coefficients = convert_vector<double>(h_coefficients_ft); -std::vector<double> h_obj_coeffs = convert_vector<double>(h_obj_coeffs_ft); -std::vector<double> h_var_lb = convert_vector<double>(h_var_lb_ft); -std::vector<double> h_var_ub = convert_vector<double>(h_var_ub_ft); -std::vector<double> h_constr_lb = convert_vector<double>(h_constr_lb_ft); -std::vector<double> h_constr_ub = convert_vector<double>(h_constr_ub_ft); +std::vector<double> h_coefficients = convert_vector<double>(std::move(h_coefficients_ft)); +std::vector<double> h_obj_coeffs = convert_vector<double>(std::move(h_obj_coeffs_ft)); +std::vector<double> h_var_lb = convert_vector<double>(std::move(h_var_lb_ft)); +std::vector<double> h_var_ub = convert_vector<double>(std::move(h_var_ub_ft)); +std::vector<double> h_constr_lb = convert_vector<double>(std::move(h_constr_lb_ft)); +std::vector<double> h_constr_ub = convert_vector<double>(std::move(h_constr_ub_ft));As per coding guidelines "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".
Also applies to: 295-300
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp` around lines 33 - 46, The current template convert_vector<To,From>(const std::vector<From>&) unconditionally returns a std::vector<To>, causing a full copy even when To==From; fix this by adding an overload for the identical-type case: implement template<typename T> const std::vector<T>& convert_vector(const std::vector<T>& src) { return src; } and keep the existing two-type template for actual conversions (std::vector<To> convert_vector(const std::vector<From>&) with static_cast loop); update usages (e.g., where convert_vector is called for Ax, bounds, objective and the other occurrence mentioned) to accept/handle a const reference when available to avoid unnecessary large-vector copies.benchmarks/linear_programming/cuopt/run_pdlp.cu (1)
193-198: Add fail-fast validation for unsupported flag combinations.
--pdlp-fp32/--mixed-precision-spmvconstraints are documented, but incompatible combinations are only surfaced later by solver validation. A small upfront check inmainwould return clearer CLI errors earlier and avoid unnecessary setup/parse work.💡 Suggested guard in
mainbool use_fp32 = program.get<bool>("--pdlp-fp32"); + const int method = program.get<int>("--method"); + const bool crossover_enabled = program.get<int>("--crossover") != 0; + const bool mixed_spmv = program.get<bool>("--mixed-precision-spmv"); + + if (use_fp32 && (method != static_cast<int>(cuopt::linear_programming::method_t::PDLP) || + crossover_enabled)) { + std::cerr << "--pdlp-fp32 is only supported for PDLP method without crossover.\n"; + return 1; + } + if (use_fp32 && mixed_spmv) { + std::cerr << "--mixed-precision-spmv has no effect in FP32 mode.\n"; + } if (use_fp32) { return run_solver<float>(program, handle_); } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/linear_programming/cuopt/run_pdlp.cu` around lines 193 - 198, The CLI currently defers reporting incompatible flag combinations until solver validation; add an early fail-fast check in main: read the boolean for "--pdlp-fp32" (use_fp32) and the boolean for "--mixed-precision-spmv" from program (same API used for program.get<bool>(...)) before calling run_solver<float>/run_solver<double>, and if the combination is unsupported (e.g. use_fp32 && mixed-precision-spmv true) print a clear error to stderr and return a non-zero exit code to abort setup immediately; place this guard just before the existing branch that calls run_solver to avoid unnecessary parsing/setup work.cpp/tests/linear_programming/pdlp_test.cu (1)
1930-2055: Add one FP32 test withmixed_precision_spmv=trueto verify no-op behavior.The PR contract says mixed precision SpMV has no effect in FP32 mode. Adding that explicit case would lock this behavior.
As per coding guidelines,
**/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 1930 - 2055, Add a new FP32 unit test mirroring the existing float32_papilo_presolve_works/run_float32 pattern that sets solver_settings.mixed_precision_spmv = true and verifies numerical correctness (expect CUOPT_TERIMINATION_STATUS_OPTIMAL and that solution.get_additional_termination_information().primal_objective matches afiro_primal_objective_f32); locate and modify the pdlp_test.cu tests around TEST(pdlp_class, run_float32) or TEST(pdlp_class, float32_papilo_presolve_works) to add TEST(pdlp_class, float32_mixed_precision_spmv_noop) which constructs op_problem via cuopt::mps_parser::parse_mps<int,float>, sets solver_settings.method = cuopt::linear_programming::method_t::PDLP and solver_settings.mixed_precision_spmv = true, calls solve_lp(&handle_, op_problem, solver_settings), and asserts both termination status is CUOPT_TERIMINATION_STATUS_OPTIMAL and the primal objective equals afiro_primal_objective_f32 to guarantee the mixed-precision flag is a no-op for FP32.cpp/src/pdlp/cusparse_view.cu (1)
1091-1091: Consider removing unnecessary synchronization in initialization setup.Line 1091 synchronizes after mixed precision matrix transforms, but no CPU-side code depends on the transformed data before the solve loop. Downstream initialization functions (compute_initial_step_size, compute_initial_primal_weight) use scaled coefficients, not the float matrices. The mixed precision matrices are consumed only later in GPU kernels on the same stream, where stream ordering ensures correctness without explicit host-blocking synchronization.
Note: This is initialization code, not a hot solver iteration path, so the performance impact is minimal. However, the synchronization may be redundant for correctness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` at line 1091, Remove the blocking host synchronization call handle_ptr_->get_stream().synchronize() after creating the mixed-precision matrices; instead rely on CUDA stream ordering so GPU kernels that consume the mixed-precision data run on the same stream without host-side synchronize. Locate the call in cusparse_view.cu (the initialization path where mixed-precision transforms are performed) and delete the synchronize() invocation; keep the rest of the initialization (including compute_initial_step_size and compute_initial_primal_weight) unchanged since they only use scaled coefficients on the host and do not require the float matrices to be host-visible before the solve kernels execute.cpp/src/pdlp/pdlp.cu (1)
2974-2996: Consider adding a clarifying comment for the asymmetric instantiation macros.The
PDLP_INSTANTIATE_FLOATvsMIP_INSTANTIATE_DOUBLEasymmetry is intentional (FP32 is PDLP-specific, while MIP always uses FP64 and relies on PDLP double instantiation). A brief comment would help future maintainers understand this design decision.📝 Suggested documentation
+// PDLP supports both FP32 and FP64 precision modes. +// Float instantiation is controlled by PDLP_INSTANTIATE_FLOAT (PDLP-specific). +// Double instantiation uses MIP_INSTANTIATE_DOUBLE since MIP depends on PDLP<double>. `#if` PDLP_INSTANTIATE_FLOAT template class pdlp_solver_t<int, float>;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 2974 - 2996, Add a short clarifying comment above the asymmetric instantiation blocks explaining why PDLP_INSTANTIATE_FLOAT and MIP_INSTANTIATE_DOUBLE differ: FP32 instantiation is PDLP-specific while MIP always uses FP64 and depends on PDLP double instantiation; place the comment immediately above the two preprocessor blocks that instantiate pdlp_solver_t and the compute_weights_initial_primal_weight_from_squared_norms function templates so future maintainers see the rationale for the asymmetry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/math_optimization/solver_settings.cu`:
- Around line 74-75: The FP32 default tolerances
(CUOPT_PRIMAL_INFEASIBLE_TOLERANCE, CUOPT_DUAL_INFEASIBLE_TOLERANCE and
CUOPT_MIP_ABSOLUTE_GAP) use f_t(1e-10) which is below float epsilon and may
underflow to denorm/zero; update the default construction to clamp to the type's
machine epsilon (e.g., replace raw f_t(1e-10) with std::max(f_t(1e-10),
std::numeric_limits<f_t>::epsilon()) or choose separate defaults for float vs
double) so pdlp_settings.tolerances.primal_infeasible_tolerance and
dual_infeasible_tolerance (and the MIP absolute gap default) are representable
in FP32.
In `@cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu`:
- Around line 182-184: The float instantiation guard in simple_rounding.cu is
using PDLP_INSTANTIATE_FLOAT which is inconsistent with the double guard
(MIP_INSTANTIATE_DOUBLE) and other MIP files; update the preprocessor check
around the INSTANTIATE(float) call to use MIP_INSTANTIATE_FLOAT to match the
file's MIP-specific instantiation pattern (or, only if this code truly needs to
be shared with PDLP, use the combined guard MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT) so that INSTANTIATE(float) follows the same macro
convention as INSTANTIATE(double) and other MIP files.
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 599-667: The code unconditionally allocates mixed-precision
buffers and creates mixed cuSPARSE descriptors (A_float_, A_T_float_, A_mixed_,
A_T_mixed_, buffer_non_transpose_mixed_, buffer_transpose_mixed_, and calls to
mixed_precision_spmv_buffersize) whenever enable_mixed_precision_spmv is true,
which wastes memory for batch paths that use SpMM on the FP64 descriptors; wrap
the entire mixed-precision allocation block so it only runs when mixed precision
is enabled AND the current execution will use SpMV (e.g., !is_batch_mode or a
flag like use_spmv) — move the allocations, cub::DeviceTransform calls,
cusparseCreateCsr calls, and buffer_size computations inside that guard and skip
them in batch/SpMM execution paths to avoid unnecessary memory use.
- Around line 606-616: The two unwrapped calls to
cub::DeviceTransform::Transform that convert doubles to floats (the calls using
double_to_float_functor with sources op_problem_scaled.coefficients and A_T_.
and destinations A_float_ and A_T_float_ respectively, using
handle_ptr->get_stream().value()) must be wrapped with RAFT_CUDA_TRY to surface
CUDA errors; locate the Transform invocations and wrap each call with
RAFT_CUDA_TRY( ... ) so any returned cudaError_t is checked, and apply the same
change for the identical pair of Transform calls found later in the file.
In `@cpp/src/pdlp/cusparse_view.hpp`:
- Around line 201-207: Replace the raw cusparseSpMatDescr_t handles A_mixed_ and
A_T_mixed_ with the existing RAII wrapper type cusparse_sp_mat_descr_wrapper_t
(i.e., change their declarations to use cusparse_sp_mat_descr_wrapper_t instead
of cusparseSpMatDescr_t), update the code paths that call cusparseCreateCsr to
assign into these wrappers, and remove any manual calls to cusparseDestroySpMat
(relying on the wrapper destructor). Ensure all references to A_mixed_,
A_T_mixed_, cusparseCreateCsr and mixed_precision_enabled_ compile against the
wrapper API so the descriptors are automatically destroyed.
In `@cpp/src/pdlp/pdhg.hpp`:
- Around line 32-33: The constructor signature in pdhg.hpp currently defaults
enable_mixed_precision_spmv to true, which can silently change behavior; change
the default to false at the constructor boundary by updating the parameter
default from enable_mixed_precision_spmv = true to enable_mixed_precision_spmv =
false in the PDHG (pdhg) constructor declaration so callers that omit this
argument keep the documented default-off behavior.
In `@cpp/src/pdlp/solve.cu`:
- Around line 607-612: The FP32+crossover unsupported check currently inside the
template block (f_t) runs too late; move the validation into the beginning of
run_pdlp (or the earlier dispatch function that selects the f_t instantiation)
so invalid configurations fail fast. Specifically, add a guard that mirrors the
existing cuopt_expects(!settings.crossover, ...) check at the top of run_pdlp
(or immediately before selecting/instantiating the PDLP template) to throw a
ValidationError when std::is_same_v<f_t, float> (or when dispatch would pick
f_t=float) and settings.crossover==true; reference run_pdlp, the dispatch call
site for f_t, and the existing cuopt_expects check to locate and duplicate the
logic early in the call path.
---
Nitpick comments:
In `@benchmarks/linear_programming/cuopt/run_pdlp.cu`:
- Around line 193-198: The CLI currently defers reporting incompatible flag
combinations until solver validation; add an early fail-fast check in main: read
the boolean for "--pdlp-fp32" (use_fp32) and the boolean for
"--mixed-precision-spmv" from program (same API used for program.get<bool>(...))
before calling run_solver<float>/run_solver<double>, and if the combination is
unsupported (e.g. use_fp32 && mixed-precision-spmv true) print a clear error to
stderr and return a non-zero exit code to abort setup immediately; place this
guard just before the existing branch that calls run_solver to avoid unnecessary
parsing/setup work.
In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp`:
- Around line 33-46: The current template convert_vector<To,From>(const
std::vector<From>&) unconditionally returns a std::vector<To>, causing a full
copy even when To==From; fix this by adding an overload for the identical-type
case: implement template<typename T> const std::vector<T>& convert_vector(const
std::vector<T>& src) { return src; } and keep the existing two-type template for
actual conversions (std::vector<To> convert_vector(const std::vector<From>&)
with static_cast loop); update usages (e.g., where convert_vector is called for
Ax, bounds, objective and the other occurrence mentioned) to accept/handle a
const reference when available to avoid unnecessary large-vector copies.
In `@cpp/src/pdlp/cusparse_view.cu`:
- Line 1091: Remove the blocking host synchronization call
handle_ptr_->get_stream().synchronize() after creating the mixed-precision
matrices; instead rely on CUDA stream ordering so GPU kernels that consume the
mixed-precision data run on the same stream without host-side synchronize.
Locate the call in cusparse_view.cu (the initialization path where
mixed-precision transforms are performed) and delete the synchronize()
invocation; keep the rest of the initialization (including
compute_initial_step_size and compute_initial_primal_weight) unchanged since
they only use scaled coefficients on the host and do not require the float
matrices to be host-visible before the solve kernels execute.
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2974-2996: Add a short clarifying comment above the asymmetric
instantiation blocks explaining why PDLP_INSTANTIATE_FLOAT and
MIP_INSTANTIATE_DOUBLE differ: FP32 instantiation is PDLP-specific while MIP
always uses FP64 and depends on PDLP double instantiation; place the comment
immediately above the two preprocessor blocks that instantiate pdlp_solver_t and
the compute_weights_initial_primal_weight_from_squared_norms function templates
so future maintainers see the rationale for the asymmetry.
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 1930-2055: Add a new FP32 unit test mirroring the existing
float32_papilo_presolve_works/run_float32 pattern that sets
solver_settings.mixed_precision_spmv = true and verifies numerical correctness
(expect CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32); locate and modify the pdlp_test.cu tests around
TEST(pdlp_class, run_float32) or TEST(pdlp_class, float32_papilo_presolve_works)
to add TEST(pdlp_class, float32_mixed_precision_spmv_noop) which constructs
op_problem via cuopt::mps_parser::parse_mps<int,float>, sets
solver_settings.method = cuopt::linear_programming::method_t::PDLP and
solver_settings.mixed_precision_spmv = true, calls solve_lp(&handle_,
op_problem, solver_settings), and asserts both termination status is
CUOPT_TERIMINATION_STATUS_OPTIMAL and the primal objective equals
afiro_primal_objective_f32 to guarantee the mixed-precision flag is a no-op for
FP32.
In `@docs/cuopt/source/lp-qp-milp-settings.rst`:
- Around line 201-207: Add a stable named anchor for the "Mixed Precision SpMV"
section and use that label in the earlier cross-reference: insert a label line
like ".. _mixed-precision-spmv:" immediately before the "Mixed Precision SpMV"
heading, and replace the existing inline reference "see :ref:`Mixed Precision
SpMV`" with "see :ref:`mixed-precision-spmv`" (or another consistent label name)
so the link remains stable if the heading text changes.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (42)
benchmarks/linear_programming/cuopt/run_pdlp.cucpp/include/cuopt/linear_programming/pdlp/solver_settings.hppcpp/src/dual_simplex/sparse_matrix.cppcpp/src/math_optimization/solution_writer.cucpp/src/math_optimization/solution_writer.hppcpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/diversity/lns/rins.cucpp/src/mip_heuristics/local_search/rounding/simple_rounding.cucpp/src/mip_heuristics/mip_constants.hppcpp/src/mip_heuristics/presolve/gf2_presolve.cppcpp/src/mip_heuristics/presolve/third_party_presolve.cppcpp/src/mip_heuristics/problem/presolve_data.cucpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/solution/solution.cucpp/src/mip_heuristics/solver_solution.cucpp/src/pdlp/cpu_pdlp_warm_start_data.cucpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hppcpp/src/pdlp/initial_scaling_strategy/initial_scaling.cucpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cucpp/src/pdlp/optimization_problem.cucpp/src/pdlp/pdhg.cucpp/src/pdlp/pdhg.hppcpp/src/pdlp/pdlp.cucpp/src/pdlp/pdlp_warm_start_data.cucpp/src/pdlp/restart_strategy/localized_duality_gap_container.cucpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cucpp/src/pdlp/restart_strategy/weighted_average_solution.cucpp/src/pdlp/saddle_point.cucpp/src/pdlp/solution_conversion.cucpp/src/pdlp/solve.cucpp/src/pdlp/solver_settings.cucpp/src/pdlp/solver_solution.cucpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cucpp/src/pdlp/termination_strategy/convergence_information.cucpp/src/pdlp/termination_strategy/infeasibility_information.cucpp/src/pdlp/termination_strategy/termination_strategy.cucpp/src/pdlp/translate.hppcpp/src/pdlp/utilities/problem_checking.cucpp/tests/linear_programming/pdlp_test.cudocs/cuopt/source/lp-qp-features.rstdocs/cuopt/source/lp-qp-milp-settings.rst
cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
Outdated
Show resolved
Hide resolved
|
/ok to test 366fd6a |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/tests/linear_programming/pdlp_test.cu (1)
54-59:⚠️ Potential issue | 🟠 MajorAdd NaN/Inf guard to
is_incorrect_objectiveto prevent masked solver regressions.The function accepts NaN values silently. When
objectiveis NaN,std::abs(objective)returns NaN, and the comparisons evaluate to false, causing the helper to incorrectly report the objective as valid. This can mask solver regressions. The codebase already usesstd::isfiniteextensively in solver components (pdlp, feasibility_jump, barrier, etc.) for this exact purpose.Suggested fix
template <typename f_t> static bool is_incorrect_objective(f_t reference, f_t objective) { + if (!std::isfinite(reference) || !std::isfinite(objective)) { return true; } if (reference == 0) { return std::abs(objective) > 0.01; } if (objective == 0) { return std::abs(reference) > 0.01; } return std::abs((reference - objective) / reference) > 0.01; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 54 - 59, The helper is_incorrect_objective must treat NaN/Inf as incorrect: at the start of is_incorrect_objective(f_t reference, f_t objective) check std::isfinite(reference) and std::isfinite(objective) and return true if either is not finite, then proceed with the existing zero checks and relative-difference logic; update includes/usings if necessary to ensure std::isfinite is available and referenced.
🧹 Nitpick comments (4)
cpp/src/pdlp/cpu_pdlp_warm_start_data.cu (1)
16-39: Consider batching synchronization for better performance (optional).Each helper function synchronizes after its copy operation. When
convert_to_cpu_warmstartorconvert_to_gpu_warmstartcopies 9 vector fields, this results in 9 separate synchronization points.For warm start data that isn't in a hot path, this is acceptable. However, if performance becomes a concern, consider refactoring to batch all copies and synchronize once at the end of the conversion functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cpu_pdlp_warm_start_data.cu` around lines 16 - 39, The helpers device_to_host_vector and host_to_device_vector currently call stream.synchronize() after each raft::copy, causing multiple sync points when convert_to_cpu_warmstart/convert_to_gpu_warmstart copy many fields; change the helpers to avoid per-copy synchronization by either (a) removing the stream.synchronize() calls and documenting that the caller must synchronize the stream once after batching copies, or (b) adding an optional parameter (e.g. bool do_sync = true) so callers can disable per-copy sync and perform a single stream.synchronize() at the end of convert_to_cpu_warmstart/convert_to_gpu_warmstart; update those convert_* functions to batch the copies and call stream.synchronize() once.cpp/tests/linear_programming/pdlp_test.cu (1)
1931-2054: Add an explicit FP32 “mixed_precision_spmv is a no-op” test.PR behavior states mixed-precision SpMV has no effect in FP32 mode, but this block doesn’t directly assert that contract yet.
Suggested test addition
+TEST(pdlp_class, float32_mixed_precision_spmv_no_effect) +{ + const raft::handle_t handle_{}; + auto path = make_path_absolute("linear_programming/afiro_original.mps"); + cuopt::mps_parser::mps_data_model_t<int, float> op_problem = + cuopt::mps_parser::parse_mps<int, float>(path, true); + + auto settings_base = pdlp_solver_settings_t<int, float>{}; + settings_base.method = cuopt::linear_programming::method_t::PDLP; + settings_base.mixed_precision_spmv = false; + + auto settings_mixed = settings_base; + settings_mixed.mixed_precision_spmv = true; + + auto solution_base = solve_lp(&handle_, op_problem, settings_base); + auto solution_mixed = solve_lp(&handle_, op_problem, settings_mixed); + + EXPECT_EQ((int)solution_base.get_termination_status(), CUOPT_TERIMINATION_STATUS_OPTIMAL); + EXPECT_EQ((int)solution_mixed.get_termination_status(), CUOPT_TERIMINATION_STATUS_OPTIMAL); + EXPECT_NEAR(solution_base.get_additional_termination_information().primal_objective, + solution_mixed.get_additional_termination_information().primal_objective, + 1e-2f); +}As per coding guidelines:
**/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error').🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 1931 - 2054, Add a test that explicitly verifies mixed-precision SpMV is a no-op in FP32: create a pdlp_solver_settings_t<int,float> (same as in run_float32) set method = cuopt::linear_programming::method_t::PDLP and set solver_settings.mixed_precision_spmv = true, call solve_lp(&handle_, op_problem, solver_settings), then assert termination status equals CUOPT_TERIMINATION_STATUS_OPTIMAL and that solution.get_additional_termination_information().primal_objective matches afiro_primal_objective_f32 (use is_incorrect_objective to check numerical equality), mirroring the checks in the existing run_float32/papilo/pslp tests so the behavior is explicitly tested.cpp/src/pdlp/pdhg.cu (1)
254-280: Consolidate repeated mixed-vs-standard SpMV dispatch into one helper.The same precision-dispatch branch is repeated in three methods. A shared helper would reduce drift risk and keep future algorithm/descriptor changes in one place.
♻️ Suggested refactor sketch
+template <typename i_t, typename f_t> +template <typename MatStd, typename MatMixed, typename VecX, typename VecY, typename BufStd, typename BufMixed> +inline void pdhg_solver_t<i_t, f_t>::run_spmv_dispatch(MatStd mat_std, + MatMixed mat_mixed, + VecX vec_x, + VecY vec_y, + BufStd buf_std, + BufMixed buf_mixed) +{ + if constexpr (std::is_same_v<f_t, double>) { + if (cusparse_view_.mixed_precision_enabled_) { + mixed_precision_spmv(handle_ptr_->get_cusparse_handle(), + CUSPARSE_OPERATION_NON_TRANSPOSE, + reusable_device_scalar_value_1_.data(), + mat_mixed, + vec_x, + reusable_device_scalar_value_0_.data(), + vec_y, + CUSPARSE_SPMV_CSR_ALG2, + buf_mixed, + stream_view_); + return; + } + } + RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmv(handle_ptr_->get_cusparse_handle(), + CUSPARSE_OPERATION_NON_TRANSPOSE, + reusable_device_scalar_value_1_.data(), + mat_std, + vec_x, + reusable_device_scalar_value_0_.data(), + vec_y, + CUSPARSE_SPMV_CSR_ALG2, + (f_t*)buf_std, + stream_view_)); +}As per coding guidelines,
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication.Also applies to: 308-334, 356-382
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdhg.cu` around lines 254 - 280, Duplicate precision-dispatch SpMV logic should be extracted into a single helper: create a templated utility (e.g., spmv_dispatch or dispatch_spmv) that accepts the cusparse handle (handle_ptr_->get_cusparse_handle()), operation flag, alpha/beta device scalars (reusable_device_scalar_value_1_.data(), reusable_device_scalar_value_0_.data()), both mixed and standard matrix pointers (cusparse_view_.A_mixed_, cusparse_view_.A), input/output buffers (cusparse_view_.tmp_primal, cusparse_view_.dual_gradient), algorithm/buffer args (CUSPARSE_SPMV_CSR_ALG2 and both buffer_non_transpose_mixed_.data()/buffer_non_transpose.data()), and the stream (stream_view_), then replace the repeated branches in pdhg.cu (and the other two occurrences) with calls to that helper which checks cusparse_view_.mixed_precision_enabled_ and calls mixed_precision_spmv or raft::sparse::detail::cusparsespmv accordingly.cpp/src/pdlp/cusparse_view.cu (1)
1069-1082: Remove the blocking stream synchronization after mixed-precision matrix transforms.Line 1081's
handle_ptr_->get_stream().synchronize()blocks the host after submitting two matrix transforms (A_→A_float_andA_T_→A_T_float_) to the same GPU stream. Since subsequent mixed-precision SpMV operations are queued on the same stream, the hardware will ensure correct ordering without explicit host stall. Removing this synchronization eliminates an unnecessary pipeline blockage in the hot path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` around lines 1069 - 1082, The explicit host-side stall is caused by handle_ptr_->get_stream().synchronize() after launching two cub::DeviceTransform::Transform calls that convert A_→A_float_ and A_T_→A_T_float_; remove the synchronize call so the transforms run asynchronously on the same stream (GPU will preserve ordering) and let subsequent mixed-precision SpMV kernels queue on that stream without blocking the host; ensure no other code relies on the host wait and delete only the handle_ptr_->get_stream().synchronize() invocation referenced here.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 1151-1156: The current hard validation using cuopt_expects blocks
float-precision users; update solve_lp() to auto-map float precision to
method_t::PDLP when the user has not explicitly selected an incompatible method
(i.e., settings.method is still method_t::Concurrent/default) instead of
failing: detect float precision and if settings.method == method_t::Concurrent,
set settings.method = method_t::PDLP and emit a log/info message about the
override; keep the existing cuopt_expects check only for the case where the user
explicitly set an incompatible method (e.g., DualSimplex/Barrier) so an error is
still raised for explicit invalid combinations. Ensure you reference and modify
the code paths around solve_lp(), settings.method, and the float-precision
branch that currently calls run_pdlp().
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Line 23: Update the preprocessor guard that gates the PDLP float tests:
replace the overly-broad conditional that uses MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT with a check that only uses PDLP_INSTANTIATE_FLOAT so
PDLP float tests compile only when PDLP is instantiated for float; locate the
conditional in pdlp_test.cu (the lines surrounding the existing `#if`
MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT) and change it to `#if`
PDLP_INSTANTIATE_FLOAT, leaving the rest of the test code unchanged.
---
Outside diff comments:
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 54-59: The helper is_incorrect_objective must treat NaN/Inf as
incorrect: at the start of is_incorrect_objective(f_t reference, f_t objective)
check std::isfinite(reference) and std::isfinite(objective) and return true if
either is not finite, then proceed with the existing zero checks and
relative-difference logic; update includes/usings if necessary to ensure
std::isfinite is available and referenced.
---
Nitpick comments:
In `@cpp/src/pdlp/cpu_pdlp_warm_start_data.cu`:
- Around line 16-39: The helpers device_to_host_vector and host_to_device_vector
currently call stream.synchronize() after each raft::copy, causing multiple sync
points when convert_to_cpu_warmstart/convert_to_gpu_warmstart copy many fields;
change the helpers to avoid per-copy synchronization by either (a) removing the
stream.synchronize() calls and documenting that the caller must synchronize the
stream once after batching copies, or (b) adding an optional parameter (e.g.
bool do_sync = true) so callers can disable per-copy sync and perform a single
stream.synchronize() at the end of
convert_to_cpu_warmstart/convert_to_gpu_warmstart; update those convert_*
functions to batch the copies and call stream.synchronize() once.
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1069-1082: The explicit host-side stall is caused by
handle_ptr_->get_stream().synchronize() after launching two
cub::DeviceTransform::Transform calls that convert A_→A_float_ and
A_T_→A_T_float_; remove the synchronize call so the transforms run
asynchronously on the same stream (GPU will preserve ordering) and let
subsequent mixed-precision SpMV kernels queue on that stream without blocking
the host; ensure no other code relies on the host wait and delete only the
handle_ptr_->get_stream().synchronize() invocation referenced here.
In `@cpp/src/pdlp/pdhg.cu`:
- Around line 254-280: Duplicate precision-dispatch SpMV logic should be
extracted into a single helper: create a templated utility (e.g., spmv_dispatch
or dispatch_spmv) that accepts the cusparse handle
(handle_ptr_->get_cusparse_handle()), operation flag, alpha/beta device scalars
(reusable_device_scalar_value_1_.data(),
reusable_device_scalar_value_0_.data()), both mixed and standard matrix pointers
(cusparse_view_.A_mixed_, cusparse_view_.A), input/output buffers
(cusparse_view_.tmp_primal, cusparse_view_.dual_gradient), algorithm/buffer args
(CUSPARSE_SPMV_CSR_ALG2 and both
buffer_non_transpose_mixed_.data()/buffer_non_transpose.data()), and the stream
(stream_view_), then replace the repeated branches in pdhg.cu (and the other two
occurrences) with calls to that helper which checks
cusparse_view_.mixed_precision_enabled_ and calls mixed_precision_spmv or
raft::sparse::detail::cusparsespmv accordingly.
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 1931-2054: Add a test that explicitly verifies mixed-precision
SpMV is a no-op in FP32: create a pdlp_solver_settings_t<int,float> (same as in
run_float32) set method = cuopt::linear_programming::method_t::PDLP and set
solver_settings.mixed_precision_spmv = true, call solve_lp(&handle_, op_problem,
solver_settings), then assert termination status equals
CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32 (use is_incorrect_objective to check numerical
equality), mirroring the checks in the existing run_float32/papilo/pslp tests so
the behavior is explicitly tested.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
cpp/src/dual_simplex/sparse_matrix.cppcpp/src/math_optimization/solution_writer.cucpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/local_search/rounding/simple_rounding.cucpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/solution/solution.cucpp/src/mip_heuristics/solver_solution.cucpp/src/pdlp/cpu_pdlp_warm_start_data.cucpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hppcpp/src/pdlp/initial_scaling_strategy/initial_scaling.cucpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cucpp/src/pdlp/optimization_problem.cucpp/src/pdlp/pdhg.cucpp/src/pdlp/pdhg.hppcpp/src/pdlp/pdlp.cucpp/src/pdlp/pdlp_warm_start_data.cucpp/src/pdlp/restart_strategy/localized_duality_gap_container.cucpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cucpp/src/pdlp/restart_strategy/weighted_average_solution.cucpp/src/pdlp/saddle_point.cucpp/src/pdlp/solution_conversion.cucpp/src/pdlp/solve.cucpp/src/pdlp/solver_settings.cucpp/src/pdlp/solver_solution.cucpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cucpp/src/pdlp/termination_strategy/convergence_information.cucpp/src/pdlp/termination_strategy/infeasibility_information.cucpp/src/pdlp/termination_strategy/termination_strategy.cucpp/src/pdlp/utilities/problem_checking.cucpp/tests/linear_programming/pdlp_test.cu
🚧 Files skipped from review as they are similar to previous changes (15)
- cpp/src/mip_heuristics/solution/solution.cu
- cpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cu
- cpp/src/pdlp/pdhg.hpp
- cpp/src/pdlp/solution_conversion.cu
- cpp/src/pdlp/pdlp_warm_start_data.cu
- cpp/src/pdlp/solver_solution.cu
- cpp/src/pdlp/saddle_point.cu
- cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
- cpp/src/pdlp/termination_strategy/termination_strategy.cu
- cpp/src/pdlp/termination_strategy/infeasibility_information.cu
- cpp/src/pdlp/solver_settings.cu
- cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
- cpp/src/pdlp/restart_strategy/weighted_average_solution.cu
- cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
- cpp/src/mip_heuristics/problem/problem.cu
|
/ok to test 2cb9ce5 |
|
/ok to test 410ec3c |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cpp/src/pdlp/solution_conversion.cu (1)
138-146: Add explicit<type_traits>include forstd::is_same_v.
std::is_same_vis used at line 142 but the file lacks a direct include for<type_traits>. Adding this include avoids reliance on transitive headers, which can be fragile if upstream dependencies change.Suggested patch
`#include` <rmm/device_buffer.hpp> `#include` <rmm/device_uvector.hpp> +#include <type_traits>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/solution_conversion.cu` around lines 138 - 146, Add a direct include for <type_traits> at the top of the translation unit so std::is_same_v used in the to_cpu_buffer template is defined without relying on transitive includes; update the includes in cpp/src/pdlp/solution_conversion.cu (before the anonymous-namespace template<typename f_t> cuopt::cython::cpu_buffer to_cpu_buffer(std::vector<f_t>& src)) to add the header for <type_traits>.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/pdlp/optimization_problem.cu`:
- Around line 1508-1978: This file contains a duplicated block of template
member-function definitions (including set_objective_name, set_problem_name,
set_variable_names, set_row_names, get_n_variables, get_n_constraints, get_nnz,
get_n_integers, get_handle_ptr, all get_* accessors, view(), set_maximize,
write_to_mps, print_scaling_information, has_quadratic_objective, and the
conflicting empty()) that re-defines methods already implemented earlier; remove
this duplicate block so each method (e.g.,
optimization_problem_t<i_t,f_t>::set_objective_name, ::write_to_mps,
::print_scaling_information, ::has_quadratic_objective, and ::empty) is defined
only once, keeping the intended canonical implementations (resolve the empty()
conflict by retaining the earlier definition), rebuild to ensure no
ODR/redefinition errors, and run tests.
---
Nitpick comments:
In `@cpp/src/pdlp/solution_conversion.cu`:
- Around line 138-146: Add a direct include for <type_traits> at the top of the
translation unit so std::is_same_v used in the to_cpu_buffer template is defined
without relying on transitive includes; update the includes in
cpp/src/pdlp/solution_conversion.cu (before the anonymous-namespace
template<typename f_t> cuopt::cython::cpu_buffer to_cpu_buffer(std::vector<f_t>&
src)) to add the header for <type_traits>.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
benchmarks/linear_programming/cuopt/run_pdlp.cucpp/src/pdlp/optimization_problem.cucpp/src/pdlp/solution_conversion.cucpp/src/pdlp/solve.cucpp/src/pdlp/translate.hppcpp/tests/linear_programming/pdlp_test.cu
🚧 Files skipped from review as they are similar to previous changes (2)
- cpp/tests/linear_programming/pdlp_test.cu
- cpp/src/pdlp/translate.hpp
|
/ok to test a3dd383 |
|
/ok to test 14f9052 |
aliceb-nv
left a comment
There was a problem hiding this comment.
LGTM, thanks for the extensive work Nicolas :)
Just a few minor stylistic nitpicks.
| std::vector<f_t> h_primal_out(n_cols); | ||
| std::vector<f_t> h_dual_out(n_rows); | ||
| std::vector<f_t> h_reduced_costs_out(n_cols); | ||
| for (int i = 0; i < n_cols; ++i) { | ||
| h_primal_out[i] = static_cast<f_t>(uncrushed_sol->x[i]); | ||
| h_reduced_costs_out[i] = static_cast<f_t>(uncrushed_sol->z[i]); | ||
| } | ||
| for (int i = 0; i < n_rows; ++i) { | ||
| h_dual_out[i] = static_cast<f_t>(uncrushed_sol->y[i]); | ||
| } | ||
|
|
||
| primal_solution.resize(n_cols, stream_view); | ||
| dual_solution.resize(n_rows, stream_view); | ||
| reduced_costs.resize(n_cols, stream_view); | ||
| raft::copy(primal_solution.data(), h_primal_out.data(), n_cols, stream_view); | ||
| raft::copy(dual_solution.data(), h_dual_out.data(), n_rows, stream_view); | ||
| raft::copy(reduced_costs.data(), h_reduced_costs_out.data(), n_cols, stream_view); |
There was a problem hiding this comment.
nit: it might be possible to unify that part of the function with thrust::transform calls that would automatically perform the implicit conversion through the iterators. It would materialize into a kernel rather than a memcpy but I don't think it matters much here
There was a problem hiding this comment.
Come to think of it we can probably do the same for the input also. It might be possible to get rid of the constexpr branches for the most part that way
| if constexpr (std::is_same_v<f_t, double>) { | ||
| if (cusparse_view_.mixed_precision_enabled_) { | ||
| mixed_precision_spmv(handle_ptr_->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| reusable_device_scalar_value_1_.data(), | ||
| cusparse_view_.A_mixed_, | ||
| cusparse_view_.reflected_primal_solution, | ||
| reusable_device_scalar_value_0_.data(), | ||
| cusparse_view_.dual_gradient, | ||
| CUSPARSE_SPMV_CSR_ALG2, | ||
| cusparse_view_.buffer_non_transpose_mixed_.data(), | ||
| stream_view_); | ||
| } | ||
| } | ||
| if (!cusparse_view_.mixed_precision_enabled_) { | ||
| RAFT_CUSPARSE_TRY( | ||
| raft::sparse::detail::cusparsespmv(handle_ptr_->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| reusable_device_scalar_value_1_.data(), | ||
| cusparse_view_.A, | ||
| cusparse_view_.reflected_primal_solution, | ||
| reusable_device_scalar_value_0_.data(), | ||
| cusparse_view_.dual_gradient, | ||
| CUSPARSE_SPMV_CSR_ALG2, | ||
| (f_t*)cusparse_view_.buffer_non_transpose.data(), | ||
| stream_view_)); | ||
| } |
There was a problem hiding this comment.
Another nit - it might be possible to unify this into a single A_spmv function, that takes an option to specify whether to use A or its transpose. Not fully confident if its worth the code changes though
|
/ok to test ab9e8eb |
rg20
left a comment
There was a problem hiding this comment.
I think we should move the logic of precision inside the solve and not change/add new APIs.
| } | ||
|
|
||
| #if MIP_INSTANTIATE_FLOAT | ||
| #if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT |
There was a problem hiding this comment.
If this is done in all the files, we can just have one flag, CUOPT_INSTANTIATE_FLOAT
| cpu.primal_solution_ = std::move(primal_solution_); | ||
| cpu.dual_solution_ = std::move(dual_solution_); | ||
| cpu.reduced_cost_ = std::move(reduced_cost_); | ||
| cpu.primal_solution_ = to_cpu_buffer(primal_solution_); |
There was a problem hiding this comment.
I don't think we should bring up the mixed precision logic this far. The conversion should have happened much before.
| if (problem.maximize) { | ||
| adjust_dual_solution_and_reduced_cost( | ||
| final_dual_solution, final_reduced_cost, problem.handle_ptr->get_stream()); | ||
| if constexpr (std::is_same_v<f_t, double>) { |
There was a problem hiding this comment.
Does this mean crossover is disabled for single precision? I don't think we want to make that decision.
|
|
||
| // Explicit template instantiations for remote execution stubs | ||
| #if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT | ||
| template std::unique_ptr<lp_solution_interface_t<int, float>> solve_lp_remote( |
There was a problem hiding this comment.
The APIs should be still in double precision and only the inner solves should be changed.
| * Convergence checking and restarts always use the full FP64 matrix, so this does | ||
| * not reduce overall memory usage. Has no effect in FP32 mode. | ||
| */ | ||
| bool mixed_precision_spmv{false}; |
There was a problem hiding this comment.
is there another flag for fp32?
I think we should just have --pdlp-precision flag with options: default, single, double, mixed as options. This allows us the change the definition of mixed in future.
|
|
||
| Users can submit a set of problems which will be solved in a batch. Problems will be solved at the same time in parallel to fully utilize the GPU. Checkout :ref:`self-hosted client <generic-example-with-normal-and-batch-mode>` example in thin client. | ||
|
|
||
| FP32 Precision Mode |
There was a problem hiding this comment.
Lets just have one parameter for precision
This PR adds support for FP32 and mixed precision in PDLP (not MIP, Dual Simplex or Barrier).
Those two new options are available through:
Below, the detail of what each feature allows:
FP32 Precision Mode
By default, PDLP operates in FP64 (double) precision. Users can switch to FP32 (float) precision for the entire solve. FP32 uses half the memory of FP64 and allows PDHG iterations to be on average twice as fast, but it may require more iterations to converge due to reduced numerical accuracy. FP32 mode is only supported with the PDLP method (not concurrent) and without crossover.
Note: The default precision is FP64 (double).
Mixed Precision SpMV
When running PDLP in FP64 mode, users can enable mixed precision sparse matrix-vector products (SpMV) during PDHG iterations. In this mode, the constraint matrix and its transpose are stored in FP32 while vectors and the compute type remain in FP64. This allows SpMV operations to be faster thanks to reduced memory bandwidth requirements, while maintaining FP64 accuracy in the accumulation. This will make PDHG iterations faster while limiting the potential negative impact on convergence (compared to running in FP32 mode). Convergence checking and restart logic always use the full FP64 matrix, so this mode does not reduce memory usage since both the FP32 and FP64 copies of the matrix are kept in memory. Mixed precision SpMV only applies in FP64 mode and has no effect when running in FP32.
Note: The default value is false.