Skip to content

Refactor optimization pipeline.#520

Merged
evaleev merged 7 commits into
masterfrom
gaudel/feature/refactor_optimization
May 23, 2026
Merged

Refactor optimization pipeline.#520
evaleev merged 7 commits into
masterfrom
gaudel/feature/refactor_optimization

Conversation

@bimalgaudel
Copy link
Copy Markdown
Member

@bimalgaudel bimalgaudel commented May 20, 2026

No description provided.

@bimalgaudel bimalgaudel changed the title refactor(optimize): generalize OptRes::flops → ops and extract DP hel… Refactor optimization pipeline. May 20, 2026
@bimalgaudel bimalgaudel marked this pull request as ready for review May 21, 2026 10:23
…urable

- Add OptFor { Flops, Memsize } and ReorderSum { Reorder, NoReorder } in
  a shared optimize/flags.hpp; thread them through single_term_opt and
  the top-level optimize() API.
- Add index_to_extent_t alias for the type-erased provider used by the
  public API; templated single_term_opt callers still pass callables
  directly via the has_index_extent concept.
- Parallelize the outermost Sum's per-summand single-term optimization
  with sequant::for_each; recurse sequentially on nested Sums.
- Have opt::reorder reuse pre-binarized eval nodes instead of rebuilding
  them inside clusters().
- Harden opt_mixed_product placeholder labels (non-identifier prefix,
  starts_with match, digit-walk suffix parse, asserted invariants).
- Fix std::forward misuse in flops_counter / memsize_counter that
  prevented compilation under clang.
- Rename optimize/flags.hpp → optimize/options.hpp; move OptFor,
  ReorderSum, and index_to_extent_t into it alongside a new
  OptimizeOptions struct (opt_for / reorder / idx_to_extent with
  sensible defaults).
- Collapse the six optimize() overloads into three (ExprPtr&,
  ResultExpr&, ResultExpr&&), each taking OptimizeOptions{} by
  default. Existing default-argument call sites are source-compatible.
- Stop including <SeQuant/core/optimize/single_term.hpp> from
  optimize.hpp. The public optimize() API now needs only the
  lightweight options header, so single_term.hpp (and its range-v3
  transitives) no longer leak into consumers. This fixes the CI
  unity-build failure in utilities/external-interface, where
  range/v3/view/indirect.hpp's use of meta:: became ambiguous against
  sequant::meta once a sibling .cpp in the same TU had issued
  `using namespace sequant;`.
@bimalgaudel bimalgaudel force-pushed the gaudel/feature/refactor_optimization branch from 2dc2175 to 59f4b57 Compare May 22, 2026 13:33

// Overloads for backwards compatibility

/// \deprecated Use the \c OptimizeOptions overload instead.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can (and maybe should) make use of the [[deprecated]] attribute: https://en.cppreference.com/cpp/language/attributes/deprecated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that. I might add it.

bimalgaudel and others added 2 commits May 22, 2026 19:45
…ummands

Add minimal tests for the refactored optimize() API:
- both OptFor::Flops and OptFor::Memsize binarize a product correctly
- ReorderSum knob, and bare optimize() == explicit Reorder
- parallel summand optimization (set_num_threads 1 vs 4) yields identical
  results, guarding the optimize_impl(..., parallel_outer=true) path

Also document the thread-safety invariants of the parallel branch (work on
per-task clones; binarize()'s lazy Index::label() cache write is sequenced
after for_each joins) and explain the deliberate IndexSet-vs-vector container
choice and the exact-equality scalar sentinel in the flop/memsize counters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@evaleev evaleev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Refactor optimization pipeline

Solid, well-structured refactor. I did a focused thread-safety audit of the new parallel path and added minimal regression tests + documentation; details below. Pushed as a1d0264 on this branch.

Parallelism (optimize_impl(..., parallel_outer=true)) — SAFE as written, conditional

A four-angle audit (TensorCanonicalizer · TensorNetwork/bliss · Index/Expr/registry · for_each mechanics) found no data race, resting on two invariants that currently hold:

  1. TensorNetwork clones every tensor on insertion, so the unsynchronized mutable lazy caches (Expr::hash_value_, Index::label_/full_label_) are only ever written on thread-local clones. Index::operator==/<=> compare immutable members only, so building index sets over shared indices is safe.
  2. The one step that lazily mutates shared Index/Expr caches — binarize() via Index::label() — runs in the sequential loop after for_each joins (optimize.cpp), never inside do_term.

Canonicalizer/bliss global state is read-only during canonicalization; the swap counter is thread_local; the loop writes disjoint, pre-sized new_smands[i] slots.

I documented invariants (1)/(2) in a comment at the parallel branch so a future edit doesn't silently break them by moving a ->hash_value()/Index::label() call into do_term.

Caveats worth knowing:

  • The default Context and cardinal_tensor_labels must be configured before entering optimize() (their writes are unsynchronized unless SEQUANT_CONTEXT_MANIPULATION_THREADSAFE); optimize() only reads them.
  • On Linux/GCC builds for_each uses std::execution::par_unseq, which ignores num_threads() — so set_num_threads(1) will not serialize this loop there (it does on the Apple-Clang/macOS build, which uses the manual-thread fallback).

Tests added (commit a1d0264)

  • OptimizeOptions API: both OptFor::Flops and OptFor::Memsize binarize a product correctly; ReorderSum knob; bare optimize() == explicit Reorder.
  • Parallel/sequential equivalence: a 3-summand sum optimized under set_num_threads(1) vs (4) must produce identical results — the safety net for the invariants above.

Minor doc comments added

  • flops_counter vs memsize_counter: documented why flops needs <IndexSet> (concatenated operands repeat contracted indices) while memsize can use the default vector (per-operand, no dups).
  • Documented the exact-equality == 1./!= 1. scalar sentinel convention in both counters.

Strengths (unchanged from the PR)

  • Placeholder-label hardening (@__opt_ + digit-by-digit parse with asserts) fixes the old I_-prefix std::stoi-on-raw-data() collision risk.
  • reorder no longer re-binarizes (precomputed nodes threaded through, with SEQUANT_ASSERT(nodes.size() == expr.size())).
  • Clean extraction of init_results/build_subnet_metadata (correctly inline); deprecated bool overloads correctly drop their default args to avoid ambiguity with the OptimizeOptions = {} overload.

LGTM once CI is green.

@evaleev evaleev merged commit fb1e5d1 into master May 23, 2026
16 checks passed
@evaleev evaleev deleted the gaudel/feature/refactor_optimization branch May 23, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants