Skip to content

[MIOpen] Configurable problem size threshold for direct solver#4212

Merged
johannes-graner merged 3 commits intodevelopfrom
user/jograner/direct-solver-threshold-ALMIOPEN-1009
Mar 24, 2026
Merged

[MIOpen] Configurable problem size threshold for direct solver#4212
johannes-graner merged 3 commits intodevelopfrom
user/jograner/direct-solver-threshold-ALMIOPEN-1009

Conversation

@johannes-graner
Copy link
Copy Markdown
Contributor

Motivation

The direct solver is problematic when the problem size is very large. Tuning becomes extremely slow since the direct solver takes an infeasible amount of time to run (up to several hours with 3D convs in video diffusion). Since the direct solver is a good fallback for small problems, setting MIOPEN_DEBUG_CONV_DIRECT=0 is not always a good solution, instead the solver should be selectively disabled for large convolutions.

This PR instroduces a new environment variable MIOPEN_CONV_DIRECT_MAX_SIZE which disables the direct solver when the the max size is exceeded. The problem size is determined by the number of elements in the result tensor (output for fwd, input for bwd, weight for wrw).

The default value of the environment variable is 0, which disables the limit. This PR introduces the functionality, but leaves OOTB behavior as it is.

Technical Details

Run time is not only determined by the result tensor element count, but it was chosen as a proxy measure since it is easier to reason about than the total number of operations performed. For a user, it is intuitive to use the size of a single tensor to determine the total problem size.

As long as the measure correlates well with the computational load, it is useful.

Test Plan

The tests were carried out with a "normal" shape that has non-direct solvers available.

Test Result

Scenario Result
Just MIOpenDriver command non-direct solver chosen
Only direct enabled direct solver chosen
Only direct enabled + Max size 10 Failed to find solver

The tests were repeated with both clean and populated databases. Even if manual tuning records that direct solver should be used, it is ignored when max size is exceeded.

Submission Checklist

@JonathanLichtnerAMD
Copy link
Copy Markdown
Contributor

Could MIOPEN_SEARCH_CUTOFF be used instead of introducing this new environment variable? See

if(using_search_cutoff && sol.solver_id.find("Naive") != std::string::npos &&
skip_time > 5.0f)
{
MIOPEN_LOG_I("Skipping Naive Solver: " << algorithm_name.ToString() << ":"
<< sol.solver_id);
continue;
}

Or, if that is not suitably cutting off search, then perhaps we could tweak that existing code ? Maybe we could use problem size in addition to time if needed?

@cderb
Copy link
Copy Markdown
Contributor

cderb commented Mar 23, 2026

Could MIOPEN_SEARCH_CUTOFF be used instead of introducing this new environment variable? See

It might be nice if MIOPEN_SEARCH_CUTOFF set a default value for MIOPEN_CONV_DIRECT_MAX_SIZE, though the new env may need to exist to allow an optional level.

@johannes-graner johannes-graner merged commit 879dc9e into develop Mar 24, 2026
41 of 43 checks passed
@johannes-graner johannes-graner deleted the user/jograner/direct-solver-threshold-ALMIOPEN-1009 branch March 24, 2026 06:34
@johannes-graner
Copy link
Copy Markdown
Contributor Author

Could MIOPEN_SEARCH_CUTOFF be used instead of introducing this new environment variable? See

It might be nice if MIOPEN_SEARCH_CUTOFF set a default value for MIOPEN_CONV_DIRECT_MAX_SIZE, though the new env may need to exist to allow an optional level.

I think this makes sense. It should be very easy to implement as a follow-up.

COrruDXC pushed a commit to COrruDXC/rocm-libraries that referenced this pull request Mar 24, 2026
…4212)

## Motivation

The direct solver is problematic when the problem size is very large.
Tuning becomes extremely slow since the direct solver takes an
infeasible amount of time to run (up to several hours with 3D convs in
video diffusion). Since the direct solver is a good fallback for small
problems, setting `MIOPEN_DEBUG_CONV_DIRECT=0` is not always a good
solution, instead the solver should be selectively disabled for large
convolutions.

This PR instroduces a new environment variable
`MIOPEN_CONV_DIRECT_MAX_SIZE` which disables the direct solver when the
the max size is exceeded. The problem size is determined by the number
of elements in the result tensor (`output` for fwd, `input` for bwd,
`weight` for wrw).

The default value of the environment variable is `0`, which disables the
limit. This PR introduces the functionality, but leaves OOTB behavior as
it is.

## Technical Details

Run time is not only determined by the result tensor element count, but
it was chosen as a proxy measure since it is easier to reason about than
the total number of operations performed. For a user, it is intuitive to
use the size of a single tensor to determine the total problem size.

As long as the measure correlates well with the computational load, it
is useful.

## Test Plan

The tests were carried out with a "normal" shape that has non-direct
solvers available.

## Test Result

| Scenario | Result |
| --- | --- |
| Just MIOpenDriver command | non-direct solver chosen |
| Only direct enabled | direct solver chosen |
| Only direct enabled + Max size 10 | Failed to find solver |

The tests were repeated with both clean and populated databases. Even if
manual tuning records that direct solver should be used, it is ignored
when max size is exceeded.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants