Skip to content

Refactor variant handling and add CUDA fallback#339

Merged
danieldk merged 9 commits intomainfrom
variant-parser-resolver
Mar 13, 2026
Merged

Refactor variant handling and add CUDA fallback#339
danieldk merged 9 commits intomainfrom
variant-parser-resolver

Conversation

@danieldk
Copy link
Member

@danieldk danieldk commented Mar 12, 2026

Build variants were stringly-typed throughout kernels, with custom parsing and serialization sprinkled everywhere. This change adds proper/strong typing to variants adding a Variant class. This also centers parsing/serialization in one place and allows code to easily query various parts of of a variant.

This also fundamentally changes how we deal with getting variants from the Hub. Rather than casting a wide net with all possible variants and using allow patterns based on that, we query the hub for variants of a kernel, parse them and can decide if there is an applicable variant ahead of time. If there are multiple applicable variants, we can select the best one (e.g. arch before noarch or recent CUDA version before older versions).

The filtering and sorting logic is now completely centralized in two functions (_filter_variants and _sort_variants).

Build variants were stringly-typed throughout kernels, with custom
parsing and serialization sprinkled everywhere. This change adds
proper/strong typing to variants adding a `Variant` class. This also
centers parsing/serialization in one place and allows code to easily
query various parts of of a variant.

This also fundamentally changes how we deal with getting variants from
the Hub. Rather than casting a wide net with all possible variants and
using allow patterns based on that, we query the hub for variants of a
kernel, parse them and can decide if there is an applicable variant
ahead of time. If there are multiple applicable variants, we can select
the best one (e.g. arch before noarch or recent CUDA version before
older versions).
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@danieldk danieldk marked this pull request as ready for review March 12, 2026 14:15
drbh
drbh previously approved these changes Mar 12, 2026
Copy link
Collaborator

@drbh drbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

just addd a small nit but its not critical and doesn't block merging

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor! I think it will be much more sustainable long-term.

I think we can close #336 as well in favor of this PR?

variant_strs = {
item.path.split("/")[-1] for item in tree if isinstance(item, RepoFolder)
}
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a logger.warning() with a descriptive message here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error here was to catch the exception at all, it should bubble up in this case, because the user may have used an invalid repo name, revision, etc. Changed it to bubble up.

try:
variants.append(Variant.parse(variant_str))
except ValueError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging a warning now.

tvm_ffi_version=None,
)
assert result is not None
assert result.variant_str == "torch-cuda"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for curiosity. Is this a reasonable resolution or should it have errored out saying no matching variants found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, noarch are always an option. If we didn't resolve to that, we could never use a noarch kernel, even in a repo that only has noarch.

Build variants were stringly-typed throughout kernels, with custom
parsing and serialization sprinkled everywhere. This change adds
proper/strong typing to variants adding a `Variant` class. This also
centers parsing/serialization in one place and allows code to easily
query various parts of of a variant.

This also fundamentally changes how we deal with getting variants from
the Hub. Rather than casting a wide net with all possible variants and
using allow patterns based on that, we query the hub for variants of a
kernel, parse them and can decide if there is an applicable variant
ahead of time. If there are multiple applicable variants, we can select
the best one (e.g. arch before noarch or recent CUDA version before
older versions).
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ship

@dataclass
@dataclass(unsafe_hash=True)
class CANN:
_VARIANT_REGEX: ClassVar[re.Pattern] = re.compile(r"cann(\d+)(\d+)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah lovely!

@danieldk danieldk merged commit b7a26de into main Mar 13, 2026
38 of 39 checks passed
@danieldk danieldk deleted the variant-parser-resolver branch March 13, 2026 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants