Refactor variant handling and add CUDA fallback#339
Conversation
Build variants were stringly-typed throughout kernels, with custom parsing and serialization sprinkled everywhere. This change adds proper/strong typing to variants adding a `Variant` class. This also centers parsing/serialization in one place and allows code to easily query various parts of of a variant. This also fundamentally changes how we deal with getting variants from the Hub. Rather than casting a wide net with all possible variants and using allow patterns based on that, we query the hub for variants of a kernel, parse them and can decide if there is an applicable variant ahead of time. If there are multiple applicable variants, we can select the best one (e.g. arch before noarch or recent CUDA version before older versions).
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
drbh
left a comment
There was a problem hiding this comment.
lgtm!
just addd a small nit but its not critical and doesn't block merging
kernels/src/kernels/variants.py
Outdated
| variant_strs = { | ||
| item.path.split("/")[-1] for item in tree if isinstance(item, RepoFolder) | ||
| } | ||
| except Exception: |
There was a problem hiding this comment.
Perhaps a logger.warning() with a descriptive message here?
There was a problem hiding this comment.
I think the error here was to catch the exception at all, it should bubble up in this case, because the user may have used an invalid repo name, revision, etc. Changed it to bubble up.
kernels/src/kernels/variants.py
Outdated
| try: | ||
| variants.append(Variant.parse(variant_str)) | ||
| except ValueError: | ||
| pass |
kernels/tests/test_variants.py
Outdated
| tvm_ffi_version=None, | ||
| ) | ||
| assert result is not None | ||
| assert result.variant_str == "torch-cuda" |
There was a problem hiding this comment.
Just for curiosity. Is this a reasonable resolution or should it have errored out saying no matching variants found?
There was a problem hiding this comment.
Yeah, noarch are always an option. If we didn't resolve to that, we could never use a noarch kernel, even in a repo that only has noarch.
Build variants were stringly-typed throughout kernels, with custom parsing and serialization sprinkled everywhere. This change adds proper/strong typing to variants adding a `Variant` class. This also centers parsing/serialization in one place and allows code to easily query various parts of of a variant. This also fundamentally changes how we deal with getting variants from the Hub. Rather than casting a wide net with all possible variants and using allow patterns based on that, we query the hub for variants of a kernel, parse them and can decide if there is an applicable variant ahead of time. If there are multiple applicable variants, we can select the best one (e.g. arch before noarch or recent CUDA version before older versions).
| @dataclass | ||
| @dataclass(unsafe_hash=True) | ||
| class CANN: | ||
| _VARIANT_REGEX: ClassVar[re.Pattern] = re.compile(r"cann(\d+)(\d+)") |
Build variants were stringly-typed throughout kernels, with custom parsing and serialization sprinkled everywhere. This change adds proper/strong typing to variants adding a
Variantclass. This also centers parsing/serialization in one place and allows code to easily query various parts of of a variant.This also fundamentally changes how we deal with getting variants from the Hub. Rather than casting a wide net with all possible variants and using allow patterns based on that, we query the hub for variants of a kernel, parse them and can decide if there is an applicable variant ahead of time. If there are multiple applicable variants, we can select the best one (e.g. arch before noarch or recent CUDA version before older versions).
The filtering and sorting logic is now completely centralized in two functions (
_filter_variantsand_sort_variants).