fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI#45004
Conversation
|
Overall looks good, with one comment! |
22fd2b1 to
bac66bf
Compare
|
P.S. @Rocketknight1 thanks for all the reviews over the past months! I've absolutely loved working on Transformers. Please do let me know if you're open to connecting outside of GH (no stress if not!). Looking forward to future PRs and hoping that the model I've been heads-down on gets its final core review soon lol :) |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @harshaljanjani Thank you. I will also take a look too. BTW, it would be nice to give the (full) test names that fails, like |
ydshieh
left a comment
There was a problem hiding this comment.
Also good from my side, thanks again!
I will update some updates on expected output values, so we have more tests fixed.
|
run-slow: phi3 |
|
This comment contains models: ["models/phi3"] |
|
Good day @ydshieh, thanks for your time! I'll keep that in mind for future PRs. I usually attach screenshots from local runs, but that makes sense, I'll include the text along with them in forthcoming PRs :) |
|
it's probably not flaky, but just we have different environment (hardware etc.) :-) |
CI ResultsCommit Info
Model CI Report❌ 1 new failed tests from this PR 😭
|
|
[For maintainers] Suggested jobs to run (before merge) run-slow: phi3 |
|
Only one
but it's already failing on |
…ain CI (huggingface#45004) * fix: Guard sdpa flash test and fix phi3/pi0 tests * fix: Narrow scope by adding it to the skip list * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
What does this PR do?
The following failing tests were identified and fixed in this PR (grouped them together since they share related root causes OR the code changes were extremely minimal and didn't warrant separate PRs):
→ Phi-3: I made a similar fix for LongCat-Flash in another PR, but it's essentially the same pattern. The PR [V5] Return a BatchEncoding dict from apply_chat_template by default again changed apply_chat_template to return a BatchEncoding dict instead of a tensor. The test was passing this dict directly to
model.generateand then accessing.shape; this fixes that.→ Pi0 / Parakeet / Evolla:
test_sdpa_can_dispatch_on_flashforces only the Flash kernel, which rejects any non-null attention mask. Pi0 wraps PaliGemma which creates a causal mask mapping even whenattention_mask=None(PaliGemma is already skipped for the very reason, so Pi0 should follow suit); Parakeet always passes a relative-position bias as attention_mask, so the mask is neverNoneeven when the test removes it; and Evolla's protein encoder generates an attention mask internally when none is provided, which then reaches SDPA as a non-null mask. Added the missing three to the skip list.cc: @Rocketknight1
CI Failures
Before the fixes (feel free to cross-check; these errors are reproducible):
After the fixes (feel free to cross-check):
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.