Skip to content

fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI#45004

Merged
ydshieh merged 3 commits intohuggingface:mainfrom
harshaljanjani:fix/parakeet-glmmoedsa-pi0-phi3-tests
Mar 27, 2026
Merged

fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI#45004
ydshieh merged 3 commits intohuggingface:mainfrom
harshaljanjani:fix/parakeet-glmmoedsa-pi0-phi3-tests

Conversation

@harshaljanjani
Copy link
Copy Markdown
Contributor

@harshaljanjani harshaljanjani commented Mar 25, 2026

What does this PR do?

The following failing tests were identified and fixed in this PR (grouped them together since they share related root causes OR the code changes were extremely minimal and didn't warrant separate PRs):

Phi-3: I made a similar fix for LongCat-Flash in another PR, but it's essentially the same pattern. The PR [V5] Return a BatchEncoding dict from apply_chat_template by default again changed apply_chat_template to return a BatchEncoding dict instead of a tensor. The test was passing this dict directly to model.generate and then accessing .shape; this fixes that.
Pi0 / Parakeet / Evolla: test_sdpa_can_dispatch_on_flash forces only the Flash kernel, which rejects any non-null attention mask. Pi0 wraps PaliGemma which creates a causal mask mapping even when attention_mask=None (PaliGemma is already skipped for the very reason, so Pi0 should follow suit); Parakeet always passes a relative-position bias as attention_mask, so the mask is never None even when the test removes it; and Evolla's protein encoder generates an attention mask internally when none is provided, which then reaches SDPA as a non-null mask. Added the missing three to the skip list.

cc: @Rocketknight1

CI Failures

Before the fixes (feel free to cross-check; these errors are reproducible):

image image

After the fixes (feel free to cross-check):

image image

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you fix any necessary existing tests?

@harshaljanjani harshaljanjani marked this pull request as ready for review March 25, 2026 20:25
@github-actions github-actions bot requested a review from ydshieh March 25, 2026 20:25
Comment thread tests/test_modeling_common.py
@Rocketknight1
Copy link
Copy Markdown
Member

Overall looks good, with one comment!

@Rocketknight1 Rocketknight1 enabled auto-merge March 27, 2026 13:21
@Rocketknight1 Rocketknight1 force-pushed the fix/parakeet-glmmoedsa-pi0-phi3-tests branch from 22fd2b1 to bac66bf Compare March 27, 2026 13:21
@harshaljanjani
Copy link
Copy Markdown
Contributor Author

P.S. @Rocketknight1 thanks for all the reviews over the past months! I've absolutely loved working on Transformers. Please do let me know if you're open to connecting outside of GH (no stress if not!). Looking forward to future PRs and hoping that the model I've been heads-down on gets its final core review soon lol :)

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 27, 2026

Hi @harshaljanjani Thank you. I will also take a look too.

BTW, it would be nice to give the (full) test names that fails, like tests/models/phi3 .... as text too, which we could copy paste. And also good to attach the error log of the failing tests as text.

Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also good from my side, thanks again!

I will update some updates on expected output values, so we have more tests fixed.

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 27, 2026

run-slow: phi3

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/phi3"]
quantizations: []

@harshaljanjani
Copy link
Copy Markdown
Contributor Author

harshaljanjani commented Mar 27, 2026

Good day @ydshieh, thanks for your time! I'll keep that in mind for future PRs. I usually attach screenshots from local runs, but that makes sense, I'll include the text along with them in forthcoming PRs :)
Just a gentle note in regard to the tests that there's a removed: possibly flaky (?) AssertionError that comes up in the test tests/models/phi3/test_modeling_phi3.py::Phi3IntegrationTest::test_phi3_mini_4k_instruct_generation which I didn't resolve with this PR, but the AttributeError it was crashing with before should now be resolved!

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 27, 2026

it's probably not flaky, but just we have different environment (hardware etc.) :-)

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 69aa42c9 workflow commit (merge commit)
PR bac66bf4 branch commit (from PR)
main 689f52ce base commit (on main)

Model CI Report

1 new failed tests from this PR 😭

  • phi3:
    tests/models/phi3/test_modeling_phi3.py::Phi3IntegrationTest::test_phi3_mini_4k_instruct_generation (❌ ⟹ ❌)

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: phi3

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 27, 2026

Only one

FAILED tests/models/phi3/test_modeling_phi3.py::Phi3IntegrationTest::test_export_static_cache - torch._dynamo.exc.Unsupported: Data-dependent branching

but it's already failing on main, not the scope of this PR.

@ydshieh ydshieh disabled auto-merge March 27, 2026 15:05
@ydshieh ydshieh merged commit b0bba2d into huggingface:main Mar 27, 2026
28 checks passed
@harshaljanjani harshaljanjani deleted the fix/parakeet-glmmoedsa-pi0-phi3-tests branch March 27, 2026 15:06
NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Mar 30, 2026
…ain CI (huggingface#45004)

* fix: Guard sdpa flash test and fix phi3/pi0 tests

* fix: Narrow scope by adding it to the skip list

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants