test: add tokenizer regression test for granite-4.0-micro by kndtran · Pull Request #948 · generative-computing/mellea

kndtran · 2026-04-27T21:30:47Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes AutoTokenizer resolves to GPT2Tokenizer in transformers v5, producing different token IDs for granite-4.0-micro #947

Add regression test for granite-4.0-micro tokenizer stability
Asserts golden token IDs, tokenizer class, and roundtrip decode
Catches transformers upgrades that change AutoTokenizer resolution (e.g. v5 returning GPT2Tokenizer instead of GPT2TokenizerFast)
26 tests, runs in ~13s, no GPU needed

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Additional

uv run pytest test/formatters/granite/test_tokenizer_regression.py -v — 26 passed
uv run ruff check and uv run ruff format --check — clean

Attribution

AI coding assistants used: Claude Code Opus 4.6

…-computing#947) Assert golden token IDs and tokenizer class to catch transformers upgrades that change AutoTokenizer resolution for granite-4.0-micro. Motivated by transformers v5 returning GPT2Tokenizer (constructs BPE from vocab.json/merges.txt) instead of GPT2TokenizerFast (loads tokenizer.json), which produced different token IDs and broke 3/6 RAG intrinsics. Closes generative-computing#947 Assisted-by: Claude Code Signed-off-by: Khoi-Nguyen Tran <kndtran@ibm.com>

psschwei · 2026-04-28T01:54:58Z

original body:

Summary

Add regression test for granite-4.0-micro tokenizer stability
Asserts golden token IDs, tokenizer class, and roundtrip decode
Catches transformers upgrades that change AutoTokenizer resolution (e.g. v5 returning GPT2Tokenizer instead of GPT2TokenizerFast)
26 tests, runs in ~13s, no GPU needed

Closes #947

Test plan

uv run pytest test/formatters/granite/test_tokenizer_regression.py -v — 26 passed
uv run ruff check and uv run ruff format --check — clean

github-actions · 2026-04-28T01:55:24Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

…enerative-computing#947) Replace full-query golden entries and intrinsic output labels with 20 diverse corrupted snippets extracted from context_relevance and query_rewrite eval data. Coverage now spans pure numbers, DUNS, company suffixes, fiscal year compounds, industry codes, standards, and mixed text+number phrases — all confirmed to diverge under tf5. Assisted-by: Claude Code

kndtran · 2026-04-28T18:30:18Z

@psschwei I updated the PR template with the details.

frreiss · 2026-04-28T20:41:05Z

@kndtran I recommend that we come up with an explanation for this problem that makes sense before checking a workaround into Mellea.

The current analysis from Claude produces more questions than answers. Specific important questions:

Why did the Hugging Face team design AutoTokenizer.from_pretrained(), the primary and recommended code path for initializing a tokenizer, such that it returns an incorrectly-configured tokenizer for Granite 4.0?
Why did the Hugging Face team design PretrainedTokenizerFast.from_pretrained(), the primary and recommended code path for initializing a Rust tokenizer, such that it returns a different tokenizer than AutoTokenizer.from_pretrained()? The release notes say that there is exactly one type of tokenizer.
Does this problem affect other models? If so, why is there not a major blocker issue raised against Transformers?
If this problem does not affect other models, what is special about Granite that causes this problem to affect Granite? Is there a regression in the Granite code or configuration that needs to be fixed?

kndtran · 2026-04-28T20:54:01Z

@frreiss I got Claude to refine the details and evidence in the referenced issue #947. This PR is just a simple regression test for the expected input IDs, specifically for our intrinsics use case, so no workarounds yet (only suggested) and it's not a general test in mellea.

Let me try to address those questions in the referenced issue. The release notes is light on the details of the tokenizer consolidation, so I (with Claude's help) had to discover the diff changes in the HF code.

frreiss · 2026-04-28T20:56:30Z

+
+@pytest.fixture(scope="module")
+def tokenizer():
+    return transformers.AutoTokenizer.from_pretrained(MODEL_ID)


This line makes no sense to me. Why are we testing the correctness of loading the tokenizer the wrong way? Per the comment above, this line should be calling PreTrainedTokenizerFast.from_pretrained(). Why do these tests not fail?

mellea is still currently on transformers < 5:

mellea/pyproject.toml

Line 47 in 7912a1d

"transformers>=4.53.2,<5",

This test will fail on migration to TF5 and will need to be updated with the workaround, if that's the choice,or there is an official fix in transformers.

@kndtran, I haven't been following this thread, but we are trying to move to transformers v5. It looks like our most immediate blocker has been fixed but I don't have an ETA for you: #418

kndtran · 2026-04-28T20:56:50Z

Hmm ok, a workaround will have to be implemented if there was a migration to TF5 in mellea as this test will fail. Let me mark this as draft for now, and work on the deeper explanation of this problem in the referenced issue.

planetf1

Just added a note on test gating -- rest will wait for a v5 upgrade

kndtran requested a review from a team as a code owner April 27, 2026 21:30

github-actions Bot added the testing label Apr 27, 2026

kndtran marked this pull request as draft April 28, 2026 03:16

kndtran marked this pull request as ready for review April 28, 2026 18:28

frreiss reviewed Apr 28, 2026

View reviewed changes

kndtran marked this pull request as draft April 28, 2026 20:58

kndtran mentioned this pull request May 1, 2026

AutoTokenizer resolves to GPT2Tokenizer in transformers v5, producing different token IDs for granite-4.0-micro #947

Open

planetf1 reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add tokenizer regression test for granite-4.0-micro#948

test: add tokenizer regression test for granite-4.0-micro#948
kndtran wants to merge 2 commits intogenerative-computing:mainfrom
kndtran:test/tokenizer-regression-947

kndtran commented Apr 27, 2026 •

edited

Loading

Uh oh!

psschwei commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

kndtran commented Apr 28, 2026

Uh oh!

frreiss commented Apr 28, 2026

Uh oh!

kndtran commented Apr 28, 2026

Uh oh!

frreiss Apr 28, 2026

Uh oh!

kndtran Apr 28, 2026

Uh oh!

jakelorocco Apr 29, 2026

Uh oh!

kndtran commented Apr 28, 2026

Uh oh!

planetf1 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kndtran commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Attribution

Uh oh!

psschwei commented Apr 28, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

kndtran commented Apr 28, 2026

Uh oh!

frreiss commented Apr 28, 2026

Uh oh!

kndtran commented Apr 28, 2026

Uh oh!

frreiss Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

kndtran Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

kndtran commented Apr 28, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kndtran commented Apr 27, 2026 •

edited

Loading