AttributeError: Qwen2Tokenizer has no attribute batch_encode_plus. Did you mean: '_encode_plus'? by jiyzhang · Pull Request #870 · NVIDIA/Model-Optimizer

jiyzhang · 2026-02-09T07:08:17Z

What does this PR do?

Type of change: ?
Bug fix

Overview: ?

The error below occurred when trying to quantize Qwen3 models (Qwen/Qwen3-Code-Next)

  File "/app/TensorRT-Model-Optimizer/examples/llm_ptq/hf_ptq.py", line 146, in make_calib_dataloader
    calib_dataloader = get_dataset_dataloader(
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/dataset_utils.py", line 217, in get_dataset_dataloader
    batch_encoded = tokenizer.batch_encode_plus(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1291, in __getattr__
    raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: Qwen2Tokenizer has no attribute batch_encode_plus. Did you mean: '_encode_plus'?

batch_encode_plus was deprecated, it's recommended to to tokenizer(...)

File changed:
modelopt/torch/utils/dataset_utils.py
from

    batch_encoded = tokenizer.batch_encode_plus(
        all_samples,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=max_sample_length,
    )

to

    batch_encoded = tokenizer(
        all_samples,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=max_sample_length,
    )

Usage

There is no change to the usage.

Testing

After the code modification, quantizing Qwen3 models works well.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

Refactor
- Updated the tokenization interface to use a more modern and streamlined approach while preserving all existing functionality and output compatibility.

1. issues encountered ``` File "/app/TensorRT-Model-Optimizer/examples/llm_ptq/hf_ptq.py", line 146, in make_calib_dataloader calib_dataloader = get_dataset_dataloader( ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/dataset_utils.py", line 217, in get_dataset_dataloader batch_encoded = tokenizer.batch_encode_plus( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1291, in __getattr__ raise AttributeError(f"{self.__class__.__name__} has no attribute {key}") AttributeError: Qwen2Tokenizer has no attribute batch_encode_plus. Did you mean: '_encode_plus'? ``` 2. `batch_encode_plus` was deprecated, it's recommended to to `tokenizer(...)` Signed-off-by: jiyzhang <jiyongzhang@gmail.com>

copy-pr-bot · 2026-02-09T07:08:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-09T07:08:50Z

📝 Walkthrough

Walkthrough

The change replaces tokenizer.batch_encode_plus() with a direct tokenizer() call in the dataset utilities module, passing equivalent parameters including return_tensors="pt", padding=True, truncation=True, and max_length to maintain the same encoding behavior.

Changes

Cohort / File(s)	Summary
Tokenizer API Migration `modelopt/torch/utils/dataset_utils.py`	Replaced deprecated `batch_encode_plus()` method with direct tokenizer call interface while preserving all encoding parameters and output structure.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title describes the specific error message encountered and serves as a bug report title rather than summarizing the solution. While it accurately reflects the problem being fixed, it highlights the error symptom rather than the main change.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

modelopt/torch/utils/dataset_utils.py (1)
227-227: Stale comment: still references batch_encode_plus.

Now that the explicit batch_encode_plus call is gone, this comment is misleading. Consider updating it to reflect the actual reason for the deep copy.
Suggested fix
-    # batch_encode_plus will modify the tokenizer in place, so we need to clone it.
+    # Tokenizer encoding may modify internal state in place, so we need to clone it.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cjluo-nv · 2026-02-09T23:00:48Z

Do you know why Qwen3-Code-Next uses Qwen2 tokenizer?

jiyzhang · 2026-02-10T04:09:52Z

Do you know why Qwen3-Code-Next uses Qwen2 tokenizer?

There is no Qwen3 tokenizer released.
The vocabulary didn't change between Qwen2 and Qwen3.

jiyzhang requested a review from a team as a code owner February 9, 2026 07:08

jiyzhang requested a review from realAsma February 9, 2026 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: Qwen2Tokenizer has no attribute batch_encode_plus. Did you mean: '_encode_plus'?#870

AttributeError: Qwen2Tokenizer has no attribute batch_encode_plus. Did you mean: '_encode_plus'?#870
jiyzhang wants to merge 1 commit intoNVIDIA:mainfrom
jiyzhang:patch-1

jiyzhang commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

cjluo-nv commented Feb 9, 2026

Uh oh!

jiyzhang commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiyzhang commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

cjluo-nv commented Feb 9, 2026

Uh oh!

jiyzhang commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiyzhang commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading