feat(num-input-tokens): add num input tokens by AWarno · Pull Request #1306 · NVIDIA-NeMo/Skills

AWarno · 2026-03-13T12:14:54Z

Add num_input_tokens to text and chat completion responses

Summary by CodeRabbit

New Features
- Inference results now surface input token counts when present, supporting both prompt-token and input-token fields from API responses.
Tests
- Added unit tests to validate extraction of input token counts across different response formats and ensure generation token counts remain correct.

Signed-off-by: Anna Warno <awarno@nvidia.com>

coderabbitai · 2026-03-13T12:17:02Z

📝 Walkthrough

Walkthrough

Parsers for completion and chat-completion responses now extract and populate num_input_tokens from the API usage object, preferring prompt_tokens and falling back to input_tokens. A new parameterized unit test verifies extraction for different usage shapes.

Changes

Cohort / File(s)	Summary
Token Count Extraction `nemo_skills/inference/model/base.py`	Populate `num_input_tokens` in `_parse_completion_response` and `_parse_chat_completion_response` from `response.usage.prompt_tokens` with fallback to `response.usage.input_tokens`. No other control-flow changes.
Test Coverage `tests/test_generation.py`	Added `test_parse_completion_response_token_counts` (parameterized) to assert `num_generated_tokens` and `num_input_tokens` for various `usage` shapes; minor test formatting reflows elsewhere in the file.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding num_input_tokens field to completion responses, matching the PR objectives of adding this field to text and chat completion responses.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch awarno/num_input_tokens

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can get early access to new features in CodeRabbit.

Enable the early_access setting to enable early access features such as new models, tools, and more.

Signed-off-by: Anna Warno <awarno@nvidia.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

nemo_skills/inference/model/base.py (1)

367-370: Deduplicate num_input_tokens extraction across both parsers.

Line 367-370 and Line 389-392 duplicate the same fallback chain. Please extract this into one helper to keep behavior in sync.

♻️ Proposed refactor

+    def _extract_num_input_tokens(self, usage) -> int | None:
+        if getattr(usage, "prompt_tokens", None) is not None:
+            return usage.prompt_tokens
+        if getattr(usage, "input_tokens", None) is not None:
+            return usage.input_tokens
+        return None
+
     def _parse_completion_response(
         self, response: "openai.types.Completion", include_response: bool = False, **kwargs
     ) -> dict:
@@
         result = {"generation": output, "num_generated_tokens": response.usage.completion_tokens}
-        if getattr(response.usage, "prompt_tokens", None) is not None:
-            result["num_input_tokens"] = response.usage.prompt_tokens
-        elif getattr(response.usage, "input_tokens", None) is not None:
-            result["num_input_tokens"] = response.usage.input_tokens
+        num_input_tokens = self._extract_num_input_tokens(response.usage)
+        if num_input_tokens is not None:
+            result["num_input_tokens"] = num_input_tokens
@@
     def _parse_chat_completion_response(self, response, include_response: bool = False, **kwargs) -> dict:
@@
         result = {"generation": output, "num_generated_tokens": response.usage.completion_tokens}
-        if getattr(response.usage, "prompt_tokens", None) is not None:
-            result["num_input_tokens"] = response.usage.prompt_tokens
-        elif getattr(response.usage, "input_tokens", None) is not None:
-            result["num_input_tokens"] = response.usage.input_tokens
+        num_input_tokens = self._extract_num_input_tokens(response.usage)
+        if num_input_tokens is not None:
+            result["num_input_tokens"] = num_input_tokens

As per coding guidelines, "Keep code simple and elegant; reuse/extend existing functionality when possible, minimize conditional checks..."

Also applies to: 389-392

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/base.py` around lines 367 - 370, Extract the
repeated fallback logic for reading input tokens into a single helper (e.g., a
private function like _extract_num_input_tokens(response)) that checks
getattr(response.usage, "prompt_tokens", None) then getattr(response.usage,
"input_tokens", None) and returns the found int or None; replace both duplicated
blocks (the places that set result["num_input_tokens"]) with a single call
result["num_input_tokens"] = _extract_num_input_tokens(response) so behavior
stays in sync across both parsers.

tests/test_generation.py (1)

287-301: Add matching token-count coverage for chat response parsing.

This new test validates completion parsing only, while the PR also adds the same behavior to _parse_chat_completion_response. Add a parallel chat test to prevent regressions in that path.

🧪 Suggested follow-up test

+@pytest.mark.parametrize(
+    "usage_kwargs,expected_input",
+    [
+        ({"prompt_tokens": 5}, 5),
+        ({"input_tokens": 7}, 7),
+        ({}, None),
+    ],
+)
+def test_parse_chat_completion_response_token_counts(usage_kwargs, expected_input):
+    model = BaseModel.__new__(BaseModel)
+    usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)
+    message = SimpleNamespace(content="hi")
+    choice = SimpleNamespace(message=message, finish_reason="stop", logprobs=None)
+    response = SimpleNamespace(usage=usage, choices=[choice], output=[])
+    result = model._parse_chat_completion_response(response)
+    assert result["num_generated_tokens"] == 10
+    assert result.get("num_input_tokens") == expected_input

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_generation.py` around lines 287 - 301, Add a parallel test for
chat responses mirroring test_parse_completion_response_token_counts to cover
_parse_chat_completion_response: instantiate BaseModel similarly, build a
chat-style response object whose structure matches what
_parse_chat_completion_response expects (e.g., .usage with completion_tokens and
prompt/input token keys and .choices with message/text-like payload),
parametrize the same usage_kwargs and expected_input cases ({"prompt_tokens":5},
{"input_tokens":7}, {}), call
model._parse_chat_completion_response(response)[0], and assert
num_generated_tokens == 10 and num_input_tokens == expected_input to ensure the
chat parsing path has identical token-count coverage.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_generation.py`:
- Line 299: The test is indexing the return of
_parse_completion_response(response) as if it were a list (result =
model._parse_completion_response(response)[0]) but _parse_completion_response
returns a dict; remove the [0] and either assign the dict directly to result
(result = model._parse_completion_response(response)) or access the specific
dict key you expect (e.g., result =
model._parse_completion_response(response)['choices'] or the appropriate key),
ensuring you reference the _parse_completion_response function and the result
variable when making the change.

---

Nitpick comments:
In `@nemo_skills/inference/model/base.py`:
- Around line 367-370: Extract the repeated fallback logic for reading input
tokens into a single helper (e.g., a private function like
_extract_num_input_tokens(response)) that checks getattr(response.usage,
"prompt_tokens", None) then getattr(response.usage, "input_tokens", None) and
returns the found int or None; replace both duplicated blocks (the places that
set result["num_input_tokens"]) with a single call result["num_input_tokens"] =
_extract_num_input_tokens(response) so behavior stays in sync across both
parsers.

In `@tests/test_generation.py`:
- Around line 287-301: Add a parallel test for chat responses mirroring
test_parse_completion_response_token_counts to cover
_parse_chat_completion_response: instantiate BaseModel similarly, build a
chat-style response object whose structure matches what
_parse_chat_completion_response expects (e.g., .usage with completion_tokens and
prompt/input token keys and .choices with message/text-like payload),
parametrize the same usage_kwargs and expected_input cases ({"prompt_tokens":5},
{"input_tokens":7}, {}), call
model._parse_chat_completion_response(response)[0], and assert
num_generated_tokens == 10 and num_input_tokens == expected_input to ensure the
chat parsing path has identical token-count coverage.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 404cf91f-ae6c-4028-8ce7-641445c28441

📥 Commits

Reviewing files that changed from the base of the PR and between 86071c1 and 263ea21.

📒 Files selected for processing (2)

nemo_skills/inference/model/base.py
tests/test_generation.py

Signed-off-by: Anna Warno <awarno@nvidia.com>

…eMo-Skills into awarno/num_input_tokens

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_generation.py`:
- Around line 305-314: The test test_parse_completion_response_token_counts
should assert num_input_tokens strictly instead of using .get(): call
model._parse_completion_response (as already done) and then use direct indexing
result["num_input_tokens"] == expected_input when expected_input is a concrete
number (5 or 7) so a missing key fails loudly, and for the case where
expected_input is None assert that "num_input_tokens" not in result (or
result.get("num_input_tokens") is None explicitly) to cover the absence case;
update the assertions around result.get("num_input_tokens") accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f9dd9f6-f82b-42f2-b870-81b4ec039b6a

📥 Commits

Reviewing files that changed from the base of the PR and between 15b6812 and 6a8dee4.

📒 Files selected for processing (1)

tests/test_generation.py

coderabbitai · 2026-03-17T13:27:01Z

+def test_parse_completion_response_token_counts(usage_kwargs, expected_input):
+    model = BaseModel.__new__(BaseModel)
+    usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)
+    response = SimpleNamespace(
+        usage=usage,
+        choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)],
+    )
+    result = model._parse_completion_response(response)
+    assert result["num_generated_tokens"] == 10
+    assert result.get("num_input_tokens") == expected_input


⚠️ Potential issue | 🟡 Minor

Make num_input_tokens assertion strict to avoid false positives.

At Line 314, .get() can hide a missing key in cases where presence should be enforced (expected_input is 5 or 7). Split the assertion so present cases use direct indexing and the None case explicitly checks absence.

💡 Suggested change

def test_parse_completion_response_token_counts(usage_kwargs, expected_input): model = BaseModel.__new__(BaseModel) usage = SimpleNamespace(completion_tokens=10, **usage_kwargs) response = SimpleNamespace( usage=usage, choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)], ) result = model._parse_completion_response(response) assert result["num_generated_tokens"] == 10 - assert result.get("num_input_tokens") == expected_input + if expected_input is None: + assert "num_input_tokens" not in result + else: + assert result["num_input_tokens"] == expected_input

As per coding guidelines, **/*.py: "Don't use .get() for accessing dictionary keys if the code expects them to be present; use direct access data[key_name] to fail with a clear error instead of silently corrupting data".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_parse_completion_response_token_counts(usage_kwargs, expected_input):

model = BaseModel.__new__(BaseModel)

usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)

response = SimpleNamespace(

usage=usage,

choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)],

)

result = model._parse_completion_response(response)

assert result["num_generated_tokens"] == 10

assert result.get("num_input_tokens") == expected_input

def test_parse_completion_response_token_counts(usage_kwargs, expected_input):

model = BaseModel.__new__(BaseModel)

usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)

response = SimpleNamespace(

usage=usage,

choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)],

)

result = model._parse_completion_response(response)

assert result["num_generated_tokens"] == 10

if expected_input is None:

assert "num_input_tokens" not in result

else:

assert result["num_input_tokens"] == expected_input

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_generation.py` around lines 305 - 314, The test test_parse_completion_response_token_counts should assert num_input_tokens strictly instead of using .get(): call model._parse_completion_response (as already done) and then use direct indexing result["num_input_tokens"] == expected_input when expected_input is a concrete number (5 or 7) so a missing key fails loudly, and for the case where expected_input is None assert that "num_input_tokens" not in result (or result.get("num_input_tokens") is None explicitly) to cover the absence case; update the assertions around result.get("num_input_tokens") accordingly.

Signed-off-by: Anna Warno <awarno@nvidia.com>

AWarno added 2 commits March 13, 2026 12:43

feat(num-input-tokens): add num input tokens

01ea3a5

Signed-off-by: Anna Warno <awarno@nvidia.com>

test: add num_input_tokens coverage in response parsing

2f29491

Signed-off-by: Anna Warno <awarno@nvidia.com>

test: add num_input_tokens coverage in response parsing

263ea21

Signed-off-by: Anna Warno <awarno@nvidia.com>

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread tests/test_generation.py Outdated

AWarno and others added 4 commits March 17, 2026 13:39

test: add num_input_tokens coverage in response parsing

15b6812

Signed-off-by: Anna Warno <awarno@nvidia.com>

Merge branch 'main' into awarno/num_input_tokens

94f5d50

fix(lint): fix lint

bd4e358

Signed-off-by: Anna Warno <awarno@nvidia.com>

Merge branch 'awarno/num_input_tokens' of https://github.com/NVIDIA/N…

6a8dee4

…eMo-Skills into awarno/num_input_tokens

coderabbitai Bot reviewed Mar 17, 2026

View reviewed changes

Kipok approved these changes Mar 17, 2026

View reviewed changes

Kipok merged commit 2949a82 into main Mar 17, 2026
5 checks passed

Kipok deleted the awarno/num_input_tokens branch March 17, 2026 18:25

mehrzads pushed a commit to mehrzads/NeMo-Skills that referenced this pull request Apr 8, 2026

feat(num-input-tokens): add num input tokens (NVIDIA-NeMo#1306)

e337250

Signed-off-by: Anna Warno <awarno@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(num-input-tokens): add num input tokens#1306

feat(num-input-tokens): add num input tokens#1306
Kipok merged 7 commits intomainfrom
awarno/num_input_tokens

AWarno commented Mar 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AWarno commented Mar 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AWarno commented Mar 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading