Skip to content

feat(num-input-tokens): add num input tokens#1306

Merged
Kipok merged 7 commits intomainfrom
awarno/num_input_tokens
Mar 17, 2026
Merged

feat(num-input-tokens): add num input tokens#1306
Kipok merged 7 commits intomainfrom
awarno/num_input_tokens

Conversation

@AWarno
Copy link
Copy Markdown
Collaborator

@AWarno AWarno commented Mar 13, 2026

Add num_input_tokens to text and chat completion responses

Summary by CodeRabbit

  • New Features

    • Inference results now surface input token counts when present, supporting both prompt-token and input-token fields from API responses.
  • Tests

    • Added unit tests to validate extraction of input token counts across different response formats and ensure generation token counts remain correct.

AWarno added 2 commits March 13, 2026 12:43
Signed-off-by: Anna Warno <awarno@nvidia.com>
Signed-off-by: Anna Warno <awarno@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

Parsers for completion and chat-completion responses now extract and populate num_input_tokens from the API usage object, preferring prompt_tokens and falling back to input_tokens. A new parameterized unit test verifies extraction for different usage shapes.

Changes

Cohort / File(s) Summary
Token Count Extraction
nemo_skills/inference/model/base.py
Populate num_input_tokens in _parse_completion_response and _parse_chat_completion_response from response.usage.prompt_tokens with fallback to response.usage.input_tokens. No other control-flow changes.
Test Coverage
tests/test_generation.py
Added test_parse_completion_response_token_counts (parameterized) to assert num_generated_tokens and num_input_tokens for various usage shapes; minor test formatting reflows elsewhere in the file.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding num_input_tokens field to completion responses, matching the PR objectives of adding this field to text and chat completion responses.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch awarno/num_input_tokens
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can get early access to new features in CodeRabbit.

Enable the early_access setting to enable early access features such as new models, tools, and more.

Signed-off-by: Anna Warno <awarno@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
nemo_skills/inference/model/base.py (1)

367-370: Deduplicate num_input_tokens extraction across both parsers.

Line 367-370 and Line 389-392 duplicate the same fallback chain. Please extract this into one helper to keep behavior in sync.

♻️ Proposed refactor
+    def _extract_num_input_tokens(self, usage) -> int | None:
+        if getattr(usage, "prompt_tokens", None) is not None:
+            return usage.prompt_tokens
+        if getattr(usage, "input_tokens", None) is not None:
+            return usage.input_tokens
+        return None
+
     def _parse_completion_response(
         self, response: "openai.types.Completion", include_response: bool = False, **kwargs
     ) -> dict:
@@
         result = {"generation": output, "num_generated_tokens": response.usage.completion_tokens}
-        if getattr(response.usage, "prompt_tokens", None) is not None:
-            result["num_input_tokens"] = response.usage.prompt_tokens
-        elif getattr(response.usage, "input_tokens", None) is not None:
-            result["num_input_tokens"] = response.usage.input_tokens
+        num_input_tokens = self._extract_num_input_tokens(response.usage)
+        if num_input_tokens is not None:
+            result["num_input_tokens"] = num_input_tokens
@@
     def _parse_chat_completion_response(self, response, include_response: bool = False, **kwargs) -> dict:
@@
         result = {"generation": output, "num_generated_tokens": response.usage.completion_tokens}
-        if getattr(response.usage, "prompt_tokens", None) is not None:
-            result["num_input_tokens"] = response.usage.prompt_tokens
-        elif getattr(response.usage, "input_tokens", None) is not None:
-            result["num_input_tokens"] = response.usage.input_tokens
+        num_input_tokens = self._extract_num_input_tokens(response.usage)
+        if num_input_tokens is not None:
+            result["num_input_tokens"] = num_input_tokens

As per coding guidelines, "Keep code simple and elegant; reuse/extend existing functionality when possible, minimize conditional checks..."

Also applies to: 389-392

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/base.py` around lines 367 - 370, Extract the
repeated fallback logic for reading input tokens into a single helper (e.g., a
private function like _extract_num_input_tokens(response)) that checks
getattr(response.usage, "prompt_tokens", None) then getattr(response.usage,
"input_tokens", None) and returns the found int or None; replace both duplicated
blocks (the places that set result["num_input_tokens"]) with a single call
result["num_input_tokens"] = _extract_num_input_tokens(response) so behavior
stays in sync across both parsers.
tests/test_generation.py (1)

287-301: Add matching token-count coverage for chat response parsing.

This new test validates completion parsing only, while the PR also adds the same behavior to _parse_chat_completion_response. Add a parallel chat test to prevent regressions in that path.

🧪 Suggested follow-up test
+@pytest.mark.parametrize(
+    "usage_kwargs,expected_input",
+    [
+        ({"prompt_tokens": 5}, 5),
+        ({"input_tokens": 7}, 7),
+        ({}, None),
+    ],
+)
+def test_parse_chat_completion_response_token_counts(usage_kwargs, expected_input):
+    model = BaseModel.__new__(BaseModel)
+    usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)
+    message = SimpleNamespace(content="hi")
+    choice = SimpleNamespace(message=message, finish_reason="stop", logprobs=None)
+    response = SimpleNamespace(usage=usage, choices=[choice], output=[])
+    result = model._parse_chat_completion_response(response)
+    assert result["num_generated_tokens"] == 10
+    assert result.get("num_input_tokens") == expected_input
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_generation.py` around lines 287 - 301, Add a parallel test for
chat responses mirroring test_parse_completion_response_token_counts to cover
_parse_chat_completion_response: instantiate BaseModel similarly, build a
chat-style response object whose structure matches what
_parse_chat_completion_response expects (e.g., .usage with completion_tokens and
prompt/input token keys and .choices with message/text-like payload),
parametrize the same usage_kwargs and expected_input cases ({"prompt_tokens":5},
{"input_tokens":7}, {}), call
model._parse_chat_completion_response(response)[0], and assert
num_generated_tokens == 10 and num_input_tokens == expected_input to ensure the
chat parsing path has identical token-count coverage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_generation.py`:
- Line 299: The test is indexing the return of
_parse_completion_response(response) as if it were a list (result =
model._parse_completion_response(response)[0]) but _parse_completion_response
returns a dict; remove the [0] and either assign the dict directly to result
(result = model._parse_completion_response(response)) or access the specific
dict key you expect (e.g., result =
model._parse_completion_response(response)['choices'] or the appropriate key),
ensuring you reference the _parse_completion_response function and the result
variable when making the change.

---

Nitpick comments:
In `@nemo_skills/inference/model/base.py`:
- Around line 367-370: Extract the repeated fallback logic for reading input
tokens into a single helper (e.g., a private function like
_extract_num_input_tokens(response)) that checks getattr(response.usage,
"prompt_tokens", None) then getattr(response.usage, "input_tokens", None) and
returns the found int or None; replace both duplicated blocks (the places that
set result["num_input_tokens"]) with a single call result["num_input_tokens"] =
_extract_num_input_tokens(response) so behavior stays in sync across both
parsers.

In `@tests/test_generation.py`:
- Around line 287-301: Add a parallel test for chat responses mirroring
test_parse_completion_response_token_counts to cover
_parse_chat_completion_response: instantiate BaseModel similarly, build a
chat-style response object whose structure matches what
_parse_chat_completion_response expects (e.g., .usage with completion_tokens and
prompt/input token keys and .choices with message/text-like payload),
parametrize the same usage_kwargs and expected_input cases ({"prompt_tokens":5},
{"input_tokens":7}, {}), call
model._parse_chat_completion_response(response)[0], and assert
num_generated_tokens == 10 and num_input_tokens == expected_input to ensure the
chat parsing path has identical token-count coverage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 404cf91f-ae6c-4028-8ce7-641445c28441

📥 Commits

Reviewing files that changed from the base of the PR and between 86071c1 and 263ea21.

📒 Files selected for processing (2)
  • nemo_skills/inference/model/base.py
  • tests/test_generation.py

Comment thread tests/test_generation.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_generation.py`:
- Around line 305-314: The test test_parse_completion_response_token_counts
should assert num_input_tokens strictly instead of using .get(): call
model._parse_completion_response (as already done) and then use direct indexing
result["num_input_tokens"] == expected_input when expected_input is a concrete
number (5 or 7) so a missing key fails loudly, and for the case where
expected_input is None assert that "num_input_tokens" not in result (or
result.get("num_input_tokens") is None explicitly) to cover the absence case;
update the assertions around result.get("num_input_tokens") accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f9dd9f6-f82b-42f2-b870-81b4ec039b6a

📥 Commits

Reviewing files that changed from the base of the PR and between 15b6812 and 6a8dee4.

📒 Files selected for processing (1)
  • tests/test_generation.py

Comment thread tests/test_generation.py
Comment on lines +305 to +314
def test_parse_completion_response_token_counts(usage_kwargs, expected_input):
model = BaseModel.__new__(BaseModel)
usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)
response = SimpleNamespace(
usage=usage,
choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)],
)
result = model._parse_completion_response(response)
assert result["num_generated_tokens"] == 10
assert result.get("num_input_tokens") == expected_input
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make num_input_tokens assertion strict to avoid false positives.

At Line 314, .get() can hide a missing key in cases where presence should be enforced (expected_input is 5 or 7). Split the assertion so present cases use direct indexing and the None case explicitly checks absence.

💡 Suggested change
 def test_parse_completion_response_token_counts(usage_kwargs, expected_input):
     model = BaseModel.__new__(BaseModel)
     usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)
     response = SimpleNamespace(
         usage=usage,
         choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)],
     )
     result = model._parse_completion_response(response)
     assert result["num_generated_tokens"] == 10
-    assert result.get("num_input_tokens") == expected_input
+    if expected_input is None:
+        assert "num_input_tokens" not in result
+    else:
+        assert result["num_input_tokens"] == expected_input

As per coding guidelines, **/*.py: "Don't use .get() for accessing dictionary keys if the code expects them to be present; use direct access data[key_name] to fail with a clear error instead of silently corrupting data".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_parse_completion_response_token_counts(usage_kwargs, expected_input):
model = BaseModel.__new__(BaseModel)
usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)
response = SimpleNamespace(
usage=usage,
choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)],
)
result = model._parse_completion_response(response)
assert result["num_generated_tokens"] == 10
assert result.get("num_input_tokens") == expected_input
def test_parse_completion_response_token_counts(usage_kwargs, expected_input):
model = BaseModel.__new__(BaseModel)
usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)
response = SimpleNamespace(
usage=usage,
choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)],
)
result = model._parse_completion_response(response)
assert result["num_generated_tokens"] == 10
if expected_input is None:
assert "num_input_tokens" not in result
else:
assert result["num_input_tokens"] == expected_input
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_generation.py` around lines 305 - 314, The test
test_parse_completion_response_token_counts should assert num_input_tokens
strictly instead of using .get(): call model._parse_completion_response (as
already done) and then use direct indexing result["num_input_tokens"] ==
expected_input when expected_input is a concrete number (5 or 7) so a missing
key fails loudly, and for the case where expected_input is None assert that
"num_input_tokens" not in result (or result.get("num_input_tokens") is None
explicitly) to cover the absence case; update the assertions around
result.get("num_input_tokens") accordingly.

@Kipok Kipok merged commit 2949a82 into main Mar 17, 2026
5 checks passed
@Kipok Kipok deleted the awarno/num_input_tokens branch March 17, 2026 18:25
mehrzads pushed a commit to mehrzads/NeMo-Skills that referenced this pull request Apr 8, 2026
Signed-off-by: Anna Warno <awarno@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants