feat(num-input-tokens): add num input tokens#1306
Conversation
Signed-off-by: Anna Warno <awarno@nvidia.com>
Signed-off-by: Anna Warno <awarno@nvidia.com>
📝 WalkthroughWalkthroughParsers for completion and chat-completion responses now extract and populate Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can get early access to new features in CodeRabbit.Enable the |
Signed-off-by: Anna Warno <awarno@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
nemo_skills/inference/model/base.py (1)
367-370: Deduplicatenum_input_tokensextraction across both parsers.Line 367-370 and Line 389-392 duplicate the same fallback chain. Please extract this into one helper to keep behavior in sync.
♻️ Proposed refactor
+ def _extract_num_input_tokens(self, usage) -> int | None: + if getattr(usage, "prompt_tokens", None) is not None: + return usage.prompt_tokens + if getattr(usage, "input_tokens", None) is not None: + return usage.input_tokens + return None + def _parse_completion_response( self, response: "openai.types.Completion", include_response: bool = False, **kwargs ) -> dict: @@ result = {"generation": output, "num_generated_tokens": response.usage.completion_tokens} - if getattr(response.usage, "prompt_tokens", None) is not None: - result["num_input_tokens"] = response.usage.prompt_tokens - elif getattr(response.usage, "input_tokens", None) is not None: - result["num_input_tokens"] = response.usage.input_tokens + num_input_tokens = self._extract_num_input_tokens(response.usage) + if num_input_tokens is not None: + result["num_input_tokens"] = num_input_tokens @@ def _parse_chat_completion_response(self, response, include_response: bool = False, **kwargs) -> dict: @@ result = {"generation": output, "num_generated_tokens": response.usage.completion_tokens} - if getattr(response.usage, "prompt_tokens", None) is not None: - result["num_input_tokens"] = response.usage.prompt_tokens - elif getattr(response.usage, "input_tokens", None) is not None: - result["num_input_tokens"] = response.usage.input_tokens + num_input_tokens = self._extract_num_input_tokens(response.usage) + if num_input_tokens is not None: + result["num_input_tokens"] = num_input_tokensAs per coding guidelines, "Keep code simple and elegant; reuse/extend existing functionality when possible, minimize conditional checks..."
Also applies to: 389-392
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/inference/model/base.py` around lines 367 - 370, Extract the repeated fallback logic for reading input tokens into a single helper (e.g., a private function like _extract_num_input_tokens(response)) that checks getattr(response.usage, "prompt_tokens", None) then getattr(response.usage, "input_tokens", None) and returns the found int or None; replace both duplicated blocks (the places that set result["num_input_tokens"]) with a single call result["num_input_tokens"] = _extract_num_input_tokens(response) so behavior stays in sync across both parsers.tests/test_generation.py (1)
287-301: Add matching token-count coverage for chat response parsing.This new test validates completion parsing only, while the PR also adds the same behavior to
_parse_chat_completion_response. Add a parallel chat test to prevent regressions in that path.🧪 Suggested follow-up test
+@pytest.mark.parametrize( + "usage_kwargs,expected_input", + [ + ({"prompt_tokens": 5}, 5), + ({"input_tokens": 7}, 7), + ({}, None), + ], +) +def test_parse_chat_completion_response_token_counts(usage_kwargs, expected_input): + model = BaseModel.__new__(BaseModel) + usage = SimpleNamespace(completion_tokens=10, **usage_kwargs) + message = SimpleNamespace(content="hi") + choice = SimpleNamespace(message=message, finish_reason="stop", logprobs=None) + response = SimpleNamespace(usage=usage, choices=[choice], output=[]) + result = model._parse_chat_completion_response(response) + assert result["num_generated_tokens"] == 10 + assert result.get("num_input_tokens") == expected_input🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_generation.py` around lines 287 - 301, Add a parallel test for chat responses mirroring test_parse_completion_response_token_counts to cover _parse_chat_completion_response: instantiate BaseModel similarly, build a chat-style response object whose structure matches what _parse_chat_completion_response expects (e.g., .usage with completion_tokens and prompt/input token keys and .choices with message/text-like payload), parametrize the same usage_kwargs and expected_input cases ({"prompt_tokens":5}, {"input_tokens":7}, {}), call model._parse_chat_completion_response(response)[0], and assert num_generated_tokens == 10 and num_input_tokens == expected_input to ensure the chat parsing path has identical token-count coverage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_generation.py`:
- Line 299: The test is indexing the return of
_parse_completion_response(response) as if it were a list (result =
model._parse_completion_response(response)[0]) but _parse_completion_response
returns a dict; remove the [0] and either assign the dict directly to result
(result = model._parse_completion_response(response)) or access the specific
dict key you expect (e.g., result =
model._parse_completion_response(response)['choices'] or the appropriate key),
ensuring you reference the _parse_completion_response function and the result
variable when making the change.
---
Nitpick comments:
In `@nemo_skills/inference/model/base.py`:
- Around line 367-370: Extract the repeated fallback logic for reading input
tokens into a single helper (e.g., a private function like
_extract_num_input_tokens(response)) that checks getattr(response.usage,
"prompt_tokens", None) then getattr(response.usage, "input_tokens", None) and
returns the found int or None; replace both duplicated blocks (the places that
set result["num_input_tokens"]) with a single call result["num_input_tokens"] =
_extract_num_input_tokens(response) so behavior stays in sync across both
parsers.
In `@tests/test_generation.py`:
- Around line 287-301: Add a parallel test for chat responses mirroring
test_parse_completion_response_token_counts to cover
_parse_chat_completion_response: instantiate BaseModel similarly, build a
chat-style response object whose structure matches what
_parse_chat_completion_response expects (e.g., .usage with completion_tokens and
prompt/input token keys and .choices with message/text-like payload),
parametrize the same usage_kwargs and expected_input cases ({"prompt_tokens":5},
{"input_tokens":7}, {}), call
model._parse_chat_completion_response(response)[0], and assert
num_generated_tokens == 10 and num_input_tokens == expected_input to ensure the
chat parsing path has identical token-count coverage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 404cf91f-ae6c-4028-8ce7-641445c28441
📒 Files selected for processing (2)
nemo_skills/inference/model/base.pytests/test_generation.py
Signed-off-by: Anna Warno <awarno@nvidia.com>
Signed-off-by: Anna Warno <awarno@nvidia.com>
…eMo-Skills into awarno/num_input_tokens
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_generation.py`:
- Around line 305-314: The test test_parse_completion_response_token_counts
should assert num_input_tokens strictly instead of using .get(): call
model._parse_completion_response (as already done) and then use direct indexing
result["num_input_tokens"] == expected_input when expected_input is a concrete
number (5 or 7) so a missing key fails loudly, and for the case where
expected_input is None assert that "num_input_tokens" not in result (or
result.get("num_input_tokens") is None explicitly) to cover the absence case;
update the assertions around result.get("num_input_tokens") accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8f9dd9f6-f82b-42f2-b870-81b4ec039b6a
📒 Files selected for processing (1)
tests/test_generation.py
| def test_parse_completion_response_token_counts(usage_kwargs, expected_input): | ||
| model = BaseModel.__new__(BaseModel) | ||
| usage = SimpleNamespace(completion_tokens=10, **usage_kwargs) | ||
| response = SimpleNamespace( | ||
| usage=usage, | ||
| choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)], | ||
| ) | ||
| result = model._parse_completion_response(response) | ||
| assert result["num_generated_tokens"] == 10 | ||
| assert result.get("num_input_tokens") == expected_input |
There was a problem hiding this comment.
Make num_input_tokens assertion strict to avoid false positives.
At Line 314, .get() can hide a missing key in cases where presence should be enforced (expected_input is 5 or 7). Split the assertion so present cases use direct indexing and the None case explicitly checks absence.
💡 Suggested change
def test_parse_completion_response_token_counts(usage_kwargs, expected_input):
model = BaseModel.__new__(BaseModel)
usage = SimpleNamespace(completion_tokens=10, **usage_kwargs)
response = SimpleNamespace(
usage=usage,
choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)],
)
result = model._parse_completion_response(response)
assert result["num_generated_tokens"] == 10
- assert result.get("num_input_tokens") == expected_input
+ if expected_input is None:
+ assert "num_input_tokens" not in result
+ else:
+ assert result["num_input_tokens"] == expected_inputAs per coding guidelines, **/*.py: "Don't use .get() for accessing dictionary keys if the code expects them to be present; use direct access data[key_name] to fail with a clear error instead of silently corrupting data".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_parse_completion_response_token_counts(usage_kwargs, expected_input): | |
| model = BaseModel.__new__(BaseModel) | |
| usage = SimpleNamespace(completion_tokens=10, **usage_kwargs) | |
| response = SimpleNamespace( | |
| usage=usage, | |
| choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)], | |
| ) | |
| result = model._parse_completion_response(response) | |
| assert result["num_generated_tokens"] == 10 | |
| assert result.get("num_input_tokens") == expected_input | |
| def test_parse_completion_response_token_counts(usage_kwargs, expected_input): | |
| model = BaseModel.__new__(BaseModel) | |
| usage = SimpleNamespace(completion_tokens=10, **usage_kwargs) | |
| response = SimpleNamespace( | |
| usage=usage, | |
| choices=[SimpleNamespace(text="hi", finish_reason="stop", logprobs=None)], | |
| ) | |
| result = model._parse_completion_response(response) | |
| assert result["num_generated_tokens"] == 10 | |
| if expected_input is None: | |
| assert "num_input_tokens" not in result | |
| else: | |
| assert result["num_input_tokens"] == expected_input |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_generation.py` around lines 305 - 314, The test
test_parse_completion_response_token_counts should assert num_input_tokens
strictly instead of using .get(): call model._parse_completion_response (as
already done) and then use direct indexing result["num_input_tokens"] ==
expected_input when expected_input is a concrete number (5 or 7) so a missing
key fails loudly, and for the case where expected_input is None assert that
"num_input_tokens" not in result (or result.get("num_input_tokens") is None
explicitly) to cover the absence case; update the assertions around
result.get("num_input_tokens") accordingly.
Signed-off-by: Anna Warno <awarno@nvidia.com>
Add num_input_tokens to text and chat completion responses
Summary by CodeRabbit
New Features
Tests