Skip to content

[API] feat: add reasoning recovery mechanism#1541

Open
AlpinDale wants to merge 1 commit into
mainfrom
reasoning_recovery
Open

[API] feat: add reasoning recovery mechanism#1541
AlpinDale wants to merge 1 commit into
mainfrom
reasoning_recovery

Conversation

@AlpinDale
Copy link
Copy Markdown
Collaborator

Implementation attempt for recovering from model uncertainty in reasoning trace but interrupting the model, inserting a recovery phrase (e.g. "–wait, I need to reconsider this...") and continuing. It'll attempt this max_recovery_attempts times in the reasoning trace. If the confidence builds up above the threshold (as dictated by DeepConf), it will continue naturally into the final response. If confidence is not raised, it'll insert a final admission phrase, and the final output will be the model telling the user it doesn't know the answer.

For now, testing is difficult because I can't think of a good question that would make the model really uncertain.

To test:

curl -s -X POST http://localhost:2242/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "Qwen/Qwen3-4B-Thinking-2507",
    "messages": [
      {
        "role": "user",
        "content": "Are you in a simulation? Try to prove it."
      }
    ],
    "temperature": 0.7,
    "enable_deepconf": true,
    "enable_reasoning_recovery": true,
    "max_recovery_attempts": 3,
    "recovery_phrases": [
      "–wait, let me reconsider this...",
      "–actually, let me approach this differently...",
      "–I need to think about this more carefully...",
      "–let me step back and re-examine this..."
    ],
    "final_admission": "–I'\''m not confident in my reasoning here, so let me provide what I can:",
    "stream": false
  }' | jq .

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @AlpinDale, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a novel reasoning recovery mechanism designed to improve the robustness and user experience of models when they encounter uncertainty during their reasoning process. Instead of abruptly stopping, the system will now attempt to guide the model through self-correction by inserting specific phrases and allowing it to reconsider its output. This feature is highly configurable, enabling fine-tuning of recovery attempts and messaging, ultimately leading to more graceful handling of uncertain scenarios and potentially more helpful responses.

Highlights

  • Reasoning Recovery Mechanism: Introduced a new mechanism that allows the model to attempt to recover from low confidence during reasoning by inserting recovery phrases and continuing generation, rather than stopping immediately.
  • Configurable Sampling Parameters: Added new SamplingParams fields: enable_reasoning_recovery, max_recovery_attempts, recovery_phrases, and final_admission, allowing users to control the behavior of the recovery process.
  • DeepConf Integration: The reasoning recovery mechanism is triggered by DeepConf's confidence detection, ensuring that recovery attempts are made when the model's certainty drops below a specified threshold.
  • Dynamic Phrase Insertion: When a recovery attempt is initiated, a chosen recovery phrase is dynamically inserted into the model's output, and if maximum attempts are reached without confidence improvement, a final admission phrase is used.
  • API Exposure and Validation: The new reasoning recovery parameters are exposed through the OpenAI API endpoints (ChatCompletionRequest and CompletionRequest) and include robust validation checks for their values.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a reasoning recovery mechanism to handle model uncertainty. The implementation adds new sampling parameters, a state management class for recovery, and integrates the logic into the output processing pipeline. The changes are well-structured, but there are a few key issues. A critical issue was found where the recovery phrases are not correctly fed back into the model for generation, defeating the purpose of the feature. Additionally, a missing validation check could lead to silent failures, and there are opportunities to improve code clarity and style in the new parameter validation and state management class.

Comment on lines +454 to +458
recovery_tokens = tokenizer.encode(
recovery_phrase,
add_special_tokens=False
)
req_state.detokenizer.token_ids.extend(recovery_tokens)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The recovery_tokens are appended to req_state.detokenizer.token_ids, which seems to only affect the final output text. For the reasoning recovery to work, these tokens must be appended to the model's input sequence for the next generation step. As it is, the model will continue generating from its state before the recovery phrase, which defeats the purpose of the recovery mechanism. You'll likely need a mechanism to communicate these tokens back to the scheduler to append them to the Sequence object for this request.

Comment on lines +1072 to +1075
if not self.enable_deepconf:
raise ValueError(
"enable_reasoning_recovery requires enable_deepconf to be "
"True")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

enable_reasoning_recovery depends on the ability to tokenize the recovery phrases and inject them. This requires detokenize=True. You should add a validation check to ensure this to prevent silent failures.

            if not self.enable_deepconf:
                raise ValueError(
                    "enable_reasoning_recovery requires enable_deepconf to be "
                    "True")
            if not self.detokenize:
                raise ValueError(
                    "enable_reasoning_recovery requires detokenize to be True")

Comment on lines +1080 to +1092
if (
self.recovery_phrases is not None and not
isinstance(self.recovery_phrases, list)
):
raise ValueError(
"recovery_phrases must be a list of strings, got "
f"{type(self.recovery_phrases)}.")
if (
self.recovery_phrases is not None and not all(
isinstance(p, str) for p in self.recovery_phrases)
):
raise ValueError(
"recovery_phrases must contain only strings.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation for recovery_phrases can be made more concise and readable by nesting the checks inside a single if self.recovery_phrases is not None: block. This avoids repeating the is not None check and improves maintainability.

            if self.recovery_phrases is not None:
                if not isinstance(self.recovery_phrases, list):
                    raise ValueError(
                        "recovery_phrases must be a list of strings, got "
                        f"{type(self.recovery_phrases)}.")
                if not all(isinstance(p, str) for p in self.recovery_phrases):
                    raise ValueError(
                        "recovery_phrases must contain only strings.")

Comment on lines +42 to +53
def __post_init__(self):
"""Initialize default values after dataclass creation."""
if self.recovery_phrases is None:
self.recovery_phrases = DEFAULT_RECOVERY_PHRASES.copy()
if not self.final_admission:
self.final_admission = DEFAULT_FINAL_ADMISSION
if self.original_prompt_tokens is None:
self.original_prompt_tokens = []
if self.original_output_tokens is None:
self.original_output_tokens = []
if self.recovery_point_tokens is None:
self.recovery_point_tokens = []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

You can make this __post_init__ more concise and idiomatic by using dataclasses.field(default_factory=...) for mutable default values like lists. This avoids the is None checks in __post_init__.

For example, you can change the field definitions like this:

from dataclasses import field
...
recovery_phrases: list[str] = field(default_factory=lambda: DEFAULT_RECOVERY_PHRASES.copy())
original_prompt_tokens: list[int] = field(default_factory=list)
original_output_tokens: list[int] = field(default_factory=list)
recovery_point_tokens: list[int] = field(default_factory=list)

Then, __post_init__ can be simplified as shown in the suggestion.

Suggested change
def __post_init__(self):
"""Initialize default values after dataclass creation."""
if self.recovery_phrases is None:
self.recovery_phrases = DEFAULT_RECOVERY_PHRASES.copy()
if not self.final_admission:
self.final_admission = DEFAULT_FINAL_ADMISSION
if self.original_prompt_tokens is None:
self.original_prompt_tokens = []
if self.original_output_tokens is None:
self.original_output_tokens = []
if self.recovery_point_tokens is None:
self.recovery_point_tokens = []
def __post_init__(self):
"""Initialize default values after dataclass creation."""
if not self.final_admission:
self.final_admission = DEFAULT_FINAL_ADMISSION

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant