Fix classification fallback bug and enhance all prompts (Issue #93)#191
Fix classification fallback bug and enhance all prompts (Issue #93)#191LinKon12 wants to merge 7 commits intoAOSSIE-Org:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR tightens DevRel activation rules (prefer ignoring non-relevant messages), updates classification defaults to not trigger DevRel, expands multiple DevRel prompts into structured, rule-based templates with iteration and formatting controls, pins Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Classifier
participant Agent
participant Tools
participant Responder
User->>Classifier: Send message (chat / mention / question)
Classifier->>Classifier: Apply STRICT ACTIVATION RULES
alt Should respond (mention or project-specific)
Classifier-->>Agent: needs_devrel:true, priority
Agent->>Agent: Run REACT supervisor loop (Think → Act → Observe, up to 5 iterations)
Agent->>Tools: Invoke selected tool (web_search / faq_handler / github_toolkit / onboarding)
Tools-->>Agent: Tool Results (TOOL RESULTS FROM PREVIOUS ACTIONS)
Agent->>Responder: Final task result / reasoning
Responder-->>User: Formatted response (per RESPONSE_PROMPT rules)
else Should ignore
Classifier-->>Responder: needs_devrel:false, priority: low
Responder-->>User: No action (ignored)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
pyproject.toml (1)
29-29: Exact version pinning may be overly restrictive.Pinning
discord-pyto==2.4.0prevents automatic pickup of patch releases (e.g., 2.4.1) that may contain security fixes. Consider using a compatible release constraint like~=2.4.0(equivalent to>=2.4.0,<2.5.0) unless there's a specific reason for exact pinning.- "discord-py (==2.4.0)", + "discord-py (>=2.4.0,<2.5.0)",backend/app/classification/classification_router.py (2)
51-53: Consider usinglogging.exceptionfor better debugging.Per static analysis,
logging.exceptionautomatically includes the traceback, which aids debugging classification failures.except Exception as e: - logger.error(f"Triage error: {str(e)}") + logger.exception("Triage error: %s", e) return self._fallback_triage(message)
36-37: Moveimport jsonto top of file.The
jsonmodule import inside the function adds overhead on each call. Standard practice is to import at module level.Add at the top of the file with other imports:
import jsonThen remove line 36.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
backend/app/agents/devrel/prompts/react_prompt.py(2 hunks)backend/app/agents/devrel/prompts/response_prompt.py(2 hunks)backend/app/agents/devrel/prompts/search_prompt.py(1 hunks)backend/app/agents/devrel/prompts/summarization_prompt.py(2 hunks)backend/app/classification/classification_router.py(1 hunks)backend/app/classification/prompt.py(2 hunks)backend/app/database/falkor/code-graph-backend/api/index.py(0 hunks)backend/main.py(2 hunks)pyproject.toml(1 hunks)
💤 Files with no reviewable changes (1)
- backend/app/database/falkor/code-graph-backend/api/index.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-08T13:08:48.469Z
Learnt from: smokeyScraper
Repo: AOSSIE-Org/Devr.AI PR: 72
File: backend/app/agents/shared/classification_router.py:0-0
Timestamp: 2025-06-08T13:08:48.469Z
Learning: The user plans to migrate the JSON parsing in backend/app/agents/shared/classification_router.py from manual JSON extraction to using Pydantic parser for better validation and type safety.
Applied to files:
backend/app/classification/classification_router.py
🪛 Ruff (0.14.8)
backend/app/classification/classification_router.py
51-51: Do not catch blind exception: Exception
(BLE001)
52-52: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
52-52: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (9)
backend/app/classification/classification_router.py (2)
39-45: Core bug fix looks good.Changing the default from
TruetoFalsedirectly addresses the critical issue in #93. When JSON parsing succeeds but fields are missing, the bot will now safely ignore rather than spam responses.
55-64: Fallback behavior fix is correct.The change from
needs_devrel: Truetoneeds_devrel: Falsein the fallback path is the right approach—better to miss a message than to spam users when classification fails.backend/app/agents/devrel/prompts/search_prompt.py (2)
1-48: Well-structured prompt with comprehensive guidance.The expanded prompt provides clear extraction rules, diverse examples covering common scenarios (setup, errors, contribution, APIs), and explicit edge-case handling. The 2-6 word constraint with max 10 words is reasonable for search queries.
41-45: Consider the "general inquiry" fallback behavior.Returning
"general inquiry"as a search query for greeting-only messages may not yield useful search results. Ensure downstream code handles this sentinel value appropriately, or consider returning an empty string to signal no actionable query.backend/app/classification/prompt.py (2)
7-39: Excellent stricter activation rules.The two-pronged activation criteria (explicit bot tag OR direct project question) with the "err on the side of NOT responding" decision logic directly addresses the false-positive reduction goal. The KEY example distinguishing general questions from project-specific ones (Lines 28-29) is particularly helpful for the LLM.
48-95: Comprehensive examples covering both positive and negative cases.The 18 examples (9 SHOULD RESPOND, 9 SHOULD IGNORE) provide clear guidance across various scenarios including greetings, statements vs questions, community-answered questions, and reactions. This should significantly improve classification accuracy.
backend/app/agents/devrel/prompts/summarization_prompt.py (1)
1-57: Well-structured summarization prompt with clear guidance.The prompt is well-organized with distinct sections, explicit merging instructions, and helpful good/bad examples. The 250-word limit and prioritization of recent context are sensible choices for maintaining useful conversation summaries.
backend/app/agents/devrel/prompts/react_prompt.py (1)
1-104: Comprehensive ReAct prompt with clear decision logic and guardrails.The prompt effectively implements the ReAct reasoning pattern with well-defined actions, priority-based decision logic, and iteration limits. The examples cover diverse scenarios and the "When NOT to use tools" section helps prevent common agent loops.
backend/app/agents/devrel/prompts/response_prompt.py (1)
24-144: Well-structured response prompt with comprehensive guidelines.The content guidelines, response type templates, and quality checklist provide excellent structure for generating consistent, user-friendly responses. The DevRel tone guidance and handling of incomplete information are particularly valuable additions.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/main.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-08T13:31:11.572Z
Learnt from: smokeyScraper
Repo: AOSSIE-Org/Devr.AI PR: 72
File: backend/app/agents/devrel/nodes/handle_web_search_node.py:31-42
Timestamp: 2025-06-08T13:31:11.572Z
Learning: In backend/app/agents/devrel/tools/search_tool.py, the TavilySearchTool.search() method has partial error handling for missing API key, AttributeError, ConnectionError, and TimeoutError, but lacks a comprehensive Exception catch-all block, so calling functions may still need additional error handling for other potential exceptions.
Applied to files:
backend/main.py
🪛 Ruff (0.14.8)
backend/main.py
53-53: Redundant exception object included in logging.exception call
(TRY401)
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/agents/devrel/prompts/response_prompt.py(2 hunks)
🔇 Additional comments (1)
backend/app/agents/devrel/prompts/response_prompt.py (1)
1-145: LGTM! Comprehensive prompt expansion with clear guidelines.The expanded prompt template significantly improves response quality guidance with:
- Structured content guidelines covering synthesis, actionability, tone, and length
- Clear Discord-specific formatting rules
- Six specialized response type templates
- Practical quality checklist
The multi-section approach provides the AI model with explicit instructions for generating consistent, high-quality responses while maintaining appropriate tone and formatting for the Discord platform.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/main.py (1)
52-53: Good fix addressing past feedback; consider removing unused exception variable.The switch to
logger.exception()correctly addresses previous review comments and will now log the full stack trace for debugging. The exception handling also properly catches all extension-related failures viacommands.ExtensionError.However, the exception variable
eon line 52 is no longer used sincelogger.exception()automatically includes exception details. You can simplify by removing the variable capture:Apply this diff to remove the unused variable:
- except (ImportError, commands.ExtensionError) as e: + except (ImportError, commands.ExtensionError): logger.exception("Failed to load Discord cog extension")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/main.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-08T13:31:11.572Z
Learnt from: smokeyScraper
Repo: AOSSIE-Org/Devr.AI PR: 72
File: backend/app/agents/devrel/nodes/handle_web_search_node.py:31-42
Timestamp: 2025-06-08T13:31:11.572Z
Learning: In backend/app/agents/devrel/tools/search_tool.py, the TavilySearchTool.search() method has partial error handling for missing API key, AttributeError, ConnectionError, and TimeoutError, but lacks a comprehensive Exception catch-all block, so calling functions may still need additional error handling for other potential exceptions.
Applied to files:
backend/main.py
🪛 Ruff (0.14.8)
backend/main.py
52-52: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In `@backend/app/agents/devrel/prompts/response_prompt.py`:
- Around line 24-25: The prompt contains long heavy separators
('═══════════════════════════════════════════════════════════════════') repeated
in the response prompt definition; replace those with a shorter delimiter like
'---' (or a minimal consistent token) wherever they appear in the response
prompt string(s) in backend/app/agents/devrel/prompts/response_prompt.py (e.g.,
the occurrences noted around the prompt construction), ensuring you update all
four occurrences so the visual structure remains but token usage is reduced.
- Around line 60-65: The "7. **Length & Structure**" rule in response_prompt.py
contains conflicting constraints — the "2-4 paragraphs" guideline can exceed the
"Maximum 15 lines" cap — so update that section to resolve the conflict by
either increasing the "Maximum 15 lines" limit (e.g., to a higher sensible cap)
or by adding an explicit exception that allows complex multi-paragraph answers
(the "2-4 paragraphs" case) to exceed the line cap; ensure you modify the text
where "2-4 paragraphs" and "Maximum 15 lines" appear so the policy is
unambiguous.
In `@tests/test_classification_router.py`:
- Line 15: The test instantiates ClassificationRouter() with no args which
triggers creation of a real ChatGoogleGenerativeAI client that depends on
settings.gemini_api_key; update the test to inject a mock LLM client instead of
using the no-arg constructor: create a lightweight mock (e.g., MagicMock or a
small stub implementing the same methods used by ClassificationRouter) and pass
it into ClassificationRouter(mock_llm) when constructing the router (also
replace the second instantiation at the other failing spot), ensuring the mock
returns expected values for calls used by the router so the test does not rely
on settings.gemini_api_key or external APIs.
- Around line 26-37: The test currently calls
ClassificationRouter._fallback_triage directly instead of exercising the LLM
path; update the test to mock the LLM call used by ClassificationRouter (the
router's LLM client or the method that invokes the model) to return a non-JSON
string, then call ClassificationRouter.should_process_message("test message")
and assert the result matches the fallback behavior (result["needs_devrel"] is
False, result["priority"] == "low", and "Fallback" in result["reasoning"]) so
the invalid-JSON path is actually exercised; refer to ClassificationRouter,
should_process_message, and _fallback_triage to locate the relevant code to mock
and assert.
- Around line 8-24: The test currently patches
ClassificationRouter.should_process_message but then calls _fallback_triage
directly, so the mock is unused; update the test to simulate an LLM failure and
exercise the error-handling path by patching the method that invokes the LLM
inside ClassificationRouter (e.g., the internal LLM call method such as whatever
implements LLM calls in ClassificationRouter — locate the method like _call_llm
/ _invoke_llm / llm_client.generate in the router) to raise an exception, then
call ClassificationRouter.should_process_message(...) (not _fallback_triage) and
assert the returned dict matches the fallback (needs_devrel False, priority
"low", "Fallback" in reasoning); remove the current unused patch on
should_process_message.
- Around line 2-3: Remove the fragile hardcoded import hack using
sys.path.insert(0, 'backend') and instead configure imports via test
configuration: delete the sys.path.insert call in the test module (the
occurrence of sys.path.insert(0, 'backend')), add the package root to test
discovery using pyproject.toml (tool.poetry/packages or [tool.setuptools] as
appropriate) or implement a conftest.py that adjusts sys.path relative to the
repository root, or set PYTHONPATH in CI; ensure tests import the package
normally without modifying sys.path at runtime.
🧹 Nitpick comments (3)
🤖 Fix all nitpicks with AI agents
Before applying any fix, first verify the finding against the current code and decide whether a code change is actually needed. If the finding is not valid or no change is required, do not modify code for that item and briefly explain why it was skipped. In `@backend/app/agents/devrel/prompts/response_prompt.py`: - Around line 24-25: The prompt contains long heavy separators ('═══════════════════════════════════════════════════════════════════') repeated in the response prompt definition; replace those with a shorter delimiter like '---' (or a minimal consistent token) wherever they appear in the response prompt string(s) in backend/app/agents/devrel/prompts/response_prompt.py (e.g., the occurrences noted around the prompt construction), ensuring you update all four occurrences so the visual structure remains but token usage is reduced. - Around line 60-65: The "7. **Length & Structure**" rule in response_prompt.py contains conflicting constraints — the "2-4 paragraphs" guideline can exceed the "Maximum 15 lines" cap — so update that section to resolve the conflict by either increasing the "Maximum 15 lines" limit (e.g., to a higher sensible cap) or by adding an explicit exception that allows complex multi-paragraph answers (the "2-4 paragraphs" case) to exceed the line cap; ensure you modify the text where "2-4 paragraphs" and "Maximum 15 lines" appear so the policy is unambiguous. In `@tests/test_classification_router.py`: - Around line 2-3: Remove the fragile hardcoded import hack using sys.path.insert(0, 'backend') and instead configure imports via test configuration: delete the sys.path.insert call in the test module (the occurrence of sys.path.insert(0, 'backend')), add the package root to test discovery using pyproject.toml (tool.poetry/packages or [tool.setuptools] as appropriate) or implement a conftest.py that adjusts sys.path relative to the repository root, or set PYTHONPATH in CI; ensure tests import the package normally without modifying sys.path at runtime.backend/app/agents/devrel/prompts/response_prompt.py (2)
24-25: Consider lighter section separators to save prompt tokens.The
═══...═══lines are 65+ characters each and appear four times. Since this prompt is sent with every response generation call, replacing them with shorter delimiters (e.g.,---) would trim token usage without affecting readability for the model.Also applies to: 67-68, 91-92, 133-134
🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and decide whether a code change is actually needed. If the finding is not valid or no change is required, do not modify code for that item and briefly explain why it was skipped. In `@backend/app/agents/devrel/prompts/response_prompt.py` around lines 24 - 25, The prompt contains long heavy separators ('═══════════════════════════════════════════════════════════════════') repeated in the response prompt definition; replace those with a shorter delimiter like '---' (or a minimal consistent token) wherever they appear in the response prompt string(s) in backend/app/agents/devrel/prompts/response_prompt.py (e.g., the occurrences noted around the prompt construction), ensuring you update all four occurrences so the visual structure remains but token usage is reduced.
60-65: Length constraints may conflict."2-4 paragraphs" (guideline for complex topics) can easily exceed "Maximum 15 lines" (line 65), especially when combined with numbered lists and line breaks between sections. Consider either raising the max-lines cap or noting that complex multi-paragraph answers are an exception.
🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and decide whether a code change is actually needed. If the finding is not valid or no change is required, do not modify code for that item and briefly explain why it was skipped. In `@backend/app/agents/devrel/prompts/response_prompt.py` around lines 60 - 65, The "7. **Length & Structure**" rule in response_prompt.py contains conflicting constraints — the "2-4 paragraphs" guideline can exceed the "Maximum 15 lines" cap — so update that section to resolve the conflict by either increasing the "Maximum 15 lines" limit (e.g., to a higher sensible cap) or by adding an explicit exception that allows complex multi-paragraph answers (the "2-4 paragraphs" case) to exceed the line cap; ensure you modify the text where "2-4 paragraphs" and "Maximum 15 lines" appear so the policy is unambiguous.tests/test_classification_router.py (1)
2-3: Fragilesys.pathmanipulation for imports.Hardcoding
sys.path.insert(0, 'backend')is brittle and depends on the working directory at test execution time. Prefer configuring the Python path viapyproject.toml/conftest.pyor aPYTHONPATHenv variable in CI so tests don't need this hack.🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and decide whether a code change is actually needed. If the finding is not valid or no change is required, do not modify code for that item and briefly explain why it was skipped. In `@tests/test_classification_router.py` around lines 2 - 3, Remove the fragile hardcoded import hack using sys.path.insert(0, 'backend') and instead configure imports via test configuration: delete the sys.path.insert call in the test module (the occurrence of sys.path.insert(0, 'backend')), add the package root to test discovery using pyproject.toml (tool.poetry/packages or [tool.setuptools] as appropriate) or implement a conftest.py that adjusts sys.path relative to the repository root, or set PYTHONPATH in CI; ensure tests import the package normally without modifying sys.path at runtime.
tests/test_classification_router.py
Outdated
| @pytest.mark.anyio | ||
| async def test_classification_fallback_on_error(): | ||
| """Test that classification fallback returns needs_devrel: False on error""" | ||
|
|
||
| # Mock ClassificationRouter's should_process_message to force exception path | ||
| with patch('app.classification.classification_router.ClassificationRouter.should_process_message') as mock_method: | ||
| # Create a real router instance | ||
| router = ClassificationRouter() | ||
|
|
||
| # Manually call the internal logic that would fail | ||
| # Simulate what happens when LLM fails | ||
| result = router._fallback_triage("test message") | ||
|
|
||
| # Should fallback to False (not spam on errors) | ||
| assert result["needs_devrel"] is False | ||
| assert result["priority"] == "low" | ||
| assert "Fallback" in result["reasoning"] |
There was a problem hiding this comment.
Test doesn't exercise the error path it claims to test; mock is unused.
The patch on should_process_message (line 13) is never configured or invoked — _fallback_triage is called directly, so the mock serves no purpose (also flagged by Ruff F841). This test is functionally identical to test_classification_fallback_on_invalid_json below.
To genuinely verify the error-path integration, mock the LLM to raise and call should_process_message:
Proposed fix: actually test the error path through should_process_message
`@pytest.mark.anyio`
async def test_classification_fallback_on_error():
"""Test that classification fallback returns needs_devrel: False on error"""
-
- # Mock ClassificationRouter's should_process_message to force exception path
- with patch('app.classification.classification_router.ClassificationRouter.should_process_message') as mock_method:
- # Create a real router instance
- router = ClassificationRouter()
-
- # Manually call the internal logic that would fail
- # Simulate what happens when LLM fails
- result = router._fallback_triage("test message")
-
- # Should fallback to False (not spam on errors)
- assert result["needs_devrel"] is False
- assert result["priority"] == "low"
- assert "Fallback" in result["reasoning"]
+ router = ClassificationRouter(llm_client=MagicMock())
+ router.llm.ainvoke = AsyncMock(side_effect=Exception("API rate limit"))
+
+ result = await router.should_process_message("test message")
+
+ assert result["needs_devrel"] is False
+ assert result["priority"] == "low"
+ assert "Fallback" in result["reasoning"]🧰 Tools
🪛 Ruff (0.15.0)
[error] 13-13: Local variable mock_method is assigned to but never used
Remove assignment to unused variable mock_method
(F841)
🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In `@tests/test_classification_router.py` around lines 8 - 24, The test currently
patches ClassificationRouter.should_process_message but then calls
_fallback_triage directly, so the mock is unused; update the test to simulate an
LLM failure and exercise the error-handling path by patching the method that
invokes the LLM inside ClassificationRouter (e.g., the internal LLM call method
such as whatever implements LLM calls in ClassificationRouter — locate the
method like _call_llm / _invoke_llm / llm_client.generate in the router) to
raise an exception, then call ClassificationRouter.should_process_message(...)
(not _fallback_triage) and assert the returned dict matches the fallback
(needs_devrel False, priority "low", "Fallback" in reasoning); remove the
current unused patch on should_process_message.
tests/test_classification_router.py
Outdated
| # Mock ClassificationRouter's should_process_message to force exception path | ||
| with patch('app.classification.classification_router.ClassificationRouter.should_process_message') as mock_method: | ||
| # Create a real router instance | ||
| router = ClassificationRouter() |
There was a problem hiding this comment.
ClassificationRouter() instantiation without mocking may fail in CI.
The no-arg constructor creates a real ChatGoogleGenerativeAI client, which requires a valid API key from settings.gemini_api_key. If tests run in an environment without that key configured, instantiation could fail. Pass a mock LLM client instead (as shown in the fixes above).
Also applies to: 29-29
🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In `@tests/test_classification_router.py` at line 15, The test instantiates
ClassificationRouter() with no args which triggers creation of a real
ChatGoogleGenerativeAI client that depends on settings.gemini_api_key; update
the test to inject a mock LLM client instead of using the no-arg constructor:
create a lightweight mock (e.g., MagicMock or a small stub implementing the same
methods used by ClassificationRouter) and pass it into
ClassificationRouter(mock_llm) when constructing the router (also replace the
second instantiation at the other failing spot), ensuring the mock returns
expected values for calls used by the router so the test does not rely on
settings.gemini_api_key or external APIs.
tests/test_classification_router.py
Outdated
| @pytest.mark.anyio | ||
| async def test_classification_fallback_on_invalid_json(): | ||
| """Test that classification fallback handles invalid JSON from LLM""" | ||
| router = ClassificationRouter() | ||
|
|
||
| # Test the fallback method directly | ||
| result = router._fallback_triage("test message") | ||
|
|
||
| # Should return needs_devrel: False | ||
| assert result["needs_devrel"] is False | ||
| assert result["priority"] == "low" | ||
| assert "Fallback" in result["reasoning"] No newline at end of file |
There was a problem hiding this comment.
Test claims to cover "invalid JSON" but doesn't send anything through the LLM path.
Same issue as above — this directly calls _fallback_triage, so it's a duplicate of the first test rather than verifying that should_process_message routes to the fallback when the LLM returns non-JSON. Consider mocking the LLM to return a non-JSON string:
Proposed fix: test the invalid-JSON path through should_process_message
`@pytest.mark.anyio`
async def test_classification_fallback_on_invalid_json():
"""Test that classification fallback handles invalid JSON from LLM"""
- router = ClassificationRouter()
-
- # Test the fallback method directly
- result = router._fallback_triage("test message")
-
- # Should return needs_devrel: False
- assert result["needs_devrel"] is False
- assert result["priority"] == "low"
- assert "Fallback" in result["reasoning"]
+ mock_llm = MagicMock()
+ mock_llm.ainvoke = AsyncMock(
+ return_value=MagicMock(content="This is not valid JSON at all")
+ )
+ router = ClassificationRouter(llm_client=mock_llm)
+
+ result = await router.should_process_message("test message")
+
+ assert result["needs_devrel"] is False
+ assert result["priority"] == "low"
+ assert "Fallback" in result["reasoning"]🤖 Prompt for AI Agents
Before applying any fix, first verify the finding against the current code and
decide whether a code change is actually needed. If the finding is not valid or
no change is required, do not modify code for that item and briefly explain why
it was skipped.
In `@tests/test_classification_router.py` around lines 26 - 37, The test currently
calls ClassificationRouter._fallback_triage directly instead of exercising the
LLM path; update the test to mock the LLM call used by ClassificationRouter (the
router's LLM client or the method that invokes the model) to return a non-JSON
string, then call ClassificationRouter.should_process_message("test message")
and assert the result matches the fallback behavior (result["needs_devrel"] is
False, result["priority"] == "low", and "Fallback" in result["reasoning"]) so
the invalid-JSON path is actually exercised; refer to ClassificationRouter,
should_process_message, and _fallback_triage to locate the relevant code to mock
and assert.
Closes #93
📝 Description
This PR fixes a critical bug where the bot responded to every message when classification API calls failed (rate limits, errors). The fallback behavior returned
needs_devrel: True, causing spam. Now it safely ignores messages when classification fails.Additionally, all prompts were enhanced per issue #93's secondary objective - improved clarity, added examples, and stricter activation rules for the classification prompt.
🔧 Changes Made
classification_router.py: Changed fallback fromneeds_devrel: TruetoFalse- bot now ignores messages safely when classification failsDEVREL_TRIAGE_PROMPT: Added strict activation rules (bot tag OR direct questions), 18 examples (vs 3), explicit ignore casesEXTRACT_SEARCH_QUERY_PROMPT: Added 8 examples and comprehensive extraction guidelines (was only 3 lines)REACT_SUPERVISOR_PROMPT: Added iteration limits (max 5), decision logic, and 5 reasoning examplesRESPONSE_PROMPT: Added 6 response type templates and quality checklistCONVERSATION_SUMMARY_PROMPT: Added structured format with 4 sections and merge strategy📷 Screenshots or Visual Changes (if applicable)
N/A - Backend logic changes only
🤝 Collaboration
Solo work
✅ Checklist
Summary by CodeRabbit
Improvements
Bug Fixes
Tests
Chores