feat: Proactive contributor assistance with lightweight pattern detection by mahek2016 · Pull Request #283 · AOSSIE-Org/Devr.AI

mahek2016 · 2026-02-18T18:22:07Z

Closes #258

📝 Description

This PR introduces a proactive contributor assistance layer to Devr.AI.

Before calling the LLM for every message, the system now performs lightweight pattern-based detection to identify common contributor intents such as greetings, onboarding questions, and issue discovery queries.

This improves responsiveness, reduces unnecessary LLM API calls, and provides a smoother onboarding experience for new contributors.

The implementation is modular and can be extended later with caching, fuzzy matching, or advanced intent detection.

🔧 Changes Made

Added _simple_pattern_match() in classification_router.py
Implemented lightweight intent detection before LLM invocation
Added proactive response handling in discord/bot.py
Reduced unnecessary LLM calls for simple messages
Maintained backward compatibility with existing classification flow

📷 Screenshots or Visual Changes (if applicable)

N/A – Backend and bot behavior enhancement (no UI changes)

🤝 Collaboration

Collaborated with: N/A

✅ Checklist

I have read the contributing guidelines.
I have added tests that prove my fix is effective or that my feature works.
I have added necessary documentation (if applicable).
Any dependent changes have been merged and published in downstream modules.

Summary by CodeRabbit

New Features
- Add a public endpoint and in-app command to fetch beginner-friendly "good first issue" suggestions from GitHub.
- Discord bot now processes messages directly and delivers multi-part, faster responses with updated greetings.
Improvements
- Introduces lightweight rule-based intent detection for quicker, deterministic triage and clearer fallback handling.
- More consistent error responses and logging for GitHub issue lookups.

…etection

coderabbitai · 2026-02-18T18:22:25Z

📝 Walkthrough

Walkthrough

Replaces LLM-driven triage with lightweight rule-based and service-backed flows across the devrel stack: adds private pattern/fallback triage helpers, swaps Discord queue/agent execution for direct GitHubToolkit calls, introduces IssueSuggestionService and new GitHub API endpoints/routes, and updates main app and requirements accordingly.

Changes

Cohort / File(s)	Summary
Classification Router `backend/app/classification/classification_router.py`	Adds `_simple_pattern_match()` and `_fallback_triage()` helpers; updates `should_process_message()` to include a disabled pattern-match Step 1 and retains LLM fallback; changes prompt/context formatting and JSON-block parsing with a dedicated LLM fallback path.
Discord bot — direct toolkit execution `backend/integrations/discord/bot.py`	Removes queue/agent integration and AsyncQueueManager usage; constructor signature simplified to `__init__(**kwargs)`; on_message now directly invokes GitHubToolkit and returns toolkit results to users; removes queue handler registration and related methods.
GitHub toolkit & intent changes `backend/app/agents/devrel/github/github_toolkit.py`	Replaces LLM-based classification with rule-based keyword classifier in `classify_intent`; integrates IssueSuggestionService for `find_good_first_issues`; expands execute flow to handle new intent branches and returns structured results.
Issue suggestion service (new) `backend/app/agents/devrel/github/services/issue_suggestion_service.py`, `backend/services/github/issue_suggestion_service.py`	Adds `IssueSuggestionService` to query GitHub Search Issues API for "good first issue" results, parse items, and return simplified issue lists; handles auth and non-200 responses.
API additions & routes `backend/app/api/router.py`, `backend/app/api/v1/github.py`, `backend/routes.py`	Registers new `/v1/github` router and adds `/beginner-issues` endpoints; initializes IssueSuggestionService with token and exposes beginner-issues handlers with error handling.
Application bootstrap & deps `backend/main.py`, `backend/requirements.txt`	Removes agent/queue/weaviate boot paths, simplifies background task lifecycle to Discord bot only, relaxes CORS to `"*"`, and substantially trims/updates Python dependencies list.

Sequence Diagram(s)

sequenceDiagram
    participant User as User (Discord)
    participant Bot as Discord Bot
    participant Toolkit as GitHubToolkit
    participant Service as IssueSuggestionService
    participant GitHub as GitHub API

    User->>Bot: message (question / request)
    Bot->>Toolkit: classify_intent / execute(message)
    Toolkit->>Toolkit: rule-based intent classification
    alt find_good_first_issues
        Toolkit->>Service: fetch_global_beginner_issues(query)
        Service->>GitHub: GET /search/issues (Bearer token)
        GitHub-->>Service: search results (JSON)
        Service-->>Toolkit: list of simplified issues
        Toolkit-->>Bot: structured result with issues
    else other intents
        Toolkit->>Toolkit: dispatch to appropriate handler (web_search, contributor_recommendation, etc.)
        Toolkit-->>Bot: structured result or placeholder
    end
    Bot->>User: send formatted response (issues or handler output)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[feat]: implement github contributor recommendation tool #110: Modifies the devrel GitHub toolkit’s classification/execute flow; likely overlaps with the new rule-based classifier and issue-suggestion paths.
Add GitHub MCP microservice for repository queries #131: Also edits backend/app/agents/devrel/github/github_toolkit.py; may conflict with intent handling or execution branching.
[feature]: entirely ReAct based workflow; aligns agents; introduces github agent #90: Touches ClassificationRouter and triage flow changes similar to this PR’s additions.

Suggested reviewers

smokeyScraper

Poem

🐰 I hopped through code, my whiskers twitching bright,
I sniffed out issues that welcome a first-time bite,
With tiny paws I fetch the helpful links you need,
No bulky LLMs — just a nimble rabbit speed,
Hop on, contribute — I’ll point you to the seed 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	Pull request includes in-scope pattern matching and rule-based classification alongside significant out-of-scope changes: complete queue system replacement, APIRouter introduction, dependency restructuring, and Discord bot refactoring unrelated to proactive assistance feature.	Separate queue system removal, API route changes, and dependency updates into distinct pull requests; keep this PR focused on pattern matching and proactive response handling only.
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check	❓ Inconclusive	Implementation partially addresses issue `#258` objectives: adds pattern-based intent detection and proactive response handling, but lacks conversation tracking across platforms, maintainer analytics, feedback loops, and structured onboarding guidance.	While lightweight pattern detection is implemented, consider a roadmap for platform tracking, analytics dashboard, feedback mechanisms, and structured onboarding flows mentioned in the feature request for future phases.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: introduces lightweight pattern detection for proactive contributor assistance, which is the primary technical addition across the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

backend/app/classification/classification_router.py (3)

104-104: Move import json to the top of the file.

Importing inside a function body on every LLM-response parse is a minor code smell. Standard library imports belong at the module level.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/classification/classification_router.py` at line 104, Move the
"import json" statement out of the function body and place it with the other
module-level imports at the top of classification_router.py; remove the inline
"import json" (currently inside the LLM-response parsing code) so the standard
library is imported once at module import time rather than on every call.

28-69: Substring matching is prone to false positives.

any(k in msg for k in ...) performs naive substring containment. Examples:

"beginner" matches "I'm not a beginner" or "the beginner issue is too hard"
"thanks" matches "no thanks", "thanksgiving"
"new here" matches "nothing new here to report"

Meanwhile, greetings use exact match (msg in greetings), so "hi!" or "hello there" won't trigger — inconsistent sensitivity.

Consider adding word-boundary checks (e.g., regex \b anchors) or at minimum tightening the keyword lists to reduce false triggers. This is acceptable for an MVP but worth flagging.

💡 Example: use regex word boundaries for more robust matching

+import re
+
 def _simple_pattern_match(self, message: str):
     msg = message.lower().strip()

-    greetings = ["hi", "hello", "hey"]
-    thanks = ["thanks", "thank you"]
-    onboarding_keywords = ["new here", "how to start", "beginner", "first time"]
-    issue_keywords = ["good first issue", "beginner issue", "start contributing"]
+    greeting_pattern = re.compile(r"^(hi|hello|hey)[\s!?.]*$")
+    thanks_pattern = re.compile(r"\b(thanks|thank you)\b")
+    onboarding_pattern = re.compile(r"\b(new here|how to start|beginner|first time)\b")
+    issue_pattern = re.compile(r"\b(good first issue|beginner issue|start contributing)\b")

-    if msg in greetings:
+    if greeting_pattern.match(msg):
         ...
-    if any(k in msg for k in onboarding_keywords):
+    if onboarding_pattern.search(msg):
         ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/classification/classification_router.py` around lines 28 - 69,
Current substring checks on msg (using any(k in msg for k in ...)) cause false
positives and greetings use an inconsistent exact-match approach; replace these
with word-boundary or tokenized matching. Update checks for greetings,
onboarding_keywords, issue_keywords, and thanks to use regex searches with \b
anchors (e.g., compile patterns for greetings like r'^(hi|hello|hey)\b' and for
lists use r'\b(keyword)\b' or join list items into a single alternation), or
alternatively tokenize msg into words and check set intersections against the
keyword sets; ensure msg is normalized (lower/stripped) before matching and
precompile patterns for performance.

76-76: Use explicit Optional for the context parameter.

Per PEP 484, Dict[str, Any] = None should be Optional[Dict[str, Any]] = None (or Dict[str, Any] | None = None). Ruff RUF013 flags this correctly.

Proposed fix

-from typing import Dict, Any
+from typing import Dict, Any, Optional
 ...
     async def should_process_message(
         self,
         message: str,
-        context: Dict[str, Any] = None
+        context: Optional[Dict[str, Any]] = None
     ) -> Dict[str, Any]:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/classification/classification_router.py` at line 76, The
parameter annotation "context: Dict[str, Any] = None" is not explicit about
optionality; update it to "context: Optional[Dict[str, Any]] = None" (or
"context: Dict[str, Any] | None = None" if using 3.10+), and add the
corresponding import for Optional from typing (or ensure the union syntax is
supported by the project's Python version); apply this change where the
"context" parameter is declared so static checkers (Ruff RUF013) no longer flag
it.

backend/integrations/discord/bot.py (2)

80-112: Extract proactive response templates into constants or a config map.

Four inline multiline strings couple presentation with control flow. As this feature grows (more proactive types, i18n, A/B testing), maintaining these inline will be painful. Consider a dictionary mapping proactive_type → template string at the module or class level, reducing the handler to a simple lookup + send.

💡 Sketch

PROACTIVE_RESPONSES = {
    "greeting": (
        "Hi {mention}! 👋\n"
        "Welcome to the community!\n"
        "If you're new, I can guide you on how to start contributing 🚀"
    ),
    "onboarding": (
        "Awesome {mention}! 🎉\n"
        "Here's how you can start:\n"
        "1️⃣ Look for `good first issue`\n"
        "2️⃣ Set up the project locally\n"
        "3️⃣ Read CONTRIBUTING.md\n\n"
        "Would you like me to suggest beginner-friendly issues?"
    ),
    "issue_suggestion": (
        "{mention} 🔍\n"
        "You can check open issues labeled `good first issue`.\n"
        "Would you like me to fetch some right now?"
    ),
}

Then in the handler:

template = PROACTIVE_RESPONSES.get(proactive_type)
if template:
    await message.channel.send(template.format(mention=message.author.mention))
    return
if proactive_type == "acknowledgment":
    return

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/integrations/discord/bot.py` around lines 80 - 112, Extract the
inline proactive reply strings into a module- or class-level mapping (e.g.,
PROACTIVE_RESPONSES) and replace the multiple inline message.channel.send calls
in the triage handling block (where triage_result and proactive_type are used)
with a single lookup + send flow: lookup template =
PROACTIVE_RESPONSES.get(proactive_type), if template then await
message.channel.send(template.format(mention=message.author.mention)) and
return; keep the explicit early return for the "acknowledgment" proactive_type.
This centralizes templates for easier maintenance, i18n and testing while
preserving the existing control flow in the handler.

84-109: Proactive responses bypass thread creation — no rate-limiting or dedup.

Every matching message (e.g., a user saying "hi" repeatedly) will trigger a new channel-level response with no cooldown. Consider adding a lightweight per-user cooldown (e.g., a TTL dict keyed by (user_id, proactive_type)) to avoid spamming the channel.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/integrations/discord/bot.py` around lines 84 - 109, The proactive
handlers in backend/integrations/discord/bot.py (the blocks checking
proactive_type and calling message.channel.send) currently send channel-level
replies every match with no dedup or rate-limit; add a lightweight per-user
cooldown before each send by introducing a process-wide TTL cache (e.g., a dict
mapping (message.author.id, proactive_type) to expiry timestamp), check the
cache at the top of the proactive path and skip sending if the entry exists and
hasn't expired, and after sending set/update the cache with a short cooldown
(e.g., 60–300s) so subsequent identical triggers are suppressed; ensure the keys
reference message.author.id and the proactive_type used in those if-blocks so
the check applies to greeting/onboarding/issue_suggestion separately.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/classification/classification_router.py`:
- Around line 62-69: The acknowledgment branch in classification_router.py
currently returns needs_devrel: False which makes bot._handle_devrel_message
(and the acknowledgment handler in bot.py) unreachable; either change the
acknowledgment return to needs_devrel: True so the triage gate if
triage_result.get("needs_devrel", False) will invoke _handle_devrel_message (and
let the existing acknowledgment handler run), or remove the dead acknowledgment
handler from bot.py (lines ~111-112) and keep needs_devrel: False to silently
drop "thanks" messages—update the JSON return in the acknowledgment branch (the
dict with "proactive_type": "acknowledgment") or remove the handler in bot.py
accordingly.

In `@backend/integrations/discord/bot.py`:
- Around line 111-112: The if-branch checking proactive_type == "acknowledgment"
in bot.py is dead because acknowledgment results have needs_devrel: False and
thus on_message never calls _handle_devrel_message for them; remove this
unreachable branch or adjust the caller logic to pass acknowledgment cases if
you intended to handle them here. Locate the conditional around proactive_type
in _handle_devrel_message (or the surrounding devrel handling block referenced
by on_message) and either delete the acknowledgment branch or update
on_message/_handle_devrel_message to ensure acknowledgment results are routed
here before keeping the branch.
- Around line 92-101: The string passed to message.channel.send inside the
proactive_type == "onboarding" branch contains a Unicode RIGHT SINGLE QUOTATION
MARK in "Here’s"; replace it with the ASCII apostrophe so the literal becomes
"Here's" to satisfy Ruff RUF001 and maintain consistent ASCII punctuation in the
onboarding message; update the message in the message.channel.send call
accordingly and run the linter to verify the warning is cleared.

---

Nitpick comments:
In `@backend/app/classification/classification_router.py`:
- Line 104: Move the "import json" statement out of the function body and place
it with the other module-level imports at the top of classification_router.py;
remove the inline "import json" (currently inside the LLM-response parsing code)
so the standard library is imported once at module import time rather than on
every call.
- Around line 28-69: Current substring checks on msg (using any(k in msg for k
in ...)) cause false positives and greetings use an inconsistent exact-match
approach; replace these with word-boundary or tokenized matching. Update checks
for greetings, onboarding_keywords, issue_keywords, and thanks to use regex
searches with \b anchors (e.g., compile patterns for greetings like
r'^(hi|hello|hey)\b' and for lists use r'\b(keyword)\b' or join list items into
a single alternation), or alternatively tokenize msg into words and check set
intersections against the keyword sets; ensure msg is normalized
(lower/stripped) before matching and precompile patterns for performance.
- Line 76: The parameter annotation "context: Dict[str, Any] = None" is not
explicit about optionality; update it to "context: Optional[Dict[str, Any]] =
None" (or "context: Dict[str, Any] | None = None" if using 3.10+), and add the
corresponding import for Optional from typing (or ensure the union syntax is
supported by the project's Python version); apply this change where the
"context" parameter is declared so static checkers (Ruff RUF013) no longer flag
it.

In `@backend/integrations/discord/bot.py`:
- Around line 80-112: Extract the inline proactive reply strings into a module-
or class-level mapping (e.g., PROACTIVE_RESPONSES) and replace the multiple
inline message.channel.send calls in the triage handling block (where
triage_result and proactive_type are used) with a single lookup + send flow:
lookup template = PROACTIVE_RESPONSES.get(proactive_type), if template then
await message.channel.send(template.format(mention=message.author.mention)) and
return; keep the explicit early return for the "acknowledgment" proactive_type.
This centralizes templates for easier maintenance, i18n and testing while
preserving the existing control flow in the handler.
- Around line 84-109: The proactive handlers in
backend/integrations/discord/bot.py (the blocks checking proactive_type and
calling message.channel.send) currently send channel-level replies every match
with no dedup or rate-limit; add a lightweight per-user cooldown before each
send by introducing a process-wide TTL cache (e.g., a dict mapping
(message.author.id, proactive_type) to expiry timestamp), check the cache at the
top of the proactive path and skip sending if the entry exists and hasn't
expired, and after sending set/update the cache with a short cooldown (e.g.,
60–300s) so subsequent identical triggers are suppressed; ensure the keys
reference message.author.id and the proactive_type used in those if-blocks so
the check applies to greeting/onboarding/issue_suggestion separately.

backend/integrations/discord/bot.py

…ostrophe)

mahek2016 · 2026-02-19T09:17:21Z

@coderabbitai Thanks for the review!

Removed the unreachable acknowledgment branch.
Replaced the Unicode apostrophe with an ASCII apostrophe for consistency.

All review suggestions have been addressed. Please let me know if anything else needs improvement.

…itHub API integration

coderabbitai

Actionable comments posted: 19

🧹 Nitpick comments (12)

backend/app/agents/devrel/github/github_toolkit.py (2)

120-125: IssueSuggestionService is instantiated on every execute() call.

A new service instance (and implicitly a new HTTP client lifecycle) is created for each query. Move the instantiation to __init__ to reuse the service across calls.

Proposed fix

 class GitHubToolkit:
     def __init__(self):
+        self.issue_service = IssueSuggestionService(settings.github_token)
         self.tools = [
             ...
         ]

Then in execute:

-            service = IssueSuggestionService(settings.github_token)
-            issues = await service.fetch_global_beginner_issues(query)
+            issues = await self.issue_service.fetch_global_beginner_issues(query)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/agents/devrel/github/github_toolkit.py` around lines 120 - 125,
The code currently instantiates IssueSuggestionService(settings.github_token)
inside execute(), creating a new HTTP client per call; move the instantiation
into the class __init__ (store it as self.issue_suggestion_service) and update
execute() to call
self.issue_suggestion_service.fetch_global_beginner_issues(query) so the same
IssueSuggestionService instance (and its underlying HTTP client) is reused
across calls.

47-76: Rule-based classifier uses overly broad single-keyword matches.

Keywords like "repo" and "search" are very common words. For example:

"I want to report a bug" contains "repo" → classified as repo_support (false positive from substring match of "repo" in "report")
"I'm searching for help" → classified as web_search

Consider using multi-word phrases, word boundaries (\b regex), or at minimum checking for whole-word matches.

Example: tighten keyword matching

-        elif "repo" in query_lower:
+        elif "repository" in query_lower or " repo " in f" {query_lower} ":
             classification = "repo_support"
 
-        elif "search" in query_lower:
+        elif "web search" in query_lower or query_lower.startswith("search "):
             classification = "web_search"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/agents/devrel/github/github_toolkit.py` around lines 47 - 76, The
rule-based classifier in classify_intent uses broad substring checks (e.g.,
checking "repo" or "search" in query_lower) causing false positives; change
those checks to use whole-word or phrase matching (e.g., regex with word
boundaries or split/token-based matching) and prefer multi-word phrases for
intents like "good first issue" and "github support" to avoid matching
substrings (e.g., "report" -> "repo"); update the conditions in classify_intent
to perform re.search(r"\brepo\b", ...) or equivalent token checks and add tests
for cases like "report" and "searching" to verify correct classification.

backend/routes.py (1)

122-127: Chain the original exception with raise ... from e.

Per Ruff B904, re-raising without from loses the original traceback context.

Fix

     except Exception as e:
         logging.error(f"Error fetching beginner issues: {e}")
         raise HTTPException(
             status_code=500,
             detail="Failed to fetch beginner issues"
-        )
+        ) from e

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/routes.py` around lines 122 - 127, The except Exception as e block
that logs and re-raises an HTTPException when failing to fetch beginner issues
should chain the original exception to preserve traceback; update the re-raise
of HTTPException(status_code=500, detail="Failed to fetch beginner issues") to
use "from e" (i.e., raise HTTPException(...) from e) while keeping the logging
call intact so the original exception context is preserved for debugging.

backend/app/classification/classification_router.py (3)

30-35: Exact-match greetings won't catch common variations.

msg in greetings requires an exact match after lower().strip(), so inputs like "hi there", "hello!", or "hey everyone" will miss. If the goal is proactive onboarding on greetings, consider msg.startswith(...) or any(msg.startswith(g) for g in greetings).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/classification/classification_router.py` around lines 30 - 35,
The current exact-match check (if msg in greetings) in classification_router.py
will miss variants like "hi there" or "hello!"; update the condition that uses
the greetings list to use a starts-with or substring test (e.g.,
any(msg.startswith(g) or msg.startswith(g + " ") or g in msg for g in
greetings)) after msg = msg.lower().strip(), and consider stripping trailing
punctuation before matching; locate the greetings variable and the conditional
that checks "if msg in greetings" and replace it with the more permissive
any(...) test so greetings like "hi there", "hello!", and "hey everyone" are
correctly detected.

73-77: PEP 484: Use explicit Optional instead of implicit None default.

context: Dict[str, Any] = None should be context: Dict[str, Any] | None = None (or Optional[Dict[str, Any]] = None). This is flagged by Ruff RUF013.

Fix

     async def should_process_message(
         self,
         message: str,
-        context: Dict[str, Any] = None
+        context: Dict[str, Any] | None = None
     ) -> Dict[str, Any]:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/classification/classification_router.py` around lines 73 - 77,
The type annotation for the method should_process_message uses an implicit None
default (context: Dict[str, Any] = None) which flags Ruff RUF013; change the
parameter to an explicit optional type such as context: Dict[str, Any] | None =
None (or use Optional[Dict[str, Any]] = None if you prefer typing.Optional) so
the signature clearly expresses that context may be None while keeping the
default None value; update the import if you choose Optional.

81-85: if False and pattern_result: — pattern matching is permanently disabled (dead code).

The if False condition means _simple_pattern_match is called but its result is always discarded. The entire "Step 1" branch is unreachable. If this is a feature flag for future use, consider using a configuration setting or removing the call until it's ready—a bare if False is confusing and easy to miss in review.

Suggestion: use a config flag or remove

-            #  Step 1: Lightweight proactive pattern check
-            pattern_result = self._simple_pattern_match(message)
-            if False and pattern_result:
-                logger.info("Pattern-based proactive classification triggered")
-                return pattern_result
+            #  Step 1: Lightweight proactive pattern check (disabled; enable via config)
+            if getattr(settings, 'enable_pattern_matching', False):
+                pattern_result = self._simple_pattern_match(message)
+                if pattern_result:
+                    logger.info("Pattern-based proactive classification triggered")
+                    return pattern_result

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/classification/classification_router.py` around lines 81 - 85,
The branch using "if False and pattern_result" permanently disables the
proactive pattern logic while still calling _simple_pattern_match; replace this
dead flag with a real configuration toggle (e.g.,
self.config.enable_proactive_pattern or a module-level setting) and guard the
call so _simple_pattern_match(message) is only invoked when the flag is true; if
you prefer to remove the feature, delete the call to _simple_pattern_match and
the unreachable if-block entirely (and keep the logger/info and return behavior
only when the config flag is enabled).

backend/integrations/discord/bot.py (2)

58-60: GitHubToolkit() is instantiated on every incoming message.

Each message creates a new GitHubToolkit instance (and downstream, a new IssueSuggestionService). Move toolkit initialization to __init__ so it's reused across messages.

Proposed fix

 def __init__(self, **kwargs):
     ...
     self.active_threads: Dict[str, str] = {}
+    self.toolkit = GitHubToolkit()

-            # 🔥 Direct Toolkit Execution
-            toolkit = GitHubToolkit()
-            result = await toolkit.execute(message.content)
+            # 🔥 Direct Toolkit Execution
+            result = await self.toolkit.execute(message.content)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/integrations/discord/bot.py` around lines 58 - 60, Currently
GitHubToolkit() (and thus IssueSuggestionService) is instantiated per message;
move the creation to the bot class constructor by adding self.toolkit =
GitHubToolkit() in __init__ and update the message handler to call await
self.toolkit.execute(message.content) instead of creating a new GitHubToolkit;
ensure any other places that constructed IssueSuggestionService are updated to
reuse self.toolkit so toolkit and downstream services are reused across
messages.

32-32: active_threads dict grows without bound — no eviction or cleanup.

Every new user gets an entry that's only removed if the thread is found to be archived. There's no TTL, max-size, or periodic cleanup. Over time this will consume memory.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/integrations/discord/bot.py` at line 32, active_threads is an
unbounded dict and will leak memory; implement bounded/TTL eviction and cleanup:
replace plain Dict[str,str] active_threads with a small helper (e.g., an LRU or
TTL cache) or wrap accesses to enforce a max size and timestamps, remove entries
when threads are archived/closed, and run a periodic cleanup task. Update any
code that writes to active_threads (places that create threads or handle thread
events such as thread creation/updates/archival handlers, e.g., the functions
that currently add/remove entries) to use the new helper APIs (put/get/remove)
so entries are removed on archive and expired entries are purged by the periodic
task or on insert when max size is reached. Ensure thread-safety if accessed
from async handlers.

backend/app/agents/devrel/github/services/issue_suggestion_service.py (3)

7-63: Duplicate IssueSuggestionService class with a divergent interface

This class is a second, incompatible definition of IssueSuggestionService that already exists at backend/services/github/issue_suggestion_service.py. The two copies have different method signatures (fetch_global_beginner_issues(user_query, limit) here vs. fetch_global_beginner_issues(language, limit) there) and different filtering logic. This split will cause confusion over which one to import and will inevitably diverge further.

Consider consolidating into a single service (e.g. merging the user_query parsing approach from this file with the parameterised approach in the other, or moving all GitHub service logic under one canonical path).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/agents/devrel/github/services/issue_suggestion_service.py` around
lines 7 - 63, There are two incompatible definitions of IssueSuggestionService;
consolidate them by removing the duplicate and merging logic into one canonical
class (IssueSuggestionService) so imports are unambiguous: pick the canonical
file to keep (prefer backend/services/github/issue_suggestion_service.py) and
update its fetch_global_beginner_issues signature to support both a typed
parameterized interface (e.g., language: Optional[str], org: Optional[str],
limit: int) and the freeform user_query parsing used here; migrate the filtering
logic from this file (language detection from user_query and org checks like
"django") into the single retained fetch_global_beginner_issues implementation,
adjust callers to the unified signature, and delete this duplicate class to
avoid divergence.
29-34: Hardcoded language/org detection is too narrow and not extensible

Only "python" and "django" are recognised, meaning any other language or organisation query passes through unfiltered. This makes the service's value-add effectively a no-op for the vast majority of inputs and will be confusing for contributors asking about JavaScript, Rust, etc.

At minimum, a configurable mapping (or a parameter-driven design) should replace the hard-coded strings, or the filtering logic should be clearly documented as a stub.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/agents/devrel/github/services/issue_suggestion_service.py` around
lines 29 - 34, The current hardcoded detection in issue_suggestion_service.py
(the search_query assembly using query_lower) only checks for "python" and
"django"; replace this with a configurable mapping approach: add language and
org maps (e.g., LANG_KEYWORDS and ORG_KEYWORDS) that map keyword sets to search
qualifiers and use a loop to detect any matching keyword in query_lower and
append the corresponding "language:..." or "org:..." to search_query; expose
these maps as constructor parameters or module-level config so they can be
extended without code changes and update the logic in the function that builds
search_query to iterate the maps instead of checking only "python" and "django".
38-39: Replace print() debug statements with structured logging

Lines 38–39, 45–46, and 61 use raw print() calls with emoji, which are not appropriate for production code — they bypass log level controls, are invisible in log aggregators, and the status/error lines on 45–46 silently swallow failures without any re-raise or propagation.
♻️ Proposed refactor
+import logging
+
+logger = logging.getLogger(__name__)

 ...

-        print("🔍 GitHub Search Query:", search_query)
-        print("🔗 GitHub URL:", url)
+        logger.debug("GitHub search query: %s", search_query)

         async with httpx.AsyncClient() as client:
             response = await client.get(url, headers=headers)

             if response.status_code != 200:
-                print("❌ GitHub API Error:", response.status_code)
-                print("❌ Response Body:", response.text)
+                logger.error("GitHub API error %s: %s", response.status_code, response.text)
                 return []

 ...

-        print(f"✅ Found {len(results)} issues")
+        logger.debug("Found %d beginner issues", len(results))
Also applies to: 45-46, 61-61
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/agents/devrel/github/services/issue_suggestion_service.py` around
lines 38 - 39, Replace the raw print() debug statements for "search_query" and
"url" (and the other print calls at the noted locations) with structured
logging: create or use a module logger (logging.getLogger(__name__>) and use
appropriate log levels (logger.debug or logger.info for query/URL diagnostics,
logger.error for failures). For the prints that currently swallow errors (the
status/error prints around lines 45–46 and 61), log the error with logger.error
including the exception/info and either re-raise the exception or
return/propagate a clear error value instead of silently continuing; update the
surrounding code paths that reference search_query and url to use these logger
calls (identify the usages of variables search_query and url and the
exception-handling block in issue_suggestion_service.py) so logs are captured by
aggregators and obey log-level configuration.

backend/app/api/v1/github.py (1)

30-34: Three Ruff-flagged issues in the exception handler

BLE001 – Bare except Exception masks unexpected errors; prefer a more specific exception type or at minimum document why the catch-all is intentional.
B904 – raise HTTPException(...) inside an except block should use raise ... from e to preserve the exception chain.
RUF010 – Use the !s conversion flag instead of str(e) in the f-string.

🔧 Proposed fix

-    except Exception as e:
-        raise HTTPException(
+    except Exception as e:
+        raise HTTPException(
             status_code=500,
-            detail=f"Failed to fetch issues: {str(e)}"
-        )
+            detail=f"Failed to fetch issues: {e!s}"
+        ) from e

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/v1/github.py` around lines 30 - 34, The current bare "except
Exception as e" should be narrowed to the specific exceptions thrown when
fetching GitHub issues (e.g., requests.exceptions.RequestException, PyGithub's
GithubException, or whatever client-specific exceptions your fetch function
raises) or, if a catch-all is intentional, add a comment explaining why;
preserve the exception chain by re-raising the HTTPException with "raise
HTTPException(...) from e" and use the f-string conversion flag for the error
message (e.g., f"Failed to fetch issues: {e!s}") while keeping the HTTPException
symbol and the surrounding except block unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/agents/devrel/github/github_toolkit.py`:
- Around line 152-153: The fallback call passing None to
handle_general_github_help causes an AttributeError because
handle_general_github_help calls _extract_search_query which uses llm. Fix by
supplying a valid LLM instance instead of None: update the fallback to pass the
toolkit's LLM (e.g., self.llm or the LLM field on GitHubToolkit) or ensure
GitHubToolkit initializes a default LLM during construction and then call
handle_general_github_help(query, self.llm); alternatively implement a safe
non-LLM fallback in handle_general_github_help that does not call
_extract_search_query when llm is None.

In `@backend/app/agents/devrel/github/services/issue_suggestion_service.py`:
- Around line 53-59: The loop that builds results from data.get("items", [])
uses direct key access on item["repository_url"], item["number"], item["title"],
and item["html_url"], which can raise KeyError on partial GitHub API responses;
update the loop in issue_suggestion_service.py (where results is appended) to
safely read fields via item.get(...) and validate that required fields
(repository_url, number, title, html_url) are present before appending, skip
(and optionally log) any incomplete entries, and extract the repo name from
repository_url defensively (e.g., handle None or empty string before splitting).
- Around line 36-42: The code builds the GitHub search URL by interpolating
search_query into an f-string (see the url variable and the client.get call in
issue_suggestion_service.py) which sends unencoded spaces/quotes; change the
request to call client.get(GITHUB_API_BASE + "/search/issues", params={"q":
search_query, "per_page": limit}, headers=headers) so httpx handles URL
encoding, remove the manual url f-string, and delete the debug print()
statements present around the search_query/url and response logging.

In `@backend/app/api/router.py`:
- Around line 1-13: The repo exposes the same beginner-issues endpoint twice
(api_router includes github_router at /v1/github exposing
/v1/github/beginner-issues while backend/routes.py separately registers
/github/beginner-issues), so remove the duplicate by consolidating to a single
router: choose the canonical implementation (either the endpoint in .v1.github's
router or the one in backend/routes.py), delete the other duplicate route
registration, and update imports/usages accordingly; ensure the remaining
endpoint still calls fetch_beginner_issues(owner=GITHUB_ORG, repo=repo) on
IssueSuggestionService and returns the same response shape, and remove or adjust
any tests/config that expect the removed path.

In `@backend/app/api/v1/github.py`:
- Around line 7-16: IssueSuggestionService is instantiated at module import with
GITHUB_TOKEN which can be None, so outbound calls may silently use an invalid
token; change to either validate GITHUB_TOKEN at import and raise (fail-fast) or
lazily create the service inside the request handler (e.g., move creation of
IssueSuggestionService into get_beginner_issues and construct it after checking
GITHUB_TOKEN), referencing IssueSuggestionService and get_beginner_issues to
locate the code and ensure the token check precedes any service construction or
API calls.
- Line 2: The handler is calling a non-existent method
issue_service.fetch_beginner_issues which will raise AttributeError; update the
call to use the existing IssueSuggestionService.fetch_global_beginner_issues (or
add a new method fetch_beginner_issues to IssueSuggestionService) so signatures
match: either replace issue_service.fetch_beginner_issues(owner=GITHUB_ORG,
repo=repo) with issue_service.fetch_global_beginner_issues(language, limit)
passing appropriate language/limit values, or implement
fetch_beginner_issues(owner, repo, ...) in IssueSuggestionService that delegates
to fetch_global_beginner_issues and returns the expected shape.

In `@backend/app/classification/classification_router.py`:
- Around line 62-69: The acknowledgment branch currently returns needs_devrel:
True while saying "no processing needed"; change needs_devrel to False and
update the reasoning string to match (e.g., "Acknowledgment message - no devrel
processing needed") in the returned dict that contains keys needs_devrel,
priority, reasoning, original_message, proactive_type so the boolean and message
are consistent.

In `@backend/integrations/discord/bot.py`:
- Around line 67-69: The except block currently only logs errors (except
Exception as e / logger.error(...)) but doesn't notify the user; update the
exception handler to, after logging the error, send a user-facing reply in the
same context (e.g., await message.reply(...) or await message.channel.send(...)
/ post into message.thread if thread-aware) with a short apology like "Sorry,
something went wrong while processing your message." Ensure the send is awaited
and wrapped to avoid raising on failure (catch/send fallback) so the bot won't
crash when attempting to notify the user.

In `@backend/main.py`:
- Around line 61-67: The CORS setup using api.add_middleware(CORSMiddleware)
currently sets allow_origins=["*"] together with allow_credentials=True which is
invalid; update the CORSMiddleware configuration in main.py (the
api.add_middleware call) to either (A) replace allow_origins=["*"] with an
explicit list of allowed origins (e.g., read from an ALLOWED_ORIGINS env var or
config and pass that list) while keeping allow_credentials=True, or (B) if you
truly need wildcard origins, set allow_credentials=False; make the change where
CORSMiddleware is configured so the Access-Control-Allow-Origin header will not
be "*" when credentials are allowed.
- Line 75: Import the router object from routes.py (e.g., from routes import
router as routes_router) and register it on the FastAPI app alongside api_router
by calling api.include_router(routes_router) so the /github/webhook and
/github/beginner-issues endpoints become reachable; place the import near the
other router imports and add the include_router call after the existing
api.include_router(api_router) in main.py.
- Around line 33-35: The fire-and-forget asyncio.create_task call for starting
the bot can be garbage-collected; store the Task on the instance so it has a
strong reference. Replace the bare
asyncio.create_task(self.discord_bot.start(settings.discord_bot_token)) with
assigning the returned Task to an instance attribute (e.g., self.discord_task)
and use that attribute when you need to await, cancel, or check the bot
lifecycle; ensure the attribute is created on the same object where create_task
is invoked so references to self.discord_task prevent silent cancellation.

In `@backend/requirements.txt`:
- Around line 1-19: requirements.txt is missing runtime packages used by the
code; add the following packages to backend/requirements.txt so imports succeed:
discord.py (used in backend/integrations/discord/bot.py), langchain-google-genai
and langchain-core (used in
backend/app/classification/classification_router.py), duckduckgo-search or ddgs
(used in backend/app/agents/devrel/tools/search_tool/ddg.py), and langsmith
(used in backend/app/agents/devrel/tools/search_tool/ddg.py); pin versions if
required by your environment and run pip install -r requirements.txt to verify
imports succeed.

In `@backend/routes.py`:
- Around line 2-3: The file defines imports (IssueSuggestionService,
GITHUB_TOKEN, GITHUB_ORG) and a FastAPI router but the router is never mounted
so its endpoints are unreachable; either mount this router into the app (e.g.,
include the router on the global FastAPI instance or add it to api_router in
main.py) or remove/merge these routes if they are superseded by the /v1/github
router—update main.py (or api_router registration) to include the router symbol
from this module or delete the duplicate endpoints and their imports
(IssueSuggestionService, GITHUB_TOKEN, GITHUB_ORG) to avoid stale/invalid
imports.
- Around line 110-114: The route calls a non-existent
IssueSuggestionService.fetch_beginner_issues with owner/repo kwargs causing
AttributeError; fix by either (A) changing the call in routes.py to the existing
method IssueSuggestionService.fetch_global_beginner_issues(language, limit) and
pass the correct parameters, or (B) add/rename a method on
IssueSuggestionService (e.g., fetch_beginner_issues(self, owner, repo, ...))
that implements repo-scoped behavior and accept owner/repo kwargs; update the
route to call the matching method name and signature so the symbols
IssueSuggestionService, fetch_beginner_issues and fetch_global_beginner_issues
align.

In `@backend/services/github/issue_suggestion_service.py`:
- Around line 1-46: This file defines an
IssueSuggestionService.fetch_global_beginner_issues(language, limit) that
conflicts with the other service variant
fetch_global_beginner_issues(user_query, limit); consolidate into a single
service API (pick one canonical signature — e.g., accept user_query and optional
language) and merge filtering logic so both callers (including github_toolkit
import) use the same class/method, updating import sites as needed; also replace
the print("GitHub search failed:", response.text) in IssueSuggestionService with
the project logger (use the existing logger instance or create one) and log the
response status and body for diagnostics.
- Around line 38-44: The loop building results from GitHub response items uses
direct indexing (issue["number"], issue["title"], issue["html_url"],
issue["repository_url"]) which can raise KeyError; update the block that appends
to results to use issue.get(...) with sensible defaults (e.g.,
issue.get("number"), issue.get("title", ""), issue.get("html_url", "") ) and
derive repo safely by reading repo_url = issue.get("repository_url") and then
splitting only if repo_url is truthy (fallback to an empty string or None).
Ensure you update the same symbols shown (items, issue, results,
repository_url/html_url keys) so missing fields don't crash the service.
- Around line 25-26: The code opens httpx.AsyncClient() and calls
client.get(url, headers=headers) with no timeout; add a request timeout to avoid
hanging the event loop by passing a timeout value (or an httpx.Timeout object)
either when constructing httpx.AsyncClient(timeout=...) or on the client.get
call (timeout=...) in the function in issue_suggestion_service.py that performs
the GitHub request; choose a reasonable timeout (e.g., 5–30s) and ensure the
call is wrapped so timeouts raise and can be handled/logged appropriately.
- Around line 11-15: The method is named fetch_global_beginner_issues but
callers invoke issue_service.fetch_beginner_issues(), causing an AttributeError;
fix by either renaming the service method fetch_global_beginner_issues to
fetch_beginner_issues or updating all call sites that call
issue_service.fetch_beginner_issues() to call fetch_global_beginner_issues()
instead, and ensure the function signature (language: str = "python", limit: int
= 5) and return type List[Dict] remain unchanged so no other call expectations
break.
- Around line 22-23: The constructed query string (variable query) is
interpolating language into the URL (url) without URL-encoding, which breaks the
GitHub API for languages with spaces/special characters; update the code in
issue_suggestion_service.py to either URL-encode the language (via
urllib.parse.quote) before building query or, better, stop manual string
interpolation and pass the search as an httpx params dict (e.g. params={'q':
f'label:"good first issue" language:{language} state:open', 'per_page': limit})
when calling the GitHub API base (GITHUB_API_BASE) so httpx handles encoding for
you and avoid embedding raw unencoded values into url.

---

Nitpick comments:
In `@backend/app/agents/devrel/github/github_toolkit.py`:
- Around line 120-125: The code currently instantiates
IssueSuggestionService(settings.github_token) inside execute(), creating a new
HTTP client per call; move the instantiation into the class __init__ (store it
as self.issue_suggestion_service) and update execute() to call
self.issue_suggestion_service.fetch_global_beginner_issues(query) so the same
IssueSuggestionService instance (and its underlying HTTP client) is reused
across calls.
- Around line 47-76: The rule-based classifier in classify_intent uses broad
substring checks (e.g., checking "repo" or "search" in query_lower) causing
false positives; change those checks to use whole-word or phrase matching (e.g.,
regex with word boundaries or split/token-based matching) and prefer multi-word
phrases for intents like "good first issue" and "github support" to avoid
matching substrings (e.g., "report" -> "repo"); update the conditions in
classify_intent to perform re.search(r"\brepo\b", ...) or equivalent token
checks and add tests for cases like "report" and "searching" to verify correct
classification.

In `@backend/app/agents/devrel/github/services/issue_suggestion_service.py`:
- Around line 7-63: There are two incompatible definitions of
IssueSuggestionService; consolidate them by removing the duplicate and merging
logic into one canonical class (IssueSuggestionService) so imports are
unambiguous: pick the canonical file to keep (prefer
backend/services/github/issue_suggestion_service.py) and update its
fetch_global_beginner_issues signature to support both a typed parameterized
interface (e.g., language: Optional[str], org: Optional[str], limit: int) and
the freeform user_query parsing used here; migrate the filtering logic from this
file (language detection from user_query and org checks like "django") into the
single retained fetch_global_beginner_issues implementation, adjust callers to
the unified signature, and delete this duplicate class to avoid divergence.
- Around line 29-34: The current hardcoded detection in
issue_suggestion_service.py (the search_query assembly using query_lower) only
checks for "python" and "django"; replace this with a configurable mapping
approach: add language and org maps (e.g., LANG_KEYWORDS and ORG_KEYWORDS) that
map keyword sets to search qualifiers and use a loop to detect any matching
keyword in query_lower and append the corresponding "language:..." or "org:..."
to search_query; expose these maps as constructor parameters or module-level
config so they can be extended without code changes and update the logic in the
function that builds search_query to iterate the maps instead of checking only
"python" and "django".
- Around line 38-39: Replace the raw print() debug statements for "search_query"
and "url" (and the other print calls at the noted locations) with structured
logging: create or use a module logger (logging.getLogger(__name__>) and use
appropriate log levels (logger.debug or logger.info for query/URL diagnostics,
logger.error for failures). For the prints that currently swallow errors (the
status/error prints around lines 45–46 and 61), log the error with logger.error
including the exception/info and either re-raise the exception or
return/propagate a clear error value instead of silently continuing; update the
surrounding code paths that reference search_query and url to use these logger
calls (identify the usages of variables search_query and url and the
exception-handling block in issue_suggestion_service.py) so logs are captured by
aggregators and obey log-level configuration.

In `@backend/app/api/v1/github.py`:
- Around line 30-34: The current bare "except Exception as e" should be narrowed
to the specific exceptions thrown when fetching GitHub issues (e.g.,
requests.exceptions.RequestException, PyGithub's GithubException, or whatever
client-specific exceptions your fetch function raises) or, if a catch-all is
intentional, add a comment explaining why; preserve the exception chain by
re-raising the HTTPException with "raise HTTPException(...) from e" and use the
f-string conversion flag for the error message (e.g., f"Failed to fetch issues:
{e!s}") while keeping the HTTPException symbol and the surrounding except block
unchanged.

In `@backend/app/classification/classification_router.py`:
- Around line 30-35: The current exact-match check (if msg in greetings) in
classification_router.py will miss variants like "hi there" or "hello!"; update
the condition that uses the greetings list to use a starts-with or substring
test (e.g., any(msg.startswith(g) or msg.startswith(g + " ") or g in msg for g
in greetings)) after msg = msg.lower().strip(), and consider stripping trailing
punctuation before matching; locate the greetings variable and the conditional
that checks "if msg in greetings" and replace it with the more permissive
any(...) test so greetings like "hi there", "hello!", and "hey everyone" are
correctly detected.
- Around line 73-77: The type annotation for the method should_process_message
uses an implicit None default (context: Dict[str, Any] = None) which flags Ruff
RUF013; change the parameter to an explicit optional type such as context:
Dict[str, Any] | None = None (or use Optional[Dict[str, Any]] = None if you
prefer typing.Optional) so the signature clearly expresses that context may be
None while keeping the default None value; update the import if you choose
Optional.
- Around line 81-85: The branch using "if False and pattern_result" permanently
disables the proactive pattern logic while still calling _simple_pattern_match;
replace this dead flag with a real configuration toggle (e.g.,
self.config.enable_proactive_pattern or a module-level setting) and guard the
call so _simple_pattern_match(message) is only invoked when the flag is true; if
you prefer to remove the feature, delete the call to _simple_pattern_match and
the unreachable if-block entirely (and keep the logger/info and return behavior
only when the config flag is enabled).

In `@backend/integrations/discord/bot.py`:
- Around line 58-60: Currently GitHubToolkit() (and thus IssueSuggestionService)
is instantiated per message; move the creation to the bot class constructor by
adding self.toolkit = GitHubToolkit() in __init__ and update the message handler
to call await self.toolkit.execute(message.content) instead of creating a new
GitHubToolkit; ensure any other places that constructed IssueSuggestionService
are updated to reuse self.toolkit so toolkit and downstream services are reused
across messages.
- Line 32: active_threads is an unbounded dict and will leak memory; implement
bounded/TTL eviction and cleanup: replace plain Dict[str,str] active_threads
with a small helper (e.g., an LRU or TTL cache) or wrap accesses to enforce a
max size and timestamps, remove entries when threads are archived/closed, and
run a periodic cleanup task. Update any code that writes to active_threads
(places that create threads or handle thread events such as thread
creation/updates/archival handlers, e.g., the functions that currently
add/remove entries) to use the new helper APIs (put/get/remove) so entries are
removed on archive and expired entries are purged by the periodic task or on
insert when max size is reached. Ensure thread-safety if accessed from async
handlers.

In `@backend/routes.py`:
- Around line 122-127: The except Exception as e block that logs and re-raises
an HTTPException when failing to fetch beginner issues should chain the original
exception to preserve traceback; update the re-raise of
HTTPException(status_code=500, detail="Failed to fetch beginner issues") to use
"from e" (i.e., raise HTTPException(...) from e) while keeping the logging call
intact so the original exception context is preserved for debugging.

coderabbitai · 2026-02-21T09:29:24Z

backend/app/agents/devrel/github/github_toolkit.py

            else:
-                result = await handle_general_github_help(query, self.llm)
+                result = await handle_general_github_help(query, None)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and inspect the general_github_help.py file find . -name "general_github_help.py" -type f

Repository: AOSSIE-Org/Devr.AI

Length of output: 191

🏁 Script executed:

# Also look at the github_toolkit.py file around lines 152-153 to see the full context cat -n backend/app/agents/devrel/github/github_toolkit.py | sed -n '140,160p'

Repository: AOSSIE-Org/Devr.AI

Length of output: 919

🏁 Script executed:

# Search for handle_general_github_help function definition rg "def handle_general_github_help|async def handle_general_github_help" -A 20

Repository: AOSSIE-Org/Devr.AI

Length of output: 2239

🏁 Script executed:

rg "_extract_search_query" -A 10 backend/app/agents/devrel/github/tools/general_github_help.py

Repository: AOSSIE-Org/Devr.AI

Length of output: 1027

🏁 Script executed:

# Also check where llm is injected/initialized in github_toolkit.py rg "self.llm|self\.llm" backend/app/agents/devrel/github/github_toolkit.py

Repository: AOSSIE-Org/Devr.AI

Length of output: 44

🏁 Script executed:

rg "_extract_search_query" -B 5 -A 15 backend/app/agents/devrel/nodes/handlers/web_search.py

Repository: AOSSIE-Org/Devr.AI

Length of output: 1628

🏁 Script executed:

# Also check all calls to handle_general_github_help to see if None is always passed rg "handle_general_github_help" backend/app/agents/devrel/github/github_toolkit.py

Repository: AOSSIE-Org/Devr.AI

Length of output: 198

handle_general_github_help(query, None) will crash — llm cannot be None.

The fallback path at line 153 passes None as the llm argument. The function immediately calls _extract_search_query(query, llm), which invokes await llm.ainvoke(...). This will raise AttributeError: 'NoneType' object has no attribute 'ainvoke'. Every unclassified query will hit this path and crash.

Either initialize an LLM instance in GitHubToolkit to pass here, or provide a non-LLM fallback for the general_github_help path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/agents/devrel/github/github_toolkit.py` around lines 152 - 153, The fallback call passing None to handle_general_github_help causes an AttributeError because handle_general_github_help calls _extract_search_query which uses llm. Fix by supplying a valid LLM instance instead of None: update the fallback to pass the toolkit's LLM (e.g., self.llm or the LLM field on GitHubToolkit) or ensure GitHubToolkit initializes a default LLM during construction and then call handle_general_github_help(query, self.llm); alternatively implement a safe non-LLM fallback in handle_general_github_help that does not call _extract_search_query when llm is None.

coderabbitai · 2026-02-21T09:29:24Z

backend/app/agents/devrel/github/services/issue_suggestion_service.py

+        url = f"{GITHUB_API_BASE}/search/issues?q={search_query}&per_page={limit}"
+
+        print("🔍 GitHub Search Query:", search_query)
+        print("🔗 GitHub URL:", url)
+
+        async with httpx.AsyncClient() as client:
+            response = await client.get(url, headers=headers)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n backend/app/agents/devrel/github/services/issue_suggestion_service.py | head -70

Repository: AOSSIE-Org/Devr.AI

Length of output: 2196

🏁 Script executed:

# Check if there are any other similar files that might show patterns find . -name "*issue_suggestion_service*" -type f

Repository: AOSSIE-Org/Devr.AI

Length of output: 187

🌐 Web query:

httpx AsyncClient URL encoding f-string vs params kwarg query string

💡 Result:

In httpx (including AsyncClient), prefer params= (or httpx.URL(..., params=...)) over building a query string with an f-string:

params= is the supported API for query strings and accepts dict, str, or [(k, v), ...]. httpx will serialize/encode the parameters for you. [1][2]

httpx.QueryParams/params= encoding is form-style (e.g., spaces become +, @ becomes %40, ! becomes %21). [1]

Some characters may be treated as “safe” and not percent-encoded (notably /), which can surprise people coming from requests; this is discussed as an intentional behavior change in httpx. [3]

Recommended

import httpx async with httpx.AsyncClient() as client: r = await client.get( "https://api.example.com/search", params={"q": "How HTTP works!", "email": "user@example.com"}, )

If you use an f-string/manual URL

You must encode yourself (and avoid double-encoding):

from urllib.parse import urlencode q = urlencode({"q": "How HTTP works!", "email": "user@example.com"}) url = f"https://api.example.com/search?{q}"

If an API requires / to be encoded inside a parameter value

You may need to pre-quote that specific value before passing it (because httpx may leave / unescaped), e.g. for q="http://example.com"; this exact issue is reported/discussed. [3]

Sources: httpx URL/QueryParams docs [1], AsyncClient params API [2], httpx issue on query param encoding and / handling [3].

Search query is not URL-encoded — spaces and quotes will be sent raw to GitHub API

The search_query string (e.g. label:"good first issue" is:issue state:open language:python) contains spaces and double-quotes. Embedding it directly in an f-string URL bypasses httpx's query parameter encoding, sending unencoded characters to the GitHub API. This will result in malformed requests and unpredictable behavior.

Use the params= keyword argument instead, which httpx will properly URL-encode:

🔧 Proposed fix

- url = f"{GITHUB_API_BASE}/search/issues?q={search_query}&per_page={limit}" - - print("🔍 GitHub Search Query:", search_query) - print("🔗 GitHub URL:", url) - async with httpx.AsyncClient() as client: - response = await client.get(url, headers=headers) + response = await client.get( + f"{GITHUB_API_BASE}/search/issues", + headers=headers, + params={"q": search_query, "per_page": limit}, + )

Additionally, remove debug print() statements (lines 38–39, 45–46, 61) from production code.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

url = f"{GITHUB_API_BASE}/search/issues?q={search_query}&per_page={limit}"

print("🔍 GitHub Search Query:", search_query)

print("🔗 GitHub URL:", url)

async with httpx.AsyncClient() as client:

response = await client.get(url, headers=headers)

async with httpx.AsyncClient() as client:

response = await client.get(

f"{GITHUB_API_BASE}/search/issues",

headers=headers,

params={"q": search_query, "per_page": limit},

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/agents/devrel/github/services/issue_suggestion_service.py` around lines 36 - 42, The code builds the GitHub search URL by interpolating search_query into an f-string (see the url variable and the client.get call in issue_suggestion_service.py) which sends unencoded spaces/quotes; change the request to call client.get(GITHUB_API_BASE + "/search/issues", params={"q": search_query, "per_page": limit}, headers=headers) so httpx handles URL encoding, remove the manual url f-string, and delete the debug print() statements present around the search_query/url and response logging.

coderabbitai · 2026-02-21T09:29:24Z

backend/app/agents/devrel/github/services/issue_suggestion_service.py

+        for item in data.get("items", []):
+            results.append({
+                "repo": item["repository_url"].split("/")[-1],
+                "number": item["number"],
+                "title": item["title"],
+                "url": item["html_url"]
+            })


⚠️ Potential issue | 🟠 Major

Direct key access will raise KeyError on partial or unexpected GitHub API responses

All four field accesses on the item dict ("repository_url", "number", "title", "html_url") will raise a KeyError if a response item omits any field, silently aborting the entire result set instead of skipping the malformed entry.

🔧 Proposed fix — use `.get()` and skip incomplete entries

for item in data.get("items", []): - results.append({ - "repo": item["repository_url"].split("/")[-1], - "number": item["number"], - "title": item["title"], - "url": item["html_url"] - }) + repo_url = item.get("repository_url", "") + number = item.get("number") + title = item.get("title") + url = item.get("html_url") + if not all([repo_url, number, title, url]): + continue + results.append({ + "repo": repo_url.split("/")[-1], + "number": number, + "title": title, + "url": url, + })

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for item in data.get("items", []):

results.append({

"repo": item["repository_url"].split("/")[-1],

"number": item["number"],

"title": item["title"],

"url": item["html_url"]

})

for item in data.get("items", []):

repo_url = item.get("repository_url", "")

number = item.get("number")

title = item.get("title")

url = item.get("html_url")

if not all([repo_url, number, title, url]):

continue

results.append({

"repo": repo_url.split("/")[-1],

"number": number,

"title": title,

"url": url,

})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/agents/devrel/github/services/issue_suggestion_service.py` around lines 53 - 59, The loop that builds results from data.get("items", []) uses direct key access on item["repository_url"], item["number"], item["title"], and item["html_url"], which can raise KeyError on partial GitHub API responses; update the loop in issue_suggestion_service.py (where results is appended) to safely read fields via item.get(...) and validate that required fields (repository_url, number, title, html_url) are present before appending, skip (and optionally log) any incomplete entries, and extract the repo name from repository_url defensively (e.g., handle None or empty string before splitting).

coderabbitai · 2026-02-21T09:29:24Z

backend/app/api/router.py

+from .v1.github import router as github_router
 from fastapi import APIRouter
 from .v1.auth import router as auth_router
 from .v1.health import router as health_router
 from .v1.integrations import router as integrations_router

 api_router = APIRouter()

+api_router.include_router(
+    github_router,
+    prefix="/v1/github",
+    tags=["GitHub"]
+)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if files exist echo "=== Checking file existence ===" ls -la backend/app/api/v1/github.py 2>&1 || echo "File not found" ls -la backend/routes.py 2>&1 || echo "File not found" echo "" echo "=== Routes in backend/app/api/v1/github.py ===" if [ -f backend/app/api/v1/github.py ]; then rg -n 'beginner|@router|def ' backend/app/api/v1/github.py | head -50 fi echo "" echo "=== Routes in backend/routes.py ===" if [ -f backend/routes.py ]; then rg -n 'beginner|@router|def ' backend/routes.py | head -50 fi

Repository: AOSSIE-Org/Devr.AI

Length of output: 904

🏁 Script executed:

#!/bin/bash echo "=== Full content of backend/app/api/v1/github.py ===" cat -n backend/app/api/v1/github.py echo "" echo "=== Full content of backend/routes.py ===" cat -n backend/routes.py

Repository: AOSSIE-Org/Devr.AI

Length of output: 6730

Duplicate beginner-issues endpoints across two routers.

This wires github_router at /v1/github (exposing /v1/github/beginner-issues), while backend/routes.py also defines /github/beginner-issues on a separate router. Both endpoints are functionally identical—they call fetch_beginner_issues(owner=GITHUB_ORG, repo=repo) on separate IssueSuggestionService instances and return the same response structure. Consolidate to a single endpoint to avoid confusion and redundancy.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/api/router.py` around lines 1 - 13, The repo exposes the same beginner-issues endpoint twice (api_router includes github_router at /v1/github exposing /v1/github/beginner-issues while backend/routes.py separately registers /github/beginner-issues), so remove the duplicate by consolidating to a single router: choose the canonical implementation (either the endpoint in .v1.github's router or the one in backend/routes.py), delete the other duplicate route registration, and update imports/usages accordingly; ensure the remaining endpoint still calls fetch_beginner_issues(owner=GITHUB_ORG, repo=repo) on IssueSuggestionService and returns the same response shape, and remove or adjust any tests/config that expect the removed path.

coderabbitai · 2026-02-21T09:29:24Z

backend/app/api/v1/github.py

@@ -0,0 +1,34 @@
+from fastapi import APIRouter, HTTPException
+from services.github.issue_suggestion_service import IssueSuggestionService


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for fetch_beginner_issues definition echo "=== Searching for fetch_beginner_issues definition ===" rg -n "def fetch_beginner_issues" --type py echo "" echo "=== Methods on service classes (issue_suggestion_service) ===" rg -n "def fetch" --type py -g "*issue_suggestion_service*" -A 2 echo "" echo "=== Locating issue_suggestion_service.py ===" fd "issue_suggestion_service.py" echo "" echo "=== Checking github.py lines 15-25 for actual method call ===" fd "github.py" -g "*/api/v1/*" -x cat -n {} | sed -n '15,25p'

Repository: AOSSIE-Org/Devr.AI

Length of output: 1087

🏁 Script executed:

#!/bin/bash # Find the github.py file in api/v1 find . -path "*/api/v1/github.py" -type f echo "" echo "=== Reading backend/app/api/v1/github.py ===" cat -n backend/app/api/v1/github.py | head -30

Repository: AOSSIE-Org/Devr.AI

Length of output: 1084

🏁 Script executed:

#!/bin/bash cat -n backend/services/github/issue_suggestion_service.py

Repository: AOSSIE-Org/Devr.AI

Length of output: 1619

fetch_beginner_issues does not exist on the imported service — AttributeError at runtime

Line 19 calls issue_service.fetch_beginner_issues(owner=GITHUB_ORG, repo=repo), but the imported IssueSuggestionService only defines fetch_global_beginner_issues(language: str, limit: int). This method does not exist and will raise AttributeError on every request to the /beginner-issues endpoint.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/api/v1/github.py` at line 2, The handler is calling a non-existent method issue_service.fetch_beginner_issues which will raise AttributeError; update the call to use the existing IssueSuggestionService.fetch_global_beginner_issues (or add a new method fetch_beginner_issues to IssueSuggestionService) so signatures match: either replace issue_service.fetch_beginner_issues(owner=GITHUB_ORG, repo=repo) with issue_service.fetch_global_beginner_issues(language, limit) passing appropriate language/limit values, or implement fetch_beginner_issues(owner, repo, ...) in IssueSuggestionService that delegates to fetch_global_beginner_issues and returns the expected shape.

coderabbitai · 2026-02-21T09:29:25Z