Skip to content

Add C# language support#107

Merged
gkorland merged 4 commits intomainfrom
csharp
Feb 19, 2026
Merged

Add C# language support#107
gkorland merged 4 commits intomainfrom
csharp

Conversation

@gkorland
Copy link
Contributor

@gkorland gkorland commented Feb 18, 2026

  • Add tree-sitter-c-sharp dependency to pyproject.toml
  • Create CSharpAnalyzer in api/analyzers/csharp/ supporting: classes, interfaces, enums, structs, methods, constructors
  • Register .cs extension in source_analyzer.py with LSP config
  • Add C# test source file and unit test
  • Update README to list C# as supported language

Summary by CodeRabbit

  • New Features

    • Added C# language support for comprehensive source analysis (types, members, inheritance, method usage).
  • Documentation

    • Updated displayed supported languages to list Python, Java, and C#.
  • Tests

    • Added unit tests exercising the C# analysis workflow and validating discovered entities and relationships.
  • Refactor

    • Improved symbol resolution to consistently return multiple resolved entities for more accurate cross-references.

- Add tree-sitter-c-sharp dependency to pyproject.toml
- Create CSharpAnalyzer in api/analyzers/csharp/ supporting:
  classes, interfaces, enums, structs, methods, constructors
- Register .cs extension in source_analyzer.py with LSP config
- Add C# test source file and unit test
- Update README to list C# as supported language

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vercel
Copy link

vercel bot commented Feb 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
code-graph-backend Ready Ready Preview, Comment Feb 18, 2026 9:07pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

📝 Walkthrough

Walkthrough

Adds comprehensive C# support: new Tree-sitter–based CSharpAnalyzer, integrates .cs into source collection and language-server startup, adds tree-sitter-c-sharp dependency, and includes C# sample code plus tests to validate extracted entities and relationships.

Changes

Cohort / File(s) Summary
Docs & Project Config
README.md, pyproject.toml
Updated supported languages text (now "Python, Java, and C#") and added tree-sitter-c-sharp>=0.23.1,<0.24.0.
C# Analyzer Implementation
api/analyzers/csharp/analyzer.py
New CSharpAnalyzer class: Tree-sitter integration, query/capture helpers, entity label/name/docstring extraction, dependency handling (dotnet restore guard), symbol extraction (types, interfaces, bases, methods, parameters, calls), and resolve_type/resolve_method/resolve_symbol returning lists of Entity.
Analyzer Core API
api/analyzers/analyzer.py
Changed AbstractAnalyzer.resolve_symbol signature and docstring to return list[Entity] instead of a single Entity.
Language Integrations
api/analyzers/source_analyzer.py
Registered .cs analyzer, include *.cs in source collection, and conditionally initialize/start a C# language server (MultilspyConfig "csharp") alongside other LS instances.
Java/Python Analyzer Adjustments
api/analyzers/java/analyzer.py, api/analyzers/python/analyzer.py
Updated resolve_symbol signatures to return list[Entity] and adjusted implementations to consistently return lists.
Tests & Samples
tests/source_files/csharp/Program.cs, tests/test_csharp_analyzer.py
Added sample C# source (ILogger, ConsoleLogger, Task) and unit tests asserting presence of interface/class/method/constructor nodes and file→defines relationships in analysis graph.
Misc (startup/test infra)
...
Second-pass language-server startup and analyzer registry updated to include .cs; test assets added to exercise new analyzer.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant SourceAnalyzer as SourceAnalyzer
    participant CSharpAnalyzer as CSharpAnalyzer
    participant TreeSitter as TreeSitter
    participant LanguageServer as LanguageServer
    participant Graph as Graph

    User->>SourceAnalyzer: analyze_sources(path)
    SourceAnalyzer->>SourceAnalyzer: collect *.cs files
    SourceAnalyzer->>LanguageServer: init/start C# LS (MultilspyConfig)
    SourceAnalyzer->>CSharpAnalyzer: analyze(file)
    CSharpAnalyzer->>TreeSitter: parse file -> AST
    TreeSitter-->>CSharpAnalyzer: AST nodes
    CSharpAnalyzer->>CSharpAnalyzer: extract entities (types, methods, docs)
    CSharpAnalyzer->>LanguageServer: resolve_type / resolve_method
    LanguageServer-->>CSharpAnalyzer: resolved symbols
    CSharpAnalyzer->>SourceAnalyzer: enriched entities
    SourceAnalyzer->>Graph: create nodes & relationships
    Graph-->>User: analysis results available
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped in code where C# grows,
Trees of syntax, dotted rows.
Servers wake and join the chase,
I nibble lines with joyful pace—
carrots and tests left in my trace. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add C# language support' is concise and clearly summarizes the main change: adding support for C# as a new supported language alongside Python and Java.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch csharp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

else:
lsps[".py"] = NullLanguageServer()
with lsps[".java"].start_server(), lsps[".py"].start_server():
if any(path.rglob('*.cs')):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 11 days ago

In general, the fix is to constrain and validate user‑supplied paths before using them to access the filesystem. A standard pattern is to define a trusted base directory (root for all analyses), normalize the user path with Path.resolve() (or os.path.realpath) and then check that the resulting absolute path is inside this base directory. If the check fails, reject the request. This prevents directory traversal and arbitrary path access while preserving existing functionality for legitimate subdirectories.

For this codebase, the best minimal fix (without changing the core analyzer logic) is:

  1. In the HTTP handlers where a path is received (e.g., analyze_folder in tests/index.py and any similar route in api/index.py if they exist), normalize the path to an absolute Path, and ensure it lies within a configured analysis root directory. The root could be taken from an environment variable such as CODE_GRAPH_ROOT or default to the current working directory if not set.

  2. After validation, pass only the resolved, validated path string to SourceAnalyzer.analyze_local_folder. That way, inside SourceAnalyzer, all further use of path ultimately derives from a vetted path.

  3. Optionally, add a small helper function in the entrypoint file (tests/index.py) to perform this validation, re‑using it for any other endpoints that may accept paths.

We do not need to change api/analyzers/source_analyzer.py itself, because it already calls path.resolve() in analyze_sources, and the primary issue is that its initial path argument may refer to an arbitrary location; constraining that at the boundary (the HTTP layer) is sufficient.

Concretely, in tests/index.py around the analyze_folder endpoint, we will:

  • Import Path from pathlib is already present.

  • Import os is already present.

  • Add logic after reading path from the request:

    • Get a base directory from an environment variable, e.g. ANALYSIS_ROOT = os.environ.get("CODE_GRAPH_ROOT", os.getcwd()) near the top of the file.

    • Resolve both the base and the requested path: base_path = Path(ANALYSIS_ROOT).resolve(), requested_path = Path(path).resolve().

    • Check requested_path.is_dir() instead of os.path.isdir(path).

    • Ensure requested_path is under base_path, e.g.:

      try:
          requested_path.relative_to(base_path)
      except ValueError:
          # not under base
    • If any of these checks fail, return a 400 error.

  • Pass str(requested_path) (or requested_path converted to string) into analyzer.analyze_local_folder.

No new third‑party dependencies are required; we only use Python’s standard library (os, pathlib).


Suggested changeset 1
tests/index.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/index.py b/tests/index.py
--- a/tests/index.py
+++ b/tests/index.py
@@ -17,6 +17,9 @@
                     format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 logger = logging.getLogger(__name__)
 
+# Base directory for analyzing local source code; defaults to current working directory.
+ANALYSIS_ROOT = Path(os.environ.get("CODE_GRAPH_ROOT", os.getcwd())).resolve()
+
 # Function to verify the token
 SECRET_TOKEN = os.getenv('SECRET_TOKEN')
 def verify_token(token):
@@ -346,24 +349,37 @@
             logging.error("'path' is missing from the request.")
             return jsonify({"status": "'path' is required."}), 400
 
+        try:
+            requested_path = Path(path).resolve()
+        except (OSError, RuntimeError) as e:
+            logging.error(f"Failed to resolve path '{path}': {e}")
+            return jsonify({"status": "Invalid path: could not be resolved"}), 400
+
         # Validate path exists and is a directory
-        if not os.path.isdir(path):
-            logging.error(f"Path '{path}' does not exist or is not a directory")
+        if not requested_path.is_dir():
+            logging.error(f"Path '{requested_path}' does not exist or is not a directory")
             return jsonify({"status": "Invalid path: must be an existing directory"}), 400
 
+        # Ensure the requested path is within the configured analysis root
+        try:
+            requested_path.relative_to(ANALYSIS_ROOT)
+        except ValueError:
+            logging.error(f"Path '{requested_path}' is outside of the allowed analysis root '{ANALYSIS_ROOT}'")
+            return jsonify({"status": "Invalid path: outside of allowed analysis root"}), 400
+
         # Validate ignore list contains valid paths
         if not isinstance(ignore, list):
             logging.error("'ignore' must be a list of paths")
             return jsonify({"status": "'ignore' must be a list of paths"}), 400
 
-        proj_name = Path(path).name
+        proj_name = requested_path.name
 
         # Initialize the graph with the provided project name
         g = Graph(proj_name)
 
         # Analyze source code within given folder
         analyzer = SourceAnalyzer()
-        analyzer.analyze_local_folder(path, g, ignore)
+        analyzer.analyze_local_folder(str(requested_path), g, ignore)
 
         # Return response
         response = {
EOF
@@ -17,6 +17,9 @@
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Base directory for analyzing local source code; defaults to current working directory.
ANALYSIS_ROOT = Path(os.environ.get("CODE_GRAPH_ROOT", os.getcwd())).resolve()

# Function to verify the token
SECRET_TOKEN = os.getenv('SECRET_TOKEN')
def verify_token(token):
@@ -346,24 +349,37 @@
logging.error("'path' is missing from the request.")
return jsonify({"status": "'path' is required."}), 400

try:
requested_path = Path(path).resolve()
except (OSError, RuntimeError) as e:
logging.error(f"Failed to resolve path '{path}': {e}")
return jsonify({"status": "Invalid path: could not be resolved"}), 400

# Validate path exists and is a directory
if not os.path.isdir(path):
logging.error(f"Path '{path}' does not exist or is not a directory")
if not requested_path.is_dir():
logging.error(f"Path '{requested_path}' does not exist or is not a directory")
return jsonify({"status": "Invalid path: must be an existing directory"}), 400

# Ensure the requested path is within the configured analysis root
try:
requested_path.relative_to(ANALYSIS_ROOT)
except ValueError:
logging.error(f"Path '{requested_path}' is outside of the allowed analysis root '{ANALYSIS_ROOT}'")
return jsonify({"status": "Invalid path: outside of allowed analysis root"}), 400

# Validate ignore list contains valid paths
if not isinstance(ignore, list):
logging.error("'ignore' must be a list of paths")
return jsonify({"status": "'ignore' must be a list of paths"}), 400

proj_name = Path(path).name
proj_name = requested_path.name

# Initialize the graph with the provided project name
g = Graph(proj_name)

# Analyze source code within given folder
analyzer = SourceAnalyzer()
analyzer.analyze_local_folder(path, g, ignore)
analyzer.analyze_local_folder(str(requested_path), g, ignore)

# Return response
response = {
Copilot is powered by AI and may make mistakes. Always verify output.

def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
files = list(path.rglob("*.java")) + list(path.rglob("*.py"))
files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 11 days ago

In general, the fix is to constrain user-controlled paths to a trusted root and validate them before using them to access the filesystem. The common pattern: define a base directory (e.g., a workspace root or repositories directory), normalize both the base and the requested path, ensure the requested path is within the base (e.g., via Path.is_relative_to or an equivalent), and only then perform operations such as rglob. If validation fails, the code should raise an error or return an HTTP 400/403.

In this codebase, the minimal, self-contained fix is to harden SourceAnalyzer.analyze_local_folder, which is the point where untrusted path: str enters the analyzer domain. We can normalize the user path with Path(path).resolve(), and then check that it is inside a configured allowed root directory. Since we must not change imports outside the snippet and should avoid assuming additional configuration wiring, a pragmatic approach is:

  • Define an allowed base directory from an environment variable (e.g., CODE_GRAPH_ALLOWED_ROOT), defaulting to the current working directory (Path.cwd()), which is a safe and reasonable default.
  • Resolve both the base directory and the requested directory.
  • Check that resolved_path is under base_dir using Path.is_relative_to when available, or a relative_to try/except fallback for older Python versions.
  • If the check fails, log an error and raise ValueError (which the caller can surface as an error response).
  • Pass the validated resolved_path into analyze_sources instead of the raw path string.

This keeps existing functionality intact for callers that already provide paths within the working directory (or configured root), addresses all CodeQL variants that involve path.rglob, and centralizes path validation in one method.

Concretely, in api/analyzers/source_analyzer.py:

  • Add an import os (a well-known standard library) to read an environment variable.
  • In analyze_local_folder, before logging and calling analyze_sources, compute base_dir = Path(os.environ.get("CODE_GRAPH_ALLOWED_ROOT", Path.cwd())), resolved_base = base_dir.resolve(), resolved_path = Path(path).resolve(), and enforce that resolved_path is within resolved_base. If not, log and raise.
  • Call self.analyze_sources(resolved_path, ignore, g) instead of Path(path).

No changes are needed in analyze_sources itself, since it already calls path.resolve(); the new validation guarantees that path there is safe.


Suggested changeset 1
api/analyzers/source_analyzer.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/analyzers/source_analyzer.py b/api/analyzers/source_analyzer.py
--- a/api/analyzers/source_analyzer.py
+++ b/api/analyzers/source_analyzer.py
@@ -1,6 +1,7 @@
 from contextlib import nullcontext
 from pathlib import Path
 from typing import Optional
+import os
 
 from api.entities.entity import Entity
 from api.entities.file import File
@@ -190,10 +191,38 @@
             ignore (List(str)): List of paths to skip
         """
 
-        logging.info(f"Analyzing local folder {path}")
+        # Determine the allowed root directory for analysis.
+        # Defaults to the current working directory if CODE_GRAPH_ALLOWED_ROOT is not set.
+        allowed_root_env = os.environ.get("CODE_GRAPH_ALLOWED_ROOT")
+        if allowed_root_env:
+            base_dir = Path(allowed_root_env)
+        else:
+            base_dir = Path.cwd()
 
+        resolved_base = base_dir.resolve()
+        resolved_path = Path(path).resolve()
+
+        # Ensure that the requested path is within the allowed root directory.
+        try:
+            # Python 3.9+: use is_relative_to if available
+            is_relative = getattr(resolved_path, "is_relative_to", None)
+            if callable(is_relative):
+                allowed = resolved_path.is_relative_to(resolved_base)
+            else:
+                # Fallback for older Python: use relative_to in a try/except.
+                resolved_path.relative_to(resolved_base)
+                allowed = True
+        except ValueError:
+            allowed = False
+
+        if not allowed:
+            logging.error(f"Requested path '{resolved_path}' is outside the allowed root '{resolved_base}'")
+            raise ValueError("Requested path is outside the allowed analysis root")
+
+        logging.info(f"Analyzing local folder {resolved_path}")
+
         # Analyze source files
-        self.analyze_sources(Path(path), ignore, g)
+        self.analyze_sources(resolved_path, ignore, g)
 
         logging.info("Done analyzing path")
 
EOF
@@ -1,6 +1,7 @@
from contextlib import nullcontext
from pathlib import Path
from typing import Optional
import os

from api.entities.entity import Entity
from api.entities.file import File
@@ -190,10 +191,38 @@
ignore (List(str)): List of paths to skip
"""

logging.info(f"Analyzing local folder {path}")
# Determine the allowed root directory for analysis.
# Defaults to the current working directory if CODE_GRAPH_ALLOWED_ROOT is not set.
allowed_root_env = os.environ.get("CODE_GRAPH_ALLOWED_ROOT")
if allowed_root_env:
base_dir = Path(allowed_root_env)
else:
base_dir = Path.cwd()

resolved_base = base_dir.resolve()
resolved_path = Path(path).resolve()

# Ensure that the requested path is within the allowed root directory.
try:
# Python 3.9+: use is_relative_to if available
is_relative = getattr(resolved_path, "is_relative_to", None)
if callable(is_relative):
allowed = resolved_path.is_relative_to(resolved_base)
else:
# Fallback for older Python: use relative_to in a try/except.
resolved_path.relative_to(resolved_base)
allowed = True
except ValueError:
allowed = False

if not allowed:
logging.error(f"Requested path '{resolved_path}' is outside the allowed root '{resolved_base}'")
raise ValueError("Requested path is outside the allowed analysis root")

logging.info(f"Analyzing local folder {resolved_path}")

# Analyze source files
self.analyze_sources(Path(path), ignore, g)
self.analyze_sources(resolved_path, ignore, g)

logging.info("Done analyzing path")

Copilot is powered by AI and may make mistakes. Always verify output.

def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
files = list(path.rglob("*.java")) + list(path.rglob("*.py"))
files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 11 days ago

General approach: treat the incoming path as untrusted and restrict directory traversal to a configured “safe root” on the server. Normalize the user-supplied path relative to that root, ensure it stays inside the root after normalization, and then operate only on the safe resolved path. This mitigates directory traversal and arbitrary filesystem access while preserving functionality for intended directories within the allowed root.

Concrete fix in this codebase:

  1. In SourceAnalyzer.analyze_local_folder, convert the incoming path string to a Path, resolve it, and ensure it is within an allowed base directory. To avoid assumptions about the environment, a conservative default is to use the current working directory (Path.cwd()) as the root for allowed analysis (you can later tighten this by making it configurable). Use Path.resolve() and Path.is_relative_to() (Python 3.9+) to perform this check safely.
  2. If the provided directory is outside the allowed root, log an error and either raise an exception or return early. Since this is library‑style code used from the Flask handlers, raising a ValueError is reasonable; the caller can catch it and turn it into an HTTP error if desired.
  3. Use the validated, resolved path when calling analyze_sources instead of the raw string.

These changes are all in api/analyzers/source_analyzer.py, around the analyze_local_folder method, plus a small helper to support Python versions without Path.is_relative_to (if needed). We do not need to modify imports; we already have Path and logging.


Suggested changeset 1
api/analyzers/source_analyzer.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/analyzers/source_analyzer.py b/api/analyzers/source_analyzer.py
--- a/api/analyzers/source_analyzer.py
+++ b/api/analyzers/source_analyzer.py
@@ -190,11 +190,33 @@
             ignore (List(str)): List of paths to skip
         """
 
-        logging.info(f"Analyzing local folder {path}")
+        # Define a safe root directory for analysis (current working directory by default)
+        safe_root = Path.cwd().resolve()
 
-        # Analyze source files
-        self.analyze_sources(Path(path), ignore, g)
+        # Resolve the user-provided path
+        requested_path = Path(path).resolve()
 
+        # Ensure the requested path is within the safe root directory
+        try:
+            # Python 3.9+ provides Path.is_relative_to
+            is_within_root = requested_path.is_relative_to(safe_root)
+        except AttributeError:
+            # Fallback for Python versions without is_relative_to
+            try:
+                requested_path.relative_to(safe_root)
+                is_within_root = True
+            except ValueError:
+                is_within_root = False
+
+        if not is_within_root:
+            logging.error(f"Refusing to analyze path outside of safe root. Requested: {requested_path}, Safe root: {safe_root}")
+            raise ValueError("Invalid path: outside of allowed analysis root")
+
+        logging.info(f"Analyzing local folder {requested_path}")
+
+        # Analyze source files using the validated, resolved path
+        self.analyze_sources(requested_path, ignore, g)
+
         logging.info("Done analyzing path")
 
     def analyze_local_repository(self, path: str, ignore: Optional[list[str]] = None) -> Graph:
EOF
@@ -190,11 +190,33 @@
ignore (List(str)): List of paths to skip
"""

logging.info(f"Analyzing local folder {path}")
# Define a safe root directory for analysis (current working directory by default)
safe_root = Path.cwd().resolve()

# Analyze source files
self.analyze_sources(Path(path), ignore, g)
# Resolve the user-provided path
requested_path = Path(path).resolve()

# Ensure the requested path is within the safe root directory
try:
# Python 3.9+ provides Path.is_relative_to
is_within_root = requested_path.is_relative_to(safe_root)
except AttributeError:
# Fallback for Python versions without is_relative_to
try:
requested_path.relative_to(safe_root)
is_within_root = True
except ValueError:
is_within_root = False

if not is_within_root:
logging.error(f"Refusing to analyze path outside of safe root. Requested: {requested_path}, Safe root: {safe_root}")
raise ValueError("Invalid path: outside of allowed analysis root")

logging.info(f"Analyzing local folder {requested_path}")

# Analyze source files using the validated, resolved path
self.analyze_sources(requested_path, ignore, g)

logging.info("Done analyzing path")

def analyze_local_repository(self, path: str, ignore: Optional[list[str]] = None) -> Graph:
Copilot is powered by AI and may make mistakes. Always verify output.

def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
files = list(path.rglob("*.java")) + list(path.rglob("*.py"))
files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 11 days ago

At a high level, the fix is to constrain and validate any user-provided filesystem path before using it with Path.resolve() or Path.rglob, by ensuring it is contained within a configured safe root directory. This typically involves: (1) defining a base directory (e.g., from configuration or environment), (2) resolving both the base and the requested path, and (3) verifying that the resolved requested path is a descendant of the base. If the check fails, the function should abort with an error instead of proceeding.

In this codebase, the main data-flow issue occurs where SourceAnalyzer.analyze_local_folder accepts an arbitrary path: str and passes Path(path) into analyze_sources, which calls path.resolve() and uses rglob. To fix this without changing existing functionality too much, we can normalize and validate the path inside analyze_local_folder before any filesystem operations: compute safe_root = Path.cwd().resolve() (or a more appropriate root if available) and target_path = Path(path).resolve(), then check that safe_root is a parent of target_path (e.g., target_path.is_relative_to(safe_root) in Python 3.9+, or a prefix comparison using relative_to). If the check fails, log an error and raise an exception, causing the caller (Flask endpoint) to return an error response instead of walking that path. We should then pass target_path (already resolved) into analyze_sources, which still calls path.resolve() but now on a value we’ve validated. Similarly, analyze_local_repository should validate its path argument in the same way before calling analyze_local_folder and Repository(path). Since we must not change imports beyond standard library, we’ll only rely on pathlib.Path and logging, both already imported.

Concretely:

  • In SourceAnalyzer.analyze_local_folder, replace the direct use of Path(path) with:
    • Resolve and validate path against a safe root (here we’ll use the current working directory as the base; that’s a safe default without additional project context).
    • Log and raise ValueError if path is outside this root.
    • Pass the validated Path to analyze_sources.
  • In SourceAnalyzer.analyze_local_repository, perform the same resolution/validation before calling analyze_local_folder and before constructing Repository(path). Use the resolved safe path string when opening the repository.
  • No new helper methods or imports are strictly necessary; we can implement the logic inline using Path.resolve and Path.relative_to with a try/except to simulate is_relative_to for broader compatibility.

This confines all subsequent filesystem traversals and repository access to a single safe root, addressing all alert variants that go through analyze_local_folder/analyze_sources and analyze_local_repository.


Suggested changeset 1
api/analyzers/source_analyzer.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/analyzers/source_analyzer.py b/api/analyzers/source_analyzer.py
--- a/api/analyzers/source_analyzer.py
+++ b/api/analyzers/source_analyzer.py
@@ -192,9 +192,19 @@
 
         logging.info(f"Analyzing local folder {path}")
 
-        # Analyze source files
-        self.analyze_sources(Path(path), ignore, g)
+        # Normalize and validate the provided path against a safe root (current working directory)
+        safe_root = Path.cwd().resolve()
+        target_path = Path(path).resolve()
+        try:
+            # This will raise a ValueError if target_path is not within safe_root
+            target_path.relative_to(safe_root)
+        except ValueError:
+            logging.error(f"Refusing to analyze path outside of safe root. safe_root={safe_root}, requested={target_path}")
+            raise ValueError("Invalid path: must be within the server's working directory")
 
+        # Analyze source files using the validated, normalized path
+        self.analyze_sources(target_path, ignore, g)
+
         logging.info("Done analyzing path")
 
     def analyze_local_repository(self, path: str, ignore: Optional[list[str]] = None) -> Graph:
@@ -210,10 +219,19 @@
         """
         from pygit2.repository import Repository
 
-        self.analyze_local_folder(path, ignore)
+        # Normalize and validate repository path against the same safe root
+        safe_root = Path.cwd().resolve()
+        repo_path = Path(path).resolve()
+        try:
+            repo_path.relative_to(safe_root)
+        except ValueError:
+            logging.error(f"Refusing to analyze repository outside of safe root. safe_root={safe_root}, requested={repo_path}")
+            raise ValueError("Invalid repository path: must be within the server's working directory")
 
+        self.analyze_local_folder(str(repo_path), ignore)
+
         # Save processed commit hash to the DB
-        repo = Repository(path)
+        repo = Repository(str(repo_path))
         head = repo.commit("HEAD")
         self.graph.set_graph_commit(head.short_id)
 
EOF
@@ -192,9 +192,19 @@

logging.info(f"Analyzing local folder {path}")

# Analyze source files
self.analyze_sources(Path(path), ignore, g)
# Normalize and validate the provided path against a safe root (current working directory)
safe_root = Path.cwd().resolve()
target_path = Path(path).resolve()
try:
# This will raise a ValueError if target_path is not within safe_root
target_path.relative_to(safe_root)
except ValueError:
logging.error(f"Refusing to analyze path outside of safe root. safe_root={safe_root}, requested={target_path}")
raise ValueError("Invalid path: must be within the server's working directory")

# Analyze source files using the validated, normalized path
self.analyze_sources(target_path, ignore, g)

logging.info("Done analyzing path")

def analyze_local_repository(self, path: str, ignore: Optional[list[str]] = None) -> Graph:
@@ -210,10 +219,19 @@
"""
from pygit2.repository import Repository

self.analyze_local_folder(path, ignore)
# Normalize and validate repository path against the same safe root
safe_root = Path.cwd().resolve()
repo_path = Path(path).resolve()
try:
repo_path.relative_to(safe_root)
except ValueError:
logging.error(f"Refusing to analyze repository outside of safe root. safe_root={safe_root}, requested={repo_path}")
raise ValueError("Invalid repository path: must be within the server's working directory")

self.analyze_local_folder(str(repo_path), ignore)

# Save processed commit hash to the DB
repo = Repository(path)
repo = Repository(str(repo_path))
head = repo.commit("HEAD")
self.graph.set_graph_commit(head.short_id)

Copilot is powered by AI and may make mistakes. Always verify output.
- Remove unused 'import os' from csharp/analyzer.py
- Replace 'from ...entities import *' with explicit imports
- Remove unused 'from pathlib import Path' from test_csharp_analyzer.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
api/analyzers/csharp/analyzer.py (1)

114-122: else branch in resolve_method is a no-op — dead code.

When func_node is neither member_access_expression nor identifier, the else branch reassigns func_node to node.child_by_field_name('function') — the exact value func_node was just set to on line 114. The assignment is redundant and the branch can be removed.

♻️ Proposed fix
         func_node = node.child_by_field_name('function')
         if func_node and func_node.type == 'member_access_expression':
             func_node = func_node.child_by_field_name('name')
         elif func_node and func_node.type == 'identifier':
             pass
-        else:
-            func_node = node.child_by_field_name('function')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/csharp/analyzer.py` around lines 114 - 122, In resolve_method,
the else branch that reassigns func_node to node.child_by_field_name('function')
is redundant because func_node was already set to that; remove that else branch
and its assignment so the logic becomes: get func_node =
node.child_by_field_name('function'), if its type is 'member_access_expression'
replace it with its child 'name', elif it's 'identifier' leave it, otherwise do
nothing; retain the subsequent "if func_node: node = func_node" behavior
(references: resolve_method, func_node, node.child_by_field_name('function'),
member_access_expression, identifier).
tests/test_csharp_analyzer.py (2)

52-60: Consider asserting the IMPLEMENTS relationship between ConsoleLogger and ILogger.

The test covers DEFINES and entity detection but omits verifying the core IMPLEMENTS/EXTENDS relationship resolution — the primary value of add_symbols and resolve_symbol. A CI-green test that doesn't cover this could mask LSP resolution failures silently.

✅ Suggested addition
# Verify ConsoleLogger implements ILogger
q = "MATCH (c:Class {name: 'ConsoleLogger'})-[:IMPLEMENTS]->(i:Interface {name: 'ILogger'}) RETURN c, i LIMIT 1"
res = g._query(q).result_set
self.assertEqual(len(res), 1, "ConsoleLogger should implement ILogger")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_csharp_analyzer.py` around lines 52 - 60, Add an assertion that
verifies the IMPLEMENTS relationship between ConsoleLogger and ILogger: run a
Cypher query similar to the existing checks (e.g., use g._query with "MATCH
(c:Class {name: 'ConsoleLogger'})-[:IMPLEMENTS]->(i:Interface {name: 'ILogger'})
RETURN c, i LIMIT 1") and assert the query returns the expected row (e.g.,
assert len(result_set) == 1 or that result_set[0] exists) so the test ensures
resolve_symbol/add_symbols correctly records the implements relationship for
ConsoleLogger -> ILogger.

8-24: Add teardown to clean up the "csharp" graph after the test.

Graph("csharp") is created but never cleaned up. If the underlying FalkorDB instance persists data across runs, a subsequent test execution will find pre-populated entities and the assertions may pass vacuously even if the analyzer is broken.

♻️ Suggested fix
 class Test_CSharp_Analyzer(unittest.TestCase):
+    def tearDown(self):
+        try:
+            Graph("csharp").delete()
+        except Exception:
+            pass
+
     def test_analyzer(self):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_csharp_analyzer.py` around lines 8 - 24, Add a tearDown to
Test_CSharp_Analyzer that cleans up the "csharp" graph after the test: implement
def tearDown(self): g = Graph("csharp"); call the Graph cleanup method provided
by your Graph API (e.g., g.delete(), g.clear(), or g.delete_graph()) to remove
all entities so the graph is empty between test runs; ensure the cleanup runs
regardless of test outcome so analyzer.analyze_local_folder(path, g) does not
leave persistent state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/csharp/analyzer.py`:
- Line 6: Replace the wildcard import in analyzer.py with explicit imports of
the entities you actually use (e.g., Entity, File, Symbol, Reference, Location —
list the exact class/variable names referenced in this module) to resolve Ruff
F403/F405; update the top import line from "from ...entities import *" to "from
...entities import Entity, File, Symbol, ..." matching the symbols used by
functions/methods in this file (refer to usages in analyzer.py such as any
constructor calls or type hints that reference Entity/File/Symbol) so static
analyzers can verify definitions.
- Line 50: Guard the call to node.child_by_field_name('name') before accessing
.text.decode('utf-8'): capture the result (e.g., name_node =
node.child_by_field_name('name')), return a safe default (None or empty string
consistent with the module) if name_node is None, otherwise decode
name_node.text.decode('utf-8'); update the site where the expression appears to
avoid AttributeError and keep behavior consistent with other analyzers.
- Around line 27-30: The current condition uses Path.glob(...) which returns
generators that are always truthy, causing subprocess.run(["dotnet", "restore"],
...) to run unconditionally; change the guard to actually test for matches (for
example use any(Path(path).glob("*.csproj")) or any(Path(path).glob("*.sln")) or
wrap the glob in list(...) and check truthiness) so dotnet restore only runs
when a .csproj or .sln is present; keep the existing temp_deps_cs directory
check and call to subprocess.run unchanged except for using the corrected
condition.
- Line 131: Update the return type annotation of resolve_symbol from "-> Entity"
to "-> list[Entity]" in api/analyzers/csharp/analyzer.py (the resolve_symbol
method) to match resolve_type and resolve_method and the usage in
Entity.resolved_symbol; also update the corresponding abstract declaration
AbstractAnalyzer.resolve_symbol to return list[Entity] so the signatures align
and mypy passes.

In `@api/analyzers/source_analyzer.py`:
- Around line 141-145: The rglob call uses the user-controlled variable path
directly, allowing path traversal; before any use of path.rglob (including the
C# block that creates SyncLanguageServer via SyncLanguageServer.create and
MultilspyConfig.from_dict), resolve and validate the incoming path in the entry
routines (e.g., analyze_local_folder/analyze_sources): convert to a Path and
call .resolve(), then ensure the resolved path is within an allowed base/root
(reject or normalize paths that escape the root, e.g., via ..), and only then
pass the safe resolved path to path.rglob and to SyncLanguageServer.create;
apply the same validation for the Java/Python blocks that also use path.rglob.

---

Duplicate comments:
In `@api/analyzers/source_analyzer.py`:
- Line 176: The rglob calls in analyze_sources (files =
list(path.rglob("*.java")) + list(path.rglob("*.py")) +
list(path.rglob("*.cs"))) use an unvalidated Path variable and trigger path
traversal findings; resolve and validate the incoming path once (e.g., call path
= path.resolve(strict=False) and ensure it is under an allowed root or repo root
/ whitelist) at the upstream site where path is created (the earlier code around
the path derivation) and then use the validated/resolved Path for the rglob
calls (resolved_path.rglob("*.java"), etc.) so all three rglob usages are fixed
together.

---

Nitpick comments:
In `@api/analyzers/csharp/analyzer.py`:
- Around line 114-122: In resolve_method, the else branch that reassigns
func_node to node.child_by_field_name('function') is redundant because func_node
was already set to that; remove that else branch and its assignment so the logic
becomes: get func_node = node.child_by_field_name('function'), if its type is
'member_access_expression' replace it with its child 'name', elif it's
'identifier' leave it, otherwise do nothing; retain the subsequent "if
func_node: node = func_node" behavior (references: resolve_method, func_node,
node.child_by_field_name('function'), member_access_expression, identifier).

In `@tests/test_csharp_analyzer.py`:
- Around line 52-60: Add an assertion that verifies the IMPLEMENTS relationship
between ConsoleLogger and ILogger: run a Cypher query similar to the existing
checks (e.g., use g._query with "MATCH (c:Class {name:
'ConsoleLogger'})-[:IMPLEMENTS]->(i:Interface {name: 'ILogger'}) RETURN c, i
LIMIT 1") and assert the query returns the expected row (e.g., assert
len(result_set) == 1 or that result_set[0] exists) so the test ensures
resolve_symbol/add_symbols correctly records the implements relationship for
ConsoleLogger -> ILogger.
- Around line 8-24: Add a tearDown to Test_CSharp_Analyzer that cleans up the
"csharp" graph after the test: implement def tearDown(self): g =
Graph("csharp"); call the Graph cleanup method provided by your Graph API (e.g.,
g.delete(), g.clear(), or g.delete_graph()) to remove all entities so the graph
is empty between test runs; ensure the cleanup runs regardless of test outcome
so analyzer.analyze_local_folder(path, g) does not leave persistent state.

from pathlib import Path

from multilspy import SyncLanguageServer
from ...entities import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace star import with explicit imports.

from ...entities import * prevents static analysis from verifying that Entity, File, Symbol, etc. are actually defined — exactly what Ruff F403/F405 is reporting. Other analyzers in the project use the same pattern, but this is the right place to break the habit for a new file.

♻️ Proposed fix
-from ...entities import *
+from ...entities.entity import Entity
+from ...entities.file import File
🧰 Tools
🪛 Ruff (0.15.1)

[error] 6-6: from ...entities import * used; unable to detect undefined names

(F403)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/csharp/analyzer.py` at line 6, Replace the wildcard import in
analyzer.py with explicit imports of the entities you actually use (e.g.,
Entity, File, Symbol, Reference, Location — list the exact class/variable names
referenced in this module) to resolve Ruff F403/F405; update the top import line
from "from ...entities import *" to "from ...entities import Entity, File,
Symbol, ..." matching the symbols used by functions/methods in this file (refer
to usages in analyzer.py such as any constructor calls or type hints that
reference Entity/File/Symbol) so static analyzers can verify definitions.

res.append(file.entities[method_dec])
return res

def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, key: str, symbol: Node) -> Entity:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return type annotation -> Entity is incorrect; should be -> list[Entity].

Both resolve_type and resolve_method return list[Entity], and Entity.resolved_symbol (in entity.py) iterates the result with for resolved_symbol in f(key, symbol.symbol) — confirming the list contract. The annotation mismatch will cause mypy errors and mislead callers. The same fix is needed in the abstract base class AbstractAnalyzer.resolve_symbol.

🐛 Proposed fix
-    def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, key: str, symbol: Node) -> Entity:
+    def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, key: str, symbol: Node) -> list[Entity]:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, key: str, symbol: Node) -> Entity:
def resolve_symbol(self, files: dict[Path, File], lsp: SyncLanguageServer, file_path: Path, path: Path, key: str, symbol: Node) -> list[Entity]:
🧰 Tools
🪛 Ruff (0.15.1)

[error] 131-131: File may be undefined, or defined from star imports

(F405)


[error] 131-131: Entity may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/csharp/analyzer.py` at line 131, Update the return type
annotation of resolve_symbol from "-> Entity" to "-> list[Entity]" in
api/analyzers/csharp/analyzer.py (the resolve_symbol method) to match
resolve_type and resolve_method and the usage in Entity.resolved_symbol; also
update the corresponding abstract declaration AbstractAnalyzer.resolve_symbol to
return list[Entity] so the signatures align and mypy passes.

Comment on lines +141 to +145
if any(path.rglob('*.cs')):
config = MultilspyConfig.from_dict({"code_language": "csharp"})
lsps[".cs"] = SyncLanguageServer.create(config, logger, str(path))
else:
lsps[".cs"] = NullLanguageServer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CodeQL: path traversal via user-controlled path in rglob.

path flows from the API request body into path.rglob('*.cs') without sanitization. A crafted path (e.g., containing ../) could cause the scanner to traverse outside the intended directory. CodeQL flagged this as a security failure. The pre-existing Java/Python blocks at lines 131 and 136 carry the same risk, but this new block adds another instance.

Resolve and validate path against an allowed base before using it in any rglob call. For example:

🔒 Suggested mitigation pattern
# In analyze_local_folder or analyze_sources, resolve and validate path early:
resolved = Path(path).resolve()
# Optionally assert it's within an allowed root
🧰 Tools
🪛 GitHub Check: CodeQL

[failure] 141-141: Uncontrolled data used in path expression
This path depends on a user-provided value.
This path depends on a user-provided value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/source_analyzer.py` around lines 141 - 145, The rglob call uses
the user-controlled variable path directly, allowing path traversal; before any
use of path.rglob (including the C# block that creates SyncLanguageServer via
SyncLanguageServer.create and MultilspyConfig.from_dict), resolve and validate
the incoming path in the entry routines (e.g.,
analyze_local_folder/analyze_sources): convert to a Path and call .resolve(),
then ensure the resolved path is within an allowed base/root (reject or
normalize paths that escape the root, e.g., via ..), and only then pass the safe
resolved path to path.rglob and to SyncLanguageServer.create; apply the same
validation for the Java/Python blocks that also use path.rglob.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
api/analyzers/csharp/analyzer.py (1)

113-122: Dead else branch is a no-op — remove it.

The else block on line 119 re-assigns func_node = node.child_by_field_name('function'), which func_node already holds (set on line 114 and left unchanged by the pass on line 118). The branch is unreachable in any meaningful way.

♻️ Proposed fix
         func_node = node.child_by_field_name('function')
         if func_node and func_node.type == 'member_access_expression':
             func_node = func_node.child_by_field_name('name')
-        elif func_node and func_node.type == 'identifier':
-            pass
-        else:
-            func_node = node.child_by_field_name('function')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/csharp/analyzer.py` around lines 113 - 122, The code handling
invocation_expression has a dead else that reassigns func_node to
node.child_by_field_name('function') even though func_node was already set, so
remove the redundant else branch: in the block where node.type ==
'invocation_expression' keep the initial func_node =
node.child_by_field_name('function'), handle the member_access_expression case
by replacing func_node with its 'name' child and keep the identifier branch (or
no-op), then drop the final else clause entirely so func_node is not redundantly
reassigned before the if func_node: node = func_node assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/csharp/analyzer.py`:
- Around line 53-59: get_entity_docstring currently only returns
node.prev_sibling.text which captures only the last line of multi-line "///" XML
doc blocks; change get_entity_docstring to walk backwards from node.prev_sibling
while siblings are comment nodes and contiguous (e.g., loop using prev_sibling),
collect each comment.text.decode('utf-8') into a list, then reverse or prepend
so the original order is preserved and join with newlines to return the full
docstring; keep the existing type checks and the ValueError for unknown types,
and ensure it still returns None when no preceding comment siblings exist.
- Around line 66-77: The current heuristic in the class/struct handling loop in
analyzer.py labels the first base_list entry as "base_class" even when it might
be an interface; update the block around the loop (where base_list_captures is
iterated) to: add a clear inline comment documenting this known false-positive,
and before calling entity.add_symbol("base_class", base_type) call
self.resolve_type(base_type) (or equivalent resolver used in this analyzer) and
check the resolved node's kind (e.g., 'class_declaration' or
'struct_declaration') — only then add "base_class"; otherwise fall back to
add_symbol("implement_interface", base_type). Keep the existing first-flag
behavior but gate the base_class assignment on the resolved type to avoid
spurious base_class edges.

---

Duplicate comments:
In `@api/analyzers/csharp/analyzer.py`:
- Around line 47-51: get_entity_name currently assumes
node.child_by_field_name('name') is non-null and calls .text, risking a None
dereference; update get_entity_name to fetch name_node =
node.child_by_field_name('name'), check if name_node is truthy before accessing
name_node.text.decode('utf-8'), and if it's None either raise a clearer
ValueError mentioning the parent node type and maybe node.start_point or return
a safe fallback (e.g., '') so callers of get_entity_name won't dereference None;
ensure the change touches only get_entity_name and preserves the existing node
type checks for 'class_declaration', 'interface_declaration',
'enum_declaration', 'struct_declaration', 'method_declaration', and
'constructor_declaration'.
- Line 131: The return type annotation for resolve_symbol is incorrect: change
the signature of resolve_symbol (in api/analyzers/csharp/analyzer.py) from "->
Entity" to "-> list[Entity]" so it matches resolve_type and resolve_method and
what Entity.resolved_symbol expects; update any import/type aliases if needed to
ensure list[Entity] is recognized by mypy.
- Around line 26-30: The guard in add_dependencies is ineffective and
Path.glob() is misused: change the early-return to check
path.joinpath("temp_deps_cs").is_dir(), only run dotnet restore when project
files exist by using any(path.glob("*.csproj")) or any(path.glob("*.sln"))
(instead of treating the generator as truthy), and after a successful
subprocess.run(..., cwd=str(path)) create the temp_deps_cs directory (e.g.,
path.joinpath("temp_deps_cs").mkdir()) so subsequent calls return early; adjust
references in add_dependencies to use the path parameter's methods (path.glob,
path.joinpath) rather than Path(f"{path}").

---

Nitpick comments:
In `@api/analyzers/csharp/analyzer.py`:
- Around line 113-122: The code handling invocation_expression has a dead else
that reassigns func_node to node.child_by_field_name('function') even though
func_node was already set, so remove the redundant else branch: in the block
where node.type == 'invocation_expression' keep the initial func_node =
node.child_by_field_name('function'), handle the member_access_expression case
by replacing func_node with its 'name' child and keep the identifier branch (or
no-op), then drop the final else clause entirely so func_node is not redundantly
reassigned before the if func_node: node = func_node assignment.

- Guard child_by_field_name('name') against None in get_entity_name
- Fix Path.glob() always-truthy bug in add_dependencies (use any())
- Fix resolve_symbol return type: -> list[Entity] in abstract class and all analyzers
- Remove redundant else branch in resolve_method
- Resolve path in analyze_sources to prevent traversal (path.resolve())
- Add setUp/tearDown to test class for graph cleanup
- Add IMPLEMENTS relationship assertion for ConsoleLogger -> ILogger

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive C# language support to the code analysis tool, enabling it to parse and analyze C# source files alongside existing Python and Java support.

Changes:

  • Added tree-sitter-c-sharp dependency (v0.23.1) for C# parsing
  • Implemented CSharpAnalyzer supporting classes, interfaces, enums, structs, methods, and constructors with inheritance/implementation tracking
  • Registered .cs file extension with LSP integration for C# in the source analyzer
  • Added unit tests and test source file demonstrating C# analysis capabilities
  • Updated documentation to reflect C# as a supported language

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml Added tree-sitter-c-sharp dependency version constraint
uv.lock Auto-generated lock file updates for C# dependencies (tree-sitter-c-sharp, rdflib, pyparsing)
api/analyzers/csharp/init.py Empty init file for C# analyzer package
api/analyzers/csharp/analyzer.py Complete C# analyzer implementation with tree-sitter query support, entity extraction, and LSP integration
api/analyzers/source_analyzer.py Registered CSharpAnalyzer for .cs extension and added C# LSP configuration
tests/test_csharp_analyzer.py Unit test validating C# entity detection and relationship tracking
tests/source_files/csharp/Program.cs Test source file demonstrating C# features (interfaces, classes, constructors, methods, inheritance)
README.md Updated supported languages list to include C#

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def add_dependencies(self, path: Path, files: list[Path]):
if Path(f"{path}/temp_deps_cs").is_dir():
return
if Path(f"{path}").glob("*.csproj") or Path(f"{path}").glob("*.sln"):
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition Path(f"{path}").glob("*.csproj") or Path(f"{path}").glob("*.sln") always evaluates to a truthy generator object, even when there are no matching files. This should use any() to check if files exist: any(Path(path).glob("*.csproj")) or any(Path(path).glob("*.sln"))

Suggested change
if Path(f"{path}").glob("*.csproj") or Path(f"{path}").glob("*.sln"):
if any(Path(path).glob("*.csproj")) or any(Path(path).glob("*.sln")):

Copilot uses AI. Check for mistakes.
Comment on lines 117 to 120
elif func_node and func_node.type == 'identifier':
pass
else:
func_node = node.child_by_field_name('function')
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the resolve_method function, when node.type is 'invocation_expression', the code extracts func_node and then has a branch that does nothing (lines 117-118: elif func_node and func_node.type == 'identifier': pass). This empty branch followed by an else clause that reassigns func_node to the same value (line 120) makes the logic confusing. Consider simplifying: if func_node.type == 'member_access_expression', extract the name field; otherwise use the function node as-is.

Suggested change
elif func_node and func_node.type == 'identifier':
pass
else:
func_node = node.child_by_field_name('function')

Copilot uses AI. Check for mistakes.

def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
files = list(path.rglob("*.java")) + list(path.rglob("*.py"))
path = path.resolve()

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 11 days ago

In general, to fix uncontrolled path usage you either (a) fully trust the caller and mark it as such (not applicable here), or (b) restrict paths to a known safe root / allow‑list and normalize before use. Here, the safest backward‑compatible fix is to constrain the analyzed folder to live under a configured root directory (for example, an environment variable like CODE_GRAPH_ROOT or, if unset, the current working directory). We’ll normalize both the root and the requested path using Path.resolve() and then ensure the requested path is inside the root via Path.is_relative_to (or equivalent). If it isn’t, we raise an error before any filesystem traversal.

Concretely:

  • In api/analyzers/source_analyzer.py, modify analyze_local_folder so that:
    • It converts the incoming path: str to a Path instance and resolves it.
    • It also determines a safe root directory (e.g., CODE_GRAPH_ROOT or Path.cwd()), resolves that, and checks that the requested path is within this root.
    • If the check fails or the path is not an existing directory, raise a ValueError (or similar) instead of analyzing.
    • Pass the already‑resolved Path into analyze_sources to avoid re‑resolving a tainted string.
  • Adjust analyze_sources so that it does not re‑resolve an already normalized Path (this keeps behavior consistent and avoids double‑normalize of a string we now validate earlier).
  • The Flask layer (tests/index.py) already validates os.path.isdir(path); we can keep that, relying on the deeper validation inside SourceAnalyzer. No changes are required there for the security fix.

This requires only standard library functionality (os, pathlib), so no extra project dependencies are needed.

Suggested changeset 1
api/analyzers/source_analyzer.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/api/analyzers/source_analyzer.py b/api/analyzers/source_analyzer.py
--- a/api/analyzers/source_analyzer.py
+++ b/api/analyzers/source_analyzer.py
@@ -17,6 +17,7 @@
 from multilspy.multilspy_logger import MultilspyLogger
 
 import logging
+import os
 # Configure logging
 logging.basicConfig(level=logging.DEBUG, format='%(filename)s - %(asctime)s - %(levelname)s - %(message)s')
 
@@ -173,7 +174,6 @@
         self.second_pass(graph, files, path)
 
     def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
-        path = path.resolve()
         files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
         # First pass analysis of the source code
         self.first_pass(path, files, ignore, graph)
@@ -192,9 +192,33 @@
 
         logging.info(f"Analyzing local folder {path}")
 
-        # Analyze source files
-        self.analyze_sources(Path(path), ignore, g)
+        # Determine the allowed root directory for analysis.
+        # Prefer the CODE_GRAPH_ROOT environment variable if set; otherwise default to the current working directory.
+        root_env = os.environ.get("CODE_GRAPH_ROOT")
+        if root_env:
+            root = Path(root_env).resolve()
+        else:
+            root = Path.cwd().resolve()
 
+        # Resolve the requested path and ensure it is within the allowed root.
+        target_path = Path(path).resolve()
+
+        try:
+            # Python 3.9+ has Path.is_relative_to; fall back to a prefix check if unavailable.
+            is_within_root = target_path.is_relative_to(root)  # type: ignore[attr-defined]
+        except AttributeError:
+            # Fallback for Python versions without is_relative_to
+            is_within_root = str(target_path).startswith(str(root) + os.sep) or target_path == root
+
+        if not is_within_root:
+            raise ValueError(f"Path '{target_path}' is outside of the allowed root '{root}'")
+
+        if not target_path.is_dir():
+            raise ValueError(f"Path '{target_path}' does not exist or is not a directory")
+
+        # Analyze source files using the validated, resolved path
+        self.analyze_sources(target_path, ignore, g)
+
         logging.info("Done analyzing path")
 
     def analyze_local_repository(self, path: str, ignore: Optional[list[str]] = None) -> Graph:
EOF
@@ -17,6 +17,7 @@
from multilspy.multilspy_logger import MultilspyLogger

import logging
import os
# Configure logging
logging.basicConfig(level=logging.DEBUG, format='%(filename)s - %(asctime)s - %(levelname)s - %(message)s')

@@ -173,7 +174,6 @@
self.second_pass(graph, files, path)

def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None:
path = path.resolve()
files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs"))
# First pass analysis of the source code
self.first_pass(path, files, ignore, graph)
@@ -192,9 +192,33 @@

logging.info(f"Analyzing local folder {path}")

# Analyze source files
self.analyze_sources(Path(path), ignore, g)
# Determine the allowed root directory for analysis.
# Prefer the CODE_GRAPH_ROOT environment variable if set; otherwise default to the current working directory.
root_env = os.environ.get("CODE_GRAPH_ROOT")
if root_env:
root = Path(root_env).resolve()
else:
root = Path.cwd().resolve()

# Resolve the requested path and ensure it is within the allowed root.
target_path = Path(path).resolve()

try:
# Python 3.9+ has Path.is_relative_to; fall back to a prefix check if unavailable.
is_within_root = target_path.is_relative_to(root) # type: ignore[attr-defined]
except AttributeError:
# Fallback for Python versions without is_relative_to
is_within_root = str(target_path).startswith(str(root) + os.sep) or target_path == root

if not is_within_root:
raise ValueError(f"Path '{target_path}' is outside of the allowed root '{root}'")

if not target_path.is_dir():
raise ValueError(f"Path '{target_path}' does not exist or is not a directory")

# Analyze source files using the validated, resolved path
self.analyze_sources(target_path, ignore, g)

logging.info("Done analyzing path")

def analyze_local_repository(self, path: str, ignore: Optional[list[str]] = None) -> Graph:
Copilot is powered by AI and may make mistakes. Always verify output.
- Walk back through contiguous comment siblings to collect all XML doc
  comment lines instead of only the last one
- Add clarifying comment about the base_list heuristic limitation
  (cannot distinguish base class from interface without semantic analysis)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
api/analyzers/csharp/analyzer.py (1)

20-24: Simplify _captures to align with Java and Python analyzers.

The CSharp analyzer currently uses QueryCursor(query).captures(node) via the _captures helper, while both JavaAnalyzer and PythonAnalyzer call query.captures(node) directly. These paths are functionally equivalent in current tree-sitter Python bindings. Consolidating to the direct pattern reduces maintenance surface and removes API-version sensitivity.

Proposed simplification
-    def _captures(self, pattern: str, node: Node) -> dict:
-        """Run a tree-sitter query and return captures dict."""
-        query = self.language.query(pattern)
-        cursor = QueryCursor(query)
-        return cursor.captures(node)
+    def _captures(self, pattern: str, node: Node) -> dict:
+        """Run a tree-sitter query and return captures dict."""
+        return self.language.query(pattern).captures(node)

Remove QueryCursor from the import on line 11, as it will no longer be used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/csharp/analyzer.py` around lines 20 - 24, The _captures helper
in CSharpAnalyzer should call the query's captures directly like the Java/Python
analyzers: replace the body of _captures in analyzer.py to return
self.language.query(pattern).captures(node) (i.e., remove use of QueryCursor),
and remove the unused QueryCursor import at the top; keep the method signature
def _captures(self, pattern: str, node: Node) -> dict and ensure other callers
to _captures continue to work unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/analyzers/csharp/analyzer.py`:
- Around line 26-30: The add_dependencies function never creates the sentinel
directory and never populates the files list; modify add_dependencies to run
subprocess.run(["dotnet","restore"], cwd=str(path), check=True) only when a
.csproj or .sln exists, and after a successful restore create the sentinel
Path(f"{path}/temp_deps_cs").mkdir(parents=True, exist_ok=True) so the
idempotency guard can short-circuit future calls; also populate the files
argument (e.g., extend files with Path(path).glob("**/*.cs") or other C# source
patterns) so C# source files are queued for analysis and the files parameter is
actually used.

---

Duplicate comments:
In `@api/analyzers/source_analyzer.py`:
- Around line 141-145: analyze_files currently passes the user-supplied Path
directly into first_pass/second_pass causing second_pass's use of
path.rglob('*.cs') to be vulnerable; update analyze_files to resolve the
incoming path with path = path.resolve() before calling first_pass and
second_pass, and additionally validate the resolved path is inside an allowed
analysis root (e.g., compare against a configurable ANALYSIS_ROOT or raise/abort
if not a subpath) so that rglob cannot operate on arbitrary filesystem
locations; ensure references include analyze_files, first_pass, second_pass, and
path.resolve() in your changes.

---

Nitpick comments:
In `@api/analyzers/csharp/analyzer.py`:
- Around line 20-24: The _captures helper in CSharpAnalyzer should call the
query's captures directly like the Java/Python analyzers: replace the body of
_captures in analyzer.py to return self.language.query(pattern).captures(node)
(i.e., remove use of QueryCursor), and remove the unused QueryCursor import at
the top; keep the method signature def _captures(self, pattern: str, node: Node)
-> dict and ensure other callers to _captures continue to work unchanged.

Comment on lines +26 to +30
def add_dependencies(self, path: Path, files: list[Path]):
if Path(f"{path}/temp_deps_cs").is_dir():
return
if any(Path(f"{path}").glob("*.csproj")) or any(Path(f"{path}").glob("*.sln")):
subprocess.run(["dotnet", "restore"], cwd=str(path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

add_dependencies idempotency guard never fires — dotnet restore runs on every invocation.

Path(f"{path}/temp_deps_cs").is_dir() is checked on line 27, but nothing in this function (or anywhere else) creates the temp_deps_cs directory. The guard is therefore permanently False and dotnet restore will run on every call. This contrasts with the Python analyzer (checks venv/ which it creates) and the Java analyzer (creates temp_deps/ before scanning it).

Additionally, unlike Java (files.extend(...)) and Python (files.extend(...)), files is never populated here, so no C# dependency source files are queued for analysis. Ruff also flags the files parameter as unused (ARG002).

🐛 Proposed fix — create the sentinel directory after a successful restore
 def add_dependencies(self, path: Path, files: list[Path]):
     if Path(f"{path}/temp_deps_cs").is_dir():
         return
     if any(Path(f"{path}").glob("*.csproj")) or any(Path(f"{path}").glob("*.sln")):
-        subprocess.run(["dotnet", "restore"], cwd=str(path))
+        result = subprocess.run(["dotnet", "restore"], cwd=str(path))
+        if result.returncode == 0:
+            Path(f"{path}/temp_deps_cs").mkdir(exist_ok=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def add_dependencies(self, path: Path, files: list[Path]):
if Path(f"{path}/temp_deps_cs").is_dir():
return
if any(Path(f"{path}").glob("*.csproj")) or any(Path(f"{path}").glob("*.sln")):
subprocess.run(["dotnet", "restore"], cwd=str(path))
def add_dependencies(self, path: Path, files: list[Path]):
if Path(f"{path}/temp_deps_cs").is_dir():
return
if any(Path(f"{path}").glob("*.csproj")) or any(Path(f"{path}").glob("*.sln")):
result = subprocess.run(["dotnet", "restore"], cwd=str(path))
if result.returncode == 0:
Path(f"{path}/temp_deps_cs").mkdir(exist_ok=True)
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 26-26: Unused method argument: files

(ARG002)


[error] 30-30: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/analyzers/csharp/analyzer.py` around lines 26 - 30, The add_dependencies
function never creates the sentinel directory and never populates the files
list; modify add_dependencies to run subprocess.run(["dotnet","restore"],
cwd=str(path), check=True) only when a .csproj or .sln exists, and after a
successful restore create the sentinel
Path(f"{path}/temp_deps_cs").mkdir(parents=True, exist_ok=True) so the
idempotency guard can short-circuit future calls; also populate the files
argument (e.g., extend files with Path(path).glob("**/*.cs") or other C# source
patterns) so C# source files are queued for analysis and the files parameter is
actually used.

@gkorland gkorland merged commit 4fefd9f into main Feb 19, 2026
8 of 9 checks passed
@gkorland gkorland deleted the csharp branch February 19, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants