fix(realtime): handle non-code files and filter spurious events#405
fix(realtime): handle non-code files and filter spurious events#405bhargavchippada wants to merge 5 commits intovitali87:mainfrom
Conversation
This commit fixes three issues in the real-time file watcher: 1. Filter spurious file system events: Only process MODIFIED, CREATED, and deleted events. Previously, read-only events like "opened" and "closed_no_write" (triggered by IDEs accessing files) would cause files to be deleted from the graph but not recreated, since Step 3 only runs for modification events. 2. Delete File nodes for non-code files: The existing CYPHER_DELETE_MODULE query only deletes Module nodes (for code files). Added a separate query to delete File nodes, ensuring non-code files like .md, .json, etc. are properly removed when deleted from the filesystem. 3. Create File nodes for ALL file types: Added process_generic_file() call for all files during MODIFIED/CREATED events, not just code files with recognized language configs. This ensures non-code files are indexed in real-time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the real-time file watcher's accuracy and completeness. It resolves issues where the graph could become inconsistent due to unhandled file types or irrelevant file system events, ensuring that the knowledge graph accurately reflects the repository's state for all files, not just code. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several valuable fixes to the real-time file watcher. It correctly filters spurious file system events, preventing incorrect deletions from the graph. Additionally, it adds logic to properly handle non-code files during creation and deletion, ensuring they are accurately represented and removed. The changes are logical and effectively address the described issues. I have a couple of suggestions to improve maintainability by moving hardcoded values to constants and using enums for consistency.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Greptile SummaryThis PR fixes three related bugs in
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[FileSystemEvent received] --> B{is_directory\nor not relevant?}
B -- Yes --> Z[return early]
B -- No --> C{event_type in\nMODIFIED / CREATED / DELETED?}
C -- No --> Z
C -- Yes --> D[Step 1: Delete Module node\nCYPHER_DELETE_MODULE]
D --> E[Step 1: Delete File node\nnew MATCH f:File DETACH DELETE]
E --> F[Step 2: remove_file_from_state]
F --> G{event_type ==\nMODIFIED or CREATED?}
G -- No / deleted --> I
G -- Yes --> H{lang_config exists\nand language supported?}
H -- Yes --> H2[process_file — build AST\nand Module/code nodes]
H -- No --> H3[skip code parsing]
H2 --> H4[process_generic_file — create File node]
H3 --> H4
H4 --> I[Step 4: CYPHER_DELETE_CALLS\n+ _process_function_calls]
I --> J[Step 5: flush_all]
Last reviewed commit: 53faff4 |
| return | ||
|
|
||
| logger.warning( | ||
| logs.CHANGE_DETECTED.format(event_type=event.event_type, path=path) | ||
| ) | ||
|
|
||
| # (H) Step 1 | ||
| # (H) Step 1: Delete existing nodes for this file path | ||
| # Delete Module node and its children (for code files) | ||
| ingestor.execute_write(CYPHER_DELETE_MODULE, {KEY_PATH: relative_path_str}) | ||
| # Delete File node (for all files including non-code like .md, .json) | ||
| ingestor.execute_write( | ||
| CYPHER_DELETE_FILE, {KEY_PATH: relative_path_str} | ||
| ) |
There was a problem hiding this comment.
Tests not updated for new execute_write call
The PR adds a second execute_write call in Step 1 (deleting the File node), but test_realtime_updater.py was not updated. All four test assertions still check execute_write.call_count == 2 (see tests at lines 45, 65, 84, 130), but the flow now has three calls:
execute_write(CYPHER_DELETE_MODULE, ...)— delete Module nodeexecute_write("MATCH (f:File …) DETACH DELETE f", ...)— new File node deletionexecute_write(CYPHER_DELETE_CALLS)— Step 4
These tests will fail as-is. The expected count in each of those assertions should be updated to 3, and a new assertion verifying process_generic_file is called for MODIFIED/CREATED events would be valuable too.
Prompt To Fix With AI
This is a comment left during a code review.
Path: realtime_updater.py
Line: 84-96
Comment:
**Tests not updated for new `execute_write` call**
The PR adds a second `execute_write` call in Step 1 (deleting the `File` node), but `test_realtime_updater.py` was not updated. All four test assertions still check `execute_write.call_count == 2` (see tests at lines 45, 65, 84, 130), but the flow now has **three** calls:
1. `execute_write(CYPHER_DELETE_MODULE, ...)` — delete Module node
2. `execute_write("MATCH (f:File …) DETACH DELETE f", ...)` — new File node deletion
3. `execute_write(CYPHER_DELETE_CALLS)` — Step 4
These tests will fail as-is. The expected count in each of those assertions should be updated to `3`, and a new assertion verifying `process_generic_file` is called for `MODIFIED`/`CREATED` events would be valuable too.
How can I resolve this? If you propose a fix, please make it concise.| ingestor.execute_write( | ||
| CYPHER_DELETE_FILE, {KEY_PATH: relative_path_str} | ||
| ) |
There was a problem hiding this comment.
Inline Cypher query should be a named constant
The project keeps all Cypher queries as named constants in constants.py (e.g. CYPHER_DELETE_MODULE, CYPHER_DELETE_CALLS). The new query is a raw string literal directly in the handler, which violates the "No Hardcoded Strings" rule.
Move it to constants.py:
CYPHER_DELETE_FILE = "MATCH (f:File {path: $path}) DETACH DELETE f"Then reference it here:
| ingestor.execute_write( | |
| CYPHER_DELETE_FILE, {KEY_PATH: relative_path_str} | |
| ) | |
| ingestor.execute_write( | |
| CYPHER_DELETE_FILE, {KEY_PATH: relative_path_str} | |
| ) |
Context Used: Rule from dashboard - ## Technical Requirements
Agentic Framework
- PydanticAI Only: This project uses PydanticAI... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: realtime_updater.py
Line: 94-96
Comment:
**Inline Cypher query should be a named constant**
The project keeps all Cypher queries as named constants in `constants.py` (e.g. `CYPHER_DELETE_MODULE`, `CYPHER_DELETE_CALLS`). The new query is a raw string literal directly in the handler, which violates the "No Hardcoded Strings" rule.
Move it to `constants.py`:
```python
CYPHER_DELETE_FILE = "MATCH (f:File {path: $path}) DETACH DELETE f"
```
Then reference it here:
```suggestion
ingestor.execute_write(
CYPHER_DELETE_FILE, {KEY_PATH: relative_path_str}
)
```
**Context Used:** Rule from `dashboard` - ## Technical Requirements
### Agentic Framework
- **PydanticAI Only**: This project uses PydanticAI... ([source](https://app.greptile.com/review/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| relevant_events = { | ||
| EventType.MODIFIED, | ||
| EventType.CREATED, | ||
| EventType.DELETED, # watchdog deletion event | ||
| } | ||
| if event.event_type not in relevant_events: |
There was a problem hiding this comment.
"deleted" should be a StrEnum member, not a raw string
EventType already has MODIFIED and CREATED as StrEnum members, but "deleted" is added as a bare string literal. Per the "No Hardcoded Strings" rule, it should be added to the enum in constants.py:
class EventType(StrEnum):
MODIFIED = "modified"
CREATED = "created"
DELETED = "deleted"Then reference it here:
| relevant_events = { | |
| EventType.MODIFIED, | |
| EventType.CREATED, | |
| EventType.DELETED, # watchdog deletion event | |
| } | |
| if event.event_type not in relevant_events: | |
| relevant_events = { | |
| EventType.MODIFIED, | |
| EventType.CREATED, | |
| EventType.DELETED, | |
| } |
Context Used: Rule from dashboard - ## Technical Requirements
Agentic Framework
- PydanticAI Only: This project uses PydanticAI... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: realtime_updater.py
Line: 78-83
Comment:
**`"deleted"` should be a `StrEnum` member, not a raw string**
`EventType` already has `MODIFIED` and `CREATED` as `StrEnum` members, but `"deleted"` is added as a bare string literal. Per the "No Hardcoded Strings" rule, it should be added to the enum in `constants.py`:
```python
class EventType(StrEnum):
MODIFIED = "modified"
CREATED = "created"
DELETED = "deleted"
```
Then reference it here:
```suggestion
relevant_events = {
EventType.MODIFIED,
EventType.CREATED,
EventType.DELETED,
}
```
**Context Used:** Rule from `dashboard` - ## Technical Requirements
### Agentic Framework
- **PydanticAI Only**: This project uses PydanticAI... ([source](https://app.greptile.com/review/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
How can I resolve this? If you propose a fix, please make it concise.
realtime_updater.py
Outdated
| relative_path_str = str(path.relative_to(self.updater.repo_path)) | ||
|
|
||
| # (H) Only process events that actually change file content | ||
| # Skip read-only events like "opened", "closed_no_write" that don't modify the file |
There was a problem hiding this comment.
Comments missing (H) prefix
Per the project's comment policy, all inline comments must be prefixed with (H). The following new comments added by this PR are missing the prefix:
realtime_updater.py:77—# Skip read-only events like "opened", "closed_no_write"...realtime_updater.py:81—# watchdog deletion eventrealtime_updater.py:91—# Delete Module node and its children (for code files)realtime_updater.py:93—# Delete File node (for all files including non-code like .md, .json)
Each should be prefixed with (H), e.g.:
| # Skip read-only events like "opened", "closed_no_write" that don't modify the file | |
| # (H) Skip read-only events like "opened", "closed_no_write" that don't modify the file |
Context Used: Rule from dashboard - ## Technical Requirements
Agentic Framework
- PydanticAI Only: This project uses PydanticAI... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: realtime_updater.py
Line: 77
Comment:
**Comments missing `(H)` prefix**
Per the project's comment policy, all inline comments must be prefixed with `(H)`. The following new comments added by this PR are missing the prefix:
- `realtime_updater.py:77` — `# Skip read-only events like "opened", "closed_no_write"...`
- `realtime_updater.py:81` — `# watchdog deletion event`
- `realtime_updater.py:91` — `# Delete Module node and its children (for code files)`
- `realtime_updater.py:93` — `# Delete File node (for all files including non-code like .md, .json)`
Each should be prefixed with `(H)`, e.g.:
```suggestion
# (H) Skip read-only events like "opened", "closed_no_write" that don't modify the file
```
**Context Used:** Rule from `dashboard` - ## Technical Requirements
### Agentic Framework
- **PydanticAI Only**: This project uses PydanticAI... ([source](https://app.greptile.com/review/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
How can I resolve this? If you propose a fix, please make it concise.Add missing constants required by the code review suggestions: - EventType.DELETED = "deleted" for watchdog deletion events - CYPHER_DELETE_FILE query for deleting File nodes - Update import in realtime_updater.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…prefixes - Update test assertions to expect 3 execute_write calls (was 2): DELETE_MODULE + DELETE_FILE + DELETE_CALLS - Rename test_unsupported_file_types_are_ignored to test_non_code_files_create_file_nodes to reflect new behavior - Add assertion for process_generic_file being called for non-code files - Add (H) prefix to all new comments per project convention - Add pytest as dev dependency All 6 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
GraphUpdater._process_files only cleared in-memory state for deleted files but never issued Cypher DELETE to Memgraph. Files/folders deleted before the hash cache existed were invisible to the diff logic entirely. - Add _prune_orphan_nodes() to GraphUpdater that queries all File, Module, and Folder paths from the graph, checks filesystem existence, and deletes stale nodes via CYPHER_DELETE_* queries - Fix _process_files to issue CYPHER_DELETE_MODULE + CYPHER_DELETE_FILE for hash-cache-detected deletions (not just in-memory cleanup) - Add CYPHER_DELETE_FOLDER and CYPHER_ALL_*_PATHS query constants - Add PRUNE_* log message constants - Add 10 unit tests covering pruning logic, edge cases, and integration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
This PR fixes three issues in the real-time file watcher (
realtime_updater.py):Filter spurious file system events: Only process
MODIFIED,CREATED, anddeletedeventsopenedandclosed_no_write(triggered by IDEs accessing files) would cause files to be deleted from the graph but not recreatedMODIFIED/CREATEDeventsDelete File nodes for non-code files: Added query to delete
FilenodesCYPHER_DELETE_MODULEquery only deletesModulenodes (for code files).md,.json, etc. were never removed from the graph when deleted from the filesystemCreate File nodes for ALL file types: Added
process_generic_file()call for all filesTest plan
.mdfile → verified it appears in the graph.mdfile → verified it's removed from the graph🤖 Generated with Claude Code