Feat: phase 3 polish sse#7
Conversation
…re findings - For both run_scan and run_workspace_scan, the 'scans' table row is now inserted BEFORE the background task starts emitting findings into 'scan_results'. This eliminates the foreign key constraint violation. - Completion logic now uses UPDATE instead of INSERT ... ON CONFLICT, keeping the logic clean and atomic. - Added usage store (apps/web/src/lib/stores/usage.ts) for reactive sidebar scan count that refreshes after each completed scan. - Removed stale code paths and unused mut warnings.
…oss server and scanner crates
📝 WalkthroughWalkthroughThis PR introduces severity-based filtering for security scan findings across the stack. Users can now select a minimum severity threshold when initiating scans. The backend filters findings accordingly before returning results, while a new usage tracking store monitors scan history. SSE streaming is refactored for cache-then-live consumption, and the rate limit increases from 10 to 100 requests per IP. Changes
Sequence DiagramsequenceDiagram
participant User as User
participant Frontend as Frontend App
participant Store as Usage Store
participant Server as Server/API
participant Scanner as Scanner Engine
participant Database as Database
User->>Frontend: Select min_severity & initiate scan
Frontend->>Server: POST /scan (min_severity included)
Server->>Database: INSERT scans row (early init)
Server->>Scanner: Run scan with config
Scanner->>Scanner: Process findings
Scanner->>Server: Emit ScanEvent::Finding
loop For each Finding
Server->>Server: Check severity >= min_severity
alt Severity meets threshold
Server->>Database: Cache finding in state.results[scan_id]
Server->>Frontend: SSE stream Finding event
else Severity below threshold
Server->>Server: Skip/filter out finding
end
end
Server->>Database: UPDATE scans row (mark complete)
Server->>Frontend: SSE stream completion event
Frontend->>Store: Call refreshScanCount()
Store->>Server: GET /history
Server->>Database: Fetch scan history
Database->>Server: Return history
Server->>Store: Return history entries
Store->>Store: Update scanCount state
Frontend->>Frontend: Reactive UI updates (usage bar, count display)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by QodoPhase 3 Polish: Fix FK violations, add severity filtering, and improve SSE handling
WalkthroughsDescription• Resolve foreign key violations by inserting scan records before findings • Add severity filtering for scan results with min_severity parameter • Implement reactive usage store for sidebar scan count tracking • Simplify regex patterns and add manual filtering in AI code engine • Apply consistent code formatting across scanner and server crates • Improve SSE subscription logic with cached event replay and live streaming • Increase rate limit from 10 to 100 scans per minute • Update setup documentation with clearer step-by-step instructions Diagramflowchart LR
A["Scan Request"] -->|"Phase 1: Insert scan record"| B["scans table"]
A -->|"Phase 2: Start background task"| C["Scanner Engine"]
C -->|"Stream findings"| D["scan_results table"]
D -->|"Update counts"| B
B -->|"Cache results"| E["results store"]
E -->|"Replay + Live stream"| F["SSE Subscriber"]
G["min_severity filter"] -->|"Applied during scan"| C
H["Usage Store"] -->|"Refresh after complete"| I["Sidebar UI"]
File Changes1. crates/scanner/src/engines/ai_code.rs
|
Code Review by Qodo
1. Direct fetch in +page.svelte
|
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
| api_key: aiConfig.apiKey, | ||
| model: aiConfig.model, | ||
| endpoint: aiConfig.endpoint || undefined, | ||
| } : undefined | ||
| } : undefined, | ||
| min_severity: selectedMinSeverity !== "info" ? selectedMinSeverity : undefined | ||
| }) |
There was a problem hiding this comment.
1. Direct fetch in +page.svelte 📘 Rule violation ⚙ Maintainability
The scan page modifies a direct fetch request instead of using the typed API client wrapper, reducing consistency and type safety for frontend network calls. This violates the requirement to route frontend network access through apps/web/src/lib/api.ts.
Agent Prompt
## Issue description
The scan page is constructing/modifying request payloads inside an ad-hoc `fetch` flow instead of calling the typed API wrapper.
## Issue Context
Compliance requires frontend network calls to be centralized in `apps/web/src/lib/api.ts` for consistent behavior and typing.
## Fix Focus Areas
- apps/web/src/routes/scan/+page.svelte[62-120]
- apps/web/src/routes/scan/+page.svelte[141-195]
- apps/web/src/lib/api.ts[48-137]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| let selectedLanguage = $state("python"); | ||
| let selectedMinSeverity = $state("info"); | ||
|
|
There was a problem hiding this comment.
2. No zod for minseverity 📘 Rule violation ⛨ Security
A new user input (selectedMinSeverity / min_severity) is introduced and sent to the API without Zod-based validation. This weakens input validation guarantees and violates the Zod-only validation requirement for forms/API inputs.
Agent Prompt
## Issue description
A new UI/API input (`min_severity`) is added without Zod validation.
## Issue Context
Compliance requires using Zod schemas for validating all new/changed form inputs and API inputs.
## Fix Focus Areas
- apps/web/src/routes/scan/+page.svelte[23-25]
- apps/web/src/routes/scan/+page.svelte[71-76]
- apps/web/src/routes/scan/+page.svelte[339-352]
- apps/web/src/lib/api.ts[14-20]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| if let Err(e) = sqlx::query( | ||
| "INSERT INTO scans (id, language, target_name, findings_count, severity_counts) | ||
| VALUES ($1, $2, $3, 0, $4)", | ||
| ) | ||
| .bind(scan_id) | ||
| .bind(&payload_lang) | ||
| .bind("Manual Scan") | ||
| .bind(serde_json::to_value(std::collections::HashMap::<String, i32>::new()).unwrap()) | ||
| .execute(&state.db) |
There was a problem hiding this comment.
3. New serde_json unwrap in server 📘 Rule violation ☼ Reliability
The server introduces new .unwrap() calls when serializing severity_counts, which can panic and crash the API process. This violates the no-unwrap/expect requirement for Rust library/API code.
Agent Prompt
## Issue description
New `.unwrap()` calls were introduced in the API server when serializing JSON for DB writes; these can panic and crash the process.
## Issue Context
This is API/server code (`crates/server`) and must avoid panics by propagating/handling errors.
## Fix Focus Areas
- crates/server/src/main.rs[321-339]
- crates/server/src/main.rs[400-408]
- crates/server/src/main.rs[501-518]
- crates/server/src/main.rs[566-575]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| let (tx, _rx) = broadcast::channel(100); | ||
| state.scans.insert(scan_id, tx.clone()); | ||
| state.results.insert(scan_id, Vec::new()); | ||
|
|
There was a problem hiding this comment.
4. Leaked scan state on db fail 🐞 Bug ☼ Reliability
run_scan/run_workspace_scan insert scan_id into in-memory maps before initializing the DB scan row; if the INSERT INTO scans fails, the handler returns early without removing those entries, leaving an orphaned scan channel/results in memory. This can cause memory leaks and SSE subscriptions that never complete for that scan_id.
Agent Prompt
### Issue description
`run_scan` and `run_workspace_scan` insert `scan_id` into `state.scans` and `state.results` before inserting the corresponding row into `scans`. If the DB insert fails, the handler returns an error without removing those map entries, leaving an orphaned in-memory scan state.
### Issue Context
This can leak memory and create “ghost” scans that never progress/complete.
### Fix Focus Areas
- crates/server/src/main.rs[294-339]
- crates/server/src/main.rs[467-518]
### Suggested fix
- Move `state.scans.insert(...)` and `state.results.insert(...)` to *after* the successful `INSERT INTO scans ...`.
- Or, if you keep the current order, ensure you remove the entries on the error return path (and consider adding a small helper to avoid duplicating cleanup between scan types).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| let mut seen_count = 0; | ||
| let live_stream = tokio_stream::wrappers::BroadcastStream::new(rx) | ||
| .filter_map(|msg| msg.ok()) | ||
| .filter(move |_| { | ||
| // Skip the first N events that were already in our cache clone | ||
| // This is a simple heuristic; a real implementation would use event IDs. | ||
| if seen_count < num_cached { | ||
| seen_count += 1; | ||
| false | ||
| } else { | ||
| true | ||
| } | ||
| }) |
There was a problem hiding this comment.
5. Sse drops first live events 🐞 Bug ≡ Correctness
subscribe_to_scan skips the first num_cached messages from the live broadcast stream, but num_cached includes events that occurred before the subscriber connected and therefore will not be present in the broadcast receiver. This can drop the first real-time findings/progress events after a client subscribes.
Agent Prompt
### Issue description
`subscribe_to_scan` computes `num_cached` from the full cached event list and then skips that many events from the live broadcast stream. Because the broadcast receiver does not replay pre-subscription messages, this skip can discard legitimate new events.
### Issue Context
Cache contains *all* events up to the clone time; live stream contains only events *after* subscribe. Skipping `num_cached` assumes live contains duplicates of the entire cache, which is not true.
### Fix Focus Areas
- crates/server/src/main.rs[596-635]
### Suggested fix
- Remove the `seen_count/num_cached` skip filter entirely.
- If you still need de-duplication, add a monotonic event sequence number (or timestamp+nonce) to emitted events, and:
- Send cached events first.
- Then forward only live events with `seq > last_cached_seq`.
- Alternatively, store `last_cached_seq` in the cache clone and filter on that instead of counts.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
crates/scanner/tests/integration.rs (1)
246-254:⚠️ Potential issue | 🟠 MajorMajor test robustness: “sorted by severity” assertion is too strict
The multi-engine test constructs
severities, sorts a clone, and then usesassert_eq!(severities, sorted, ...)(Lines 246-254). This requires the entire vector order to match, including the relative order of findings that share the same severity.If there are multiple findings with equal severity, the ordering among them may legitimately differ while still being “severity descending”. A more robust check is to assert monotonic non-increasing severities across adjacent elements.
🛠️ Proposed more robust assertion
- let severities: Vec<&Severity> = findings.iter().map(|f| &f.severity).collect(); - let mut sorted = severities.clone(); - sorted.sort_by(|a, b| b.cmp(a)); - assert_eq!( - severities, sorted, - "Findings should be sorted by severity descending" - ); + let severities: Vec<&Severity> = findings.iter().map(|f| &f.severity).collect(); + assert!( + severities + .windows(2) + .all(|w| w[0] >= w[1]), + "Findings should be sorted by severity descending" + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/scanner/tests/integration.rs` around lines 246 - 254, The current test collects severities into severities (Vec<&Severity>), clones it into sorted and asserts exact equality (assert_eq!(severities, sorted)) which is too strict for equal-severity items; change the assertion to verify non-increasing order instead: iterate adjacent pairs of severities (e.g., using windows or a simple for loop over indices) and assert that each earlier severity >= the next, replacing the assert_eq on the whole vector; update the check near the code that builds severities, sorted, and calls sorted.sort_by (remove or keep sort_by only as needed for comparison) and replace the equality assertion with the monotonicity assertion so equal severities may appear in any relative order.crates/scanner/src/engines/sca.rs (1)
271-321:⚠️ Potential issue | 🟠 MajorPotential correctness risk:
zip()may silently truncate OSV batch results.In
query_osv, you populate results via:
for (dep, query_result) in chunk.iter().zip(batch_resp.results.iter()) { ... }If OSV ever returns fewer
resultsthanqueries(or in the unlikely case ordering/length diverges),zip()will stop early and you’ll fail to push(dep.clone(), vec![])for the remaining deps—leading to missing/partial findings.Suggested defensive approach (recommended refactor):
- iterate by index over
chunk- pull
batch_resp.results.get(i)and default toNone/empty vulnsif missing🔧 Proposed fix sketch
- for (dep, query_result) in chunk.iter().zip(batch_resp.results.iter()) { - let vulns = query_result.vulns.clone().unwrap_or_default(); - results.push((dep.clone(), vulns)); - } + for (i, dep) in chunk.iter().enumerate() { + let query_result = batch_resp + .results + .get(i); + let vulns = query_result + .and_then(|qr| qr.vulns.clone()) + .unwrap_or_default(); + results.push((dep.clone(), vulns)); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/scanner/src/engines/sca.rs` around lines 271 - 321, The use of zip in query_osv can silently truncate results when OSV returns fewer entries than queries; change the chunk processing to iterate by index (for i in 0..chunk.len()) and for each dep use batch_resp.results.get(i) to obtain the corresponding result (or None) and default to an empty vec when missing, then push (dep.clone(), vulns) for every index so every dependency in the chunk is always accounted for even if the response is shorter or out-of-sync.apps/web/src/routes/scan/+page.svelte (2)
54-200: 🛠️ Refactor suggestion | 🟠 MajorThis route needs to be split before it grows further.
The page is already far past the 200-line Svelte cap, and the new severity controls plus duplicated single/workspace scan flow add even more parallel logic to keep in sync. Please extract the scan form/results pieces or move the shared orchestration into reusable helpers.
As per coding guidelines,
apps/web/src/**/*.svelte: Component files max 200 lines — split into smaller components if larger.Also applies to: 339-393
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/routes/scan/`+page.svelte around lines 54 - 200, The page exceeds the 200-line Svelte limit and duplicates scan orchestration; extract the scan UI and event handling into smaller components and reusable helpers: move the form and results display into new Svelte components (e.g., ScanForm.svelte and ScanResults.svelte) and factor the SSE/scan orchestration logic into shared functions (e.g., startScan(), subscribeToScanEvents(scan_id) used by runScan and runWorkspaceScan) so runScan and runWorkspaceScan only prepare payloads and call the shared helpers; ensure you preserve symbols scanProgress, scanStatus, findings, isScanning and call refreshScanCount from the shared completion handler.
62-77: 🛠️ Refactor suggestion | 🟠 MajorKeep scan POSTs in the typed API client.
This page still builds both scan requests with raw
fetch, which breaks the repo's API-boundary rule and leaves the newmin_severitycontract duplicated in two places. Move these POSTs into$lib/api.tsand call the typed helper from here.As per coding guidelines,
apps/web/src/**/*.{ts,tsx,svelte}: All fetch calls must go through typed API client functions inapps/web/src/lib/api.ts.Also applies to: 141-159
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/routes/scan/`+page.svelte around lines 62 - 77, Replace the inline fetch POSTs with a typed API client function: add a function in apps/web/src/lib/api.ts (e.g., submitScan or createScan) that accepts the same payload fields (code, language, engines, optional ai_config with provider/api_key/model/endpoint, and optional min_severity) and performs the POST to /api/v1/scan returning the typed response; then update apps/web/src/routes/scan/+page.svelte to call that new client function instead of using fetch (do this for both occurrences around lines 62-77 and 141-159) so the min_severity contract and all fetches live in the typed API surface.
🧹 Nitpick comments (4)
crates/scanner/tests/integration.rs (3)
140-148: Suggested: strengthen GitHub token test to validate engine/severity (not just non-empty)The GitHub token test now only asserts
!findings.is_empty()(Line 147). To make this integration test more meaningful (and less dependent on scan ordering/contents), consider asserting at least one finding matches the expected engine and minimum severity (e.g.,Engine::SecretsandSeverity::High/Critical, whichever is the intended contract).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/scanner/tests/integration.rs` around lines 140 - 148, Update the test secrets_detects_github_token to not only assert findings is non-empty but to assert at least one finding has engine == Engine::Secrets and a severity at or above the intended threshold (e.g., Severity::High or Severity::Critical); locate the test function secrets_detects_github_token and the scan(...) call, iterate/scan the returned findings to check finding.engine == Engine::Secrets and compare finding.severity against Severity::High (or the correct enum variant) and fail the test if no such finding exists.
90-97: Minor: avoid doubleto_lowercase()in weak-hashing predicateIn
sast_detects_weak_hashing, the predicate callsf.title.to_lowercase()twice (Lines 93-95). This allocates twice and is slightly noisy. Cache the lowercase title once inside the closure.♻️ Suggested tweak
- assert!( - findings - .iter() - .any(|f| f.title.to_lowercase().contains("md5") - || f.title.to_lowercase().contains("hash")), - "Expected a weak-hashing finding" - ); + assert!( + findings.iter().any(|f| { + let t = f.title.to_lowercase(); + t.contains("md5") || t.contains("hash") + }), + "Expected a weak-hashing finding" + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/scanner/tests/integration.rs` around lines 90 - 97, In the sast_detects_weak_hashing test, the predicate currently calls f.title.to_lowercase() twice which causes two allocations; modify the closure to compute let title = f.title.to_lowercase(); once and then use title.contains("md5") || title.contains("hash") so the lowercase string is reused. Update the iterator predicate (the closure over f) to cache the lowercase title before the contains checks to eliminate the duplicate to_lowercase() call.
151-166: Potential brittleness: avoid assumingfindings[0]is the RSA private keyIn
secrets_detects_rsa_private_key, you check non-empty (Line 163) but then assertfindings[0].severity == Severity::Critical(Line 164). If the scan can emit multiple findings (even for the same input),[0]could be the “wrong” one and make the test flaky.Prefer
iter().any(...)to assert “there exists a critical RSA-private-key finding” rather than relying on index 0.♻️ Proposed change
- assert!(!findings.is_empty(), "Expected a private key finding"); - assert_eq!(findings[0].severity, Severity::Critical); + assert!(!findings.is_empty(), "Expected a private key finding"); + assert!( + findings.iter().any(|f| f.severity == Severity::Critical), + "Expected at least one Critical private key finding" + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/scanner/tests/integration.rs` around lines 151 - 166, The test secrets_detects_rsa_private_key is brittle because it assumes findings[0] is the RSA key; instead update the assertion to check that at least one finding matches the RSA private key criteria (e.g., Severity::Critical and the finding kind or name indicates an RSA private key) by using findings.iter().any(|f| /* predicate */) so the test asserts existence rather than relying on index 0; locate the test function secrets_detects_rsa_private_key and replace the assert_eq!(findings[0].severity, Severity::Critical) with an assert!(findings.iter().any(...)) using a predicate that checks severity and identifies the RSA-private-key finding.crates/scanner/src/engines/sca.rs (1)
1-437: Run formatting + clippy as required by repo guidelines.The PR appears to be formatting/polish-heavy; please ensure the repository requirement is met before merge:
cargo fmtcargo clippy -- -D warnings🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/scanner/src/engines/sca.rs` around lines 1 - 437, Run cargo fmt and cargo clippy -- -D warnings and fix any issues they report; specifically run `cargo fmt` to apply formatting and then `cargo clippy -- -D warnings` and address lint errors/warnings coming from functions and types in this file (e.g. parse_dependencies, parse_package_lock, parse_requirements_txt, parse_go_sum, map_cvss_severity, query_osv, run) by adjusting code to satisfy clippy (remove unused imports/variables, handle Result unwraps/expectations, simplify patterns, add explicit types or docs as required) until clippy exits cleanly with zero warnings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/web/src/lib/stores/usage.ts`:
- Around line 6-9: refreshScanCount() currently calls getHistory() which returns
a capped page (latest 50) so scanCount will max out and not reflect real usage;
change refreshScanCount to call a dedicated usage/count endpoint (e.g. create
and use getUsageCount() or fetch '/api/v1/usage' that returns the total scan
count) and set scanCount from that response instead of history.length; update
any callers that rely on refreshScanCount to expect a numeric total and add
minimal error handling for the usage fetch in the refreshScanCount function.
In `@apps/web/src/routes/`+layout.svelte:
- Around line 67-69: Clamp the progress percent and remaining scans before
rendering: compute a safe percent (e.g. use Math.min(100, ( $scanCount /
maxScans )*100) with a guard for maxScans === 0) and a non-negative remaining
count (e.g. Math.max(0, maxScans - $scanCount)), then use those values in the
progress bar div inline style and in the paragraph text instead of using
($scanCount / maxScans) * 100 and maxScans - $scanCount directly; update
references to $scanCount and maxScans in the progress bar div and the "{maxScans
- $scanCount}/{maxScans} scans remaining" paragraph to use the clamped
variables.
In `@apps/web/src/routes/scan/`+page.svelte:
- Around line 111-119: The onerror handler is closing the EventSource during
EventSource.CONNECTING (transient retry) which aborts automatic reconnection;
update the condition in the onerror handler so it only closes when
eventSource.readyState === EventSource.CLOSED (remove EventSource.CONNECTING
from the OR), and keep isScanning false + eventSource.close() only in that
CLOSED branch; reference the eventSource variable and isScanning flag in your
change.
In `@crates/scanner/src/engines/ai_code.rs`:
- Around line 115-117: The two uses of Regex::new(...).unwrap() (the
decorator-detection Regex instances created in ai_code.rs) introduce panics in
library code; replace the .unwrap() calls by propagating the compile error
instead—change the surrounding code to return Result<..., regex::Error> (or
propagate via ?), remove the unwraps, and map/return the Regex::new(...) error
to the caller so regex construction failures are handled instead of panicking;
update the function/closure signatures that construct these Regexes accordingly
(and adjust callers) so errors flow through.
- Around line 205-213: The suppression check is too broad: l.contains("user")
hides valid routes like "/users/{id}"; update the condition inside the rule.name
== "Unauthenticated Route Handler" block to only suppress when "user" appears as
an auth-related token or standalone identifier (not as part of "users"). Replace
l.contains("user") with a stricter match (e.g., a regex word-boundary check or
explicit tokens) such as testing
r"\b(user|current_user|user_id|get_current_user|authenticated_user)\b" or
equivalent so "/users" no longer matches while "{user}" or "current_user" still
suppress. Reference: the if block checking rule.name == "Unauthenticated Route
Handler" and the variable l.
- Around line 154-155: The current Regex::new(...) in
crates/scanner/src/engines/ai_code.rs is too narrow (only matches `url =
"http://..."` and `fetch("http://..."`); broaden the pattern used when
constructing the Regex in the code (the Regex::new call) to match other common
HTTP usages like function calls and attribute accesses (e.g.,
`requests.get("http://...")`, `axios.get("http://...")`, plain quoted
`"http://..."`, and similar call/assignment patterns) while keeping the existing
localhost/127.0.0.1 suppression logic inside the run function intact; update the
Regex::new invocation to a more general case-insensitive pattern that captures
quoted http:// URLs and typical call forms and leave the filtering in run
unchanged.
In `@crates/server/src/cve_sync/mod.rs`:
- Around line 336-340: The code currently treats the OSV severity string in
variable score as a numeric value (used to compute n), but OSV returns CVSS
vectors like "CVSS:3.1/AV:N/..." so .split('/').next() will not yield a numeric
base score; update the parsing where n is computed to detect CVSS vector strings
and extract the numeric base score by using a CVSS parser (add and use a CVSS v3
crate or helper) instead of naive splitting: if score starts_with "CVSS" parse
the vector (e.g., via a cvss crate API such as cvss::v3::Vector::from_str or
equivalent) and read its base score, otherwise fall back to parsing a numeric
string or default to 5.0; replace the current expression that sets n to call
this parser and return the parsed base score as f32. Ensure you reference the
same variable names (score and n) and update error handling to fall back to 5.0
on parse failure.
In `@crates/server/src/main.rs`:
- Around line 401-408: The code currently swallows the result of the SQL update
(the sqlx::query that binds findings.len(),
serde_json::to_value(&severity_counts) and scan_id and executes against
state_task.db) which can make the scan appear successful even when the DB write
fails; change this to handle the Result: check the .await result, log any error
via the existing logging facility and, on failure, update the scan record to a
failed/incomplete state (e.g., set a status column or failure flag on the scans
row referenced by scan_id) so /history and /results reflect the real state;
apply the same treatment to the other background persistence calls around the
553-574 region.
- Around line 323-330: The INSERT query uses serde_json::to_value(...).unwrap()
which can panic; remove the unwrap and handle the serialization error
explicitly: call serde_json::to_value(...) and propagate or map the Result to an
error path (e.g., return Err or call the existing scan-failure/cleanup routine
used elsewhere in this module) before executing sqlx::query. Update the code
around the sqlx::query/execute block (the INSERT for scans) to match error
handling style used elsewhere (check for Result from to_value, log the error
with scan_id/payload_lang, and fail the scan cleanly), and apply the same unwrap
removal/explicit handling for the other occurrences noted (around the other
serde_json::to_value calls in the file ranges mentioned).
- Around line 596-634: The current count-based deduplication using num_cached
and seen_count is wrong; remove the seen_count logic and instead deduplicate by
stable event IDs or sequence numbers: build a HashSet of IDs (from
cached_events) before creating past_stream, then in the live_stream (the
tokio_stream::wrappers::BroadcastStream created from rx) filter incoming events
by checking their event_id/sequence (extract the ID/seq from the incoming event)
against that set, only forwarding events whose ID/seq is not already present and
insert newly-seen IDs into the set; update the map/filter chain that produces
live_stream (and keep past_stream unchanged) to use this ID-based check rather
than the count-based closure so you don’t drop genuine future events.
- Around line 321-339: The pre-created scan row is left with findings_count=0 on
early exits, making it look like a successful scan; update the early-return
paths that follow the initial INSERT (the block using scan_id and the
surrounding scan workflow) to either set a terminal failure state (e.g., run
sqlx::query("UPDATE scans SET status = $1 WHERE id =
$2").bind("Failed").bind(scan_id).execute(&state.db).await) or delete the
placeholder row (sqlx::query("DELETE FROM scans WHERE id =
$1").bind(scan_id).execute(&state.db).await) before returning; factor this into
a small helper like mark_scan_failed_or_delete(scan_id, &state.db) and call it
on every early-exit path (including the other similar blocks you noted) so the
DB never retains a misleading zero-finding successful scan.
---
Outside diff comments:
In `@apps/web/src/routes/scan/`+page.svelte:
- Around line 54-200: The page exceeds the 200-line Svelte limit and duplicates
scan orchestration; extract the scan UI and event handling into smaller
components and reusable helpers: move the form and results display into new
Svelte components (e.g., ScanForm.svelte and ScanResults.svelte) and factor the
SSE/scan orchestration logic into shared functions (e.g., startScan(),
subscribeToScanEvents(scan_id) used by runScan and runWorkspaceScan) so runScan
and runWorkspaceScan only prepare payloads and call the shared helpers; ensure
you preserve symbols scanProgress, scanStatus, findings, isScanning and call
refreshScanCount from the shared completion handler.
- Around line 62-77: Replace the inline fetch POSTs with a typed API client
function: add a function in apps/web/src/lib/api.ts (e.g., submitScan or
createScan) that accepts the same payload fields (code, language, engines,
optional ai_config with provider/api_key/model/endpoint, and optional
min_severity) and performs the POST to /api/v1/scan returning the typed
response; then update apps/web/src/routes/scan/+page.svelte to call that new
client function instead of using fetch (do this for both occurrences around
lines 62-77 and 141-159) so the min_severity contract and all fetches live in
the typed API surface.
In `@crates/scanner/src/engines/sca.rs`:
- Around line 271-321: The use of zip in query_osv can silently truncate results
when OSV returns fewer entries than queries; change the chunk processing to
iterate by index (for i in 0..chunk.len()) and for each dep use
batch_resp.results.get(i) to obtain the corresponding result (or None) and
default to an empty vec when missing, then push (dep.clone(), vulns) for every
index so every dependency in the chunk is always accounted for even if the
response is shorter or out-of-sync.
In `@crates/scanner/tests/integration.rs`:
- Around line 246-254: The current test collects severities into severities
(Vec<&Severity>), clones it into sorted and asserts exact equality
(assert_eq!(severities, sorted)) which is too strict for equal-severity items;
change the assertion to verify non-increasing order instead: iterate adjacent
pairs of severities (e.g., using windows or a simple for loop over indices) and
assert that each earlier severity >= the next, replacing the assert_eq on the
whole vector; update the check near the code that builds severities, sorted, and
calls sorted.sort_by (remove or keep sort_by only as needed for comparison) and
replace the equality assertion with the monotonicity assertion so equal
severities may appear in any relative order.
---
Nitpick comments:
In `@crates/scanner/src/engines/sca.rs`:
- Around line 1-437: Run cargo fmt and cargo clippy -- -D warnings and fix any
issues they report; specifically run `cargo fmt` to apply formatting and then
`cargo clippy -- -D warnings` and address lint errors/warnings coming from
functions and types in this file (e.g. parse_dependencies, parse_package_lock,
parse_requirements_txt, parse_go_sum, map_cvss_severity, query_osv, run) by
adjusting code to satisfy clippy (remove unused imports/variables, handle Result
unwraps/expectations, simplify patterns, add explicit types or docs as required)
until clippy exits cleanly with zero warnings.
In `@crates/scanner/tests/integration.rs`:
- Around line 140-148: Update the test secrets_detects_github_token to not only
assert findings is non-empty but to assert at least one finding has engine ==
Engine::Secrets and a severity at or above the intended threshold (e.g.,
Severity::High or Severity::Critical); locate the test function
secrets_detects_github_token and the scan(...) call, iterate/scan the returned
findings to check finding.engine == Engine::Secrets and compare finding.severity
against Severity::High (or the correct enum variant) and fail the test if no
such finding exists.
- Around line 90-97: In the sast_detects_weak_hashing test, the predicate
currently calls f.title.to_lowercase() twice which causes two allocations;
modify the closure to compute let title = f.title.to_lowercase(); once and then
use title.contains("md5") || title.contains("hash") so the lowercase string is
reused. Update the iterator predicate (the closure over f) to cache the
lowercase title before the contains checks to eliminate the duplicate
to_lowercase() call.
- Around line 151-166: The test secrets_detects_rsa_private_key is brittle
because it assumes findings[0] is the RSA key; instead update the assertion to
check that at least one finding matches the RSA private key criteria (e.g.,
Severity::Critical and the finding kind or name indicates an RSA private key) by
using findings.iter().any(|f| /* predicate */) so the test asserts existence
rather than relying on index 0; locate the test function
secrets_detects_rsa_private_key and replace the assert_eq!(findings[0].severity,
Severity::Critical) with an assert!(findings.iter().any(...)) using a predicate
that checks severity and identifies the RSA-private-key finding.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 39b02cea-e8ef-44f8-a8d6-af64c4048ca2
📒 Files selected for processing (11)
README.mdapps/web/src/lib/api.tsapps/web/src/lib/stores/usage.tsapps/web/src/routes/+layout.svelteapps/web/src/routes/history/[id]/+page.svelteapps/web/src/routes/scan/+page.sveltecrates/scanner/src/engines/ai_code.rscrates/scanner/src/engines/sca.rscrates/scanner/tests/integration.rscrates/server/src/cve_sync/mod.rscrates/server/src/main.rs
| export async function refreshScanCount() { | ||
| try { | ||
| const history = await getHistory(); | ||
| scanCount.set(history.length); |
There was a problem hiding this comment.
This store is counting a capped history page, not actual usage.
refreshScanCount() uses /api/v1/history, but the server only returns the latest 50 scans. After that point scanCount stops reflecting real usage, so anything deriving quota/billing state from this value will drift. A dedicated usage/count endpoint would be safer here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/src/lib/stores/usage.ts` around lines 6 - 9, refreshScanCount()
currently calls getHistory() which returns a capped page (latest 50) so
scanCount will max out and not reflect real usage; change refreshScanCount to
call a dedicated usage/count endpoint (e.g. create and use getUsageCount() or
fetch '/api/v1/usage' that returns the total scan count) and set scanCount from
that response instead of history.length; update any callers that rely on
refreshScanCount to expect a numeric total and add minimal error handling for
the usage fetch in the refreshScanCount function.
| <div class="bg-brand-primary h-full rounded-full shadow-[0_0_8px_rgba(236,72,153,0.5)]" style="width: {($scanCount / maxScans) * 100}%"></div> | ||
| </div> | ||
| <p class="text-[10px] text-zinc-500 mt-2">4/10 scans remaining</p> | ||
| <p class="text-[10px] text-zinc-500 mt-2">{maxScans - $scanCount}/{maxScans} scans remaining</p> |
There was a problem hiding this comment.
Clamp the usage math once the free tier is exhausted.
If $scanCount goes past maxScans, the bar width exceeds 100% and the remaining label becomes negative. Clamp both values so the UI stays sane after the limit is reached.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/src/routes/`+layout.svelte around lines 67 - 69, Clamp the progress
percent and remaining scans before rendering: compute a safe percent (e.g. use
Math.min(100, ( $scanCount / maxScans )*100) with a guard for maxScans === 0)
and a non-negative remaining count (e.g. Math.max(0, maxScans - $scanCount)),
then use those values in the progress bar div inline style and in the paragraph
text instead of using ($scanCount / maxScans) * 100 and maxScans - $scanCount
directly; update references to $scanCount and maxScans in the progress bar div
and the "{maxScans - $scanCount}/{maxScans} scans remaining" paragraph to use
the clamped variables.
| eventSource.onerror = () => { | ||
| console.error("SSE connection failed"); | ||
| isScanning = false; | ||
| eventSource.close(); | ||
| scanStatus = "Connection lost. Reconnecting or finished."; | ||
| // We don't necessarily want to set isScanning to false here because SSE often reconnects automatically. | ||
| // But for Zenvra's simple model, we'll close it to avoid zombie states if the scan is actually done. | ||
| if (eventSource.readyState === EventSource.CLOSED || eventSource.readyState === EventSource.CONNECTING) { | ||
| isScanning = false; | ||
| eventSource.close(); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
On the EventSource Web API, what does readyState === EventSource.CONNECTINGmean duringonerror, and should the client close the connection or allow the browser to retry automatically?
💡 Result:
readyState === EventSource.CONNECTING during the onerror event means the EventSource connection has encountered an error (e.g., network issue, server error) and the browser has set the state to CONNECTING to automatically attempt reconnection, as per the WHATWG HTML spec. This is the normal reconnection process: the browser will retry after a delay (default ~3s, adjustable via retry: field), keeping readyState as CONNECTING until success (OPEN) or permanent failure (CLOSED). The client should NOT close the connection in onerror when readyState is CONNECTING. Closing it (es.close) sets readyState to CLOSED, permanently stopping retries. Official docs (MDN, WHATWG) do not recommend closing; they advise logging/handling the error and letting the browser retry automatically for transient failures. Only close for fatal/permanent errors (when readyState === CLOSED) or per app logic. Many implementations add manual reconnection logic only if readyState === CLOSED to handle cases where browser stops retrying (e.g., certain HTTP errors like 500 in some browsers).
Citations:
- 1: https://developer.mozilla.org/en-US/docs/Web/API/EventSource/readyState
- 2: https://developer.mozilla.org/en-US/docs/Web/API/EventSource
- 3: https://html.spec.whatwg.org/multipage/server-sent-events.html
- 4: https://stackoverflow.com/questions/24564030/is-an-eventsource-sse-supposed-to-try-to-reconnect-indefinitely
- 5: https://whatpr.org/html/3851/server-sent-events.html
Remove EventSource.CONNECTING from the close condition.
EventSource.CONNECTING indicates the browser is actively retrying after a transient failure. Closing the source while in this state cancels automatic reconnection, causing scans to abort on brief network hiccups. Per the WHATWG spec, clients should only close when readyState === EventSource.CLOSED (permanent failure) or allow the browser to retry automatically. Update the condition to exclude CONNECTING:
Current problematic code
if (eventSource.readyState === EventSource.CLOSED || eventSource.readyState === EventSource.CONNECTING) {
isScanning = false;
eventSource.close();
}Only close if readyState === EventSource.CLOSED or handle retries explicitly for fatal errors, not transient ones.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/web/src/routes/scan/`+page.svelte around lines 111 - 119, The onerror
handler is closing the EventSource during EventSource.CONNECTING (transient
retry) which aborts automatic reconnection; update the condition in the onerror
handler so it only closes when eventSource.readyState === EventSource.CLOSED
(remove EventSource.CONNECTING from the OR), and keep isScanning false +
eventSource.close() only in that CLOSED branch; reference the eventSource
variable and isScanning flag in your change.
| // Simplified: detect the decorator. Manual filtering below. | ||
| Regex::new(r"(?i)@(app|router)\.(delete|put|patch)\s*\(").unwrap() | ||
| }) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Avoid introducing new .unwrap() in scanner library code.
Line 116 and Line 155 add new panic paths in library code. Please propagate regex construction errors instead of unwrapping.
Proposed direction
-fn build_rules() -> Vec<AiCodeRule> {
+fn build_rules() -> Result<Vec<AiCodeRule>> {
vec![
AiCodeRule {
name: "Unauthenticated Route Handler",
- regex: NO_AUTH_ROUTE_REGEX.get_or_init(|| {
- Regex::new(r"(?i)@(app|router)\.(delete|put|patch)\s*\(").unwrap()
- }).clone(),
+ regex: Regex::new(r"(?i)@(app|router)\.(delete|put|patch)\s*\(")?,
...
},
AiCodeRule {
name: "Plain HTTP Endpoint (No TLS)",
- regex: PLAIN_HTTP_REGEX.get_or_init(|| {
- Regex::new(r#"(?i)(url\s*=\s*['"]http://|fetch\s*\(\s*['"]http://)"#).unwrap()
- }).clone(),
+ regex: Regex::new(r#"(?i)(url\s*=\s*['"]http://|fetch\s*\(\s*['"]http://)"#)?,
...
},
]
}
...
- let rules = build_rules();
+ let rules = build_rules()?;As per coding guidelines: crates/scanner/**/*.rs: “never use .unwrap() in library code” and crates/{scanner,server}/**/*.rs: “Do not use .unwrap() or .expect() in library and API code”.
Also applies to: 154-156
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/scanner/src/engines/ai_code.rs` around lines 115 - 117, The two uses
of Regex::new(...).unwrap() (the decorator-detection Regex instances created in
ai_code.rs) introduce panics in library code; replace the .unwrap() calls by
propagating the compile error instead—change the surrounding code to return
Result<..., regex::Error> (or propagate via ?), remove the unwraps, and
map/return the Regex::new(...) error to the caller so regex construction
failures are handled instead of panicking; update the function/closure
signatures that construct these Regexes accordingly (and adjust callers) so
errors flow through.
| // Simplified: detect http:// without localhost/127.0.0.1 | ||
| Regex::new(r#"(?i)(url\s*=\s*['"]http://|fetch\s*\(\s*['"]http://)"#).unwrap() |
There was a problem hiding this comment.
Plain HTTP regex became too narrow and likely regresses detection coverage.
The new pattern only catches url = "http://..." and fetch("http://..."). It misses common forms like requests.get("http://..."), axios.get("http://..."), and other quoted URLs.
Broader pattern while keeping local-host suppression in run
- Regex::new(r#"(?i)(url\s*=\s*['"]http://|fetch\s*\(\s*['"]http://)"#).unwrap()
+ Regex::new(r#"(?i)['"]http://[^'"]+['"]"#).unwrap()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Simplified: detect http:// without localhost/127.0.0.1 | |
| Regex::new(r#"(?i)(url\s*=\s*['"]http://|fetch\s*\(\s*['"]http://)"#).unwrap() | |
| // Simplified: detect http:// without localhost/127.0.0.1 | |
| Regex::new(r#"(?i)['"]http://[^'"]+['"]"#).unwrap() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/scanner/src/engines/ai_code.rs` around lines 154 - 155, The current
Regex::new(...) in crates/scanner/src/engines/ai_code.rs is too narrow (only
matches `url = "http://..."` and `fetch("http://..."`); broaden the pattern used
when constructing the Regex in the code (the Regex::new call) to match other
common HTTP usages like function calls and attribute accesses (e.g.,
`requests.get("http://...")`, `axios.get("http://...")`, plain quoted
`"http://..."`, and similar call/assignment patterns) while keeping the existing
localhost/127.0.0.1 suppression logic inside the run function intact; update the
Regex::new invocation to a more general case-insensitive pattern that captures
quoted http:// URLs and typical call forms and leave the filtering in run
unchanged.
| let n: f32 = score | ||
| .split('/') | ||
| .next() | ||
| .and_then(|s| s.parse().ok()) | ||
| .unwrap_or(5.0); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the OSV API schema, what is the format of severity[].scorefor CVSS entries? Is it a numeric score or a CVSS vector string such asCVSS:3.1/...?
💡 Result:
In the OSV API schema, the format of severity[].score for CVSS entries is a CVSS vector string such as CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:C/C:H/I:N/A:N, not a numeric score. The OSV schema defines severity as an array of objects with type (e.g., "CVSS_V3") and score as a string containing the vector for CVSS types. API responses follow this schema, returning vector strings in score, as confirmed by schema documentation, examples, and discussions where numeric scores are not provided and must be calculated separately using CVSS libraries.
Citations:
- 1: https://ossf.github.io/osv-schema/
- 2: https://another-rex.github.io/osv-schema/
- 3: How to get an integer severity score from osv Rest API google/osv.dev#2643
- 4: OSV API Response is Inconsistent with Schema google/osv.dev#2337
- 5: https://github.com/ossf/osv-schema/blob/main/proto/vulnerability.proto
Parse CVSS vectors from OSV severity entries to extract numeric base scores.
The OSV API returns CVSS entries as vector strings (e.g., CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:C/C:H/I:N/A:N), not numeric scores. The current parsing logic with .split('/').next() will fail on the vector format, causing all OSV findings to silently default to a score of 5.0 (mapping to "medium" severity). This skews severity filtering and counts. Extract the numeric base score from the vector string using a proper CVSS parsing approach.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/server/src/cve_sync/mod.rs` around lines 336 - 340, The code currently
treats the OSV severity string in variable score as a numeric value (used to
compute n), but OSV returns CVSS vectors like "CVSS:3.1/AV:N/..." so
.split('/').next() will not yield a numeric base score; update the parsing where
n is computed to detect CVSS vector strings and extract the numeric base score
by using a CVSS parser (add and use a CVSS v3 crate or helper) instead of naive
splitting: if score starts_with "CVSS" parse the vector (e.g., via a cvss crate
API such as cvss::v3::Vector::from_str or equivalent) and read its base score,
otherwise fall back to parsing a numeric string or default to 5.0; replace the
current expression that sets n to call this parser and return the parsed base
score as f32. Ensure you reference the same variable names (score and n) and
update error handling to fall back to 5.0 on parse failure.
| // ─── Phase 1: Initialize Scan Record ───────────────────────────────────── | ||
| // We MUST insert the scan record before any findings to avoid FK violations. | ||
| if let Err(e) = sqlx::query( | ||
| "INSERT INTO scans (id, language, target_name, findings_count, severity_counts) | ||
| VALUES ($1, $2, $3, 0, $4)", | ||
| ) | ||
| .bind(scan_id) | ||
| .bind(&payload_lang) | ||
| .bind("Manual Scan") | ||
| .bind(serde_json::to_value(std::collections::HashMap::<String, i32>::new()).unwrap()) | ||
| .execute(&state.db) | ||
| .await | ||
| { | ||
| tracing::error!("Failed to initialize scan record {}: {}", scan_id, e); | ||
| return Err(( | ||
| StatusCode::INTERNAL_SERVER_ERROR, | ||
| "Database initialization error".to_string(), | ||
| )); | ||
| } |
There was a problem hiding this comment.
Don't leave pre-created scan rows looking like successful zero-finding scans.
These branches insert into scans before the worker starts, but the non-Complete exits just break and clean up memory. If the worker errors or the channel closes early, history keeps a findings_count = 0 row that looks successful, and the new usage counter will count it as real usage too. Persist a terminal failed state or delete the placeholder row on every early-exit path.
Also applies to: 399-416, 501-519, 566-575
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/server/src/main.rs` around lines 321 - 339, The pre-created scan row
is left with findings_count=0 on early exits, making it look like a successful
scan; update the early-return paths that follow the initial INSERT (the block
using scan_id and the surrounding scan workflow) to either set a terminal
failure state (e.g., run sqlx::query("UPDATE scans SET status = $1 WHERE id =
$2").bind("Failed").bind(scan_id).execute(&state.db).await) or delete the
placeholder row (sqlx::query("DELETE FROM scans WHERE id =
$1").bind(scan_id).execute(&state.db).await) before returning; factor this into
a small helper like mark_scan_failed_or_delete(scan_id, &state.db) and call it
on every early-exit path (including the other similar blocks you noted) so the
DB never retains a misleading zero-finding successful scan.
| if let Err(e) = sqlx::query( | ||
| "INSERT INTO scans (id, language, target_name, findings_count, severity_counts) | ||
| VALUES ($1, $2, $3, 0, $4)", | ||
| ) | ||
| .bind(scan_id) | ||
| .bind(&payload_lang) | ||
| .bind("Manual Scan") | ||
| .bind(serde_json::to_value(std::collections::HashMap::<String, i32>::new()).unwrap()) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove the new unwrap()s from server request paths.
These serialization calls now panic inside API/background-server code. Please handle the error explicitly and fail the scan cleanly instead.
As per coding guidelines, crates/{scanner,server}/**/*.rs: Do not use .unwrap() or .expect() in library and API code.
Also applies to: 401-405, 502-509, 567-571
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/server/src/main.rs` around lines 323 - 330, The INSERT query uses
serde_json::to_value(...).unwrap() which can panic; remove the unwrap and handle
the serialization error explicitly: call serde_json::to_value(...) and propagate
or map the Result to an error path (e.g., return Err or call the existing
scan-failure/cleanup routine used elsewhere in this module) before executing
sqlx::query. Update the code around the sqlx::query/execute block (the INSERT
for scans) to match error handling style used elsewhere (check for Result from
to_value, log the error with scan_id/payload_lang, and fail the scan cleanly),
and apply the same unwrap removal/explicit handling for the other occurrences
noted (around the other serde_json::to_value calls in the file ranges
mentioned).
| let _ = sqlx::query( | ||
| "UPDATE scans SET findings_count = $1, severity_counts = $2 WHERE id = $3", | ||
| ) | ||
| .bind(scan_id) | ||
| .bind(payload_lang) | ||
| .bind("Manual Scan") | ||
| .bind(findings.len() as i32) | ||
| .bind(serde_json::to_value(&severity_counts).unwrap_or(serde_json::Value::Object(Default::default()))) | ||
| .bind(serde_json::to_value(&severity_counts).unwrap()) | ||
| .bind(scan_id) | ||
| .execute(&state_task.db) | ||
| .await { | ||
| tracing::error!("Failed to finalize scan {}: {}", scan_id, e); | ||
| } | ||
| .await; |
There was a problem hiding this comment.
Don't swallow persistence failures in the background worker.
The let _ = ...await writes let SSE report a successful scan even when final counts or workspace findings never reach Postgres. That leaves /history and /results inconsistent with what the client saw live. At minimum, log these failures and mark the scan as failed/incomplete.
Also applies to: 553-574
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/server/src/main.rs` around lines 401 - 408, The code currently
swallows the result of the SQL update (the sqlx::query that binds
findings.len(), serde_json::to_value(&severity_counts) and scan_id and executes
against state_task.db) which can make the scan appear successful even when the
DB write fails; change this to handle the Result: check the .await result, log
any error via the existing logging facility and, on failure, update the scan
record to a failed/incomplete state (e.g., set a status column or failure flag
on the scans row referenced by scan_id) so /history and /results reflect the
real state; apply the same treatment to the other background persistence calls
around the 553-574 region.
| // Use a multi-stage stream: catch up from cache, then switch to live. | ||
| // We subscribe first to ensure we don't miss anything that happens during the catch-up. | ||
| let rx = state.scans.get(&id).map(|tx| tx.subscribe()); | ||
|
|
||
| let cached_events = state | ||
| .results | ||
| .get(&id) | ||
| .map(|c| c.clone()) | ||
| .unwrap_or_default(); | ||
| let num_cached = cached_events.len(); | ||
|
|
||
| let past_stream = stream::iter(cached_events).map(|event| -> Result<Event, Infallible> { | ||
| Ok(Event::default() | ||
| .json_data(&event) | ||
| .unwrap_or_else(|_| Event::default())) | ||
| }); | ||
|
|
||
| let stream: BoxedStream = if let Some(rx) = rx { | ||
| // Scan is ongoing | ||
| let mut seen_count = 0; | ||
| let live_stream = tokio_stream::wrappers::BroadcastStream::new(rx) | ||
| .filter_map(|msg| msg.ok()) | ||
| .filter(move |_| { | ||
| // Skip the first N events that were already in our cache clone | ||
| // This is a simple heuristic; a real implementation would use event IDs. | ||
| if seen_count < num_cached { | ||
| seen_count += 1; | ||
| false | ||
| } else { | ||
| true | ||
| } | ||
| }) | ||
| .map(|event| -> Result<Event, Infallible> { | ||
| Ok(Event::default() | ||
| .json_data(&event) | ||
| .json_data(event) | ||
| .unwrap_or_else(|_| Event::default())) | ||
| }), | ||
| ) | ||
| }); | ||
|
|
||
| Box::pin(past_stream.chain(live_stream)) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does tokio::sync::broadcast::Sender::subscribe() receive messages that were sent before the receiver subscribed, or only messages sent afterwards?
💡 Result:
Only messages sent after the receiver subscribes. tokio::sync::broadcast::Sender::subscribe creates a new Receiver handle that “will receive values sent after this call to subscribe”, and each Receiver receives clones of “all values sent after it has subscribed”. Therefore, a receiver does not get messages that were sent before the call to subscribe (though it may get lagged/overwritten behavior for messages sent after it subscribed).
Citations:
- 1: https://docs.rs/tokio/latest/tokio/sync/broadcast/struct.Sender.html
- 2: https://docs.rs/tokio/1.6.0/tokio/sync/broadcast/struct.Receiver.html
- 3: https://docs.rs/tokio/latest/tokio/sync/broadcast/index.html
- 4: https://docs.rs/tokio/latest/src/tokio/sync/broadcast.rs.html
Remove count-based event deduplication; use stable event IDs or sequence numbers instead.
The cached/live replay heuristic incorrectly drops legitimate events. When broadcast::subscribe() is called, it only receives messages sent after the subscription. Skipping the first num_cached events from the live stream therefore discards genuine future events that have no overlap with the cached batch. This needs event-based deduplication (e.g., event IDs or sequence numbers), not count-based skipping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/server/src/main.rs` around lines 596 - 634, The current count-based
deduplication using num_cached and seen_count is wrong; remove the seen_count
logic and instead deduplicate by stable event IDs or sequence numbers: build a
HashSet of IDs (from cached_events) before creating past_stream, then in the
live_stream (the tokio_stream::wrappers::BroadcastStream created from rx) filter
incoming events by checking their event_id/sequence (extract the ID/seq from the
incoming event) against that set, only forwarding events whose ID/seq is not
already present and insert newly-seen IDs into the set; update the map/filter
chain that produces live_stream (and keep past_stream unchanged) to use this
ID-based check rather than the count-based closure so you don’t drop genuine
future events.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation