Fix 2 by abishekve · Pull Request #194 · MachDatum/ThingConnect.Pulse

abishekve · 2026-05-12T11:36:48Z

PR Title

Fix SQLite lock contention, thread pool starvation, and unbounded raw data queries

PR Description

This PR resolves major SQLite performance and stability issues affecting live monitoring, status page responsiveness, rollup processing, and long-term database growth.

The primary root causes were:

Concurrent SQLite writes from probe tasks
Full-table scans against check_result_raw
Sync-over-async thread pool blocking
Excessive per-probe DB access
Unbounded historical queries loading millions of rows into memory
Missing automated pruning

The changes introduce WAL mode, a centralized async write queue, query optimizations, endpoint caching, scheduled pruning, and multiple background processing improvements.

Changes Included

1. Added `SqliteWalInterceptor`

Introduced a connection interceptor that applies SQLite PRAGMA optimizations on every new DB connection:

journal_mode=WAL
busy_timeout=5000
synchronous=NORMAL
cache_size=-8000

This enables concurrent reads/writes, reduces lock contention, and improves overall DB responsiveness.

2. Added `CheckResultWriteQueue`

Implemented a centralized background write queue for probe results.

Previous behavior

Each probe executed its own SaveChangesAsync() call, causing massive SQLite write lock contention when many probes completed simultaneously.

New behavior

Probe results are queued via Channel<T>
Single background writer batches up to 100 items per save
Queue bounded at 10,000 entries with oldest-item drop strategy
RTT updates use ExecuteUpdateAsync
Remaining items are flushed during shutdown

Result:

Eliminates concurrent SQLite writers
Reduces write amplification
Improves monitoring stability under load

3. Updated `OutageDetectionService`

Replaced inline async DB writes with fire-and-forget queue enqueueing:

SaveCheckResultAsync() → SaveCheckResult()
Actual persistence handled asynchronously by CheckResultWriteQueue

This removes DB latency from probe execution paths.

4. Updated `MonitoringBackgroundService`

Fixed timer restart thundering herd

Previously all probe timers restarted every 15 seconds, causing probes to execute simultaneously.

Now:

Timers restart only when intervals actually change
Uses ConcurrentDictionary<Guid, int> to track intervals

Removed per-probe DB reads

Previously each probe executed FindAsync(endpointId).

Now:

Endpoints cached in-memory via ConcurrentDictionary<Guid, Endpoint>
Refreshed periodically
Probe execution performs zero DB reads

5. Optimized `StatusService`

Resolved multiple severe performance issues.

Removed expensive full-table scans

Old query pattern:

GroupBy + OrderByDescending + FirstOrDefault
Triggered scans across millions of check_result_raw rows

Status now uses:

endpoint.LastStatus
endpoint.LastRttMs

Fixed sync-over-async thread pool starvation

Previous implementation called:

IsFlapping(endpoint.Id).Result

inside loops for every endpoint, blocking ASP.NET Core thread pool threads.

Replaced with:

GetFlappingEndpointIdsAsync(endpointIds)
Single batched query
HashSet<Guid>.Contains() lookups

Removed unused `CountAsync()`

Deleted unnecessary DB query that was never used.

6. Optimized `RollupService`

Fixed unbounded memory usage in rollup processing.

Previous behavior

Loaded entire tables into memory:

await _context.CheckResultsRaw.ToListAsync()

then filtered in C#.

New behavior

Date filtering moved into SQL WHERE
Added AsNoTracking()
Removed blocking .ToList()
Uses fully async queries

This dramatically reduces memory usage and DB pressure.

7. Optimized `HistoryService`

Fixed all historical query methods loading entire endpoint histories before filtering.

Updated methods:

GetRawDataAsync
GetRollup15mDataAsync
GetRollupDailyDataAsync
GetOutagesAsync

Changes:

Date filtering moved into SQL
Added AsNoTracking()
Replaced synchronous list materialization with async equivalents

8. Added `PruneBackgroundService`

Previously pruning logic existed but required manual invocation.

Added automated scheduled pruning service:

Runs daily at 02:00
Calls IPruneService.PruneRawDataAsync()
Reads config from Data:Pruning
Uses scoped service pattern for DbContext access

This prevents uncontrolled growth of check_result_raw.

9. Updated `appsettings.json`

These should ideally be separated into independent PRs.

…points, computes flap status in memory, and returns a HashSet<Guid>. The loop uses that pre-computed set — zero async calls inside the loop.

… into fix-2

… into fix-2

abishekve added 11 commits January 24, 2026 16:59

fix: resolve cookie authentication and network configuration issues

e61a4cc

removed comment code

133a024

WAL mode implemented.

4c2d60a

CheckResultWriteQueue

2cf5a04

Fixed thundering herd in MonitoringBackgroundService.cs

d54bb58

Fix: GetFlappingEndpointIdsAsync now runs a single query for all end…

415a5b3

…points, computes flap status in memory, and returns a HashSet<Guid>. The loop uses that pre-computed set — zero async calls inside the loop.

Merge branch 'fix-1' of https://github.com/MachDatum/ThingConnect.Pulse…

8fd9ca1

… into fix-2

build fix

ca3cab7

added prune bg serivce

387e974

Merge branch 'fix-1' of https://github.com/MachDatum/ThingConnect.Pulse…

436df65

… into fix-2

github-actions Bot added chore docs fix labels May 12, 2026

abishekve changed the base branch from fix-1 to master May 12, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 2#194

Fix 2#194
abishekve wants to merge 11 commits into
masterfrom
fix-2

abishekve commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abishekve commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title

PR Description

Changes Included

1. Added SqliteWalInterceptor

2. Added CheckResultWriteQueue

Previous behavior

New behavior

3. Updated OutageDetectionService

4. Updated MonitoringBackgroundService

Fixed timer restart thundering herd

Removed per-probe DB reads

5. Optimized StatusService

Removed expensive full-table scans

Fixed sync-over-async thread pool starvation

Removed unused CountAsync()

6. Optimized RollupService

Previous behavior

New behavior

7. Optimized HistoryService

8. Added PruneBackgroundService

9. Updated appsettings.json

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abishekve commented May 12, 2026 •

edited

Loading

1. Added `SqliteWalInterceptor`

2. Added `CheckResultWriteQueue`

3. Updated `OutageDetectionService`

4. Updated `MonitoringBackgroundService`

5. Optimized `StatusService`

Removed unused `CountAsync()`

6. Optimized `RollupService`

7. Optimized `HistoryService`

8. Added `PruneBackgroundService`

9. Updated `appsettings.json`