Skip to content

Triage and notification reliability — transactional writes, XSS fix, retries#235

Merged
MichielDean merged 10 commits into
mainfrom
feat/sc-pl9c3
Apr 15, 2026
Merged

Triage and notification reliability — transactional writes, XSS fix, retries#235
MichielDean merged 10 commits into
mainfrom
feat/sc-pl9c3

Conversation

@MichielDean
Copy link
Copy Markdown
Owner

Closes droplet sc-pl9c3.

Multiple reliability and security fixes across triage, mail, Telegram, and GitHub integrations:

  1. Triage persistOutput atomic — single DB transaction with batch inserts (pgx.Batch) for clusters and classifications
  2. Telegram RunURL HTML-escaped — XSS fix for href attribute injection
  3. SMTP retry — 3 retries with exponential backoff for transient SMTP errors (5xx, timeout)
  4. Telegram retry — retry with backoff for 429/5xx, Retry-After header support
  5. GitHub status retry — retry for 429/5xx with Retry-After header
  6. Webhook bounded worker pool — semaphore-based (default 10), prevents unbounded goroutine growth
  7. Triage jobs respect shutdown — parent context propagated from server, cancellable on graceful shutdown
  8. LLM transient-only retry — only retry on deadline/signal errors, fail fast on client errors
  9. Invitation HTML email — multipart/alternative with crypto/rand MIME boundary
  10. invited_by FK ON DELETE SET NULL — migration 000024 drops old FK, makes column nullable, adds new FK with SET NULL

All 25 packages pass. QA and security reviews completed with no blocking issues.

Cistern Agent added 10 commits April 15, 2026 00:29
… XSS fix, retries, bounded concurrency

- Triage persistOutput: wrap cluster/classification inserts in a single DB transaction (store.PersistOutput) so partial writes cannot leave the record inconsistent
- Telegram: HTML-escape RunURL in FormatMessage to prevent XSS in href attributes
- Telegram: add retry with backoff for 429 (rate limit) and 5xx responses, respect Retry-After header
- GitHub status API: add retry for 429 with Retry-After and 5xx server errors
- SMTP/general mail: add 3 retries with exponential backoff for transient errors (dial, TLS, 5xx)
- Mailer invitation emails: add multipart/alternative HTML body for invitation emails, retry on transient SMTP errors
- Webhook dispatch: replace unbounded goroutines with bounded worker pool (default 10)
- Triage job: accept parent context for cancellation during graceful shutdown (SetParentContext)
- LLM retry: only retry on transient errors (context deadline, killed process), fail fast on client errors (exit code 1)
- Invitation emails: add styled HTML alternative part
- Invitations migration: make invited_by nullable with ON DELETE SET NULL so user deletion doesn't fail
- Extract shared IsTransientSMTPError and defaultSMTPRetries into
  internal/smtptransient package to eliminate duplication between
  internal/mail and internal/mailer
- Replace hardcoded 'boundary123' MIME boundary with crypto/rand-based
  unique boundary in mailer to prevent email corruption
- Both packages now delegate to smtptransient.IsTransient and
  smtptransient.DefaultRetries
- Add tests for unique boundary uniqueness and multipart content
- All tests pass
- Migration 000024: DROP existing invitations_invited_by_fkey before
  ADD CONSTRAINT fk_invitations_invited_by, preventing deployment failure
  from two FKs on the same column.
- smtptransient: Replace overly broad strings.Contains('55') and ('54')
  with regex matching 3-digit SMTP response codes ([45]\d{2}) preceded
  by word boundary. Add tests for false positive rejection (timestamps,
  port numbers, error IDs).
@MichielDean MichielDean merged commit ac8a47f into main Apr 15, 2026
10 checks passed
@MichielDean MichielDean deleted the feat/sc-pl9c3 branch April 15, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant