Skip to content

osctrl-api: security hardening — auth bedrock, env secret containment, TLS-side rate-limit#813

Open
alvarofraguas wants to merge 1 commit into
jmpsec:mainfrom
alvarofraguas:pr/round-1-security-auth
Open

osctrl-api: security hardening — auth bedrock, env secret containment, TLS-side rate-limit#813
alvarofraguas wants to merge 1 commit into
jmpsec:mainfrom
alvarofraguas:pr/round-1-security-auth

Conversation

@alvarofraguas
Copy link
Copy Markdown

Summary

Server-side security hardening across the API and TLS surfaces of osctrl-api / osctrl-tls. Bundled as one PR because the pieces share a coherent threat model (anonymous attack surfaces, cross-tenant exposure, credential-bearing payloads). No API contract changes for existing clients — JSON shapes, route paths, and request bodies are unchanged.

End-to-end tested against a local Kali docker deployment.

Themes

Auth bedrock

  • --auth=jwt is now the default. --auth=none requires the operator to explicitly set OSCTRL_INSECURE_NO_AUTH=1; a 60s warning ticker keeps the deployment from drifting into "auth-off forever".
  • HttpOnly + Secure cookie session for browser/SPA clients (osctrl_token). CLI clients using Authorization: Bearer continue to work unchanged.
  • Double-submit CSRF (osctrl_csrf cookie + X-CSRF-Token header) for mutating cookie-authenticated requests. CLI Bearer flows are exempt.
  • JWT signing algorithm pinned to HMAC (rejects alg:none and RS-vs-HS confusion).
  • JWT secret minimum 32 bytes. Startup fails fast with the openssl rand -base64 48 one-liner if the configured secret is shorter.
  • Strict forwarded-headers trust via new --trusted-proxies flag. Empty default means utils.GetIP ignores X-Forwarded-For / X-Real-IP, so an internet attacker can't spoof IPs to defeat rate limits or poison audit logs.

Env secret containment + cross-env defense

  • New types.TLSEnvironmentView low-privilege projection. Omits Secret, EnrollSecretPath, RemoveSecretPath, Certificate, Flags, and every other field that materially contributes to enrolling a node.
  • EnvironmentHandler branches on access level: AdminLevel (or super-admin) gets the full storage struct, UserLevel gets the safe projection.
  • EnvEnrollHandler / EnvRemoveHandler raised from UserLevel to AdminLevel — both embed the env's enroll/remove secret in their responses.
  • EnvActionsHandler create/delete branches validate caller-supplied UUIDs via EnvUUIDFilter and ExistsByUUID.
  • QueryResultsHandler now precheck-validates the named query belongs to the path env. Without this, a user with QueryLevel on env A could pull results from env B by passing B's query name in A's URL.

TLS-side hardening

  • New pkg/ratelimit token-bucket middleware with per-key (IP, IP+env, etc.) buckets, amortized eviction, and a shared overflow bucket past the per-key cap so memory stays bounded under spray.
  • Per-(remote, env) rate limits applied to enroll + config endpoints (--rate-limit-rps, --rate-limit-burst, --rate-limit-window).
  • Constant-time comparison for env enroll-secret to defeat timing oracles.
  • Two new audit hooks: AuditLog.FailedLogin (api-side) and AuditLog.FailedEnroll (tls-side) so SOC tooling has a clean stream for brute-force / password-spray / enroll-abuse alerts.

SQL hardening + carve path safety

  • New carves.ValidCarvePath regexp gate. Without this a CarveLevel operator could pass '; SELECT 1; -- and pivot "carve a file" into "run any SELECT against your fleet" via GenCarveQuery's string-concat path.

Authz + audit-log hardening

  • bcrypt cost raised from default (10) to 12. CheckLoginCredentials opportunistically re-hashes existing users at next login — no password reset required, rehash failure is non-fatal.
  • New UserManager.ClearToken empties APIToken AND CSRFToken together so an existing JWT+CSRF pair stops validating in lockstep.
  • Settings + env-action handlers tightened (env-scoped settings reads instead of global; env service-name validation).
  • pkg/environments/env-cache.go: amortized cache eviction (dropped a dead field; cache entries TTL out instead of growing unbounded).

Defaults + ops

  • --audit-log default flipped to true across deploy/config/{api,admin,tls}.yml. Operators can opt out with --audit-log=false.

⚠️ Operator-visible behavior changes

Two intentional breaking changes worth a release-note line:

  1. --auth=jwt is now the default. Deployments currently running --auth=none will fail to start unless they explicitly set OSCTRL_INSECURE_NO_AUTH=1. The help text and a 60s warning ticker make this loud. Intended forcing-function so insecure defaults don't survive an upgrade.
  2. bcrypt cost 12 + on-login auto-rehash. Existing cost-10 hashes migrate transparently the next time each user logs in. No password reset needed, no DB migration. Login latency for affected users will tick up by ~tens of ms once, then stay at the new cost.

Test plan

  • go build ./... — clean
  • go vet ./... — clean
  • go test ./... — all packages pass
  • gofmt -l ./... — empty
  • End-to-end smoke against a Kali docker deployment (login, CSRF, enroll, env-secret-containment regression)
  • Pre-existing tests still pass (no behavior changes on the happy path for existing CLI Bearer clients)
  • New tests added:
    • cmd/api/auth_test.go — JSON-vs-redirect on 401
    • cmd/api/handlers/environments_test.go — env-projection strips every secret-bearing field
    • pkg/carves/utils_test.goValidCarvePath accept/reject table + carve-query happy-path shape
    • pkg/ratelimit/ratelimit_test.go — token-bucket + eviction + overflow

What this is part of

This is round 1 of a 3-round contribution. Round 2 will add the new API endpoints the SPA needs (paginated lists, stats, saved-queries CRUD, users/permissions/tokens, env-config PATCH, audit-log filters). Round 3 will land the React frontend under a new frontend/ directory. Splitting reduces review surface per PR.

@javuto javuto self-requested a review May 13, 2026 14:40
@javuto javuto self-assigned this May 13, 2026
@javuto javuto added osctrl-api osctrl-api related changes 🔐 security Security related issues ✨ enhancement New feature or request labels May 13, 2026
@alvarofraguas alvarofraguas force-pushed the pr/round-1-security-auth branch from 027a183 to 1810d3c Compare May 13, 2026 15:38
@javuto javuto added the osctrl-tls osctrl-tls related changes label May 13, 2026
Comment thread cmd/tls/main.go Outdated
}
// Initialize audit log so failed enroll attempts are recorded.
//
auditLog, err := auditlog.CreateAuditLogManager(db.Conn, serviceName, flagParams.Service.AuditLog)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's separate all changes for osctrl-tls in its own PR, so it is easier to isolate if something needs to be tested.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — carved the osctrl-tls changes (cmd/tls/handlers/{handlers,post,utils}.go, cmd/tls/main.go, deploy/config/tls.yml) into their own PR: #816. This PR's diff drops to 29 osctrl-api + shared-infra files (no cmd/tls/ touches).

The shared infra they consume (pkg/auditlog.FailedEnroll, pkg/ratelimit, utils.SetTrustedProxies / updated GetIP) stays here, since osctrl-api also uses it. #816 stacks on this PR — its diff will collapse to just the 5 TLS files once this lands. PRs #814 and #815 rebased onto the new tip.

Comment thread pkg/environments/env-cache.go Outdated
// osctrl-tls restart, and the steady-state load is "one read per
// minute per env" — trivial for osctrl-tls's read traffic.
//
envCacheTTL = 60 * time.Second
Copy link
Copy Markdown
Collaborator

@javuto javuto May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too low, I prefer to leave it at 2 hours.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to 2 * time.Hour in c9e7397. Kept the constant + docstring so the value is self-documenting and tunable, but the runtime behavior matches the historical cleanup interval. PRs #814 and #815 rebased onto the new round-1 tip.

Comment thread pkg/config/flags.go
Value: AuthNone,
Usage: "Authentication mechanism for the service",
Value: AuthJWT,
Usage: "Authentication mechanism for the service (jwt|none — `none` requires OSCTRL_INSECURE_NO_AUTH=1)",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave usage as Authentication mechanism for the service, the rest is invalid.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick confirmation needed before I revert the Usage text: the same diff also changes Value: AuthNoneValue: AuthJWT on the line above (defaulting the API to JWT auth instead of no-auth, gated by OSCTRL_INSECURE_NO_AUTH=1 for local dev). Your comment is on the Usage string only — do you want me to keep the default change at AuthJWT, or revert the default to AuthNone as well?

Context for the default change: it closes the "API is unauthenticated by default" finding from the recent security audit (the related guardAuthMode helper in cmd/api/main.go is its runtime enforcement). Happy to revert if you'd prefer to land the default change as a separate, narrower PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it — you're right, and I missed that. The --auth flag is in the shared initServiceFlags so my default flip applied to all three services, including osctrl-tls. osctrl-tls's runtime never reads the field (osquery agents authenticate via the per-environment enroll secret, not JWT), so the practical impact is zero, but osctrl-tls --help now misleadingly shows auth (default: jwt) which is wrong semantically.

Proposing the cleanest fix: split the --auth flag out of the shared initServiceFlags into the three per-service Init*Flags functions, with per-service defaults:

  • osctrl-apiAuthJWT (closes U-AUTH-1 — silent unauthenticated default was the audit blocker)
  • osctrl-adminAuthDB (its existing default, browser login form)
  • osctrl-tlsAuthNone (correct for it, as you said)

Plus the Usage text reverted to Authentication mechanism for the service per your other comment.

The guardAuthMode helper stays in cmd/api/main.go only — it's specific to osctrl-api's audit finding. Want me to land this as part of #813, or as a small standalone PR you can review independently?

@@ -0,0 +1,144 @@
// Package ratelimit provides a small token-bucket rate-limit middleware
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as mentioned in tls, these better be in a separate PR for osctrl-tls

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick clarification before I move pkg/ratelimit: this PR also wires it into osctrl-api on the /login endpoint (10 req/min strict for credential POSTs, separate 60 req/min for pre-auth GETs like the env list). The package itself has two consumers — osctrl-api here and osctrl-tls in #816.

Two options:

(a) Keep pkg/ratelimit here as shared infrastructure (current state) and only the consumer code for osctrl-tls lives in #816. The package is then symmetric with pkg/auditlog (also shared, with both FailedLogin and FailedEnroll helpers).

(b) Move pkg/ratelimit entirely to #816 and drop the /login rate-limit from osctrl-api in this PR. The brute-force / password-spray protection on /login would then either ship later in a follow-up or not at all.

Which would you prefer? If (b), happy to land the api-side /login rate-limit as a separate small PR after these two settle.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do (a) and since it is limited to osctrl-api, the pkg will be used later by osctrl-tls or whatever. Thanks!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Confirmed — keeping pkg/ratelimit here as shared infra. No code change needed; the osctrl-tls consumer is already split out into #816 as you requested.

Comment thread pkg/carves/utils.go
// platforms: absolute POSIX paths (Linux/macOS), Windows paths with
// backslashes and drive letters, and glob wildcards (* and ?). It
// explicitly excludes single quote, semicolon, and comment markers.
var validCarvePath = regexp.MustCompile(`^[/A-Za-z0-9._\-\\:*?]+$`)
Copy link
Copy Markdown
Collaborator

@javuto javuto May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this regex is a good idea. It will block legitimate paths with spaces, for example:

  • C:\Program Files\...
  • /Library/Application Support/...

Escaping SQL string literals would be a better approach

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the regex is too restrictive. Spaces, parentheses, accented characters, and a bunch of legitimate path content fail it. Will replace the validation with SQL string-literal escaping in GenCarveQuery (''' so a single quote can never break out of the string literal), drop ValidCarvePath / validCarvePath, and remove the precheck call in CarvesRunHandler.

For the glob (LIKE) path I'll handle the wildcard mapping the existing comment already implies — *%, ?_ — and escape any user-supplied % / _ / \\ so a path containing them isn't accidentally interpreted as a wildcard.

Will batch this with your other comments so we don't ping-pong force-pushes on the PR. Tests will cover at least: paths with spaces (Windows and macOS), an injection attempt ('; SELECT 1; --), and glob escape of literal %.

Copy link
Copy Markdown
Collaborator

@javuto javuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, several things to change and we can merge!

@alvarofraguas alvarofraguas force-pushed the pr/round-1-security-auth branch from 1810d3c to c9e7397 Compare May 14, 2026 17:16
…, shared rate-limit + audit-log infra

Server-side hardening for osctrl-api, plus shared infrastructure
(rate-limit package, audit-log helpers, trusted-proxies plumbing)
that osctrl-tls also consumes — its consumer-side changes ship in a
companion PR so the TLS-facing surface can be tested in isolation.

== Auth bedrock ==

cmd/api:
  - --auth=jwt is now the default. Refuse to start with --auth=none
    unless OSCTRL_INSECURE_NO_AUTH=1 is set. When opted in, a 60s
    warning ticker keeps the deployment from drifting into
    'auth-off forever'.
  - HttpOnly + Secure cookie session for SPA-style clients
    (osctrl_token). CLI clients with Authorization: Bearer continue
    to work unchanged.
  - Double-submit CSRF (osctrl_csrf cookie + X-CSRF-Token header) for
    mutating cookie-authenticated requests. CLI Bearer flows exempt.
  - JWT signing-algorithm pin (HMAC only) to defeat alg-confusion
    attacks (alg:none / RS256-with-HS256-verify).
  - JWT secret minimum 32 bytes (HS256 needs HMAC key ≥ hash output).
    Startup fails fast with the openssl one-liner if too short.
  - Strict 'forwarded headers' trust via --trusted-proxies. Empty
    default means utils.GetIP ignores X-Forwarded-For / X-Real-IP —
    an internet attacker can't spoof IPs to defeat rate-limits or
    poison audit logs.

== Env secret containment + cross-env defense ==

pkg/types: new TLSEnvironmentView — the low-privilege env projection.
  Omits Secret, EnrollSecretPath, RemoveSecretPath, Certificate, Flags,
  and every other field that materially contributes to enrolling a node.

cmd/api/handlers/environments.go:
  - EnvironmentHandler now branches on access level: AdminLevel (or
    super-admin) gets the full storage struct; UserLevel gets the
    low-priv view.
  - EnvEnrollHandler / EnvRemoveHandler raised from UserLevel to
    AdminLevel — both embed the env's enroll/remove secret.
  - Both handlers log only the target name, not returnData.
  - EnvActionsHandler 'create' branch validates caller-supplied UUID
    via EnvUUIDFilter (rejects malformed) and EnvExists (rejects
    collision). 'delete' branch gets the same validation for symmetry.

cmd/api/handlers/queries.go: QueryResultsHandler now precheck-validates
  the named query belongs to env.ID via h.Queries.Exists(name, env.ID)
  and returns 404 otherwise. logging.GetQueryResults filtered on 'name'
  only, so without this gate a user with QueryLevel on env A could
  pull results from env B by passing B's query name in A's URL.

pkg/environments/environments.go: tighten EnvUUIDFilter regex and add
  axis-pure Exists/UUIDExists helpers so handler checks can match the
  router's expectations exactly.

== Shared rate-limit + audit-log infrastructure ==

pkg/ratelimit (new): per-key token-bucket rate limiter with idle
  eviction. Used by osctrl-api for /login here, and by osctrl-tls for
  /enroll in the companion PR. Tunable burst, window, and key
  function (KeyByIP today; KeyByIPAndEnv available).

pkg/auditlog/audit.go: FailedLogin + FailedEnroll helpers — a clean
  stream of authn/enrol failures for SoC tooling to alert on
  brute-force, password-spray, and enroll abuse.

pkg/utils/http-utils.go: SetTrustedProxies + an updated GetIP that
  honors the trusted-proxies set. Empty (default) ignores
  X-Forwarded-For / X-Real-IP entirely.

== SQL hardening + carve path safety ==

pkg/carves/utils.go: new ValidCarvePath regexp gate. Without this gate
  a CarveLevel operator could pass \`'; SELECT 1; --\` and pivot 'carve
  a file' into 'run any SELECT against your fleet' via GenCarveQuery's
  string concat.

cmd/api/handlers/carves.go (CarvesRunHandler): path validated before
  the SQL splice. Rejected paths return 400.

== Authz + audit-log hardening ==

pkg/users:
  - bcrypt cost raised from default (10) to 12. CheckLoginCredentials
    opportunistically re-hashes existing users at next login (no
    password reset needed). Rehash failure is non-fatal.
  - New ClearToken empties APIToken AND CSRFToken so any existing JWT
    + CSRF cookie pair stops validating. Used by future
    DELETE /api/v1/users/{username}/token in a follow-up PR.

cmd/api/handlers/{users,settings,environments}.go: authz tightenings
  around permission writes, settings PATCH, and env-action service-name
  validation.

pkg/environments/env-cache.go: keep the 2h cleanup interval; introduce
  an envCacheTTL constant so the value is self-documenting and tunable
  locally without changing runtime defaults.

== Defaults + ops ==

deploy/config/{api,admin}.yml: flip --audit-log default to true so
  audit log writes are on by default. Operators can disable with
  --audit-log=false.

Verified: go build ./... clean, go vet ./... clean, go test ./pkg/...
./cmd/api/... ./cmd/tls/... all green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

✨ enhancement New feature or request osctrl-api osctrl-api related changes 🔐 security Security related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants