osctrl-api: security hardening — auth bedrock, env secret containment, TLS-side rate-limit#813
osctrl-api: security hardening — auth bedrock, env secret containment, TLS-side rate-limit#813alvarofraguas wants to merge 1 commit into
Conversation
027a183 to
1810d3c
Compare
| } | ||
| // Initialize audit log so failed enroll attempts are recorded. | ||
| // | ||
| auditLog, err := auditlog.CreateAuditLogManager(db.Conn, serviceName, flagParams.Service.AuditLog) |
There was a problem hiding this comment.
Let's separate all changes for osctrl-tls in its own PR, so it is easier to isolate if something needs to be tested.
There was a problem hiding this comment.
Done — carved the osctrl-tls changes (cmd/tls/handlers/{handlers,post,utils}.go, cmd/tls/main.go, deploy/config/tls.yml) into their own PR: #816. This PR's diff drops to 29 osctrl-api + shared-infra files (no cmd/tls/ touches).
The shared infra they consume (pkg/auditlog.FailedEnroll, pkg/ratelimit, utils.SetTrustedProxies / updated GetIP) stays here, since osctrl-api also uses it. #816 stacks on this PR — its diff will collapse to just the 5 TLS files once this lands. PRs #814 and #815 rebased onto the new tip.
| // osctrl-tls restart, and the steady-state load is "one read per | ||
| // minute per env" — trivial for osctrl-tls's read traffic. | ||
| // | ||
| envCacheTTL = 60 * time.Second |
There was a problem hiding this comment.
Too low, I prefer to leave it at 2 hours.
| Value: AuthNone, | ||
| Usage: "Authentication mechanism for the service", | ||
| Value: AuthJWT, | ||
| Usage: "Authentication mechanism for the service (jwt|none — `none` requires OSCTRL_INSECURE_NO_AUTH=1)", |
There was a problem hiding this comment.
Leave usage as Authentication mechanism for the service, the rest is invalid.
There was a problem hiding this comment.
Quick confirmation needed before I revert the Usage text: the same diff also changes Value: AuthNone → Value: AuthJWT on the line above (defaulting the API to JWT auth instead of no-auth, gated by OSCTRL_INSECURE_NO_AUTH=1 for local dev). Your comment is on the Usage string only — do you want me to keep the default change at AuthJWT, or revert the default to AuthNone as well?
Context for the default change: it closes the "API is unauthenticated by default" finding from the recent security audit (the related guardAuthMode helper in cmd/api/main.go is its runtime enforcement). Happy to revert if you'd prefer to land the default change as a separate, narrower PR.
There was a problem hiding this comment.
Got it — you're right, and I missed that. The --auth flag is in the shared initServiceFlags so my default flip applied to all three services, including osctrl-tls. osctrl-tls's runtime never reads the field (osquery agents authenticate via the per-environment enroll secret, not JWT), so the practical impact is zero, but osctrl-tls --help now misleadingly shows auth (default: jwt) which is wrong semantically.
Proposing the cleanest fix: split the --auth flag out of the shared initServiceFlags into the three per-service Init*Flags functions, with per-service defaults:
osctrl-api→AuthJWT(closes U-AUTH-1 — silent unauthenticated default was the audit blocker)osctrl-admin→AuthDB(its existing default, browser login form)osctrl-tls→AuthNone(correct for it, as you said)
Plus the Usage text reverted to Authentication mechanism for the service per your other comment.
The guardAuthMode helper stays in cmd/api/main.go only — it's specific to osctrl-api's audit finding. Want me to land this as part of #813, or as a small standalone PR you can review independently?
| @@ -0,0 +1,144 @@ | |||
| // Package ratelimit provides a small token-bucket rate-limit middleware | |||
There was a problem hiding this comment.
Same as mentioned in tls, these better be in a separate PR for osctrl-tls
There was a problem hiding this comment.
Quick clarification before I move pkg/ratelimit: this PR also wires it into osctrl-api on the /login endpoint (10 req/min strict for credential POSTs, separate 60 req/min for pre-auth GETs like the env list). The package itself has two consumers — osctrl-api here and osctrl-tls in #816.
Two options:
(a) Keep pkg/ratelimit here as shared infrastructure (current state) and only the consumer code for osctrl-tls lives in #816. The package is then symmetric with pkg/auditlog (also shared, with both FailedLogin and FailedEnroll helpers).
(b) Move pkg/ratelimit entirely to #816 and drop the /login rate-limit from osctrl-api in this PR. The brute-force / password-spray protection on /login would then either ship later in a follow-up or not at all.
Which would you prefer? If (b), happy to land the api-side /login rate-limit as a separate small PR after these two settle.
There was a problem hiding this comment.
We can do (a) and since it is limited to osctrl-api, the pkg will be used later by osctrl-tls or whatever. Thanks!
There was a problem hiding this comment.
👍 Confirmed — keeping pkg/ratelimit here as shared infra. No code change needed; the osctrl-tls consumer is already split out into #816 as you requested.
| // platforms: absolute POSIX paths (Linux/macOS), Windows paths with | ||
| // backslashes and drive letters, and glob wildcards (* and ?). It | ||
| // explicitly excludes single quote, semicolon, and comment markers. | ||
| var validCarvePath = regexp.MustCompile(`^[/A-Za-z0-9._\-\\:*?]+$`) |
There was a problem hiding this comment.
Not sure this regex is a good idea. It will block legitimate paths with spaces, for example:
C:\Program Files\.../Library/Application Support/...
Escaping SQL string literals would be a better approach
There was a problem hiding this comment.
Agreed — the regex is too restrictive. Spaces, parentheses, accented characters, and a bunch of legitimate path content fail it. Will replace the validation with SQL string-literal escaping in GenCarveQuery (' → '' so a single quote can never break out of the string literal), drop ValidCarvePath / validCarvePath, and remove the precheck call in CarvesRunHandler.
For the glob (LIKE) path I'll handle the wildcard mapping the existing comment already implies — * → %, ? → _ — and escape any user-supplied % / _ / \\ so a path containing them isn't accidentally interpreted as a wildcard.
Will batch this with your other comments so we don't ping-pong force-pushes on the PR. Tests will cover at least: paths with spaces (Windows and macOS), an injection attempt ('; SELECT 1; --), and glob escape of literal %.
javuto
left a comment
There was a problem hiding this comment.
Nice work, several things to change and we can merge!
1810d3c to
c9e7397
Compare
…, shared rate-limit + audit-log infra
Server-side hardening for osctrl-api, plus shared infrastructure
(rate-limit package, audit-log helpers, trusted-proxies plumbing)
that osctrl-tls also consumes — its consumer-side changes ship in a
companion PR so the TLS-facing surface can be tested in isolation.
== Auth bedrock ==
cmd/api:
- --auth=jwt is now the default. Refuse to start with --auth=none
unless OSCTRL_INSECURE_NO_AUTH=1 is set. When opted in, a 60s
warning ticker keeps the deployment from drifting into
'auth-off forever'.
- HttpOnly + Secure cookie session for SPA-style clients
(osctrl_token). CLI clients with Authorization: Bearer continue
to work unchanged.
- Double-submit CSRF (osctrl_csrf cookie + X-CSRF-Token header) for
mutating cookie-authenticated requests. CLI Bearer flows exempt.
- JWT signing-algorithm pin (HMAC only) to defeat alg-confusion
attacks (alg:none / RS256-with-HS256-verify).
- JWT secret minimum 32 bytes (HS256 needs HMAC key ≥ hash output).
Startup fails fast with the openssl one-liner if too short.
- Strict 'forwarded headers' trust via --trusted-proxies. Empty
default means utils.GetIP ignores X-Forwarded-For / X-Real-IP —
an internet attacker can't spoof IPs to defeat rate-limits or
poison audit logs.
== Env secret containment + cross-env defense ==
pkg/types: new TLSEnvironmentView — the low-privilege env projection.
Omits Secret, EnrollSecretPath, RemoveSecretPath, Certificate, Flags,
and every other field that materially contributes to enrolling a node.
cmd/api/handlers/environments.go:
- EnvironmentHandler now branches on access level: AdminLevel (or
super-admin) gets the full storage struct; UserLevel gets the
low-priv view.
- EnvEnrollHandler / EnvRemoveHandler raised from UserLevel to
AdminLevel — both embed the env's enroll/remove secret.
- Both handlers log only the target name, not returnData.
- EnvActionsHandler 'create' branch validates caller-supplied UUID
via EnvUUIDFilter (rejects malformed) and EnvExists (rejects
collision). 'delete' branch gets the same validation for symmetry.
cmd/api/handlers/queries.go: QueryResultsHandler now precheck-validates
the named query belongs to env.ID via h.Queries.Exists(name, env.ID)
and returns 404 otherwise. logging.GetQueryResults filtered on 'name'
only, so without this gate a user with QueryLevel on env A could
pull results from env B by passing B's query name in A's URL.
pkg/environments/environments.go: tighten EnvUUIDFilter regex and add
axis-pure Exists/UUIDExists helpers so handler checks can match the
router's expectations exactly.
== Shared rate-limit + audit-log infrastructure ==
pkg/ratelimit (new): per-key token-bucket rate limiter with idle
eviction. Used by osctrl-api for /login here, and by osctrl-tls for
/enroll in the companion PR. Tunable burst, window, and key
function (KeyByIP today; KeyByIPAndEnv available).
pkg/auditlog/audit.go: FailedLogin + FailedEnroll helpers — a clean
stream of authn/enrol failures for SoC tooling to alert on
brute-force, password-spray, and enroll abuse.
pkg/utils/http-utils.go: SetTrustedProxies + an updated GetIP that
honors the trusted-proxies set. Empty (default) ignores
X-Forwarded-For / X-Real-IP entirely.
== SQL hardening + carve path safety ==
pkg/carves/utils.go: new ValidCarvePath regexp gate. Without this gate
a CarveLevel operator could pass \`'; SELECT 1; --\` and pivot 'carve
a file' into 'run any SELECT against your fleet' via GenCarveQuery's
string concat.
cmd/api/handlers/carves.go (CarvesRunHandler): path validated before
the SQL splice. Rejected paths return 400.
== Authz + audit-log hardening ==
pkg/users:
- bcrypt cost raised from default (10) to 12. CheckLoginCredentials
opportunistically re-hashes existing users at next login (no
password reset needed). Rehash failure is non-fatal.
- New ClearToken empties APIToken AND CSRFToken so any existing JWT
+ CSRF cookie pair stops validating. Used by future
DELETE /api/v1/users/{username}/token in a follow-up PR.
cmd/api/handlers/{users,settings,environments}.go: authz tightenings
around permission writes, settings PATCH, and env-action service-name
validation.
pkg/environments/env-cache.go: keep the 2h cleanup interval; introduce
an envCacheTTL constant so the value is self-documenting and tunable
locally without changing runtime defaults.
== Defaults + ops ==
deploy/config/{api,admin}.yml: flip --audit-log default to true so
audit log writes are on by default. Operators can disable with
--audit-log=false.
Verified: go build ./... clean, go vet ./... clean, go test ./pkg/...
./cmd/api/... ./cmd/tls/... all green.
c9e7397 to
2610744
Compare
Summary
Server-side security hardening across the API and TLS surfaces of
osctrl-api/osctrl-tls. Bundled as one PR because the pieces share a coherent threat model (anonymous attack surfaces, cross-tenant exposure, credential-bearing payloads). No API contract changes for existing clients — JSON shapes, route paths, and request bodies are unchanged.End-to-end tested against a local Kali docker deployment.
Themes
Auth bedrock
--auth=jwtis now the default.--auth=nonerequires the operator to explicitly setOSCTRL_INSECURE_NO_AUTH=1; a 60s warning ticker keeps the deployment from drifting into "auth-off forever".osctrl_token). CLI clients usingAuthorization: Bearercontinue to work unchanged.osctrl_csrfcookie +X-CSRF-Tokenheader) for mutating cookie-authenticated requests. CLI Bearer flows are exempt.alg:noneand RS-vs-HS confusion).openssl rand -base64 48one-liner if the configured secret is shorter.--trusted-proxiesflag. Empty default meansutils.GetIPignoresX-Forwarded-For/X-Real-IP, so an internet attacker can't spoof IPs to defeat rate limits or poison audit logs.Env secret containment + cross-env defense
types.TLSEnvironmentViewlow-privilege projection. OmitsSecret,EnrollSecretPath,RemoveSecretPath,Certificate,Flags, and every other field that materially contributes to enrolling a node.EnvironmentHandlerbranches on access level: AdminLevel (or super-admin) gets the full storage struct, UserLevel gets the safe projection.EnvEnrollHandler/EnvRemoveHandlerraised from UserLevel to AdminLevel — both embed the env's enroll/remove secret in their responses.EnvActionsHandlercreate/deletebranches validate caller-supplied UUIDs viaEnvUUIDFilterandExistsByUUID.QueryResultsHandlernow precheck-validates the named query belongs to the path env. Without this, a user with QueryLevel on env A could pull results from env B by passing B's query name in A's URL.TLS-side hardening
pkg/ratelimittoken-bucket middleware with per-key (IP, IP+env, etc.) buckets, amortized eviction, and a shared overflow bucket past the per-key cap so memory stays bounded under spray.--rate-limit-rps,--rate-limit-burst,--rate-limit-window).AuditLog.FailedLogin(api-side) andAuditLog.FailedEnroll(tls-side) so SOC tooling has a clean stream for brute-force / password-spray / enroll-abuse alerts.SQL hardening + carve path safety
carves.ValidCarvePathregexp gate. Without this aCarveLeveloperator could pass'; SELECT 1; --and pivot "carve a file" into "run any SELECT against your fleet" viaGenCarveQuery's string-concat path.Authz + audit-log hardening
CheckLoginCredentialsopportunistically re-hashes existing users at next login — no password reset required, rehash failure is non-fatal.UserManager.ClearTokenemptiesAPITokenANDCSRFTokentogether so an existing JWT+CSRF pair stops validating in lockstep.pkg/environments/env-cache.go: amortized cache eviction (dropped a dead field; cache entries TTL out instead of growing unbounded).Defaults + ops
--audit-logdefault flipped totrueacrossdeploy/config/{api,admin,tls}.yml. Operators can opt out with--audit-log=false.Two intentional breaking changes worth a release-note line:
--auth=jwtis now the default. Deployments currently running--auth=nonewill fail to start unless they explicitly setOSCTRL_INSECURE_NO_AUTH=1. The help text and a 60s warning ticker make this loud. Intended forcing-function so insecure defaults don't survive an upgrade.Test plan
go build ./...— cleango vet ./...— cleango test ./...— all packages passgofmt -l ./...— emptycmd/api/auth_test.go— JSON-vs-redirect on 401cmd/api/handlers/environments_test.go— env-projection strips every secret-bearing fieldpkg/carves/utils_test.go—ValidCarvePathaccept/reject table + carve-query happy-path shapepkg/ratelimit/ratelimit_test.go— token-bucket + eviction + overflowWhat this is part of
This is round 1 of a 3-round contribution. Round 2 will add the new API endpoints the SPA needs (paginated lists, stats, saved-queries CRUD, users/permissions/tokens, env-config PATCH, audit-log filters). Round 3 will land the React frontend under a new
frontend/directory. Splitting reduces review surface per PR.