feat(fleetnode): discovery scan + ReportDiscoveredDevices agent#256
feat(fleetnode): discovery scan + ReportDiscoveredDevices agent#256ankitgoswami wants to merge 18 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e357c802ad
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| rpc UploadTelemetry(stream UploadTelemetryRequest) returns (UploadTelemetryResponse); | ||
| rpc UploadEvents(stream UploadEventsRequest) returns (UploadEventsResponse); | ||
| rpc UploadHeartbeat(UploadHeartbeatRequest) returns (UploadHeartbeatResponse); | ||
| rpc ReportDiscoveredDevices(ReportDiscoveredDevicesRequest) returns (ReportDiscoveredDevicesResponse); |
There was a problem hiding this comment.
Implement the discovered-device RPC before enabling scans
With this RPC added to the gateway contract, the fleetnode daemon now calls ReportDiscoveredDevices, but the fleetd handler still only embeds UnimplementedFleetNodeGatewayServiceHandler and a repo-wide search found no concrete ReportDiscoveredDevices method outside generated/test code. In any environment running discovery (--scan-cidr with ports/plugins), every scan report will receive CodeUnimplemented from the server and sendDiscoveryChunk logs then drops the batch, so discovered devices are never persisted or pairable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Implements the on-prem fleet-node agent side of RFC 0001 phase 2: a periodic discovery scan that probes user-supplied CIDRs with plugin-loaded discoverers and batches results into the new ReportDiscoveredDevices gateway RPC, running in parallel with the existing heartbeat loop. Adds the matching protos (also duplicated in #235) and exports ipscanner.GenerateIPsFromCIDR so the agent reuses the upstream CIDR enumerator.
Changes:
- New
fleetnode runflags (--scan-cidr,--scan-port,--discovery-interval) and a discovery goroutine with bounded-concurrency probes, 60s tick timeout, 1024-per-batch chunking, andauto:/mac:/serial:identifier synthesis. - Heartbeat and discovery goroutines share
RunCmd.stateMu;refreshAndSavehandshakes against a shallowStatecopy so token reads don't block on the 2-RPC refresh. ipscanner.generateIPsFromCIDRis exported asGenerateIPsFromCIDRand consumed by the agent'senumerateCIDR, which layers a 4096-host cap on top of the existing 16-host-bit safety cap.
Reviewed changes
Copilot reviewed 10 out of 17 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| proto/fleetnodegateway/v1/fleetnodegateway.proto | Adds ReportDiscoveredDevices RPC and DiscoveredDeviceReport message with validation. |
| proto/fleetnodeadmin/v1/fleetnodeadmin.proto | Adds pairing/unpairing/list admin RPCs (duplicated in #235). |
| server/cmd/fleetnode/run.go | Adds discovery flags, plugin bootstrap, parallel heartbeat/discovery goroutines, mutex-guarded session refresh. |
| server/cmd/fleetnode/discovery.go | New file: probe orchestration, chunked report sending, identifier synthesis, plugin discoverer wiring. |
| server/cmd/fleetnode/discovery_test.go | New table-driven and integration tests covering CIDR enumeration, batching, error paths, and parallel heartbeat+discovery. |
| server/cmd/fleetnode/run_test.go | Extends stubGatewayClient with discovery call tracking. |
| server/cmd/fleetnode/fake_gateway_test.go | Adds ReportDiscoveredDevices handler to the fake gateway. |
| server/internal/domain/ipscanner/network_utils.go | Exports GenerateIPsFromCIDR and clarifies its doc comment. |
| server/internal/domain/ipscanner/scanner.go | Updates internal caller to the exported name. |
| server/internal/domain/ipscanner/network_utils_test.go | Renames test references to the exported function. |
| server/generated/grpc/fleetnodegateway/v1/*.go | Regenerated for the new RPC and messages. |
| server/generated/grpc/fleetnodeadmin/v1/*.go | Regenerated for the new pairing RPCs. |
| client/src/protoFleet/api/generated/**/*.ts | Regenerated TS bindings mirroring the proto additions. |
| server/sdk/v1/python/proto_fleet_sdk/generated/pb/driver_pb2_grpc.py | Regenerated GRPC version constant update. |
🔐 Codex Security Review
Review SummaryOverall Risk: HIGH Findings[HIGH] Fleetnode Executes Discovery Against Unrestricted Targets
[MEDIUM] Windows Plugin Loading Skips ACL and File Validation
NotesNo SQL, frontend rendering, Rust ASIC plugin, pool/wallet, or protobuf wire-format changes were present in the reviewed diff. The changed TLS/h2c client split and redirect rejection look security-positive. Generated by Codex Security Review | |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9216c9d33f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| rpc UploadTelemetry(stream UploadTelemetryRequest) returns (UploadTelemetryResponse); | ||
| rpc UploadEvents(stream UploadEventsRequest) returns (UploadEventsResponse); | ||
| rpc UploadHeartbeat(UploadHeartbeatRequest) returns (UploadHeartbeatResponse); | ||
| rpc ReportDiscoveredDevices(ReportDiscoveredDevicesRequest) returns (ReportDiscoveredDevicesResponse); |
There was a problem hiding this comment.
Add discovery reports to fleet-node auth
This new RPC is called by the fleetnode agent with its session token via NewAuthenticatedGatewayClient, but server/internal/handlers/interceptors/config.go only lists telemetry/events/heartbeat/control in FleetNodeAuthenticatedProcedures. For ReportDiscoveredDevices, FleetNodeAuthInterceptor therefore never resolves the fleet-node session and the normal auth interceptor treats the bearer value as a user API key, so discovery reports are rejected before they can reach the handler; sendDiscoveryChunk then drops Unauthenticated batches and the feature never persists scanned devices.
Useful? React with 👍 / 👎.
Replace the autonomous CIDR-scanning daemon with a control-stream consumer. The agent opens a ControlStream after enrollment is CONFIRMED, sends a Hello, waits for Accepted, then runs each DiscoverRequest the server dispatches as a ControlCommand and streams results back via ReportDiscoveredDevices with the matching command_id. A ControlAck closes each command. - New control.go: runControlLoop with exponential backoff reconnect, runControlSession handshake + receive pump, handleCommand decode + probe + report + ack, runProbes with bounded concurrency, streamReports chunked at server's max, plus the discoverer interface + pluginDiscoverer implementation (moved from the old discovery.go). - Only IPList mode is accepted on the agent. The server expands CIDR/IPRange into IPList before dispatching (see PR B). MDNS and Nmap are rejected with a descriptive ack error. - run.go: drop --scan-cidr/--scan-port/--discovery-interval and the autonomous discovery loop. Add --plugins-dir-driven control loop alongside the heartbeat loop in the same goroutine pattern. Permanent errors (NotFound, ErrBeginAuthRejected) propagate so the daemon exits rather than reconnecting forever. Tests cover happy path (probe + report + ack with correlated command_id), MDNS rejection, and reconnect after stream EOF. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9216c9d to
977b22d
Compare
Add zero-config defaults so \`fleetnode run\` works out of the box for the shipped layout, with safe fallback behavior when the layout isn't there. --plugins-dir - Default to <dir-of-binary>/plugins. Missing dir means the control loop stays off (heartbeat only) instead of failing — operators can use the agent without bundling plugins. - A present-but-unsafe dir is a hard error rather than a silent downgrade: the plugin manager execs anything in the dir as the agent's user, so any non-owner write capability there is RCE-equivalent. - Perm check on Unix: owner must be root or the running uid; no group- or world-writable bits. Windows relies on the MSI installer placing the dir under an Administrator-only path and only checks existence here. --nmap-path - Adds nmap support to the agent. NmapModeRequest now executes via the Ullaakut nmap wrapper, then the plugin probe runs against each open host:port (mirrors combined-mode pairing.service). - Default path resolution: explicit flag wins; otherwise prefer <dir-of-binary>/nmap (for the bundled-binary install) if executable; otherwise return "nmap" and let exec.LookPath resolve from PATH at scan time. - Flag must be absolute and point at an executable file. handleCommand now switches on the request mode: IPList runs the existing plugin probe sweep, Nmap runs the new nmap path, and MDNS is rejected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the Nmap rejection in DiscoverOnFleetNode. With the agent shipping a bundled nmap binary (#256), NmapModeRequest now passes through unchanged so the agent can execute it. MDNS still rejected; the agent doesn't run an mDNS listener. Server-side validation: require nmap.target to be non-empty so an all-zero NmapModeRequest is caught before round-tripping through the control stream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply ce-simplify findings:
- Extract fanOutProbes helper so the bounded-concurrency probe loop is
shared between IPList mode (runProbes inline) and the nmap follow-up
loop (runNmapDiscovery). The two had identical sem/wg/mu structure
with different concurrency caps; now both use the same helper with
their own constants.
- Introduce endpoint{ip, port} struct so IPList ports-x-ips and nmap's
open-host list both feed fanOutProbes through one type.
- Drop fleetNodeID parameter from streamReports / handleCommand. The
ID was used only in log lines; runControlLoop now enriches the
logger with fleet_node_id once per session and downstream callers
inherit it.
- Use nmap.Open constant instead of "open" string literal.
- Cache executableDir() once in run() instead of calling it twice
(once for resolvePluginsDir, once for resolveNmapPath).
- Reset backoff after a stable control session. Previously a single
successful reconnect followed by an immediate disconnect kept
growing backoff toward the 30s cap; now a session that lives past
stableSessionThreshold resets backoff to 1s on the next failure.
- Strip narrative doc comments (control loop, nmap, plugins-dir) that
restated the implementation. Keep WHY comments where the rationale
isn't obvious (plugin-manager exec security, nmap PATH-fallback
contract, plugins-dir RCE).
Build, vet, lint, and tests green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`just build-fleetnode` now wires up the layout the agent expects out of the box: server/plugins/ is already adjacent (populated by `just build-plugins`), and this recipe additionally symlinks server/nmap from whatever is on PATH so the agent's NmapMode probe path works without --nmap-path. Both have soft warnings rather than hard failures when the underlying tools aren't present (plugins not yet built, nmap not installed), so developers without the full stack can still build the binary. .gitignore picks up server/nmap (server/fleetnode and server/plugins/* were already ignored). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous build-fleetnode recipe dropped the binary at server/fleetnode and relied on server/plugins/ being adjacent. That breaks the common dev workflow: `just dev` cross-compiles plugins to linux/arm64 inside server/plugins/ for the Docker server container, and the native-host agent can't exec ELF binaries (`exec format error` on every load). Move the agent's full local layout into server/.fleetnode/: - ./.fleetnode/fleetnode (native binary) - ./.fleetnode/plugins/ (native Go plugins via _build-go-plugins-native) - ./.fleetnode/nmap (symlink to system nmap) `server/plugins/` is left untouched for `just dev`'s container, and the agent's binary-adjacent default still Just Works (`./.fleetnode/fleetnode run --state-dir=...`) without any flags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ControlStream RPC is bidi; HTTP/1.1 can't carry it, so the agent's gateway client now uses an h2c-prior-knowledge HTTP/2 transport that matches the fleetd server's h2c.NewHandler. Without this the agent's control loop hits "HTTP status 505 HTTP Version Not Supported" on every reconnect attempt while unary heartbeats keep working (HTTP/1.1 was fine for unary). Side effects: - http.Client.Timeout is removed (it would cap long-lived streams). Per-call deadlines added: heartbeatTimeout in sendHeartbeat, handshakeStepTimeout in RunHandshake's two RPC calls. Existing per-call timeouts (discoveryReportTimeout, ControlStream ctx) already cover their paths. - Test servers wrapped in h2c.NewHandler so they negotiate HTTP/2 prior knowledge like prod does. - TestGatewayHTTPClient_HasTimeout removed -- the constant it asserted against no longer exists. Production HTTPS support is deferred until there's a target prod deployment; the agent currently only speaks h2c plaintext. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two complementary defenses so the agent doesn't leave go-plugin children running when it dies. 1. Catch SIGHUP alongside SIGINT/SIGTERM. Closing the terminal sends SIGHUP; without it the parent dies uncaught and 'defer cleanup()' never fires, so the plugin manager's Shutdown is skipped and the children are reparented to init. 2. Add a startup reaper. On Unix, before constructing the new plugin manager, scan `ps -eo pid,ppid,command` for processes whose parent is init (ppid == 1) and whose binary lives in the resolved plugins directory; SIGKILL them. Covers the uncatchable cases (SIGKILL, panic, OOM) that the SIGHUP fix can't. Windows is a documented no-op: go-plugin children there inherit a job object that terminates them with the parent. Both run early enough to keep the failure mode the user just hit (stranded proto-plugin blocking the next agent's plugin handshake) from happening again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ppid==1 filter was too narrow. Real-world cases where it missed: - A previous agent's children whose zombie parent hadn't been collected yet, so the kernel had not re-parented them to init at the moment the new agent's reaper ran. - Children adopted by an arbitrary surviving process tree (e.g. a debug wrapper or shell job) instead of init. The reaper runs before newPluginDiscoverer, so by definition the current agent has not yet spawned anything in pluginsDir. Anything we see whose executable lives there is leftover from a prior run, regardless of ppid. Drop the filter and exclude only the agent's own PID for safety. Concurrent agents on the same pluginsDir are already impossible because of state.lock contention, so widening the reaper here cannot affect a legitimate sibling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes to prevent the silent-hang footgun we just hit. 1. Lock acquisition is non-blocking (LOCK_EX|LOCK_NB). On contention, read the owner PID recorded inside the lockfile and either tell the operator who to kill or, if the recorded PID is dead, hint that a subprocess inherited the FD. Previously the lock would block forever with no log output -- the new behavior is a clear error in <100ms. 2. Log "acquiring state lock" at the call site so the agent always has a visible checkpoint between plugin startup and lock acquisition. The user-facing symptom of the earlier hang was zero log output; now there's always a last-line indicating exactly how far startup got. 3. Move the plugin reaper + newPluginDiscoverer INSIDE the state lock. The reaper is broad enough to kill any plugin process in our pluginsDir; if a sibling agent already owns the lock, running the reaper would murder its plugin children. Reaping only after we own the lock guarantees we're the authoritative agent and any leftover plugins are real strays. Verified with a smoke test: two `fleetnode run` invocations against the same state-dir -- second errors in ~60ms with the new message and the first agent's daemon + plugins survive untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bidi http2 response body parks in sync.Cond.Wait, which is not ctx-aware. So stream.Receive() in the control loop would block past the parent ctx's lifetime even after SIGINT triggered cancellation -- Ctrl+C left the daemon running indefinitely. Wrap the session with a tiny watcher goroutine: when ctx fires, force stream.CloseRequest() + CloseResponse(). That broadcasts the cond and Receive returns with "canceled: http2: response body closed". Session cleans up, control loop exits on its outer ctx check, runLocked returns, plugin manager shuts down. Also drop the plugin-manager grace period from 5s to 2s. Shutdown waits the full timer regardless of whether the plugin already exited, so 5s per plugin was 10s of dead time on every clean exit. 2s keeps the safety net for genuinely stuck plugins while bringing clean shutdown to ~4s total. Smoke-tested: SIGINT now returns in ~4s vs hanging indefinitely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Some IDE terminal panels and shell setups silently filter or buffer stderr (FD 2), making the agent appear hung when it was logging the whole time. Slog now writes to stdout, matching the convention used by other CLI daemons (caddy, prometheus, vault). Redirecting with `2>&1` and piping through tee still work; the only behavior change is that the structured-log stream is now on FD 1 instead of FD 2. go-plugin's hclog noise (debug `plugin address` lines, plugin manager Shutdown messages) stays on stderr -- those are secondary and we want the agent's own logs to win the visible-by-default slot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Extract testutil.NewH2CServer / testutil.NewH2CClient so the five
inline httptest.NewUnstartedServer(h2c.NewHandler(...)) + h2c
transport copies across cmd/fleetnode and fleetnodebootstrap test
files share one canonical form.
- Collapse the symmetric handshake.go WithTimeout pair into a
withHandshakeTimeout helper; the previous inline pattern was easy
to forget on a future third step.
- Swap the defer order in runControlSession so close(done) runs
before the stream-close defer. The watcher goroutine now retires
via its quiet path on every normal return, avoiding a redundant
close race when ctx happens to cancel at the same moment.
- Replace the no-op `if err := writeLockOwnerPID(f); err != nil
{ _ = err }` block with `_ = writeLockOwnerPID(f)` plus a single
one-liner explaining the best-effort intent.
- Trim narrative comments on newGatewayHTTPClient, runControlSession,
reapOrphanedPlugins, and the stdout-logging Run shim. Keep only
the non-obvious WHY.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass through the recent code aggressively removing comments that restated implementation, narrated change history, or padded the WHY into multi-sentence prose. Project preference is default to no comments; non-obvious rationale stays but in one or two lines. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vnish (and the rest of the asic-rs firmware family) live in the Rust asicrs plugin, not the Go antminer plugin. Until now build-fleetnode only staged the Go plugins, so the agent couldn't discover any asic-rs-only firmware on macOS dev. - Parameterize _asicrs-build with outdir (default kept at server/plugins so build-plugins/test-contract are unaffected). - Build natively via local cargo on Darwin instead of docker buildx; the docker path produces a Linux ELF that macOS can't exec. Linux hosts (CI) keep the docker path so they don't need Rust installed. - Add the new dep to build-fleetnode so server/.fleetnode/plugins/ contains asicrs-plugin + asicrs-config.yaml alongside the Go plugins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…README - HTTPS gateway client is now scheme-aware: TLS-validating http2.Transport for https://, h2c for http://. Prior shared DialTLSContext shim silently bypassed TLS for https://, defeating ValidateServerURL. - validateNmapTarget restricts server-supplied targets to IP/CIDR/range/ hostname; rejects leading dashes and shell metacharacters so a compromised server cannot pivot via nmap flags like -iL/etc/passwd or -oN. - Drop --plugins-dir and --nmap-path CLI flags. Installer always uses <exe-dir>/plugins and <exe-dir>/nmap; operator overrides only widen the attack surface. nmap falls back to PATH for dev machines. - Add server/cmd/fleetnode/README.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Per-file plugin validation: lstat each entry under <exe-dir>/plugins, reject symlinks, require regular file owned by root or current uid with no group/world write. Container-only check missed writable individual binaries and symlinks to elsewhere. - Bound discovery commands: cap IPs at 1024, ports at 10 (matches pairing.MaxPortsPerIP), wall-clock at 10m (matches pairing default). Reject oversized requests with a clear error. - IPv6 nmap: detect IPv6 literal / hostname-only-IPv6 and pass -6 via nmap.WithIPv6Scanning(); reject IPv6 CIDR consistently with pairing service. Extract buildNmapOptions for testability. - Per-RPC timeout on Register: wrap with withHandshakeTimeout so blackholed servers can't hang enroll while it holds the state lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pairing service rejects scoped IPv6, rejects link-local IPv6, canonicalizes IPv6 spelling, and resolves hostnames to IP literals before probing. The fleetnode agent previously passed server-supplied ip_addresses straight through, so hostname inputs probed successfully but then failed ReportDiscoveredDevices buf.validate (ip: true), and scoped or link-local IPv6 inputs reached the dialer with no chance of succeeding. - New helper netutil.NormalizeIPListEntry with an IPListResolver interface so DNS can be stubbed in tests. - Sentinel errors (ErrEmptyTarget, ErrScopedIPv6, ErrLinkLocalIPv6, ErrHostnameUnresolved) so callers can branch on cause when logging skips. - pairing/service.go::DiscoverWithIPList: ~45 lines of inline normalization replaced with a single call to the helper. - cmd/fleetnode/control.go: normalize each ip_address before fan-out, skip failures and log, ack failure if no entry normalizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous cap on len(ports) didn't prevent a single entry from
expanding to thousands of probes: nmap honors range ("1-65535") and
comma ("80,443,8080,...") syntax inside a single string. Plugin-default
ports were also unchecked.
- resolveAndValidatePorts requires every entry to be a decimal TCP port
in 1..65535. Range syntax, comma syntax, protocol prefixes ("T:80"),
whitespace, and service names are rejected.
- Dedupes the result and re-caps at maxPortsPerIP.
- Applied uniformly across IpList and Nmap dispatch, and to the
plugin-supplied default fallback.
- runNmapDiscovery / buildNmapOptions now take the validated port list
so they never see the raw request.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds the on-prem fleet-node agent's control-stream consumer (RFC 0001 phase 2). After enrollment is
CONFIRMED, the agent opensControlStream, executes thepairing.v1.DiscoverRequestthe server dispatches, streams results viaReportDiscoveredDevicescorrelated bycommand_id, and acks.Pairs with #235 (server-side
ControlStreamhandler,DiscoverOnFleetNodeadmin RPC, in-memory registry). Wire contracts merged in #257. Until #235 lands, the agent's control loop receivesUnimplementedand reconnects with backoff — degrades quietly. PR order between #235 and this one doesn't matter.Agent invocation
fleetnode run— no scan flags needed. Plugins resolved from<exe-dir>/plugins; nmap from<exe-dir>/nmaporPATHat scan time. State, enrollment, and heartbeat configuration via existing flags.Control loop (
control.go)runControlLoopopens the stream and reconnects with exponential backoff (1s → 30s cap). Permanent errors (CodeNotFound,ErrBeginAuthRejected) propagate so the daemon exits rather than spinning.runControlSessionperformsHello→Acceptedhandshake, then receivesControlCommands. A watcher goroutine closes the stream onctx.Donebecausehttp2.pipe.Readblocks on async.Condthat ctx alone cannot unblock.handleCommanddecodes the payload aspairing.v1.DiscoverRequestand dispatches:IPListModeRequest→ bounded-concurrency probes via the plugin manager (32 in-flight, 3s per probe).NmapModeRequest→runNmapDiscoveryshells out viaUllaakut/nmap/v3, then runs plugin probes against every open host:port (16 in-flight). 10-minute overall scan budget.MDNSModeRequest→ rejected via ackerror_message.streamReportschunks at the server'smax_items = 1024limit and setscommand_idso the operator stream wakes up.sync.MutexguardsSessionToken;refreshAndSavehandshakes against a shallowStatecopy so the 2-RPC refresh does not stall token reads.Plugins
Loaded from
<exe-dir>/plugins(fixed; no CLI flag). Missing dir → heartbeat-only mode. Present-but-unsafe → hard error: the plugin manager execs every file there as the agent uid, so non-owner write capability is RCE-equivalent. UnixcheckPluginsDirPermsrefuses non-root/non-self owners or any group/world write bits. Windows production deployments must install the binary under an Administrator-only directory (e.g.,%ProgramFiles%\fleetnode\) soplugins\inherits a safe ACL.Nmap
<exe-dir>/nmapif present, otherwisenmapfrom PATH at scan time. Server-supplied targets pass throughvalidateNmapTarget, which accepts only IP / CIDR /A.B.C.D-Nrange / hostname. Leading dashes, whitespace, and shell metacharacters are rejected — a compromised server cannot pivot via flags like-iL/etc/passwdor-oN /tmp/loot.Transport security
newGatewayHTTPClientis scheme-aware:http://uses h2c (loopback streaming),https://uses a TLS-validatinghttp2.Transport. A sharedAllowHTTP + DialTLSContextshim would silently downgradehttps://to plaintext, defeatingValidateServerURL's non-loopback HTTPS requirement.Identifier synthesis
Production SDK drivers leave
DeviceIdentifierempty. The agent synthesizes:mac:<addr>when MAC is exposedserial:<sn>when only serial is exposedauto:<uuid>otherwise (server reconciles by(fleet_node, ip, port))Docs
server/cmd/fleetnode/README.md covers overview, subcommands, state/lock semantics, plugins, nmap, control-stream flow, build, run, enrollment, security model, development, troubleshooting.
Tests
control_test.go: happy path (probe → report → ack with correlatedcommand_id), MDNS rejection ack, reconnect after stream EOF.nmap_test.go:TestValidateNmapTarget(21 cases covering IP/CIDR/range/hostname acceptance and-iL, whitespace, null-byte, shell-metachar, leading-dash rejection); resolver tests.plugins_dir_test.go: binary-adjacent default discovery, missing-default-returns-empty, present-but-unsafe hard error.client_test.go: TLS round-trip with pinned cert succeeds; untrusted cert fails withx509; non-http(s) scheme rejected; redirects refused.run_test.go:stubGatewayClientsatisfies the extendedgatewayClientinterface (ReportDiscoveredDevices,ControlStream).Test plan
go build ./server/...go vet ./server/...golangci-lint run -c .golangci.yaml ./cmd/fleetnode/... ./internal/fleetnodebootstrap/...go test ./server/cmd/fleetnode/... ./server/internal/fleetnodebootstrap/... -race -count=1fleetnode enroll, confirm, thenfleetnode run. Verifycontrol stream openedlog. With the fake antminer on:4028, hitDiscoverOnFleetNodefrom the operator side with anipListrequest and confirm streamingDiscoverResponsecontaining the device + adiscovered_devicerow withdiscovered_by_fleet_node_idset. Re-run with annmaprequest and confirm the same.🤖 Generated with Claude Code