Epic: Multi-Domain Support (v1)#240
Draft
uzyn wants to merge 7 commits into
Draft
Conversation
Lands the multi-domain config schema in `config.rs` (parse + validate only; no migration, no daemon/runtime changes): - `Config.domains: Vec<String>` replaces `Config.domain: String`. Canonical shape is `domains = ["a.com", "b.com"]`; legacy `domain = "x.com"` accepted on read and normalized to a one-entry vec. Mixed `domain` + `domains` rejects with the exact wording "specify either 'domain' (singular, legacy) or 'domains' (plural), not both". Entries lowercased, case-insensitively deduplicated, RFC 1035-validated. Order significant — `domains[0]` is default. - `[mailboxes."info@a.com"]` (FQDN-keyed) parses; key must equal `address` (case-insensitive on the domain). Legacy `[mailboxes.info]` (local-part-keyed) preserves the operator-friendly key in the in-memory map; `address` must reference a configured domain. On-disk re-keying to FQDN is deferred to the later upgrade migration so single-domain runtime paths keep working unchanged this sprint. - `MailboxConfig::is_catchall(&self, config: &Config)` matches `*@<d>` for any `d` in `config.domains`. - `Config.per_domain: HashMap<String, DomainOverride>` parses from `[domain."<name>"]` sub-tables (singular `domain` key — TOML cannot let `domains` be both an array and a table). Each `DomainOverride` carries optional `signature`, `dkim_selector`, `trust`, `trusted_senders`. Dangling sub-tables reject at load. Per-domain trust validates against the same allowlist as the global trust. - Top-level `dkim_selector` is now `Option<String>`; `Config::default_dkim_selector(&self) -> &str` resolves to `"aimx"` when unset. `Config::default_domain(&self) -> &str` returns `domains[0]`. - 47 unit tests in `config::tests` cover every legal and rejected shape. 6 fixture configs land at `tests/fixtures/config/*.toml` with structural-invariant load tests for each. All 1181 unit + 116 integration tests pass. `cargo clippy --all-targets -- -D warnings` and `cargo fmt -- --check` clean.
Atomic on-disk migration that brings legacy single-domain installs onto the canonical multi-domain layout on first `aimx serve` startup under the new binary. Synchronous, gated by the `.layout-version: 2` marker, idempotent across restarts, hard-fail on partial completion. Storage + DKIM relocation: rename(2) under `<data_dir>/<default_domain>/` and `<dkim_dir>/<default_domain>/`. Config rewrite: structural `domain → domains` promotion via `write_atomic`. Marker write: temp- then-rename of `.layout-version: 2` (0644 root:root). Order is load- bearing so a crash mid-flow prefers "orphaned DKIM key" over "domain in config but DKIM missing"; marker is last so a partial run never claims to be done. Migration runs under the documented lock hierarchy (outer per-mailbox locks in sorted FQDN order → inner CONFIG_WRITE_LOCK) before any listener binds. Layout-aware path shim: `Config::inbox_dir` / `sent_dir` / `storage_root_for_default_domain` / `storage_roots` consult the marker so same-process callers see v2 paths immediately after the marker lands. `resolve_active_dkim_dir` keeps the doctor probe and the daemon's DKIM load aligned across v1 and v2 installs. The `send_handler`, `state_handler`, `mailbox_handler`, `doctor`, and `mailbox::discover_mailbox_names` data-plane paths now route through these helpers. Mailbox-key FQDN re-key (`[mailboxes.<local>]` → `[mailboxes."<local>@<domain>"]`) is deferred to the runtime data- plane rewire so every `config.mailboxes.get(<local>)` callsite migrates at the same time as the on-disk shape. Per-domain storage dir is explicitly chmod'd to 0o755 after creation so the daemon's defensive 0o077 umask doesn't lock out non-root MCP callers; contained inbox/<name>/ and sent/<name>/ subdirs remain 0o700 root-locked. UmaskGuard test helper pins the umask so cargo test's default 0o022 can't mask the regression. Tests: 26 unit tests in `src/upgrade_migration.rs` cover detection, each rename step, EXDEV handling by code inspection, the config rewrite, the marker write, the orchestration, and the per-domain dir traversal-bit invariant. 4 integration tests in `tests/upgrade.rs` exercise end-to-end migration, idempotency, corrupted-marker hard-fail, and post-migration SMTP RCPT against a realistic v1 fixture. 6 additional unit tests cover the layout-aware doctor DKIM probe and per-domain mailbox storage scans. `tests/uds_authz.rs` paths routed through new per-domain helpers to keep the production-perm smoke suite passing.
…re-key (#242) Wires multi-domain into the runtime data plane: SMTP RCPT TO accepts any configured domain, outbound signs with the per-message domain's DKIM key + selector, sent copies persist under `<data_dir>/<from-domain>/sent/<local>/`, bare-local-part From: rewrites to the default domain daemon-side, and the deferred on-disk mailbox-key FQDN re-key fires on first start under the new binary. - `recipient_domain_matches_any` replaces the single-domain helper in the SMTP session state machine; `Config::resolve_mailbox_for_rcpt` does exact FQDN lookup with per-domain catchall fallback. - Per-domain DKIM key map via `Arc<ArcSwap<HashMap<String, DkimKeyEntry>>>` so future domain CRUD verbs can hot-swap without restarting. Selector resolution order: per-domain override → top-level → built-in `"aimx"`. Missing key for non-default domains warns and the daemon still starts; missing default-domain key is fatal. Legacy `<dkim_dir>/private.key` fallback applies only to the default domain. - `send_handler` extracts the From: domain from the submitted body, validates against `config.domains`, signs with the per-domain key, and rejects per-domain catchall as outbound sender. Bare-local- part From: rewrites both header and body bytes before signing so DMARC alignment stays valid. - New `src/storage.rs` with `mailbox_storage_path` / `Folder`; `Config::inbox_dir` / `sent_dir` delegate to the helper, and a CI grep job rejects new raw `.join("inbox" / "sent")` outside the storage / upgrade-migration / mailbox modules. - Carry-over startup re-key rewrites legacy `[mailboxes.<local>]` to `[mailboxes."<local>@<domain>"]` on already-v2 installs. Idempotent. - MAILBOX-CREATE (daemon + CLI fallback) inserts new mailboxes FQDN-keyed so the in-memory shape is consistent post-create without waiting for the next-restart carry-over. - 29 new tests across `src/config.rs`, `src/dkim_keys.rs`, `src/smtp/session.rs`, `src/send_handler.rs`, `src/storage.rs`, `tests/multi_domain.rs`, and `tests/upgrade.rs`.
Adds `aimx domains list` / `aimx domains add` (with `aimx domain` clap alias and a scaffolded `remove`), the `AIMX/1 DOMAIN-LIST` and `DOMAIN-ADD` UDS verbs, the root-only `Action::DomainCrud` authz variant, and an `--domain` flag on `aimx dkim-keygen` for targeting a specific per-domain key directory. The `DOMAIN-ADD` handler hot-swaps the in-memory `Arc<Config>` and the per-domain DKIM `ArcSwap` map atomically (DKIM map first, config second) so a concurrent send observing the new domain in `config.domains` always sees the matching key. SMTP RCPT to a freshly-added domain is accepted by the running daemon without a restart, validated by an end-to-end CI test under sudo. `aimx dkim-keygen` (no `--domain`) writes to `<dkim_dir>/<default_domain>/` — the v2 per-domain layout the daemon loader reads from — eliminating the rotation footgun where the new key would have silently landed at a path the daemon ignored. The read-side legacy fallback for unmigrated v1 installs is unchanged. Daemon-stopped fallback: root falls back to a direct `config.toml` edit plus DKIM keygen with a restart hint; non-root hard-errors with the canonical "daemon must be running for non-root domain CRUD" hint. `dkim::generate_keypair` now `chmod 0700`s its parent dir itself, so both the CLI direct path and the daemon `handle_domain_add` path land at identical on-disk permissions.
Land `aimx domains remove <domain>` with the `AIMX/1 DOMAIN-REMOVE`
UDS verb. Default path refuses with a sorted JSON list of blocking
mailbox FQDNs; `--force` cascades to per-mailbox wipe + per-domain
storage `rmdir` + config rewrite + DKIM-map hot-swap under the daemon
lock hierarchy (outer: per-mailbox locks in sorted FQDN order; inner:
CONFIG_WRITE_LOCK — matches the existing codebase convention so the
cascade cannot deadlock against concurrent MAILBOX-CRUD / HOOK-CRUD /
MARK-* / ingest).
Last-domain remove is hard-blocked regardless of `--force` with a
pointer to `aimx uninstall`. DKIM key files at `<dkim_dir>/<domain>/`
are preserved on disk so the operator can re-add the domain without
regenerating the keypair; the response echoes the path back so the
CLI can print the canonical preservation hint.
The cascade is re-runnable, not strict-atomic: on partial IO failure
the in-memory Config and DKIM map are not swapped, external observers
still see the pre-cascade view, and a second invocation completes the
cascade idempotently. The under-lock re-snapshot guards against
mailbox-set drift between the pre-flight scan and the lock acquisition
list with a Conflict refusal.
Daemon-stopped fallback: root falls back to direct config edit +
storage wipe + restart hint; non-root hard-errors with the canonical
"daemon must be running" message. The `storage_tree_removed` field on
the response is true only when an on-disk per-domain tree was actually
removed, so the CLI's "Storage tree removed." line is now accurate.
CI is wired: `tests/domains_remove.rs` runs under sudo on the
`mailbox-dir-perms-isolation` job. Coverage includes a concurrent-
ingest stress test that pins the lock-hierarchy invariant
operationally (cascade completes within 10s while a background thread
hammers SMTP RCPT TO on the surviving domain) and a unit test that
pins the `live_blocker_fqdns != lock_keys` conflict-detection branch
via a release-build-zero-cost test hook.
`src/domain_handler.rs` added to the storage-path enforcement awk
allowlist in CI so the cascade's per-domain `inbox/`/`sent/` walk is
the only sanctioned use of raw `.join("inbox")` outside `storage.rs`.
Per-domain runtime wiring + observability + MCP FQDN sweep for the multi-domain track. - Trust resolution helpers (`MailboxConfig::effective_trust` / `effective_trusted_senders`) walk per-mailbox → per-domain → global with replace semantics at every layer. - DKIM selector + signature resolution helpers (`Config::dkim_selector_for_domain` / `signature_for_domain` / `effective_signature_for_domain`) walk per-domain → top-level → built-in default. Per-domain signature is appended to the body before DKIM signing so the recipient verifies the signed-over bytes. - `aimx doctor` renders per-domain blocks on multi-domain installs with default-domain marker, per-domain DKIM key presence + DNS verification status, mailbox + unread counts. Single-domain installs keep the flat layout (no regression). - MCP FQDN sweep: every tool returning mailbox identifiers (`mailbox_list`, `email_list`, `email_mark_read`, `email_mark_unread`, `hook_create`, `hook_list`, `mailbox_delete`) returns FQDN-shaped names. Bare local-parts on input continue to resolve against `domains[0]`. - Datadir README template bumped to describe the per-domain layout + `.layout-version` marker; first `aimx serve` start post-upgrade refreshes via the existing version-gated overwrite. Tests: 8-combination trust resolution coverage, DKIM selector + signature resolution order, per-domain doctor rendering (flat + multi-domain blocks + per-domain DKIM DNS status), MAILBOX-LIST FQDN regression on single + multi-domain, end-to-end MCP integration suite (two-domain + single-domain) spanning mailbox_list FQDN shape and email_list bare-vs-FQDN input acceptance.
20 tasks
- book/multi-domain.md: new 9-section operator reference (when to add a
second domain, `aimx domains` CLI, per-domain config, per-domain DKIM,
storage layout, upgrade migration walkthrough, removal semantics, light
scope, rollback procedure). Linked from book/SUMMARY.md and
book/README.md.
- book/{setup,mailboxes,mcp,cli,faq,troubleshooting}.md: multi-domain
content threaded through existing pages — FQDN-keyed mailboxes,
per-domain catchall, `--domain` flag on dkim-keygen, `aimx domains`
command group, DOMAIN-* UDS verbs, three multi-domain FAQs, a new
troubleshooting section.
- agents/common/aimx-primer.md: default-domain resolution and FQDN
disambiguation rules; primer line-count soft cap bumped 500 -> 600.
- agents/common/references/multi-domain.md (new): operator-facing
reference card covering the default-domain rule, FQDN disambiguation
across mailbox-scoped MCP tools, per-domain storage, and the
operator-only boundary on domain CRUD.
- RELEASE_NOTES.md (new): top-level notes calling out the config
rewrite, storage relocation, DKIM relocation, and rollback pointer.
- src/upgrade.rs: `aimx upgrade` now prints a one-screen post-upgrade
reminder; `post_upgrade_reminder_text()` pinned by a unit test so
future edits cannot silently drop a section.
- scripts/check-docs.sh: allow `aimx domain` singular clap alias.
Smoke results documented in docs/multi-domain-smoke-results.md via a
synthetic-via-tests mapping — each step pins to a CI integration test
(tests/upgrade.rs, tests/domains_uds.rs, tests/domains_remove.rs,
tests/multi_domain.rs, tests/mcp_multi_domain.rs). Real-hardware
rollback verification is recommended pre-tag.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Epic Integration PR
Track: multi-domain
PRD: docs/multi-domain-prd.md
Sprint plan: docs/multi-domain-sprint.md
All 7 sprints of the multi-domain track have landed on
epic/multi-domain.This PR is ready for human review and merge to
main.Sprints
aimx domains list+add+ UDS verbs + DKIM keygen flag ([Sprint 4] aimx domains list + add + UDS verbs + DKIM keygen flag #243)aimx domains remove+--forcecascade ([Sprint 5] aimx domains remove + --force cascade #244)Scope (v1, "light" multi-domain)
One operator hosts multiple sending/receiving domains on the same server:
domains[0]is the default; bare local-parts resolve against itExplicitly out of scope: multi-tenant features (per-domain ACLs, separate
operators, per-domain rate limits, per-domain verifier endpoints, per-domain
TLS certs). See the PRD §1 and §9 for the full scope discipline.
How to review
integration-level concerns (cross-sprint coherence, migration safety,
release notes, operator-facing UX).
mainhere is the full PRD, not just the last sprint.[Sprint 3] SMTP intake, per-domain DKIM, storage helper, mailbox-key re-key #242) since it irreversibly rewrites
config.tomland relocates storage onfirst start under the new binary.
Pre-tag recommendation
One smoke step is documented as "manual pre-tag verification recommended":
the rollback procedure on real hardware (see
book/multi-domain.mdrollbacksection and
docs/multi-domain-smoke-results.md). Not a blocker for mergingthis PR — leaves room to validate it before tagging the multi-domain release.
Release notes
RELEASE_NOTES.mdcalls out everything operators need to know about theupgrade (config rewrite, storage relocation, DKIM relocation, rollback
pointer).
aimx upgradealso prints a one-screen post-upgrade reminder.