Semantic email search. Describe what you're looking for in natural language and find matching emails by meaning, not just keywords. Runs entirely locally, no data leaves your machine.
- Apple Mail — parses
.emlxfiles under~/Library/Mail/V<N>/ - Thunderbird — reads mbox files under
~/Library/Thunderbird/Profiles/<profile>/ImapMail/<account>/, honorsX-Mozilla-Statusdeletion flags - Plain mbox archives — point at any directory of
.mbox/.txt/extension-less mboxrd dumps (Gmail Takeout, Apple Mail "Export Mailbox" output, mailing-list archives, …)
- Apple Silicon Mac — Metal GPU is what makes the embedder usable
- Rust toolchain for building from source
- For Apple Mail: Full Disk Access granted to your terminal (System Settings → Privacy & Security → Full Disk Access).
~/Library/Mail/is otherwise unreadable.
git clone https://github.com/futuun/mailwise.git
cd mailwise
cargo install --path .The binary lands in ~/.cargo/bin/mailwise. On first run, the embedding model jina-embeddings-v5-text-nano-retrieval (~230 MB, CC BY-NC 4.0) is downloaded into ~/.mailwise/.
mailwise config # interactive: pick clients, set poll interval, etc.
mailwise index # scan + embed (foreground)
mailwise search "..." # natural-language query
mailwise install-agent # background indexing via launchd; tail ~/.mailwise/logs/indexer.logInitial indexing takes some time (~30 minutes for 50k emails on M1 Max). Everything is stored in ~/.mailwise/mailwise.db. If you kill index mid-run it picks up where it left off next time.
mailwise search accepts --open N to open the Nth result in the configured client, and --format json for launcher integration (Alfred, Raycast). See mailwise help for the full list.
- Index — every poll, each enabled client scans messages on disk and emits
(Message-ID, locator)pairs. The diff againstemail_sourcestells us what's new, relocated, or gone. New messages get parsed (RFC 2822 + MIME body extraction; HTML through scraper/html5ever, plain text gets format=flowed unflowing + sigdash/footer trimming). - Embed — concat subject + body, embed via Jina v5 nano on Metal GPU through llama.cpp → 768-dimensional vectors.
- Store — metadata in SQLite
emails, vectors in asqlite-vecvirtual table with cosine distance. - Search — embed the query, KNN over the vector table, rank by cosine similarity (with a length-factor penalty so trivially-close two-word matches don't crowd out richer hits).
All data lives in ~/.mailwise/mailwise.db.
MailClient is the entire contract. Three methods do the actual work:
list_locators— walk this client's data on disk and return(Message-ID, locator)pairs plus ascan_completeflag (used to refuse destructive deletes when the walk had errors). Locator format is opaque to the framework — existing clients use a.emlxfile path (Apple Mail) or<mbox_path>#offset=N(Thunderbird, plain-mbox).fetch_email(locator)— parse one message at the given locator into anEmail. Useparser::build_emailfor the RFC 2822/MIME work.open(conn, message_id)— open the message in whatever way fits (URL scheme, native API, rendered preview).
Plus the boilerplate source() and is_available(). Add a variant to the Source enum, register the client in instantiate(...), and sync_one handles everything else: diff against email_sources, parallel-parse only genuinely-new Message-IDs, batched DB inserts, ratio-gated removes, end-of-cycle orphan GC across all clients.
The shared mbox module covers SIMD envelope scanning, mboxrd un-escape, and one-shot message fetch, so a new mbox-flavoured client (Mutt, Postfix Maildir-as-mbox, etc.) is mostly just a filesystem-walk predicate and an optional should_skip callback for client-specific deletion flags.
The mailwise source code is licensed under MIT.