Security measures, validation, and hardening in LLMem. Back to README
LMEM_HOMEis validated against path traversal, system directories, and symlink attacks.- Write paths are validated against system directories and symbolic links.
is_symlink()checks are wrapped intry/except OSErrorto handle inaccessible paths gracefully. - System directory blocking uses a shared
_BLOCKED_PATH_PREFIXEStuple (with prefix +/matching to avoid false positives) across_validate_home_path,_validate_write_path,OpenCodeAdapter.__init__, andCopilotAdapter.__init__to prevent DRY violations. validate_session_id()rejects session IDs containing/,\, or..to prevent path traversal when constructing context file paths.- CLI path validation:
llmem add --file,llmem import, andllmem export --outputblock access to protected system directories (e.g./etc,/bin,/usr/bin,/sbin,/usr/sbin,/dev,/proc,/sys,/var,/boot,/root). These checks use prefix +/matching to avoid false positives (e.g./binary_searchis not blocked as/bin). - Import file size limit:
llmem importrejects files larger than 10 MiB. - URL validation (
is_safe_url) blocks private/reserved IPs and SSRF vectors, including percent-encoded IP hostnames (e.g.%31%32%37%2e%30%2e%30%2e%31is decoded before IP checks). Whenallow_remote=False(the default), only loopback addresses on the Ollama default port are permitted — all other IPs including public addresses are rejected.safe_urlopenenforces URL validation, blocks all redirects (via_NoRedirectHandler), mitigates DNS rebinding by re-resolving the hostname immediately before the request, strips credentials from error messages, and requires an explicitallow_remoteparameter (defaults toFalse) for non-loopback addresses. It accepts both string URLs andurllib.request.Requestobjects, and applies a default 30-second timeout to prevent indefinite hangs. - OpenCode session extraction validates the database path: rejects path traversal (
..), system directories, symlinks, and URI injection (?and#characters).get_opencode_db_path()validates via_validate_home_pathbefore returning. - CopilotAdapter validates
state_dirvia_validate_home_pathto reject path traversal and system directory attacks.share_diris resolved but not validated (it points to user-controlled output). - API keys are masked in
__repr__on provider instances (***masked***). - API keys are refused over plain HTTP to non-loopback hosts.
OpenAIProviderandAnthropicProviderraiseValueErrorifbase_urlishttp://and the hostname is not a loopback address (checked via exact string match andipaddressfor IPv6-mapped addresses like::ffff:127.0.0.1). Substring matches likelocalhost.evil.comare blocked. - A warning is logged when API keys are sent to a non-default base URL to alert the user of potential credential exfiltration risk.
- Validation error messages use
_strip_credentials()to remove userinfo from URLs — never embed user-supplied URL credentials in error messages or logs. _strip_credentials()is used consistently acrossis_safe_url(),safe_urlopen(), provider error messages, and config URL validation to prevent credential leaking.- Embedding and generation inputs are validated against size limits:
MAX_TEXT_LENGTH(100,000 characters per text) andMAX_BATCH_SIZE(2,048 texts per batch) to prevent OOM and resource exhaustion. - Embedding dimension validation in
MemoryStore.add()rejects vectors whose dimension doesn't matchvec_dimensions, preventing dimension mismatch bugs from silently corrupting the vector index. - All SQL queries use parameterized statements (no injection risk).
- Code reference path validation:
validate_code_ref_path()rejects absolute paths (leading/) and directory traversal (..) in code ref target_ids.resolve_code_ref()enforces anallowed_pathsallowlist (defaulting to[Path.cwd()]) and blocks resolved paths targeting system directories (/etc,/var, etc.). Code refs must use the relative formatpath:start_line:end_line. add_relation()validates code reftarget_idpaths at insertion time — unsafe paths are rejected withValueError.- SQLite extension loading is disabled immediately after
sqlite-vecloads, preventing runtime loading of arbitrary shared libraries. - Database files are created with
umask(0o177)before creation, thenchmod(0o600)applied to the DB file and its WAL/SHM sidecars (prevents a race window where sensitive memory content is world-readable on multi-user systems). Parent directories use0o700. config.yamlis written with0o600file permissions (owner-only read/write) to protect API keys and secrets from other users on shared systems.- Server auth token strength:
server.auth_tokeninconfig.yamlmust be at least 16 characters. Short tokens are rejected with a hint to generate a strong token. import_memories()validates entry IDs (string, max 256 chars), embeddings (bytes, max 1 MB), and confidence (numeric) before insertion. Invalid entries are skipped with warnings rather than crashing.export_all()defaults to a limit of 10,000 memories to prevent unbounded memory consumption; passlimit=Noneto export all._search_by_embedding_brute()uses aLIMITclause (10,000 rows max) to prevent OOM on large databases.- Embedding metrics computation capping:
get_embeddings_with_types(limit=)applies a SQLLIMITclause (default: 10,000) andcompute_metrics(max_embeddings=10000)truncates input vectors. These caps prevent O(n²) pairwise metrics computations from causing OOM or CPU hangs on large stores. process_transcript()enforces the same size limit asprocess_file()to prevent OOM from large session transcripts.- Dream diary locking: On platforms with
fcntl(Linux/macOS), dream diary writes use an exclusive file lock to prevent corruption from concurrent dream cycles. - OpenCode tool invocations (
_llmem.ts) prepend--before user arguments to prevent argparse flag injection. - JavaScript hooks use
execFileSync(not shell-basedexecSync) andvalidateSessionId()for path traversal protection, withcanSpawnProcess()rate limiting andMAX_CONCURRENT=3process cap to prevent resource exhaustion. - Prototype pollution protection in
_parseSimpleYaml: keys__proto__,constructor, andprototypeare filtered from parsed YAML to prevent Object prototype mutation. - ProviderDetector.detect() only returns
providerandollama_url— no API key presence is exposed. - Migration from
~/.lobsterdog/skips symlinks (usingfollow_symlinks=False).
Code indexing security:
walk_code_files()skips all symlinks (both file and directory) to prevent path traversal and data exposure.- Default file size limit of 1 MiB (
--max-file-size) prevents memory exhaustion from large files. - Default directory depth limit of 50 (
--max-depth) prevents stack overflow from deeply nested trees. - Credential files are excluded from indexing:
.env,.env.*variants,.pem,.key, SSH private keys (id_rsa,id_dsa,id_ed25519,id_ecdsa),.netrc,.htpasswd,.npmrc,.pypirc. .gitignorepatterns are respected at every directory level, with correct handling of anchored patterns (leading/), negation patterns (!), and directory-only patterns (trailing/).
The Go implementation (internal/store) shares the same security posture as Python with some implementation-specific differences:
- Database file permissions: Parent directory created with
0700(owner-only). On Unix (Linux/macOS),syscall.Umask(0o177)is set before DB file creation so the file is created with mode0600from the start — no TOCTOU window. On non-Unix platforms (Windows), the umask is a no-op and the0700parent directory serves as the primary defense. After creation,chmodDBFiles()applies0600to the DB file and its WAL/SHM sidecars as an additional layer. - SQL injection: All queries use parameterized
?placeholders via Go'sdatabase/sqlpackage. Theplaceholders()function generates comma-separated?sequences forINclauses. FTS5 MATCH queries are sanitized viasanitizeFTSQuery(). LIKE queries useESCAPE '\'withescapeLike(). - Type validation: Memory type names must match
^[a-z][a-z0-9_]*$and be ≤64 characters (enforced byRegisterMemoryType()). Relation types must be in theValidRelationTypes()set (supersedes,related_to,derived_from). - Embedding dimension validation:
Add()rejects embeddings whose byte length doesn't matchvec_dimensions × 4.ImportMemories()skips entries with mismatched embedding dimensions. - Embedding size limit (DoS protection):
Add(),Update(), andImportMemories()enforce a 1 MB per-embedding limit (maxEmbeddingBytes = 1048576).Add()andUpdate()return errors for oversized embeddings.ImportMemories()skips oversized entries with aslog.Warnlog message. - ID length cap:
ImportMemories()rejects IDs longer than 256 characters. - Import validation:
ImportMemories()validates that each entry has non-emptyTypeandContentstring fields,Confidenceis in [0, 1], and embedding dimensions match. Invalid entries are skipped withslog.Warn— no panics. - Traversal depth cap:
TraverseRelations()capsmaxDepthat 5 (hard limit matching Python). - Brute-force limit:
searchByEmbeddingBrute()uses aLIMITclause (default: 10000 rows) to prevent OOM on large databases. - Export limit:
ExportAll()defaults to 10000; pass0for no limit. - Vec dimension mismatch:
initVecTable()verifies that an existingmemories_vectable matches the configuredvec_dimensions. Mismatch returns an error. - FTS fallback: If FTS5 search fails (e.g., malformed query),
Search()silently falls back to LIKE search rather than returning an error. - Context support: All public methods accept
context.Contextfor cancellation and timeout, following Go best practices. - Error wrapping: All errors are wrapped with the
"llmem: store: "prefix usingfmtErr(), providing domain context for debugging.
The internal/urlvalidate package mirrors the Python url_validate module's SSRF protections with Go-specific implementation details:
- Private IP blocking:
IsSafeURLrejects private, link-local, multicast, and unspecified IPs. Loopback is only allowed on the Ollama default port (11434) whenallowRemote=false. - Percent-decode bypass prevention: Hostnames are percent-decoded via
url.PathUnescapebefore IP and DNS checks, preventing SSRF bypass via encoded IP addresses (e.g.%31%32%37%2e%30%2e%30%2e%31→127.0.0.1). - Redirect blocking:
SafeURLOpenuses a customnoRedirectTransportthat blocks all HTTP redirects (3xx responses). Redirect source and target URLs are logged viaslog.Warnwith credentials stripped. - DNS rebinding mitigation:
SafeURLOpenre-resolves the hostname immediately before the HTTP request to detect DNS rebinding TOCTOU attacks. - Credential stripping:
stripCredentials()removes userinfo from URLs in error messages and logs, preserving query strings and fragments. - Fail-closed defaults:
IsSafeURLwithallowRemote=falseonly permits loopback addresses on the Ollama default port.IsRemoteAllowedreturnsfalsefor hostnames that fail DNS resolution. - Local hostname recognition:
IsRemoteAllowedrecognizeslocalhost,localhost.localdomain, andlocalhost6as local hostnames (not remote), consistent with Python's_is_loopback_hostname(). - EmbeddingEngine URL validation:
NewEmbeddingEnginevalidates BaseURL viaValidateBaseURLwhen creating a production HTTP client. Test clients (withHTTPClientprovided) skip URL validation since the caller controls the transport.
- LRU cache safety: Cache access is protected by
sync.RWMutex. Cache hits return a defensive copy to prevent mutation. Cache writes use exclusive lock. - Ollama availability check:
CheckAvailablereturnsfalseon any error (logs at Debug level). Never panics. Matches exact model name or model name with tag suffix (e.g."nomic-embed-text:latest"), not prefix matching. - Timeout protection: Production HTTP clients use a configurable timeout (default 30s). Context cancellation is respected in
EmbedandCheckAvailable.
- Computation capping:
ComputeMetricscaps embedding count atMetricsMaxEmbeddings(10000) to prevent O(n²) CPU hangs. Labels are truncated to match. Logs a warning when capping occurs. - Edge case safety:
AnisotropyandSimilarityRangereturn 0.0 for empty/single-vector input.DiscriminationGapreturns an error on label/embedding length mismatch.