feat(artifacts): define artifact storage types, config, and persistence layer#2801
feat(artifacts): define artifact storage types, config, and persistence layer#2801graycyrus wants to merge 2 commits into
Conversation
…ce layer Add new `artifacts` domain with ArtifactMeta struct, ArtifactKind/ArtifactStatus enums, filesystem-backed persistence under <workspace>/artifacts/<uuid>/, and three RPC endpoints (ai_list_artifacts, ai_get_artifact, ai_delete_artifact). Includes 38 unit tests covering CRUD operations, pagination, path-sandboxing enforcement, and corrupt metadata handling. Closes tinyhumansai#2776
📝 WalkthroughWalkthroughThis PR implements the artifact storage domain for OpenHuman, introducing typed artifact metadata (ArtifactMeta), filesystem-backed persistence with path validation, and three RPC endpoints (ai_list_artifacts, ai_get_artifact, ai_delete_artifact) that enable workspace artifact management with pagination, sorting, corruption resilience, and comprehensive test coverage. ChangesArtifact Storage Domain
Sequence Diagram(s)sequenceDiagram
participant Client
participant ops
participant store
participant filesystem
Client->>ops: ai_list_artifacts(offset, limit)
ops->>store: list_artifacts(workspace_dir, offset, limit)
store->>filesystem: read artifacts/ directory entries
store->>filesystem: read each meta.json
store->>store: sort by created_at descending
store-->>ops: (Vec<ArtifactMeta>, total)
ops-->>Client: JSON with artifacts, total, offset, limit
Client->>ops: ai_get_artifact(artifact_id)
ops->>store: get_artifact(workspace_dir, artifact_id)
store->>store: validate_artifact_id
store->>store: assert_within_root
store->>filesystem: read artifact/meta.json
store-->>ops: ArtifactMeta
ops->>ops: compute absolute_path
ops-->>Client: JSON with meta + absolute_path
Client->>ops: ai_delete_artifact(artifact_id)
ops->>store: delete_artifact(workspace_dir, artifact_id)
store->>store: validate and assert containment
store->>filesystem: remove artifact directory recursively
store-->>ops: success
ops-->>Client: JSON with artifact_id, deleted: true
sequenceDiagram
participant Client
participant schemas
participant ops
Client->>schemas: RPC request with function and params JSON
schemas->>schemas: load config with timeout
schemas->>schemas: parse/validate parameters
schemas->>ops: call ai_* operation
ops-->>schemas: RpcOutcome result
schemas->>schemas: convert to JSON
schemas-->>Client: JSON response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/memory.md:
- Line 217: Update the note to correct the behavior and point to the canonical
policy: clarify that assert_within_root does not canonicalize but only checks
prefix containment (starts_with), remove the instruction to “replicate this
pattern” for new domains, and instead direct teams to use the centralized file
I/O validation in src/openhuman/security/policy.rs (i.e., call validate_path /
validate_parent_path) while keeping validate_artifact_id's two-layer checks as
an example for artifact IDs only.
In `@src/openhuman/artifacts/ops.rs`:
- Around line 63-69: The current derivation of absolute_path uses meta.path
directly which allows absolute paths or parent-traversal to escape the artifacts
root; update the logic that builds absolute_path (the code touching
absolute_path, config.workspace_dir, and meta.path) to first validate/sanitize
meta.path: reject or canonicalize if Path::new(&meta.path).is_absolute() and
reject or disallow any Path::new(&meta.path).components() containing
Component::ParentDir, or alternatively rebuild a safe relative PathBuf by
filtering components to only Normal components before joining; if validation
fails return an error instead of joining, then join the sanitized relative path
to config.workspace_dir.join("artifacts") and convert to string as before.
In `@src/openhuman/artifacts/schemas.rs`:
- Around line 87-92: The output schema for get_artifact currently uses
TypeSchema::Ref("ArtifactMeta") but the comment claims an extra absolute_path
field; update the contract to match by replacing the output FieldSchema type
with an explicit schema that includes ArtifactMeta plus absolute_path (either by
changing the type to Ref("ArtifactMetaWithPath") and adding that new type
definition, or by using an inline TypeSchema::Object that spreads ArtifactMeta
fields and adds absolute_path: TypeSchema::String required: true); ensure the
FieldSchema name stays "artifact" and adjust any schema registry entries that
reference ArtifactMeta if you introduce ArtifactMetaWithPath.
In `@src/openhuman/artifacts/store.rs`:
- Around line 26-57: The validator validate_artifact_id currently allows the
single-dot "." which leads to delete_artifact calling remove_dir_all on the
artifacts root; update validate_artifact_id to reject "." (and also reject "./"
and ".\\" to be safe) by returning an Err with a clear message (same format as
other errors), so delete_artifact and any other callers cannot use "." as an
artifact_id and accidentally operate on the whole artifacts tree.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 37ca5f18-69fc-4ea8-ad37-543ac0921a5a
📒 Files selected for processing (10)
.claude/memory.mdsrc/core/all.rssrc/openhuman/artifacts/mod.rssrc/openhuman/artifacts/ops.rssrc/openhuman/artifacts/ops_tests.rssrc/openhuman/artifacts/schemas.rssrc/openhuman/artifacts/store.rssrc/openhuman/artifacts/store_tests.rssrc/openhuman/artifacts/types.rssrc/openhuman/mod.rs
|
|
||
| - **Filesystem-backed persistence, no SQLite** — `src/openhuman/artifacts/` stores JSON metadata (`meta.json`) + binary blobs under `<workspace_dir>/artifacts/<uuid>/`. Pattern mirrors `memory/ops/files.rs` but simpler. | ||
| - **`"ai"` namespace in controller registry** — RPC methods are `openhuman.ai_list_artifacts`, `openhuman.ai_get_artifact`, `openhuman.ai_delete_artifact`. Future `ai_*` methods should use this same namespace. | ||
| - **Two-layer path validation required** — (1) `validate_artifact_id` rejects empty strings, `/`, `\`, `..`, absolute Unix paths, Windows `C:` and UNC `\\` paths; (2) `assert_within_root` canonicalizes and checks containment. Replicate this pattern for any new filesystem-backed domain. |
There was a problem hiding this comment.
Correct the security guidance and assert_within_root description.
This line is currently misleading in two ways: assert_within_root here does not canonicalize paths (it checks starts_with), and advising teams to “replicate this pattern” conflicts with the documented policy that file I/O validation must go through src/openhuman/security/policy.rs (validate_path / validate_parent_path). Please update this note to avoid steering future domains toward a non-policy pattern.
Suggested doc fix
-- **Two-layer path validation required** — (1) `validate_artifact_id` rejects empty strings, `/`, `\`, `..`, absolute Unix paths, Windows `C:` and UNC `\\` paths; (2) `assert_within_root` canonicalizes and checks containment. Replicate this pattern for any new filesystem-backed domain.
+- **Artifacts path safety (domain-specific)** — this domain currently uses (1) `validate_artifact_id` input checks and (2) `assert_within_root` containment checks before filesystem operations. Note: `assert_within_root` is a containment check (not canonicalization). For new filesystem-backed domains, follow the security policy entry points in `src/openhuman/security/policy.rs` (`validate_path` / `validate_parent_path`).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Two-layer path validation required** — (1) `validate_artifact_id` rejects empty strings, `/`, `\`, `..`, absolute Unix paths, Windows `C:` and UNC `\\` paths; (2) `assert_within_root` canonicalizes and checks containment. Replicate this pattern for any new filesystem-backed domain. | |
| - **Artifacts path safety (domain-specific)** — this domain currently uses (1) `validate_artifact_id` input checks and (2) `assert_within_root` containment checks before filesystem operations. Note: `assert_within_root` is a containment check (not canonicalization). For new filesystem-backed domains, follow the security policy entry points in `src/openhuman/security/policy.rs` (`validate_path` / `validate_parent_path`). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/memory.md at line 217, Update the note to correct the behavior and
point to the canonical policy: clarify that assert_within_root does not
canonicalize but only checks prefix containment (starts_with), remove the
instruction to “replicate this pattern” for new domains, and instead direct
teams to use the centralized file I/O validation in
src/openhuman/security/policy.rs (i.e., call validate_path /
validate_parent_path) while keeping validate_artifact_id's two-layer checks as
an example for artifact IDs only.
| // Compute absolute path for the caller's convenience. | ||
| let absolute_path = config | ||
| .workspace_dir | ||
| .join("artifacts") | ||
| .join(&meta.path) | ||
| .to_string_lossy() | ||
| .into_owned(); |
There was a problem hiding this comment.
Harden absolute_path derivation against unsafe meta.path.
meta.path is used directly in join(...). If it is absolute (or contains parent traversal), the computed absolute_path can escape the artifacts root.
Suggested fix
+ use std::path::{Component, Path};
+
+ let rel = Path::new(&meta.path);
+ if rel.is_absolute()
+ || rel
+ .components()
+ .any(|c| matches!(c, Component::ParentDir | Component::RootDir | Component::Prefix(_)))
+ {
+ return Err(format!(
+ "[artifacts] invalid artifact path for id={artifact_id}: {:?}",
+ meta.path
+ ));
+ }
+
+ let artifacts_root = config.workspace_dir.join("artifacts");
let absolute_path = config
- .workspace_dir
- .join("artifacts")
+ .workspace_dir
+ .join("artifacts")
.join(&meta.path)
.to_string_lossy()
.into_owned();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Compute absolute path for the caller's convenience. | |
| let absolute_path = config | |
| .workspace_dir | |
| .join("artifacts") | |
| .join(&meta.path) | |
| .to_string_lossy() | |
| .into_owned(); | |
| // Compute absolute path for the caller's convenience. | |
| use std::path::{Component, Path}; | |
| let rel = Path::new(&meta.path); | |
| if rel.is_absolute() | |
| || rel | |
| .components() | |
| .any(|c| matches!(c, Component::ParentDir | Component::RootDir | Component::Prefix(_))) | |
| { | |
| return Err(format!( | |
| "[artifacts] invalid artifact path for id={artifact_id}: {:?}", | |
| meta.path | |
| )); | |
| } | |
| let artifacts_root = config.workspace_dir.join("artifacts"); | |
| let absolute_path = config | |
| .workspace_dir | |
| .join("artifacts") | |
| .join(&meta.path) | |
| .to_string_lossy() | |
| .into_owned(); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/artifacts/ops.rs` around lines 63 - 69, The current derivation
of absolute_path uses meta.path directly which allows absolute paths or
parent-traversal to escape the artifacts root; update the logic that builds
absolute_path (the code touching absolute_path, config.workspace_dir, and
meta.path) to first validate/sanitize meta.path: reject or canonicalize if
Path::new(&meta.path).is_absolute() and reject or disallow any
Path::new(&meta.path).components() containing Component::ParentDir, or
alternatively rebuild a safe relative PathBuf by filtering components to only
Normal components before joining; if validation fails return an error instead of
joining, then join the sanitized relative path to
config.workspace_dir.join("artifacts") and convert to string as before.
| outputs: vec![FieldSchema { | ||
| name: "artifact", | ||
| ty: TypeSchema::Ref("ArtifactMeta"), | ||
| comment: "Artifact metadata plus absolute_path field.", | ||
| required: true, | ||
| }], |
There was a problem hiding this comment.
get_artifact output schema does not encode the documented absolute_path field.
The schema advertises artifact as ArtifactMeta, but the comment says it includes an extra absolute_path. That makes discovery/docs contract ambiguous for clients. Please model the extra field explicitly (or remove that claim if not returned).
Suggested schema shape
- outputs: vec![FieldSchema {
- name: "artifact",
- ty: TypeSchema::Ref("ArtifactMeta"),
- comment: "Artifact metadata plus absolute_path field.",
- required: true,
- }],
+ outputs: vec![FieldSchema {
+ name: "artifact",
+ ty: TypeSchema::Object {
+ fields: vec![
+ FieldSchema {
+ name: "meta",
+ ty: TypeSchema::Ref("ArtifactMeta"),
+ comment: "Artifact metadata.",
+ required: true,
+ },
+ FieldSchema {
+ name: "absolute_path",
+ ty: TypeSchema::String,
+ comment: "Resolved absolute file path for the artifact.",
+ required: true,
+ },
+ ],
+ },
+ comment: "Artifact metadata plus resolved absolute path.",
+ required: true,
+ }],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| outputs: vec![FieldSchema { | |
| name: "artifact", | |
| ty: TypeSchema::Ref("ArtifactMeta"), | |
| comment: "Artifact metadata plus absolute_path field.", | |
| required: true, | |
| }], | |
| outputs: vec![FieldSchema { | |
| name: "artifact", | |
| ty: TypeSchema::Object { | |
| fields: vec![ | |
| FieldSchema { | |
| name: "meta", | |
| ty: TypeSchema::Ref("ArtifactMeta"), | |
| comment: "Artifact metadata.", | |
| required: true, | |
| }, | |
| FieldSchema { | |
| name: "absolute_path", | |
| ty: TypeSchema::String, | |
| comment: "Resolved absolute file path for the artifact.", | |
| required: true, | |
| }, | |
| ], | |
| }, | |
| comment: "Artifact metadata plus resolved absolute path.", | |
| required: true, | |
| }], |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/artifacts/schemas.rs` around lines 87 - 92, The output schema
for get_artifact currently uses TypeSchema::Ref("ArtifactMeta") but the comment
claims an extra absolute_path field; update the contract to match by replacing
the output FieldSchema type with an explicit schema that includes ArtifactMeta
plus absolute_path (either by changing the type to Ref("ArtifactMetaWithPath")
and adding that new type definition, or by using an inline TypeSchema::Object
that spreads ArtifactMeta fields and adds absolute_path: TypeSchema::String
required: true); ensure the FieldSchema name stays "artifact" and adjust any
schema registry entries that reference ArtifactMeta if you introduce
ArtifactMetaWithPath.
| fn validate_artifact_id(id: &str) -> Result<(), String> { | ||
| if id.is_empty() { | ||
| return Err("[artifacts] artifact_id must not be empty".to_string()); | ||
| } | ||
| if id.contains('/') { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not contain '/': {id:?}" | ||
| )); | ||
| } | ||
| if id.contains('\\') { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not contain '\\': {id:?}" | ||
| )); | ||
| } | ||
| if id == ".." || id.starts_with("../") || id.starts_with("..\\") { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not be a path traversal: {id:?}" | ||
| )); | ||
| } | ||
| // Reject absolute paths (Unix /foo or Windows C:\foo / \\server\share) | ||
| if id.starts_with('/') || id.starts_with('\\') { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not be an absolute path: {id:?}" | ||
| )); | ||
| } | ||
| // Reject Windows drive-letter paths like C: | ||
| if id.len() >= 2 && id.as_bytes()[1] == b':' { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not be an absolute path: {id:?}" | ||
| )); | ||
| } | ||
| Ok(()) |
There was a problem hiding this comment.
Reject . as an artifact ID to prevent full-root deletion.
validate_artifact_id allows ".". That value flows into delete_artifact, where remove_dir_all(<workspace>/artifacts/.) can delete the entire artifacts tree.
Suggested fix
fn validate_artifact_id(id: &str) -> Result<(), String> {
if id.is_empty() {
return Err("[artifacts] artifact_id must not be empty".to_string());
}
+ if id == "." || id == ".." {
+ return Err(format!(
+ "[artifacts] artifact_id must not be a reserved path component: {id:?}"
+ ));
+ }
if id.contains('/') {
return Err(format!(
"[artifacts] artifact_id must not contain '/': {id:?}"
));
}
@@
- if id == ".." || id.starts_with("../") || id.starts_with("..\\") {
+ if id.starts_with("../") || id.starts_with("..\\") {
return Err(format!(
"[artifacts] artifact_id must not be a path traversal: {id:?}"
));
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/artifacts/store.rs` around lines 26 - 57, The validator
validate_artifact_id currently allows the single-dot "." which leads to
delete_artifact calling remove_dir_all on the artifacts root; update
validate_artifact_id to reject "." (and also reject "./" and ".\\" to be safe)
by returning an Err with a clear message (same format as other errors), so
delete_artifact and any other callers cannot use "." as an artifact_id and
accidentally operate on the whole artifacts tree.
Summary
artifactsdomain (src/openhuman/artifacts/) withArtifactMetastruct,ArtifactKindenum (Presentation, Document, Image, Other), andArtifactStatusenum (Pending, Ready, Failed)<workspace>/artifacts/<uuid>/meta.json+ binary blobs with lazy directory creationai_list_artifacts(paginated),ai_get_artifact(metadata + absolute path),ai_delete_artifact(removes metadata + blob)Test plan
cargo checkpassescargo test -p openhuman --lib -- "artifacts"— 38/38 tests passcargo fmt --checkcleanChecklist
[artifacts]prefix on all operationsCloses #2776
Summary by CodeRabbit
New Features