Skip to content

feat(cache): restore output files on cache hit#321

Open
branchseer wants to merge 11 commits intomainfrom
feat/output-restoration
Open

feat(cache): restore output files on cache hit#321
branchseer wants to merge 11 commits intomainfrom
feat/output-restoration

Conversation

@branchseer
Copy link
Copy Markdown
Member

@branchseer branchseer commented Apr 6, 2026

Previously, a cache hit only replayed terminal output (stdout/stderr). Build artifacts like dist/ were not restored, so downstream tasks or users had to re-run even on a cache hit.

Now, output files written during task execution are archived and restored automatically. The output field reuses the same config types and resolution logic as input (renamed ResolvedInputConfigResolvedGlobConfig to reflect this).

Config examples:

// Default: auto-detect written files (same as input default)
{ "command": "tsc --outDir dist" }

// Explicit globs
{ "command": "tsc", "output": ["dist/**"] }

// Auto with exclusions
{ "command": "webpack", "output": [{ "auto": true }, "!dist/cache/**"] }

// Disable output restoration
{ "command": "vitest run", "output": [] }

Test plan

  • E2E: auto output detection restores files on cache hit
  • E2E: glob output only restores matched files
  • E2E: auto output works with non-auto input
  • E2E: negative output globs exclude files from restoration
  • E2E: changing output config invalidates cache
  • All existing E2E and plan snapshot tests pass
  • cargo clippy -- -D warnings clean

🤖 Generated with Claude Code

branchseer and others added 11 commits March 31, 2026 10:15
…` and add `output` field

Rename `ResolvedInputConfig` to `ResolvedGlobConfig` since the struct is now
shared by both input and output config. Add `output` field to
`EnabledCacheConfig` with the same type as `input` (`Option<UserInputsConfig>`),
defaulting to auto-detection. Add `output_config` to `CacheConfig` and resolve
it via the shared `ResolvedGlobConfig::from_user_config()`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `output_config: ResolvedGlobConfig` to `CacheMetadata` and propagate it
through `plan_spawn_execution` and `resolve_synthetic_cache_config`. Synthetic
tasks merge output globs into the parent's output config, mirroring the
existing input config merging logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `zstd` as a workspace dependency and add `tar`, `uuid` (with v4 feature),
and `zstd` to the `vite_task` crate for output archive creation and extraction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and archive module

- Add `output_config` to `CacheEntryKey` so changing output config invalidates cache
- Add `output_archive: Option<Str>` to `CacheEntryValue` for the tar.zst filename
- Bump DB version from 10 to 11 (resets old databases)
- Add `FingerprintMismatch::OutputConfig` variant with display support
- Create `archive.rs` module with tar+zstd create/extract operations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ration

- Enable fspy tracking when either input or output uses auto-detection
- Add `collect_and_archive_outputs()` to gather output files from fspy writes
  and/or output globs, then create a tar.zst archive on cache update
- Restore output files from archive on cache hit via `extract_output_archive()`
- Add `collect_glob_paths()` to glob_inputs.rs for path-only collection
- Thread `cache_dir` through `ExecutionContext` and `execute_spawn`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plan snapshots now include the new `output_config` field. E2E input-cache-test
snapshots updated to reflect the output_config in cache entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 5 e2e test cases covering output restoration:
- Auto output detection: files restored on cache hit
- Glob output: only matched files restored (dist/ yes, tmp/ no)
- Auto output with non-auto input: output auto works independently
- Negative output globs: excluded files not restored
- Output config change: invalidates cache with descriptive message

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Windows, `&&`-chained commands become separate cache entries. The
`vtt mkdir -p dist` sub-command's fspy-inferred input is the `dist`
directory itself — when the test deletes `dist`, the next run detects
`dist` was removed and reports a cache miss for that sub-command.

Fix by making `vtt write-file` create parent directories automatically,
then removing the `mkdir` commands from test fixtures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-test

On Windows, `create_dir_all` causes fspy to track parent directories as
inferred inputs. Deleting the directory then triggers a cache miss on the
next run. Fix by deleting only the output files, keeping directories intact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79f428a501

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +410 to +411
let needs_fspy = cache_metadata.input_config.includes_auto
|| cache_metadata.output_config.includes_auto;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Separate input/output negative filters for auto tracking

This now enables fspy whenever output_config.includes_auto is true, but the filter passed to path tracking is still derived only from input_config.negative_globs, so writes that match input negatives are dropped before output archiving. In execute_spawn, a config like input: [{auto:true}, "!dist/**"] with default auto output will never record dist/** writes in path_writes, causing cache hits to replay logs but fail to restore expected output files.

Useful? React with 👍 / 👎.

sorted_files.sort();

// Create archive with UUID filename
let archive_name: Str = vite_str::format!("{}.tar.zst", uuid::Uuid::new_v4());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clean up superseded output archives on cache overwrite

A fresh UUID archive filename is generated on every successful cache write, but the previous archive file is never removed when the same cache key is updated, so old tarballs become orphaned. Re-running tasks over time will continuously accumulate stale *.tar.zst files in the cache directory and can cause unbounded disk growth even though only the latest archive is reachable from cache.db.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant