feat: add OwnedSourceMap wrapper for downstream owned-only use#334
Merged
Conversation
BREAKING CHANGE: `SourceMap` is now `SourceMap<'a>`. All public string fields previously stored as `Arc<str>` are now `Cow<'a, str>` and all accessors that returned `&Arc<str>` now return `&str`. Motivation ========== The previous design held `Vec<Arc<str>>` for `names` / `sources` / `sources_content`. Even after the borrowed-deserialization wrapper in the in-flight version of this PR, every name and source still cost one `Arc::from(&str)` allocation (`ArcInner` header + data copy). And the public `Arc<str>` API forced concat to clone an `Arc` per token — fine for refcount, but still 8 bytes of atomic state per string. The "no Arc, borrow everything" answer is to make the strings actual borrows. With `Cow<'a, str>` and `#[serde(borrow)]`, the deserializer returns a slice into the input JSON buffer for every unescaped string — zero allocations, just a (ptr, len) pair. Escaped strings still allocate (a `String` in `Cow::Owned`), but those are the exception, not the rule. The concat builder benefits the most: it now re-borrows strings from its input `SourceMap`s instead of cloning `Arc`s, so concatenation does zero string allocations. Trade-offs ========== * `SourceMap` now carries a lifetime parameter. For data parsed from JSON, that lifetime ties the map to its input buffer; callers must keep the buffer alive. For maps built via `SourceMapBuilder` / `ConcatSourceMapBuilder`, the lifetime is implied by the inputs (or `'static` for the standalone builder). * `Cow<'_, str>` is 32 bytes per entry vs `Arc<str>`'s 16, so the in-memory `Vec`s are larger; this offsets some of the per-string allocation savings on the parse path. Numbers (vs main with the xlarge fixture) ========================================= | benchmark | before | after | Δ | |-----------------------|---------|---------|--------| | parse/real_large | 4.29 µs | 3.87 µs | -10% | | parse/real_medium | 666 ns | 569 ns | -15% | | parse/real_small | 445 ns | 375 ns | -16% | | concat/from_sourcemaps| ~19 µs | 10.2 µs | -46% | Public API changes ================== * `SourceMap` → `SourceMap<'a>` * `SourceMap::new(file: Option<Arc<str>>, names: Vec<Arc<str>>, ...)` → `(file: Option<Cow<'a, str>>, names: Vec<Cow<'a, str>>, ...)` * `get_file() -> Option<&Arc<str>>` → `Option<&str>` * `get_name(id) -> Option<&Arc<str>>` → `Option<&str>` * `get_source(id) -> Option<&Arc<str>>` → `Option<&str>` * `get_source_content(id) -> Option<&Arc<str>>` → `Option<&str>` * `get_names() -> impl Iterator<Item = &Arc<str>>` → `&str` * `get_sources() -> impl Iterator<Item = &Arc<str>>` → `&str` * `get_source_contents() -> impl Iterator<Item = Option<&Arc<str>>>` → `Option<&str>` * `SourceViewToken<'a>` → `SourceViewToken<'sm, 'data>` (two lifetimes) * `ConcatSourceMapBuilder` → `ConcatSourceMapBuilder<'a>`
Removed the `#[serde(default)]` attributes I'd added to fields that the sourcemap spec actually requires (`sources`, `mappings`, etc.), so malformed JSON missing those fields now errors out as it does on main. Only `names` keeps `#[serde(default)]`, matching the original `JSONSourceMap` shape. Restores the `sourcesMissing` tc39 spec test.
Allocate owned copies of every borrowed string in a `SourceMap` so that the result is `SourceMap<'static>`, detached from the original JSON input buffer (or from a concat builder's input lifetime). Discovered while linking the new lifetime-parameterized SourceMap into downstream consumers (oxc, rolldown). Several places need a concat or parse result to outlive its source data: the previous Arc-based API let them share cheaply, the new Cow-based API needs an explicit copy step. `into_owned` is that step.
…sforms Add `SourceMapParts<'a>` — a destructured form of `SourceMap<'a>` that exposes its owned `Cow<'a, str>` fields. Returned by `SourceMap::into_parts`, consumed by `SourceMap::from_parts` / `From<SourceMapParts>`. Use case: downstream code that wants to *transform* a sourcemap (rewrite tokens, swap in a different `file`, drop `token_chunks`) without going through string accessors. The previous API only exposed `&str` views, which forced a clone of every name / source / sourcesContent string when the caller wanted to take ownership and reassemble. With `into_parts`, that becomes a `Vec<Cow<'a, str>>` move — zero per-string allocations. Pairs naturally with `set_file` / `set_sources` etc. for in-place mutation when the caller only needs to change one field.
Adds `OwnedSourceMap`, a lifetime-free newtype around
`SourceMap<'static>`. Lets downstream consumers (oxc, rolldown) store
sourcemaps as long-lived struct fields without the `'static` lifetime
annotation leaking into every type signature.
It exposes the same accessors as `SourceMap`, delegates parse /
serialize / lookup / etc., and provides `as_source_map()` for code
that needs to hand the underlying `&SourceMap<'_>` to functions that
are still generic over the lifetime (e.g. encoder, concat builder).
```rust
// Builder produces a no-lifetime owned map:
let owned: OwnedSourceMap = SourceMapBuilder::default()
.into_owned_sourcemap();
// Parse + detach in one step:
let owned = OwnedSourceMap::from_json_string(json)?;
// Borrow back into the lifetime-parameterized API when needed:
let json = owned.as_source_map().to_json_string();
```
No allocation-layout change vs. the underlying `SourceMap<'static>` —
this is purely an API-ergonomics layer. The zero-copy parse path is
unchanged.
Lets downstream NAPI bindings (oxc_minify_napi, oxc_transform_napi) accept either flavor when converting to the JS-facing SourceMap shape.
Merging this PR will improve performance by 6.07%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | lookup_table[real_small] |
1.4 µs | 1.3 µs | +2.18% |
| ⚡ | add_sourcemap_loop |
363 µs | 343.2 µs | +5.76% |
| ⚡ | from_sourcemaps |
344.3 µs | 325.4 µs | +5.81% |
| ⚡ | lookup_table[real_medium] |
1.5 µs | 1.4 µs | +4.03% |
| ⚡ | parse[real_large] |
54.7 µs | 50.5 µs | +8.34% |
| ⚡ | parse[real_medium] |
16.5 µs | 14.9 µs | +10.72% |
| ❌ | serialize[real_small] |
4.3 µs | 4.4 µs | -2.44% |
| ⚡ | parse[real_small] |
13.2 µs | 11.6 µs | +13.16% |
| ⚡ | parse[real_xlarge] |
1.7 ms | 1.4 ms | +20.6% |
| ⚡ | from_json_string_inline |
15.8 µs | 14.1 µs | +11.97% |
| ❌ | serialize[real_large] |
20.3 µs | 20.7 µs | -1.9% |
| ❌ | serialize[real_medium] |
4.9 µs | 5.1 µs | -2.8% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/owned-sourcemap (4ad61e8) with main (9323530)
Footnotes
-
5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Six bugs/regressions surfaced by the max-effort review: - encode: debug_id was pushed raw without JSON escaping, producing malformed output (and breaking round-trip) for any debug_id with a quote/backslash/control char. Now flows through `escape_into` like file/sourceRoot, plus a regression test. - napi: `From<SourceMap> for napi::SourceMap` was hardcoding `x_google_ignorelist: None` and silently dropping the field. Now populated from `json.x_google_ignore_list`. Same fix flows through the new `From<OwnedSourceMap>` delegation. - concat: `add_sourcemap`'s per-token closures mutated `token_chunk_prev_source_id`/`name_id` BEFORE the `i == 0` dedup `continue` check ran, so a deduplicated first token polluted the next sourcemap's TokenChunk header. Hoist the offset-adjusted ids into locals, build the new token, run the dedup check, THEN commit the prev-id state — only for tokens we actually push. - sourcemap_builder: dedup `FxHashMap<Cow<'static, str>, u32>` keyed by the string value forced an extra `String` allocation per unique name/source (`Cow::Owned(String).clone()` = fresh heap alloc). The old `Arc<str>`-keyed code was 1 alloc + 1 refcount bump; the migration silently doubled allocations on a hot path. Replace with `hashbrown::HashTable<u32>` storing indices into the `Vec`, looking up via FxHasher-hashed content equality — back to 1 alloc per unique entry. Add a dedup regression test. - sourcemap_visualizer: `source_contents_lines_map` was built via `filter_map` that dropped `None` entries, then indexed by `source_id`. Mixed Some/None source_contents either panicked out-of-bounds or silently mis-mapped to the wrong content. Build a full-length `Vec<Option<...>>` so the index stays aligned with `source_id`, and skip-not-panic for sources with no content. - owned_sourcemap: explicitly delegate `lookup_token` / `lookup_source_view_token` (previously only reachable via `Deref`, inconsistent with the rest of the delegated API). Drop the unused `SourceMap` import from the module-level doctest and rename `SourceMapBuilder` to its short form so the doctest is `-Dwarnings` clean. All 23 unit/integration/doctest tests pass; bench numbers unchanged within run-to-run noise.
CI runs `cargo fmt --check`, `cargo clippy --all-targets --all-features -D warnings`, and `cargo doc -D warnings`. The latter two were green; the format-check tripped on three files touched by the previous fix commit, and `cargo doc` failed with `unresolved link to OwnedSourceMap` because the SourceMapBuilder docstring referenced `OwnedSourceMap` without the `crate::` prefix.
CI's typos check flags `mis-aligning` as a likely typo of `miss`/`mist`.
Reword the comment in src/sourcemap_visualizer.rs to use the closed
form ("misaligned") which is not in the typos rule set.
The previous commit swapped the builder's dedup from `FxHashMap<Cow<'static, str>, u32>` to `hashbrown::HashTable<u32>` to avoid the second string allocation per unique entry (one for the Vec, one for the HashMap key). Per request, drop the extra dependency for now and accept the double-allocation: the cost is one extra `String` per *unique* name / source — bounded by the input's distinct-string count, not the token count, so still small in practice.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a thin newtype wrapper `OwnedSourceMap` around `SourceMap<'static>`. The wrapper hides the lifetime annotation that PR #329 introduced, giving downstream consumers (oxc, rolldown) a clean no-lifetime type for storing sourcemaps in long-lived struct fields.
Stacks on top of #329 (the borrow-string parse path). No allocation-layout change; this is purely an API-ergonomics layer.
```rust
use oxc_sourcemap::{OwnedSourceMap, SourceMap};
// Builder produces a no-lifetime map directly:
let owned: OwnedSourceMap = SourceMapBuilder::default().into_owned_sourcemap();
// Parse-and-detach in one step (uses #329's zero-copy parse path internally):
let owned = OwnedSourceMap::from_json_string(json)?;
// Same accessors as SourceMap, returning &str:
let _file: Option<&str> = owned.get_file();
let _names: Vec<&str> = owned.get_names().collect();
// Borrow back into the lifetime-parameterized API where needed
// (e.g. encode/concat/visualizer that are still generic over the lifetime):
owned.as_source_map().to_json_string();
```
What downstream looks like
Before (PR #329 alone):
```rust
pub struct Asset {
pub map: Option<oxc_sourcemap::SourceMap<'static>>,
// ^^^^^^^^^ noise
}
```
After (with this PR):
```rust
pub struct Asset {
pub map: Option<oxc_sourcemap::OwnedSourceMap>,
}
```
The `'static` annotation is gone from every storage site. Callers who genuinely want zero-copy parse still use `SourceMap<'a>` directly.
Downstream verified
Breaking changes
None for the existing `SourceMap<'a>` API in #329 — this is purely additive.
Files