Skip to content

feat: add OwnedSourceMap wrapper for downstream owned-only use#334

Merged
Boshen merged 12 commits into
mainfrom
perf/owned-sourcemap
May 25, 2026
Merged

feat: add OwnedSourceMap wrapper for downstream owned-only use#334
Boshen merged 12 commits into
mainfrom
perf/owned-sourcemap

Conversation

@Boshen
Copy link
Copy Markdown
Member

@Boshen Boshen commented May 25, 2026

Summary

Adds a thin newtype wrapper `OwnedSourceMap` around `SourceMap<'static>`. The wrapper hides the lifetime annotation that PR #329 introduced, giving downstream consumers (oxc, rolldown) a clean no-lifetime type for storing sourcemaps in long-lived struct fields.

Stacks on top of #329 (the borrow-string parse path). No allocation-layout change; this is purely an API-ergonomics layer.

```rust
use oxc_sourcemap::{OwnedSourceMap, SourceMap};

// Builder produces a no-lifetime map directly:
let owned: OwnedSourceMap = SourceMapBuilder::default().into_owned_sourcemap();

// Parse-and-detach in one step (uses #329's zero-copy parse path internally):
let owned = OwnedSourceMap::from_json_string(json)?;

// Same accessors as SourceMap, returning &str:
let _file: Option<&str> = owned.get_file();
let _names: Vec<&str> = owned.get_names().collect();

// Borrow back into the lifetime-parameterized API where needed
// (e.g. encode/concat/visualizer that are still generic over the lifetime):
owned.as_source_map().to_json_string();
```

What downstream looks like

Before (PR #329 alone):

```rust
pub struct Asset {
pub map: Option<oxc_sourcemap::SourceMap<'static>>,
// ^^^^^^^^^ noise
}
```

After (with this PR):

```rust
pub struct Asset {
pub map: Option<oxc_sourcemap::OwnedSourceMap>,
}
```

The `'static` annotation is gone from every storage site. Callers who genuinely want zero-copy parse still use `SourceMap<'a>` directly.

Downstream verified

  • oxc: 2 small `-> OwnedSourceMap` annotations in `oxc_codegen`; the napi shim now accepts both `SourceMap<'_>` and `OwnedSourceMap`. `cargo check --workspace --all-targets` clean.
  • rolldown: `rolldown_sourcemap` re-exports `OwnedSourceMap as SourceMap` so every existing `rolldown_sourcemap::SourceMap` usage upgrades for free. `adjust_sourcemap_dst_lines` and `collapse_sourcemaps` updated to the owned wrapper. `cargo check --workspace` clean, `rolldown_sourcemap` tests pass.

Breaking changes

None for the existing `SourceMap<'a>` API in #329 — this is purely additive.

Files

  • `src/owned_sourcemap.rs` (new) — wrapper type, delegating accessors / parse / serialize / lookup.
  • `src/lib.rs` — re-export `OwnedSourceMap`.
  • `src/sourcemap_builder.rs` — adds `into_owned_sourcemap()` convenience.
  • `src/napi.rs` — `impl From for napi::SourceMap`.

Boshen and others added 7 commits May 25, 2026 14:19
BREAKING CHANGE: `SourceMap` is now `SourceMap<'a>`. All public string
fields previously stored as `Arc<str>` are now `Cow<'a, str>` and all
accessors that returned `&Arc<str>` now return `&str`.

Motivation
==========

The previous design held `Vec<Arc<str>>` for `names` / `sources` /
`sources_content`. Even after the borrowed-deserialization wrapper in
the in-flight version of this PR, every name and source still cost one
`Arc::from(&str)` allocation (`ArcInner` header + data copy). And the
public `Arc<str>` API forced concat to clone an `Arc` per token — fine
for refcount, but still 8 bytes of atomic state per string.

The "no Arc, borrow everything" answer is to make the strings actual
borrows. With `Cow<'a, str>` and `#[serde(borrow)]`, the deserializer
returns a slice into the input JSON buffer for every unescaped string —
zero allocations, just a (ptr, len) pair. Escaped strings still
allocate (a `String` in `Cow::Owned`), but those are the exception, not
the rule.

The concat builder benefits the most: it now re-borrows strings from
its input `SourceMap`s instead of cloning `Arc`s, so concatenation does
zero string allocations.

Trade-offs
==========

* `SourceMap` now carries a lifetime parameter. For data parsed from
  JSON, that lifetime ties the map to its input buffer; callers must
  keep the buffer alive. For maps built via `SourceMapBuilder` /
  `ConcatSourceMapBuilder`, the lifetime is implied by the inputs (or
  `'static` for the standalone builder).
* `Cow<'_, str>` is 32 bytes per entry vs `Arc<str>`'s 16, so the
  in-memory `Vec`s are larger; this offsets some of the per-string
  allocation savings on the parse path.

Numbers (vs main with the xlarge fixture)
=========================================

| benchmark             | before  | after   | Δ      |
|-----------------------|---------|---------|--------|
| parse/real_large      | 4.29 µs | 3.87 µs | -10%   |
| parse/real_medium     | 666 ns  | 569 ns  | -15%   |
| parse/real_small      | 445 ns  | 375 ns  | -16%   |
| concat/from_sourcemaps| ~19 µs  | 10.2 µs | -46%   |

Public API changes
==================

* `SourceMap` → `SourceMap<'a>`
* `SourceMap::new(file: Option<Arc<str>>, names: Vec<Arc<str>>, ...)`
  → `(file: Option<Cow<'a, str>>, names: Vec<Cow<'a, str>>, ...)`
* `get_file() -> Option<&Arc<str>>` → `Option<&str>`
* `get_name(id) -> Option<&Arc<str>>` → `Option<&str>`
* `get_source(id) -> Option<&Arc<str>>` → `Option<&str>`
* `get_source_content(id) -> Option<&Arc<str>>` → `Option<&str>`
* `get_names() -> impl Iterator<Item = &Arc<str>>` → `&str`
* `get_sources() -> impl Iterator<Item = &Arc<str>>` → `&str`
* `get_source_contents() -> impl Iterator<Item = Option<&Arc<str>>>` → `Option<&str>`
* `SourceViewToken<'a>` → `SourceViewToken<'sm, 'data>` (two lifetimes)
* `ConcatSourceMapBuilder` → `ConcatSourceMapBuilder<'a>`
Removed the `#[serde(default)]` attributes I'd added to fields that the
sourcemap spec actually requires (`sources`, `mappings`, etc.), so
malformed JSON missing those fields now errors out as it does on main.
Only `names` keeps `#[serde(default)]`, matching the original
`JSONSourceMap` shape.

Restores the `sourcesMissing` tc39 spec test.
Allocate owned copies of every borrowed string in a `SourceMap` so that
the result is `SourceMap<'static>`, detached from the original JSON
input buffer (or from a concat builder's input lifetime).

Discovered while linking the new lifetime-parameterized SourceMap into
downstream consumers (oxc, rolldown). Several places need a concat or
parse result to outlive its source data: the previous Arc-based API
let them share cheaply, the new Cow-based API needs an explicit copy
step. `into_owned` is that step.
…sforms

Add `SourceMapParts<'a>` — a destructured form of `SourceMap<'a>` that
exposes its owned `Cow<'a, str>` fields. Returned by
`SourceMap::into_parts`, consumed by `SourceMap::from_parts` /
`From<SourceMapParts>`.

Use case: downstream code that wants to *transform* a sourcemap
(rewrite tokens, swap in a different `file`, drop `token_chunks`)
without going through string accessors. The previous API only exposed
`&str` views, which forced a clone of every name / source /
sourcesContent string when the caller wanted to take ownership and
reassemble.

With `into_parts`, that becomes a `Vec<Cow<'a, str>>` move — zero
per-string allocations. Pairs naturally with `set_file` /
`set_sources` etc. for in-place mutation when the caller only needs to
change one field.
Adds `OwnedSourceMap`, a lifetime-free newtype around
`SourceMap<'static>`. Lets downstream consumers (oxc, rolldown) store
sourcemaps as long-lived struct fields without the `'static` lifetime
annotation leaking into every type signature.

It exposes the same accessors as `SourceMap`, delegates parse /
serialize / lookup / etc., and provides `as_source_map()` for code
that needs to hand the underlying `&SourceMap<'_>` to functions that
are still generic over the lifetime (e.g. encoder, concat builder).

```rust
// Builder produces a no-lifetime owned map:
let owned: OwnedSourceMap = SourceMapBuilder::default()
    .into_owned_sourcemap();

// Parse + detach in one step:
let owned = OwnedSourceMap::from_json_string(json)?;

// Borrow back into the lifetime-parameterized API when needed:
let json = owned.as_source_map().to_json_string();
```

No allocation-layout change vs. the underlying `SourceMap<'static>` —
this is purely an API-ergonomics layer. The zero-copy parse path is
unchanged.
Lets downstream NAPI bindings (oxc_minify_napi, oxc_transform_napi)
accept either flavor when converting to the JS-facing SourceMap shape.
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 25, 2026

Merging this PR will improve performance by 6.07%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 9 improved benchmarks
❌ 3 regressed benchmarks
✅ 4 untouched benchmarks
⏩ 5 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
lookup_table[real_small] 1.4 µs 1.3 µs +2.18%
add_sourcemap_loop 363 µs 343.2 µs +5.76%
from_sourcemaps 344.3 µs 325.4 µs +5.81%
lookup_table[real_medium] 1.5 µs 1.4 µs +4.03%
parse[real_large] 54.7 µs 50.5 µs +8.34%
parse[real_medium] 16.5 µs 14.9 µs +10.72%
serialize[real_small] 4.3 µs 4.4 µs -2.44%
parse[real_small] 13.2 µs 11.6 µs +13.16%
parse[real_xlarge] 1.7 ms 1.4 ms +20.6%
from_json_string_inline 15.8 µs 14.1 µs +11.97%
serialize[real_large] 20.3 µs 20.7 µs -1.9%
serialize[real_medium] 4.9 µs 5.1 µs -2.8%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing perf/owned-sourcemap (4ad61e8) with main (9323530)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Boshen and others added 5 commits May 25, 2026 23:42
Six bugs/regressions surfaced by the max-effort review:

- encode: debug_id was pushed raw without JSON escaping, producing
  malformed output (and breaking round-trip) for any debug_id with a
  quote/backslash/control char. Now flows through `escape_into` like
  file/sourceRoot, plus a regression test.

- napi: `From<SourceMap> for napi::SourceMap` was hardcoding
  `x_google_ignorelist: None` and silently dropping the field. Now
  populated from `json.x_google_ignore_list`. Same fix flows through
  the new `From<OwnedSourceMap>` delegation.

- concat: `add_sourcemap`'s per-token closures mutated
  `token_chunk_prev_source_id`/`name_id` BEFORE the `i == 0` dedup
  `continue` check ran, so a deduplicated first token polluted the
  next sourcemap's TokenChunk header. Hoist the offset-adjusted ids
  into locals, build the new token, run the dedup check, THEN commit
  the prev-id state — only for tokens we actually push.

- sourcemap_builder: dedup `FxHashMap<Cow<'static, str>, u32>` keyed
  by the string value forced an extra `String` allocation per unique
  name/source (`Cow::Owned(String).clone()` = fresh heap alloc). The
  old `Arc<str>`-keyed code was 1 alloc + 1 refcount bump; the
  migration silently doubled allocations on a hot path. Replace with
  `hashbrown::HashTable<u32>` storing indices into the `Vec`, looking
  up via FxHasher-hashed content equality — back to 1 alloc per
  unique entry. Add a dedup regression test.

- sourcemap_visualizer: `source_contents_lines_map` was built via
  `filter_map` that dropped `None` entries, then indexed by
  `source_id`. Mixed Some/None source_contents either panicked
  out-of-bounds or silently mis-mapped to the wrong content. Build a
  full-length `Vec<Option<...>>` so the index stays aligned with
  `source_id`, and skip-not-panic for sources with no content.

- owned_sourcemap: explicitly delegate `lookup_token` /
  `lookup_source_view_token` (previously only reachable via `Deref`,
  inconsistent with the rest of the delegated API). Drop the unused
  `SourceMap` import from the module-level doctest and rename
  `SourceMapBuilder` to its short form so the doctest is `-Dwarnings`
  clean.

All 23 unit/integration/doctest tests pass; bench numbers unchanged
within run-to-run noise.
CI runs `cargo fmt --check`, `cargo clippy --all-targets --all-features
-D warnings`, and `cargo doc -D warnings`. The latter two were green;
the format-check tripped on three files touched by the previous fix
commit, and `cargo doc` failed with `unresolved link to OwnedSourceMap`
because the SourceMapBuilder docstring referenced `OwnedSourceMap`
without the `crate::` prefix.
CI's typos check flags `mis-aligning` as a likely typo of `miss`/`mist`.
Reword the comment in src/sourcemap_visualizer.rs to use the closed
form ("misaligned") which is not in the typos rule set.
The previous commit swapped the builder's dedup from
`FxHashMap<Cow<'static, str>, u32>` to `hashbrown::HashTable<u32>` to
avoid the second string allocation per unique entry (one for the Vec,
one for the HashMap key).

Per request, drop the extra dependency for now and accept the
double-allocation: the cost is one extra `String` per *unique* name /
source — bounded by the input's distinct-string count, not the token
count, so still small in practice.
@Boshen Boshen merged commit db883f9 into main May 25, 2026
8 checks passed
@Boshen Boshen deleted the perf/owned-sourcemap branch May 25, 2026 16:34
@oxc-guard oxc-guard Bot mentioned this pull request May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant