Skip to content

perf: misc allocation/branch reductions (round 2)#335

Closed
Boshen wants to merge 2 commits into
mainfrom
perf/round2-misc-allocations
Closed

perf: misc allocation/branch reductions (round 2)#335
Boshen wants to merge 2 commits into
mainfrom
perf/round2-misc-allocations

Conversation

@Boshen
Copy link
Copy Markdown
Member

@Boshen Boshen commented May 25, 2026

Summary

Four small, independent perf wins on top of main. Each removes one allocation, one branch, or one bounds check from a hot path.

  • encode: x_google_ignoreList integers now go through an inline stack-buffer u32 → bytes converter instead of u32::to_string(). Zero allocations on the ignoreList encode path.
  • encode (VLQ): the 64-entry B64_CHARS table is now indexed via get_unchecked. digit & 0b11111 is provably 0..=31, but the optimizer doesn't reliably elide the bounds check across the loop break. Worth ~3-4% on small/medium serialize.
  • decode: tokens are constructed via a new pub(crate) Token::new_raw(...) that takes raw u32 ids with INVALID_ID as the absent-sentinel. The decoder already tracks ids that way, so this skips the previous u32 → Option<u32> → u32 roundtrip through Token::new. Also marks the small Token getters #[inline]. Worth ~1-4% on small/medium/large parse.
  • concat builder: add_sourcemap extends sources / source_contents / names by iterating the input Vec<Cow>s directly. Going through the get_* accessors returned impl Iterator<Item = &str>, which hid the ExactSizeIterator impl from extend and forced geometric growth. Direct field iteration preserves the exact-size hint so each extend pre-reserves once.
  • builder: drop the explicit self.tokens.shrink_to_fit() from into_sourcemapVec::into_boxed_slice below already drops excess capacity in one allocation+copy.

Benchmarks

Wall-clock differences are mostly inside the criterion noise floor (±2-3%) on the existing perf fixtures. The wins these make are most visible on workloads with many sourcemaps being concatenated (where the extend no-reserve was geometric) and on workloads with thousands of tokens (where the per-token bounds-check + Option roundtrip cost adds up). Composes additively with #330 and #331.

Boshen added 2 commits May 26, 2026 00:46
Three small, independent wins on top of main:

* **encode**: `x_google_ignoreList` integers no longer go through
  `u32::to_string()` per element. Inline a stack-buffer u32 → bytes
  conversion so the rare ignoreList encode path does zero allocations.

* **concat builder**: `add_sourcemap` now extends `sources` /
  `source_contents` / `names` by iterating the input `Vec<Cow>`s
  directly. The previous `get_*()` accessors return
  `impl Iterator<Item = &str>` which hides the `ExactSizeIterator`
  impl from `extend`, forcing geometric growth of the output vecs.
  Going through `.iter().map(...)` preserves the exact-size hint, so
  each `extend` pre-reserves in one shot.

* **builder**: drop the explicit `self.tokens.shrink_to_fit()` from
  `into_sourcemap`. `Vec::into_boxed_slice` already drops any excess
  capacity in a single allocation+copy; the standalone shrink was
  duplicate work on the same Vec.
Two more small wins:

* **encode**: lookup into the 64-entry `B64_CHARS` table now uses
  `get_unchecked`. The optimizer doesn't reliably elide the bounds
  check across the loop-break boundary, even though `digit & 0b11111`
  is provably in `0..=31`. Worth ~3-4% on small/medium serialize.

* **decode**: tokens are now constructed via `Token::new_raw`, a
  pub(crate) constructor that takes raw u32 ids using `INVALID_ID`
  as the absent-sentinel. The decoder already tracks ids that way,
  so this skips the previous `u32 → Option<u32> → u32` roundtrip
  through `Token::new`. Also marks Token's small getters `#[inline]`
  so accessor calls in hot loops reliably collapse to direct field
  reads. Worth ~1-4% across the parse sizes.
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 25, 2026

Merging this PR will degrade performance by 1.35%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 3 regressed benchmarks
✅ 13 untouched benchmarks
⏩ 5 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
parse[real_medium] 14.9 µs 15.1 µs -1.35%
parse[real_small] 11.6 µs 11.8 µs -1.48%
from_json_string_inline 14.1 µs 14.3 µs -1.22%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing perf/round2-misc-allocations (25593e5) with main (db883f9)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@Boshen
Copy link
Copy Markdown
Member Author

Boshen commented May 26, 2026

Closing per investigation: no sub-change in this PR touches the parse path algorithmically, yet CodSpeed reports a ~1.35% parse regression and itself warns of "different runtime environments". The effect appears to be binary-layout / i-cache sensitivity from reshuffling unrelated functions, not a real perf issue. Not worth the noise for the tiny wins on the existing fixtures.

@Boshen Boshen closed this May 26, 2026
@Boshen Boshen deleted the perf/round2-misc-allocations branch May 26, 2026 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant