Skip to content

Improve Apple Silicon performance #3

@nikneym

Description

@nikneym

The implementation seem to perform significantly worse on Apple Silicon (maybe also on other AArch64) chips. Though we use suggested vector length via std.simd.suggestVectorLength (which is likely 128-bits) something else might be required to take better advantage of vectors on such platforms.

Hyperfine benchmark results on Apple M4 Pro chip:

Benchmark 1: ./hparse/zig-out/bin/hparse
  Time (mean ± σ):      1.464 s ±  0.011 s    [User: 1.457 s, System: 0.005 s]
  Range (min … max):    1.445 s …  1.481 s    10 runs

Benchmark 2: ./picohttpparser/picohttpparser
  Time (mean ± σ):     964.7 ms ±  13.4 ms    [User: 959.8 ms, System: 3.3 ms]
  Range (min … max):   947.9 ms … 988.6 ms    10 runs

Benchmark 3: ./bench-httparse/target/release/bench-httparse
  Time (mean ± σ):     752.5 ms ± 246.2 ms    [User: 675.9 ms, System: 2.7 ms]
  Range (min … max):   650.8 ms … 1452.6 ms    10 runs

Summary
  ./bench-httparse/target/release/bench-httparse ran
    1.28 ± 0.42 times faster than ./picohttpparser/picohttpparser
    1.95 ± 0.64 times faster than ./hparse/zig-out/bin/hparse

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceOptimizations that enhances speed and performance of the module

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions