Improve Apple Silicon performance

The implementation seem to perform significantly worse on Apple Silicon (maybe also on other AArch64) chips. Though we use suggested vector length via `std.simd.suggestVectorLength` (which is likely 128-bits) something else might be required to take better advantage of vectors on such platforms.

Hyperfine benchmark results on Apple M4 Pro chip:
```
Benchmark 1: ./hparse/zig-out/bin/hparse
  Time (mean ± σ):      1.464 s ±  0.011 s    [User: 1.457 s, System: 0.005 s]
  Range (min … max):    1.445 s …  1.481 s    10 runs

Benchmark 2: ./picohttpparser/picohttpparser
  Time (mean ± σ):     964.7 ms ±  13.4 ms    [User: 959.8 ms, System: 3.3 ms]
  Range (min … max):   947.9 ms … 988.6 ms    10 runs

Benchmark 3: ./bench-httparse/target/release/bench-httparse
  Time (mean ± σ):     752.5 ms ± 246.2 ms    [User: 675.9 ms, System: 2.7 ms]
  Range (min … max):   650.8 ms … 1452.6 ms    10 runs

Summary
  ./bench-httparse/target/release/bench-httparse ran
    1.28 ± 0.42 times faster than ./picohttpparser/picohttpparser
    1.95 ± 0.64 times faster than ./hparse/zig-out/bin/hparse
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Apple Silicon performance #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve Apple Silicon performance #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions