Skip to content

Arkime vs Rust vs Python #285

@awick

Description

@awick

I used Opus to compare Arkime vs Rust vs Python and found some issues in Arkime which I've now fixed. Here are the issues it thinks are in Rust/Python if helpful.

JA4 Reference Implementation Bugs

Found while comparing the Rust and Python JA4 reference implementations against Arkime's capture implementation across 188 pcap files from the Arkime test suite.


Python (ja4.py) Bugs

1. JA4: Empty extension list hashes to SHA256("") instead of 000000000000

Pcaps: https3-301-get.pcap, socks-https-example.pcap

When a TLS Client Hello has no extensions, Python produces the SHA256 hash of an empty string (e3b0c44298fc) for the 3rd segment instead of 000000000000.

Tool JA4
Python t10d230100_6a57a6f57151_e3b0c44298fc
Rust t10d230100_6a57a6f57151_000000000000
Arkime t10d230100_6a57a6f57151_000000000000

Both pcaps show the same issue. Rust and Arkime agree on 000000000000.


2. JA4H: HTTP version misdetected (1.0 reported as 1.1)

Pcaps: http-empty-useragent.pcap, v6-http.pcap

Python reports HTTP/1.1 (11) but the actual request line in the pcap is HTTP/1.0. Verified via tshark:

$ tshark -r http-empty-useragent.pcap -Y "http.request" -T fields -e http.request.version
HTTP/1.0
Tool JA4H
Python ge11nn110000_d295f7cacc7a_...
Rust (no HTTP in this pcap for Rust)
Arkime ge10nn010000_b8bcd45ac095_...

v6-http.pcap has the same version issue:

Tool JA4H
Python ge11nn05en00_dff23709e538_...
Arkime ge10nn05en00_dff23709e538_...

Note the header hashes match for v6-http.pcap (dff23709e538), confirming the only difference is the version field.


3. JA4H: Header count inflated (likely counting across requests)

Pcaps: http-empty-useragent.pcap, https-connect.pcap

Python reports far more headers than are present in individual HTTP requests. This may be caused by accumulating headers across multiple HTTP requests in the same TCP stream instead of counting per-request.

http-empty-useragent.pcap: The pcap contains a single GET request with 1 header (User-Agent):

GET / HTTP/1.0
User-Agent:
Tool Header count in JA4H Full JA4H
Python 11 ge11nn110000_d295f7cacc7a_...
Arkime 01 ge10nn010000_b8bcd45ac095_...

https-connect.pcap:

Tool Header count in JA4H Full JA4H
Python 38 co11nn380000_35ef01bf733f_...
Arkime 01 co10nn010000_b8bcd45ac095_...

4. JA4H: Cookie value hashes differ from Rust reference

Pcap: single-packets.pcap

For requests with cookies, the 4th JA4H segment (cookie field=value hash) differs between Python and the Rust reference, despite the raw cookie fields/values appearing identical.

Example — request with cookies pardot, visitor_id413862, visitor_id413862-hash:

Tool JA4H
Python ge11cr06enus_8c2f9ef95269_d23bf79698dc_c1eaa758c543
Rust ge11cr06enus_8c2f9ef95269_d23bf79698dc_69e42fa741fe
Arkime ge11cr06enus_8c2f9ef95269_d23bf79698dc_69e42fa741fe

First 3 segments match (method, headers, cookie names). Only the 4th segment (cookie values hash) differs. Rust and Arkime produce identical results for all 6 unique JA4H values in this pcap.


Rust Reference Notes

No bugs found in the Rust implementation. All JA4/JA4S/JA4H/JA4T values that both Rust and Arkime produce match exactly across all 188 test pcaps.

Minor coverage note: Rust does not produce JA4 fingerprints for DTLS Client Hello packets in arkime_synthetic.pcap (Arkime produces dd2d020400_c1929292aa6b_c91bed236abd). This is a coverage gap, not a correctness issue.


Test Methodology

Compared three implementations across 188 pcap files from arkime/tests/pcap/:

  • Rust: /path/to/ja4 -j <pcap> (reference implementation)
  • Python: python3 ja4.py -J <pcap> (reference implementation)
  • Arkime: ./capture --tests -o plugins=ja4plus.so -r <pcap>

All pcap files mentioned are available in the Arkime repository.

Metadata

Metadata

Assignees

Labels

pythonPython implementation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions