Skip to content

Conversation

@tomas-zijdemans
Copy link
Contributor

@tomas-zijdemans tomas-zijdemans commented Jan 7, 2026

New XML parsing and serialization module

What @std/xml has:

  • Streaming parser, DOM-style parser, serialization
  • Browser compatible, position tracking, spec-compliant

What @std/xml doesn't have:

  • Namespace resolution, DTD/Schema validation, HTML entities
  • Custom entities, XPath/selectors, object-to-XML builder

Benchmark Results

Performance work never really ends, and you often find yourself comparing apples and oranges. Anyway. Here goes.

The challengers

Library XML Spec compliant? Streaming XML parsing? Error position tracking?
SAX No Yes Yes
saxes Yes Yes Yes
fast-xml-parser No No Yes
txml No Yes No
xml2js No No Partial
htmlparser2 No Yes Partial
deno std Yes Yes Yes (configurable)

Error position tracking is nice for debugging, but really hurts performance. So I made it an option that defaults to true for non-streaming and false for streaming (streaming is usually for trusted data sources. Multi-GB feeds or logs where throughput is critical). The results below contain both with and without error position tracking.

Test data

I used the test files located in testdata for non-streaming. I used one 597MB file for the streaming benchmark (google product data), but didn't check that into testdata. Other payloads may give different results.

Small Files (<10KB) — Median Results

Parser Time (ms) vs Deno std
txml 0.010 1.6x faster
Deno std (no pos) 0.010 1.6x faster
Deno std (+pos) 0.016 baseline
saxes 0.016 1.0x (same)
htmlparser2 0.021 1.3x slower
SAX 0.028 1.8x slower
fast-xml-parser 0.037 2.3x slower
xml2js 0.047 2.9x slower

1 Large File (301KB) — Median Results

Parser Time (ms) vs Deno std
txml 2.10 1.5x faster
saxes 2.40 1.3x faster
Deno std (no pos) 2.47 1.3x faster
Deno std (+pos) 3.11 baseline
htmlparser2 4.59 1.5x slower
SAX 7.94 2.6x slower
fast-xml-parser 11.60 3.7x slower
xml2js 14.67 4.7x slower

Streaming (a 597MB file) — Median Results

Parser Time (s) Throughput vs Deno std
Deno std (no pos) 4.24 179K items/s baseline
Deno std (+pos) 4.47 170K items/s 1.1x slower
saxes 5.24 145K items/s 1.2x slower
htmlparser2 6.44 118K items/s 1.5x slower
SAX 16.45 46K items/s 3.9x slower

@tomas-zijdemans tomas-zijdemans requested a review from kt3k as a code owner January 7, 2026 22:09
@crowlKats
Copy link
Member

could we get some benchmarks comparing to other parsers?

@tomas-zijdemans
Copy link
Contributor Author

could we get some benchmarks comparing to other parsers?

Yes, that's a good idea. I'll look into it.

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 96.71339% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.23%. Comparing base (6b93b78) to head (fdd09f0).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
xml/_tokenizer.ts 95.29% 54 Missing and 3 partials ⚠️
xml/_parse_sync.ts 98.31% 4 Missing and 2 partials ⚠️
xml/_entities.ts 96.07% 4 Missing ⚠️
xml/parse_stream.ts 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6942      +/-   ##
==========================================
- Coverage   94.28%   94.23%   -0.06%     
==========================================
  Files         584      610      +26     
  Lines       43186    45609    +2423     
  Branches     6933     7501     +568     
==========================================
+ Hits        40720    42981    +2261     
- Misses       2413     2568     +155     
- Partials       53       60       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tomas-zijdemans
Copy link
Contributor Author

could we get some benchmarks comparing to other parsers?

Updated the description now. Let me know if you would like to benchmark against a specific package

@tomas-zijdemans tomas-zijdemans force-pushed the xml branch 6 times, most recently from 7f792e6 to e20f201 Compare January 8, 2026 21:04
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have no idea why this formatting is happening 😅

@timreichen
Copy link
Contributor

Ref: denoland/deno#24995
There was no reply if DOMParser or something similar was to be implemented in deno, so I like this PR in general.
However, it might be worth to check with the deno core team what their current stance is on this before merging anything.

@tomas-zijdemans
Copy link
Contributor Author

Ref: denoland/deno#24995 There was no reply if DOMParser or something similar was to be implemented in deno, so I like this PR in general. However, it might be worth to check with the deno core team what their current stance is on this before merging anything.

Thanks, I was not aware of this discussion. Perhaps we could have it as an unstable module for now? Then we can always kick it out, should the core team decide to implement DOMParser

@tomas-zijdemans
Copy link
Contributor Author

Updated again to increase streaming performance and get test coverage to 100%

@tomas-zijdemans
Copy link
Contributor Author

More perf work. Will look into using callbacks instead of arrays of objects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants