Releases: jonwiggins/xmloxide
v0.4.1
Added
- Schematron abstract pattern support —
is-ainstantiation for reusable
validation patterns - XPath
replace(),tokenize(), and 10 more XPath 2.0 functions —
expanding XPath 2.0 coverage - Schematron C FFI —
xmloxide_parse_schematron,xmloxide_free_schematron,
xmloxide_validate_schematron, andxmloxide_validate_schematron_with_phasefor
C/C++ consumers; Schematron was previously only available in Rust, Python, and WASM - CSS selector C FFI —
xmloxide_css_select,xmloxide_css_select_first, and
xmloxide_free_nodeid_arrayfor querying elements with CSS selectors from C/C++ - CSS selectors in Python —
css_select()andcss_select_first()methods on
Documentin pyxmloxide - WASM tree mutation APIs —
createElement,createText,createComment,
appendChild,removeNode,setAttribute,removeAttribute,setTextContent,
insertBefore,cloneNodeonWasmDocument - Validation benchmarks — criterion benchmarks for DTD, RelaxNG, XSD, and
Schematron validation - Expanded XPath benchmarks — count, string function, position predicate,
ancestor axis, and union expression benchmarks - CSS selector benchmarks — class selector and complex combinator benchmarks
- 58 CSS evaluator inline tests covering tag, class, ID, attribute, pseudo-class,
combinator, and universal selector matching - 10 new FFI tests (5 Schematron + 5 CSS) bringing FFI test total to 138
Fixed
- README incorrectly listed Schematron as unsupported — the Limitations section
claimed "No Schematron" despite Schematron being added in 0.4.0 - README listed XPath as "1.0 only" — updated to "XPath 1.0+" reflecting the
17+ XPath 2.0 functions added in prior releases - Outdated test counts in README — updated from 936 to 1078 unit tests
Improved
- Unit tests expanded from 1010 to 1078
- FFI tests expanded from 128 to 138
- README now documents serde, async, and Schematron features
- MIGRATION.md expanded with HTML5 parsing, HTML5 streaming, Schematron validation,
and CSS selector migration examples - CLAUDE.md module map updated with css/, serde_xml/, async_xml, and full ffi/ listing
xmllint --schematronadded to CLI documentation in READMExmloxide.hheader updated with Schematron and CSS selector declarations
v0.4.0
Added
- ISO Schematron validation (
validation::schematronmodule) — rule-based XML
validation per ISO/IEC 19757-3, complementing DTD, RelaxNG, and XSDparse_schematron()/validate_schematron()/validate_schematron_with_phase()API- Assert/report checks with
XPath-driven test expressions - Firing rule semantics (first matching rule wins per pattern)
- Three-level
<sch:let>variables (schema, pattern, rule scope) - Message interpolation via
<sch:value-of select="..."/> - Phase-based selective validation (
<sch:phase>/<sch:active>) - Dual namespace support: ISO (
http://purl.oclc.org/dml/schematron) and
classic 1.5 (http://www.ascc.net/xml/schematron), plussch:prefix - 31 unit tests + 11 integration tests with realistic purchase order schema
xmllint --schematron— CLI validation against Schematron schemas, following
the existing--relaxngand--schemapatterns- XPath
matches()function — regex matching for Schematron pattern validation,
with a hand-rolled engine (noregexcrate dependency) supporting character classes,
quantifiers, shorthand (\d,\s,\w), alternation, grouping, counted
quantifiers{n,m}, and flags (i,s) - XPath namespace-aware name matching —
XPathContext::set_namespace()registers
prefix→URI bindings so that prefixed name tests like//inv:invoiceresolve via
namespace URI comparison instead of string matching; Schematron<sch:ns>bindings
are automatically threaded through - XSD
elementFormDefaultsupport — when set to"qualified", child elements
in instance documents must carry the schema's target namespace; fixes namespace
validation for UBL 2.4 and similar schemas - XSD
xsd:importandxsd:includesupport (#3) —
multi-file XSD schema composition for real-world schemas like UBL 2.4 - XSD
xsd:element refsupport — element references resolve to global
element declarations in local or imported schemas - WASM validation APIs —
validateRelaxng(),validateXsd(),
validateSchematron()onWasmDocument - Python validation APIs —
validate_relaxng(),validate_xsd(),
validate_schematron()onDocument fuzz_schematronfuzz target for schema parsing and validation (11 total)
Fixed
- XPath attribute path returning String instead of NodeSet — multi-step paths
ending with an attribute axis (e.g.,item/@amount) now correctly return a
NodeSet, fixingsum(),count(), and comparison operations on attribute
collections - XPath
prefix:*tokenization — the lexer now correctly tokenizes namespace
wildcard expressions likeinv:*as a single token instead of failing with a
parse error - Schematron message interpolation for NodeSets —
<sch:value-of>expressions
that return element NodeSets now correctly compute string values using the document
context instead of returning empty strings
Improved
- Unit tests expanded from 988 to 1010
- Fuzz targets expanded from 10 to 11
v0.3.1
Fixed
- Pin
tempfiledev-dependency to<3.20andproptestto<1.7to avoid transitive dependencies requiring Rust 1.84+/1.85+, which broke builds on the MSRV of 1.81
Improved
- Pre-commit hook now includes an MSRV check: runs
cargo checkwith the 1.81 toolchain (if installed) or scansCargo.lockfor edition2024 dependencies as a fallback heuristic
v0.3.0
What's New
CSS Selector Engine
Query document trees with familiar CSS syntax — tag, class, ID, attribute selectors, all combinators (descendant, child, adjacent sibling, general sibling), pseudo-classes (:first-child, :last-child, :only-child, :empty, :not(), :nth-child(), :nth-last-child()), and selector groups.
use xmloxide::css::select;
use xmloxide::Document;
let doc = Document::parse_str(r#"<div><p class="intro">Hello</p></div>"#).unwrap();
let root = doc.root_element().unwrap();
let results = select(&doc, root, "p.intro").unwrap();Streaming HTML5 SAX API
Callback-driven API that wraps the WHATWG HTML5 tokenizer directly without building a DOM tree. Ideal for large HTML documents where you only need to extract specific data.
use xmloxide::html5::sax::{Html5SaxHandler, parse_html5_sax};
struct Counter { elements: usize }
impl Html5SaxHandler for Counter {
fn start_element(&mut self, _: &str, _: &[(String, String)], _: bool) {
self.elements += 1;
}
}
let mut h = Counter { elements: 0 };
parse_html5_sax("<div><p>Hello</p></div>", &mut h);Auto-populated ID Map
element_by_id() now works out of the box — the parser automatically indexes id attributes during tree construction for XML, HTML 4, and HTML5 documents. Pure #id CSS selectors use O(1) hash lookup.
Performance Improvements
#[inline]on hot-path tree accessors- Direct node field access in
Descendants/Childreniterators - Arena pre-sizing from estimated input node count
Other Additions
- Tree mutation API (
create_element,append_child,remove_node, etc.) - Serde XML support (
serdefeature) - Async XML parsing (
asyncfeature) - WebAssembly bindings (
xmloxide-wasmsubcrate) - Python bindings (
pyxmloxidesubcrate) - Property-based testing (20 proptest properties)
Bug Fixes
- HTML 4 parser infinite loop on bare
<not followed by a valid tag start - HTML5 tokenizer panic on multi-byte characters in the ambiguous ampersand state
Full changelog: https://github.com/jonwiggins/xmloxide/blob/main/CHANGELOG.md
v0.1.1
Fixed
- Fix docs.rs build failure caused by
all-features = truepulling in the
bench-libxml2feature, which requires system libxml2 headers unavailable
in the docs.rs sandbox. Now explicitly listscliandffifeatures.
Improved
- Expanded doc comments on
Documentnavigation, iteration, and mutation
methods,HtmlParseOptionsbuilder methods,XmlReaderaccessors, and
SerializeOptionsbuilder methods.
v0.1.0
Initial release of xmloxide — a pure Rust reimplementation of libxml2.
Added
- XML 1.0 parser — hand-rolled recursive descent parser with full W3C XML
1.0 (Fifth Edition) conformance (1727/1727 applicable tests passing) - Error recovery — parse malformed XML and produce a usable tree, matching
libxml2's recovery behavior (119/119 libxml2 compatibility tests passing) - Arena-based DOM tree —
DocumentwithNodeIdindices for O(1) access,
cache-friendly layout, and safe bulk deallocation - HTML parser — error-tolerant HTML 4.01 parsing with auto-closing tags,
implicit elements, and void element handling - SAX2 streaming parser — event-driven API via
SaxHandlertrait - XmlReader — pull-based parsing API
- Push/incremental parser — feed chunks of data as they arrive
- XPath 1.0 — full expression parser and evaluator with all core functions
and axes, includingnamespace::axis support - DTD validation — parse and validate against Document Type Definitions
- RelaxNG validation — parse and validate against RelaxNG schemas
- XML Schema (XSD) validation — parse and validate against XML Schema
definitions - Canonical XML — C14N 1.0 and Exclusive C14N serialization
- XInclude — document inclusion processing
- XML Catalogs — OASIS XML Catalogs for URI resolution
- XML serialization — 1.5-2.4x faster than libxml2
- HTML serialization — void elements, attribute rules
- C/C++ FFI — full C API with header file (
include/xmloxide.h) covering
document parsing, tree navigation and mutation, serialization, XPath, SAX2
streaming, push parser, XmlReader, validation, C14N, XInclude, and catalogs xmllintCLI — command-line tool for parsing, validating, and querying
XML/HTML (behindclifeature flag)- Character encoding — automatic detection and transcoding via
encoding_rs - Namespace support — full Namespaces in XML 1.0 implementation
- String interning — dictionary-based interning for fast comparisons
- Fuzz targets — XML, HTML, XPath, and roundtrip fuzz testing
- Benchmark suite — criterion benchmarks for parsing, serialization, SAX,
XmlReader, XPath, push parsing, and head-to-head comparison with libxml2
Performance
- Parsing within 3-4% of libxml2 on most documents, 12% faster on SVG
- Serialization is 1.5-2.4x faster than libxml2
- XPath is 1.1-2.7x faster than libxml2 across all benchmarks
- Key optimizations: O(1) character peek, bulk text scanning, ASCII fast paths,
zero-copy element name splitting, inline entity resolution, XPath//step
fusion with fused axis expansion, inlined tree accessors, and name-test fast
paths for child/descendant axes
Testing
- 785 unit tests across all modules
- 112 FFI integration tests covering the full C API surface
- 1727/1727 W3C XML Conformance Test Suite tests (100%)
- 119/119 libxml2 compatibility tests (100%)
- Real-world XML, security/DoS, and entity resolver integration tests