Skip to content

Releases: jonwiggins/xmloxide

v0.4.1

23 Mar 04:56

Choose a tag to compare

Added

  • Schematron abstract pattern supportis-a instantiation for reusable
    validation patterns
  • XPath replace(), tokenize(), and 10 more XPath 2.0 functions
    expanding XPath 2.0 coverage
  • Schematron C FFIxmloxide_parse_schematron, xmloxide_free_schematron,
    xmloxide_validate_schematron, and xmloxide_validate_schematron_with_phase for
    C/C++ consumers; Schematron was previously only available in Rust, Python, and WASM
  • CSS selector C FFIxmloxide_css_select, xmloxide_css_select_first, and
    xmloxide_free_nodeid_array for querying elements with CSS selectors from C/C++
  • CSS selectors in Pythoncss_select() and css_select_first() methods on
    Document in pyxmloxide
  • WASM tree mutation APIscreateElement, createText, createComment,
    appendChild, removeNode, setAttribute, removeAttribute, setTextContent,
    insertBefore, cloneNode on WasmDocument
  • Validation benchmarks — criterion benchmarks for DTD, RelaxNG, XSD, and
    Schematron validation
  • Expanded XPath benchmarks — count, string function, position predicate,
    ancestor axis, and union expression benchmarks
  • CSS selector benchmarks — class selector and complex combinator benchmarks
  • 58 CSS evaluator inline tests covering tag, class, ID, attribute, pseudo-class,
    combinator, and universal selector matching
  • 10 new FFI tests (5 Schematron + 5 CSS) bringing FFI test total to 138

Fixed

  • README incorrectly listed Schematron as unsupported — the Limitations section
    claimed "No Schematron" despite Schematron being added in 0.4.0
  • README listed XPath as "1.0 only" — updated to "XPath 1.0+" reflecting the
    17+ XPath 2.0 functions added in prior releases
  • Outdated test counts in README — updated from 936 to 1078 unit tests

Improved

  • Unit tests expanded from 1010 to 1078
  • FFI tests expanded from 128 to 138
  • README now documents serde, async, and Schematron features
  • MIGRATION.md expanded with HTML5 parsing, HTML5 streaming, Schematron validation,
    and CSS selector migration examples
  • CLAUDE.md module map updated with css/, serde_xml/, async_xml, and full ffi/ listing
  • xmllint --schematron added to CLI documentation in README
  • xmloxide.h header updated with Schematron and CSS selector declarations

v0.4.0

23 Mar 04:56

Choose a tag to compare

Added

  • ISO Schematron validation (validation::schematron module) — rule-based XML
    validation per ISO/IEC 19757-3, complementing DTD, RelaxNG, and XSD
    • parse_schematron() / validate_schematron() / validate_schematron_with_phase() API
    • Assert/report checks with XPath-driven test expressions
    • Firing rule semantics (first matching rule wins per pattern)
    • Three-level <sch:let> variables (schema, pattern, rule scope)
    • Message interpolation via <sch:value-of select="..."/>
    • Phase-based selective validation (<sch:phase> / <sch:active>)
    • Dual namespace support: ISO (http://purl.oclc.org/dml/schematron) and
      classic 1.5 (http://www.ascc.net/xml/schematron), plus sch: prefix
    • 31 unit tests + 11 integration tests with realistic purchase order schema
  • xmllint --schematron — CLI validation against Schematron schemas, following
    the existing --relaxng and --schema patterns
  • XPath matches() function — regex matching for Schematron pattern validation,
    with a hand-rolled engine (no regex crate dependency) supporting character classes,
    quantifiers, shorthand (\d, \s, \w), alternation, grouping, counted
    quantifiers {n,m}, and flags (i, s)
  • XPath namespace-aware name matchingXPathContext::set_namespace() registers
    prefix→URI bindings so that prefixed name tests like //inv:invoice resolve via
    namespace URI comparison instead of string matching; Schematron <sch:ns> bindings
    are automatically threaded through
  • XSD elementFormDefault support — when set to "qualified", child elements
    in instance documents must carry the schema's target namespace; fixes namespace
    validation for UBL 2.4 and similar schemas
  • XSD xsd:import and xsd:include support (#3) —
    multi-file XSD schema composition for real-world schemas like UBL 2.4
  • XSD xsd:element ref support — element references resolve to global
    element declarations in local or imported schemas
  • WASM validation APIsvalidateRelaxng(), validateXsd(),
    validateSchematron() on WasmDocument
  • Python validation APIsvalidate_relaxng(), validate_xsd(),
    validate_schematron() on Document
  • fuzz_schematron fuzz target for schema parsing and validation (11 total)

Fixed

  • XPath attribute path returning String instead of NodeSet — multi-step paths
    ending with an attribute axis (e.g., item/@amount) now correctly return a
    NodeSet, fixing sum(), count(), and comparison operations on attribute
    collections
  • XPath prefix:* tokenization — the lexer now correctly tokenizes namespace
    wildcard expressions like inv:* as a single token instead of failing with a
    parse error
  • Schematron message interpolation for NodeSets<sch:value-of> expressions
    that return element NodeSets now correctly compute string values using the document
    context instead of returning empty strings

Improved

  • Unit tests expanded from 988 to 1010
  • Fuzz targets expanded from 10 to 11

v0.3.1

06 Mar 18:58

Choose a tag to compare

Fixed

  • Pin tempfile dev-dependency to <3.20 and proptest to <1.7 to avoid transitive dependencies requiring Rust 1.84+/1.85+, which broke builds on the MSRV of 1.81

Improved

  • Pre-commit hook now includes an MSRV check: runs cargo check with the 1.81 toolchain (if installed) or scans Cargo.lock for edition2024 dependencies as a fallback heuristic

v0.3.0

06 Mar 18:41

Choose a tag to compare

What's New

CSS Selector Engine

Query document trees with familiar CSS syntax — tag, class, ID, attribute selectors, all combinators (descendant, child, adjacent sibling, general sibling), pseudo-classes (:first-child, :last-child, :only-child, :empty, :not(), :nth-child(), :nth-last-child()), and selector groups.

use xmloxide::css::select;
use xmloxide::Document;

let doc = Document::parse_str(r#"<div><p class="intro">Hello</p></div>"#).unwrap();
let root = doc.root_element().unwrap();
let results = select(&doc, root, "p.intro").unwrap();

Streaming HTML5 SAX API

Callback-driven API that wraps the WHATWG HTML5 tokenizer directly without building a DOM tree. Ideal for large HTML documents where you only need to extract specific data.

use xmloxide::html5::sax::{Html5SaxHandler, parse_html5_sax};

struct Counter { elements: usize }
impl Html5SaxHandler for Counter {
    fn start_element(&mut self, _: &str, _: &[(String, String)], _: bool) {
        self.elements += 1;
    }
}

let mut h = Counter { elements: 0 };
parse_html5_sax("<div><p>Hello</p></div>", &mut h);

Auto-populated ID Map

element_by_id() now works out of the box — the parser automatically indexes id attributes during tree construction for XML, HTML 4, and HTML5 documents. Pure #id CSS selectors use O(1) hash lookup.

Performance Improvements

  • #[inline] on hot-path tree accessors
  • Direct node field access in Descendants/Children iterators
  • Arena pre-sizing from estimated input node count

Other Additions

  • Tree mutation API (create_element, append_child, remove_node, etc.)
  • Serde XML support (serde feature)
  • Async XML parsing (async feature)
  • WebAssembly bindings (xmloxide-wasm subcrate)
  • Python bindings (pyxmloxide subcrate)
  • Property-based testing (20 proptest properties)

Bug Fixes

  • HTML 4 parser infinite loop on bare < not followed by a valid tag start
  • HTML5 tokenizer panic on multi-byte characters in the ambiguous ampersand state

Full changelog: https://github.com/jonwiggins/xmloxide/blob/main/CHANGELOG.md

v0.1.1

01 Mar 21:06

Choose a tag to compare

Fixed

  • Fix docs.rs build failure caused by all-features = true pulling in the
    bench-libxml2 feature, which requires system libxml2 headers unavailable
    in the docs.rs sandbox. Now explicitly lists cli and ffi features.

Improved

  • Expanded doc comments on Document navigation, iteration, and mutation
    methods, HtmlParseOptions builder methods, XmlReader accessors, and
    SerializeOptions builder methods.

v0.1.0

28 Feb 22:54

Choose a tag to compare

Initial release of xmloxide — a pure Rust reimplementation of libxml2.

Added

  • XML 1.0 parser — hand-rolled recursive descent parser with full W3C XML
    1.0 (Fifth Edition) conformance (1727/1727 applicable tests passing)
  • Error recovery — parse malformed XML and produce a usable tree, matching
    libxml2's recovery behavior (119/119 libxml2 compatibility tests passing)
  • Arena-based DOM treeDocument with NodeId indices for O(1) access,
    cache-friendly layout, and safe bulk deallocation
  • HTML parser — error-tolerant HTML 4.01 parsing with auto-closing tags,
    implicit elements, and void element handling
  • SAX2 streaming parser — event-driven API via SaxHandler trait
  • XmlReader — pull-based parsing API
  • Push/incremental parser — feed chunks of data as they arrive
  • XPath 1.0 — full expression parser and evaluator with all core functions
    and axes, including namespace:: axis support
  • DTD validation — parse and validate against Document Type Definitions
  • RelaxNG validation — parse and validate against RelaxNG schemas
  • XML Schema (XSD) validation — parse and validate against XML Schema
    definitions
  • Canonical XML — C14N 1.0 and Exclusive C14N serialization
  • XInclude — document inclusion processing
  • XML Catalogs — OASIS XML Catalogs for URI resolution
  • XML serialization — 1.5-2.4x faster than libxml2
  • HTML serialization — void elements, attribute rules
  • C/C++ FFI — full C API with header file (include/xmloxide.h) covering
    document parsing, tree navigation and mutation, serialization, XPath, SAX2
    streaming, push parser, XmlReader, validation, C14N, XInclude, and catalogs
  • xmllint CLI — command-line tool for parsing, validating, and querying
    XML/HTML (behind cli feature flag)
  • Character encoding — automatic detection and transcoding via encoding_rs
  • Namespace support — full Namespaces in XML 1.0 implementation
  • String interning — dictionary-based interning for fast comparisons
  • Fuzz targets — XML, HTML, XPath, and roundtrip fuzz testing
  • Benchmark suite — criterion benchmarks for parsing, serialization, SAX,
    XmlReader, XPath, push parsing, and head-to-head comparison with libxml2

Performance

  • Parsing within 3-4% of libxml2 on most documents, 12% faster on SVG
  • Serialization is 1.5-2.4x faster than libxml2
  • XPath is 1.1-2.7x faster than libxml2 across all benchmarks
  • Key optimizations: O(1) character peek, bulk text scanning, ASCII fast paths,
    zero-copy element name splitting, inline entity resolution, XPath // step
    fusion with fused axis expansion, inlined tree accessors, and name-test fast
    paths for child/descendant axes

Testing

  • 785 unit tests across all modules
  • 112 FFI integration tests covering the full C API surface
  • 1727/1727 W3C XML Conformance Test Suite tests (100%)
  • 119/119 libxml2 compatibility tests (100%)
  • Real-world XML, security/DoS, and entity resolver integration tests