diff --git a/BENCHMARKS.md b/BENCHMARKS.md index 53e80c3..c3be6de 100644 --- a/BENCHMARKS.md +++ b/BENCHMARKS.md @@ -6,7 +6,7 @@ Comprehensive performance comparison between all json2xml implementations. - **Machine**: Apple Silicon (M-series, aarch64) - **OS**: macOS -- **Date**: January 28, 2026 +- **Date**: March 12, 2026 ### Implementations Tested @@ -22,10 +22,10 @@ Comprehensive performance comparison between all json2xml implementations. | Size | Description | Bytes | |------|-------------|-------| | Small | Simple object `{"name": "John", "age": 30, "city": "New York"}` | 47 | -| Medium | 10 generated records with nested structures | ~3,208 | +| Medium | 10 generated records with nested structures | ~3,211 | | bigexample.json | Real-world patent data | 2,018 | -| Large | 100 generated records with nested structures | ~32,205 | -| Very Large | 1,000 generated records with nested structures | ~323,119 | +| Large | 100 generated records with nested structures | ~32,220 | +| Very Large | 1,000 generated records with nested structures | ~323,114 | ## Results @@ -33,21 +33,21 @@ Comprehensive performance comparison between all json2xml implementations. | Test Case | Python | Rust | Go | Zig | |-----------|--------|------|-----|-----| -| Small (47B) | 41.88µs | 1.66µs | 4.52ms | 2.80ms | -| Medium (3.2KB) | 2.19ms | 71.85µs | 4.33ms | 2.18ms | -| bigexample (2KB) | 854.38µs | 30.89µs | 4.28ms | 2.12ms | -| Large (32KB) | 21.57ms | 672.96µs | 4.47ms | 2.48ms | -| Very Large (323KB) | 216.52ms | 6.15ms | 4.44ms | 5.54ms | +| Small (47B) | 78.39µs | 1.05µs | 4.31ms | 1.96ms | +| Medium (3.2KB) | 2.15ms | 15.47µs | 5.03ms | 2.34ms | +| bigexample (2KB) | 862.12µs | 6.44µs | 4.47ms | 2.38ms | +| Large (32KB) | 22.08ms | 150.91µs | 4.80ms | 2.89ms | +| Very Large (323KB) | 218.63ms | 1.47ms | 4.75ms | 5.38ms | ### Speedup vs Pure Python | Test Case | Rust | Go | Zig | |-----------|------|-----|-----| -| Small (47B) | **25.2x** | 0.0x* | 0.0x* | -| Medium (3.2KB) | **30.5x** | 0.5x* | 1.0x* | -| bigexample (2KB) | **27.7x** | 0.2x* | 0.4x* | -| Large (32KB) | **32.1x** | 4.8x | **8.7x** | -| Very Large (323KB) | **35.2x** | **48.8x** | **39.1x** | +| Small (47B) | **74.9x** | 0.0x* | 0.0x* | +| Medium (3.2KB) | **139.1x** | 0.4x* | 0.9x* | +| bigexample (2KB) | **133.9x** | 0.2x* | 0.4x* | +| Large (32KB) | **146.3x** | 4.6x | **7.6x** | +| Very Large (323KB) | **149.2x** | **46.1x** | **40.6x** | *CLI tools have process spawn overhead (~2-4ms) which dominates for small inputs @@ -56,7 +56,7 @@ Comprehensive performance comparison between all json2xml implementations. ### 1. Rust Extension is the Best Choice for Python Users 🦀 The Rust extension (json2xml-rs) provides: -- **~25-35x faster** than pure Python consistently across all input sizes +- **~75-149x faster** than pure Python consistently across all input sizes - **Zero process overhead** - called directly from Python - **Automatic fallback** - pure Python used if Rust unavailable - **Easy install**: `pip install json2xml[fast]` @@ -64,15 +64,15 @@ The Rust extension (json2xml-rs) provides: ### 2. Go Excels for Very Large CLI Workloads 🚀 For very large inputs (323KB+): -- **48.8x faster** than Python +- **46.1x faster** than Python - But ~4ms startup overhead hurts small file performance - Best for batch processing or large file conversions ### 3. Zig is Now Highly Competitive ⚡ After recent optimizations: -- **39.1x faster** than Python for very large files -- **8.7x faster** for large files (32KB) +- **40.6x faster** than Python for very large files +- **7.6x faster** for large files (32KB) - Faster startup than Go (~2ms vs ~4ms) - Best balance of startup time and throughput @@ -89,7 +89,7 @@ CLI tools (Go, Zig) have process spawn overhead: | Use Case | Recommended | Why | |----------|-------------|-----| -| Python library calls | **Rust** (`pip install json2xml[fast]`) | 25-35x faster, no overhead | +| Python library calls | **Rust** (`pip install json2xml[fast]`) | 75-149x faster, no overhead | | Small files via CLI | **Zig** (json2xml-zig) | Fastest startup (~2ms) | | Large files via CLI | **Go** or **Zig** | Both excellent (Go slightly faster) | | Batch processing | **Go** or **Rust** | Both excellent | diff --git a/README.rst b/README.rst index d851820..f9fcecf 100644 --- a/README.rst +++ b/README.rst @@ -43,7 +43,7 @@ Installation pip install json2xml -**With Native Rust Acceleration (28x faster)** +**With Native Rust Acceleration (up to 149x faster)** For maximum performance, install the optional Rust extension: @@ -55,7 +55,7 @@ For maximum performance, install the optional Rust extension: # Or install the Rust extension separately pip install json2xml-rs -The Rust extension provides **28x faster** conversion compared to pure Python. It's automatically used when available, with seamless fallback to pure Python. +The Rust extension provides **75-149x faster** conversion compared to pure Python. It's automatically used when available, with seamless fallback to pure Python. **As a CLI Tool** @@ -301,7 +301,7 @@ Using tools directly: **Rust Extension Development** -The optional Rust extension (``json2xml-rs``) provides 29x faster performance. To develop or build the Rust extension: +The optional Rust extension (``json2xml-rs``) provides up to 149x faster performance. To develop or build the Rust extension: Prerequisites: @@ -428,21 +428,21 @@ For users who need maximum performance within Python, json2xml includes an optio - Rust Extension - Speedup * - **Small JSON** (47 bytes) - - 40µs - - 1.5µs - - **27x** + - 78µs + - 1.05µs + - **75x** * - **Medium JSON** (3.2 KB) - - 2.1ms - - 71µs - - **30x** + - 2.15ms + - 15µs + - **139x** * - **Large JSON** (32 KB) - - 21ms - - 740µs - - **28x** + - 22ms + - 151µs + - **146x** * - **Very Large JSON** (323 KB) - - 213ms - - 7.5ms - - **28x** + - 219ms + - 1.47ms + - **149x** **Usage with Rust Extension:** diff --git a/rust/Cargo.toml b/rust/Cargo.toml index 54c1e60..3121889 100644 --- a/rust/Cargo.toml +++ b/rust/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "json2xml_rs" -version = "0.1.0" +version = "0.2.0" edition = "2021" description = "Fast native JSON to XML conversion for Python" license = "Apache-2.0" @@ -14,7 +14,7 @@ default = ["python"] python = ["pyo3/extension-module", "dep:pyo3"] [dependencies] -pyo3 = { version = "0.27", optional = true } +pyo3 = { version = "0.28.2", optional = true } [profile.release] lto = true diff --git a/rust/pyproject.toml b/rust/pyproject.toml index 33b1c28..56329e6 100644 --- a/rust/pyproject.toml +++ b/rust/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "maturin" [project] name = "json2xml_rs" -version = "0.1.0" +version = "0.2.0" description = "Fast native JSON to XML conversion - Rust extension for json2xml" readme = "README.md" requires-python = ">=3.9" diff --git a/rust/src/lib.rs b/rust/src/lib.rs index 6345de1..30b51da 100644 --- a/rust/src/lib.rs +++ b/rust/src/lib.rs @@ -3,35 +3,82 @@ //! This module provides a high-performance Rust implementation of dicttoxml //! that can be used as a drop-in replacement for the pure Python version. +#[cfg(feature = "python")] +use pyo3::exceptions::PyValueError; #[cfg(feature = "python")] use pyo3::prelude::*; #[cfg(feature = "python")] use pyo3::types::{PyBool, PyDict, PyFloat, PyInt, PyList, PyString}; -use std::fmt::Write; - -/// Escape special XML characters in a string. -/// This is one of the hottest paths - optimized for single-pass processing. +/// Escape special XML characters in a string (allocating convenience wrapper). #[inline] pub fn escape_xml(s: &str) -> String { - let mut result = String::with_capacity(s.len() + s.len() / 10); - for c in s.chars() { - match c { - '&' => result.push_str("&"), - '"' => result.push_str("""), - '\'' => result.push_str("'"), - '<' => result.push_str("<"), - '>' => result.push_str(">"), - _ => result.push(c), - } - } - result + let mut out = String::with_capacity(s.len() + s.len() / 10); + push_escaped_attr(&mut out, s); + out +} + +/// Append text content with XML escaping matching the Python implementation. +/// Scans bytes for speed, copies clean slices in bulk. +#[inline] +pub fn push_escaped_text(out: &mut String, s: &str) { + let mut last = 0; + for (i, b) in s.bytes().enumerate() { + let repl = match b { + b'&' => "&", + b'"' => """, + b'\'' => "'", + b'<' => "<", + b'>' => ">", + _ => continue, + }; + out.push_str(&s[last..i]); + out.push_str(repl); + last = i + 1; + } + out.push_str(&s[last..]); } -/// Wrap content in CDATA section +/// Append attribute value with full XML escaping (also escapes quotes). +#[inline] +pub fn push_escaped_attr(out: &mut String, s: &str) { + let mut last = 0; + for (i, b) in s.bytes().enumerate() { + let repl = match b { + b'&' => "&", + b'"' => """, + b'\'' => "'", + b'<' => "<", + b'>' => ">", + _ => continue, + }; + out.push_str(&s[last..i]); + out.push_str(repl); + last = i + 1; + } + out.push_str(&s[last..]); +} + +/// Wrap content in CDATA section (allocating convenience wrapper). #[inline] pub fn wrap_cdata(s: &str) -> String { - let escaped = s.replace("]]>", "]]]]>"); - format!("", escaped) + let mut out = String::with_capacity(s.len() + 12); + push_cdata(&mut out, s); + out +} + +/// Append a CDATA section directly to the buffer. +#[inline] +pub fn push_cdata(out: &mut String, s: &str) { + out.push_str("") { + let abs = start + i; + out.push_str(&s[start..abs]); + out.push_str("]]]]>"); + start = abs + 3; + } + out.push_str(&s[start..]); + out.push_str("]]>"); } /// Check if a key is a valid XML element name (simplified check) @@ -57,47 +104,91 @@ pub fn is_valid_xml_name(key: &str) -> bool { } // Names starting with "xml" (case-insensitive) are reserved - !key.to_lowercase().starts_with("xml") + if key.len() >= 3 && key.as_bytes()[..3].eq_ignore_ascii_case(b"xml") { + return false; + } + + true } -/// Make a valid XML name from a key, returning the key and any attributes +/// Make a valid XML name from a key, returning the tag name and the raw +/// (unescaped) original key when a fallback is needed. Escaping of the +/// attribute value is handled later by `make_attr_string`, so we must NOT +/// escape here to avoid double-escaping. pub fn make_valid_xml_name(key: &str) -> (String, Option<(String, String)>) { - let escaped = escape_xml(key); - // Already valid - if is_valid_xml_name(&escaped) { - return (escaped, None); + if is_valid_xml_name(key) { + return (key.to_string(), None); } // Numeric key - prepend 'n' - if escaped.chars().all(|c| c.is_ascii_digit()) { - return (format!("n{}", escaped), None); + if key.bytes().all(|b| b.is_ascii_digit()) && !key.is_empty() { + return (format!("n{}", key), None); } // Try replacing spaces with underscores - let with_underscores = escaped.replace(' ', "_"); + let with_underscores = key.replace(' ', "_"); if is_valid_xml_name(&with_underscores) { return (with_underscores, None); } - // Fall back to using "key" with name attribute - ("key".to_string(), Some(("name".to_string(), escaped))) + // Fall back to using "key" with name attribute (raw value, escaped later) + ( + "key".to_string(), + Some(("name".to_string(), key.to_string())), + ) } -/// Build an attribute string from key-value pairs +/// Build an attribute string from key-value pairs (allocating convenience wrapper). pub fn make_attr_string(attrs: &[(String, String)]) -> String { - if attrs.is_empty() { - return String::new(); - } - let mut result = String::new(); + let mut out = String::new(); + push_attrs(&mut out, attrs); + out +} + +/// Append XML attributes directly to a buffer. +#[inline] +fn push_attrs(out: &mut String, attrs: &[(String, String)]) { for (k, v) in attrs { - write!(result, " {}=\"{}\"", k, escape_xml(v)).unwrap(); + out.push(' '); + out.push_str(k); + out.push_str("=\""); + push_escaped_attr(out, v); + out.push('"'); } - result +} + +/// Write opening tag with optional name and type attributes directly to buffer. +#[cfg(feature = "python")] +#[inline] +fn write_open_tag(out: &mut String, tag: &str, name_attr: Option<&str>, type_attr: Option<&str>) { + out.push('<'); + out.push_str(tag); + if let Some(name) = name_attr { + out.push_str(" name=\""); + push_escaped_attr(out, name); + out.push('"'); + } + if let Some(ty) = type_attr { + out.push_str(" type=\""); + out.push_str(ty); + out.push('"'); + } + out.push('>'); +} + +/// Write a closing tag directly to buffer. +#[cfg(feature = "python")] +#[inline] +fn write_close_tag(out: &mut String, tag: &str) { + out.push_str("'); } /// Configuration for XML conversion #[cfg(feature = "python")] +#[derive(Copy, Clone)] struct ConvertConfig { attr_type: bool, cdata: bool, @@ -108,496 +199,193 @@ struct ConvertConfig { #[cfg(feature = "python")] use pyo3::PyResult; -/// Convert a Python value to XML string +/// Return `Some(type_name)` when `attr_type` is enabled. +#[cfg(feature = "python")] +#[inline] +fn type_attr<'a>(cfg: &ConvertConfig, ty: &'a str) -> Option<&'a str> { + if cfg.attr_type { + Some(ty) + } else { + None + } +} + +/// Single unified type-dispatch writer. Every Python value goes through here +/// exactly once, writing directly into the shared output buffer. #[cfg(feature = "python")] -fn convert_value( +fn write_value( py: Python<'_>, + out: &mut String, obj: &Bound<'_, PyAny>, - parent: &str, - config: &ConvertConfig, - item_name: &str, -) -> PyResult { - // Handle None + tag: &str, + name_attr: Option<&str>, + cfg: &ConvertConfig, + wrap_container: bool, +) -> PyResult<()> { + // None if obj.is_none() { - return convert_none(item_name, config); + write_open_tag(out, tag, name_attr, type_attr(cfg, "null")); + write_close_tag(out, tag); + return Ok(()); } - // Handle bool (must check before int since bool is subclass of int in Python) + // Bool (must check before int since bool is subclass of int in Python) if obj.is_instance_of::() { - let val: bool = obj.extract()?; - return convert_bool(item_name, val, config); + let v: bool = obj.extract()?; + write_open_tag(out, tag, name_attr, type_attr(cfg, "bool")); + out.push_str(if v { "true" } else { "false" }); + write_close_tag(out, tag); + return Ok(()); } - // Handle int - try i64 first, fall back to string for large integers + // Int - try i64 first, fall back to string for large integers if obj.is_instance_of::() { - let val_str = match obj.extract::() { - Ok(val) => val.to_string(), - Err(_) => obj.str()?.extract::()?, // Fall back for big ints - }; - return convert_number(item_name, &val_str, "int", config); + write_open_tag(out, tag, name_attr, type_attr(cfg, "int")); + match obj.extract::() { + Ok(v) => { + out.push_str(&v.to_string()); + } + Err(_) => { + out.push_str(obj.str()?.to_str()?); + } + } + write_close_tag(out, tag); + return Ok(()); } - // Handle float + // Float - use Python's str() for parity (Rust renders 1.0 as "1") if obj.is_instance_of::() { - let val: f64 = obj.extract()?; - return convert_number(item_name, &val.to_string(), "float", config); - } - - // Handle string - if obj.is_instance_of::() { - let val: String = obj.extract()?; - return convert_string(item_name, &val, config); + write_open_tag(out, tag, name_attr, type_attr(cfg, "float")); + out.push_str(obj.str()?.to_str()?); + write_close_tag(out, tag); + return Ok(()); + } + + // String + if let Ok(py_str) = obj.cast::() { + let s = py_str.to_str()?; + write_open_tag(out, tag, name_attr, type_attr(cfg, "str")); + if cfg.cdata { + push_cdata(out, s); + } else { + push_escaped_text(out, s); + } + write_close_tag(out, tag); + return Ok(()); } - // Handle dict - if obj.is_instance_of::() { - let dict: &Bound<'_, PyDict> = obj.cast()?; - return convert_dict(py, dict, parent, config); + // Dict + if let Ok(dict) = obj.cast::() { + if wrap_container { + write_open_tag(out, tag, name_attr, type_attr(cfg, "dict")); + } + write_dict_contents(py, out, dict, cfg)?; + if wrap_container { + write_close_tag(out, tag); + } + return Ok(()); } - // Handle list - if obj.is_instance_of::() { - let list: &Bound<'_, PyList> = obj.cast()?; - return convert_list(py, list, parent, config); + // List + if let Ok(list) = obj.cast::() { + if wrap_container { + write_open_tag(out, tag, name_attr, type_attr(cfg, "list")); + } + write_list_contents(py, out, list, tag, cfg)?; + if wrap_container { + write_close_tag(out, tag); + } + return Ok(()); } - // Handle other sequences (tuples, etc.) - check if iterable via try_iter + // Other iterables (tuples, generators, etc.) if let Ok(iter) = obj.try_iter() { - let items: Vec> = iter.filter_map(|r| r.ok()).collect(); + let items: Vec> = iter.collect::>()?; let list = PyList::new(py, &items)?; - return convert_list(py, &list, parent, config); - } - - // Fallback: convert to string - let val: String = obj.str()?.extract()?; - convert_string(item_name, &val, config) -} - -/// Convert a string value to XML -#[cfg(feature = "python")] -fn convert_string(key: &str, val: &str, config: &ConvertConfig) -> PyResult { - let (xml_key, name_attr) = make_valid_xml_name(key); - let mut attrs = Vec::new(); - - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "str".to_string())); + if wrap_container { + write_open_tag(out, tag, name_attr, type_attr(cfg, "list")); + } + write_list_contents(py, out, &list, tag, cfg)?; + if wrap_container { + write_close_tag(out, tag); + } + return Ok(()); } - let attr_string = make_attr_string(&attrs); - let content = if config.cdata { - wrap_cdata(val) + // Fallback: convert to string via Python's str() + let py_str = obj.str()?; + let s = py_str.to_str()?; + write_open_tag(out, tag, name_attr, type_attr(cfg, "str")); + if cfg.cdata { + push_cdata(out, s); } else { - escape_xml(val) - }; - - Ok(format!( - "<{}{}>{}", - xml_key, attr_string, content, xml_key - )) -} - -/// Convert a number value to XML -#[cfg(feature = "python")] -fn convert_number( - key: &str, - val: &str, - type_name: &str, - config: &ConvertConfig, -) -> PyResult { - let (xml_key, name_attr) = make_valid_xml_name(key); - let mut attrs = Vec::new(); - - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), type_name.to_string())); - } - - let attr_string = make_attr_string(&attrs); - Ok(format!("<{}{}>{}", xml_key, attr_string, val, xml_key)) -} - -/// Convert a boolean value to XML -#[cfg(feature = "python")] -fn convert_bool(key: &str, val: bool, config: &ConvertConfig) -> PyResult { - let (xml_key, name_attr) = make_valid_xml_name(key); - let mut attrs = Vec::new(); - - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "bool".to_string())); + push_escaped_text(out, s); } - - let attr_string = make_attr_string(&attrs); - let bool_str = if val { "true" } else { "false" }; - Ok(format!( - "<{}{}>{}", - xml_key, attr_string, bool_str, xml_key - )) -} - -/// Convert a None value to XML -#[cfg(feature = "python")] -fn convert_none(key: &str, config: &ConvertConfig) -> PyResult { - let (xml_key, name_attr) = make_valid_xml_name(key); - let mut attrs = Vec::new(); - - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "null".to_string())); - } - - let attr_string = make_attr_string(&attrs); - Ok(format!("<{}{}>", xml_key, attr_string, xml_key)) + write_close_tag(out, tag); + Ok(()) } -/// Convert a dictionary to XML +/// Write all key-value pairs of a dict into the buffer. #[cfg(feature = "python")] -fn convert_dict( +fn write_dict_contents( py: Python<'_>, + out: &mut String, dict: &Bound<'_, PyDict>, - _parent: &str, - config: &ConvertConfig, -) -> PyResult { - let mut output = String::new(); - + cfg: &ConvertConfig, +) -> PyResult<()> { for (key, val) in dict.iter() { let key_str: String = key.str()?.extract()?; - let (xml_key, name_attr) = make_valid_xml_name(&key_str); - - // Handle bool (must check before int) - if val.is_instance_of::() { - let bool_val: bool = val.extract()?; - let mut attrs = Vec::new(); - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "bool".to_string())); - } - let attr_string = make_attr_string(&attrs); - let bool_str = if bool_val { "true" } else { "false" }; - write!( - output, - "<{}{}>{}", - xml_key, attr_string, bool_str, xml_key - ) - .unwrap(); - } - // Handle int - try i64 first, fall back to string for large integers - else if val.is_instance_of::() { - let int_str = match val.extract::() { - Ok(v) => v.to_string(), - Err(_) => val.str()?.extract::()?, - }; - let mut attrs = Vec::new(); - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "int".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!( - output, - "<{}{}>{}", - xml_key, attr_string, int_str, xml_key - ) - .unwrap(); - } - // Handle float - else if val.is_instance_of::() { - let float_val: f64 = val.extract()?; - let mut attrs = Vec::new(); - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "float".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!( - output, - "<{}{}>{}", - xml_key, attr_string, float_val, xml_key - ) - .unwrap(); - } - // Handle string - else if val.is_instance_of::() { - let str_val: String = val.extract()?; - let mut attrs = Vec::new(); - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "str".to_string())); - } - let attr_string = make_attr_string(&attrs); - let content = if config.cdata { - wrap_cdata(&str_val) - } else { - escape_xml(&str_val) - }; - write!( - output, - "<{}{}>{}", - xml_key, attr_string, content, xml_key - ) - .unwrap(); - } - // Handle None - else if val.is_none() { - let mut attrs = Vec::new(); - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "null".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!(output, "<{}{}>", xml_key, attr_string, xml_key).unwrap(); - } - // Handle nested dict - else if val.is_instance_of::() { - let nested_dict: &Bound<'_, PyDict> = val.cast()?; - let mut attrs = Vec::new(); - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "dict".to_string())); - } - let attr_string = make_attr_string(&attrs); - let inner = convert_dict(py, nested_dict, &xml_key, config)?; - write!( - output, - "<{}{}>{}", - xml_key, attr_string, inner, xml_key - ) - .unwrap(); - } - // Handle list - else if val.is_instance_of::() { - let list: &Bound<'_, PyList> = val.cast()?; - let list_output = convert_list(py, list, &xml_key, config)?; - - if config.item_wrap { - let mut attrs = Vec::new(); - if let Some((k, v)) = name_attr { - attrs.push((k, v)); - } - if config.attr_type { - attrs.push(("type".to_string(), "list".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!( - output, - "<{}{}>{}", - xml_key, attr_string, list_output, xml_key - ) - .unwrap(); + let (xml_key, name_attr_pair) = make_valid_xml_name(&key_str); + let name_attr = name_attr_pair.as_ref().map(|(_, v)| v.as_str()); + + // Lists in dicts get special wrapping treatment + if let Ok(list) = val.cast::() { + if cfg.item_wrap { + write_open_tag(out, &xml_key, name_attr, type_attr(cfg, "list")); + write_list_contents(py, out, list, &xml_key, cfg)?; + write_close_tag(out, &xml_key); } else { - output.push_str(&list_output); - } - } - // Fallback: convert to string - else { - let str_val: String = val.str()?.extract()?; - let mut attrs = Vec::new(); - if let Some((k, v)) = name_attr { - attrs.push((k, v)); + write_list_contents(py, out, list, &xml_key, cfg)?; } - if config.attr_type { - attrs.push(("type".to_string(), "str".to_string())); - } - let attr_string = make_attr_string(&attrs); - let content = if config.cdata { - wrap_cdata(&str_val) - } else { - escape_xml(&str_val) - }; - write!( - output, - "<{}{}>{}", - xml_key, attr_string, content, xml_key - ) - .unwrap(); + } else { + write_value(py, out, &val, &xml_key, name_attr, cfg, true)?; } } - - Ok(output) + Ok(()) } -/// Convert a list to XML +/// Write all items of a list into the buffer. #[cfg(feature = "python")] -fn convert_list( +fn write_list_contents( py: Python<'_>, + out: &mut String, list: &Bound<'_, PyList>, parent: &str, - config: &ConvertConfig, -) -> PyResult { - let mut output = String::new(); - let item_name = "item"; + cfg: &ConvertConfig, +) -> PyResult<()> { + let tag_name = if cfg.list_headers { + parent + } else if cfg.item_wrap { + "item" + } else { + parent + }; for item in list.iter() { - let tag_name = if config.item_wrap || config.list_headers { - if config.list_headers { - parent + // Dicts inside lists have special wrapping logic + if let Ok(dict) = item.cast::() { + if cfg.item_wrap || cfg.list_headers { + write_open_tag(out, tag_name, None, type_attr(cfg, "dict")); + write_dict_contents(py, out, dict, cfg)?; + write_close_tag(out, tag_name); } else { - item_name + write_dict_contents(py, out, dict, cfg)?; } } else { - parent - }; - - // Handle bool (must check before int) - if item.is_instance_of::() { - let bool_val: bool = item.extract()?; - let mut attrs = Vec::new(); - if config.attr_type { - attrs.push(("type".to_string(), "bool".to_string())); - } - let attr_string = make_attr_string(&attrs); - let bool_str = if bool_val { "true" } else { "false" }; - write!( - output, - "<{}{}>{}", - tag_name, attr_string, bool_str, tag_name - ) - .unwrap(); - } - // Handle int - try i64 first, fall back to string for large integers - else if item.is_instance_of::() { - let int_str = match item.extract::() { - Ok(v) => v.to_string(), - Err(_) => item.str()?.extract::()?, - }; - let mut attrs = Vec::new(); - if config.attr_type { - attrs.push(("type".to_string(), "int".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!( - output, - "<{}{}>{}", - tag_name, attr_string, int_str, tag_name - ) - .unwrap(); - } - // Handle float - else if item.is_instance_of::() { - let float_val: f64 = item.extract()?; - let mut attrs = Vec::new(); - if config.attr_type { - attrs.push(("type".to_string(), "float".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!( - output, - "<{}{}>{}", - tag_name, attr_string, float_val, tag_name - ) - .unwrap(); - } - // Handle string - else if item.is_instance_of::() { - let str_val: String = item.extract()?; - let mut attrs = Vec::new(); - if config.attr_type { - attrs.push(("type".to_string(), "str".to_string())); - } - let attr_string = make_attr_string(&attrs); - let content = if config.cdata { - wrap_cdata(&str_val) - } else { - escape_xml(&str_val) - }; - write!( - output, - "<{}{}>{}", - tag_name, attr_string, content, tag_name - ) - .unwrap(); - } - // Handle None - else if item.is_none() { - let mut attrs = Vec::new(); - if config.attr_type { - attrs.push(("type".to_string(), "null".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!(output, "<{}{}>", tag_name, attr_string, tag_name).unwrap(); - } - // Handle nested dict - else if item.is_instance_of::() { - let nested_dict: &Bound<'_, PyDict> = item.cast()?; - let inner = convert_dict(py, nested_dict, tag_name, config)?; - - if config.item_wrap || config.list_headers { - let mut attrs = Vec::new(); - if config.attr_type { - attrs.push(("type".to_string(), "dict".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!( - output, - "<{}{}>{}", - tag_name, attr_string, inner, tag_name - ) - .unwrap(); - } else { - output.push_str(&inner); - } - } - // Handle nested list - else if item.is_instance_of::() { - let nested_list: &Bound<'_, PyList> = item.cast()?; - let inner = convert_list(py, nested_list, tag_name, config)?; - - let mut attrs = Vec::new(); - if config.attr_type { - attrs.push(("type".to_string(), "list".to_string())); - } - let attr_string = make_attr_string(&attrs); - write!( - output, - "<{}{}>{}", - tag_name, attr_string, inner, tag_name - ) - .unwrap(); - } - // Fallback - else { - let str_val: String = item.str()?.extract()?; - let mut attrs = Vec::new(); - if config.attr_type { - attrs.push(("type".to_string(), "str".to_string())); - } - let attr_string = make_attr_string(&attrs); - let content = if config.cdata { - wrap_cdata(&str_val) - } else { - escape_xml(&str_val) - }; - write!( - output, - "<{}{}>{}", - tag_name, attr_string, content, tag_name - ) - .unwrap(); + write_value(py, out, &item, tag_name, None, cfg, true)?; } } - - Ok(output) + Ok(()) } /// Convert a Python dict/list to XML bytes. @@ -629,6 +417,13 @@ fn dicttoxml( cdata: bool, list_headers: bool, ) -> PyResult> { + if !is_valid_xml_name(custom_root) { + return Err(PyValueError::new_err(format!( + "Invalid XML root element name: '{}'", + custom_root + ))); + } + let config = ConvertConfig { attr_type, cdata, @@ -636,26 +431,30 @@ fn dicttoxml( list_headers, }; - let content = if obj.is_instance_of::() { - let dict: &Bound<'_, PyDict> = obj.cast()?; - convert_dict(py, dict, custom_root, &config)? - } else if obj.is_instance_of::() { - let list: &Bound<'_, PyList> = obj.cast()?; - convert_list(py, list, custom_root, &config)? - } else { - convert_value(py, obj, custom_root, &config, custom_root)? - }; + let mut out = String::new(); - let output = if root { - format!( - "<{}>{}", - custom_root, content, custom_root - ) + if root { + out.push_str(""); + out.push('<'); + out.push_str(custom_root); + out.push('>'); + } + + if let Ok(dict) = obj.cast::() { + write_dict_contents(py, &mut out, dict, &config)?; + } else if let Ok(list) = obj.cast::() { + write_list_contents(py, &mut out, list, custom_root, &config)?; } else { - content - }; + write_value(py, &mut out, obj, custom_root, None, &config, true)?; + } + + if root { + out.push_str("'); + } - Ok(output.into_bytes()) + Ok(out.into_bytes()) } /// Fast XML string escaping. @@ -876,10 +675,23 @@ mod tests { } #[test] - fn escapes_special_chars_in_name() { + fn returns_raw_key_for_invalid_names() { + // make_valid_xml_name must return the raw key, not escaped. + // Escaping happens later in make_attr_string to avoid double-escaping. let (name, attr) = make_valid_xml_name("tag&name"); assert_eq!(name, "key"); - assert_eq!(attr, Some(("name".to_string(), "tag&name".to_string()))); + assert_eq!(attr, Some(("name".to_string(), "tag&name".to_string()))); + } + + #[test] + fn double_escape_does_not_happen() { + // End-to-end: make_valid_xml_name + make_attr_string should produce + // a single level of escaping, not &amp; + let (name, attr) = make_valid_xml_name("tag&name"); + assert_eq!(name, "key"); + let attrs = attr.map(|(k, v)| vec![(k, v)]).unwrap_or_default(); + let attr_string = make_attr_string(&attrs); + assert_eq!(attr_string, " name=\"tag&name\""); } } @@ -912,4 +724,79 @@ mod tests { assert_eq!(make_attr_string(&attrs), " name=\"foo & bar\""); } } + + mod push_escaped_text_tests { + use super::*; + + #[test] + fn escapes_special_chars_in_text() { + let mut out = String::new(); + push_escaped_text(&mut out, "a < b & c > d"); + assert_eq!(out, "a < b & c > d"); + } + + #[test] + fn escapes_quotes_in_text() { + let mut out = String::new(); + push_escaped_text(&mut out, "say \"hello\" & 'bye'"); + assert_eq!(out, "say "hello" & 'bye'"); + } + + #[test] + fn handles_empty_string() { + let mut out = String::new(); + push_escaped_text(&mut out, ""); + assert_eq!(out, ""); + } + + #[test] + fn handles_no_special_chars() { + let mut out = String::new(); + push_escaped_text(&mut out, "plain text 123"); + assert_eq!(out, "plain text 123"); + } + + #[test] + fn handles_unicode() { + let mut out = String::new(); + push_escaped_text(&mut out, "café & thé"); + assert_eq!(out, "café & thé"); + } + } + + mod push_escaped_attr_tests { + use super::*; + + #[test] + fn escapes_quotes_and_special_chars() { + let mut out = String::new(); + push_escaped_attr(&mut out, "a\"b'c&df"); + assert_eq!(out, "a"b'c&d<e>f"); + } + } + + mod push_cdata_tests { + use super::*; + + #[test] + fn wraps_simple_string() { + let mut out = String::new(); + push_cdata(&mut out, "hello"); + assert_eq!(out, ""); + } + + #[test] + fn escapes_cdata_end_sequence() { + let mut out = String::new(); + push_cdata(&mut out, "foo]]>bar"); + assert_eq!(out, "bar]]>"); + } + + #[test] + fn handles_multiple_cdata_end_sequences() { + let mut out = String::new(); + push_cdata(&mut out, "a]]>b]]>c"); + assert_eq!(out, "b]]]]>c]]>"); + } + } }