Skip to content

Optimization of REXML::XPathParser#step#310

Open
naitoh wants to merge 2 commits intoruby:masterfrom
naitoh:fix_XPathParser_step_descendant_performance_xpath
Open

Optimization of REXML::XPathParser#step#310
naitoh wants to merge 2 commits intoruby:masterfrom
naitoh:fix_XPathParser_step_descendant_performance_xpath

Conversation

@naitoh
Copy link
Copy Markdown
Contributor

@naitoh naitoh commented May 5, 2026

Benchmark

$ benchmark-driver benchmark/xpath.yaml
                                                           before       after  before(YJIT)  after(YJIT) 
  REXML::XPath.match(REXML::Document.new(xml), 'a//a')     4.170k      4.300k        4.091k       4.087k i/s -     100.000 times in 0.023979s 0.023257s 0.024444s 0.024470s
REXML::XPath.match(REXML::Document.new(xml), '//a//a')    196.250      1.378k       293.984       1.701k i/s -     100.000 times in 0.509554s 0.072574s 0.340154s 0.058796s

Comparison:
               REXML::XPath.match(REXML::Document.new(xml), 'a//a')
                                                 after:      4299.8 i/s 
                                                before:      4170.3 i/s - 1.03x  slower
                                          before(YJIT):      4091.0 i/s - 1.05x  slower
                                           after(YJIT):      4086.6 i/s - 1.05x  slower

             REXML::XPath.match(REXML::Document.new(xml), '//a//a')
                                           after(YJIT):      1700.8 i/s 
                                                 after:      1377.9 i/s - 1.23x  slower
                                          before(YJIT):       294.0 i/s - 5.79x  slower
                                                before:       196.3 i/s - 8.67x  slower
  • YJIT=ON : 0.99x - 5.79x faster
  • YJIT=OFF : 1.03x - 7.01x faster

@naitoh naitoh requested a review from kou May 5, 2026 06:47
Comment thread lib/rexml/xpath_parser.rb
new_nodes = {}
nodeset.each do |node|
new_nodeset = []
new_nodes = {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, the result of this xpath has changed.

REXML::XPath.match(REXML::Document.new('<a id="a1"><b id="b1"><a id="a2"><b id="b2"><b id="b3"></b></b></a></b></a>'), '//a/descendant::b[1]')
# => [<b id='b1'> ... </>, <b id='b2'> ... </>] Expected (in master branch)
# => [<b id='b1'> ... </>] Actual (this pull request)

Reference:

document.body.innerHTML='<a id="a1"><b id="b1"><a id="a2"><b id="b2"><b id="b3"></b></b></a></b></a>'
snapshot = document.evaluate('//a/descendant::b[1]', document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
[...new Array(snapshot.snapshotLength)].map((_, i) => snapshot.snapshotItem(i))
// => [b#b1, b#b2]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry.
Since this change has unintended consequences, I will cancel the change to #descendant.
Thanks!

Comment thread lib/rexml/xpath_parser.rb Outdated
Comment on lines +470 to +471
next if seen.key?(raw_node.object_id)
seen[raw_node.object_id] = true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object_id will lazily assign a new id when it is called. We can avoid this by using seen = {}.compare_by_identity instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out.
I’ll fix it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using compare_by_identity, Hash will compare by its identity, (same as comparing by object_id, but without assigning actual object_id), so seen.key?(raw_node) and seen[raw_node] = true is enough.

These two code gives the same result

h = {}
h[key.object_id] = value
h[key.object_id] #=> value
h[key.dup.object_id] #=> nil
h = {}.compare_by_identity
h[key] = value
h[key] #=> value
h[key.dup] #=> nil

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see.
Fixed it.

@naitoh naitoh force-pushed the fix_XPathParser_step_descendant_performance_xpath branch from 5b3e01a to 2d3346f Compare May 5, 2026 16:20
@naitoh naitoh changed the title Optimization of REXML::XPathParser#step,#descendant Optimization of REXML::XPathParser#step May 5, 2026
@naitoh naitoh requested a review from tompng May 5, 2026 16:23
naitoh added 2 commits May 6, 2026 05:12
## Benchmark
```
$ benchmark-driver benchmark/xpath.yaml
                                                           before       after  before(YJIT)  after(YJIT)
  REXML::XPath.match(REXML::Document.new(xml), 'a//a')     4.170k      4.300k        4.091k       4.087k i/s -     100.000 times in 0.023979s 0.023257s 0.024444s 0.024470s
REXML::XPath.match(REXML::Document.new(xml), '//a//a')    196.250      1.378k       293.984       1.701k i/s -     100.000 times in 0.509554s 0.072574s 0.340154s 0.058796s

Comparison:
               REXML::XPath.match(REXML::Document.new(xml), 'a//a')
                                                 after:      4299.8 i/s
                                                before:      4170.3 i/s - 1.03x  slower
                                          before(YJIT):      4091.0 i/s - 1.05x  slower
                                           after(YJIT):      4086.6 i/s - 1.05x  slower

             REXML::XPath.match(REXML::Document.new(xml), '//a//a')
                                           after(YJIT):      1700.8 i/s
                                                 after:      1377.9 i/s - 1.23x  slower
                                          before(YJIT):       294.0 i/s - 5.79x  slower
                                                before:       196.3 i/s - 8.67x  slower
```
- YJIT=ON : 0.99x - 5.79x faster
- YJIT=OFF : 1.03x - 7.01x faster
@naitoh naitoh force-pushed the fix_XPathParser_step_descendant_performance_xpath branch from 2d3346f to 6456800 Compare May 5, 2026 20:14
@kou kou requested review from Copilot and removed request for tompng May 6, 2026 03:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes REXML::XPathParser#step by deduplicating nodes earlier when merging multiple node sets, improving performance (notably for queries like //a//a) while preserving correct XPath node-set semantics (no duplicates, document order) and predicate behavior.

Changes:

  • Deduplicate merged node sets in XPathParser#step using identity-based tracking before sorting and assigning positions.
  • Add regression tests covering descendant-axis ordering, duplicate elimination, and position predicates across overlapping subtrees.
  • Update the XPath benchmark to include a //a//a scenario and reduce XML nesting depth.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
lib/rexml/xpath_parser.rb Deduplicates merged raw nodes (by identity) before sorting and wrapping into XPathNodes.
test/xpath/test_base.rb Adds regression tests for descendant-axis behavior (no duplicates, document order, predicate position semantics).
benchmark/xpath.yaml Adds a new benchmark case for //a//a and reduces generated XML depth.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread benchmark/xpath.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants