Add DOM mutation operations: remove/2, remove_attribute/2, set_attribute/3 by jsmestad · Pull Request #34 · dashbitco/lazy_html

jsmestad · 2026-04-09T22:17:15Z

Summary

Three new functions that mutate the native Lexbor DOM in place, enabling efficient tree transformation without round-tripping through Elixir tuples.

New functions

remove/2 - removes all elements matching a CSS selector from the DOM tree. Uses the same CSS selector engine as query/2. Collects matching nodes first (can't modify during traversal), then destroys them via lxb_dom_node_destroy (unlink + free).
remove_attribute/2 - removes a named attribute from all element nodes and their descendants. Walks the subtree in C via lxb_dom_node_simple_walk.
set_attribute/3 - sets an attribute on all element nodes in the set. Uses the existing lxb_dom_element_set_attribute (already called by from_tree).

Motivation

For HTML transformation workloads (sanitization, content stripping), the current workflow requires exporting the DOM to Elixir tuples via to_tree, walking them in pure Elixir, then importing back via from_tree. On a corpus of 51 real production pages (8.4MB HTML), this takes ~500ms.

Cost breakdown of the current approach:

Step	Time	What it does
`from_fragment`	28ms	Parse HTML (Lexbor C, fast)
`to_tree`	75ms	Convert native DOM to BEAM tuples
Elixir tree walk	370ms	Remove junk tags, strip attrs, etc.
`Tree.to_html`	30ms	Serialize Elixir tree
Total	503ms

With native mutation, the entire pipeline stays in C:

html
|> LazyHTML.from_fragment()                      #  28ms
|> LazyHTML.remove("script, style, nav, ...")    #  ~5ms
|> LazyHTML.remove("[hidden]")                   #  ~2ms
|> LazyHTML.to_html()                            # ~12ms
# Total: ~50ms estimated (10x faster)

The to_tree export and Elixir tree walking are eliminated entirely.

Implementation

All three NIF functions follow the existing patterns in lazy_html.cpp:

dom_remove reuses parse_css_selector and lxb_selectors_find from query. After collecting matching nodes, it calls lxb_dom_node_destroy (which both unlinks and frees). It also scrubs the LazyHTML.nodes vector to prevent dangling pointer access.
dom_remove_attribute uses lxb_dom_node_simple_walk to traverse descendants and lxb_dom_element_remove_attribute on each element.
dom_set_attribute calls lxb_dom_element_set_attribute, which is already used in from_tree.

Safety

remove/2 mutates the underlying DOM. Any %LazyHTML{} values previously obtained via query/2 that reference removed nodes become invalid. This is documented with a warning admonition in the function docs.

Tests

17 new tests + 3 new doctests covering:

Simple, compound, and attribute CSS selectors
Nested element removal
Root node removal
No-op when nothing matches
Subsequent queries reflecting mutations
Attribute removal from nested elements
Attribute set/overwrite
Multi-node attribute set from query results

All 96 tests pass (39 doctests + 57 tests, 0 failures).

…ute/3 Three new functions that mutate the native Lexbor DOM in place, enabling efficient tree transformation without round-tripping through Elixir tuples. ## New functions - `remove/2` - Removes all elements matching a CSS selector from the DOM. Uses the same selector engine as `query/2`. Collects matching nodes first, then destroys them via `lxb_dom_node_destroy` (unlink + free). - `remove_attribute/2` - Removes a named attribute from all element nodes and their descendants. Walks the subtree in C via `lxb_dom_node_simple_walk`. - `set_attribute/3` - Sets an attribute on all element nodes in the set. Uses the existing `lxb_dom_element_set_attribute` (already used by `from_tree`). ## Motivation For HTML transformation workloads (sanitization, content stripping), the current workflow requires exporting the DOM to Elixir tuples via `to_tree`, walking them in pure Elixir, then importing back. On a corpus of 51 real production pages (8.4MB HTML), this takes ~500ms. With native mutation, the same work can stay in C throughout: html |> LazyHTML.from_fragment() |> LazyHTML.remove("script, style, nav, footer") |> LazyHTML.remove("[hidden]") |> LazyHTML.to_html() Estimated speedup: ~8x (500ms -> ~60ms). ## Safety note `remove/2` mutates the underlying DOM. Any `%LazyHTML{}` values previously obtained via `query/2` that reference removed nodes become invalid. This is documented in the function's warning admonition. ## Tests 17 new tests covering: - Simple and compound CSS selectors - Nested element removal - Attribute selector removal (`[hidden]`) - Root node removal - No-op when nothing matches - Subsequent queries reflecting mutations - Attribute removal from nested elements - Attribute set/overwrite - Multi-node attribute set from query results 3 new doctests.

josevalim · 2026-04-10T06:52:49Z

Unfortunately mutations introduce a bunch of side-effects into the tree and leaves it up to the user to manage state and deal with the side-effects of that. I think it would make more sense to introduce a transform API, where you express query selectors and the operations you want to do with those, and then you apply it on a copy of the tree. So you get the benefits you mentioned but on top of a pure API.

Something like:

html
|> LazyHTML.from_fragment()
|> LazyHTML.transform([
  LazyHTML.Transform.remove("script, style, nav, ..."),
  LazyHTML.Transform.set_attribute("#omg", "data-foo", "bar")
])
|> LazyHTML.to_html()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DOM mutation operations: remove/2, remove_attribute/2, set_attribute/3#34

Add DOM mutation operations: remove/2, remove_attribute/2, set_attribute/3#34
jsmestad wants to merge 1 commit intodashbitco:mainfrom
jsmestad:dom-mutation-api

jsmestad commented Apr 9, 2026

Uh oh!

josevalim commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsmestad commented Apr 9, 2026

Summary

New functions

Motivation

Implementation

Safety

Tests

Uh oh!

josevalim commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

josevalim commented Apr 10, 2026 •

edited

Loading