Ensure proper cleanup for <pre> tags containing complex HTML#22
Open
rainux wants to merge 1 commit intokepano:mainfrom
Open
Ensure proper cleanup for <pre> tags containing complex HTML#22rainux wants to merge 1 commit intokepano:mainfrom
rainux wants to merge 1 commit intokepano:mainfrom
Conversation
Owner
|
Thanks. Can you provide an example of a page that didn't work before? |
Contributor
Author
|
I'm sorry, I didn't directly provide URL since it's a site for porngraphy novel. https://hlib.cc/n/15263018 Also I understand complex HTML tags should not exist in |
nareshrajkumar866
approved these changes
May 23, 2025
nareshrajkumar866
approved these changes
May 23, 2025
nareshrajkumar866
approved these changes
May 26, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR maximize the compatibility with the situation where
<pre>tags containing block-level elements (e.g.,<p>,<ul>,<br>) were not being cleaned correctly by Defuddle.Problem: The previous logic treated all
<pre>-like elements as potential code blocks. Complex HTML inside them was inappropriately formatted because standard cleanup ignores<pre>internals.Solution: The rule handling preformatted elements (codeBlockRules in code.ts) now checks the children of
<pre>(and similar containers):<br>are found, the<pre>is converted to a<div>. This ensures its content is treated as standard HTML by later cleanup steps.<pre><code>block (maintaining the original behavior for simple preformatted text and code).Potential Refactoring:
The transform function now does a couple of different things based on the content. I'm not sure if it's better to keep it this way for simplicity, or if we should maybe split the logic for "convert complex
<pre>to<div>" and "standardize to<pre><code>" into separate helper functions within code.ts, also the filename and rule name may be renamed to something likepreformatedXxx.Happy to discuss or explore this in a follow-up if you think it makes sense!
This PR was co-authored with Gemini 2.5 Pro.