-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
gh-135661: Fix CDATA section parsing in HTMLParser #135665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
f7f9f56
816f34e
cf918e3
d346c10
9e1ae33
524cac5
2a1bb46
8cdbc95
e4f13a8
50fd4b3
165fd1e
a5f45b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -325,7 +325,11 @@ def parse_html_declaration(self, i): | |
| # this case is actually already handled in goahead() | ||
| return self.parse_comment(i) | ||
| elif rawdata[i:i+9] == '<![CDATA[': | ||
| return self.parse_marked_section(i) | ||
| j = rawdata.find(']]>') | ||
| if j < 0: | ||
| return -1 | ||
| self.unknown_decl(rawdata[i+3: j]) | ||
| return j + 3 | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to the HTML5 standard (https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state), it should be either data or bogus comment (which ends with
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried copying the content of the tests in the following file: <!DOCTYPE html>
<html>
<body>
<![CDATA[just some plain text]]><hr>
<![CDATA[<!-- not a comment -->]]><hr>
<![CDATA[¬-an-entity-ref;]]><hr>
<![CDATA[<not a='start tag'>]]><hr>
<![CDATA[]]><hr>
<![CDATA[[[I have many brackets]]]]><hr>
<![CDATA[I have a > in the middle]]><hr>
<![CDATA[I have a ]] in the middle]]><hr>
<![CDATA[] ]>]]><hr>
<![CDATA[]] >]]><hr>
<![CDATA[
if (a < b && a > b) {
printf("[<marquee>How?</marquee>]");
}
]]><hr>
</body>
</html>and this was the result on Firefox: <html><head></head><body>
<!--[CDATA[just some plain text]]--><hr>
<!--[CDATA[<!-- not a comment ---->]]><hr>
<!--[CDATA[¬-an-entity-ref;]]--><hr>
<!--[CDATA[<not a='start tag'-->]]><hr>
<!--[CDATA[]]--><hr>
<!--[CDATA[[[I have many brackets]]]]--><hr>
<!--[CDATA[I have a --> in the middle]]><hr>
<!--[CDATA[I have a ]] in the middle]]--><hr>
<!--[CDATA[] ]-->]]><hr>
<!--[CDATA[]] -->]]><hr>
<!--[CDATA[
if (a < b && a --> b) {
printf("[<marquee>How?</marquee>]");
}
]]><hr>
</body></html>
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, and if you try This is context dependent. HTMLParser is actually just a tokenizer. To determine the context automatically, it needs to support the stack of open elements and to know what elements are in the HTML namespace. This is all in the specification, and we will implement this in future. But this is a different level of complexity. So I solved the issue by letting the user to determine the context. New method |
||
| elif rawdata[i:i+9].lower() == '<!doctype': | ||
| # find the closing > | ||
| gtpos = rawdata.find('>', i+9) | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,2 @@ | ||||||||||||||||||
| Fix CDATA section parsing in :class:`html.parser.HTMLParser` according to | ||||||||||||||||||
| the HTML5 standard: ``] ]>`` and ``]] >`` no longer end the CDATA section. | ||||||||||||||||||
|
Comment on lines
+1
to
+2
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.