Skip to content

Fix URL percent-encoding using space-padding instead of zero-padding#259

Open
bysiber wants to merge 1 commit intopypa:masterfrom
bysiber:fix/url-percent-encoding-padding
Open

Fix URL percent-encoding using space-padding instead of zero-padding#259
bysiber wants to merge 1 commit intopypa:masterfrom
bysiber:fix/url-percent-encoding-padding

Conversation

@bysiber
Copy link
Copy Markdown

@bysiber bysiber commented Feb 20, 2026

Summary

URL percent-encoding in Page.links uses %2x format which pads with a space instead of a zero for characters with ordinal values < 16.

Problem

The format string '%%%2x' produces space-padded hex values:

>>> '%%%2x' % 10
'% a'   # invalid URL encoding

>>> '%%%02x' % 10
'%0a'   # correct URL encoding

Characters like newline (0x0a), tab (0x09), etc. matched by _clean_re would be encoded as % a, % 9, etc. — these are not valid percent-encoded sequences per RFC 3986 and would break URL parsing.

Fix

Change %2x to %02x to zero-pad the hex value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant