Skip to content

Commit 300de1e

Browse files
gpsheadclaudehauntsaninja
authored
gh-86519: Add prefixmatch APIs to the re module (GH-31137)
Adds `prefixmatch` APIs to the re module as an alternate name for our long existing `match` APIs to help alleviate a common Python confusion for those coming from other languages regular expression libraries. These alleviate common confusion around what "match" means as Python is different than other popular languages regex libraries in our use of the term as an API name. The original `match` names are **NOT being deprecated**. Source tooling like linters, IDEs, and LLMs could suggest using `prefixmatch` instead of match to improve code health and reduce cognitive burden of understanding the intent of code when configured for a modern minimum Python version. See the documentation changes for a better description. Discussions took place in the PR, in the issue, and finally at https://discuss.python.org/t/add-re-prefixmatch-deprecate-re-match/105927 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 5fe139c commit 300de1e

File tree

8 files changed

+251
-123
lines changed

8 files changed

+251
-123
lines changed

Doc/library/re.rst

Lines changed: 124 additions & 69 deletions
Large diffs are not rendered by default.

Doc/whatsnew/3.15.rst

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -824,6 +824,19 @@ pickle
824824
(Contributed by Zackery Spytz and Serhiy Storchaka in :gh:`77188`.)
825825

826826

827+
re
828+
--
829+
830+
* :func:`re.prefixmatch` and a corresponding :meth:`~re.Pattern.prefixmatch`
831+
have been added as alternate more explicit names for the existing
832+
:func:`re.match` and :meth:`~re.Pattern.match` APIs. These are intended
833+
to be used to alleviate confusion around what *match* means by following the
834+
Zen of Python's *"Explicit is better than implicit"* mantra. Most other
835+
language regular expression libraries use an API named *match* to mean what
836+
Python has always called *search*.
837+
(Contributed by Gregory P. Smith in :gh:`86519`.)
838+
839+
827840
resource
828841
--------
829842

@@ -1285,7 +1298,7 @@ Diego Russo in :gh:`140683` and :gh:`142305`.)
12851298

12861299

12871300
Removed
1288-
=======
1301+
========
12891302

12901303
ctypes
12911304
------

Lib/re/__init__.py

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -85,17 +85,18 @@
8585
\\ Matches a literal backslash.
8686
8787
This module exports the following functions:
88-
match Match a regular expression pattern to the beginning of a string.
89-
fullmatch Match a regular expression pattern to all of a string.
90-
search Search a string for the presence of a pattern.
91-
sub Substitute occurrences of a pattern found in a string.
92-
subn Same as sub, but also return the number of substitutions made.
93-
split Split a string by the occurrences of a pattern.
94-
findall Find all occurrences of a pattern in a string.
95-
finditer Return an iterator yielding a Match object for each match.
96-
compile Compile a pattern into a Pattern object.
97-
purge Clear the regular expression cache.
98-
escape Backslash all non-alphanumerics in a string.
88+
prefixmatch Match a regular expression pattern to the beginning of a str.
89+
match The original name of prefixmatch prior to 3.15.
90+
fullmatch Match a regular expression pattern to all of a string.
91+
search Search a string for the presence of a pattern.
92+
sub Substitute occurrences of a pattern found in a string.
93+
subn Same as sub, but also return the number of substitutions made.
94+
split Split a string by the occurrences of a pattern.
95+
findall Find all occurrences of a pattern in a string.
96+
finditer Return an iterator yielding a Match object for each match.
97+
compile Compile a pattern into a Pattern object.
98+
purge Clear the regular expression cache.
99+
escape Backslash all non-alphanumerics in a string.
99100
100101
Each function other than purge and escape can take an optional 'flags' argument
101102
consisting of one or more of the following module constants, joined by "|".
@@ -130,7 +131,7 @@
130131

131132
# public symbols
132133
__all__ = [
133-
"match", "fullmatch", "search", "sub", "subn", "split",
134+
"prefixmatch", "match", "fullmatch", "search", "sub", "subn", "split",
134135
"findall", "finditer", "compile", "purge", "escape",
135136
"error", "Pattern", "Match", "A", "I", "L", "M", "S", "X", "U",
136137
"ASCII", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE",
@@ -159,10 +160,13 @@ class RegexFlag:
159160
# --------------------------------------------------------------------
160161
# public interface
161162

162-
def match(pattern, string, flags=0):
163+
def prefixmatch(pattern, string, flags=0):
163164
"""Try to apply the pattern at the start of the string, returning
164165
a Match object, or None if no match was found."""
165-
return _compile(pattern, flags).match(string)
166+
return _compile(pattern, flags).prefixmatch(string)
167+
168+
# Our original name which was less explicitly clear about the behavior for prefixmatch.
169+
match = prefixmatch
166170

167171
def fullmatch(pattern, string, flags=0):
168172
"""Try to apply the pattern to all of the string, returning
@@ -311,7 +315,7 @@ def escape(pattern):
311315
return pattern.translate(_special_chars_map).encode('latin1')
312316

313317
Pattern = type(_compiler.compile('', 0))
314-
Match = type(_compiler.compile('', 0).match(''))
318+
Match = type(_compiler.compile('', 0).prefixmatch(''))
315319

316320
# --------------------------------------------------------------------
317321
# internals
@@ -410,10 +414,10 @@ def __init__(self, lexicon, flags=0):
410414
def scan(self, string):
411415
result = []
412416
append = result.append
413-
match = self.scanner.scanner(string).match
417+
_match = self.scanner.scanner(string).prefixmatch
414418
i = 0
415419
while True:
416-
m = match()
420+
m = _match()
417421
if not m:
418422
break
419423
j = m.end()

Lib/test/test_inspect/test_inspect.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6277,7 +6277,10 @@ def test_pwd_module_has_signatures(self):
62776277

62786278
def test_re_module_has_signatures(self):
62796279
import re
6280-
methods_no_signature = {'Match': {'group'}}
6280+
methods_no_signature = {
6281+
'Match': {'group'},
6282+
'Pattern': {'match'}, # It is now an alias for prefixmatch
6283+
}
62816284
self._test_module_has_signatures(re,
62826285
methods_no_signature=methods_no_signature,
62836286
good_exceptions={'error', 'PatternError'})

Lib/test/test_re.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -90,10 +90,13 @@ def test_search_star_plus(self):
9090
self.assertEqual(re.search('x+', 'axx').span(), (1, 3))
9191
self.assertIsNone(re.search('x', 'aaa'))
9292
self.assertEqual(re.match('a*', 'xxx').span(0), (0, 0))
93+
self.assertEqual(re.prefixmatch('a*', 'xxx').span(0), (0, 0))
9394
self.assertEqual(re.match('a*', 'xxx').span(), (0, 0))
9495
self.assertEqual(re.match('x*', 'xxxa').span(0), (0, 3))
96+
self.assertEqual(re.prefixmatch('x*', 'xxxa').span(0), (0, 3))
9597
self.assertEqual(re.match('x*', 'xxxa').span(), (0, 3))
9698
self.assertIsNone(re.match('a+', 'xxx'))
99+
self.assertIsNone(re.prefixmatch('a+', 'xxx'))
97100

98101
def test_branching(self):
99102
"""Test Branching
@@ -180,6 +183,7 @@ def test_bug_449000(self):
180183
def test_bug_1661(self):
181184
# Verify that flags do not get silently ignored with compiled patterns
182185
pattern = re.compile('.')
186+
self.assertRaises(ValueError, re.prefixmatch, pattern, 'A', re.I)
183187
self.assertRaises(ValueError, re.match, pattern, 'A', re.I)
184188
self.assertRaises(ValueError, re.search, pattern, 'A', re.I)
185189
self.assertRaises(ValueError, re.findall, pattern, 'A', re.I)
@@ -517,6 +521,8 @@ def test_re_match(self):
517521
self.assertEqual(re.match(b'(a)', string).group(0), b'a')
518522
self.assertEqual(re.match(b'(a)', string).group(1), b'a')
519523
self.assertEqual(re.match(b'(a)', string).group(1, 1), (b'a', b'a'))
524+
self.assertEqual(re.prefixmatch(b'(a)', string).group(1, 1),
525+
(b'a', b'a'))
520526
for a in ("\xe0", "\u0430", "\U0001d49c"):
521527
self.assertEqual(re.match(a, a).groups(), ())
522528
self.assertEqual(re.match('(%s)' % a, a).groups(), (a,))
@@ -558,10 +564,8 @@ def __index__(self):
558564
self.assertEqual(m.group(2, 1), ('b', 'a'))
559565
self.assertEqual(m.group(Index(2), Index(1)), ('b', 'a'))
560566

561-
def test_match_getitem(self):
562-
pat = re.compile('(?:(?P<a1>a)|(?P<b2>b))(?P<c3>c)?')
563-
564-
m = pat.match('a')
567+
def do_test_match_getitem(self, match_fn):
568+
m = match_fn('a')
565569
self.assertEqual(m['a1'], 'a')
566570
self.assertEqual(m['b2'], None)
567571
self.assertEqual(m['c3'], None)
@@ -585,7 +589,7 @@ def test_match_getitem(self):
585589
with self.assertRaisesRegex(IndexError, 'no such group'):
586590
'a1={a2}'.format_map(m)
587591

588-
m = pat.match('ac')
592+
m = match_fn('ac')
589593
self.assertEqual(m['a1'], 'a')
590594
self.assertEqual(m['b2'], None)
591595
self.assertEqual(m['c3'], 'c')
@@ -602,6 +606,14 @@ def test_match_getitem(self):
602606
# No len().
603607
self.assertRaises(TypeError, len, m)
604608

609+
def test_match_getitem(self):
610+
pat = re.compile('(?:(?P<a1>a)|(?P<b2>b))(?P<c3>c)?')
611+
self.do_test_match_getitem(pat.match)
612+
613+
def test_prefixmatch_getitem(self):
614+
pat = re.compile('(?:(?P<a1>a)|(?P<b2>b))(?P<c3>c)?')
615+
self.do_test_match_getitem(pat.prefixmatch)
616+
605617
def test_re_fullmatch(self):
606618
# Issue 16203: Proposal: add re.fullmatch() method.
607619
self.assertEqual(re.fullmatch(r"a", "a").span(), (0, 1))
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
The :mod:`re` module gains a new :func:`re.prefixmatch` function as an
2+
explicit spelling of what has to date always been known as :func:`re.match`.
3+
:class:`re.Pattern` similary gains a :meth:`re.Pattern.prefixmatch` method.
4+
5+
Why? Explicit is better than implicit. Other widely used languages all use
6+
the term "match" to mean what Python uses the term "search" for. The
7+
unadorened "match" name in Python has been a frequent case of confusion and
8+
coding bugs due to the inconsistency with the rest if the software industry.
9+
10+
We do not plan to deprecate and remove the older ``match`` name.

Modules/_sre/clinic/sre.c.h

Lines changed: 19 additions & 19 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)