-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Open
Labels
area-System.Text.RegularExpressionstenet-performancePerformance related issuePerformance related issue
Description
Found in #124842, this test fails. It's only a possible perf issue that it's searching for "htt" not "http", not correctness.
// Alternation prefix factoring leaves Alternate(Set[Pp], Concat(Set[Pp],Set[Ss...])) — single-node branch isn't a Concat, so extraction stops at "htt"
[InlineData(@"((http|https)://foo)", (int)RegexOptions.IgnoreCase, (int)FindNextStartingPositionMode.LeadingString_OrdinalIgnoreCase_LeftToRight, "htt")]
public void LeadingPrefix(string pattern, int options, int expectedMode, string expectedPrefix)
{
RegexFindOptimizations opts = ComputeOptimizations(pattern, (RegexOptions)options);
Assert.Equal((FindNextStartingPositionMode)expectedMode, opts.FindMode);
Assert.Equal(expectedPrefix, opts.LeadingPrefix);
}
note from there
It's probably due to how it ends up being represented in the node graph:
/// ○ Match a character in the set [Hh].<br/>
/// ○ Match a character in the set [Tt] exactly 2 times.<br/>
/// ○ Match with 2 alternative expressions.<br/>
/// ○ Match a character in the set [Pp].<br/>
/// ○ Match a sequence of expressions.<br/>
/// ○ Match a character in the set [Pp].<br/>
/// ○ Match a character in the set [Ss].<br/>
I don't remember the code, but the prefix analyzer is likely stopping when it gets to the two alternation branches and rationalizes comparing a set against a concatenation. Just a guess. If that's accurate, we could improve such cases with more code.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area-System.Text.RegularExpressionstenet-performancePerformance related issuePerformance related issue