Skip to content

Patterns with literal string in ASCII alternation of different length do not fast-search for longest string #124871

@danmoseley

Description

@danmoseley

Found in #124842, this test fails. It's only a possible perf issue that it's searching for "htt" not "http", not correctness.

 // Alternation prefix factoring leaves Alternate(Set[Pp], Concat(Set[Pp],Set[Ss...])) — single-node branch isn't a Concat, so extraction stops at "htt"
        [InlineData(@"((http|https)://foo)", (int)RegexOptions.IgnoreCase, (int)FindNextStartingPositionMode.LeadingString_OrdinalIgnoreCase_LeftToRight, "htt")]
        public void LeadingPrefix(string pattern, int options, int expectedMode, string expectedPrefix)
        {
            RegexFindOptimizations opts = ComputeOptimizations(pattern, (RegexOptions)options);
            Assert.Equal((FindNextStartingPositionMode)expectedMode, opts.FindMode);
            Assert.Equal(expectedPrefix, opts.LeadingPrefix);
        }

note from there

It's probably due to how it ends up being represented in the node graph:

    ///     ○ Match a character in the set [Hh].<br/>
    ///     ○ Match a character in the set [Tt] exactly 2 times.<br/>
    ///     ○ Match with 2 alternative expressions.<br/>
    ///         ○ Match a character in the set [Pp].<br/>
    ///         ○ Match a sequence of expressions.<br/>
    ///             ○ Match a character in the set [Pp].<br/>
    ///             ○ Match a character in the set [Ss].<br/>

I don't remember the code, but the prefix analyzer is likely stopping when it gets to the two alternation branches and rationalizes comparing a set against a concatenation. Just a guess. If that's accurate, we could improve such cases with more code.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions