esi: replace _findOpeningTag with memchr#13173
Conversation
Replace the hand-rolled two-state-machine loop with memchr + memcmp. memchr delegates scanning to the platform's optimized implementation (e.g. __memchr_avx2 on glibc x86-64) to skip non-'<' bytes, then memcmp verifies each candidate anchor. This eliminates the KMP-failure limitation noted in the original comment: the old implementation could mishandle opening sequences like '<e<esi'; the new one finds the correct anchor unconditionally. Adds four parser_test sections covering boundary cases specific to the new implementation: exact-length prefix at chunk end for both <esi: and <!--esi, <!--esi without required trailing whitespace, and multiple false '<' anchors before a valid tag.
There was a problem hiding this comment.
Pull request overview
Replaces the hand-rolled byte-by-byte state machine in EsiParser::_findOpeningTag with a memchr+memcmp based scan that delegates the common case (skipping non-< bytes) to libc's optimized implementation. This also fixes a known correctness limitation of the previous state machine (mishandling of repeated opening sequences like <e<esi).
Changes:
- Rewrite
_findOpeningTagto usememchrto locate<andmemcmpto verify the<esi:/<!--esianchors, preserving theCOMPLETE_MATCH/PARTIAL_MATCH/NO_MATCHsemantics and partial-match-at-chunk-boundary behavior. - Add
<cstring>include formemchr/memcmp. - Add four new parser test SECTIONs covering exact-length prefixes at chunk boundaries,
<!--esiwithout trailing whitespace, and multiple false<anchors before a valid tag.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| plugins/esi/lib/EsiParser.cc | Replaces the two-state-machine scan loop with a memchr/memcmp-based implementation and updates the function's leading comment. |
| plugins/esi/test/parser_test.cc | Adds boundary-condition test sections for the new implementation. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| if (++i_esi == ESI_TAG_PREFIX_LEN) { | ||
| is_html_comment_node = false; | ||
| opening_tag_pos = i_data - i_esi + 1; | ||
| const char *const buf = data.data(); |
There was a problem hiding this comment.
One thought here, and I asked Claude to make an example: Did you consider pulling in and using libswoc TextView instead? It excels at things like "string parsing". Not a show stopper, and could be a future consideration.
I asked Claude to write a suggested use with TextView (not tested ...)
swoc::TextView view{data.data() + start_pos, data.size() - start_pos};
const swoc::TextView esi_prefix{ESI_TAG_PREFIX, ESI_TAG_PREFIX_LEN};
const swoc::TextView html_prefix{HTML_COMMENT_NODE_INFO.tag_suffix,
size_t(HTML_COMMENT_NODE_INFO.tag_suffix_len)};
while (!view.empty()) {
size_t pos = view.find('<');
if (pos == swoc::TextView::npos) return NO_MATCH;
view.remove_prefix(pos);
size_t abs_pos = data.size() - view.size();
if (view.starts_with(esi_prefix)) { /* COMPLETE_MATCH esi */ }
if (view.size() > html_prefix.size() && view.starts_with(html_prefix)) {
char ch = view[html_prefix.size()];
if (ch==' '||ch=='\t'||ch=='\r'||ch=='\n') { /* COMPLETE_MATCH html */ }
}
if (view.size() < esi_prefix.size() && esi_prefix.starts_with(view)) { /* PARTIAL */ }
if (view.size() <= html_prefix.size() && html_prefix.starts_with(view)) { /* PARTIAL */ }
view.remove_prefix(1);
}There was a problem hiding this comment.
I was mostly unaware of what was offered in libswoc, so that's def why I didn't use it. I did some quick benchmarks with a synthetic comparison between this vs TextView and libswoc is as fast if not slightly faster sometimes. Its within margin of error, so Its more up to you guys if you want it re-written that way now, or something we should tackle later?
| const char *const buf = data.data(); | ||
| const size_t total = data.size(); | ||
| const size_t esi_len = ESI_TAG_PREFIX_LEN; | ||
| const size_t hlen = HTML_COMMENT_NODE_INFO.tag_suffix_len; |
There was a problem hiding this comment.
Do you really need this hlen local? The compiler will optimize it away, but I honestly would just like to see HTML_COMMENT_NODE_INFO.tag_suffix_len where you need this, specially since you memcmp from HTML_COMMENT_NODE_INFO.tag_suffix already (no shadow variable for the string).
Same with the esi_len I think, neither are blockers.
Remove the hlen and esi_len local aliases so the length argument to each memcmp call is read from the same named constant as the string argument, consistent with how ESI_TAG_PREFIX and HTML_COMMENT_NODE_INFO.tag_suffix were already written in full. Addresses review feedback from zwoop on PR apache#13173. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[approve ci] |
Problem
EsiParser::_findOpeningTaguses a hand-rolled two-state-machine byte loopto scan response bodies for
<esi:and<!--esiopening tags. On a CDNedge server profiled under production load, this function appeared as a
top-5 CPU leaf, ahead of OpenSSL EC math and zlib.
The root cause is that the loop is unconditionally O(n) at ~2 ns/byte — it
checks every byte even when the response body contains no ESI tags at all,
which is the common case on a CDN edge.
There is also a correctness issue in the original: the KMP-failure comment
in the source notes that the state machine mishandles sequences like
<e<esi:— the first<eadvances a partial-match counter but thenfails, and the recovery doesn't rewind to the second
<. The newimplementation avoids this entirely.
Solution
Replace the loop with
memchr+memcmp:memchrlocates the next<— delegates to the platform's optimizedlibc implementation (e.g.
__memchr_avx2on glibc x86-64), whichprocesses 128 bytes per iteration in an AVX2 unrolled loop.
memcmpverifies the candidate anchor (<esi:or<!--esi).<and repeat.Partial-match (chunk boundary) paths are preserved: when the buffer ends
mid-prefix, the function returns
PARTIAL_MATCHso the caller canaccumulate more data before deciding.
The only interface change is
#include <cstring>— all call sites, returntypes, and semantics are identical.
Performance
Measured on E5-2683 v4 Broadwell, 2.10 GHz, RHEL 8, glibc 2.28, gcc -O2:
<)<, no ESI)<!--esicomment nodes<)The pathological case is the worst case for this approach — every false
<triggers a
memcmp. Real CDN response bodies fall in the typical HTML range.Testing
parser_test. Four newSECTIONblocks cover boundary conditions specific to this implementation:
exact-length prefix at chunk end for both
<esi:and<!--esi,<!--esiwithout required trailing whitespace, and multiple false<anchors before a valid tag.
esi,esi_304, andesi_nested_includeall pass (builtinside
ci.trafficserver.apache.org/ats/fedora:42, CMake presetci-fedora-autest).through both the old and new implementations — 0 mismatches across all
MATCH_TYPEreturns, positions, andis_html_comment_nodeflags.