perf(clients): dense array lookup for Errors.forCode#2
Open
mashraf-222 wants to merge 6 commits intotrunkfrom
Open
perf(clients): dense array lookup for Errors.forCode#2mashraf-222 wants to merge 6 commits intotrunkfrom
mashraf-222 wants to merge 6 commits intotrunkfrom
Conversation
Adds three tests that define the behavioral contract of Errors.forCode: - testForCodeReturnsMatchingForEveryDefinedCode asserts round-trip identity for every code present in Errors.values(): forCode(e.code()) == e. - testForCodeReturnsUnknownServerErrorForOutOfRangeCodes probes codes outside the defined range (including Short.MIN/MAX and just-past-edge values) and asserts the fallback to UNKNOWN_SERVER_ERROR. Codes that happen to be defined are skipped so the test stays robust as new errors are added. - testForCodeIsIdentityInverseOfCode asserts forCode composed with Errors#code is idempotent. These are behavior-preserving tests against the existing HashMap-based implementation; they lock the contract before any refactor of the lookup path.
Replaces the HashMap<Short, Errors> CODE_TO_ERROR with a dense Errors[] indexed by (code - MIN_CODE). Error codes are a near-contiguous short range (currently -1 to 133); MIN_CODE=-1 so UNKNOWN_SERVER_ERROR lands at index 0. Any gaps in the range remain null in the array, and forCode bounds-checks before indexing, then falls back to UNKNOWN_SERVER_ERROR with the same log.warn behaviour as the prior implementation. Why it works - Removes the HashMap.get() + HashMap.getNode() + Integer.equals() chain observed in consumer profiles (cumulative ~0.4% of consumer CPU and ~0.15% of broker CPU on an lz4 reference workload). - Removes the per-call Short autobox that would otherwise happen if the HashMap.get(short) callsite were devirtualized under heavy polymorphic load; the array path is allocation-free. - The array entries live together in a small contiguous region of memory (< 280 bytes at current MAX_CODE=133), so hot calls hit L1. Why it's correct - For every Errors enum value e, the new forCode(e.code()) returns e (indexing by (code - MIN_CODE) is a bijection on the defined range and the static initializer places each enum at its own index, retaining the original duplicate-code check). - For codes outside [MIN_CODE, MAX_CODE] the code short-circuits to the log.warn + UNKNOWN_SERVER_ERROR fallback, preserving the prior behaviour for unknown codes. - For codes inside the range that do not correspond to any defined enum, the array slot is null and the same fallback runs. - ExceptionInInitializerError still fires on duplicate codes (Errors loaded under the same ClassLoader) because we now guard against a pre-existing non-null slot at the same index. Contract verified by the equivalence tests committed in the previous commit: testForCodeReturnsMatchingForEveryDefinedCode, testForCodeReturnsUnknownServerErrorForOutOfRangeCodes, and testForCodeIsIdentityInverseOfCode. No behaviour change beyond the lookup mechanism. Public API is unchanged. No new allocations introduced; one Errors[] reference is retained at class-init for the life of the JVM.
Four distributions approximate real-world usage: - allZero: every call is NONE (code 0); the common happy path. - small: uniformly random over Errors.values(); exercises the table. - mixed: 80/10/10 NONE / UNKNOWN_SERVER_ERROR / random; realistic. - miss: codes outside the defined range; worst-case fallback. The harness preloads a shuffled code buffer at @setup and unrolls the measured loop x4 so the per-call cost dominates loop overhead. All results are consumed via Blackhole to prevent dead-code elimination. Config: @fork(2) @WarmUp(5x5s) @measurement(10x5s) — the default "full-confidence" Kafka profile used by prior candidate benchmarks in this repo.
…(279) The e2e evidence file committed in the previous commit used t=3.182 (df=3) when computing N=5 half-widths; the correct t-value for N=5 is 2.776 (df=4). The CI-overlap verdict is unchanged by this correction (both producer and consumer 95% CIs still overlap with either t-value), but the exact half-width numbers in the table were over-estimated by a factor of ~1.146. This commit replaces those numbers with half-widths recomputed directly from the raw records/sec txt files in the session directory. This commit also: - Corrects the raw data rows for the trunk and feature consumer sequences so they match the actual per-run txt files (prior values were misattributed). - Updates the caller-count claim (274 -> 279) in both evidence files to match a fresh grep of the current tree: `rg -n "Errors\.forCode" --type java | grep -v '/test/' | grep -v '/jmh-benchmarks/' | wc -l` -> 279. No code or test changes — only evidence-file corrections.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
perf(clients): dense array lookup for Errors.forCode
Summary: Replace
HashMap<Short, Errors>with denseErrors[]for
Errors.forCode. Per-call cost −70% to −80% on 4/4 JMHconfigurations with non-overlapping 95% CIs. 279 non-test callers.
End-to-end impact not claimed (Amdahl-bounded below single-box noise floor).
Replaces the
HashMap<Short, Errors>lookup table insideorg.apache.kafka.common.protocol.Errors.forCode(short)with adense
Errors[]indexed by(code − MIN_CODE). The change isbacked by 3 equivalence tests added in this PR plus the 7 existing
ErrorsTestcases (10 total, all pass), a JMH benchmark(
ErrorsForCodeBenchmark) whose four input-distributionconfigurations each show a −70% to −80% per-call reduction with
non-overlapping 95% CIs, and a post-refactor profile showing the
method drops out of broker and consumer CPU samples. This PR is
scoped as LIBRARY PRIMITIVE: 279 non-test callers in the
codebase. End-to-end throughput impact was measured at N=5 on a
single-box lz4 workload and is inconclusive due to CI overlap,
as the Amdahl ceiling (~0.32% consumer / ~0.11% broker) sits well
below this hardware's single-box noise floor.
Tier and claim
LIBRARY PRIMITIVE —
Errors.forCode(short): per-call costreduced by −70% to −80% (non-overlap 95% CIs, JMH with 2 forks ×
10 measurement × 5 s + 5 × 5 s warmup per fork) across 4 input
distributions. 279 non-test downstream callers in the codebase.
End-to-end measured inconclusive at N=5 on lz4 1024-byte
reference workload; CIs overlap (see "End-to-end" section).
What changed
clients/src/main/java/org/apache/kafka/common/protocol/Errors.java:425-439, 526-539— replaces
Map<Short, Errors> CODE_TO_ERROR = new HashMap<>()with dense
static final Errors[] CODE_TO_ERROR_ARRplusMIN_CODE/MAX_CODEshort fields; rewritesforCode(short)tobounds-check then single array load, fallback unchanged.
clients/src/test/java/org/apache/kafka/common/protocol/ErrorsTest.java:91-138— adds 3 new tests covering round-trip identity, out-of-range
fallback, and
forCode(e.code()) == eidempotence.jmh-benchmarks/src/main/java/org/apache/kafka/jmh/common/ErrorsForCodeBenchmark.java— new JMH benchmark, four
@Paramdistributions (allZero,small,mixed,miss), unrolled 4× per op,Blackhole-consumed.
codeflash-evidence/20260507-cand3-errors-forcode-delta.md,codeflash-evidence/20260507-cand3-errors-forcode-e2e.md—per-candidate evidence files committed alongside the code (an
additional correction commit fixes an arithmetic error in the
initial evidence: N=5 half-widths now use the correct t=2.776
(df=4), raw data rows match the session's txt files, and caller
count is 279 as re-verified on this tree).
Public API: unchanged.
Errors.forCode(short)signature,return type, and observable behaviour on every valid and invalid
input are preserved.
On-disk / on-wire formats: unchanged.
JVM floor: unchanged (clients module continues to target Java 11).
Why it works
Commit
f8cba68—refactor: Errors.forCode dense array lookupMechanism. The old
forCodepath doesCODE_TO_ERROR.get(Short.valueOf(code)). This executes:shortto aShort(cached in the Shortautobox for codes in −128..127; most Kafka codes fall in this
range, so this step is often free, but the cache lookup still
consumes a load).
HashMap.hash(key)— a method call on the Short key.HashMap.getNode(hash, key)— table indexing, bucket probe,key.equals(probe.key)on each probe.Integer.equalsdispatch through the key object's vtable(Short is final so this is effectively monomorphic, but the
JVM still emits the check).
The new path is a primitive bounds check plus one array load:
The JMH numbers show
allZero11.854 ns/op → 2.941 ns/op per 4-callunroll, i.e. ~2.96 ns/call → ~0.74 ns/call. Those ns/call figures are
consistent with a picture of the baseline doing roughly 7–8 cycles
of chained loads and branches at 3 GHz, and the feature doing
roughly 2–3 cycles for a masked-compare bounds check plus an
L1-resident array load. The cycle estimates are anchored in the
measured ns/op rather than an independent measurement.
Post-refactor profile (
profiles/post-refactor/cand3/DELTA.md):on a 30-second CPU sample of both broker and consumer under the
reference workload,
Errors.forCodewent from 5/1159 = 0.43%consumer samples and 2/3295 = 0.06% broker samples down to 0
samples on both JVMs. At the default 9.9 ms sampling interval, a
≤1 ns call site has a vanishing probability of being sampled at
all — consistent with the observed JMH reduction.
Why it's correct.
Errors.forCode(short)signature and return type unchanged.Errors.values()and stores each enum at indexcode − MIN_CODE,so for any enum
e,forCode(e.code())returnse. This is abijection on the defined range.
[MIN_CODE, MAX_CODE]short-circuit to the fallback path (log.warn("Unexpected error code: {}.", code)+ returnUNKNOWN_SERVER_ERROR), exactlymatching the prior behaviour. Codes inside the range that don't
map to an enum leave a
nullslot;forCodedetects the nulland falls through to the same fallback.
ExceptionInInitializerErrorif two enum entries would land atthe same array index, preserving the pre-existing guard.
log.warn(...)message andreturned
UNKNOWN_SERVER_ERRORsentinel are byte-identical tobefore.
forCodeis a pure read from astatic finalreference; nosynchronisation, no ordering observable to callers.
Tests:
clients/src/test/java/org/apache/kafka/common/protocol/ErrorsTest.javanow has 10 @test methods (7 prior + 3 new). The three new tests
cover the contract this refactor touches:
testForCodeReturnsMatchingForEveryDefinedCode— iterates everyErrors.values()and assertsforCode(e.code()) == e.testForCodeReturnsUnknownServerErrorForOutOfRangeCodes—probes
Short.MIN_VALUE,Short.MAX_VALUE, and just-past-edgevalues, asserting fallback identity.
testForCodeIsIdentityInverseOfCode—forCode(forCode(c).code()) == forCode(c).All 10 tests pass on the refactor commit (output at
/home/ubuntu/code/kafka/clients/build/test-results/test/TEST-org.apache.kafka.common.protocol.ErrorsTest.xml).Benchmark
JMH methodology
kafka-jmh-benchmarks-4.4.0-SNAPSHOTshadow jar)-f 2 -wi 5 -w 5 -i 10 -r 5— 2 forks × (5 × 5 s warmup + 10 × 5 s measurement)org.apache.kafka.jmh.common.ErrorsForCodeBenchmark.forCodeErrors.forCode(short)results per measured op (unrolled 4×), consumed viaBlackholeto defeat dead-code elimination.@Setup(seededRandom(0xC0DEC0DEL)) so per-call cost is dominated by the lookup itself and no loop-inside-loop constant folding is possible.@Param:allZero— every call isNONE(code 0); the common happy path.small— uniformly random overErrors.values().mixed— 80%NONE/ 10%UNKNOWN_SERVER_ERROR/ 10% random other (realistic broker shape).miss— codes outside the defined range; worst-case fallback.JMH results
Source:
codeflash-evidence/20260507-cand3-errors-forcode-delta.md(committed in this PR).
All four configurations: non-overlap 95% CIs (verified by
not (base_mean − base_err > feat_mean + feat_err or feat_mean − feat_err > base_mean + base_err)).All
scoreError/scoreratios below 5% (max = 3.60% on baselinesmall).End-to-end
E2E was measured at N=5 interleaved on a single-box KRaft broker
(4 vCPU, 1 GB heap, G1GC default, broker + producer + consumer on
the same host — CPU-contended), lz4 compression, 5 M × 1024 B
records. Full protocol and raw runs are in
codeflash-evidence/20260507-cand3-errors-forcode-e2e.md(committedin this PR) and the raw per-run txt files under the session directory.
Half-widths below are 95% CIs via t-distribution with df=4 (t=2.776)
at N=5, recomputed directly from the raw records/sec values.
Source for all E2E numbers above: raw per-run records/sec files cited in
codeflash-evidence/20260507-cand3-errors-forcode-e2e.md. Means andhalf-widths were recomputed from the raw data with t=2.776 (df=4, N=5)
and verified in this PR's integrity check.
E2E measurement inconclusive at N=5; 95% CIs overlap on both
producer and consumer.
This PR is not claiming an E2E throughput improvement. By Amdahl,
the measured
Errors.forCodeshare of CPU on this workload was~0.43% cumulative consumer and ~0.15% broker (reported in
codeflash-evidence/20260507-cand3-errors-forcode-delta.mdunder theAmdahl section); a 75% reduction there caps the theoretical E2E at
~0.32% consumer / ~0.11% broker. N=5 on this single-box setup has
half-widths of 6.5–25.9%, so the Amdahl-predicted signal is several
orders of magnitude below the noise floor and cannot be distinguished.
The consumer point estimate of −10.11% is directional-negative
inside a wide noise band. No mechanism for a real consumer
regression is plausible (the refactor is strictly a same-inputs
same-outputs faster lookup), and the trunk/feature raw sequences
show the variance is driven by single-box CPU-contention jitter,
not by the refactor. The evidence file
codeflash-evidence/20260507-cand3-errors-forcode-e2e.mddiscussesthis in more detail under "Consumer (−10.11%, CI overlap)".
Callers and impact scope
Errors.forCode(short)Verified non-test caller count: 279 Java call sites outside
/test/and outside/jmh-benchmarks/.The method sits on virtually every response-parsing path across
clients, broker, KRaft, and coordinators. Hot-path examples
verified by direct
rg -n:raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java:953, 973, 1170, 1182, 1346, 1358(KRaft vote/fetch response parsing; 14 total in this file)clients/src/main/java/org/apache/kafka/clients/NetworkClient.java:1029(ApiVersions response parsing)clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractFetch.java:208(per-partition fetch error translation)clients/src/main/java/org/apache/kafka/common/requests/FetchResponse.java:94, 134(fetch response top-level + partition error codes)clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:1373, 1523, 1554(offset-commit + group join error paths)clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerHeartbeatRequestManager.java:188core/src/main/scala/kafka/server/KafkaApis.scala:1575, 4162, 4228(broker API dispatch)transaction-coordinator/src/main/java/org/apache/kafka/coordinator/transaction/RPCProducerIdManager.java:172group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorService.java:846, 1601, 2026Reproduction
Each command is copy-paste runnable on a machine with JDK 25 and
Gradle installed. Wall-time estimates assume a 4 vCPU host (they
scale roughly linearly with cores for the benchmark, sub-linearly
for gradle).
E2E reproduction (optional; results will be noise-dominated at N=5
on a single-box setup, as the evidence above documents):
Repeat N=5 times per side, interleaved A-B-A-B-A-B-A-B-A-B.
Risks and limitations
iterates
Errors.values()twice (once to compute min/max, onceto populate the array) instead of once. This is ~270 array loads
at class-load, run exactly once per
Errorsclassloader — anirrelevant one-off cost.
Errors[]of sizeMAX_CODE − MIN_CODE + 1to the
Errorsclass's static state. At the current range[−1, 133]this is 135 reference slots ≈ < 300 bytes includingarray header. Retained for the life of the JVM. The prior
HashMapit replaces held the same entries with largerper-entry overhead (
HashMap.Node+ autoboxedShortperkey), so this is a small net reduction.
code outside the current
[−1, 133]range, the array grows toaccommodate it, with any gap slots left
nulland resolved viathe out-of-range fallback path (same behaviour as today). There
is no hard upper bound enforced at code time, so in the worst
case a code like
Short.MAX_VALUEwould allocate a 32 k-slotarray (~128 KB). This is an eventuality worth noting; if it
becomes a concern, the fallback could be reinstated as a
HashMap lookup for codes beyond a hard cap, preserving the
dense array for the common range.
static final,populated in the class initialiser, read without
synchronisation. This is the same pattern as the prior
HashMapfield (static final Map) and is safe for concurrentreads.
AWS host. E2E was measured on the same host with broker +
producer + consumer sharing CPU (see the End-to-end section);
N=5 is insufficient to detect a sub-1% Amdahl-predicted E2E
signal, and the half-widths in the E2E table reflect this. JMH
results on non-HotSpot JVMs have not been measured.
Test plan
./gradlew :clients:test --tests "*ErrorsTest*"— 10 pass, 0 fail (10 total: 7 prior + 3 new in this PR)./gradlew :clients:checkstyleMain :clients:checkstyleTest— pass./gradlew :clients:spotbugsMain— passcodeflash-evidence/20260507-cand3-errors-forcode-delta.mdwith backing JSON files retained in the session directorycodeflash-evidence/20260507-cand3-errors-forcode-e2e.md(N=5 single-box; result inconclusive as documented)Target repo
This PR targets the fork
codeflash-ai/kafkaon thetrunkbranch,NOT upstream
apache/kafka. Opening an upstream PR would be anexplicit separate decision by the user.