Optimize PendingTrace span registration and time tracking#11078
Optimize PendingTrace span registration and time tracking#11078gh-worker-dd-mergequeue-cf854d[bot] merged 2 commits intomasterfrom
Conversation
tag: no release note tag: ai generated Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 58 metrics, 13 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~2e6fda2251, baseline=1.62.0-SNAPSHOT~067d0d2c4b
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.056 s) : 0, 1055960
Total [baseline] (8.843 s) : 0, 8842824
Agent [candidate] (1.057 s) : 0, 1057315
Total [candidate] (8.845 s) : 0, 8845293
section iast
Agent [baseline] (1.22 s) : 0, 1219681
Total [baseline] (9.53 s) : 0, 9529530
Agent [candidate] (1.223 s) : 0, 1222930
Total [candidate] (9.56 s) : 0, 9560114
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~2e6fda2251, baseline=1.62.0-SNAPSHOT~067d0d2c4b
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.244 ms) : 0, 1244
crashtracking [candidate] (1.225 ms) : 0, 1225
BytebuddyAgent [baseline] (631.876 ms) : 0, 631876
BytebuddyAgent [candidate] (633.35 ms) : 0, 633350
AgentMeter [baseline] (29.5 ms) : 0, 29500
AgentMeter [candidate] (29.396 ms) : 0, 29396
GlobalTracer [baseline] (248.753 ms) : 0, 248753
GlobalTracer [candidate] (248.594 ms) : 0, 248594
AppSec [baseline] (32.158 ms) : 0, 32158
AppSec [candidate] (31.994 ms) : 0, 31994
Debugger [baseline] (59.197 ms) : 0, 59197
Debugger [candidate] (58.987 ms) : 0, 58987
Remote Config [baseline] (603.013 µs) : 0, 603
Remote Config [candidate] (586.69 µs) : 0, 587
Telemetry [baseline] (8.068 ms) : 0, 8068
Telemetry [candidate] (8.071 ms) : 0, 8071
Flare Poller [baseline] (8.327 ms) : 0, 8327
Flare Poller [candidate] (8.955 ms) : 0, 8955
section iast
crashtracking [baseline] (1.229 ms) : 0, 1229
crashtracking [candidate] (1.228 ms) : 0, 1228
BytebuddyAgent [baseline] (798.738 ms) : 0, 798738
BytebuddyAgent [candidate] (800.079 ms) : 0, 800079
AgentMeter [baseline] (11.35 ms) : 0, 11350
AgentMeter [candidate] (11.38 ms) : 0, 11380
GlobalTracer [baseline] (238.276 ms) : 0, 238276
GlobalTracer [candidate] (239.459 ms) : 0, 239459
IAST [baseline] (25.726 ms) : 0, 25726
IAST [candidate] (25.74 ms) : 0, 25740
AppSec [baseline] (31.673 ms) : 0, 31673
AppSec [candidate] (31.877 ms) : 0, 31877
Debugger [baseline] (60.177 ms) : 0, 60177
Debugger [candidate] (59.489 ms) : 0, 59489
Remote Config [baseline] (545.751 µs) : 0, 546
Remote Config [candidate] (532.508 µs) : 0, 533
Telemetry [baseline] (11.903 ms) : 0, 11903
Telemetry [candidate] (13.16 ms) : 0, 13160
Flare Poller [baseline] (3.437 ms) : 0, 3437
Flare Poller [candidate] (3.613 ms) : 0, 3613
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~2e6fda2251, baseline=1.62.0-SNAPSHOT~067d0d2c4b
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.07 s) : 0, 1070157
Total [baseline] (11.192 s) : 0, 11191836
Agent [candidate] (1.056 s) : 0, 1055775
Total [candidate] (11.124 s) : 0, 11124195
section appsec
Agent [baseline] (1.255 s) : 0, 1255083
Total [baseline] (11.134 s) : 0, 11133619
Agent [candidate] (1.251 s) : 0, 1251266
Total [candidate] (11.097 s) : 0, 11096773
section iast
Agent [baseline] (1.229 s) : 0, 1228663
Total [baseline] (11.398 s) : 0, 11397508
Agent [candidate] (1.234 s) : 0, 1233916
Total [candidate] (11.412 s) : 0, 11411800
section profiling
Agent [baseline] (1.184 s) : 0, 1183985
Total [baseline] (11.154 s) : 0, 11153842
Agent [candidate] (1.183 s) : 0, 1182693
Total [candidate] (11.147 s) : 0, 11146517
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~2e6fda2251, baseline=1.62.0-SNAPSHOT~067d0d2c4b
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.231 ms) : 0, 1231
crashtracking [candidate] (1.223 ms) : 0, 1223
BytebuddyAgent [baseline] (637.992 ms) : 0, 637992
BytebuddyAgent [candidate] (632.125 ms) : 0, 632125
AgentMeter [baseline] (29.687 ms) : 0, 29687
AgentMeter [candidate] (29.521 ms) : 0, 29521
GlobalTracer [baseline] (252.356 ms) : 0, 252356
GlobalTracer [candidate] (248.719 ms) : 0, 248719
AppSec [baseline] (32.642 ms) : 0, 32642
AppSec [candidate] (32.005 ms) : 0, 32005
Debugger [baseline] (60.938 ms) : 0, 60938
Debugger [candidate] (59.912 ms) : 0, 59912
Remote Config [baseline] (613.165 µs) : 0, 613
Remote Config [candidate] (599.686 µs) : 0, 600
Telemetry [baseline] (8.263 ms) : 0, 8263
Telemetry [candidate] (8.088 ms) : 0, 8088
Flare Poller [baseline] (9.975 ms) : 0, 9975
Flare Poller [candidate] (7.369 ms) : 0, 7369
section appsec
crashtracking [baseline] (1.234 ms) : 0, 1234
crashtracking [candidate] (1.223 ms) : 0, 1223
BytebuddyAgent [baseline] (665.82 ms) : 0, 665820
BytebuddyAgent [candidate] (664.144 ms) : 0, 664144
AgentMeter [baseline] (12.164 ms) : 0, 12164
AgentMeter [candidate] (12.106 ms) : 0, 12106
GlobalTracer [baseline] (250.104 ms) : 0, 250104
GlobalTracer [candidate] (249.193 ms) : 0, 249193
IAST [baseline] (24.674 ms) : 0, 24674
IAST [candidate] (24.564 ms) : 0, 24564
AppSec [baseline] (185.175 ms) : 0, 185175
AppSec [candidate] (184.65 ms) : 0, 184650
Debugger [baseline] (66.315 ms) : 0, 66315
Debugger [candidate] (66.182 ms) : 0, 66182
Remote Config [baseline] (624.708 µs) : 0, 625
Remote Config [candidate] (602.227 µs) : 0, 602
Telemetry [baseline] (8.783 ms) : 0, 8783
Telemetry [candidate] (8.496 ms) : 0, 8496
Flare Poller [baseline] (3.593 ms) : 0, 3593
Flare Poller [candidate] (3.553 ms) : 0, 3553
section iast
crashtracking [baseline] (1.227 ms) : 0, 1227
crashtracking [candidate] (1.228 ms) : 0, 1228
BytebuddyAgent [baseline] (804.963 ms) : 0, 804963
BytebuddyAgent [candidate] (806.867 ms) : 0, 806867
AgentMeter [baseline] (11.382 ms) : 0, 11382
AgentMeter [candidate] (11.556 ms) : 0, 11556
GlobalTracer [baseline] (239.877 ms) : 0, 239877
GlobalTracer [candidate] (241.705 ms) : 0, 241705
IAST [baseline] (25.752 ms) : 0, 25752
IAST [candidate] (26.182 ms) : 0, 26182
AppSec [baseline] (31.737 ms) : 0, 31737
AppSec [candidate] (32.221 ms) : 0, 32221
Debugger [baseline] (58.073 ms) : 0, 58073
Debugger [candidate] (61.99 ms) : 0, 61990
Remote Config [baseline] (526.351 µs) : 0, 526
Remote Config [candidate] (1.137 ms) : 0, 1137
Telemetry [baseline] (14.941 ms) : 0, 14941
Telemetry [candidate] (11.212 ms) : 0, 11212
Flare Poller [baseline] (3.611 ms) : 0, 3611
Flare Poller [candidate] (3.502 ms) : 0, 3502
section profiling
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.18 ms) : 0, 1180
BytebuddyAgent [baseline] (691.811 ms) : 0, 691811
BytebuddyAgent [candidate] (690.616 ms) : 0, 690616
AgentMeter [baseline] (9.129 ms) : 0, 9129
AgentMeter [candidate] (9.098 ms) : 0, 9098
GlobalTracer [baseline] (206.93 ms) : 0, 206930
GlobalTracer [candidate] (207.037 ms) : 0, 207037
AppSec [baseline] (32.343 ms) : 0, 32343
AppSec [candidate] (32.429 ms) : 0, 32429
Debugger [baseline] (64.703 ms) : 0, 64703
Debugger [candidate] (65.451 ms) : 0, 65451
Remote Config [baseline] (572.916 µs) : 0, 573
Remote Config [candidate] (561.8 µs) : 0, 562
Telemetry [baseline] (8.552 ms) : 0, 8552
Telemetry [candidate] (7.775 ms) : 0, 7775
Flare Poller [baseline] (3.53 ms) : 0, 3530
Flare Poller [candidate] (3.533 ms) : 0, 3533
ProfilingAgent [baseline] (93.851 ms) : 0, 93851
ProfilingAgent [candidate] (93.724 ms) : 0, 93724
Profiling [baseline] (94.421 ms) : 0, 94421
Profiling [candidate] (94.287 ms) : 0, 94287
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 3 performance regressions! Performance is the same for 17 metrics, 16 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~2e6fda2251, baseline=1.62.0-SNAPSHOT~067d0d2c4b
dateFormat X
axisFormat %s
section baseline
no_agent (1.236 ms) : 1224, 1248
. : milestone, 1236,
iast (3.241 ms) : 3195, 3286
. : milestone, 3241,
iast_FULL (5.928 ms) : 5868, 5988
. : milestone, 5928,
iast_GLOBAL (3.685 ms) : 3629, 3741
. : milestone, 3685,
profiling (2.173 ms) : 2154, 2193
. : milestone, 2173,
tracing (1.868 ms) : 1852, 1883
. : milestone, 1868,
section candidate
no_agent (1.228 ms) : 1217, 1240
. : milestone, 1228,
iast (3.367 ms) : 3318, 3416
. : milestone, 3367,
iast_FULL (5.888 ms) : 5829, 5947
. : milestone, 5888,
iast_GLOBAL (3.595 ms) : 3536, 3654
. : milestone, 3595,
profiling (2.146 ms) : 2126, 2166
. : milestone, 2146,
tracing (1.923 ms) : 1906, 1939
. : milestone, 1923,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~2e6fda2251, baseline=1.62.0-SNAPSHOT~067d0d2c4b
dateFormat X
axisFormat %s
section baseline
no_agent (18.424 ms) : 18235, 18614
. : milestone, 18424,
appsec (18.469 ms) : 18282, 18655
. : milestone, 18469,
code_origins (18.102 ms) : 17921, 18283
. : milestone, 18102,
iast (18.257 ms) : 18071, 18443
. : milestone, 18257,
profiling (18.375 ms) : 18193, 18557
. : milestone, 18375,
tracing (17.759 ms) : 17584, 17934
. : milestone, 17759,
section candidate
no_agent (19.046 ms) : 18852, 19240
. : milestone, 19046,
appsec (18.664 ms) : 18478, 18850
. : milestone, 18664,
code_origins (18.061 ms) : 17882, 18239
. : milestone, 18061,
iast (18.954 ms) : 18764, 19144
. : milestone, 18954,
profiling (20.026 ms) : 19829, 20223
. : milestone, 20026,
tracing (17.844 ms) : 17671, 18017
. : milestone, 17844,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~2e6fda2251, baseline=1.62.0-SNAPSHOT~067d0d2c4b
dateFormat X
axisFormat %s
section baseline
no_agent (15.067 s) : 15067000, 15067000
. : milestone, 15067000,
appsec (14.849 s) : 14849000, 14849000
. : milestone, 14849000,
iast (18.291 s) : 18291000, 18291000
. : milestone, 18291000,
iast_GLOBAL (18.327 s) : 18327000, 18327000
. : milestone, 18327000,
profiling (15.038 s) : 15038000, 15038000
. : milestone, 15038000,
tracing (14.949 s) : 14949000, 14949000
. : milestone, 14949000,
section candidate
no_agent (14.944 s) : 14944000, 14944000
. : milestone, 14944000,
appsec (14.928 s) : 14928000, 14928000
. : milestone, 14928000,
iast (18.697 s) : 18697000, 18697000
. : milestone, 18697000,
iast_GLOBAL (18.013 s) : 18013000, 18013000
. : milestone, 18013000,
profiling (15.318 s) : 15318000, 15318000
. : milestone, 15318000,
tracing (15.088 s) : 15088000, 15088000
. : milestone, 15088000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~2e6fda2251, baseline=1.62.0-SNAPSHOT~067d0d2c4b
dateFormat X
axisFormat %s
section baseline
no_agent (1.489 ms) : 1478, 1501
. : milestone, 1489,
appsec (2.537 ms) : 2482, 2592
. : milestone, 2537,
iast (2.279 ms) : 2209, 2348
. : milestone, 2279,
iast_GLOBAL (2.326 ms) : 2256, 2396
. : milestone, 2326,
profiling (2.093 ms) : 2038, 2148
. : milestone, 2093,
tracing (2.079 ms) : 2025, 2133
. : milestone, 2079,
section candidate
no_agent (1.486 ms) : 1474, 1498
. : milestone, 1486,
appsec (2.546 ms) : 2490, 2601
. : milestone, 2546,
iast (2.277 ms) : 2207, 2346
. : milestone, 2277,
iast_GLOBAL (2.329 ms) : 2259, 2399
. : milestone, 2329,
profiling (2.111 ms) : 2055, 2166
. : milestone, 2111,
tracing (2.077 ms) : 2023, 2131
. : milestone, 2077,
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| @Override | ||
| void touch() { | ||
| lastReferenced = timeSource.getNanoTicks(); | ||
| LAST_REFERENCED.lazySet(this, timeSource.getNanoTicks()); |
There was a problem hiding this comment.
Interesting. I haven't really benchmarked lazySet. My guess is this makes a bigger difference on ARM with its more relaxed memory model than x86, but we should measure.
dougqh
left a comment
There was a problem hiding this comment.
This looks good to me.
I'd like to figure out a way to hook in the Spring PetClinic throughput tests that I've been doing, so we can get a better idea of overall impact. Based on the profiling I've done, I suspect the overall gain here is quite small.
While this is in the top 20 CPU time consumers in my stress test, the total time was <1%, so I'm guessing we won't see too much overall difference from this.
All that said, it is a straight-forward well-contained change, so why not?
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
dougqh
left a comment
There was a problem hiding this comment.
In a local throughput test, this appears to be neutral to slightly negative.
Unfortunately, macro tests are subject to run-to-run variance, so I cay conclusively yet.
I ran more complete throughput test at a variety of heap sizes. Overall, the change looks good. That's in line with what I expected from the benchmark results included in the PR and from the profiles from stress tests. |
What Does This Do
Two optimizations to
PendingTrace, which is on the critical path for every span start and finish:Replace volatile write of
lastReferencedwithlazySet: ThelastReferencedfield is written on everygetCurrentTimeNano()call (span start/finish) but only read by a backgroundPendingTraceBufferthread for approximate timeout detection. The full StoreLoad memory fence from a volatile write is unnecessary —lazySetprovides release-store semantics (StoreStore barrier only), which is sufficient since the reader tolerates staleness on the order of seconds.Guard
ROOT_SPANCAS with volatile read:registerSpan()callsROOT_SPAN.compareAndSet(this, null, span)on every span creation, but only the first call (root span) succeeds. For all subsequent spans, the CAS fails after acquiring exclusive cache line ownership. Addingif (rootSpan == null)before the CAS replaces the failed CAS (which requires cache line ownership + write barrier) with a cheap volatile read for non-root spans.Motivation
PendingTrace.getCurrentTimeNano()andregisterSpan()execute on every span start. In high-throughput applications creating millions of spans per second, eliminating a memory fence and a failed CAS per span adds up.Why this is faster
lockprefix ormfence(StoreLoad barrier).lazySetemits only a StoreStore barrier, which is a no-op on x86 (TSO memory model). On ARM, it avoids thedmb ishfull barrier, using onlydmb ishst(store barrier). This eliminates serialization of the store buffer on every span start/finish.compareAndSetrequires exclusive cache line ownership (MESI E/M state) even when it fails. A volatile read only needs shared ownership (S state). For non-root spans (the vast majority), this avoids a cache line transition on therootSpanfield.The buffer worker may see a slightly stale
lastReferenced, which could trigger marginally earlier trace splitting but cannot cause data loss.Benchmark results (8 threads, JDK 21, macOS aarch64, Fork=1, Warmup=2, Measurement=3)
getCurrentTimeNanogetCurrentTimeNano_contendedstartAndFinishSpansystemNanoTime(control)Note: The
lazySetoptimization primarily reduces memory fence cost, which is more impactful on ARM than x86 (where StoreStore is already free under TSO). ThegetCurrentTimeNano_contendedbenchmark shows modest improvement under cross-thread contention. ThestartAndFinishSpanbenchmark includes significant overhead from span creation/finish that dominates the CAS-guard savings.Human readability score: 9.5/10
Both changes use patterns already present in the same file (
AtomicLongFieldUpdaterfor other fields, volatile read guards elsewhere).Additional Notes
Also adds
TimeSourceBenchmark.javaJMH benchmark coveringPendingTrace.getCurrentTimeNano(), contended variant,startAndFinishSpan,System.nanoTime()baseline, andCoreTracer.getTimeWithNanoTicks().tag: no release note
tag: ai generated
Contributor Checklist
type:andcomp:labelssolvesinstead🤖 Generated with Claude Code