4.x perf optimizations #84

yuzawa-san · 2026-01-13T18:55:16Z

i have tried the 4.X branch out in a real production environment under real load. this allowed me to profile and kill off more hotspots. here are the main changes:

optimize hot loops in AbstractLazilyEncodableSection. here we can skip allocating/calling iterators and just use simple for loops
remove ManagedIntegerSet and ManagedFixedList. both of these wrapped the values returned by the field getValue() methods. this was in order for us to determine if the value returned was mutated. in that case the parent field would be marked as dirty. however we found we called a lot of the getValues on the same decoded field and it would return a new wrapper on each call. i was never fully satisfied with my original implementation here, so i reworked this into a different architecture. i introduced a Dirtyable interface which means that the values themselves can track mutations. i made the fields that return a collection return a Dirtyable implementation: IntegerSet, FixedIntegerList, FixedList.
this allowed me to clean up the class hierarchy for IntegerSet. also this is a concrete class which means it have virtual dispatch instead of the more expensive interface dispatch.
i also converted the method signatures to return FixedIntegerList instead of List<Integer>. that FixedIntegerList is much more optimized: store in a byte array, add methods for unboxed access.
added tests

Kevin-P-Kerr

My only concern is that FixedIntegerList can only represent (I believe) 8 bit values. That works for this use case?

yuzawa-san · 2026-01-13T19:05:21Z

@Kevin-P-Kerr yes, the gpp state specs only represent values 0,1,2 as the values (e.g. SensitiveDataProcessing , KnownChildSensitiveDataConsents). theoretically it could have been packed even smaller, but i found byte the smallest primitive which has built-in casts to/from int.

Kevin-P-Kerr · 2026-01-13T20:36:39Z

iabgpp-encoder/src/main/java/com/iab/gpp/encoder/datatype/FixedIntegerList.java

+  public int setInt(int index, int value) {
+    // NOTE: int 128 is prevented since it would get turned into byte -128
+    if(value < 0 || value >= 128) {
+      throw new IllegalArgumentException("FixedIntegerList only supports positive integers less than 128.");


this should be fine as the most typical values are 0,1,2 I believe

yuzawa-san · 2026-01-28T20:01:28Z

i was able to do further optimization:

i was able to remove the substring operations fully at great savings. this means that we no longer need to slice BitSets. instead i introduced a BitStringReader which progressed forward thru the BitString. it provides the primitive readInt readLong readFibonacci methods. i was able to remove the need search for the end of fibonacci string. we can accumulate the value while we search for the end.
i implemented BitSet in a simpler manner than the JDK version. it uses bytes instead of longs. i settled on a block based base64 decoding algorithm inspired by the JDK's version. this method exploits the fact that 4 base64 characters fit exactly into 3 bytes. this allows for some good loop unwinding and fewer bit shift operations on the decode which was taking up a good bit of cpu relative to other things.
i converted the field keys to enums. this allows for a large number of optimizations since we can use offsets and lists to store things
i made the class hierarchy more DRY. this reduces the amount of boilerplate required to add new sections.
refactored the lazy encode decode and dirty logic into a parent abstract class for reuse. the GppModel, the sections, and the segments all now have consistent and reuseable logic.
i cleaned up GppModel to store the sorted segment ids in the header. this allows the dirty logic to be unified. i changed the section map to be keyed on Integer since that is cheaper and its instances (via valueOf) are cached. i kept support for doing stuff via section names. i modified the setters and getters by section to use FieldKey instead of raw strings, since we get a lot of performance from not having to keep a string to field map.
replace interfaces with abstract types since they have faster method dispatch
upgrade SlicedCharSequence.split() to use String.indexOf() which is significantly faster than using charAt to find the split locations

i have a benchmark:

@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@State(org.openjdk.jmh.annotations.Scope.Thread)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(
        value = 3)
public class Microbenchmark {
  	private static final String in = "DBABMA~CQCDewAQCDewAPoABABGA9EMAP-AAB4AAIAAKVtV_G__bXlv-X736ftkeY1f9_h77sQxBhfJs-4FzLvW_JwX32EzNE36tqYKmRIAu3bBIQNtHJjUTVChaogVrzDsak2coTtKJ-BkiHMRe2dYCF5vmwtj-QKZ5vr_93d52R_t_dr-3dzyz5Vnv3a9_-b1WJidK5-tH_v_bROb-_I-9_5-_4v8_N_rE2_eT1t_tevt739-8tv_9___9____7______3_-ClbVfxv_215b_l-9-n7ZHmNX_f4e-7EMQYXybPuBcy71vycF99hMzRN-ramCpkSALt2wSEDbRyY1E1QoWqIFa8w7GpNnKE7SifgZIhzEXtnWAheb5sLY_kCmeb6__d3edkf7f3a_t3c8s-VZ792vf_m9ViYnSufrR_7_20Tm_vyPvf-fv-L_Pzf6xNv3k9bf7Xr7e9_fvLb__f___f___-______9__gAAAAA.QKVtV_G__bXlv-X736ftkeY1f9_h77sQxBhfJs-4FzLvW_JwX32EzNE36tqYKmRIAu3bBIQNtHJjUTVChaogVrzDsak2coTtKJ-BkiHMRe2dYCF5vmwtj-QKZ5vr_93d52R_t_dr-3dzyz5Vnv3a9_-b1WJidK5-tH_v_bROb-_I-9_5-_4v8_N_rE2_eT1t_tevt739-8tv_9___9____7______3_-.IKVtV_G__bXlv-X736ftkeY1f9_h77sQxBhfJs-4FzLvW_JwX32EzNE36tqYKmRIAu3bBIQNtHJjUTVChaogVrzDsak2coTtKJ-BkiHMRe2dYCF5vmwtj-QKZ5vr_93d52R_t_dr-3dzyz5Vnv3a9_-b1WJidK5-tH_v_bROb-_I-9_5-_4v8_N_rE2_eT1t_tevt739-8tv_9___9____7______3_-";
    

    @Benchmark
    @Threads(Threads.MAX)
    public void run(Blackhole bh) throws Exception {
        TcfEuV2 nu = new GppModel(in).getTcfEuV2Section();
        bh.consume(nu.getPublisherConsents());
        bh.consume(nu.getPurposeConsents());
        bh.consume(nu.getVendorConsents());
        bh.consume(nu.getPurposeLegitimateInterests());
        bh.consume(nu.getVendorLegitimateInterests());
        bh.consume(nu.getSpecialFeatureOptins());
        bh.consume(nu.getCmpId());
        bh.consume(nu.getPublisherRestrictions());
    }
}

here are the results:

6ac876f6 (4.X):
Benchmark                              Mode  Cnt      Score      Error   Units
Microbenchmark.run                     avgt   15  22775.215 ± 1694.869   ns/op
Microbenchmark.run:gc.alloc.rate       avgt   15   3219.333 ±  226.438  MB/sec
Microbenchmark.run:gc.alloc.rate.norm  avgt   15   6376.014 ±   25.040    B/op
Microbenchmark.run:gc.count            avgt   15    292.000             counts
Microbenchmark.run:gc.time             avgt   15    107.000                 ms

5c1d473 (4.X-perf-optimizations):
Benchmark                              Mode  Cnt      Score     Error   Units
Microbenchmark.run                     avgt   15   1994.077 ±  94.978   ns/op
Microbenchmark.run:gc.alloc.rate       avgt   15  21475.516 ± 991.532  MB/sec
Microbenchmark.run:gc.alloc.rate.norm  avgt   15   3736.001 ±  12.520    B/op
Microbenchmark.run:gc.count            avgt   15    920.000            counts
Microbenchmark.run:gc.time             avgt   15    348.000                ms

that is roughly an 11x improvement in speed and almost 2x decrease in memory usage.

yuzawa-san added 7 commits January 12, 2026 13:20

optimize hot loops

9812b66

clean up mutability collection flow

674d03f

use arrays

cd9a876

clean up dirty flow again

bf62f5f

FixedIntegerList

56b930e

avoid itable calls to IntegerSet

8f80001

update tests

1515377

Kevin-P-Kerr reviewed Jan 13, 2026

View reviewed changes

yuzawa-san added 4 commits January 13, 2026 14:26

fixed signed byte int limit

74870a8

javadoc

92cbd01

DirtyableList

5070310

fix test

06f6c28

Kevin-P-Kerr approved these changes Jan 13, 2026

View reviewed changes

yuzawa-san added 16 commits January 15, 2026 18:32

remove substring

2b97c41

fix tests

3ab0e44

fix copy

35ace8d

own bitset

ed4581a

wip

ec6b366

remove imports

c7cc6b6

style

22b1caa

optimize fibonacci

e966b1e

optimize encode

3645237

convert fields to enums

b569dff

thin segments

af6ab8d

move

269e8a5

dry

61524c4

dry

842af42

use an abstract method

80abb08

more dry

e96873e

yuzawa-san added 20 commits January 26, 2026 21:18

more dry

bbaad3c

clean up flows

900b542

clean up hierarchy

09ef2d9

style

b9817bb

remove extra class

cd9bb63

use fieldkey

70c65ef

toString

3f5f9c8

user headers to store state

f9e14b0

clean up registry of sections

7639aaf

cleanup

a9e2109

notes

14e87ae

use array

45f0c2b

cleanup

fba9fbb

another size

647eb79

fix sizing

46843c7

fix dirty on init

e50a578

;

1ae9e6d

upgrade slicing methodology to use indexOf

4e73663

substring

f463f6b

clear only if something was there

a8d48c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.x perf optimizations #84

4.x perf optimizations #84

yuzawa-san commented Jan 13, 2026

Uh oh!

Kevin-P-Kerr left a comment

Uh oh!

yuzawa-san commented Jan 13, 2026

Uh oh!

Kevin-P-Kerr Jan 13, 2026

Uh oh!

yuzawa-san commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

4.x perf optimizations #84

Are you sure you want to change the base?

4.x perf optimizations #84

Conversation

yuzawa-san commented Jan 13, 2026

Uh oh!

Kevin-P-Kerr left a comment

Choose a reason for hiding this comment

Uh oh!

yuzawa-san commented Jan 13, 2026

Uh oh!

Kevin-P-Kerr Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

yuzawa-san commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuzawa-san commented Jan 28, 2026 •

edited

Loading