Skip to content

Conversation

@jsedding
Copy link
Contributor

@jsedding jsedding commented Oct 9, 2025

No description provided.

@jsedding jsedding requested review from nfsantos and smiroslav October 9, 2025 14:29
@jsedding
Copy link
Contributor Author

jsedding commented Oct 9, 2025

@smiroslav I have experimented with various approaches to the cache preloading mechanism.

First, I dismissed my initial approach in #2519 to preload segments into the in-memory SegmentCache, because it busts the cache and there is no good way to know if a segment was already cached in a PersistentCache (which is the main objective).

I also dismissed the approach you took in #2513, where you implement preloading in the CachingSegmentArchiveReader, because the reader is bound to a single archive. This means that the preload mechanism cannot load referenced segments from other archives, which makes it a lot less useful.

The approach implemented in this PR allows adding a PersistentCache instance to a *FileStore via the FileStoreBuilder. When this is done, prefetching can additionally be configured (number of async threads and "depth", i.e. how many levels of referenced segments to follow). With this approach, the SegmentPreloader has access to the PersistentCache, which allows it to check if a segment is already present. It also has access to the TarFiles object, which allows it to read segments from any archive, and it can retrieve the segment graph from the TarFiles as well, which avoids having to read the references from the segment's data.

- fix unit-test
- address some sonar warnings
- use Awaitility for unit test
- test coverage, consistent naming
@jsedding jsedding force-pushed the jsedding/OAK-11934-segment-preloading branch from 77e6211 to 22ff2a6 Compare October 13, 2025 07:51
@sonarqubecloud
Copy link

@jsedding jsedding merged commit b34d1e3 into trunk Oct 13, 2025
4 of 6 checks passed
@jsedding jsedding deleted the jsedding/OAK-11934-segment-preloading branch October 13, 2025 20:28
reschke added a commit that referenced this pull request Oct 14, 2025
* OAK-11972: eliminate uses of org.apache.jackrabbit.core.data.RandomInputStream and refactor test code package locations in oak-commons

* OAK-11972: eliminate uses of org.apache.jackrabbit.core.data.RandomInputStream and refactor test code package locations in oak-commons

* OAK-11972: eliminate uses of org.apache.jackrabbit.core.data.RandomInputStream and refactor test code package locations in oak-commons - package info

* OAK-11972: eliminate uses of org.apache.jackrabbit.core.data.RandomInputStream and refactor test code package locations in oak-commons - package info - Sonar nits

* OAK-11931 : fix default for prevNoPropCachePercentage (#2559)

* OAK-11892 - Expose hidden mount for elasticsearch indexes in IndexStats (#2490)

Co-authored-by: chibulcu <chibulcu@adobe.com>

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure

* OAK-11964 - refactor parallel compaction inheritance (#2548)

* OAK-11972: package-info chmod

* OAK-11970 : updated MongoDocker Rule to use Mongo 8 (#2565)

* OAK-11971 : updated MongoProcess to use Mongo 8 (#2566)

* OAK-11972: remove leftover in oak-segment-tar pom (#2567)

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- some test fixes and cleanup

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- some test fixes and cleanup

* OAK-11974: remove usage of jackrabbit-data - NamedThreadFactory (#2568)

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review comment fixes

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review comment fixes

* OAK-11895 - CheckpointCompactor writes to "after" instead of "onto" NodeState (#2549)

* OAK-11977 Tree store: BufferOverflowException (#2571)

* OAK-11899 - use default value in case config value can not be parsed (#2509)

* OAK-11899 - use default value in case config value can not be parsed

* OAK-11899 - if config parameter can not be converted, use default value

* OAK-11899 - increase org.apache.jackrabbit.oak.spi.security version

* OAK-11969 do not check the existence of the tree twice (#2560)

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review fixes

* OAK-11934 - segment preloading for PersistentCache (#2569)

* OAK-11914 : removed usage of Guava's HashBasedTable (#2573)

* OAK-11914 : removed usage of Guava's HashBasedTable

* OAK-11914 : removed un-necessary usage of computeIfAbsent method

* OAK-11911 : exposed DirectExecutor from ExecutorUtils (#2575)

* OAK-11910 : exposed DirectExecutor from ExecutorUtils

* OAK-11910 : marked DirectExecutor as public again

* OAK-11911 : increased minor version

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review fixes

---------

Co-authored-by: Julian Reschke <reschke@apache.org>
Co-authored-by: Julian Reschke <julian.reschke@gmx.de>
Co-authored-by: stefan-egli <stefanegli@apache.org>
Co-authored-by: chibulcuteanu <paul.chibulcuteanu@gmail.com>
Co-authored-by: chibulcu <chibulcu@adobe.com>
Co-authored-by: Julian Sedding <jsedding@apache.org>
Co-authored-by: Rishabh Kumar <rishabhdaim1991@gmail.com>
Co-authored-by: Thomas Mueller <thomasm@apache.org>
Co-authored-by: waldoro <waldek.r@gmail.com>
Co-authored-by: Jörg Hoh <joerghoh@users.noreply.github.com>
reschke added a commit that referenced this pull request Nov 7, 2025
* OAK-11972: eliminate uses of org.apache.jackrabbit.core.data.RandomInputStream and refactor test code package locations in oak-commons

* OAK-11972: eliminate uses of org.apache.jackrabbit.core.data.RandomInputStream and refactor test code package locations in oak-commons

* OAK-11972: eliminate uses of org.apache.jackrabbit.core.data.RandomInputStream and refactor test code package locations in oak-commons - package info

* OAK-11972: eliminate uses of org.apache.jackrabbit.core.data.RandomInputStream and refactor test code package locations in oak-commons - package info - Sonar nits

* OAK-11931 : fix default for prevNoPropCachePercentage (#2559)

* OAK-11892 - Expose hidden mount for elasticsearch indexes in IndexStats (#2490)

Co-authored-by: chibulcu <chibulcu@adobe.com>

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure

* OAK-11964 - refactor parallel compaction inheritance (#2548)

* OAK-11972: package-info chmod

* OAK-11970 : updated MongoDocker Rule to use Mongo 8 (#2565)

* OAK-11971 : updated MongoProcess to use Mongo 8 (#2566)

* OAK-11972: remove leftover in oak-segment-tar pom (#2567)

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- some test fixes and cleanup

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- some test fixes and cleanup

* OAK-11974: remove usage of jackrabbit-data - NamedThreadFactory (#2568)

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review comment fixes

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review comment fixes

* OAK-11895 - CheckpointCompactor writes to "after" instead of "onto" NodeState (#2549)

* OAK-11977 Tree store: BufferOverflowException (#2571)

* OAK-11899 - use default value in case config value can not be parsed (#2509)

* OAK-11899 - use default value in case config value can not be parsed

* OAK-11899 - if config parameter can not be converted, use default value

* OAK-11899 - increase org.apache.jackrabbit.oak.spi.security version

* OAK-11969 do not check the existence of the tree twice (#2560)

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review fixes

* OAK-11934 - segment preloading for PersistentCache (#2569)

* OAK-11914 : removed usage of Guava's HashBasedTable (#2573)

* OAK-11914 : removed usage of Guava's HashBasedTable

* OAK-11914 : removed un-necessary usage of computeIfAbsent method

* OAK-11911 : exposed DirectExecutor from ExecutorUtils (#2575)

* OAK-11910 : exposed DirectExecutor from ExecutorUtils

* OAK-11910 : marked DirectExecutor as public again

* OAK-11911 : increased minor version

* OAK-11981: blob-plugins - remove use of TransientFileFactory (#2576)

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review fixes

* OAK-11983: benchmarks: remove jackrabbit-core dependency (#2577)

* OAK-11980 - improve parallelization of I/O during segment-azure initialization (#2574)

* OAK-11985 : added MongoServerUnavailableException into transient errors (#2582)

* OAK-11982 move the calculation of the WARN limit out of the constructor (#2585)

* OAK-11913 : created Forwarding executir service (#2580)

* OAK-11935 : updating aws sdk from 1.x to 2.x (#2558)

* OAK-11935 : updating aws sdk from 1.x to 2.x

* OAK-11935 : added more unit cases

* OAK-11935 : added unit cases for S3RequestDecorator

* OAK-11935 : fixed issue when running ITs with different encryption mode

* OAK-11935 : removed unused imports

* OAK-11935 : incorporated review comments

* OAK-11935 : added review comments

* OAK-11935 : added properties for cross Region access

* OAK-11935 : removed additional list for removing keys in case of GCP mode

* OAK-11935 : moved out CRUD operations from Utils to S3CrudHelper

* OAK-11935 : added unit cases for waitForBucket method

* OAK-11935 : make total attemps and delay configurable to use lower values in unit cases

* OAK-11935 : remaned S3CrudHelper to S3BackendHelper

* OAK-11987 - org.apache.jackrabbit.oak.segment.azure.tool.SegmentStoreMigrator.migrateBinaryRef is missing a null check (#2586)

* OAK-11984 Support UserId Change for External Users (#2581)

* OAK-11984 Support UserId Change for External Users

* Removed unused change

* Update oak-auth-external/src/main/java/org/apache/jackrabbit/oak/spi/security/authentication/external/impl/ExternalLoginModule.java

Co-authored-by: Alejandro Moratinos <Amoratinos@users.noreply.github.com>

* Update oak-auth-external/src/main/java/org/apache/jackrabbit/oak/spi/security/authentication/external/impl/ExternalLoginModule.java

Co-authored-by: Alejandro Moratinos <Amoratinos@users.noreply.github.com>

* moving constants

* Added FF

* Added tests for FF

* Added debug log

---------

Co-authored-by: Alejandro Moratinos <Amoratinos@users.noreply.github.com>
Co-authored-by: angela <anchela@adobe.com>

* OAK-11936: Allow updating the inference config via JMX (#2525)

* OAK-11936: Allow updating the inference config via JMX

* OAK-11936: Apache License added to new classes

---------

Co-authored-by: marvinw <marvinw@adobe.com>

* OAK-11949: Sort union queries without "order-by" by score (#2540)

* OAK-11949: Sort Union Queries without order-by by score

* OAK-11936: restructure if-clause, move import and one additional test

* OAK-11949: use double for Null protection

---------

Co-authored-by: marvinw <marvinw@adobe.com>

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- code review fixes

* OAK-11267: Upgrade Azure SDK V8 to V12 for oak-blob-azure
- fix build: osgi package version update for oak api jmx

* OAK-11912 : created DirectExecutorServiuce in oak-commons (#2579)

* OAK-11936: Allow updating the inference config via JMX - fix package version

* OAK-11936: Allow updating the inference config via JMX - line ends in test case

* OAK-11994 remove unused dependency to joda-time (#2593)

Co-authored-by: Joerg Hoh <jhoh@adobe.com>

* OAK-11997: Log slow Mongo queries in DocumentNodeStore (#2596)

* Revert "OAK-11936: Allow updating the inference config via JMX - line ends in test case"

This reverts commit dea0956.

* Revert "OAK-11936: Allow updating the inference config via JMX - fix package version"

This reverts commit 7081c97.

* Revert "OAK-11936: Allow updating the inference config via JMX (#2525)"

This reverts commit 174dce1.

* Reapply "OAK-11936: Allow updating the inference config via JMX (#2525)"

This reverts commit acac0de.

* Reapply "OAK-11936: Allow updating the inference config via JMX - fix package version"

This reverts commit 248d4ec.

* Reapply "OAK-11936: Allow updating the inference config via JMX - line ends in test case"

This reverts commit 09a603b.

---------

Co-authored-by: Julian Reschke <reschke@apache.org>
Co-authored-by: Julian Reschke <julian.reschke@gmx.de>
Co-authored-by: stefan-egli <stefanegli@apache.org>
Co-authored-by: chibulcuteanu <paul.chibulcuteanu@gmail.com>
Co-authored-by: chibulcu <chibulcu@adobe.com>
Co-authored-by: Julian Sedding <jsedding@apache.org>
Co-authored-by: Rishabh Kumar <rishabhdaim1991@gmail.com>
Co-authored-by: Thomas Mueller <thomasm@apache.org>
Co-authored-by: waldoro <waldek.r@gmail.com>
Co-authored-by: Jörg Hoh <joerghoh@users.noreply.github.com>
Co-authored-by: Johnson Ho <johnho@adobe.com>
Co-authored-by: Nicola Scendoni <nscendoni@adobe.com>
Co-authored-by: Alejandro Moratinos <Amoratinos@users.noreply.github.com>
Co-authored-by: angela <anchela@adobe.com>
Co-authored-by: Marvin <95419378+ChlineSaurus@users.noreply.github.com>
Co-authored-by: marvinw <marvinw@adobe.com>
Co-authored-by: Joerg Hoh <jhoh@adobe.com>
Co-authored-by: José Andrés Cordero Benítez <Joscorbe@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant