KAFKA-20179 : Avoid headers deserialization during changelogging#21676
KAFKA-20179 : Avoid headers deserialization during changelogging#21676muralibasani wants to merge 2 commits intoapache:trunkfrom
Conversation
|
This PR does not really address the problem. It just shifts the deserialization we try to avoid to a different place. To really solve the problem, we need to change the We need to change the whole call stack, to be able to literally pass a |
@mjsax thank you for taking a look. Tried to make a few changes. They look complicated indeed. ProcessorContextImpl logChange and vector clock changes. Passed raw header bytes through the producer call stack .. from the changelog stores all the way down to DefaultRecord.writeTo(), so we never deserialize and re-serialize headers just to write them to the changelog topic. When the vector clock is enabled, we manually splice the new entries into the raw byte array instead of materializing a Headers object. |
Changelog stores were eagerly deserializing header bytes on every put, only for the producer to re-serialize them.
Added SerializedHeaders — a lazy Headers wrapper that holds the raw bytes and defers parsing until the producer actually needs them via toArray().
The three changelog store wrappers (KV, window, session) now use rawHeaderBytes() and SerializedHeaders instead of the eager headers() call, and vector clock entries are appended without triggering deserialization.