KAFKA-20173: Metered layer of KV-stores needs to pass Headers#21684
KAFKA-20173: Metered layer of KV-stores needs to pass Headers#21684mjsax wants to merge 6 commits intoapache:trunkfrom
Conversation
| protected V outerValue(final byte[] value) { | ||
| return value != null ? serdes.valueFrom(value, new RecordHeaders()) : null; | ||
| protected byte[] serializeValue(final V value) { | ||
| return value != null ? serdes.rawValue(value, internalContext.headers()) : null; |
There was a problem hiding this comment.
This is the key question -- should we pass context.headers() or new RecordHeaders for non-header stores. Both solutions have advantages and disadvantages.
Using new RecordHeaders is strictly more backward compatible; the idea to pass in context.headers() feels like a "bug fix" though -- we should have always done this IMHO. That's why I opted for this solution.
Of course, we can also "fix" this "bug" by arguing: we stay 100% backward compatible and pass in new RecordHeaders and user get the fix by enabling the new headers stores...
Let me know what you think.
There was a problem hiding this comment.
IMHO. I think new RecordHeaders looks more correct (from backward compatibility POV), but essentially that's okay to do context.headers() because:
- If user doesn't use headers aware serializer, headers will be ignored (ether empty or not)
- The idea of this ticket is to actually fix code base to propagate headers, and if we can propagate original headers instead of "mocked" one we go for it
- It doesn't make any difference if users aren't using headers state stores / serializers, but helps us to consistently apply simple paradigm: if there is a access to original headers -> go for it, if not fallback to
new RecordHeaders
There was a problem hiding this comment.
@mjsax
Regarding backward compability, I think this risk is very low because, most serdes ignore headers - the default interface methods delegate to the non-headers versions. Using internalContext.headers() is semantically correct behavior - serdes should have access to record context.
1Q: should we check if internalContext != null?
There was a problem hiding this comment.
1Q: should we check if internalContext != null?
I would say no? If internalContext would be null, it would be a bug (I believe), so we should rather expose such an issue directly, and fail fast by crashing, and not "mask" the bug by not failing?
| } | ||
|
|
||
| public RawAndDeserializedValue<V> getWithBinary(final K key) { | ||
| RawAndDeserializedValue<V> getWithBinary(final K key) { |
| MeteredKeyValueStore( | ||
| final KeyValueStore<Bytes, byte[]> inner, | ||
| final String metricsScope, | ||
| final Time time, | ||
| final Serde<K> keySerde, | ||
| final Serde<V> valueSerde | ||
| ) { |
There was a problem hiding this comment.
Offtopic
Question about formatting fix: do we have consistent code style across code base? I mean literally cli formatted or ide setting
I was able to find this: https://kafka.apache.org/community/developer/#streams-api , but I don't think that's enough to keep code style consistent
There was a problem hiding this comment.
No we don't have anything that would do strict enforcement...
| protected V outerValue(final byte[] value) { | ||
| return value != null ? serdes.valueFrom(value, new RecordHeaders()) : null; | ||
| protected byte[] serializeValue(final V value) { | ||
| return value != null ? serdes.rawValue(value, internalContext.headers()) : null; |
There was a problem hiding this comment.
IMHO. I think new RecordHeaders looks more correct (from backward compatibility POV), but essentially that's okay to do context.headers() because:
- If user doesn't use headers aware serializer, headers will be ignored (ether empty or not)
- The idea of this ticket is to actually fix code base to propagate headers, and if we can propagate original headers instead of "mocked" one we go for it
- It doesn't make any difference if users aren't using headers state stores / serializers, but helps us to consistently apply simple paradigm: if there is a access to original headers -> go for it, if not fallback to
new RecordHeaders
| throw new UnsupportedOperationException("Position is not supported for " + getClass().getSimpleName()); | ||
| } | ||
|
|
||
| protected Bytes keyBytes(final K key, final Headers headers) { |
There was a problem hiding this comment.
Do we still need this method, if we already have it from parent store?
see https://github.com/apache/kafka/pull/21684/changes#diff-9af87381ff50464fc0979726bd22231c747d941b3bb7a9c152ddbf430c61cf23R444
There was a problem hiding this comment.
I was focusing on MeteredKeyValueStore only, as the PR is already large enough. -- Would need to revisit both MeteredTimestampedKeyValueStore and MeteredTimestampedKeyValueStoreWithHeaders as follow up to further cleanup, in a follow up PR.
| Objects.requireNonNull(key, "key cannot be null"); | ||
| try { | ||
| final long validTo = maybeMeasureLatency(() -> inner.put(keyBytes(key), plainValueSerdes.rawValue(value), timestamp), time, putSensor); | ||
| final long validTo = maybeMeasureLatency(() -> inner.put(serializeKey(key), plainValueSerdes.rawValue(value), timestamp), time, putSensor); |
There was a problem hiding this comment.
plainValueSerdes.rawValue(value). do we need headers there?
There was a problem hiding this comment.
Good catch -- similar to my other comment -- focus of this PR was MeteredKeyValueStore -- we haven't even implemented "header-versioned-store" yet. Will address in follow up PR.
|
|
||
| @Test | ||
| public void shouldThrowIfIncompatibleSerdeForKey() throws ClassNotFoundException { | ||
| @SuppressWarnings("rawtypes") |
There was a problem hiding this comment.
out of curiosity: why new RecordHeaders() leads to @SuppressWarnings("rawtypes")?
There was a problem hiding this comment.
It doesn't -- this is additional side cleanup. We use Class below, which is the offender (and we cannot switch to Class<?> either to avoid the rawtype, because we want to test the wrong type condition...)
6f5fc81 to
ee99c2c
Compare
| store.init(context.getStateStoreContext(), store); | ||
| store.init( | ||
| new AbstractProcessorContext<>(new TaskId(0, 0), new StreamsConfig(context.appConfigs()), (StreamsMetricsImpl) context.metrics(), null) { |
There was a problem hiding this comment.
I assume MockProcessorContext doesn't have a method to mock headers, and that's the reason why you re-implemented it, right?
btw, don't think this comment is still correct:
/**
* Demonstrate the use of {@link MockProcessorContext} for testing the {@link Processor} in the {@link WordCountProcessorDemo}.
*/
There was a problem hiding this comment.
Oh. That's a good point. I totally missed that this is example code... Need to do this differently.
There was a problem hiding this comment.
Seems this is related to https://issues.apache.org/jira/browse/KAFKA-19983
| .withLoggingDisabled() // Changelog is not supported by MockProcessorContext. | ||
| // Caching is disabled by default, but FYI: caching is also not supported by MockProcessorContext. | ||
| .build(); | ||
| store.init(context.getStateStoreContext(), store); |
| protected V outerValue(final byte[] value) { | ||
| return value != null ? serdes.valueFrom(value, new RecordHeaders()) : null; | ||
| protected byte[] serializeValue(final V value) { | ||
| return value != null ? serdes.rawValue(value, internalContext.headers()) : null; |
There was a problem hiding this comment.
@mjsax
Regarding backward compability, I think this risk is very low because, most serdes ignore headers - the default interface methods delegate to the non-headers versions. Using internalContext.headers() is semantically correct behavior - serdes should have access to record context.
1Q: should we check if internalContext != null?
Updates the metered ks-stores layer to pass the context headers into
serdes. Simplifies the code with some refactoring.