Skip to content

fix(confluent): Fix KIP-848 consumer assignment and seek handling#525

Draft
evanh wants to merge 3 commits intomainfrom
evanh/fix/update-confluent
Draft

fix(confluent): Fix KIP-848 consumer assignment and seek handling#525
evanh wants to merge 3 commits intomainfrom
evanh/fix/update-confluent

Conversation

@evanh
Copy link
Member

@evanh evanh commented Mar 9, 2026

Fix several bugs when using confluent-kafka 2.13.2+ with Kafka 4+ and the KIP-848 consumer group protocol.

Core fix: KIP-848 assignment and seek handling

With the old rebalancing protocols (eager and cooperative-sticky), incremental_assign or assign is called from within the on_assign callback, and any seek() calls made by the user in that callback take effect immediately via rdkafka.

With KIP-848, calling seek() inside on_assign fails with _UNKNOWN_PARTITION — rdkafka hasn't fully committed the incremental assignment to its internal state by the time user code runs in the callback. The solution is to defer incremental_assign until after on_assign completes (in the finally block), using self.__offsets as the starting positions. Since self.__offsets already reflects any seek() calls the user made in the callback, the desired position is baked directly into the incremental_assign call, and no separate seek() is needed.

This required adding a separate __is_kip848 flag (distinct from __is_cooperative_sticky) so the two protocols can be handled differently in __assign and seek().

Other changes

  • Bump confluent-kafka to >=2.13.2; KIP-848 requires Kafka 4+, so update run-kafka.sh to use cp-kafka:8.0.0 with KAFKA_GROUP_COORDINATOR_REBALANCE_PROTOCOLS=classic,consumer
  • Drop Python 3.9 from the CI test matrix
  • Fix test_kip848_e2e to handle a pre-existing topic from unclean test teardown
  • Remove the xfail marker from TestKafkaStreamsKip848::test_pause_resume_rebalancing — it now passes

evanh and others added 3 commits March 9, 2026 16:17
With the KIP-848 consumer group protocol, calling seek() inside the
on_assign callback fails with _UNKNOWN_PARTITION because rdkafka hasn't
fully registered the partition yet. Fix this by deferring incremental_assign
until after the user's on_assign callback, so any seek() calls made in the
callback are incorporated into the starting offsets passed to rdkafka.

Also fix the test_kip848_e2e topic cleanup to handle pre-existing topics
from unclean test teardown, remove the xfail marker from
test_pause_resume_rebalancing (it now passes), and bump confluent-kafka
to >=2.13.2 with Kafka 4+ (cp-kafka:8.0.0) required for KIP-848 support.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The __pending_seeks block in poll() was never reached: seeks deferred
during on_assign are cleared in the assignment callback's finally block
(after incremental_assign), before poll() can execute. Remove it along
with a leftover commented-out STALE_MEMBER_EPOCH line.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After the previous cleanup, __pending_seeks was written to in seek()
but never read — its values were unused since incremental_assign reads
from __offsets instead. Remove the dict and all its callsites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
for partition, offset in offsets.items()
]
if self.__is_cooperative_sticky:
if self.__is_kip848:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With cooperative-sticky, __assign is called from the assignment_callback before on_assign, and calling incremental_assign there is safe — rdkafka processes it immediately and seek() works fine afterwards inside the callback.

With KIP-848, calling incremental_assign at that point and then calling seek() inside the user's on_assign callback fails with _UNKNOWN_PARTITION. The root cause is that with KIP-848, rdkafka hasn't fully committed the incremental assignment to its internal state by the time the Python callback returns control to user code — so the partition isn't yet seekable.

The fix is to defer incremental_assign until after on_assign completes (in the finally block), at which point we use self.__offsets — which already reflects any seek() calls the user made — as the starting offsets. This way, we never need to call seek() at all; the desired position is baked directly into the incremental_assign call.

So the pass here means: skip incremental_assign for now, because it will be called with the correct final offsets later in the finally block, after on_assign has run.

From Claude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant