Skip to content

Conversation

@mbutrovich
Copy link
Collaborator

@mbutrovich mbutrovich commented Jan 13, 2026

Which issue does this PR close?

  • N/A.

What changes are included in this PR?

  • Due to the way Comet maps DataFusion SessionContext, the tokio runtime, and Spark Tasks, we see frequent waker churn when concurrency is set to 1 in the ArrowReader. This adds a fast path that does not use try_flatten_unordered and its internal replace_waker calls.
  • This also prevents tasks from being reordered at runtime. Several Iceberg Java tests expect specific query results without an ORDER BY, so this enables those tests to keep working when concurrency is set to 1.

See apache/datafusion-comet#3051 and

flamegraph

Are these changes tested?

New test for determinism, also running the entire Iceberg Java Spark suite via Comet in apache/datafusion-comet#3051.

@mbutrovich mbutrovich changed the title perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism Jan 13, 2026
@mbutrovich mbutrovich changed the title perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism to FileScanTask processing Jan 13, 2026
@mbutrovich
Copy link
Collaborator Author

I think CI needs a kick, since it worked fine on the previous run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant