perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism to FileScanTask processing #2020

mbutrovich · 2026-01-13T21:07:52Z

Which issue does this PR close?

N/A.

What changes are included in this PR?

Due to the way Comet maps DataFusion SessionContext, the tokio runtime, and Spark Tasks, we see frequent waker churn when concurrency is set to 1 in the ArrowReader. This adds a fast path that does not use try_flatten_unordered and its internal replace_waker calls.
This also prevents tasks from being reordered at runtime. Several Iceberg Java tests expect specific query results without an ORDER BY, so this enables those tests to keep working when concurrency is set to 1.

See apache/datafusion-comet#3051 and

Are these changes tested?

New test for determinism, also running the entire Iceberg Java Spark suite via Comet in apache/datafusion-comet#3051.

mbutrovich · 2026-01-14T02:03:51Z

I think CI needs a kick, since it worked fine on the previous run.

Fast path ArrowReader::read when concurrency is 1 to avoid waker churn.

9876967

mbutrovich mentioned this pull request Jan 13, 2026

perf: [iceberg] Remove IcebergFileStream and use iceberg-rust's parallelization apache/datafusion-comet#3051

Draft

mbutrovich changed the title ~~perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn~~ perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism Jan 13, 2026

mbutrovich changed the title ~~perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism~~ perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism to FileScanTask processing Jan 13, 2026

mbutrovich and others added 3 commits January 13, 2026 16:24

Add test.

f8d5d42

Fix clippy.

b85e910

Merge branch 'main' into fast_path_concurrency_1

c638b95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism to FileScanTask processing #2020

perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism to FileScanTask processing #2020

mbutrovich commented Jan 13, 2026 •

edited

Loading

Uh oh!

mbutrovich commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism to FileScanTask processing #2020

Are you sure you want to change the base?

perf(reader): Fast path ArrowReader::read when concurrency is 1 to avoid waker churn and add determinism to FileScanTask processing #2020

Conversation

mbutrovich commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

mbutrovich commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mbutrovich commented Jan 13, 2026 •

edited

Loading