Skip to content

EXPORT PART background tasks block executor indefinitely when S3 destination is unreachable #1559

@Selfeer

Description

@Selfeer

Version: 26.1.4.20001 (altinity build)
Introduced by: PR #1402
How to run the test:

./regression.py --clickhouse https://altinity-build-artifacts.s3.amazonaws.com/REFs/antalya-26.1/9b978b90baa1fd20e917a17784f8ec8c22265fd3/build_amd_release/clickhouse-common-static_26.1.4.20001.altinityantalya_amd64.deb --clickhouse-version 26.1.4.20001 -l test.log --storage minio --only "/s3/minio/export tests/export part/concurrent alter/during minio interruption/*" --as-binary

Summary

When ALTER TABLE ... EXPORT PART ... TO TABLE targets an S3-backed table and the S3 endpoint becomes unreachable, the background export tasks retry internally for an extremely long time (~50 minutes each), consuming all background executor slots. No new export operations can be scheduled until the stuck tasks complete. Additionally, DROP TABLE on the source table hangs while tasks are in flight.

What we were testing

The test validates that concurrent ALTER TABLE ... EXPORT PART operations behave correctly when the S3 destination (MinIO) is interrupted mid-operation. This is a network resilience scenario — the expectation is that export operations should either fail promptly or recover once the destination comes back, not permanently block the executor.

Test procedure

  1. Create a partitioned MergeTree source table with 5 partitions (10 parts total)
  2. Create an S3-backed destination table pointing to MinIO
  3. Kill the MinIO container (docker kill --signal=KILL)
  4. Export all 10 parts sequentially via ALTER TABLE ... EXPORT PART
  5. Start MinIO back up and verify data

What happened

Phase 1 — Parts accepted, then rejected (18:07:07)

9 parts were accepted into the background executor. The 10th was immediately rejected:

# clickhouse-server.log — 9 parts accepted in rapid succession (~300ms window)

18:07:07.172 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '1_1_1_0' TO TABLE s3_... (stage: Complete)
18:07:07.207 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '1_2_2_0' TO TABLE s3_... (stage: Complete)
18:07:07.248 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '2_3_3_0' TO TABLE s3_... (stage: Complete)
18:07:07.276 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '2_4_4_0' TO TABLE s3_... (stage: Complete)
18:07:07.310 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '3_5_5_0' TO TABLE s3_... (stage: Complete)
18:07:07.349 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '3_6_6_0' TO TABLE s3_... (stage: Complete)
18:07:07.382 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '4_7_7_0' TO TABLE s3_... (stage: Complete)
18:07:07.412 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '4_8_8_0' TO TABLE s3_... (stage: Complete)
  # ^^^ 8 background threads now occupied

# 10th part — rejected immediately (1ms later):
18:07:07.468 <Error> executeQuery: Code: 236. DB::Exception: Failed to schedule export part task
  for data part '5_10_10_0'. Background executor is busy. (ABORTED)

From this point on, every retry of 5_10_10_0 was rejected — 1960+ times over 5 minutes.

Phase 2 — Background threads stuck in S3 retries

All 8 executor threads entered ExportPartTask::executeStep() and began retrying S3 uploads against the dead MinIO. The S3 client is configured for 501 retries, each timing out after ~6 seconds:

# clickhouse-server.log — S3 retries begin immediately

18:07:07.673 [ 2188 ] S3ClientRetryStrategy: Attempt 1/501 failed with retryable error: ...,
  Timeout: connect timed out: 172.21.0.8:9001
18:07:07.708 [ 2180 ] S3ClientRetryStrategy: Attempt 1/501 failed with retryable error: ...,
  Timeout: connect timed out: 172.21.0.8:9001
18:07:07.750 [ 2185 ] S3ClientRetryStrategy: Attempt 1/501 failed with retryable error: ...,
  Timeout: connect timed out: 172.21.0.8:9001

# ~4 minutes later, still retrying:
18:11:28.110 [ 2188 ] S3ClientRetryStrategy: Attempt 50/501 failed ...
18:11:28.160 [ 2180 ] S3ClientRetryStrategy: Attempt 50/501 failed ...

# ~10 minutes later, server is killed while tasks are only at attempt 108/501:
18:17:15.290 [ 2180 ] S3ClientRetryStrategy: Attempt 108/501 failed ...
18:17:15.290 [ 2188 ] S3ClientRetryStrategy: Attempt 108/501 failed ...

8 threads, 8 parts, all stuck in parallel — each would need 501 retries * ~6s = ~50 minutes to drain:

Part Thread Stuck in
1_1_1_0 2188 ExportPartTask::executeStep() → S3 PutObject retry loop
1_2_2_0 2180 same
2_3_3_0 2185 same
2_4_4_0 2187 same
3_5_5_0 2183 same
3_6_6_0 2184 same
4_7_7_0 2186 same
4_8_8_0 2182 same

Phase 3 — Server shutdown, DROP TABLE hung

The server was shut down at 18:17:19 while tasks were still at attempt ~108/501. Prior to that, DROP TABLE on the source table hung for 300s because the background tasks held references to it:

# clickhouse-server.log — shutdown sequence while tasks still active

18:17:19.266 <Debug> Context: Shutting down merges executor
18:17:19.266 <Debug> Context: Shutting down fetches executor
18:17:19.266 <Debug> Context: Shutting down moves executor
18:17:19.266 <Debug> Context: Shutting down common executor

Root cause

Three issues combine to produce this failure:

  1. No cancellation mechanism for in-flight export tasks. Once an ExportPartTask is scheduled, it cannot be cancelled — not by the client, not by DROP TABLE, and not by any timeout. PR Improvements to partition export #1402 added an isCancelled() check before exec.execute(), but the S3 retry loop runs inside exec.execute() and does not check the cancellation flag between retries.

  2. Excessive S3 retry budget. The S3ClientRetryStrategy allows 501 retries with ~6-second connect timeouts per retry, meaning a single stuck task blocks a background thread for ~50 minutes.

  3. Hard rejection when executor is full. New export requests get Code: 236 (ABORTED) with no option to queue or wait, so the subsystem is completely unavailable until the stuck tasks drain.

Impact

  • The entire EXPORT PART subsystem becomes unavailable for up to ~50 minutes after a transient S3 outage
  • DROP TABLE on affected tables hangs indefinitely while tasks are in flight
  • No user-facing way to cancel stuck export tasks or reclaim executor slots

Reproducibility

Deterministic - reproduced on two consecutive runs with identical behavior.

PR #1402 — "Improvements to partition export" (merged 2026-03-07)

This PR introduced the bug. The critical change is in MergeTreeData::exportPartToTable(), which rewrote how export part tasks are scheduled:

Before #1402 — export parts used a lazy, trigger-based model. exportPartToTable() added a manifest to the set and called background_moves_assignee.trigger(). The background assignee would later pick up unprocessed manifests in scheduleDataMovingJob(), one at a time, interleaved with regular data-move jobs. This prevented executor saturation.

// OLD: just store manifest + trigger
export_manifests.emplace(std::move(manifest));
background_moves_assignee.trigger();

After #1402exportPartToTable() creates the task eagerly and schedules it directly via scheduleMoveTask(). If the executor is full, it throws Code: 236 (ABORTED). Every ALTER TABLE ... EXPORT PART call immediately occupies a background executor slot, and rapid sequential calls saturate all slots before any task completes.

// NEW: create task + schedule immediately, throw if full
manifest.task = std::make_shared<ExportPartTask>(*this, manifest);
if (!background_moves_assignee.scheduleMoveTask(manifest.task))
{
    export_manifests.erase(manifest);
    throw Exception(ErrorCodes::ABORTED,
        "Failed to schedule export part task for data part '{}'. Background executor is busy",
        part_name);
}

PR #1402 also removed the export-part scheduling from scheduleDataMovingJob() entirely — the old fallback loop that iterated export_manifests and scheduled idle ones was deleted. The only path to schedule an export task is now the inline path with no backpressure.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions