You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When ALTER TABLE ... EXPORT PART ... TO TABLE targets an S3-backed table and the S3 endpoint becomes unreachable, the background export tasks retry internally for an extremely long time (~50 minutes each), consuming all background executor slots. No new export operations can be scheduled until the stuck tasks complete. Additionally, DROP TABLE on the source table hangs while tasks are in flight.
What we were testing
The test validates that concurrent ALTER TABLE ... EXPORT PART operations behave correctly when the S3 destination (MinIO) is interrupted mid-operation. This is a network resilience scenario — the expectation is that export operations should either fail promptly or recover once the destination comes back, not permanently block the executor.
Test procedure
Create a partitioned MergeTree source table with 5 partitions (10 parts total)
Create an S3-backed destination table pointing to MinIO
Kill the MinIO container (docker kill --signal=KILL)
Export all 10 parts sequentially via ALTER TABLE ... EXPORT PART
Start MinIO back up and verify data
What happened
Phase 1 — Parts accepted, then rejected (18:07:07)
9 parts were accepted into the background executor. The 10th was immediately rejected:
# clickhouse-server.log — 9 parts accepted in rapid succession (~300ms window)
18:07:07.172 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '1_1_1_0' TO TABLE s3_... (stage: Complete)
18:07:07.207 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '1_2_2_0' TO TABLE s3_... (stage: Complete)
18:07:07.248 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '2_3_3_0' TO TABLE s3_... (stage: Complete)
18:07:07.276 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '2_4_4_0' TO TABLE s3_... (stage: Complete)
18:07:07.310 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '3_5_5_0' TO TABLE s3_... (stage: Complete)
18:07:07.349 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '3_6_6_0' TO TABLE s3_... (stage: Complete)
18:07:07.382 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '4_7_7_0' TO TABLE s3_... (stage: Complete)
18:07:07.412 <Debug> executeQuery: ALTER TABLE source_... EXPORT PART '4_8_8_0' TO TABLE s3_... (stage: Complete)
# ^^^ 8 background threads now occupied
# 10th part — rejected immediately (1ms later):
18:07:07.468 <Error> executeQuery: Code: 236. DB::Exception: Failed to schedule export part task
for data part '5_10_10_0'. Background executor is busy. (ABORTED)
From this point on, every retry of 5_10_10_0 was rejected — 1960+ times over 5 minutes.
Phase 2 — Background threads stuck in S3 retries
All 8 executor threads entered ExportPartTask::executeStep() and began retrying S3 uploads against the dead MinIO. The S3 client is configured for 501 retries, each timing out after ~6 seconds:
The server was shut down at 18:17:19 while tasks were still at attempt ~108/501. Prior to that, DROP TABLE on the source table hung for 300s because the background tasks held references to it:
# clickhouse-server.log — shutdown sequence while tasks still active
18:17:19.266 <Debug> Context: Shutting down merges executor
18:17:19.266 <Debug> Context: Shutting down fetches executor
18:17:19.266 <Debug> Context: Shutting down moves executor
18:17:19.266 <Debug> Context: Shutting down common executor
Root cause
Three issues combine to produce this failure:
No cancellation mechanism for in-flight export tasks. Once an ExportPartTask is scheduled, it cannot be cancelled — not by the client, not by DROP TABLE, and not by any timeout. PR Improvements to partition export #1402 added an isCancelled() check before exec.execute(), but the S3 retry loop runs insideexec.execute() and does not check the cancellation flag between retries.
Excessive S3 retry budget. The S3ClientRetryStrategy allows 501 retries with ~6-second connect timeouts per retry, meaning a single stuck task blocks a background thread for ~50 minutes.
Hard rejection when executor is full. New export requests get Code: 236 (ABORTED) with no option to queue or wait, so the subsystem is completely unavailable until the stuck tasks drain.
Impact
The entire EXPORT PART subsystem becomes unavailable for up to ~50 minutes after a transient S3 outage
DROP TABLE on affected tables hangs indefinitely while tasks are in flight
No user-facing way to cancel stuck export tasks or reclaim executor slots
Reproducibility
Deterministic - reproduced on two consecutive runs with identical behavior.
PR #1402 — "Improvements to partition export" (merged 2026-03-07)
This PR introduced the bug. The critical change is in MergeTreeData::exportPartToTable(), which rewrote how export part tasks are scheduled:
Before #1402 — export parts used a lazy, trigger-based model. exportPartToTable() added a manifest to the set and called background_moves_assignee.trigger(). The background assignee would later pick up unprocessed manifests in scheduleDataMovingJob(), one at a time, interleaved with regular data-move jobs. This prevented executor saturation.
// OLD: just store manifest + trigger
export_manifests.emplace(std::move(manifest));
background_moves_assignee.trigger();
After #1402 — exportPartToTable() creates the task eagerly and schedules it directly via scheduleMoveTask(). If the executor is full, it throws Code: 236 (ABORTED). Every ALTER TABLE ... EXPORT PART call immediately occupies a background executor slot, and rapid sequential calls saturate all slots before any task completes.
// NEW: create task + schedule immediately, throw if full
manifest.task = std::make_shared<ExportPartTask>(*this, manifest);
if (!background_moves_assignee.scheduleMoveTask(manifest.task))
{
export_manifests.erase(manifest);
throwException(ErrorCodes::ABORTED,
"Failed to schedule export part task for data part '{}'. Background executor is busy",
part_name);
}
PR #1402 also removed the export-part scheduling from scheduleDataMovingJob() entirely — the old fallback loop that iterated export_manifests and scheduled idle ones was deleted. The only path to schedule an export task is now the inline path with no backpressure.
Version: 26.1.4.20001 (altinity build)
Introduced by: PR #1402
How to run the test:
./regression.py --clickhouse https://altinity-build-artifacts.s3.amazonaws.com/REFs/antalya-26.1/9b978b90baa1fd20e917a17784f8ec8c22265fd3/build_amd_release/clickhouse-common-static_26.1.4.20001.altinityantalya_amd64.deb --clickhouse-version 26.1.4.20001 -l test.log --storage minio --only "/s3/minio/export tests/export part/concurrent alter/during minio interruption/*" --as-binarySummary
When
ALTER TABLE ... EXPORT PART ... TO TABLEtargets an S3-backed table and the S3 endpoint becomes unreachable, the background export tasks retry internally for an extremely long time (~50 minutes each), consuming all background executor slots. No new export operations can be scheduled until the stuck tasks complete. Additionally,DROP TABLEon the source table hangs while tasks are in flight.What we were testing
The test validates that concurrent
ALTER TABLE ... EXPORT PARToperations behave correctly when the S3 destination (MinIO) is interrupted mid-operation. This is a network resilience scenario — the expectation is that export operations should either fail promptly or recover once the destination comes back, not permanently block the executor.Test procedure
docker kill --signal=KILL)ALTER TABLE ... EXPORT PARTWhat happened
Phase 1 — Parts accepted, then rejected (18:07:07)
9 parts were accepted into the background executor. The 10th was immediately rejected:
From this point on, every retry of
5_10_10_0was rejected — 1960+ times over 5 minutes.Phase 2 — Background threads stuck in S3 retries
All 8 executor threads entered
ExportPartTask::executeStep()and began retrying S3 uploads against the dead MinIO. The S3 client is configured for 501 retries, each timing out after ~6 seconds:8 threads, 8 parts, all stuck in parallel — each would need 501 retries * ~6s = ~50 minutes to drain:
1_1_1_0ExportPartTask::executeStep()→ S3 PutObject retry loop1_2_2_02_3_3_02_4_4_03_5_5_03_6_6_04_7_7_04_8_8_0Phase 3 — Server shutdown, DROP TABLE hung
The server was shut down at 18:17:19 while tasks were still at attempt ~108/501. Prior to that,
DROP TABLEon the source table hung for 300s because the background tasks held references to it:Root cause
Three issues combine to produce this failure:
No cancellation mechanism for in-flight export tasks. Once an
ExportPartTaskis scheduled, it cannot be cancelled — not by the client, not byDROP TABLE, and not by any timeout. PR Improvements to partition export #1402 added anisCancelled()check beforeexec.execute(), but the S3 retry loop runs insideexec.execute()and does not check the cancellation flag between retries.Excessive S3 retry budget. The
S3ClientRetryStrategyallows 501 retries with ~6-second connect timeouts per retry, meaning a single stuck task blocks a background thread for ~50 minutes.Hard rejection when executor is full. New export requests get
Code: 236 (ABORTED)with no option to queue or wait, so the subsystem is completely unavailable until the stuck tasks drain.Impact
EXPORT PARTsubsystem becomes unavailable for up to ~50 minutes after a transient S3 outageDROP TABLEon affected tables hangs indefinitely while tasks are in flightReproducibility
Deterministic - reproduced on two consecutive runs with identical behavior.
PR #1402 — "Improvements to partition export" (merged 2026-03-07)
This PR introduced the bug. The critical change is in
MergeTreeData::exportPartToTable(), which rewrote how export part tasks are scheduled:Before #1402 — export parts used a lazy, trigger-based model.
exportPartToTable()added a manifest to the set and calledbackground_moves_assignee.trigger(). The background assignee would later pick up unprocessed manifests inscheduleDataMovingJob(), one at a time, interleaved with regular data-move jobs. This prevented executor saturation.// OLD: just store manifest + trigger export_manifests.emplace(std::move(manifest)); background_moves_assignee.trigger();After #1402 —
exportPartToTable()creates the task eagerly and schedules it directly viascheduleMoveTask(). If the executor is full, it throwsCode: 236 (ABORTED). EveryALTER TABLE ... EXPORT PARTcall immediately occupies a background executor slot, and rapid sequential calls saturate all slots before any task completes.PR #1402 also removed the export-part scheduling from
scheduleDataMovingJob()entirely — the old fallback loop that iteratedexport_manifestsand scheduled idle ones was deleted. The only path to schedule an export task is now the inline path with no backpressure.