Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions docs/en/engines/table-engines/mergetree-family/part_export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# ALTER TABLE EXPORT PART

## Overview

The `ALTER TABLE EXPORT PART` command exports individual MergeTree data parts to object storage (S3, Azure Blob Storage, etc.), typically in Parquet format.

**Key Characteristics:**
- **Experimental feature** - must be enabled via `allow_experimental_export_merge_tree_part` setting
- **Asynchronous** - executes in the background, returns immediately
- **Ephemeral** - no automatic retry mechanism; manual retry required on failure
- **Idempotent** - safe to re-export the same part (skips by default if file exists)
- **Preserves sort order** from the source table

## Syntax

```sql
ALTER TABLE [database.]table_name
EXPORT PART 'part_name'
TO TABLE [destination_database.]destination_table
SETTINGS allow_experimental_export_merge_tree_part = 1
[, setting_name = value, ...]
```

### Parameters

- **`table_name`**: The source MergeTree table containing the part to export
- **`part_name`**: The exact name of the data part to export (e.g., `'2020_1_1_0'`, `'all_1_1_0'`)
- **`destination_table`**: The target table for the export (typically an S3, Azure, or other object storage table)

## Requirements

Source and destination tables must be 100% compatible:

1. **Identical schemas** - same columns, types, and order
2. **Matching partition keys** - partition expressions must be identical

## Settings

### `allow_experimental_export_merge_tree_part` (Required)

- **Type**: `Bool`
- **Default**: `false`
- **Description**: Must be set to `true` to enable the experimental feature.

### `export_merge_tree_part_overwrite_file_if_exists` (Optional)

- **Type**: `Bool`
- **Default**: `false`
- **Description**: If set to `true`, it will overwrite the file. Otherwise, fails with exception.

## Examples

### Basic Export to S3

```sql
-- Create source and destination tables
CREATE TABLE mt_table (id UInt64, year UInt16)
ENGINE = MergeTree() PARTITION BY year ORDER BY tuple();

CREATE TABLE s3_table (id UInt64, year UInt16)
ENGINE = S3(s3_conn, filename='data', format=Parquet, partition_strategy='hive')
PARTITION BY year;

-- Insert and export
INSERT INTO mt_table VALUES (1, 2020), (2, 2020), (3, 2021);

ALTER TABLE mt_table EXPORT PART '2020_1_1_0' TO TABLE s3_table
SETTINGS allow_experimental_export_merge_tree_part = 1;

ALTER TABLE mt_table EXPORT PART '2021_2_2_0' TO TABLE s3_table
SETTINGS allow_experimental_export_merge_tree_part = 1;
```

## Monitoring

### Active Exports

Active exports can be found in the `system.exports` table. As of now, it only shows currently executing exports. It will not show pending or finished exports.

```sql
arthur :) select * from system.exports;

SELECT *
FROM system.exports

Query id: 2026718c-d249-4208-891b-a271f1f93407

Row 1:
──────
source_database: default
source_table: source_mt_table
destination_database: default
destination_table: destination_table
create_time: 2025-11-19 09:09:11
part_name: 20251016-365_1_1_0
destination_file_path: table_root/eventDate=2025-10-16/retention=365/20251016-365_1_1_0_17B2F6CD5D3C18E787C07AE3DAF16EB1.parquet
elapsed: 2.04845441
rows_read: 1138688 -- 1.14 million
total_rows_to_read: 550961374 -- 550.96 million
total_size_bytes_compressed: 37619147120 -- 37.62 billion
total_size_bytes_uncompressed: 138166213721 -- 138.17 billion
bytes_read_uncompressed: 316892925 -- 316.89 million
memory_usage: 596006095 -- 596.01 million
peak_memory_usage: 601239033 -- 601.24 million
```

### Export History

You can query succeeded or failed exports in `system.part_log`. For now, it only keeps track of completion events (either success or fails).

```sql
arthur :) select * from system.part_log where event_type='ExportPart' and table = 'replicated_source' order by event_time desc limit 1;

SELECT *
FROM system.part_log
WHERE (event_type = 'ExportPart') AND (`table` = 'replicated_source')
ORDER BY event_time DESC
LIMIT 1

Query id: ae1c1cd3-c20e-4f20-8b82-ed1f6af0237f

Row 1:
──────
hostname: arthur
query_id:
event_type: ExportPart
merge_reason: NotAMerge
merge_algorithm: Undecided
event_date: 2025-11-19
event_time: 2025-11-19 09:08:31
event_time_microseconds: 2025-11-19 09:08:31.974701
duration_ms: 4
database: default
table: replicated_source
table_uuid: 78471c67-24f4-4398-9df5-ad0a6c3daf41
part_name: 2021_0_0_0
partition_id: 2021
partition: 2021
part_type: Compact
disk_name: default
path_on_disk: year=2021/2021_0_0_0_78C704B133D41CB0EF64DD2A9ED3B6BA.parquet
rows: 1
size_in_bytes: 272
merged_from: ['2021_0_0_0']
bytes_uncompressed: 86
read_rows: 1
read_bytes: 6
peak_memory_usage: 22
error: 0
exception:
ProfileEvents: {}
```

### Profile Events

- `PartsExports` - Successful exports
- `PartsExportFailures` - Failed exports
- `PartsExportDuplicated` - Number of part exports that failed because target already exists.
- `PartsExportTotalMilliseconds` - Total time

56 changes: 56 additions & 0 deletions docs/en/operations/system-tables/exports.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
description: 'System table containing information about in progress merge tree part exports'
keywords: ['system table', 'exports', 'merge tree', 'part']
slug: /operations/system-tables/exports
title: 'system.exports'
---

Contains information about in progress merge tree part exports

Columns:

- `source_database` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the source database.
- `source_table` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the source table.
- `destination_database` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the destination database.
- `destination_table` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the destination table.
- `create_time` ([DateTime](/docs/en/sql-reference/data-types/datetime.md)) — Date and time when the export command was received in the server.
- `part_name` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the part.
- `destination_file_path` ([String](/docs/en/sql-reference/data-types/string.md)) — File path relative to where the part is being exported to.
- `elapsed` ([Float64](/docs/en/sql-reference/data-types/float.md)) — The time elapsed (in seconds) since the export started.
- `rows_read` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The number of rows read from the exported part.
- `total_rows_to_read` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The total number of rows to read from the exported part.
- `total_size_bytes_compressed` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The total size of the compressed data in the exported part.
- `total_size_bytes_uncompressed` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The total size of the uncompressed data in the exported part.
- `bytes_read_uncompressed` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The number of uncompressed bytes read from the exported part.
- `memory_usage` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — Current memory usage in bytes for the export operation.
- `peak_memory_usage` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — Peak memory usage in bytes during the export operation.

**Example**

```sql
arthur :) select * from system.exports;

SELECT *
FROM system.exports

Query id: 2026718c-d249-4208-891b-a271f1f93407

Row 1:
──────
source_database: default
source_table: source_mt_table
destination_database: default
destination_table: destination_table
create_time: 2025-11-19 09:09:11
part_name: 20251016-365_1_1_0
destination_file_path: table_root/eventDate=2025-10-16/retention=365/20251016-365_1_1_0_17B2F6CD5D3C18E787C07AE3DAF16EB1.parquet
elapsed: 2.04845441
rows_read: 1138688 -- 1.14 million
total_rows_to_read: 550961374 -- 550.96 million
total_size_bytes_compressed: 37619147120 -- 37.62 billion
total_size_bytes_uncompressed: 138166213721 -- 138.17 billion
bytes_read_uncompressed: 316892925 -- 316.89 million
memory_usage: 596006095 -- 596.01 million
peak_memory_usage: 601239033 -- 601.24 million
```

2 changes: 1 addition & 1 deletion src/Storages/System/StorageSystemExports.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ ColumnsDescription StorageSystemExports::getColumnsDescription()
{"source_table", std::make_shared<DataTypeString>(), "Name of the source table."},
{"destination_database", std::make_shared<DataTypeString>(), "Name of the destination database."},
{"destination_table", std::make_shared<DataTypeString>(), "Name of the destination table."},
{"create_time", std::make_shared<DataTypeDateTime>(), "Date and time when the export command was submitted for execution."},
{"create_time", std::make_shared<DataTypeDateTime>(), "Date and time when the export command was received in the server."},
{"part_name", std::make_shared<DataTypeString>(), "Name of the part"},
{"destination_file_path", std::make_shared<DataTypeString>(), "File path where the part is being exported."},
{"elapsed", std::make_shared<DataTypeFloat64>(), "The time elapsed (in seconds) since the export started."},
Expand Down
Loading