- Notifications
You must be signed in to change notification settings - Fork 344
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
As a part of #1382 , we need to implement insert_into for IcebergTableProvider to support INSERT INTO query in datafusion:
insert into t value (1, 'a'); Physical Plans
Within insert_into, we will need to add a few nodes / Datafusion physical plans to complete the write process. And the entire write process can be described by the flowchart below:
flowchart TD A(["Input Node"]) --> F["Project Node"] F --> B["Repartition Node"] B --> C["Sort Node"] C --> D["Writer Node"] D --> E["Commit Node"] - Input Node: Input physical plan that represents the input data
- Project Node: Caculate partition value
- Repartition Node: Decide the partitioning mode for the best parallelism
- Sort Node: Sort the input data
- Writer Node: Spawn Iceberg writers and write the input data
- Commit Node: Commit the data written using Iceberg Tx API
Writer Extension
Except writers mentioned in the writer path of #1382 , there are other writers like RollingFileWriter can be useful to help split incoming data into multiple files
Tasks List
- Implement
RollingFileWriter: Helps split incoming data into multiple files #1541 - Implement Project Node: Caculate partition value #1542
- Implement Repartition Node: Decide when the partitioning mode for the best parallelism #1543
- Implement Sort Node: Sort the input data #1544
- Implement Writer Node: Spawn Iceberg writers and write the input data #1545
- Implement Commit Node: Commit the data written using Iceberg Tx API #1546
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request