Managed I/O supports the following capabilities for Apache Iceberg:
| Catalogs |
|
|---|---|
| Read capabilities | Batch read |
| Write capabilities |
|
For BigQuery tables for Apache Iceberg, use the BigQueryIO connector with BigQuery Storage API. The table must already exist; dynamic table creation is not supported.
Requirements
The following SDKs support managed I/O for Apache Iceberg:
- Apache Beam SDK for Java version 2.58.0 or later
- Apache Beam SDK for Python version 2.61.0 or later
Configuration
Managed I/O for Apache Iceberg supports the following configuration parameters:
ICEBERG Read
| Configuration | Type | Description |
|---|---|---|
| table | str | Identifier of the Iceberg table. |
| catalog_name | str | Name of the catalog containing the table. |
| catalog_properties | map[str, str] | Properties used to set up the Iceberg catalog. |
| config_properties | map[str, str] | Properties passed to the Hadoop Configuration. |
| drop | list[str] | A subset of column names to exclude from reading. If null or empty, all columns will be read. |
| filter | str | SQL-like predicate to filter data at scan time. Example: "id > 5 AND status = 'ACTIVE'". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html |
| keep | list[str] | A subset of column names to read exclusively. If null or empty, all columns will be read. |
ICEBERG Write
| Configuration | Type | Description |
|---|---|---|
| table | str | A fully-qualified table identifier. You may also provide a template to write to multiple dynamic destinations, for example: `dataset.my_{col1}_{col2.nested}_table`. |
| catalog_name | str | Name of the catalog containing the table. |
| catalog_properties | map[str, str] | Properties used to set up the Iceberg catalog. |
| config_properties | map[str, str] | Properties passed to the Hadoop Configuration. |
| drop | list[str] | A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'. |
| keep | list[str] | A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'. |
| only | str | The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'. |
| partition_fields | list[str] | Fields used to create a partition spec that is applied when tables are created. For a field 'foo', the available partition transforms are:
For more information on partition transforms, please visit https://iceberg.apache.org/spec/#partition-transforms. |
| table_properties | map[str, str] | Iceberg table properties to be set on the table when it is created. For more information on table properties, please visit https://iceberg.apache.org/docs/latest/configuration/#table-properties. |
| triggering_frequency_seconds | int32 | For a streaming pipeline, sets the frequency at which snapshots are produced. |
What's next
For more information and code examples, see the following topics: