Back to destination list
Official
S3 destination integration documentation
This destination plugin lets you sync data from a CloudQuery source to remote S3 storage in various formats such as CSV, JSON and Parquet
Loading plugin documentation
This destination plugin lets you sync data from a CloudQuery source to remote S3 storage in various formats such as CSV, JSON and Parquet
Loading plugin documentation
We use tracking cookies to understand how you use the product and help us improve it. Please accept cookies to help us improve. You can always opt out later via the link in the footer.
s3://bucket_name/path/to/files, with each table placed in its own directory.kind: destination spec: name: "s3" path: "cloudquery/s3" registry: "cloudquery" version: "v7.9.9" write_mode: "append" send_sync_summary: true # Learn more about the configuration options at https://cql.ink/s3_destination spec: bucket: "bucket_name" region: "region-name" # Example: us-east-1 path: "path/to/files/{{TABLE}}/{{UUID}}.{{FORMAT}}" format: "parquet" # options: parquet, json, csv format_spec: # CSV specific parameters: # delimiter: "," # skip_header: false # Parquet specific parameters: # version: "v2Latest" # root_repetition: "repeated" # max_row_group_length: 134217728 # 128 * 1024 * 1024 # Optional parameters # compression: "" # options: gzip # no_rotate: false # athena: false # <- set this to true for Athena compatibility # write_empty_objects_for_empty_tables: false # <- set this to true if using with the CloudQuery Compliance policies # test_write: true # tests the ability to write to the bucket before processing the data # endpoint: "" # Endpoint to use for S3 API calls. # endpoint_skip_tls_verify # Disable TLS verification if using an untrusted certificate # use_path_style: false # batch_size: 10000 # 10K entries # batch_size_bytes: 52428800 # 50 MiB # batch_timeout: 30s # 30 seconds # max_retries: 3 # 3 retries # max_backoff: 30 # 30 seconds # part_size: 5242880 # 5 MiB # aws_debug: true # credentials: # <- Use this to specify non-default role assumption parameters # local_profile: "s3-profile" # Use a local profile instead of the default one # role_arn: "arn:aws:iam::123456789012:role/role_name" # Specify the role to assume # external_id: "external_id" # Used when assuming a role # role_session_name: "session_name" # Used when assuming a role {{YEAR}}, {{MONTH}}, {{DAY}} and {{HOUR}} in the path to create a directory structure based on the current time. For example:path: "path/to/files/{{TABLE}}/dt={{YEAR}}-{{MONTH}}-{{DAY}}/{{UUID}}.parquet" json and csv.append write_mode. The (top level) spec section is described in the Destination Spec Reference.batch_size, batch_size_bytes and batch_timeout options (see below).bucket (string) (required)region (string) (required)credentials (credentials) (optional)path (string) (required)path/to/files/{{TABLE}}/{{UUID}}.parquet.{{TABLE}} will be replaced with the table name{{TABLE_HYPHEN}} will be replaced with the table name with hyphens instead of underscores.{{SYNC_ID}} will be replaced with the unique identifier of the sync. This value is a UUID and is randomly generated for each sync.{{FORMAT}} will be replaced with the file format, such as csv, json or parquet. If compression is enabled, the format will be csv.gz, json.gz etc.{{UUID}} will be replaced with a random UUID to uniquely identify each file{{YEAR}} will be replaced with the current year in YYYY format{{MONTH}} will be replaced with the current month in MM format{{DAY}} will be replaced with the current day in DD format{{HOUR}} will be replaced with the current hour in HH format{{MINUTE}} will be replaced with the current minute in mm formatUTC and will be the current time at the time the file is written, not when the sync started.format (string) (required)csv, json and parquet.format_spec (format_spec) (optional)server_side_encryption_configuration (server_side_encryption_configuration) (optional)compression (string) (optional) (default: "")"" or gzip. Not supported for parquet format.no_rotate (boolean) (optional) (default: false)true, the plugin will write to one file per table. Otherwise, for every batch a new file will be created with a different .<UUID> suffix.athena (boolean) (optional) (default: false)athena is set to true, the S3 plugin will sanitize keys in JSON columns to be compatible with the Hive Metastore / Athena. This allows tables to be created with a Glue Crawler and then queried via Athena, without changes to the table schema.write_empty_objects_for_empty_tables (boolean) (optional) (default: false)test_write (boolean) (optional) (default: true)false to skip the test.endpoint (string) (optional) (default: "")https://s3.amazonaws.com/BUCKET/KEY, use_path_style should be enabled, too.acl (string) (optional) (default: "")private, public-read, public-read-write, authenticated-read, aws-exec-read, bucket-owner-read, bucket-owner-full-control.endpoint_skip_tls_verify (boolean) (optional) (default: false)endpoint option.use_path_style (boolean) (optional) (default: false)endpoint option, i.e., https://s3.amazonaws.com/BUCKET/KEY. By default, the S3 client will use virtual hosted bucket addressing when possible (https://BUCKET.s3.amazonaws.com/KEY).batch_size (integer) (optional) (default: 10000)batch_size_bytes (integer) (optional) (default: 52428800 (= 50 MiB))batch_timeout (duration) (optional) (default: 30s (30 seconds))delimiter (string) (optional) (default: ,)skip_header (boolean) (optional) (default: false)true, the CSV file will not contain a header row as the first row.version (string) (optional) (default: v2Latest)v1.0, v2.4, v2.6 and v2Latest. v2Latest is an alias for the latest version available in the Parquet library which is currently v2.6.root_repetition (string) (optional) (default: repeated)undefined, required, optional and repeated.undefined.max_row_group_length (integer) (optional) (default: 134217728 (= 128 _ 1024 _ 1024))sse_kms_key_id (string) (required)server_side_encryption.server_side_encryption (string) (required)AES256, aws:kms and aws:kms:dsse.local_profile (string) (default: will use current credentials)[default] aws_access_key_id=xxxx aws_secret_access_key=xxxx [user1] aws_access_key_id=xxxx aws_secret_access_key=xxxx local_profile should be set to either default or user1.role_arn (string)role_session_name (string)role_arn.external_id (string)role_arn.delimiter (string) (optional) (default: ,)skip_header (boolean) (optional) (default: false)true, the CSV file will not contain a header row as the first row.PutObject permissions (we will never make any changes to your cloud setup), so, following the principle of least privilege, it's recommended to grant it PutObject permissions.aws account list-regions AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN environment variables - see the AWS guide.credentials and config files in ~/.aws (the credentials file takes priority)aws sso to authenticate the plugin - see Configuring IAM Identity Center authentication with the AWS CLI AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN environment variables.credentials and config files in ~/.aws (the credentials file takes priority).aws sso to authenticate cloudquery - you can read more about it here.PutObject permissions (we will never make any changes to your cloud setup), so, following the principle of least privilege, it's recommended to grant it PutObject permissions.AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN environment variables (AWS_SESSION_TOKEN can be optional for some accounts). For information on obtaining credentials, see the AWS guide. To export the environment variables (On Linux/Mac - similar for Windows):export AWS_ACCESS_KEY_ID='{Your AWS Access Key ID}' export AWS_SECRET_ACCESS_KEY='{Your AWS secret access key}' export AWS_SESSION_TOKEN='{Your AWS session token}' credentials and config files in the .aws directory in your home folder. The contents of these files are practically interchangeable, but CloudQuery will prioritize credentials in the credentials file. For information about obtaining credentials, see the AWS guide. Here are example contents for a credentials file:[default] aws_access_key_id = YOUR_ACCESS_KEY_ID aws_secret_access_key = YOUR_SECRET_ACCESS_KEY [myprofile] aws_access_key_id = YOUR_ACCESS_KEY_ID aws_secret_access_key = YOUR_SECRET_ACCESS_KEY AWS_PROFILE environment variable (On Linux/Mac, similar for Windows):export AWS_PROFILE=myprofile local_profile field:accounts: id: <account_alias> local_profile: myprofile aws sts get-session-token --serial-number <YOUR_MFA_SERIAL_NUMBER> --token-code <YOUR_MFA_TOKEN_CODE> --duration-seconds 3600 export AWS_ACCESS_KEY_ID=<YOUR_ACCESS_KEY_ID> export AWS_SECRET_ACCESS_KEY=<YOUR_SECRET_ACCESS_KEY> export AWS_SESSION_TOKEN=<YOUR_SESSION_TOKEN>