- Notifications
You must be signed in to change notification settings - Fork 60
Open
Labels
Description
BigQuery has following Quota Policy.
So, It's better to split output file each 4GB.
| File Type | Compressed | Uncompressed |
|---|---|---|
| CSV | 4 GB | With new-lines in strings: 4 GB Without new-lines in strings: 5 TB |
| JSON | 4 GB | 5TB |
Problems
- Have to split newline(CRLF/LF/CR) at EOL, not only filesize.
- Split before output beforehand is better way than split output file, Because Embulk run multiple tasks with multiple CPU cores.
kosukekurimoto