- Notifications
You must be signed in to change notification settings - Fork 4.9k
🎉 Source S3: support of Parquet format #5305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /test connector=connectors/source-s3
|
| /test connector=connectors/source-s3
|
| /test connector=connectors/source-s3
|
| /test connector=connectors/source-s3
|
bazarnov left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix branch conflicts + fix the airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py file conflict on this branch.
| /test connector=connectors/source-s3 |
1 similar comment
| /test connector=connectors/source-s3 |
| /test connector=connectors/source-s3 |
…es_abstract/formats/parquet_spec.py Co-authored-by: George Claireaux <george@claireaux.co.uk>
…es_abstract/formats/parquet_spec.py Co-authored-by: George Claireaux <george@claireaux.co.uk>
Phlair left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, lgtm! One small note on buffer_size to make that more clear.
Due to the way we're iterating through individual files at the abstract-level, I anticipate issues with partitioned parquet datasets. I think we should make clear in the documentation that partitioned parquet datasets are unsupported for now.
For more context, it should work however the performance could be very bad + the columns used for partition would be missing from output (I think).
...te-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py Outdated Show resolved Hide resolved
| /test connector=connectors/source-s3
|
| /publish connector=connectors/source-s3
|
| /test connector=connectors/source-s3
|
| /test connector=connectors/source-s3
|
| /publish connector=connectors/source-s3
|
How
Using same lib 'pyarrow' as for csv parsing
Recommended reading order
formats/parquet_spec.pyformats/parquet_parserpyPre-merge Checklist
Updating a connector
Community member or Airbyter
airbyte_secret./gradlew :airbyte-integrations:connectors:<name>:integrationTest.README.mddocs/integrations/<source or destination>/<name>.mdincluding changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>command is passing./publishcommand described here