Skip to content

Conversation

@antixar
Copy link
Contributor

@antixar antixar commented Aug 10, 2021

How

Using same lib 'pyarrow' as for csv parsing

Recommended reading order

  1. formats/parquet_spec.py
  2. formats/parquet_parserpy

Pre-merge Checklist

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions
  • Connector version bumped like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

@antixar antixar linked an issue Aug 10, 2021 that may be closed by this pull request
@antixar antixar self-assigned this Aug 10, 2021
@github-actions github-actions bot added the area/connectors Connector related issues label Aug 10, 2021
@antixar
Copy link
Contributor Author

antixar commented Aug 12, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1125652227
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1125652227

@jrhizor jrhizor temporarily deployed to more-secrets August 12, 2021 22:26 Inactive
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Aug 13, 2021
@antixar
Copy link
Contributor Author

antixar commented Aug 13, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1126887837
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1126887837

@jrhizor jrhizor temporarily deployed to more-secrets August 13, 2021 07:54 Inactive
@antixar
Copy link
Contributor Author

antixar commented Aug 13, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1127075915
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1127075915

@jrhizor jrhizor temporarily deployed to more-secrets August 13, 2021 09:05 Inactive
@antixar
Copy link
Contributor Author

antixar commented Aug 13, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1127128527
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1127128527

@jrhizor jrhizor temporarily deployed to more-secrets August 13, 2021 09:21 Inactive
@antixar antixar requested review from bazarnov and midavadim August 13, 2021 09:47
Copy link
Contributor

@bazarnov bazarnov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix branch conflicts + fix the airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py file conflict on this branch.

@antixar
Copy link
Contributor Author

antixar commented Aug 23, 2021

/test connector=connectors/source-s3

1 similar comment
@antixar
Copy link
Contributor Author

antixar commented Aug 23, 2021

/test connector=connectors/source-s3

@antixar
Copy link
Contributor Author

antixar commented Aug 23, 2021

/test connector=connectors/source-s3

@antixar antixar requested a review from bazarnov August 23, 2021 12:46
antixar and others added 2 commits August 30, 2021 15:27
…es_abstract/formats/parquet_spec.py Co-authored-by: George Claireaux <george@claireaux.co.uk>
…es_abstract/formats/parquet_spec.py Co-authored-by: George Claireaux <george@claireaux.co.uk>
@airbytehq airbytehq deleted a comment from Phlair Aug 30, 2021
@antixar antixar requested a review from Phlair August 30, 2021 22:23
Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, lgtm! One small note on buffer_size to make that more clear.

Due to the way we're iterating through individual files at the abstract-level, I anticipate issues with partitioned parquet datasets. I think we should make clear in the documentation that partitioned parquet datasets are unsupported for now.
For more context, it should work however the performance could be very bad + the columns used for partition would be missing from output (I think).

@antixar
Copy link
Contributor Author

antixar commented Aug 31, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1186397350
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1186397350

@jrhizor jrhizor temporarily deployed to more-secrets August 31, 2021 13:54 Inactive
@antixar
Copy link
Contributor Author

antixar commented Aug 31, 2021

/publish connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1186480409
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1186480409

@jrhizor jrhizor temporarily deployed to more-secrets August 31, 2021 14:17 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets August 31, 2021 14:17 Inactive
@sherifnada sherifnada removed their request for review September 1, 2021 00:39
@antixar
Copy link
Contributor Author

antixar commented Sep 3, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1197766878
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1197766878

@jrhizor jrhizor temporarily deployed to more-secrets September 3, 2021 11:12 Inactive
@antixar
Copy link
Contributor Author

antixar commented Sep 4, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1201855623
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1201855623

@jrhizor jrhizor temporarily deployed to more-secrets September 4, 2021 22:59 Inactive
@antixar
Copy link
Contributor Author

antixar commented Sep 4, 2021

/publish connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1201885536
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1201885536

@jrhizor jrhizor temporarily deployed to more-secrets September 4, 2021 23:18 Inactive
@antixar antixar merged commit e5c44e6 into master Sep 4, 2021
@antixar antixar deleted the antixar/5102-source-s3-support-parquet branch September 4, 2021 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/connectors Connector related issues area/documentation Improvements or additions to documentation

6 participants