Skip to content

Conversation

@flash1293
Copy link
Contributor

@flash1293 flash1293 commented Jul 13, 2023

What

The apify connector has been broken for a while because it doesn't set the schema properly so the sync fails (the schema is completely empty and doesn't specify any fields).

This PR fixes the problem by wrapping all fields in a single data object that will be normalized as a single column in the destination. This can be refined later on with more sophisticated detection logic.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 13, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan and you've followed all steps in the Breaking Changes Checklist
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • The connector tests are passing in CI
  • You've updated the connector's metadata.yaml file (new!)
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@octavia-squidington-iii
Copy link
Collaborator

source-apify-dataset test report (commit 8639f91075) - ❌

⏲️ Total pipeline duration: 02mn22s

Step Result
Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-apify-dataset docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-apify-dataset test
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Jul 13, 2023
@flash1293 flash1293 changed the title Fix apify connector 🐛 Source apify-dataset: Fix broken sync Jul 13, 2023
@octavia-squidington-iii
Copy link
Collaborator

source-apify-dataset test report (commit 1c5e3ffc7f) - ✅

⏲️ Total pipeline duration: 01mn34s

Step Result
Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-apify-dataset docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-apify-dataset test
@octavia-squidington-iii
Copy link
Collaborator

source-apify-dataset test report (commit f32f7facc6) - ❌

⏲️ Total pipeline duration: 01mn42s

Step Result
Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-apify-dataset docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-apify-dataset test
@flash1293 flash1293 requested a review from a team July 14, 2023 11:07
@octavia-squidington-iii
Copy link
Collaborator

source-apify-dataset test report (commit 1ba40c9215) - ✅

⏲️ Total pipeline duration: 01mn55s

Step Result
Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-apify-dataset docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-apify-dataset test
"type": "object",
"properties": {
# as datasets are not typed and we only know the field name, each field is defined as oneOf of all possible types
field: {"type": ["array", "object", "boolean", "number", "integer", "string"]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will confuse downstream destinations. Should we just embed all the data in a wrapper object whose fields aren't declared?

Copy link
Contributor Author

@flash1293 flash1293 Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK it should turn every column into a jsonb which seemed better to me than a single wrapper object.
I actually started out that way (see the first commit 03ceb1d ) but I reconsidered because while not perfect this approach seemed a little better.

If you have concerns we can also go with the wrapper object for now (even though it would be a breaking change but that's probably not a huge deal as it's not functional at all at the moment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd recommend a wrapper object since it's get translated as JSONB or VARIANT etc.. by most downstream destinations

@octavia-squidington-iii
Copy link
Collaborator

source-apify-dataset test report (commit d147f42ecc) - ❌

⏲️ Total pipeline duration: 01mn44s

Step Result
Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-apify-dataset docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-apify-dataset test
@octavia-squidington-iii
Copy link
Collaborator

source-apify-dataset test report (commit 26bf15af49) - ✅

⏲️ Total pipeline duration: 01mn46s

Step Result
Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-apify-dataset docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-apify-dataset test
@flash1293 flash1293 merged commit b6531ac into master Jul 18, 2023
@flash1293 flash1293 deleted the flash1293/apify-fix branch July 18, 2023 09:28
efimmatytsin pushed a commit to scentbird/airbyte that referenced this pull request Jul 27, 2023
* fix connector * fix more things * prepare release * cleanup * format * revert back to wrapper object * format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/apify-dataset

4 participants