- Notifications
You must be signed in to change notification settings - Fork 4.9k
🐛 Source apify-dataset: Fix broken sync #28290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
|
| Step | Result |
|---|---|
| Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ❌ |
| QA checks | ✅ |
| Code format checks | ✅ |
| Connector package install | ✅ |
| Build source-apify-dataset docker image for platform linux/x86_64 | ✅ |
| Unit tests | ✅ |
| Acceptance tests | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-apify-dataset test |
| Step | Result |
|---|---|
| Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
| Code format checks | ✅ |
| Connector package install | ✅ |
| Build source-apify-dataset docker image for platform linux/x86_64 | ✅ |
| Unit tests | ✅ |
| Acceptance tests | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-apify-dataset test |
| Step | Result |
|---|---|
| Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
| Code format checks | ❌ |
| Connector package install | ✅ |
| Build source-apify-dataset docker image for platform linux/x86_64 | ✅ |
| Unit tests | ✅ |
| Acceptance tests | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-apify-dataset test |
| Step | Result |
|---|---|
| Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
| Code format checks | ✅ |
| Connector package install | ✅ |
| Build source-apify-dataset docker image for platform linux/x86_64 | ✅ |
| Unit tests | ✅ |
| Acceptance tests | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-apify-dataset test| "type": "object", | ||
| "properties": { | ||
| # as datasets are not typed and we only know the field name, each field is defined as oneOf of all possible types | ||
| field: {"type": ["array", "object", "boolean", "number", "integer", "string"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will confuse downstream destinations. Should we just embed all the data in a wrapper object whose fields aren't declared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK it should turn every column into a jsonb which seemed better to me than a single wrapper object.
I actually started out that way (see the first commit 03ceb1d ) but I reconsidered because while not perfect this approach seemed a little better.
If you have concerns we can also go with the wrapper object for now (even though it would be a breaking change but that's probably not a huge deal as it's not functional at all at the moment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd recommend a wrapper object since it's get translated as JSONB or VARIANT etc.. by most downstream destinations
|
| Step | Result |
|---|---|
| Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
| Code format checks | ❌ |
| Connector package install | ✅ |
| Build source-apify-dataset docker image for platform linux/x86_64 | ✅ |
| Unit tests | ✅ |
| Acceptance tests | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-apify-dataset test |
| Step | Result |
|---|---|
| Validate airbyte-integrations/connectors/source-apify-dataset/metadata.yaml | ✅ |
| Connector version semver check | ✅ |
| Connector version increment check | ✅ |
| QA checks | ✅ |
| Code format checks | ✅ |
| Connector package install | ✅ |
| Build source-apify-dataset docker image for platform linux/x86_64 | ✅ |
| Unit tests | ✅ |
| Acceptance tests | ✅ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-apify-dataset test* fix connector * fix more things * prepare release * cleanup * format * revert back to wrapper object * format
What
The apify connector has been broken for a while because it doesn't set the schema properly so the sync fails (the schema is completely empty and doesn't specify any fields).
This PR fixes the problem by wrapping all fields in a single
dataobject that will be normalized as a single column in the destination. This can be refined later on with more sophisticated detection logic.