- Notifications
You must be signed in to change notification settings - Fork 4.9k
🎉 GitHub Source: Don't minimize output fields #5898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sherifnada left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is making me wonder if we should just not minimize fields at all. it seems to have been a fairly controversial move. I think I would prefer we do that since it's more in line with ELT. @cjwooo would you mind changing the PR to do that instead?
| Yea I can do that. Do other sources have the same field minimization feature? |
| I don't think so, no |
| @sherifnada Should we get some confirmation from the larger community before unminimizing everything? I'm only concerned it would break existing users' workflows. |
| @cjwooo apologies - I should have been clearer. Totally agreed with you that backwards compatibility is desirable here. I think we should:
and we can get rid of the minimization logic. I believe this would be totally backwards compatible. Does that help? |
| That makes sense, but one issue I see is that part of the current minimization logic is transforming a list of objects into a list of ids: https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github/source_github/streams.py#L200-L201. And the record key for this case stays the same, e.g. https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github/source_github/schemas/pull_requests.json#L62-L66. I don't think we can suddenly switch the outputted records for the PullRequests stream to have an array of label objects under |
| @sherifnada ^? |
| @cjwooo hmm that's a good point. I think we have a few options then:
I think I would rather do 2 rather than have a scar in the output schema for the rest of time. We could do a major version release to indicate this and manage the roll out such that consumers can prepare themselves for this. WDYT? |
| I'd also rather go with option 2. I'll go ahead and implement that. |
| @sherifnada Updated PR. |
sherifnada left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjwooo apologies to increase the scope on you here, but would you mind removing all minimization? if we're going to make a breaking change I would rather that we do it exactly once rather than phase out minimization in different stages.
Though if the current set of minimized fields will require no breaking changes in the future feel free to let it be and we'll finish that piece on our side. LMK.
Otherwise this looks great! thanks so much for trooping through this issue. I'll merge it as soon as you let me know the verdict on the above.
| @sherifnada Removed all minimization logic. |
What
Our use case requires the full values in certain streams, such as the author and committer fields in the Commits stream, not just their ids.
How
Add an option to the GitHub Source spec, allowing the user to choose whether or not to minimize fields for all streams. By default, the option is true, meaning fields will still be minimized.
Pre-merge Checklist
Community member or Airbyter
airbyte_secret./gradlew :airbyte-integrations:connectors:<name>:integrationTest.README.mdbootstrap.md. See description and examplesdocs/integrations/<source or destination>/<name>.mdincluding changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>command is passing./publishcommand described here