- Notifications
You must be signed in to change notification settings - Fork 4.9k
feat(source-instagram): Migrate user_insights stream to low-code #62844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
👋 Greetings, Contributor!Here are some helpful tips and reminders for your convenience. Helpful Resources
PR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
|
|
| from airbyte_cdk.sources.source import TState | ||
| from airbyte_cdk.sources.streams.core import Stream | ||
| from source_instagram.api import InstagramAPI | ||
| from source_instagram.streams import UserInsights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this in for now since it's easier to double check behavior between the python and low-code implementations.
This will end up getting deleted in the manifest-only migration anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're referring to the UserInsights class in streams.py?
| connectorType: source | ||
| definitionId: 6acf6b55-4f1e-4fca-944e-1a3caef8aba8 | ||
| dockerImageTag: 4.1.0-rc.1 | ||
| dockerImageTag: 4.1.0-rc.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an in-progress rollout for rc1 that was never properly performed, so rather than revert everything or trigger one now (it'll prevent me from regression testing), I will release both and analyze both affected streams
pnilan left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one minor comment:
- remove composite error handler (this is more of a nit, but IMO we should lessen our dependency on it)
| yield stream_slice | ||
| | ||
| | ||
| class RFC3339DatetimeSchemaNormalization(TypeTransformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did something similar in source-amplitude (although it was a RecordTransformation). Maybe we should create a component based on this in the future.
| property_list: | ||
| # Chunk 1: period: day, metrics: follower_count,reach | ||
| - day | ||
| - "follower_count,reach" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, but it feels weird we need to annotate the chunks. Feels like each chunk should be a discrete object/list. But I guess this is an unusual implementation of property chunking compared to something like HubSpot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, what I'm doing is a little bit hacky since I'm structing things in a very specific way for the groupings.
The original intended use case of grouping for Hubspot/LinkedIn and other connectors was that we had an arbitrary list of properties to request from the API and we had to specify them under a query parameter like fields_to_request=a,b,c,d. And grouping is additional functionality where we need to make multiple requests.
And so I'm not quite using the grouping as it was intended, in a way follower_count and reach are the actual grouping. You're right that we probably want property_list to be more flexible like a key/val or object like you mentioned, but w/ too small a sample size I didn't want to introduce this just yet. And since I found a way w/o needing to change the interface I left it as such. I do agree this is not quite the ideal shape
| period: "{{ stream_partition.extra_fields['query_properties'][0] }}" | ||
| metric: "{{ stream_partition.extra_fields['query_properties'][1] }}" | ||
| error_handler: | ||
| type: CompositeErrorHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove the CompositeErrorHandler, the following has the same behavior but removes our dependency on the composite error handler. (A year ago we talked about ripping it out because it actually doesn't provide any added functionality)
error_handler: type: DefaultErrorHandler max_retries: 5 backoff_strategies: - type: ExponentialBackoffStrategy factor: 5There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah that's a good point, no reason for the composite. Will fix!
| cursor_datetime_formats: | ||
| - "%Y-%m-%dT%H:%M:%S+00:00" | ||
| step: P1D | ||
| cursor_granularity: PT0S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this granularity imply exclusive date ranges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does. I did this to replicate the exact request behavior of the python implementation that was inclusive on the same datetime. As far as I understand, this endpoint just returns one insight record for per day and each range is 1 day
| from airbyte_cdk.sources.source import TState | ||
| from airbyte_cdk.sources.streams.core import Stream | ||
| from source_instagram.api import InstagramAPI | ||
| from source_instagram.streams import UserInsights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're referring to the UserInsights class in streams.py?
| regression test results: user_insights w/ state
user_insights w/o state:
Summary:
|


Closes https://github.com/airbytehq/airbyte-internal-issues/issues/12630
What
Migrate the final stream from Instagram
UserInsightsto low-code format.How
The main complexity of this connector is that we make four separate requests and then merge the record back together by the
datevalue. And we emit only 1 record per day with a maximum query for the last 30 days (if there no incoming state).The mapping of 4 periods to metrics in the Python implementation was:
And an example final records looks like:
I tried to included clarifying comments, but the main trick I devised was that by structuring the
property_listinto groups of two, and a chunk size of 2, we can emit resulting stream slices that can be injected as query parameters.And the last thing to note is the this requires a custom state migration because the existing implementation only incorporated the
business_account_idas the slice key. And similar tosource-jirawe load thepage_id(which needs to be injected into the outbound api request) asextra_fieldsso that it isn't used in the partition key.Review guide
manifest.yamlcomponents.pyUser Impact
No direct impact, however the format of the state message will change to the new format used by low-code connectors
Can this PR be safely reverted and rolled back?
Sort of. The code can be, but the state message migrates to the new format which makes regression a bit tricky. It is doable to rewrite it bac