Skip to content

Conversation

@brianjlai
Copy link
Contributor

Closes https://github.com/airbytehq/airbyte-internal-issues/issues/12630

What

Migrate the final stream from Instagram UserInsights to low-code format.

How

The main complexity of this connector is that we make four separate requests and then merge the record back together by the date value. And we emit only 1 record per day with a maximum query for the last 30 days (if there no incoming state).

The mapping of 4 periods to metrics in the Python implementation was:

METRICS_BY_PERIOD = { "day": [ "follower_count", "reach", ], "week": ["reach"], "days_28": ["reach"], "lifetime": ["online_followers"], } 

And an example final records looks like:

{ "follower_count": 10, "date": "2025-06-28T07:00:00+00:00", "reach": 3, "page_id": "12345", "business_account_id": "1234567890", "reach_week": 4, "reach_days_28": 5, "online_followers": { "0": 159, "1": 157 } } 

I tried to included clarifying comments, but the main trick I devised was that by structuring the property_list into groups of two, and a chunk size of 2, we can emit resulting stream slices that can be injected as query parameters.

And the last thing to note is the this requires a custom state migration because the existing implementation only incorporated the business_account_id as the slice key. And similar to source-jira we load the page_id (which needs to be injected into the outbound api request) as extra_fields so that it isn't used in the partition key.

Review guide

  1. manifest.yaml
  2. components.py

User Impact

No direct impact, however the format of the state message will change to the new format used by low-code connectors

Can this PR be safely reverted and rolled back?

Sort of. The code can be, but the state message migrates to the new format which makes regression a bit tricky. It is doable to rewrite it bac

  • YES 💚
  • NO ❌
@vercel
Copy link

vercel bot commented Jul 8, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Jul 8, 2025 10:11pm
@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2025

👋 Greetings, Contributor!

Here are some helpful tips and reminders for your convenience.

Helpful Resources

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • /format-fix - Fixes most formatting issues.
  • /bump-version - Bumps connector versions.
    • You can specify a custom changelog by passing changelog. Example: /bump-version changelog="My cool update"
    • Leaving the changelog arg blank will auto-populate the changelog from the PR title.
  • /run-cat-tests - Runs legacy CAT tests (Connector Acceptance Tests)
  • /build-connector-images - Builds and publishes a pre-release docker image for the modified connector(s).

📝 Edit this welcome message.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2025

source-instagram Connector Test Results

79 tests   76 ✅  43s ⏱️
 3 suites   3 💤
 3 files     0 ❌

Results for commit 7d22b3d.

♻️ This comment has been updated with latest results.

from airbyte_cdk.sources.source import TState
from airbyte_cdk.sources.streams.core import Stream
from source_instagram.api import InstagramAPI
from source_instagram.streams import UserInsights
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this in for now since it's easier to double check behavior between the python and low-code implementations.

This will end up getting deleted in the manifest-only migration anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're referring to the UserInsights class in streams.py?

connectorType: source
definitionId: 6acf6b55-4f1e-4fca-944e-1a3caef8aba8
dockerImageTag: 4.1.0-rc.1
dockerImageTag: 4.1.0-rc.2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an in-progress rollout for rc1 that was never properly performed, so rather than revert everything or trigger one now (it'll prevent me from regression testing), I will release both and analyze both affected streams

Copy link
Contributor

@pnilan pnilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one minor comment:

  • remove composite error handler (this is more of a nit, but IMO we should lessen our dependency on it)
yield stream_slice


class RFC3339DatetimeSchemaNormalization(TypeTransformer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did something similar in source-amplitude (although it was a RecordTransformation). Maybe we should create a component based on this in the future.

property_list:
# Chunk 1: period: day, metrics: follower_count,reach
- day
- "follower_count,reach"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, but it feels weird we need to annotate the chunks. Feels like each chunk should be a discrete object/list. But I guess this is an unusual implementation of property chunking compared to something like HubSpot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, what I'm doing is a little bit hacky since I'm structing things in a very specific way for the groupings.

The original intended use case of grouping for Hubspot/LinkedIn and other connectors was that we had an arbitrary list of properties to request from the API and we had to specify them under a query parameter like fields_to_request=a,b,c,d. And grouping is additional functionality where we need to make multiple requests.

And so I'm not quite using the grouping as it was intended, in a way follower_count and reach are the actual grouping. You're right that we probably want property_list to be more flexible like a key/val or object like you mentioned, but w/ too small a sample size I didn't want to introduce this just yet. And since I found a way w/o needing to change the interface I left it as such. I do agree this is not quite the ideal shape

period: "{{ stream_partition.extra_fields['query_properties'][0] }}"
metric: "{{ stream_partition.extra_fields['query_properties'][1] }}"
error_handler:
type: CompositeErrorHandler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove the CompositeErrorHandler, the following has the same behavior but removes our dependency on the composite error handler. (A year ago we talked about ripping it out because it actually doesn't provide any added functionality)

error_handler: type: DefaultErrorHandler max_retries: 5 backoff_strategies: - type: ExponentialBackoffStrategy factor: 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah that's a good point, no reason for the composite. Will fix!

cursor_datetime_formats:
- "%Y-%m-%dT%H:%M:%S+00:00"
step: P1D
cursor_granularity: PT0S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this granularity imply exclusive date ranges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does. I did this to replicate the exact request behavior of the python implementation that was inclusive on the same datetime. As far as I understand, this endpoint just returns one insight record for per day and each range is 1 day

from airbyte_cdk.sources.source import TState
from airbyte_cdk.sources.streams.core import Stream
from source_instagram.api import InstagramAPI
from source_instagram.streams import UserInsights
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're referring to the UserInsights class in streams.py?

@brianjlai
Copy link
Contributor Author

regression test results:

user_insights w/ state
https://github.com/airbytehq/airbyte/actions/runs/16135803702

user_insights_with_state
  • The check fails for both control and candidate, not sure why, but might be a Instagram API restriction
  • The catalog has expected mismatches for a additional fields on the user_insights, but these are more cosmetic or is_resumable. There are also two additions of views which I was unable to replicate. Testing locally the catalogs were exactly the same
  • Also interestingly for the control version it is unable to get any records, I can't explain this.
  • On the target there are 33 records. 3 streams were incremental w/ recent state so it is expected to see 1 record for each since we emit one insight metric per day
  • The final 30 records are because the last test candidate is in full refresh mode and therefore has no state.

user_insights w/o state:
https://github.com/airbytehq/airbyte/actions/runs/16156849963

user_insights_no_state
  • Similar to the above run, the control for some reason can't get any records
  • There are two syncs compared for a total of 60 records. And since user_insights can only be run for the last 30 days, this is the expected count.

Summary:

  • There were a lot of weird limitation I ran into while testing this low-code migration.
  • Because of that, I did a bit more manual validation of record accuracy and I confirmed the correct date range is being queried, the record counts matched, and spot checking records which matched on both versions
  • This should be ready for release
@brianjlai brianjlai merged commit c0336af into master Jul 9, 2025
27 of 28 checks passed
@brianjlai brianjlai deleted the brian/instagram_user_insights_to_low_code branch July 9, 2025 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment