feat(source-instagram): Migrate user_insights stream to low-code #62844

brianjlai · 2025-07-08T03:12:53Z

Closes https://github.com/airbytehq/airbyte-internal-issues/issues/12630

What

Migrate the final stream from Instagram UserInsights to low-code format.

How

The main complexity of this connector is that we make four separate requests and then merge the record back together by the date value. And we emit only 1 record per day with a maximum query for the last 30 days (if there no incoming state).

The mapping of 4 periods to metrics in the Python implementation was:

METRICS_BY_PERIOD = { "day": [ "follower_count", "reach", ], "week": ["reach"], "days_28": ["reach"], "lifetime": ["online_followers"], }

And an example final records looks like:

{ "follower_count": 10, "date": "2025-06-28T07:00:00+00:00", "reach": 3, "page_id": "12345", "business_account_id": "1234567890", "reach_week": 4, "reach_days_28": 5, "online_followers": { "0": 159, "1": 157 } }

I tried to included clarifying comments, but the main trick I devised was that by structuring the property_list into groups of two, and a chunk size of 2, we can emit resulting stream slices that can be injected as query parameters.

And the last thing to note is the this requires a custom state migration because the existing implementation only incorporated the business_account_id as the slice key. And similar to source-jira we load the page_id (which needs to be injected into the outbound api request) as extra_fields so that it isn't used in the partition key.

Review guide

manifest.yaml
components.py

User Impact

No direct impact, however the format of the state message will change to the new format used by low-code connectors

Can this PR be safely reverted and rolled back?

Sort of. The code can be, but the state message migrates to the new format which makes regression a bit tricky. It is doable to rewrite it bac

YES 💚
NO ❌

vercel · 2025-07-08T03:12:58Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		Jul 8, 2025 10:11pm

github-actions · 2025-07-08T03:13:20Z

👋 Greetings, Contributor!

Here are some helpful tips and reminders for your convenience.

Helpful Resources

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

/format-fix - Fixes most formatting issues.
/bump-version - Bumps connector versions.
- You can specify a custom changelog by passing changelog. Example: /bump-version changelog="My cool update"
- Leaving the changelog arg blank will auto-populate the changelog from the PR title.
/run-cat-tests - Runs legacy CAT tests (Connector Acceptance Tests)
/build-connector-images - Builds and publishes a pre-release docker image for the modified connector(s).

📝 Edit this welcome message.

github-actions · 2025-07-08T03:15:36Z

`source-instagram` Connector Test Results

79 tests 76 ✅ 43s ⏱️
3 suites 3 💤
3 files 0 ❌

Results for commit 7d22b3d.

♻️ This comment has been updated with latest results.

brianjlai · 2025-07-08T03:16:14Z

airbyte-integrations/connectors/source-instagram/source_instagram/source.py

 from airbyte_cdk.sources.source import TState
-from airbyte_cdk.sources.streams.core import Stream
 from source_instagram.api import InstagramAPI
-from source_instagram.streams import UserInsights


I left this in for now since it's easier to double check behavior between the python and low-code implementations.

This will end up getting deleted in the manifest-only migration anyway

You're referring to the UserInsights class in streams.py?

brianjlai · 2025-07-08T03:17:22Z

airbyte-integrations/connectors/source-instagram/metadata.yaml

 connectorType: source
 definitionId: 6acf6b55-4f1e-4fca-944e-1a3caef8aba8
- dockerImageTag: 4.1.0-rc.1
+ dockerImageTag: 4.1.0-rc.2


There was an in-progress rollout for rc1 that was never properly performed, so rather than revert everything or trigger one now (it'll prevent me from regression testing), I will release both and analyze both affected streams

pnilan

Looks good, one minor comment:

remove composite error handler (this is more of a nit, but IMO we should lessen our dependency on it)

pnilan · 2025-07-08T15:49:24Z

airbyte-integrations/connectors/source-instagram/source_instagram/components.py

+ yield stream_slice
+
+
+class RFC3339DatetimeSchemaNormalization(TypeTransformer):


We did something similar in source-amplitude (although it was a RecordTransformation). Maybe we should create a component based on this in the future.

pnilan · 2025-07-08T15:59:20Z

airbyte-integrations/connectors/source-instagram/source_instagram/manifest.yaml

+ property_list:
+ # Chunk 1: period: day, metrics: follower_count,reach
+ - day
+ - "follower_count,reach"


This makes sense to me, but it feels weird we need to annotate the chunks. Feels like each chunk should be a discrete object/list. But I guess this is an unusual implementation of property chunking compared to something like HubSpot?

good question, what I'm doing is a little bit hacky since I'm structing things in a very specific way for the groupings.

The original intended use case of grouping for Hubspot/LinkedIn and other connectors was that we had an arbitrary list of properties to request from the API and we had to specify them under a query parameter like fields_to_request=a,b,c,d. And grouping is additional functionality where we need to make multiple requests.

And so I'm not quite using the grouping as it was intended, in a way follower_count and reach are the actual grouping. You're right that we probably want property_list to be more flexible like a key/val or object like you mentioned, but w/ too small a sample size I didn't want to introduce this just yet. And since I found a way w/o needing to change the interface I left it as such. I do agree this is not quite the ideal shape

pnilan · 2025-07-08T16:01:59Z

airbyte-integrations/connectors/source-instagram/source_instagram/manifest.yaml

+ period: "{{ stream_partition.extra_fields['query_properties'][0] }}"
+ metric: "{{ stream_partition.extra_fields['query_properties'][1] }}"
+ error_handler:
+ type: CompositeErrorHandler


Should remove the CompositeErrorHandler, the following has the same behavior but removes our dependency on the composite error handler. (A year ago we talked about ripping it out because it actually doesn't provide any added functionality)

error_handler: type: DefaultErrorHandler max_retries: 5 backoff_strategies: - type: ExponentialBackoffStrategy factor: 5

ah yeah that's a good point, no reason for the composite. Will fix!

pnilan · 2025-07-08T16:05:02Z

airbyte-integrations/connectors/source-instagram/source_instagram/manifest.yaml

+ cursor_datetime_formats:
+ - "%Y-%m-%dT%H:%M:%S+00:00"
+ step: P1D
+ cursor_granularity: PT0S


Does this granularity imply exclusive date ranges?

it does. I did this to replicate the exact request behavior of the python implementation that was inclusive on the same datetime. As far as I understand, this endpoint just returns one insight record for per day and each range is 1 day

pnilan · 2025-07-08T16:09:04Z

airbyte-integrations/connectors/source-instagram/source_instagram/source.py

 from airbyte_cdk.sources.source import TState
-from airbyte_cdk.sources.streams.core import Stream
 from source_instagram.api import InstagramAPI
-from source_instagram.streams import UserInsights


You're referring to the UserInsights class in streams.py?

brianjlai · 2025-07-09T07:29:51Z

regression test results:

user_insights w/ state
https://github.com/airbytehq/airbyte/actions/runs/16135803702

The check fails for both control and candidate, not sure why, but might be a Instagram API restriction
The catalog has expected mismatches for a additional fields on the user_insights, but these are more cosmetic or is_resumable. There are also two additions of views which I was unable to replicate. Testing locally the catalogs were exactly the same
Also interestingly for the control version it is unable to get any records, I can't explain this.
On the target there are 33 records. 3 streams were incremental w/ recent state so it is expected to see 1 record for each since we emit one insight metric per day
The final 30 records are because the last test candidate is in full refresh mode and therefore has no state.

user_insights w/o state:
https://github.com/airbytehq/airbyte/actions/runs/16156849963

Similar to the above run, the control for some reason can't get any records
There are two syncs compared for a total of 60 records. And since user_insights can only be run for the last 30 days, this is the expected count.

Summary:

There were a lot of weird limitation I ran into while testing this low-code migration.
Because of that, I did a bit more manual validation of record accuracy and I confirmed the correct date range is being queried, the record counts matched, and spot checking records which matched on both versions
This should be ready for release

Migrate user_insights stream to low-code

21648dd

octavia-squidington-iii added the connectors/source/instagram label Jul 8, 2025

clean up source.py to no longer reference python streams

bbc73e7

brianjlai requested review from ChristoGrab, dbgold17, pnilan and tolik0 July 8, 2025 03:13

brianjlai commented Jul 8, 2025

View reviewed changes

brianjlai added 2 commits July 7, 2025 23:33

use inline schema and add back needed method for check

c5a24a5

update pr number in docs

b497ab0

vercel bot deployed to Preview July 8, 2025 06:39 View deployment

fix integration tests

0350a03

pnilan approved these changes Jul 8, 2025

View reviewed changes

pr feedback

7d22b3d

brianjlai merged commit c0336af into master Jul 9, 2025
27 of 28 checks passed

brianjlai deleted the brian/instagram_user_insights_to_low_code branch July 9, 2025 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(source-instagram): Migrate user_insights stream to low-code #62844

feat(source-instagram): Migrate user_insights stream to low-code #62844

Uh oh!

brianjlai commented Jul 8, 2025

vercel bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025

github-actions bot commented Jul 8, 2025 •

edited

Loading

brianjlai Jul 8, 2025

pnilan Jul 8, 2025

brianjlai Jul 8, 2025

pnilan left a comment

pnilan Jul 8, 2025

pnilan Jul 8, 2025

brianjlai Jul 8, 2025

pnilan Jul 8, 2025

brianjlai Jul 8, 2025

pnilan Jul 8, 2025

brianjlai Jul 8, 2025

pnilan Jul 8, 2025

brianjlai commented Jul 9, 2025

Uh oh!

Labels

4 participants

		yield stream_slice


		class RFC3339DatetimeSchemaNormalization(TypeTransformer):

feat(source-instagram): Migrate user_insights stream to low-code #62844

feat(source-instagram): Migrate user_insights stream to low-code #62844

Uh oh!

Conversation

brianjlai commented Jul 8, 2025

What

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

vercel bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

github-actions bot commented Jul 8, 2025

👋 Greetings, Contributor!

Helpful Resources

PR Slash Commands

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

source-instagram Connector Test Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pnilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianjlai commented Jul 9, 2025

Uh oh!

Labels

4 participants

vercel bot commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

`source-instagram` Connector Test Results