Skip to content

Conversation

@theyueli
Copy link
Contributor

What

Fixes https://github.com/airbytehq/airbyte-internal-issues/issues/8559
Fixes https://github.com/airbytehq/oncall/issues/5581

This patch disables incremental sync when a stream is found to be a view. This is done only for Postgres sources configured in xmin mode.

How

Adding a check in discovery() to only allow FULL_REFRESH sync mode.

Review guide

User Impact

In the UI, user will only see full refresh sync option for view. See the picture below.
Screenshot 2024-07-18 at 6 09 50 PM

Can this PR be safely reverted and rolled back?

  • [X ] YES 💚
  • NO ❌
@theyueli theyueli self-assigned this Jul 19, 2024
@theyueli theyueli requested a review from a team as a code owner July 19, 2024 01:21
@vercel
Copy link

vercel bot commented Jul 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Jul 19, 2024 1:59am
@theyueli theyueli changed the title disable incremental sync for view in Postgres xmin replication mode [source-postgres] Disable incremental sync for view in Postgres xmin replication mode Jul 19, 2024
}

public static AirbyteStream setFullRefreshToView(final JdbcDatabase database, final AirbyteStream stream) throws SQLException {
if (isStreamAView(database, stream)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means we will run this query N times, once for all N streams. Could we load up the list of streams once instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I just pushed a commit that implements this idea. Please take another look, thanks!

*/
public static AirbyteStream overrideSyncModes(final AirbyteStream stream, Map<String, List<String>> viewsBySchema) {
if (isStreamAView(viewsBySchema, stream)) {
return stream.withSupportedSyncModes(Lists.newArrayList(SyncMode.FULL_REFRESH));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I thought in an xmin sync, full refreshes were still incremental... and the therefore still dependent on a usable cursor?

Copy link
Contributor Author

@theyueli theyueli Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you mentioned is for the cursor-based sync mode;

maybe I'm wrong, but from what I tested, it does not require specifying a cursor in xmin (see picture below)
Screenshot 2024-07-18 at 7 30 15 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full refreshes are still incremental (based on CTID), but I didn't think a cursor needed to be explicitly set? So I think fixing the sync modes might work

cc : @xiaohansong who might know this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - cursor is not needed for full refresh. We should support full refresh anyway, for views they would be non resumable.

@theyueli theyueli merged commit c7a1213 into master Jul 19, 2024
@theyueli theyueli deleted the yue/disable-incremental-for-view-xmin branch July 19, 2024 21:38
xiaohansong pushed a commit that referenced this pull request Jul 23, 2024
…replication mode (#42108) Fixes airbytehq/airbyte-internal-issues#8559 Fixes airbytehq/oncall#5581 This patch disables incremental sync when a stream is found to be a view. This is done only for Postgres sources configured in xmin mode. This is done by adding a check in discovery() to only allow FULL_REFRESH sync mode.
@taherajnaFA
Copy link

taherajnaFA commented Jul 25, 2024

@theyueli @xiaohansong If I understand this change correctly, we can no longer sync a view in an incremental mode, right?
Currently, we have a use case where we want to filter the base table of our Postgres source within a certain time frame, essentially we do not wish to sync old data into our warehouse.
So the way we have achieved this is by creating a view on top of the base table with the time filter applied, also we have selected the xmin column while creating the view.
To Airbyte, up until now this would just behave as a raw table, all we had to do was unselect the xmin column from the schema while setting up the connection and Airbyte would sync everything incrementally to our destination.
After this change, i assume that this would no longer work.

I think instead of removing the option out right, this change should be implemented in a different way.

@evantahler
Copy link
Contributor

If I understand this change correctly, we can no longer sync a view in an incremental mode, right?

Only when using the xmin replication mode

@taherajnaFA
Copy link

taherajnaFA commented Jul 29, 2024

@evantahler precisely, that's what our usecase is. We just have a filter on the start date of the source table in our view, everything else stays as is.

taherajnaFA added a commit to taherajnaFA/airbyte that referenced this pull request Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/postgres

7 participants