Skip to content

Conversation

@rodireich
Copy link
Contributor

@rodireich rodireich commented Sep 4, 2023

What

Postgres of versions before 14 are unable to run TID range scan queries that are running when we query with WHERE ctid > '(x,y)'.
As a result every time we query for a chunk of data postgres needs to run a full scan of the entire table. This leads to a considerably slow performance.
In order to avoid this issue we need to query postgres 12, 13 in a way that will do a TID scan (no "range")
See full discussion in this [doc]

How

Instead of querying for a range of ctid (WHERE ctid > '(x,y)'),
We are going to query for a list of ctid's.
Because there is no range of ctid, we need to know the last possible tuple in a page.

Recommended reading order

  1. InitialSyncCtidIterator.java
  2. PostgresQueryUtils.java
  3. PostgresCtidHandler.java

🚨 User Impact 🚨

User should see quicker initial sync of tables on all 3 sync modes (CDC, cursor, xmin).
Previous syncs that started with the old algorithm will seamlessly work faster upon another attempt.
No breaking changes.

subodh1810 and others added 19 commits August 21, 2023 22:23
# Conflicts: #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/PostgresQueryUtils.java #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/PostgresSource.java #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/cdc/PostgresCdcCtidInitializer.java #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/ctid/CtidGlobalStateManager.java #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/ctid/CtidPerStreamStateManager.java #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/ctid/CtidStateManager.java #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/ctid/PostgresCtidHandler.java #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/cursor_based/CursorBasedCtidUtils.java #	airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/xmin/XminCtidUtils.java #	airbyte-integrations/connectors/source-postgres/src/test/java/io/airbyte/integrations/source/postgres/ctid/PostgresCtidHandlerTest.java
@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@rodireich rodireich changed the base branch from master to postgres-validate-file-node-each-chunk September 4, 2023 05:47
@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2023

Coverage report for source-postgres

File Coverage [88.52%] 🍏
PostgresCtidHandler.java 94.78% 🍏
CtidUtils.java 91.74% 🍏
InitialSyncCtidIterator.java 89.83% 🍏
PostgresCdcCtidInitializer.java 89.61% 🍏
PostgresSource.java 88.55% 🍏
PostgresQueryUtils.java 84.63% 🍏
Ctid.java 82.42% 🍏
Total Project Coverage 71.68% 🍏
Copy link
Contributor

@prateekmukhedkar prateekmukhedkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodireich i realized now you don't have changes to release a new version of source-postgres connector. Were you planning to do it in a different PR?

@rodireich
Copy link
Contributor Author

@prateekmukhedkar did you mean an bumped version?
I always do that right before merging ☺️

@rodireich rodireich dismissed subodh1810’s stale review September 20, 2023 22:23

Tests like PostgresSourceAcceptanceLegacyCtidTest run the same test flows for with this new flow.

@airbyte-oss-build-runner
Copy link
Collaborator

source-postgres test report (commit 201421cffd) - ❌

⏲️ Total pipeline duration: 28mn00s

Step Result
Build connector tar
Build source-postgres docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test
@rodireich rodireich changed the title 29779 source postgres slow ctid read seen on customer connection ✨29779 source postgres slow ctid read seen on customer connection Sep 20, 2023
@airbyte-oss-build-runner
Copy link
Collaborator

source-postgres test report (commit ffe61c9921) - ✅

⏲️ Total pipeline duration: 27mn38s

Step Result
Build connector tar
Build source-postgres docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test
@airbyte-oss-build-runner
Copy link
Collaborator

source-postgres-strict-encrypt test report (commit ffe61c9921) - ✅

⏲️ Total pipeline duration: 06mn42s

Step Result
Build connector tar
Build source-postgres-strict-encrypt docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-postgres-strict-encrypt/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres-strict-encrypt test
@airbyte-oss-build-runner
Copy link
Collaborator

source-alloydb test report (commit ffe61c9921) - ❌

⏲️ Total pipeline duration: 02mn15s

Step Result
Build connector tar
Build source-alloydb docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-alloydb/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-alloydb test
@airbyte-oss-build-runner
Copy link
Collaborator

source-postgres test report (commit f40e7032d8) - ✅

⏲️ Total pipeline duration: 03mn39s

Step Result
Build connector tar
Build source-postgres docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test
@airbyte-oss-build-runner
Copy link
Collaborator

source-alloydb test report (commit f40e7032d8) - ❌

⏲️ Total pipeline duration: 01mn38s

Step Result
Build connector tar
Build source-alloydb docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-alloydb/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-alloydb test
@rodireich
Copy link
Contributor Author

/approve-and-merge reason="AlloyDB acceptance test failing in CI"

@airbyte-oss-build-runner
Copy link
Collaborator

source-postgres-strict-encrypt test report (commit f40e7032d8) - ✅

⏲️ Total pipeline duration: 02mn53s

Step Result
Build connector tar
Build source-postgres-strict-encrypt docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-postgres-strict-encrypt/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres-strict-encrypt test
@octavia-approvington
Copy link
Contributor

All in!!
all in baby

@octavia-approvington octavia-approvington merged commit a876e60 into master Sep 20, 2023
@octavia-approvington octavia-approvington deleted the 29779-source-postgres-slow-ctid-read-seen-on-customer-connection branch September 20, 2023 23:16
jbfbell pushed a commit that referenced this pull request Sep 22, 2023
…0125) Co-authored-by: subodh <subodh1810@gmail.com> Co-authored-by: subodh1810 <subodh1810@users.noreply.github.com> Co-authored-by: rodireich <rodireich@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment