Skip to content

Conversation

@evantahler
Copy link
Contributor

@evantahler evantahler commented Jun 6, 2024

Re: https://github.com/airbytehq/oncall/issues/5379
Closes: https://github.com/airbytehq/oncall/issues/5379
Dev Image: https://github.com/airbytehq/airbyte/actions/runs/9393312995

When we encounter mongo collections with multiple _id types in the same collection, ordering the records for the initial snapshot gets weird. We can't fully guarantee that _id:1 is always sorted before or after _id:"b". However, failing the sync because of these issues, in all cases, is also premature until we can confirm data loss more robustly. In the linked OC issue, we do have signal that at least for some customers, when syncing with multiple ID types, we do not loose data.

This PR changes a throw into a logger.warn for now.

@vercel
Copy link

vercel bot commented Jun 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Jul 22, 2024 10:55pm
@rodireich
Copy link
Contributor

@evantahler I think this is probably the way to go - a single misbehaving row shouldn't cause an entire sync to fail.
In addition to removal of the strict test,
What I think we can also add here to formalize it is:

  1. We don't want a single row to trump a sync, but some threshold - if id's are all over the place then how we determine the "correct" type.
  2. Marking incorrect id type rows with a meta event - similarly to how we do on other connectors when we fail to convert a value.
  3. Add analytics to be able to know how common this is
@evantahler
Copy link
Contributor Author

We have learned that ins some cases, it is intentional that there are many types of _ids in a single collection, so it's more than a single bad object :(

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Jul 22, 2024
@evantahler evantahler changed the title [source-mongo] skip _id consistent type check source-mongo: Warn (vs fail) on different _id types in collection Jul 22, 2024
@evantahler evantahler marked this pull request as ready for review July 22, 2024 20:48
@evantahler evantahler requested a review from a team as a code owner July 22, 2024 20:48
Copy link
Contributor

@theyueli theyueli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change!

@evantahler evantahler merged commit d2ae382 into master Jul 22, 2024
@evantahler evantahler deleted the evan/mongo-no-id-check branch July 22, 2024 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/mongodb-v2

6 participants