🐛 bug(source-klaviyo): fix problem with pods running out of memory when syning events stream historical data #46741
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

What
Related to https://github.com/airbytehq/oncall/issues/6684
The last job in the ticket has been running since September 20th, 2024, and it is currently in attempt 10. It has been syncing data from 2015 and forward, running out of memory or having similar problems after ~4 days/145GB of reading. You can find a summary in this comment.
Connection schema is set for 3 streams:
The sync fails because the
Eventsdata takes too long; the other two generally continue from the last state.So my intent to fix this is checkpointing, so even if the sync fails next attempt will start from a more advanced state rather than:
Setting state of SourceKlaviyo stream to {}How
Evens stream will use new
base_incremental_checkpoint_streamwith customKlaviyoCheckpointDatetimeBasedCursor.I have set a step of 1 month and a granularity of one second; the cursor uses the start and end date from the slices created to filter, the original KlaviyoDatetimeBasedCursor would only take start_date for this purpose.
Review guide
airbyte-integrations/connectors/source-klaviyo/source_klaviyo/manifest.yaml: new stream with checkpointing cursor and 1 month step and 1 second granularity.airbyte-integrations/connectors/source-klaviyo/source_klaviyo/components/datetime_based_cursor.py: cursor with filter start and end date from slice created.User Impact
User should be able to complete the sync job.
Can this PR be safely reverted and rolled back?