- Notifications
You must be signed in to change notification settings - Fork 4.9k
Description
Tell us about the problem you're trying to solve
Now that we have access to the number of records actually committed to the destination, we can be smarter about when run normalization. Our logic so far has been always run normalization, because even if replication failed, we should normalize whatever data that we can. This is wasteful if there's nothing to normalize.
Describe the solution you’d like
Now we can skip running normalization if the following condition if no records were committed to the source or destination.
There is one more case that we need to handle before we implement this. If in the previous replication job, records were synced but normalization failed, we likely still want to run normalization on the next run, even if no records are synced. I'm not sure a) if this is the behavior we want or a slightly different behavior and b) if this is the right behavior how we would do it sensibly.
@pmossman for visibilty, since this feature is unlocked by your change.
Acceptance Criteria
- Normalization is skipped when it is known that no records were committed
- Normalization is never skipped when it should have run (edge case around failed normalization from a previous job)