Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Conversation

nolar
Copy link
Contributor

@nolar nolar commented Jan 11, 2024

A known flaw: if there are equal duped rows, e.g.:

A: [pk=1000, val=hello], [pk=1000, val=hello] B: [pk=1000, val=hello], [pk=1000, val=hello] 

… then we might not notice them even on the level of checksum scanning of table segments. If the segments are fully equal, these dupes will never be yielded, neither with -/+, nor with a potentially different informational marker * introduced specially for dupes. It will only be noticed in segments that have some other (unrelated) differences. Which makes this dupe-detection not fully reliable.

@nolar nolar requested a review from dlawin January 11, 2024 11:21
@nolar nolar force-pushed the detect-duplicates branch from 41d71b0 to 8944e5f Compare January 11, 2024 16:45
@nolar nolar requested a review from vvkh January 11, 2024 16:45
@nolar nolar merged commit f8dd74c into master Jan 11, 2024
@nolar nolar deleted the detect-duplicates branch January 11, 2024 18:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

2 participants