Deduplicate matching lines (within a configurable window) from a file or standard input, writing to standard output.
Like uniq but works on unsorted input to be used as a pipe filter with constant memory usage.
Sometimes you need consume a data stream (Certificate Transparency log for example) that have non consecutive duplicates and you don't want to deal with them. The usual solution involving awk has unbounded memory usage so that might be a problem, this one doesn't.
swuniq uses a ringbuffer of configurable size (-w option) as a FIFO queue to store hashes of each line to keep memory use constant (64bits * -w value).
# swuniq -h Usage: swuniq [-w N] [INPUT] Filter matching lines (within a configurable window) from INPUT (or standard input), writing to standard output. -w N Size of the sliding window to use for deduplication Note: By default swuniq will use a window of 100 lines. # cat input.txt apple apple apple banana banana strawberry blueberry apple banana strawberry blueberry kiwifruit orange peach watermelon orange watermelon kiwifruit banana banana banana apple kiwifruit # swuniq < input.txt apple banana strawberry blueberry kiwifruit orange peach watermelon # swuniq -w 4 < input.txt apple banana strawberry blueberry kiwifruit orange peach watermelon banana apple kiwifruit # swuniq -w 2 < input.txt apple banana strawberry blueberry apple banana strawberry blueberry kiwifruit orange peach watermelon orange kiwifruit banana apple kiwifruit