swuniq

Deduplicate matching lines (within a configurable window) from a file or standard input, writing to standard output.

Like uniq but works on unsorted input to be used as a pipe filter with constant memory usage.

Why?

Sometimes you need consume a data stream (Certificate Transparency log for example) that have non consecutive duplicates and you don't want to deal with them. The usual solution involving awk has unbounded memory usage so that might be a problem, this one doesn't.

Memory Usage

swuniq uses a ringbuffer of configurable size (-w option) as a FIFO queue to store hashes of each line to keep memory use constant (64bits * -w value).

Example

# swuniq -h Usage: swuniq [-w N] [INPUT] Filter matching lines (within a configurable window) from INPUT (or standard input), writing to standard output.	-w N Size of the sliding window to use for deduplication Note: By default swuniq will use a window of 100 lines. # cat input.txt  apple apple apple banana banana strawberry blueberry apple banana strawberry blueberry kiwifruit orange peach watermelon orange watermelon kiwifruit banana banana banana apple kiwifruit # swuniq < input.txt apple banana strawberry blueberry kiwifruit orange peach watermelon # swuniq -w 4 < input.txt apple banana strawberry blueberry kiwifruit orange peach watermelon banana apple kiwifruit # swuniq -w 2 < input.txt  apple banana strawberry blueberry apple banana strawberry blueberry kiwifruit orange peach watermelon orange kiwifruit banana apple kiwifruit

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github/workflows		.github/workflows
uthash @ e493aa9		uthash @ e493aa9
xxHash @ 35b0373		xxHash @ 35b0373
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
CHANGELOG		CHANGELOG
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
swuniq.c		swuniq.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

swuniq

Why?

Memory Usage

Example

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

mterron/swuniq

Folders and files

Latest commit

History

Repository files navigation

swuniq

Why?

Memory Usage

Example

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages