CLI tip 32: text processing between two files with GNU awk
awk
is handy to compare records and fields between two or more files. The key features used in the solution below:
- For two files as input,
NR==FNR
will betrue
only when the first file is being processed next
will skip rest of the script and fetch the next recorda[$0]
by itself is a valid statement. It will create an uninitialized element in arraya
with$0
as the key (assuming the key doesn't exist yet)$0 in a
checks if the given string ($0
here) exists as a key in the arraya
$ cat colors_1.txt teal light blue green yellow $ cat colors_2.txt light blue black dark green yellow # common lines $ awk 'NR==FNR{a[$0]; next} $0 in a' colors_1.txt colors_2.txt light blue yellow # lines from colors_2.txt not present in colors_1.txt $ awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_1.txt colors_2.txt black dark green
Note that the
NR==FNR
logic will fail if the first file is empty, sinceNR
wouldn't get a chance to increment. You can set a flag after the first file has been processed to avoid this issue. See this unix.stackexchange thread for more workarounds.# no output $ awk 'NR==FNR{a[$0]; next} !($0 in a)' /dev/null <(seq 2) # gives the expected output $ awk '!f{a[$0]; next} !($0 in a)' /dev/null f=1 <(seq 2) 1 2
Here's an example of comparing specific fields instead of whole lines. When you use a ,
separator between strings to construct the array key, the value of SUBSEP
is inserted. This special variable has a default value of the non-printing character \034
which is usually not used as part of text files.
$ cat marks.txt Dept Name Marks ECE Raj 53 ECE Joel 72 EEE Moi 68 CSE Surya 81 EEE Tia 59 ECE Om 92 CSE Amy 67 $ cat dept_name.txt EEE Moi CSE Amy ECE Raj $ awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' dept_name.txt marks.txt ECE Raj 53 EEE Moi 68 CSE Amy 67
Video demo:
See also my CLI text processing with GNU awk ebook.