AWK is a text-processing utility on GNU/Linux.
It is very powerful and uses a simple programming language.
It can solve complex text processing tasks with a few lines of code.
Example of tasks can be done with AWK:
Text processing,
Producing formatted text reports,
Performing arithmetic operations,
Performing string operations,
Parsing log files, including log files of DBs,
Constructing queries to populate data into DBs
and many more.
AWK follows a simple workflow − Read, Execute, and Repeat.
Read
AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory.
Execute
All AWK commands are applied sequentially on the input. By default AWK executes commands
on every line. We can restrict this by providing patterns.
Repeat
This process repeats until the file reaches its end.
BEGIN block
The syntax of the BEGIN block is as follows −
Syntax
BEGIN {awk-commands}
The BEGIN block gets executed at program start-up. It executes only once. This is a good place
to initialize variables. BEGIN is an AWK keyword and hence it must be in upper-case. Please
note that this block is optional.
Body Block
The syntax of the body block is as follows −
Syntax
/pattern/ {awk-commands}
The body block applies AWK commands on every input line. By default, AWK executes
commands on every line. We can restrict this by providing patterns. Note that there are no
keywords for the Body block.
END Block
The syntax of the END block is as follows −
Syntax
END {awk-commands}
The END block executes at the end of the program. END is an AWK keyword and hence it must
be in upper-case. Please note that this block is optional.
dmi@dmi-laptop:~/my_awk$ cat marks.txt 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"}' Sr No Name Sub Marks
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"} {print}' marks.txt Sr No Name Sub Marks 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk '{print}' marks.txt 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89
dmi@dmi-laptop:~/my_awk$ cat command.awk {print} dmi@dmi-laptop:~/my_awk$ awk -f command.awk marks.txt 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ awk -v name=Linda 'BEGIN{printf "Name = %s\n", name}' Name = Linda
dmi@dmi-laptop:~/my_awk$ cat marks.txt 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 dmi@dmi-laptop:~/my_awk$ awk '{print $3 "\t" $4}' marks.txt Physics 80 Maths 90 Biology 87 English 85 History 89 dmi@dmi-laptop:~/my_awk$
In the following example we're searching form pattern a.
When a pattern match succeeds, it executes a command from the body block.
dmi@dmi-laptop:~/my_awk$ cat marks.txt 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 dmi@dmi-laptop:~/my_awk$ awk '/a/ {print $0}' marks.txt 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 dmi@dmi-laptop:~/my_awk$
In the absence of a body block − default action is taken which is to print the record.
dmi@dmi-laptop:~/my_awk$ awk '/a/' marks.txt 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 dmi@dmi-laptop:~/my_awk$
We can print columns in any order.
dmi@dmi-laptop:~/my_awk$ awk '/a/ {print $4 "\t" $3}' marks.txt 90 Maths 87 Biology 85 English 89 History dmi@dmi-laptop:~/my_awk$
We can count and print the number of lines for which a pattern match succeeded.
dmi@dmi-laptop:~/my_awk$ cat marks.txt 1) Amit Physics 80 2) Rahul Maths 90 3) Shyam Biology 87 4) Kedar English 85 5) Hari History 89 dmi@dmi-laptop:~/my_awk$ awk '/a/{++cnt} END {print "Count = ", cnt}' marks.txt Count = 4 dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat my_example.txt aaa bbb cccccc dd eee fffff fff ffff ggg hh hhh hhhh kkk ll dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 3' my_example.txt aaa bbb cccccc dd fffff fff ffff ggg hh hhh hhhh kkk ll dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 5' my_example.txt aaa bbb cccccc dd fffff fff ffff ggg hh hhh hhhh kkk ll dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 8' my_example.txt cccccc dd fffff fff ffff ggg hh hhh hhhh dmi@dmi-laptop:~/my_awk$
$0 variable stores the entire line.
In the absence of a body block, default action is taken, i.e., the print action.
ARGC is a standard AWK variable
It implies the number of arguments provided at the command line.
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN {print "Arguments =", ARGC}'
One Two Three Four
Arguments = 5
ARGV is a standard AWK variable.
It is an array that stores the command-line arguments.
The array's valid index ranges from 0 to ARGC-1.
dmi@dmi-laptop:~/my_awk$ cat command.awk BEGIN { for (i = 0; i < ARGC - 1; ++i) { printf "ARGV[%d] = %s\n", i, ARGV[i] } } dmi@dmi-laptop:~/my_awk$ awk -f command.awk one two three four five six seven eight ARGV[0] = awk ARGV[1] = one ARGV[2] = two ARGV[3] = three ARGV[4] = four ARGV[5] = five ARGV[6] = six ARGV[7] = seven dmi@dmi-laptop:~/my_awk$ dmi@dmi-laptop:~/my_awk$ awk 'BEGIN { for (i = 0; i < ARGC - 1; ++i) { printf "ARGV[%d] = %s\n", i, ARGV[i] } } ' one two three four five six seven eight ARGV[0] = awk ARGV[1] = one ARGV[2] = two ARGV[3] = three ARGV[4] = four ARGV[5] = five ARGV[6] = six ARGV[7] = seven dmi@dmi-laptop:~/my_awk$
Regular expression .
It matches any single character except the end of line character.
dmi@dmi-laptop:~/my_awk$ echo -e "cat\nbat\nfun\nfin\nfan" cat bat fun fin fan echo -e ---- enables interpretation of backslash escapes dmi@dmi-laptop:~/my_awk$ echo -e "cat\nbat\nfun\nfin\nfan" | awk '/f.n/' fun fin fan dmi@dmi-laptop:~/my_awk$
Regular expression ^ .
It matches the start of the line.
dmi@dmi-laptop:~/my_awk$ echo -e "This\nThat\nThere\nTheir\nthese" This That There Their these dmi@dmi-laptop:~/my_awk$ echo -e "This\nThat\nThere\nTheir\nthese" | awk '/^The/' There Their dmi@dmi-laptop:~/my_awk$
Regular expression $.
It matches the end of line.
dmi@dmi-laptop:~/my_awk$ echo -e "knife\nknow\nfun\nfin\nfan\nnine" knife know fun fin fan nine dmi@dmi-laptop:~/my_awk$ echo -e "knife\nknow\nfun\nfin\nfan\nnine" | awk '/n$/' fun fin fan dmi@dmi-laptop:~/my_awk$
Regular expression [ ] Match character set
It is used to match only one out of several characters.
dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall" Call Tall Ball dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall" | awk '/[CT]all/' Call Tall dmi@dmi-laptop:~/my_awk$
Regular expression [^ ] Exclusive set
In the exclusive set, the ^ negates the set of characters in the square brackets.
dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall" Call Tall Ball dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall" | awk '/[^CT]all/' Ball dmi@dmi-laptop:~/my_awk$
How to find the length of each record in a file?
dmi@dmi-laptop:~/my_awk$ cat my_example.txt aaa bbb cccccc dd eee fffff fff ffff ggg hh hhh hhhh kkk ll dmi@dmi-laptop:~/my_awk$ awk '{print $0, ".....", length($0)}' my_example.txt aaa bbb ..... 7 cccccc dd ..... 9 eee ..... 3 fffff fff ffff ..... 14 ggg hh hhh hhhh ..... 15 kkk ll ..... 6
Delimiter
dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt aaa, bbb, ccc, dddd eee ff, gggg, hhhh, kk, llllll, mmmm, nnn ooooo, pppp,qqq rrr sss ttt, uuu, vvv dmi@dmi-laptop:~/my_awk$ awk -F, ' { print $2 } ' some_file_with_commas.txt bbb gggg pppp uuu dmi@dmi-laptop:~/my_awk$ dmi@dmi-laptop:~/my_awk$ awk -F, ' length($2)>0 { print $2 } ' some_file_with_commas.txt bbb gggg pppp uuu dmi@dmi-laptop:~/my_awk$
Sum of file sizes with AWK on a list of files
dmi@dmi-laptop:~/my_awk$ ls -l total 16 -rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk -rw-rw-r-- 1 dmi dmi 120 Dec 11 08:18 marks.txt -rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt -rw-rw-r-- 1 dmi dmi 100 Dec 11 09:11 some_file_with_commas.txt dmi@dmi-laptop:~/my_awk$ ls -l | awk '{sum += $5} END {print sum}' 379 dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ ls -l total 16 -rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk -rw-rw-r-- 1 dmi dmi 120 Dec 11 08:18 marks.txt -rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt -rw-rw-r-- 1 dmi dmi 100 Dec 11 09:11 some_file_with_commas.txt dmi@dmi-laptop:~/my_awk$ ls -l | awk '$5 < 100 {print $0} ' total 16 -rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk -rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt dmi@dmi-laptop:~/my_awk$ ls -l | awk '$5 < 100 {print $9} ' command.awk my_example.txt dmi@dmi-laptop:~/my_awk$ ls -l | awk 'length($5)>0 && $5 < 100 {print $9} ' command.awk my_example.txt dmi@dmi-laptop:~/my_awk$
Skip first line of file
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data Name, Address, Birthday, Mark John, Green street, 2000-01-01, 100 Ann, Apple street, 1980-05-22, 99 Miki, Orange street, 1985-01-01, 97 dmi@dmi-laptop:~/my_awk$ awk '(NR>1)' some_data_to_populate.data John, Green street, 2000-01-01, 100 Ann, Apple street, 1980-05-22, 99 Miki, Orange street, 1985-01-01, 97 dmi@dmi-laptop:~/my_awk$
The awk's NR variable indicates the number of records in a file.
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data Name, Address, Birthday, Mark John, Green street, 2000-01-01, 100 Ann, Apple street, 1980-05-22, 99 Miki, Orange street, 1985-01-01, 97 dmi@dmi-laptop:~/my_awk$ awk -F, '(NR>1) { printf("%s", $2) } ' some_data_to_populate.data Green street Apple street Orange streetdmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data Name, Address, Birthday, Mark John, Green street, 2000-01-01, 100 Ann, Apple street, 1980-05-22, 99 Miki, Orange street, 1985-01-01, 97 dmi@dmi-laptop:~/my_awk$ awk -F, '(NR>1) { printf("%s\n", $2) } ' some_data_to_populate.data Green street Apple street Orange street
(NR>1) - not print the first rec in the file
dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt aaa, bbb, ccc, dddd eee ff, gggg, hhhh, kk, llllll, mmmm, nnn ooooo, pppp,qqq rrr sss ttt, uuu, vvv dmi@dmi-laptop:~/my_awk$ awk ' { printf("\x27") } ' some_file_with_commas.txt ''''''''dmi@dmi-laptop:~/
dmi@dmi-laptop:~/my_awk$ awk ' { printf("\x27\n") } ' some_file_with_commas.txt ' ' ' ' ' ' ' ' dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt aaa, bbb, ccc, dddd eee ff, gggg, hhhh, kk, llllll, mmmm, nnn ooooo, pppp,qqq rrr sss ttt, uuu, vvv dmi@dmi-laptop:~/my_awk$ awk -F, ' { printf("\x27%s\x27\n", $1) } ' some_file_with_commas.txt 'aaa' 'eee' 'ff' 'ooooo' 'rrr' 'sss' 'ttt' 'vvv' dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("insert into some_table values(trim(\x27%s\x27), trim(\x27%s\x27), trim(\x27%s\x27), %s);\n", $1, $2, $3, $4); } ' insert into some_table values(trim('John'), trim(' Green street'), trim(' 2000-01-01'), 100); insert into some_table values(trim('Ann'), trim(' Apple street'), trim(' 1980-05-22'), 99); insert into some_table values(trim('Miki'), trim(' Orange street'), trim(' 1985-01-01'), 97); dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("update some_table set the_address=trim(\x27%s\x27), the_birthday=trim(\x27%s\x27), the_mark=%s where the_name=\x27%s\x27;\n", $2, $3, $4, $1); } ' update some_table set the_address=trim(' Green street'), the_birthday=trim(' 2000-01-01'), the_mark= 100 where the_name='John'; update some_table set the_address=trim(' Apple street'), the_birthday=trim(' 1980-05-22'), the_mark= 99 where the_name='Ann'; update some_table set the_address=trim(' Orange street'), the_birthday=trim(' 1985-01-01'), the_mark= 97 where the_name='Miki'; dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data Name, Address, Birthday, Mark John, Green street, 2000-01-01, 100 Ann, Apple street, 1980-05-22, 99 Miki, Orange street, 1985-01-01, 97 dmi@dmi-laptop:~/my_awk$ dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("insert into some_table values(trim(\x27%s\x27), trim(\x27%s\x27), trim(\x27%s\x27), %s);\n", $1, $2, $3, $4); } ' > RunMe.sql dmi@dmi-laptop:~/my_awk$ cat RunMe.sql insert into some_table values(trim('John'), trim(' Green street'), trim(' 2000-01-01'), 100); insert into some_table values(trim('Ann'), trim(' Apple street'), trim(' 1980-05-22'), 99); insert into some_table values(trim('Miki'), trim(' Orange street'), trim(' 1985-01-01'), 97); dmi@dmi-laptop:~/my_awk$
Top comments (0)