Introduction to awk Arun Vishwanathan Nevis Networks Pvt. Ltd.
Agenda • What is awk ?? • awk versions • A few basic things about AWK • Program Structure in awk. • A simple example. • Running awk programs. • Advanced awk features. • Awk examples. • Advantages of AWK. • awk References.
What is awk ?? • The word awk is derived from the names of its inventors!!! • awk is actually Aho Weinberger and Kernighan ;). • From the original awk paper published by Bell Labs, awk is “ Awk is a programming language designed to make many common information retrieval and text manipulation tasks easy to state and to perform.” • Simply put, awk is a programming language designed to search for, match patterns, and perform actions on files.
awk Versions • awk – Original Bell Labs awk (Version 7 UNIX, around 1978) + latest POSIX awk. • nawk – New awk (released with SVR4 around 1989) • gawk – GNU implementation of awk standard. • mawk – Michael’s awk. ……… and the list goes on. All these are basically same except for some minor differences in features provided. This presentation will assume the widely used POSIX awk (also called “awk”).
A few basic things about awk • awk reads from a file or from its standard input, and outputs to its standard output. • awk recognizes the concepts of "file", "record" and "field". • A file consists of records, which by default are the lines of the file. One line becomes one record. • awk operates on one record at a time. • A record consists of fields, which by default are separated by any number of spaces or tabs. • Field number 1 is accessed with $1, field 2 with $2, and so forth. $0 refers to the whole record.
Program Structure in Awk • An awk program is a sequence of statements of the form: pattern { action } pattern { action } ... • pattern in front of an action acts as a selector that determines whether the action is to be executed. • Patterns can be : regular expressions, arithmetic relational expressions, string-valued expressions, and arbitrary boolean combinations of these.
Program Structure in awk (cont..) • action is a sequence of action statements terminated by newlines or semicolons. • These action statements can be used to do a variety of bookkeeping and string manipulating tasks. • awk programs can either be written in a file or they can be written on the command line itself.
A simple example • Problem : Get the userid of user “arun” from the /etc/passwd file. • Suppose /etc/passwd file contains the following entries arun:x:504:504::/home/arun:/bin/bash try:x:500:500::/home/try:/bin/bash optima:x:501:501::/home/optima:/bin/bash optimal:x:502:502::/home/optimal:/bin/bash • awk will see this file as follows – 1 line = 1 record (by default) so in total there are 4 records in the file. – 1 record = 7 fields separated by “:” (Not by default) Note : Default field separator is space.
A simple example (cont..) $ awk –F”:” ‘/arun/ {print $1 “ “ $3}’ /etc/passwd Awk executable pattern to search Action to perform on line If pattern matches The file to operate upon Field Separator
A simple example (cont..) • The output of the above command will be [root@tux root]# awk -F":" ‘/arun/ {print $1 " " $3}’ /etc/passwd arun 504 [root@tux root]# • Another way to write the command is [root@tux root]# awk ‘BEGIN { FS=“:” } /arun/ {print $1 " " $3}’ /etc/passwd arun 504 [root@tux root]#
Running awk programs There are four ways in which we can run awk programs • One-shot: Running a short throw-away awk program. $ awk 'program' input-file1 input-file2 ... where program consists of a series of patterns and actions. • Read Terminal: Using no input files (input from terminal instead). $ awk 'program' <ENTER> <input lines> <input lines> ctrl-d • Long: Putting permanent awk programs in files. $ awk -f source-file input-file1 input-file2 ...
Running awk programs (cont..) • Executable Scripts: Making self-contained awk programs. (eg) : Write a script named hello with the following contents #! /bin/awk -f # a sample awk program /foo/ { print $1} Execute the following command $ chmod +x hello To run this script simply type $ ./hello file.txt
Advanced awk features • Awk borrows a lot from the C language. • The if loop, for loop and while loop have the same constructs as in C. • Awk’s variables are stored internally as strings. eg. x = “1.01” x = x + 1 print x The above will print the value 2.01 • Comparison operators in awk are : "==", "<", ">", "<=", ">=", "!=“, "~" and "!~“. • “~” and “!~” operators mean "matches" and "does not match".
Advanced awk features (cont..) • Common Arithmetic operators in awk are : “+", “-", “/", “*“; • “^” is the exponentiation operator. • “%” is the modulo operator • All the C operators like “++”, “--”, “+=“, “-=”, “/=“ etc. are also valid. • The awk language has one-dimensional arrays for storing groups of related strings or numbers. • Arrays in awk are associative. This means that each array is a collection of pairs: an index, and its corresponding array element value. (eg) : Element 1 value 2 Element 2 value “foo” Element “cat” value “chicken”
Awk Examples $ awk '{ print $0 }' /etc/passwd Prints all the lines in /etc/passwd $ awk -F":" '{ print "username: " $1 "ttuid:" $3" }' /etc/passwd Prints the 1st and 3rd fields of each line in /etc/passwd. The fields are separated by “:” $ awk –f script1.awk /etc/passwd script1.awk BEGIN{ x=0 }# The BEGIN block is executed before processing the file /^$/ { x=x+1 } # For every null line increment the count END { print "I found " x " blank lines. :)" } #Executed at the end The above script calculates the number of null lines. Note that BEGIN and END are special patterns.
Awk examples (cont..) $ awk 'BEGIN { RS = "/" } ; { print $0 }' file1.txt RS is the record separator (default is n). In this example the RS is modified to “/” and then the file is processed. So awk will distinguish between records by “/” character. $ awk '$1 ~ /foo/ { print $0 }' file.txt The pattern will print out all records from file file.txt whose first fields contain the string “foo”. $ awk '{ print $(2*2) }' file.txt In the above example the field number is an expression. So awk will print the 4th fields of all the records.
Awk examples (cont..) $ awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped This example will subtract the second field of each record by 10 and store it in the third field. $ awk 'BEGIN { FS = "," } ; { print $2 }' file.txt FS is the field separator in awk. In the above example we are asking awk to separate the fields by “,” instead of default “ “. $ awk 'BEGIN { OFS = ";"; ORS = "nn" } { print $1, $2 }' file1.txt OFS is the Output field Separator, ORS is Output record separator. This prints the first and second fields of each input record separated by a semicolon, with a blank line added after each line.
Awk examples (cont..) Consider that we have the following input in a file called grades john 85 92 78 94 88 andrea 89 90 75 90 86 jasper 84 88 80 92 84 The following awk script grades.awk will find the average # average five grades { total = $2 + $3 + $4 + $5 + $6 avg = total / 5 print $1, avg } $ awk –f grades.awk grades
Awk examples (cont..) $ awk 'BEGIN { OFMT = "%d" # print numbers as integers print 17.23 }‘ This will print 17. OFMT is the output format specifier. $ awk –f mailerr.awk { report = "mail bug-system" print "Awk script failed:", $0 | report print "at record number", FNR, "of", FILENAME | report close(report) } This script opens a pipe to the mail command and prints output into the pipe. When the pipe is closed the mail is sent. Awk assumes that whatever comes after the “|” symbol is a command and creates a process for it.
• awk '{ if (NF > max) max = NF } •
Advantages of Awk • awk is an interpreted language so you can avoid the usually lengthy edit- compile-test-debug cycle of software development . • Can be used for rapid prototyping. • The awk language is very useful for producing reports from large amounts of raw data, such as summarizing information from the output of other utility programs like ls.
awk references • The GNU Awk manual • Awk -- A Pattern Scanning and Processing Language (Original AWK paper) • http://www-106.ibm.com/developerworks/library/l-awk1.html • http://www-106.ibm.com/developerworks/library/l-awk2.html • http://www-106.ibm.com/developerworks/library/l-awk3.html • Sed and Awk 2nd Edition (O’reilly)

awk_intro.ppt

  • 1.
    Introduction to awk ArunVishwanathan Nevis Networks Pvt. Ltd.
  • 2.
    Agenda • What isawk ?? • awk versions • A few basic things about AWK • Program Structure in awk. • A simple example. • Running awk programs. • Advanced awk features. • Awk examples. • Advantages of AWK. • awk References.
  • 3.
    What is awk?? • The word awk is derived from the names of its inventors!!! • awk is actually Aho Weinberger and Kernighan ;). • From the original awk paper published by Bell Labs, awk is “ Awk is a programming language designed to make many common information retrieval and text manipulation tasks easy to state and to perform.” • Simply put, awk is a programming language designed to search for, match patterns, and perform actions on files.
  • 4.
    awk Versions • awk– Original Bell Labs awk (Version 7 UNIX, around 1978) + latest POSIX awk. • nawk – New awk (released with SVR4 around 1989) • gawk – GNU implementation of awk standard. • mawk – Michael’s awk. ……… and the list goes on. All these are basically same except for some minor differences in features provided. This presentation will assume the widely used POSIX awk (also called “awk”).
  • 5.
    A few basicthings about awk • awk reads from a file or from its standard input, and outputs to its standard output. • awk recognizes the concepts of "file", "record" and "field". • A file consists of records, which by default are the lines of the file. One line becomes one record. • awk operates on one record at a time. • A record consists of fields, which by default are separated by any number of spaces or tabs. • Field number 1 is accessed with $1, field 2 with $2, and so forth. $0 refers to the whole record.
  • 6.
    Program Structure inAwk • An awk program is a sequence of statements of the form: pattern { action } pattern { action } ... • pattern in front of an action acts as a selector that determines whether the action is to be executed. • Patterns can be : regular expressions, arithmetic relational expressions, string-valued expressions, and arbitrary boolean combinations of these.
  • 7.
    Program Structure inawk (cont..) • action is a sequence of action statements terminated by newlines or semicolons. • These action statements can be used to do a variety of bookkeeping and string manipulating tasks. • awk programs can either be written in a file or they can be written on the command line itself.
  • 8.
    A simple example •Problem : Get the userid of user “arun” from the /etc/passwd file. • Suppose /etc/passwd file contains the following entries arun:x:504:504::/home/arun:/bin/bash try:x:500:500::/home/try:/bin/bash optima:x:501:501::/home/optima:/bin/bash optimal:x:502:502::/home/optimal:/bin/bash • awk will see this file as follows – 1 line = 1 record (by default) so in total there are 4 records in the file. – 1 record = 7 fields separated by “:” (Not by default) Note : Default field separator is space.
  • 9.
    A simple example(cont..) $ awk –F”:” ‘/arun/ {print $1 “ “ $3}’ /etc/passwd Awk executable pattern to search Action to perform on line If pattern matches The file to operate upon Field Separator
  • 10.
    A simple example(cont..) • The output of the above command will be [root@tux root]# awk -F":" ‘/arun/ {print $1 " " $3}’ /etc/passwd arun 504 [root@tux root]# • Another way to write the command is [root@tux root]# awk ‘BEGIN { FS=“:” } /arun/ {print $1 " " $3}’ /etc/passwd arun 504 [root@tux root]#
  • 11.
    Running awk programs Thereare four ways in which we can run awk programs • One-shot: Running a short throw-away awk program. $ awk 'program' input-file1 input-file2 ... where program consists of a series of patterns and actions. • Read Terminal: Using no input files (input from terminal instead). $ awk 'program' <ENTER> <input lines> <input lines> ctrl-d • Long: Putting permanent awk programs in files. $ awk -f source-file input-file1 input-file2 ...
  • 12.
    Running awk programs(cont..) • Executable Scripts: Making self-contained awk programs. (eg) : Write a script named hello with the following contents #! /bin/awk -f # a sample awk program /foo/ { print $1} Execute the following command $ chmod +x hello To run this script simply type $ ./hello file.txt
  • 13.
    Advanced awk features •Awk borrows a lot from the C language. • The if loop, for loop and while loop have the same constructs as in C. • Awk’s variables are stored internally as strings. eg. x = “1.01” x = x + 1 print x The above will print the value 2.01 • Comparison operators in awk are : "==", "<", ">", "<=", ">=", "!=“, "~" and "!~“. • “~” and “!~” operators mean "matches" and "does not match".
  • 14.
    Advanced awk features(cont..) • Common Arithmetic operators in awk are : “+", “-", “/", “*“; • “^” is the exponentiation operator. • “%” is the modulo operator • All the C operators like “++”, “--”, “+=“, “-=”, “/=“ etc. are also valid. • The awk language has one-dimensional arrays for storing groups of related strings or numbers. • Arrays in awk are associative. This means that each array is a collection of pairs: an index, and its corresponding array element value. (eg) : Element 1 value 2 Element 2 value “foo” Element “cat” value “chicken”
  • 15.
    Awk Examples $ awk'{ print $0 }' /etc/passwd Prints all the lines in /etc/passwd $ awk -F":" '{ print "username: " $1 "ttuid:" $3" }' /etc/passwd Prints the 1st and 3rd fields of each line in /etc/passwd. The fields are separated by “:” $ awk –f script1.awk /etc/passwd script1.awk BEGIN{ x=0 }# The BEGIN block is executed before processing the file /^$/ { x=x+1 } # For every null line increment the count END { print "I found " x " blank lines. :)" } #Executed at the end The above script calculates the number of null lines. Note that BEGIN and END are special patterns.
  • 16.
    Awk examples (cont..) $awk 'BEGIN { RS = "/" } ; { print $0 }' file1.txt RS is the record separator (default is n). In this example the RS is modified to “/” and then the file is processed. So awk will distinguish between records by “/” character. $ awk '$1 ~ /foo/ { print $0 }' file.txt The pattern will print out all records from file file.txt whose first fields contain the string “foo”. $ awk '{ print $(2*2) }' file.txt In the above example the field number is an expression. So awk will print the 4th fields of all the records.
  • 17.
    Awk examples (cont..) $awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped This example will subtract the second field of each record by 10 and store it in the third field. $ awk 'BEGIN { FS = "," } ; { print $2 }' file.txt FS is the field separator in awk. In the above example we are asking awk to separate the fields by “,” instead of default “ “. $ awk 'BEGIN { OFS = ";"; ORS = "nn" } { print $1, $2 }' file1.txt OFS is the Output field Separator, ORS is Output record separator. This prints the first and second fields of each input record separated by a semicolon, with a blank line added after each line.
  • 18.
    Awk examples (cont..) Considerthat we have the following input in a file called grades john 85 92 78 94 88 andrea 89 90 75 90 86 jasper 84 88 80 92 84 The following awk script grades.awk will find the average # average five grades { total = $2 + $3 + $4 + $5 + $6 avg = total / 5 print $1, avg } $ awk –f grades.awk grades
  • 19.
    Awk examples (cont..) $awk 'BEGIN { OFMT = "%d" # print numbers as integers print 17.23 }‘ This will print 17. OFMT is the output format specifier. $ awk –f mailerr.awk { report = "mail bug-system" print "Awk script failed:", $0 | report print "at record number", FNR, "of", FILENAME | report close(report) } This script opens a pipe to the mail command and prints output into the pipe. When the pipe is closed the mail is sent. Awk assumes that whatever comes after the “|” symbol is a command and creates a process for it.
  • 20.
    • awk '{if (NF > max) max = NF } •
  • 21.
    Advantages of Awk •awk is an interpreted language so you can avoid the usually lengthy edit- compile-test-debug cycle of software development . • Can be used for rapid prototyping. • The awk language is very useful for producing reports from large amounts of raw data, such as summarizing information from the output of other utility programs like ls.
  • 22.
    awk references • TheGNU Awk manual • Awk -- A Pattern Scanning and Processing Language (Original AWK paper) • http://www-106.ibm.com/developerworks/library/l-awk1.html • http://www-106.ibm.com/developerworks/library/l-awk2.html • http://www-106.ibm.com/developerworks/library/l-awk3.html • Sed and Awk 2nd Edition (O’reilly)