UNIX - awk Data extraction and formatted Reporting Tool Presentation By Nihar R Paital
Introduction  Developer : Alfred Aho Peter Weinberger Brian Kernighan  Appears in : Version 7 UNIX onwards  Developed during : 1970 s  Developed at : Bell Labs  Category : UNIX Utility  Supported by : All UNIX flavors Nihar R Paital
Definition The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. Nihar R Paital
It performs basic text formatting on an input stream ( A file / input from a pipeline )  Formatting using input file $ awk {print $n} Filename Example: $ awk {print $1} awk.txt > awk.txt.bak  Formatting using a filter in a pipeline $ generate_data | awk {print $1} Example: $ cat awk.txt | awk {print $1} > awk.txt.bak Before proceeding to next slide please create a file named awk.txt with following Contents. 07.46.199.184 [28/Sep/2010:04:08:20] "GET /robots.txt HTTP/1.1" 200 0 "msnbot" 123.125.71.19 [28/Sep/2010:04:20:11] "GET / HTTP/1.1" 304 - "Baiduspider" Nihar R Paital
Basic but important for awk  Syntax :  awk {print $n} filename  Generate data : awk {print $n}  Awk programs will start with a "{" and end with a "}"  $0 is the entire line  Awk parses the line in to fields for you automatically, using any whitespace (space, tab) as a delimiter.  Fields of a regular file will be available using $1,$2,$3 … etc  NF : It is a special Variable contains the number of fields in the current line. We can print the last field by printing the field $NF  NR : It prints the row number being currently processed. Nihar R Paital
Basic Examples $ awk '{print $0}' awk.txt It will print all the lines as they are in File $ echo 'this is a test' | awk '{print $3}' It will print 'a' $ echo 'this is a test' | awk '{print $NF}' It prints "test" $ awk '{print $1, $(NF-2) }' awk.txt It will print the last 3rd word of file awk.txt $ awk '{print NR ") " $1 " -> " $(NF-2)}‘ Output: 1) 07.46.199.184 -> 200 2) 123.125.71.19 -> 304 Nihar R Paital
Advance use of AWK $ awk '{print $2}' logs.txt Output: [28/Sep/2010:04:08:20] [28/Sep/2010:04:20:11] The date field is separated by "/" and ":" characters. Suppose I want to print like [28/Sep/2010 [28/Sep/2010 $ awk '{print $2}' logs.txt | awk 'BEGIN{FS=":"}{print $1}' Output: [28/Sep/2010 [28/Sep/2010 Here FS=“:” means Field Separator as colon(:) $ awk '{print $2}' logs.txt | awk 'BEGIN{FS=":"}{print $1}' | sed 's/[//' Output: 28/Sep/2010 28/Sep/2010 Here We are Substituting [ with NULL value Nihar R Paital
Advance Use of AWK If I want to return only the 200 status lines $ awk '{if ($(NF-2) == "200") {print $0}}' logs.txt Output: 07.46.199.184 [28/Sep/2010:04:08:20] "GET /robots.txt HTTP/1.1" 200 0 "msnbot" $ awk '{a+=$(NF-2); print "Total so far:", a}' logs.txt Output: Total so far: 200 Total so far: 504 $ awk '{a+=$(NF-2)}END{print "Total:", a}' logs.txt Output: Total: 504 Nihar R Paital
Nihar R Paital

Unix - Class7 - awk

  • 1.
    UNIX - awk Data extraction and formatted Reporting Tool Presentation By Nihar R Paital
  • 2.
    Introduction  Developer : Alfred Aho Peter Weinberger Brian Kernighan  Appears in : Version 7 UNIX onwards  Developed during : 1970 s  Developed at : Bell Labs  Category : UNIX Utility  Supported by : All UNIX flavors Nihar R Paital
  • 3.
    Definition The AWKutility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. Nihar R Paital
  • 4.
    It performs basictext formatting on an input stream ( A file / input from a pipeline )  Formatting using input file $ awk {print $n} Filename Example: $ awk {print $1} awk.txt > awk.txt.bak  Formatting using a filter in a pipeline $ generate_data | awk {print $1} Example: $ cat awk.txt | awk {print $1} > awk.txt.bak Before proceeding to next slide please create a file named awk.txt with following Contents. 07.46.199.184 [28/Sep/2010:04:08:20] "GET /robots.txt HTTP/1.1" 200 0 "msnbot" 123.125.71.19 [28/Sep/2010:04:20:11] "GET / HTTP/1.1" 304 - "Baiduspider" Nihar R Paital
  • 5.
    Basic but importantfor awk  Syntax :  awk {print $n} filename  Generate data : awk {print $n}  Awk programs will start with a "{" and end with a "}"  $0 is the entire line  Awk parses the line in to fields for you automatically, using any whitespace (space, tab) as a delimiter.  Fields of a regular file will be available using $1,$2,$3 … etc  NF : It is a special Variable contains the number of fields in the current line. We can print the last field by printing the field $NF  NR : It prints the row number being currently processed. Nihar R Paital
  • 6.
    Basic Examples $awk '{print $0}' awk.txt It will print all the lines as they are in File $ echo 'this is a test' | awk '{print $3}' It will print 'a' $ echo 'this is a test' | awk '{print $NF}' It prints "test" $ awk '{print $1, $(NF-2) }' awk.txt It will print the last 3rd word of file awk.txt $ awk '{print NR ") " $1 " -> " $(NF-2)}‘ Output: 1) 07.46.199.184 -> 200 2) 123.125.71.19 -> 304 Nihar R Paital
  • 7.
    Advance use ofAWK $ awk '{print $2}' logs.txt Output: [28/Sep/2010:04:08:20] [28/Sep/2010:04:20:11] The date field is separated by "/" and ":" characters. Suppose I want to print like [28/Sep/2010 [28/Sep/2010 $ awk '{print $2}' logs.txt | awk 'BEGIN{FS=":"}{print $1}' Output: [28/Sep/2010 [28/Sep/2010 Here FS=“:” means Field Separator as colon(:) $ awk '{print $2}' logs.txt | awk 'BEGIN{FS=":"}{print $1}' | sed 's/[//' Output: 28/Sep/2010 28/Sep/2010 Here We are Substituting [ with NULL value Nihar R Paital
  • 8.
    Advance Use ofAWK If I want to return only the 200 status lines $ awk '{if ($(NF-2) == "200") {print $0}}' logs.txt Output: 07.46.199.184 [28/Sep/2010:04:08:20] "GET /robots.txt HTTP/1.1" 200 0 "msnbot" $ awk '{a+=$(NF-2); print "Total so far:", a}' logs.txt Output: Total so far: 200 Total so far: 504 $ awk '{a+=$(NF-2)}END{print "Total:", a}' logs.txt Output: Total: 504 Nihar R Paital
  • 9.