LIN 6932 1 Searching for something in a file GREP • The grep family is a collection of three related programs for finding patterns in files. Their names are grep, fgrep, and egrep. • The name grep has its origin in the phrase "Get Regular Expression and Print” • grep is a full-blown regular-expression matcher • fgrep = "fixed string grep” only searches for strings • egrep = “extended grep”
LIN 6932 2 Searching for something in a file fgrep fgrep: the easiest (but not fastest) one to use Syntax: % fgrep [options] 'search string’ filenames Interpretation: In the name fgrep the f stands for "Fixed string", and not "Fast" (contrary to what the man page may tell you). The fgrep program finds all the lines in a file that contain a certain fixed string. So, for example, I could find all occurrences of CA in the files in the current working directory simply by typing this command: % fgrep CA *
LIN 6932 3 Searching for something in a file fgrep • Like many UNIX filters, it can take as many file names as you like to supply. And of course it permits various adverbs that specify options; two useful ones are • -i ignore the difference between upper case and lower case when deciding what is a match • -v reverse the effect of the search by outputting only the lines that don't match % fgrep -i CA * % fgrep -v CA *
LIN 6932 4 Searching for something in a file fgrep The key limitation of fgrep is that you cannot use it to get approximate matches, or matches of more complicated patterns that cannot be described by just giving a fixed string. Sometimes you are not quite sure what string you are looking for; for example, you might know only that the word you are seeking begins with z and ends with -ic, and had the sequence gm in it somewhere. What you need, then, is not a program that will find the matching lines for you if you give it the exact string you need to find, but rather a program that can understand a language in which you can say things like "begins with z and ends with -ic or -ics and had gm in it somewhere."
LIN 6932 5 Searching for something in a file grep called up by giving a command that has this form: % grep [options] pattern description files_to_search_in % grep -i 'pull[aeiou][mn]’ shakespeare bad_phone_numbers display • This means, "without distinguishing between upper and lower case, search the files shakespeare bad_phone_numbers display for lines that contain pull followed by a vowel letter followed by an m or an n". Thus it is looking for Pullum, Pullam, Pullen, PULLUN, pullum@grove.ufl.edu, etc., etc. • The expression pull[aeiou][mn] is a pattern description covering the name Pullum and most common variants of it. Thus it is looking for Pullum, Pullam, Pullen, PULLUN, pullum@grove.ufl.edu. • The pattern descriptions used with grep are in a language called the language of regular expressions. This is one of the most important and fruitful developments in modern computer science, and in order to use grep you need to understand regular expressions thoroughly.
LIN 6932 6 Searching for something in a file grep There are various dialects of the regular expression language that are used by various UNIX programs. Here we will be talking about grep and its extended cousin egrep. (Read the excellent summary with examples in Unix in a Nutshell, particularly chapter 6, and do man grep on a NetBSD machine to check the details of the GNU grep that runs on those machines. (GNU: pronounced guh-noo, approximately like canoe; launched in 1984 to develop a complete Unix-like operating system which is free software, often referred to as LINUX) Note that the grep that runs on other machines may be a different program, with lots of differences in its behavior from the GNU version.
LIN 6932 7 Searching for something in a file grep There are various dialects of the regular expression language that are used by various UNIX programs. Here we will be talking about grep and its extended cousin egrep. (Read the excellent summary with examples in Unix in a Nutshell, particularly chapter 6, and do man grep on a NetBSD machine to check the details of the GNU grep that runs on those machines. (GNU: pronounced guh-noo, approximately like canoe; launched in 1984 to develop a complete Unix-like operating system which is free software, often referred to as LINUX) Note that the grep that runs on other machines may be a different program, with lots of differences in its behavior from the GNU version.
LIN 6932 8 Searching for something in a file grep • Example: “The match the phrase that begins with z at the beginning of a line and ends in -ic or -ics at the end of the line, and it has gm in it somewhere”, is expressed in the language of regular expressions in this form: ^z.*gm.*ics*$ To be more precise, what this regular expression means is: "beginning of line followed by z followed by optional other material followed by gm followed by optional other material followed by -ic followed by zero or more occurrences of s followed by end of line" • It can therefore be used in a grep command to search for a word in a dictionary where each word is on a separate line meeting the description: % grep '^z.*gm.*ics*$' dictionary Search result: zeugmatic
LIN 6932 9 Searching for something in a file grep The most trivial case of a regular expression is that of a fixed string of the sort that fgrep recognizes. Fixed strings are regular expressions that are matched only by strings identical to themselves. The regular expression Z is matched by any occurrence of Z. There happens to be only one line in The Great God Pan (/class/lin6932/c6932aab/machen.txt) that matches it, namely the middle line of these three: remained. These three, however, were 'good lives,' but yet not proof against the Zulu assegais and typhoid fever, and so one morning Aubernoun woke up and found himself Lord Because the middle line matches the expression Z, you can fetch (a copy of) that line out of the file like this: % grep Z machen.txt not proof against the Zulu assegais and typhoid fever, and so
LIN 6932 10 Searching for something in a file grep % fgrep Z machen.txt fgrep would do the same thing. But what fgrep cannot do is to call for all lines with Au possibly followed by some other lower-case letters and then an n. That is accomplished by the regular expression Au[a-z]*n this RE is matched by any sequence of a capital A followed by a lower-case u followed by zero or more letters in the range lower-case a to lower-case z followed by lower-case n. This means it will be matched by any string containing a word like word like any of these: Aubernoun, Augustine, Austin, etc.
LIN 6932 11 Searching for something in a file grep % fmt -1 machen.txt | tr -d '[:punct:] ' | grep 'Au[a-z]*n' | sort -u The fmt command is to break the words up and put them one on each line the tr -d '[:punct:]' command erases all punctuation, and spaces the sort -u command sorts the search result alphabetically
LIN 6932 12 Searching for something in a file grep % grep 'Au[a-z]*n' machen.txt Au[s t r a l a b r a c a d a b r a l a l i o l a s i a]n Au[ a-z ]*n
LIN 6932 13 Searching for something in a file grep Example: The zipcodes in the near vicinity of the UC campus are 95060 (Santa Cruz west of the river), 95062 (Live Oak), 95064 (UCSC), 95065 (East Santa Cruz), 95066 (Scotts Valley). Suppose you wanted to extract from a file called addresses, containing one full name and address on each line, just the addresses of people living in these areas. Assume some people type a space after CA and others don't, and some write several spaces. The following regular expression describes the set of zipcodes you want: CA *9506[024-6]. This grep command will find just the lines in the file addresses that contain zipcodes for people who live in near the campus: % grep 'CA *9506[024-6]' addresses
LIN 6932 14 Searching for something in a file grep Example: Suppose you want only the 9-digit zipcodes, that's easy too: % grep 'CA *9506[024-6]-[0-9]{4}' addresses
LIN 6932 15 Searching for something in a file grep Example: Suppose you were looking to see whether there were any words beginning with a in a file called shakespeare. You might type % grep a* shakespeare
LIN 6932 16 Searching for something in a file egrep Some simple tasks would be a bit of a chore just using grep. Suppose we wanted to add Ben Lomond (CA 95005), Davenport (CA 95017), and Felton (CA 95018). What we need here is the disjunction: for the 5-digit zipcodes, the strings we want will match either CA *9506[024- 6] or CA *95005 or CA *9501[78] or. Now, we can certainly do that: we can simply call grep three separate times, and amalgamate all the results. We cannot amalgamate all the searches into something like CA *950[016][024- 8], because that defines a set that is too big; it lets in 95004, for example, and that's Aromas, way the other side of Watsonville. The way to do it is to use the extended regular expressons provided by the egrep program. In egrep, you can use parentheses to group parts of the expression and the pipe symbol to mean or. So (AB)|C means "either AB or C", while A(B|C) means "A followed by either B or C", and so on. Thus we could use:% egrep 'CA *950((05)|(6[024-6])|(1[78]))' addressesThere are a few other things that egrep allows but grep does not. For example, in egrep regular expressions you can say a+ to mean "a sequence of one or more as", or [a-z]+ to mean "a sequence of one or more lower-case letters". In grep regular expressions you would have to say aa* and [a-z][a-z]* respectively to get these effects.
LIN 6932 17 Searching for something in a file egrep The way to do it is to use the extended regular expressons provided by the egrep program. In egrep, you can use parentheses to group parts of the expression and the pipe symbol to mean or. So (AB)|C means "either AB or C", while A(B|C) means "A followed by either B or C", and so on. Thus we could use:% egrep 'CA *950((05)|(6[024-6])|(1[78]))' addressesThere are a few other things that egrep allows but grep does not. For example, in egrep regular expressions you can say a+ to mean "a sequence of one or more as", or [a-z]+ to mean "a sequence of one or more lower-case letters". In grep regular expressions you would have to say aa* and [a-z][a-z]* respectively to get these effects.
LIN 6932 18 Searching for something in a file egrep So we can use: % egrep 'CA *950((05)|(6[024-6])|(1[78]))' addresses There are a few other things that egrep allows but grep does not. For example, in egrep regular expressions you can say a+ to mean "a sequence of one or more as", or [a-z]+ to mean "a sequence of one or more lower-case letters". In grep regular expressions you would have to say aa* and [a-z][a-z]* respectively to get these effects.
LIN 6932 19 File Management with Shell Commands Changing to another directory % cd .. [RETURN] go up a directory tree % cd [DIRECTORY] [RETURN] change to a subdirectory % cd /tmp to change to some other directory on the system, you must type the full path name
LIN 6932 20 File Management with Shell Commands • Create a directory % mkdir [DIRECTORY.NAME] [RETURN] • Remove a directory % rmdir [DIRECTORY.NAME] [RETURN]
LIN 6932 21 Searching for something in a file > cd .. > cd c6932aab > ls display shakespeare > cp shakespeare ~c6932aad > cd > ls shakespeare
LIN 6932 22 Searching for something in a file % grep [options] pattern filenames % fgrep [options] string filenames fgrep (or "fast grep") only searches for strings grep is a full-blown regular-expression matcher Some of the valid options are: -i case-insensitive search -n show the line# along with the matched line -v invert match, e.g. find all lines that do NOT match -w match entire words, rather than substrings
LIN 6932 23 Searching for something in a file with GREP % grep -inw ”thou" shakespeare find all instances of the word ”though" in the file “shakespeare”, case- insensitive but whole words and display the line numbers
LIN 6932 24 Grep grep '^smug' files {'smug' at the start of a line} grep 'smug$' files {'smug' at the end of a line} grep '^smug$' files {lines containing only 'smug'} grep '^s' files {lines starting with '^s'} grep '[Ss]mug' files {search for 'Smug' or 'smug'} grep 'B[oO][bB]' files {search for BOB, Bob, BOb or BoB } grep '^$' files {search for blank lines} grep '[0-9][0-9]' file {search for pairs of numeric digits}
LIN 6932 25 Grep grep '[^a-zA-Z0-9] {anything not a letter or number} grep '[0-9]{3}-[0-9]{4}' {999-9999, like phone numbers} grep '^.$' {lines with exactly one character} grep '"smug"' {'smug' within double quotes} grep '"*smug"*' {'smug', with or without quotes} grep '^.' {any line that starts with "."} grep '^.[a-z][a-z]' {line start with "." and 2 lc letters}
LIN 6932 26 Egrep The version of grep that supports the full set of operators mentioned above is generally called egrep (for extended grep) % egrep '(mine|my)' shakespeare

grep and egrep linux presentation for lecture

  • 1.
    LIN 6932 1 Searchingfor something in a file GREP • The grep family is a collection of three related programs for finding patterns in files. Their names are grep, fgrep, and egrep. • The name grep has its origin in the phrase "Get Regular Expression and Print” • grep is a full-blown regular-expression matcher • fgrep = "fixed string grep” only searches for strings • egrep = “extended grep”
  • 2.
    LIN 6932 2 Searchingfor something in a file fgrep fgrep: the easiest (but not fastest) one to use Syntax: % fgrep [options] 'search string’ filenames Interpretation: In the name fgrep the f stands for "Fixed string", and not "Fast" (contrary to what the man page may tell you). The fgrep program finds all the lines in a file that contain a certain fixed string. So, for example, I could find all occurrences of CA in the files in the current working directory simply by typing this command: % fgrep CA *
  • 3.
    LIN 6932 3 Searchingfor something in a file fgrep • Like many UNIX filters, it can take as many file names as you like to supply. And of course it permits various adverbs that specify options; two useful ones are • -i ignore the difference between upper case and lower case when deciding what is a match • -v reverse the effect of the search by outputting only the lines that don't match % fgrep -i CA * % fgrep -v CA *
  • 4.
    LIN 6932 4 Searchingfor something in a file fgrep The key limitation of fgrep is that you cannot use it to get approximate matches, or matches of more complicated patterns that cannot be described by just giving a fixed string. Sometimes you are not quite sure what string you are looking for; for example, you might know only that the word you are seeking begins with z and ends with -ic, and had the sequence gm in it somewhere. What you need, then, is not a program that will find the matching lines for you if you give it the exact string you need to find, but rather a program that can understand a language in which you can say things like "begins with z and ends with -ic or -ics and had gm in it somewhere."
  • 5.
    LIN 6932 5 Searchingfor something in a file grep called up by giving a command that has this form: % grep [options] pattern description files_to_search_in % grep -i 'pull[aeiou][mn]’ shakespeare bad_phone_numbers display • This means, "without distinguishing between upper and lower case, search the files shakespeare bad_phone_numbers display for lines that contain pull followed by a vowel letter followed by an m or an n". Thus it is looking for Pullum, Pullam, Pullen, PULLUN, pullum@grove.ufl.edu, etc., etc. • The expression pull[aeiou][mn] is a pattern description covering the name Pullum and most common variants of it. Thus it is looking for Pullum, Pullam, Pullen, PULLUN, pullum@grove.ufl.edu. • The pattern descriptions used with grep are in a language called the language of regular expressions. This is one of the most important and fruitful developments in modern computer science, and in order to use grep you need to understand regular expressions thoroughly.
  • 6.
    LIN 6932 6 Searchingfor something in a file grep There are various dialects of the regular expression language that are used by various UNIX programs. Here we will be talking about grep and its extended cousin egrep. (Read the excellent summary with examples in Unix in a Nutshell, particularly chapter 6, and do man grep on a NetBSD machine to check the details of the GNU grep that runs on those machines. (GNU: pronounced guh-noo, approximately like canoe; launched in 1984 to develop a complete Unix-like operating system which is free software, often referred to as LINUX) Note that the grep that runs on other machines may be a different program, with lots of differences in its behavior from the GNU version.
  • 7.
    LIN 6932 7 Searchingfor something in a file grep There are various dialects of the regular expression language that are used by various UNIX programs. Here we will be talking about grep and its extended cousin egrep. (Read the excellent summary with examples in Unix in a Nutshell, particularly chapter 6, and do man grep on a NetBSD machine to check the details of the GNU grep that runs on those machines. (GNU: pronounced guh-noo, approximately like canoe; launched in 1984 to develop a complete Unix-like operating system which is free software, often referred to as LINUX) Note that the grep that runs on other machines may be a different program, with lots of differences in its behavior from the GNU version.
  • 8.
    LIN 6932 8 Searchingfor something in a file grep • Example: “The match the phrase that begins with z at the beginning of a line and ends in -ic or -ics at the end of the line, and it has gm in it somewhere”, is expressed in the language of regular expressions in this form: ^z.*gm.*ics*$ To be more precise, what this regular expression means is: "beginning of line followed by z followed by optional other material followed by gm followed by optional other material followed by -ic followed by zero or more occurrences of s followed by end of line" • It can therefore be used in a grep command to search for a word in a dictionary where each word is on a separate line meeting the description: % grep '^z.*gm.*ics*$' dictionary Search result: zeugmatic
  • 9.
    LIN 6932 9 Searchingfor something in a file grep The most trivial case of a regular expression is that of a fixed string of the sort that fgrep recognizes. Fixed strings are regular expressions that are matched only by strings identical to themselves. The regular expression Z is matched by any occurrence of Z. There happens to be only one line in The Great God Pan (/class/lin6932/c6932aab/machen.txt) that matches it, namely the middle line of these three: remained. These three, however, were 'good lives,' but yet not proof against the Zulu assegais and typhoid fever, and so one morning Aubernoun woke up and found himself Lord Because the middle line matches the expression Z, you can fetch (a copy of) that line out of the file like this: % grep Z machen.txt not proof against the Zulu assegais and typhoid fever, and so
  • 10.
    LIN 6932 10 Searchingfor something in a file grep % fgrep Z machen.txt fgrep would do the same thing. But what fgrep cannot do is to call for all lines with Au possibly followed by some other lower-case letters and then an n. That is accomplished by the regular expression Au[a-z]*n this RE is matched by any sequence of a capital A followed by a lower-case u followed by zero or more letters in the range lower-case a to lower-case z followed by lower-case n. This means it will be matched by any string containing a word like word like any of these: Aubernoun, Augustine, Austin, etc.
  • 11.
    LIN 6932 11 Searchingfor something in a file grep % fmt -1 machen.txt | tr -d '[:punct:] ' | grep 'Au[a-z]*n' | sort -u The fmt command is to break the words up and put them one on each line the tr -d '[:punct:]' command erases all punctuation, and spaces the sort -u command sorts the search result alphabetically
  • 12.
    LIN 6932 12 Searchingfor something in a file grep % grep 'Au[a-z]*n' machen.txt Au[s t r a l a b r a c a d a b r a l a l i o l a s i a]n Au[ a-z ]*n
  • 13.
    LIN 6932 13 Searchingfor something in a file grep Example: The zipcodes in the near vicinity of the UC campus are 95060 (Santa Cruz west of the river), 95062 (Live Oak), 95064 (UCSC), 95065 (East Santa Cruz), 95066 (Scotts Valley). Suppose you wanted to extract from a file called addresses, containing one full name and address on each line, just the addresses of people living in these areas. Assume some people type a space after CA and others don't, and some write several spaces. The following regular expression describes the set of zipcodes you want: CA *9506[024-6]. This grep command will find just the lines in the file addresses that contain zipcodes for people who live in near the campus: % grep 'CA *9506[024-6]' addresses
  • 14.
    LIN 6932 14 Searchingfor something in a file grep Example: Suppose you want only the 9-digit zipcodes, that's easy too: % grep 'CA *9506[024-6]-[0-9]{4}' addresses
  • 15.
    LIN 6932 15 Searchingfor something in a file grep Example: Suppose you were looking to see whether there were any words beginning with a in a file called shakespeare. You might type % grep a* shakespeare
  • 16.
    LIN 6932 16 Searchingfor something in a file egrep Some simple tasks would be a bit of a chore just using grep. Suppose we wanted to add Ben Lomond (CA 95005), Davenport (CA 95017), and Felton (CA 95018). What we need here is the disjunction: for the 5-digit zipcodes, the strings we want will match either CA *9506[024- 6] or CA *95005 or CA *9501[78] or. Now, we can certainly do that: we can simply call grep three separate times, and amalgamate all the results. We cannot amalgamate all the searches into something like CA *950[016][024- 8], because that defines a set that is too big; it lets in 95004, for example, and that's Aromas, way the other side of Watsonville. The way to do it is to use the extended regular expressons provided by the egrep program. In egrep, you can use parentheses to group parts of the expression and the pipe symbol to mean or. So (AB)|C means "either AB or C", while A(B|C) means "A followed by either B or C", and so on. Thus we could use:% egrep 'CA *950((05)|(6[024-6])|(1[78]))' addressesThere are a few other things that egrep allows but grep does not. For example, in egrep regular expressions you can say a+ to mean "a sequence of one or more as", or [a-z]+ to mean "a sequence of one or more lower-case letters". In grep regular expressions you would have to say aa* and [a-z][a-z]* respectively to get these effects.
  • 17.
    LIN 6932 17 Searchingfor something in a file egrep The way to do it is to use the extended regular expressons provided by the egrep program. In egrep, you can use parentheses to group parts of the expression and the pipe symbol to mean or. So (AB)|C means "either AB or C", while A(B|C) means "A followed by either B or C", and so on. Thus we could use:% egrep 'CA *950((05)|(6[024-6])|(1[78]))' addressesThere are a few other things that egrep allows but grep does not. For example, in egrep regular expressions you can say a+ to mean "a sequence of one or more as", or [a-z]+ to mean "a sequence of one or more lower-case letters". In grep regular expressions you would have to say aa* and [a-z][a-z]* respectively to get these effects.
  • 18.
    LIN 6932 18 Searchingfor something in a file egrep So we can use: % egrep 'CA *950((05)|(6[024-6])|(1[78]))' addresses There are a few other things that egrep allows but grep does not. For example, in egrep regular expressions you can say a+ to mean "a sequence of one or more as", or [a-z]+ to mean "a sequence of one or more lower-case letters". In grep regular expressions you would have to say aa* and [a-z][a-z]* respectively to get these effects.
  • 19.
    LIN 6932 19 FileManagement with Shell Commands Changing to another directory % cd .. [RETURN] go up a directory tree % cd [DIRECTORY] [RETURN] change to a subdirectory % cd /tmp to change to some other directory on the system, you must type the full path name
  • 20.
    LIN 6932 20 FileManagement with Shell Commands • Create a directory % mkdir [DIRECTORY.NAME] [RETURN] • Remove a directory % rmdir [DIRECTORY.NAME] [RETURN]
  • 21.
    LIN 6932 21 Searchingfor something in a file > cd .. > cd c6932aab > ls display shakespeare > cp shakespeare ~c6932aad > cd > ls shakespeare
  • 22.
    LIN 6932 22 Searchingfor something in a file % grep [options] pattern filenames % fgrep [options] string filenames fgrep (or "fast grep") only searches for strings grep is a full-blown regular-expression matcher Some of the valid options are: -i case-insensitive search -n show the line# along with the matched line -v invert match, e.g. find all lines that do NOT match -w match entire words, rather than substrings
  • 23.
    LIN 6932 23 Searchingfor something in a file with GREP % grep -inw ”thou" shakespeare find all instances of the word ”though" in the file “shakespeare”, case- insensitive but whole words and display the line numbers
  • 24.
    LIN 6932 24 Grep grep'^smug' files {'smug' at the start of a line} grep 'smug$' files {'smug' at the end of a line} grep '^smug$' files {lines containing only 'smug'} grep '^s' files {lines starting with '^s'} grep '[Ss]mug' files {search for 'Smug' or 'smug'} grep 'B[oO][bB]' files {search for BOB, Bob, BOb or BoB } grep '^$' files {search for blank lines} grep '[0-9][0-9]' file {search for pairs of numeric digits}
  • 25.
    LIN 6932 25 Grep grep'[^a-zA-Z0-9] {anything not a letter or number} grep '[0-9]{3}-[0-9]{4}' {999-9999, like phone numbers} grep '^.$' {lines with exactly one character} grep '"smug"' {'smug' within double quotes} grep '"*smug"*' {'smug', with or without quotes} grep '^.' {any line that starts with "."} grep '^.[a-z][a-z]' {line start with "." and 2 lc letters}
  • 26.
    LIN 6932 26 Egrep Theversion of grep that supports the full set of operators mentioned above is generally called egrep (for extended grep) % egrep '(mine|my)' shakespeare

Editor's Notes

  • #21 % vi /class/lin6932/c6932aab/shakespeare