3

Hey, I am on a HP-UX server here. When recursively grepping a directory tree, I have problems when the tree also contains binary files: grep treats them as text files and displays very long lines containing a lot of non-printable characters. This not only makes the output hard to scan, but also often makes my terminal unusable (and writes funny strings to its title).

GNU-grep has an option --binary-file= which would help (and it does not print the matching line anyway for binary files), but I do not have GNU-tools availabe.

Is there a way to simulate the behavior of GNU-grep or to ignore files that look like they are binary?

Btw. if there is an easy way to do this in perl, that would be fine, too.

2 Answers 2

3

Building on the previous answer, you can use the "file" command to identify text files, and then limit your grep to only those files. For example:

 find dir -type f -print | xargs file | grep text | cut -f1 -d: | xargs grep "expression" 

That's:

  • Find all files in directory "dir"
  • Pass these as arguments to "file"
  • Look for output from "file" containing the word "text"
  • Chop out the first colon-delimited field and use it as a filename
  • Search these files using grep.

This will fail in the case of filenames containing whitespace or colons, but will otherwise do what you want.

1
  • That one works a bit better, but the file command on hp-ux seems to be as bad as the grep command - 'file | grep text' is not enough to weed out the binary files.. Commented Dec 21, 2009 at 14:05
1

There might be a better way, but maybe pass all the files to a shell loop, and do something like the following with the file command:

if file "$i" | grep text; then ... fi 

...?

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.