Posts: 4 Threads: 1 Joined: Aug 2021 Aug-18-2021, 11:07 AM (This post was last modified: Aug-18-2021, 11:20 AM by Larz60+.) Hi, Kindly your support to provide python script to search given strings (Number, Text etc.) from multiple ".gz" text files Directory contains multiple".gz" files date wise Output: /bkup/TC/XYZ/20210818 File Names: A_7235818.csv.gz A_7235819.csv.gz . .
Output: Content of sample file. 38486,22625,XYZ_06_0_20210817204446-3997 88279,77617,XYZ_06_0_20210817204846-3998
Getting error while running below Code. import glob import gzip matched_lines = [] ZIPFILES='/bkup/TC/XYZ/20210818/*.gz' grep = raw_input('Enter Search: ') filelist = glob.glob(ZIPFILES) for gzfile in filelist: #print("#Starting " + gzfile) #if you want to know which file is being processed with gzip.open( gzfile, 'rb') as f: # grep = raw_input('Enter Search: ') for line in f: # read file line by line if grep in line: # search for string in each line matched_lines.append(line) # keep a list of matched lines file_content = ''.join(matched_lines) # join the matched lines print(file_content)Output: Error: $ ./srch6.py File "./srch6.py", line 17 for line in f: # read file line by line ^ IndentationError: expected an indented block Larz60+ write Aug-18-2021, 11:20 AM:Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button. fixed for you this time, please use bbcode tags on future posts Posts: 12,117 Threads: 494 Joined: Sep 2016 The error is clear. You need to fix indentation. (I did not try to run following so there may be additional errors): import glob import gzip matched_lines = [] ZIPFILES='/bkup/TC/XYZ/20210818/*.gz' grep = raw_input('Enter Search: ') filelist = glob.glob(ZIPFILES) for gzfile in filelist: # print("#Starting " + gzfile) #if you want to know which file is being processed with gzip.open( gzfile, 'rb') as f: # grep = raw_input('Enter Search: ') for line in f: # read file line by line if grep in line: # search for string in each line matched_lines.append(line) # keep a list of matched lines file_content = ''.join(matched_lines) # join the matched lines print(file_content) Posts: 4 Threads: 1 Joined: Aug 2021 Aug-18-2021, 01:37 PM (This post was last modified: Aug-18-2021, 06:18 PM by Larz60+.) (Aug-18-2021, 11:26 AM)Larz60+ Wrote: The error is clear. You need to fix indentation. (I did not try to run following so there may be additional errors): import glob import gzip matched_lines = [] ZIPFILES='/bkup/TC/XYZ/20210818/*.gz' grep = raw_input('Enter Search: ') filelist = glob.glob(ZIPFILES) for gzfile in filelist: # print("#Starting " + gzfile) #if you want to know which file is being processed with gzip.open( gzfile, 'rb') as f: # grep = raw_input('Enter Search: ') for line in f: # read file line by line if grep in line: # search for string in each line matched_lines.append(line) # keep a list of matched lines file_content = ''.join(matched_lines) # join the matched lines print(file_content) Thanks - now code is working fine but not getting search result. import glob import gzip matched_lines = [] ZIPFILES='/bkup/TC/XYZ/20210818/*.gz' grep = raw_input('Enter Search: ') filelist = glob.glob(ZIPFILES) for gzfile in filelist: print("#Starting " + gzfile) #if you want to know which file is being processed with gzip.open( gzfile, 'rb') as f: # grep = raw_input('Enter Search: ') for line in f: # read file line by line if grep in line: # search for string in each line matched_lines.append(line) # keep a list of matched lines file_content = ''.join(matched_lines) # join the matched lines print(file_content) Larz60+ write Aug-18-2021, 06:18 PM:Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button. Please, as requested previously, use bbcode tags on posts, it's a forum requirement. Posts: 2,171 Threads: 12 Joined: May 2017 Aug-18-2021, 02:35 PM (This post was last modified: Aug-18-2021, 02:35 PM by DeaD_EyE.) It looks like an ancient example for Python 2, which is really out of date. Here is a working example with some Python magic: #!/usr/bin/env python3 # You should use Python 3 and don't touch Python 2 # The development of Python 2 has been stopped and # won't get any security updates import gzip import sys from collections import defaultdict from pathlib import Path def get_matching_files(root, contains): """ Generator to iterate over gz-files in root and search line by line for each file a matching text. If a result was found the generator yields: >>> gzfile, (line_number, line) """ for gzfile in root.glob("*.gz"): # open in text mode # this may rise an UnicodeDecodeError # if the encoding is messed up with gzip.open(gzfile, "rt") as gz: for line_number, line in enumerate(gz, start=1): if contains in line: yield gzfile, (line_number, line) if __name__ == "__main__": if len(sys.argv) != 3: raise SystemExit(f"python3 {sys.argv[0]} path_to_directory matching_text") zipfiles = Path(sys.argv[1]) search = sys.argv[2] results = defaultdict(list) for gzfile, line in get_matching_files(zipfiles, search): # line is tuple of (line_number, line) results[gzfile].append(line) print(results)The part to get the arguments should be done with argparse, click or typer. Posts: 4 Threads: 1 Joined: Aug 2021 (Aug-18-2021, 02:35 PM)DeaD_EyE Wrote: It looks like an ancient example for Python 2, which is really out of date. Here is a working example with some Python magic: #!/usr/bin/env python3 # You should use Python 3 and don't touch Python 2 # The development of Python 2 has been stopped and # won't get any security updates import gzip import sys from collections import defaultdict from pathlib import Path def get_matching_files(root, contains): """ Generator to iterate over gz-files in root and search line by line for each file a matching text. If a result was found the generator yields: >>> gzfile, (line_number, line) """ for gzfile in root.glob("*.gz"): # open in text mode # this may rise an UnicodeDecodeError # if the encoding is messed up with gzip.open(gzfile, "rt") as gz: for line_number, line in enumerate(gz, start=1): if contains in line: yield gzfile, (line_number, line) if __name__ == "__main__": if len(sys.argv) != 3: raise SystemExit(f"python3 {sys.argv[0]} path_to_directory matching_text") zipfiles = Path(sys.argv[1]) search = sys.argv[2] results = defaultdict(list) for gzfile, line in get_matching_files(zipfiles, search): # line is tuple of (line_number, line) results[gzfile].append(line) print(results)The part to get the arguments should be done with argparse, click or typer. Apologize for delayed response as we have updated python 3.6.8 version. Executing above script by updating exact file path in below line for gzfile in root.glob("/bkup/TC/XYZ/20210818/*.gz"): Output: python3 ./srch.py path_to_directory matching_text Posts: 1,835 Threads: 2 Joined: Apr 2017 What's the reason for reimplementing zgrep? Gribouillis likes this post Posts: 2,171 Threads: 12 Joined: May 2017 (Aug-25-2021, 04:18 AM)ndc85430 Wrote: What's the reason for reimplementing zgrep? Reimplementing it in Python is better like this: import subprocess def zgrep(file, pattern): proc = subprocess.Popen(["zgrep", pattern, file], stdout=subprocess.PIPE) for line in proc.stdout: yield line.decode(errors="replace") Code, which utilizes zgrep, does not run on Windows. In addition, it adds a dependency to Python + it's not Python. The next could be, why to implement cat, sort, awk, sed, ... if we already have them on our machines? The increase of non-pythonic solutions: https://github.com/arunsivaramanneo/GPU-...wer.py#L24 Output: Output: python3 ./srch.py path_to_directory matching_text
Yes, what could this mean? Have you tried python3 ./srch.py --help It's the normal way how command line tools are controlled. They take options, arguments and parameters. If you want to list a directory on Linux, you could type: ls -l / The -l is an option and the / is an argument and points to the target directory which ls should show. Posts: 4,874 Threads: 78 Joined: Jan 2018 It seems that zgrep is just a shell script invoking gzip and grep. It could be easily rewritten in Python. Posts: 1,835 Threads: 2 Joined: Apr 2017 But the point is, it exists, so why bother reimplementing it? Posts: 4,874 Threads: 78 Joined: Jan 2018 As DeaD_EyE said above, to increase portability and to reduce dependencies. Python has a built-in gzip library, and a grep-like behavior can be obtained with re.search() . It means that a similar functionality can be obtained from the standard library. Of course someone has to make the effort (sorry I don't currently have time to do that). |