This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Created on 2021-02-24 15:06 by rhpvorderman, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24645 merged rhpvorderman, 2021-02-25 08:01
Messages (2)
msg387624 - (view) Author: Ruben Vorderman (rhpvorderman) * Date: 2021-02-24 15:05
python -m gzip reads in chunks of 1024 bytes: https://github.com/python/cpython/blob/1f433406bd46fbd00b88223ad64daea6bc9eaadc/Lib/gzip.py#L599 This hurts performance somewhat. Using io.DEFAULT_BUFFER_SIZE will improve it. Also 'io.DEFAULT_BUFFER_SIZE' is better than: 'ARBITRARY_NUMBER_WITH_NO_COMMENT_EXPLAINING_WHY'. With 1024 blocks Decompression: $ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null' Benchmark #1: cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null Time (mean ± σ): 926.9 ms ± 7.7 ms [User: 901.2 ms, System: 59.1 ms] Range (min … max): 913.3 ms … 939.4 ms 10 runs Compression: $ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null' Benchmark #1: cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null Time (mean ± σ): 2.514 s ± 0.030 s [User: 2.469 s, System: 0.125 s] Range (min … max): 2.472 s … 2.563 s 10 runs with io.DEFAULT_BUFFER_SIZE Decompression: $ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null' Benchmark #1: cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null Time (mean ± σ): 839.9 ms ± 7.3 ms [User: 816.0 ms, System: 57.3 ms] Range (min … max): 830.1 ms … 851.3 ms 10 runs Compression: $ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null' Benchmark #1: cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null Time (mean ± σ): 2.275 s ± 0.024 s [User: 2.247 s, System: 0.096 s] Range (min … max): 2.254 s … 2.322 s 10 runs Speedups: - Decompression 840 / 927 = 0.906 ~= 9% reduction in runtime - Compression 2.275 / 2.514 = 0.905 ~= 9% reduction in runtime. It is not stellar, but it is a quite nice improvement for such a tiny change.
msg387722 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-02-26 12:18
 New changeset 7956ef884965ac6f9f7f2a27b835ea80e471c886 by Ruben Vorderman in branch 'master': bpo-43317: Use io.DEFAULT_BUFFER_SIZE instead of 1024 in gzip CLI (#24645) https://github.com/python/cpython/commit/7956ef884965ac6f9f7f2a27b835ea80e471c886 
History
Date User Action Args
2022-04-11 14:59:41adminsetgithub: 87483
2021-02-26 12:18:14methanesetstatus: open -> closed
versions: - Python 3.6, Python 3.7, Python 3.8, Python 3.9
nosy: - methane

resolution: fixed
stage: patch review -> resolved
2021-02-26 12:18:00methanesetnosy: + methane
messages: + msg387722
2021-02-25 08:01:24rhpvordermansetkeywords: + patch
stage: patch review
pull_requests: + pull_request23430
2021-02-25 07:53:46rhpvordermansettype: performance
2021-02-24 16:41:00xtreaksetnosy: + xtreak
2021-02-24 15:06:00rhpvordermancreate