This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Created on 2020-01-19 20:23 by wchargin, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
repro.py wchargin, 2020-01-19 20:23 repro script, as given in initial bug comment
Pull Requests
URL Status Linked Edit
PR 18077 merged wchargin, 2020-01-20 08:49
PR 18100 merged miss-islington, 2020-01-21 11:25
PR 18101 merged miss-islington, 2020-01-21 11:25
Messages (12)
msg360268 - (view) Author: Willow Chargin (wchargin) * Date: 2020-01-19 20:23
The `gzip` module properly uses the user-specified compression level to control the underlying zlib stream compression level, but always writes metadata that indicates that the maximum compression level was used. Repro: ``` import gzip blob = b"The quick brown fox jumps over the lazy dog." * 32 with gzip.GzipFile("fast.gz", mode="wb", compresslevel=1) as outfile: outfile.write(blob) with gzip.GzipFile("best.gz", mode="wb", compresslevel=9) as outfile: outfile.write(blob) ``` Run this script, then run `wc -c *.gz` and `file *.gz`: ``` $ wc -c *.gz 82 best.gz 84 fast.gz 166 total $ file *.gz best.gz: gzip compressed data, was "best", last modified: Sun Jan 19 20:15:23 2020, max compression fast.gz: gzip compressed data, was "fast", last modified: Sun Jan 19 20:15:23 2020, max compression ``` The file sizes correctly reflect the difference, but `file` thinks that both archives are written at max compression. The error is that the ninth byte of the header in the output stream is hard-coded to `\002` at Lib/gzip.py:260 (as of 558f07891170), which indicates maximum compression. The correct value to indicate maximum speed is `\004`. See RFC 1952, section 2.3.1: <https://tools.ietf.org/html/rfc1952> Using GNU `gzip(1)` with `--fast` creates the same output file as the one emitted by the `gzip` module, except for two bytes: the metadata and the OS (the ninth and tenth bytes).
msg360269 - (view) Author: Willow Chargin (wchargin) * Date: 2020-01-19 20:27
(The commit reference above was meant to be git558f07891170, not a Mercurial reference. Pardon the churn; I'm new here. :-) )
msg360299 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-20 08:29
Looks reasonable. gzip should write b'\002' for compresslevel == _COMPRESS_LEVEL_BEST, b'\004' for compresslevel == _COMPRESS_LEVEL_FAST, and b'\000' otherwise. Do you mind to create a PR William.
msg360301 - (view) Author: Willow Chargin (wchargin) * Date: 2020-01-20 08:58
Sure, PR sent (pull_request17470).
msg360302 - (view) Author: Willow Chargin (wchargin) * Date: 2020-01-20 08:59
PR URL, for reference: <https://github.com/python/cpython/pull/18077>
msg360390 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-21 11:25
 New changeset eab3b3f1c60afecfb4db3c3619109684cb04bd60 by Serhiy Storchaka (William Chargin) in branch 'master': bpo-39389: gzip: fix compression level metadata (GH-18077) https://github.com/python/cpython/commit/eab3b3f1c60afecfb4db3c3619109684cb04bd60 
msg360391 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-21 11:27
Thank you for your contribution William!
msg360392 - (view) Author: miss-islington (miss-islington) Date: 2020-01-21 11:42
 New changeset ab0d8e356ecd351d55f89519a6a97a1e69c0dfab by Miss Islington (bot) in branch '3.8': bpo-39389: gzip: fix compression level metadata (GH-18077) https://github.com/python/cpython/commit/ab0d8e356ecd351d55f89519a6a97a1e69c0dfab 
msg360524 - (view) Author: Willow Chargin (wchargin) * Date: 2020-01-23 00:18
My pleasure; thanks for the triage and review!
msg363272 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-03-03 16:16
Ping. The 3.7.x backport (PR 18101) for this issue is still open and neither needs to be fixed or closed.
msg363273 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-03-03 16:16
"either"
msg363331 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-03-04 07:06
 New changeset 12c45efe828a90a2f2f58a1f95c85d792a0d9c0a by Miss Islington (bot) in branch '3.7': [3.7] bpo-39389: gzip: fix compression level metadata (GH-18077) (GH-18101) https://github.com/python/cpython/commit/12c45efe828a90a2f2f58a1f95c85d792a0d9c0a 
History
Date User Action Args
2022-04-11 14:59:25adminsetgithub: 83570
2021-04-24 10:50:25iritkatriellinkissue27521 superseder
2020-03-04 07:06:55ned.deilysetstatus: open -> closed
resolution: fixed
stage: backport needed -> resolved
2020-03-04 07:06:23ned.deilysetmessages: + msg363331
2020-03-03 16:16:32ned.deilysetmessages: + msg363273
2020-03-03 16:16:10ned.deilysetstatus: closed -> open

nosy: + ned.deily
messages: + msg363272

resolution: fixed -> (no value)
stage: resolved -> backport needed
2020-01-23 00:18:03wcharginsetmessages: + msg360524
2020-01-21 11:42:52miss-islingtonsetnosy: + miss-islington
messages: + msg360392
2020-01-21 11:27:14serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg360391

stage: patch review -> resolved
2020-01-21 11:25:47miss-islingtonsetpull_requests: + pull_request17490
2020-01-21 11:25:40miss-islingtonsetpull_requests: + pull_request17489
2020-01-21 11:25:31serhiy.storchakasetmessages: + msg360390
2020-01-20 08:59:07wcharginsetmessages: + msg360302
2020-01-20 08:58:17wcharginsetmessages: + msg360301
2020-01-20 08:49:36wcharginsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request17470
2020-01-20 08:29:13serhiy.storchakasetversions: - Python 2.7, Python 3.5, Python 3.6
nosy: + serhiy.storchaka

messages: + msg360299

keywords: + easy
stage: needs patch
2020-01-19 20:27:07wcharginsetmessages: + msg360269
2020-01-19 20:24:33wcharginsettype: behavior
2020-01-19 20:23:58wchargincreate