Skip to content

GzipFile.seek makes invalid write if buffer is not flushed in Python 3.12rc1 #108111

Closed
@effigies

Description

@effigies

Bug report

Checklist

  • I am confident this is a bug in CPython, not a bug in a third-party project
  • I have searched the CPython issue tracker,
    and am confident this bug has not been reported before

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.12.0rc1 (main, Aug 16 2023, 05:03:59) [GCC 12.2.0]

A clear and concise description of the bug:

I have code that writes out sections of a data file in chunks, and uses seeks to ensure that the position is correct before writing.

In the following example, I write 5 bytes, seek to position 5 and write five more bytes. If I flush the buffer, the result is as expected. If I do not, 5 null bytes are written between the two groups of intended bytes.

#!/usr/bin/env python import io import gzip for flush in (True, False): data = io.BytesIO() gzip_writer = gzip.GzipFile(fileobj=data, mode='wb') gzip_writer.write(b'abcde') # If the buffer isn't flushed, seek works from unchanged offset if flush and hasattr(gzip_writer, '_buffer'): gzip_writer._buffer.flush() gzip_writer.seek(5) gzip_writer.write(b'fghij') gzip_writer.close() # Recover result data.seek(0) gzip_reader = gzip.GzipFile(fileobj=data, mode='rb') result = gzip_reader.read() print(f'{flush=}: {result}')

In the case where I seek but don't tell, I get spurious \x00 bytes:

flush=True: b'abcdefghij' flush=False: b'abcde\x00\x00\x00\x00\x00fghij' 

Here is the output in Python 3.10.10:

flush=True: b'abcdefghij' flush=False: b'abcdefghij' 

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions