Skip to content

Commit 8d13091

Browse files
authored
bpo-34043: Optimize tarfile uncompress performance (GH-8089)
tarfile._Stream has two buffer for compressed and uncompressed data. Those buffers are not aligned so unnecessary bytes slicing happens for every reading chunks. This commit bypass compressed buffering. In this benchmark [1], user time become 250ms from 300ms. [1]: https://bugs.python.org/msg320763
1 parent f120288 commit 8d13091

File tree

2 files changed

+13
-18
lines changed

2 files changed

+13
-18
lines changed

Lib/tarfile.py

Lines changed: 12 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -513,21 +513,10 @@ def seek(self, pos=0):
513513
raise StreamError("seeking backwards is not allowed")
514514
return self.pos
515515

516-
def read(self, size=None):
517-
"""Return the next size number of bytes from the stream.
518-
If size is not defined, return all bytes of the stream
519-
up to EOF.
520-
"""
521-
if size is None:
522-
t = []
523-
while True:
524-
buf = self._read(self.bufsize)
525-
if not buf:
526-
break
527-
t.append(buf)
528-
buf = b"".join(t)
529-
else:
530-
buf = self._read(size)
516+
def read(self, size):
517+
"""Return the next size number of bytes from the stream."""
518+
assert size is not None
519+
buf = self._read(size)
531520
self.pos += len(buf)
532521
return buf
533522

@@ -540,9 +529,14 @@ def _read(self, size):
540529
c = len(self.dbuf)
541530
t = [self.dbuf]
542531
while c < size:
543-
buf = self.__read(self.bufsize)
544-
if not buf:
545-
break
532+
# Skip underlying buffer to avoid unaligned double buffering.
533+
if self.buf:
534+
buf = self.buf
535+
self.buf = b""
536+
else:
537+
buf = self.fileobj.read(self.bufsize)
538+
if not buf:
539+
break
546540
try:
547541
buf = self.cmp.decompress(buf)
548542
except self.exception:
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Optimize tarfile uncompress performance about 15% when gzip is used.

0 commit comments

Comments
 (0)