-
- Notifications
You must be signed in to change notification settings - Fork 33.4k
bpo-34010: Fix tarfile read performance regression #8020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| | @@ -525,7 +525,7 @@ def read(self, size=None): | |||||
| if not buf: | ||||||
| break | ||||||
| t.append(buf) | ||||||
| buf = "".join(t) | ||||||
| buf = b"".join(t) | ||||||
| else: | ||||||
| buf = self._read(size) | ||||||
| self.pos += len(buf) | ||||||
| | @@ -538,6 +538,7 @@ def _read(self, size): | |||||
| return self.__read(size) | ||||||
| | ||||||
| c = len(self.dbuf) | ||||||
| t = [self.dbuf] | ||||||
| while c < size: | ||||||
| buf = self.__read(self.bufsize) | ||||||
| if not buf: | ||||||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about bypassing while c < size: if self.buf: buf = self.buf self.buf = b"" else: buf = self.fileobj.read(self.bufsize) if not buf: break Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For compressed streams your suggestion works. However, there is one case where Lines 537 to 538 in 3c45240
Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regardless optimization Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree. It is a bit twisted to handle the different cases of decompressing versus direct read. So far, this is just a minimal fix of the performance regression. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right. | ||||||
| | @@ -546,26 +547,27 @@ def _read(self, size): | |||||
| buf = self.cmp.decompress(buf) | ||||||
| except self.exception: | ||||||
| raise ReadError("invalid compressed data") | ||||||
| self.dbuf += buf | ||||||
| t.append(buf) | ||||||
| c += len(buf) | ||||||
| buf = self.dbuf[:size] | ||||||
| self.dbuf = self.dbuf[size:] | ||||||
| return buf | ||||||
| t = b"".join(t) | ||||||
| self.dbuf = t[size:] | ||||||
| return t[:size] | ||||||
| | ||||||
| def __read(self, size): | ||||||
| """Return size bytes from stream. If internal buffer is empty, | ||||||
| read another block from the stream. | ||||||
| """ | ||||||
| c = len(self.buf) | ||||||
| t = [self.buf] | ||||||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this optimization is needed. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For large rem = size - len(self.buf) if rem > 0: self.buf += self.fileobj.read(max(self.bufsize, rem)) t = self.buf[:size] self.buf = self.buf[size:] return t | ||||||
| while c < size: | ||||||
| buf = self.fileobj.read(self.bufsize) | ||||||
| if not buf: | ||||||
| break | ||||||
| self.buf += buf | ||||||
| t.append(buf) | ||||||
| c += len(buf) | ||||||
| buf = self.buf[:size] | ||||||
| self.buf = self.buf[size:] | ||||||
| return buf | ||||||
| t = b"".join(t) | ||||||
| self.buf = t[size:] | ||||||
| return t[:size] | ||||||
| # class _Stream | ||||||
| | ||||||
| class _StreamProxy(object): | ||||||
| | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It never caused a problem, since this line is never called;
sizeis never None in the function call. But still, should be fixed, I guess.