-
- Notifications
You must be signed in to change notification settings - Fork 33.4k
bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior #7891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| line = b'' | ||
| while True: # loop over lines in stream | ||
| try: | ||
| last_line = line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't last_line only be set after StopIteration is caught? ISTM that in other cases we wouldn't want to be adding the newline at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readline is one of the ancient APIs that existed before generators. There's two ways of stopping iteration, either raising StopIteration or returning the empty string, the latter gets caught all the way down here https://github.com/python/cpython/blob/master/Lib/tokenize.py#L528
How you're describing it is what I had initially but there's a few places in the loop where iteration can stop so I thought this would be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then perhaps add a comment to that end? It seems rather crucial to understanding that particular piece of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, added.
Lib/tokenize.py Outdated
| try: | ||
| # This loop has multiple points where it can break out so we | ||
| # pick up the value for the last_line here, at one unified | ||
| # point to keep things simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"at one unified point to keep things simple" should be removed IMO (unnecessary).
I suggest mentioning briefly why last_line is kept (to the best of my understanding: the last line is needed for the newline check after the loop, but line will be overridden at the last loop iteration).
| A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
| You're right, on a second read the comment didn't really explain much. Reworded. I have made the requested changes; please review again |
| Thanks for making the requested changes! @taleinat: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a whitespace issue in one of the files. For details, run make patchcheck or take a look at the output on Travis.
| A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
| Whoops, that's what I get for being lazy and using the github web editor. I have made the requested changes; please review again |
| Thanks for making the requested changes! @taleinat: please review the changes made to this pull request. |
taleinat left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the whole PR again and a have a few final remarks. After these are fixed it would be good to go IMO.
Lib/tokenize.py Outdated
| pos += 1 | ||
| | ||
| # Add an implicit NEWLINE if the input doesn't end in one | ||
| if len(last_line) > 0 and last_line[-1] not in '\r\n': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should begin if last_line and ....
Lib/test/test_tokenize.py Outdated
| for type, token, start, end, line in tokenize(f.readline): | ||
| if type == ENDMARKER: | ||
| break | ||
| if s[-1] not in '\r\n' and type == NEWLINE and end[0] == num_lines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is quite unreadable. Not simple to make more readable though. A short comment could help.
Lib/test/test_tokenize.py Outdated
| if type == ENDMARKER: | ||
| break | ||
| if s[-1] not in '\r\n' and type == NEWLINE and end[0] == num_lines: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check_tokenize() logic is repeated exactly, but it is now becoming rather involved. This should be a common function used by both classes.
| A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
| Thanks for the reviews Tal, I really appreciate it! I refactored out the common logic, does this seem fine? I also added a comment and added a variable in the check to convey the meaning better. I have made the requested changes; please review again |
| Thanks @ammaraskar for the PR, and @taleinat for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7. |
| Sorry, @ammaraskar and @taleinat, I could not cleanly backport this to |
| Sorry, @ammaraskar and @taleinat, I could not cleanly backport this to |
| Sorry, @ammaraskar and @taleinat, I could not cleanly backport this to |
| Will backport by hand in a bit |
| @ammaraskar, ask me (via GitHub's "Reviewers" feature) to review the backport PRs when they're ready. |
…ne behavior (pythonGH-7891) Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one. Contributed by Ammar Askar.. (cherry picked from commit c4ef489) Co-authored-by: Ammar Askar <ammar_askar@hotmail.com>
…ne behavior (pythonGH-7891) Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one. Contributed by Ammar Askar.. (cherry picked from commit c4ef489) Co-authored-by: Ammar Askar <ammar_askar@hotmail.com>
…ne behavior (pythonGH-7891) Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one. Contributed by Ammar Askar.. (cherry picked from commit c4ef489) Co-authored-by: Ammar Askar <ammar_askar@hotmail.com>
| Thanks, @ammaraskar. |
Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one.
https://bugs.python.org/issue33899