Skip to content

Conversation

@pablogsal
Copy link
Member

@lysnikolaou Hummm, can you quickly cover why moving this chunk solves the problem? I think I am missing some key insight here

@lysnikolaou
Copy link
Member Author

lysnikolaou commented Jun 15, 2023

Sure. There's two different cases to consider here:

  1. When we're calling the tokenizer from the tokenize module, it uses the readline tokenizer, which you added in 9216e69. In that branch, update_fstring_expr wasn't being called at all from tok_underflow_readline, which resulted in the last_expr_size not being updated. That's why we were passing a negative count to PyUnicode_DecodeUTF8.
cpython onmain via C v14.0.3-clang via 🐍 pyenv 3.11.3cat tmp/t.py print(f'''{ 3 =}''')% cpython onmain via C v14.0.3-clang via 🐍 pyenv 3.11.3 ❯ ./python.exe -m tokenize tmp/t.py unexpected error: Negative size passed to PyUnicode_New Traceback (most recent call last): File "/Users/lysnikolaou/repos/python/cpython/Lib/runpy.py", line 198, in _run_module_as_main return _run_code(code, main_globals, None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lysnikolaou/repos/python/cpython/Lib/runpy.py", line 88, in _run_code exec(code, run_globals) File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 547, in <module> main() File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 489, in main tokens = list(tokenize(f.readline)) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 447, in tokenize yield from _generate_tokens_from_c_tokenizer(rl_gen.__next__, encoding, extra_tokens=True) File "/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line 537, in _generate_tokens_from_c_tokenizer for info in it: SystemError: Negative size passed to PyUnicode_New
  1. There was a second bug with an f-string being tokenized from a file, only when the closing brace of the f-string expression part is in the last line of the file and there's no newline at the end. In that branch we use tok->cur to calculate the size of the buffer before adding an implicit newline and then tok->start after adding the newline. That resulted in the end buffer being one character too short.
cpython onmain via C v14.0.3-clang via 🐍 pyenv 3.11.3 took 15scat tmp/t.py print(f'''{ 3 =}''')% cpython onmain via C v14.0.3-clang via 🐍 pyenv 3.11.3 ❯ ./python.exe tmp/t.py 3 3
@lysnikolaou
Copy link
Member Author

Do you think I should split this PR in two and open a second issue for the second case above?

@lysnikolaou lysnikolaou force-pushed the fix-expr-buffer-file-readline branch from 2efb0a4 to a83de94 Compare June 15, 2023 13:46
@pablogsal
Copy link
Member

Do you think I should split this PR in two and open a second issue for the second case above?

Nah, I think is fine, but we may want to add two NEWS entries

Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the explanation @lysnikolaou. Fantastic work as always! 💪

@pablogsal
Copy link
Member

@lysnikolaou If you want add the second NEWS entry and then land it.

@lysnikolaou lysnikolaou enabled auto-merge (squash) June 15, 2023 15:56
@lysnikolaou lysnikolaou merged commit d382ad4 into python:main Jun 15, 2023
@miss-islington
Copy link
Contributor

Thanks @lysnikolaou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 15, 2023
…kenizer (pythonGH-105828) (cherry picked from commit d382ad4) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
@bedevere-bot
Copy link

GH-105832 is a backport of this pull request to the 3.12 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.12 only security fixes label Jun 15, 2023
lysnikolaou added a commit to lysnikolaou/cpython that referenced this pull request Jun 15, 2023
lysnikolaou added a commit that referenced this pull request Jun 15, 2023
…okenizer (GH-105828) (#105832) (cherry picked from commit d382ad4) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
carljm added a commit to carljm/cpython that referenced this pull request Jun 15, 2023
* main: (57 commits) pythongh-105831: Fix NEWS blurb from pythongh-105828 (python#105833) pythongh-105820: Fix tok_mode expression buffer in file & readline tokenizer (python#105828) pythongh-105751, test_ctypes: Remove disabled tests (python#105826) pythongh-105821: Use a raw f-string in test_httpservers.py (python#105822) pythongh-105751: Remove platform usage in test_ctypes (python#105819) pythongh-105751: Reenable disable test_ctypes tests (python#105818) pythongh-105751: Remove dead code in test_ctypes (python#105817) More reorganisation of the typing docs (python#105787) Improve docs for `typing.dataclass_transform` (python#105792) pythonGH-89812: Churn `pathlib.Path` test methods (python#105807) pythongh-105800: Issue SyntaxWarning in f-strings for invalid escape sequences (python#105801) pythongh-105751: Cleanup test_ctypes imports (python#105803) pythongh-105481: add HAS_JUMP flag to opcode metadata (python#105791) pythongh-105751: test_ctypes avoids the operator module (pythonGH-105797) pythongh-105751: test_ctypes: Remove @need_symbol decorator (pythonGH-105798) pythongh-104909: Implement conditional stack effects for macros (python#105748) pythongh-75905: Remove test_xmlrpc_net: skipped since 2017 (python#105796) pythongh-105481: Fix types and a bug for pseudos (python#105788) Update DSL docs for cases generator (python#105753) pythonGH-77273: Better bytecodes for f-strings (pythonGH-6132) ...
gvanrossum pushed a commit to gvanrossum/cpython that referenced this pull request Jun 18, 2023
gvanrossum pushed a commit to gvanrossum/cpython that referenced this pull request Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants