This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Created on 2021-06-11 14:39 by elmanto, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
crashes.tgz elmanto, 2021-06-11 14:39 Inputs that result in the asan violation
Pull Requests
URL Status Linked Edit
PR 26676 merged pablogsal, 2021-06-11 17:18
PR 26695 merged miss-islington, 2021-06-12 17:53
Messages (10)
msg395637 - (view) Author: alessandro mantovani (elmanto) Date: 2021-06-11 14:39
Use After Free in python3.11 (commit 2ab27c4af4ddf752) Steps to reproduce: 1) ./configure --with-address-sanitizer 2) make 3) ./python <input> I attach some of the input that lead to the undefined behavior For the complete description you can find the asan report here: ==1082579==ERROR: AddressSanitizer: heap-use-after-free on address 0x626000045a40 at pc 0x000000735155 bp 0x7fffffffbed0 sp 0x7fffffffbec8 READ of size 8 at 0x626000045a40 thread T0 #0 0x735154 in ascii_decode /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5091:28 #1 0x735154 in unicode_decode_utf8 /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5158:10 #2 0xc98381 in syntaxerror /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c:1087:15 #3 0xc8d616 in tok_get /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c #4 0xc8696b in PyTokenizer_Get /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c:1884:18 #5 0xead74c in _PyPegen_check_tokenizer_errors /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1260:17 #6 0xead74c in _PyPegen_run_parser /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1292:17 #7 0xeaebca in _PyPegen_run_parser_from_file_pointer /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1377:14 #8 0xc83a91 in _PyParser_ASTFromFile /home/elmanto/ddg/other_targets/cpython/Parser/peg_api.c:26:12 #9 0xa0abf1 in pyrun_file /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:1197:11 #10 0xa0abf1 in _PyRun_SimpleFileObject /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:455:13 #11 0xa09b19 in _PyRun_AnyFileObject /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:89:15 #12 0x4dfe94 in pymain_run_file_obj /home/elmanto/ddg/other_targets/cpython/Modules/main.c:353:15 #13 0x4dfe94 in pymain_run_file /home/elmanto/ddg/other_targets/cpython/Modules/main.c:372:15 #14 0x4dfe94 in pymain_run_python /home/elmanto/ddg/other_targets/cpython/Modules/main.c:587:21 #15 0x4dfe94 in Py_RunMain /home/elmanto/ddg/other_targets/cpython/Modules/main.c:666:5 #16 0x4e154c in pymain_main /home/elmanto/ddg/other_targets/cpython/Modules/main.c:696:12 #17 0x4e1874 in Py_BytesMain /home/elmanto/ddg/other_targets/cpython/Modules/main.c:720:12 #18 0x7ffff7a2e0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16 #19 0x43501d in _start (/home/elmanto/ddg/other_targets/cpython/python+0x43501d) 0x626000045a40 is located 2368 bytes inside of 10560-byte region [0x626000045100,0x626000047a40) freed by thread T0 here: #0 0x4ada79 in realloc (/home/elmanto/ddg/other_targets/cpython/python+0x4ada79) #1 0x638e61 in PyMem_RawRealloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:602:12 #2 0x638e61 in _PyObject_Realloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:2339:12 previously allocated by thread T0 here: #0 0x4ada79 in realloc (/home/elmanto/ddg/other_targets/cpython/python+0x4ada79) #1 0x638e61 in PyMem_RawRealloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:602:12 #2 0x638e61 in _PyObject_Realloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:2339:12 SUMMARY: AddressSanitizer: heap-use-after-free /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5091:28 in ascii_decode Shadow bytes around the buggy address: 0x0c4c80000af0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c4c80000b00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c4c80000b10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c4c80000b20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c4c80000b30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c4c80000b40: fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd 0x0c4c80000b50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c4c80000b60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c4c80000b70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c4c80000b80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c4c80000b90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==1082579==ABORTING
msg395641 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-06-11 15:44
Lysandros and Pablo, this *only* occurs when the lexer is reading directly from a file, not when it's reading the same source code from a (bytes) string. All examples are syntax errors (some raise ValueError in the parser).
msg395646 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-11 16:46
Here is a smaller reproducer: x = "ijosdfsd\ def blech(): pass This seems to be an error with: commit a698d52c3975c80b45b139b2f08402ec514dce75 Author: Batuhan Taskaya <isidentical@gmail.com> Date: Thu Jan 21 00:38:47 2021 +0300 bpo-40176: Improve error messages for unclosed string literals (GH-19346) Automerge-Triggered-By: GH:isidentical Batuhan, can you take a look?
msg395647 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-11 16:54
I think this should fix the issue, but someone should validate this: diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c index 6002f3e05a..1c28737183 100644 --- a/Parser/tokenizer.c +++ b/Parser/tokenizer.c @@ -1084,17 +1084,16 @@ syntaxerror(struct tok_state *tok, const char *format, ...) goto error; } - errtext = PyUnicode_DecodeUTF8(tok->line_start, tok->cur - tok->line_start, + errtext = PyUnicode_DecodeUTF8(tok->buf, tok->inp - tok->buf, "replace"); if (!errtext) { goto error; } int offset = (int)PyUnicode_GET_LENGTH(errtext); - Py_ssize_t line_len = strcspn(tok->line_start, "\n"); - if (line_len != tok->cur - tok->line_start) { + Py_ssize_t line_len = strcspn(tok->buf, "\n"); + if (line_len != tok->buf - tok->inp) { Py_DECREF(errtext); - errtext = PyUnicode_DecodeUTF8(tok->line_start, line_len, - "replace"); + errtext = PyUnicode_DecodeUTF8(tok->buf, line_len, "replace"); } if (!errtext) { goto error;
msg395648 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-11 16:58
This affects 3.10 as well
msg395651 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-11 17:18
Ok, found the problem, we are not resetting the multi-line-start pointer when we are reallocating the tokenizer buffers.
msg395694 - (view) Author: miss-islington (miss-islington) Date: 2021-06-12 17:53
 New changeset a342cc5891dbd8a08d40e9444f2e2c9e93258721 by Pablo Galindo in branch 'main': bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676) https://github.com/python/cpython/commit/a342cc5891dbd8a08d40e9444f2e2c9e93258721 
msg395695 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-12 17:57
alessandro mantovani, one question, how did you generate the crash scripts?
msg395705 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-12 20:27
 New changeset d03f342a8389f1ea9100efb0d1a205601e607254 by Miss Islington (bot) in branch '3.10': bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676) (GH-26695) https://github.com/python/cpython/commit/d03f342a8389f1ea9100efb0d1a205601e607254 
msg395726 - (view) Author: alessandro mantovani (elmanto) Date: 2021-06-13 03:43
Fuzzing experimental techniques, but then I observed the same behavior was happening with vanilla afl++. As a starting queue I used the *.py files that I found in the repo under ‘test’ or so Best Alessandro Mantovani Inviato da iPhone > Il giorno 12.06.2021, alle ore 19:57, Pablo Galindo Salgado <report@bugs.python.org> ha scritto: > >  > Pablo Galindo Salgado <pablogsal@gmail.com> added the comment: > > alessandro mantovani, one question, how did you generate the crash scripts? > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue44396> > _______________________________________
History
Date User Action Args
2022-04-11 14:59:46adminsetgithub: 88562
2021-06-13 03:43:40elmantosetmessages: + msg395726
2021-06-12 20:27:49pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-06-12 20:27:10pablogsalsetmessages: + msg395705
2021-06-12 17:57:28pablogsalsetmessages: + msg395695
2021-06-12 17:53:58miss-islingtonsetstage: patch review
pull_requests: + pull_request25280
2021-06-12 17:53:57miss-islingtonsetnosy: + miss-islington
messages: + msg395694
2021-06-11 17:18:30pablogsalsetmessages: + msg395651
stage: patch review -> (no value)
2021-06-11 17:18:12pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request25262
2021-06-11 16:58:59pablogsalsetpriority: normal -> release blocker
2021-06-11 16:58:32pablogsalsetmessages: + msg395648
versions: + Python 3.10
2021-06-11 16:54:50pablogsalsetmessages: + msg395647
2021-06-11 16:46:26pablogsalsetnosy: + BTaskaya
messages: + msg395646
2021-06-11 15:44:02gvanrossumsetmessages: + msg395641
2021-06-11 15:10:17vstinnersetnosy: + gvanrossum, lys.nikolaou, pablogsal

title: Use-After-Free -> pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror()
2021-06-11 14:39:45elmantocreate