Description
The size check in _Py_DecodeUTF8Ex can be improved to always check against a constant value without further arithmetic involved. This is already done at other places within the file, e.g. here.
I was curious if this could actually be triggered with a proof of concept by overflowing the check and eventually performing an out of boundary heap access. And in fact, with a very artificial setup, it is possible on a 32 bit system which tries to convert a 2 GB long string:
#include "Python.h" #include <sys/mman.h> #include <err.h> #include <stdlib.h> int main(int argc, char *argv[]) { char *str; size_t wlen; wchar_t *program; // force UTF-8 mode Py_UTF8Mode = 1; if ((program = Py_DecodeLocale(argv[0], NULL)) == NULL) errx(1, "PyDecodeLocale"); Py_SetProgramName(program); Py_Initialize(); // try to convert a 2 GB long string if ((str = mmap(NULL, (size_t)INT_MAX + 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0)) == (void *)-1) err(1, "malloc"); memset(str, 'a', INT_MAX); str[INT_MAX] = '\0'; Py_DecodeLocale(str, &wlen); PyMem_RawFree(program); return 0; }
I doubt that this is really reachable with actual code. But at least it is a good showcase that actual arithmetic is left over in the if-check. Let's remove it and save us this possible headache.
PS: Not sure if this is the correct way to create python issues with GitHub now. Let me know if something's missing or wrong!