Skip to content

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Aug 25, 2025

Most OSes/libcs have an optimization that calloc() sometimes, or most of the time, do not call memset() in userland wasting CPU to zeroize brand new memory blocks/pages obtained fresh from the kernel. The larger the calloc() allocation is, the higher chance the memory blocks will be obtained fresh from the kernel. MS CRT's calloc() is a wrapper function that is either thin or heavy (personal opinions), and ultimatly forwards to HeapAlloc(hSecretUBHandle,HEAP_ZERO_MEMORY,size).

Whether HeapAlloc@Kernel32.dll has or doesn't have the don't memset(,0,); fresh kernel pages optimization, this author doesn't know and it is irrelevant. WinPerl did its part, to take advantage of the optimization if it exists inside Microsoft's closed source OS.

Historically perlhost.h/vmem.h was perl5xx.dll emulating calloc() because this area of the interp is "unfinished business" from the late 1990s where Win95 and "Win32s Runtime" on Win 3.11 WFW OS compatiblity was critical for WinPerl. WinNT Kernel Win OSes have always been POSIX-like or actually Unix SVR1 1983 compatible from the start (and remained compatible with POSIX/SVR1 1983 until WSL 1).

The alternate never used memory allocator in vmem.h doesn't have a Calloc() method, so the nextgen and current "native kernel32.dll malloc()" code couldn't implement a Calloc() method. The DIY malloc() impl doesn't have a Calloc() because in 1993-1997-ish, VirtualAlloc, VirtualProtect, VirtualFree, couldn't be used in WinPerl for some reason lost to time.

This author's Win95 Kernel32.dll file exports all 3 functions and they are not stubs that only do "return STATUS_NOT_IMPLEMENTED;".

do_crt_invalid_parameter() was added so the DIY allocator behaves like the native MS CRT calloc() behaves. perlhost.h's design concept is that the library can be copy pasted without modification to the PHP and Python interps, something like that. Therefore perlhost.h and vmem.h aren't allowed to be aware of the Perl C API. So no croak()/die()/die_noperl().

-split off the very cold "Free to wrong pool" panic branch into its own
function. Less "dead" machine code for the CPU to skip around in the
perf critical VMemNL::Free() call. VC 2022 -O1 LTO inlined the
DispatchWrongPool() method against our wishes, so override VC 2022's and
GCC's inline criteria. We do not want inlining here.
-move 2 of void* writes out of the CS lock inside PerlMemSharedMalloc()
PerlMemMalloc and PerlMemParseMalloc and the Calloc()s, they are writes
of constants to a new mem block and not reads/writes to the head (VMem*)
object, or the first block hanging off the VMem* LL, so its not needed
to muxtex lock those 2 writes
-m_lRefCount assignment in VMem::VMem so CC doesn't need to save var this
around fn call InitializeCriticalSection in this function
-change return NULL; to return ptr; better codegen on MSVC 2022, since
optmizer doesnt realize var ptr is a free 0x0 value after false test
and instead emits xor RAX, RAX;
-reorder the VMem struct so VMemNL m_VMem (the per-my_perl pool) is at the
the front


  • This set of changes does not require a perldelta entry.
Most OSes/libcs have an optimization that calloc() sometimes, or most of the time, do not call memset() in userland wasting CPU to zeroize brand new memory blocks/pages obtained fresh from the kernel. The larger the calloc() allocation is, the higher chance the memory blocks will be obtained fresh from the kernel. MS CRT's calloc() is a wrapper function that is either thin or heavy (personal opinions), and ultimatly forwards to HeapAlloc(hSecretUBHandle,HEAP_ZERO_MEMORY,size). Whether HeapAlloc@Kernel32.dll has or doesn't have the don't memset(,0,); fresh kernel pages optimization, this author doesn't know and it is irrelevant. WinPerl did its part, to take advantage of the optimization if it exists inside Microsoft's closed source OS. Historically perlhost.h/vmem.h was perl5xx.dll emulating calloc() because this area of the interp is "unfinished business" from the late 1990s where Win95 and "Win32s Runtime" on Win 3.11 WFW OS compatiblity was critical for WinPerl. WinNT Kernel Win OSes have always been POSIX-like or actually Unix SVR1 1983 compatible from the start (and remained compatible with POSIX/SVR1 1983 until WSL 1). The alternate never used memory allocator in vmem.h doesn't have a Calloc() method, so the nextgen and current "native kernel32.dll malloc()" code couldn't implement a Calloc() method. The DIY malloc() impl doesn't have a Calloc() because in 1993-1997-ish, VirtualAlloc, VirtualProtect, VirtualFree, couldn't be used in WinPerl for some reason lost to time. This author's Win95 Kernel32.dll file exports all 3 functions and they are not stubs that only do "return STATUS_NOT_IMPLEMENTED;". do_crt_invalid_parameter() was added so the DIY allocator behaves like the native MS CRT calloc() behaves. perlhost.h's design concept is that the library can be copy pasted without modification to the PHP and Python interps, something like that. Therefore perlhost.h and vmem.h aren't allowed to be aware of the Perl C API. So no croak()/die()/die_noperl(). -split off the very cold "Free to wrong pool" panic branch into its own function. Less "dead" machine code for the CPU to skip around in the perf critical VMemNL::Free() call. VC 2022 -O1 LTO inlined the DispatchWrongPool() method against our wishes, so override VC 2022's and GCC's inline criteria. We do not want inlining here. -move 2 of void* writes out of the CS lock inside PerlMemSharedMalloc() PerlMemMalloc and PerlMemParseMalloc and the Calloc()s, they are writes of constants to a new mem block and not reads/writes to the head (VMem*) object, or the first block hanging off the VMem* LL, so its not needed to muxtex lock those 2 writes -m_lRefCount assignment in VMem::VMem so CC doesn't need to save var this around fn call InitializeCriticalSection in this function -change return NULL; to return ptr; better codegen on MSVC 2022, since optmizer doesnt realize var ptr is a free 0x0 value after false test and instead emits xor RAX, RAX; -reorder the VMem struct so VMemNL m_VMem (the per-my_perl pool) is at the the front
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant