Note: Get the full running example here
While beeing able to use inline assembler within a 32 bit project, you can not use that within a 64bit build. Microsoft just does not support this. You have to use an external .asm
file instead which can be processed by MASM
during the build process.
Assembler in VS
To activate MASM
support in a C++
project in VS2019
, please follow the guidelines in the official documentation. Additionally I suggest you to install the VS extension AsmDude to get syntax highlighting.
Switching between both implementations
To access the procedures that are defined in the assembler file, you must declare them with extern "C"
in the header file. Otherwise the compiler rewrites the function names and thus the linker won't be able to match the CPP and ASM output together.
To see that effect, add a new function in a header file and call it somewhere in your CPP code. Then open the obj
file and search the method name.
extern bool DevToTest(int a, int b);
Without the "C"
addition, the name does not match the original one:
Using "C"
fixes this:
extern "C" bool DevToTest(int a, int b);
To differentiate between 32 and 64 bit code, you can use preprocessor directives. Adjust Naked32Bit.h
as following:
#pragma once #include "pch.h" #ifdef _WIN64 EXTERN_C void InitEnterLeaveCallbacks(bool* activate, int* hashMap, int size); EXTERN_C void FnEnterCallback(FunctionID funcId, UINT_PTR clientData, COR_PRF_FRAME_INFO func, COR_PRF_FUNCTION_ARGUMENT_INFO* argumentInfo); EXTERN_C void FnLeaveCallback(FunctionID funcId, UINT_PTR clientData, COR_PRF_FRAME_INFO func, COR_PRF_FUNCTION_ARGUMENT_INFO* argumentInfo); EXTERN_C void FnTailcallCallback(FunctionID funcId, UINT_PTR clientData, COR_PRF_FRAME_INFO func); #else void InitEnterLeaveCallbacks(bool* activate, int* hashMap, int size); //.... #endif
In case of a 64 bit build, the functions refer to external symbols. I also adjusted the signature of the Init
function. This was necessary because I wanted to show you how you can build the same logic as in the inline assembler. But this requires a hashmap. To avoid allocating memory in assembler, I just pass the variables from CPP
into the assembler code. This saves me some time and makes the whole thing more readable.
Note: Of course the naming of the header file is not correct anymore, but this does not matter 😄
Now adjust Naked32bit.cpp
:
extern "C" void _stdcall StackOverflowDetected(FunctionID funcId, int count) { std::cout << "stackoverflow: " << funcId << ", count: " << count; } extern "C" void _stdcall EnterCpp( FunctionID funcId, int identifier) { std::cout << "enter funcion id: " << funcId << ", Arguments in correct order: " << (identifier == 12345) << "\r\n"; } #ifdef _WIN64 #else bool* activateCallbacks; int* pHashMap; int mapSize; void InitEnterLeaveCallbacks(bool* activate, int* hashMap, int size) { activateCallbacks = activate; pHashMap = hashMap; mapSize = size; }
Both functions, EnterCpp
and SODetected
must be marked with extern "C"
. The Init
function and the variables must be moved into the 32bit code block. You can leave the 64bit code block empty because everything will be in the assembler file.
Now add the initialize in ProfilerCOncreteImpl.cpp
:
this->PHashMap = new int[mapSize]; memset(this->PHashMap, 0, mapSize); InitEnterLeaveCallbacks(&this->ActivateCallbacks, this->PHashMap, mapSize);
The ASM Code
What you will see now is no magic. There is only one thing you have to pay attention for: In 64Bt builds there is only one calling convention: fastcall
. See the links at the end of the post to get an insight into it. The most important points (at least these are the points I came across a few times):
- parameters are passed from left to right in the register:
RCX, RDX, R8, R9
- The caller must reserve 4*8 bytes in case of the callee wants to store the parameters onto the stack
- The caller has to clean up the stack afterwards
I stumbled a few times over the last two points which led to unwanted behavior.
_DATA SEGMENT pActivateEnterLeaveCallback qword 0 pHashMap qword 0 mapSize dword 0 _DATA ENDS extern EnterCpp:proc extern StackOverflowDetected:proc _TEXT SEGMENT PUBLIC InitEnterLeaveCallbacks InitEnterLeaveCallbacks PROC mov pActivateEnterLeaveCallback, RCX mov pHashMap, RDX mov mapSize, R8D ret InitEnterLeaveCallbacks ENDP PUBLIC FnEnterCallback FnEnterCallback PROC mov RAX, pActivateEnterLeaveCallback cmp byte ptr [RAX], 1 JNE skipCallback mov R8, pHashMap MOV RAX, RCX XOR RDX, RDX DIV DWORD PTR [mapSize] ADD R8, RDX INC DWORD PTR [R8] CMP DWORD PTR [R8], 30 JB skipStackOverflow xor rdx, rdx MOV EDX, [R8] SUB RSP, 20h CALL StackOverflowDetected ADD RSP, 20h skipStackOverflow: sub RSP, 20h mov rdx, 12345 CALL EnterCpp add RSP, 20h skipCallback: ret FnEnterCallback ENDP PUBLIC FnLeaveCallback FnLeaveCallback PROC MOV RAX, pActivateEnterLeaveCallback CMP BYTE PTR [RAX], 1 JNE skipCallback MOV R8, pHashMap MOV RAX, RCX XOR RDX, RDX DIV DWORD PTR [mapSize] ADD R8, RDX DEC DWORD PTR [R8] skipCallback: ret FnLeaveCallback ENDP PUBLIC FnTailcallCallback FnTailcallCallback PROC ret FnTailcallCallback ENDP _TEXT ENDS END
You see, nothing new here. sub RSP, 20h
and add RSP, 20h
are used to reserve memory on the stack and clean it up afterwards.
Using CPP implementations
As it seems that the CLR uses fastcall
convention for calling the callbacks, you may assume that you can use CPP implementations instead of writing assembler code. Indeed I was able to do this:
#ifdef _WIN64 bool* activateCallbacks; int* pHashMap; int mapSize; void InitEnterLeaveCallbacks(bool* activate, int* hashMap, int size) { activateCallbacks = activate; pHashMap = hashMap; mapSize = size; } void __fastcall FnEnterCallback( FunctionID funcId, UINT_PTR clientData, COR_PRF_FRAME_INFO func, COR_PRF_FUNCTION_ARGUMENT_INFO* argumentInfo) { if (activateCallbacks) { int amount = pHashMap[funcId % mapSize]; amount++; pHashMap[funcId % mapSize] = amount; if (amount >= 30) { StackOverflowDetected(funcId, amount); } EnterCpp(funcId, 12345); } } void __fastcall FnLeaveCallback( FunctionID funcId, UINT_PTR clientData, COR_PRF_FRAME_INFO func, COR_PRF_FUNCTION_ARGUMENT_INFO* argumentInfo) { if (activateCallbacks) { pHashMap[funcId % mapSize] = pHashMap[funcId % mapSize] - 1; } } void __fastcall FnTailcallCallback(FunctionID funcId, UINT_PTR clientData, COR_PRF_FRAME_INFO func) { } #else
During testing the code I don't see any errors but I don't know if this approach is intended by Microsoft.
Conclusion
The differences between 32 and 63 bit is not so big. I think the most relevant thing is the calling convention.
Additional Links
Configure project in VS to enable MASM
Use correct #define for x86/x64
Impact of fastcall to stack consumption
Unwind code macros
Stack usage on x64
Another link about stack frames
X64 ASM code for the profiler
Example about unwind information
Explanation of fast call asm code
Found a typo?
As I am not a native English speaker, it is very likely that you will find an error. In this case, feel free to create a pull request here: https://github.com/gabbersepp/dev.to-posts . Also please open a PR for all other kind of errors.
Do not worry about merge conflicts. I will resolve them on my own.
Top comments (0)