In a related article, I described how I set up Visual Studio to work with assembly language files using Yasm. In this article, I discuss some of the basics of writing functions callable from Visual C++. Honestly, I'm still wrapping my head around assembly language but I'm pretty hooked. Sources that have been helpful to me are The Old New Thing, Microsoft's documentation, and finally, Yasm's own documentation. The FreePascal team has also some useful information on this topic. Free Pascal Info on Win64 calling conventions.
In assembly language, the contract between a function and the code that calls is known as a calling convention. A calling convention maps function parameters to specific places on the stack and in registers. For example, in Windows for x86-64, the first parameter of a function is usually placed in the rcx register. Wikipedia's article on Calling Conventions is actually rather good. To summarize, registers are used for passing parameters as follows:
| Register | Windows | Unix (Linux, Mac OS/X) |
|---|---|---|
| rax | int/pointer return value | int/pointer return value |
| rsi | not used | 1st int / pointer |
| rdi | not used | 2nd int / pointer |
| rcx | 1st int / pointer | 3rd int / pointer |
| rdx | 2nd int / pointer | 4th int / pointer |
| r8 | 3rd int / pointer | 5th int / pointer |
| r9 | 4th int / pointer | 6th int / pointer |
| xmm0 - xmm3 | First four floating point | First four floating point and small structures |
| xmm4 - xmm7 | Not used | Next four floating point and small structures |
Windows divides function calls into two types. The first are those functions that call something in Windows, so called non-leaf functions, and then those functions that do not make other calls, leaf functions.
In both cases the Windows convention demands the stack to be 16 byte aligned.
Non-leaf functions have a fairly strict set of rules. You use the register allocations for parameters, as mentioned previously, you have to decorate a special segment so that Windows can add your function to its exception handling mechanism, and you have to have special entrance and exit code for your function, called a prologue, and an epilogue. Basically, in the prologue, you'll establish a frame and set aside some space so that functions can save the contents of the parameters without pushing or popping the stack repeatedly as you might make multiple function calls. This also provides a consistent way to re-use registers used for parameters.
Yasm has some extra keywords to help this extra work for you. Let's have a look at our own assembly function that calls MessageBox
extern MessageBoxA
mytitle: db 'hello windows',0
mymessage: db 'this is a big call',0
proc_frame callwin
global callwin
rex_push_reg rbp
alloc_stack 0x20
set_frame rbp,0x0
end_prolog
mov rcx, 0 ; hWnd = HWND_DESKTOP
mov rdx, mymessage ; LPCSTR lpText
mov r8, mytitle ; LPCSTR lpCaption
mov r9, 0 ; uType = MB_OK
call MessageBoxA
lea rsp,[rbp] ; This is the official epilog
add rsp,0x20
pop rbp
ret
endproc_frame
proc_frame gets us registered in the exception handler chain. It looks to me like your proc symbol gets decorated behind the scenes with something in the pdata segment at assembly time. When the Windows loader loads your exe, it goes through these decorated symbols and calls a special function to set up the exception handling details for each of those marked functions.
rex_push_reg does two things. First, we save rbp and we're going to use that as our frame. We also generate a special rex prefix at the beginning of our function and that lets the world know that this function can be hotpatched. Hotpatching is a way Windows has of applying patches to objects in a server while they are loaded.
alloc_stack just subtracts 0x20 bytes from rsp. The idea is to have a sort of a pattern that says we will allocate space for our own calls just once, give the functions we call some space to save parameters, rather than pushing and popping with each function call we make from our function. If I had needed local variables, or more than four stack parameters, I would have allocated space for them in the alloc_stack call and adjusted the frame pointer. A future example does this.
Finally we call our function, MessageBoxA.
Having finished, we restore the stack pointer, and exit.
It appears that we can sometimes get away with a bit less of a detailed call when we invoke a CRT function. This is likely not entirely safe. The interesting thing here is that again we allocate the 0x20 bytes for spill, whether we need it or not, and then, also, we allocate 0x8 bytes to keep the stack 16 byte aligned (the return address would throw it off).
extern printf
amessage: db 'hello windows\n',0
callcrt: global callcrt
sub rsp, 0x28
mov rcx, qword amessage
call printf
add rsp, 0x28
ret
You can call a leaf function. Leaf functions are functions that only call functions that you wrote. Because the contract is private to you, you could theoretically give your function whatever stack and register contract you want. However, it must still be callable from C.
We miss the old Watcom compiler ability to specify the register usage of functions. I'll probably steal this idea when I right my own language. That would let us write portable assembler code and share it between Mac and Windows and Linux.
callleaf: global callleaf
mov rax,rcx
shl rax,4
inc rax
ret
The following example shows how we call a non-leaf function with a parameter that we save in our spill space. We allocate a little local string, update it, and pass it to MessageBox.
Notice the pair of instructions: mov [rsp+8],ecx at the beginning, and add rcx, [rbp + 18h]. The mov stores rcx into the spill space the caller allocated for us, and the add fetches rcx out of there and adds that.
The little loop around 'fillit' just kicks out the alphabet, with the rcx added to the letter of each - caesarian encryption, I guess.
I use a frame pointer via rbp, and its pointing at the local data area. In the below example, I have the local variables first and then the spill for the MessageBox that I will call.
extern MessageBoxA
mytitle: db 'the alphabet',0
proc_frame callwinwithlocals
global callwinwithlocals
mov [rsp+8], rcx
push rbp
push rsi
sub rsp, 48h
lea rbp, [ rsp + 48h ]
end_prolog
mov rsi, 0
fillit:
mov rcx, 'A'
add rcx, [rbp + 18h]
add rcx, rsi
mov byte [rbp + rsi], cl
inc rsi
cmp rsi, 0xf
jb fillit
mov byte [rbp + rsi], 0
mov rcx, 0 ; hWnd = HWND_DESKTOP
mov rdx, rbp ; LPCSTR lpText
mov r8, mytitle ; LPCSTR lpCaption
mov r9, 0 ; uType = MB_OK
call MessageBoxA
add rsp, 48h
pop rsi
pop rbp
ret
endproc_frame
On the C side, one specifies that these assembly functions can be called as follows:
extern "C" {
int callleaf( int a );
int callcrt( );
int callwin( );
int referencecall( const char *src );
}
This tells the compiler to generate references to these functions using unmangled names. In order to work its object oriented magic, C++ does a lot of funky stuff, like mangling the names of functions for starts. This behind the scenes machinations of C++ are thoroughly undocumented, so therefor, you have to use C as a gateway into C++ from other systems.
You can download the zipped up VS2008 solution containing the above code: Download Solution. If you find problems or have questions about the above code, let us know at Email the writer(s).