In some situations, we may want to integrate some hand-crafted object code into our project. This is most productively accomplished by writing Intel x86 assembly and assembling it into object code via an assembler like NASM. In other situations, we may be looking at disassembly (that may have come from C, Assembly, or both). Further, on x86 architectures all of our C/C++ code gets compiled into Intel x86 object code. So if we are reverse engineering unmanaged code, it will also be critical to understand how functions call each other at the lowest level. In either case, we must know exactly how functions are called, otherwise it would be difficult to make heads or tails of a program’s flow. In this post, I’ll go over the three most common calling conventions. You’ll need to know only the very basics of assembly to follow along.
Function calls on x86 architectures
The x86 architecture doesn’t have any concrete notion of a function call in the same sense that high level software languages do–we work with a series of jumps back and forth between blocks of code. Consider the following example:
add eax, ebx
jmp MultiplyEaxByTwo
Continue:
; ...
MultiplyEaxByTwo:
shl eax,2
jmp Continue
After adding the ebx
register to eax
, we multiply eax
by two, then continue executing ;...
. This is a simple analogy to a function call in assembly.
The preferred way to perform function calls is a little different. We typically use call
and ret
:
add eax, ebx
call MultiplyEaxByTwo
; ...
MultiplyEaxByTwo:
shl eax,2
ret
call
differs from jmp
in one important way–it pushes the address of the next instruction to be executed after returning (here ;...
) onto the stack. Why? Because ret
pops this value off the stack and sets our instruction pointer eip
. This is why we didn’t have to label ;...
as Continue
in the above example. ret
does all of the work of returning execution to the callee for us.
In the above examples, we actually created our own calling convention. The function MultiplyEaxByTwo
takes in one parameter, eax
, which it modifies in place. When calling assembly-to-assembly, we are free to come up with whatever calling conventions we think are convenient. It is once we want to interact with higher level languages like C that we must pay careful attention to protocols! If we wanted to call MultiplyEaxByTwo
from C, how could we do it? We’d want to make a call like:
extern void MultiplyEaxByTwo(int *ptr);
// ...
unsigned int x = 10;
MultiplyEaxByTwo(&x);
printf("This value should be twenty: %u\n", x);
C is going to set up the stack in a specific way before call MultiplyEaxByTwo
, it is going to expect the stack to look a specific way after MultiplyEaxByTwo
executes a ret
, and (for functions with a return value) it is going to look in a very specific place for the result.
These specifics define the calling convention.
The Big Three
There are lots and lots of calling conventions. Fortunately, there are only three that predominate in C (there is one other, the thiscall
convention, which is used extensively in C++
but is outside of the scope of this tutorial). All of the code from this section is arranged in a VS2013 solution available on GitHub.
Calling conventions answer three important questions:
- What do the stack and registers look like when the function is called?
- What do the stack and registers look like when the function returns?
- Where is the result stored?
Investigating cdecl
First, up is cdecl
, which is the default calling convention for C and C++. Consider a function taking three int
s and returning the sum:
__declspec(dllexport) int __cdecl MyCdecl(int a, int b, int c);
The __cdecl
keyword specifies that MyCdecl
adheres to the cdecl
calling convention. The __declspec(dllexport)
keyword allows us to export the function in a DLL, so that we can test our function:
TEST_CLASS(MyCdeclTest)
{
public:
TEST_METHOD(MyCdeclAddsCorrectly)
{
auto result = MyCdecl(30, 8, 4);
Assert::AreEqual(42, result);
}
};
We implement MyCdecl
in assembly to illustrate the cdecl
calling convention:
GLOBAL _MyCdecl
EXPORT _MyCdecl
_MyCdecl:
xor eax,eax
add eax,[esp+4]
add eax,[esp+8]
add eax,[esp+12]
ret
Let’s break down _MyCdecl
line by line:
- All
cdecl
functions start with an underscore.GLOBAL
tells NASM that_MyCdecl
is a global symbol. EXPORT
tells NASM we are writing a DLL, and this function is to be exported. We still need to mark it GLOBAL!_MyCdecl:
is a label that tells NASM where our function starts.xor eax,eax
setseax
to zeroadd eax,[esp+4]
adds the value on the stack 4 bytes above the stack pointeresp
- Same as 5, but with an 8-byte offset.
- Same as 5, but with a 12-byte offset.
- Pop
eip
off the stack, i.e. return to the caller.
According to cdecl
, the stack will look like this when the top of _MyCdecl
executes our test method MyCdeclAddsCorrectly
. Each block [ ]
represents four bytes.
Low memory
[ RP ] <-- ESP
[ 30 ]
[ 8 ]
[ 4 ]
[ ... ]
High Memory
RP
denotes the address of the return pointer, i.e. the value that was pushed onto the stack when call
was executed. This is the address where our function will return execution to.
The arguments have been pushed onto the stack from right to left, so that the first argument is at the lowest memory address (recall that the stack grows towards lower memory addresses). EIP’s value is the address of RP
, so [esp+4]
corresponds with our first argument, [esp+8]
with our second, and so on.
OK, so we are zeroing out eax
, then adding each of the three arguments. But how do we return this value? It turns out that in cdecl
all we had to do was store the return value into eax
. That’s it!
One subtle feature of cdecl
to note is that the caller will have to clean up the stack, i.e. return esp
to the correct value, after ret
. In fact, this is the major difference between cdecl
and stdcall
, which we examine next.
Investigating stdcall
To investigate stdcall
, let’s create a similar function:
__declspec(dllexport) int __stdcall MyStdcall(int a, int b, int c);
The big difference here is in the __stdcall
keyword. As you might expect, this tells C that our function implements the stdcall
convention. Our test will be nearly identical to the previous section’s:
TEST_CLASS(MyStdCallTest)
{
public:
TEST_METHOD(MyStdCallAddsCorrectly)
{
auto result = MyStdcall(30, 8, 4);
Assert::AreEqual(42, result);
}
};
We mentioned that the major difference between cdecl
and stdcall
is that the callee (our function) must clean up the stack. How is this done? Well, it turns out that ret
can take an argument corresponding with the number of bytes to add to esp
after popping off the return pointer:
GLOBAL _MyStdcall@12
EXPORT _MyStdcall@12
_MyStdcall@12:
xor eax,eax
add eax,[esp+4]
add eax,[esp+8]
add eax,[esp+12]
ret 12
Why add? Examine what we would like the stack to look like after returning:
Low memory
[ RP ]
[ 30 ]
[ 8 ]
[ 4 ]
[ ... ] <-- ESP
High Memory
ESP needs to increment a total of 16 bytes–4 for the return pointer, then 12 for the arguments. That’s it!
The naming convention for stdcall
functions is to prepend an underscore and append an @
followed by the number of bytes that must be cleaned off the stack, i.e. the number of arguments times four.
Investigating fastcall
The final x86 calling convention you’re likely to run into when looking at C programs is the fastcall
convention. Like stdcall
, the callee must clean the stack. The major difference between the two is that the first two arguments will not be present on the stack. Instead, they are shoved into the general purpose registers ecx
and edx
. Keeping data in registers rather than in memory can be much faster (hence the name fastcall
).
Our function declaration appears as follows:
__declspec(dllexport) int __fastcall MyFastcall(int a, int b, int c);
The test follows from the above two sections:
TEST_CLASS(MyFastcallTest)
{
public:
TEST_METHOD(MyFastcallAddsCorrectly)
{
auto result = MyFastcall(30, 8, 4);
Assert::AreEqual(42, result);
}
};
When MyFastcall
executes, the stack will look like this:
Low memory
[ RP ] <-- ESP
[ 4 ]
[ ... ]
High Memory
ecx
will contain 30
and edx
will contain 8
. Our implementation is then
GLOBAL @MyFastcall@12
EXPORT @MyFastcall@12
@MyFastcall@12:
xor eax,eax
add eax,ecx
add eax,edx
add eax,[esp+4]
ret 4
Notice that we must only clean 4 bytes from the stack, since two of our arguments are in registers rather than the stack. The naming convention for fastcall
functions is to prepend an @
, append an @
, then append the number of bytes to clean from the stack.
Wrapping up
Notice that each of our test cases looks exactly the same–from C, the calling convention really doesn’t matter.
Where it matters is when we dive underneath the surface and get into the x86 object code. Our three conventions have these main features:
cdecl
: caller cleans the stack, all arguments are pushed onto the stack from right to left. Names formatted like_foo
stdcall
: callee cleans the stack, arguments the same ascdecl
. Names formatted like_foo@20
.fastcall
: callee cleans the stack, first two arguments inecx
,edx
, rest onto the stack from right to left. Names formatted like@foo@12
It is worth getting very familiar with these three calling conventions! If you do any assembly programming, reverse engineering, or vulnerability research, it is an absolutely critical skill to have.
Some gotchas
There is some ceremony you must go through to get NASM assembling as part of your MSBuild process. I recommend starting from https://github.com/JLospinoso/x86CallingConventions, which contains the following bit of magic within the DLL’s project file:
<ItemGroup>
<CustomBuild Include="CallingConventions.nasm">
<FileType>Document</FileType>
<Message Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Compiling %(Identity)</Message>
<Message Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Compiling %(Identity)</Message>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">nasm -f win$(PlatformArchitecture) -Xvc -o $(IntermediateOutputPath)\%(Filename).obj %(Identity)</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(IntermediateOutputPath)\%(Filename).obj;%(Outputs)</Outputs>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">nasm -f win$(PlatformArchitecture) -Xvc -o $(IntermediateOutputPath)\%(Filename).obj %(Identity)</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(IntermediateOutputPath)\%(Filename).obj;%(Outputs)</Outputs>
</CustomBuild>
</ItemGroup>
As you’re building out your DLL, dumpbin
is an indispensable tool for determining if you are having export-related issues:
dumpbin /EXPORTS foo.dll
will tell you whether you’ve successfully exported your function, and with what calling convention (information you can glean from the exported name).