Runtime Code Patching - Not for the Faint of Heart

Artikel
05/14/2008

I have been involved in several conversations recently that have revolved around the joys of runtime code patching. I am always shocked to hear people say that they are ok with this idea of code patching at runtime. Moreover – it shocks me that they think it is easy to get right! I do think that code patching can have its place in a system if it is implemented correctly and if its intent and semantics are fully disclosed to any potential users.

So what exactly is code patching? There are tons of examples of it on the web (mostly hacker sites! :D ) and many books (again – mostly hacker stuff) that describe it in detail – just “Live Search” “runtime code patching” , but basically patching is the process of replacing a set of op codes in memory with a different set of op codes – at runtime. This seems simple enough but the devil is in the details. Let’s dig in!

Consider these op codes from a simple and mostly useless function:

01001175 8bff mov edi,edi

01001177 53 push ebx

01001178 50 push eax

01001179 53 push ebx

0100117a 51 push ecx

0100117b 52 push edx

0100117c 5a pop edx

0100117d 59 pop ecx

0100117e 5b pop ebx

0100117f 58 pop eax

01001180 5b pop ebx

01001181 c3 ret

This code is kind of silly – but it has nice properties that expose the problems with patching, more on that in a bit. It is also made up of very commonly generated op codes. The format of the assembly listing is:

[address] [op codes] [mnemonics for op codes]

The typical reason for installing a patch is to either bypass or modify the behavior of an existing function. Therefore the canonical patch is “op codes for a jmp to an absolute address”, written over the beginning of an existing routine like so:

01001175 ea871100011b00 jmp 001b:01001187

0100117c 5a pop edx

0100117d 59 pop ecx

0100117e 5b pop ebx

0100117f 58 pop eax

01001180 5b pop ebx

01001181 c3 ret

Here we replaced the first few instructions of our function with an absolute jump to some other code, presumably code that we wrote and loaded into the system in some way.

So now that we know what a patch is – what is the problem? This seems like it will work? What exactly is it that we need to worry about? Well, the most obvious thing we have to worry about is another thread running the code that we are patching, while we are patching it. The reason we need to worry about this is because if we modify the code that another thread is running, while it is running it – it will crash, or at the very least do very strange things. We need to make sure that any thread running the code we are patching either executes all of the old op codes or just the new one – but never a combination of the two. You can imagine a scenario like this:

1. Thread T1 executes the first instruction from the old op codes at address 0x01001175

2. The old op codes are overwritten with the new "jump" op code, by thread T2

3. Thread T1 executes the op code at 0x01001177, which is now an address in the middle of the jump instruction that T2 wrote over old op codes

This would be really bad! J

So what to do? The next logical progression in analyzing the issue goes like this:

“Well, the problem is that we have a race with other threads running the code we are patching, so I’ll just make sure that no other threads are in that code when I do the patch!”.

Perfect! Well, not quite. Although we can corral all the other running processors (using DPCs, IPIs, etc.) and make sure that none of the running threads are in the code we are going to patch. But believe it or not this still isn’t enough to make our patch solid! To clarify, let’s restate the problem with our patching approach: we need to synchronize with all of the other threads currently running the code we are tying to patch. However, the latest incarnation of methodology, will only cause us to synchronize with all the threads currently running on other processors. What about the threads that aren’t running? Is there any synchronization required there? Actually and unfortunately – yes. The problem is – we can have threads that aren’t currently running but that have the address of the old op codes, where our new instruction is now residing, saved away for future use. This can occur if a thread was context swapped while executing the code we want to patch (when a thread is context swapped, the current instruction pointer is saved away so that it can be restored when the thread runs again). Other things that can cause the instruction pointer to be saved away are: exceptions, interrupts, etc..

So what is the moral of the story here? Don’t patch code? Well that may be a little extreme, but the moral is at least to never patch multiple instructions. Working in the Windows Online Crash Analysis (OCA ) data for a long time now, I can tell you that I have seen many failed attempts at doing this correctly that have ended in a BSOD. J

In fact, if you look at post Server 2003 Windows binaries, you will notice that the generated code follows this format almost exclusively:

7731a321 90 nop

7731a322 90 nop

7731a323 90 nop

7731a324 90 nop

7731a325 90 nop

7731a326 8bff mov edi,edi

7731a328 55 push ebp

7731a329 8bec mov ebp,esp

… <more op codes here>

This type of code generation allows for a patch to be installed safely at runtime. The 2 bytes at the start of the function (mov edi, edi) is enough space to hold the op code for a “relative short jump”, which be crafted to jump to the address of the 5 bytes of nop op codes. These bytes would have been previously overwritten with a “jmp ADDRESS” op code. This was a conscious change that was made to allow servicing of existing binaries on long running machines, without having to reboot to replace the binaries. The same rules apply though – the other processors must be corralled in order to make sure no thread is active in the code we want to patch at the time the patch is applied. Again – the reason this methodology works is because the op code boundaries match. In other words, either code will execute the “mov edi, edi” or the “short relative jump”. If it executes the former, then the routine runs normally through the code. If it executes the latter, then it will jump to the patch installed over the nop op codes. No problem here. There are issues with the instruction caches on processors as well, but those issues can be managed within the corralling mechanism. One last point is, we don’t have to worry about the “normal path” code running the patch that was written over the nops because there is no path to that code except via the “short relative jump” that we wrote over the first instruction. Nice huh?!

Well, I hope this has been somewhat interesting or useful in some way. Please let me know if there are any other things that would be interesting to talk about.

Comments

Anonymous
May 14, 2008
Nice post, Jonathan - thanks for explaining in a clear, concise fashion the issues revolving around this.
Anonymous
May 14, 2008
Very interesting post. Can you further elaborate on the caching issues involved? Also, why do we need to corral all running threads? Do we care if a thread is now running inside the old function? Thanks, Eran.
Anonymous
May 14, 2008
Thanks for the comments molotov and Eran! Eran - here is one example of a cache issue with the instruction cache (from the Intel IA32 Processors Manuals): "For Intel486 processors, a write to an instruction in the cache will modify it in both the cache and memory, but if the instruction was prefetched before the write, the old version of the instruction could be the one executed. To prevent the old instruction from being executed, flush the instruction prefetch unit by coding a jump instruction immediately after any write that modifies an instruction." There are other processor dependent issues as well, but they all revolve around the fact that entry points into the new instruction op code can exist when the caches and memory get out of sync. Thanks!
Anonymous
May 14, 2008
Eran - also about your second point - right - we don't care if someone was already passed that patch area - we just need to manage the accesses to the area around the patch. Thanks.
Anonymous
May 17, 2008
Daniel Pearson pointed out an error in my post that I want to share with everyone. I led you to believe (from my example disassembly at the end of the post) that the 5 bytes of nop opcode that get overwritten are at the end of the function. They are not - they are actually the 5 bytes preceding the function. It makes totally sense when you think about it - because the 2 byte jump instruction can only jmp 127/128 bytes in either direction respectively. Why is this important? Well if you had a function that was larger than 127 bytes you couldn't reach your nop patch bytes if they were at the end! :) It is a subtle but very important point. Thanks Daniel!
Anonymous
May 22, 2008
The mov edi, edi is used by Microsoft Hotpatching.
Anonymous
July 29, 2008
>> So what is the moral of the story here? Don’t patch code? Well that may be a little extreme, but the moral is at least to never patch multiple instructions. I think that's not enough. I saw the detour library by M$ used CopyMemory to copy op bytes, where CopyMeomory was finally spread to rep movsd; So even you just want to copy one single instruction, there is also possiblity that another thread gets CPU before all bytes are copied. (Also consider multi-core processor)
Anonymous
July 30, 2008
Sure but that could be easily fixed using instructions like cmpxchg8b.
Anonymous
August 28, 2008
How does the detours library handle the caching issue?
Anonymous
September 10, 2008
The comment has been removed
Anonymous
October 05, 2013
Hi, The problem with such technique is that we need to make such as technique atomic as if it fails in between it can lead to opcode corruption. The only security concern is that such a technique can be abused by Malware Developers to create Metamorphicpolymorphic malware, which can evade AV. Nevertheless it is a good piece of blog

Freigeben über

Runtime Code Patching - Not for the Faint of Heart

Comments

Zusätzliche Ressourcen