[Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)

* [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
       [not found] <CAG5rQryFDdrYZKPWYm8k_5EPGOP9RgvUqamSkjWiO3UikieeAw@mail.gmail.com>
@ 2014-08-13 18:36 ` Hulin, Patrick - 0559 - MITLL
  2014-08-14 13:53   ` Hulin, Patrick - 0559 - MITLL
  2014-08-15 20:48   ` Paolo Bonzini
  0 siblings, 2 replies; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-13 18:36 UTC (permalink / raw)
  To: qemu-devel

Hi QEMU devs,

QEMU 2.10 does not currently run Windows 7 64-bit without KVM. There have been a few threads about this over the past few years (such as https://bugs.launchpad.net/qemu/+bug/921208 and http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02603.html), but the problem was never resolved. I think I've identified the cause, but I am not sure what the correct way to fix it is. I'm working on PANDA, a set of analysis extensions to QEMU (github.com/moyix/panda) and I'd really like to be able to use our analyses on Windows 7 64-bit.

There are two issues right now. The first is that QEMU is missing a CPUID bit (for debug extensions, CPUID_DE) because the feature isn't implemented in QEMU. This can easily be hacked around by just enabling the bit, but I imagine you all aren't excited about advertising features that don't exist. The second issue is that both the installer and the OS itself fail with blue screens of DRIVER_IRQL_NOT_LESS_OR_EQUAL or KMODE_EXCEPTION_NOT_HANDLED (due to illegal instruction). This is a little trickier.

One of the major differences between Windows 7 x86 and x64 is that the 64-bit version has Microsoft's Kernel Patch Protection, aka PatchGuard. In order to protect itself, PatchGuard lives encrypted in memory and follows a two-stage decryption process. The process begins with a series of xor's which successively decrypt the PatchGuard code. This is self-modifying code (in particular, the first xor overwrites itself and the next instruction).

For the uninitiated, as I understand it, QEMU's self-modifying code support works in the following way. Before executing a translation block, QEMU write-protects (using host MMU features) the _host_ page that contains the section of guest memory on which the guest TB code lives. When self-modifying code attempts to write to that page, it triggers a host segmentation fault. QEMU then catches this segmentation fault using standard POSIX signal infrastructure. Once caught it walks into the software MMU code. If the write intersects the current TB, QEMU splits the TB into two: the single instruction that is being executed and the rest of the block, which is invalidated so it will be retranslated as soon as QEMU tries to run it. QEMU then restores the pre-write CPU state (cpu_restore_state) and longjmp's out (cpu_resume_from_signal). The instruction then executes again, and this time it actually makes the write to QEMU's memory state. QEMU translates the new code, which is now in its own TB, and continues from there.

In this case, the write is 8 bytes and unaligned, so it gets split into 8 single-byte writes. In stock QEMU, these writes are done in reverse order (see the loop in softmmu_template.h, line 402). The third decryption xor from Kernel Patch Protection should hit 4 bytes that are in the current TB and 4 bytes in the TB afterwards in linear order. Since this happens in reverse order, and the last 4 bytes of the write do not intersect the current TB, those writes happen successfully and QEMU's memory is modified. The 4th byte in linear order (the 5th in temporal order) then triggers the current_tb_modified flag and cpu_restore_state, longjmp'ing out. However, cpu_restore_state only goes back to right before that byte is written, so the last 4 bytes—the ones off the current TB—have been modified. QEMU then invalidates, retranslates, and runs the xor again. This successfully decrypts the 4 bytes inside the current TB, but because the write to the last 4 bytes was not reversed as it should have been, those bytes get xor'd a second time. Effectively, QEMU mistakenly re-encrypts those bytes. Once the code is incorrect, inaccuracies build up until something blue screen-able happens (in this case, an illegal instruction or various kinds of bad accesses).

I am not sure how to fix this issue. For now, in our tool, PANDA, we have just reversed the order of the loop. But that change will fail in any situation in which the write happens off the front end of the TB and then the self-modifying code loops back to the previous TB. This modification enables Windows 7 x64 to run successfully without KVM, which is all we really need for our purposes.

I looked back in the commit history for this area of the code. It looks like the order of the loop was changed from forwards to backwards back in 2007 by the following two commits:

commit 6c41b2723f5cac6e62e68925e7a73f30b11a7a06
Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
Date:   Sat Nov 17 12:12:29 2007 +0000
    Don't compare '\0' against pointers.
    Add a note from Fabrice in slow_st template.

    git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3669 c046a42c-6fe2-441c-8c8c-71466251a162

commit 7221fa98d381a19b8809979934554644381fb88c
Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
Date:   Sat Nov 17 09:53:42 2007 +0000
    Check permissions for the last byte first in unaligned slow_st accesses (patch from TeLeMan).

    git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3665 c046a42c-6fe2-441c-8c8c-71466251a162

The relevant qemu-devel thread is here: https://lists.gnu.org/archive/html/qemu-devel/2007-10/msg00646.html. It looks like the author was trying to fix a page boundary bug where the write was off the front of the write-protected page and would happen twice, just as in this case. Unfortunately, the "fix" just moved the problem to a different case. Fabrice commented on that patch in this thread: https://lists.gnu.org/archive/html/qemu-devel/2007-11/msg00538.html, saying that the reverse-order code would work across forward page boundaries, essentially by chance. Unfortunately, it caused the code to fail on forward TB boundaries.

If it's not too complicated, I'd like to contribute an actual fix back upstream. I don't understand the MMU code completely, so if I've gotten anything wrong please correct me. As I see it, there are two options, neither of which seem too easy under the current control flow:

- Make sure cpu_restore_state goes all the way back to the beginning of the stq, and not just the most recent stb.
- Specifically check to see if an stq intersects the current TB before splitting it into the 8 stb's. 

There are probably others though. Thoughts? Questions? It would be really awesome to get a real fix for this bug.

P.S. Windows 8 x64 still fails, even after my forward-loop patch. I'm working on debugging that too.

^ permalink raw reply	[flat|nested] 14+ messages in thread