All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
       [not found] <CAG5rQryFDdrYZKPWYm8k_5EPGOP9RgvUqamSkjWiO3UikieeAw@mail.gmail.com>
@ 2014-08-13 18:36 ` Hulin, Patrick - 0559 - MITLL
  2014-08-14 13:53   ` Hulin, Patrick - 0559 - MITLL
  2014-08-15 20:48   ` Paolo Bonzini
  0 siblings, 2 replies; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-13 18:36 UTC (permalink / raw)
  To: qemu-devel

Hi QEMU devs,

QEMU 2.10 does not currently run Windows 7 64-bit without KVM. There have been a few threads about this over the past few years (such as https://bugs.launchpad.net/qemu/+bug/921208 and http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02603.html), but the problem was never resolved. I think I've identified the cause, but I am not sure what the correct way to fix it is. I'm working on PANDA, a set of analysis extensions to QEMU (github.com/moyix/panda) and I'd really like to be able to use our analyses on Windows 7 64-bit.

There are two issues right now. The first is that QEMU is missing a CPUID bit (for debug extensions, CPUID_DE) because the feature isn't implemented in QEMU. This can easily be hacked around by just enabling the bit, but I imagine you all aren't excited about advertising features that don't exist. The second issue is that both the installer and the OS itself fail with blue screens of DRIVER_IRQL_NOT_LESS_OR_EQUAL or KMODE_EXCEPTION_NOT_HANDLED (due to illegal instruction). This is a little trickier.

One of the major differences between Windows 7 x86 and x64 is that the 64-bit version has Microsoft's Kernel Patch Protection, aka PatchGuard. In order to protect itself, PatchGuard lives encrypted in memory and follows a two-stage decryption process. The process begins with a series of xor's which successively decrypt the PatchGuard code. This is self-modifying code (in particular, the first xor overwrites itself and the next instruction).

For the uninitiated, as I understand it, QEMU's self-modifying code support works in the following way. Before executing a translation block, QEMU write-protects (using host MMU features) the _host_ page that contains the section of guest memory on which the guest TB code lives. When self-modifying code attempts to write to that page, it triggers a host segmentation fault. QEMU then catches this segmentation fault using standard POSIX signal infrastructure. Once caught it walks into the software MMU code. If the write intersects the current TB, QEMU splits the TB into two: the single instruction that is being executed and the rest of the block, which is invalidated so it will be retranslated as soon as QEMU tries to run it. QEMU then restores the pre-write CPU state (cpu_restore_state) and longjmp's out (cpu_resume_from_signal). The instruction then executes again, and this time it actually makes the write to QEMU's memory state. QEMU translates the new code, which is now in its own TB, and continues from there.

In this case, the write is 8 bytes and unaligned, so it gets split into 8 single-byte writes. In stock QEMU, these writes are done in reverse order (see the loop in softmmu_template.h, line 402). The third decryption xor from Kernel Patch Protection should hit 4 bytes that are in the current TB and 4 bytes in the TB afterwards in linear order. Since this happens in reverse order, and the last 4 bytes of the write do not intersect the current TB, those writes happen successfully and QEMU's memory is modified. The 4th byte in linear order (the 5th in temporal order) then triggers the current_tb_modified flag and cpu_restore_state, longjmp'ing out. However, cpu_restore_state only goes back to right before that byte is written, so the last 4 bytes—the ones off the current TB—have been modified. QEMU then invalidates, retranslates, and runs the xor again. This successfully decrypts the 4 bytes inside the current TB, but because the write to the last 4 bytes was not reversed as it should have been, those bytes get xor'd a second time. Effectively, QEMU mistakenly re-encrypts those bytes. Once the code is incorrect, inaccuracies build up until something blue screen-able happens (in this case, an illegal instruction or various kinds of bad accesses).

I am not sure how to fix this issue. For now, in our tool, PANDA, we have just reversed the order of the loop. But that change will fail in any situation in which the write happens off the front end of the TB and then the self-modifying code loops back to the previous TB. This modification enables Windows 7 x64 to run successfully without KVM, which is all we really need for our purposes.

I looked back in the commit history for this area of the code. It looks like the order of the loop was changed from forwards to backwards back in 2007 by the following two commits:

commit 6c41b2723f5cac6e62e68925e7a73f30b11a7a06
Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
Date:   Sat Nov 17 12:12:29 2007 +0000
    Don't compare '\0' against pointers.
    Add a note from Fabrice in slow_st template.
        
    git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3669 c046a42c-6fe2-441c-8c8c-71466251a162
 
commit 7221fa98d381a19b8809979934554644381fb88c
Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
Date:   Sat Nov 17 09:53:42 2007 +0000
    Check permissions for the last byte first in unaligned slow_st accesses (patch from TeLeMan).
    
    git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3665 c046a42c-6fe2-441c-8c8c-71466251a162

The relevant qemu-devel thread is here: https://lists.gnu.org/archive/html/qemu-devel/2007-10/msg00646.html. It looks like the author was trying to fix a page boundary bug where the write was off the front of the write-protected page and would happen twice, just as in this case. Unfortunately, the "fix" just moved the problem to a different case. Fabrice commented on that patch in this thread: https://lists.gnu.org/archive/html/qemu-devel/2007-11/msg00538.html, saying that the reverse-order code would work across forward page boundaries, essentially by chance. Unfortunately, it caused the code to fail on forward TB boundaries.

If it's not too complicated, I'd like to contribute an actual fix back upstream. I don't understand the MMU code completely, so if I've gotten anything wrong please correct me. As I see it, there are two options, neither of which seem too easy under the current control flow:

- Make sure cpu_restore_state goes all the way back to the beginning of the stq, and not just the most recent stb.
- Specifically check to see if an stq intersects the current TB before splitting it into the 8 stb's. 

There are probably others though. Thoughts? Questions? It would be really awesome to get a real fix for this bug.

P.S. Windows 8 x64 still fails, even after my forward-loop patch. I'm working on debugging that too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-13 18:36 ` [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM) Hulin, Patrick - 0559 - MITLL
@ 2014-08-14 13:53   ` Hulin, Patrick - 0559 - MITLL
  2014-08-15 20:48   ` Paolo Bonzini
  1 sibling, 0 replies; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-14 13:53 UTC (permalink / raw)
  To: qemu-devel

I suppose I should probably add a tl;dr.

I have a diagnosis of the reason Windows 7 64-bit won’t run without KVM, as well as a hack to fix it, but I’d like input on a real fix.

Details below.

On Aug 13, 2014, at 2:36 PM, Hulin, Patrick - 0559 - MITLL <patrick.hulin@ll.mit.edu> wrote:

> Hi QEMU devs,
> 
> QEMU 2.10 does not currently run Windows 7 64-bit without KVM. There have been a few threads about this over the past few years (such as https://bugs.launchpad.net/qemu/+bug/921208 and http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02603.html), but the problem was never resolved. I think I've identified the cause, but I am not sure what the correct way to fix it is. I'm working on PANDA, a set of analysis extensions to QEMU (github.com/moyix/panda) and I'd really like to be able to use our analyses on Windows 7 64-bit.
> 
> There are two issues right now. The first is that QEMU is missing a CPUID bit (for debug extensions, CPUID_DE) because the feature isn't implemented in QEMU. This can easily be hacked around by just enabling the bit, but I imagine you all aren't excited about advertising features that don't exist. The second issue is that both the installer and the OS itself fail with blue screens of DRIVER_IRQL_NOT_LESS_OR_EQUAL or KMODE_EXCEPTION_NOT_HANDLED (due to illegal instruction). This is a little trickier.
> 
> One of the major differences between Windows 7 x86 and x64 is that the 64-bit version has Microsoft's Kernel Patch Protection, aka PatchGuard. In order to protect itself, PatchGuard lives encrypted in memory and follows a two-stage decryption process. The process begins with a series of xor's which successively decrypt the PatchGuard code. This is self-modifying code (in particular, the first xor overwrites itself and the next instruction).
> 
> For the uninitiated, as I understand it, QEMU's self-modifying code support works in the following way. Before executing a translation block, QEMU write-protects (using host MMU features) the _host_ page that contains the section of guest memory on which the guest TB code lives. When self-modifying code attempts to write to that page, it triggers a host segmentation fault. QEMU then catches this segmentation fault using standard POSIX signal infrastructure. Once caught it walks into the software MMU code. If the write intersects the current TB, QEMU splits the TB into two: the single instruction that is being executed and the rest of the block, which is invalidated so it will be retranslated as soon as QEMU tries to run it. QEMU then restores the pre-write CPU state (cpu_restore_state) and longjmp's out (cpu_resume_from_signal). The instruction then executes again, and this time it actually makes the write to QEMU's memory state. QEMU translates the new code, which is now in its own TB, and continues from there.
> 
> In this case, the write is 8 bytes and unaligned, so it gets split into 8 single-byte writes. In stock QEMU, these writes are done in reverse order (see the loop in softmmu_template.h, line 402). The third decryption xor from Kernel Patch Protection should hit 4 bytes that are in the current TB and 4 bytes in the TB afterwards in linear order. Since this happens in reverse order, and the last 4 bytes of the write do not intersect the current TB, those writes happen successfully and QEMU's memory is modified. The 4th byte in linear order (the 5th in temporal order) then triggers the current_tb_modified flag and cpu_restore_state, longjmp'ing out. However, cpu_restore_state only goes back to right before that byte is written, so the last 4 bytes—the ones off the current TB—have been modified. QEMU then invalidates, retranslates, and runs the xor again. This successfully decrypts the 4 bytes inside the current TB, but because the write to the last 4 bytes was not reversed as it should have been, those bytes get xor'd a second time. Effectively, QEMU mistakenly re-encrypts those bytes. Once the code is incorrect, inaccuracies build up until something blue screen-able happens (in this case, an illegal instruction or various kinds of bad accesses).
> 
> I am not sure how to fix this issue. For now, in our tool, PANDA, we have just reversed the order of the loop. But that change will fail in any situation in which the write happens off the front end of the TB and then the self-modifying code loops back to the previous TB. This modification enables Windows 7 x64 to run successfully without KVM, which is all we really need for our purposes.
> 
> I looked back in the commit history for this area of the code. It looks like the order of the loop was changed from forwards to backwards back in 2007 by the following two commits:
> 
> commit 6c41b2723f5cac6e62e68925e7a73f30b11a7a06
> Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
> Date:   Sat Nov 17 12:12:29 2007 +0000
>     Don't compare '\0' against pointers.
>     Add a note from Fabrice in slow_st template.
>         
>     git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3669 c046a42c-6fe2-441c-8c8c-71466251a162
>  
> commit 7221fa98d381a19b8809979934554644381fb88c
> Author: balrog <balrog@c046a42c-6fe2-441c-8c8c-71466251a162>
> Date:   Sat Nov 17 09:53:42 2007 +0000
>     Check permissions for the last byte first in unaligned slow_st accesses (patch from TeLeMan).
>     
>     git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@3665 c046a42c-6fe2-441c-8c8c-71466251a162
> 
> The relevant qemu-devel thread is here: https://lists.gnu.org/archive/html/qemu-devel/2007-10/msg00646.html. It looks like the author was trying to fix a page boundary bug where the write was off the front of the write-protected page and would happen twice, just as in this case. Unfortunately, the "fix" just moved the problem to a different case. Fabrice commented on that patch in this thread: https://lists.gnu.org/archive/html/qemu-devel/2007-11/msg00538.html, saying that the reverse-order code would work across forward page boundaries, essentially by chance. Unfortunately, it caused the code to fail on forward TB boundaries.
> 
> If it's not too complicated, I'd like to contribute an actual fix back upstream. I don't understand the MMU code completely, so if I've gotten anything wrong please correct me. As I see it, there are two options, neither of which seem too easy under the current control flow:
> 
> - Make sure cpu_restore_state goes all the way back to the beginning of the stq, and not just the most recent stb.
> - Specifically check to see if an stq intersects the current TB before splitting it into the 8 stb's. 
> 
> There are probably others though. Thoughts? Questions? It would be really awesome to get a real fix for this bug.
> 
> P.S. Windows 8 x64 still fails, even after my forward-loop patch. I'm working on debugging that too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-13 18:36 ` [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM) Hulin, Patrick - 0559 - MITLL
  2014-08-14 13:53   ` Hulin, Patrick - 0559 - MITLL
@ 2014-08-15 20:48   ` Paolo Bonzini
  2014-08-15 21:49     ` Hulin, Patrick - 0559 - MITLL
  1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2014-08-15 20:48 UTC (permalink / raw)
  To: Hulin, Patrick - 0559 - MITLL, qemu-devel

Il 13/08/2014 20:36, Hulin, Patrick - 0559 - MITLL ha scritto:
> Hi QEMU devs,
> 
> QEMU 2.10 does not currently run Windows 7 64-bit without KVM. There
> have been a few threads about this over the past few years (such as
> https://bugs.launchpad.net/qemu/+bug/921208 and
> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02603.html),
> but the problem was never resolved. I think I've identified the
> cause, but I am not sure what the correct way to fix it is. I'm
> working on PANDA, a set of analysis extensions to QEMU
> (github.com/moyix/panda) and I'd really like to be able to use our
> analyses on Windows 7 64-bit.
> 
> There are two issues right now. The first is that QEMU is missing a
> CPUID bit (for debug extensions, CPUID_DE) because the feature isn't
> implemented in QEMU. This can easily be hacked around by just
> enabling the bit, but I imagine you all aren't excited about
> advertising features that don't exist.

Not too worried about it either.

The two aspects of CPUID.DE are: 1) support for CR4.DE and raising an
exception for DR4 and DR5 access; 2) I/O breakpoints.  Now, QEMU always
raises the exception even if CR4.DE=0, and doesn't complain if you set
bits of CR4 that ought to be reserved according to CPUID.  And I/O
breakpoints aren't that hard to implement.  Having a full implementation
of CPUID.DE wouldn't be hard, but I wouldn't have much problems with
just setting the bit in TCG_FEATURES and noting that it is only
partially implemented.

> The second issue is that both
> the installer and the OS itself fail with blue screens of
> DRIVER_IRQL_NOT_LESS_OR_EQUAL or KMODE_EXCEPTION_NOT_HANDLED (due to
> illegal instruction). This is a little trickier.

Indeed.  Thanks for debugging it.

> Before executing a translation block, QEMU write-protects (using
> host MMU features) the _host_ page that contains the section of guest 
> memory on which the guest TB code lives.

Are you sure?  My knowledge of this code is only cursory, but I think
that only applies to user-mode emulation.  System emulation uses softmmu
exclusively to trap such writes (TLB_NOTDIRTY or something like that).

> In this case, the write is 8 bytes and unaligned, so it gets split
> into 8 single-byte writes. In stock QEMU, these writes are done in
> reverse order (see the loop in softmmu_template.h, line 402). The
> third decryption xor from Kernel Patch Protection should hit 4 bytes
> that are in the current TB and 4 bytes in the TB afterwards in linear
> order. Since this happens in reverse order, and the last 4 bytes of
> the write do not intersect the current TB, those writes happen
> successfully and QEMU's memory is modified. The 4th byte in linear
> order (the 5th in temporal order) then triggers the
> current_tb_modified flag and cpu_restore_state, longjmp'ing out.
>
> I am not sure how to fix this issue. For now, in our tool, PANDA, we
> have just reversed the order of the loop. But that change will fail
> in any situation in which the write happens off the front end of the
> TB and then the self-modifying code loops back to the previous TB.
> This modification enables Windows 7 x64 to run successfully without
> KVM, which is all we really need for our purposes.

Would it work to just call tb_invalidate_phys_page_range before the
helper_ret_stb loop?

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-15 20:48   ` Paolo Bonzini
@ 2014-08-15 21:49     ` Hulin, Patrick - 0559 - MITLL
  2014-08-17  5:21       ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-15 21:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Aug 15, 2014, at 4:48 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 13/08/2014 20:36, Hulin, Patrick - 0559 - MITLL ha scritto:
>> Hi QEMU devs,
>> 
>> QEMU 2.10 does not currently run Windows 7 64-bit without KVM. There
>> have been a few threads about this over the past few years (such as
>> https://bugs.launchpad.net/qemu/+bug/921208 and
>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02603.html),
>> but the problem was never resolved. I think I've identified the
>> cause, but I am not sure what the correct way to fix it is. I'm
>> working on PANDA, a set of analysis extensions to QEMU
>> (github.com/moyix/panda) and I'd really like to be able to use our
>> analyses on Windows 7 64-bit.
>> 
>> There are two issues right now. The first is that QEMU is missing a
>> CPUID bit (for debug extensions, CPUID_DE) because the feature isn't
>> implemented in QEMU. This can easily be hacked around by just
>> enabling the bit, but I imagine you all aren't excited about
>> advertising features that don't exist.
> 
> Not too worried about it either.
> 
> The two aspects of CPUID.DE are: 1) support for CR4.DE and raising an
> exception for DR4 and DR5 access; 2) I/O breakpoints.  Now, QEMU always
> raises the exception even if CR4.DE=0, and doesn't complain if you set
> bits of CR4 that ought to be reserved according to CPUID.  And I/O
> breakpoints aren't that hard to implement.  Having a full implementation
> of CPUID.DE wouldn't be hard, but I wouldn't have much problems with
> just setting the bit in TCG_FEATURES and noting that it is only
> partially implemented.
> 
>> The second issue is that both
>> the installer and the OS itself fail with blue screens of
>> DRIVER_IRQL_NOT_LESS_OR_EQUAL or KMODE_EXCEPTION_NOT_HANDLED (due to
>> illegal instruction). This is a little trickier.
> 
> Indeed.  Thanks for debugging it.
> 
>> Before executing a translation block, QEMU write-protects (using
>> host MMU features) the _host_ page that contains the section of guest 
>> memory on which the guest TB code lives.
> 
> Are you sure?  My knowledge of this code is only cursory, but I think
> that only applies to user-mode emulation.  System emulation uses softmmu
> exclusively to trap such writes (TLB_NOTDIRTY or something like that).

Certainly possible. I don’t really understand this code either.

> 
>> In this case, the write is 8 bytes and unaligned, so it gets split
>> into 8 single-byte writes. In stock QEMU, these writes are done in
>> reverse order (see the loop in softmmu_template.h, line 402). The
>> third decryption xor from Kernel Patch Protection should hit 4 bytes
>> that are in the current TB and 4 bytes in the TB afterwards in linear
>> order. Since this happens in reverse order, and the last 4 bytes of
>> the write do not intersect the current TB, those writes happen
>> successfully and QEMU's memory is modified. The 4th byte in linear
>> order (the 5th in temporal order) then triggers the
>> current_tb_modified flag and cpu_restore_state, longjmp'ing out.
>> 
>> I am not sure how to fix this issue. For now, in our tool, PANDA, we
>> have just reversed the order of the loop. But that change will fail
>> in any situation in which the write happens off the front end of the
>> TB and then the self-modifying code loops back to the previous TB.
>> This modification enables Windows 7 x64 to run successfully without
>> KVM, which is all we really need for our purposes.
> 
> Would it work to just call tb_invalidate_phys_page_range before the
> helper_ret_stb loop?
> 
> Paolo

Maybe. I think there’s another issue, which is that QEMU’s ending up in the I/O read/write code instead of the normal memory RW. This could be QEMU messing up, it could be PatchGuard doing something weird, or it could be me misunderstanding what’s going on. I never really figured out how the control flow works here.

I’m going to look at this more next week—I’ll see if I have more info then.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-15 21:49     ` Hulin, Patrick - 0559 - MITLL
@ 2014-08-17  5:21       ` Paolo Bonzini
  2014-08-18 17:37         ` Richard Henderson
  2014-08-18 17:47         ` Hulin, Patrick - 0559 - MITLL
  0 siblings, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2014-08-17  5:21 UTC (permalink / raw)
  To: Hulin, Patrick - 0559 - MITLL; +Cc: qemu-devel

Il 15/08/2014 23:49, Hulin, Patrick - 0559 - MITLL ha scritto:
>>> In this case, the write is 8 bytes and unaligned, so it gets split
>>> into 8 single-byte writes. In stock QEMU, these writes are done in
>>> reverse order (see the loop in softmmu_template.h, line 402). The
>>> third decryption xor from Kernel Patch Protection should hit 4 bytes
>>> that are in the current TB and 4 bytes in the TB afterwards in linear
>>> order. Since this happens in reverse order, and the last 4 bytes of
>>> the write do not intersect the current TB, those writes happen
>>> successfully and QEMU's memory is modified. The 4th byte in linear
>>> order (the 5th in temporal order) then triggers the
>>> current_tb_modified flag and cpu_restore_state, longjmp'ing out.
>>>
>> Would it work to just call tb_invalidate_phys_page_range before the
>> helper_ret_stb loop?
> 
> Maybe. I think there’s another issue, which is that QEMU’s ending up
> in the I/O read/write code instead of the normal memory RW. This could
> be QEMU messing up, it could be PatchGuard doing something weird, or it
> could be me misunderstanding what’s going on. I never really figured out
> how the control flow works here.

That's okay.  Everything that's in the slow path goes down
io_mem_read/write (in this case TLB_NOTDIRTY is set for dirty-page
tracking and causes QEMU to choose that path).

Try making a self-contained test case using the kvm-unit-tests harness
(git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git).

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-17  5:21       ` Paolo Bonzini
@ 2014-08-18 17:37         ` Richard Henderson
  2014-08-18 17:47           ` Hulin, Patrick - 0559 - MITLL
  2014-08-18 17:47         ` Hulin, Patrick - 0559 - MITLL
  1 sibling, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2014-08-18 17:37 UTC (permalink / raw)
  To: Paolo Bonzini, Hulin, Patrick - 0559 - MITLL; +Cc: qemu-devel

On 08/16/2014 10:21 PM, Paolo Bonzini wrote:
>>> Would it work to just call tb_invalidate_phys_page_range before the
>>> helper_ret_stb loop?

I doubt it.

>> Maybe. I think there’s another issue, which is that QEMU’s ending up
>> in the I/O read/write code instead of the normal memory RW. This could
>> be QEMU messing up, it could be PatchGuard doing something weird, or it
>> could be me misunderstanding what’s going on. I never really figured out
>> how the control flow works here.
> 
> That's okay.  Everything that's in the slow path goes down
> io_mem_read/write (in this case TLB_NOTDIRTY is set for dirty-page
> tracking and causes QEMU to choose that path).
> 
> Try making a self-contained test case using the kvm-unit-tests harness
> (git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git).

I believe that the proper solution is to force *both* page table entries into
the TLB before performing any actual memory operations.

We'll have done the page for the first byte at the top of
helper_{le,be}_{ld,st}_name.  When we discover it's an unaligned access, we
should load and check the pte for the second page.  We might have to shuffle
those two tests around, since in theory the second page could be I/O mapped and
we'd want to pass off the whole access to io_mem_*.

Since two adjacent pages can't conflict in our direct-mapped TLB, we can then
safely pass off the work to secondary helpers without fear the first TLB entry
will be flushed.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-18 17:37         ` Richard Henderson
@ 2014-08-18 17:47           ` Hulin, Patrick - 0559 - MITLL
  2014-08-18 20:50             ` Hulin, Patrick - 0559 - MITLL
  2014-08-18 21:12             ` Paolo Bonzini
  0 siblings, 2 replies; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-18 17:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paolo Bonzini, qemu-devel


On Aug 18, 2014, at 1:37 PM, Richard Henderson <rth@twiddle.net> wrote:

> On 08/16/2014 10:21 PM, Paolo Bonzini wrote:
>>>> Would it work to just call tb_invalidate_phys_page_range before the
>>>> helper_ret_stb loop?
> 
> I doubt it.

Correct. Doesn’t work. Haven’t fully diagnosed why, but it doesn’t seem to ever hit the current_tb_modified passage if you invalidate beforehand.

> I believe that the proper solution is to force *both* page table entries into
> the TLB before performing any actual memory operations.
> 
> We'll have done the page for the first byte at the top of
> helper_{le,be}_{ld,st}_name.  When we discover it's an unaligned access, we
> should load and check the pte for the second page.  We might have to shuffle
> those two tests around, since in theory the second page could be I/O mapped and
> we'd want to pass off the whole access to io_mem_*.
> 
> Since two adjacent pages can't conflict in our direct-mapped TLB, we can then
> safely pass off the work to secondary helpers without fear the first TLB entry
> will be flushed.

This isn’t about cross-page writes, although that might make fixing the problem a little tricky. The issue occurs with two adjacent TBs on the same page: because the individual writes are split up and done in reverse order, writes (and reads) off the back of the current TB happen twice. In the case of an xor this means the original xor gets undone, which is what breaks in Windows 7.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-17  5:21       ` Paolo Bonzini
  2014-08-18 17:37         ` Richard Henderson
@ 2014-08-18 17:47         ` Hulin, Patrick - 0559 - MITLL
  2014-08-18 18:08           ` Hulin, Patrick - 0559 - MITLL
  1 sibling, 1 reply; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-18 17:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]

On Aug 17, 2014, at 1:21 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 15/08/2014 23:49, Hulin, Patrick - 0559 - MITLL ha scritto:
>>>> In this case, the write is 8 bytes and unaligned, so it gets split
>>>> into 8 single-byte writes. In stock QEMU, these writes are done in
>>>> reverse order (see the loop in softmmu_template.h, line 402). The
>>>> third decryption xor from Kernel Patch Protection should hit 4 bytes
>>>> that are in the current TB and 4 bytes in the TB afterwards in linear
>>>> order. Since this happens in reverse order, and the last 4 bytes of
>>>> the write do not intersect the current TB, those writes happen
>>>> successfully and QEMU's memory is modified. The 4th byte in linear
>>>> order (the 5th in temporal order) then triggers the
>>>> current_tb_modified flag and cpu_restore_state, longjmp'ing out.
>>>> 
>>> Would it work to just call tb_invalidate_phys_page_range before the
>>> helper_ret_stb loop?
>> 
>> Maybe. I think there’s another issue, which is that QEMU’s ending up
>> in the I/O read/write code instead of the normal memory RW. This could
>> be QEMU messing up, it could be PatchGuard doing something weird, or it
>> could be me misunderstanding what’s going on. I never really figured out
>> how the control flow works here.
> 
> That's okay.  Everything that's in the slow path goes down
> io_mem_read/write (in this case TLB_NOTDIRTY is set for dirty-page
> tracking and causes QEMU to choose that path).
> 
> Try making a self-contained test case using the kvm-unit-tests harness
> (git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git).
> 
> Paolo

Okay. I’ve attached a test case. Note that since this has to run without KVM it messes around with the run script.


[-- Attachment #2: selfmodify.patch --]
[-- Type: application/octet-stream, Size: 1784 bytes --]

diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak
index 0b0da85..0ddf5ad 100644
--- a/config/config-x86-common.mak
+++ b/config/config-x86-common.mak
@@ -105,6 +105,8 @@ $(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o $(TEST_DIR)/vmx_tests.o
 
 $(TEST_DIR)/debug.elf: $(cstart.o) $(TEST_DIR)/debug.o
 
+$(TEST_DIR)/selfmodify.elf: $(cstart.o) $(TEST_DIR)/selfmodify.o
+
 arch_clean:
 	$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
 	$(TEST_DIR)/.*.d lib/x86/.*.d
diff --git a/config/config-x86_64.mak b/config/config-x86_64.mak
index 06b2581..edb22ca 100644
--- a/config/config-x86_64.mak
+++ b/config/config-x86_64.mak
@@ -9,5 +9,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
 	  $(TEST_DIR)/pcid.flat $(TEST_DIR)/debug.flat
 tests += $(TEST_DIR)/svm.flat
 tests += $(TEST_DIR)/vmx.flat
+tests += $(TEST_DIR)/selfmodify.flat
 
 include config/config-x86-common.mak
diff --git a/x86/run b/x86/run
index 646c577..a35678f 100755
--- a/x86/run
+++ b/x86/run
@@ -33,7 +33,7 @@ else
 	pc_testdev="-device testdev,chardev=testlog -chardev file,id=testlog,path=msr.out"
 fi
 
-command="${qemu} -enable-kvm $pc_testdev -display none -serial stdio $pci_testdev -kernel"
+command="${qemu} $pc_testdev -display none -serial stdio $pci_testdev -kernel"
 echo ${command} "$@"
 ${command} "$@"
 ret=$?
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 6d3e23a..646130b 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -6,6 +6,10 @@
 # arch = i386/x86_64 # Only if the test case works only on one of them
 # groups = group1 group2 # Used to identify test cases with run_tests -g ...
 
+[selfmodify]
+file = selfmodify.flat
+arch = x86_64
+
 [apic]
 file = apic.flat
 smp = 2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-18 17:47         ` Hulin, Patrick - 0559 - MITLL
@ 2014-08-18 18:08           ` Hulin, Patrick - 0559 - MITLL
  0 siblings, 0 replies; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-18 18:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

On 8/18/14, 1:47 PM, "Hulin, Patrick - 0559 - MITLL"
<Patrick.Hulin@ll.mit.edu> wrote:

>On Aug 17, 2014, at 1:21 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>> Il 15/08/2014 23:49, Hulin, Patrick - 0559 - MITLL ha scritto:
>>>>> In this case, the write is 8 bytes and unaligned, so it gets split
>>>>> into 8 single-byte writes. In stock QEMU, these writes are done in
>>>>> reverse order (see the loop in softmmu_template.h, line 402). The
>>>>> third decryption xor from Kernel Patch Protection should hit 4 bytes
>>>>> that are in the current TB and 4 bytes in the TB afterwards in linear
>>>>> order. Since this happens in reverse order, and the last 4 bytes of
>>>>> the write do not intersect the current TB, those writes happen
>>>>> successfully and QEMU's memory is modified. The 4th byte in linear
>>>>> order (the 5th in temporal order) then triggers the
>>>>> current_tb_modified flag and cpu_restore_state, longjmp'ing out.
>>>>> 
>>>> Would it work to just call tb_invalidate_phys_page_range before the
>>>> helper_ret_stb loop?
>>> 
>>> Maybe. I think there¹s another issue, which is that QEMU¹s ending up
>>> in the I/O read/write code instead of the normal memory RW. This could
>>> be QEMU messing up, it could be PatchGuard doing something weird, or it
>>> could be me misunderstanding what¹s going on. I never really figured
>>>out
>>> how the control flow works here.
>> 
>> That's okay.  Everything that's in the slow path goes down
>> io_mem_read/write (in this case TLB_NOTDIRTY is set for dirty-page
>> tracking and causes QEMU to choose that path).
>> 
>> Try making a self-contained test case using the kvm-unit-tests harness
>> (git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git).
>> 
>> Paolo
>
>Okay. I¹ve attached a test case. Note that since this has to run without
>KVM it messes around with the run script.

The friendly folks on IRC point out that I should probably include the
actual binary.


[-- Attachment #2: selfmodify.flat --]
[-- Type: application/octet-stream, Size: 85963 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-18 17:47           ` Hulin, Patrick - 0559 - MITLL
@ 2014-08-18 20:50             ` Hulin, Patrick - 0559 - MITLL
  2014-08-19  6:16               ` Paolo Bonzini
  2014-08-18 21:12             ` Paolo Bonzini
  1 sibling, 1 reply; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-18 20:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paolo Bonzini, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 765 bytes --]


On 8/18/14, 1:47 PM, "Hulin, Patrick - 0559 - MITLL"
<Patrick.Hulin@ll.mit.edu> wrote:

>On Aug 18, 2014, at 1:37 PM, Richard Henderson <rth@twiddle.net> wrote:
>
>>On 08/16/2014 10:21 PM, Paolo Bonzini wrote:
>>>>>Would it work to just call tb_invalidate_phys_page_range before the
>>>>>helper_ret_stb loop?
>>I doubt it.
>
>Correct. Doesn¹t work. Haven¹t fully diagnosed why, but it doesn¹t seem
>to ever hit the current_tb_modified passage if you invalidate beforehand.

Yeah - mem_io_pc doesn¹t get updated until we¹re inside io_write, so
tb_invalidate_phys_page_range thinks we¹re inside a different TB. As a
result, it¹s ³is this TB modified² check still returns false.

I¹ve attached the correct source patch for the test case as well.


[-- Attachment #2: selfmodify.patch --]
[-- Type: application/octet-stream, Size: 2113 bytes --]

diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak
index 0b0da85..0ddf5ad 100644
--- a/config/config-x86-common.mak
+++ b/config/config-x86-common.mak
@@ -105,6 +105,8 @@ $(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o $(TEST_DIR)/vmx_tests.o
 
 $(TEST_DIR)/debug.elf: $(cstart.o) $(TEST_DIR)/debug.o
 
+$(TEST_DIR)/selfmodify.elf: $(cstart.o) $(TEST_DIR)/selfmodify.o
+
 arch_clean:
 	$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
 	$(TEST_DIR)/.*.d lib/x86/.*.d
diff --git a/config/config-x86_64.mak b/config/config-x86_64.mak
index 06b2581..edb22ca 100644
--- a/config/config-x86_64.mak
+++ b/config/config-x86_64.mak
@@ -9,5 +9,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
 	  $(TEST_DIR)/pcid.flat $(TEST_DIR)/debug.flat
 tests += $(TEST_DIR)/svm.flat
 tests += $(TEST_DIR)/vmx.flat
+tests += $(TEST_DIR)/selfmodify.flat
 
 include config/config-x86-common.mak
diff --git a/x86/run b/x86/run
index 646c577..a35678f 100755
--- a/x86/run
+++ b/x86/run
@@ -33,7 +33,7 @@ else
 	pc_testdev="-device testdev,chardev=testlog -chardev file,id=testlog,path=msr.out"
 fi
 
-command="${qemu} -enable-kvm $pc_testdev -display none -serial stdio $pci_testdev -kernel"
+command="${qemu} $pc_testdev -display none -serial stdio $pci_testdev -kernel"
 echo ${command} "$@"
 ${command} "$@"
 ret=$?
diff --git a/x86/selfmodify.S b/x86/selfmodify.S
new file mode 100644
index 0000000..8d985cc
--- /dev/null
+++ b/x86/selfmodify.S
@@ -0,0 +1,20 @@
+geteip:
+	mov (%rsp), %rax
+	ret
+
+.global main
+.type main, @function
+
+main:
+	movq $0x947B967B00000000, %rbx
+	call geteip
+	xorq %rbx, (%rax)
+	nop
+	jmp fail
+	jmp fail
+	xor %rax, %rax
+	retq
+
+fail:
+	movq $2, %rax
+	retq 
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 6d3e23a..646130b 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -6,6 +6,10 @@
 # arch = i386/x86_64 # Only if the test case works only on one of them
 # groups = group1 group2 # Used to identify test cases with run_tests -g ...
 
+[selfmodify]
+file = selfmodify.flat
+arch = x86_64
+
 [apic]
 file = apic.flat
 smp = 2

[-- Attachment #3: selfmodify.flat --]
[-- Type: application/octet-stream, Size: 85963 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-18 17:47           ` Hulin, Patrick - 0559 - MITLL
  2014-08-18 20:50             ` Hulin, Patrick - 0559 - MITLL
@ 2014-08-18 21:12             ` Paolo Bonzini
  1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2014-08-18 21:12 UTC (permalink / raw)
  To: Hulin, Patrick - 0559 - MITLL, Richard Henderson; +Cc: qemu-devel

Il 18/08/2014 19:47, Hulin, Patrick - 0559 - MITLL ha scritto:
>> We'll have done the page for the first byte at the top of 
>> helper_{le,be}_{ld,st}_name.  When we discover it's an unaligned
>> access, we should load and check the pte for the second page.  We
>> might have to shuffle those two tests around, since in theory the
>> second page could be I/O mapped and we'd want to pass off the
>> whole access to io_mem_*.
>> 
>> Since two adjacent pages can't conflict in our direct-mapped TLB,
>> we can then safely pass off the work to secondary helpers without
>> fear the first TLB entry will be flushed.
> 
> This isn’t about cross-page writes, although that might make fixing
> the problem a little tricky. The issue occurs with two adjacent TBs
> on the same page: because the individual writes are split up and done
> in reverse order, writes (and reads) off the back of the current TB
> happen twice. In the case of an xor this means the original xor gets
> undone, which is what breaks in Windows 7.

If you fill the TLB beforehand as suggested by Richard, you can reverse
again the direction of the "for" loop.  This should hopefully fix your bug.

Of course if the write is not cross-page, there's no need to do the TLB
pre-fill.

Thanks for the test case!

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-18 20:50             ` Hulin, Patrick - 0559 - MITLL
@ 2014-08-19  6:16               ` Paolo Bonzini
  2014-08-20 14:03                 ` Hulin, Patrick - 0559 - MITLL
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2014-08-19  6:16 UTC (permalink / raw)
  To: Hulin, Patrick - 0559 - MITLL, Richard Henderson; +Cc: qemu-devel

Il 18/08/2014 22:50, Hulin, Patrick - 0559 - MITLL ha scritto:
>> >Correct. Doesn¹t work. Haven¹t fully diagnosed why, but it doesn¹t seem
>> >to ever hit the current_tb_modified passage if you invalidate beforehand.
> Yeah - mem_io_pc doesn¹t get updated until we¹re inside io_write, so
> tb_invalidate_phys_page_range thinks we¹re inside a different TB. As a
> result, it¹s ³is this TB modified² check still returns false.

We can set that (and probably mem_io_vaddr) before the for loop, too.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-19  6:16               ` Paolo Bonzini
@ 2014-08-20 14:03                 ` Hulin, Patrick - 0559 - MITLL
  2014-08-20 15:12                   ` Richard Henderson
  0 siblings, 1 reply; 14+ messages in thread
From: Hulin, Patrick - 0559 - MITLL @ 2014-08-20 14:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Richard Henderson

On Aug 19, 2014, at 2:16 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 18/08/2014 22:50, Hulin, Patrick - 0559 - MITLL ha scritto:
>>>> Correct. Doesn¹t work. Haven¹t fully diagnosed why, but it doesn¹t seem
>>>> to ever hit the current_tb_modified passage if you invalidate beforehand.
>> Yeah - mem_io_pc doesn¹t get updated until we¹re inside io_write, so
>> tb_invalidate_phys_page_range thinks we¹re inside a different TB. As a
>> result, it¹s ³is this TB modified² check still returns false.
> 
> We can set that (and probably mem_io_vaddr) before the for loop, too.
> 
> Paolo

Strange - keeping the loop reversed and calling tb_invalidate_phys_range, with mem_io_pc set, works on the test case but fails on real Windows 7. It must be breaking something else.

I think making the loop forwards again makes the most sense. I’m not sure if the bug from 2007 will even still occur. That was caused by QEMU’s longjmp’ing only after the page boundary, similar to how it now only longjmp’s after the TB boundary. QEMU currently decides whether to longjmp by checking whether any of the TBs that intersect the page (i.e. p->first_tb etc.) contain the write address, and every code page should be TLB-protected. So every unaligned (and therefore out-of-line) write that touches already-translated code should get caught. As far as I can tell, the only problem is when a write splits into partially touching translated and untranslated code.

Can you guys explain why calling tlb_fill is necessary on the page boundary case?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)
  2014-08-20 14:03                 ` Hulin, Patrick - 0559 - MITLL
@ 2014-08-20 15:12                   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2014-08-20 15:12 UTC (permalink / raw)
  To: Hulin, Patrick - 0559 - MITLL, Paolo Bonzini; +Cc: qemu-devel

On 08/20/2014 07:03 AM, Hulin, Patrick - 0559 - MITLL wrote:
> Can you guys explain why calling tlb_fill is necessary on the page boundary case?

So that you trap on an unaligned store where the second page is unmapped.

That's what we get "for free" with running to store loop backward.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-20 15:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAG5rQryFDdrYZKPWYm8k_5EPGOP9RgvUqamSkjWiO3UikieeAw@mail.gmail.com>
2014-08-13 18:36 ` [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM) Hulin, Patrick - 0559 - MITLL
2014-08-14 13:53   ` Hulin, Patrick - 0559 - MITLL
2014-08-15 20:48   ` Paolo Bonzini
2014-08-15 21:49     ` Hulin, Patrick - 0559 - MITLL
2014-08-17  5:21       ` Paolo Bonzini
2014-08-18 17:37         ` Richard Henderson
2014-08-18 17:47           ` Hulin, Patrick - 0559 - MITLL
2014-08-18 20:50             ` Hulin, Patrick - 0559 - MITLL
2014-08-19  6:16               ` Paolo Bonzini
2014-08-20 14:03                 ` Hulin, Patrick - 0559 - MITLL
2014-08-20 15:12                   ` Richard Henderson
2014-08-18 21:12             ` Paolo Bonzini
2014-08-18 17:47         ` Hulin, Patrick - 0559 - MITLL
2014-08-18 18:08           ` Hulin, Patrick - 0559 - MITLL

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.