All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: [RFC] Keeping host value of CR4.MCE (and other bits) live during guest execution
       [not found] <CAN+hb0VA6Xrff+YOpKSSqjkdLeZO8S3EGdt6nCLnBaU9Ag-E0w@mail.gmail.com>
@ 2015-04-15 23:52 ` Benjamin Serebrin
  2015-04-16  0:05   ` Andy Lutomirski
  2015-04-16  9:23   ` Fwd: " Paolo Bonzini
  0 siblings, 2 replies; 4+ messages in thread
From: Benjamin Serebrin @ 2015-04-15 23:52 UTC (permalink / raw)
  To: kvm, pbonzini, luto

There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1),
the value gets zeroed while the CPU is running in guest context.
If a machine check event arrives while the CPU is in guest context and
effective CR4.MCE is zero, the machine raises CATERR and crashes.

We should make sure the host value of CR4.MCE is always active.  There
are read and write shadows for the guest to think it wrote its own value.

For discussion: there's new complexity with CR4 shadowing
(1e02ce4cccdcb9688386e5b8d2c9fa4660b45389).  I measure CR4 reads at 24
cycles on haswell and 36 on sandybridge, which compares well with
L2 miss costs.  Is the shadowing worth the complexity?  CR4 is also
cached (with no real consistency mechanism) in the VMCS at the time
of guest VCPU creation.  If there is ever a change in CR4 value
over time, or if CR4 is different on different CPUs in the system, all this
logic gets broken.

Thanks,
Ben


---

The host's decision to enable machine check exceptions should remain
in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
and passed a slightly-modified 0 to the vmcs.guest_cr4 value.

Tested: Inject machine check while a guest is spinning.
Before the change, if guest CR4.MCE==0, then the machine check is
escalated to Catastrophic Error (CATERR) and the machine dies.
If guest CR4.MCE==1, then the machine check causes VMEXIT and is
handled normally by host Linux. After the change, injecting a machine
check causes normal Linux machine check handling.


---
 arch/x86/kvm/vmx.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a214104..44c8d24 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu,
unsigned long cr3)

 static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-       unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ?
-                   KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
+       /*
+        * Pass through host's Machine Check Enable value to hw_cr4, which
+        * is in force while we are in guest mode.  Do not let guests control
+        * this bit, even if host CR4.MCE == 0.
+        */
+       unsigned long hw_cr4 =
+               (read_cr4() & X86_CR4_MCE) |
+               (cr4 & ~X86_CR4_MCE) |
+               (to_vmx(vcpu)->rmode.vm86_active ?
+                KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);

        if (cr4 & X86_CR4_VMXE) {
                /*
--

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC] Keeping host value of CR4.MCE (and other bits) live during guest execution
  2015-04-15 23:52 ` Fwd: [RFC] Keeping host value of CR4.MCE (and other bits) live during guest execution Benjamin Serebrin
@ 2015-04-16  0:05   ` Andy Lutomirski
  2015-04-16  9:23   ` Fwd: " Paolo Bonzini
  1 sibling, 0 replies; 4+ messages in thread
From: Andy Lutomirski @ 2015-04-16  0:05 UTC (permalink / raw)
  To: Benjamin Serebrin; +Cc: kvm list, Paolo Bonzini

On Wed, Apr 15, 2015 at 4:52 PM, Benjamin Serebrin <serebrin@google.com> wrote:
> There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1),
> the value gets zeroed while the CPU is running in guest context.
> If a machine check event arrives while the CPU is in guest context and
> effective CR4.MCE is zero, the machine raises CATERR and crashes.
>
> We should make sure the host value of CR4.MCE is always active.  There
> are read and write shadows for the guest to think it wrote its own value.
>
> For discussion: there's new complexity with CR4 shadowing
> (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389).  I measure CR4 reads at 24
> cycles on haswell and 36 on sandybridge, which compares well with
> L2 miss costs.  Is the shadowing worth the complexity?  CR4 is also
> cached (with no real consistency mechanism) in the VMCS at the time
> of guest VCPU creation.  If there is ever a change in CR4 value
> over time, or if CR4 is different on different CPUs in the system, all this
> logic gets broken.

The cache line I chose for the shadow is hot during context switches,
so we shouldn't ever have a meaningful cache miss there (except maybe
in KVM).

Also, we do have a consistency mechanism -- see:

commit d974baa398f34393db76be45f7d4d04fbdbb4a0a
Author: Andy Lutomirski <luto@amacapital.net>
Date:   Wed Oct 8 09:02:13 2014 -0700

    x86,kvm,vmx: Preserve CR4 across VM entry

>
>
> ---
>
> The host's decision to enable machine check exceptions should remain
> in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
> and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
>
> Tested: Inject machine check while a guest is spinning.
> Before the change, if guest CR4.MCE==0, then the machine check is
> escalated to Catastrophic Error (CATERR) and the machine dies.
> If guest CR4.MCE==1, then the machine check causes VMEXIT and is
> handled normally by host Linux. After the change, injecting a machine
> check causes normal Linux machine check handling.
>
>
> ---
>  arch/x86/kvm/vmx.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a214104..44c8d24 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu,
> unsigned long cr3)
>
>  static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
> -       unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ?
> -                   KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
> +       /*
> +        * Pass through host's Machine Check Enable value to hw_cr4, which
> +        * is in force while we are in guest mode.  Do not let guests control
> +        * this bit, even if host CR4.MCE == 0.
> +        */
> +       unsigned long hw_cr4 =
> +               (read_cr4() & X86_CR4_MCE) |

cr4_read_shadow?  There is no read_cr4 any more.

> +               (cr4 & ~X86_CR4_MCE) |
> +               (to_vmx(vcpu)->rmode.vm86_active ?
> +                KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
>

Otherwise looks sensible.

--Andy

>         if (cr4 & X86_CR4_VMXE) {
>                 /*
> --



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: [RFC] Keeping host value of CR4.MCE (and other bits) live during guest execution
  2015-04-15 23:52 ` Fwd: [RFC] Keeping host value of CR4.MCE (and other bits) live during guest execution Benjamin Serebrin
  2015-04-16  0:05   ` Andy Lutomirski
@ 2015-04-16  9:23   ` Paolo Bonzini
  2015-04-16 16:41     ` Benjamin Serebrin
  1 sibling, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2015-04-16  9:23 UTC (permalink / raw)
  To: Benjamin Serebrin, kvm, luto



On 16/04/2015 01:52, Benjamin Serebrin wrote:
> There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1),
> the value gets zeroed while the CPU is running in guest context.
> If a machine check event arrives while the CPU is in guest context and
> effective CR4.MCE is zero, the machine raises CATERR and crashes.
> 
> We should make sure the host value of CR4.MCE is always active.  There
> are read and write shadows for the guest to think it wrote its own value.
> 
> For discussion: there's new complexity with CR4 shadowing
> (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389).  I measure CR4 reads at 24
> cycles on haswell and 36 on sandybridge, which compares well with
> L2 miss costs.  Is the shadowing worth the complexity?  CR4 is also
> cached (with no real consistency mechanism) in the VMCS at the time
> of guest VCPU creation.  If there is ever a change in CR4 value
> over time, or if CR4 is different on different CPUs in the system, all this
> logic gets broken.

Good catch.  Please resubmit with your Signed-off-by, and rebased to
Linus's tree.

Paolo

> Thanks,
> Ben
> 
> 
> ---
> 
> The host's decision to enable machine check exceptions should remain
> in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
> and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
> 
> Tested: Inject machine check while a guest is spinning.
> Before the change, if guest CR4.MCE==0, then the machine check is
> escalated to Catastrophic Error (CATERR) and the machine dies.
> If guest CR4.MCE==1, then the machine check causes VMEXIT and is
> handled normally by host Linux. After the change, injecting a machine
> check causes normal Linux machine check handling.
> 
> 
> ---
>  arch/x86/kvm/vmx.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a214104..44c8d24 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu,
> unsigned long cr3)
> 
>  static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
> -       unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ?
> -                   KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
> +       /*
> +        * Pass through host's Machine Check Enable value to hw_cr4, which
> +        * is in force while we are in guest mode.  Do not let guests control
> +        * this bit, even if host CR4.MCE == 0.
> +        */
> +       unsigned long hw_cr4 =
> +               (read_cr4() & X86_CR4_MCE) |
> +               (cr4 & ~X86_CR4_MCE) |
> +               (to_vmx(vcpu)->rmode.vm86_active ?
> +                KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
> 
>         if (cr4 & X86_CR4_VMXE) {
>                 /*
> --
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: [RFC] Keeping host value of CR4.MCE (and other bits) live during guest execution
  2015-04-16  9:23   ` Fwd: " Paolo Bonzini
@ 2015-04-16 16:41     ` Benjamin Serebrin
  0 siblings, 0 replies; 4+ messages in thread
From: Benjamin Serebrin @ 2015-04-16 16:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, luto

Thanks very much, Andy and Paolo.  I've brought my patch out of the dark age
and just re-sent it under a new subject line.


On Thu, Apr 16, 2015 at 2:23 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 16/04/2015 01:52, Benjamin Serebrin wrote:
>> There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1),
>> the value gets zeroed while the CPU is running in guest context.
>> If a machine check event arrives while the CPU is in guest context and
>> effective CR4.MCE is zero, the machine raises CATERR and crashes.
>>
>> We should make sure the host value of CR4.MCE is always active.  There
>> are read and write shadows for the guest to think it wrote its own value.
>>
>> For discussion: there's new complexity with CR4 shadowing
>> (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389).  I measure CR4 reads at 24
>> cycles on haswell and 36 on sandybridge, which compares well with
>> L2 miss costs.  Is the shadowing worth the complexity?  CR4 is also
>> cached (with no real consistency mechanism) in the VMCS at the time
>> of guest VCPU creation.  If there is ever a change in CR4 value
>> over time, or if CR4 is different on different CPUs in the system, all this
>> logic gets broken.
>
> Good catch.  Please resubmit with your Signed-off-by, and rebased to
> Linus's tree.
>
> Paolo
>
>> Thanks,
>> Ben
>>
>>
>> ---
>>
>> The host's decision to enable machine check exceptions should remain
>> in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
>> and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
>>
>> Tested: Inject machine check while a guest is spinning.
>> Before the change, if guest CR4.MCE==0, then the machine check is
>> escalated to Catastrophic Error (CATERR) and the machine dies.
>> If guest CR4.MCE==1, then the machine check causes VMEXIT and is
>> handled normally by host Linux. After the change, injecting a machine
>> check causes normal Linux machine check handling.
>>
>>
>> ---
>>  arch/x86/kvm/vmx.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index a214104..44c8d24 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu,
>> unsigned long cr3)
>>
>>  static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>>  {
>> -       unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ?
>> -                   KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
>> +       /*
>> +        * Pass through host's Machine Check Enable value to hw_cr4, which
>> +        * is in force while we are in guest mode.  Do not let guests control
>> +        * this bit, even if host CR4.MCE == 0.
>> +        */
>> +       unsigned long hw_cr4 =
>> +               (read_cr4() & X86_CR4_MCE) |
>> +               (cr4 & ~X86_CR4_MCE) |
>> +               (to_vmx(vcpu)->rmode.vm86_active ?
>> +                KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
>>
>>         if (cr4 & X86_CR4_VMXE) {
>>                 /*
>> --
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-04-16 16:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAN+hb0VA6Xrff+YOpKSSqjkdLeZO8S3EGdt6nCLnBaU9Ag-E0w@mail.gmail.com>
2015-04-15 23:52 ` Fwd: [RFC] Keeping host value of CR4.MCE (and other bits) live during guest execution Benjamin Serebrin
2015-04-16  0:05   ` Andy Lutomirski
2015-04-16  9:23   ` Fwd: " Paolo Bonzini
2015-04-16 16:41     ` Benjamin Serebrin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.