linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
@ 2020-06-05 19:26 Sean Christopherson
  2020-06-06  2:51 ` Xiaoyao Li
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Sean Christopherson @ 2020-06-05 19:26 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86
  Cc: H. Peter Anvin, linux-kernel, Xiaoyao Li, Paolo Bonzini, kvm,
	Sean Christopherson

Choo! Choo!  All aboard the Split Lock Express, with direct service to
Wreckage!

Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
exists, and is writable, on many generations of CPUs.  Writing the MSR,
even with '0', can result in bizarre, undocumented behavior.

This fixes a crash on Haswell when resuming from suspend with a live KVM
guest.  Because APs use the standard SMP boot flow for resume, they will
go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
which runs even when sld_state==sld_off to ensure SLD is disabled.  On
Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
succeed and _may_ take the SMT _sibling_ out of VMX root mode.

When KVM has an active guest, KVM performs VMXON as part of CPU onlining
(see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
flow is effectively:

  on_each_ap_cpu() {
     WRMSR(MSR_TEST_CTRL, 0)
     VMXON
  }

As a result, the WRMSR can disable VMX on a different CPU that has
already done VMXON.  This ultimately results in a #UD on VMPTRLD when
KVM regains control and attempt run its vCPUs.

The above voodoo was confirmed by reworking KVM's VMXON flow to write
MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
Further verification of the insanity was done by redoing VMXON on all
APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
which should VM-Fail, occasionally succeeded, and also eliminated the
unexpected #UD on VMPTRLD.

The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
almost always hang (even when fudging VMXON), a #UD on a random Jcc was
observed, suspend/resume stability is qualitatively poor, and so on and
so forth.

  kernel BUG at arch/x86/kvm/x86.c:386!
  invalid opcode: 0000 [#7] SMP
  CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G      D
  Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
  RIP: 0010:kvm_spurious_fault+0xf/0x20
  Code: <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
  RSP: 0018:ffffc0bcc1677b78 EFLAGS: 00010246
  RAX: 0000617640000000 RBX: ffff9e8d01d80000 RCX: ffff9e8d4fa40000
  RDX: ffff9e8d03360000 RSI: 00000003c3360000 RDI: ffff9e8d03360000
  RBP: 0000000000000001 R08: ffff9e8d046d9d40 R09: 0000000000000018
  R10: ffffc0bcc1677b80 R11: 0000000000000008 R12: 0000000000000006
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007fe16c9f9700(0000) GS:ffff9e8d4fa40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000d7a418 CR3: 00000003c47b1006 CR4: 00000000001626e0
  Call Trace:
   vmx_vcpu_load_vmcs+0x1fb/0x2b0
   vmx_vcpu_load+0x3e/0x160
   kvm_arch_vcpu_load+0x48/0x260
   finish_task_switch+0x140/0x260
   __schedule+0x460/0x720
   _cond_resched+0x2d/0x40
   kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0
   kvm_vcpu_ioctl+0x363/0x5c0
   ksys_ioctl+0x88/0xa0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x4c/0x170
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Fixes: dbaba47085b0c ("x86/split_lock: Rework the initialization flow of split lock detection")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kernel/cpu/intel.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index a19a680542ce..19b6c42739fc 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -48,6 +48,13 @@ enum split_lock_detect_state {
 static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
 static u64 msr_test_ctrl_cache __ro_after_init;
 
+/*
+ * With a name like MSR_TEST_CTL it should go without saying, but don't touch
+ * MSR_TEST_CTL unless the CPU is one of the whitelisted models.  Writing it
+ * on CPUs that do not support SLD can cause fireworks, even when writing '0'.
+ */
+static bool cpu_model_supports_sld __ro_after_init;
+
 /*
  * Processors which have self-snooping capability can handle conflicting
  * memory type across CPUs by snooping its own cache. However, there exists
@@ -1064,7 +1071,8 @@ static void sld_update_msr(bool on)
 
 static void split_lock_init(void)
 {
-	split_lock_verify_msr(sld_state != sld_off);
+	if (cpu_model_supports_sld)
+		split_lock_verify_msr(sld_state != sld_off);
 }
 
 static void split_lock_warn(unsigned long ip)
@@ -1167,5 +1175,6 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
 		return;
 	}
 
+	cpu_model_supports_sld = true;
 	split_lock_setup();
 }
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
  2020-06-05 19:26 [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted Sean Christopherson
@ 2020-06-06  2:51 ` Xiaoyao Li
  2020-06-07 17:06   ` Sean Christopherson
  2020-06-30  5:25 ` Sean Christopherson
  2020-06-30 12:12 ` [tip: x86/urgent] " tip-bot2 for Sean Christopherson
  2 siblings, 1 reply; 6+ messages in thread
From: Xiaoyao Li @ 2020-06-06  2:51 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86
  Cc: H. Peter Anvin, linux-kernel, Paolo Bonzini, kvm

On 6/6/2020 3:26 AM, Sean Christopherson wrote:
> Choo! Choo!  All aboard the Split Lock Express, with direct service to
> Wreckage!
> 
> Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
> SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
> exists, and is writable, on many generations of CPUs.  Writing the MSR,
> even with '0', can result in bizarre, undocumented behavior.
> 
> This fixes a crash on Haswell when resuming from suspend with a live KVM
> guest.  Because APs use the standard SMP boot flow for resume, they will
> go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
> which runs even when sld_state==sld_off to ensure SLD is disabled.  On
> Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
> succeed and _may_ take the SMT _sibling_ out of VMX root mode.
> 
> When KVM has an active guest, KVM performs VMXON as part of CPU onlining
> (see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
> flow is effectively:
> 
>    on_each_ap_cpu() {
>       WRMSR(MSR_TEST_CTRL, 0)
>       VMXON
>    }
> 
> As a result, the WRMSR can disable VMX on a different CPU that has
> already done VMXON.  This ultimately results in a #UD on VMPTRLD when
> KVM regains control and attempt run its vCPUs.
> 
> The above voodoo was confirmed by reworking KVM's VMXON flow to write
> MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
> Further verification of the insanity was done by redoing VMXON on all
> APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
> which should VM-Fail, occasionally succeeded, and also eliminated the
> unexpected #UD on VMPTRLD.
> 
> The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
> to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
> almost always hang (even when fudging VMXON), a #UD on a random Jcc was
> observed, suspend/resume stability is qualitatively poor, and so on and
> so forth.
> 

I'm wondering if all those side-effects of MSR_TEST_CTRL exist on CPUs 
have SLD feature, have you ever tested on a SLD capable CPU?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
  2020-06-06  2:51 ` Xiaoyao Li
@ 2020-06-07 17:06   ` Sean Christopherson
  2020-06-08 19:01     ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2020-06-07 17:06 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, Paolo Bonzini, kvm

On Sat, Jun 06, 2020 at 10:51:06AM +0800, Xiaoyao Li wrote:
> On 6/6/2020 3:26 AM, Sean Christopherson wrote:
> >Choo! Choo!  All aboard the Split Lock Express, with direct service to
> >Wreckage!
> >
> >Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
> >SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
> >exists, and is writable, on many generations of CPUs.  Writing the MSR,
> >even with '0', can result in bizarre, undocumented behavior.
> >
> >This fixes a crash on Haswell when resuming from suspend with a live KVM
> >guest.  Because APs use the standard SMP boot flow for resume, they will
> >go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
> >which runs even when sld_state==sld_off to ensure SLD is disabled.  On
> >Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
> >succeed and _may_ take the SMT _sibling_ out of VMX root mode.
> >
> >When KVM has an active guest, KVM performs VMXON as part of CPU onlining
> >(see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
> >flow is effectively:
> >
> >   on_each_ap_cpu() {
> >      WRMSR(MSR_TEST_CTRL, 0)
> >      VMXON
> >   }
> >
> >As a result, the WRMSR can disable VMX on a different CPU that has
> >already done VMXON.  This ultimately results in a #UD on VMPTRLD when
> >KVM regains control and attempt run its vCPUs.
> >
> >The above voodoo was confirmed by reworking KVM's VMXON flow to write
> >MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
> >Further verification of the insanity was done by redoing VMXON on all
> >APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
> >which should VM-Fail, occasionally succeeded, and also eliminated the
> >unexpected #UD on VMPTRLD.
> >
> >The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
> >to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
> >almost always hang (even when fudging VMXON), a #UD on a random Jcc was
> >observed, suspend/resume stability is qualitatively poor, and so on and
> >so forth.
> >
> 
> I'm wondering if all those side-effects of MSR_TEST_CTRL exist on CPUs have
> SLD feature, have you ever tested on a SLD capable CPU?

No, I'll poke at it on ICX tomorrow.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
  2020-06-07 17:06   ` Sean Christopherson
@ 2020-06-08 19:01     ` Sean Christopherson
  0 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2020-06-08 19:01 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, Paolo Bonzini, kvm

On Sun, Jun 07, 2020 at 10:06:46AM -0700, Sean Christopherson wrote:
> On Sat, Jun 06, 2020 at 10:51:06AM +0800, Xiaoyao Li wrote:
> > On 6/6/2020 3:26 AM, Sean Christopherson wrote:
> > >Choo! Choo!  All aboard the Split Lock Express, with direct service to
> > >Wreckage!
> > >
> > >Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
> > >SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
> > >exists, and is writable, on many generations of CPUs.  Writing the MSR,
> > >even with '0', can result in bizarre, undocumented behavior.
> > >
> > >This fixes a crash on Haswell when resuming from suspend with a live KVM
> > >guest.  Because APs use the standard SMP boot flow for resume, they will
> > >go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
> > >which runs even when sld_state==sld_off to ensure SLD is disabled.  On
> > >Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
> > >succeed and _may_ take the SMT _sibling_ out of VMX root mode.
> > >
> > >When KVM has an active guest, KVM performs VMXON as part of CPU onlining
> > >(see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
> > >flow is effectively:
> > >
> > >   on_each_ap_cpu() {
> > >      WRMSR(MSR_TEST_CTRL, 0)
> > >      VMXON
> > >   }
> > >
> > >As a result, the WRMSR can disable VMX on a different CPU that has
> > >already done VMXON.  This ultimately results in a #UD on VMPTRLD when
> > >KVM regains control and attempt run its vCPUs.
> > >
> > >The above voodoo was confirmed by reworking KVM's VMXON flow to write
> > >MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
> > >Further verification of the insanity was done by redoing VMXON on all
> > >APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
> > >which should VM-Fail, occasionally succeeded, and also eliminated the
> > >unexpected #UD on VMPTRLD.
> > >
> > >The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
> > >to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
> > >almost always hang (even when fudging VMXON), a #UD on a random Jcc was
> > >observed, suspend/resume stability is qualitatively poor, and so on and
> > >so forth.
> > >
> > 
> > I'm wondering if all those side-effects of MSR_TEST_CTRL exist on CPUs have
> > SLD feature, have you ever tested on a SLD capable CPU?
> 
> No, I'll poke at it on ICX tomorrow.

Tested on ICX with SLD both enabled and disabled by writing MSR_TEST_CTRL
on every logical CPU during vCPU creation, no weirdness observed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
  2020-06-05 19:26 [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted Sean Christopherson
  2020-06-06  2:51 ` Xiaoyao Li
@ 2020-06-30  5:25 ` Sean Christopherson
  2020-06-30 12:12 ` [tip: x86/urgent] " tip-bot2 for Sean Christopherson
  2 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2020-06-30  5:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86
  Cc: H. Peter Anvin, linux-kernel, Xiaoyao Li, Paolo Bonzini, kvm

Ping.  This would ideally get into 5.8, the bad behavior is quite nasty.

On Fri, Jun 05, 2020 at 12:26:05PM -0700, Sean Christopherson wrote:
> Choo! Choo!  All aboard the Split Lock Express, with direct service to
> Wreckage!
> 
> Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
> SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
> exists, and is writable, on many generations of CPUs.  Writing the MSR,
> even with '0', can result in bizarre, undocumented behavior.
> 
> This fixes a crash on Haswell when resuming from suspend with a live KVM
> guest.  Because APs use the standard SMP boot flow for resume, they will
> go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
> which runs even when sld_state==sld_off to ensure SLD is disabled.  On
> Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
> succeed and _may_ take the SMT _sibling_ out of VMX root mode.
> 
> When KVM has an active guest, KVM performs VMXON as part of CPU onlining
> (see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
> flow is effectively:
> 
>   on_each_ap_cpu() {
>      WRMSR(MSR_TEST_CTRL, 0)
>      VMXON
>   }
> 
> As a result, the WRMSR can disable VMX on a different CPU that has
> already done VMXON.  This ultimately results in a #UD on VMPTRLD when
> KVM regains control and attempt run its vCPUs.
> 
> The above voodoo was confirmed by reworking KVM's VMXON flow to write
> MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
> Further verification of the insanity was done by redoing VMXON on all
> APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
> which should VM-Fail, occasionally succeeded, and also eliminated the
> unexpected #UD on VMPTRLD.
> 
> The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
> to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
> almost always hang (even when fudging VMXON), a #UD on a random Jcc was
> observed, suspend/resume stability is qualitatively poor, and so on and
> so forth.
> 
>   kernel BUG at arch/x86/kvm/x86.c:386!
>   invalid opcode: 0000 [#7] SMP
>   CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G      D
>   Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
>   RIP: 0010:kvm_spurious_fault+0xf/0x20
>   Code: <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
>   RSP: 0018:ffffc0bcc1677b78 EFLAGS: 00010246
>   RAX: 0000617640000000 RBX: ffff9e8d01d80000 RCX: ffff9e8d4fa40000
>   RDX: ffff9e8d03360000 RSI: 00000003c3360000 RDI: ffff9e8d03360000
>   RBP: 0000000000000001 R08: ffff9e8d046d9d40 R09: 0000000000000018
>   R10: ffffc0bcc1677b80 R11: 0000000000000008 R12: 0000000000000006
>   R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>   FS:  00007fe16c9f9700(0000) GS:ffff9e8d4fa40000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000d7a418 CR3: 00000003c47b1006 CR4: 00000000001626e0
>   Call Trace:
>    vmx_vcpu_load_vmcs+0x1fb/0x2b0
>    vmx_vcpu_load+0x3e/0x160
>    kvm_arch_vcpu_load+0x48/0x260
>    finish_task_switch+0x140/0x260
>    __schedule+0x460/0x720
>    _cond_resched+0x2d/0x40
>    kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0
>    kvm_vcpu_ioctl+0x363/0x5c0
>    ksys_ioctl+0x88/0xa0
>    __x64_sys_ioctl+0x16/0x20
>    do_syscall_64+0x4c/0x170
>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Xiaoyao Li <xiaoyao.li@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: kvm@vger.kernel.org
> Fixes: dbaba47085b0c ("x86/split_lock: Rework the initialization flow of split lock detection")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kernel/cpu/intel.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index a19a680542ce..19b6c42739fc 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -48,6 +48,13 @@ enum split_lock_detect_state {
>  static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
>  static u64 msr_test_ctrl_cache __ro_after_init;
>  
> +/*
> + * With a name like MSR_TEST_CTL it should go without saying, but don't touch
> + * MSR_TEST_CTL unless the CPU is one of the whitelisted models.  Writing it
> + * on CPUs that do not support SLD can cause fireworks, even when writing '0'.
> + */
> +static bool cpu_model_supports_sld __ro_after_init;
> +
>  /*
>   * Processors which have self-snooping capability can handle conflicting
>   * memory type across CPUs by snooping its own cache. However, there exists
> @@ -1064,7 +1071,8 @@ static void sld_update_msr(bool on)
>  
>  static void split_lock_init(void)
>  {
> -	split_lock_verify_msr(sld_state != sld_off);
> +	if (cpu_model_supports_sld)
> +		split_lock_verify_msr(sld_state != sld_off);
>  }
>  
>  static void split_lock_warn(unsigned long ip)
> @@ -1167,5 +1175,6 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
>  		return;
>  	}
>  
> +	cpu_model_supports_sld = true;
>  	split_lock_setup();
>  }
> -- 
> 2.26.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip: x86/urgent] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
  2020-06-05 19:26 [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted Sean Christopherson
  2020-06-06  2:51 ` Xiaoyao Li
  2020-06-30  5:25 ` Sean Christopherson
@ 2020-06-30 12:12 ` tip-bot2 for Sean Christopherson
  2 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Sean Christopherson @ 2020-06-30 12:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Sean Christopherson, Thomas Gleixner, stable, x86, LKML

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     009bce1df0bb5eb970b9eb98d963861f7fe353c7
Gitweb:        https://git.kernel.org/tip/009bce1df0bb5eb970b9eb98d963861f7fe353c7
Author:        Sean Christopherson <sean.j.christopherson@intel.com>
AuthorDate:    Fri, 05 Jun 2020 12:26:05 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 30 Jun 2020 14:09:31 +02:00

x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted

Choo! Choo!  All aboard the Split Lock Express, with direct service to
Wreckage!

Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
exists, and is writable, on many generations of CPUs.  Writing the MSR,
even with '0', can result in bizarre, undocumented behavior.

This fixes a crash on Haswell when resuming from suspend with a live KVM
guest.  Because APs use the standard SMP boot flow for resume, they will
go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
which runs even when sld_state==sld_off to ensure SLD is disabled.  On
Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
succeed and _may_ take the SMT _sibling_ out of VMX root mode.

When KVM has an active guest, KVM performs VMXON as part of CPU onlining
(see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
flow is effectively:

  on_each_ap_cpu() {
     WRMSR(MSR_TEST_CTRL, 0)
     VMXON
  }

As a result, the WRMSR can disable VMX on a different CPU that has
already done VMXON.  This ultimately results in a #UD on VMPTRLD when
KVM regains control and attempt run its vCPUs.

The above voodoo was confirmed by reworking KVM's VMXON flow to write
MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
Further verification of the insanity was done by redoing VMXON on all
APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
which should VM-Fail, occasionally succeeded, and also eliminated the
unexpected #UD on VMPTRLD.

The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
almost always hang (even when fudging VMXON), a #UD on a random Jcc was
observed, suspend/resume stability is qualitatively poor, and so on and
so forth.

  kernel BUG at arch/x86/kvm/x86.c:386!
  CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G      D
  Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
  RIP: 0010:kvm_spurious_fault+0xf/0x20
  Call Trace:
   vmx_vcpu_load_vmcs+0x1fb/0x2b0
   vmx_vcpu_load+0x3e/0x160
   kvm_arch_vcpu_load+0x48/0x260
   finish_task_switch+0x140/0x260
   __schedule+0x460/0x720
   _cond_resched+0x2d/0x40
   kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0
   kvm_vcpu_ioctl+0x363/0x5c0
   ksys_ioctl+0x88/0xa0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x4c/0x170
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: dbaba47085b0c ("x86/split_lock: Rework the initialization flow of split lock detection")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200605192605.7439-1-sean.j.christopherson@intel.com

---
 arch/x86/kernel/cpu/intel.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index c25a67a..0ab48f1 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -50,6 +50,13 @@ static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
 static u64 msr_test_ctrl_cache __ro_after_init;
 
 /*
+ * With a name like MSR_TEST_CTL it should go without saying, but don't touch
+ * MSR_TEST_CTL unless the CPU is one of the whitelisted models.  Writing it
+ * on CPUs that do not support SLD can cause fireworks, even when writing '0'.
+ */
+static bool cpu_model_supports_sld __ro_after_init;
+
+/*
  * Processors which have self-snooping capability can handle conflicting
  * memory type across CPUs by snooping its own cache. However, there exists
  * CPU models in which having conflicting memory types still leads to
@@ -1071,7 +1078,8 @@ static void sld_update_msr(bool on)
 
 static void split_lock_init(void)
 {
-	split_lock_verify_msr(sld_state != sld_off);
+	if (cpu_model_supports_sld)
+		split_lock_verify_msr(sld_state != sld_off);
 }
 
 static void split_lock_warn(unsigned long ip)
@@ -1177,5 +1185,6 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
 		return;
 	}
 
+	cpu_model_supports_sld = true;
 	split_lock_setup();
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-30 12:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05 19:26 [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted Sean Christopherson
2020-06-06  2:51 ` Xiaoyao Li
2020-06-07 17:06   ` Sean Christopherson
2020-06-08 19:01     ` Sean Christopherson
2020-06-30  5:25 ` Sean Christopherson
2020-06-30 12:12 ` [tip: x86/urgent] " tip-bot2 for Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).