linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup
@ 2020-02-27 22:30 Sean Christopherson
  2020-02-27 22:30 ` [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support Sean Christopherson
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Sean Christopherson @ 2020-02-27 22:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Patch 1 fixes a a theoretical bug where a crashdump NMI that arrives
while KVM is messing with the percpu VMCS list would result in one or more
VMCSes not being cleared, potentially causing memory corruption in the new
kexec'd kernel.

Patch 2 isn't directly related, but it conflicts with the crash cleanup
changes, both from a code and a semantics perspective.  Without the crash
cleanup, IMO hardware_enable() should do crash_disable_local_vmclear()
if VMXON fails, i.e. clean up after itself.  But hardware_disable()
doesn't even do crash_disable_local_vmclear() (which is what got me
looking at that code in the first place).  Basing the VMXON change on top
of the crash cleanup avoids the debate entirely.

Patch 3 is a simple clean up (no functional change intended ;-) ).

I verified my analysis of the NMI bug by simulating what would happen if
an NMI arrived in the middle of list_add() and list_del().  The below
output matches expectations, e.g. nothing hangs, the entry being added
doesn't show up, and the entry being deleted _does_ show up.

[    8.205898] KVM: testing NMI in list_add()
[    8.205898] KVM: testing NMI in list_del()
[    8.205899] KVM: found e3
[    8.205899] KVM: found e2
[    8.205899] KVM: found e1
[    8.205900] KVM: found e3
[    8.205900] KVM: found e1

static void vmx_test_list(struct list_head *list, struct list_head *e1,
			  struct list_head *e2, struct list_head *e3)
{
	struct list_head *tmp;

	list_for_each(tmp, list) {
		if (tmp == e1)
			pr_warn("KVM: found e1\n");
		else if (tmp == e2)
			pr_warn("KVM: found e2\n");
		else if (tmp == e3)
			pr_warn("KVM: found e3\n");
		else
			pr_warn("KVM: kaboom\n");
	}
}

static int __init vmx_init(void)
{
	LIST_HEAD(list);
	LIST_HEAD(e1);
	LIST_HEAD(e2);
	LIST_HEAD(e3);

	pr_warn("KVM: testing NMI in list_add()\n");

	list.next->prev = &e1;
	vmx_test_list(&list, &e1, &e2, &e3);

	e1.next = list.next;
	vmx_test_list(&list, &e1, &e2, &e3);

	e1.prev = &list;
	vmx_test_list(&list, &e1, &e2, &e3);

	INIT_LIST_HEAD(&list);
	INIT_LIST_HEAD(&e1);

	list_add(&e1, &list);
	list_add(&e2, &list);
	list_add(&e3, &list);

	pr_warn("KVM: testing NMI in list_del()\n");

	e3.prev = &e1;
	vmx_test_list(&list, &e1, &e2, &e3);

	list_del(&e2);
	list.prev = &e1;
	vmx_test_list(&list, &e1, &e2, &e3);
}

Sean Christopherson (3):
  KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
  KVM: VMX: Gracefully handle faults on VMXON
  KVM: VMX: Make loaded_vmcs_init() a static function

 arch/x86/kvm/vmx/vmx.c | 101 +++++++++++++++++------------------------
 arch/x86/kvm/vmx/vmx.h |   1 -
 2 files changed, 41 insertions(+), 61 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
  2020-02-27 22:30 [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup Sean Christopherson
@ 2020-02-27 22:30 ` Sean Christopherson
  2020-02-28 10:16   ` Paolo Bonzini
  2020-02-27 22:30 ` [PATCH 2/3] KVM: VMX: Gracefully handle faults on VMXON Sean Christopherson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2020-02-27 22:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.

Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs().  This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.

Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel.  Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.

Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration.  list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new).  A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.

In addition to removing the temporary disabling of VMCLEAR, move the
list_del() from __loaded_vmcs_clear() to loaded_vmcs_init() so that the
VMCS is deleted from the list only after it's been VMCLEAR'd.  Deleting
the VMCS before VMCLEAR would allow a race where the NMI shootdown could
arrive between list_del() and vmcs_clear() and thus neither flow would
execute a successful VMCLEAR.

Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.

[*] https://patchwork.kernel.org/patch/1675731/#3720461

Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 79 +++++++++++-------------------------------
 arch/x86/kvm/vmx/vmx.h |  2 +-
 2 files changed, 22 insertions(+), 59 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a04017bdae05..2648b2214407 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -654,53 +654,39 @@ static int vmx_set_guest_msr(struct vcpu_vmx *vmx, struct shared_msr_entry *msr,
 	return ret;
 }
 
-void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
+void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
 {
 	vmcs_clear(loaded_vmcs->vmcs);
 	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
 		vmcs_clear(loaded_vmcs->shadow_vmcs);
+
+	if (in_use) {
+		list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
+
+		/*
+		 * Ensure deleting loaded_vmcs from its current percpu list
+		 * completes before setting loaded_vmcs->vcpu to -1, otherwise
+		 * a different cpu can see vcpu == -1 first and add loaded_vmcs
+		 * to its percpu list before it's deleted from this cpu's list.
+		 * Pairs with the smp_rmb() in vmx_vcpu_load_vmcs().
+		 */
+		smp_wmb();
+	}
+
 	loaded_vmcs->cpu = -1;
 	loaded_vmcs->launched = 0;
 }
 
 #ifdef CONFIG_KEXEC_CORE
-/*
- * This bitmap is used to indicate whether the vmclear
- * operation is enabled on all cpus. All disabled by
- * default.
- */
-static cpumask_t crash_vmclear_enabled_bitmap = CPU_MASK_NONE;
-
-static inline void crash_enable_local_vmclear(int cpu)
-{
-	cpumask_set_cpu(cpu, &crash_vmclear_enabled_bitmap);
-}
-
-static inline void crash_disable_local_vmclear(int cpu)
-{
-	cpumask_clear_cpu(cpu, &crash_vmclear_enabled_bitmap);
-}
-
-static inline int crash_local_vmclear_enabled(int cpu)
-{
-	return cpumask_test_cpu(cpu, &crash_vmclear_enabled_bitmap);
-}
-
 static void crash_vmclear_local_loaded_vmcss(void)
 {
 	int cpu = raw_smp_processor_id();
 	struct loaded_vmcs *v;
 
-	if (!crash_local_vmclear_enabled(cpu))
-		return;
-
 	list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
 			    loaded_vmcss_on_cpu_link)
 		vmcs_clear(v->vmcs);
 }
-#else
-static inline void crash_enable_local_vmclear(int cpu) { }
-static inline void crash_disable_local_vmclear(int cpu) { }
 #endif /* CONFIG_KEXEC_CORE */
 
 static void __loaded_vmcs_clear(void *arg)
@@ -712,19 +698,8 @@ static void __loaded_vmcs_clear(void *arg)
 		return; /* vcpu migration can race with cpu offline */
 	if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs)
 		per_cpu(current_vmcs, cpu) = NULL;
-	crash_disable_local_vmclear(cpu);
-	list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
 
-	/*
-	 * we should ensure updating loaded_vmcs->loaded_vmcss_on_cpu_link
-	 * is before setting loaded_vmcs->vcpu to -1 which is done in
-	 * loaded_vmcs_init. Otherwise, other cpu can see vcpu = -1 fist
-	 * then adds the vmcs into percpu list before it is deleted.
-	 */
-	smp_wmb();
-
-	loaded_vmcs_init(loaded_vmcs);
-	crash_enable_local_vmclear(cpu);
+	loaded_vmcs_init(loaded_vmcs, true);
 }
 
 void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
@@ -1343,18 +1318,17 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu)
 	if (!already_loaded) {
 		loaded_vmcs_clear(vmx->loaded_vmcs);
 		local_irq_disable();
-		crash_disable_local_vmclear(cpu);
 
 		/*
-		 * Read loaded_vmcs->cpu should be before fetching
-		 * loaded_vmcs->loaded_vmcss_on_cpu_link.
-		 * See the comments in __loaded_vmcs_clear().
+		 * Ensure loaded_vmcs->cpu is read before adding loaded_vmcs to
+		 * this cpu's percpu list, otherwise it may not yet be deleted
+		 * from its previous cpu's percpu list.  Pairs with the
+		 * smb_wmb() in loaded_vmcs_init().
 		 */
 		smp_rmb();
 
 		list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link,
 			 &per_cpu(loaded_vmcss_on_cpu, cpu));
-		crash_enable_local_vmclear(cpu);
 		local_irq_enable();
 	}
 
@@ -2290,17 +2264,6 @@ static int hardware_enable(void)
 	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
 	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
 
-	/*
-	 * Now we can enable the vmclear operation in kdump
-	 * since the loaded_vmcss_on_cpu list on this cpu
-	 * has been initialized.
-	 *
-	 * Though the cpu is not in VMX operation now, there
-	 * is no problem to enable the vmclear operation
-	 * for the loaded_vmcss_on_cpu list is empty!
-	 */
-	crash_enable_local_vmclear(cpu);
-
 	kvm_cpu_vmxon(phys_addr);
 	if (enable_ept)
 		ept_sync_global();
@@ -2603,7 +2566,7 @@ int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
 
 	loaded_vmcs->shadow_vmcs = NULL;
 	loaded_vmcs->hv_timer_soft_disabled = false;
-	loaded_vmcs_init(loaded_vmcs);
+	loaded_vmcs_init(loaded_vmcs, false);
 
 	if (cpu_has_vmx_msr_bitmap()) {
 		loaded_vmcs->msr_bitmap = (unsigned long *)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index e64da06c7009..a0a93b2a238e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -493,7 +493,7 @@ struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags);
 void free_vmcs(struct vmcs *vmcs);
 int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs);
 void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs);
-void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs);
+void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use);
 void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs);
 
 static inline struct vmcs *alloc_vmcs(bool shadow)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] KVM: VMX: Gracefully handle faults on VMXON
  2020-02-27 22:30 [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup Sean Christopherson
  2020-02-27 22:30 ` [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support Sean Christopherson
@ 2020-02-27 22:30 ` Sean Christopherson
  2020-02-27 22:30 ` [PATCH 3/3] KVM: VMX: Make loaded_vmcs_init() a static function Sean Christopherson
  2020-02-28 11:02 ` [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup Paolo Bonzini
  3 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2020-02-27 22:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Gracefully handle faults on VMXON, e.g. #GP due to VMX being disabled by
BIOS, instead of letting the fault crash the system.  Now that KVM uses
cpufeatures to query support instead of reading MSR_IA32_FEAT_CTL
directly, it's possible for a bug in a different subsystem to cause KVM
to incorrectly attempt VMXON[*].  Crashing the system is especially
annoying if the system is configured such that hardware_enable() will
be triggered during boot.

Oppurtunistically rename @addr to @vmxon_pointer and use a named param
to reference it in the inline assembly.

Print 0xdeadbeef in the ultra-"rare" case that reading MSR_IA32_FEAT_CTL
also faults.

[*] https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.com
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2648b2214407..a5109804abee 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2236,18 +2236,33 @@ static __init int vmx_disabled_by_bios(void)
 	       !boot_cpu_has(X86_FEATURE_VMX);
 }
 
-static void kvm_cpu_vmxon(u64 addr)
+static int kvm_cpu_vmxon(u64 vmxon_pointer)
 {
+	u64 msr;
+
 	cr4_set_bits(X86_CR4_VMXE);
 	intel_pt_handle_vmx(1);
 
-	asm volatile ("vmxon %0" : : "m"(addr));
+	asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
+			  _ASM_EXTABLE(1b, %l[fault])
+			  : : [vmxon_pointer] "m"(vmxon_pointer)
+			  : : fault);
+	return 0;
+
+fault:
+	WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx\n",
+		  rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr) ? 0xdeadbeef : msr);
+	intel_pt_handle_vmx(0);
+	cr4_clear_bits(X86_CR4_VMXE);
+
+	return -EFAULT;
 }
 
 static int hardware_enable(void)
 {
 	int cpu = raw_smp_processor_id();
 	u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
+	int r;
 
 	if (cr4_read_shadow() & X86_CR4_VMXE)
 		return -EBUSY;
@@ -2264,7 +2279,10 @@ static int hardware_enable(void)
 	INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
 	spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
 
-	kvm_cpu_vmxon(phys_addr);
+	r = kvm_cpu_vmxon(phys_addr);
+	if (r)
+		return r;
+
 	if (enable_ept)
 		ept_sync_global();
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] KVM: VMX: Make loaded_vmcs_init() a static function
  2020-02-27 22:30 [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup Sean Christopherson
  2020-02-27 22:30 ` [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support Sean Christopherson
  2020-02-27 22:30 ` [PATCH 2/3] KVM: VMX: Gracefully handle faults on VMXON Sean Christopherson
@ 2020-02-27 22:30 ` Sean Christopherson
  2020-02-28 11:02 ` [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup Paolo Bonzini
  3 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2020-02-27 22:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Make loaded_vmcs_init() static as it's only used in vmx.c.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 2 +-
 arch/x86/kvm/vmx/vmx.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a5109804abee..1aeb3359c4d1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -654,7 +654,7 @@ static int vmx_set_guest_msr(struct vcpu_vmx *vmx, struct shared_msr_entry *msr,
 	return ret;
 }
 
-void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
+static void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
 {
 	vmcs_clear(loaded_vmcs->vmcs);
 	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a0a93b2a238e..8df23af2cb58 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -493,7 +493,6 @@ struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags);
 void free_vmcs(struct vmcs *vmcs);
 int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs);
 void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs);
-void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use);
 void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs);
 
 static inline struct vmcs *alloc_vmcs(bool shadow)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
  2020-02-27 22:30 ` [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support Sean Christopherson
@ 2020-02-28 10:16   ` Paolo Bonzini
  2020-02-28 14:59     ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2020-02-28 10:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 27/02/20 23:30, Sean Christopherson wrote:
> -void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
> +void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
>  {
>  	vmcs_clear(loaded_vmcs->vmcs);
>  	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
>  		vmcs_clear(loaded_vmcs->shadow_vmcs);
> +
> +	if (in_use) {
> +		list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
> +
> +		/*
> +		 * Ensure deleting loaded_vmcs from its current percpu list
> +		 * completes before setting loaded_vmcs->vcpu to -1, otherwise
> +		 * a different cpu can see vcpu == -1 first and add loaded_vmcs
> +		 * to its percpu list before it's deleted from this cpu's list.
> +		 * Pairs with the smp_rmb() in vmx_vcpu_load_vmcs().
> +		 */
> +		smp_wmb();
> +	}
> +

I'd like to avoid the new in_use argument and, also, I think it's a 
little bit nicer to always invoke the memory barrier.  Even though we 
use "asm volatile" for vmclear and therefore the compiler is already 
taken care of, in principle it's more correct to order the ->cpu write 
against vmclear's.

This gives the following patch on top:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c9d6152e7a4d..77a64110577b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -656,25 +656,24 @@ static int vmx_set_guest_msr(struct vcpu_vmx *vmx, struct shared_msr_entry *msr,
 	return ret;
 }
 
-void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
+void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
 {
 	vmcs_clear(loaded_vmcs->vmcs);
 	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
 		vmcs_clear(loaded_vmcs->shadow_vmcs);
 
-	if (in_use) {
+	if (!list_empty(&loaded_vmcs->loaded_vmcss_on_cpu_link))
 		list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
 
-		/*
-		 * Ensure deleting loaded_vmcs from its current percpu list
-		 * completes before setting loaded_vmcs->vcpu to -1, otherwise
-		 * a different cpu can see vcpu == -1 first and add loaded_vmcs
-		 * to its percpu list before it's deleted from this cpu's list.
-		 * Pairs with the smp_rmb() in vmx_vcpu_load_vmcs().
-		 */
-		smp_wmb();
-	}
-
+	/*
+	 * Ensure all writes to loaded_vmcs, including deleting it
+	 * from its current percpu list, complete before setting
+	 * loaded_vmcs->vcpu to -1; otherwise,, a different cpu can
+	 * see vcpu == -1 first and add loaded_vmcs to its percpu
+	 * list before it's deleted from this cpu's list.  Pairs
+	 * with the smp_rmb() in vmx_vcpu_load_vmcs().
+	 */
+	smp_wmb();
 	loaded_vmcs->cpu = -1;
 	loaded_vmcs->launched = 0;
 }
@@ -701,7 +700,7 @@ static void __loaded_vmcs_clear(void *arg)
 	if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs)
 		per_cpu(current_vmcs, cpu) = NULL;
 
-	loaded_vmcs_init(loaded_vmcs, true);
+	loaded_vmcs_init(loaded_vmcs);
 }
 
 void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
@@ -2568,7 +2567,8 @@ int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
 
 	loaded_vmcs->shadow_vmcs = NULL;
 	loaded_vmcs->hv_timer_soft_disabled = false;
-	loaded_vmcs_init(loaded_vmcs, false);
+	INIT_LIST_HEAD(&loaded_vmcs->loaded_vmcss_on_cpu_link);
+	loaded_vmcs_init(loaded_vmcs);
 
 	if (cpu_has_vmx_msr_bitmap()) {
 		loaded_vmcs->msr_bitmap = (unsigned long *)

Paolo


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup
  2020-02-27 22:30 [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup Sean Christopherson
                   ` (2 preceding siblings ...)
  2020-02-27 22:30 ` [PATCH 3/3] KVM: VMX: Make loaded_vmcs_init() a static function Sean Christopherson
@ 2020-02-28 11:02 ` Paolo Bonzini
  3 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2020-02-28 11:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 27/02/20 23:30, Sean Christopherson wrote:
> Patch 1 fixes a a theoretical bug where a crashdump NMI that arrives
> while KVM is messing with the percpu VMCS list would result in one or more
> VMCSes not being cleared, potentially causing memory corruption in the new
> kexec'd kernel.
> 
> Patch 2 isn't directly related, but it conflicts with the crash cleanup
> changes, both from a code and a semantics perspective.  Without the crash
> cleanup, IMO hardware_enable() should do crash_disable_local_vmclear()
> if VMXON fails, i.e. clean up after itself.  But hardware_disable()
> doesn't even do crash_disable_local_vmclear() (which is what got me
> looking at that code in the first place).  Basing the VMXON change on top
> of the crash cleanup avoids the debate entirely.
> 
> Patch 3 is a simple clean up (no functional change intended ;-) ).
> 
> I verified my analysis of the NMI bug by simulating what would happen if
> an NMI arrived in the middle of list_add() and list_del().  The below
> output matches expectations, e.g. nothing hangs, the entry being added
> doesn't show up, and the entry being deleted _does_ show up.
> 
> [    8.205898] KVM: testing NMI in list_add()
> [    8.205898] KVM: testing NMI in list_del()
> [    8.205899] KVM: found e3
> [    8.205899] KVM: found e2
> [    8.205899] KVM: found e1
> [    8.205900] KVM: found e3
> [    8.205900] KVM: found e1
> 
> static void vmx_test_list(struct list_head *list, struct list_head *e1,
> 			  struct list_head *e2, struct list_head *e3)
> {
> 	struct list_head *tmp;
> 
> 	list_for_each(tmp, list) {
> 		if (tmp == e1)
> 			pr_warn("KVM: found e1\n");
> 		else if (tmp == e2)
> 			pr_warn("KVM: found e2\n");
> 		else if (tmp == e3)
> 			pr_warn("KVM: found e3\n");
> 		else
> 			pr_warn("KVM: kaboom\n");
> 	}
> }
> 
> static int __init vmx_init(void)
> {
> 	LIST_HEAD(list);
> 	LIST_HEAD(e1);
> 	LIST_HEAD(e2);
> 	LIST_HEAD(e3);
> 
> 	pr_warn("KVM: testing NMI in list_add()\n");
> 
> 	list.next->prev = &e1;
> 	vmx_test_list(&list, &e1, &e2, &e3);
> 
> 	e1.next = list.next;
> 	vmx_test_list(&list, &e1, &e2, &e3);
> 
> 	e1.prev = &list;
> 	vmx_test_list(&list, &e1, &e2, &e3);
> 
> 	INIT_LIST_HEAD(&list);
> 	INIT_LIST_HEAD(&e1);
> 
> 	list_add(&e1, &list);
> 	list_add(&e2, &list);
> 	list_add(&e3, &list);
> 
> 	pr_warn("KVM: testing NMI in list_del()\n");
> 
> 	e3.prev = &e1;
> 	vmx_test_list(&list, &e1, &e2, &e3);
> 
> 	list_del(&e2);
> 	list.prev = &e1;
> 	vmx_test_list(&list, &e1, &e2, &e3);
> }
> 
> Sean Christopherson (3):
>   KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
>   KVM: VMX: Gracefully handle faults on VMXON
>   KVM: VMX: Make loaded_vmcs_init() a static function
> 
>  arch/x86/kvm/vmx/vmx.c | 101 +++++++++++++++++------------------------
>  arch/x86/kvm/vmx/vmx.h |   1 -
>  2 files changed, 41 insertions(+), 61 deletions(-)
> 

Queued, thanks (but still awaiting "counter-review" on the patch 1
suggestions).

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
  2020-02-28 10:16   ` Paolo Bonzini
@ 2020-02-28 14:59     ` Sean Christopherson
  2020-03-21 14:55       ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2020-02-28 14:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On Fri, Feb 28, 2020 at 11:16:10AM +0100, Paolo Bonzini wrote:
> On 27/02/20 23:30, Sean Christopherson wrote:
> > -void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
> > +void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
> >  {
> >  	vmcs_clear(loaded_vmcs->vmcs);
> >  	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
> >  		vmcs_clear(loaded_vmcs->shadow_vmcs);
> > +
> > +	if (in_use) {
> > +		list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
> > +
> > +		/*
> > +		 * Ensure deleting loaded_vmcs from its current percpu list
> > +		 * completes before setting loaded_vmcs->vcpu to -1, otherwise
> > +		 * a different cpu can see vcpu == -1 first and add loaded_vmcs
> > +		 * to its percpu list before it's deleted from this cpu's list.
> > +		 * Pairs with the smp_rmb() in vmx_vcpu_load_vmcs().
> > +		 */
> > +		smp_wmb();
> > +	}
> > +
> 
> I'd like to avoid the new in_use argument and, also, I think it's a
> little bit nicer to always invoke the memory barrier.  Even though we 
> use "asm volatile" for vmclear and therefore the compiler is already 
> taken care of, in principle it's more correct to order the ->cpu write 
> against vmclear's.

Completely agree on all points.  I wanted to avoid in_use as well, but it
didn't occur to me to use list_empty()...

> This gives the following patch on top:

Looks good.

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c9d6152e7a4d..77a64110577b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -656,25 +656,24 @@ static int vmx_set_guest_msr(struct vcpu_vmx *vmx, struct shared_msr_entry *msr,
>  	return ret;
>  }
>  
> -void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
> +void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
>  {
>  	vmcs_clear(loaded_vmcs->vmcs);
>  	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
>  		vmcs_clear(loaded_vmcs->shadow_vmcs);
>  
> -	if (in_use) {
> +	if (!list_empty(&loaded_vmcs->loaded_vmcss_on_cpu_link))
>  		list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
>  
> -		/*
> -		 * Ensure deleting loaded_vmcs from its current percpu list
> -		 * completes before setting loaded_vmcs->vcpu to -1, otherwise
> -		 * a different cpu can see vcpu == -1 first and add loaded_vmcs
> -		 * to its percpu list before it's deleted from this cpu's list.
> -		 * Pairs with the smp_rmb() in vmx_vcpu_load_vmcs().
> -		 */
> -		smp_wmb();
> -	}
> -
> +	/*
> +	 * Ensure all writes to loaded_vmcs, including deleting it
> +	 * from its current percpu list, complete before setting
> +	 * loaded_vmcs->vcpu to -1; otherwise,, a different cpu can
> +	 * see vcpu == -1 first and add loaded_vmcs to its percpu
> +	 * list before it's deleted from this cpu's list.  Pairs
> +	 * with the smp_rmb() in vmx_vcpu_load_vmcs().
> +	 */
> +	smp_wmb();
>  	loaded_vmcs->cpu = -1;
>  	loaded_vmcs->launched = 0;
>  }
> @@ -701,7 +700,7 @@ static void __loaded_vmcs_clear(void *arg)
>  	if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs)
>  		per_cpu(current_vmcs, cpu) = NULL;
>  
> -	loaded_vmcs_init(loaded_vmcs, true);
> +	loaded_vmcs_init(loaded_vmcs);
>  }
>  
>  void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
> @@ -2568,7 +2567,8 @@ int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
>  
>  	loaded_vmcs->shadow_vmcs = NULL;
>  	loaded_vmcs->hv_timer_soft_disabled = false;
> -	loaded_vmcs_init(loaded_vmcs, false);
> +	INIT_LIST_HEAD(&loaded_vmcs->loaded_vmcss_on_cpu_link);
> +	loaded_vmcs_init(loaded_vmcs);
>  
>  	if (cpu_has_vmx_msr_bitmap()) {
>  		loaded_vmcs->msr_bitmap = (unsigned long *)
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
  2020-02-28 14:59     ` Sean Christopherson
@ 2020-03-21 14:55       ` Sean Christopherson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2020-03-21 14:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On Fri, Feb 28, 2020 at 06:59:43AM -0800, Sean Christopherson wrote:
> On Fri, Feb 28, 2020 at 11:16:10AM +0100, Paolo Bonzini wrote:
> > On 27/02/20 23:30, Sean Christopherson wrote:
> > > -void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
> > > +void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
> > >  {
> > >  	vmcs_clear(loaded_vmcs->vmcs);
> > >  	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
> > >  		vmcs_clear(loaded_vmcs->shadow_vmcs);
> > > +
> > > +	if (in_use) {
> > > +		list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
> > > +
> > > +		/*
> > > +		 * Ensure deleting loaded_vmcs from its current percpu list
> > > +		 * completes before setting loaded_vmcs->vcpu to -1, otherwise
> > > +		 * a different cpu can see vcpu == -1 first and add loaded_vmcs
> > > +		 * to its percpu list before it's deleted from this cpu's list.
> > > +		 * Pairs with the smp_rmb() in vmx_vcpu_load_vmcs().
> > > +		 */
> > > +		smp_wmb();
> > > +	}
> > > +
> > 
> > I'd like to avoid the new in_use argument and, also, I think it's a
> > little bit nicer to always invoke the memory barrier.  Even though we 
> > use "asm volatile" for vmclear and therefore the compiler is already 
> > taken care of, in principle it's more correct to order the ->cpu write 
> > against vmclear's.
> 
> Completely agree on all points.  I wanted to avoid in_use as well, but it
> didn't occur to me to use list_empty()...
> 
> > This gives the following patch on top:
> 
> Looks good.
> 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index c9d6152e7a4d..77a64110577b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -656,25 +656,24 @@ static int vmx_set_guest_msr(struct vcpu_vmx *vmx, struct shared_msr_entry *msr,
> >  	return ret;
> >  }
> >  
> > -void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs, bool in_use)
> > +void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
> >  {
> >  	vmcs_clear(loaded_vmcs->vmcs);
> >  	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
> >  		vmcs_clear(loaded_vmcs->shadow_vmcs);
> >  
> > -	if (in_use) {
> > +	if (!list_empty(&loaded_vmcs->loaded_vmcss_on_cpu_link))

Circling back to this, an even better option is to drop loaded_vmcs_init()
and open code the required pieces in __loaded_vmcs_clear() and
alloc_loaded_vmcs().  Those are the only two callers, and the latter
doesn't need to VMCLEAR the shadow VMCS (guaranteed to be NULL) and
obviously doesn't need the list manipulation of smp_wmb().

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-21 14:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-27 22:30 [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup Sean Christopherson
2020-02-27 22:30 ` [PATCH 1/3] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support Sean Christopherson
2020-02-28 10:16   ` Paolo Bonzini
2020-02-28 14:59     ` Sean Christopherson
2020-03-21 14:55       ` Sean Christopherson
2020-02-27 22:30 ` [PATCH 2/3] KVM: VMX: Gracefully handle faults on VMXON Sean Christopherson
2020-02-27 22:30 ` [PATCH 3/3] KVM: VMX: Make loaded_vmcs_init() a static function Sean Christopherson
2020-02-28 11:02 ` [PATCH 0/3] KVM: VMX: Fix for kexec VMCLEAR and VMXON cleanup Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).