kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt
@ 2024-04-25 23:39 Sean Christopherson
  2024-04-25 23:39 ` [PATCH 1/4] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-04-25 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel

Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.

The primary motivation for this series is to simplify dealing with enabling
virtualization for Intel's TDX, which needs to temporarily enable virtualization
when kvm-intek.ko is loaded, i.e. long before the first VM is created.

That said, this is a nice cleanup on its own, assuming I haven't broken
something.  By registering the callbacks on-demand, the callbacks themselves
don't need to check kvm_usage_count, because their very existence implies a
non-zero count.

The meat is in patch 3, patches 1 and 2 are tangentially related x86
cleanups, patch 4 renames helpers to further pave the way for TDX.

Moderately well tested on x86, though I haven't (yet) done due dilegence on
the suspend/resume and cphup paths.  I ran selftests on arm64 and nothing
exploded.
 
Sean Christopherson (4):
  x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef
  KVM: x86: Register emergency virt callback in common code, via
    kvm_x86_ops
  KVM: Register cpuhp and syscore callbacks when enabling hardware
  KVM: Rename functions related to enabling virtualization hardware

 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/include/asm/reboot.h   |   2 +-
 arch/x86/kvm/svm/svm.c          |   5 +-
 arch/x86/kvm/vmx/main.c         |   2 +
 arch/x86/kvm/vmx/vmx.c          |   6 +-
 arch/x86/kvm/vmx/x86_ops.h      |   1 +
 arch/x86/kvm/x86.c              |   5 +
 virt/kvm/kvm_main.c             | 186 +++++++++++---------------------
 8 files changed, 78 insertions(+), 132 deletions(-)


base-commit: 7b076c6a308ec5bce9fc96e2935443ed228b9148
-- 
2.44.0.769.g3c40516874-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef
  2024-04-25 23:39 [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
@ 2024-04-25 23:39 ` Sean Christopherson
  2024-04-25 23:39 ` [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops Sean Christopherson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-04-25 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel

Define cpu_emergency_virt_cb even if the kernel is being built without KVM
support so that KVM can reference the typedef in asm/kvm_host.h without
needing yet more #ifdefs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/reboot.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 6536873f8fc0..d0ef2a678d66 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,8 +25,8 @@ void __noreturn machine_real_restart(unsigned int type);
 #define MRR_BIOS	0
 #define MRR_APM		1
 
-#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
 typedef void (cpu_emergency_virt_cb)(void);
+#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
 void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
 void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback);
 void cpu_emergency_disable_virtualization(void);
-- 
2.44.0.769.g3c40516874-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops
  2024-04-25 23:39 [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
  2024-04-25 23:39 ` [PATCH 1/4] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
@ 2024-04-25 23:39 ` Sean Christopherson
  2024-04-26  8:52   ` Chao Gao
  2024-04-25 23:39 ` [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
  2024-04-25 23:39 ` [PATCH 4/4] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson
  3 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2024-04-25 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel

Move the registration of KVM's "disable virtualization in an emergency"
callbacks into common KVM code, using a pointer in kvm_x86_ops to provide
each vendor's callback.

There is no reason to force vendor code to do the registration, and the
callback should be installed when kvm_x86_ops themselves are ready, i.e.
when it's possible for .hardware_enabled() to be invoked.  E.g. TDX needs
to do VMXON during module initialization, so registering the callback as
part of early setup means one less thing to mess up.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 3 +++
 arch/x86/kvm/svm/svm.c          | 5 +----
 arch/x86/kvm/vmx/main.c         | 2 ++
 arch/x86/kvm/vmx/vmx.c          | 6 +-----
 arch/x86/kvm/vmx/x86_ops.h      | 1 +
 arch/x86/kvm/x86.c              | 5 +++++
 6 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1d13e3cd1dc5..d64a51da150c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -36,6 +36,7 @@
 #include <asm/kvm_page_track.h>
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/hyperv-tlfs.h>
+#include <asm/reboot.h>
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
@@ -1606,6 +1607,8 @@ struct kvm_x86_ops {
 
 	int (*hardware_enable)(void);
 	void (*hardware_disable)(void);
+	cpu_emergency_virt_cb *emergency_disable;
+
 	void (*hardware_unsetup)(void);
 	bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
 	void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0f3b59da0d4a..3b54243d0c22 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4919,6 +4919,7 @@ static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
+	.emergency_disable = svm_emergency_disable,
 	.check_processor_compatibility = svm_check_processor_compat,
 
 	.hardware_unsetup = svm_hardware_unsetup,
@@ -5352,8 +5353,6 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
 static void __svm_exit(void)
 {
 	kvm_x86_vendor_exit();
-
-	cpu_emergency_unregister_virt_callback(svm_emergency_disable);
 }
 
 static int __init svm_init(void)
@@ -5369,8 +5368,6 @@ static int __init svm_init(void)
 	if (r)
 		return r;
 
-	cpu_emergency_register_virt_callback(svm_emergency_disable);
-
 	/*
 	 * Common KVM initialization _must_ come last, after this, /dev/kvm is
 	 * exposed to userspace!
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 7c546ad3e4c9..3f423afc263b 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -24,6 +24,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 
 	.hardware_enable = vmx_hardware_enable,
 	.hardware_disable = vmx_hardware_disable,
+	.emergency_disable = vmx_emergency_disable,
+
 	.has_emulated_msr = vmx_has_emulated_msr,
 
 	.vm_size = sizeof(struct kvm_vmx),
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f10b5f8f364b..19bc62b60fac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -753,7 +753,7 @@ static int kvm_cpu_vmxoff(void)
 	return -EIO;
 }
 
-static void vmx_emergency_disable(void)
+void vmx_emergency_disable(void)
 {
 	int cpu = raw_smp_processor_id();
 	struct loaded_vmcs *v;
@@ -8562,8 +8562,6 @@ static void __vmx_exit(void)
 {
 	allow_smaller_maxphyaddr = false;
 
-	cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
-
 	vmx_cleanup_l1d_flush();
 }
 
@@ -8610,8 +8608,6 @@ static int __init vmx_init(void)
 		pi_init_cpu(cpu);
 	}
 
-	cpu_emergency_register_virt_callback(vmx_emergency_disable);
-
 	vmx_check_vmcs12_offsets();
 
 	/*
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 502704596c83..afddfe3747dd 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -15,6 +15,7 @@ void vmx_hardware_unsetup(void);
 int vmx_check_processor_compat(void);
 int vmx_hardware_enable(void);
 void vmx_hardware_disable(void);
+void vmx_emergency_disable(void);
 int vmx_vm_init(struct kvm *kvm);
 void vmx_vm_destroy(struct kvm *kvm);
 int vmx_vcpu_precreate(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9ef1fa4b90b..12e88aa2cca2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9797,6 +9797,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 
 	kvm_ops_update(ops);
 
+	cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable);
+
 	for_each_online_cpu(cpu) {
 		smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1);
 		if (r < 0)
@@ -9847,6 +9849,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	return 0;
 
 out_unwind_ops:
+	cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable);
 	kvm_x86_ops.hardware_enable = NULL;
 	static_call(kvm_x86_hardware_unsetup)();
 out_mmu_exit:
@@ -9887,6 +9890,8 @@ void kvm_x86_vendor_exit(void)
 	static_key_deferred_flush(&kvm_xen_enabled);
 	WARN_ON(static_branch_unlikely(&kvm_xen_enabled.key));
 #endif
+	cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable);
+
 	mutex_lock(&vendor_module_lock);
 	kvm_x86_ops.hardware_enable = NULL;
 	mutex_unlock(&vendor_module_lock);
-- 
2.44.0.769.g3c40516874-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware
  2024-04-25 23:39 [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
  2024-04-25 23:39 ` [PATCH 1/4] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
  2024-04-25 23:39 ` [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops Sean Christopherson
@ 2024-04-25 23:39 ` Sean Christopherson
  2024-04-26  8:32   ` Chao Gao
  2024-04-25 23:39 ` [PATCH 4/4] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson
  3 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2024-04-25 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel

Register KVM's cpuhp and syscore callback when enabling virtualization
in hardware instead of registering the callbacks during initialization,
and let the CPU up/down framework invoke the inner enable/disable
functions.  Registering the callbacks during initialization makes things
more complex than they need to be, as KVM needs to be very careful about
handling races between enabling CPUs being onlined/offlined and hardware
being enabled/disabled.

Intel TDX support will require KVM to enable virtualization during KVM
initialization, i.e. will add another wrinkle to things, at which point
sorting out the potential races with kvm_usage_count would become even
more complex.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 170 +++++++++++++++-----------------------------
 1 file changed, 56 insertions(+), 114 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 10dad093b7fb..ad1b5a9e86d4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5490,7 +5490,7 @@ EXPORT_SYMBOL_GPL(kvm_rebooting);
 static DEFINE_PER_CPU(bool, hardware_enabled);
 static int kvm_usage_count;
 
-static int __hardware_enable_nolock(void)
+static int hardware_enable_nolock(void)
 {
 	if (__this_cpu_read(hardware_enabled))
 		return 0;
@@ -5505,34 +5505,18 @@ static int __hardware_enable_nolock(void)
 	return 0;
 }
 
-static void hardware_enable_nolock(void *failed)
-{
-	if (__hardware_enable_nolock())
-		atomic_inc(failed);
-}
-
 static int kvm_online_cpu(unsigned int cpu)
 {
-	int ret = 0;
-
 	/*
 	 * Abort the CPU online process if hardware virtualization cannot
 	 * be enabled. Otherwise running VMs would encounter unrecoverable
 	 * errors when scheduled to this CPU.
 	 */
-	mutex_lock(&kvm_lock);
-	if (kvm_usage_count)
-		ret = __hardware_enable_nolock();
-	mutex_unlock(&kvm_lock);
-	return ret;
+	return hardware_enable_nolock();
 }
 
-static void hardware_disable_nolock(void *junk)
+static void hardware_disable_nolock(void *ign)
 {
-	/*
-	 * Note, hardware_disable_all_nolock() tells all online CPUs to disable
-	 * hardware, not just CPUs that successfully enabled hardware!
-	 */
 	if (!__this_cpu_read(hardware_enabled))
 		return;
 
@@ -5543,78 +5527,10 @@ static void hardware_disable_nolock(void *junk)
 
 static int kvm_offline_cpu(unsigned int cpu)
 {
-	mutex_lock(&kvm_lock);
-	if (kvm_usage_count)
-		hardware_disable_nolock(NULL);
-	mutex_unlock(&kvm_lock);
+	hardware_disable_nolock(NULL);
 	return 0;
 }
 
-static void hardware_disable_all_nolock(void)
-{
-	BUG_ON(!kvm_usage_count);
-
-	kvm_usage_count--;
-	if (!kvm_usage_count)
-		on_each_cpu(hardware_disable_nolock, NULL, 1);
-}
-
-static void hardware_disable_all(void)
-{
-	cpus_read_lock();
-	mutex_lock(&kvm_lock);
-	hardware_disable_all_nolock();
-	mutex_unlock(&kvm_lock);
-	cpus_read_unlock();
-}
-
-static int hardware_enable_all(void)
-{
-	atomic_t failed = ATOMIC_INIT(0);
-	int r;
-
-	/*
-	 * Do not enable hardware virtualization if the system is going down.
-	 * If userspace initiated a forced reboot, e.g. reboot -f, then it's
-	 * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling
-	 * after kvm_reboot() is called.  Note, this relies on system_state
-	 * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops
-	 * hook instead of registering a dedicated reboot notifier (the latter
-	 * runs before system_state is updated).
-	 */
-	if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
-	    system_state == SYSTEM_RESTART)
-		return -EBUSY;
-
-	/*
-	 * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
-	 * is called, and so on_each_cpu() between them includes the CPU that
-	 * is being onlined.  As a result, hardware_enable_nolock() may get
-	 * invoked before kvm_online_cpu(), which also enables hardware if the
-	 * usage count is non-zero.  Disable CPU hotplug to avoid attempting to
-	 * enable hardware multiple times.
-	 */
-	cpus_read_lock();
-	mutex_lock(&kvm_lock);
-
-	r = 0;
-
-	kvm_usage_count++;
-	if (kvm_usage_count == 1) {
-		on_each_cpu(hardware_enable_nolock, &failed, 1);
-
-		if (atomic_read(&failed)) {
-			hardware_disable_all_nolock();
-			r = -EBUSY;
-		}
-	}
-
-	mutex_unlock(&kvm_lock);
-	cpus_read_unlock();
-
-	return r;
-}
-
 static void kvm_shutdown(void)
 {
 	/*
@@ -5646,8 +5562,7 @@ static int kvm_suspend(void)
 	lockdep_assert_not_held(&kvm_lock);
 	lockdep_assert_irqs_disabled();
 
-	if (kvm_usage_count)
-		hardware_disable_nolock(NULL);
+	hardware_disable_nolock(NULL);
 	return 0;
 }
 
@@ -5656,8 +5571,7 @@ static void kvm_resume(void)
 	lockdep_assert_not_held(&kvm_lock);
 	lockdep_assert_irqs_disabled();
 
-	if (kvm_usage_count)
-		WARN_ON_ONCE(__hardware_enable_nolock());
+	WARN_ON_ONCE(hardware_enable_nolock());
 }
 
 static struct syscore_ops kvm_syscore_ops = {
@@ -5665,6 +5579,54 @@ static struct syscore_ops kvm_syscore_ops = {
 	.resume = kvm_resume,
 	.shutdown = kvm_shutdown,
 };
+
+static int hardware_enable_all(void)
+{
+	int r;
+
+	guard(mutex)(&kvm_lock);
+
+	if (kvm_usage_count++)
+		return 0;
+
+	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
+			      kvm_online_cpu, kvm_offline_cpu);
+	if (r)
+		return r;
+
+	register_syscore_ops(&kvm_syscore_ops);
+
+	/*
+	 * Undo virtualization enabling and bail if the system is going down.
+	 * If userspace initiated a forced reboot, e.g. reboot -f, then it's
+	 * possible for an in-flight operation to enable virtualization after
+	 * syscore_shutdown() is called, i.e. without kvm_shutdown() being
+	 * invoked.  Note, this relies on system_state being set _before_
+	 * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked
+	 * or this CPU observes the impending shutdown.  Which is why KVM uses
+	 * a syscore ops hook instead of registering a dedicated reboot
+	 * notifier (the latter runs before system_state is updated).
+	 */
+	if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
+	    system_state == SYSTEM_RESTART) {
+		unregister_syscore_ops(&kvm_syscore_ops);
+		cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+static void hardware_disable_all(void)
+{
+	guard(mutex)(&kvm_lock);
+
+	if (--kvm_usage_count)
+		return;
+
+	unregister_syscore_ops(&kvm_syscore_ops);
+	cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
+}
 #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
 static int hardware_enable_all(void)
 {
@@ -6370,15 +6332,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	int r;
 	int cpu;
 
-#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
-	r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
-				      kvm_online_cpu, kvm_offline_cpu);
-	if (r)
-		return r;
-
-	register_syscore_ops(&kvm_syscore_ops);
-#endif
-
 	/* A kmem cache lets us meet the alignment requirements of fx_save. */
 	if (!vcpu_align)
 		vcpu_align = __alignof__(struct kvm_vcpu);
@@ -6389,10 +6342,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 					   offsetofend(struct kvm_vcpu, stats_id)
 					   - offsetof(struct kvm_vcpu, arch),
 					   NULL);
-	if (!kvm_vcpu_cache) {
-		r = -ENOMEM;
-		goto err_vcpu_cache;
-	}
+	if (!kvm_vcpu_cache)
+		return -ENOMEM;
 
 	for_each_possible_cpu(cpu) {
 		if (!alloc_cpumask_var_node(&per_cpu(cpu_kick_mask, cpu),
@@ -6449,11 +6400,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	for_each_possible_cpu(cpu)
 		free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
 	kmem_cache_destroy(kvm_vcpu_cache);
-err_vcpu_cache:
-#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
-	unregister_syscore_ops(&kvm_syscore_ops);
-	cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
-#endif
 	return r;
 }
 EXPORT_SYMBOL_GPL(kvm_init);
@@ -6475,10 +6421,6 @@ void kvm_exit(void)
 	kmem_cache_destroy(kvm_vcpu_cache);
 	kvm_vfio_ops_exit();
 	kvm_async_pf_deinit();
-#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
-	unregister_syscore_ops(&kvm_syscore_ops);
-	cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
-#endif
 	kvm_irqfd_exit();
 }
 EXPORT_SYMBOL_GPL(kvm_exit);
-- 
2.44.0.769.g3c40516874-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] KVM: Rename functions related to enabling virtualization hardware
  2024-04-25 23:39 [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
                   ` (2 preceding siblings ...)
  2024-04-25 23:39 ` [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
@ 2024-04-25 23:39 ` Sean Christopherson
  3 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-04-25 23:39 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini; +Cc: kvm, linux-kernel

Rename the various functions that enable virtualization to prepare for
upcoming changes, and to clean up artifacts of KVM's previous behavior,
which required manually juggling locks around kvm_usage_count.

Drop the "nolock" qualifier from per-CPU functions now that there are no
"nolock" implementations of the "all" variants, i.e. now that calling a
non-nolock function from a nolock function isn't confusing (unlike this
sentence).

Drop "all" from the outer helpers as they no longer manually iterate
over all CPUs, and because it might not be obvious what "all" refers to.
Instead, use double-underscores to communicate that the per-CPU functions
are helpers to the outer APIs.

Prepend "kvm" to all functions to prepare for exposing the outermost
enable/disable APIs to external users (Intel's TDX needs to enable
virtualization during KVM initialization).

Lastly, use "virtualization" instead of "hardware", because while the
functions do enable virtualization in hardware, there are a _lot_ of
things that KVM enables in hardware.

E.g. calling kvm_hardware_enable() or kvm_hardware_enable_all() from
TDX code is less intuitive than kvm_enable_virtualization().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ad1b5a9e86d4..7579bda0e310 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -139,8 +139,8 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
 #define KVM_COMPAT(c)	.compat_ioctl	= kvm_no_compat_ioctl,	\
 			.open		= kvm_no_compat_open
 #endif
-static int hardware_enable_all(void);
-static void hardware_disable_all(void);
+static int kvm_enable_virtualization(void);
+static void kvm_disable_virtualization(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 
@@ -1213,7 +1213,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 	if (r)
 		goto out_err_no_arch_destroy_vm;
 
-	r = hardware_enable_all();
+	r = kvm_enable_virtualization();
 	if (r)
 		goto out_err_no_disable;
 
@@ -1256,7 +1256,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 		mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
 #endif
 out_err_no_mmu_notifier:
-	hardware_disable_all();
+	kvm_disable_virtualization();
 out_err_no_disable:
 	kvm_arch_destroy_vm(kvm);
 out_err_no_arch_destroy_vm:
@@ -1345,7 +1345,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 #endif
 	kvm_arch_free_vm(kvm);
 	preempt_notifier_dec();
-	hardware_disable_all();
+	kvm_disable_virtualization();
 	mmdrop(mm);
 }
 
@@ -5490,7 +5490,7 @@ EXPORT_SYMBOL_GPL(kvm_rebooting);
 static DEFINE_PER_CPU(bool, hardware_enabled);
 static int kvm_usage_count;
 
-static int hardware_enable_nolock(void)
+static int __kvm_enable_virtualization(void)
 {
 	if (__this_cpu_read(hardware_enabled))
 		return 0;
@@ -5512,10 +5512,10 @@ static int kvm_online_cpu(unsigned int cpu)
 	 * be enabled. Otherwise running VMs would encounter unrecoverable
 	 * errors when scheduled to this CPU.
 	 */
-	return hardware_enable_nolock();
+	return __kvm_enable_virtualization();
 }
 
-static void hardware_disable_nolock(void *ign)
+static void __kvm_disable_virtualization(void *ign)
 {
 	if (!__this_cpu_read(hardware_enabled))
 		return;
@@ -5527,7 +5527,7 @@ static void hardware_disable_nolock(void *ign)
 
 static int kvm_offline_cpu(unsigned int cpu)
 {
-	hardware_disable_nolock(NULL);
+	__kvm_disable_virtualization(NULL);
 	return 0;
 }
 
@@ -5546,7 +5546,7 @@ static void kvm_shutdown(void)
 	 */
 	pr_info("kvm: exiting hardware virtualization\n");
 	kvm_rebooting = true;
-	on_each_cpu(hardware_disable_nolock, NULL, 1);
+	on_each_cpu(__kvm_disable_virtualization, NULL, 1);
 }
 
 static int kvm_suspend(void)
@@ -5562,7 +5562,7 @@ static int kvm_suspend(void)
 	lockdep_assert_not_held(&kvm_lock);
 	lockdep_assert_irqs_disabled();
 
-	hardware_disable_nolock(NULL);
+	__kvm_disable_virtualization(NULL);
 	return 0;
 }
 
@@ -5571,7 +5571,7 @@ static void kvm_resume(void)
 	lockdep_assert_not_held(&kvm_lock);
 	lockdep_assert_irqs_disabled();
 
-	WARN_ON_ONCE(hardware_enable_nolock());
+	WARN_ON_ONCE(__kvm_enable_virtualization());
 }
 
 static struct syscore_ops kvm_syscore_ops = {
@@ -5580,7 +5580,7 @@ static struct syscore_ops kvm_syscore_ops = {
 	.shutdown = kvm_shutdown,
 };
 
-static int hardware_enable_all(void)
+static int kvm_enable_virtualization(void)
 {
 	int r;
 
@@ -5617,7 +5617,7 @@ static int hardware_enable_all(void)
 	return 0;
 }
 
-static void hardware_disable_all(void)
+static void kvm_disable_virtualization(void)
 {
 	guard(mutex)(&kvm_lock);
 
@@ -5628,12 +5628,12 @@ static void hardware_disable_all(void)
 	cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
 }
 #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
-static int hardware_enable_all(void)
+static int kvm_enable_virtualization(void)
 {
 	return 0;
 }
 
-static void hardware_disable_all(void)
+static void kvm_disable_virtualization(void)
 {
 
 }
-- 
2.44.0.769.g3c40516874-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware
  2024-04-25 23:39 ` [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
@ 2024-04-26  8:32   ` Chao Gao
  2024-04-26 17:07     ` Sean Christopherson
  0 siblings, 1 reply; 9+ messages in thread
From: Chao Gao @ 2024-04-26  8:32 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

>+static int hardware_enable_all(void)
>+{
>+	int r;
>+
>+	guard(mutex)(&kvm_lock);
>+
>+	if (kvm_usage_count++)
>+		return 0;
>+
>+	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
>+			      kvm_online_cpu, kvm_offline_cpu);

A subtle change is: cpuhp_setup_state() calls kvm_online_cpu() serially
on all CPUs. Previously, hardware enabling is done with on_each_cpu().
I assume performance isn't a concern here. Right?

>+	if (r)
>+		return r;

decrease kvm_usage_count on error?

>+
>+	register_syscore_ops(&kvm_syscore_ops);
>+
>+	/*
>+	 * Undo virtualization enabling and bail if the system is going down.
>+	 * If userspace initiated a forced reboot, e.g. reboot -f, then it's
>+	 * possible for an in-flight operation to enable virtualization after
>+	 * syscore_shutdown() is called, i.e. without kvm_shutdown() being
>+	 * invoked.  Note, this relies on system_state being set _before_
>+	 * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked
>+	 * or this CPU observes the impending shutdown.  Which is why KVM uses
>+	 * a syscore ops hook instead of registering a dedicated reboot
>+	 * notifier (the latter runs before system_state is updated).
>+	 */
>+	if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
>+	    system_state == SYSTEM_RESTART) {
>+		unregister_syscore_ops(&kvm_syscore_ops);
>+		cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
>+		return -EBUSY;

ditto

>+	}
>+
>+	return 0;
>+}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops
  2024-04-25 23:39 ` [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops Sean Christopherson
@ 2024-04-26  8:52   ` Chao Gao
  2024-04-26 17:08     ` Sean Christopherson
  0 siblings, 1 reply; 9+ messages in thread
From: Chao Gao @ 2024-04-26  8:52 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel

>diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
>index 502704596c83..afddfe3747dd 100644
>--- a/arch/x86/kvm/vmx/x86_ops.h
>+++ b/arch/x86/kvm/vmx/x86_ops.h
>@@ -15,6 +15,7 @@ void vmx_hardware_unsetup(void);
> int vmx_check_processor_compat(void);
> int vmx_hardware_enable(void);
> void vmx_hardware_disable(void);
>+void vmx_emergency_disable(void);
> int vmx_vm_init(struct kvm *kvm);
> void vmx_vm_destroy(struct kvm *kvm);
> int vmx_vcpu_precreate(struct kvm *kvm);
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index e9ef1fa4b90b..12e88aa2cca2 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -9797,6 +9797,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> 
> 	kvm_ops_update(ops);
> 
>+	cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable);
>+

vmx_emergency_disable() accesses loaded_vmcss_on_cpu but now it may be called
before loaded_vmcss_on_cpu is initialized. This may be not a problem for now
given the check for X86_CR4_VMXE  in vmx_emergency_disable(). But relying on
that check is fragile. I think it is better to apply the patch below from Isaku
before this patch.

https://lore.kernel.org/kvm/c1b7f0e5c2476f9f565acda5c1e746b8d181499b.1708933498.git.isaku.yamahata@intel.com/

> 	for_each_online_cpu(cpu) {
> 		smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1);
> 		if (r < 0)
>@@ -9847,6 +9849,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> 	return 0;
> 
> out_unwind_ops:
>+	cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable);
> 	kvm_x86_ops.hardware_enable = NULL;
> 	static_call(kvm_x86_hardware_unsetup)();
> out_mmu_exit:
>@@ -9887,6 +9890,8 @@ void kvm_x86_vendor_exit(void)
> 	static_key_deferred_flush(&kvm_xen_enabled);
> 	WARN_ON(static_branch_unlikely(&kvm_xen_enabled.key));
> #endif
>+	cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable);
>+
> 	mutex_lock(&vendor_module_lock);
> 	kvm_x86_ops.hardware_enable = NULL;
> 	mutex_unlock(&vendor_module_lock);
>-- 
>2.44.0.769.g3c40516874-goog
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware
  2024-04-26  8:32   ` Chao Gao
@ 2024-04-26 17:07     ` Sean Christopherson
  0 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-04-26 17:07 UTC (permalink / raw)
  To: Chao Gao; +Cc: Paolo Bonzini, kvm, linux-kernel

On Fri, Apr 26, 2024, Chao Gao wrote:
> >+static int hardware_enable_all(void)
> >+{
> >+	int r;
> >+
> >+	guard(mutex)(&kvm_lock);
> >+
> >+	if (kvm_usage_count++)
> >+		return 0;
> >+
> >+	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> >+			      kvm_online_cpu, kvm_offline_cpu);
> 
> A subtle change is: cpuhp_setup_state() calls kvm_online_cpu() serially
> on all CPUs. Previously, hardware enabling is done with on_each_cpu().

Ooh, I hadn't considered that side effect.

> I assume performance isn't a concern here. Right?

Hmm, performance isn't critical, but I wouldn't be at all surprised if it's
noticeable and problematic for large systems.  Assuming we do end up going this
route, maybe we can "solve" this by documenting KVM's behavior, e.g. so that if
userspace cares about the latency of VM creation, it can that fudge around the
potential latency problem by creating a dummy VM.  Hrm, that's pretty gross though.

Oh!  Hah!  What if we add a module param to let userspace force virtualization to
be enabled when KVM is loaded?  And then kvm_enable_virtualization() doesn't even
need to be exported/public, because KVM can simply require enable_virt_at_load
(or whatever is a good name) to be %true in order to enable TDX.

I don't think there's any real danger for abuse, e.g. *unprivileged* userspace
can effectively do this already by creating a dummy VM.

Am I missing something?  This seems too easy.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7579bda0e310..1bfbb612a98f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,6 +96,9 @@ unsigned int halt_poll_ns_shrink;
 module_param(halt_poll_ns_shrink, uint, 0644);
 EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
 
+static bool enable_virt_at_load;
+module_param(enable_virt_at_load, uint, 0444);
+
 /*
  * Ordering of locks:
  *
@@ -6377,6 +6380,12 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 
        kvm_gmem_init(module);
 
+       if (enable_virt_at_load) {
+               r = kvm_enable_virtualization();
+               if (r)
+                       goto err_virt;
+       }
+
        /*
         * Registration _must_ be the very last thing done, as this exposes
         * /dev/kvm to userspace, i.e. all infrastructure must be setup!
@@ -6390,6 +6399,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
        return 0;
 
 err_register:
+       kvm_disable_virtualization();
+err_virt:
        kvm_vfio_ops_exit();
 err_vfio:
        kvm_async_pf_deinit();
@@ -6415,6 +6426,11 @@ void kvm_exit(void)
         */
        misc_deregister(&kvm_dev);
 
+       if (enable_virt_at_load) {
+               kvm_disable_virtualization();
+               BUG_ON(kvm_usage_count);
+       }
+
        debugfs_remove_recursive(kvm_debugfs_dir);
        for_each_possible_cpu(cpu)
                free_cpumask_var(per_cpu(cpu_kick_mask, cpu));

> >+	if (r)
> >+		return r;
> 
> decrease kvm_usage_count on error?

/facepalm, yes.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops
  2024-04-26  8:52   ` Chao Gao
@ 2024-04-26 17:08     ` Sean Christopherson
  0 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-04-26 17:08 UTC (permalink / raw)
  To: Chao Gao; +Cc: Paolo Bonzini, kvm, linux-kernel

On Fri, Apr 26, 2024, Chao Gao wrote:
> >diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> >index 502704596c83..afddfe3747dd 100644
> >--- a/arch/x86/kvm/vmx/x86_ops.h
> >+++ b/arch/x86/kvm/vmx/x86_ops.h
> >@@ -15,6 +15,7 @@ void vmx_hardware_unsetup(void);
> > int vmx_check_processor_compat(void);
> > int vmx_hardware_enable(void);
> > void vmx_hardware_disable(void);
> >+void vmx_emergency_disable(void);
> > int vmx_vm_init(struct kvm *kvm);
> > void vmx_vm_destroy(struct kvm *kvm);
> > int vmx_vcpu_precreate(struct kvm *kvm);
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index e9ef1fa4b90b..12e88aa2cca2 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -9797,6 +9797,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> > 
> > 	kvm_ops_update(ops);
> > 
> >+	cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable);
> >+
> 
> vmx_emergency_disable() accesses loaded_vmcss_on_cpu but now it may be called
> before loaded_vmcss_on_cpu is initialized. This may be not a problem for now
> given the check for X86_CR4_VMXE  in vmx_emergency_disable(). But relying on
> that check is fragile. I think it is better to apply the patch below from Isaku
> before this patch.
> 
> https://lore.kernel.org/kvm/c1b7f0e5c2476f9f565acda5c1e746b8d181499b.1708933498.git.isaku.yamahata@intel.com/

Agreed, good eyeballs, and thanks for the reviews!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-26 17:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-25 23:39 [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
2024-04-25 23:39 ` [PATCH 1/4] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
2024-04-25 23:39 ` [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops Sean Christopherson
2024-04-26  8:52   ` Chao Gao
2024-04-26 17:08     ` Sean Christopherson
2024-04-25 23:39 ` [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
2024-04-26  8:32   ` Chao Gao
2024-04-26 17:07     ` Sean Christopherson
2024-04-25 23:39 ` [PATCH 4/4] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).