All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features
@ 2024-04-04 12:13 Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 01/17] KVM: SVM: Invert handling of SEV and SEV_ES feature flags Paolo Bonzini
                   ` (16 more replies)
  0 siblings, 17 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata, Dave Hansen

This is the same as v4, except for the following minor changes:

- moving the KVM_X86_SEV_VMSA_FEATURES attribute to a
  separate group, KVM_X86_GRP_SEV [Isaku]

- as part of the previous change, retroactively define group 0
  as "KVM_X86_GRP_SYSTEM"

- squashing in the "fixup! KVM: SEV: sync FPU and AVX state at
  LAUNCH_UPDATE_VMSA time" patch

- disabling FPU and AVX sync for the old-style KVM_SEV_ES_INIT
  ioctl [Michael]

- adding an fstp instruction to the new test case, in order to
  keep the x87 stack balanced (just for cleanliness/paranoia)

Paolo Bonzini (16):
  KVM: SVM: Compile sev.c if and only if CONFIG_KVM_AMD_SEV=y
  KVM: x86: use u64_to_user_ptr()
  KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR
  KVM: SEV: publish supported VMSA features
  KVM: SEV: store VMSA features in kvm_sev_info
  KVM: x86: add fields to struct kvm_arch for CoCo features
  KVM: x86: Add supported_vm_types to kvm_caps
  KVM: SEV: introduce to_kvm_sev_info
  KVM: SEV: define VM types for SEV and SEV-ES
  KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time
  KVM: SEV: introduce KVM_SEV_INIT2 operation
  KVM: SEV: allow SEV-ES DebugSwap again
  selftests: kvm: add tests for KVM_SEV_INIT2
  selftests: kvm: switch to using KVM_X86_*_VM
  selftests: kvm: split "launch" phase of SEV VM creation
  selftests: kvm: add test for transferring FPU state into VMSA

Sean Christopherson (1):
  KVM: SVM: Invert handling of SEV and SEV_ES feature flags

 Documentation/virt/kvm/api.rst                |   2 +
 .../virt/kvm/x86/amd-memory-encryption.rst    |  52 ++++-
 arch/x86/include/asm/fpu/api.h                |   3 +
 arch/x86/include/asm/kvm-x86-ops.h            |   1 +
 arch/x86/include/asm/kvm_host.h               |   8 +-
 arch/x86/include/uapi/asm/kvm.h               |  20 +-
 arch/x86/kernel/fpu/xstate.c                  |   1 +
 arch/x86/kernel/fpu/xstate.h                  |   2 -
 arch/x86/kvm/Makefile                         |   7 +-
 arch/x86/kvm/cpuid.c                          |   2 +-
 arch/x86/kvm/svm/sev.c                        | 190 ++++++++++++++----
 arch/x86/kvm/svm/svm.c                        |  27 ++-
 arch/x86/kvm/svm/svm.h                        |  54 +++--
 arch/x86/kvm/x86.c                            | 165 +++++++++------
 arch/x86/kvm/x86.h                            |   2 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/kvm_util_base.h     |  11 +-
 .../selftests/kvm/include/x86_64/processor.h  |   6 -
 .../selftests/kvm/include/x86_64/sev.h        |  19 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    |   1 -
 .../selftests/kvm/lib/x86_64/processor.c      |  14 +-
 tools/testing/selftests/kvm/lib/x86_64/sev.c  |  44 +++-
 .../selftests/kvm/set_memory_region_test.c    |   8 +-
 .../selftests/kvm/x86_64/sev_init2_tests.c    | 152 ++++++++++++++
 .../selftests/kvm/x86_64/sev_smoke_test.c     |  96 ++++++++-
 25 files changed, 703 insertions(+), 185 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/sev_init2_tests.c

-- 
2.43.0






^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 01/17] KVM: SVM: Invert handling of SEV and SEV_ES feature flags
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 02/17] KVM: SVM: Compile sev.c if and only if CONFIG_KVM_AMD_SEV=y Paolo Bonzini
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

Leave SEV and SEV_ES '0' in kvm_cpu_caps by default, and instead set them
in sev_set_cpu_caps() if SEV and SEV-ES support are fully enabled.  Aside
from the fact that sev_set_cpu_caps() is wildly misleading when it *clears*
capabilities, this will allow compiling out sev.c without falsely
advertising SEV/SEV-ES support in KVM_GET_SUPPORTED_CPUID.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/cpuid.c   | 2 +-
 arch/x86/kvm/svm/sev.c | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index bfc0bfcb2bc6..51bd2197feed 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -771,7 +771,7 @@ void kvm_set_cpu_caps(void)
 	kvm_cpu_cap_mask(CPUID_8000_000A_EDX, 0);
 
 	kvm_cpu_cap_mask(CPUID_8000_001F_EAX,
-		0 /* SME */ | F(SEV) | 0 /* VM_PAGE_FLUSH */ | F(SEV_ES) |
+		0 /* SME */ | 0 /* SEV */ | 0 /* VM_PAGE_FLUSH */ | 0 /* SEV_ES */ |
 		F(SME_COHERENT));
 
 	kvm_cpu_cap_mask(CPUID_8000_0021_EAX,
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e5a4d9b0e79f..382c745b8ba9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2186,10 +2186,10 @@ void sev_vm_destroy(struct kvm *kvm)
 
 void __init sev_set_cpu_caps(void)
 {
-	if (!sev_enabled)
-		kvm_cpu_cap_clear(X86_FEATURE_SEV);
-	if (!sev_es_enabled)
-		kvm_cpu_cap_clear(X86_FEATURE_SEV_ES);
+	if (sev_enabled)
+		kvm_cpu_cap_set(X86_FEATURE_SEV);
+	if (sev_es_enabled)
+		kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
 }
 
 void __init sev_hardware_setup(void)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 02/17] KVM: SVM: Compile sev.c if and only if CONFIG_KVM_AMD_SEV=y
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 01/17] KVM: SVM: Invert handling of SEV and SEV_ES feature flags Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 03/17] KVM: x86: use u64_to_user_ptr() Paolo Bonzini
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata, Sean Christopherson

Stop compiling sev.c when CONFIG_KVM_AMD_SEV=n, as the number of #ifdefs
in sev.c is getting ridiculous, and having #ifdefs inside of SEV helpers
is quite confusing.

To minimize #ifdefs in code flows, #ifdef away only the kvm_x86_ops hooks
and the #VMGEXIT handler. Stubs are also restricted to functions that
check sev_enabled and to the destruction functions sev_free_cpu() and
sev_vm_destroy(), where the style of their callers is to leave checks
to the callers.  Most call sites instead rely on dead code elimination
to take care of functions that are guarded with sev_guest() or
sev_es_guest().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/Makefile  |  7 ++++---
 arch/x86/kvm/svm/sev.c | 24 ++--------------------
 arch/x86/kvm/svm/svm.c |  5 ++++-
 arch/x86/kvm/svm/svm.h | 45 ++++++++++++++++++++++++++----------------
 4 files changed, 38 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index a88bb14266b6..a358bf5e3a65 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -26,9 +26,10 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
 kvm-intel-$(CONFIG_KVM_HYPERV)	+= vmx/hyperv.o vmx/hyperv_evmcs.o
 
-kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
-			   svm/sev.o
-kvm-amd-$(CONFIG_KVM_HYPERV) += svm/hyperv.o
+kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o
+
+kvm-amd-$(CONFIG_KVM_AMD_SEV)	+= svm/sev.o
+kvm-amd-$(CONFIG_KVM_HYPERV)	+= svm/hyperv.o
 
 ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 382c745b8ba9..5d41f27a8af5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -32,22 +32,9 @@
 #include "cpuid.h"
 #include "trace.h"
 
-#ifndef CONFIG_KVM_AMD_SEV
-/*
- * When this config is not defined, SEV feature is not supported and APIs in
- * this file are not used but this file still gets compiled into the KVM AMD
- * module.
- *
- * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum
- * misc_res_type {} defined in linux/misc_cgroup.h.
- *
- * Below macros allow compilation to succeed.
- */
-#define MISC_CG_RES_SEV MISC_CG_RES_TYPES
-#define MISC_CG_RES_SEV_ES MISC_CG_RES_TYPES
-#endif
+#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MIN	1ULL
 
-#ifdef CONFIG_KVM_AMD_SEV
 /* enable/disable SEV support */
 static bool sev_enabled = true;
 module_param_named(sev, sev_enabled, bool, 0444);
@@ -59,11 +46,6 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 /* enable/disable SEV-ES DebugSwap support */
 static bool sev_es_debug_swap_enabled = false;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
-#else
-#define sev_enabled false
-#define sev_es_enabled false
-#define sev_es_debug_swap_enabled false
-#endif /* CONFIG_KVM_AMD_SEV */
 
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
@@ -2194,7 +2176,6 @@ void __init sev_set_cpu_caps(void)
 
 void __init sev_hardware_setup(void)
 {
-#ifdef CONFIG_KVM_AMD_SEV
 	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
@@ -2294,7 +2275,6 @@ void __init sev_hardware_setup(void)
 	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
 	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
 		sev_es_debug_swap_enabled = false;
-#endif
 }
 
 void sev_hardware_unsetup(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d1a9f9951635..e7f47a1f3eb1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3303,7 +3303,9 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
 	[SVM_EXIT_AVIC_UNACCELERATED_ACCESS]	= avic_unaccelerated_access_interception,
+#ifdef CONFIG_KVM_AMD_SEV
 	[SVM_EXIT_VMGEXIT]			= sev_handle_vmgexit,
+#endif
 };
 
 static void dump_vmcb(struct kvm_vcpu *vcpu)
@@ -5023,6 +5025,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.enable_smi_window = svm_enable_smi_window,
 #endif
 
+#ifdef CONFIG_KVM_AMD_SEV
 	.mem_enc_ioctl = sev_mem_enc_ioctl,
 	.mem_enc_register_region = sev_mem_enc_register_region,
 	.mem_enc_unregister_region = sev_mem_enc_unregister_region,
@@ -5030,7 +5033,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.vm_copy_enc_context_from = sev_vm_copy_enc_context_from,
 	.vm_move_enc_context_from = sev_vm_move_enc_context_from,
-
+#endif
 	.check_emulate_instruction = svm_check_emulate_instruction,
 
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7f1fbd874c45..ec8ca7d92cf1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -664,13 +664,16 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
-#define GHCB_VERSION_MAX	1ULL
-#define GHCB_VERSION_MIN	1ULL
+void pre_sev_run(struct vcpu_svm *svm, int cpu);
+void sev_init_vmcb(struct vcpu_svm *svm);
+void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
+int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
+void sev_es_vcpu_reset(struct vcpu_svm *svm);
+void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
+void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 
-
-extern unsigned int max_sev_asid;
-
-void sev_vm_destroy(struct kvm *kvm);
+#ifdef CONFIG_KVM_AMD_SEV
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp);
 int sev_mem_enc_register_region(struct kvm *kvm,
 				struct kvm_enc_region *range);
@@ -679,22 +682,30 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
 int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd);
 int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd);
 void sev_guest_memory_reclaimed(struct kvm *kvm);
+int sev_handle_vmgexit(struct kvm_vcpu *vcpu);
 
-void pre_sev_run(struct vcpu_svm *svm, int cpu);
+/* These symbols are used in common code and are stubbed below.  */
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void sev_free_vcpu(struct kvm_vcpu *vcpu);
+void sev_vm_destroy(struct kvm *kvm);
 void __init sev_set_cpu_caps(void);
 void __init sev_hardware_setup(void);
 void sev_hardware_unsetup(void);
 int sev_cpu_init(struct svm_cpu_data *sd);
-void sev_init_vmcb(struct vcpu_svm *svm);
-void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
-void sev_free_vcpu(struct kvm_vcpu *vcpu);
-int sev_handle_vmgexit(struct kvm_vcpu *vcpu);
-int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
-void sev_es_vcpu_reset(struct vcpu_svm *svm);
-void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
-void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
-void sev_es_unmap_ghcb(struct vcpu_svm *svm);
-struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+extern unsigned int max_sev_asid;
+#else
+static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
+	return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+}
+
+static inline void sev_free_vcpu(struct kvm_vcpu *vcpu) {}
+static inline void sev_vm_destroy(struct kvm *kvm) {}
+static inline void __init sev_set_cpu_caps(void) {}
+static inline void __init sev_hardware_setup(void) {}
+static inline void sev_hardware_unsetup(void) {}
+static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
+#define max_sev_asid 0
+#endif
 
 /* vmenter.S */
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 03/17] KVM: x86: use u64_to_user_ptr()
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 01/17] KVM: SVM: Invert handling of SEV and SEV_ES feature flags Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 02/17] KVM: SVM: Compile sev.c if and only if CONFIG_KVM_AMD_SEV=y Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 04/17] KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR Paolo Bonzini
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata, Sean Christopherson

There is no danger to the kernel if 32-bit userspace provides a 64-bit
value that has the high bits set, but for whatever reason happens to
resolve to an address that has something mapped there.  KVM uses the
checked version of get_user() and put_user(), so any faults are caught
properly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47d9f03b7778..3d2029402513 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4842,25 +4842,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	return r;
 }
 
-static inline void __user *kvm_get_attr_addr(struct kvm_device_attr *attr)
-{
-	void __user *uaddr = (void __user*)(unsigned long)attr->addr;
-
-	if ((u64)(unsigned long)uaddr != attr->addr)
-		return ERR_PTR_USR(-EFAULT);
-	return uaddr;
-}
-
 static int kvm_x86_dev_get_attr(struct kvm_device_attr *attr)
 {
-	u64 __user *uaddr = kvm_get_attr_addr(attr);
+	u64 __user *uaddr = u64_to_user_ptr(attr->addr);
 
 	if (attr->group)
 		return -ENXIO;
 
-	if (IS_ERR(uaddr))
-		return PTR_ERR(uaddr);
-
 	switch (attr->attr) {
 	case KVM_X86_XCOMP_GUEST_SUPP:
 		if (put_user(kvm_caps.supported_xcr0, uaddr))
@@ -5712,12 +5700,9 @@ static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu,
 static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu,
 				 struct kvm_device_attr *attr)
 {
-	u64 __user *uaddr = kvm_get_attr_addr(attr);
+	u64 __user *uaddr = u64_to_user_ptr(attr->addr);
 	int r;
 
-	if (IS_ERR(uaddr))
-		return PTR_ERR(uaddr);
-
 	switch (attr->attr) {
 	case KVM_VCPU_TSC_OFFSET:
 		r = -EFAULT;
@@ -5735,13 +5720,10 @@ static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu,
 static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu,
 				 struct kvm_device_attr *attr)
 {
-	u64 __user *uaddr = kvm_get_attr_addr(attr);
+	u64 __user *uaddr = u64_to_user_ptr(attr->addr);
 	struct kvm *kvm = vcpu->kvm;
 	int r;
 
-	if (IS_ERR(uaddr))
-		return PTR_ERR(uaddr);
-
 	switch (attr->attr) {
 	case KVM_VCPU_TSC_OFFSET: {
 		u64 offset, tsc, ns;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 04/17] KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (2 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 03/17] KVM: x86: use u64_to_user_ptr() Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 21:30   ` Isaku Yamahata
  2024-04-04 12:13 ` [PATCH v5 05/17] KVM: SEV: publish supported VMSA features Paolo Bonzini
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

Allow vendor modules to provide their own attributes on /dev/kvm.
To avoid proliferation of vendor ops, implement KVM_HAS_DEVICE_ATTR
and KVM_GET_DEVICE_ATTR in terms of the same function.  You're not
supposed to use KVM_GET_DEVICE_ATTR to do complicated computations,
especially on /dev/kvm.

Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/x86.c                 | 38 +++++++++++++++++++-----------
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 110d7f29ca9a..5187fcf4b610 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -121,6 +121,7 @@ KVM_X86_OP(enter_smm)
 KVM_X86_OP(leave_smm)
 KVM_X86_OP(enable_smi_window)
 #endif
+KVM_X86_OP_OPTIONAL(dev_get_attr)
 KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
 KVM_X86_OP_OPTIONAL(mem_enc_register_region)
 KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 16e07a2eee19..04c430eb25cf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1778,6 +1778,7 @@ struct kvm_x86_ops {
 	void (*enable_smi_window)(struct kvm_vcpu *vcpu);
 #endif
 
+	int (*dev_get_attr)(u32 group, u64 attr, u64 *val);
 	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
 	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d2029402513..3934e7682734 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4842,34 +4842,44 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	return r;
 }
 
-static int kvm_x86_dev_get_attr(struct kvm_device_attr *attr)
+static int __kvm_x86_dev_get_attr(struct kvm_device_attr *attr, u64 *val)
 {
-	u64 __user *uaddr = u64_to_user_ptr(attr->addr);
-
-	if (attr->group)
+	if (attr->group) {
+		if (kvm_x86_ops.dev_get_attr)
+			return static_call(kvm_x86_dev_get_attr)(attr->group, attr->attr, val);
 		return -ENXIO;
+	}
 
 	switch (attr->attr) {
 	case KVM_X86_XCOMP_GUEST_SUPP:
-		if (put_user(kvm_caps.supported_xcr0, uaddr))
-			return -EFAULT;
+		*val = kvm_caps.supported_xcr0;
 		return 0;
 	default:
 		return -ENXIO;
 	}
 }
 
+static int kvm_x86_dev_get_attr(struct kvm_device_attr *attr)
+{
+	u64 __user *uaddr = u64_to_user_ptr(attr->addr);
+	int r;
+	u64 val;
+
+	r = __kvm_x86_dev_get_attr(attr, &val);
+	if (r < 0)
+		return r;
+
+	if (put_user(val, uaddr))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
 {
-	if (attr->group)
-		return -ENXIO;
+	u64 val;
 
-	switch (attr->attr) {
-	case KVM_X86_XCOMP_GUEST_SUPP:
-		return 0;
-	default:
-		return -ENXIO;
-	}
+	return __kvm_x86_dev_get_attr(attr, &val);
 }
 
 long kvm_arch_dev_ioctl(struct file *filp,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 05/17] KVM: SEV: publish supported VMSA features
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (3 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 04/17] KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 21:32   ` Isaku Yamahata
  2024-04-04 12:13 ` [PATCH v5 06/17] KVM: SEV: store VMSA features in kvm_sev_info Paolo Bonzini
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

Compute the set of features to be stored in the VMSA when KVM is
initialized; move it from there into kvm_sev_info when SEV is initialized,
and then into the initial VMSA.

The new variable can then be used to return the set of supported features
to userspace, via the KVM_GET_DEVICE_ATTR ioctl.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    | 12 ++++++++++
 arch/x86/include/uapi/asm/kvm.h               |  9 +++++--
 arch/x86/kvm/svm/sev.c                        | 24 +++++++++++++++++--
 arch/x86/kvm/svm/svm.c                        |  1 +
 arch/x86/kvm/svm/svm.h                        |  2 ++
 5 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 84335d119ff1..2ea648e4c97a 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -425,6 +425,18 @@ issued by the hypervisor to make the guest ready for execution.
 
 Returns: 0 on success, -negative on error
 
+Device attribute API
+====================
+
+Attributes of the SEV implementation can be retrieved through the
+``KVM_HAS_DEVICE_ATTR`` and ``KVM_GET_DEVICE_ATTR`` ioctls on the ``/dev/kvm``
+device node, using group ``KVM_X86_GRP_SEV``.
+
+Currently only one attribute is implemented:
+
+* ``KVM_X86_SEV_VMSA_FEATURES``: return the set of all bits that
+  are accepted in the ``vmsa_features`` of ``KVM_SEV_INIT2``.
+
 Firmware Management
 ===================
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index ef11aa4cab42..b7dc515f4c27 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -457,8 +457,13 @@ struct kvm_sync_regs {
 
 #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE	0x00000001
 
-/* attributes for system fd (group 0) */
-#define KVM_X86_XCOMP_GUEST_SUPP	0
+/* vendor-independent attributes for system fd (group 0) */
+#define KVM_X86_GRP_SYSTEM		0
+#  define KVM_X86_XCOMP_GUEST_SUPP	0
+
+/* vendor-specific groups and attributes for system fd */
+#define KVM_X86_GRP_SEV			1
+#  define KVM_X86_SEV_VMSA_FEATURES	0
 
 struct kvm_vmx_nested_state_data {
 	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5d41f27a8af5..5055935dfd1d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -46,6 +46,7 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 /* enable/disable SEV-ES DebugSwap support */
 static bool sev_es_debug_swap_enabled = false;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
+static u64 sev_supported_vmsa_features;
 
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
@@ -603,8 +604,8 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->xss  = svm->vcpu.arch.ia32_xss;
 	save->dr6  = svm->vcpu.arch.dr6;
 
-	if (sev_es_debug_swap_enabled) {
-		save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+	if (sev_supported_vmsa_features) {
+		save->sev_features = sev_supported_vmsa_features;
 		pr_warn_once("Enabling DebugSwap with KVM_SEV_ES_INIT. "
 			     "This will not work starting with Linux 6.10\n");
 	}
@@ -1843,6 +1844,21 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+int sev_dev_get_attr(u32 group, u64 attr, u64 *val)
+{
+	if (group != KVM_X86_GRP_SEV)
+		return -ENXIO;
+
+	switch (attr) {
+	case KVM_X86_SEV_VMSA_FEATURES:
+		*val = sev_supported_vmsa_features;
+		return 0;
+
+	default:
+		return -ENXIO;
+	}
+}
+
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -2275,6 +2291,10 @@ void __init sev_hardware_setup(void)
 	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
 	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
 		sev_es_debug_swap_enabled = false;
+
+	sev_supported_vmsa_features = 0;
+	if (sev_es_debug_swap_enabled)
+		sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP;
 }
 
 void sev_hardware_unsetup(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e7f47a1f3eb1..450535d6757f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5026,6 +5026,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 #endif
 
 #ifdef CONFIG_KVM_AMD_SEV
+	.dev_get_attr = sev_dev_get_attr,
 	.mem_enc_ioctl = sev_mem_enc_ioctl,
 	.mem_enc_register_region = sev_mem_enc_register_region,
 	.mem_enc_unregister_region = sev_mem_enc_unregister_region,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ec8ca7d92cf1..1c6601a9cbbf 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -692,6 +692,7 @@ void __init sev_set_cpu_caps(void);
 void __init sev_hardware_setup(void);
 void sev_hardware_unsetup(void);
 int sev_cpu_init(struct svm_cpu_data *sd);
+int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
 #else
 static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) {
@@ -704,6 +705,7 @@ static inline void __init sev_set_cpu_caps(void) {}
 static inline void __init sev_hardware_setup(void) {}
 static inline void sev_hardware_unsetup(void) {}
 static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
+static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
 #define max_sev_asid 0
 #endif
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 06/17] KVM: SEV: store VMSA features in kvm_sev_info
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (4 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 05/17] KVM: SEV: publish supported VMSA features Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features Paolo Bonzini
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

Right now, the set of features that are stored in the VMSA upon
initialization is fixed and depends on the module parameters for
kvm-amd.ko.  However, the hypervisor cannot really change it at will
because the feature word has to match between the hypervisor and whatever
computes a measurement of the VMSA for attestation purposes.

Add a field to kvm_sev_info that holds the set of features to be stored
in the VMSA; and query it instead of referring to the module parameters.

Because KVM_SEV_INIT and KVM_SEV_ES_INIT accept no parameters, this
does not yet introduce any functional change, but it paves the way for
an API that allows customization of the features per-VM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20240209183743.22030-6-pbonzini@redhat.com>
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/sev.c | 29 +++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.c |  2 +-
 arch/x86/kvm/svm/svm.h |  3 ++-
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5055935dfd1d..e24f7d243a0a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -99,6 +99,14 @@ static inline bool is_mirroring_enc_context(struct kvm *kvm)
 	return !!to_kvm_svm(kvm)->sev_info.enc_context_owner;
 }
 
+static bool sev_vcpu_has_debug_swap(struct vcpu_svm *svm)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
+
+	return sev->vmsa_features & SVM_SEV_FEAT_DEBUG_SWAP;
+}
+
 /* Must be called with the sev_bitmap_lock held */
 static bool __sev_recycle_asids(unsigned int min_asid, unsigned int max_asid)
 {
@@ -248,6 +256,11 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 	sev->active = true;
 	sev->es_active = argp->id == KVM_SEV_ES_INIT;
+	sev->vmsa_features = sev_supported_vmsa_features;
+	if (sev_supported_vmsa_features)
+		pr_warn_once("Enabling DebugSwap with KVM_SEV_ES_INIT. "
+			     "This will not work starting with Linux 6.10\n");
+
 	ret = sev_asid_new(sev);
 	if (ret)
 		goto e_no_asid;
@@ -269,6 +282,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	sev_asid_free(sev);
 	sev->asid = 0;
 e_no_asid:
+	sev->vmsa_features = 0;
 	sev->es_active = false;
 	sev->active = false;
 	return ret;
@@ -563,6 +577,8 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 {
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
 	struct sev_es_save_area *save = svm->sev_es.vmsa;
 
 	/* Check some debug related fields before encrypting the VMSA */
@@ -604,11 +620,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->xss  = svm->vcpu.arch.ia32_xss;
 	save->dr6  = svm->vcpu.arch.dr6;
 
-	if (sev_supported_vmsa_features) {
-		save->sev_features = sev_supported_vmsa_features;
-		pr_warn_once("Enabling DebugSwap with KVM_SEV_ES_INIT. "
-			     "This will not work starting with Linux 6.10\n");
-	}
+	save->sev_features = sev->vmsa_features;
 
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
@@ -1688,6 +1700,7 @@ static void sev_migrate_from(struct kvm *dst_kvm, struct kvm *src_kvm)
 	dst->pages_locked = src->pages_locked;
 	dst->enc_context_owner = src->enc_context_owner;
 	dst->es_active = src->es_active;
+	dst->vmsa_features = src->vmsa_features;
 
 	src->asid = 0;
 	src->active = false;
@@ -3063,7 +3076,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	svm_set_intercept(svm, TRAP_CR8_WRITE);
 
 	vmcb->control.intercepts[INTERCEPT_DR] = 0;
-	if (!sev_es_debug_swap_enabled) {
+	if (!sev_vcpu_has_debug_swap(svm)) {
 		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
 		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
 		recalc_intercepts(svm);
@@ -3118,7 +3131,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 					    sev_enc_bit));
 }
 
-void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
+void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa)
 {
 	/*
 	 * All host state for SEV-ES guests is categorized into three swap types
@@ -3146,7 +3159,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
 	 * the CPU (Type-B). If DebugSwap is disabled/unsupported, the CPU both
 	 * saves and loads debug registers (Type-A).
 	 */
-	if (sev_es_debug_swap_enabled) {
+	if (sev_vcpu_has_debug_swap(svm)) {
 		hostsa->dr0 = native_get_debugreg(0);
 		hostsa->dr1 = native_get_debugreg(1);
 		hostsa->dr2 = native_get_debugreg(2);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 450535d6757f..c22e87ebf0de 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1523,7 +1523,7 @@ static void svm_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 		struct sev_es_save_area *hostsa;
 		hostsa = (struct sev_es_save_area *)(page_address(sd->save_area) + 0x400);
 
-		sev_es_prepare_switch_to_guest(hostsa);
+		sev_es_prepare_switch_to_guest(svm, hostsa);
 	}
 
 	if (tsc_scaling)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1c6601a9cbbf..4a1623cacbae 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -85,6 +85,7 @@ struct kvm_sev_info {
 	unsigned long pages_locked; /* Number of pages locked */
 	struct list_head regions_list;  /* List of registered regions */
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
+	u64 vmsa_features;
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct list_head mirror_vms; /* List of VMs mirroring */
 	struct list_head mirror_entry; /* Use as a list entry of mirrors */
@@ -670,7 +671,7 @@ void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
 void sev_es_vcpu_reset(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
-void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
+void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 
 #ifdef CONFIG_KVM_AMD_SEV
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (5 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 06/17] KVM: SEV: store VMSA features in kvm_sev_info Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 21:39   ` Isaku Yamahata
  2024-04-05 23:01   ` Edgecombe, Rick P
  2024-04-04 12:13 ` [PATCH v5 08/17] KVM: x86: Add supported_vm_types to kvm_caps Paolo Bonzini
                   ` (9 subsequent siblings)
  16 siblings, 2 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

Some VM types have characteristics in common; in fact, the only use
of VM types right now is kvm_arch_has_private_mem and it assumes that
_all_ nonzero VM types have private memory.

We will soon introduce a VM type for SEV and SEV-ES VMs, and at that
point we will have two special characteristics of confidential VMs
that depend on the VM type: not just if memory is private, but
also whether guest state is protected.  For the latter we have
kvm->arch.guest_state_protected, which is only set on a fully initialized
VM.

For VM types with protected guest state, we can actually fix a problem in
the SEV-ES implementation, where ioctls to set registers do not cause an
error even if the VM has been initialized and the guest state encrypted.
Make sure that when using VM types that will become an error.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20240209183743.22030-7-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  7 ++-
 arch/x86/kvm/x86.c              | 93 ++++++++++++++++++++++++++-------
 2 files changed, 79 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 04c430eb25cf..3d56b5bb10e9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1279,12 +1279,14 @@ enum kvm_apicv_inhibit {
 };
 
 struct kvm_arch {
-	unsigned long vm_type;
 	unsigned long n_used_mmu_pages;
 	unsigned long n_requested_mmu_pages;
 	unsigned long n_max_mmu_pages;
 	unsigned int indirect_shadow_pages;
 	u8 mmu_valid_gen;
+	u8 vm_type;
+	bool has_private_mem;
+	bool has_protected_state;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
@@ -2153,8 +2155,9 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
 void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 		       int tdp_max_root_level, int tdp_huge_page_level);
 
+
 #ifdef CONFIG_KVM_PRIVATE_MEM
-#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type != KVM_X86_DEFAULT_VM)
+#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
 #else
 #define kvm_arch_has_private_mem(kvm) false
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3934e7682734..d4a8d896798f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5555,11 +5555,15 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
-					     struct kvm_debugregs *dbgregs)
+static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
+					    struct kvm_debugregs *dbgregs)
 {
 	unsigned int i;
 
+	if (vcpu->kvm->arch.has_protected_state &&
+	    vcpu->arch.guest_state_protected)
+		return -EINVAL;
+
 	memset(dbgregs, 0, sizeof(*dbgregs));
 
 	BUILD_BUG_ON(ARRAY_SIZE(vcpu->arch.db) != ARRAY_SIZE(dbgregs->db));
@@ -5568,6 +5572,7 @@ static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
 
 	dbgregs->dr6 = vcpu->arch.dr6;
 	dbgregs->dr7 = vcpu->arch.dr7;
+	return 0;
 }
 
 static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
@@ -5575,6 +5580,10 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 {
 	unsigned int i;
 
+	if (vcpu->kvm->arch.has_protected_state &&
+	    vcpu->arch.guest_state_protected)
+		return -EINVAL;
+
 	if (dbgregs->flags)
 		return -EINVAL;
 
@@ -5595,8 +5604,8 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
 }
 
 
-static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
-					  u8 *state, unsigned int size)
+static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
+					 u8 *state, unsigned int size)
 {
 	/*
 	 * Only copy state for features that are enabled for the guest.  The
@@ -5614,24 +5623,25 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
 			     XFEATURE_MASK_FPSSE;
 
 	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
-		return;
+		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
 
 	fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size,
 				       supported_xcr0, vcpu->arch.pkru);
+	return 0;
 }
 
-static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
-					 struct kvm_xsave *guest_xsave)
+static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
+					struct kvm_xsave *guest_xsave)
 {
-	kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region,
-				      sizeof(guest_xsave->region));
+	return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region,
+					     sizeof(guest_xsave->region));
 }
 
 static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 					struct kvm_xsave *guest_xsave)
 {
 	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
-		return 0;
+		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
 
 	return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
 					      guest_xsave->region,
@@ -5639,18 +5649,23 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
 					      &vcpu->arch.pkru);
 }
 
-static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
-					struct kvm_xcrs *guest_xcrs)
+static int kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
+				       struct kvm_xcrs *guest_xcrs)
 {
+	if (vcpu->kvm->arch.has_protected_state &&
+	    vcpu->arch.guest_state_protected)
+		return -EINVAL;
+
 	if (!boot_cpu_has(X86_FEATURE_XSAVE)) {
 		guest_xcrs->nr_xcrs = 0;
-		return;
+		return 0;
 	}
 
 	guest_xcrs->nr_xcrs = 1;
 	guest_xcrs->flags = 0;
 	guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK;
 	guest_xcrs->xcrs[0].value = vcpu->arch.xcr0;
+	return 0;
 }
 
 static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
@@ -5658,6 +5673,10 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
 {
 	int i, r = 0;
 
+	if (vcpu->kvm->arch.has_protected_state &&
+	    vcpu->arch.guest_state_protected)
+		return -EINVAL;
+
 	if (!boot_cpu_has(X86_FEATURE_XSAVE))
 		return -EINVAL;
 
@@ -6040,7 +6059,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_GET_DEBUGREGS: {
 		struct kvm_debugregs dbgregs;
 
-		kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
+		r = kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
+		if (r < 0)
+			break;
 
 		r = -EFAULT;
 		if (copy_to_user(argp, &dbgregs,
@@ -6070,7 +6091,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		if (!u.xsave)
 			break;
 
-		kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
+		r = kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
+		if (r < 0)
+			break;
 
 		r = -EFAULT;
 		if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave)))
@@ -6099,7 +6122,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		if (!u.xsave)
 			break;
 
-		kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
+		r = kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
+		if (r < 0)
+			break;
 
 		r = -EFAULT;
 		if (copy_to_user(argp, u.xsave, size))
@@ -6115,7 +6140,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		if (!u.xcrs)
 			break;
 
-		kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
+		r = kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
+		if (r < 0)
+			break;
 
 		r = -EFAULT;
 		if (copy_to_user(argp, u.xcrs,
@@ -6259,6 +6286,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	}
 #endif
 	case KVM_GET_SREGS2: {
+		r = -EINVAL;
+		if (vcpu->kvm->arch.has_protected_state &&
+		    vcpu->arch.guest_state_protected)
+			goto out;
+
 		u.sregs2 = kzalloc(sizeof(struct kvm_sregs2), GFP_KERNEL);
 		r = -ENOMEM;
 		if (!u.sregs2)
@@ -6271,6 +6303,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		break;
 	}
 	case KVM_SET_SREGS2: {
+		r = -EINVAL;
+		if (vcpu->kvm->arch.has_protected_state &&
+		    vcpu->arch.guest_state_protected)
+			goto out;
+
 		u.sregs2 = memdup_user(argp, sizeof(struct kvm_sregs2));
 		if (IS_ERR(u.sregs2)) {
 			r = PTR_ERR(u.sregs2);
@@ -11478,6 +11515,10 @@ static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 
 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
+	if (vcpu->kvm->arch.has_protected_state &&
+	    vcpu->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 	__get_regs(vcpu, regs);
 	vcpu_put(vcpu);
@@ -11519,6 +11560,10 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 
 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
+	if (vcpu->kvm->arch.has_protected_state &&
+	    vcpu->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 	__set_regs(vcpu, regs);
 	vcpu_put(vcpu);
@@ -11591,6 +11636,10 @@ static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
 int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
 				  struct kvm_sregs *sregs)
 {
+	if (vcpu->kvm->arch.has_protected_state &&
+	    vcpu->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 	__get_sregs(vcpu, sregs);
 	vcpu_put(vcpu);
@@ -11858,6 +11907,10 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 {
 	int ret;
 
+	if (vcpu->kvm->arch.has_protected_state &&
+	    vcpu->arch.guest_state_protected)
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 	ret = __set_sregs(vcpu, sregs);
 	vcpu_put(vcpu);
@@ -11975,7 +12028,7 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 	struct fxregs_state *fxsave;
 
 	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
-		return 0;
+		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
 
 	vcpu_load(vcpu);
 
@@ -11998,7 +12051,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 	struct fxregs_state *fxsave;
 
 	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
-		return 0;
+		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
 
 	vcpu_load(vcpu);
 
@@ -12524,6 +12577,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		return -EINVAL;
 
 	kvm->arch.vm_type = type;
+	kvm->arch.has_private_mem =
+		(type == KVM_X86_SW_PROTECTED_VM);
 
 	ret = kvm_page_track_init(kvm);
 	if (ret)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 08/17] KVM: x86: Add supported_vm_types to kvm_caps
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (6 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 09/17] KVM: SEV: introduce to_kvm_sev_info Paolo Bonzini
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata, Sean Christopherson

This simplifies the implementation of KVM_CHECK_EXTENSION(KVM_CAP_VM_TYPES),
and also allows the vendor module to specify which VM types are supported.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 12 ++++++------
 arch/x86/kvm/x86.h |  2 ++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d4a8d896798f..d584f5739402 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -94,6 +94,7 @@
 
 struct kvm_caps kvm_caps __read_mostly = {
 	.supported_mce_cap = MCG_CTL_P | MCG_SER_P,
+	.supported_vm_types = BIT(KVM_X86_DEFAULT_VM),
 };
 EXPORT_SYMBOL_GPL(kvm_caps);
 
@@ -4629,9 +4630,7 @@ static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
 
 static bool kvm_is_vm_type_supported(unsigned long type)
 {
-	return type == KVM_X86_DEFAULT_VM ||
-	       (type == KVM_X86_SW_PROTECTED_VM &&
-		IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled);
+	return type < 32 && (kvm_caps.supported_vm_types & BIT(type));
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
@@ -4832,9 +4831,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = kvm_caps.has_notify_vmexit;
 		break;
 	case KVM_CAP_VM_TYPES:
-		r = BIT(KVM_X86_DEFAULT_VM);
-		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
-			r |= BIT(KVM_X86_SW_PROTECTED_VM);
+		r = kvm_caps.supported_vm_types;
 		break;
 	default:
 		break;
@@ -9824,6 +9821,9 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 
 	kvm_register_perf_callbacks(ops->handle_intel_pt_intr);
 
+	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled)
+		kvm_caps.supported_vm_types |= BIT(KVM_X86_SW_PROTECTED_VM);
+
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		kvm_caps.supported_xss = 0;
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a8b71803777b..d80a4c6b5a38 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -24,6 +24,8 @@ struct kvm_caps {
 	bool has_bus_lock_exit;
 	/* notify VM exit supported? */
 	bool has_notify_vmexit;
+	/* bit mask of VM types */
+	u32 supported_vm_types;
 
 	u64 supported_mce_cap;
 	u64 supported_xcr0;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 09/17] KVM: SEV: introduce to_kvm_sev_info
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (7 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 08/17] KVM: x86: Add supported_vm_types to kvm_caps Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 10/17] KVM: SEV: define VM types for SEV and SEV-ES Paolo Bonzini
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata, Sean Christopherson

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/sev.c | 4 ++--
 arch/x86/kvm/svm/svm.h | 5 +++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e24f7d243a0a..f98448dc8be8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -96,7 +96,7 @@ static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid)
 
 static inline bool is_mirroring_enc_context(struct kvm *kvm)
 {
-	return !!to_kvm_svm(kvm)->sev_info.enc_context_owner;
+	return !!to_kvm_sev_info(kvm)->enc_context_owner;
 }
 
 static bool sev_vcpu_has_debug_swap(struct vcpu_svm *svm)
@@ -653,7 +653,7 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
 	clflush_cache_range(svm->sev_es.vmsa, PAGE_SIZE);
 
 	vmsa.reserved = 0;
-	vmsa.handle = to_kvm_svm(kvm)->sev_info.handle;
+	vmsa.handle = to_kvm_sev_info(kvm)->handle;
 	vmsa.address = __sme_pa(svm->sev_es.vmsa);
 	vmsa.len = PAGE_SIZE;
 	ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_VMSA, &vmsa, error);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4a1623cacbae..5d5b8ed43db8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -319,6 +319,11 @@ static __always_inline struct kvm_svm *to_kvm_svm(struct kvm *kvm)
 	return container_of(kvm, struct kvm_svm, kvm);
 }
 
+static __always_inline struct kvm_sev_info *to_kvm_sev_info(struct kvm *kvm)
+{
+	return &to_kvm_svm(kvm)->sev_info;
+}
+
 static __always_inline bool sev_guest(struct kvm *kvm)
 {
 #ifdef CONFIG_KVM_AMD_SEV
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 10/17] KVM: SEV: define VM types for SEV and SEV-ES
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (8 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 09/17] KVM: SEV: introduce to_kvm_sev_info Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 11/17] KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time Paolo Bonzini
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst  |  2 ++
 arch/x86/include/uapi/asm/kvm.h |  2 ++
 arch/x86/kvm/svm/sev.c          | 16 +++++++++++++---
 arch/x86/kvm/svm/svm.c          | 11 +++++++++++
 arch/x86/kvm/svm/svm.h          |  1 +
 5 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..f0b76ff5030d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8819,6 +8819,8 @@ means the VM type with value @n is supported.  Possible values of @n are::
 
   #define KVM_X86_DEFAULT_VM	0
   #define KVM_X86_SW_PROTECTED_VM	1
+  #define KVM_X86_SEV_VM	2
+  #define KVM_X86_SEV_ES_VM	3
 
 Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing.
 Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index b7dc515f4c27..ab609adacb11 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -861,5 +861,7 @@ struct kvm_hyperv_eventfd {
 
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_SW_PROTECTED_VM	1
+#define KVM_X86_SEV_VM		2
+#define KVM_X86_SEV_ES_VM	3
 
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f98448dc8be8..1512bacd74a9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -251,6 +251,9 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (kvm->created_vcpus)
 		return -EINVAL;
 
+	if (kvm->arch.vm_type != KVM_X86_DEFAULT_VM)
+		return -EINVAL;
+
 	if (unlikely(sev->active))
 		return -EINVAL;
 
@@ -272,6 +275,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 	INIT_LIST_HEAD(&sev->regions_list);
 	INIT_LIST_HEAD(&sev->mirror_vms);
+	sev->need_init = false;
 
 	kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV);
 
@@ -1808,7 +1812,8 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	if (ret)
 		goto out_fput;
 
-	if (sev_guest(kvm) || !sev_guest(source_kvm)) {
+	if (kvm->arch.vm_type != source_kvm->arch.vm_type ||
+	    sev_guest(kvm) || !sev_guest(source_kvm)) {
 		ret = -EINVAL;
 		goto out_unlock;
 	}
@@ -2132,6 +2137,7 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	mirror_sev->asid = source_sev->asid;
 	mirror_sev->fd = source_sev->fd;
 	mirror_sev->es_active = source_sev->es_active;
+	mirror_sev->need_init = false;
 	mirror_sev->handle = source_sev->handle;
 	INIT_LIST_HEAD(&mirror_sev->regions_list);
 	INIT_LIST_HEAD(&mirror_sev->mirror_vms);
@@ -2197,10 +2203,14 @@ void sev_vm_destroy(struct kvm *kvm)
 
 void __init sev_set_cpu_caps(void)
 {
-	if (sev_enabled)
+	if (sev_enabled) {
 		kvm_cpu_cap_set(X86_FEATURE_SEV);
-	if (sev_es_enabled)
+		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_VM);
+	}
+	if (sev_es_enabled) {
 		kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
+		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
+	}
 }
 
 void __init sev_hardware_setup(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c22e87ebf0de..b0038ece55cb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4086,6 +4086,9 @@ static void svm_cancel_injection(struct kvm_vcpu *vcpu)
 
 static int svm_vcpu_pre_run(struct kvm_vcpu *vcpu)
 {
+	if (to_kvm_sev_info(vcpu->kvm)->need_init)
+		return -EINVAL;
+
 	return 1;
 }
 
@@ -4891,6 +4894,14 @@ static void svm_vm_destroy(struct kvm *kvm)
 
 static int svm_vm_init(struct kvm *kvm)
 {
+	int type = kvm->arch.vm_type;
+
+	if (type != KVM_X86_DEFAULT_VM &&
+	    type != KVM_X86_SW_PROTECTED_VM) {
+		kvm->arch.has_protected_state = (type == KVM_X86_SEV_ES_VM);
+		to_kvm_sev_info(kvm)->need_init = true;
+	}
+
 	if (!pause_filter_count || !pause_filter_thresh)
 		kvm->arch.pause_in_guest = true;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5d5b8ed43db8..323901782547 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,6 +79,7 @@ enum {
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
+	bool need_init;		/* waiting for SEV_INIT2 */
 	unsigned int asid;	/* ASID used for this guest */
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 11/17] KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (9 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 10/17] KVM: SEV: define VM types for SEV and SEV-ES Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 12/17] KVM: SEV: introduce KVM_SEV_INIT2 operation Paolo Bonzini
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata, Dave Hansen

SEV-ES allows passing custom contents for x87, SSE and AVX state into the VMSA.
Allow userspace to do that with the usual KVM_SET_XSAVE API and only mark
FPU contents as confidential after it has been copied and encrypted into
the VMSA.

Since the XSAVE state for AVX is the first, it does not need the
compacted-state handling of get_xsave_addr().  However, there are other
parts of XSAVE state in the VMSA that currently are not handled, and
the validation logic of get_xsave_addr() is pointless to duplicate
in KVM, so move get_xsave_addr() to public FPU API; it is really just
a facility to operate on XSAVE state and does not expose any internal
details of arch/x86/kernel/fpu.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/fpu/api.h |  3 ++
 arch/x86/kernel/fpu/xstate.c   |  1 +
 arch/x86/kernel/fpu/xstate.h   |  2 --
 arch/x86/kvm/svm/sev.c         | 50 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c         |  8 ------
 5 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index a2be3aefff9f..f86ad3335529 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -143,6 +143,9 @@ extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfe
 
 extern u64 xstate_get_guest_group_perm(void);
 
+extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
+
+
 /* KVM specific functions */
 extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu);
 extern void fpu_free_guest_fpstate(struct fpu_guest *gfpu);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 33a214b1a4ce..6d32e415b01e 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -991,6 +991,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 
 	return __raw_xsave_addr(xsave, xfeature_nr);
 }
+EXPORT_SYMBOL_GPL(get_xsave_addr);
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
 
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 19ca623ffa2a..05df04f39628 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -54,8 +54,6 @@ extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void
 extern void fpu__init_cpu_xstate(void);
 extern void fpu__init_system_xstate(unsigned int legacy_size);
 
-extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
-
 static inline u64 xfeatures_mask_supervisor(void)
 {
 	return fpu_kernel_cfg.max_features & XFEATURE_MASK_SUPERVISOR_SUPPORTED;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1512bacd74a9..3517d6736c93 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -23,6 +23,7 @@
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
 #include <asm/fpu/xcr.h>
+#include <asm/fpu/xstate.h>
 #include <asm/debugreg.h>
 
 #include "mmu.h"
@@ -584,6 +585,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
 	struct sev_es_save_area *save = svm->sev_es.vmsa;
+	struct xregs_state *xsave;
+	const u8 *s;
+	u8 *d;
+	int i;
 
 	/* Check some debug related fields before encrypting the VMSA */
 	if (svm->vcpu.guest_debug || (svm->vmcb->save.dr7 & ~DR7_FIXED_1))
@@ -626,6 +631,44 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 
 	save->sev_features = sev->vmsa_features;
 
+	/*
+	 * Skip FPU and AVX setup with KVM_SEV_ES_INIT to avoid
+	 * breaking older measurements.
+	 */
+	if (vcpu->kvm->arch.vm_type != KVM_X86_DEFAULT_VM) {
+		xsave = &vcpu->arch.guest_fpu.fpstate->regs.xsave;
+		save->x87_dp = xsave->i387.rdp;
+		save->mxcsr = xsave->i387.mxcsr;
+		save->x87_ftw = xsave->i387.twd;
+		save->x87_fsw = xsave->i387.swd;
+		save->x87_fcw = xsave->i387.cwd;
+		save->x87_fop = xsave->i387.fop;
+		save->x87_ds = 0;
+		save->x87_cs = 0;
+		save->x87_rip = xsave->i387.rip;
+
+		for (i = 0; i < 8; i++) {
+			/*
+			 * The format of the x87 save area is undocumented and
+			 * definitely not what you would expect.  It consists of
+			 * an 8*8 bytes area with bytes 0-7, and an 8*2 bytes
+			 * area with bytes 8-9 of each register.
+			 */
+			d = save->fpreg_x87 + i * 8;
+			s = ((u8 *)xsave->i387.st_space) + i * 16;
+			memcpy(d, s, 8);
+			save->fpreg_x87[64 + i * 2] = s[8];
+			save->fpreg_x87[64 + i * 2 + 1] = s[9];
+		}
+		memcpy(save->fpreg_xmm, xsave->i387.xmm_space, 256);
+
+		s = get_xsave_addr(xsave, XFEATURE_YMM);
+		if (s)
+			memcpy(save->fpreg_ymm, s, 256);
+		else
+			memset(save->fpreg_ymm, 0, 256);
+	}
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -664,6 +707,13 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
 	if (ret)
 	  return ret;
 
+	/*
+	 * SEV-ES guests maintain an encrypted version of their FPU
+	 * state which is restored and saved on VMRUN and VMEXIT.
+	 * Mark vcpu->arch.guest_fpu->fpstate as scratch so it won't
+	 * do xsave/xrstor on it.
+	 */
+	fpstate_set_confidential(&vcpu->arch.guest_fpu);
 	vcpu->arch.guest_state_protected = true;
 	return 0;
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b0038ece55cb..0f3b59da0d4a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1433,14 +1433,6 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 		vmsa_page = snp_safe_alloc_page(vcpu);
 		if (!vmsa_page)
 			goto error_free_vmcb_page;
-
-		/*
-		 * SEV-ES guests maintain an encrypted version of their FPU
-		 * state which is restored and saved on VMRUN and VMEXIT.
-		 * Mark vcpu->arch.guest_fpu->fpstate as scratch so it won't
-		 * do xsave/xrstor on it.
-		 */
-		fpstate_set_confidential(&vcpu->arch.guest_fpu);
 	}
 
 	err = avic_init_vcpu(svm);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 12/17] KVM: SEV: introduce KVM_SEV_INIT2 operation
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (10 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 11/17] KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 13/17] KVM: SEV: allow SEV-ES DebugSwap again Paolo Bonzini
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

The idea that no parameter would ever be necessary when enabling SEV or
SEV-ES for a VM was decidedly optimistic.  In fact, in some sense it's
already a parameter whether SEV or SEV-ES is desired.  Another possible
source of variability is the desired set of VMSA features, as that affects
the measurement of the VM's initial state and cannot be changed
arbitrarily by the hypervisor.

Create a new sub-operation for KVM_MEMORY_ENCRYPT_OP that can take a struct,
and put the new op to work by including the VMSA features as a field of the
struct.  The existing KVM_SEV_INIT and KVM_SEV_ES_INIT use the full set of
supported VMSA features for backwards compatibility.

The struct also includes the usual bells and whistles for future
extensibility: a flags field that must be zero for now, and some padding
at the end.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../virt/kvm/x86/amd-memory-encryption.rst    | 40 ++++++++++++--
 arch/x86/include/uapi/asm/kvm.h               |  9 ++++
 arch/x86/kvm/svm/sev.c                        | 53 ++++++++++++++++---
 3 files changed, 92 insertions(+), 10 deletions(-)

diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 2ea648e4c97a..3381556d596d 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -76,15 +76,49 @@ are defined in ``<linux/psp-dev.h>``.
 KVM implements the following commands to support common lifecycle events of SEV
 guests, such as launching, running, snapshotting, migrating and decommissioning.
 
-1. KVM_SEV_INIT
----------------
+1. KVM_SEV_INIT2
+----------------
 
-The KVM_SEV_INIT command is used by the hypervisor to initialize the SEV platform
+The KVM_SEV_INIT2 command is used by the hypervisor to initialize the SEV platform
 context. In a typical workflow, this command should be the first command issued.
 
+For this command to be accepted, either KVM_X86_SEV_VM or KVM_X86_SEV_ES_VM
+must have been passed to the KVM_CREATE_VM ioctl.  A virtual machine created
+with those machine types in turn cannot be run until KVM_SEV_INIT2 is invoked.
+
+Parameters: struct kvm_sev_init (in)
 
 Returns: 0 on success, -negative on error
 
+::
+
+        struct kvm_sev_init {
+                __u64 vmsa_features;  /* initial value of features field in VMSA */
+                __u32 flags;          /* must be 0 */
+                __u32 pad[9];
+        };
+
+It is an error if the hypervisor does not support any of the bits that
+are set in ``flags`` or ``vmsa_features``.  ``vmsa_features`` must be
+0 for SEV virtual machines, as they do not have a VMSA.
+
+This command replaces the deprecated KVM_SEV_INIT and KVM_SEV_ES_INIT commands.
+The commands did not have any parameters (the ```data``` field was unused) and
+only work for the KVM_X86_DEFAULT_VM machine type (0).
+
+They behave as if:
+
+* the VM type is KVM_X86_SEV_VM for KVM_SEV_INIT, or KVM_X86_SEV_ES_VM for
+  KVM_SEV_ES_INIT
+
+* the ``flags`` and ``vmsa_features`` fields of ``struct kvm_sev_init`` are
+  set to zero
+
+If the ``KVM_X86_SEV_VMSA_FEATURES`` attribute does not exist, the hypervisor only
+supports KVM_SEV_INIT and KVM_SEV_ES_INIT.  In that case, note that KVM_SEV_ES_INIT
+might set the debug swap VMSA feature (bit 5) depending on the value of the
+``debug_swap`` parameter of ``kvm-amd.ko``.
+
 2. KVM_SEV_LAUNCH_START
 -----------------------
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index ab609adacb11..72ad5ace118d 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -694,6 +694,9 @@ enum sev_cmd_id {
 	/* Guest Migration Extension */
 	KVM_SEV_SEND_CANCEL,
 
+	/* Second time is the charm; improved versions of the above ioctls.  */
+	KVM_SEV_INIT2,
+
 	KVM_SEV_NR_MAX,
 };
 
@@ -705,6 +708,12 @@ struct kvm_sev_cmd {
 	__u32 sev_fd;
 };
 
+struct kvm_sev_init {
+	__u64 vmsa_features;
+	__u32 flags;
+	__u32 pad[9];
+};
+
 struct kvm_sev_launch_start {
 	__u32 handle;
 	__u32 policy;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3517d6736c93..2f20270be93b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -243,27 +243,31 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 	sev_decommission(handle);
 }
 
-static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
+static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
+			    struct kvm_sev_init *data,
+			    unsigned long vm_type)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	struct sev_platform_init_args init_args = {0};
+	bool es_active = vm_type != KVM_X86_SEV_VM;
+	u64 valid_vmsa_features = es_active ? sev_supported_vmsa_features : 0;
 	int ret;
 
 	if (kvm->created_vcpus)
 		return -EINVAL;
 
-	if (kvm->arch.vm_type != KVM_X86_DEFAULT_VM)
+	if (data->flags)
+		return -EINVAL;
+
+	if (data->vmsa_features & ~valid_vmsa_features)
 		return -EINVAL;
 
 	if (unlikely(sev->active))
 		return -EINVAL;
 
 	sev->active = true;
-	sev->es_active = argp->id == KVM_SEV_ES_INIT;
-	sev->vmsa_features = sev_supported_vmsa_features;
-	if (sev_supported_vmsa_features)
-		pr_warn_once("Enabling DebugSwap with KVM_SEV_ES_INIT. "
-			     "This will not work starting with Linux 6.10\n");
+	sev->es_active = es_active;
+	sev->vmsa_features = data->vmsa_features;
 
 	ret = sev_asid_new(sev);
 	if (ret)
@@ -293,6 +297,38 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_init data = {
+		.vmsa_features = 0,
+	};
+	unsigned long vm_type;
+
+	if (kvm->arch.vm_type != KVM_X86_DEFAULT_VM)
+		return -EINVAL;
+
+	vm_type = (argp->id == KVM_SEV_INIT ? KVM_X86_SEV_VM : KVM_X86_SEV_ES_VM);
+	return __sev_guest_init(kvm, argp, &data, vm_type);
+}
+
+static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_init data;
+
+	if (!sev->need_init)
+		return -EINVAL;
+
+	if (kvm->arch.vm_type != KVM_X86_SEV_VM &&
+	    kvm->arch.vm_type != KVM_X86_SEV_ES_VM)
+		return -EINVAL;
+
+	if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
+		return -EFAULT;
+
+	return __sev_guest_init(kvm, argp, &data, kvm->arch.vm_type);
+}
+
 static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
 {
 	unsigned int asid = sev_get_asid(kvm);
@@ -1960,6 +1996,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_INIT:
 		r = sev_guest_init(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_INIT2:
+		r = sev_guest_init2(kvm, &sev_cmd);
+		break;
 	case KVM_SEV_LAUNCH_START:
 		r = sev_launch_start(kvm, &sev_cmd);
 		break;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 13/17] KVM: SEV: allow SEV-ES DebugSwap again
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (11 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 12/17] KVM: SEV: introduce KVM_SEV_INIT2 operation Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 14/17] selftests: kvm: add tests for KVM_SEV_INIT2 Paolo Bonzini
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

The DebugSwap feature of SEV-ES provides a way for confidential guests
to use data breakpoints.  Its status is record in VMSA, and therefore
attestation signatures depend on whether it is enabled or not.  In order
to avoid invalidating the signatures depending on the host machine, it
was disabled by default (see commit 5abf6dceb066, "SEV: disable SEV-ES
DebugSwap by default", 2024-03-09).

However, we now have a new API to create SEV VMs that allows enabling
DebugSwap based on what the user tells KVM to do, and we also changed the
legacy KVM_SEV_ES_INIT API to never enable DebugSwap.  It is therefore
possible to re-enable the feature without breaking compatibility with
kernels that pre-date the introduction of DebugSwap, so go ahead.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm/sev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2f20270be93b..022d92fb4b85 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -45,7 +45,7 @@ static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
 
 /* enable/disable SEV-ES DebugSwap support */
-static bool sev_es_debug_swap_enabled = false;
+static bool sev_es_debug_swap_enabled = true;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 static u64 sev_supported_vmsa_features;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 14/17] selftests: kvm: add tests for KVM_SEV_INIT2
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (12 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 13/17] KVM: SEV: allow SEV-ES DebugSwap again Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 15/17] selftests: kvm: switch to using KVM_X86_*_VM Paolo Bonzini
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/kvm_util_base.h     |   6 +-
 .../selftests/kvm/set_memory_region_test.c    |   8 +-
 .../selftests/kvm/x86_64/sev_init2_tests.c    | 152 ++++++++++++++++++
 4 files changed, 159 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/sev_init2_tests.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 741c7dc16afc..871e2de3eb05 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -120,6 +120,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_caps_test
 TEST_GEN_PROGS_x86_64 += x86_64/xen_shinfo_test
 TEST_GEN_PROGS_x86_64 += x86_64/xen_vmcall_test
+TEST_GEN_PROGS_x86_64 += x86_64/sev_init2_tests
 TEST_GEN_PROGS_x86_64 += x86_64/sev_migrate_tests
 TEST_GEN_PROGS_x86_64 += x86_64/sev_smoke_test
 TEST_GEN_PROGS_x86_64 += x86_64/amx_test
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 3e0db283a46a..7c06ceb36643 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -890,17 +890,15 @@ static inline struct kvm_vm *vm_create_barebones(void)
 	return ____vm_create(VM_SHAPE_DEFAULT);
 }
 
-#ifdef __x86_64__
-static inline struct kvm_vm *vm_create_barebones_protected_vm(void)
+static inline struct kvm_vm *vm_create_barebones_type(unsigned long type)
 {
 	const struct vm_shape shape = {
 		.mode = VM_MODE_DEFAULT,
-		.type = KVM_X86_SW_PROTECTED_VM,
+		.type = type,
 	};
 
 	return ____vm_create(shape);
 }
-#endif
 
 static inline struct kvm_vm *vm_create(uint32_t nr_runnable_vcpus)
 {
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index 06b43ed23580..904d58793fc6 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -339,7 +339,7 @@ static void test_invalid_memory_region_flags(void)
 
 #ifdef __x86_64__
 	if (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))
-		vm = vm_create_barebones_protected_vm();
+		vm = vm_create_barebones_type(KVM_X86_SW_PROTECTED_VM);
 	else
 #endif
 		vm = vm_create_barebones();
@@ -462,7 +462,7 @@ static void test_add_private_memory_region(void)
 
 	pr_info("Testing ADD of KVM_MEM_GUEST_MEMFD memory regions\n");
 
-	vm = vm_create_barebones_protected_vm();
+	vm = vm_create_barebones_type(KVM_X86_SW_PROTECTED_VM);
 
 	test_invalid_guest_memfd(vm, vm->kvm_fd, 0, "KVM fd should fail");
 	test_invalid_guest_memfd(vm, vm->fd, 0, "VM's fd should fail");
@@ -471,7 +471,7 @@ static void test_add_private_memory_region(void)
 	test_invalid_guest_memfd(vm, memfd, 0, "Regular memfd() should fail");
 	close(memfd);
 
-	vm2 = vm_create_barebones_protected_vm();
+	vm2 = vm_create_barebones_type(KVM_X86_SW_PROTECTED_VM);
 	memfd = vm_create_guest_memfd(vm2, MEM_REGION_SIZE, 0);
 	test_invalid_guest_memfd(vm, memfd, 0, "Other VM's guest_memfd() should fail");
 
@@ -499,7 +499,7 @@ static void test_add_overlapping_private_memory_regions(void)
 
 	pr_info("Testing ADD of overlapping KVM_MEM_GUEST_MEMFD memory regions\n");
 
-	vm = vm_create_barebones_protected_vm();
+	vm = vm_create_barebones_type(KVM_X86_SW_PROTECTED_VM);
 
 	memfd = vm_create_guest_memfd(vm, MEM_REGION_SIZE * 4, 0);
 
diff --git a/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c b/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
new file mode 100644
index 000000000000..7a4a61be119b
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/sev_init2_tests.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kvm.h>
+#include <linux/psp-sev.h>
+#include <stdio.h>
+#include <sys/ioctl.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <pthread.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "kselftest.h"
+
+#define SVM_SEV_FEAT_DEBUG_SWAP 32u
+
+/*
+ * Some features may have hidden dependencies, or may only work
+ * for certain VM types.  Err on the side of safety and don't
+ * expect that all supported features can be passed one by one
+ * to KVM_SEV_INIT2.
+ *
+ * (Well, right now there's only one...)
+ */
+#define KNOWN_FEATURES SVM_SEV_FEAT_DEBUG_SWAP
+
+int kvm_fd;
+u64 supported_vmsa_features;
+bool have_sev_es;
+
+static int __sev_ioctl(int vm_fd, int cmd_id, void *data)
+{
+	struct kvm_sev_cmd cmd = {
+		.id = cmd_id,
+		.data = (uint64_t)data,
+		.sev_fd = open_sev_dev_path_or_exit(),
+	};
+	int ret;
+
+	ret = ioctl(vm_fd, KVM_MEMORY_ENCRYPT_OP, &cmd);
+	TEST_ASSERT(ret < 0 || cmd.error == SEV_RET_SUCCESS,
+		    "%d failed: fw error: %d\n",
+		    cmd_id, cmd.error);
+
+	return ret;
+}
+
+static void test_init2(unsigned long vm_type, struct kvm_sev_init *init)
+{
+	struct kvm_vm *vm;
+	int ret;
+
+	vm = vm_create_barebones_type(vm_type);
+	ret = __sev_ioctl(vm->fd, KVM_SEV_INIT2, init);
+	TEST_ASSERT(ret == 0,
+		    "KVM_SEV_INIT2 return code is %d (expected 0), errno: %d",
+		    ret, errno);
+	kvm_vm_free(vm);
+}
+
+static void test_init2_invalid(unsigned long vm_type, struct kvm_sev_init *init, const char *msg)
+{
+	struct kvm_vm *vm;
+	int ret;
+
+	vm = vm_create_barebones_type(vm_type);
+	ret = __sev_ioctl(vm->fd, KVM_SEV_INIT2, init);
+	TEST_ASSERT(ret == -1 && errno == EINVAL,
+		    "KVM_SEV_INIT2 should fail, %s.",
+		    msg);
+	kvm_vm_free(vm);
+}
+
+void test_vm_types(void)
+{
+	test_init2(KVM_X86_SEV_VM, &(struct kvm_sev_init){});
+
+	/*
+	 * TODO: check that unsupported types cannot be created.  Probably
+	 * a separate selftest.
+	 */
+	if (have_sev_es)
+		test_init2(KVM_X86_SEV_ES_VM, &(struct kvm_sev_init){});
+
+	test_init2_invalid(0, &(struct kvm_sev_init){},
+			   "VM type is KVM_X86_DEFAULT_VM");
+	if (kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))
+		test_init2_invalid(KVM_X86_SW_PROTECTED_VM, &(struct kvm_sev_init){},
+				   "VM type is KVM_X86_SW_PROTECTED_VM");
+}
+
+void test_flags(uint32_t vm_type)
+{
+	int i;
+
+	for (i = 0; i < 32; i++)
+		test_init2_invalid(vm_type,
+			&(struct kvm_sev_init){ .flags = BIT(i) },
+			"invalid flag");
+}
+
+void test_features(uint32_t vm_type, uint64_t supported_features)
+{
+	int i;
+
+	for (i = 0; i < 64; i++) {
+		if (!(supported_features & (1u << i)))
+			test_init2_invalid(vm_type,
+				&(struct kvm_sev_init){ .vmsa_features = BIT_ULL(i) },
+				"unknown feature");
+		else if (KNOWN_FEATURES & (1u << i))
+			test_init2(vm_type,
+				&(struct kvm_sev_init){ .vmsa_features = BIT_ULL(i) });
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	int kvm_fd = open_kvm_dev_path_or_exit();
+	bool have_sev;
+
+	TEST_REQUIRE(__kvm_has_device_attr(kvm_fd, KVM_X86_GRP_SEV,
+					   KVM_X86_SEV_VMSA_FEATURES) == 0);
+	kvm_device_attr_get(kvm_fd, KVM_X86_GRP_SEV,
+			    KVM_X86_SEV_VMSA_FEATURES,
+			    &supported_vmsa_features);
+
+	have_sev = kvm_cpu_has(X86_FEATURE_SEV);
+	TEST_ASSERT(have_sev == !!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SEV_VM)),
+		    "sev: KVM_CAP_VM_TYPES (%x) does not match cpuid (checking %x)",
+		    kvm_check_cap(KVM_CAP_VM_TYPES), 1 << KVM_X86_SEV_VM);
+
+	TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SEV_VM));
+	have_sev_es = kvm_cpu_has(X86_FEATURE_SEV_ES);
+
+	TEST_ASSERT(have_sev_es == !!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SEV_ES_VM)),
+		    "sev-es: KVM_CAP_VM_TYPES (%x) does not match cpuid (checking %x)",
+		    kvm_check_cap(KVM_CAP_VM_TYPES), 1 << KVM_X86_SEV_ES_VM);
+
+	test_vm_types();
+
+	test_flags(KVM_X86_SEV_VM);
+	if (have_sev_es)
+		test_flags(KVM_X86_SEV_ES_VM);
+
+	test_features(KVM_X86_SEV_VM, 0);
+	if (have_sev_es)
+		test_features(KVM_X86_SEV_ES_VM, supported_vmsa_features);
+
+	return 0;
+}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 15/17] selftests: kvm: switch to using KVM_X86_*_VM
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (13 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 14/17] selftests: kvm: add tests for KVM_SEV_INIT2 Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 16/17] selftests: kvm: split "launch" phase of SEV VM creation Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 17/17] selftests: kvm: add test for transferring FPU state into VMSA Paolo Bonzini
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

This removes the concept of "subtypes", instead letting the tests use proper
VM types that were recently added.  While the sev_init_vm() and sev_es_init_vm()
are still able to operate with the legacy KVM_SEV_INIT and KVM_SEV_ES_INIT
ioctls, this is limited to VMs that are created manually with
vm_create_barebones().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../selftests/kvm/include/kvm_util_base.h     |  5 ++--
 .../selftests/kvm/include/x86_64/processor.h  |  6 ----
 .../selftests/kvm/include/x86_64/sev.h        | 16 ++--------
 tools/testing/selftests/kvm/lib/kvm_util.c    |  1 -
 .../selftests/kvm/lib/x86_64/processor.c      | 14 +++++----
 tools/testing/selftests/kvm/lib/x86_64/sev.c  | 30 +++++++++++++++++--
 6 files changed, 40 insertions(+), 32 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 7c06ceb36643..8acca8237687 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -93,7 +93,6 @@ enum kvm_mem_region_type {
 struct kvm_vm {
 	int mode;
 	unsigned long type;
-	uint8_t subtype;
 	int kvm_fd;
 	int fd;
 	unsigned int pgtable_levels;
@@ -200,8 +199,8 @@ enum vm_guest_mode {
 struct vm_shape {
 	uint32_t type;
 	uint8_t  mode;
-	uint8_t  subtype;
-	uint16_t padding;
+	uint8_t  pad0;
+	uint16_t pad1;
 };
 
 kvm_static_assert(sizeof(struct vm_shape) == sizeof(uint64_t));
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 81ce37ec407d..74a59c7ce7ed 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -23,12 +23,6 @@
 extern bool host_cpu_is_intel;
 extern bool host_cpu_is_amd;
 
-enum vm_guest_x86_subtype {
-	VM_SUBTYPE_NONE = 0,
-	VM_SUBTYPE_SEV,
-	VM_SUBTYPE_SEV_ES,
-};
-
 /* Forced emulation prefix, used to invoke the emulator unconditionally. */
 #define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
 
diff --git a/tools/testing/selftests/kvm/include/x86_64/sev.h b/tools/testing/selftests/kvm/include/x86_64/sev.h
index 8a1bf88474c9..0719f083351a 100644
--- a/tools/testing/selftests/kvm/include/x86_64/sev.h
+++ b/tools/testing/selftests/kvm/include/x86_64/sev.h
@@ -67,20 +67,8 @@ kvm_static_assert(SEV_RET_SUCCESS == 0);
 	__TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd,	ret, vm);		\
 })
 
-static inline void sev_vm_init(struct kvm_vm *vm)
-{
-	vm->arch.sev_fd = open_sev_dev_path_or_exit();
-
-	vm_sev_ioctl(vm, KVM_SEV_INIT, NULL);
-}
-
-
-static inline void sev_es_vm_init(struct kvm_vm *vm)
-{
-	vm->arch.sev_fd = open_sev_dev_path_or_exit();
-
-	vm_sev_ioctl(vm, KVM_SEV_ES_INIT, NULL);
-}
+void sev_vm_init(struct kvm_vm *vm);
+void sev_es_vm_init(struct kvm_vm *vm);
 
 static inline void sev_register_encrypted_memory(struct kvm_vm *vm,
 						 struct userspace_mem_region *region)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index b2262b5fad9e..9da388100f3a 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -276,7 +276,6 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
 
 	vm->mode = shape.mode;
 	vm->type = shape.type;
-	vm->subtype = shape.subtype;
 
 	vm->pa_bits = vm_guest_mode_params[vm->mode].pa_bits;
 	vm->va_bits = vm_guest_mode_params[vm->mode].va_bits;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 74a4c736c9ae..9f87ca8b7ab6 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -578,10 +578,11 @@ void kvm_arch_vm_post_create(struct kvm_vm *vm)
 	sync_global_to_guest(vm, host_cpu_is_intel);
 	sync_global_to_guest(vm, host_cpu_is_amd);
 
-	if (vm->subtype == VM_SUBTYPE_SEV)
-		sev_vm_init(vm);
-	else if (vm->subtype == VM_SUBTYPE_SEV_ES)
-		sev_es_vm_init(vm);
+	if (vm->type == KVM_X86_SEV_VM || vm->type == KVM_X86_SEV_ES_VM) {
+		struct kvm_sev_init init = { 0 };
+
+		vm_sev_ioctl(vm, KVM_SEV_INIT2, &init);
+	}
 }
 
 void vcpu_arch_set_entry_point(struct kvm_vcpu *vcpu, void *guest_code)
@@ -1081,9 +1082,12 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits)
 
 void kvm_init_vm_address_properties(struct kvm_vm *vm)
 {
-	if (vm->subtype == VM_SUBTYPE_SEV || vm->subtype == VM_SUBTYPE_SEV_ES) {
+	if (vm->type == KVM_X86_SEV_VM || vm->type == KVM_X86_SEV_ES_VM) {
+		vm->arch.sev_fd = open_sev_dev_path_or_exit();
 		vm->arch.c_bit = BIT_ULL(this_cpu_property(X86_PROPERTY_SEV_C_BIT));
 		vm->gpa_tag_mask = vm->arch.c_bit;
+	} else {
+		vm->arch.sev_fd = -1;
 	}
 }
 
diff --git a/tools/testing/selftests/kvm/lib/x86_64/sev.c b/tools/testing/selftests/kvm/lib/x86_64/sev.c
index e248d3364b9c..597994fa4f41 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/sev.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/sev.c
@@ -35,6 +35,32 @@ static void encrypt_region(struct kvm_vm *vm, struct userspace_mem_region *regio
 	}
 }
 
+void sev_vm_init(struct kvm_vm *vm)
+{
+	if (vm->type == KVM_X86_DEFAULT_VM) {
+		assert(vm->arch.sev_fd == -1);
+		vm->arch.sev_fd = open_sev_dev_path_or_exit();
+		vm_sev_ioctl(vm, KVM_SEV_INIT, NULL);
+	} else {
+		struct kvm_sev_init init = { 0 };
+		assert(vm->type == KVM_X86_SEV_VM);
+		vm_sev_ioctl(vm, KVM_SEV_INIT2, &init);
+	}
+}
+
+void sev_es_vm_init(struct kvm_vm *vm)
+{
+	if (vm->type == KVM_X86_DEFAULT_VM) {
+		assert(vm->arch.sev_fd == -1);
+		vm->arch.sev_fd = open_sev_dev_path_or_exit();
+		vm_sev_ioctl(vm, KVM_SEV_ES_INIT, NULL);
+	} else {
+		struct kvm_sev_init init = { 0 };
+		assert(vm->type == KVM_X86_SEV_ES_VM);
+		vm_sev_ioctl(vm, KVM_SEV_INIT2, &init);
+	}
+}
+
 void sev_vm_launch(struct kvm_vm *vm, uint32_t policy)
 {
 	struct kvm_sev_launch_start launch_start = {
@@ -91,10 +117,8 @@ struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t policy, void *guest_code,
 					   struct kvm_vcpu **cpu)
 {
 	struct vm_shape shape = {
-		.type = VM_TYPE_DEFAULT,
 		.mode = VM_MODE_DEFAULT,
-		.subtype = policy & SEV_POLICY_ES ? VM_SUBTYPE_SEV_ES :
-						    VM_SUBTYPE_SEV,
+		.type = policy & SEV_POLICY_ES ? KVM_X86_SEV_ES_VM : KVM_X86_SEV_VM,
 	};
 	struct kvm_vm *vm;
 	struct kvm_vcpu *cpus[1];
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 16/17] selftests: kvm: split "launch" phase of SEV VM creation
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (14 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 15/17] selftests: kvm: switch to using KVM_X86_*_VM Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  2024-04-04 12:13 ` [PATCH v5 17/17] selftests: kvm: add test for transferring FPU state into VMSA Paolo Bonzini
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

Allow the caller to set the initial state of the VM.  Doing this
before sev_vm_launch() matters for SEV-ES, since that is the
place where the VMSA is updated and after which the guest state
becomes sealed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tools/testing/selftests/kvm/include/x86_64/sev.h |  3 ++-
 tools/testing/selftests/kvm/lib/x86_64/sev.c     | 16 ++++++++++------
 .../selftests/kvm/x86_64/sev_smoke_test.c        |  7 ++++++-
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/sev.h b/tools/testing/selftests/kvm/include/x86_64/sev.h
index 0719f083351a..82c11c81a956 100644
--- a/tools/testing/selftests/kvm/include/x86_64/sev.h
+++ b/tools/testing/selftests/kvm/include/x86_64/sev.h
@@ -31,8 +31,9 @@ void sev_vm_launch(struct kvm_vm *vm, uint32_t policy);
 void sev_vm_launch_measure(struct kvm_vm *vm, uint8_t *measurement);
 void sev_vm_launch_finish(struct kvm_vm *vm);
 
-struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t policy, void *guest_code,
+struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t type, void *guest_code,
 					   struct kvm_vcpu **cpu);
+void vm_sev_launch(struct kvm_vm *vm, uint32_t policy, uint8_t *measurement);
 
 kvm_static_assert(SEV_RET_SUCCESS == 0);
 
diff --git a/tools/testing/selftests/kvm/lib/x86_64/sev.c b/tools/testing/selftests/kvm/lib/x86_64/sev.c
index 597994fa4f41..d482029b6004 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/sev.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/sev.c
@@ -113,26 +113,30 @@ void sev_vm_launch_finish(struct kvm_vm *vm)
 	TEST_ASSERT_EQ(status.state, SEV_GUEST_STATE_RUNNING);
 }
 
-struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t policy, void *guest_code,
+struct kvm_vm *vm_sev_create_with_one_vcpu(uint32_t type, void *guest_code,
 					   struct kvm_vcpu **cpu)
 {
 	struct vm_shape shape = {
 		.mode = VM_MODE_DEFAULT,
-		.type = policy & SEV_POLICY_ES ? KVM_X86_SEV_ES_VM : KVM_X86_SEV_VM,
+		.type = type,
 	};
 	struct kvm_vm *vm;
 	struct kvm_vcpu *cpus[1];
-	uint8_t measurement[512];
 
 	vm = __vm_create_with_vcpus(shape, 1, 0, guest_code, cpus);
 	*cpu = cpus[0];
 
+	return vm;
+}
+
+void vm_sev_launch(struct kvm_vm *vm, uint32_t policy, uint8_t *measurement)
+{
 	sev_vm_launch(vm, policy);
 
-	/* TODO: Validate the measurement is as expected. */
+	if (!measurement)
+		measurement = alloca(256);
+
 	sev_vm_launch_measure(vm, measurement);
 
 	sev_vm_launch_finish(vm);
-
-	return vm;
 }
diff --git a/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c b/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c
index 026779f3ed06..234c80dd344d 100644
--- a/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c
+++ b/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c
@@ -41,7 +41,12 @@ static void test_sev(void *guest_code, uint64_t policy)
 	struct kvm_vm *vm;
 	struct ucall uc;
 
-	vm = vm_sev_create_with_one_vcpu(policy, guest_code, &vcpu);
+	uint32_t type = policy & SEV_POLICY_ES ? KVM_X86_SEV_ES_VM : KVM_X86_SEV_VM;
+
+	vm = vm_sev_create_with_one_vcpu(type, guest_code, &vcpu);
+
+	/* TODO: Validate the measurement is as expected. */
+	vm_sev_launch(vm, policy, NULL);
 
 	for (;;) {
 		vcpu_run(vcpu);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 17/17] selftests: kvm: add test for transferring FPU state into VMSA
  2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
                   ` (15 preceding siblings ...)
  2024-04-04 12:13 ` [PATCH v5 16/17] selftests: kvm: split "launch" phase of SEV VM creation Paolo Bonzini
@ 2024-04-04 12:13 ` Paolo Bonzini
  16 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-04 12:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: michael.roth, isaku.yamahata

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../selftests/kvm/x86_64/sev_smoke_test.c     | 89 +++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c b/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c
index 234c80dd344d..7c70c0da4fb7 100644
--- a/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c
+++ b/tools/testing/selftests/kvm/x86_64/sev_smoke_test.c
@@ -4,6 +4,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <sys/ioctl.h>
+#include <math.h>
 
 #include "test_util.h"
 #include "kvm_util.h"
@@ -13,6 +14,8 @@
 #include "sev.h"
 
 
+#define XFEATURE_MASK_X87_AVX (XFEATURE_MASK_FP | XFEATURE_MASK_SSE | XFEATURE_MASK_YMM)
+
 static void guest_sev_es_code(void)
 {
 	/* TODO: Check CPUID after GHCB-based hypercall support is added. */
@@ -35,6 +38,86 @@ static void guest_sev_code(void)
 	GUEST_DONE();
 }
 
+/* Stash state passed via VMSA before any compiled code runs.  */
+extern void guest_code_xsave(void);
+asm("guest_code_xsave:\n"
+    "mov $-1, %eax\n"
+    "mov $-1, %edx\n"
+    "xsave (%rdi)\n"
+    "jmp guest_sev_es_code");
+
+static void compare_xsave(u8 *from_host, u8 *from_guest)
+{
+	int i;
+	bool bad = false;
+	for (i = 0; i < 4095; i++) {
+		if (from_host[i] != from_guest[i]) {
+			printf("mismatch at %02hhx | %02hhx %02hhx\n", i, from_host[i], from_guest[i]);
+			bad = true;
+		}
+	}
+
+	if (bad)
+		abort();
+}
+
+static void test_sync_vmsa(uint32_t policy)
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	vm_vaddr_t gva;
+	void *hva;
+
+	double x87val = M_PI;
+	struct kvm_xsave __attribute__((aligned(64))) xsave = { 0 };
+	struct kvm_sregs sregs;
+	struct kvm_xcrs xcrs = {
+		.nr_xcrs = 1,
+		.xcrs[0].xcr = 0,
+		.xcrs[0].value = XFEATURE_MASK_X87_AVX,
+	};
+
+	vm = vm_sev_create_with_one_vcpu(KVM_X86_SEV_ES_VM, guest_code_xsave, &vcpu);
+	gva = vm_vaddr_alloc_shared(vm, PAGE_SIZE, KVM_UTIL_MIN_VADDR,
+				    MEM_REGION_TEST_DATA);
+	hva = addr_gva2hva(vm, gva);
+
+	vcpu_args_set(vcpu, 1, gva);
+
+	vcpu_sregs_get(vcpu, &sregs);
+	sregs.cr4 |= X86_CR4_OSFXSR | X86_CR4_OSXSAVE;
+	vcpu_sregs_set(vcpu, &sregs);
+
+	vcpu_xcrs_set(vcpu, &xcrs);
+	asm("fninit\n"
+	    "vpcmpeqb %%ymm4, %%ymm4, %%ymm4\n"
+	    "fldl %3\n"
+	    "xsave (%2)\n"
+	    "fstp %%st\n"
+	    : "=m"(xsave)
+	    : "A"(XFEATURE_MASK_X87_AVX), "r"(&xsave), "m" (x87val)
+	    : "ymm4", "st", "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)");
+	vcpu_xsave_set(vcpu, &xsave);
+
+	vm_sev_launch(vm, SEV_POLICY_ES | policy, NULL);
+
+	/* This page is shared, so make it decrypted.  */
+	memset(hva, 0, 4096);
+
+	vcpu_run(vcpu);
+
+	TEST_ASSERT(vcpu->run->exit_reason == KVM_EXIT_SYSTEM_EVENT,
+		    "Wanted SYSTEM_EVENT, got %s",
+		    exit_reason_str(vcpu->run->exit_reason));
+	TEST_ASSERT_EQ(vcpu->run->system_event.type, KVM_SYSTEM_EVENT_SEV_TERM);
+	TEST_ASSERT_EQ(vcpu->run->system_event.ndata, 1);
+	TEST_ASSERT_EQ(vcpu->run->system_event.data[0], GHCB_MSR_TERM_REQ);
+
+	compare_xsave((u8 *)&xsave, (u8 *)hva);
+
+	kvm_vm_free(vm);
+}
+
 static void test_sev(void *guest_code, uint64_t policy)
 {
 	struct kvm_vcpu *vcpu;
@@ -87,6 +170,12 @@ int main(int argc, char *argv[])
 	if (kvm_cpu_has(X86_FEATURE_SEV_ES)) {
 		test_sev(guest_sev_es_code, SEV_POLICY_ES | SEV_POLICY_NO_DBG);
 		test_sev(guest_sev_es_code, SEV_POLICY_ES);
+
+		if (kvm_has_cap(KVM_CAP_XCRS) &&
+		    (xgetbv(0) & XFEATURE_MASK_X87_AVX) == XFEATURE_MASK_X87_AVX) {
+			test_sync_vmsa(0);
+			test_sync_vmsa(SEV_POLICY_NO_DBG);
+		}
 	}
 
 	return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 04/17] KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR
  2024-04-04 12:13 ` [PATCH v5 04/17] KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR Paolo Bonzini
@ 2024-04-04 21:30   ` Isaku Yamahata
  0 siblings, 0 replies; 31+ messages in thread
From: Isaku Yamahata @ 2024-04-04 21:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, michael.roth, isaku.yamahata, isaku.yamahata

On Thu, Apr 04, 2024 at 08:13:14AM -0400,
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Allow vendor modules to provide their own attributes on /dev/kvm.
> To avoid proliferation of vendor ops, implement KVM_HAS_DEVICE_ATTR
> and KVM_GET_DEVICE_ATTR in terms of the same function.  You're not
> supposed to use KVM_GET_DEVICE_ATTR to do complicated computations,
> especially on /dev/kvm.
> 
> Reviewed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/x86.c                 | 38 +++++++++++++++++++-----------
>  3 files changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 110d7f29ca9a..5187fcf4b610 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -121,6 +121,7 @@ KVM_X86_OP(enter_smm)
>  KVM_X86_OP(leave_smm)
>  KVM_X86_OP(enable_smi_window)
>  #endif
> +KVM_X86_OP_OPTIONAL(dev_get_attr)
>  KVM_X86_OP_OPTIONAL(mem_enc_ioctl)
>  KVM_X86_OP_OPTIONAL(mem_enc_register_region)
>  KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 16e07a2eee19..04c430eb25cf 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1778,6 +1778,7 @@ struct kvm_x86_ops {
>  	void (*enable_smi_window)(struct kvm_vcpu *vcpu);
>  #endif
>  
> +	int (*dev_get_attr)(u32 group, u64 attr, u64 *val);
>  	int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp);
>  	int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *argp);
>  	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3d2029402513..3934e7682734 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4842,34 +4842,44 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	return r;
>  }
>  
> -static int kvm_x86_dev_get_attr(struct kvm_device_attr *attr)
> +static int __kvm_x86_dev_get_attr(struct kvm_device_attr *attr, u64 *val)
>  {
> -	u64 __user *uaddr = u64_to_user_ptr(attr->addr);
> -
> -	if (attr->group)
> +	if (attr->group) {
> +		if (kvm_x86_ops.dev_get_attr)
> +			return static_call(kvm_x86_dev_get_attr)(attr->group, attr->attr, val);
>  		return -ENXIO;
> +	}
>  
>  	switch (attr->attr) {
>  	case KVM_X86_XCOMP_GUEST_SUPP:
> -		if (put_user(kvm_caps.supported_xcr0, uaddr))
> -			return -EFAULT;
> +		*val = kvm_caps.supported_xcr0;
>  		return 0;
>  	default:
>  		return -ENXIO;
>  	}
>  }
>  
> +static int kvm_x86_dev_get_attr(struct kvm_device_attr *attr)
> +{
> +	u64 __user *uaddr = u64_to_user_ptr(attr->addr);
> +	int r;
> +	u64 val;
> +
> +	r = __kvm_x86_dev_get_attr(attr, &val);
> +	if (r < 0)
> +		return r;
> +
> +	if (put_user(val, uaddr))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
>  {
> -	if (attr->group)
> -		return -ENXIO;
> +	u64 val;
>  
> -	switch (attr->attr) {
> -	case KVM_X86_XCOMP_GUEST_SUPP:
> -		return 0;
> -	default:
> -		return -ENXIO;
> -	}
> +	return __kvm_x86_dev_get_attr(attr, &val);
>  }
>  
>  long kvm_arch_dev_ioctl(struct file *filp,
> -- 
> 2.43.0
> 
> 
> 

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 05/17] KVM: SEV: publish supported VMSA features
  2024-04-04 12:13 ` [PATCH v5 05/17] KVM: SEV: publish supported VMSA features Paolo Bonzini
@ 2024-04-04 21:32   ` Isaku Yamahata
  0 siblings, 0 replies; 31+ messages in thread
From: Isaku Yamahata @ 2024-04-04 21:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, michael.roth, isaku.yamahata, isaku.yamahata

On Thu, Apr 04, 2024 at 08:13:15AM -0400,
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Compute the set of features to be stored in the VMSA when KVM is
> initialized; move it from there into kvm_sev_info when SEV is initialized,
> and then into the initial VMSA.
> 
> The new variable can then be used to return the set of supported features
> to userspace, via the KVM_GET_DEVICE_ATTR ioctl.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  .../virt/kvm/x86/amd-memory-encryption.rst    | 12 ++++++++++
>  arch/x86/include/uapi/asm/kvm.h               |  9 +++++--
>  arch/x86/kvm/svm/sev.c                        | 24 +++++++++++++++++--
>  arch/x86/kvm/svm/svm.c                        |  1 +
>  arch/x86/kvm/svm/svm.h                        |  2 ++
>  5 files changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> index 84335d119ff1..2ea648e4c97a 100644
> --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
> @@ -425,6 +425,18 @@ issued by the hypervisor to make the guest ready for execution.
>  
>  Returns: 0 on success, -negative on error
>  
> +Device attribute API
> +====================
> +
> +Attributes of the SEV implementation can be retrieved through the
> +``KVM_HAS_DEVICE_ATTR`` and ``KVM_GET_DEVICE_ATTR`` ioctls on the ``/dev/kvm``
> +device node, using group ``KVM_X86_GRP_SEV``.
> +
> +Currently only one attribute is implemented:
> +
> +* ``KVM_X86_SEV_VMSA_FEATURES``: return the set of all bits that
> +  are accepted in the ``vmsa_features`` of ``KVM_SEV_INIT2``.
> +
>  Firmware Management
>  ===================
>  
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index ef11aa4cab42..b7dc515f4c27 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -457,8 +457,13 @@ struct kvm_sync_regs {
>  
>  #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE	0x00000001
>  
> -/* attributes for system fd (group 0) */
> -#define KVM_X86_XCOMP_GUEST_SUPP	0
> +/* vendor-independent attributes for system fd (group 0) */
> +#define KVM_X86_GRP_SYSTEM		0
> +#  define KVM_X86_XCOMP_GUEST_SUPP	0
> +
> +/* vendor-specific groups and attributes for system fd */
> +#define KVM_X86_GRP_SEV			1
> +#  define KVM_X86_SEV_VMSA_FEATURES	0
>  
>  struct kvm_vmx_nested_state_data {
>  	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];

Thank you for updating those.  Only for constat and document part.
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-04-04 12:13 ` [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features Paolo Bonzini
@ 2024-04-04 21:39   ` Isaku Yamahata
  2024-04-05 23:01   ` Edgecombe, Rick P
  1 sibling, 0 replies; 31+ messages in thread
From: Isaku Yamahata @ 2024-04-04 21:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, michael.roth, isaku.yamahata, isaku.yamahata

On Thu, Apr 04, 2024 at 08:13:17AM -0400,
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Some VM types have characteristics in common; in fact, the only use
> of VM types right now is kvm_arch_has_private_mem and it assumes that
> _all_ nonzero VM types have private memory.
> 
> We will soon introduce a VM type for SEV and SEV-ES VMs, and at that
> point we will have two special characteristics of confidential VMs
> that depend on the VM type: not just if memory is private, but
> also whether guest state is protected.  For the latter we have
> kvm->arch.guest_state_protected, which is only set on a fully initialized
> VM.
> 
> For VM types with protected guest state, we can actually fix a problem in
> the SEV-ES implementation, where ioctls to set registers do not cause an
> error even if the VM has been initialized and the guest state encrypted.
> Make sure that when using VM types that will become an error.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Message-Id: <20240209183743.22030-7-pbonzini@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  7 ++-
>  arch/x86/kvm/x86.c              | 93 ++++++++++++++++++++++++++-------
>  2 files changed, 79 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 04c430eb25cf..3d56b5bb10e9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1279,12 +1279,14 @@ enum kvm_apicv_inhibit {
>  };
>  
>  struct kvm_arch {
> -	unsigned long vm_type;
>  	unsigned long n_used_mmu_pages;
>  	unsigned long n_requested_mmu_pages;
>  	unsigned long n_max_mmu_pages;
>  	unsigned int indirect_shadow_pages;
>  	u8 mmu_valid_gen;
> +	u8 vm_type;
> +	bool has_private_mem;
> +	bool has_protected_state;
>  	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
>  	struct list_head active_mmu_pages;
>  	struct list_head zapped_obsolete_pages;
> @@ -2153,8 +2155,9 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
>  void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>  		       int tdp_max_root_level, int tdp_huge_page_level);
>  
> +
>  #ifdef CONFIG_KVM_PRIVATE_MEM
> -#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type != KVM_X86_DEFAULT_VM)
> +#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
>  #else
>  #define kvm_arch_has_private_mem(kvm) false
>  #endif
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3934e7682734..d4a8d896798f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5555,11 +5555,15 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> -static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
> -					     struct kvm_debugregs *dbgregs)
> +static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
> +					    struct kvm_debugregs *dbgregs)
>  {
>  	unsigned int i;
>  
> +	if (vcpu->kvm->arch.has_protected_state &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	memset(dbgregs, 0, sizeof(*dbgregs));
>  
>  	BUILD_BUG_ON(ARRAY_SIZE(vcpu->arch.db) != ARRAY_SIZE(dbgregs->db));
> @@ -5568,6 +5572,7 @@ static void kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
>  
>  	dbgregs->dr6 = vcpu->arch.dr6;
>  	dbgregs->dr7 = vcpu->arch.dr7;
> +	return 0;
>  }
>  
>  static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
> @@ -5575,6 +5580,10 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
>  {
>  	unsigned int i;
>  
> +	if (vcpu->kvm->arch.has_protected_state &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	if (dbgregs->flags)
>  		return -EINVAL;
>  
> @@ -5595,8 +5604,8 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
>  }
>  
>  
> -static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
> -					  u8 *state, unsigned int size)
> +static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
> +					 u8 *state, unsigned int size)
>  {
>  	/*
>  	 * Only copy state for features that are enabled for the guest.  The
> @@ -5614,24 +5623,25 @@ static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
>  			     XFEATURE_MASK_FPSSE;
>  
>  	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
> -		return;
> +		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
>  
>  	fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size,
>  				       supported_xcr0, vcpu->arch.pkru);
> +	return 0;
>  }
>  
> -static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
> -					 struct kvm_xsave *guest_xsave)
> +static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
> +					struct kvm_xsave *guest_xsave)
>  {
> -	kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region,
> -				      sizeof(guest_xsave->region));
> +	return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region,
> +					     sizeof(guest_xsave->region));
>  }
>  
>  static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
>  					struct kvm_xsave *guest_xsave)
>  {
>  	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
> -		return 0;
> +		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
>  
>  	return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
>  					      guest_xsave->region,
> @@ -5639,18 +5649,23 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
>  					      &vcpu->arch.pkru);
>  }
>  
> -static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
> -					struct kvm_xcrs *guest_xcrs)
> +static int kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu,
> +				       struct kvm_xcrs *guest_xcrs)
>  {
> +	if (vcpu->kvm->arch.has_protected_state &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	if (!boot_cpu_has(X86_FEATURE_XSAVE)) {
>  		guest_xcrs->nr_xcrs = 0;
> -		return;
> +		return 0;
>  	}
>  
>  	guest_xcrs->nr_xcrs = 1;
>  	guest_xcrs->flags = 0;
>  	guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK;
>  	guest_xcrs->xcrs[0].value = vcpu->arch.xcr0;
> +	return 0;
>  }
>  
>  static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
> @@ -5658,6 +5673,10 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu,
>  {
>  	int i, r = 0;
>  
> +	if (vcpu->kvm->arch.has_protected_state &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	if (!boot_cpu_has(X86_FEATURE_XSAVE))
>  		return -EINVAL;
>  
> @@ -6040,7 +6059,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  	case KVM_GET_DEBUGREGS: {
>  		struct kvm_debugregs dbgregs;
>  
> -		kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
> +		r = kvm_vcpu_ioctl_x86_get_debugregs(vcpu, &dbgregs);
> +		if (r < 0)
> +			break;
>  
>  		r = -EFAULT;
>  		if (copy_to_user(argp, &dbgregs,
> @@ -6070,7 +6091,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		if (!u.xsave)
>  			break;
>  
> -		kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
> +		r = kvm_vcpu_ioctl_x86_get_xsave(vcpu, u.xsave);
> +		if (r < 0)
> +			break;
>  
>  		r = -EFAULT;
>  		if (copy_to_user(argp, u.xsave, sizeof(struct kvm_xsave)))
> @@ -6099,7 +6122,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		if (!u.xsave)
>  			break;
>  
> -		kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
> +		r = kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, size);
> +		if (r < 0)
> +			break;
>  
>  		r = -EFAULT;
>  		if (copy_to_user(argp, u.xsave, size))
> @@ -6115,7 +6140,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		if (!u.xcrs)
>  			break;
>  
> -		kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
> +		r = kvm_vcpu_ioctl_x86_get_xcrs(vcpu, u.xcrs);
> +		if (r < 0)
> +			break;
>  
>  		r = -EFAULT;
>  		if (copy_to_user(argp, u.xcrs,
> @@ -6259,6 +6286,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  	}
>  #endif
>  	case KVM_GET_SREGS2: {
> +		r = -EINVAL;
> +		if (vcpu->kvm->arch.has_protected_state &&
> +		    vcpu->arch.guest_state_protected)
> +			goto out;
> +
>  		u.sregs2 = kzalloc(sizeof(struct kvm_sregs2), GFP_KERNEL);
>  		r = -ENOMEM;
>  		if (!u.sregs2)
> @@ -6271,6 +6303,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		break;
>  	}
>  	case KVM_SET_SREGS2: {
> +		r = -EINVAL;
> +		if (vcpu->kvm->arch.has_protected_state &&
> +		    vcpu->arch.guest_state_protected)
> +			goto out;
> +
>  		u.sregs2 = memdup_user(argp, sizeof(struct kvm_sregs2));
>  		if (IS_ERR(u.sregs2)) {
>  			r = PTR_ERR(u.sregs2);
> @@ -11478,6 +11515,10 @@ static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>  
>  int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>  {
> +	if (vcpu->kvm->arch.has_protected_state &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	vcpu_load(vcpu);
>  	__get_regs(vcpu, regs);
>  	vcpu_put(vcpu);
> @@ -11519,6 +11560,10 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>  
>  int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
>  {
> +	if (vcpu->kvm->arch.has_protected_state &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	vcpu_load(vcpu);
>  	__set_regs(vcpu, regs);
>  	vcpu_put(vcpu);
> @@ -11591,6 +11636,10 @@ static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
>  int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
>  				  struct kvm_sregs *sregs)
>  {
> +	if (vcpu->kvm->arch.has_protected_state &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	vcpu_load(vcpu);
>  	__get_sregs(vcpu, sregs);
>  	vcpu_put(vcpu);
> @@ -11858,6 +11907,10 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>  {
>  	int ret;
>  
> +	if (vcpu->kvm->arch.has_protected_state &&
> +	    vcpu->arch.guest_state_protected)
> +		return -EINVAL;
> +
>  	vcpu_load(vcpu);
>  	ret = __set_sregs(vcpu, sregs);
>  	vcpu_put(vcpu);
> @@ -11975,7 +12028,7 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
>  	struct fxregs_state *fxsave;
>  
>  	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
> -		return 0;
> +		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
>  
>  	vcpu_load(vcpu);
>  
> @@ -11998,7 +12051,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
>  	struct fxregs_state *fxsave;
>  
>  	if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
> -		return 0;
> +		return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
>  
>  	vcpu_load(vcpu);
>  
> @@ -12524,6 +12577,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  		return -EINVAL;
>  
>  	kvm->arch.vm_type = type;
> +	kvm->arch.has_private_mem =
> +		(type == KVM_X86_SW_PROTECTED_VM);
>  
>  	ret = kvm_page_track_init(kvm);
>  	if (ret)
> -- 
> 2.43.0

This works well with TDX KVM patch series.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-04-04 12:13 ` [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features Paolo Bonzini
  2024-04-04 21:39   ` Isaku Yamahata
@ 2024-04-05 23:01   ` Edgecombe, Rick P
  2024-04-09  1:21     ` Sean Christopherson
  1 sibling, 1 reply; 31+ messages in thread
From: Edgecombe, Rick P @ 2024-04-05 23:01 UTC (permalink / raw)
  To: kvm, pbonzini, linux-kernel; +Cc: seanjc, michael.roth, Yamahata, Isaku

On Thu, 2024-04-04 at 08:13 -0400, Paolo Bonzini wrote:
>  
>  struct kvm_arch {
> -       unsigned long vm_type;
>         unsigned long n_used_mmu_pages;
>         unsigned long n_requested_mmu_pages;
>         unsigned long n_max_mmu_pages;
>         unsigned int indirect_shadow_pages;
>         u8 mmu_valid_gen;
> +       u8 vm_type;
> +       bool has_private_mem;
> +       bool has_protected_state;

I'm a little late to this conversation, so hopefully not just complicating things. But why not
deduce has_private_mem and has_protected_state from the vm_type during runtime? Like if
kvm.arch.vm_type was instead a bit mask with the bit position of the KVM_X86_*_VM set,
kvm_arch_has_private_mem() could bitwise-and with a compile time mask of vm_types that have primate
memory. This also prevents it from ever transitioning through non-nonsensical states like vm_type ==
KVM_X86_TDX_VM, but !has_private_memory, so would be a little more robust.

Partly why I ask is there is logic in the x86 MMU TDX changes that tries to be generic but still
needs special handling for it. The current solution is to look at kvm_gfn_shared_mask() as TDX is
the only vm type that sets it, but Isaku and I were discussing if we should check something else,
that didn't appear to be tying together to unrelated concepts:
https://lore.kernel.org/kvm/20240319235654.GC1994522@ls.amr.corp.intel.com/

Since it's down the mail, the relevant snippet:
"
> >  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> >                                    struct kvm_memory_slot *slot)
> >  {
> > -       kvm_mmu_zap_all_fast(kvm);
> > +       if (kvm_gfn_shared_mask(kvm))
> 
> There seems to be an attempt to abstract away the existence of Secure-
> EPT in mmu.c, that is not fully successful. In this case the code
> checks kvm_gfn_shared_mask() to see if it needs to handle the zapping
> in a way specific needed by S-EPT. It ends up being a little confusing
> because the actual check is about whether there is a shared bit. It
> only works because only S-EPT is the only thing that has a
> kvm_gfn_shared_mask().
> 
> Doing something like (kvm->arch.vm_type == KVM_X86_TDX_VM) looks wrong,
> but is more honest about what we are getting up to here. I'm not sure
> though, what do you think?

Right, I attempted and failed in zapping case.  This is due to the restriction
that the Secure-EPT pages must be removed from the leaves.  the VMX case (also
NPT, even SNP) heavily depends on zapping root entry as optimization.

I can think of
- add TDX check. Looks wrong
- Use kvm_gfn_shared_mask(kvm). confusing
- Give other name for this check like zap_from_leafs (or better name?)
  The implementation is same to kvm_gfn_shared_mask() with comment.
  - Or we can add a boolean variable to struct kvm
"

This patch seems like the convention would be to add and check a "zap_leafs_only" bool. But it
starts to become a lot of bools. If instead we added an arch_zap_leafs_only(struct kvm *kvm), that
checked the vm_type was KVM_X86_TDX_VM, it could make the calling code more clear. But then I wonder
why not do the same for has_private_mem and has_protected_state?

Of course TDX can adjust for any format of the state. Just seems cleaner to me.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-04-05 23:01   ` Edgecombe, Rick P
@ 2024-04-09  1:21     ` Sean Christopherson
  2024-04-09 14:01       ` Edgecombe, Rick P
                         ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-04-09  1:21 UTC (permalink / raw)
  To: Rick P Edgecombe
  Cc: kvm, pbonzini, linux-kernel, michael.roth, Isaku Yamahata

On Fri, Apr 05, 2024, Rick P Edgecombe wrote:
> On Thu, 2024-04-04 at 08:13 -0400, Paolo Bonzini wrote:
> >  
> >  struct kvm_arch {
> > -       unsigned long vm_type;
> >         unsigned long n_used_mmu_pages;
> >         unsigned long n_requested_mmu_pages;
> >         unsigned long n_max_mmu_pages;
> >         unsigned int indirect_shadow_pages;
> >         u8 mmu_valid_gen;
> > +       u8 vm_type;
> > +       bool has_private_mem;
> > +       bool has_protected_state;
> 
> I'm a little late to this conversation, so hopefully not just complicating
> things. But why not deduce has_private_mem and has_protected_state from the
> vm_type during runtime? Like if kvm.arch.vm_type was instead a bit mask with
> the bit position of the KVM_X86_*_VM set, kvm_arch_has_private_mem() could
> bitwise-and with a compile time mask of vm_types that have primate memory.
> This also prevents it from ever transitioning through non-nonsensical states
> like vm_type == KVM_X86_TDX_VM, but !has_private_memory, so would be a little
> more robust.

LOL, time is a circle, or something like that.  Paolo actually did this in v2[*],
and I objected, vociferously.

KVM advertises VM types to userspace via a 32-bit field, one bit per type.  So
without more uAPI changes, the VM type needs to be <=31.  KVM could embed the
"has private memory" information into the type, but then we cut down on the number
of possible VM types *and* bleed has_private_memory into KVM's ABI.

While it's unlikely KVM will ever support TDX without has_private_memory, it's
entirely possible that KVM could add support for an existing VM "base" type that
doesn't currently support private memory.  E.g. with some massaging, KVM could
support private memory for SEV and SEV-ES.  And then we need to add an entirely
new VM type just so that KVM can let it use private memory.

Obviously KVM could shove in bits after the fact, e.g. store vm_type as a u64
instead of u32 (or u8 as in this patch), but then what's the point?  Burning a
byte instead of a bit for per-VM flag is a complete non-issue, and booleans tend
to yield code that's easier to read and easier to maintain.

[*] https://lore.kernel.org/all/ZdjL783FazB6V6Cy@google.com

> Partly why I ask is there is logic in the x86 MMU TDX changes that tries to
> be generic but still needs special handling for it. The current solution is
> to look at kvm_gfn_shared_mask() as TDX is the only vm type that sets it, but
> Isaku and I were discussing if we should check something else, that didn't
> appear to be tying together to unrelated concepts:
> https://lore.kernel.org/kvm/20240319235654.GC1994522@ls.amr.corp.intel.com/
> 
> Since it's down the mail, the relevant snippet:
> "
> > >  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> > >                                    struct kvm_memory_slot *slot)
> > >  {
> > > -       kvm_mmu_zap_all_fast(kvm);
> > > +       if (kvm_gfn_shared_mask(kvm))

Whatever you do that is TDX specific and an internal KVM thing is likely the wrong
thing :-)

The main reason KVM doesn't do a targeted zap on memslot removal is because of ABI
baggage that we _think_ is limited to interaction with VFIO.  Since KVM doesn't
have any ABI for TDX *or* SNP, I want to at least entertain the option of doing
a target zap for SNP as well as TDX, even though it's only truly "necessary" for
TDX, in quotes because it's not strictly necessary, e.g. KVM could BLOCK the S-EPT
entries without fully removing the mappings.

Whether or not targeted zapping is optimal for SNP (or any VM type) is very much
TBD, and likely highly dependent on use case, but at the same time it would be
nice to not rule it out completely.

E.g. ChromeOS currently has a use case where they frequently delete and recreate
a 2GiB (give or take) memslot.  For that use case, zapping _just_ that memslot is
likely far superious than blasting and rebuilding the entire VM.  But if userspace
deletes a 1TiB for some reason, e.g. for memory unplug?, then the fast zap is
probably better, even though it requires rebuilding all SPTEs.

> > There seems to be an attempt to abstract away the existence of Secure-
> > EPT in mmu.c, that is not fully successful. In this case the code
> > checks kvm_gfn_shared_mask() to see if it needs to handle the zapping
> > in a way specific needed by S-EPT. It ends up being a little confusing
> > because the actual check is about whether there is a shared bit. It
> > only works because only S-EPT is the only thing that has a
> > kvm_gfn_shared_mask().
> > 
> > Doing something like (kvm->arch.vm_type == KVM_X86_TDX_VM) looks wrong,
> > but is more honest about what we are getting up to here. I'm not sure
> > though, what do you think?
> 
> Right, I attempted and failed in zapping case.  This is due to the restriction
> that the Secure-EPT pages must be removed from the leaves.  the VMX case (also
> NPT, even SNP) heavily depends on zapping root entry as optimization.

As above, it's more nuanced than that.  KVM has come to depend on the fast zap,
but it got that way *because* KVM has historical zapped everything, and userspace
has (unknowingly) relied on that behavior.

> I can think of
> - add TDX check. Looks wrong
> - Use kvm_gfn_shared_mask(kvm). confusing

Ya, even if we end up making it a hardcoded TDX thing, dress it up a bit.  E.g.
even if KVM checks for a shared mask under the hood, add a helper to capture the
logic, e.g. kvm_zap_all_sptes_on_memslot_deletion(kvm).

> - Give other name for this check like zap_from_leafs (or better name?)
>   The implementation is same to kvm_gfn_shared_mask() with comment.
>   - Or we can add a boolean variable to struct kvm

If we _don't_ hardcode the behavior, a per-memslot flag or a per-VM capability
(and thus boolean) is likely the way to go.  My off-the-cuff vote is probably for
a per-memslot flag.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-04-09  1:21     ` Sean Christopherson
@ 2024-04-09 14:01       ` Edgecombe, Rick P
  2024-04-09 14:43       ` Paolo Bonzini
  2024-05-07 23:01       ` Edgecombe, Rick P
  2 siblings, 0 replies; 31+ messages in thread
From: Edgecombe, Rick P @ 2024-04-09 14:01 UTC (permalink / raw)
  To: seanjc; +Cc: kvm, pbonzini, linux-kernel, michael.roth, Yamahata, Isaku

On Mon, 2024-04-08 at 18:21 -0700, Sean Christopherson wrote:
> 
> Whatever you do that is TDX specific and an internal KVM thing is likely the
> wrong
> thing :-)
> 
> The main reason KVM doesn't do a targeted zap on memslot removal is because of
> ABI
> baggage that we _think_ is limited to interaction with VFIO.  Since KVM
> doesn't
> have any ABI for TDX *or* SNP, I want to at least entertain the option of
> doing
> a target zap for SNP as well as TDX, even though it's only truly "necessary"
> for
> TDX, in quotes because it's not strictly necessary, e.g. KVM could BLOCK the
> S-EPT
> entries without fully removing the mappings.
> 
> Whether or not targeted zapping is optimal for SNP (or any VM type) is very
> much
> TBD, and likely highly dependent on use case, but at the same time it would be
> nice to not rule it out completely.
> 
> E.g. ChromeOS currently has a use case where they frequently delete and
> recreate
> a 2GiB (give or take) memslot.  For that use case, zapping _just_ that memslot
> is
> likely far superious than blasting and rebuilding the entire VM.  But if
> userspace
> deletes a 1TiB for some reason, e.g. for memory unplug?, then the fast zap is
> probably better, even though it requires rebuilding all SPTEs.

Interesting, thanks for the history.

> 
> > > There seems to be an attempt to abstract away the existence of Secure-
> > > EPT in mmu.c, that is not fully successful. In this case the code
> > > checks kvm_gfn_shared_mask() to see if it needs to handle the zapping
> > > in a way specific needed by S-EPT. It ends up being a little confusing
> > > because the actual check is about whether there is a shared bit. It
> > > only works because only S-EPT is the only thing that has a
> > > kvm_gfn_shared_mask().
> > > 
> > > Doing something like (kvm->arch.vm_type == KVM_X86_TDX_VM) looks wrong,
> > > but is more honest about what we are getting up to here. I'm not sure
> > > though, what do you think?
> > 
> > Right, I attempted and failed in zapping case.  This is due to the
> > restriction
> > that the Secure-EPT pages must be removed from the leaves.  the VMX case
> > (also
> > NPT, even SNP) heavily depends on zapping root entry as optimization.
> 
> As above, it's more nuanced than that.  KVM has come to depend on the fast
> zap,
> but it got that way *because* KVM has historical zapped everything, and
> userspace
> has (unknowingly) relied on that behavior.
> 
> > I can think of
> > - add TDX check. Looks wrong
> > - Use kvm_gfn_shared_mask(kvm). confusing
> 
> Ya, even if we end up making it a hardcoded TDX thing, dress it up a bit. 
> E.g.
> even if KVM checks for a shared mask under the hood, add a helper to capture
> the
> logic, e.g. kvm_zap_all_sptes_on_memslot_deletion(kvm).
> 
> > - Give other name for this check like zap_from_leafs (or better name?)
> >    The implementation is same to kvm_gfn_shared_mask() with comment.
> >    - Or we can add a boolean variable to struct kvm
> 
> If we _don't_ hardcode the behavior, a per-memslot flag or a per-VM capability
> (and thus boolean) is likely the way to go.  My off-the-cuff vote is probably
> for
> a per-memslot flag.

The per-memslot flag is interesting. If we had a per-memslot flag it might be
nice for that 2GB memslot. For TDX, making userspace have to know about zapping
requirements is not ideal. If TDX somehow loses the restriction someday, then
userspace would have to manage that as well. I think the decision belongs inside
KVM, for TDX at least.

We'll have to take a look at how they would come together in the code.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-04-09  1:21     ` Sean Christopherson
  2024-04-09 14:01       ` Edgecombe, Rick P
@ 2024-04-09 14:43       ` Paolo Bonzini
  2024-04-09 15:26         ` Sean Christopherson
  2024-05-07 23:01       ` Edgecombe, Rick P
  2 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2024-04-09 14:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Rick P Edgecombe, kvm, linux-kernel, michael.roth, Isaku Yamahata

On Tue, Apr 9, 2024 at 3:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > I'm a little late to this conversation, so hopefully not just complicating
> > things. But why not deduce has_private_mem and has_protected_state from the
> > vm_type during runtime? Like if kvm.arch.vm_type was instead a bit mask with
> > the bit position of the KVM_X86_*_VM set, kvm_arch_has_private_mem() could
> > bitwise-and with a compile time mask of vm_types that have primate memory.
> > This also prevents it from ever transitioning through non-nonsensical states
> > like vm_type == KVM_X86_TDX_VM, but !has_private_memory, so would be a little
> > more robust.
>
> LOL, time is a circle, or something like that.  Paolo actually did this in v2[*],
> and I objected, vociferously.

To be fair, Rick is asking for something much less hideous - just set

 kvm->arch.vm_type = (1 << vm_type);

and then define kvm_has_*(kvm) as !!(kvm->arch.vm_type & SOME_BIT_MASK).

And indeed it makes sense as an alternative. It also feels a little
bit more restrictive and the benefit is small, so I think I'm going to
go with this version.

Paolo


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-04-09 14:43       ` Paolo Bonzini
@ 2024-04-09 15:26         ` Sean Christopherson
  0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-04-09 15:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rick P Edgecombe, kvm, linux-kernel, michael.roth, Isaku Yamahata

On Tue, Apr 09, 2024, Paolo Bonzini wrote:
> On Tue, Apr 9, 2024 at 3:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > I'm a little late to this conversation, so hopefully not just complicating
> > > things. But why not deduce has_private_mem and has_protected_state from the
> > > vm_type during runtime? Like if kvm.arch.vm_type was instead a bit mask with
> > > the bit position of the KVM_X86_*_VM set, kvm_arch_has_private_mem() could
> > > bitwise-and with a compile time mask of vm_types that have primate memory.
> > > This also prevents it from ever transitioning through non-nonsensical states
> > > like vm_type == KVM_X86_TDX_VM, but !has_private_memory, so would be a little
> > > more robust.
> >
> > LOL, time is a circle, or something like that.  Paolo actually did this in v2[*],
> > and I objected, vociferously.
> 
> To be fair, Rick is asking for something much less hideous - just set
> 
>  kvm->arch.vm_type = (1 << vm_type);
> 
> and then define kvm_has_*(kvm) as !!(kvm->arch.vm_type & SOME_BIT_MASK).
> 
> And indeed it makes sense as an alternative.

Ah, yeah, I'd be fine with that. 

> It also feels a little bit more restrictive and the benefit is small, so I
> think I'm going to go with this version.

+1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-04-09  1:21     ` Sean Christopherson
  2024-04-09 14:01       ` Edgecombe, Rick P
  2024-04-09 14:43       ` Paolo Bonzini
@ 2024-05-07 23:01       ` Edgecombe, Rick P
  2024-05-08  0:21         ` Sean Christopherson
  2 siblings, 1 reply; 31+ messages in thread
From: Edgecombe, Rick P @ 2024-05-07 23:01 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, pbonzini, linux-kernel, Zhao, Yan Y, michael.roth, Yamahata, Isaku

On Mon, 2024-04-08 at 18:21 -0700, Sean Christopherson wrote:
> > - Give other name for this check like zap_from_leafs (or better name?)
> >    The implementation is same to kvm_gfn_shared_mask() with comment.
> >    - Or we can add a boolean variable to struct kvm
> 
> If we _don't_ hardcode the behavior, a per-memslot flag or a per-VM capability
> (and thus boolean) is likely the way to go.  My off-the-cuff vote is probably
> for
> a per-memslot flag.

Hi Sean,

Can you elaborate on the reason for a per-memslot flag? We are discussing this
design point internally, and also the intersection with the previous attempts to
do something similar with a per-vm flag[0].

I'm wondering if the intention is to try to make a memslot flag, so it can be
expanded for the normal VM usage. Because the discussion on the original
attempts, it seems safer to keep this behavior more limited (TDX only) for now.
And for TDX's usage a struct kvm bool fits best because all memslots need to be
set to zap_leafs_only = true, anyway. It's simpler for userspace, and less
possible situations to worry about for KVM.

[0]
https://lore.kernel.org/kvm/20200703025047.13987-1-sean.j.christopherson@intel.com/


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-05-07 23:01       ` Edgecombe, Rick P
@ 2024-05-08  0:21         ` Sean Christopherson
  2024-05-08  1:19           ` Edgecombe, Rick P
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2024-05-08  0:21 UTC (permalink / raw)
  To: Rick P Edgecombe
  Cc: kvm, pbonzini, linux-kernel, Yan Y Zhao, michael.roth, Isaku Yamahata

On Tue, May 07, 2024, Rick P Edgecombe wrote:
> On Mon, 2024-04-08 at 18:21 -0700, Sean Christopherson wrote:
> > > - Give other name for this check like zap_from_leafs (or better name?)
> > >    The implementation is same to kvm_gfn_shared_mask() with comment.
> > >    - Or we can add a boolean variable to struct kvm
> > 
> > If we _don't_ hardcode the behavior, a per-memslot flag or a per-VM
> > capability (and thus boolean) is likely the way to go.  My off-the-cuff
> > vote is probably for a per-memslot flag.
> 
> Hi Sean,
> 
> Can you elaborate on the reason for a per-memslot flag? We are discussing this
> design point internally, and also the intersection with the previous attempts to
> do something similar with a per-vm flag[0].
> 
> I'm wondering if the intention is to try to make a memslot flag, so it can be
> expanded for the normal VM usage.

Sure, I'll go with that answer.  Like I said, off-the-cuff.

There's no concrete motiviation, it's more that _if_ we're going to expose a knob
to userspace, then I'd prefer to make it as precise as possible to minimize the
changes of KVM ending up back in ABI hell again.

> Because the discussion on the original attempts, it seems safer to keep this
> behavior more limited (TDX only) for now.  And for TDX's usage a struct kvm
> bool fits best because all memslots need to be set to zap_leafs_only = true,
> anyway.

No they don't.  They might be set that way in practice for QEMU, but it's not
strictly required.  E.g. nothing would prevent a VMM from exposing a shared-only
memslot to a guest.  The memslots that burned KVM the first time around were
related to VFIO devices, and I wouldn't put it past someone to be crazy enough
to expose an passhtrough an untrusted device to a TDX guest.

> It's simpler for userspace, and less possible situations to worry about for KVM.
> 
> [0] https://lore.kernel.org/kvm/20200703025047.13987-1-sean.j.christopherson@intel.com/


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-05-08  0:21         ` Sean Christopherson
@ 2024-05-08  1:19           ` Edgecombe, Rick P
  2024-05-08 14:38             ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Edgecombe, Rick P @ 2024-05-08  1:19 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, pbonzini, linux-kernel, Zhao, Yan Y, michael.roth, Yamahata, Isaku

On Tue, 2024-05-07 at 17:21 -0700, Sean Christopherson wrote:
> > Can you elaborate on the reason for a per-memslot flag? We are discussing
> > this
> > design point internally, and also the intersection with the previous
> > attempts to
> > do something similar with a per-vm flag[0].
> > 
> > I'm wondering if the intention is to try to make a memslot flag, so it can
> > be
> > expanded for the normal VM usage.
> 
> Sure, I'll go with that answer.  Like I said, off-the-cuff.
> 
> There's no concrete motiviation, it's more that _if_ we're going to expose a
> knob
> to userspace, then I'd prefer to make it as precise as possible to minimize
> the
> changes of KVM ending up back in ABI hell again.
> 
> > Because the discussion on the original attempts, it seems safer to keep this
> > behavior more limited (TDX only) for now.  And for TDX's usage a struct kvm
> > bool fits best because all memslots need to be set to zap_leafs_only = true,
> > anyway.
> 
> No they don't.  They might be set that way in practice for QEMU, but it's not
> strictly required.  E.g. nothing would prevent a VMM from exposing a shared-
> only
> memslot to a guest.  The memslots that burned KVM the first time around were
> related to VFIO devices, and I wouldn't put it past someone to be crazy enough
> to expose an passhtrough an untrusted device to a TDX guest.

Ok, thanks for clarification. So it's more of a strategic thing to move more
zapping logic into userspace so the logic can change without introducing kernel
regressions.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-05-08  1:19           ` Edgecombe, Rick P
@ 2024-05-08 14:38             ` Sean Christopherson
  2024-05-08 15:04               ` Edgecombe, Rick P
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2024-05-08 14:38 UTC (permalink / raw)
  To: Rick P Edgecombe
  Cc: kvm, pbonzini, linux-kernel, Yan Y Zhao, michael.roth, Isaku Yamahata

On Wed, May 08, 2024, Rick P Edgecombe wrote:
> On Tue, 2024-05-07 at 17:21 -0700, Sean Christopherson wrote:
> > > Can you elaborate on the reason for a per-memslot flag? We are discussing
> > > this
> > > design point internally, and also the intersection with the previous
> > > attempts to
> > > do something similar with a per-vm flag[0].
> > > 
> > > I'm wondering if the intention is to try to make a memslot flag, so it can
> > > be
> > > expanded for the normal VM usage.
> > 
> > Sure, I'll go with that answer.  Like I said, off-the-cuff.
> > 
> > There's no concrete motiviation, it's more that _if_ we're going to expose
> > a knob to userspace, then I'd prefer to make it as precise as possible to
> > minimize the changes of KVM ending up back in ABI hell again.
> > 
> > > Because the discussion on the original attempts, it seems safer to keep this
> > > behavior more limited (TDX only) for now.  And for TDX's usage a struct kvm
> > > bool fits best because all memslots need to be set to zap_leafs_only = true,
> > > anyway.
> > 
> > No they don't.  They might be set that way in practice for QEMU, but it's
> > not strictly required.  E.g. nothing would prevent a VMM from exposing a
> > shared- only memslot to a guest.  The memslots that burned KVM the first
> > time around were related to VFIO devices, and I wouldn't put it past
> > someone to be crazy enough
> > to expose an passhtrough an untrusted device to a TDX guest.
> 
> Ok, thanks for clarification. So it's more of a strategic thing to move more
> zapping logic into userspace so the logic can change without introducing kernel
> regressions.

You're _really_ reading too much into my suggestion.  As above, my suggestion
was very spur of the momemnt.  I haven't put much thought into the tradeoffs and
side effects.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features
  2024-05-08 14:38             ` Sean Christopherson
@ 2024-05-08 15:04               ` Edgecombe, Rick P
  0 siblings, 0 replies; 31+ messages in thread
From: Edgecombe, Rick P @ 2024-05-08 15:04 UTC (permalink / raw)
  To: seanjc
  Cc: kvm, pbonzini, linux-kernel, Zhao, Yan Y, michael.roth, Yamahata, Isaku

On Wed, 2024-05-08 at 07:38 -0700, Sean Christopherson wrote:
> > Ok, thanks for clarification. So it's more of a strategic thing to move more
> > zapping logic into userspace so the logic can change without introducing
> > kernel
> > regressions.
> 
> You're _really_ reading too much into my suggestion.  As above, my suggestion
> was very spur of the momemnt.  I haven't put much thought into the tradeoffs
> and
> side effects.

I'm not taking it as a mandate. Just trying to glean your insights. That said,
I'm really on the fence and so leaning on your intuition as the tie breaker.

For TDX's usage a struct kvm bool seems simpler code wise in KVM, and for
userspace. But the zapping logic as ABI problem seems like a reasonable thing to
think about while we are designing new ABI. Of course, it also means KVM has to
be responsible now for safely zapping memory from a variety of userspace
algorithms. So it somewhat makes KVM's job easier, and somewhat makes it harder.

The real issue might be that that problem was never debugged. While there is no
evidence it will affect TDXs, it remains a possibility. But we can't do the zap
roots thing for TDX, so in the end the ABI design will not affect TDX exposure
either way. But making it a normal feature will affect exposure for normal VMs.
So we are also balancing ABI flexibility with exposure to that specific bug.


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-05-08 15:04 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 12:13 [PATCH v5 00/17] KVM: SEV: allow customizing VMSA features Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 01/17] KVM: SVM: Invert handling of SEV and SEV_ES feature flags Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 02/17] KVM: SVM: Compile sev.c if and only if CONFIG_KVM_AMD_SEV=y Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 03/17] KVM: x86: use u64_to_user_ptr() Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 04/17] KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR Paolo Bonzini
2024-04-04 21:30   ` Isaku Yamahata
2024-04-04 12:13 ` [PATCH v5 05/17] KVM: SEV: publish supported VMSA features Paolo Bonzini
2024-04-04 21:32   ` Isaku Yamahata
2024-04-04 12:13 ` [PATCH v5 06/17] KVM: SEV: store VMSA features in kvm_sev_info Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo features Paolo Bonzini
2024-04-04 21:39   ` Isaku Yamahata
2024-04-05 23:01   ` Edgecombe, Rick P
2024-04-09  1:21     ` Sean Christopherson
2024-04-09 14:01       ` Edgecombe, Rick P
2024-04-09 14:43       ` Paolo Bonzini
2024-04-09 15:26         ` Sean Christopherson
2024-05-07 23:01       ` Edgecombe, Rick P
2024-05-08  0:21         ` Sean Christopherson
2024-05-08  1:19           ` Edgecombe, Rick P
2024-05-08 14:38             ` Sean Christopherson
2024-05-08 15:04               ` Edgecombe, Rick P
2024-04-04 12:13 ` [PATCH v5 08/17] KVM: x86: Add supported_vm_types to kvm_caps Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 09/17] KVM: SEV: introduce to_kvm_sev_info Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 10/17] KVM: SEV: define VM types for SEV and SEV-ES Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 11/17] KVM: SEV: sync FPU and AVX state at LAUNCH_UPDATE_VMSA time Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 12/17] KVM: SEV: introduce KVM_SEV_INIT2 operation Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 13/17] KVM: SEV: allow SEV-ES DebugSwap again Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 14/17] selftests: kvm: add tests for KVM_SEV_INIT2 Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 15/17] selftests: kvm: switch to using KVM_X86_*_VM Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 16/17] selftests: kvm: split "launch" phase of SEV VM creation Paolo Bonzini
2024-04-04 12:13 ` [PATCH v5 17/17] selftests: kvm: add test for transferring FPU state into VMSA Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.