linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/35] SEV-ES hypervisor support
@ 2020-09-14 20:15 Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 01/35] KVM: SVM: Remove the call to sev_platform_status() during setup Tom Lendacky
                   ` (35 more replies)
  0 siblings, 36 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

This patch series provides support for running SEV-ES guests under KVM.

Secure Encrypted Virtualization - Encrypted State (SEV-ES) expands on the
SEV support to protect the guest register state from the hypervisor. See
"AMD64 Architecture Programmer's Manual Volume 2: System Programming",
section "15.35 Encrypted State (SEV-ES)" [1].

In order to allow a hypervisor to perform functions on behalf of a guest,
there is architectural support for notifying a guest's operating system
when certain types of VMEXITs are about to occur. This allows the guest to
selectively share information with the hypervisor to satisfy the requested
function. The notification is performed using a new exception, the VMM
Communication exception (#VC). The information is shared through the
Guest-Hypervisor Communication Block (GHCB) using the VMGEXIT instruction.
The GHCB format and the protocol for using it is documented in "SEV-ES
Guest-Hypervisor Communication Block Standardization" [2].

Under SEV-ES, a vCPU save area (VMSA) must be encrypted. SVM is updated to
build the initial VMSA and then encrypt it before running the guest. Once
encrypted, it must not be modified by the hypervisor. Modification of the
VMSA will result in the VMRUN instruction failing with a SHUTDOWN exit
code. KVM must support the VMGEXIT exit code in order to perform the
necessary functions required of the guest. The GHCB is used to exchange
the information needed by both the hypervisor and the guest.

To simplify access to the VMSA and the GHCB, SVM uses an accessor function
to obtain the address of the either the VMSA or the GHCB, depending on the
stage of execution of the guest.

There are changes to some of the intercepts that are needed under SEV-ES.
For example, CR0 writes cannot be intercepted, so the code needs to ensure
that the intercept is not enabled during execution or that the hypervisor
does not try to read the register as part of exit processing. Another
example is shutdown processing, where the vCPU cannot be directly reset.

Support is added to handle VMGEXIT events and implement the GHCB protocol.
This includes supporting standard exit events, like a CPUID instruction
intercept, to new support, for things like AP processor booting. Much of
the existing SVM intercept support can be re-used by setting the exit
code information from the VMGEXIT and calling the appropriate intercept
handlers.

Finally, to launch and run an SEV-ES guest requires changes to the vCPU
initialization, loading and execution.

[1] https://www.amd.com/system/files/TechDocs/24593.pdf
[2] https://developer.amd.com/wp-content/resources/56421.pdf

---

These patches are based on a commit of the KVM next branch. However, I had
to backport recent SEV-ES guest patches (a previous series to the actual
patches that are now in the tip tree) into my development branch, since
there are prereq patches needed by this series. As a result, this patch
series will not successfully build or apply to the KVM next branch as is.

A version of the tree can be found at:
https://github.com/AMDESE/linux/tree/sev-es-5.8-v3

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>

Tom Lendacky (35):
  KVM: SVM: Remove the call to sev_platform_status() during setup
  KVM: SVM: Add support for SEV-ES capability in KVM
  KVM: SVM: Add indirect access to the VM save area
  KVM: SVM: Make GHCB accessor functions available to the hypervisor
  KVM: SVM: Add initial support for SEV-ES GHCB access to KVM
  KVM: SVM: Add required changes to support intercepts under SEV-ES
  KVM: SVM: Modify DRx register intercepts for an SEV-ES guest
  KVM: SVM: Prevent debugging under SEV-ES
  KVM: SVM: Do not emulate MMIO under SEV-ES
  KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES
  KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file
  KVM: SVM: Add initial support for a VMGEXIT VMEXIT
  KVM: SVM: Create trace events for VMGEXIT processing
  KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002
  KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004
  KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100
  KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
  KVM: SVM: Support MMIO for an SEV-ES guest
  KVM: SVM: Support port IO operations for an SEV-ES guest
  KVM: SVM: Add SEV/SEV-ES support for intercepting INVD
  KVM: SVM: Add support for EFER write traps for an SEV-ES guest
  KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
  KVM: SVM: Add support for booting APs for an SEV-ES guest
  KVM: X86: Update kvm_skip_emulated_instruction() for an SEV-ES guest
  KVM: SVM: Add NMI support for an SEV-ES guest
  KVM: SVM: Set the encryption mask for the SVM host save area
  KVM: SVM: Update ASID allocation to support SEV-ES guests
  KVM: SVM: Provide support for SEV-ES vCPU creation/loading
  KVM: SVM: Provide support for SEV-ES vCPU loading
  KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
  KVM: SVM: Provide support to launch and run an SEV-ES guest

 arch/x86/include/asm/kvm_host.h  |  16 +
 arch/x86/include/asm/msr-index.h |   1 +
 arch/x86/include/asm/svm.h       |  35 +-
 arch/x86/include/uapi/asm/svm.h  |  28 ++
 arch/x86/kernel/cpu/vmware.c     |  12 +-
 arch/x86/kvm/Kconfig             |   3 +-
 arch/x86/kvm/cpuid.c             |   1 +
 arch/x86/kvm/kvm_cache_regs.h    |  30 +-
 arch/x86/kvm/mmu/mmu.c           |   7 +
 arch/x86/kvm/svm/nested.c        | 125 +++---
 arch/x86/kvm/svm/sev.c           | 590 ++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.c           | 704 ++++++++++++++++++++++++-------
 arch/x86/kvm/svm/svm.h           | 357 ++++++++++++++--
 arch/x86/kvm/svm/vmenter.S       |  50 +++
 arch/x86/kvm/trace.h             |  99 +++++
 arch/x86/kvm/vmx/vmx.c           |   7 +
 arch/x86/kvm/x86.c               | 357 ++++++++++++++--
 arch/x86/kvm/x86.h               |   8 +
 18 files changed, 2070 insertions(+), 360 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [RFC PATCH 01/35] KVM: SVM: Remove the call to sev_platform_status() during setup
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 02/35] KVM: SVM: Add support for SEV-ES capability in KVM Tom Lendacky
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

When both KVM support and the CCP driver are built into the kernel instead
of as modules, KVM initialization happens before CCP initialization. As a
result, sev_platform_status() will return a failure when it is called from
sev_hardware_setup(), when this isn't really an error condition.

Since sev_platform_status() doesn't need to be called at this time anyway,
remove the invocation from sev_hardware_setup().

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 22 +---------------------
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 402dc4234e39..fab382e2dad2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1117,9 +1117,6 @@ void sev_vm_destroy(struct kvm *kvm)
 
 int __init sev_hardware_setup(void)
 {
-	struct sev_user_data_status *status;
-	int rc;
-
 	/* Maximum number of encrypted guests supported simultaneously */
 	max_sev_asid = cpuid_ecx(0x8000001F);
 
@@ -1138,26 +1135,9 @@ int __init sev_hardware_setup(void)
 	if (!sev_reclaim_asid_bitmap)
 		return 1;
 
-	status = kmalloc(sizeof(*status), GFP_KERNEL);
-	if (!status)
-		return 1;
-
-	/*
-	 * Check SEV platform status.
-	 *
-	 * PLATFORM_STATUS can be called in any state, if we failed to query
-	 * the PLATFORM status then either PSP firmware does not support SEV
-	 * feature or SEV firmware is dead.
-	 */
-	rc = sev_platform_status(status, NULL);
-	if (rc)
-		goto err;
-
 	pr_info("SEV supported\n");
 
-err:
-	kfree(status);
-	return rc;
+	return 0;
 }
 
 void sev_hardware_teardown(void)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 02/35] KVM: SVM: Add support for SEV-ES capability in KVM
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 01/35] KVM: SVM: Remove the call to sev_platform_status() during setup Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 03/35] KVM: SVM: Add indirect access to the VM save area Tom Lendacky
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support to KVM for determining if a system is capable of supporting
SEV-ES as well as determining if a guest is an SEV-ES guest.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/Kconfig   |  3 ++-
 arch/x86/kvm/svm/sev.c | 47 ++++++++++++++++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.c | 20 +++++++++---------
 arch/x86/kvm/svm/svm.h | 17 ++++++++++++++-
 4 files changed, 66 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fbd5bd7a945a..4e8924aab05e 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -99,7 +99,8 @@ config KVM_AMD_SEV
 	depends on KVM_AMD && X86_64
 	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
 	help
-	Provides support for launching Encrypted VMs on AMD processors.
+	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
+	  with Encrypted State (SEV-ES) on AMD processors.
 
 config KVM_MMU_AUDIT
 	bool "Audit KVM MMU"
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index fab382e2dad2..48379e21ed43 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -923,7 +923,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	struct kvm_sev_cmd sev_cmd;
 	int r;
 
-	if (!svm_sev_enabled())
+	if (!svm_sev_enabled() || !sev)
 		return -ENOTTY;
 
 	if (!argp)
@@ -1115,29 +1115,58 @@ void sev_vm_destroy(struct kvm *kvm)
 	sev_asid_free(sev->asid);
 }
 
-int __init sev_hardware_setup(void)
+void __init sev_hardware_setup(void)
 {
+	unsigned int eax, ebx, ecx, edx;
+	bool sev_es_supported = false;
+	bool sev_supported = false;
+
+	/* Does the CPU support SEV? */
+	if (!boot_cpu_has(X86_FEATURE_SEV))
+		goto out;
+
+	/* Retrieve SEV CPUID information */
+	cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
+
 	/* Maximum number of encrypted guests supported simultaneously */
-	max_sev_asid = cpuid_ecx(0x8000001F);
+	max_sev_asid = ecx;
 
 	if (!svm_sev_enabled())
-		return 1;
+		goto out;
 
 	/* Minimum ASID value that should be used for SEV guest */
-	min_sev_asid = cpuid_edx(0x8000001F);
+	min_sev_asid = edx;
 
 	/* Initialize SEV ASID bitmaps */
 	sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
 	if (!sev_asid_bitmap)
-		return 1;
+		goto out;
 
 	sev_reclaim_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
 	if (!sev_reclaim_asid_bitmap)
-		return 1;
+		goto out;
 
-	pr_info("SEV supported\n");
+	pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1);
+	sev_supported = true;
 
-	return 0;
+	/* SEV-ES support requested? */
+	if (!sev_es)
+		goto out;
+
+	/* Does the CPU support SEV-ES? */
+	if (!boot_cpu_has(X86_FEATURE_SEV_ES))
+		goto out;
+
+	/* Has the system been allocated ASIDs for SEV-ES? */
+	if (min_sev_asid == 1)
+		goto out;
+
+	pr_info("SEV-ES supported: %u ASIDs\n", min_sev_asid - 1);
+	sev_es_supported = true;
+
+out:
+	sev = sev_supported;
+	sev_es = sev_es_supported;
 }
 
 void sev_hardware_teardown(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4368b66615c1..83292fc44b4e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -187,9 +187,13 @@ static int vgif = true;
 module_param(vgif, int, 0444);
 
 /* enable/disable SEV support */
-static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
 module_param(sev, int, 0444);
 
+/* enable/disable SEV-ES support */
+int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param(sev_es, int, 0444);
+
 static bool __read_mostly dump_invalid_vmcb = 0;
 module_param(dump_invalid_vmcb, bool, 0644);
 
@@ -860,15 +864,11 @@ static __init int svm_hardware_setup(void)
 		kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
 	}
 
-	if (sev) {
-		if (boot_cpu_has(X86_FEATURE_SEV) &&
-		    IS_ENABLED(CONFIG_KVM_AMD_SEV)) {
-			r = sev_hardware_setup();
-			if (r)
-				sev = false;
-		} else {
-			sev = false;
-		}
+	if (IS_ENABLED(CONFIG_KVM_AMD_SEV) && sev) {
+		sev_hardware_setup();
+	} else {
+		sev = false;
+		sev_es = false;
 	}
 
 	svm_adjust_mmio_mask();
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a798e1731709..2692ddf30c8d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -60,6 +60,7 @@ enum {
 
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
+	bool es_active;		/* SEV-ES enabled guest */
 	unsigned int asid;	/* ASID used for this guest */
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
@@ -348,6 +349,9 @@ static inline bool gif_set(struct vcpu_svm *svm)
 #define MSR_CR3_LONG_RESERVED_MASK		0xfff0000000000fe7U
 #define MSR_INVALID				0xffffffffU
 
+extern int sev;
+extern int sev_es;
+
 u32 svm_msrpm_offset(u32 msr);
 void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
 void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
@@ -474,6 +478,17 @@ static inline bool sev_guest(struct kvm *kvm)
 #endif
 }
 
+static inline bool sev_es_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return sev_guest(kvm) && sev->es_active;
+#else
+	return false;
+#endif
+}
+
 static inline bool svm_sev_enabled(void)
 {
 	return IS_ENABLED(CONFIG_KVM_AMD_SEV) ? max_sev_asid : 0;
@@ -486,7 +501,7 @@ int svm_register_enc_region(struct kvm *kvm,
 int svm_unregister_enc_region(struct kvm *kvm,
 			      struct kvm_enc_region *range);
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
-int __init sev_hardware_setup(void);
+void __init sev_hardware_setup(void);
 void sev_hardware_teardown(void);
 
 #endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 03/35] KVM: SVM: Add indirect access to the VM save area
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 01/35] KVM: SVM: Remove the call to sev_platform_status() during setup Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 02/35] KVM: SVM: Add support for SEV-ES capability in KVM Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 04/35] KVM: SVM: Make GHCB accessor functions available to the hypervisor Tom Lendacky
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

In order to later support accessing the GHCB structure in a similar way
as the VM save area (VMSA) structure, change all accesses to the VMSA
into function calls. Later on, this will allow the hypervisor support to
decide between accessing the VMSA or GHCB in a central location. Accesses
to a nested VMCB structure save area remain as direct save area accesses.

The functions are created using VMSA accessor macros.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/nested.c | 125 +++++++++++++++--------------
 arch/x86/kvm/svm/svm.c    | 165 +++++++++++++++++++-------------------
 arch/x86/kvm/svm/svm.h    | 129 ++++++++++++++++++++++++++++-
 3 files changed, 273 insertions(+), 146 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d1ae94f40907..c5d18c859ded 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -367,28 +367,29 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 static void nested_prepare_vmcb_save(struct vcpu_svm *svm, struct vmcb *nested_vmcb)
 {
 	/* Load the nested guest state */
-	svm->vmcb->save.es = nested_vmcb->save.es;
-	svm->vmcb->save.cs = nested_vmcb->save.cs;
-	svm->vmcb->save.ss = nested_vmcb->save.ss;
-	svm->vmcb->save.ds = nested_vmcb->save.ds;
-	svm->vmcb->save.gdtr = nested_vmcb->save.gdtr;
-	svm->vmcb->save.idtr = nested_vmcb->save.idtr;
+	svm_es_write(svm, &nested_vmcb->save.es);
+	svm_cs_write(svm, &nested_vmcb->save.cs);
+	svm_ss_write(svm, &nested_vmcb->save.ss);
+	svm_ds_write(svm, &nested_vmcb->save.ds);
+	svm_gdtr_write(svm, &nested_vmcb->save.gdtr);
+	svm_idtr_write(svm, &nested_vmcb->save.idtr);
 	kvm_set_rflags(&svm->vcpu, nested_vmcb->save.rflags);
 	svm_set_efer(&svm->vcpu, nested_vmcb->save.efer);
 	svm_set_cr0(&svm->vcpu, nested_vmcb->save.cr0);
 	svm_set_cr4(&svm->vcpu, nested_vmcb->save.cr4);
-	svm->vmcb->save.cr2 = svm->vcpu.arch.cr2 = nested_vmcb->save.cr2;
+	svm_cr2_write(svm, nested_vmcb->save.cr2);
+	svm->vcpu.arch.cr2 = nested_vmcb->save.cr2;
 	kvm_rax_write(&svm->vcpu, nested_vmcb->save.rax);
 	kvm_rsp_write(&svm->vcpu, nested_vmcb->save.rsp);
 	kvm_rip_write(&svm->vcpu, nested_vmcb->save.rip);
 
 	/* In case we don't even reach vcpu_run, the fields are not updated */
-	svm->vmcb->save.rax = nested_vmcb->save.rax;
-	svm->vmcb->save.rsp = nested_vmcb->save.rsp;
-	svm->vmcb->save.rip = nested_vmcb->save.rip;
-	svm->vmcb->save.dr7 = nested_vmcb->save.dr7;
+	svm_rax_write(svm, nested_vmcb->save.rax);
+	svm_rsp_write(svm, nested_vmcb->save.rsp);
+	svm_rip_write(svm, nested_vmcb->save.rip);
+	svm_dr7_write(svm, nested_vmcb->save.dr7);
 	svm->vcpu.arch.dr6  = nested_vmcb->save.dr6;
-	svm->vmcb->save.cpl = nested_vmcb->save.cpl;
+	svm_cpl_write(svm, nested_vmcb->save.cpl);
 }
 
 static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
@@ -451,7 +452,6 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 	int ret;
 	struct vmcb *nested_vmcb;
 	struct vmcb *hsave = svm->nested.hsave;
-	struct vmcb *vmcb = svm->vmcb;
 	struct kvm_host_map map;
 	u64 vmcb_gpa;
 
@@ -460,7 +460,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 		return 1;
 	}
 
-	vmcb_gpa = svm->vmcb->save.rax;
+	vmcb_gpa = svm_rax_read(svm);
 	ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(vmcb_gpa), &map);
 	if (ret == -EINVAL) {
 		kvm_inject_gp(&svm->vcpu, 0);
@@ -481,7 +481,7 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 		goto out;
 	}
 
-	trace_kvm_nested_vmrun(svm->vmcb->save.rip, vmcb_gpa,
+	trace_kvm_nested_vmrun(svm_rip_read(svm), vmcb_gpa,
 			       nested_vmcb->save.rip,
 			       nested_vmcb->control.int_ctl,
 			       nested_vmcb->control.event_inj,
@@ -500,25 +500,25 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 	 * Save the old vmcb, so we don't need to pick what we save, but can
 	 * restore everything when a VMEXIT occurs
 	 */
-	hsave->save.es     = vmcb->save.es;
-	hsave->save.cs     = vmcb->save.cs;
-	hsave->save.ss     = vmcb->save.ss;
-	hsave->save.ds     = vmcb->save.ds;
-	hsave->save.gdtr   = vmcb->save.gdtr;
-	hsave->save.idtr   = vmcb->save.idtr;
+	hsave->save.es     = *svm_es_read(svm);
+	hsave->save.cs     = *svm_cs_read(svm);
+	hsave->save.ss     = *svm_ss_read(svm);
+	hsave->save.ds     = *svm_ds_read(svm);
+	hsave->save.gdtr   = *svm_gdtr_read(svm);
+	hsave->save.idtr   = *svm_idtr_read(svm);
 	hsave->save.efer   = svm->vcpu.arch.efer;
 	hsave->save.cr0    = kvm_read_cr0(&svm->vcpu);
 	hsave->save.cr4    = svm->vcpu.arch.cr4;
 	hsave->save.rflags = kvm_get_rflags(&svm->vcpu);
 	hsave->save.rip    = kvm_rip_read(&svm->vcpu);
-	hsave->save.rsp    = vmcb->save.rsp;
-	hsave->save.rax    = vmcb->save.rax;
+	hsave->save.rsp    = svm_rsp_read(svm);
+	hsave->save.rax    = svm_rax_read(svm);
 	if (npt_enabled)
-		hsave->save.cr3    = vmcb->save.cr3;
+		hsave->save.cr3    = svm_cr3_read(svm);
 	else
 		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
 
-	copy_vmcb_control_area(&hsave->control, &vmcb->control);
+	copy_vmcb_control_area(&hsave->control, &svm->vmcb->control);
 
 	svm->nested.nested_run_pending = 1;
 
@@ -544,20 +544,21 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 	return ret;
 }
 
-void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
+void nested_svm_vmloadsave(struct vmcb_save_area *from_vmsa,
+			   struct vmcb_save_area *to_vmsa)
 {
-	to_vmcb->save.fs = from_vmcb->save.fs;
-	to_vmcb->save.gs = from_vmcb->save.gs;
-	to_vmcb->save.tr = from_vmcb->save.tr;
-	to_vmcb->save.ldtr = from_vmcb->save.ldtr;
-	to_vmcb->save.kernel_gs_base = from_vmcb->save.kernel_gs_base;
-	to_vmcb->save.star = from_vmcb->save.star;
-	to_vmcb->save.lstar = from_vmcb->save.lstar;
-	to_vmcb->save.cstar = from_vmcb->save.cstar;
-	to_vmcb->save.sfmask = from_vmcb->save.sfmask;
-	to_vmcb->save.sysenter_cs = from_vmcb->save.sysenter_cs;
-	to_vmcb->save.sysenter_esp = from_vmcb->save.sysenter_esp;
-	to_vmcb->save.sysenter_eip = from_vmcb->save.sysenter_eip;
+	to_vmsa->fs = from_vmsa->fs;
+	to_vmsa->gs = from_vmsa->gs;
+	to_vmsa->tr = from_vmsa->tr;
+	to_vmsa->ldtr = from_vmsa->ldtr;
+	to_vmsa->kernel_gs_base = from_vmsa->kernel_gs_base;
+	to_vmsa->star = from_vmsa->star;
+	to_vmsa->lstar = from_vmsa->lstar;
+	to_vmsa->cstar = from_vmsa->cstar;
+	to_vmsa->sfmask = from_vmsa->sfmask;
+	to_vmsa->sysenter_cs = from_vmsa->sysenter_cs;
+	to_vmsa->sysenter_esp = from_vmsa->sysenter_esp;
+	to_vmsa->sysenter_eip = from_vmsa->sysenter_eip;
 }
 
 int nested_svm_vmexit(struct vcpu_svm *svm)
@@ -588,24 +589,24 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	/* Give the current vmcb to the guest */
 	svm_set_gif(svm, false);
 
-	nested_vmcb->save.es     = vmcb->save.es;
-	nested_vmcb->save.cs     = vmcb->save.cs;
-	nested_vmcb->save.ss     = vmcb->save.ss;
-	nested_vmcb->save.ds     = vmcb->save.ds;
-	nested_vmcb->save.gdtr   = vmcb->save.gdtr;
-	nested_vmcb->save.idtr   = vmcb->save.idtr;
+	nested_vmcb->save.es     = *svm_es_read(svm);
+	nested_vmcb->save.cs     = *svm_cs_read(svm);
+	nested_vmcb->save.ss     = *svm_ss_read(svm);
+	nested_vmcb->save.ds     = *svm_ds_read(svm);
+	nested_vmcb->save.gdtr   = *svm_gdtr_read(svm);
+	nested_vmcb->save.idtr   = *svm_idtr_read(svm);
 	nested_vmcb->save.efer   = svm->vcpu.arch.efer;
 	nested_vmcb->save.cr0    = kvm_read_cr0(&svm->vcpu);
 	nested_vmcb->save.cr3    = kvm_read_cr3(&svm->vcpu);
-	nested_vmcb->save.cr2    = vmcb->save.cr2;
+	nested_vmcb->save.cr2    = svm_cr2_read(svm);
 	nested_vmcb->save.cr4    = svm->vcpu.arch.cr4;
 	nested_vmcb->save.rflags = kvm_get_rflags(&svm->vcpu);
 	nested_vmcb->save.rip    = kvm_rip_read(&svm->vcpu);
 	nested_vmcb->save.rsp    = kvm_rsp_read(&svm->vcpu);
 	nested_vmcb->save.rax    = kvm_rax_read(&svm->vcpu);
-	nested_vmcb->save.dr7    = vmcb->save.dr7;
+	nested_vmcb->save.dr7    = svm_dr7_read(svm);
 	nested_vmcb->save.dr6    = svm->vcpu.arch.dr6;
-	nested_vmcb->save.cpl    = vmcb->save.cpl;
+	nested_vmcb->save.cpl    = svm_cpl_read(svm);
 
 	nested_vmcb->control.int_state         = vmcb->control.int_state;
 	nested_vmcb->control.exit_code         = vmcb->control.exit_code;
@@ -625,9 +626,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	nested_vmcb->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
 	nested_vmcb->control.pause_filter_count =
-		svm->vmcb->control.pause_filter_count;
+		vmcb->control.pause_filter_count;
 	nested_vmcb->control.pause_filter_thresh =
-		svm->vmcb->control.pause_filter_thresh;
+		vmcb->control.pause_filter_thresh;
 
 	/* Restore the original control entries */
 	copy_vmcb_control_area(&vmcb->control, &hsave->control);
@@ -638,12 +639,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->nested.ctl.nested_cr3 = 0;
 
 	/* Restore selected save entries */
-	svm->vmcb->save.es = hsave->save.es;
-	svm->vmcb->save.cs = hsave->save.cs;
-	svm->vmcb->save.ss = hsave->save.ss;
-	svm->vmcb->save.ds = hsave->save.ds;
-	svm->vmcb->save.gdtr = hsave->save.gdtr;
-	svm->vmcb->save.idtr = hsave->save.idtr;
+	svm_es_write(svm, &hsave->save.es);
+	svm_cs_write(svm, &hsave->save.cs);
+	svm_ss_write(svm, &hsave->save.ss);
+	svm_ds_write(svm, &hsave->save.ds);
+	svm_gdtr_write(svm, &hsave->save.gdtr);
+	svm_idtr_write(svm, &hsave->save.idtr);
 	kvm_set_rflags(&svm->vcpu, hsave->save.rflags);
 	svm_set_efer(&svm->vcpu, hsave->save.efer);
 	svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE);
@@ -651,11 +652,11 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	kvm_rax_write(&svm->vcpu, hsave->save.rax);
 	kvm_rsp_write(&svm->vcpu, hsave->save.rsp);
 	kvm_rip_write(&svm->vcpu, hsave->save.rip);
-	svm->vmcb->save.dr7 = 0;
-	svm->vmcb->save.cpl = 0;
-	svm->vmcb->control.exit_int_info = 0;
+	svm_dr7_write(svm, 0);
+	svm_cpl_write(svm, 0);
+	vmcb->control.exit_int_info = 0;
 
-	vmcb_mark_all_dirty(svm->vmcb);
+	vmcb_mark_all_dirty(vmcb);
 
 	trace_kvm_nested_vmexit_inject(nested_vmcb->control.exit_code,
 				       nested_vmcb->control.exit_info_1,
@@ -673,7 +674,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		return 1;
 
 	if (npt_enabled)
-		svm->vmcb->save.cr3 = hsave->save.cr3;
+		svm_cr3_write(svm, hsave->save.cr3);
 
 	/*
 	 * Drop what we picked up for L2 via svm_complete_interrupts() so it
@@ -819,7 +820,7 @@ int nested_svm_check_permissions(struct vcpu_svm *svm)
 		return 1;
 	}
 
-	if (svm->vmcb->save.cpl) {
+	if (svm_cpl_read(svm)) {
 		kvm_inject_gp(&svm->vcpu, 0);
 		return 1;
 	}
@@ -888,7 +889,7 @@ static void nested_svm_nmi(struct vcpu_svm *svm)
 
 static void nested_svm_intr(struct vcpu_svm *svm)
 {
-	trace_kvm_nested_intr_vmexit(svm->vmcb->save.rip);
+	trace_kvm_nested_intr_vmexit(svm_rip_read(svm));
 
 	svm->vmcb->control.exit_code   = SVM_EXIT_INTR;
 	svm->vmcb->control.exit_info_1 = 0;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 83292fc44b4e..779c167e42cc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -285,7 +285,7 @@ void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 		svm_set_gif(svm, true);
 	}
 
-	svm->vmcb->save.efer = efer | EFER_SVME;
+	svm_efer_write(svm, efer | EFER_SVME);
 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
 }
 
@@ -357,7 +357,7 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu)
 		 */
 		(void)skip_emulated_instruction(&svm->vcpu);
 		rip = kvm_rip_read(&svm->vcpu);
-		svm->int3_rip = rip + svm->vmcb->save.cs.base;
+		svm->int3_rip = rip + svm_cs_read_base(svm);
 		svm->int3_injected = rip - old_rip;
 	}
 
@@ -699,9 +699,9 @@ void disable_nmi_singlestep(struct vcpu_svm *svm)
 	if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
 		/* Clear our flags if they were not set by the guest */
 		if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
-			svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
+			svm_rflags_and(svm, ~X86_EFLAGS_TF);
 		if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
-			svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
+			svm_rflags_and(svm, ~X86_EFLAGS_RF);
 	}
 }
 
@@ -988,7 +988,7 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 static void init_vmcb(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
-	struct vmcb_save_area *save = &svm->vmcb->save;
+	struct vmcb_save_area *save = get_vmsa(svm);
 
 	svm->vcpu.arch.hflags = 0;
 
@@ -1328,7 +1328,7 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	unsigned long rflags = svm->vmcb->save.rflags;
+	unsigned long rflags = svm_rflags_read(svm);
 
 	if (svm->nmi_singlestep) {
 		/* Hide our flags if they were not set by the guest */
@@ -1350,7 +1350,7 @@ static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
         * (caused by either a task switch or an inter-privilege IRET),
         * so we do not need to update the CPL here.
         */
-	to_svm(vcpu)->vmcb->save.rflags = rflags;
+	svm_rflags_write(to_svm(vcpu), rflags);
 }
 
 static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
@@ -1405,7 +1405,7 @@ static void svm_clear_vintr(struct vcpu_svm *svm)
 
 static struct vmcb_seg *svm_seg(struct kvm_vcpu *vcpu, int seg)
 {
-	struct vmcb_save_area *save = &to_svm(vcpu)->vmcb->save;
+	struct vmcb_save_area *save = get_vmsa(to_svm(vcpu));
 
 	switch (seg) {
 	case VCPU_SREG_CS: return &save->cs;
@@ -1492,32 +1492,30 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
 		if (var->unusable)
 			var->db = 0;
 		/* This is symmetric with svm_set_segment() */
-		var->dpl = to_svm(vcpu)->vmcb->save.cpl;
+		var->dpl = svm_cpl_read(to_svm(vcpu));
 		break;
 	}
 }
 
 static int svm_get_cpl(struct kvm_vcpu *vcpu)
 {
-	struct vmcb_save_area *save = &to_svm(vcpu)->vmcb->save;
-
-	return save->cpl;
+	return svm_cpl_read(to_svm(vcpu));
 }
 
 static void svm_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	dt->size = svm->vmcb->save.idtr.limit;
-	dt->address = svm->vmcb->save.idtr.base;
+	dt->size = svm_idtr_read_limit(svm);
+	dt->address = svm_idtr_read_base(svm);
 }
 
 static void svm_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm->vmcb->save.idtr.limit = dt->size;
-	svm->vmcb->save.idtr.base = dt->address ;
+	svm_idtr_write_limit(svm, dt->size);
+	svm_idtr_write_base(svm, dt->address);
 	vmcb_mark_dirty(svm->vmcb, VMCB_DT);
 }
 
@@ -1525,30 +1523,31 @@ static void svm_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	dt->size = svm->vmcb->save.gdtr.limit;
-	dt->address = svm->vmcb->save.gdtr.base;
+	dt->size = svm_gdtr_read_limit(svm);
+	dt->address = svm_gdtr_read_base(svm);
 }
 
 static void svm_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm->vmcb->save.gdtr.limit = dt->size;
-	svm->vmcb->save.gdtr.base = dt->address ;
+	svm_gdtr_write_limit(svm, dt->size);
+	svm_gdtr_write_base(svm, dt->address);
 	vmcb_mark_dirty(svm->vmcb, VMCB_DT);
 }
 
 static void update_cr0_intercept(struct vcpu_svm *svm)
 {
 	ulong gcr0 = svm->vcpu.arch.cr0;
-	u64 *hcr0 = &svm->vmcb->save.cr0;
+	u64 hcr0;
 
-	*hcr0 = (*hcr0 & ~SVM_CR0_SELECTIVE_MASK)
+	hcr0 = (svm_cr0_read(svm) & ~SVM_CR0_SELECTIVE_MASK)
 		| (gcr0 & SVM_CR0_SELECTIVE_MASK);
 
+	svm_cr0_write(svm, hcr0);
 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
 
-	if (gcr0 == *hcr0) {
+	if (gcr0 == hcr0) {
 		clr_cr_intercept(svm, INTERCEPT_CR0_READ);
 		clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
 	} else {
@@ -1565,12 +1564,12 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	if (vcpu->arch.efer & EFER_LME) {
 		if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
 			vcpu->arch.efer |= EFER_LMA;
-			svm->vmcb->save.efer |= EFER_LMA | EFER_LME;
+			svm_efer_or(svm, EFER_LMA | EFER_LME);
 		}
 
 		if (is_paging(vcpu) && !(cr0 & X86_CR0_PG)) {
 			vcpu->arch.efer &= ~EFER_LMA;
-			svm->vmcb->save.efer &= ~(EFER_LMA | EFER_LME);
+			svm_efer_and(svm, ~(EFER_LMA | EFER_LME));
 		}
 	}
 #endif
@@ -1586,7 +1585,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	 */
 	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
 		cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
-	svm->vmcb->save.cr0 = cr0;
+	svm_cr0_write(svm, cr0);
 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
 	update_cr0_intercept(svm);
 }
@@ -1594,7 +1593,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	unsigned long host_cr4_mce = cr4_read_shadow() & X86_CR4_MCE;
-	unsigned long old_cr4 = to_svm(vcpu)->vmcb->save.cr4;
+	unsigned long old_cr4 = svm_cr4_read(to_svm(vcpu));
 
 	if (cr4 & X86_CR4_VMXE)
 		return 1;
@@ -1606,7 +1605,7 @@ int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 	if (!npt_enabled)
 		cr4 |= X86_CR4_PAE;
 	cr4 |= host_cr4_mce;
-	to_svm(vcpu)->vmcb->save.cr4 = cr4;
+	svm_cr4_write(to_svm(vcpu), cr4);
 	vmcb_mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
 	return 0;
 }
@@ -1637,7 +1636,7 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
 	 */
 	if (seg == VCPU_SREG_SS)
 		/* This is symmetric with svm_get_segment() */
-		svm->vmcb->save.cpl = (var->dpl & 3);
+		svm_cpl_write(svm, (var->dpl & 3));
 
 	vmcb_mark_dirty(svm->vmcb, VMCB_SEG);
 }
@@ -1672,8 +1671,8 @@ static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value)
 {
 	struct vmcb *vmcb = svm->vmcb;
 
-	if (unlikely(value != vmcb->save.dr6)) {
-		vmcb->save.dr6 = value;
+	if (unlikely(value != svm_dr6_read(svm))) {
+		svm_dr6_write(svm, value);
 		vmcb_mark_dirty(vmcb, VMCB_DR);
 	}
 }
@@ -1690,8 +1689,8 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 	 * We cannot reset svm->vmcb->save.dr6 to DR6_FIXED_1|DR6_RTM here,
 	 * because db_interception might need it.  We can do it before vmentry.
 	 */
-	vcpu->arch.dr6 = svm->vmcb->save.dr6;
-	vcpu->arch.dr7 = svm->vmcb->save.dr7;
+	vcpu->arch.dr6 = svm_dr6_read(svm);
+	vcpu->arch.dr7 = svm_dr7_read(svm);
 	vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_WONT_EXIT;
 	set_dr_intercepts(svm);
 }
@@ -1700,7 +1699,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm->vmcb->save.dr7 = value;
+	svm_dr7_write(svm, value);
 	vmcb_mark_dirty(svm->vmcb, VMCB_DR);
 }
 
@@ -1735,7 +1734,7 @@ static int db_interception(struct vcpu_svm *svm)
 	if (!(svm->vcpu.guest_debug &
 	      (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) &&
 		!svm->nmi_singlestep) {
-		u32 payload = (svm->vmcb->save.dr6 ^ DR6_RTM) & ~DR6_FIXED_1;
+		u32 payload = (svm_dr6_read(svm) ^ DR6_RTM) & ~DR6_FIXED_1;
 		kvm_queue_exception_p(&svm->vcpu, DB_VECTOR, payload);
 		return 1;
 	}
@@ -1749,10 +1748,10 @@ static int db_interception(struct vcpu_svm *svm)
 	if (svm->vcpu.guest_debug &
 	    (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) {
 		kvm_run->exit_reason = KVM_EXIT_DEBUG;
-		kvm_run->debug.arch.dr6 = svm->vmcb->save.dr6;
-		kvm_run->debug.arch.dr7 = svm->vmcb->save.dr7;
+		kvm_run->debug.arch.dr6 = svm_dr6_read(svm);
+		kvm_run->debug.arch.dr7 = svm_dr7_read(svm);
 		kvm_run->debug.arch.pc =
-			svm->vmcb->save.cs.base + svm->vmcb->save.rip;
+			svm_cs_read_base(svm) + svm_rip_read(svm);
 		kvm_run->debug.arch.exception = DB_VECTOR;
 		return 0;
 	}
@@ -1765,7 +1764,7 @@ static int bp_interception(struct vcpu_svm *svm)
 	struct kvm_run *kvm_run = svm->vcpu.run;
 
 	kvm_run->exit_reason = KVM_EXIT_DEBUG;
-	kvm_run->debug.arch.pc = svm->vmcb->save.cs.base + svm->vmcb->save.rip;
+	kvm_run->debug.arch.pc = svm_cs_read_base(svm) + svm_rip_read(svm);
 	kvm_run->debug.arch.exception = BP_VECTOR;
 	return 0;
 }
@@ -1953,7 +1952,7 @@ static int vmload_interception(struct vcpu_svm *svm)
 	if (nested_svm_check_permissions(svm))
 		return 1;
 
-	ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
+	ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm_rax_read(svm)), &map);
 	if (ret) {
 		if (ret == -EINVAL)
 			kvm_inject_gp(&svm->vcpu, 0);
@@ -1964,7 +1963,7 @@ static int vmload_interception(struct vcpu_svm *svm)
 
 	ret = kvm_skip_emulated_instruction(&svm->vcpu);
 
-	nested_svm_vmloadsave(nested_vmcb, svm->vmcb);
+	nested_svm_vmloadsave(&nested_vmcb->save, get_vmsa(svm));
 	kvm_vcpu_unmap(&svm->vcpu, &map, true);
 
 	return ret;
@@ -1979,7 +1978,7 @@ static int vmsave_interception(struct vcpu_svm *svm)
 	if (nested_svm_check_permissions(svm))
 		return 1;
 
-	ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
+	ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm_rax_read(svm)), &map);
 	if (ret) {
 		if (ret == -EINVAL)
 			kvm_inject_gp(&svm->vcpu, 0);
@@ -1990,7 +1989,7 @@ static int vmsave_interception(struct vcpu_svm *svm)
 
 	ret = kvm_skip_emulated_instruction(&svm->vcpu);
 
-	nested_svm_vmloadsave(svm->vmcb, nested_vmcb);
+	nested_svm_vmloadsave(get_vmsa(svm), &nested_vmcb->save);
 	kvm_vcpu_unmap(&svm->vcpu, &map, true);
 
 	return ret;
@@ -2064,7 +2063,7 @@ static int invlpga_interception(struct vcpu_svm *svm)
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
-	trace_kvm_invlpga(svm->vmcb->save.rip, kvm_rcx_read(&svm->vcpu),
+	trace_kvm_invlpga(svm_rip_read(svm), kvm_rcx_read(&svm->vcpu),
 			  kvm_rax_read(&svm->vcpu));
 
 	/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
@@ -2075,7 +2074,7 @@ static int invlpga_interception(struct vcpu_svm *svm)
 
 static int skinit_interception(struct vcpu_svm *svm)
 {
-	trace_kvm_skinit(svm->vmcb->save.rip, kvm_rax_read(&svm->vcpu));
+	trace_kvm_skinit(svm_rip_read(svm), kvm_rax_read(&svm->vcpu));
 
 	kvm_queue_exception(&svm->vcpu, UD_VECTOR);
 	return 1;
@@ -2387,24 +2386,24 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 	switch (msr_info->index) {
 	case MSR_STAR:
-		msr_info->data = svm->vmcb->save.star;
+		msr_info->data = svm_star_read(svm);
 		break;
 #ifdef CONFIG_X86_64
 	case MSR_LSTAR:
-		msr_info->data = svm->vmcb->save.lstar;
+		msr_info->data = svm_lstar_read(svm);
 		break;
 	case MSR_CSTAR:
-		msr_info->data = svm->vmcb->save.cstar;
+		msr_info->data = svm_cstar_read(svm);
 		break;
 	case MSR_KERNEL_GS_BASE:
-		msr_info->data = svm->vmcb->save.kernel_gs_base;
+		msr_info->data = svm_kernel_gs_base_read(svm);
 		break;
 	case MSR_SYSCALL_MASK:
-		msr_info->data = svm->vmcb->save.sfmask;
+		msr_info->data = svm_sfmask_read(svm);
 		break;
 #endif
 	case MSR_IA32_SYSENTER_CS:
-		msr_info->data = svm->vmcb->save.sysenter_cs;
+		msr_info->data = svm_sysenter_cs_read(svm);
 		break;
 	case MSR_IA32_SYSENTER_EIP:
 		msr_info->data = svm->sysenter_eip;
@@ -2423,19 +2422,19 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	 * implemented.
 	 */
 	case MSR_IA32_DEBUGCTLMSR:
-		msr_info->data = svm->vmcb->save.dbgctl;
+		msr_info->data = svm_dbgctl_read(svm);
 		break;
 	case MSR_IA32_LASTBRANCHFROMIP:
-		msr_info->data = svm->vmcb->save.br_from;
+		msr_info->data = svm_br_from_read(svm);
 		break;
 	case MSR_IA32_LASTBRANCHTOIP:
-		msr_info->data = svm->vmcb->save.br_to;
+		msr_info->data = svm_br_to_read(svm);
 		break;
 	case MSR_IA32_LASTINTFROMIP:
-		msr_info->data = svm->vmcb->save.last_excp_from;
+		msr_info->data = svm_last_excp_from_read(svm);
 		break;
 	case MSR_IA32_LASTINTTOIP:
-		msr_info->data = svm->vmcb->save.last_excp_to;
+		msr_info->data = svm_last_excp_to_read(svm);
 		break;
 	case MSR_VM_HSAVE_PA:
 		msr_info->data = svm->nested.hsave_msr;
@@ -2527,7 +2526,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
 			return 1;
 		vcpu->arch.pat = data;
-		svm->vmcb->save.g_pat = data;
+		svm_g_pat_write(svm, data);
 		vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
 		break;
 	case MSR_IA32_SPEC_CTRL:
@@ -2584,32 +2583,32 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->virt_spec_ctrl = data;
 		break;
 	case MSR_STAR:
-		svm->vmcb->save.star = data;
+		svm_star_write(svm, data);
 		break;
 #ifdef CONFIG_X86_64
 	case MSR_LSTAR:
-		svm->vmcb->save.lstar = data;
+		svm_lstar_write(svm, data);
 		break;
 	case MSR_CSTAR:
-		svm->vmcb->save.cstar = data;
+		svm_cstar_write(svm, data);
 		break;
 	case MSR_KERNEL_GS_BASE:
-		svm->vmcb->save.kernel_gs_base = data;
+		svm_kernel_gs_base_write(svm, data);
 		break;
 	case MSR_SYSCALL_MASK:
-		svm->vmcb->save.sfmask = data;
+		svm_sfmask_write(svm, data);
 		break;
 #endif
 	case MSR_IA32_SYSENTER_CS:
-		svm->vmcb->save.sysenter_cs = data;
+		svm_sysenter_cs_write(svm, data);
 		break;
 	case MSR_IA32_SYSENTER_EIP:
 		svm->sysenter_eip = data;
-		svm->vmcb->save.sysenter_eip = data;
+		svm_sysenter_eip_write(svm, data);
 		break;
 	case MSR_IA32_SYSENTER_ESP:
 		svm->sysenter_esp = data;
-		svm->vmcb->save.sysenter_esp = data;
+		svm_sysenter_esp_write(svm, data);
 		break;
 	case MSR_TSC_AUX:
 		if (!boot_cpu_has(X86_FEATURE_RDTSCP))
@@ -2632,7 +2631,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		if (data & DEBUGCTL_RESERVED_BITS)
 			return 1;
 
-		svm->vmcb->save.dbgctl = data;
+		svm_dbgctl_write(svm, data);
 		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
 		if (data & (1ULL<<0))
 			svm_enable_lbrv(svm);
@@ -2805,7 +2804,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb_control_area *control = &svm->vmcb->control;
-	struct vmcb_save_area *save = &svm->vmcb->save;
+	struct vmcb_save_area *save = get_vmsa(svm);
 
 	if (!dump_invalid_vmcb) {
 		pr_warn_ratelimited("set kvm_amd.dump_invalid_vmcb=1 to dump internal KVM state.\n");
@@ -2934,16 +2933,16 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
 
 	if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
-		vcpu->arch.cr0 = svm->vmcb->save.cr0;
+		vcpu->arch.cr0 = svm_cr0_read(svm);
 	if (npt_enabled)
-		vcpu->arch.cr3 = svm->vmcb->save.cr3;
+		vcpu->arch.cr3 = svm_cr3_read(svm);
 
 	svm_complete_interrupts(svm);
 
 	if (is_guest_mode(vcpu)) {
 		int vmexit;
 
-		trace_kvm_nested_vmexit(svm->vmcb->save.rip, exit_code,
+		trace_kvm_nested_vmexit(svm_rip_read(svm), exit_code,
 					svm->vmcb->control.exit_info_1,
 					svm->vmcb->control.exit_info_2,
 					svm->vmcb->control.exit_int_info,
@@ -3204,7 +3203,7 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
 	 */
 	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
 	svm->nmi_singlestep = true;
-	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+	svm_rflags_or(svm, (X86_EFLAGS_TF | X86_EFLAGS_RF));
 }
 
 static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
@@ -3418,9 +3417,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 	fastpath_t exit_fastpath;
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
-	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
-	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+	svm_rax_write(svm, vcpu->arch.regs[VCPU_REGS_RAX]);
+	svm_rsp_write(svm, vcpu->arch.regs[VCPU_REGS_RSP]);
+	svm_rip_write(svm, vcpu->arch.regs[VCPU_REGS_RIP]);
 
 	/*
 	 * Disable singlestep if we're injecting an interrupt/exception.
@@ -3442,7 +3441,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	sync_lapic_to_cr8(vcpu);
 
-	svm->vmcb->save.cr2 = vcpu->arch.cr2;
+	svm_cr2_write(svm, vcpu->arch.cr2);
 
 	/*
 	 * Run with all-zero DR6 unless needed, so that we can get the exact cause
@@ -3492,10 +3491,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
 
-	vcpu->arch.cr2 = svm->vmcb->save.cr2;
-	vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
-	vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
-	vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+	vcpu->arch.cr2 = svm_cr2_read(svm);
+	vcpu->arch.regs[VCPU_REGS_RAX] = svm_rax_read(svm);
+	vcpu->arch.regs[VCPU_REGS_RSP] = svm_rsp_read(svm);
+	vcpu->arch.regs[VCPU_REGS_RIP] = svm_rip_read(svm);
 
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_before_interrupt(&svm->vcpu);
@@ -3558,7 +3557,7 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long root,
 		cr3 = vcpu->arch.cr3;
 	}
 
-	svm->vmcb->save.cr3 = cr3;
+	svm_cr3_write(svm, cr3);
 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
 }
 
@@ -3886,9 +3885,9 @@ static int svm_pre_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 		/* FEE0h - SVM Guest VMCB Physical Address */
 		put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb);
 
-		svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
-		svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
-		svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+		svm_rax_write(svm, vcpu->arch.regs[VCPU_REGS_RAX]);
+		svm_rsp_write(svm, vcpu->arch.regs[VCPU_REGS_RSP]);
+		svm_rip_write(svm, vcpu->arch.regs[VCPU_REGS_RIP]);
 
 		ret = nested_svm_vmexit(svm);
 		if (ret)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2692ddf30c8d..f42ba9d158df 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -395,7 +395,8 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 			 struct vmcb *nested_vmcb);
 void svm_leave_nested(struct vcpu_svm *svm);
 int nested_svm_vmrun(struct vcpu_svm *svm);
-void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
+void nested_svm_vmloadsave(struct vmcb_save_area *from_vmsa,
+			   struct vmcb_save_area *to_vmsa);
 int nested_svm_vmexit(struct vcpu_svm *svm);
 int nested_svm_exit_handled(struct vcpu_svm *svm);
 int nested_svm_check_permissions(struct vcpu_svm *svm);
@@ -504,4 +505,130 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_hardware_setup(void);
 void sev_hardware_teardown(void);
 
+/* VMSA Accessor functions */
+
+static inline struct vmcb_save_area *get_vmsa(struct vcpu_svm *svm)
+{
+	return &svm->vmcb->save;
+}
+
+#define DEFINE_VMSA_SEGMENT_ENTRY(_field, _entry, _size)		\
+	static inline _size						\
+	svm_##_field##_read_##_entry(struct vcpu_svm *svm)		\
+	{								\
+		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
+									\
+		return vmsa->_field._entry;				\
+	}								\
+									\
+	static inline void						\
+	svm_##_field##_write_##_entry(struct vcpu_svm *svm,		\
+				      _size value)			\
+	{								\
+		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
+									\
+		vmsa->_field._entry = value;				\
+	}								\
+
+#define DEFINE_VMSA_SEGMENT_ACCESSOR(_field)				\
+	DEFINE_VMSA_SEGMENT_ENTRY(_field, selector, u16)		\
+	DEFINE_VMSA_SEGMENT_ENTRY(_field, attrib, u16)			\
+	DEFINE_VMSA_SEGMENT_ENTRY(_field, limit, u32)			\
+	DEFINE_VMSA_SEGMENT_ENTRY(_field, base, u64)			\
+									\
+	static inline struct vmcb_seg *					\
+	svm_##_field##_read(struct vcpu_svm *svm)			\
+	{								\
+		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
+									\
+		return &vmsa->_field;					\
+	}								\
+									\
+	static inline void						\
+	svm_##_field##_write(struct vcpu_svm *svm,			\
+			    struct vmcb_seg *seg)			\
+	{								\
+		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
+									\
+		vmsa->_field = *seg;					\
+	}
+
+DEFINE_VMSA_SEGMENT_ACCESSOR(cs)
+DEFINE_VMSA_SEGMENT_ACCESSOR(ds)
+DEFINE_VMSA_SEGMENT_ACCESSOR(es)
+DEFINE_VMSA_SEGMENT_ACCESSOR(fs)
+DEFINE_VMSA_SEGMENT_ACCESSOR(gs)
+DEFINE_VMSA_SEGMENT_ACCESSOR(ss)
+DEFINE_VMSA_SEGMENT_ACCESSOR(gdtr)
+DEFINE_VMSA_SEGMENT_ACCESSOR(idtr)
+DEFINE_VMSA_SEGMENT_ACCESSOR(ldtr)
+DEFINE_VMSA_SEGMENT_ACCESSOR(tr)
+
+#define DEFINE_VMSA_SIZE_ACCESSOR(_field, _size)			\
+	static inline _size						\
+	svm_##_field##_read(struct vcpu_svm *svm)			\
+	{								\
+		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
+									\
+		return vmsa->_field;					\
+	}								\
+									\
+	static inline void						\
+	svm_##_field##_write(struct vcpu_svm *svm, _size value)		\
+	{								\
+		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
+									\
+		vmsa->_field = value;					\
+	}								\
+									\
+	static inline void						\
+	svm_##_field##_and(struct vcpu_svm *svm, _size value)		\
+	{								\
+		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
+									\
+		vmsa->_field &= value;					\
+	}								\
+									\
+	static inline void						\
+	svm_##_field##_or(struct vcpu_svm *svm, _size value)		\
+	{								\
+		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
+									\
+		vmsa->_field |= value;					\
+	}
+
+#define DEFINE_VMSA_ACCESSOR(_field)					\
+	DEFINE_VMSA_SIZE_ACCESSOR(_field, u64)
+
+#define DEFINE_VMSA_U8_ACCESSOR(_field)					\
+	DEFINE_VMSA_SIZE_ACCESSOR(_field, u8)
+
+DEFINE_VMSA_ACCESSOR(efer)
+DEFINE_VMSA_ACCESSOR(cr0)
+DEFINE_VMSA_ACCESSOR(cr2)
+DEFINE_VMSA_ACCESSOR(cr3)
+DEFINE_VMSA_ACCESSOR(cr4)
+DEFINE_VMSA_ACCESSOR(dr6)
+DEFINE_VMSA_ACCESSOR(dr7)
+DEFINE_VMSA_ACCESSOR(rflags)
+DEFINE_VMSA_ACCESSOR(star)
+DEFINE_VMSA_ACCESSOR(lstar)
+DEFINE_VMSA_ACCESSOR(cstar)
+DEFINE_VMSA_ACCESSOR(sfmask)
+DEFINE_VMSA_ACCESSOR(kernel_gs_base)
+DEFINE_VMSA_ACCESSOR(sysenter_cs)
+DEFINE_VMSA_ACCESSOR(sysenter_esp)
+DEFINE_VMSA_ACCESSOR(sysenter_eip)
+DEFINE_VMSA_ACCESSOR(g_pat)
+DEFINE_VMSA_ACCESSOR(dbgctl)
+DEFINE_VMSA_ACCESSOR(br_from)
+DEFINE_VMSA_ACCESSOR(br_to)
+DEFINE_VMSA_ACCESSOR(last_excp_from)
+DEFINE_VMSA_ACCESSOR(last_excp_to)
+
+DEFINE_VMSA_U8_ACCESSOR(cpl)
+DEFINE_VMSA_ACCESSOR(rip)
+DEFINE_VMSA_ACCESSOR(rax)
+DEFINE_VMSA_ACCESSOR(rsp)
+
 #endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 04/35] KVM: SVM: Make GHCB accessor functions available to the hypervisor
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (2 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 03/35] KVM: SVM: Add indirect access to the VM save area Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 05/35] KVM: SVM: Add initial support for SEV-ES GHCB access to KVM Tom Lendacky
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Update the GHCB accessor functions so that some of the macros can be used
by KVM when accessing the GHCB via the VMSA accessors. This will avoid
duplicating code and make access to the GHCB somewhat transparent.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/svm.h   | 15 +++++++++++++--
 arch/x86/kernel/cpu/vmware.c | 12 ++++++------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index da38eb195355..c112207c201b 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -349,15 +349,26 @@ struct vmcb {
 #define DEFINE_GHCB_ACCESSORS(field)						\
 	static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb)	\
 	{									\
+		const struct vmcb_save_area *vmsa = &ghcb->save;		\
+										\
 		return test_bit(GHCB_BITMAP_IDX(field),				\
-				(unsigned long *)&ghcb->save.valid_bitmap);	\
+				(unsigned long *)vmsa->valid_bitmap);		\
+	}									\
+										\
+	static inline u64 ghcb_get_##field(struct ghcb *ghcb)			\
+	{									\
+		const struct vmcb_save_area *vmsa = &ghcb->save;		\
+										\
+		return vmsa->field;						\
 	}									\
 										\
 	static inline void ghcb_set_##field(struct ghcb *ghcb, u64 value)	\
 	{									\
+		struct vmcb_save_area *vmsa = &ghcb->save;			\
+										\
 		__set_bit(GHCB_BITMAP_IDX(field),				\
 			  (unsigned long *)&ghcb->save.valid_bitmap);		\
-		ghcb->save.field = value;					\
+		vmsa->field = value;						\
 	}
 
 DEFINE_GHCB_ACCESSORS(cpl)
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 924571fe5864..c6ede3b3d302 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -501,12 +501,12 @@ static bool vmware_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
 	      ghcb_rbp_is_valid(ghcb)))
 		return false;
 
-	regs->bx = ghcb->save.rbx;
-	regs->cx = ghcb->save.rcx;
-	regs->dx = ghcb->save.rdx;
-	regs->si = ghcb->save.rsi;
-	regs->di = ghcb->save.rdi;
-	regs->bp = ghcb->save.rbp;
+	regs->bx = ghcb_get_rbx(ghcb);
+	regs->cx = ghcb_get_rcx(ghcb);
+	regs->dx = ghcb_get_rdx(ghcb);
+	regs->si = ghcb_get_rsi(ghcb);
+	regs->di = ghcb_get_rdi(ghcb);
+	regs->bp = ghcb_get_rbp(ghcb);
 
 	return true;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 05/35] KVM: SVM: Add initial support for SEV-ES GHCB access to KVM
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (3 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 04/35] KVM: SVM: Make GHCB accessor functions available to the hypervisor Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
       [not found]   ` <20200914205801.GA7084@sjchrist-ice>
  2020-09-14 20:15 ` [RFC PATCH 06/35] KVM: SVM: Add required changes to support intercepts under SEV-ES Tom Lendacky
                   ` (30 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Provide initial support for accessing the GHCB when needing to access
registers for an SEV-ES guest. The support consists of:

  - Accessing the GHCB instead of the VMSA when reading and writing
    guest registers (after the VMSA has been encrypted).
  - Creating register access override functions for reading and writing
    guest registers from the common KVM support.
  - Allocating pages for the VMSA and GHCB when creating each vCPU
    - The VMSA page holds the encrypted VMSA for the vCPU
    - The GHCB page is used to hold a copy of the guest GHCB during
      VMGEXIT processing.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h  |   7 ++
 arch/x86/include/asm/msr-index.h |   1 +
 arch/x86/kvm/kvm_cache_regs.h    |  30 +++++--
 arch/x86/kvm/svm/svm.c           | 138 ++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h           |  65 ++++++++++++++-
 5 files changed, 230 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5303dbc5c9bc..c900992701d6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -788,6 +788,9 @@ struct kvm_vcpu_arch {
 
 	/* AMD MSRC001_0015 Hardware Configuration */
 	u64 msr_hwcr;
+
+	/* SEV-ES support */
+	bool vmsa_encrypted;
 };
 
 struct kvm_lpage_info {
@@ -1227,6 +1230,10 @@ struct kvm_x86_ops {
 	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
 
 	void (*migrate_timers)(struct kvm_vcpu *vcpu);
+
+	void (*reg_read_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
+	void (*reg_write_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg,
+				   unsigned long val);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 249a4147c4b2..16f5b20bb099 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -466,6 +466,7 @@
 #define MSR_AMD64_IBSBRTARGET		0xc001103b
 #define MSR_AMD64_IBSOPDATA4		0xc001103d
 #define MSR_AMD64_IBS_REG_COUNT_MAX	8 /* includes MSR_AMD64_IBSBRTARGET */
+#define MSR_AMD64_VM_PAGE_FLUSH		0xc001011e
 #define MSR_AMD64_SEV_ES_GHCB		0xc0010130
 #define MSR_AMD64_SEV			0xc0010131
 #define MSR_AMD64_SEV_ENABLED_BIT	0
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index cfe83d4ae625..e87eb90999d5 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -9,15 +9,21 @@
 	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
 	 | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_PGE | X86_CR4_TSD)
 
-#define BUILD_KVM_GPR_ACCESSORS(lname, uname)				      \
-static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
-{									      \
-	return vcpu->arch.regs[VCPU_REGS_##uname];			      \
-}									      \
-static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu,	      \
-						unsigned long val)	      \
-{									      \
-	vcpu->arch.regs[VCPU_REGS_##uname] = val;			      \
+#define BUILD_KVM_GPR_ACCESSORS(lname, uname)					\
+static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)	\
+{										\
+	if (kvm_x86_ops.reg_read_override)					\
+		kvm_x86_ops.reg_read_override(vcpu, VCPU_REGS_##uname);		\
+										\
+	return vcpu->arch.regs[VCPU_REGS_##uname];				\
+}										\
+static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu,		\
+						unsigned long val)		\
+{										\
+	if (kvm_x86_ops.reg_write_override)					\
+		kvm_x86_ops.reg_write_override(vcpu, VCPU_REGS_##uname, val);	\
+										\
+	vcpu->arch.regs[VCPU_REGS_##uname] = val;				\
 }
 BUILD_KVM_GPR_ACCESSORS(rax, RAX)
 BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
@@ -67,6 +73,9 @@ static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
 	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
 		return 0;
 
+	if (kvm_x86_ops.reg_read_override)
+		kvm_x86_ops.reg_read_override(vcpu, reg);
+
 	if (!kvm_register_is_available(vcpu, reg))
 		kvm_x86_ops.cache_reg(vcpu, reg);
 
@@ -79,6 +88,9 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
 	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
 		return;
 
+	if (kvm_x86_ops.reg_write_override)
+		kvm_x86_ops.reg_write_override(vcpu, reg, val);
+
 	vcpu->arch.regs[reg] = val;
 	kvm_register_mark_dirty(vcpu, reg);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 779c167e42cc..d1f52211627a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1175,6 +1175,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	struct page *msrpm_pages;
 	struct page *hsave_page;
 	struct page *nested_msrpm_pages;
+	struct page *vmsa_page = NULL;
 	int err;
 
 	BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
@@ -1197,9 +1198,19 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	if (!hsave_page)
 		goto free_page3;
 
+	if (sev_es_guest(svm->vcpu.kvm)) {
+		/*
+		 * SEV-ES guests require a separate VMSA page used to contain
+		 * the encrypted register state of the guest.
+		 */
+		vmsa_page = alloc_page(GFP_KERNEL);
+		if (!vmsa_page)
+			goto free_page4;
+	}
+
 	err = avic_init_vcpu(svm);
 	if (err)
-		goto free_page4;
+		goto free_page5;
 
 	/* We initialize this flag to true to make sure that the is_running
 	 * bit would be set the first time the vcpu is loaded.
@@ -1219,6 +1230,12 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm->vmcb = page_address(page);
 	clear_page(svm->vmcb);
 	svm->vmcb_pa = __sme_set(page_to_pfn(page) << PAGE_SHIFT);
+
+	if (vmsa_page) {
+		svm->vmsa = page_address(vmsa_page);
+		clear_page(svm->vmsa);
+	}
+
 	svm->asid_generation = 0;
 	init_vmcb(svm);
 
@@ -1227,6 +1244,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 
 	return 0;
 
+free_page5:
+	if (vmsa_page)
+		__free_page(vmsa_page);
 free_page4:
 	__free_page(hsave_page);
 free_page3:
@@ -1258,6 +1278,26 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	 */
 	svm_clear_current_vmcb(svm->vmcb);
 
+	if (sev_es_guest(vcpu->kvm)) {
+		struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
+
+		if (vcpu->arch.vmsa_encrypted) {
+			u64 page_to_flush;
+
+			/*
+			 * The VMSA page was used by hardware to hold guest
+			 * encrypted state, be sure to flush it before returning
+			 * it to the system. This is done using the VM Page
+			 * Flush MSR (which takes the page virtual address and
+			 * guest ASID).
+			 */
+			page_to_flush = (u64)svm->vmsa | sev->asid;
+			wrmsrl(MSR_AMD64_VM_PAGE_FLUSH, page_to_flush);
+		}
+
+		__free_page(virt_to_page(svm->vmsa));
+	}
+
 	__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
 	__free_page(virt_to_page(svm->nested.hsave));
@@ -4012,6 +4052,99 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 		   (svm->vmcb->control.intercept & (1ULL << INTERCEPT_INIT));
 }
 
+/*
+ * These return values represent the offset in quad words within the VM save
+ * area. This allows them to be accessed by casting the save area to a u64
+ * array.
+ */
+#define VMSA_REG_ENTRY(_field)	 (offsetof(struct vmcb_save_area, _field) / sizeof(u64))
+#define VMSA_REG_UNDEF		 VMSA_REG_ENTRY(valid_bitmap)
+static inline unsigned int vcpu_to_vmsa_entry(enum kvm_reg reg)
+{
+	switch (reg) {
+	case VCPU_REGS_RAX:	return VMSA_REG_ENTRY(rax);
+	case VCPU_REGS_RBX:	return VMSA_REG_ENTRY(rbx);
+	case VCPU_REGS_RCX:	return VMSA_REG_ENTRY(rcx);
+	case VCPU_REGS_RDX:	return VMSA_REG_ENTRY(rdx);
+	case VCPU_REGS_RSP:	return VMSA_REG_ENTRY(rsp);
+	case VCPU_REGS_RBP:	return VMSA_REG_ENTRY(rbp);
+	case VCPU_REGS_RSI:	return VMSA_REG_ENTRY(rsi);
+	case VCPU_REGS_RDI:	return VMSA_REG_ENTRY(rdi);
+#ifdef CONFIG_X86_64
+	case VCPU_REGS_R8:	return VMSA_REG_ENTRY(r8);
+	case VCPU_REGS_R9:	return VMSA_REG_ENTRY(r9);
+	case VCPU_REGS_R10:	return VMSA_REG_ENTRY(r10);
+	case VCPU_REGS_R11:	return VMSA_REG_ENTRY(r11);
+	case VCPU_REGS_R12:	return VMSA_REG_ENTRY(r12);
+	case VCPU_REGS_R13:	return VMSA_REG_ENTRY(r13);
+	case VCPU_REGS_R14:	return VMSA_REG_ENTRY(r14);
+	case VCPU_REGS_R15:	return VMSA_REG_ENTRY(r15);
+#endif
+	case VCPU_REGS_RIP:	return VMSA_REG_ENTRY(rip);
+	default:
+		WARN_ONCE(1, "unsupported VCPU to VMSA register conversion\n");
+		return VMSA_REG_UNDEF;
+	}
+}
+
+/* For SEV-ES guests, populate the vCPU register from the appropriate VMSA/GHCB */
+static void svm_reg_read_override(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+	struct vmcb_save_area *vmsa;
+	struct vcpu_svm *svm;
+	unsigned int entry;
+	unsigned long val;
+	u64 *vmsa_reg;
+
+	if (!sev_es_guest(vcpu->kvm))
+		return;
+
+	entry = vcpu_to_vmsa_entry(reg);
+	if (entry == VMSA_REG_UNDEF)
+		return;
+
+	svm = to_svm(vcpu);
+	vmsa = get_vmsa(svm);
+	vmsa_reg = (u64 *)vmsa;
+	val = (unsigned long)vmsa_reg[entry];
+
+	/* If a GHCB is mapped, check the bitmap of valid entries */
+	if (svm->ghcb) {
+		if (!test_bit(entry, (unsigned long *)vmsa->valid_bitmap))
+			val = 0;
+	}
+
+	vcpu->arch.regs[reg] = val;
+}
+
+/* For SEV-ES guests, set the vCPU register in the appropriate VMSA */
+static void svm_reg_write_override(struct kvm_vcpu *vcpu, enum kvm_reg reg,
+				   unsigned long val)
+{
+	struct vmcb_save_area *vmsa;
+	struct vcpu_svm *svm;
+	unsigned int entry;
+	u64 *vmsa_reg;
+
+	entry = vcpu_to_vmsa_entry(reg);
+	if (entry == VMSA_REG_UNDEF)
+		return;
+
+	svm = to_svm(vcpu);
+	vmsa = get_vmsa(svm);
+	vmsa_reg = (u64 *)vmsa;
+
+	/* If a GHCB is mapped, set the bit to indicate a valid entry */
+	if (svm->ghcb) {
+		unsigned int index = entry / 8;
+		unsigned int shift = entry % 8;
+
+		vmsa->valid_bitmap[index] |= BIT(shift);
+	}
+
+	vmsa_reg[entry] = val;
+}
+
 static void svm_vm_destroy(struct kvm *kvm)
 {
 	avic_vm_destroy(kvm);
@@ -4150,6 +4283,9 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
 
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
+
+	.reg_read_override = svm_reg_read_override,
+	.reg_write_override = svm_reg_write_override,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f42ba9d158df..ff587536f571 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -159,6 +159,10 @@ struct vcpu_svm {
 	 */
 	struct list_head ir_list;
 	spinlock_t ir_list_lock;
+
+	/* SEV-ES support */
+	struct vmcb_save_area *vmsa;
+	struct ghcb *ghcb;
 };
 
 struct svm_cpu_data {
@@ -509,9 +513,34 @@ void sev_hardware_teardown(void);
 
 static inline struct vmcb_save_area *get_vmsa(struct vcpu_svm *svm)
 {
-	return &svm->vmcb->save;
+	struct vmcb_save_area *vmsa;
+
+	if (sev_es_guest(svm->vcpu.kvm)) {
+		/*
+		 * Before LAUNCH_UPDATE_VMSA, use the actual SEV-ES save area
+		 * to construct the initial state.  Afterwards, use the mapped
+		 * GHCB in a VMGEXIT or the traditional save area as a scratch
+		 * area when outside of a VMGEXIT.
+		 */
+		if (svm->vcpu.arch.vmsa_encrypted) {
+			if (svm->ghcb)
+				vmsa = &svm->ghcb->save;
+			else
+				vmsa = &svm->vmcb->save;
+		} else {
+			vmsa = svm->vmsa;
+		}
+	} else {
+		vmsa = &svm->vmcb->save;
+	}
+
+	return vmsa;
 }
 
+#define SEV_ES_SET_VALID(_vmsa, _field)					\
+	__set_bit(GHCB_BITMAP_IDX(_field),				\
+		  (unsigned long *)(_vmsa)->valid_bitmap)
+
 #define DEFINE_VMSA_SEGMENT_ENTRY(_field, _entry, _size)		\
 	static inline _size						\
 	svm_##_field##_read_##_entry(struct vcpu_svm *svm)		\
@@ -528,6 +557,9 @@ static inline struct vmcb_save_area *get_vmsa(struct vcpu_svm *svm)
 		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
 									\
 		vmsa->_field._entry = value;				\
+		if (svm->vcpu.arch.vmsa_encrypted) {			\
+			SEV_ES_SET_VALID(vmsa, _field);			\
+		}							\
 	}								\
 
 #define DEFINE_VMSA_SEGMENT_ACCESSOR(_field)				\
@@ -551,6 +583,9 @@ static inline struct vmcb_save_area *get_vmsa(struct vcpu_svm *svm)
 		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
 									\
 		vmsa->_field = *seg;					\
+		if (svm->vcpu.arch.vmsa_encrypted) {			\
+			SEV_ES_SET_VALID(vmsa, _field);			\
+		}							\
 	}
 
 DEFINE_VMSA_SEGMENT_ACCESSOR(cs)
@@ -579,6 +614,9 @@ DEFINE_VMSA_SEGMENT_ACCESSOR(tr)
 		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
 									\
 		vmsa->_field = value;					\
+		if (svm->vcpu.arch.vmsa_encrypted) {			\
+			SEV_ES_SET_VALID(vmsa, _field);			\
+		}							\
 	}								\
 									\
 	static inline void						\
@@ -587,6 +625,9 @@ DEFINE_VMSA_SEGMENT_ACCESSOR(tr)
 		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
 									\
 		vmsa->_field &= value;					\
+		if (svm->vcpu.arch.vmsa_encrypted) {			\
+			SEV_ES_SET_VALID(vmsa, _field);			\
+		}							\
 	}								\
 									\
 	static inline void						\
@@ -595,6 +636,9 @@ DEFINE_VMSA_SEGMENT_ACCESSOR(tr)
 		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
 									\
 		vmsa->_field |= value;					\
+		if (svm->vcpu.arch.vmsa_encrypted) {			\
+			SEV_ES_SET_VALID(vmsa, _field);			\
+		}							\
 	}
 
 #define DEFINE_VMSA_ACCESSOR(_field)					\
@@ -629,6 +673,25 @@ DEFINE_VMSA_ACCESSOR(last_excp_to)
 DEFINE_VMSA_U8_ACCESSOR(cpl)
 DEFINE_VMSA_ACCESSOR(rip)
 DEFINE_VMSA_ACCESSOR(rax)
+DEFINE_VMSA_ACCESSOR(rbx)
+DEFINE_VMSA_ACCESSOR(rcx)
+DEFINE_VMSA_ACCESSOR(rdx)
 DEFINE_VMSA_ACCESSOR(rsp)
+DEFINE_VMSA_ACCESSOR(rbp)
+DEFINE_VMSA_ACCESSOR(rsi)
+DEFINE_VMSA_ACCESSOR(rdi)
+DEFINE_VMSA_ACCESSOR(r8)
+DEFINE_VMSA_ACCESSOR(r9)
+DEFINE_VMSA_ACCESSOR(r10)
+DEFINE_VMSA_ACCESSOR(r11)
+DEFINE_VMSA_ACCESSOR(r12)
+DEFINE_VMSA_ACCESSOR(r13)
+DEFINE_VMSA_ACCESSOR(r14)
+DEFINE_VMSA_ACCESSOR(r15)
+DEFINE_VMSA_ACCESSOR(sw_exit_code)
+DEFINE_VMSA_ACCESSOR(sw_exit_info_1)
+DEFINE_VMSA_ACCESSOR(sw_exit_info_2)
+DEFINE_VMSA_ACCESSOR(sw_scratch)
+DEFINE_VMSA_ACCESSOR(xcr0)
 
 #endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 06/35] KVM: SVM: Add required changes to support intercepts under SEV-ES
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (4 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 05/35] KVM: SVM: Add initial support for SEV-ES GHCB access to KVM Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 07/35] KVM: SVM: Modify DRx register intercepts for an SEV-ES guest Tom Lendacky
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

When a guest is running under SEV-ES, the hypervisor cannot access the
guest register state. There are numerous places in the KVM code where
certain registers are accessed that are not allowed to be accessed (e.g.
RIP, CR0, etc). Add checks to prevent register accesses and intercept
updates at various points within the KVM code.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/svm.h |   3 +-
 arch/x86/kvm/cpuid.c       |   1 +
 arch/x86/kvm/svm/svm.c     | 114 ++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c         |   6 +-
 4 files changed, 113 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index c112207c201b..ed03d23f56fe 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -130,7 +130,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
 #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
 
-#define SVM_INTERRUPT_SHADOW_MASK 1
+#define SVM_INTERRUPT_SHADOW_MASK	BIT_ULL(0)
+#define SVM_GUEST_INTERRUPT_MASK	BIT_ULL(1)
 
 #define SVM_IOIO_STR_SHIFT 2
 #define SVM_IOIO_REP_SHIFT 3
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3fd6eec202d7..15f2b2365936 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -115,6 +115,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 					   MSR_IA32_MISC_ENABLE_MWAIT);
 	}
 }
+EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
 
 static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d1f52211627a..f8a5b7164008 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -36,6 +36,7 @@
 #include <asm/mce.h>
 #include <asm/spec-ctrl.h>
 #include <asm/cpu_device_id.h>
+#include <asm/traps.h>
 
 #include <asm/virtext.h>
 #include "trace.h"
@@ -320,6 +321,13 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	/*
+	 * SEV-ES does not expose the next RIP. The RIP update is controlled by
+	 * the type of exit and the #VC handler in the guest.
+	 */
+	if (sev_es_guest(vcpu->kvm))
+		goto done;
+
 	if (nrips && svm->vmcb->control.next_rip != 0) {
 		WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
 		svm->next_rip = svm->vmcb->control.next_rip;
@@ -331,6 +339,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	} else {
 		kvm_rip_write(vcpu, svm->next_rip);
 	}
+
+done:
 	svm_set_interrupt_shadow(vcpu, 0);
 
 	return 1;
@@ -1578,9 +1588,17 @@ static void svm_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
 
 static void update_cr0_intercept(struct vcpu_svm *svm)
 {
-	ulong gcr0 = svm->vcpu.arch.cr0;
+	ulong gcr0;
 	u64 hcr0;
 
+	/*
+	 * SEV-ES guests must always keep the CR intercepts cleared. CR
+	 * tracking is done using the CR write traps.
+	 */
+	if (sev_es_guest(svm->vcpu.kvm))
+		return;
+
+	gcr0 = svm->vcpu.arch.cr0;
 	hcr0 = (svm_cr0_read(svm) & ~SVM_CR0_SELECTIVE_MASK)
 		| (gcr0 & SVM_CR0_SELECTIVE_MASK);
 
@@ -2209,6 +2227,17 @@ static int task_switch_interception(struct vcpu_svm *svm)
 
 static int cpuid_interception(struct vcpu_svm *svm)
 {
+	/*
+	 * SEV-ES guests require the vCPU arch registers to be populated via
+	 * the GHCB.
+	 */
+	if (sev_es_guest(svm->vcpu.kvm)) {
+		if (kvm_register_read(&svm->vcpu, VCPU_REGS_RAX) == 0x0d) {
+			svm->vcpu.arch.xcr0 = svm_xcr0_read(svm);
+			kvm_update_cpuid_runtime(&svm->vcpu);
+		}
+	}
+
 	return kvm_emulate_cpuid(&svm->vcpu);
 }
 
@@ -2527,7 +2556,28 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 static int rdmsr_interception(struct vcpu_svm *svm)
 {
-	return kvm_emulate_rdmsr(&svm->vcpu);
+	u32 ecx = kvm_rcx_read(&svm->vcpu);
+	u64 data;
+
+	if (kvm_get_msr(&svm->vcpu, ecx, &data)) {
+		trace_kvm_msr_read_ex(ecx);
+		if (sev_es_guest(svm->vcpu.kvm)) {
+			ghcb_set_sw_exit_info_1(svm->ghcb, 1);
+			ghcb_set_sw_exit_info_2(svm->ghcb,
+						X86_TRAP_GP |
+						SVM_EVTINJ_TYPE_EXEPT |
+						SVM_EVTINJ_VALID);
+		} else {
+			kvm_inject_gp(&svm->vcpu, 0);
+		}
+		return 1;
+	}
+
+	trace_kvm_msr_read(ecx, data);
+
+	kvm_rax_write(&svm->vcpu, data & 0xffffffff);
+	kvm_rdx_write(&svm->vcpu, data >> 32);
+	return kvm_skip_emulated_instruction(&svm->vcpu);
 }
 
 static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
@@ -2716,7 +2766,25 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 static int wrmsr_interception(struct vcpu_svm *svm)
 {
-	return kvm_emulate_wrmsr(&svm->vcpu);
+	u32 ecx = kvm_rcx_read(&svm->vcpu);
+	u64 data = kvm_read_edx_eax(&svm->vcpu);
+
+	if (kvm_set_msr(&svm->vcpu, ecx, data)) {
+		trace_kvm_msr_write_ex(ecx, data);
+		if (sev_es_guest(svm->vcpu.kvm)) {
+			ghcb_set_sw_exit_info_1(svm->ghcb, 1);
+			ghcb_set_sw_exit_info_2(svm->ghcb,
+						X86_TRAP_GP |
+						SVM_EVTINJ_TYPE_EXEPT |
+						SVM_EVTINJ_VALID);
+		} else {
+			kvm_inject_gp(&svm->vcpu, 0);
+		}
+		return 1;
+	}
+
+	trace_kvm_msr_write(ecx, data);
+	return kvm_skip_emulated_instruction(&svm->vcpu);
 }
 
 static int msr_interception(struct vcpu_svm *svm)
@@ -2746,7 +2814,14 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
 static int pause_interception(struct vcpu_svm *svm)
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
-	bool in_kernel = (svm_get_cpl(vcpu) == 0);
+	bool in_kernel;
+
+	/*
+	 * CPL is not made available for an SEV-ES guest, so just set in_kernel
+	 * to true.
+	 */
+	in_kernel = (sev_es_guest(svm->vcpu.kvm)) ? true
+						  : (svm_get_cpl(vcpu) == 0);
 
 	if (!kvm_pause_in_guest(vcpu->kvm))
 		grow_ple_window(vcpu);
@@ -2972,10 +3047,13 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 
 	trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
 
-	if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
-		vcpu->arch.cr0 = svm_cr0_read(svm);
-	if (npt_enabled)
-		vcpu->arch.cr3 = svm_cr3_read(svm);
+	/* SEV-ES guests must use the CR write traps to track CR registers. */
+	if (!sev_es_guest(vcpu->kvm)) {
+		if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
+			vcpu->arch.cr0 = svm_cr0_read(svm);
+		if (npt_enabled)
+			vcpu->arch.cr3 = svm_cr3_read(svm);
+	}
 
 	svm_complete_interrupts(svm);
 
@@ -3094,6 +3172,13 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	/*
+	 * SEV-ES guests must always keep the CR intercepts cleared. CR
+	 * tracking is done using the CR write traps.
+	 */
+	if (sev_es_guest(vcpu->kvm))
+		return;
+
 	if (nested_svm_virtualize_tpr(vcpu))
 		return;
 
@@ -3162,6 +3247,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb *vmcb = svm->vmcb;
 
+	/*
+	 * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
+	 * bit to determine the state of the IF flag.
+	 */
+	if (sev_es_guest(svm->vcpu.kvm))
+		return !(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK);
+
 	if (!gif_set(svm))
 		return true;
 
@@ -3347,6 +3439,12 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
 		svm->vcpu.arch.nmi_injected = true;
 		break;
 	case SVM_EXITINTINFO_TYPE_EXEPT:
+		/*
+		 * Never re-inject a #VC exception.
+		 */
+		if (vector == X86_TRAP_VC)
+			break;
+
 		/*
 		 * In case of software exceptions, do not reinject the vector,
 		 * but re-execute the instruction instead. Rewind RIP first
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 539ea1cd6020..a5afdccb6c17 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3771,7 +3771,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	int idx;
 
-	if (vcpu->preempted)
+	if (vcpu->preempted && !vcpu->arch.vmsa_encrypted)
 		vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu);
 
 	/*
@@ -7774,7 +7774,9 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
 {
 	struct kvm_run *kvm_run = vcpu->run;
 
-	kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
+	kvm_run->if_flag = (vcpu->arch.vmsa_encrypted)
+		? kvm_arch_interrupt_allowed(vcpu)
+		: (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
 	kvm_run->flags = is_smm(vcpu) ? KVM_RUN_X86_SMM : 0;
 	kvm_run->cr8 = kvm_get_cr8(vcpu);
 	kvm_run->apic_base = kvm_get_apic_base(vcpu);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 07/35] KVM: SVM: Modify DRx register intercepts for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (5 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 06/35] KVM: SVM: Add required changes to support intercepts under SEV-ES Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES Tom Lendacky
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

An SEV-ES guest must only and always intercept DR7 reads and writes.
Update set_dr_intercepts() and clr_dr_intercepts() to account for this.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/svm.h | 89 ++++++++++++++++++++++++------------------
 1 file changed, 50 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ff587536f571..9953ee7f54cd 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -190,6 +190,28 @@ static inline struct kvm_svm *to_kvm_svm(struct kvm *kvm)
 	return container_of(kvm, struct kvm_svm, kvm);
 }
 
+static inline bool sev_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return sev->active;
+#else
+	return false;
+#endif
+}
+
+static inline bool sev_es_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return sev_guest(kvm) && sev->es_active;
+#else
+	return false;
+#endif
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
@@ -244,26 +266,35 @@ static inline bool is_cr_intercept(struct vcpu_svm *svm, int bit)
 	return vmcb->control.intercept_cr & (1U << bit);
 }
 
+#define SVM_DR_INTERCEPTS		\
+	((1 << INTERCEPT_DR0_READ)	\
+	| (1 << INTERCEPT_DR1_READ)	\
+	| (1 << INTERCEPT_DR2_READ)	\
+	| (1 << INTERCEPT_DR3_READ)	\
+	| (1 << INTERCEPT_DR4_READ)	\
+	| (1 << INTERCEPT_DR5_READ)	\
+	| (1 << INTERCEPT_DR6_READ)	\
+	| (1 << INTERCEPT_DR7_READ)	\
+	| (1 << INTERCEPT_DR0_WRITE)	\
+	| (1 << INTERCEPT_DR1_WRITE)	\
+	| (1 << INTERCEPT_DR2_WRITE)	\
+	| (1 << INTERCEPT_DR3_WRITE)	\
+	| (1 << INTERCEPT_DR4_WRITE)	\
+	| (1 << INTERCEPT_DR5_WRITE)	\
+	| (1 << INTERCEPT_DR6_WRITE)	\
+	| (1 << INTERCEPT_DR7_WRITE))
+
+#define SVM_SEV_ES_DR_INTERCEPTS	\
+	((1 << INTERCEPT_DR7_READ)	\
+	| (1 << INTERCEPT_DR7_WRITE))
+
 static inline void set_dr_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept_dr = (1 << INTERCEPT_DR0_READ)
-		| (1 << INTERCEPT_DR1_READ)
-		| (1 << INTERCEPT_DR2_READ)
-		| (1 << INTERCEPT_DR3_READ)
-		| (1 << INTERCEPT_DR4_READ)
-		| (1 << INTERCEPT_DR5_READ)
-		| (1 << INTERCEPT_DR6_READ)
-		| (1 << INTERCEPT_DR7_READ)
-		| (1 << INTERCEPT_DR0_WRITE)
-		| (1 << INTERCEPT_DR1_WRITE)
-		| (1 << INTERCEPT_DR2_WRITE)
-		| (1 << INTERCEPT_DR3_WRITE)
-		| (1 << INTERCEPT_DR4_WRITE)
-		| (1 << INTERCEPT_DR5_WRITE)
-		| (1 << INTERCEPT_DR6_WRITE)
-		| (1 << INTERCEPT_DR7_WRITE);
+	vmcb->control.intercept_dr =
+		(sev_es_guest(svm->vcpu.kvm)) ? SVM_SEV_ES_DR_INTERCEPTS
+					      : SVM_DR_INTERCEPTS;
 
 	recalc_intercepts(svm);
 }
@@ -272,7 +303,9 @@ static inline void clr_dr_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = get_host_vmcb(svm);
 
-	vmcb->control.intercept_dr = 0;
+	vmcb->control.intercept_dr =
+		(sev_es_guest(svm->vcpu.kvm)) ? SVM_SEV_ES_DR_INTERCEPTS
+					      : 0;
 
 	recalc_intercepts(svm);
 }
@@ -472,28 +505,6 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 
 extern unsigned int max_sev_asid;
 
-static inline bool sev_guest(struct kvm *kvm)
-{
-#ifdef CONFIG_KVM_AMD_SEV
-	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-
-	return sev->active;
-#else
-	return false;
-#endif
-}
-
-static inline bool sev_es_guest(struct kvm *kvm)
-{
-#ifdef CONFIG_KVM_AMD_SEV
-	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-
-	return sev_guest(kvm) && sev->es_active;
-#else
-	return false;
-#endif
-}
-
 static inline bool svm_sev_enabled(void)
 {
 	return IS_ENABLED(CONFIG_KVM_AMD_SEV) ? max_sev_asid : 0;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (6 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 07/35] KVM: SVM: Modify DRx register intercepts for an SEV-ES guest Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 21:26   ` Sean Christopherson
  2020-09-14 20:15 ` [RFC PATCH 09/35] KVM: SVM: Do not emulate MMIO " Tom Lendacky
                   ` (27 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Since the guest register state of an SEV-ES guest is encrypted, debugging
is not supported. Update the code to prevent guest debugging when the
guest is an SEV-ES guest. This includes adding a callable function that
is used to determine if the guest supports being debugged.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm/svm.c          | 16 ++++++++++++++++
 arch/x86/kvm/vmx/vmx.c          |  7 +++++++
 arch/x86/kvm/x86.c              |  3 +++
 4 files changed, 28 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c900992701d6..3e2a3d2a8ba8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1234,6 +1234,8 @@ struct kvm_x86_ops {
 	void (*reg_read_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
 	void (*reg_write_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg,
 				   unsigned long val);
+
+	bool (*allow_debug)(struct kvm *kvm);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f8a5b7164008..47fa2067609a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1729,6 +1729,9 @@ static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value)
 {
 	struct vmcb *vmcb = svm->vmcb;
 
+	if (svm->vcpu.arch.vmsa_encrypted)
+		return;
+
 	if (unlikely(value != svm_dr6_read(svm))) {
 		svm_dr6_write(svm, value);
 		vmcb_mark_dirty(vmcb, VMCB_DR);
@@ -1739,6 +1742,9 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (vcpu->arch.vmsa_encrypted)
+		return;
+
 	get_debugreg(vcpu->arch.db[0], 0);
 	get_debugreg(vcpu->arch.db[1], 1);
 	get_debugreg(vcpu->arch.db[2], 2);
@@ -1757,6 +1763,9 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (vcpu->arch.vmsa_encrypted)
+		return;
+
 	svm_dr7_write(svm, value);
 	vmcb_mark_dirty(svm->vmcb, VMCB_DR);
 }
@@ -4243,6 +4252,11 @@ static void svm_reg_write_override(struct kvm_vcpu *vcpu, enum kvm_reg reg,
 	vmsa_reg[entry] = val;
 }
 
+static bool svm_allow_debug(struct kvm *kvm)
+{
+	return !sev_es_guest(kvm);
+}
+
 static void svm_vm_destroy(struct kvm *kvm)
 {
 	avic_vm_destroy(kvm);
@@ -4384,6 +4398,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.reg_read_override = svm_reg_read_override,
 	.reg_write_override = svm_reg_write_override,
+
+	.allow_debug = svm_allow_debug,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..fb8591bba96f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7879,6 +7879,11 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
 	return supported & BIT(bit);
 }
 
+static bool vmx_allow_debug(struct kvm *kvm)
+{
+	return true;
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.hardware_unsetup = hardware_unsetup,
 
@@ -8005,6 +8010,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
 	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
 	.migrate_timers = vmx_migrate_timers,
+
+	.allow_debug = vmx_allow_debug,
 };
 
 static __init int hardware_setup(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a5afdccb6c17..9970c0b7854f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9279,6 +9279,9 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 	unsigned long rflags;
 	int i, r;
 
+	if (!kvm_x86_ops.allow_debug(vcpu->kvm))
+		return -EINVAL;
+
 	vcpu_load(vcpu);
 
 	if (dbg->control & (KVM_GUESTDBG_INJECT_DB | KVM_GUESTDBG_INJECT_BP)) {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 09/35] KVM: SVM: Do not emulate MMIO under SEV-ES
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (7 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 21:33   ` Sean Christopherson
  2020-09-14 20:15 ` [RFC PATCH 10/35] KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES Tom Lendacky
                   ` (26 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

When a guest is running as an SEV-ES guest, it is not possible to emulate
MMIO. Add support to prevent trying to perform MMIO emulation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a5d0207e7189..2e1b8b876286 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5485,6 +5485,13 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 	if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
 		emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
 emulate:
+	/*
+	 * When the guest is an SEV-ES guest, emulation is not possible.  Allow
+	 * the guest to handle the MMIO emulation.
+	 */
+	if (vcpu->arch.vmsa_encrypted)
+		return 1;
+
 	/*
 	 * On AMD platforms, under certain conditions insn_len may be zero on #NPF.
 	 * This can happen if a guest gets a page-fault on data access but the HW
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 10/35] KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (8 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 09/35] KVM: SVM: Do not emulate MMIO " Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 11/35] KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file Tom Lendacky
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

When a SHUTDOWN VMEXIT is encountered, normally the VMCB is re-initialized
so that the guest can be re-launched. But when a guest is running as an
SEV-ES guest, the VMSA cannot be re-initialized because it has been
encrypted. For now, just return -EINVAL to prevent a possible attempt at
a guest reset.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/svm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 47fa2067609a..f9daa40b3cfc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1953,6 +1953,13 @@ static int shutdown_interception(struct vcpu_svm *svm)
 {
 	struct kvm_run *kvm_run = svm->vcpu.run;
 
+	/*
+	 * The VM save area has already been encrypted so it
+	 * cannot be reinitialized - just terminate.
+	 */
+	if (sev_es_guest(svm->vcpu.kvm))
+		return -EINVAL;
+
 	/*
 	 * VMCB is undefined after a SHUTDOWN intercept
 	 * so reinitialize it.
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 11/35] KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (9 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 10/35] KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
       [not found]   ` <20200915172148.GE8420@sjchrist-ice>
  2020-09-14 20:15 ` [RFC PATCH 12/35] KVM: SVM: Add initial support for a VMGEXIT VMEXIT Tom Lendacky
                   ` (24 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

This is a pre-patch to consolidate some exit handling code into callable
functions. Follow-on patches for SEV-ES exit handling will then be able
to use them from the sev.c file.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/svm.c | 64 +++++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f9daa40b3cfc..6a4cc535ba77 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3047,6 +3047,43 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	       "excp_to:", save->last_excp_to);
 }
 
+static bool svm_is_supported_exit(struct kvm_vcpu *vcpu, u64 exit_code)
+{
+	if (exit_code < ARRAY_SIZE(svm_exit_handlers) &&
+	    svm_exit_handlers[exit_code])
+		return true;
+
+	vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%llx\n", exit_code);
+	dump_vmcb(vcpu);
+	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
+	vcpu->run->internal.ndata = 2;
+	vcpu->run->internal.data[0] = exit_code;
+	vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
+
+	return false;
+}
+
+static int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
+{
+	if (!svm_is_supported_exit(&svm->vcpu, exit_code))
+		return 0;
+
+#ifdef CONFIG_RETPOLINE
+	if (exit_code == SVM_EXIT_MSR)
+		return msr_interception(svm);
+	else if (exit_code == SVM_EXIT_VINTR)
+		return interrupt_window_interception(svm);
+	else if (exit_code == SVM_EXIT_INTR)
+		return intr_interception(svm);
+	else if (exit_code == SVM_EXIT_HLT)
+		return halt_interception(svm);
+	else if (exit_code == SVM_EXIT_NPF)
+		return npf_interception(svm);
+#endif
+	return svm_exit_handlers[exit_code](svm);
+}
+
 static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2)
 {
 	struct vmcb_control_area *control = &to_svm(vcpu)->vmcb->control;
@@ -3113,32 +3150,7 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	if (exit_fastpath != EXIT_FASTPATH_NONE)
 		return 1;
 
-	if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
-	    || !svm_exit_handlers[exit_code]) {
-		vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%x\n", exit_code);
-		dump_vmcb(vcpu);
-		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-		vcpu->run->internal.suberror =
-			KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
-		vcpu->run->internal.ndata = 2;
-		vcpu->run->internal.data[0] = exit_code;
-		vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
-		return 0;
-	}
-
-#ifdef CONFIG_RETPOLINE
-	if (exit_code == SVM_EXIT_MSR)
-		return msr_interception(svm);
-	else if (exit_code == SVM_EXIT_VINTR)
-		return interrupt_window_interception(svm);
-	else if (exit_code == SVM_EXIT_INTR)
-		return intr_interception(svm);
-	else if (exit_code == SVM_EXIT_HLT)
-		return halt_interception(svm);
-	else if (exit_code == SVM_EXIT_NPF)
-		return npf_interception(svm);
-#endif
-	return svm_exit_handlers[exit_code](svm);
+	return svm_invoke_exit_handler(svm, exit_code);
 }
 
 static void reload_tss(struct kvm_vcpu *vcpu)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 12/35] KVM: SVM: Add initial support for a VMGEXIT VMEXIT
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (10 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 11/35] KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 13/35] KVM: SVM: Create trace events for VMGEXIT processing Tom Lendacky
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a
VMGEXIT includes reading the guest GHCB and performing the requested
action. Since many of the VMGEXIT exit reasons correspond to existing
VMEXIT reasons, the information from the GHCB is copied into the VMCB
control exit code areas and then the standard exit handlers are invoked,
similar to standard VMEXIT processing. Before restarting the vCPU, the
now updated SVM GHCB is copied back to the guest GHCB.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/svm.h      |  2 +-
 arch/x86/include/uapi/asm/svm.h |  7 ++++
 arch/x86/kvm/svm/sev.c          | 65 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |  6 ++-
 arch/x86/kvm/svm/svm.h          |  7 ++++
 5 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ed03d23f56fe..07b4ac1e7179 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -82,7 +82,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u32 exit_int_info_err;
 	u64 nested_ctl;
 	u64 avic_vapic_bar;
-	u8 reserved_4[8];
+	u64 ghcb_gpa;
 	u32 event_inj;
 	u32 event_inj_err;
 	u64 nested_cr3;
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 0f837339db66..0bc3942ffdd3 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -80,6 +80,7 @@
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
+#define SVM_EXIT_VMGEXIT       0x403
 
 /* SEV-ES software-defined VMGEXIT events */
 #define SVM_VMGEXIT_MMIO_READ			0x80000001
@@ -185,6 +186,12 @@
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
+	{ SVM_EXIT_VMGEXIT,		"vmgexit" }, \
+	{ SVM_VMGEXIT_MMIO_READ,	"vmgexit_mmio_read" }, \
+	{ SVM_VMGEXIT_MMIO_WRITE,	"vmgexit_mmio_write" }, \
+	{ SVM_VMGEXIT_NMI_COMPLETE,	"vmgexit_nmi_complete" }, \
+	{ SVM_VMGEXIT_AP_HLT_LOOP,	"vmgexit_ap_hlt_loop" }, \
+	{ SVM_VMGEXIT_AP_JUMP_TABLE,	"vmgexit_ap_jump_table" }, \
 	{ SVM_EXIT_ERR,         "invalid_guest_state" }
 
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 48379e21ed43..e085d8b83a32 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1180,11 +1180,23 @@ void sev_hardware_teardown(void)
 	sev_flush_asids();
 }
 
+static void pre_sev_es_run(struct vcpu_svm *svm)
+{
+	if (!svm->ghcb)
+		return;
+
+	kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
+	svm->ghcb = NULL;
+}
+
 void pre_sev_run(struct vcpu_svm *svm, int cpu)
 {
 	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 	int asid = sev_get_asid(svm->vcpu.kvm);
 
+	/* Perform any SEV-ES pre-run actions */
+	pre_sev_es_run(svm);
+
 	/* Assign the asid allocated with this SEV guest */
 	svm->vmcb->control.asid = asid;
 
@@ -1202,3 +1214,56 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
 	svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 }
+
+static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
+{
+	return -EINVAL;
+}
+
+int sev_handle_vmgexit(struct vcpu_svm *svm)
+{
+	struct vmcb_control_area *control = &svm->vmcb->control;
+	struct ghcb *ghcb;
+	u64 ghcb_gpa;
+	int ret;
+
+	/* Validate the GHCB */
+	ghcb_gpa = control->ghcb_gpa;
+	if (ghcb_gpa & GHCB_MSR_INFO_MASK)
+		return sev_handle_vmgexit_msr_protocol(svm);
+
+	if (!ghcb_gpa) {
+		pr_err("vmgexit: GHCB gpa is not set\n");
+		return -EINVAL;
+	}
+
+	if (kvm_vcpu_map(&svm->vcpu, ghcb_gpa >> PAGE_SHIFT, &svm->ghcb_map)) {
+		/* Unable to map GHCB from guest */
+		pr_err("vmgexit: error mapping GHCB from guest\n");
+		return -EINVAL;
+	}
+
+	svm->ghcb = svm->ghcb_map.hva;
+	ghcb = svm->ghcb_map.hva;
+
+	control->exit_code = lower_32_bits(ghcb_get_sw_exit_code(ghcb));
+	control->exit_code_hi = upper_32_bits(ghcb_get_sw_exit_code(ghcb));
+	control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
+	control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);
+
+	ghcb_set_sw_exit_info_1(ghcb, 0);
+	ghcb_set_sw_exit_info_2(ghcb, 0);
+
+	ret = -EINVAL;
+	switch (ghcb_get_sw_exit_code(ghcb)) {
+	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+		pr_err("vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
+		       control->exit_info_1,
+		       control->exit_info_2);
+		break;
+	default:
+		ret = svm_invoke_exit_handler(svm, ghcb_get_sw_exit_code(ghcb));
+	}
+
+	return ret;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6a4cc535ba77..89ee9d533e9a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2929,6 +2929,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
 	[SVM_EXIT_AVIC_UNACCELERATED_ACCESS]	= avic_unaccelerated_access_interception,
+	[SVM_EXIT_VMGEXIT]			= sev_handle_vmgexit,
 };
 
 static void dump_vmcb(struct kvm_vcpu *vcpu)
@@ -2968,6 +2969,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
 	pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
 	pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
+	pr_err("%-20s%016llx\n", "ghcb:", control->ghcb_gpa);
 	pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
 	pr_err("%-20s%08x\n", "event_inj_err:", control->event_inj_err);
 	pr_err("%-20s%lld\n", "virt_ext:", control->virt_ext);
@@ -3064,7 +3066,7 @@ static bool svm_is_supported_exit(struct kvm_vcpu *vcpu, u64 exit_code)
 	return false;
 }
 
-static int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
+int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
 {
 	if (!svm_is_supported_exit(&svm->vcpu, exit_code))
 		return 0;
@@ -3080,6 +3082,8 @@ static int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
 		return halt_interception(svm);
 	else if (exit_code == SVM_EXIT_NPF)
 		return npf_interception(svm);
+	else if (exit_code == SVM_EXIT_VMGEXIT)
+		return sev_handle_vmgexit(svm);
 #endif
 	return svm_exit_handlers[exit_code](svm);
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9953ee7f54cd..1690e52d5265 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -17,6 +17,7 @@
 
 #include <linux/kvm_types.h>
 #include <linux/kvm_host.h>
+#include <linux/bits.h>
 
 #include <asm/svm.h>
 
@@ -163,6 +164,7 @@ struct vcpu_svm {
 	/* SEV-ES support */
 	struct vmcb_save_area *vmsa;
 	struct ghcb *ghcb;
+	struct kvm_host_map ghcb_map;
 };
 
 struct svm_cpu_data {
@@ -399,6 +401,7 @@ bool svm_smi_blocked(struct kvm_vcpu *vcpu);
 bool svm_nmi_blocked(struct kvm_vcpu *vcpu);
 bool svm_interrupt_blocked(struct kvm_vcpu *vcpu);
 void svm_set_gif(struct vcpu_svm *svm, bool value);
+int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code);
 
 /* nested.c */
 
@@ -503,6 +506,9 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
+#define GHCB_MSR_INFO_POS		0
+#define GHCB_MSR_INFO_MASK		(BIT_ULL(12) - 1)
+
 extern unsigned int max_sev_asid;
 
 static inline bool svm_sev_enabled(void)
@@ -519,6 +525,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_hardware_setup(void);
 void sev_hardware_teardown(void);
+int sev_handle_vmgexit(struct vcpu_svm *svm);
 
 /* VMSA Accessor functions */
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 13/35] KVM: SVM: Create trace events for VMGEXIT processing
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (11 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 12/35] KVM: SVM: Add initial support for a VMGEXIT VMEXIT Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 14/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002 Tom Lendacky
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add trace events for entry to and exit from VMGEXIT processing. The vCPU
id and the exit reason will be common for the trace events. The exit info
fields and valid bitmap fields will represent the input and output values
for the entry and exit events, respectively.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c |  6 +++++
 arch/x86/kvm/trace.h   | 55 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c     |  2 ++
 3 files changed, 63 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e085d8b83a32..f0fd89788de7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -14,9 +14,11 @@
 #include <linux/psp-sev.h>
 #include <linux/pagemap.h>
 #include <linux/swap.h>
+#include <linux/trace_events.h>
 
 #include "x86.h"
 #include "svm.h"
+#include "trace.h"
 
 static int sev_flush_asids(void);
 static DECLARE_RWSEM(sev_deactivate_lock);
@@ -1185,6 +1187,8 @@ static void pre_sev_es_run(struct vcpu_svm *svm)
 	if (!svm->ghcb)
 		return;
 
+	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
+
 	kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
 	svm->ghcb = NULL;
 }
@@ -1246,6 +1250,8 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
 	svm->ghcb = svm->ghcb_map.hva;
 	ghcb = svm->ghcb_map.hva;
 
+	trace_kvm_vmgexit_enter(svm->vcpu.vcpu_id, ghcb);
+
 	control->exit_code = lower_32_bits(ghcb_get_sw_exit_code(ghcb));
 	control->exit_code_hi = upper_32_bits(ghcb_get_sw_exit_code(ghcb));
 	control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b66432b015d2..06e5c15d0508 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1592,6 +1592,61 @@ TRACE_EVENT(kvm_hv_syndbg_get_msr,
 		  __entry->vcpu_id, __entry->vp_index, __entry->msr,
 		  __entry->data)
 );
+
+/*
+ * Tracepoint for the start of VMGEXIT processing
+ */
+TRACE_EVENT(kvm_vmgexit_enter,
+	TP_PROTO(unsigned int vcpu_id, struct ghcb *ghcb),
+	TP_ARGS(vcpu_id, ghcb),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+		__field(u64, exit_reason)
+		__field(u64, info1)
+		__field(u64, info2)
+		__field(u8 *, bitmap)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id     = vcpu_id;
+		__entry->exit_reason = ghcb->save.sw_exit_code;
+		__entry->info1       = ghcb->save.sw_exit_info_1;
+		__entry->info2       = ghcb->save.sw_exit_info_2;
+		__entry->bitmap      = ghcb->save.valid_bitmap;
+	),
+
+	TP_printk("vcpu %u, exit_reason %llx, exit_info1 %llx, exit_info2 %llx, valid_bitmap",
+		  __entry->vcpu_id, __entry->exit_reason,
+		  __entry->info1, __entry->info2)
+);
+
+/*
+ * Tracepoint for the end of VMGEXIT processing
+ */
+TRACE_EVENT(kvm_vmgexit_exit,
+	TP_PROTO(unsigned int vcpu_id, struct ghcb *ghcb),
+	TP_ARGS(vcpu_id, ghcb),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+		__field(u64, exit_reason)
+		__field(u64, info1)
+		__field(u64, info2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id     = vcpu_id;
+		__entry->exit_reason = ghcb->save.sw_exit_code;
+		__entry->info1       = ghcb->save.sw_exit_info_1;
+		__entry->info2       = ghcb->save.sw_exit_info_2;
+	),
+
+	TP_printk("vcpu %u, exit_reason %llx, exit_info1 %llx, exit_info2 %llx",
+		  __entry->vcpu_id, __entry->exit_reason,
+		  __entry->info1, __entry->info2)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9970c0b7854f..ef85340e05ea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10790,3 +10790,5 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 14/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (12 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 13/35] KVM: SVM: Create trace events for VMGEXIT processing Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 15/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004 Tom Lendacky
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The GHCB defines a GHCB MSR protocol using the lower 12-bits of the GHCB
MSR (in the hypervisor this corresponds to the GHCB GPA field in the
VMCB).

Function 0x002 is a request to set the GHCB MSR value to the SEV INFO as
per the specification via the VMCB GHCB GPA field.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 26 +++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h | 17 +++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f0fd89788de7..07082c752c76 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -20,6 +20,7 @@
 #include "svm.h"
 #include "trace.h"
 
+static u8 sev_enc_bit;
 static int sev_flush_asids(void);
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -1130,6 +1131,9 @@ void __init sev_hardware_setup(void)
 	/* Retrieve SEV CPUID information */
 	cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
 
+	/* Set encryption bit location for SEV-ES guests */
+	sev_enc_bit = ebx & 0x3f;
+
 	/* Maximum number of encrypted guests supported simultaneously */
 	max_sev_asid = ecx;
 
@@ -1219,9 +1223,29 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 }
 
+static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
+{
+	svm->vmcb->control.ghcb_gpa = value;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
-	return -EINVAL;
+	struct vmcb_control_area *control = &svm->vmcb->control;
+	u64 ghcb_info;
+
+	ghcb_info = control->ghcb_gpa & GHCB_MSR_INFO_MASK;
+
+	switch (ghcb_info) {
+	case GHCB_MSR_SEV_INFO_REQ:
+		set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+						    GHCB_VERSION_MIN,
+						    sev_enc_bit));
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 1;
 }
 
 int sev_handle_vmgexit(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1690e52d5265..b1a5d90a860c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -506,9 +506,26 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
+#define GHCB_VERSION_MAX		1ULL
+#define GHCB_VERSION_MIN		1ULL
+
 #define GHCB_MSR_INFO_POS		0
 #define GHCB_MSR_INFO_MASK		(BIT_ULL(12) - 1)
 
+#define GHCB_MSR_SEV_INFO_RESP		0x001
+#define GHCB_MSR_SEV_INFO_REQ		0x002
+#define GHCB_MSR_VER_MAX_POS		48
+#define GHCB_MSR_VER_MAX_MASK		0xffff
+#define GHCB_MSR_VER_MIN_POS		32
+#define GHCB_MSR_VER_MIN_MASK		0xffff
+#define GHCB_MSR_CBIT_POS		24
+#define GHCB_MSR_CBIT_MASK		0xff
+#define GHCB_MSR_SEV_INFO(_max, _min, _cbit)				\
+	((((_max) & GHCB_MSR_VER_MAX_MASK) << GHCB_MSR_VER_MAX_POS) |	\
+	 (((_min) & GHCB_MSR_VER_MIN_MASK) << GHCB_MSR_VER_MIN_POS) |	\
+	 (((_cbit) & GHCB_MSR_CBIT_MASK) << GHCB_MSR_CBIT_POS) |	\
+	 GHCB_MSR_SEV_INFO_RESP)
+
 extern unsigned int max_sev_asid;
 
 static inline bool svm_sev_enabled(void)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 15/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (13 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 14/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002 Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 16/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100 Tom Lendacky
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The GHCB defines a GHCB MSR protocol using the lower 12-bits of the GHCB
MSR (in the hypervisor this corresponds to the GHCB GPA field in the
VMCB).

Function 0x004 is a request for CPUID information. Only a single CPUID
result register can be sent per invocation, so the protocol defines the
register that is requested. The GHCB MSR value is set to the CPUID
register value as per the specification via the VMCB GHCB GPA field.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 55 ++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.h |  9 +++++++
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 07082c752c76..5cf823e1ce01 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1223,6 +1223,18 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 }
 
+static void set_ghcb_msr_bits(struct vcpu_svm *svm, u64 value, u64 mask,
+			      unsigned int pos)
+{
+	svm->vmcb->control.ghcb_gpa &= ~(mask << pos);
+	svm->vmcb->control.ghcb_gpa |= (value & mask) << pos;
+}
+
+static u64 get_ghcb_msr_bits(struct vcpu_svm *svm, u64 mask, unsigned int pos)
+{
+	return (svm->vmcb->control.ghcb_gpa >> pos) & mask;
+}
+
 static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 {
 	svm->vmcb->control.ghcb_gpa = value;
@@ -1232,6 +1244,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
 	u64 ghcb_info;
+	int ret = 1;
 
 	ghcb_info = control->ghcb_gpa & GHCB_MSR_INFO_MASK;
 
@@ -1241,11 +1254,49 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 						    GHCB_VERSION_MIN,
 						    sev_enc_bit));
 		break;
+	case GHCB_MSR_CPUID_REQ: {
+		u64 cpuid_fn, cpuid_reg, cpuid_value;
+
+		cpuid_fn = get_ghcb_msr_bits(svm,
+					     GHCB_MSR_CPUID_FUNC_MASK,
+					     GHCB_MSR_CPUID_FUNC_POS);
+
+		/* Initialize the registers needed by the CPUID intercept */
+		svm_rax_write(svm, cpuid_fn);
+		svm_rcx_write(svm, 0);
+
+		ret = svm_invoke_exit_handler(svm, SVM_EXIT_CPUID);
+		if (!ret) {
+			ret = -EINVAL;
+			break;
+		}
+
+		cpuid_reg = get_ghcb_msr_bits(svm,
+					      GHCB_MSR_CPUID_REG_MASK,
+					      GHCB_MSR_CPUID_REG_POS);
+		if (cpuid_reg == 0)
+			cpuid_value = svm_rax_read(svm);
+		else if (cpuid_reg == 1)
+			cpuid_value = svm_rbx_read(svm);
+		else if (cpuid_reg == 2)
+			cpuid_value = svm_rcx_read(svm);
+		else
+			cpuid_value = svm_rdx_read(svm);
+
+		set_ghcb_msr_bits(svm, cpuid_value,
+				  GHCB_MSR_CPUID_VALUE_MASK,
+				  GHCB_MSR_CPUID_VALUE_POS);
+
+		set_ghcb_msr_bits(svm, GHCB_MSR_CPUID_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
 	default:
-		return -EINVAL;
+		ret = -EINVAL;
 	}
 
-	return 1;
+	return ret;
 }
 
 int sev_handle_vmgexit(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index b1a5d90a860c..0a84fae34629 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -526,6 +526,15 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 	 (((_cbit) & GHCB_MSR_CBIT_MASK) << GHCB_MSR_CBIT_POS) |	\
 	 GHCB_MSR_SEV_INFO_RESP)
 
+#define GHCB_MSR_CPUID_REQ		0x004
+#define GHCB_MSR_CPUID_RESP		0x005
+#define GHCB_MSR_CPUID_FUNC_POS		32
+#define GHCB_MSR_CPUID_FUNC_MASK	0xffffffff
+#define GHCB_MSR_CPUID_VALUE_POS	32
+#define GHCB_MSR_CPUID_VALUE_MASK	0xffffffff
+#define GHCB_MSR_CPUID_REG_POS		30
+#define GHCB_MSR_CPUID_REG_MASK		0x3
+
 extern unsigned int max_sev_asid;
 
 static inline bool svm_sev_enabled(void)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 16/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (14 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 15/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004 Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 17/35] KVM: SVM: Create trace events for VMGEXIT MSR protocol processing Tom Lendacky
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The GHCB defines a GHCB MSR protocol using the lower 12-bits of the GHCB
MSR (in the hypervisor this corresponds to the GHCB GPA field in the
VMCB).

Function 0x100 is a request for termination of the guest. The guest has
encountered some situation for which it has requested to be terminated.
The GHCB MSR value contains the reason for the request.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 13 +++++++++++++
 arch/x86/kvm/svm/svm.h |  6 ++++++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5cf823e1ce01..8300f3846580 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1292,6 +1292,19 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_TERM_REQ: {
+		u64 reason_set, reason_code;
+
+		reason_set = get_ghcb_msr_bits(svm,
+					       GHCB_MSR_TERM_REASON_SET_MASK,
+					       GHCB_MSR_TERM_REASON_SET_POS);
+		reason_code = get_ghcb_msr_bits(svm,
+						GHCB_MSR_TERM_REASON_MASK,
+						GHCB_MSR_TERM_REASON_POS);
+		pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
+			reason_set, reason_code);
+		fallthrough;
+	}
 	default:
 		ret = -EINVAL;
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0a84fae34629..3574f52f8a1c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -535,6 +535,12 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 #define GHCB_MSR_CPUID_REG_POS		30
 #define GHCB_MSR_CPUID_REG_MASK		0x3
 
+#define GHCB_MSR_TERM_REQ		0x100
+#define GHCB_MSR_TERM_REASON_SET_POS	12
+#define GHCB_MSR_TERM_REASON_SET_MASK	0xf
+#define GHCB_MSR_TERM_REASON_POS	16
+#define GHCB_MSR_TERM_REASON_MASK	0xff
+
 extern unsigned int max_sev_asid;
 
 static inline bool svm_sev_enabled(void)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 17/35] KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (15 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 16/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100 Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 18/35] KVM: SVM: Support MMIO for an SEV-ES guest Tom Lendacky
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add trace events for entry to and exit from VMGEXIT MSR protocol
processing. The vCPU will be common for the trace events. The MSR
protocol processing is guided by the GHCB GPA in the VMCB, so the GHCB
GPA will represent the input and output values for the entry and exit
events, respectively. Additionally, the exit event will contain the
return code for the event.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c |  6 ++++++
 arch/x86/kvm/trace.h   | 44 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c     |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8300f3846580..92a4df26057a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1248,6 +1248,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 
 	ghcb_info = control->ghcb_gpa & GHCB_MSR_INFO_MASK;
 
+	trace_kvm_vmgexit_msr_protocol_enter(svm->vcpu.vcpu_id,
+					     control->ghcb_gpa);
+
 	switch (ghcb_info) {
 	case GHCB_MSR_SEV_INFO_REQ:
 		set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
@@ -1309,6 +1312,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		ret = -EINVAL;
 	}
 
+	trace_kvm_vmgexit_msr_protocol_exit(svm->vcpu.vcpu_id,
+					    control->ghcb_gpa, ret);
+
 	return ret;
 }
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 06e5c15d0508..117dc4a89c0a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1647,6 +1647,50 @@ TRACE_EVENT(kvm_vmgexit_exit,
 		  __entry->info1, __entry->info2)
 );
 
+/*
+ * Tracepoint for the start of VMGEXIT MSR procotol processing
+ */
+TRACE_EVENT(kvm_vmgexit_msr_protocol_enter,
+	TP_PROTO(unsigned int vcpu_id, u64 ghcb_gpa),
+	TP_ARGS(vcpu_id, ghcb_gpa),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+		__field(u64, ghcb_gpa)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id  = vcpu_id;
+		__entry->ghcb_gpa = ghcb_gpa;
+	),
+
+	TP_printk("vcpu %u, ghcb_gpa %016llx",
+		  __entry->vcpu_id, __entry->ghcb_gpa)
+);
+
+/*
+ * Tracepoint for the end of VMGEXIT MSR procotol processing
+ */
+TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
+	TP_PROTO(unsigned int vcpu_id, u64 ghcb_gpa, int result),
+	TP_ARGS(vcpu_id, ghcb_gpa, result),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, vcpu_id)
+		__field(u64, ghcb_gpa)
+		__field(int, result)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id  = vcpu_id;
+		__entry->ghcb_gpa = ghcb_gpa;
+		__entry->result   = result;
+	),
+
+	TP_printk("vcpu %u, ghcb_gpa %016llx, result %d",
+		  __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef85340e05ea..2a2a394126a2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10792,3 +10792,5 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 18/35] KVM: SVM: Support MMIO for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (16 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 17/35] KVM: SVM: Create trace events for VMGEXIT MSR protocol processing Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 19/35] KVM: SVM: Support port IO operations " Tom Lendacky
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

For an SEV-ES guest, MMIO is performed to a shared (un-encrypted) page
so that both the hypervisor and guest can read or write to it and each
see the contents.

The GHCB specification provides software-defined VMGEXIT exit codes to
indicate a request for an MMIO read or an MMIO write. Add support to
recognize the MMIO requests and invoke SEV-ES specific routines that
can complete the MMIO operation. These routines use common KVM support
to complete the MMIO operation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 116 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |   3 +
 arch/x86/kvm/svm/svm.h |   6 ++
 arch/x86/kvm/x86.c     | 123 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h     |   5 ++
 5 files changed, 253 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 92a4df26057a..740b44485f36 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1191,6 +1191,24 @@ static void pre_sev_es_run(struct vcpu_svm *svm)
 	if (!svm->ghcb)
 		return;
 
+	if (svm->ghcb_sa_free) {
+		/*
+		 * The scratch area lives outside the GHCB, so there is a
+		 * buffer that, depending on the operation performed, may
+		 * need to be synced, then freed.
+		 */
+		if (svm->ghcb_sa_sync) {
+			kvm_write_guest(svm->vcpu.kvm,
+					ghcb_get_sw_scratch(svm->ghcb),
+					svm->ghcb_sa, svm->ghcb_sa_len);
+			svm->ghcb_sa_sync = false;
+		}
+
+		kfree(svm->ghcb_sa);
+		svm->ghcb_sa = NULL;
+		svm->ghcb_sa_free = false;
+	}
+
 	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
 
 	kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
@@ -1223,6 +1241,86 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 }
 
+#define GHCB_SCRATCH_AREA_LIMIT		(16ULL * PAGE_SIZE)
+static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
+{
+	struct vmcb_control_area *control = &svm->vmcb->control;
+	struct ghcb *ghcb = svm->ghcb;
+	u64 ghcb_scratch_beg, ghcb_scratch_end;
+	u64 scratch_gpa_beg, scratch_gpa_end;
+	void *scratch_va;
+
+	scratch_gpa_beg = ghcb_get_sw_scratch(ghcb);
+	if (!scratch_gpa_beg) {
+		pr_err("vmgexit: scratch gpa not provided\n");
+		return false;
+	}
+
+	scratch_gpa_end = scratch_gpa_beg + len;
+	if (scratch_gpa_end < scratch_gpa_beg) {
+		pr_err("vmgexit: scratch length (%#llx) not valid for scratch address (%#llx)\n",
+		       len, scratch_gpa_beg);
+		return false;
+	}
+
+	if ((scratch_gpa_beg & PAGE_MASK) == control->ghcb_gpa) {
+		/* Scratch area begins within GHCB */
+		ghcb_scratch_beg = control->ghcb_gpa +
+				   offsetof(struct ghcb, shared_buffer);
+		ghcb_scratch_end = control->ghcb_gpa +
+				   offsetof(struct ghcb, reserved_1);
+
+		/*
+		 * If the scratch area begins within the GHCB, it must be
+		 * completely contained in the GHCB shared buffer area.
+		 */
+		if (scratch_gpa_beg < ghcb_scratch_beg ||
+		    scratch_gpa_end > ghcb_scratch_end) {
+			pr_err("vmgexit: scratch area is outside of GHCB shared buffer area (%#llx - %#llx)\n",
+			       scratch_gpa_beg, scratch_gpa_end);
+			return false;
+		}
+
+		scratch_va = (void *)svm->ghcb;
+		scratch_va += (scratch_gpa_beg - control->ghcb_gpa);
+	} else {
+		/*
+		 * The guest memory must be read into a kernel buffer, so
+		 * limit the size
+		 */
+		if (len > GHCB_SCRATCH_AREA_LIMIT) {
+			pr_err("vmgexit: scratch area exceeds KVM limits (%#llx requested, %#llx limit)\n",
+			       len, GHCB_SCRATCH_AREA_LIMIT);
+			return false;
+		}
+		scratch_va = kzalloc(len, GFP_KERNEL);
+		if (!scratch_va)
+			return false;
+
+		if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, scratch_va, len)) {
+			/* Unable to copy scratch area from guest */
+			pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
+
+			kfree(scratch_va);
+			return false;
+		}
+
+		/*
+		 * The scratch area is outside the GHCB. The operation will
+		 * dictate whether the buffer needs to be synced before running
+		 * the vCPU next time (i.e. a read was requested so the data
+		 * must be written back to the guest memory).
+		 */
+		svm->ghcb_sa_sync = sync;
+		svm->ghcb_sa_free = true;
+	}
+
+	svm->ghcb_sa = scratch_va;
+	svm->ghcb_sa_len = len;
+
+	return true;
+}
+
 static void set_ghcb_msr_bits(struct vcpu_svm *svm, u64 value, u64 mask,
 			      unsigned int pos)
 {
@@ -1356,6 +1454,24 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
 
 	ret = -EINVAL;
 	switch (ghcb_get_sw_exit_code(ghcb)) {
+	case SVM_VMGEXIT_MMIO_READ:
+		if (!setup_vmgexit_scratch(svm, true, control->exit_info_2))
+			break;
+
+		ret = kvm_sev_es_mmio_read(&svm->vcpu,
+					   control->exit_info_1,
+					   control->exit_info_2,
+					   svm->ghcb_sa);
+		break;
+	case SVM_VMGEXIT_MMIO_WRITE:
+		if (!setup_vmgexit_scratch(svm, false, control->exit_info_2))
+			break;
+
+		ret = kvm_sev_es_mmio_write(&svm->vcpu,
+					    control->exit_info_1,
+					    control->exit_info_2,
+					    svm->ghcb_sa);
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		pr_err("vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
 		       control->exit_info_1,
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 89ee9d533e9a..439b0d0e53eb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1306,6 +1306,9 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 		}
 
 		__free_page(virt_to_page(svm->vmsa));
+
+		if (svm->ghcb_sa_free)
+			kfree(svm->ghcb_sa);
 	}
 
 	__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 3574f52f8a1c..8de45462ff4a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -165,6 +165,12 @@ struct vcpu_svm {
 	struct vmcb_save_area *vmsa;
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
+
+	/* SEV-ES scratch area support */
+	void *ghcb_sa;
+	u64 ghcb_sa_len;
+	bool ghcb_sa_sync;
+	bool ghcb_sa_free;
 };
 
 struct svm_cpu_data {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2a2a394126a2..a0070eeeb139 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10768,6 +10768,129 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c
 }
 EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error);
 
+static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu)
+{
+	struct kvm_run *run = vcpu->run;
+	struct kvm_mmio_fragment *frag;
+	unsigned int len;
+
+	BUG_ON(!vcpu->mmio_needed);
+
+	/* Complete previous fragment */
+	frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment];
+	len = min(8u, frag->len);
+	if (!vcpu->mmio_is_write)
+		memcpy(frag->data, run->mmio.data, len);
+
+	if (frag->len <= 8) {
+		/* Switch to the next fragment. */
+		frag++;
+		vcpu->mmio_cur_fragment++;
+	} else {
+		/* Go forward to the next mmio piece. */
+		frag->data += len;
+		frag->gpa += len;
+		frag->len -= len;
+	}
+
+	if (vcpu->mmio_cur_fragment >= vcpu->mmio_nr_fragments) {
+		vcpu->mmio_needed = 0;
+
+		// VMG change, at this point, we're always done
+		// RIP has already been advanced
+		return 1;
+	}
+
+	// More MMIO is needed
+	run->mmio.phys_addr = frag->gpa;
+	run->mmio.len = min(8u, frag->len);
+	run->mmio.is_write = vcpu->mmio_is_write;
+	if (run->mmio.is_write)
+		memcpy(run->mmio.data, frag->data, min(8u, frag->len));
+	run->exit_reason = KVM_EXIT_MMIO;
+
+	vcpu->arch.complete_userspace_io = complete_sev_es_emulated_mmio;
+
+	return 0;
+}
+
+int kvm_sev_es_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
+			  void *data)
+{
+	int handled;
+	struct kvm_mmio_fragment *frag;
+
+	if (!data)
+		return -EINVAL;
+
+	handled = write_emultor.read_write_mmio(vcpu, gpa, bytes, data);
+	if (handled == bytes)
+		return 1;
+
+	bytes -= handled;
+	gpa += handled;
+	data += handled;
+
+	/*TODO: Check if need to increment number of frags */
+	frag = vcpu->mmio_fragments;
+	vcpu->mmio_nr_fragments = 1;
+	frag->len = bytes;
+	frag->gpa = gpa;
+	frag->data = data;
+
+	vcpu->mmio_needed = 1;
+	vcpu->mmio_cur_fragment = 0;
+
+	vcpu->run->mmio.phys_addr = gpa;
+	vcpu->run->mmio.len = min(8u, frag->len);
+	vcpu->run->mmio.is_write = 1;
+	memcpy(vcpu->run->mmio.data, frag->data, min(8u, frag->len));
+	vcpu->run->exit_reason = KVM_EXIT_MMIO;
+
+	vcpu->arch.complete_userspace_io = complete_sev_es_emulated_mmio;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_write);
+
+int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
+			 void *data)
+{
+	int handled;
+	struct kvm_mmio_fragment *frag;
+
+	if (!data)
+		return -EINVAL;
+
+	handled = read_emultor.read_write_mmio(vcpu, gpa, bytes, data);
+	if (handled == bytes)
+		return 1;
+
+	bytes -= handled;
+	gpa += handled;
+	data += handled;
+
+	/*TODO: Check if need to increment number of frags */
+	frag = vcpu->mmio_fragments;
+	vcpu->mmio_nr_fragments = 1;
+	frag->len = bytes;
+	frag->gpa = gpa;
+	frag->data = data;
+
+	vcpu->mmio_needed = 1;
+	vcpu->mmio_cur_fragment = 0;
+
+	vcpu->run->mmio.phys_addr = gpa;
+	vcpu->run->mmio.len = min(8u, frag->len);
+	vcpu->run->mmio.is_write = 0;
+	vcpu->run->exit_reason = KVM_EXIT_MMIO;
+
+	vcpu->arch.complete_userspace_io = complete_sev_es_emulated_mmio;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_read);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 995ab696dcf0..ce3b7d3d8631 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -398,4 +398,9 @@ bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu);
 	__reserved_bits;                                \
 })
 
+int kvm_sev_es_mmio_write(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
+			  void *dst);
+int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
+			 void *dst);
+
 #endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 19/35] KVM: SVM: Support port IO operations for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (17 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 18/35] KVM: SVM: Support MMIO for an SEV-ES guest Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 20/35] KVM: SVM: Add SEV/SEV-ES support for intercepting INVD Tom Lendacky
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

For an SEV-ES guest, port IO is performed to a shared (un-encrypted) page
so that both the hypervisor and guest can read or write to it and each
see the contents.

For port IO operations, invoke SEV-ES specific routines that can complete
the operation using common KVM port IO support.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm/sev.c          |  9 ++++++
 arch/x86/kvm/svm/svm.c          | 11 +++++--
 arch/x86/kvm/svm/svm.h          |  1 +
 arch/x86/kvm/x86.c              | 51 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h              |  3 ++
 6 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3e2a3d2a8ba8..7320a9c68a5a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -613,6 +613,7 @@ struct kvm_vcpu_arch {
 
 	struct kvm_pio_request pio;
 	void *pio_data;
+	void *guest_ins_data;
 
 	u8 event_exit_inst_len;
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 740b44485f36..da1736d228a6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1483,3 +1483,12 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
 
 	return ret;
 }
+
+int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
+{
+	if (!setup_vmgexit_scratch(svm, in, svm->vmcb->control.exit_info_2))
+		return -EINVAL;
+
+	return kvm_sev_es_string_io(&svm->vcpu, size, port,
+				    svm->ghcb_sa, svm->ghcb_sa_len, in);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 439b0d0e53eb..37c98e85aa62 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1984,11 +1984,16 @@ static int io_interception(struct vcpu_svm *svm)
 	++svm->vcpu.stat.io_exits;
 	string = (io_info & SVM_IOIO_STR_MASK) != 0;
 	in = (io_info & SVM_IOIO_TYPE_MASK) != 0;
-	if (string)
-		return kvm_emulate_instruction(vcpu, 0);
-
 	port = io_info >> 16;
 	size = (io_info & SVM_IOIO_SIZE_MASK) >> SVM_IOIO_SIZE_SHIFT;
+
+	if (string) {
+		if (sev_es_guest(vcpu->kvm))
+			return sev_es_string_io(svm, size, port, in);
+		else
+			return kvm_emulate_instruction(vcpu, 0);
+	}
+
 	svm->next_rip = svm->vmcb->control.exit_info_2;
 
 	return kvm_fast_pio(&svm->vcpu, size, port, in);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8de45462ff4a..9f1c8ed88c79 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -564,6 +564,7 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_hardware_setup(void);
 void sev_hardware_teardown(void);
 int sev_handle_vmgexit(struct vcpu_svm *svm);
+int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
 
 /* VMSA Accessor functions */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a0070eeeb139..674719d801d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10372,6 +10372,10 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
 
 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
 {
+	/* Can't read RIP of an SEV-ES guest, just return 0 */
+	if (vcpu->arch.vmsa_encrypted)
+		return 0;
+
 	if (is_64_bit_mode(vcpu))
 		return kvm_rip_read(vcpu);
 	return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) +
@@ -10768,6 +10772,53 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c
 }
 EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error);
 
+static int complete_sev_es_emulated_ins(struct kvm_vcpu *vcpu)
+{
+	memcpy(vcpu->arch.guest_ins_data, vcpu->arch.pio_data,
+	       vcpu->arch.pio.count * vcpu->arch.pio.size);
+	vcpu->arch.pio.count = 0;
+
+	return 1;
+}
+
+static int kvm_sev_es_outs(struct kvm_vcpu *vcpu, unsigned int size,
+			   unsigned int port, void *data,  unsigned int count)
+{
+	int ret;
+
+	ret = emulator_pio_out_emulated(vcpu->arch.emulate_ctxt, size, port,
+					data, count);
+	vcpu->arch.pio.count = 0;
+
+	return 0;
+}
+
+static int kvm_sev_es_ins(struct kvm_vcpu *vcpu, unsigned int size,
+			  unsigned int port, void *data, unsigned int count)
+{
+	int ret;
+
+	ret = emulator_pio_in_emulated(vcpu->arch.emulate_ctxt, size, port,
+				       data, count);
+	if (ret) {
+		vcpu->arch.pio.count = 0;
+	} else {
+		vcpu->arch.guest_ins_data = data;
+		vcpu->arch.complete_userspace_io = complete_sev_es_emulated_ins;
+	}
+
+	return 0;
+}
+
+int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
+			 unsigned int port, void *data,  unsigned int count,
+			 int in)
+{
+	return in ? kvm_sev_es_ins(vcpu, size, port, data, count)
+		  : kvm_sev_es_outs(vcpu, size, port, data, count);
+}
+EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
+
 static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu)
 {
 	struct kvm_run *run = vcpu->run;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index ce3b7d3d8631..ae68670f5289 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -402,5 +402,8 @@ int kvm_sev_es_mmio_write(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
 			  void *dst);
 int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
 			 void *dst);
+int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
+			 unsigned int port, void *data,  unsigned int count,
+			 int in);
 
 #endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 20/35] KVM: SVM: Add SEV/SEV-ES support for intercepting INVD
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (18 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 19/35] KVM: SVM: Support port IO operations " Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 22:00   ` Sean Christopherson
  2020-09-14 20:15 ` [RFC PATCH 21/35] KVM: SVM: Add support for EFER write traps for an SEV-ES guest Tom Lendacky
                   ` (15 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The INVD instruction intercept performs emulation. Emulation can't be done
on an SEV or SEV-ES guest because the guest memory is encrypted.

Provide a specific intercept routine for the INVD intercept. Within this
intercept routine, skip the instruction for an SEV or SEV-ES guest since
it is emulated as a NOP anyway.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/svm.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 37c98e85aa62..ac64a5b128b2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2275,6 +2275,17 @@ static int iret_interception(struct vcpu_svm *svm)
 	return 1;
 }
 
+static int invd_interception(struct vcpu_svm *svm)
+{
+	/*
+	 * Can't do emulation on any type of SEV guest and INVD is emulated
+	 * as a NOP, so just skip it.
+	 */
+	return (sev_guest(svm->vcpu.kvm))
+		? kvm_skip_emulated_instruction(&svm->vcpu)
+		: kvm_emulate_instruction(&svm->vcpu, 0);
+}
+
 static int invlpg_interception(struct vcpu_svm *svm)
 {
 	if (!static_cpu_has(X86_FEATURE_DECODEASSISTS))
@@ -2912,7 +2923,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_RDPMC]			= rdpmc_interception,
 	[SVM_EXIT_CPUID]			= cpuid_interception,
 	[SVM_EXIT_IRET]                         = iret_interception,
-	[SVM_EXIT_INVD]                         = emulate_on_interception,
+	[SVM_EXIT_INVD]                         = invd_interception,
 	[SVM_EXIT_PAUSE]			= pause_interception,
 	[SVM_EXIT_HLT]				= halt_interception,
 	[SVM_EXIT_INVLPG]			= invlpg_interception,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 21/35] KVM: SVM: Add support for EFER write traps for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (19 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 20/35] KVM: SVM: Add SEV/SEV-ES support for intercepting INVD Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
       [not found]   ` <20200914220800.GI7192@sjchrist-ice>
  2020-09-14 20:15 ` [RFC PATCH 22/35] KVM: SVM: Add support for CR0 " Tom Lendacky
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

For SEV-ES guests, the interception of EFER write access is not
recommended. EFER interception occurs prior to EFER being modified and
the hypervisor is unable to modify EFER itself because the register is
located in the encrypted register state.

SEV-ES guests introduce a new EFER write trap. This trap provides
intercept support of an EFER write after it has been modified. The new
EFER value is provided in the VMCB EXITINFO1 field, allowing the
hypervisor to track the setting of the guest EFER.

Add support to track the value of the guest EFER value using the EFER
write trap so that the hypervisor understands the guest operating mode.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/include/uapi/asm/svm.h |  2 ++
 arch/x86/kvm/svm/svm.c          | 12 ++++++++++++
 arch/x86/kvm/x86.c              | 12 ++++++++++++
 4 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7320a9c68a5a..b535b690eb66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1427,6 +1427,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
 		    int reason, bool has_error_code, u32 error_code);
 
+int kvm_track_efer(struct kvm_vcpu *vcpu, u64 efer);
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 0bc3942ffdd3..ce937a242995 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -77,6 +77,7 @@
 #define SVM_EXIT_MWAIT_COND    0x08c
 #define SVM_EXIT_XSETBV        0x08d
 #define SVM_EXIT_RDPRU         0x08e
+#define SVM_EXIT_EFER_WRITE_TRAP		0x08f
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -183,6 +184,7 @@
 	{ SVM_EXIT_MONITOR,     "monitor" }, \
 	{ SVM_EXIT_MWAIT,       "mwait" }, \
 	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
+	{ SVM_EXIT_EFER_WRITE_TRAP,	"write_efer_trap" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ac64a5b128b2..ac467225a51d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2466,6 +2466,17 @@ static int cr8_write_interception(struct vcpu_svm *svm)
 	return 0;
 }
 
+static int efer_trap(struct vcpu_svm *svm)
+{
+	int ret;
+
+	ret = kvm_track_efer(&svm->vcpu, svm->vmcb->control.exit_info_1);
+	if (ret)
+		return ret;
+
+	return kvm_complete_insn_gp(&svm->vcpu, 0);
+}
+
 static int svm_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	msr->data = 0;
@@ -2944,6 +2955,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_MWAIT]			= mwait_interception,
 	[SVM_EXIT_XSETBV]			= xsetbv_interception,
 	[SVM_EXIT_RDPRU]			= rdpru_interception,
+	[SVM_EXIT_EFER_WRITE_TRAP]		= efer_trap,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 674719d801d2..b65bd0c986d4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1480,6 +1480,18 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	return 0;
 }
 
+int kvm_track_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+	struct msr_data msr_info;
+
+	msr_info.host_initiated = false;
+	msr_info.index = MSR_EFER;
+	msr_info.data = efer;
+
+	return set_efer(vcpu, &msr_info);
+}
+EXPORT_SYMBOL_GPL(kvm_track_efer);
+
 void kvm_enable_efer_bits(u64 mask)
 {
        efer_reserved_bits &= ~mask;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 22/35] KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (20 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 21/35] KVM: SVM: Add support for EFER write traps for an SEV-ES guest Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 22:13   ` Sean Christopherson
  2020-09-14 20:15 ` [RFC PATCH 23/35] KVM: SVM: Add support for CR4 " Tom Lendacky
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

For SEV-ES guests, the interception of control register write access
is not recommended. Control register interception occurs prior to the
control register being modified and the hypervisor is unable to modify
the control register itself because the register is located in the
encrypted register state.

SEV-ES guests introduce new control register write traps. These traps
provide intercept support of a control register write after the control
register has been modified. The new control register value is provided in
the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
of the guest control registers.

Add support to track the value of the guest CR0 register using the control
register write trap so that the hypervisor understands the guest operating
mode.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/include/uapi/asm/svm.h | 17 +++++++++++++
 arch/x86/kvm/svm/svm.c          | 20 +++++++++++++++
 arch/x86/kvm/x86.c              | 43 ++++++++++++++++++++++++---------
 4 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b535b690eb66..9cc9b65bea7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1432,6 +1432,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
+int kvm_track_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
 int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val);
 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index ce937a242995..cc45d7996e9c 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -78,6 +78,22 @@
 #define SVM_EXIT_XSETBV        0x08d
 #define SVM_EXIT_RDPRU         0x08e
 #define SVM_EXIT_EFER_WRITE_TRAP		0x08f
+#define SVM_EXIT_CR0_WRITE_TRAP			0x090
+#define SVM_EXIT_CR1_WRITE_TRAP			0x091
+#define SVM_EXIT_CR2_WRITE_TRAP			0x092
+#define SVM_EXIT_CR3_WRITE_TRAP			0x093
+#define SVM_EXIT_CR4_WRITE_TRAP			0x094
+#define SVM_EXIT_CR5_WRITE_TRAP			0x095
+#define SVM_EXIT_CR6_WRITE_TRAP			0x096
+#define SVM_EXIT_CR7_WRITE_TRAP			0x097
+#define SVM_EXIT_CR8_WRITE_TRAP			0x098
+#define SVM_EXIT_CR9_WRITE_TRAP			0x099
+#define SVM_EXIT_CR10_WRITE_TRAP		0x09a
+#define SVM_EXIT_CR11_WRITE_TRAP		0x09b
+#define SVM_EXIT_CR12_WRITE_TRAP		0x09c
+#define SVM_EXIT_CR13_WRITE_TRAP		0x09d
+#define SVM_EXIT_CR14_WRITE_TRAP		0x09e
+#define SVM_EXIT_CR15_WRITE_TRAP		0x09f
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -185,6 +201,7 @@
 	{ SVM_EXIT_MWAIT,       "mwait" }, \
 	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
 	{ SVM_EXIT_EFER_WRITE_TRAP,	"write_efer_trap" }, \
+	{ SVM_EXIT_CR0_WRITE_TRAP,	"write_cr0_trap" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ac467225a51d..506656988559 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2413,6 +2413,25 @@ static int cr_interception(struct vcpu_svm *svm)
 	return kvm_complete_insn_gp(&svm->vcpu, err);
 }
 
+static int cr_trap(struct vcpu_svm *svm)
+{
+	unsigned int cr;
+
+	cr = svm->vmcb->control.exit_code - SVM_EXIT_CR0_WRITE_TRAP;
+
+	switch (cr) {
+	case 0:
+		kvm_track_cr0(&svm->vcpu, svm->vmcb->control.exit_info_1);
+		break;
+	default:
+		WARN(1, "unhandled CR%d write trap", cr);
+		kvm_queue_exception(&svm->vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	return kvm_complete_insn_gp(&svm->vcpu, 0);
+}
+
 static int dr_interception(struct vcpu_svm *svm)
 {
 	int reg, dr;
@@ -2956,6 +2975,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_XSETBV]			= xsetbv_interception,
 	[SVM_EXIT_RDPRU]			= rdpru_interception,
 	[SVM_EXIT_EFER_WRITE_TRAP]		= efer_trap,
+	[SVM_EXIT_CR0_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b65bd0c986d4..6f5988c305e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -799,11 +799,29 @@ bool pdptrs_changed(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(pdptrs_changed);
 
+static void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0,
+			     unsigned long cr0)
+{
+	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
+
+	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
+		kvm_clear_async_pf_completion_queue(vcpu);
+		kvm_async_pf_hash_reset(vcpu);
+	}
+
+	if ((cr0 ^ old_cr0) & update_bits)
+		kvm_mmu_reset_context(vcpu);
+
+	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
+	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
+	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
+		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
+}
+
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
 	unsigned long old_cr0 = kvm_read_cr0(vcpu);
 	unsigned long pdptr_bits = X86_CR0_CD | X86_CR0_NW | X86_CR0_PG;
-	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
 
 	cr0 |= X86_CR0_ET;
 
@@ -842,22 +860,23 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 
 	kvm_x86_ops.set_cr0(vcpu, cr0);
 
-	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
-		kvm_clear_async_pf_completion_queue(vcpu);
-		kvm_async_pf_hash_reset(vcpu);
-	}
+	kvm_post_set_cr0(vcpu, old_cr0, cr0);
 
-	if ((cr0 ^ old_cr0) & update_bits)
-		kvm_mmu_reset_context(vcpu);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_set_cr0);
 
-	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
-	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
-	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
-		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
+int kvm_track_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+	unsigned long old_cr0 = kvm_read_cr0(vcpu);
+
+	vcpu->arch.cr0 = cr0;
+
+	kvm_post_set_cr0(vcpu, old_cr0, cr0);
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_set_cr0);
+EXPORT_SYMBOL_GPL(kvm_track_cr0);
 
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 23/35] KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (21 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 22/35] KVM: SVM: Add support for CR0 " Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 22:16   ` Sean Christopherson
  2020-09-14 20:15 ` [RFC PATCH 24/35] KVM: SVM: Add support for CR8 " Tom Lendacky
                   ` (12 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

For SEV-ES guests, the interception of control register write access
is not recommended. Control register interception occurs prior to the
control register being modified and the hypervisor is unable to modify
the control register itself because the register is located in the
encrypted register state.

SEV-ES guests introduce new control register write traps. These traps
provide intercept support of a control register write after the control
register has been modified. The new control register value is provided in
the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
of the guest control registers.

Add support to track the value of the guest CR4 register using the control
register write trap so that the hypervisor understands the guest operating
mode.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/include/uapi/asm/svm.h |  1 +
 arch/x86/kvm/svm/svm.c          |  4 ++++
 arch/x86/kvm/x86.c              | 20 ++++++++++++++++++++
 4 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9cc9b65bea7e..e4fd2600ecf6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1433,6 +1433,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
 int kvm_track_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
+int kvm_track_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
 int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val);
 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index cc45d7996e9c..ea88789d71f2 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -202,6 +202,7 @@
 	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
 	{ SVM_EXIT_EFER_WRITE_TRAP,	"write_efer_trap" }, \
 	{ SVM_EXIT_CR0_WRITE_TRAP,	"write_cr0_trap" }, \
+	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 506656988559..ec5efa1d4344 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2423,6 +2423,9 @@ static int cr_trap(struct vcpu_svm *svm)
 	case 0:
 		kvm_track_cr0(&svm->vcpu, svm->vmcb->control.exit_info_1);
 		break;
+	case 4:
+		kvm_track_cr4(&svm->vcpu, svm->vmcb->control.exit_info_1);
+		break;
 	default:
 		WARN(1, "unhandled CR%d write trap", cr);
 		kvm_queue_exception(&svm->vcpu, UD_VECTOR);
@@ -2976,6 +2979,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_RDPRU]			= rdpru_interception,
 	[SVM_EXIT_EFER_WRITE_TRAP]		= efer_trap,
 	[SVM_EXIT_CR0_WRITE_TRAP]		= cr_trap,
+	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f5988c305e1..5e5f1e8fed3a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1033,6 +1033,26 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 }
 EXPORT_SYMBOL_GPL(kvm_set_cr4);
 
+int kvm_track_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	unsigned long old_cr4 = kvm_read_cr4(vcpu);
+	unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
+				   X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
+
+	if (kvm_x86_ops.set_cr4(vcpu, cr4))
+		return 1;
+
+	if (((cr4 ^ old_cr4) & pdptr_bits) ||
+	    (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
+		kvm_mmu_reset_context(vcpu);
+
+	if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE))
+		kvm_update_cpuid_runtime(vcpu);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_track_cr4);
+
 int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	bool skip_tlb_flush = false;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 24/35] KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (22 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 23/35] KVM: SVM: Add support for CR4 " Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 22:19   ` Sean Christopherson
  2020-09-14 20:15 ` [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES Tom Lendacky
                   ` (11 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

For SEV-ES guests, the interception of control register write access
is not recommended. Control register interception occurs prior to the
control register being modified and the hypervisor is unable to modify
the control register itself because the register is located in the
encrypted register state.

SEV-ES guests introduce new control register write traps. These traps
provide intercept support of a control register write after the control
register has been modified. The new control register value is provided in
the VMCB EXITINFO1 field, allowing the hypervisor to track the setting
of the guest control registers.

Add support to track the value of the guest CR8 register using the control
register write trap so that the hypervisor understands the guest operating
mode.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/include/uapi/asm/svm.h | 1 +
 arch/x86/kvm/svm/svm.c          | 4 ++++
 arch/x86/kvm/x86.c              | 6 ++++++
 4 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e4fd2600ecf6..790659494aae 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1434,6 +1434,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
 int kvm_track_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int kvm_track_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+int kvm_track_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
 int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
 int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val);
 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index ea88789d71f2..60830088e8e3 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -203,6 +203,7 @@
 	{ SVM_EXIT_EFER_WRITE_TRAP,	"write_efer_trap" }, \
 	{ SVM_EXIT_CR0_WRITE_TRAP,	"write_cr0_trap" }, \
 	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
+	{ SVM_EXIT_CR8_WRITE_TRAP,	"write_cr8_trap" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ec5efa1d4344..b35c2de1130c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2426,6 +2426,9 @@ static int cr_trap(struct vcpu_svm *svm)
 	case 4:
 		kvm_track_cr4(&svm->vcpu, svm->vmcb->control.exit_info_1);
 		break;
+	case 8:
+		kvm_track_cr8(&svm->vcpu, svm->vmcb->control.exit_info_1);
+		break;
 	default:
 		WARN(1, "unhandled CR%d write trap", cr);
 		kvm_queue_exception(&svm->vcpu, UD_VECTOR);
@@ -2980,6 +2983,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_EFER_WRITE_TRAP]		= efer_trap,
 	[SVM_EXIT_CR0_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
+	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5e5f1e8fed3a..6e445a76b691 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1109,6 +1109,12 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_cr8);
 
+int kvm_track_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
+{
+	return kvm_set_cr8(vcpu, cr8);
+}
+EXPORT_SYMBOL_GPL(kvm_track_cr8);
+
 static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
 {
 	int i;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (23 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 24/35] KVM: SVM: Add support for CR8 " Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 21:37   ` Sean Christopherson
  2020-09-14 20:15 ` [RFC PATCH 26/35] KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest Tom Lendacky
                   ` (10 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Since many of the registers used by the SEV-ES are encrypted and cannot
be read or written, adjust the __get_sregs() / __set_sregs() to only get
or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
is encrypted.

For __get_sregs(), return the actual value that is in use by the guest
as determined by the write trap support of the registers.

For __set_sregs(), set the arch specific value that KVM believes the guest
to be using. Note, this will not set the guest's actual value so it might
only be useful for such things as live migration.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/x86.c | 56 +++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6e445a76b691..76efe70cd635 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9090,6 +9090,9 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 {
 	struct desc_ptr dt;
 
+	if (vcpu->arch.vmsa_encrypted)
+		goto tracking_regs;
+
 	kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
 	kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
 	kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
@@ -9107,12 +9110,15 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	sregs->gdt.limit = dt.size;
 	sregs->gdt.base = dt.address;
 
-	sregs->cr0 = kvm_read_cr0(vcpu);
 	sregs->cr2 = vcpu->arch.cr2;
 	sregs->cr3 = kvm_read_cr3(vcpu);
+
+tracking_regs:
+	sregs->cr0 = kvm_read_cr0(vcpu);
 	sregs->cr4 = kvm_read_cr4(vcpu);
 	sregs->cr8 = kvm_get_cr8(vcpu);
 	sregs->efer = vcpu->arch.efer;
+
 	sregs->apic_base = kvm_get_apic_base(vcpu);
 
 	memset(sregs->interrupt_bitmap, 0, sizeof(sregs->interrupt_bitmap));
@@ -9248,18 +9254,6 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	if (kvm_set_apic_base(vcpu, &apic_base_msr))
 		goto out;
 
-	dt.size = sregs->idt.limit;
-	dt.address = sregs->idt.base;
-	kvm_x86_ops.set_idt(vcpu, &dt);
-	dt.size = sregs->gdt.limit;
-	dt.address = sregs->gdt.base;
-	kvm_x86_ops.set_gdt(vcpu, &dt);
-
-	vcpu->arch.cr2 = sregs->cr2;
-	mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
-	vcpu->arch.cr3 = sregs->cr3;
-	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
-
 	kvm_set_cr8(vcpu, sregs->cr8);
 
 	mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
@@ -9276,6 +9270,14 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	if (cpuid_update_needed)
 		kvm_update_cpuid_runtime(vcpu);
 
+	if (vcpu->arch.vmsa_encrypted)
+		goto tracking_regs;
+
+	vcpu->arch.cr2 = sregs->cr2;
+	mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
+	vcpu->arch.cr3 = sregs->cr3;
+	kvm_register_mark_available(vcpu, VCPU_EXREG_CR3);
+
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 	if (is_pae_paging(vcpu)) {
 		load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu));
@@ -9283,16 +9285,12 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	}
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 
-	if (mmu_reset_needed)
-		kvm_mmu_reset_context(vcpu);
-
-	max_bits = KVM_NR_INTERRUPTS;
-	pending_vec = find_first_bit(
-		(const unsigned long *)sregs->interrupt_bitmap, max_bits);
-	if (pending_vec < max_bits) {
-		kvm_queue_interrupt(vcpu, pending_vec, false);
-		pr_debug("Set back pending irq %d\n", pending_vec);
-	}
+	dt.size = sregs->idt.limit;
+	dt.address = sregs->idt.base;
+	kvm_x86_ops.set_idt(vcpu, &dt);
+	dt.size = sregs->gdt.limit;
+	dt.address = sregs->gdt.base;
+	kvm_x86_ops.set_gdt(vcpu, &dt);
 
 	kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
 	kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
@@ -9312,6 +9310,18 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	    !is_protmode(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
+tracking_regs:
+	if (mmu_reset_needed)
+		kvm_mmu_reset_context(vcpu);
+
+	max_bits = KVM_NR_INTERRUPTS;
+	pending_vec = find_first_bit(
+		(const unsigned long *)sregs->interrupt_bitmap, max_bits);
+	if (pending_vec < max_bits) {
+		kvm_queue_interrupt(vcpu, pending_vec, false);
+		pr_debug("Set back pending irq %d\n", pending_vec);
+	}
+
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 
 	ret = 0;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 26/35] KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (24 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
       [not found]   ` <20200914213917.GD7192@sjchrist-ice>
  2020-09-14 20:15 ` [RFC PATCH 27/35] KVM: SVM: Add support for booting APs for an " Tom Lendacky
                   ` (9 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The guest FPU is automatically restored on VMRUN and saved on VMEXIT by
the hardware, so there is no reason to do this in KVM.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/svm.c |  8 ++++++--
 arch/x86/kvm/x86.c     | 18 ++++++++++++++----
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b35c2de1130c..48699c41b62a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3682,7 +3682,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 		svm_set_dr6(svm, DR6_FIXED_1 | DR6_RTM);
 
 	clgi();
-	kvm_load_guest_xsave_state(vcpu);
+
+	if (!sev_es_guest(svm->vcpu.kvm))
+		kvm_load_guest_xsave_state(vcpu);
 
 	if (lapic_in_kernel(vcpu) &&
 		vcpu->arch.apic->lapic_timer.timer_advance_ns)
@@ -3728,7 +3730,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_before_interrupt(&svm->vcpu);
 
-	kvm_load_host_xsave_state(vcpu);
+	if (!sev_es_guest(svm->vcpu.kvm))
+		kvm_load_host_xsave_state(vcpu);
+
 	stgi();
 
 	/* Any pending NMI will happen here */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76efe70cd635..a53e24c1c5d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8896,9 +8896,14 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 
 	kvm_save_current_fpu(vcpu->arch.user_fpu);
 
-	/* PKRU is separately restored in kvm_x86_ops.run.  */
-	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
-				~XFEATURE_MASK_PKRU);
+	/*
+	 * An encrypted save area means that the guest state can't be
+	 * set by the hypervisor, so skip trying to set it.
+	 */
+	if (!vcpu->arch.vmsa_encrypted)
+		/* PKRU is separately restored in kvm_x86_ops.run. */
+		__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
+					~XFEATURE_MASK_PKRU);
 
 	fpregs_mark_activate();
 	fpregs_unlock();
@@ -8911,7 +8916,12 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
 	fpregs_lock();
 
-	kvm_save_current_fpu(vcpu->arch.guest_fpu);
+	/*
+	 * An encrypted save area means that the guest state can't be
+	 * read/saved by the hypervisor, so skip trying to save it.
+	 */
+	if (!vcpu->arch.vmsa_encrypted)
+		kvm_save_current_fpu(vcpu->arch.guest_fpu);
 
 	copy_kernel_to_fpregs(&vcpu->arch.user_fpu->state);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 27/35] KVM: SVM: Add support for booting APs for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (25 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 26/35] KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 28/35] KVM: X86: Update kvm_skip_emulated_instruction() " Tom Lendacky
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
  Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
  support. It is up to the guest to transfer control of the AP to the
  proper location.

Subsequent AP boot:
  KVM will expect to receive an AP Reset Hold exit event indicating that
  the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
  awaken it. When the AP Reset Hold exit event is received, KVM will place
  the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
  sequence, KVM will make the vCPU runnable. It is again up to the guest
  to then transfer control of the AP to the proper location.

The GHCB specification also requires the hypervisor to save the address of
an AP Jump Table so that, for example, vCPUs that have been parked by UEFI
can be started by the OS. Provide support for the AP Jump Table set/get
exit code.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm/sev.c          | 48 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |  7 +++++
 arch/x86/kvm/svm/svm.h          |  3 +++
 arch/x86/kvm/x86.c              |  9 +++++++
 5 files changed, 69 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 790659494aae..003f257d2155 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1237,6 +1237,8 @@ struct kvm_x86_ops {
 				   unsigned long val);
 
 	bool (*allow_debug)(struct kvm *kvm);
+
+	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index da1736d228a6..cbb5f1b191bb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -16,6 +16,8 @@
 #include <linux/swap.h>
 #include <linux/trace_events.h>
 
+#include <asm/trapnr.h>
+
 #include "x86.h"
 #include "svm.h"
 #include "trace.h"
@@ -1472,6 +1474,35 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
 					    control->exit_info_2,
 					    svm->ghcb_sa);
 		break;
+	case SVM_VMGEXIT_AP_HLT_LOOP:
+		svm->ap_hlt_loop = true;
+		ret = kvm_emulate_halt(&svm->vcpu);
+		break;
+	case SVM_VMGEXIT_AP_JUMP_TABLE: {
+		struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+
+		switch (control->exit_info_1) {
+		case 0:
+			/* Set AP jump table address */
+			sev->ap_jump_table = control->exit_info_2;
+			break;
+		case 1:
+			/* Get AP jump table address */
+			ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
+			break;
+		default:
+			pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
+			       control->exit_info_1);
+			ghcb_set_sw_exit_info_1(ghcb, 1);
+			ghcb_set_sw_exit_info_2(ghcb,
+						X86_TRAP_UD |
+						SVM_EVTINJ_TYPE_EXEPT |
+						SVM_EVTINJ_VALID);
+		}
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		pr_err("vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
 		       control->exit_info_1,
@@ -1492,3 +1523,20 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
 	return kvm_sev_es_string_io(&svm->vcpu, size, port,
 				    svm->ghcb_sa, svm->ghcb_sa_len, in);
 }
+
+void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	/* First SIPI: Use the the values as initially set by the VMM */
+	if (!svm->ap_hlt_loop)
+		return;
+
+	/*
+	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
+	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
+	 * non-zero value.
+	 */
+	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+	svm->ap_hlt_loop = false;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 48699c41b62a..ce1707dc9464 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4343,6 +4343,11 @@ static bool svm_allow_debug(struct kvm *kvm)
 	return !sev_es_guest(kvm);
 }
 
+static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
+{
+	sev_vcpu_deliver_sipi_vector(vcpu, vector);
+}
+
 static void svm_vm_destroy(struct kvm *kvm)
 {
 	avic_vm_destroy(kvm);
@@ -4486,6 +4491,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.reg_write_override = svm_reg_write_override,
 
 	.allow_debug = svm_allow_debug,
+
+	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9f1c8ed88c79..a0b226c90feb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -67,6 +67,7 @@ struct kvm_sev_info {
 	int fd;			/* SEV device fd */
 	unsigned long pages_locked; /* Number of pages locked */
 	struct list_head regions_list;  /* List of registered regions */
+	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
 };
 
 struct kvm_svm {
@@ -165,6 +166,7 @@ struct vcpu_svm {
 	struct vmcb_save_area *vmsa;
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
+	bool ap_hlt_loop;
 
 	/* SEV-ES scratch area support */
 	void *ghcb_sa;
@@ -565,6 +567,7 @@ void __init sev_hardware_setup(void);
 void sev_hardware_teardown(void);
 int sev_handle_vmgexit(struct vcpu_svm *svm);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
+void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 
 /* VMSA Accessor functions */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a53e24c1c5d1..23564d02d158 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9772,6 +9772,15 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 {
 	struct kvm_segment cs;
 
+	/*
+	 * For SEV-ES, the register state can't be altered by KVM. If the VMSA
+	 * is encrypted, call the vcpu_deliver_sipi_vector() x86 op.
+	 */
+	if (vcpu->arch.vmsa_encrypted) {
+		kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, vector);
+		return;
+	}
+
 	kvm_get_segment(vcpu, &cs, VCPU_SREG_CS);
 	cs.selector = vector << 8;
 	cs.base = vector << 12;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 28/35] KVM: X86: Update kvm_skip_emulated_instruction() for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (26 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 27/35] KVM: SVM: Add support for booting APs for an " Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 21:51   ` Sean Christopherson
  2020-09-14 20:15 ` [RFC PATCH 29/35] KVM: SVM: Add NMI support " Tom Lendacky
                   ` (7 subsequent siblings)
  35 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The register state for an SEV-ES guest is encrypted so the value of the
RIP cannot be updated. For an automatic exit, the RIP will be advanced
as necessary. For a non-automatic exit, it is up to the #VC handler in
the guest to advance the RIP.

Add support to skip any RIP updates in kvm_skip_emulated_instruction()
for an SEV-ES guest.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/x86.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 23564d02d158..1dbdca607511 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6874,13 +6874,17 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
 
 int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
-	unsigned long rflags = kvm_x86_ops.get_rflags(vcpu);
+	unsigned long rflags;
 	int r;
 
 	r = kvm_x86_ops.skip_emulated_instruction(vcpu);
 	if (unlikely(!r))
 		return 0;
 
+	if (vcpu->arch.vmsa_encrypted)
+		return 1;
+
+	rflags = kvm_x86_ops.get_rflags(vcpu);
 	/*
 	 * rflags is the old, "raw" value of the flags.  The new value has
 	 * not been saved yet.
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 29/35] KVM: SVM: Add NMI support for an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (27 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 28/35] KVM: X86: Update kvm_skip_emulated_instruction() " Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 30/35] KVM: SVM: Set the encryption mask for the SVM host save area Tom Lendacky
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The GHCB specification defines how NMIs are to be handled for an SEV-ES
guest. To detect the completion of an NMI the hypervisor must not
intercept the IRET instruction (because a #VC while running the NMI will
issue an IRET) and, instead, must receive an NMI Complete exit event from
the guest.

Update the KVM support for detecting the completion of NMIs in the guest
to follow the GHCB specification. When an SEV-ES guest is active, the
IRET instruction will no longer be intercepted. Now, when the NMI Complete
exit event is received, the iret_interception() function will be called
to simulate the completion of the NMI.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c |  3 +++
 arch/x86/kvm/svm/svm.c | 20 +++++++++++++-------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index cbb5f1b191bb..9bf7411a4b5d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1474,6 +1474,9 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
 					    control->exit_info_2,
 					    svm->ghcb_sa);
 		break;
+	case SVM_VMGEXIT_NMI_COMPLETE:
+		ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
+		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 		svm->ap_hlt_loop = true;
 		ret = kvm_emulate_halt(&svm->vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ce1707dc9464..fcd4f0d983e9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2268,9 +2268,11 @@ static int cpuid_interception(struct vcpu_svm *svm)
 static int iret_interception(struct vcpu_svm *svm)
 {
 	++svm->vcpu.stat.nmi_window_exits;
-	svm_clr_intercept(svm, INTERCEPT_IRET);
 	svm->vcpu.arch.hflags |= HF_IRET_MASK;
-	svm->nmi_iret_rip = kvm_rip_read(&svm->vcpu);
+	if (!sev_es_guest(svm->vcpu.kvm)) {
+		svm_clr_intercept(svm, INTERCEPT_IRET);
+		svm->nmi_iret_rip = kvm_rip_read(&svm->vcpu);
+	}
 	kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	return 1;
 }
@@ -3242,7 +3244,8 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 
 	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
 	vcpu->arch.hflags |= HF_NMI_MASK;
-	svm_set_intercept(svm, INTERCEPT_IRET);
+	if (!sev_es_guest(svm->vcpu.kvm))
+		svm_set_intercept(svm, INTERCEPT_IRET);
 	++vcpu->stat.nmi_injections;
 }
 
@@ -3326,10 +3329,12 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 
 	if (masked) {
 		svm->vcpu.arch.hflags |= HF_NMI_MASK;
-		svm_set_intercept(svm, INTERCEPT_IRET);
+		if (!sev_es_guest(svm->vcpu.kvm))
+			svm_set_intercept(svm, INTERCEPT_IRET);
 	} else {
 		svm->vcpu.arch.hflags &= ~HF_NMI_MASK;
-		svm_clr_intercept(svm, INTERCEPT_IRET);
+		if (!sev_es_guest(svm->vcpu.kvm))
+			svm_clr_intercept(svm, INTERCEPT_IRET);
 	}
 }
 
@@ -3507,8 +3512,9 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
 	 * If we've made progress since setting HF_IRET_MASK, we've
 	 * executed an IRET and can allow NMI injection.
 	 */
-	if ((svm->vcpu.arch.hflags & HF_IRET_MASK)
-	    && kvm_rip_read(&svm->vcpu) != svm->nmi_iret_rip) {
+	if ((svm->vcpu.arch.hflags & HF_IRET_MASK) &&
+	    (sev_es_guest(svm->vcpu.kvm) ||
+	     kvm_rip_read(&svm->vcpu) != svm->nmi_iret_rip)) {
 		svm->vcpu.arch.hflags &= ~(HF_NMI_MASK | HF_IRET_MASK);
 		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 30/35] KVM: SVM: Set the encryption mask for the SVM host save area
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (28 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 29/35] KVM: SVM: Add NMI support " Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 31/35] KVM: SVM: Update ASID allocation to support SEV-ES guests Tom Lendacky
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The SVM host save area is used to restore some host state on VMEXIT of an
SEV-ES guest. After allocating the save area, clear it and add the
encryption mask to the SVM host save area physical address that is
programmed into the VM_HSAVE_PA MSR.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 1 -
 arch/x86/kvm/svm/svm.c | 3 ++-
 arch/x86/kvm/svm/svm.h | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9bf7411a4b5d..15be71b30e2a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -30,7 +30,6 @@ unsigned int max_sev_asid;
 static unsigned int min_sev_asid;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
-#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
 
 struct enc_region {
 	struct list_head list;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fcd4f0d983e9..fcb59d0b3c52 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -478,7 +478,7 @@ static int svm_hardware_enable(void)
 
 	wrmsrl(MSR_EFER, efer | EFER_SVME);
 
-	wrmsrl(MSR_VM_HSAVE_PA, page_to_pfn(sd->save_area) << PAGE_SHIFT);
+	wrmsrl(MSR_VM_HSAVE_PA, __sme_page_pa(sd->save_area));
 
 	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
 		wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
@@ -546,6 +546,7 @@ static int svm_cpu_init(int cpu)
 	sd->save_area = alloc_page(GFP_KERNEL);
 	if (!sd->save_area)
 		goto free_cpu_data;
+	clear_page(page_address(sd->save_area));
 
 	if (svm_sev_enabled()) {
 		sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a0b226c90feb..e3b4b0368bd8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -21,6 +21,8 @@
 
 #include <asm/svm.h>
 
+#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
+
 static const u32 host_save_user_msrs[] = {
 #ifdef CONFIG_X86_64
 	MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 31/35] KVM: SVM: Update ASID allocation to support SEV-ES guests
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (29 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 30/35] KVM: SVM: Set the encryption mask for the SVM host save area Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 32/35] KVM: SVM: Provide support for SEV-ES vCPU creation/loading Tom Lendacky
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

SEV and SEV-ES guests each have dedicated ASID ranges. Update the ASID
allocation routine to return an ASID in the respective range.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 15be71b30e2a..73d2a3f6c83c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -61,19 +61,19 @@ static int sev_flush_asids(void)
 }
 
 /* Must be called with the sev_bitmap_lock held */
-static bool __sev_recycle_asids(void)
+static bool __sev_recycle_asids(int min_asid, int max_asid)
 {
 	int pos;
 
 	/* Check if there are any ASIDs to reclaim before performing a flush */
-	pos = find_next_bit(sev_reclaim_asid_bitmap,
-			    max_sev_asid, min_sev_asid - 1);
-	if (pos >= max_sev_asid)
+	pos = find_next_bit(sev_reclaim_asid_bitmap, max_sev_asid, min_asid);
+	if (pos >= max_asid)
 		return false;
 
 	if (sev_flush_asids())
 		return false;
 
+	/* The flush process will flush all reclaimable SEV and SEV-ES ASIDs */
 	bitmap_xor(sev_asid_bitmap, sev_asid_bitmap, sev_reclaim_asid_bitmap,
 		   max_sev_asid);
 	bitmap_zero(sev_reclaim_asid_bitmap, max_sev_asid);
@@ -81,20 +81,23 @@ static bool __sev_recycle_asids(void)
 	return true;
 }
 
-static int sev_asid_new(void)
+static int sev_asid_new(struct kvm_sev_info *sev)
 {
+	int pos, min_asid, max_asid;
 	bool retry = true;
-	int pos;
 
 	mutex_lock(&sev_bitmap_lock);
 
 	/*
-	 * SEV-enabled guest must use asid from min_sev_asid to max_sev_asid.
+	 * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid.
+	 * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1.
 	 */
+	min_asid = sev->es_active ? 0 : min_sev_asid - 1;
+	max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid;
 again:
-	pos = find_next_zero_bit(sev_asid_bitmap, max_sev_asid, min_sev_asid - 1);
-	if (pos >= max_sev_asid) {
-		if (retry && __sev_recycle_asids()) {
+	pos = find_next_zero_bit(sev_asid_bitmap, max_sev_asid, min_asid);
+	if (pos >= max_asid) {
+		if (retry && __sev_recycle_asids(min_asid, max_asid)) {
 			retry = false;
 			goto again;
 		}
@@ -176,7 +179,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (unlikely(sev->active))
 		return ret;
 
-	asid = sev_asid_new();
+	asid = sev_asid_new(sev);
 	if (asid < 0)
 		return ret;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 32/35] KVM: SVM: Provide support for SEV-ES vCPU creation/loading
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (30 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 31/35] KVM: SVM: Update ASID allocation to support SEV-ES guests Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 33/35] KVM: SVM: Provide support for SEV-ES vCPU loading Tom Lendacky
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

An SEV-ES vCPU requires additional VMCB initialization requirements for
vCPU creation and vCPU load/put requirements. This includes:

General VMCB initialization changes:
  - Set a VMCB control bit to enable SEV-ES support on the vCPU.
  - Set the VMCB encrypted VM save area address.
  - CRx registers are part of the encrypted register state and cannot be
    updated. Remove CRx the register read and write intercepts and replace
    them with CRx register write traps to track the CRx register values.
  - Certain MSR values are part of the encrypted register state and cannot
    be updated. Remove certain MSR intercepts (EFER, CR_PAT, etc.).
  - Remove the #GP intercept (no support for "enable_vmware_backdoor").
  - Remove the XSETBV intercept since the hypervisor cannot modify XCR0.

General vCPU creation changes:
  - Set the initial GHCB gpa value as per the GHCB specification.

General vCPU load changes:
  - SEV-ES hardware will restore certain registers on VMEXIT, but not save
    them on VMRUN (see Table B-3 and Table B-4 of the AMD64 APM Volume 2).
    During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and
    save the current value of XCR0 to the per-CPU SVM save area.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/svm.h | 15 ++++++++++-
 arch/x86/kvm/svm/sev.c     | 54 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c     | 19 +++++++++++---
 arch/x86/kvm/svm/svm.h     |  3 +++
 4 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 07b4ac1e7179..06bb3a83edce 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -53,6 +53,16 @@ enum {
 	INTERCEPT_MWAIT_COND,
 	INTERCEPT_XSETBV,
 	INTERCEPT_RDPRU,
+	TRAP_EFER_WRITE,
+	TRAP_CR0_WRITE,
+	TRAP_CR1_WRITE,
+	TRAP_CR2_WRITE,
+	TRAP_CR3_WRITE,
+	TRAP_CR4_WRITE,
+	TRAP_CR5_WRITE,
+	TRAP_CR6_WRITE,
+	TRAP_CR7_WRITE,
+	TRAP_CR8_WRITE,
 };
 
 
@@ -96,6 +106,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u8 reserved_6[8];	/* Offset 0xe8 */
 	u64 avic_logical_id;	/* Offset 0xf0 */
 	u64 avic_physical_id;	/* Offset 0xf8 */
+	u8 reserved_7[8];
+	u64 vmsa_pa;		/* Used for an SEV-ES guest */
 };
 
 
@@ -150,6 +162,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 
 #define SVM_NESTED_CTL_NP_ENABLE	BIT(0)
 #define SVM_NESTED_CTL_SEV_ENABLE	BIT(1)
+#define SVM_NESTED_CTL_SEV_ES_ENABLE	BIT(2)
 
 struct vmcb_seg {
 	u16 selector;
@@ -249,7 +262,7 @@ struct ghcb {
 static inline void __unused_size_checks(void)
 {
 	BUILD_BUG_ON(sizeof(struct vmcb_save_area) != 1032);
-	BUILD_BUG_ON(sizeof(struct vmcb_control_area) != 256);
+	BUILD_BUG_ON(sizeof(struct vmcb_control_area) != 272);
 	BUILD_BUG_ON(sizeof(struct ghcb) != 4096);
 }
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 73d2a3f6c83c..7ed88f2e8d93 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1545,3 +1545,57 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
 	svm->ap_hlt_loop = false;
 }
+
+void sev_es_init_vmcb(struct vcpu_svm *svm)
+{
+	svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
+	svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
+
+	/*
+	 * An SEV-ES guest requires a VMSA area that is a separate from the
+	 * VMCB page. Do not include the encryption mask on the VMSA physical
+	 * address since hardware will access it using the guest key.
+	 */
+	svm->vmcb->control.vmsa_pa = __pa(svm->vmsa);
+
+	/* Can't intercept CR register access, HV can't modify CR registers */
+	clr_cr_intercept(svm, INTERCEPT_CR0_READ);
+	clr_cr_intercept(svm, INTERCEPT_CR4_READ);
+	clr_cr_intercept(svm, INTERCEPT_CR8_READ);
+	clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
+	clr_cr_intercept(svm, INTERCEPT_CR4_WRITE);
+	clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+
+	svm_clr_intercept(svm, INTERCEPT_SELECTIVE_CR0);
+
+	/* Track EFER/CR register changes */
+	svm_set_intercept(svm, TRAP_EFER_WRITE);
+	svm_set_intercept(svm, TRAP_CR0_WRITE);
+	svm_set_intercept(svm, TRAP_CR4_WRITE);
+	svm_set_intercept(svm, TRAP_CR8_WRITE);
+
+	/* No support for enable_vmware_backdoor */
+	clr_exception_intercept(svm, GP_VECTOR);
+
+	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
+	svm_clr_intercept(svm, INTERCEPT_XSETBV);
+
+	/* Clear intercepts on selected MSRs */
+	set_msr_interception(svm->msrpm, MSR_EFER, 1, 1);
+	set_msr_interception(svm->msrpm, MSR_IA32_CR_PAT, 1, 1);
+	set_msr_interception(svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
+	set_msr_interception(svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
+	set_msr_interception(svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
+	set_msr_interception(svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
+}
+
+void sev_es_create_vcpu(struct vcpu_svm *svm)
+{
+	/*
+	 * Set the GHCB MSR value as per the GHCB specification when creating
+	 * a vCPU for an SEV-ES guest.
+	 */
+	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+					    GHCB_VERSION_MIN,
+					    sev_enc_bit));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fcb59d0b3c52..cb9b1d281adb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -91,7 +91,7 @@ static DEFINE_PER_CPU(u64, current_tsc_ratio);
 
 static const struct svm_direct_access_msrs {
 	u32 index;   /* Index of the MSR */
-	bool always; /* True if intercept is always on */
+	bool always; /* True if intercept is initially cleared */
 } direct_access_msrs[] = {
 	{ .index = MSR_STAR,				.always = true  },
 	{ .index = MSR_IA32_SYSENTER_CS,		.always = true  },
@@ -109,6 +109,9 @@ static const struct svm_direct_access_msrs {
 	{ .index = MSR_IA32_LASTBRANCHTOIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTFROMIP,		.always = false },
 	{ .index = MSR_IA32_LASTINTTOIP,		.always = false },
+	{ .index = MSR_EFER,				.always = false },
+	{ .index = MSR_IA32_CR_PAT,			.always = false },
+	{ .index = MSR_AMD64_SEV_ES_GHCB,		.always = true  },
 	{ .index = MSR_INVALID,				.always = false },
 };
 
@@ -598,8 +601,7 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
 	return !!test_bit(bit_write,  &tmp);
 }
 
-static void set_msr_interception(u32 *msrpm, unsigned msr,
-				 int read, int write)
+void set_msr_interception(u32 *msrpm, unsigned int msr, int read, int write)
 {
 	u8 bit_read, bit_write;
 	unsigned long tmp;
@@ -1147,6 +1149,11 @@ static void init_vmcb(struct vcpu_svm *svm)
 	if (sev_guest(svm->vcpu.kvm)) {
 		svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
 		clr_exception_intercept(svm, UD_VECTOR);
+
+		if (sev_es_guest(svm->vcpu.kvm)) {
+			/* Perform SEV-ES specific VMCB updates */
+			sev_es_init_vmcb(svm);
+		}
 	}
 
 	vmcb_mark_all_dirty(svm->vmcb);
@@ -1253,6 +1260,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm_init_osvw(vcpu);
 	vcpu->arch.microcode_version = 0x01000065;
 
+	if (sev_es_guest(svm->vcpu.kvm))
+		/* Perform SEV-ES specific VMCB creation updates */
+		sev_es_create_vcpu(svm);
+
 	return 0;
 
 free_page5:
@@ -1375,6 +1386,7 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 	loadsegment(gs, svm->host.gs);
 #endif
 #endif
+
 	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
 		wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 }
@@ -3039,6 +3051,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%016llx\n", "avic_backing_page:", control->avic_backing_page);
 	pr_err("%-20s%016llx\n", "avic_logical_id:", control->avic_logical_id);
 	pr_err("%-20s%016llx\n", "avic_physical_id:", control->avic_physical_id);
+	pr_err("%-20s%016llx\n", "vmsa_pa:", control->vmsa_pa);
 	pr_err("VMCB State Save Area:\n");
 	pr_err("%-5s s: %04x a: %04x l: %08x b: %016llx\n",
 	       "es:",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e3b4b0368bd8..465e14a7146f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -412,6 +412,7 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu);
 bool svm_interrupt_blocked(struct kvm_vcpu *vcpu);
 void svm_set_gif(struct vcpu_svm *svm, bool value);
 int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code);
+void set_msr_interception(u32 *msrpm, unsigned int msr, int read, int write);
 
 /* nested.c */
 
@@ -570,6 +571,8 @@ void sev_hardware_teardown(void);
 int sev_handle_vmgexit(struct vcpu_svm *svm);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+void sev_es_init_vmcb(struct vcpu_svm *svm);
+void sev_es_create_vcpu(struct vcpu_svm *svm);
 
 /* VMSA Accessor functions */
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 33/35] KVM: SVM: Provide support for SEV-ES vCPU loading
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (31 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 32/35] KVM: SVM: Provide support for SEV-ES vCPU creation/loading Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 34/35] KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests Tom Lendacky
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

An SEV-ES vCPU requires additional VMCB vCPU load/put requirements. SEV-ES
hardware will restore certain registers on VMEXIT, but not save them on
VMRUM (see Table B-3 and Table B-4 of the AMD64 APM Volume 2), so make the
following changes:

General vCPU load changes:
  - During vCPU loading, perform a VMSAVE to the per-CPU SVM save area and
    save the current value of XCR0 to the per-CPU SVM save area as these
    registers will be restored on VMEXIT.

General vCPU put changes:
  - Do not attempt to restore registers that SEV-ES hardware has already
    restored on VMEXIT.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 48 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c | 36 +++++++++++++++++++------------
 arch/x86/kvm/svm/svm.h | 22 +++++++++++++------
 3 files changed, 87 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7ed88f2e8d93..50018436863b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -17,11 +17,14 @@
 #include <linux/trace_events.h>
 
 #include <asm/trapnr.h>
+#include <asm/fpu/internal.h>
 
 #include "x86.h"
 #include "svm.h"
 #include "trace.h"
 
+#define __ex(x) __kvm_handle_fault_on_reboot(x)
+
 static u8 sev_enc_bit;
 static int sev_flush_asids(void);
 static DECLARE_RWSEM(sev_deactivate_lock);
@@ -1599,3 +1602,48 @@ void sev_es_create_vcpu(struct vcpu_svm *svm)
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
 }
+
+void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu)
+{
+	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
+	struct vmcb_save_area *hostsa;
+	unsigned int i;
+
+	/*
+	 * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
+	 * of which one step is to perform a VMLOAD. Since hardware does not
+	 * perform a VMSAVE on VMRUN, the host savearea must be updated.
+	 */
+	asm volatile(__ex("vmsave") : : "a" (__sme_page_pa(sd->save_area)) : "memory");
+
+	/*
+	 * Certain MSRs are restored on VMEXIT, only save ones that aren't
+	 * saved via the vmsave above.
+	 */
+	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) {
+		if (host_save_user_msrs[i].sev_es_restored)
+			continue;
+
+		rdmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]);
+	}
+
+	/* XCR0 is restored on VMEXIT, save the current host value */
+	hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
+	hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
+}
+
+void sev_es_vcpu_put(struct vcpu_svm *svm)
+{
+	unsigned int i;
+
+	/*
+	 * Certain MSRs are restored on VMEXIT and were saved with vmsave in
+	 * sev_es_vcpu_load() above. Only restore ones that weren't.
+	 */
+	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) {
+		if (host_save_user_msrs[i].sev_es_restored)
+			continue;
+
+		wrmsrl(host_save_user_msrs[i].index, svm->host_user_msrs[i]);
+	}
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cb9b1d281adb..efefe8ba9759 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1340,15 +1340,20 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		vmcb_mark_all_dirty(svm->vmcb);
 	}
 
+	if (sev_es_guest(svm->vcpu.kvm)) {
+		sev_es_vcpu_load(svm, cpu);
+	} else {
 #ifdef CONFIG_X86_64
-	rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
+		rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
 #endif
-	savesegment(fs, svm->host.fs);
-	savesegment(gs, svm->host.gs);
-	svm->host.ldt = kvm_read_ldt();
+		savesegment(fs, svm->host.fs);
+		savesegment(gs, svm->host.gs);
+		svm->host.ldt = kvm_read_ldt();
 
-	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
-		rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
+		for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
+			rdmsrl(host_save_user_msrs[i].index,
+			       svm->host_user_msrs[i]);
+	}
 
 	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
 		u64 tsc_ratio = vcpu->arch.tsc_scaling_ratio;
@@ -1376,19 +1381,24 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 	avic_vcpu_put(vcpu);
 
 	++vcpu->stat.host_state_reload;
-	kvm_load_ldt(svm->host.ldt);
+	if (sev_es_guest(svm->vcpu.kvm)) {
+		sev_es_vcpu_put(svm);
+	} else {
+		kvm_load_ldt(svm->host.ldt);
 #ifdef CONFIG_X86_64
-	loadsegment(fs, svm->host.fs);
-	wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
-	load_gs_index(svm->host.gs);
+		loadsegment(fs, svm->host.fs);
+		wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
+		load_gs_index(svm->host.gs);
 #else
 #ifdef CONFIG_X86_32_LAZY_GS
-	loadsegment(gs, svm->host.gs);
+		loadsegment(gs, svm->host.gs);
 #endif
 #endif
 
-	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
-		wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
+		for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
+			wrmsrl(host_save_user_msrs[i].index,
+			       svm->host_user_msrs[i]);
+	}
 }
 
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 465e14a7146f..0812d70085d7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -23,15 +23,23 @@
 
 #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
 
-static const u32 host_save_user_msrs[] = {
+static const struct svm_host_save_msrs {
+	u32 index;		/* Index of the MSR */
+	bool sev_es_restored;	/* True if MSR is restored on SEV-ES VMEXIT */
+} host_save_user_msrs[] = {
 #ifdef CONFIG_X86_64
-	MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
-	MSR_FS_BASE,
+	{ .index = MSR_STAR,			.sev_es_restored = true },
+	{ .index = MSR_LSTAR,			.sev_es_restored = true },
+	{ .index = MSR_CSTAR,			.sev_es_restored = true },
+	{ .index = MSR_SYSCALL_MASK,		.sev_es_restored = true },
+	{ .index = MSR_KERNEL_GS_BASE,		.sev_es_restored = true },
+	{ .index = MSR_FS_BASE,			.sev_es_restored = true },
 #endif
-	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
-	MSR_TSC_AUX,
+	{ .index = MSR_IA32_SYSENTER_CS,	.sev_es_restored = true },
+	{ .index = MSR_IA32_SYSENTER_ESP,	.sev_es_restored = true },
+	{ .index = MSR_IA32_SYSENTER_EIP,	.sev_es_restored = true },
+	{ .index = MSR_TSC_AUX,			.sev_es_restored = false },
 };
-
 #define NR_HOST_SAVE_USER_MSRS ARRAY_SIZE(host_save_user_msrs)
 
 #define MSRPM_OFFSETS	16
@@ -573,6 +581,8 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_init_vmcb(struct vcpu_svm *svm);
 void sev_es_create_vcpu(struct vcpu_svm *svm);
+void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu);
+void sev_es_vcpu_put(struct vcpu_svm *svm);
 
 /* VMSA Accessor functions */
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 34/35] KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (32 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 33/35] KVM: SVM: Provide support for SEV-ES vCPU loading Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 20:15 ` [RFC PATCH 35/35] KVM: SVM: Provide support to launch and run an SEV-ES guest Tom Lendacky
  2020-09-14 22:59 ` [RFC PATCH 00/35] SEV-ES hypervisor support Sean Christopherson
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

The guest vCPU register state of an SEV-ES guest will be restored on VMRUN
and save saved on VMEXIT. Therefore, there is no need to restore the guest
registers directly and through VMLOAD before VMRUN and no need to save the
guest registers directly and through VMSAVE on VMEXIT.

Update the svm_vcpu_run() function to skip register state saving and
restoring and provide an alternative function for running an SEV-ES guest
in vmenter.S

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/svm.c     | 36 +++++++++++++++++----------
 arch/x86/kvm/svm/svm.h     |  5 ++++
 arch/x86/kvm/svm/vmenter.S | 50 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index efefe8ba9759..5e5f67dd293a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3640,16 +3640,20 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	guest_enter_irqoff();
 	lockdep_hardirqs_on(CALLER_ADDR0);
 
-	__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
+	if (sev_es_guest(svm->vcpu.kvm)) {
+		__svm_sev_es_vcpu_run(svm->vmcb_pa);
+	} else {
+		__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
 
 #ifdef CONFIG_X86_64
-	native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+		native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
 #else
-	loadsegment(fs, svm->host.fs);
+		loadsegment(fs, svm->host.fs);
 #ifndef CONFIG_X86_32_LAZY_GS
-	loadsegment(gs, svm->host.gs);
+		loadsegment(gs, svm->host.gs);
 #endif
 #endif
+	}
 
 	/*
 	 * VMEXIT disables interrupts (host state), but tracing and lockdep
@@ -3676,9 +3680,11 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 	fastpath_t exit_fastpath;
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm_rax_write(svm, vcpu->arch.regs[VCPU_REGS_RAX]);
-	svm_rsp_write(svm, vcpu->arch.regs[VCPU_REGS_RSP]);
-	svm_rip_write(svm, vcpu->arch.regs[VCPU_REGS_RIP]);
+	if (!sev_es_guest(svm->vcpu.kvm)) {
+		svm_rax_write(svm, vcpu->arch.regs[VCPU_REGS_RAX]);
+		svm_rsp_write(svm, vcpu->arch.regs[VCPU_REGS_RSP]);
+		svm_rip_write(svm, vcpu->arch.regs[VCPU_REGS_RIP]);
+	}
 
 	/*
 	 * Disable singlestep if we're injecting an interrupt/exception.
@@ -3700,7 +3706,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	sync_lapic_to_cr8(vcpu);
 
-	svm_cr2_write(svm, vcpu->arch.cr2);
+	if (!sev_es_guest(svm->vcpu.kvm))
+		svm_cr2_write(svm, vcpu->arch.cr2);
 
 	/*
 	 * Run with all-zero DR6 unless needed, so that we can get the exact cause
@@ -3748,14 +3755,17 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 	if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
 		svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
-	reload_tss(vcpu);
+	if (!sev_es_guest(svm->vcpu.kvm))
+		reload_tss(vcpu);
 
 	x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl);
 
-	vcpu->arch.cr2 = svm_cr2_read(svm);
-	vcpu->arch.regs[VCPU_REGS_RAX] = svm_rax_read(svm);
-	vcpu->arch.regs[VCPU_REGS_RSP] = svm_rsp_read(svm);
-	vcpu->arch.regs[VCPU_REGS_RIP] = svm_rip_read(svm);
+	if (!sev_es_guest(svm->vcpu.kvm)) {
+		vcpu->arch.cr2 = svm_cr2_read(svm);
+		vcpu->arch.regs[VCPU_REGS_RAX] = svm_rax_read(svm);
+		vcpu->arch.regs[VCPU_REGS_RSP] = svm_rsp_read(svm);
+		vcpu->arch.regs[VCPU_REGS_RIP] = svm_rip_read(svm);
+	}
 
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_before_interrupt(&svm->vcpu);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0812d70085d7..1405ea3549b8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -584,6 +584,11 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
 void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu);
 void sev_es_vcpu_put(struct vcpu_svm *svm);
 
+/* vmenter.S */
+
+void __svm_sev_es_vcpu_run(unsigned long vmcb_pa);
+void __svm_vcpu_run(unsigned long vmcb_pa, unsigned long *regs);
+
 /* VMSA Accessor functions */
 
 static inline struct vmcb_save_area *get_vmsa(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 1ec1ac40e328..6feb8c08f45a 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -168,3 +168,53 @@ SYM_FUNC_START(__svm_vcpu_run)
 	pop %_ASM_BP
 	ret
 SYM_FUNC_END(__svm_vcpu_run)
+
+/**
+ * __svm_sev_es_vcpu_run - Run a SEV-ES vCPU via a transition to SVM guest mode
+ * @vmcb_pa:	unsigned long
+ */
+SYM_FUNC_START(__svm_sev_es_vcpu_run)
+	push %_ASM_BP
+#ifdef CONFIG_X86_64
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+#else
+	push %edi
+	push %esi
+#endif
+	push %_ASM_BX
+
+	/* Enter guest mode */
+	mov %_ASM_ARG1, %_ASM_AX
+	sti
+
+1:	vmrun %_ASM_AX
+	jmp 3f
+2:	cmpb $0, kvm_rebooting
+	jne 3f
+	ud2
+	_ASM_EXTABLE(1b, 2b)
+
+3:	cli
+
+#ifdef CONFIG_RETPOLINE
+	/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
+	FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
+#endif
+
+	pop %_ASM_BX
+
+#ifdef CONFIG_X86_64
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+#else
+	pop %esi
+	pop %edi
+#endif
+	pop %_ASM_BP
+	ret
+SYM_FUNC_END(__svm_sev_es_vcpu_run)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [RFC PATCH 35/35] KVM: SVM: Provide support to launch and run an SEV-ES guest
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (33 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 34/35] KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests Tom Lendacky
@ 2020-09-14 20:15 ` Tom Lendacky
  2020-09-14 22:59 ` [RFC PATCH 00/35] SEV-ES hypervisor support Sean Christopherson
  35 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-14 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, x86
  Cc: Paolo Bonzini, Jim Mattson, Joerg Roedel, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

An SEV-ES guest requires some additional steps to be launched as compared
to an SEV guest:
  - Implement additional VMCB initialization requirements for SEV-ES.
  - Update MSR_VM_HSAVE_PA to include the encryption bit if SME is active.
  - Add additional MSRs to the list of direct access MSRs so that the
    intercepts can be disabled.
  - Measure all vCPUs using the LAUNCH_UPDATE_VMSA SEV command after all
    calls to LAUNCH_UPDATE_DATA have been performed but before the call
    to LAUNCH_MEASURE has been performed.
  - Use VMSAVE to save host information that is not saved on VMRUN but is
    restored on VMEXIT.
  - Modify the VMRUN path to eliminate guest register state restoring and
    saving.

At this point the guest can be run. However, the run sequence is different
for an SEV-ES guest compared to a normal or even an SEV guest. Because the
guest register state is encrypted, it is all saved as part of VMRUN/VMEXIT
and does not require restoring before or saving after a VMRUN instruction.
As a result, all that is required to perform a VMRUN is to save the RBP
and RAX registers, issue the VMRUN and then restore RAX and RBP.

Additionally, certain state is automatically saved and restored with an
SEV-ES VMRUN. As a result certain register states are not required to be
restored upon VMEXIT (e.g. FS, GS, etc.), so only do that if the guest is
not an SEV-ES guest.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 60 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 50018436863b..eaa669c16345 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -201,6 +201,16 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_es_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	if (!sev_es)
+		return -ENOTTY;
+
+	to_kvm_svm(kvm)->sev_info.es_active = true;
+
+	return sev_guest_init(kvm, argp);
+}
+
 static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
 {
 	struct sev_data_activate *data;
@@ -501,6 +511,50 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_launch_update_vmsa *vmsa;
+	int i, ret;
+
+	if (!sev_es_guest(kvm))
+		return -ENOTTY;
+
+	vmsa = kzalloc(sizeof(*vmsa), GFP_KERNEL);
+	if (!vmsa)
+		return -ENOMEM;
+
+	for (i = 0; i < kvm->created_vcpus; i++) {
+		struct vcpu_svm *svm = to_svm(kvm->vcpus[i]);
+		struct vmcb_save_area *save = get_vmsa(svm);
+
+		/* Set XCR0 before encrypting */
+		save->xcr0 = svm->vcpu.arch.xcr0;
+
+		/*
+		 * The LAUNCH_UPDATE_VMSA command will perform in-place
+		 * encryption of the VMSA memory content (i.e it will write
+		 * the same memory region with the guest's key), so invalidate
+		 * it first.
+		 */
+		clflush_cache_range(svm->vmsa, PAGE_SIZE);
+
+		vmsa->handle = sev->handle;
+		vmsa->address = __sme_pa(svm->vmsa);
+		vmsa->len = PAGE_SIZE;
+		ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_VMSA, vmsa,
+				    &argp->error);
+		if (ret)
+			goto e_free;
+
+		svm->vcpu.arch.vmsa_encrypted = true;
+	}
+
+e_free:
+	kfree(vmsa);
+	return ret;
+}
+
 static int sev_launch_measure(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
 	void __user *measure = (void __user *)(uintptr_t)argp->data;
@@ -948,12 +1002,18 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_INIT:
 		r = sev_guest_init(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_ES_INIT:
+		r = sev_es_guest_init(kvm, &sev_cmd);
+		break;
 	case KVM_SEV_LAUNCH_START:
 		r = sev_launch_start(kvm, &sev_cmd);
 		break;
 	case KVM_SEV_LAUNCH_UPDATE_DATA:
 		r = sev_launch_update_data(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_LAUNCH_UPDATE_VMSA:
+		r = sev_launch_update_vmsa(kvm, &sev_cmd);
+		break;
 	case KVM_SEV_LAUNCH_MEASURE:
 		r = sev_launch_measure(kvm, &sev_cmd);
 		break;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-14 20:15 ` [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES Tom Lendacky
@ 2020-09-14 21:26   ` Sean Christopherson
  2020-09-15 13:37     ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 21:26 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:22PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Since the guest register state of an SEV-ES guest is encrypted, debugging
> is not supported. Update the code to prevent guest debugging when the
> guest is an SEV-ES guest. This includes adding a callable function that
> is used to determine if the guest supports being debugged.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/svm/svm.c          | 16 ++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c          |  7 +++++++
>  arch/x86/kvm/x86.c              |  3 +++
>  4 files changed, 28 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c900992701d6..3e2a3d2a8ba8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1234,6 +1234,8 @@ struct kvm_x86_ops {
>  	void (*reg_read_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
>  	void (*reg_write_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg,
>  				   unsigned long val);
> +
> +	bool (*allow_debug)(struct kvm *kvm);

Why add both allow_debug() and vmsa_encrypted?  I assume there are scenarios
where allow_debug() != vmsa_encrypted?  E.g. is there a debug mode for SEV-ES
where the VMSA is not encrypted, but KVM (ironically) can't intercept #DBs or
something?

Alternatively, have you explored using a new VM_TYPE for SEV-ES guests?  With
a genericized vmsa_encrypted, that would allow something like the following
for scenarios where the VMSA is not (yet?) encrypted for an SEV-ES guest.  I
don't love bleeding the VM type into x86.c, but for one-off quirks like this
I think it'd be preferable to adding a kvm_x86_ops hook.

int kvm_arch_vcpu_ioctl_set_guest_debug(...)
{
	if (vcpu->arch.guest_state_protected ||
	    kvm->arch.vm_type == KVM_X86_SEV_ES_VM)
		return -EINVAL;
}

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/35] KVM: SVM: Do not emulate MMIO under SEV-ES
  2020-09-14 20:15 ` [RFC PATCH 09/35] KVM: SVM: Do not emulate MMIO " Tom Lendacky
@ 2020-09-14 21:33   ` Sean Christopherson
  2020-09-15 13:38     ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 21:33 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:23PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> When a guest is running as an SEV-ES guest, it is not possible to emulate
> MMIO. Add support to prevent trying to perform MMIO emulation.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a5d0207e7189..2e1b8b876286 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5485,6 +5485,13 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
>  	if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
>  		emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
>  emulate:
> +	/*
> +	 * When the guest is an SEV-ES guest, emulation is not possible.  Allow
> +	 * the guest to handle the MMIO emulation.
> +	 */
> +	if (vcpu->arch.vmsa_encrypted)
> +		return 1;

A better approach is to refactor need_emulation_on_page_fault() (the hook
that's just out of sight in this patch) into a more generic
kvm_x86_ops.is_emulatable() so that the latter can be used to kill emulation
everywhere, and for other reasons.  E.g. TDX obviously shares very similar
logic, but SGX also adds a case where KVM can theoretically end up in an
emulator path without the ability to access the necessary guest state.

I have exactly such a prep patch (because SGX and TDX...), I'll get it posted
in the next day or two.

> +
>  	/*
>  	 * On AMD platforms, under certain conditions insn_len may be zero on #NPF.
>  	 * This can happen if a guest gets a page-fault on data access but the HW
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  2020-09-14 20:15 ` [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES Tom Lendacky
@ 2020-09-14 21:37   ` Sean Christopherson
  2020-09-15 14:19     ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 21:37 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Since many of the registers used by the SEV-ES are encrypted and cannot
> be read or written, adjust the __get_sregs() / __set_sregs() to only get
> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
> is encrypted.

Is there an actual use case for writing said registers after the VMSA is
encrypted?  Assuming there's a separate "debug mode" and live migration has
special logic, can KVM simply reject the ioctl() if guest state is protected?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 28/35] KVM: X86: Update kvm_skip_emulated_instruction() for an SEV-ES guest
  2020-09-14 20:15 ` [RFC PATCH 28/35] KVM: X86: Update kvm_skip_emulated_instruction() " Tom Lendacky
@ 2020-09-14 21:51   ` Sean Christopherson
  2020-09-15 14:57     ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 21:51 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:42PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> The register state for an SEV-ES guest is encrypted so the value of the
> RIP cannot be updated. For an automatic exit, the RIP will be advanced
> as necessary. For a non-automatic exit, it is up to the #VC handler in
> the guest to advance the RIP.
> 
> Add support to skip any RIP updates in kvm_skip_emulated_instruction()
> for an SEV-ES guest.

Is there a reason this can't be handled in svm?  E.g. can KVM be reworked
to effectively split the emulation logic so that it's a bug for KVM to end
up trying to modify RIP?

Also, patch 06 modifies SVM's skip_emulated_instruction() to skip the RIP
update, but keeps the "svm_set_interrupt_shadow(vcpu, 0)" logic.  Seems like
either that change or this one is wrong.

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kvm/x86.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 23564d02d158..1dbdca607511 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6874,13 +6874,17 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
>  
>  int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  {
> -	unsigned long rflags = kvm_x86_ops.get_rflags(vcpu);
> +	unsigned long rflags;
>  	int r;
>  
>  	r = kvm_x86_ops.skip_emulated_instruction(vcpu);
>  	if (unlikely(!r))
>  		return 0;
>  
> +	if (vcpu->arch.vmsa_encrypted)
> +		return 1;
> +
> +	rflags = kvm_x86_ops.get_rflags(vcpu);
>  	/*
>  	 * rflags is the old, "raw" value of the flags.  The new value has
>  	 * not been saved yet.
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 20/35] KVM: SVM: Add SEV/SEV-ES support for intercepting INVD
  2020-09-14 20:15 ` [RFC PATCH 20/35] KVM: SVM: Add SEV/SEV-ES support for intercepting INVD Tom Lendacky
@ 2020-09-14 22:00   ` Sean Christopherson
  2020-09-15 15:08     ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 22:00 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:34PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> The INVD instruction intercept performs emulation. Emulation can't be done
> on an SEV or SEV-ES guest because the guest memory is encrypted.
> 
> Provide a specific intercept routine for the INVD intercept. Within this
> intercept routine, skip the instruction for an SEV or SEV-ES guest since
> it is emulated as a NOP anyway.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kvm/svm/svm.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 37c98e85aa62..ac64a5b128b2 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2275,6 +2275,17 @@ static int iret_interception(struct vcpu_svm *svm)
>  	return 1;
>  }
>  
> +static int invd_interception(struct vcpu_svm *svm)
> +{
> +	/*
> +	 * Can't do emulation on any type of SEV guest and INVD is emulated
> +	 * as a NOP, so just skip it.
> +	 */
> +	return (sev_guest(svm->vcpu.kvm))

Should this be a standalone/backported fix for SEV?

> +		? kvm_skip_emulated_instruction(&svm->vcpu)
> +		: kvm_emulate_instruction(&svm->vcpu, 0);
> +}
> +
>  static int invlpg_interception(struct vcpu_svm *svm)
>  {
>  	if (!static_cpu_has(X86_FEATURE_DECODEASSISTS))
> @@ -2912,7 +2923,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
>  	[SVM_EXIT_RDPMC]			= rdpmc_interception,
>  	[SVM_EXIT_CPUID]			= cpuid_interception,
>  	[SVM_EXIT_IRET]                         = iret_interception,
> -	[SVM_EXIT_INVD]                         = emulate_on_interception,
> +	[SVM_EXIT_INVD]                         = invd_interception,
>  	[SVM_EXIT_PAUSE]			= pause_interception,
>  	[SVM_EXIT_HLT]				= halt_interception,
>  	[SVM_EXIT_INVLPG]			= invlpg_interception,
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 22/35] KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  2020-09-14 20:15 ` [RFC PATCH 22/35] KVM: SVM: Add support for CR0 " Tom Lendacky
@ 2020-09-14 22:13   ` Sean Christopherson
  2020-09-15 15:56     ` Tom Lendacky
  2020-11-30 18:15     ` Paolo Bonzini
  0 siblings, 2 replies; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 22:13 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:36PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b65bd0c986d4..6f5988c305e1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -799,11 +799,29 @@ bool pdptrs_changed(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(pdptrs_changed);
>  
> +static void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0,
> +			     unsigned long cr0)

What about using __kvm_set_cr*() instead of kvm_post_set_cr*()?  That would
show that __kvm_set_cr*() is a subordinate of kvm_set_cr*(), and from the
SVM side would provide the hint that the code is skipping the front end of
kvm_set_cr*().

> +{
> +	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
> +
> +	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
> +		kvm_clear_async_pf_completion_queue(vcpu);
> +		kvm_async_pf_hash_reset(vcpu);
> +	}
> +
> +	if ((cr0 ^ old_cr0) & update_bits)
> +		kvm_mmu_reset_context(vcpu);
> +
> +	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
> +	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
> +	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
> +		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
> +}
> +
>  int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>  {
>  	unsigned long old_cr0 = kvm_read_cr0(vcpu);
>  	unsigned long pdptr_bits = X86_CR0_CD | X86_CR0_NW | X86_CR0_PG;
> -	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
>  
>  	cr0 |= X86_CR0_ET;
>  
> @@ -842,22 +860,23 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>  
>  	kvm_x86_ops.set_cr0(vcpu, cr0);
>  
> -	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
> -		kvm_clear_async_pf_completion_queue(vcpu);
> -		kvm_async_pf_hash_reset(vcpu);
> -	}
> +	kvm_post_set_cr0(vcpu, old_cr0, cr0);
>  
> -	if ((cr0 ^ old_cr0) & update_bits)
> -		kvm_mmu_reset_context(vcpu);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_set_cr0);
>  
> -	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
> -	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
> -	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
> -		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
> +int kvm_track_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)

I really dislike the "track" terminology.  For me, using "track" as the verb
in a function implies the function activates tracking.  But it's probably a
moot point, because similar to EFER, I don't see any reason to put the front
end of the emulation into x86.c.  Both getting old_cr0 and setting
vcpu->arch.cr0 can be done in svm.c

> +{
> +	unsigned long old_cr0 = kvm_read_cr0(vcpu);
> +
> +	vcpu->arch.cr0 = cr0;
> +
> +	kvm_post_set_cr0(vcpu, old_cr0, cr0);
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL_GPL(kvm_set_cr0);
> +EXPORT_SYMBOL_GPL(kvm_track_cr0);
>  
>  void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
>  {
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 23/35] KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  2020-09-14 20:15 ` [RFC PATCH 23/35] KVM: SVM: Add support for CR4 " Tom Lendacky
@ 2020-09-14 22:16   ` Sean Christopherson
  2020-11-30 18:16     ` Paolo Bonzini
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 22:16 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:37PM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6f5988c305e1..5e5f1e8fed3a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1033,6 +1033,26 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_cr4);
>  
> +int kvm_track_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> +{
> +	unsigned long old_cr4 = kvm_read_cr4(vcpu);
> +	unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
> +				   X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
> +
> +	if (kvm_x86_ops.set_cr4(vcpu, cr4))
> +		return 1;

Pretty much all the same comments as EFER and CR0, e.g. call svm_set_cr4()
directly instead of bouncing through kvm_x86_ops.  And with that, this can
be called __kvm_set_cr4() to be consistent with __kvm_set_cr0().

> +
> +	if (((cr4 ^ old_cr4) & pdptr_bits) ||
> +	    (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> +		kvm_mmu_reset_context(vcpu);
> +
> +	if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE))
> +		kvm_update_cpuid_runtime(vcpu);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_track_cr4);
> +
>  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  {
>  	bool skip_tlb_flush = false;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 24/35] KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  2020-09-14 20:15 ` [RFC PATCH 24/35] KVM: SVM: Add support for CR8 " Tom Lendacky
@ 2020-09-14 22:19   ` Sean Christopherson
  2020-09-15 15:57     ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 22:19 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:38PM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5e5f1e8fed3a..6e445a76b691 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1109,6 +1109,12 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_get_cr8);
>  
> +int kvm_track_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
> +{
> +	return kvm_set_cr8(vcpu, cr8);

I'm guessing this was added to achieve consistency at the SVM call sites.
With the previously suggested changes, kvm_track_cr8() can simply be
dropped.

> +}
> +EXPORT_SYMBOL_GPL(kvm_track_cr8);
> +
>  static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
>  {
>  	int i;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
                   ` (34 preceding siblings ...)
  2020-09-14 20:15 ` [RFC PATCH 35/35] KVM: SVM: Provide support to launch and run an SEV-ES guest Tom Lendacky
@ 2020-09-14 22:59 ` Sean Christopherson
  2020-09-15 17:22   ` Tom Lendacky
  35 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-14 22:59 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Sep 14, 2020 at 03:15:14PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> This patch series provides support for running SEV-ES guests under KVM.

From the x86/VMX side of things, the GPR hooks are the only changes that I
strongly dislike.

For the vmsa_encrypted flag and related things like allow_debug(), I'd
really like to aim for a common implementation between SEV-ES and TDX[*] from
the get go, within reason obviously.  From a code perspective, I don't think
it will be too onerous as the basic tenets are quite similar, e.g. guest
state is off limits, FPU state is autoswitched, etc..., but I suspect (or
maybe worry?) that there are enough minor differences that we'll want a more
generic way of marking ioctls() as disallowed to avoid having one-off checks
all over the place.

That being said, it may also be that there are some ioctls() that should be
disallowed under SEV-ES, but aren't in this series.  E.g. I assume
kvm_vcpu_ioctl_smi() should be rejected as KVM can't do the necessary
emulation (I assume this applies to vanilla SEV as well?).

One thought to try and reconcile the differences between SEV-ES and TDX would
be expicitly list which ioctls() are and aren't supported and go from there?
E.g. if there is 95% overlap than we probably don't need to get fancy with
generic allow/deny.

Given that we don't yet have publicly available KVM code for TDX, what if I
generate and post a list of ioctls() that are denied by either SEV-ES or TDX,
organized by the denier(s)?  Then for the ioctls() that are denied by one and
not the other, we add a brief explanation of why it's denied?

If that sounds ok, I'll get the list and the TDX side of things posted
tomorrow.

Thanks!


[*] https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 05/35] KVM: SVM: Add initial support for SEV-ES GHCB access to KVM
       [not found]   ` <20200914205801.GA7084@sjchrist-ice>
@ 2020-09-15 13:24     ` Tom Lendacky
  2020-09-15 16:28       ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 13:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 3:58 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:19PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Provide initial support for accessing the GHCB when needing to access
>> registers for an SEV-ES guest. The support consists of:
>>
>>   - Accessing the GHCB instead of the VMSA when reading and writing
>>     guest registers (after the VMSA has been encrypted).
>>   - Creating register access override functions for reading and writing
>>     guest registers from the common KVM support.
>>   - Allocating pages for the VMSA and GHCB when creating each vCPU
>>     - The VMSA page holds the encrypted VMSA for the vCPU
>>     - The GHCB page is used to hold a copy of the guest GHCB during
>>       VMGEXIT processing.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h  |   7 ++
>>  arch/x86/include/asm/msr-index.h |   1 +
>>  arch/x86/kvm/kvm_cache_regs.h    |  30 +++++--
>>  arch/x86/kvm/svm/svm.c           | 138 ++++++++++++++++++++++++++++++-
>>  arch/x86/kvm/svm/svm.h           |  65 ++++++++++++++-
>>  5 files changed, 230 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 5303dbc5c9bc..c900992701d6 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -788,6 +788,9 @@ struct kvm_vcpu_arch {
>>  
>>  	/* AMD MSRC001_0015 Hardware Configuration */
>>  	u64 msr_hwcr;
>> +
>> +	/* SEV-ES support */
>> +	bool vmsa_encrypted;
> 
> 
> Peeking a little into the future, Intel needs a very similar flag for TDX[*].
> At a glance throughout the series,, I don't see anything that is super SEV-ES
> specific, so I think we could do s/vmsa_encrypted/guest_state_protected (or
> something along those lines).

Yup, I can do that.

> 
> [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftware.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Fdevelop%2Farticles%2Fintel-trust-domain-extensions.html&amp;data=02%7C01%7Cthomas.lendacky%40amd.com%7Cd5fcf35d079042b095b308d858f0e12f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637357138885657516&amp;sdata=aSclP%2BxSatvG9GOMEtWpXfdUxrLOlVcCXJNH41OdGms%3D&amp;reserved=0
> 
>>  };
>>  
>>  struct kvm_lpage_info {
>> @@ -1227,6 +1230,10 @@ struct kvm_x86_ops {
>>  	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>>  
>>  	void (*migrate_timers)(struct kvm_vcpu *vcpu);
>> +
>> +	void (*reg_read_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
>> +	void (*reg_write_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg,
>> +				   unsigned long val);
>>  };
>>  
>>  struct kvm_x86_nested_ops {
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index 249a4147c4b2..16f5b20bb099 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -466,6 +466,7 @@
>>  #define MSR_AMD64_IBSBRTARGET		0xc001103b
>>  #define MSR_AMD64_IBSOPDATA4		0xc001103d
>>  #define MSR_AMD64_IBS_REG_COUNT_MAX	8 /* includes MSR_AMD64_IBSBRTARGET */
>> +#define MSR_AMD64_VM_PAGE_FLUSH		0xc001011e
>>  #define MSR_AMD64_SEV_ES_GHCB		0xc0010130
>>  #define MSR_AMD64_SEV			0xc0010131
>>  #define MSR_AMD64_SEV_ENABLED_BIT	0
>> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
>> index cfe83d4ae625..e87eb90999d5 100644
>> --- a/arch/x86/kvm/kvm_cache_regs.h
>> +++ b/arch/x86/kvm/kvm_cache_regs.h
>> @@ -9,15 +9,21 @@
>>  	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
>>  	 | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_PGE | X86_CR4_TSD)
>>  
>> -#define BUILD_KVM_GPR_ACCESSORS(lname, uname)				      \
>> -static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
>> -{									      \
>> -	return vcpu->arch.regs[VCPU_REGS_##uname];			      \
>> -}									      \
>> -static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu,	      \
>> -						unsigned long val)	      \
>> -{									      \
>> -	vcpu->arch.regs[VCPU_REGS_##uname] = val;			      \
>> +#define BUILD_KVM_GPR_ACCESSORS(lname, uname)					\
>> +static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)	\
>> +{										\
>> +	if (kvm_x86_ops.reg_read_override)					\
>> +		kvm_x86_ops.reg_read_override(vcpu, VCPU_REGS_##uname);		\
>> +										\
>> +	return vcpu->arch.regs[VCPU_REGS_##uname];				\
>> +}										\
>> +static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu,		\
>> +						unsigned long val)		\
>> +{										\
>> +	if (kvm_x86_ops.reg_write_override)					\
>> +		kvm_x86_ops.reg_write_override(vcpu, VCPU_REGS_##uname, val);	\
>> +										\
>> +	vcpu->arch.regs[VCPU_REGS_##uname] = val;				\
>>  }
>>  BUILD_KVM_GPR_ACCESSORS(rax, RAX)
>>  BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
>> @@ -67,6 +73,9 @@ static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
>>  	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
>>  		return 0;
>>  
>> +	if (kvm_x86_ops.reg_read_override)
>> +		kvm_x86_ops.reg_read_override(vcpu, reg);
>> +
>>  	if (!kvm_register_is_available(vcpu, reg))
>>  		kvm_x86_ops.cache_reg(vcpu, reg);
>>  
>> @@ -79,6 +88,9 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
>>  	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
>>  		return;
>>  
>> +	if (kvm_x86_ops.reg_write_override)
>> +		kvm_x86_ops.reg_write_override(vcpu, reg, val);
> 
> 
> There has to be a more optimal approach for propagating registers between
> vcpu->arch.regs and the VMSA than adding a per-GPR hook.  Why not simply
> copy the entire set of registers to/from the VMSA on every exit and entry?
> AFAICT, valid_bits is only used in the read path, and KVM doesn't do anything
> sophistated when it hits a !valid_bits reads.

That would probably be ok. And actually, the code might be able to just
check the GHCB valid bitmap for valid regs on exit, copy them and then
clear the bitmap. The write code could check if vmsa_encrypted is set and
then set a "valid" bit for the reg that could be used to set regs on entry.

I'm not sure if turning kvm_vcpu_arch.regs into a struct and adding a
valid bit would be overkill or not.

> 
> If unconditional copying has a noticeable impact on e.g. IRQ handling
> latency, the save/restore could be limited to exits that may access guest
> state, which is presumably a well-defined, limited list.  Such exits are
> basically a slow path anyways, especially if the guest kernel is taking a
> #VC on the front eend.  Adding hooks to KVM_{GET,SET}_REGS to ensure userspace
> accesses are handled correcty would be trivial.
> 
> Adding per-GPR hooks will bloat common KVM for both VMX and SVM, and will
> likely have a noticeable performance impact on SVM due to adding 60-70 cycles
> to every GPR access for the retpoline.  Static calls will take the sting out
> of that, but it's still a lot of code bytes, that IMO, are completely
> unecessary.

Yes, definitely something I need to look into more.

> 
>> +
>>  	vcpu->arch.regs[reg] = val;
>>  	kvm_register_mark_dirty(vcpu, reg);
>>  }
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 779c167e42cc..d1f52211627a 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -1175,6 +1175,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>>  	struct page *msrpm_pages;
>>  	struct page *hsave_page;
>>  	struct page *nested_msrpm_pages;
>> +	struct page *vmsa_page = NULL;
>>  	int err;
>>  
>>  	BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
>> @@ -1197,9 +1198,19 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>>  	if (!hsave_page)
>>  		goto free_page3;
>>  
>> +	if (sev_es_guest(svm->vcpu.kvm)) {
>> +		/*
>> +		 * SEV-ES guests require a separate VMSA page used to contain
>> +		 * the encrypted register state of the guest.
>> +		 */
>> +		vmsa_page = alloc_page(GFP_KERNEL);
>> +		if (!vmsa_page)
>> +			goto free_page4;
>> +	}
>> +
>>  	err = avic_init_vcpu(svm);
>>  	if (err)
>> -		goto free_page4;
>> +		goto free_page5;
>>  
>>  	/* We initialize this flag to true to make sure that the is_running
>>  	 * bit would be set the first time the vcpu is loaded.
>> @@ -1219,6 +1230,12 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>>  	svm->vmcb = page_address(page);
>>  	clear_page(svm->vmcb);
>>  	svm->vmcb_pa = __sme_set(page_to_pfn(page) << PAGE_SHIFT);
>> +
>> +	if (vmsa_page) {
>> +		svm->vmsa = page_address(vmsa_page);
>> +		clear_page(svm->vmsa);
>> +	}
>> +
>>  	svm->asid_generation = 0;
>>  	init_vmcb(svm);
>>  
>> @@ -1227,6 +1244,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>>  
>>  	return 0;
>>  
>> +free_page5:
>> +	if (vmsa_page)
>> +		__free_page(vmsa_page);
>>  free_page4:
>>  	__free_page(hsave_page);
>>  free_page3:
>> @@ -1258,6 +1278,26 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>>  	 */
>>  	svm_clear_current_vmcb(svm->vmcb);
>>  
>> +	if (sev_es_guest(vcpu->kvm)) {
>> +		struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
>> +
>> +		if (vcpu->arch.vmsa_encrypted) {
>> +			u64 page_to_flush;
>> +
>> +			/*
>> +			 * The VMSA page was used by hardware to hold guest
>> +			 * encrypted state, be sure to flush it before returning
>> +			 * it to the system. This is done using the VM Page
>> +			 * Flush MSR (which takes the page virtual address and
>> +			 * guest ASID).
>> +			 */
>> +			page_to_flush = (u64)svm->vmsa | sev->asid;
>> +			wrmsrl(MSR_AMD64_VM_PAGE_FLUSH, page_to_flush);
>> +		}
>> +
>> +		__free_page(virt_to_page(svm->vmsa));
>> +	}
>> +
>>  	__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
>>  	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
>>  	__free_page(virt_to_page(svm->nested.hsave));
>> @@ -4012,6 +4052,99 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
>>  		   (svm->vmcb->control.intercept & (1ULL << INTERCEPT_INIT));
>>  }
>>  
>> +/*
>> + * These return values represent the offset in quad words within the VM save
>> + * area. This allows them to be accessed by casting the save area to a u64
>> + * array.
>> + */
>> +#define VMSA_REG_ENTRY(_field)	 (offsetof(struct vmcb_save_area, _field) / sizeof(u64))
>> +#define VMSA_REG_UNDEF		 VMSA_REG_ENTRY(valid_bitmap)
>> +static inline unsigned int vcpu_to_vmsa_entry(enum kvm_reg reg)
>> +{
>> +	switch (reg) {
>> +	case VCPU_REGS_RAX:	return VMSA_REG_ENTRY(rax);
>> +	case VCPU_REGS_RBX:	return VMSA_REG_ENTRY(rbx);
>> +	case VCPU_REGS_RCX:	return VMSA_REG_ENTRY(rcx);
>> +	case VCPU_REGS_RDX:	return VMSA_REG_ENTRY(rdx);
>> +	case VCPU_REGS_RSP:	return VMSA_REG_ENTRY(rsp);
>> +	case VCPU_REGS_RBP:	return VMSA_REG_ENTRY(rbp);
>> +	case VCPU_REGS_RSI:	return VMSA_REG_ENTRY(rsi);
>> +	case VCPU_REGS_RDI:	return VMSA_REG_ENTRY(rdi);
>> +#ifdef CONFIG_X86_64
>> +	case VCPU_REGS_R8:	return VMSA_REG_ENTRY(r8);
>> +	case VCPU_REGS_R9:	return VMSA_REG_ENTRY(r9);
>> +	case VCPU_REGS_R10:	return VMSA_REG_ENTRY(r10);
>> +	case VCPU_REGS_R11:	return VMSA_REG_ENTRY(r11);
>> +	case VCPU_REGS_R12:	return VMSA_REG_ENTRY(r12);
>> +	case VCPU_REGS_R13:	return VMSA_REG_ENTRY(r13);
>> +	case VCPU_REGS_R14:	return VMSA_REG_ENTRY(r14);
>> +	case VCPU_REGS_R15:	return VMSA_REG_ENTRY(r15);
>> +#endif
>> +	case VCPU_REGS_RIP:	return VMSA_REG_ENTRY(rip);
>> +	default:
>> +		WARN_ONCE(1, "unsupported VCPU to VMSA register conversion\n");
>> +		return VMSA_REG_UNDEF;
>> +	}
>> +}
>> +
>> +/* For SEV-ES guests, populate the vCPU register from the appropriate VMSA/GHCB */
>> +static void svm_reg_read_override(struct kvm_vcpu *vcpu, enum kvm_reg reg)
>> +{
>> +	struct vmcb_save_area *vmsa;
>> +	struct vcpu_svm *svm;
>> +	unsigned int entry;
>> +	unsigned long val;
>> +	u64 *vmsa_reg;
>> +
>> +	if (!sev_es_guest(vcpu->kvm))
>> +		return;
>> +
>> +	entry = vcpu_to_vmsa_entry(reg);
>> +	if (entry == VMSA_REG_UNDEF)
>> +		return;
>> +
>> +	svm = to_svm(vcpu);
>> +	vmsa = get_vmsa(svm);
>> +	vmsa_reg = (u64 *)vmsa;
>> +	val = (unsigned long)vmsa_reg[entry];
>> +
>> +	/* If a GHCB is mapped, check the bitmap of valid entries */
>> +	if (svm->ghcb) {
>> +		if (!test_bit(entry, (unsigned long *)vmsa->valid_bitmap))
>> +			val = 0;
> 
> Is KVM relying on this being 0?  Would it make sense to stuff something like
> 0xaaaa... or 0xdeadbeefdeadbeef so that consumption of bogus data is more
> noticeable?

No, KVM isn't relying on this being 0. I thought about using something
other than 0 here, but settled on just using 0. I'm open to changing that,
though. I'm not sure if there's an easy way to short-circuit the intercept
and respond back with an error at this point, that would be optimal.

> 
>> +	}
>> +
>> +	vcpu->arch.regs[reg] = val;
>> +}
>> +
>> +/* For SEV-ES guests, set the vCPU register in the appropriate VMSA */
>> +static void svm_reg_write_override(struct kvm_vcpu *vcpu, enum kvm_reg reg,
>> +				   unsigned long val)
>> +{
>> +	struct vmcb_save_area *vmsa;
>> +	struct vcpu_svm *svm;
>> +	unsigned int entry;
>> +	u64 *vmsa_reg;
>> +
>> +	entry = vcpu_to_vmsa_entry(reg);
>> +	if (entry == VMSA_REG_UNDEF)
>> +		return;
>> +
>> +	svm = to_svm(vcpu);
>> +	vmsa = get_vmsa(svm);
>> +	vmsa_reg = (u64 *)vmsa;
>> +
>> +	/* If a GHCB is mapped, set the bit to indicate a valid entry */
>> +	if (svm->ghcb) {
>> +		unsigned int index = entry / 8;
>> +		unsigned int shift = entry % 8;
>> +
>> +		vmsa->valid_bitmap[index] |= BIT(shift);
>> +	}
>> +
>> +	vmsa_reg[entry] = val;
>> +}
>> +
>>  static void svm_vm_destroy(struct kvm *kvm)
>>  {
>>  	avic_vm_destroy(kvm);
>> @@ -4150,6 +4283,9 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>>  	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>>  
>>  	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
>> +
>> +	.reg_read_override = svm_reg_read_override,
>> +	.reg_write_override = svm_reg_write_override,
>>  };
>>  
>>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index f42ba9d158df..ff587536f571 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -159,6 +159,10 @@ struct vcpu_svm {
>>  	 */
>>  	struct list_head ir_list;
>>  	spinlock_t ir_list_lock;
>> +
>> +	/* SEV-ES support */
>> +	struct vmcb_save_area *vmsa;
>> +	struct ghcb *ghcb;
>>  };
>>  
>>  struct svm_cpu_data {
>> @@ -509,9 +513,34 @@ void sev_hardware_teardown(void);
>>  
>>  static inline struct vmcb_save_area *get_vmsa(struct vcpu_svm *svm)
>>  {
>> -	return &svm->vmcb->save;
>> +	struct vmcb_save_area *vmsa;
>> +
>> +	if (sev_es_guest(svm->vcpu.kvm)) {
>> +		/*
>> +		 * Before LAUNCH_UPDATE_VMSA, use the actual SEV-ES save area
>> +		 * to construct the initial state.  Afterwards, use the mapped
>> +		 * GHCB in a VMGEXIT or the traditional save area as a scratch
>> +		 * area when outside of a VMGEXIT.
>> +		 */
>> +		if (svm->vcpu.arch.vmsa_encrypted) {
>> +			if (svm->ghcb)
>> +				vmsa = &svm->ghcb->save;
>> +			else
>> +				vmsa = &svm->vmcb->save;
>> +		} else {
>> +			vmsa = svm->vmsa;
>> +		}
> 
> Not sure if it's actually better, but this whole thing could be:
> 
> 	if (!sev_es_guest(svm->vcpu.kvm))
> 		return &svm->vmcb->save;
> 
> 	if (!svm->vcpu.arch.vmsa_encrypted)
> 		return svm->vmsa;
> 
> 	return svm->ghcb ? &svm->ghcb->save : &svm->vmcb->save;
> 

It does look cleaner.

> 
>> +	} else {
>> +		vmsa = &svm->vmcb->save;
>> +	}
>> +
>> +	return vmsa;
>>  }
>>  
>> +#define SEV_ES_SET_VALID(_vmsa, _field)					\
>> +	__set_bit(GHCB_BITMAP_IDX(_field),				\
>> +		  (unsigned long *)(_vmsa)->valid_bitmap)
>> +
>>  #define DEFINE_VMSA_SEGMENT_ENTRY(_field, _entry, _size)		\
>>  	static inline _size						\
>>  	svm_##_field##_read_##_entry(struct vcpu_svm *svm)		\
>> @@ -528,6 +557,9 @@ static inline struct vmcb_save_area *get_vmsa(struct vcpu_svm *svm)
>>  		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
>>  									\
>>  		vmsa->_field._entry = value;				\
>> +		if (svm->vcpu.arch.vmsa_encrypted) {			\
> 
> Pretty sure braces are unnecessary on all these.

Yup, I can get rid of them.

Thanks,
Tom

> 
>> +			SEV_ES_SET_VALID(vmsa, _field);			\
>> +		}							\
>>  	}								\
>>  
>>  #define DEFINE_VMSA_SEGMENT_ACCESSOR(_field)				\
>> @@ -551,6 +583,9 @@ static inline struct vmcb_save_area *get_vmsa(struct vcpu_svm *svm)
>>  		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
>>  									\
>>  		vmsa->_field = *seg;					\
>> +		if (svm->vcpu.arch.vmsa_encrypted) {			\
>> +			SEV_ES_SET_VALID(vmsa, _field);			\
>> +		}							\
>>  	}
>>  
>>  DEFINE_VMSA_SEGMENT_ACCESSOR(cs)
>> @@ -579,6 +614,9 @@ DEFINE_VMSA_SEGMENT_ACCESSOR(tr)
>>  		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
>>  									\
>>  		vmsa->_field = value;					\
>> +		if (svm->vcpu.arch.vmsa_encrypted) {			\
>> +			SEV_ES_SET_VALID(vmsa, _field);			\
>> +		}							\
>>  	}								\
>>  									\
>>  	static inline void						\
>> @@ -587,6 +625,9 @@ DEFINE_VMSA_SEGMENT_ACCESSOR(tr)
>>  		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
>>  									\
>>  		vmsa->_field &= value;					\
>> +		if (svm->vcpu.arch.vmsa_encrypted) {			\
>> +			SEV_ES_SET_VALID(vmsa, _field);			\
>> +		}							\
>>  	}								\
>>  									\
>>  	static inline void						\
>> @@ -595,6 +636,9 @@ DEFINE_VMSA_SEGMENT_ACCESSOR(tr)
>>  		struct vmcb_save_area *vmsa = get_vmsa(svm);		\
>>  									\
>>  		vmsa->_field |= value;					\
>> +		if (svm->vcpu.arch.vmsa_encrypted) {			\
>> +			SEV_ES_SET_VALID(vmsa, _field);			\
>> +		}							\

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-14 21:26   ` Sean Christopherson
@ 2020-09-15 13:37     ` Tom Lendacky
  2020-09-15 16:30       ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 13:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 4:26 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:22PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Since the guest register state of an SEV-ES guest is encrypted, debugging
>> is not supported. Update the code to prevent guest debugging when the
>> guest is an SEV-ES guest. This includes adding a callable function that
>> is used to determine if the guest supports being debugged.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  2 ++
>>  arch/x86/kvm/svm/svm.c          | 16 ++++++++++++++++
>>  arch/x86/kvm/vmx/vmx.c          |  7 +++++++
>>  arch/x86/kvm/x86.c              |  3 +++
>>  4 files changed, 28 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index c900992701d6..3e2a3d2a8ba8 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1234,6 +1234,8 @@ struct kvm_x86_ops {
>>  	void (*reg_read_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
>>  	void (*reg_write_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg,
>>  				   unsigned long val);
>> +
>> +	bool (*allow_debug)(struct kvm *kvm);
> 
> Why add both allow_debug() and vmsa_encrypted?  I assume there are scenarios
> where allow_debug() != vmsa_encrypted?  E.g. is there a debug mode for SEV-ES
> where the VMSA is not encrypted, but KVM (ironically) can't intercept #DBs or
> something?

No, once the guest has had LAUNCH_UPDATE_VMSA run against the vCPUs, then
the vCPU states are all encrypted. But that doesn't mean that debugging
can't be done in the future.

> 
> Alternatively, have you explored using a new VM_TYPE for SEV-ES guests?  With
> a genericized vmsa_encrypted, that would allow something like the following
> for scenarios where the VMSA is not (yet?) encrypted for an SEV-ES guest.  I
> don't love bleeding the VM type into x86.c, but for one-off quirks like this
> I think it'd be preferable to adding a kvm_x86_ops hook.
> 
> int kvm_arch_vcpu_ioctl_set_guest_debug(...)
> {
> 	if (vcpu->arch.guest_state_protected ||
> 	    kvm->arch.vm_type == KVM_X86_SEV_ES_VM)
> 		return -EINVAL;
> }
> 

I haven't explored that, I'll look into it.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 09/35] KVM: SVM: Do not emulate MMIO under SEV-ES
  2020-09-14 21:33   ` Sean Christopherson
@ 2020-09-15 13:38     ` Tom Lendacky
  0 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 13:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 4:33 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:23PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> When a guest is running as an SEV-ES guest, it is not possible to emulate
>> MMIO. Add support to prevent trying to perform MMIO emulation.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kvm/mmu/mmu.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index a5d0207e7189..2e1b8b876286 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -5485,6 +5485,13 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
>>  	if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
>>  		emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
>>  emulate:
>> +	/*
>> +	 * When the guest is an SEV-ES guest, emulation is not possible.  Allow
>> +	 * the guest to handle the MMIO emulation.
>> +	 */
>> +	if (vcpu->arch.vmsa_encrypted)
>> +		return 1;
> 
> A better approach is to refactor need_emulation_on_page_fault() (the hook
> that's just out of sight in this patch) into a more generic
> kvm_x86_ops.is_emulatable() so that the latter can be used to kill emulation
> everywhere, and for other reasons.  E.g. TDX obviously shares very similar
> logic, but SGX also adds a case where KVM can theoretically end up in an
> emulator path without the ability to access the necessary guest state.
> 
> I have exactly such a prep patch (because SGX and TDX...), I'll get it posted
> in the next day or two.

Sounds good. I'll check it out when it's posted.

Thanks,
Tom

> 
>> +
>>  	/*
>>  	 * On AMD platforms, under certain conditions insn_len may be zero on #NPF.
>>  	 * This can happen if a guest gets a page-fault on data access but the HW
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  2020-09-14 21:37   ` Sean Christopherson
@ 2020-09-15 14:19     ` Tom Lendacky
       [not found]       ` <20200915163342.GC8420@sjchrist-ice>
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 14:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 4:37 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Since many of the registers used by the SEV-ES are encrypted and cannot
>> be read or written, adjust the __get_sregs() / __set_sregs() to only get
>> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
>> is encrypted.
> 
> Is there an actual use case for writing said registers after the VMSA is
> encrypted?  Assuming there's a separate "debug mode" and live migration has
> special logic, can KVM simply reject the ioctl() if guest state is protected?

Yeah, I originally had it that way but one of the folks looking at live
migration for SEV-ES thought it would be easier given the way Qemu does
things. But I think it's easy enough to batch the tracking registers into
the VMSA state that is being transferred during live migration. Let me
check that out and likely the SET ioctl() could just skip all the regs.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 26/35] KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
       [not found]   ` <20200914213917.GD7192@sjchrist-ice>
@ 2020-09-15 14:25     ` Tom Lendacky
  2020-09-15 16:34       ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 14:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 4:39 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:40PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> The guest FPU is automatically restored on VMRUN and saved on VMEXIT by
>> the hardware, so there is no reason to do this in KVM.
> 
> I assume hardware has its own buffer?  If so, a better approach would be to
> not allocate arch.guest_fpu in the first place, and then rework KVM to key
> off !guest_fpu.

Yup, let me look into that.

Thanks,
Tom

> 
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kvm/svm/svm.c |  8 ++++++--
>>  arch/x86/kvm/x86.c     | 18 ++++++++++++++----
>>  2 files changed, 20 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index b35c2de1130c..48699c41b62a 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3682,7 +3682,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>>  		svm_set_dr6(svm, DR6_FIXED_1 | DR6_RTM);
>>  
>>  	clgi();
>> -	kvm_load_guest_xsave_state(vcpu);
>> +
>> +	if (!sev_es_guest(svm->vcpu.kvm))
>> +		kvm_load_guest_xsave_state(vcpu);
>>  
>>  	if (lapic_in_kernel(vcpu) &&
>>  		vcpu->arch.apic->lapic_timer.timer_advance_ns)
>> @@ -3728,7 +3730,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>>  	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
>>  		kvm_before_interrupt(&svm->vcpu);
>>  
>> -	kvm_load_host_xsave_state(vcpu);
>> +	if (!sev_es_guest(svm->vcpu.kvm))
>> +		kvm_load_host_xsave_state(vcpu);
>> +
>>  	stgi();
>>  
>>  	/* Any pending NMI will happen here */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 76efe70cd635..a53e24c1c5d1 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -8896,9 +8896,14 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>>  
>>  	kvm_save_current_fpu(vcpu->arch.user_fpu);
>>  
>> -	/* PKRU is separately restored in kvm_x86_ops.run.  */
>> -	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
>> -				~XFEATURE_MASK_PKRU);
>> +	/*
>> +	 * An encrypted save area means that the guest state can't be
>> +	 * set by the hypervisor, so skip trying to set it.
>> +	 */
>> +	if (!vcpu->arch.vmsa_encrypted)
>> +		/* PKRU is separately restored in kvm_x86_ops.run. */
>> +		__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
>> +					~XFEATURE_MASK_PKRU);
>>  
>>  	fpregs_mark_activate();
>>  	fpregs_unlock();
>> @@ -8911,7 +8916,12 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>>  {
>>  	fpregs_lock();
>>  
>> -	kvm_save_current_fpu(vcpu->arch.guest_fpu);
>> +	/*
>> +	 * An encrypted save area means that the guest state can't be
>> +	 * read/saved by the hypervisor, so skip trying to save it.
>> +	 */
>> +	if (!vcpu->arch.vmsa_encrypted)
>> +		kvm_save_current_fpu(vcpu->arch.guest_fpu);
>>  
>>  	copy_kernel_to_fpregs(&vcpu->arch.user_fpu->state);
>>  
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 28/35] KVM: X86: Update kvm_skip_emulated_instruction() for an SEV-ES guest
  2020-09-14 21:51   ` Sean Christopherson
@ 2020-09-15 14:57     ` Tom Lendacky
  0 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 14:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 4:51 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:42PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> The register state for an SEV-ES guest is encrypted so the value of the
>> RIP cannot be updated. For an automatic exit, the RIP will be advanced
>> as necessary. For a non-automatic exit, it is up to the #VC handler in
>> the guest to advance the RIP.
>>
>> Add support to skip any RIP updates in kvm_skip_emulated_instruction()
>> for an SEV-ES guest.
> 
> Is there a reason this can't be handled in svm?  E.g. can KVM be reworked
> to effectively split the emulation logic so that it's a bug for KVM to end
> up trying to modify RIP?
> 
> Also, patch 06 modifies SVM's skip_emulated_instruction() to skip the RIP
> update, but keeps the "svm_set_interrupt_shadow(vcpu, 0)" logic.  Seems like
> either that change or this one is wrong.

I added this because of the get_rflags() call. But let me look into
changing get_rflags() in svm.c and see if this patch can go away.

Thanks,
Tom

> 
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kvm/x86.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 23564d02d158..1dbdca607511 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6874,13 +6874,17 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu)
>>  
>>  int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
>>  {
>> -	unsigned long rflags = kvm_x86_ops.get_rflags(vcpu);
>> +	unsigned long rflags;
>>  	int r;
>>  
>>  	r = kvm_x86_ops.skip_emulated_instruction(vcpu);
>>  	if (unlikely(!r))
>>  		return 0;
>>  
>> +	if (vcpu->arch.vmsa_encrypted)
>> +		return 1;
>> +
>> +	rflags = kvm_x86_ops.get_rflags(vcpu);
>>  	/*
>>  	 * rflags is the old, "raw" value of the flags.  The new value has
>>  	 * not been saved yet.
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 20/35] KVM: SVM: Add SEV/SEV-ES support for intercepting INVD
  2020-09-14 22:00   ` Sean Christopherson
@ 2020-09-15 15:08     ` Tom Lendacky
  0 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 15:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 5:00 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:34PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> The INVD instruction intercept performs emulation. Emulation can't be done
>> on an SEV or SEV-ES guest because the guest memory is encrypted.
>>
>> Provide a specific intercept routine for the INVD intercept. Within this
>> intercept routine, skip the instruction for an SEV or SEV-ES guest since
>> it is emulated as a NOP anyway.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kvm/svm/svm.c | 13 ++++++++++++-
>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 37c98e85aa62..ac64a5b128b2 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -2275,6 +2275,17 @@ static int iret_interception(struct vcpu_svm *svm)
>>  	return 1;
>>  }
>>  
>> +static int invd_interception(struct vcpu_svm *svm)
>> +{
>> +	/*
>> +	 * Can't do emulation on any type of SEV guest and INVD is emulated
>> +	 * as a NOP, so just skip it.
>> +	 */
>> +	return (sev_guest(svm->vcpu.kvm))
> 
> Should this be a standalone/backported fix for SEV?

Yes. Let me split it out and send it separately.

Thanks,
Tom

> 
>> +		? kvm_skip_emulated_instruction(&svm->vcpu)
>> +		: kvm_emulate_instruction(&svm->vcpu, 0);
>> +}
>> +
>>  static int invlpg_interception(struct vcpu_svm *svm)
>>  {
>>  	if (!static_cpu_has(X86_FEATURE_DECODEASSISTS))
>> @@ -2912,7 +2923,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
>>  	[SVM_EXIT_RDPMC]			= rdpmc_interception,
>>  	[SVM_EXIT_CPUID]			= cpuid_interception,
>>  	[SVM_EXIT_IRET]                         = iret_interception,
>> -	[SVM_EXIT_INVD]                         = emulate_on_interception,
>> +	[SVM_EXIT_INVD]                         = invd_interception,
>>  	[SVM_EXIT_PAUSE]			= pause_interception,
>>  	[SVM_EXIT_HLT]				= halt_interception,
>>  	[SVM_EXIT_INVLPG]			= invlpg_interception,
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 21/35] KVM: SVM: Add support for EFER write traps for an SEV-ES guest
       [not found]   ` <20200914220800.GI7192@sjchrist-ice>
@ 2020-09-15 15:45     ` Tom Lendacky
  0 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 15:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 5:08 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:35PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> For SEV-ES guests, the interception of EFER write access is not
>> recommended. EFER interception occurs prior to EFER being modified and
>> the hypervisor is unable to modify EFER itself because the register is
>> located in the encrypted register state.
>>
>> SEV-ES guests introduce a new EFER write trap. This trap provides
>> intercept support of an EFER write after it has been modified. The new
>> EFER value is provided in the VMCB EXITINFO1 field, allowing the
>> hypervisor to track the setting of the guest EFER.
>>
>> Add support to track the value of the guest EFER value using the EFER
>> write trap so that the hypervisor understands the guest operating mode.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  1 +
>>  arch/x86/include/uapi/asm/svm.h |  2 ++
>>  arch/x86/kvm/svm/svm.c          | 12 ++++++++++++
>>  arch/x86/kvm/x86.c              | 12 ++++++++++++
>>  4 files changed, 27 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 7320a9c68a5a..b535b690eb66 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1427,6 +1427,7 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>>  int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
>>  		    int reason, bool has_error_code, u32 error_code);
>>  
>> +int kvm_track_efer(struct kvm_vcpu *vcpu, u64 efer);
>>  int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
>>  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
>>  int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
>> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
>> index 0bc3942ffdd3..ce937a242995 100644
>> --- a/arch/x86/include/uapi/asm/svm.h
>> +++ b/arch/x86/include/uapi/asm/svm.h
>> @@ -77,6 +77,7 @@
>>  #define SVM_EXIT_MWAIT_COND    0x08c
>>  #define SVM_EXIT_XSETBV        0x08d
>>  #define SVM_EXIT_RDPRU         0x08e
>> +#define SVM_EXIT_EFER_WRITE_TRAP		0x08f
>>  #define SVM_EXIT_NPF           0x400
>>  #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
>>  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
>> @@ -183,6 +184,7 @@
>>  	{ SVM_EXIT_MONITOR,     "monitor" }, \
>>  	{ SVM_EXIT_MWAIT,       "mwait" }, \
>>  	{ SVM_EXIT_XSETBV,      "xsetbv" }, \
>> +	{ SVM_EXIT_EFER_WRITE_TRAP,	"write_efer_trap" }, \
>>  	{ SVM_EXIT_NPF,         "npf" }, \
>>  	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
>>  	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index ac64a5b128b2..ac467225a51d 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -2466,6 +2466,17 @@ static int cr8_write_interception(struct vcpu_svm *svm)
>>  	return 0;
>>  }
>>  
>> +static int efer_trap(struct vcpu_svm *svm)
>> +{
>> +	int ret;
>> +
>> +	ret = kvm_track_efer(&svm->vcpu, svm->vmcb->control.exit_info_1);
>> +	if (ret)
> 
> Shouldn't this be a WARN or something?  E.g. KVM thinks the WRMSR has faulted,
> while it obviously hasn't, which means KVM's internal model is now out of sync.

Makes sense, I can add something here.

> 
>> +		return ret;
>> +
>> +	return kvm_complete_insn_gp(&svm->vcpu, 0);
>> +}
>> +
>>  static int svm_get_msr_feature(struct kvm_msr_entry *msr)
>>  {
>>  	msr->data = 0;
>> @@ -2944,6 +2955,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
>>  	[SVM_EXIT_MWAIT]			= mwait_interception,
>>  	[SVM_EXIT_XSETBV]			= xsetbv_interception,
>>  	[SVM_EXIT_RDPRU]			= rdpru_interception,
>> +	[SVM_EXIT_EFER_WRITE_TRAP]		= efer_trap,
>>  	[SVM_EXIT_NPF]				= npf_interception,
>>  	[SVM_EXIT_RSM]                          = rsm_interception,
>>  	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 674719d801d2..b65bd0c986d4 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1480,6 +1480,18 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>  	return 0;
>>  }
>>  
>> +int kvm_track_efer(struct kvm_vcpu *vcpu, u64 efer)
>> +{
>> +	struct msr_data msr_info;
>> +
>> +	msr_info.host_initiated = false;
>> +	msr_info.index = MSR_EFER;
>> +	msr_info.data = efer;
>> +
>> +	return set_efer(vcpu, &msr_info);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_track_efer);
> 
> I don't see any reason to put this in x86.c, just copy-paste the guts into
> efer_trap() and s/set_efer/kvm_set_msr_common.

Ok, I can do that. I'll add a comment to indicate that the result of doing
that is that set_efer() is ultimately invoked through that path.

Thanks,
Tom

> 
>> +
>>  void kvm_enable_efer_bits(u64 mask)
>>  {
>>         efer_reserved_bits &= ~mask;
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 22/35] KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  2020-09-14 22:13   ` Sean Christopherson
@ 2020-09-15 15:56     ` Tom Lendacky
  2020-11-30 18:15     ` Paolo Bonzini
  1 sibling, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 15:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 5:13 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:36PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b65bd0c986d4..6f5988c305e1 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -799,11 +799,29 @@ bool pdptrs_changed(struct kvm_vcpu *vcpu)
>>  }
>>  EXPORT_SYMBOL_GPL(pdptrs_changed);
>>  
>> +static void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0,
>> +			     unsigned long cr0)
> 
> What about using __kvm_set_cr*() instead of kvm_post_set_cr*()?  That would
> show that __kvm_set_cr*() is a subordinate of kvm_set_cr*(), and from the
> SVM side would provide the hint that the code is skipping the front end of
> kvm_set_cr*().

Ok, I'll change this (and the others) to __kvm_set_cr* and export them.

> 
>> +{
>> +	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
>> +
>> +	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>> +		kvm_clear_async_pf_completion_queue(vcpu);
>> +		kvm_async_pf_hash_reset(vcpu);
>> +	}
>> +
>> +	if ((cr0 ^ old_cr0) & update_bits)
>> +		kvm_mmu_reset_context(vcpu);
>> +
>> +	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
>> +	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
>> +	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
>> +		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
>> +}
>> +
>>  int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>>  {
>>  	unsigned long old_cr0 = kvm_read_cr0(vcpu);
>>  	unsigned long pdptr_bits = X86_CR0_CD | X86_CR0_NW | X86_CR0_PG;
>> -	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
>>  
>>  	cr0 |= X86_CR0_ET;
>>  
>> @@ -842,22 +860,23 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>>  
>>  	kvm_x86_ops.set_cr0(vcpu, cr0);
>>  
>> -	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>> -		kvm_clear_async_pf_completion_queue(vcpu);
>> -		kvm_async_pf_hash_reset(vcpu);
>> -	}
>> +	kvm_post_set_cr0(vcpu, old_cr0, cr0);
>>  
>> -	if ((cr0 ^ old_cr0) & update_bits)
>> -		kvm_mmu_reset_context(vcpu);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_set_cr0);
>>  
>> -	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
>> -	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
>> -	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
>> -		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
>> +int kvm_track_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
> 
> I really dislike the "track" terminology.  For me, using "track" as the verb
> in a function implies the function activates tracking.  But it's probably a
> moot point, because similar to EFER, I don't see any reason to put the front
> end of the emulation into x86.c.  Both getting old_cr0 and setting
> vcpu->arch.cr0 can be done in svm.c

Yup, I can move that to svm.c.

Thanks,
Tom

> 
>> +{
>> +	unsigned long old_cr0 = kvm_read_cr0(vcpu);
>> +
>> +	vcpu->arch.cr0 = cr0;
>> +
>> +	kvm_post_set_cr0(vcpu, old_cr0, cr0);
>>  
>>  	return 0;
>>  }
>> -EXPORT_SYMBOL_GPL(kvm_set_cr0);
>> +EXPORT_SYMBOL_GPL(kvm_track_cr0);
>>  
>>  void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
>>  {
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 24/35] KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  2020-09-14 22:19   ` Sean Christopherson
@ 2020-09-15 15:57     ` Tom Lendacky
  0 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 15:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 5:19 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:38PM -0500, Tom Lendacky wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 5e5f1e8fed3a..6e445a76b691 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1109,6 +1109,12 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_get_cr8);
>>  
>> +int kvm_track_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
>> +{
>> +	return kvm_set_cr8(vcpu, cr8);
> 
> I'm guessing this was added to achieve consistency at the SVM call sites.
> With the previously suggested changes, kvm_track_cr8() can simply be
> dropped.

Yup.

Thanks,
Tom

> 
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_track_cr8);
>> +
>>  static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
>>  {
>>  	int i;
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 05/35] KVM: SVM: Add initial support for SEV-ES GHCB access to KVM
  2020-09-15 13:24     ` Tom Lendacky
@ 2020-09-15 16:28       ` Sean Christopherson
  2020-09-16 14:54         ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-15 16:28 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Tue, Sep 15, 2020 at 08:24:22AM -0500, Tom Lendacky wrote:
> On 9/14/20 3:58 PM, Sean Christopherson wrote:
> >> @@ -79,6 +88,9 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
> >>  	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
> >>  		return;
> >>  
> >> +	if (kvm_x86_ops.reg_write_override)
> >> +		kvm_x86_ops.reg_write_override(vcpu, reg, val);
> > 
> > 
> > There has to be a more optimal approach for propagating registers between
> > vcpu->arch.regs and the VMSA than adding a per-GPR hook.  Why not simply
> > copy the entire set of registers to/from the VMSA on every exit and entry?
> > AFAICT, valid_bits is only used in the read path, and KVM doesn't do anything
> > sophistated when it hits a !valid_bits reads.
> 
> That would probably be ok. And actually, the code might be able to just
> check the GHCB valid bitmap for valid regs on exit, copy them and then
> clear the bitmap. The write code could check if vmsa_encrypted is set and
> then set a "valid" bit for the reg that could be used to set regs on entry.
> 
> I'm not sure if turning kvm_vcpu_arch.regs into a struct and adding a
> valid bit would be overkill or not.

KVM already has space in regs_avail and regs_dirty for GPRs, they're just not
used by the get/set helpers because they're always loaded/stored for both SVM
and VMX.

I assume nothing will break if KVM "writes" random GPRs in the VMSA?  I can't
see how the guest would achieve any level of security if it wantonly consumes
GPRs, i.e. it's the guest's responsibility to consume only the relevant GPRs.

If that holds true, than avoiding the copying isn't functionally necessary, and
is really just a performance optimization.  One potentially crazy idea would be
to change vcpu->arch.regs to be a pointer (defaults a __regs array), and then
have SEV-ES switch it to point directly at the VMSA array (I think the layout
is identical for x86-64?).

> >> @@ -4012,6 +4052,99 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> >>  		   (svm->vmcb->control.intercept & (1ULL << INTERCEPT_INIT));
> >>  }
> >>  
> >> +/*
> >> + * These return values represent the offset in quad words within the VM save
> >> + * area. This allows them to be accessed by casting the save area to a u64
> >> + * array.
> >> + */
> >> +#define VMSA_REG_ENTRY(_field)	 (offsetof(struct vmcb_save_area, _field) / sizeof(u64))
> >> +#define VMSA_REG_UNDEF		 VMSA_REG_ENTRY(valid_bitmap)
> >> +static inline unsigned int vcpu_to_vmsa_entry(enum kvm_reg reg)
> >> +{
> >> +	switch (reg) {
> >> +	case VCPU_REGS_RAX:	return VMSA_REG_ENTRY(rax);
> >> +	case VCPU_REGS_RBX:	return VMSA_REG_ENTRY(rbx);
> >> +	case VCPU_REGS_RCX:	return VMSA_REG_ENTRY(rcx);
> >> +	case VCPU_REGS_RDX:	return VMSA_REG_ENTRY(rdx);
> >> +	case VCPU_REGS_RSP:	return VMSA_REG_ENTRY(rsp);
> >> +	case VCPU_REGS_RBP:	return VMSA_REG_ENTRY(rbp);
> >> +	case VCPU_REGS_RSI:	return VMSA_REG_ENTRY(rsi);
> >> +	case VCPU_REGS_RDI:	return VMSA_REG_ENTRY(rdi);
> >> +#ifdef CONFIG_X86_64

Is KVM SEV-ES going to support 32-bit builds?

> >> +	case VCPU_REGS_R8:	return VMSA_REG_ENTRY(r8);
> >> +	case VCPU_REGS_R9:	return VMSA_REG_ENTRY(r9);
> >> +	case VCPU_REGS_R10:	return VMSA_REG_ENTRY(r10);
> >> +	case VCPU_REGS_R11:	return VMSA_REG_ENTRY(r11);
> >> +	case VCPU_REGS_R12:	return VMSA_REG_ENTRY(r12);
> >> +	case VCPU_REGS_R13:	return VMSA_REG_ENTRY(r13);
> >> +	case VCPU_REGS_R14:	return VMSA_REG_ENTRY(r14);
> >> +	case VCPU_REGS_R15:	return VMSA_REG_ENTRY(r15);
> >> +#endif
> >> +	case VCPU_REGS_RIP:	return VMSA_REG_ENTRY(rip);
> >> +	default:
> >> +		WARN_ONCE(1, "unsupported VCPU to VMSA register conversion\n");
> >> +		return VMSA_REG_UNDEF;
> >> +	}
> >> +}
> >> +
> >> +/* For SEV-ES guests, populate the vCPU register from the appropriate VMSA/GHCB */
> >> +static void svm_reg_read_override(struct kvm_vcpu *vcpu, enum kvm_reg reg)
> >> +{
> >> +	struct vmcb_save_area *vmsa;
> >> +	struct vcpu_svm *svm;
> >> +	unsigned int entry;
> >> +	unsigned long val;
> >> +	u64 *vmsa_reg;
> >> +
> >> +	if (!sev_es_guest(vcpu->kvm))
> >> +		return;
> >> +
> >> +	entry = vcpu_to_vmsa_entry(reg);
> >> +	if (entry == VMSA_REG_UNDEF)
> >> +		return;
> >> +
> >> +	svm = to_svm(vcpu);
> >> +	vmsa = get_vmsa(svm);
> >> +	vmsa_reg = (u64 *)vmsa;
> >> +	val = (unsigned long)vmsa_reg[entry];
> >> +
> >> +	/* If a GHCB is mapped, check the bitmap of valid entries */
> >> +	if (svm->ghcb) {
> >> +		if (!test_bit(entry, (unsigned long *)vmsa->valid_bitmap))
> >> +			val = 0;
> > 
> > Is KVM relying on this being 0?  Would it make sense to stuff something like
> > 0xaaaa... or 0xdeadbeefdeadbeef so that consumption of bogus data is more
> > noticeable?
> 
> No, KVM isn't relying on this being 0. I thought about using something
> other than 0 here, but settled on just using 0. I'm open to changing that,
> though. I'm not sure if there's an easy way to short-circuit the intercept
> and respond back with an error at this point, that would be optimal.

Ya, responding with an error would be ideal.  At this point, we're taking the
same lazy approach for TDX and effectively consuming garbage if the guest
requests emulation but doesn't expose the necessary GPRs.  That being said,
TDX's guest/host ABI is quite rigid, so all the "is this register valid"
checks could be hardcoded into the higher level "emulation" flows.

Would that also be an option for SEV-ES?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-15 13:37     ` Tom Lendacky
@ 2020-09-15 16:30       ` Sean Christopherson
  2020-09-15 20:13         ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-15 16:30 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Tue, Sep 15, 2020 at 08:37:12AM -0500, Tom Lendacky wrote:
> On 9/14/20 4:26 PM, Sean Christopherson wrote:
> > On Mon, Sep 14, 2020 at 03:15:22PM -0500, Tom Lendacky wrote:
> >> From: Tom Lendacky <thomas.lendacky@amd.com>
> >>
> >> Since the guest register state of an SEV-ES guest is encrypted, debugging
> >> is not supported. Update the code to prevent guest debugging when the
> >> guest is an SEV-ES guest. This includes adding a callable function that
> >> is used to determine if the guest supports being debugged.
> >>
> >> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> >> ---
> >>  arch/x86/include/asm/kvm_host.h |  2 ++
> >>  arch/x86/kvm/svm/svm.c          | 16 ++++++++++++++++
> >>  arch/x86/kvm/vmx/vmx.c          |  7 +++++++
> >>  arch/x86/kvm/x86.c              |  3 +++
> >>  4 files changed, 28 insertions(+)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index c900992701d6..3e2a3d2a8ba8 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -1234,6 +1234,8 @@ struct kvm_x86_ops {
> >>  	void (*reg_read_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
> >>  	void (*reg_write_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg,
> >>  				   unsigned long val);
> >> +
> >> +	bool (*allow_debug)(struct kvm *kvm);
> > 
> > Why add both allow_debug() and vmsa_encrypted?  I assume there are scenarios
> > where allow_debug() != vmsa_encrypted?  E.g. is there a debug mode for SEV-ES
> > where the VMSA is not encrypted, but KVM (ironically) can't intercept #DBs or
> > something?
> 
> No, once the guest has had LAUNCH_UPDATE_VMSA run against the vCPUs, then
> the vCPU states are all encrypted. But that doesn't mean that debugging
> can't be done in the future.

I don't quite follow the "doesn't mean debugging can't be done in the future".
Does that imply that debugging could be supported for SEV-ES guests, even if
they have an encrypted VMSA?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 26/35] KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
  2020-09-15 14:25     ` Tom Lendacky
@ 2020-09-15 16:34       ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2020-09-15 16:34 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Tue, Sep 15, 2020 at 09:25:18AM -0500, Tom Lendacky wrote:
> On 9/14/20 4:39 PM, Sean Christopherson wrote:
> > On Mon, Sep 14, 2020 at 03:15:40PM -0500, Tom Lendacky wrote:
> >> From: Tom Lendacky <thomas.lendacky@amd.com>
> >>
> >> The guest FPU is automatically restored on VMRUN and saved on VMEXIT by
> >> the hardware, so there is no reason to do this in KVM.
> > 
> > I assume hardware has its own buffer?  If so, a better approach would be to
> > not allocate arch.guest_fpu in the first place, and then rework KVM to key
> > off !guest_fpu.
> 
> Yup, let me look into that.

Heh, it's on our todo list as well :-)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-09-14 22:59 ` [RFC PATCH 00/35] SEV-ES hypervisor support Sean Christopherson
@ 2020-09-15 17:22   ` Tom Lendacky
  2020-09-15 17:32     ` Sean Christopherson
  2020-09-16  0:19     ` Sean Christopherson
  0 siblings, 2 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 17:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/14/20 5:59 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:14PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> This patch series provides support for running SEV-ES guests under KVM.
> 
> From the x86/VMX side of things, the GPR hooks are the only changes that I
> strongly dislike.
> 
> For the vmsa_encrypted flag and related things like allow_debug(), I'd
> really like to aim for a common implementation between SEV-ES and TDX[*] from
> the get go, within reason obviously.  From a code perspective, I don't think
> it will be too onerous as the basic tenets are quite similar, e.g. guest
> state is off limits, FPU state is autoswitched, etc..., but I suspect (or
> maybe worry?) that there are enough minor differences that we'll want a more
> generic way of marking ioctls() as disallowed to avoid having one-off checks
> all over the place.
> 
> That being said, it may also be that there are some ioctls() that should be
> disallowed under SEV-ES, but aren't in this series.  E.g. I assume
> kvm_vcpu_ioctl_smi() should be rejected as KVM can't do the necessary
> emulation (I assume this applies to vanilla SEV as well?).

Right, SMM isn't currently supported under SEV-ES. SEV does support SMM,
though, since the register state can be altered to change over to the SMM
register state. So the SMI ioctl() is ok for SEV.

> 
> One thought to try and reconcile the differences between SEV-ES and TDX would
> be expicitly list which ioctls() are and aren't supported and go from there?
> E.g. if there is 95% overlap than we probably don't need to get fancy with
> generic allow/deny.
> 
> Given that we don't yet have publicly available KVM code for TDX, what if I
> generate and post a list of ioctls() that are denied by either SEV-ES or TDX,
> organized by the denier(s)?  Then for the ioctls() that are denied by one and
> not the other, we add a brief explanation of why it's denied?
> 
> If that sounds ok, I'll get the list and the TDX side of things posted
> tomorrow.

That sounds good.

Thanks,
Tom

> 
> Thanks!
> 
> 
> [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftware.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Fdevelop%2Farticles%2Fintel-trust-domain-extensions.html&amp;data=02%7C01%7Cthomas.lendacky%40amd.com%7C000b3d355429471694fa08d85901e575%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637357211966578452&amp;sdata=nEhXcrxY7KmQVCsVJrX20bagZLbzwVqlT%2BYvhSYCjHI%3D&amp;reserved=0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-09-15 17:22   ` Tom Lendacky
@ 2020-09-15 17:32     ` Sean Christopherson
  2020-09-15 20:05       ` Brijesh Singh
  2020-09-16  0:19     ` Sean Christopherson
  1 sibling, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-15 17:32 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Tue, Sep 15, 2020 at 12:22:05PM -0500, Tom Lendacky wrote:
> On 9/14/20 5:59 PM, Sean Christopherson wrote:
> > On Mon, Sep 14, 2020 at 03:15:14PM -0500, Tom Lendacky wrote:
> >> From: Tom Lendacky <thomas.lendacky@amd.com>
> >>
> >> This patch series provides support for running SEV-ES guests under KVM.
> > 
> > From the x86/VMX side of things, the GPR hooks are the only changes that I
> > strongly dislike.
> > 
> > For the vmsa_encrypted flag and related things like allow_debug(), I'd
> > really like to aim for a common implementation between SEV-ES and TDX[*] from
> > the get go, within reason obviously.  From a code perspective, I don't think
> > it will be too onerous as the basic tenets are quite similar, e.g. guest
> > state is off limits, FPU state is autoswitched, etc..., but I suspect (or
> > maybe worry?) that there are enough minor differences that we'll want a more
> > generic way of marking ioctls() as disallowed to avoid having one-off checks
> > all over the place.
> > 
> > That being said, it may also be that there are some ioctls() that should be
> > disallowed under SEV-ES, but aren't in this series.  E.g. I assume
> > kvm_vcpu_ioctl_smi() should be rejected as KVM can't do the necessary
> > emulation (I assume this applies to vanilla SEV as well?).
> 
> Right, SMM isn't currently supported under SEV-ES. SEV does support SMM,
> though, since the register state can be altered to change over to the SMM
> register state. So the SMI ioctl() is ok for SEV.

But isn't guest memory inaccessible for SEV?  E.g. how does KVM emulate the
save/restore to/from SMRAM?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-09-15 17:32     ` Sean Christopherson
@ 2020-09-15 20:05       ` Brijesh Singh
  0 siblings, 0 replies; 86+ messages in thread
From: Brijesh Singh @ 2020-09-15 20:05 UTC (permalink / raw)
  To: Sean Christopherson, Tom Lendacky
  Cc: brijesh.singh, kvm, linux-kernel, x86, Paolo Bonzini,
	Jim Mattson, Joerg Roedel, Vitaly Kuznetsov, Wanpeng Li,
	Borislav Petkov, Ingo Molnar, Thomas Gleixner


On 9/15/20 12:32 PM, Sean Christopherson wrote:
> On Tue, Sep 15, 2020 at 12:22:05PM -0500, Tom Lendacky wrote:
>> On 9/14/20 5:59 PM, Sean Christopherson wrote:
>>> On Mon, Sep 14, 2020 at 03:15:14PM -0500, Tom Lendacky wrote:
>>>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>>>
>>>> This patch series provides support for running SEV-ES guests under KVM.
>>> From the x86/VMX side of things, the GPR hooks are the only changes that I
>>> strongly dislike.
>>>
>>> For the vmsa_encrypted flag and related things like allow_debug(), I'd
>>> really like to aim for a common implementation between SEV-ES and TDX[*] from
>>> the get go, within reason obviously.  From a code perspective, I don't think
>>> it will be too onerous as the basic tenets are quite similar, e.g. guest
>>> state is off limits, FPU state is autoswitched, etc..., but I suspect (or
>>> maybe worry?) that there are enough minor differences that we'll want a more
>>> generic way of marking ioctls() as disallowed to avoid having one-off checks
>>> all over the place.
>>>
>>> That being said, it may also be that there are some ioctls() that should be
>>> disallowed under SEV-ES, but aren't in this series.  E.g. I assume
>>> kvm_vcpu_ioctl_smi() should be rejected as KVM can't do the necessary
>>> emulation (I assume this applies to vanilla SEV as well?).
>> Right, SMM isn't currently supported under SEV-ES. SEV does support SMM,
>> though, since the register state can be altered to change over to the SMM
>> register state. So the SMI ioctl() is ok for SEV.
> But isn't guest memory inaccessible for SEV?  E.g. how does KVM emulate the
> save/restore to/from SMRAM?


In SEV, to support the SMM, the guest BIOS (Ovmf) maps the SMM state
save area as unencrypted. This allows the KVM to access the SMM state
saved area as unencrypted.  SVM also provides intercepts for the RSM, so
KVM does not need to fetch and decode the instruction bytes to know
whether the VMEXIT was due to exiting from the SMM mode.



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-15 16:30       ` Sean Christopherson
@ 2020-09-15 20:13         ` Tom Lendacky
  2020-09-16 15:11           ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 20:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/15/20 11:30 AM, Sean Christopherson wrote:
> On Tue, Sep 15, 2020 at 08:37:12AM -0500, Tom Lendacky wrote:
>> On 9/14/20 4:26 PM, Sean Christopherson wrote:
>>> On Mon, Sep 14, 2020 at 03:15:22PM -0500, Tom Lendacky wrote:
>>>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>>>
>>>> Since the guest register state of an SEV-ES guest is encrypted, debugging
>>>> is not supported. Update the code to prevent guest debugging when the
>>>> guest is an SEV-ES guest. This includes adding a callable function that
>>>> is used to determine if the guest supports being debugged.
>>>>
>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>>> ---
>>>>  arch/x86/include/asm/kvm_host.h |  2 ++
>>>>  arch/x86/kvm/svm/svm.c          | 16 ++++++++++++++++
>>>>  arch/x86/kvm/vmx/vmx.c          |  7 +++++++
>>>>  arch/x86/kvm/x86.c              |  3 +++
>>>>  4 files changed, 28 insertions(+)
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index c900992701d6..3e2a3d2a8ba8 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -1234,6 +1234,8 @@ struct kvm_x86_ops {
>>>>  	void (*reg_read_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
>>>>  	void (*reg_write_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg,
>>>>  				   unsigned long val);
>>>> +
>>>> +	bool (*allow_debug)(struct kvm *kvm);
>>>
>>> Why add both allow_debug() and vmsa_encrypted?  I assume there are scenarios
>>> where allow_debug() != vmsa_encrypted?  E.g. is there a debug mode for SEV-ES
>>> where the VMSA is not encrypted, but KVM (ironically) can't intercept #DBs or
>>> something?
>>
>> No, once the guest has had LAUNCH_UPDATE_VMSA run against the vCPUs, then
>> the vCPU states are all encrypted. But that doesn't mean that debugging
>> can't be done in the future.
> 
> I don't quite follow the "doesn't mean debugging can't be done in the future".
> Does that imply that debugging could be supported for SEV-ES guests, even if
> they have an encrypted VMSA?

Almost anything can be done with software. It would require a lot of
hypervisor and guest code and changes to the GHCB spec, etc. So given
that, probably just the check for arch.guest_state_protected is enough for
now. I'll just need to be sure none of the debugging paths can be taken
before the VMSA is encrypted.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
       [not found]       ` <20200915163342.GC8420@sjchrist-ice>
@ 2020-09-15 20:37         ` Tom Lendacky
  2020-09-15 22:44           ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-15 20:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/15/20 11:33 AM, Sean Christopherson wrote:
> On Tue, Sep 15, 2020 at 09:19:46AM -0500, Tom Lendacky wrote:
>> On 9/14/20 4:37 PM, Sean Christopherson wrote:
>>> On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
>>>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>>>
>>>> Since many of the registers used by the SEV-ES are encrypted and cannot
>>>> be read or written, adjust the __get_sregs() / __set_sregs() to only get
>>>> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
>>>> is encrypted.
>>>
>>> Is there an actual use case for writing said registers after the VMSA is
>>> encrypted?  Assuming there's a separate "debug mode" and live migration has
>>> special logic, can KVM simply reject the ioctl() if guest state is protected?
>>
>> Yeah, I originally had it that way but one of the folks looking at live
>> migration for SEV-ES thought it would be easier given the way Qemu does
>> things. But I think it's easy enough to batch the tracking registers into
>> the VMSA state that is being transferred during live migration. Let me
>> check that out and likely the SET ioctl() could just skip all the regs.
> 
> Hmm, that would be ideal.  How are the tracked registers validated when they're
> loaded at the destination?  It seems odd/dangerous that KVM would have full
> control over efer/cr0/cr4/cr8.  I.e. why is KVM even responsibile for migrating
> that information, e.g. as opposed to migrating an opaque blob that contains
> encrypted versions of those registers?
> 

KVM doesn't have control of them. They are part of the guest's encrypted
state and that is what the guest uses. KVM can't alter the value that the
guest is using for them once the VMSA is encrypted. However, KVM makes
some decisions based on the values it thinks it knows.  For example, early
on I remember the async PF support failing because the CR0 that KVM
thought the guest had didn't have the PE bit set, even though the guest
was in protected mode. So KVM didn't include the error code in the
exception it injected (is_protmode() was false) and things failed. Without
syncing these values after live migration, things also fail (probably for
the same reason). So the idea is to just keep KVM apprised of the values
that the guest has.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  2020-09-15 20:37         ` Tom Lendacky
@ 2020-09-15 22:44           ` Sean Christopherson
  2020-11-30 18:28             ` Paolo Bonzini
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-15 22:44 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Tue, Sep 15, 2020 at 03:37:21PM -0500, Tom Lendacky wrote:
> On 9/15/20 11:33 AM, Sean Christopherson wrote:
> > On Tue, Sep 15, 2020 at 09:19:46AM -0500, Tom Lendacky wrote:
> >> On 9/14/20 4:37 PM, Sean Christopherson wrote:
> >>> On Mon, Sep 14, 2020 at 03:15:39PM -0500, Tom Lendacky wrote:
> >>>> From: Tom Lendacky <thomas.lendacky@amd.com>
> >>>>
> >>>> Since many of the registers used by the SEV-ES are encrypted and cannot
> >>>> be read or written, adjust the __get_sregs() / __set_sregs() to only get
> >>>> or set the registers being tracked (efer, cr0, cr4 and cr8) once the VMSA
> >>>> is encrypted.
> >>>
> >>> Is there an actual use case for writing said registers after the VMSA is
> >>> encrypted?  Assuming there's a separate "debug mode" and live migration has
> >>> special logic, can KVM simply reject the ioctl() if guest state is protected?
> >>
> >> Yeah, I originally had it that way but one of the folks looking at live
> >> migration for SEV-ES thought it would be easier given the way Qemu does
> >> things. But I think it's easy enough to batch the tracking registers into
> >> the VMSA state that is being transferred during live migration. Let me
> >> check that out and likely the SET ioctl() could just skip all the regs.
> > 
> > Hmm, that would be ideal.  How are the tracked registers validated when they're
> > loaded at the destination?  It seems odd/dangerous that KVM would have full
> > control over efer/cr0/cr4/cr8.  I.e. why is KVM even responsibile for migrating
> > that information, e.g. as opposed to migrating an opaque blob that contains
> > encrypted versions of those registers?
> > 
> 
> KVM doesn't have control of them. They are part of the guest's encrypted
> state and that is what the guest uses. KVM can't alter the value that the
> guest is using for them once the VMSA is encrypted. However, KVM makes
> some decisions based on the values it thinks it knows.  For example, early
> on I remember the async PF support failing because the CR0 that KVM
> thought the guest had didn't have the PE bit set, even though the guest
> was in protected mode. So KVM didn't include the error code in the
> exception it injected (is_protmode() was false) and things failed. Without
> syncing these values after live migration, things also fail (probably for
> the same reason). So the idea is to just keep KVM apprised of the values
> that the guest has.

Ah, gotcha.  Migrating tracked state through the VMSA would probably be ideal.
The semantics of __set_sregs() kinda setting state but not reaaaally setting
state would be weird.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-09-15 17:22   ` Tom Lendacky
  2020-09-15 17:32     ` Sean Christopherson
@ 2020-09-16  0:19     ` Sean Christopherson
  2020-10-13 20:26       ` Tom Lendacky
  2020-11-30 15:31       ` Paolo Bonzini
  1 sibling, 2 replies; 86+ messages in thread
From: Sean Christopherson @ 2020-09-16  0:19 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Tue, Sep 15, 2020 at 12:22:05PM -0500, Tom Lendacky wrote:
> On 9/14/20 5:59 PM, Sean Christopherson wrote:
> > Given that we don't yet have publicly available KVM code for TDX, what if I
> > generate and post a list of ioctls() that are denied by either SEV-ES or TDX,
> > organized by the denier(s)?  Then for the ioctls() that are denied by one and
> > not the other, we add a brief explanation of why it's denied?
> > 
> > If that sounds ok, I'll get the list and the TDX side of things posted
> > tomorrow.
> 
> That sounds good.

TDX completely blocks the following ioctl()s:

  kvm_vcpu_ioctl_interrupt
  kvm_vcpu_ioctl_smi
  kvm_vcpu_ioctl_x86_setup_mce
  kvm_vcpu_ioctl_x86_set_mce
  kvm_vcpu_ioctl_x86_get_debugregs
  kvm_vcpu_ioctl_x86_set_debugregs
  kvm_vcpu_ioctl_x86_get_xsave
  kvm_vcpu_ioctl_x86_set_xsave
  kvm_vcpu_ioctl_x86_get_xcrs
  kvm_vcpu_ioctl_x86_set_xcrs
  kvm_arch_vcpu_ioctl_get_regs
  kvm_arch_vcpu_ioctl_set_regs
  kvm_arch_vcpu_ioctl_get_sregs
  kvm_arch_vcpu_ioctl_set_sregs
  kvm_arch_vcpu_ioctl_set_guest_debug
  kvm_arch_vcpu_ioctl_get_fpu
  kvm_arch_vcpu_ioctl_set_fpu

Looking through the code, I think kvm_arch_vcpu_ioctl_get_mpstate() and
kvm_arch_vcpu_ioctl_set_mpstate() should also be disallowed, we just haven't
actually done so.

There are also two helper functions that are "blocked".
dm_request_for_irq_injection() returns false if guest_state_protected, and
post_kvm_run_save() shoves dummy state.

TDX also selectively blocks/skips portions of other ioctl()s so that the
TDX code itself can yell loudly if e.g. .get_cpl() is invoked.  The event
injection restrictions are due to direct injection not being allowed (except
for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
exception injection is completely disallowed.

  kvm_vcpu_ioctl_x86_get_vcpu_events:
	if (!vcpu->kvm->arch.guest_state_protected)
        	events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);

  kvm_arch_vcpu_put:
        if (vcpu->preempted && !vcpu->kvm->arch.guest_state_protected)
                vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu);

  kvm_vcpu_ioctl_x86_set_vcpu_events:
	u32 allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING |
			    KVM_VCPUEVENT_VALID_SIPI_VECTOR |
			    KVM_VCPUEVENT_VALID_SHADOW |
			    KVM_VCPUEVENT_VALID_SMM |
			    KVM_VCPUEVENT_VALID_PAYLOAD;

	if (vcpu->kvm->arch.guest_state_protected)
		allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING;


  kvm_arch_vcpu_ioctl_run:
	if (vcpu->kvm->arch.guest_state_protected)
		kvm_sync_valid_fields = KVM_SYNC_X86_EVENTS;
	else
		kvm_sync_valid_fields = KVM_SYNC_X86_VALID_FIELDS;


In addition to the more generic guest_state_protected, we also (obviously
tentatively) have a few other flags to deal with aspects of TDX that I'm
fairly certain don't apply to SEV-ES:

  tsc_immutable - KVM doesn't have write access to the TSC offset of the
                  guest.

  eoi_intercept_unsupported - KVM can't intercept EOIs (doesn't have access
                              to EOI bitmaps) and so can't support level
                              triggered interrupts, at least not without
                              extra pain.

  readonly_mem_unsupported - Secure EPT (analagous to SNP) requires RWX
                             permissions for all private/encrypted memory.
                             S-EPT isn't optional, so we get the joy of
                             adding this right off the bat...

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 05/35] KVM: SVM: Add initial support for SEV-ES GHCB access to KVM
  2020-09-15 16:28       ` Sean Christopherson
@ 2020-09-16 14:54         ` Tom Lendacky
  0 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-16 14:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/15/20 11:28 AM, Sean Christopherson wrote:
> On Tue, Sep 15, 2020 at 08:24:22AM -0500, Tom Lendacky wrote:
>> On 9/14/20 3:58 PM, Sean Christopherson wrote:
>>>> @@ -79,6 +88,9 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
>>>>  	if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
>>>>  		return;
>>>>  
>>>> +	if (kvm_x86_ops.reg_write_override)
>>>> +		kvm_x86_ops.reg_write_override(vcpu, reg, val);
>>>
>>>
>>> There has to be a more optimal approach for propagating registers between
>>> vcpu->arch.regs and the VMSA than adding a per-GPR hook.  Why not simply
>>> copy the entire set of registers to/from the VMSA on every exit and entry?
>>> AFAICT, valid_bits is only used in the read path, and KVM doesn't do anything
>>> sophistated when it hits a !valid_bits reads.
>>
>> That would probably be ok. And actually, the code might be able to just
>> check the GHCB valid bitmap for valid regs on exit, copy them and then
>> clear the bitmap. The write code could check if vmsa_encrypted is set and
>> then set a "valid" bit for the reg that could be used to set regs on entry.
>>
>> I'm not sure if turning kvm_vcpu_arch.regs into a struct and adding a
>> valid bit would be overkill or not.
> 
> KVM already has space in regs_avail and regs_dirty for GPRs, they're just not
> used by the get/set helpers because they're always loaded/stored for both SVM
> and VMX.
> 
> I assume nothing will break if KVM "writes" random GPRs in the VMSA?  I can't
> see how the guest would achieve any level of security if it wantonly consumes
> GPRs, i.e. it's the guest's responsibility to consume only the relevant GPRs.

Right, the guest should only read the registers that it is expecting to be
provided by the hypervisor as set forth in the GHCB spec. It shouldn't
load any other registers that the hypervisor provides. The Linux SEV-ES
guest support follows this model and will only load the registers that are
specified via the GHCB spec for a particular NAE event, ignoring anything
else provided.

> 
> If that holds true, than avoiding the copying isn't functionally necessary, and
> is really just a performance optimization.  One potentially crazy idea would be
> to change vcpu->arch.regs to be a pointer (defaults a __regs array), and then
> have SEV-ES switch it to point directly at the VMSA array (I think the layout
> is identical for x86-64?).

That would be nice, but it isn't quite laid out like that. Before SEV-ES
support, RAX and RSP were the only GPRs saved. With the arrival of SEV-ES,
the remaining registers were added to the VMSA, but a number of bytes
after RAX and RSP. So right now, there are reserved areas where RAX and
RSP would have been at the new register block in the VMSA (see offset
0x300 in the VMSA layout of the APM volume 2,
https://www.amd.com/system/files/TechDocs/24593.pdf).

I might be able to move the RAX and RSP values before the VMSA is
encrypted (or the GHCB returned), assuming those fields would stay
reserved, but I don't think that can be guaranteed.

Let me see if I can put something together using regs_avail and regs_dirty.

> 
>>>> @@ -4012,6 +4052,99 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
>>>>  		   (svm->vmcb->control.intercept & (1ULL << INTERCEPT_INIT));
>>>>  }
>>>>  
>>>> +/*
>>>> + * These return values represent the offset in quad words within the VM save
>>>> + * area. This allows them to be accessed by casting the save area to a u64
>>>> + * array.
>>>> + */
>>>> +#define VMSA_REG_ENTRY(_field)	 (offsetof(struct vmcb_save_area, _field) / sizeof(u64))
>>>> +#define VMSA_REG_UNDEF		 VMSA_REG_ENTRY(valid_bitmap)
>>>> +static inline unsigned int vcpu_to_vmsa_entry(enum kvm_reg reg)
>>>> +{
>>>> +	switch (reg) {
>>>> +	case VCPU_REGS_RAX:	return VMSA_REG_ENTRY(rax);
>>>> +	case VCPU_REGS_RBX:	return VMSA_REG_ENTRY(rbx);
>>>> +	case VCPU_REGS_RCX:	return VMSA_REG_ENTRY(rcx);
>>>> +	case VCPU_REGS_RDX:	return VMSA_REG_ENTRY(rdx);
>>>> +	case VCPU_REGS_RSP:	return VMSA_REG_ENTRY(rsp);
>>>> +	case VCPU_REGS_RBP:	return VMSA_REG_ENTRY(rbp);
>>>> +	case VCPU_REGS_RSI:	return VMSA_REG_ENTRY(rsi);
>>>> +	case VCPU_REGS_RDI:	return VMSA_REG_ENTRY(rdi);
>>>> +#ifdef CONFIG_X86_64
> 
> Is KVM SEV-ES going to support 32-bit builds?

No, SEV-ES won't support 32-bit builds and since those fields are always
defined, I can just remove this #ifdef.

> 
>>>> +	case VCPU_REGS_R8:	return VMSA_REG_ENTRY(r8);
>>>> +	case VCPU_REGS_R9:	return VMSA_REG_ENTRY(r9);
>>>> +	case VCPU_REGS_R10:	return VMSA_REG_ENTRY(r10);
>>>> +	case VCPU_REGS_R11:	return VMSA_REG_ENTRY(r11);
>>>> +	case VCPU_REGS_R12:	return VMSA_REG_ENTRY(r12);
>>>> +	case VCPU_REGS_R13:	return VMSA_REG_ENTRY(r13);
>>>> +	case VCPU_REGS_R14:	return VMSA_REG_ENTRY(r14);
>>>> +	case VCPU_REGS_R15:	return VMSA_REG_ENTRY(r15);
>>>> +#endif
>>>> +	case VCPU_REGS_RIP:	return VMSA_REG_ENTRY(rip);
>>>> +	default:
>>>> +		WARN_ONCE(1, "unsupported VCPU to VMSA register conversion\n");
>>>> +		return VMSA_REG_UNDEF;
>>>> +	}
>>>> +}
>>>> +
>>>> +/* For SEV-ES guests, populate the vCPU register from the appropriate VMSA/GHCB */
>>>> +static void svm_reg_read_override(struct kvm_vcpu *vcpu, enum kvm_reg reg)
>>>> +{
>>>> +	struct vmcb_save_area *vmsa;
>>>> +	struct vcpu_svm *svm;
>>>> +	unsigned int entry;
>>>> +	unsigned long val;
>>>> +	u64 *vmsa_reg;
>>>> +
>>>> +	if (!sev_es_guest(vcpu->kvm))
>>>> +		return;
>>>> +
>>>> +	entry = vcpu_to_vmsa_entry(reg);
>>>> +	if (entry == VMSA_REG_UNDEF)
>>>> +		return;
>>>> +
>>>> +	svm = to_svm(vcpu);
>>>> +	vmsa = get_vmsa(svm);
>>>> +	vmsa_reg = (u64 *)vmsa;
>>>> +	val = (unsigned long)vmsa_reg[entry];
>>>> +
>>>> +	/* If a GHCB is mapped, check the bitmap of valid entries */
>>>> +	if (svm->ghcb) {
>>>> +		if (!test_bit(entry, (unsigned long *)vmsa->valid_bitmap))
>>>> +			val = 0;
>>>
>>> Is KVM relying on this being 0?  Would it make sense to stuff something like
>>> 0xaaaa... or 0xdeadbeefdeadbeef so that consumption of bogus data is more
>>> noticeable?
>>
>> No, KVM isn't relying on this being 0. I thought about using something
>> other than 0 here, but settled on just using 0. I'm open to changing that,
>> though. I'm not sure if there's an easy way to short-circuit the intercept
>> and respond back with an error at this point, that would be optimal.
> 
> Ya, responding with an error would be ideal.  At this point, we're taking the
> same lazy approach for TDX and effectively consuming garbage if the guest
> requests emulation but doesn't expose the necessary GPRs.  That being said,
> TDX's guest/host ABI is quite rigid, so all the "is this register valid"
> checks could be hardcoded into the higher level "emulation" flows.
> 
> Would that also be an option for SEV-ES?

Meaning adding the expected input checks at VMEXIT time in the VMGEXIT
handler, so that accesses later are guaranteed to be good? That is an
option and might also address one of the other points you brought up about
 about receiving exits that are not supported/expected.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-15 20:13         ` Tom Lendacky
@ 2020-09-16 15:11           ` Tom Lendacky
  2020-09-16 16:02             ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-16 15:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/15/20 3:13 PM, Tom Lendacky wrote:
> On 9/15/20 11:30 AM, Sean Christopherson wrote:
>> On Tue, Sep 15, 2020 at 08:37:12AM -0500, Tom Lendacky wrote:
>>> On 9/14/20 4:26 PM, Sean Christopherson wrote:
>>>> On Mon, Sep 14, 2020 at 03:15:22PM -0500, Tom Lendacky wrote:
>>>>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>>>>
>>>>> Since the guest register state of an SEV-ES guest is encrypted, debugging
>>>>> is not supported. Update the code to prevent guest debugging when the
>>>>> guest is an SEV-ES guest. This includes adding a callable function that
>>>>> is used to determine if the guest supports being debugged.
>>>>>
>>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>>>> ---
>>>>>  arch/x86/include/asm/kvm_host.h |  2 ++
>>>>>  arch/x86/kvm/svm/svm.c          | 16 ++++++++++++++++
>>>>>  arch/x86/kvm/vmx/vmx.c          |  7 +++++++
>>>>>  arch/x86/kvm/x86.c              |  3 +++
>>>>>  4 files changed, 28 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>> index c900992701d6..3e2a3d2a8ba8 100644
>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>> @@ -1234,6 +1234,8 @@ struct kvm_x86_ops {
>>>>>  	void (*reg_read_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
>>>>>  	void (*reg_write_override)(struct kvm_vcpu *vcpu, enum kvm_reg reg,
>>>>>  				   unsigned long val);
>>>>> +
>>>>> +	bool (*allow_debug)(struct kvm *kvm);
>>>>
>>>> Why add both allow_debug() and vmsa_encrypted?  I assume there are scenarios
>>>> where allow_debug() != vmsa_encrypted?  E.g. is there a debug mode for SEV-ES
>>>> where the VMSA is not encrypted, but KVM (ironically) can't intercept #DBs or
>>>> something?
>>>
>>> No, once the guest has had LAUNCH_UPDATE_VMSA run against the vCPUs, then
>>> the vCPU states are all encrypted. But that doesn't mean that debugging
>>> can't be done in the future.
>>
>> I don't quite follow the "doesn't mean debugging can't be done in the future".
>> Does that imply that debugging could be supported for SEV-ES guests, even if
>> they have an encrypted VMSA?
> 
> Almost anything can be done with software. It would require a lot of
> hypervisor and guest code and changes to the GHCB spec, etc. So given
> that, probably just the check for arch.guest_state_protected is enough for
> now. I'll just need to be sure none of the debugging paths can be taken
> before the VMSA is encrypted.

So I don't think there's any guarantee that the KVM_SET_GUEST_DEBUG ioctl
couldn't be called before the VMSA is encrypted, meaning I can't check the
arch.guest_state_protected bit for that call. So if we really want to get
rid of the allow_debug() op, I'd need some other way to indicate that this
is an SEV-ES / protected state guest.

How are you planning on blocking this ioctl for TDX? Would the
arch.guest_state_protected bit be sit earlier than is done for SEV-ES?

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-16 15:11           ` Tom Lendacky
@ 2020-09-16 16:02             ` Sean Christopherson
  2020-09-16 16:38               ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-16 16:02 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Wed, Sep 16, 2020 at 10:11:10AM -0500, Tom Lendacky wrote:
> On 9/15/20 3:13 PM, Tom Lendacky wrote:
> > On 9/15/20 11:30 AM, Sean Christopherson wrote:
> >> I don't quite follow the "doesn't mean debugging can't be done in the future".
> >> Does that imply that debugging could be supported for SEV-ES guests, even if
> >> they have an encrypted VMSA?
> > 
> > Almost anything can be done with software. It would require a lot of
> > hypervisor and guest code and changes to the GHCB spec, etc. So given
> > that, probably just the check for arch.guest_state_protected is enough for
> > now. I'll just need to be sure none of the debugging paths can be taken
> > before the VMSA is encrypted.
> 
> So I don't think there's any guarantee that the KVM_SET_GUEST_DEBUG ioctl
> couldn't be called before the VMSA is encrypted, meaning I can't check the
> arch.guest_state_protected bit for that call. So if we really want to get
> rid of the allow_debug() op, I'd need some other way to indicate that this
> is an SEV-ES / protected state guest.

Would anything break if KVM "speculatively" set guest_state_protected before
LAUNCH_UPDATE_VMSA?  E.g. does KVM need to emulate before LAUNCH_UPDATE_VMSA?

> How are you planning on blocking this ioctl for TDX? Would the
> arch.guest_state_protected bit be sit earlier than is done for SEV-ES?

Yep, guest_state_protected is set from time zero (kvm_x86_ops.vm_init) as
guest state is encrypted/inaccessible from the get go.  The flag actually
gets turned off for debuggable TDX guests, but that's also forced to happen
before the KVM_RUN can be invoked (TDX architecture) and is a one-time
configuration, i.e. userspace can flip the switch exactly once, and only at
a very specific point in time.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 11/35] KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file
       [not found]   ` <20200915172148.GE8420@sjchrist-ice>
@ 2020-09-16 16:22     ` Tom Lendacky
  0 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-16 16:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/15/20 12:21 PM, Sean Christopherson wrote:
> On Mon, Sep 14, 2020 at 03:15:25PM -0500, Tom Lendacky wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> This is a pre-patch to consolidate some exit handling code into callable
>> functions. Follow-on patches for SEV-ES exit handling will then be able
>> to use them from the sev.c file.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kvm/svm/svm.c | 64 +++++++++++++++++++++++++-----------------
>>  1 file changed, 38 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index f9daa40b3cfc..6a4cc535ba77 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3047,6 +3047,43 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
>>  	       "excp_to:", save->last_excp_to);
>>  }
>>  
>> +static bool svm_is_supported_exit(struct kvm_vcpu *vcpu, u64 exit_code)
>> +{
>> +	if (exit_code < ARRAY_SIZE(svm_exit_handlers) &&
>> +	    svm_exit_handlers[exit_code])
>> +		return true;
>> +
>> +	vcpu_unimpl(vcpu, "svm: unexpected exit reason 0x%llx\n", exit_code);
>> +	dump_vmcb(vcpu);
>> +	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>> +	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
>> +	vcpu->run->internal.ndata = 2;
>> +	vcpu->run->internal.data[0] = exit_code;
>> +	vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
> 
> Based on the name "is_supported_exit", I would prefer that vcpu->run be filled
> in by the caller.  Looking at the below code where svm_is_supported_exit() is
> checked, without diving into the implementation of the helper it's not at all
> clear that vcpu->run is filled.
> 
> Assuming svm_invoke_exit_handler() is the only user, it probably makes sense to
> fill vcpu->run in the caller.  If there will be multiple callers, then it'd be
> nice to rename svm_is_supported_exit() to e.g. svm_handle_invalid_exit() or so.

Will change.

> 
>> +
>> +	return false;
>> +}
>> +
>> +static int svm_invoke_exit_handler(struct vcpu_svm *svm, u64 exit_code)
>> +{
>> +	if (!svm_is_supported_exit(&svm->vcpu, exit_code))
>> +		return 0;
>> +
>> +#ifdef CONFIG_RETPOLINE
>> +	if (exit_code == SVM_EXIT_MSR)
>> +		return msr_interception(svm);
>> +	else if (exit_code == SVM_EXIT_VINTR)
>> +		return interrupt_window_interception(svm);
>> +	else if (exit_code == SVM_EXIT_INTR)
>> +		return intr_interception(svm);
>> +	else if (exit_code == SVM_EXIT_HLT)
>> +		return halt_interception(svm);
>> +	else if (exit_code == SVM_EXIT_NPF)
>> +		return npf_interception(svm);
>> +#endif
>> +	return svm_exit_handlers[exit_code](svm);
> 
> Now I see why kvm_skip_emulated_instruction() is bailing on SEV-ES guests,
> #VMGEXIT simply routes through the legacy exit handlers.  Which totally makes
> sense from a code reuse perspective, but the lack of sanity checking with that
> approach is undesirable, e.g. I assume there are a big pile of exit codes that
> are flat out unsupported for SEV-ES, and ideally KVM would yell loudly if it
> tries to do skip_emulated_instruction() for a protected guest.
> 
> Rather than route through the legacy handlers, I suspect it will be more
> desirable in the long run to have a separate path for #VMGEXIT, i.e. a path
> that does the back half of emulation (the front half being the "fetch" phase).

Except there are some automatic exits (AE events) that don't go through
VMGEXIT and would need to be sure the RIP isn't updated. I can audit the
AE events and see what's possible.

Additionally, maybe just ensuring that kvm_x86_ops.get_rflags() doesn't
return something with the TF flag set eliminates the need for the change
to kvm_skip_emulated_instruction().

> 
> The biggest downsides would be code duplication and ongoing maintenance.  Our
> current approach for TDX is to eat that overhead, because it's not _that_ much
> code.  But, maybe there's a middle ground, e.g. using the existing flows but
> having them skip (heh) kvm_skip_emulated_instruction() for protected guests.
> 
> There are a few flows, e.g. MMIO emulation, that will need dedicated
> implementations, but I'm 99% certain we can put those in x86.c and share them
> between SEV-ES and TDX.
>  
> One question that will impact KVM's options: can KVM inject exceptions to
> SEV-ES guests?  E.g. if the guest request emulation of a bogus WRMSR, is the
> #GP delivered as an actual #GP, or is the error "returned" via the GHCB?

Yes, for SEV-ES guest, you can inject exceptions. But, when using VMGEXIT
for, e.g. WRMSR, you would pass an exception error code back to the #VC
handler that will propagate that exception in the guest with the registers
associated with the #VC.

Thanks,
Tom

> 
> The most annoying hiccup is that TDX doesn't use the "standard" GPRs, e.g. MSR
> index isn't passed via ECX.  I'll play around with a common x86.c
> implementation to see how painful it will be to use for TDX.  Given that SEV-ES
> is more closely aligned with legacy behavior (in terms of registers usage),
> getting SEV-ES working on a common base should be relatively easy, at least in
> theory :-).
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-16 16:02             ` Sean Christopherson
@ 2020-09-16 16:38               ` Tom Lendacky
  2020-09-16 16:49                 ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-16 16:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh



On 9/16/20 11:02 AM, Sean Christopherson wrote:
> On Wed, Sep 16, 2020 at 10:11:10AM -0500, Tom Lendacky wrote:
>> On 9/15/20 3:13 PM, Tom Lendacky wrote:
>>> On 9/15/20 11:30 AM, Sean Christopherson wrote:
>>>> I don't quite follow the "doesn't mean debugging can't be done in the future".
>>>> Does that imply that debugging could be supported for SEV-ES guests, even if
>>>> they have an encrypted VMSA?
>>>
>>> Almost anything can be done with software. It would require a lot of
>>> hypervisor and guest code and changes to the GHCB spec, etc. So given
>>> that, probably just the check for arch.guest_state_protected is enough for
>>> now. I'll just need to be sure none of the debugging paths can be taken
>>> before the VMSA is encrypted.
>>
>> So I don't think there's any guarantee that the KVM_SET_GUEST_DEBUG ioctl
>> couldn't be called before the VMSA is encrypted, meaning I can't check the
>> arch.guest_state_protected bit for that call. So if we really want to get
>> rid of the allow_debug() op, I'd need some other way to indicate that this
>> is an SEV-ES / protected state guest.
> 
> Would anything break if KVM "speculatively" set guest_state_protected before
> LAUNCH_UPDATE_VMSA?  E.g. does KVM need to emulate before LAUNCH_UPDATE_VMSA?

Yes, the way the code is set up, the guest state (VMSA) is initialized in
the same way it is today (mostly) and that state is encrypted by the
LAUNCH_UPDATE_VMSA call. I check the guest_state_protected bit to decide
on whether to direct the updates to the real VMSA (before it's encrypted)
or the GHCB (that's the get_vmsa() function from patch #5).

Thanks,
Tom

> 
>> How are you planning on blocking this ioctl for TDX? Would the
>> arch.guest_state_protected bit be sit earlier than is done for SEV-ES?
> 
> Yep, guest_state_protected is set from time zero (kvm_x86_ops.vm_init) as
> guest state is encrypted/inaccessible from the get go.  The flag actually
> gets turned off for debuggable TDX guests, but that's also forced to happen
> before the KVM_RUN can be invoked (TDX architecture) and is a one-time
> configuration, i.e. userspace can flip the switch exactly once, and only at
> a very specific point in time.
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-16 16:38               ` Tom Lendacky
@ 2020-09-16 16:49                 ` Sean Christopherson
  2020-09-16 20:27                   ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-16 16:49 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Wed, Sep 16, 2020 at 11:38:38AM -0500, Tom Lendacky wrote:
> 
> 
> On 9/16/20 11:02 AM, Sean Christopherson wrote:
> > On Wed, Sep 16, 2020 at 10:11:10AM -0500, Tom Lendacky wrote:
> >> On 9/15/20 3:13 PM, Tom Lendacky wrote:
> >>> On 9/15/20 11:30 AM, Sean Christopherson wrote:
> >>>> I don't quite follow the "doesn't mean debugging can't be done in the future".
> >>>> Does that imply that debugging could be supported for SEV-ES guests, even if
> >>>> they have an encrypted VMSA?
> >>>
> >>> Almost anything can be done with software. It would require a lot of
> >>> hypervisor and guest code and changes to the GHCB spec, etc. So given
> >>> that, probably just the check for arch.guest_state_protected is enough for
> >>> now. I'll just need to be sure none of the debugging paths can be taken
> >>> before the VMSA is encrypted.
> >>
> >> So I don't think there's any guarantee that the KVM_SET_GUEST_DEBUG ioctl
> >> couldn't be called before the VMSA is encrypted, meaning I can't check the
> >> arch.guest_state_protected bit for that call. So if we really want to get
> >> rid of the allow_debug() op, I'd need some other way to indicate that this
> >> is an SEV-ES / protected state guest.
> > 
> > Would anything break if KVM "speculatively" set guest_state_protected before
> > LAUNCH_UPDATE_VMSA?  E.g. does KVM need to emulate before LAUNCH_UPDATE_VMSA?
> 
> Yes, the way the code is set up, the guest state (VMSA) is initialized in
> the same way it is today (mostly) and that state is encrypted by the
> LAUNCH_UPDATE_VMSA call. I check the guest_state_protected bit to decide
> on whether to direct the updates to the real VMSA (before it's encrypted)
> or the GHCB (that's the get_vmsa() function from patch #5).

Ah, gotcha.  Would it work to set guest_state_protected[*] from time zero,
and move vmsa_encrypted to struct vcpu_svm?  I.e. keep vmsa_encrypted, but
use it only for guiding get_vmsa() and related behavior.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-16 16:49                 ` Sean Christopherson
@ 2020-09-16 20:27                   ` Tom Lendacky
  2020-09-16 22:50                     ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-09-16 20:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/16/20 11:49 AM, Sean Christopherson wrote:
> On Wed, Sep 16, 2020 at 11:38:38AM -0500, Tom Lendacky wrote:
>>
>>
>> On 9/16/20 11:02 AM, Sean Christopherson wrote:
>>> On Wed, Sep 16, 2020 at 10:11:10AM -0500, Tom Lendacky wrote:
>>>> On 9/15/20 3:13 PM, Tom Lendacky wrote:
>>>>> On 9/15/20 11:30 AM, Sean Christopherson wrote:
>>>>>> I don't quite follow the "doesn't mean debugging can't be done in the future".
>>>>>> Does that imply that debugging could be supported for SEV-ES guests, even if
>>>>>> they have an encrypted VMSA?
>>>>>
>>>>> Almost anything can be done with software. It would require a lot of
>>>>> hypervisor and guest code and changes to the GHCB spec, etc. So given
>>>>> that, probably just the check for arch.guest_state_protected is enough for
>>>>> now. I'll just need to be sure none of the debugging paths can be taken
>>>>> before the VMSA is encrypted.
>>>>
>>>> So I don't think there's any guarantee that the KVM_SET_GUEST_DEBUG ioctl
>>>> couldn't be called before the VMSA is encrypted, meaning I can't check the
>>>> arch.guest_state_protected bit for that call. So if we really want to get
>>>> rid of the allow_debug() op, I'd need some other way to indicate that this
>>>> is an SEV-ES / protected state guest.
>>>
>>> Would anything break if KVM "speculatively" set guest_state_protected before
>>> LAUNCH_UPDATE_VMSA?  E.g. does KVM need to emulate before LAUNCH_UPDATE_VMSA?
>>
>> Yes, the way the code is set up, the guest state (VMSA) is initialized in
>> the same way it is today (mostly) and that state is encrypted by the
>> LAUNCH_UPDATE_VMSA call. I check the guest_state_protected bit to decide
>> on whether to direct the updates to the real VMSA (before it's encrypted)
>> or the GHCB (that's the get_vmsa() function from patch #5).
> 
> Ah, gotcha.  Would it work to set guest_state_protected[*] from time zero,
> and move vmsa_encrypted to struct vcpu_svm?  I.e. keep vmsa_encrypted, but
> use it only for guiding get_vmsa() and related behavior.

It is mainly __set_sregs() that needs to know when to allow the register
writes and when not to. During guest initialization, __set_sregs is how
some of the VMSA is initialized by Qemu.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-16 20:27                   ` Tom Lendacky
@ 2020-09-16 22:50                     ` Sean Christopherson
  2020-09-17 16:27                       ` Tom Lendacky
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-09-16 22:50 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Wed, Sep 16, 2020 at 03:27:13PM -0500, Tom Lendacky wrote:
> On 9/16/20 11:49 AM, Sean Christopherson wrote:
> > On Wed, Sep 16, 2020 at 11:38:38AM -0500, Tom Lendacky wrote:
> >>
> >>
> >> On 9/16/20 11:02 AM, Sean Christopherson wrote:
> >>> On Wed, Sep 16, 2020 at 10:11:10AM -0500, Tom Lendacky wrote:
> >>>> On 9/15/20 3:13 PM, Tom Lendacky wrote:
> >>>>> On 9/15/20 11:30 AM, Sean Christopherson wrote:
> >>>>>> I don't quite follow the "doesn't mean debugging can't be done in the future".
> >>>>>> Does that imply that debugging could be supported for SEV-ES guests, even if
> >>>>>> they have an encrypted VMSA?
> >>>>>
> >>>>> Almost anything can be done with software. It would require a lot of
> >>>>> hypervisor and guest code and changes to the GHCB spec, etc. So given
> >>>>> that, probably just the check for arch.guest_state_protected is enough for
> >>>>> now. I'll just need to be sure none of the debugging paths can be taken
> >>>>> before the VMSA is encrypted.
> >>>>
> >>>> So I don't think there's any guarantee that the KVM_SET_GUEST_DEBUG ioctl
> >>>> couldn't be called before the VMSA is encrypted, meaning I can't check the
> >>>> arch.guest_state_protected bit for that call. So if we really want to get
> >>>> rid of the allow_debug() op, I'd need some other way to indicate that this
> >>>> is an SEV-ES / protected state guest.
> >>>
> >>> Would anything break if KVM "speculatively" set guest_state_protected before
> >>> LAUNCH_UPDATE_VMSA?  E.g. does KVM need to emulate before LAUNCH_UPDATE_VMSA?
> >>
> >> Yes, the way the code is set up, the guest state (VMSA) is initialized in
> >> the same way it is today (mostly) and that state is encrypted by the
> >> LAUNCH_UPDATE_VMSA call. I check the guest_state_protected bit to decide
> >> on whether to direct the updates to the real VMSA (before it's encrypted)
> >> or the GHCB (that's the get_vmsa() function from patch #5).
> > 
> > Ah, gotcha.  Would it work to set guest_state_protected[*] from time zero,
> > and move vmsa_encrypted to struct vcpu_svm?  I.e. keep vmsa_encrypted, but
> > use it only for guiding get_vmsa() and related behavior.
> 
> It is mainly __set_sregs() that needs to know when to allow the register
> writes and when not to. During guest initialization, __set_sregs is how
> some of the VMSA is initialized by Qemu.

Hmm.  I assume that also means KVM_SET_REGS and KVM_GET_XCRS are also legal
before the VMSA is encrypted?  If so, then the current behavior of setting
vmsa_encrypted "late" make sense.  KVM_SET_FPU/XSAVE can be handled by not
allocating guest_fpu, i.e. they can be disallowed from time zero without
adding an SEV-ES specific check.

Which brings us back to KVM_SET_GUEST_DEBUG.  What would happen if that were
allowed prior to VMSA encryption?  If LAUNCH_UPDATE_VMSA acts as a sort of
reset, one thought would be to allow KVM_SET_GUEST_DEBUG and then sanitize
KVM's state during LAUNCH_UPDATE_VMSA.  Or perhaps even better, disallow
LAUNCH_UPDATE_VMSA if vcpu->guest_debug!=0.  That would allow using debug
capabilities up until LAUNCH_UPDATE_VMSA without adding much burden to KVM.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES
  2020-09-16 22:50                     ` Sean Christopherson
@ 2020-09-17 16:27                       ` Tom Lendacky
  0 siblings, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-09-17 16:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 9/16/20 5:50 PM, Sean Christopherson wrote:
> On Wed, Sep 16, 2020 at 03:27:13PM -0500, Tom Lendacky wrote:
>> On 9/16/20 11:49 AM, Sean Christopherson wrote:
>>> On Wed, Sep 16, 2020 at 11:38:38AM -0500, Tom Lendacky wrote:
>>>>
>>>>
>>>> On 9/16/20 11:02 AM, Sean Christopherson wrote:
>>>>> On Wed, Sep 16, 2020 at 10:11:10AM -0500, Tom Lendacky wrote:
>>>>>> On 9/15/20 3:13 PM, Tom Lendacky wrote:
>>>>>>> On 9/15/20 11:30 AM, Sean Christopherson wrote:
>>>>>>>> I don't quite follow the "doesn't mean debugging can't be done in the future".
>>>>>>>> Does that imply that debugging could be supported for SEV-ES guests, even if
>>>>>>>> they have an encrypted VMSA?
>>>>>>>
>>>>>>> Almost anything can be done with software. It would require a lot of
>>>>>>> hypervisor and guest code and changes to the GHCB spec, etc. So given
>>>>>>> that, probably just the check for arch.guest_state_protected is enough for
>>>>>>> now. I'll just need to be sure none of the debugging paths can be taken
>>>>>>> before the VMSA is encrypted.
>>>>>>
>>>>>> So I don't think there's any guarantee that the KVM_SET_GUEST_DEBUG ioctl
>>>>>> couldn't be called before the VMSA is encrypted, meaning I can't check the
>>>>>> arch.guest_state_protected bit for that call. So if we really want to get
>>>>>> rid of the allow_debug() op, I'd need some other way to indicate that this
>>>>>> is an SEV-ES / protected state guest.
>>>>>
>>>>> Would anything break if KVM "speculatively" set guest_state_protected before
>>>>> LAUNCH_UPDATE_VMSA?  E.g. does KVM need to emulate before LAUNCH_UPDATE_VMSA?
>>>>
>>>> Yes, the way the code is set up, the guest state (VMSA) is initialized in
>>>> the same way it is today (mostly) and that state is encrypted by the
>>>> LAUNCH_UPDATE_VMSA call. I check the guest_state_protected bit to decide
>>>> on whether to direct the updates to the real VMSA (before it's encrypted)
>>>> or the GHCB (that's the get_vmsa() function from patch #5).
>>>
>>> Ah, gotcha.  Would it work to set guest_state_protected[*] from time zero,
>>> and move vmsa_encrypted to struct vcpu_svm?  I.e. keep vmsa_encrypted, but
>>> use it only for guiding get_vmsa() and related behavior.
>>
>> It is mainly __set_sregs() that needs to know when to allow the register
>> writes and when not to. During guest initialization, __set_sregs is how
>> some of the VMSA is initialized by Qemu.
> 
> Hmm.  I assume that also means KVM_SET_REGS and KVM_GET_XCRS are also legal
> before the VMSA is encrypted?  If so, then the current behavior of setting
> vmsa_encrypted "late" make sense.  KVM_SET_FPU/XSAVE can be handled by not
> allocating guest_fpu, i.e. they can be disallowed from time zero without
> adding an SEV-ES specific check.
> 
> Which brings us back to KVM_SET_GUEST_DEBUG.  What would happen if that were
> allowed prior to VMSA encryption?  If LAUNCH_UPDATE_VMSA acts as a sort of
> reset, one thought would be to allow KVM_SET_GUEST_DEBUG and then sanitize
> KVM's state during LAUNCH_UPDATE_VMSA.  Or perhaps even better, disallow
> LAUNCH_UPDATE_VMSA if vcpu->guest_debug!=0.  That would allow using debug
> capabilities up until LAUNCH_UPDATE_VMSA without adding much burden to KVM.

I think the vcpu->guest_debug check before the LAUNCH_UPDATE_VMSA would be 
good. I'll remove the allow_debug() op and replace it with the 
guest_state_protected check in its place.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-09-16  0:19     ` Sean Christopherson
@ 2020-10-13 20:26       ` Tom Lendacky
  2020-11-30 15:31       ` Paolo Bonzini
  1 sibling, 0 replies; 86+ messages in thread
From: Tom Lendacky @ 2020-10-13 20:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, x86, Paolo Bonzini, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

Apologies, Sean.

I thought I had replied to this but found it instead in my drafts folder...

I've taken much of your feedback and incorporated that into the next
version of the patches that I submitted and updated this response based on
that, too.

On 9/15/20 7:19 PM, Sean Christopherson wrote:
> On Tue, Sep 15, 2020 at 12:22:05PM -0500, Tom Lendacky wrote:
>> On 9/14/20 5:59 PM, Sean Christopherson wrote:
>>> Given that we don't yet have publicly available KVM code for TDX, what if I
>>> generate and post a list of ioctls() that are denied by either SEV-ES or TDX,
>>> organized by the denier(s)?  Then for the ioctls() that are denied by one and
>>> not the other, we add a brief explanation of why it's denied?
>>>
>>> If that sounds ok, I'll get the list and the TDX side of things posted
>>> tomorrow.
>>
>> That sounds good.
> 
> TDX completely blocks the following ioctl()s:

SEV-ES doesn't need to completely block these ioctls. SEV-SNP is likely to
do more of that. SEV-ES will still allow interrupts to be injected, or
registers to be retrieved (which will only contain what was provided in
the GHCB exchange), etc.

> 
>   kvm_vcpu_ioctl_interrupt
>   kvm_vcpu_ioctl_smi
>   kvm_vcpu_ioctl_x86_setup_mce
>   kvm_vcpu_ioctl_x86_set_mce
>   kvm_vcpu_ioctl_x86_get_debugregs
>   kvm_vcpu_ioctl_x86_set_debugregs
>   kvm_vcpu_ioctl_x86_get_xsave
>   kvm_vcpu_ioctl_x86_set_xsave
>   kvm_vcpu_ioctl_x86_get_xcrs
>   kvm_vcpu_ioctl_x86_set_xcrs
>   kvm_arch_vcpu_ioctl_get_regs
>   kvm_arch_vcpu_ioctl_set_regs
>   kvm_arch_vcpu_ioctl_get_sregs
>   kvm_arch_vcpu_ioctl_set_sregs
>   kvm_arch_vcpu_ioctl_set_guest_debug
>   kvm_arch_vcpu_ioctl_get_fpu
>   kvm_arch_vcpu_ioctl_set_fpu

Of the listed ioctls, really the only ones I've updated are:

  kvm_vcpu_ioctl_x86_get_xsave
  kvm_vcpu_ioctl_x86_set_xsave

  kvm_arch_vcpu_ioctl_get_sregs
    This allows reading of the tracking value registers
  kvm_arch_vcpu_ioctl_set_sregs
    This prevents setting of register values

  kvm_arch_vcpu_ioctl_set_guest_debug

  kvm_arch_vcpu_ioctl_get_fpu
  kvm_arch_vcpu_ioctl_set_fpu

> 
> Looking through the code, I think kvm_arch_vcpu_ioctl_get_mpstate() and
> kvm_arch_vcpu_ioctl_set_mpstate() should also be disallowed, we just haven't
> actually done so.

I haven't done anything with these either.

> 
> There are also two helper functions that are "blocked".
> dm_request_for_irq_injection() returns false if guest_state_protected, and
> post_kvm_run_save() shoves dummy state.

... and these.

> 
> TDX also selectively blocks/skips portions of other ioctl()s so that the
> TDX code itself can yell loudly if e.g. .get_cpl() is invoked.  The event
> injection restrictions are due to direct injection not being allowed (except
> for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
> exception injection is completely disallowed.

For SEV-ES, we don't have those restrictions.

> 
>   kvm_vcpu_ioctl_x86_get_vcpu_events:
> 	if (!vcpu->kvm->arch.guest_state_protected)
>         	events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);
> 
>   kvm_arch_vcpu_put:
>         if (vcpu->preempted && !vcpu->kvm->arch.guest_state_protected)
>                 vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu);
> 
>   kvm_vcpu_ioctl_x86_set_vcpu_events:
> 	u32 allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING |
> 			    KVM_VCPUEVENT_VALID_SIPI_VECTOR |
> 			    KVM_VCPUEVENT_VALID_SHADOW |
> 			    KVM_VCPUEVENT_VALID_SMM |
> 			    KVM_VCPUEVENT_VALID_PAYLOAD;
> 
> 	if (vcpu->kvm->arch.guest_state_protected)
> 		allowed_flags = KVM_VCPUEVENT_VALID_NMI_PENDING;
> 
> 
>   kvm_arch_vcpu_ioctl_run:
> 	if (vcpu->kvm->arch.guest_state_protected)
> 		kvm_sync_valid_fields = KVM_SYNC_X86_EVENTS;
> 	else
> 		kvm_sync_valid_fields = KVM_SYNC_X86_VALID_FIELDS;
> 
> 
> In addition to the more generic guest_state_protected, we also (obviously
> tentatively) have a few other flags to deal with aspects of TDX that I'm
> fairly certain don't apply to SEV-ES:
> 
>   tsc_immutable - KVM doesn't have write access to the TSC offset of the
>                   guest.
> 
>   eoi_intercept_unsupported - KVM can't intercept EOIs (doesn't have access
>                               to EOI bitmaps) and so can't support level
>                               triggered interrupts, at least not without
>                               extra pain.
> 
>   readonly_mem_unsupported - Secure EPT (analagous to SNP) requires RWX
>                              permissions for all private/encrypted memory.
>                              S-EPT isn't optional, so we get the joy of
>                              adding this right off the bat...

Yes, most of the above stuff doesn't apply to SEV-ES.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-09-16  0:19     ` Sean Christopherson
  2020-10-13 20:26       ` Tom Lendacky
@ 2020-11-30 15:31       ` Paolo Bonzini
  2020-11-30 16:06         ` Tom Lendacky
  2020-11-30 18:14         ` Sean Christopherson
  1 sibling, 2 replies; 86+ messages in thread
From: Paolo Bonzini @ 2020-11-30 15:31 UTC (permalink / raw)
  To: Tom Lendacky, Sean Christopherson
  Cc: kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 16/09/20 02:19, Sean Christopherson wrote:
> 
> TDX also selectively blocks/skips portions of other ioctl()s so that the
> TDX code itself can yell loudly if e.g. .get_cpl() is invoked.  The event
> injection restrictions are due to direct injection not being allowed (except
> for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
> exception injection is completely disallowed.
> 
>    kvm_vcpu_ioctl_x86_get_vcpu_events:
> 	if (!vcpu->kvm->arch.guest_state_protected)
>          	events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);

Perhaps an alternative implementation can enter the vCPU with immediate 
exit until no events are pending, and then return all zeroes?

Paolo


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-11-30 15:31       ` Paolo Bonzini
@ 2020-11-30 16:06         ` Tom Lendacky
  2020-11-30 18:18           ` Sean Christopherson
  2020-11-30 18:14         ` Sean Christopherson
  1 sibling, 1 reply; 86+ messages in thread
From: Tom Lendacky @ 2020-11-30 16:06 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 11/30/20 9:31 AM, Paolo Bonzini wrote:
> On 16/09/20 02:19, Sean Christopherson wrote:
>>
>> TDX also selectively blocks/skips portions of other ioctl()s so that the
>> TDX code itself can yell loudly if e.g. .get_cpl() is invoked.  The event
>> injection restrictions are due to direct injection not being allowed
>> (except
>> for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
>> exception injection is completely disallowed.
>>
>>    kvm_vcpu_ioctl_x86_get_vcpu_events:
>>     if (!vcpu->kvm->arch.guest_state_protected)
>>              events->interrupt.shadow =
>> kvm_x86_ops.get_interrupt_shadow(vcpu);
> 
> Perhaps an alternative implementation can enter the vCPU with immediate
> exit until no events are pending, and then return all zeroes?

SEV-SNP has support for restricting injections, but SEV-ES does not.
Perhaps a new boolean, guest_restricted_injection, can be used instead of
basing it on guest_state_protected.

Thanks,
Tom

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-11-30 15:31       ` Paolo Bonzini
  2020-11-30 16:06         ` Tom Lendacky
@ 2020-11-30 18:14         ` Sean Christopherson
  2020-11-30 18:35           ` Paolo Bonzini
  1 sibling, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-11-30 18:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tom Lendacky, kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Nov 30, 2020, Paolo Bonzini wrote:
> On 16/09/20 02:19, Sean Christopherson wrote:
> > 
> > TDX also selectively blocks/skips portions of other ioctl()s so that the
> > TDX code itself can yell loudly if e.g. .get_cpl() is invoked.  The event
> > injection restrictions are due to direct injection not being allowed (except
> > for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
> > exception injection is completely disallowed.
> > 
> >    kvm_vcpu_ioctl_x86_get_vcpu_events:
> > 	if (!vcpu->kvm->arch.guest_state_protected)
> >          	events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);
> 
> Perhaps an alternative implementation can enter the vCPU with immediate exit
> until no events are pending, and then return all zeroes?

This can't work.  If the guest has STI blocking, e.g. it did STI->TDVMCALL with
a valid vIRQ in GUEST_RVI, then events->interrupt.shadow should technically be
non-zero to reflect the STI blocking.  But, the immediate exit (a hardware IRQ
for TDX guests) will cause VM-Exit before the guest can execute any instructions
and thus the guest will never clear STI blocking and never consume the pending
event.  Or there could be a valid vIRQ, but GUEST_RFLAGS.IF=0, in which case KVM
would need to run the guest for an indeterminate amount of time to wait for the
vIRQ to be consumed.

Tangentially related, I haven't looked through the official external TDX docs,
but I suspect that vmcs.GUEST_RVI is listed as inaccessible for production TDs.
This will be changed as the VMM needs access to GUEST_RVI to handle
STI->TDVMCALL(HLT), otherwise the VMM may incorrectly put the vCPU into a
blocked (not runnable) state even though it has a pending wake event.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 22/35] KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  2020-09-14 22:13   ` Sean Christopherson
  2020-09-15 15:56     ` Tom Lendacky
@ 2020-11-30 18:15     ` Paolo Bonzini
  1 sibling, 0 replies; 86+ messages in thread
From: Paolo Bonzini @ 2020-11-30 18:15 UTC (permalink / raw)
  To: Sean Christopherson, Tom Lendacky
  Cc: kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 15/09/20 00:13, Sean Christopherson wrote:
>> +static void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0,
>> +			     unsigned long cr0)
> What about using __kvm_set_cr*() instead of kvm_post_set_cr*()?  That would
> show that __kvm_set_cr*() is a subordinate of kvm_set_cr*(), and from the
> SVM side would provide the hint that the code is skipping the front end of
> kvm_set_cr*().

No, kvm_post_set_cr0 is exactly the right name because it doesn't set 
any state.  __kvm_set_cr0 tells me that it is a (rarely used) way to set 
CR0, which this function isn't.

Sorry Tom for not catching this earlier.

Paolo

>> +{
>> +	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
>> +
>> +	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>> +		kvm_clear_async_pf_completion_queue(vcpu);
>> +		kvm_async_pf_hash_reset(vcpu);
>> +	}
>> +
>> +	if ((cr0 ^ old_cr0) & update_bits)
>> +		kvm_mmu_reset_context(vcpu);
>> +
>> +	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
>> +	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
>> +	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
>> +		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
>> +}
>> +
>>   int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>>   {
>>   	unsigned long old_cr0 = kvm_read_cr0(vcpu);
>>   	unsigned long pdptr_bits = X86_CR0_CD | X86_CR0_NW | X86_CR0_PG;
>> -	unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;
>>   
>>   	cr0 |= X86_CR0_ET;
>>   
>> @@ -842,22 +860,23 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
>>   
>>   	kvm_x86_ops.set_cr0(vcpu, cr0);
>>   
>> -	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>> -		kvm_clear_async_pf_completion_queue(vcpu);
>> -		kvm_async_pf_hash_reset(vcpu);
>> -	}
>> +	kvm_post_set_cr0(vcpu, old_cr0, cr0);


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 23/35] KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  2020-09-14 22:16   ` Sean Christopherson
@ 2020-11-30 18:16     ` Paolo Bonzini
  0 siblings, 0 replies; 86+ messages in thread
From: Paolo Bonzini @ 2020-11-30 18:16 UTC (permalink / raw)
  To: Sean Christopherson, Tom Lendacky
  Cc: kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 15/09/20 00:16, Sean Christopherson wrote:
>> +int kvm_track_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>> +{
>> +	unsigned long old_cr4 = kvm_read_cr4(vcpu);
>> +	unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
>> +				   X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
>> +
>> +	if (kvm_x86_ops.set_cr4(vcpu, cr4))
>> +		return 1;
> Pretty much all the same comments as EFER and CR0, e.g. call svm_set_cr4()
> directly instead of bouncing through kvm_x86_ops.  And with that, this can
> be called __kvm_set_cr4() to be consistent with __kvm_set_cr0().

I agree with calling svm_set_cr4 directly, but then this should be 
kvm_post_set_cr4.

Paolo

>> +
>> +	if (((cr4 ^ old_cr4) & pdptr_bits) ||
>> +	    (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
>> +		kvm_mmu_reset_context(vcpu);
>> +
>> +	if ((cr4 ^ old_cr4) & (X86_CR4_OSXSAVE | X86_CR4_PKE))
>> +		kvm_update_cpuid_runtime(vcpu);
>> +
>> +	return 0;
>> +}


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-11-30 16:06         ` Tom Lendacky
@ 2020-11-30 18:18           ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2020-11-30 18:18 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Nov 30, 2020, Tom Lendacky wrote:
> On 11/30/20 9:31 AM, Paolo Bonzini wrote:
> > On 16/09/20 02:19, Sean Christopherson wrote:
> >>
> >> TDX also selectively blocks/skips portions of other ioctl()s so that the
> >> TDX code itself can yell loudly if e.g. .get_cpl() is invoked.  The event
> >> injection restrictions are due to direct injection not being allowed
> >> (except
> >> for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
> >> exception injection is completely disallowed.
> >>
> >>    kvm_vcpu_ioctl_x86_get_vcpu_events:
> >>     if (!vcpu->kvm->arch.guest_state_protected)
> >>              events->interrupt.shadow =
> >> kvm_x86_ops.get_interrupt_shadow(vcpu);
> > 
> > Perhaps an alternative implementation can enter the vCPU with immediate
> > exit until no events are pending, and then return all zeroes?
> 
> SEV-SNP has support for restricting injections, but SEV-ES does not.
> Perhaps a new boolean, guest_restricted_injection, can be used instead of
> basing it on guest_state_protected.

Ya, that probably makes sense.  I suspect the easiest way to resolve these
conflicts will be to land the SEV-ES series and then tweak things as needed for
TDX.  Easiest in the sense that it should be fairly obvious what can be covered
by guest_state_protected and what needs a dedicated flag.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  2020-09-15 22:44           ` Sean Christopherson
@ 2020-11-30 18:28             ` Paolo Bonzini
  2020-11-30 19:39               ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Paolo Bonzini @ 2020-11-30 18:28 UTC (permalink / raw)
  To: Tom Lendacky, Sean Christopherson
  Cc: kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 16/09/20 00:44, Sean Christopherson wrote:
>> KVM doesn't have control of them. They are part of the guest's encrypted
>> state and that is what the guest uses. KVM can't alter the value that the
>> guest is using for them once the VMSA is encrypted. However, KVM makes
>> some decisions based on the values it thinks it knows.  For example, early
>> on I remember the async PF support failing because the CR0 that KVM
>> thought the guest had didn't have the PE bit set, even though the guest
>> was in protected mode. So KVM didn't include the error code in the
>> exception it injected (is_protmode() was false) and things failed. Without
>> syncing these values after live migration, things also fail (probably for
>> the same reason). So the idea is to just keep KVM apprised of the values
>> that the guest has.
> 
> Ah, gotcha.  Migrating tracked state through the VMSA would probably be ideal.
> The semantics of __set_sregs() kinda setting state but not reaaaally setting
> state would be weird.

How would that work with TDX?

Paolo


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-11-30 18:14         ` Sean Christopherson
@ 2020-11-30 18:35           ` Paolo Bonzini
  2020-11-30 19:35             ` Sean Christopherson
  0 siblings, 1 reply; 86+ messages in thread
From: Paolo Bonzini @ 2020-11-30 18:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Tom Lendacky, kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On 30/11/20 19:14, Sean Christopherson wrote:
>>> TDX also selectively blocks/skips portions of other ioctl()s so that the
>>> TDX code itself can yell loudly if e.g. .get_cpl() is invoked.  The event
>>> injection restrictions are due to direct injection not being allowed (except
>>> for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
>>> exception injection is completely disallowed.
>>>
>>>     kvm_vcpu_ioctl_x86_get_vcpu_events:
>>> 	if (!vcpu->kvm->arch.guest_state_protected)
>>>           	events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);
>> Perhaps an alternative implementation can enter the vCPU with immediate exit
>> until no events are pending, and then return all zeroes?
>
> This can't work.  If the guest has STI blocking, e.g. it did STI->TDVMCALL with
> a valid vIRQ in GUEST_RVI, then events->interrupt.shadow should technically be
> non-zero to reflect the STI blocking.  But, the immediate exit (a hardware IRQ
> for TDX guests) will cause VM-Exit before the guest can execute any instructions
> and thus the guest will never clear STI blocking and never consume the pending
> event.  Or there could be a valid vIRQ, but GUEST_RFLAGS.IF=0, in which case KVM
> would need to run the guest for an indeterminate amount of time to wait for the
> vIRQ to be consumed.

Delayed interrupts are fine, since they are injected according to RVI 
and the posted interrupt descriptor.  I'm thinking more of events 
(exceptions and interrupts) that caused an EPT violation exit and were 
recorded in the IDT-vectored info field.

Paolo

> Tangentially related, I haven't looked through the official external TDX docs,
> but I suspect that vmcs.GUEST_RVI is listed as inaccessible for production TDs.
> This will be changed as the VMM needs access to GUEST_RVI to handle
> STI->TDVMCALL(HLT), otherwise the VMM may incorrectly put the vCPU into a
> blocked (not runnable) state even though it has a pending wake event.
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-11-30 18:35           ` Paolo Bonzini
@ 2020-11-30 19:35             ` Sean Christopherson
  2020-11-30 20:24               ` Paolo Bonzini
  0 siblings, 1 reply; 86+ messages in thread
From: Sean Christopherson @ 2020-11-30 19:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tom Lendacky, kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh, Xiaoyao Li, Isaku Yamahata

+Isaku and Xiaoyao

On Mon, Nov 30, 2020, Paolo Bonzini wrote:
> On 30/11/20 19:14, Sean Christopherson wrote:
> > > > TDX also selectively blocks/skips portions of other ioctl()s so that the
> > > > TDX code itself can yell loudly if e.g. .get_cpl() is invoked.  The event
> > > > injection restrictions are due to direct injection not being allowed (except
> > > > for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
> > > > exception injection is completely disallowed.
> > > > 
> > > >     kvm_vcpu_ioctl_x86_get_vcpu_events:
> > > > 	if (!vcpu->kvm->arch.guest_state_protected)
> > > >           	events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);
> > > Perhaps an alternative implementation can enter the vCPU with immediate exit
> > > until no events are pending, and then return all zeroes?
> > 
> > This can't work.  If the guest has STI blocking, e.g. it did STI->TDVMCALL with
> > a valid vIRQ in GUEST_RVI, then events->interrupt.shadow should technically be
> > non-zero to reflect the STI blocking.  But, the immediate exit (a hardware IRQ
> > for TDX guests) will cause VM-Exit before the guest can execute any instructions
> > and thus the guest will never clear STI blocking and never consume the pending
> > event.  Or there could be a valid vIRQ, but GUEST_RFLAGS.IF=0, in which case KVM
> > would need to run the guest for an indeterminate amount of time to wait for the
> > vIRQ to be consumed.
> 
> Delayed interrupts are fine, since they are injected according to RVI and
> the posted interrupt descriptor.  I'm thinking more of events (exceptions
> and interrupts) that caused an EPT violation exit and were recorded in the
> IDT-vectored info field.

Ah.  As is, I don't believe KVM has access to this information.  TDX-Module
handles the actual EPT violation, as well as event reinjection.  The EPT
violation reported by SEAMRET is synthesized, and IIRC the IDT-vectoring field
is not readable.

Regardless, is there an actual a problem with having a "pending" exception that
isn't reported to userspace?  Obviously the info needs to be migrated, but that
will be taken care of by virtue of migrating the VMCS.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  2020-11-30 18:28             ` Paolo Bonzini
@ 2020-11-30 19:39               ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2020-11-30 19:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tom Lendacky, kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh

On Mon, Nov 30, 2020, Paolo Bonzini wrote:
> On 16/09/20 00:44, Sean Christopherson wrote:
> > > KVM doesn't have control of them. They are part of the guest's encrypted
> > > state and that is what the guest uses. KVM can't alter the value that the
> > > guest is using for them once the VMSA is encrypted. However, KVM makes
> > > some decisions based on the values it thinks it knows.  For example, early
> > > on I remember the async PF support failing because the CR0 that KVM
> > > thought the guest had didn't have the PE bit set, even though the guest
> > > was in protected mode. So KVM didn't include the error code in the
> > > exception it injected (is_protmode() was false) and things failed. Without
> > > syncing these values after live migration, things also fail (probably for
> > > the same reason). So the idea is to just keep KVM apprised of the values
> > > that the guest has.
> > 
> > Ah, gotcha.  Migrating tracked state through the VMSA would probably be ideal.
> > The semantics of __set_sregs() kinda setting state but not reaaaally setting
> > state would be weird.
> 
> How would that work with TDX?

Can you elaborate?  I.e. how would what work with TDX?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [RFC PATCH 00/35] SEV-ES hypervisor support
  2020-11-30 19:35             ` Sean Christopherson
@ 2020-11-30 20:24               ` Paolo Bonzini
  0 siblings, 0 replies; 86+ messages in thread
From: Paolo Bonzini @ 2020-11-30 20:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Tom Lendacky, kvm, linux-kernel, x86, Jim Mattson, Joerg Roedel,
	Vitaly Kuznetsov, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Brijesh Singh, Xiaoyao Li, Isaku Yamahata

On 30/11/20 20:35, Sean Christopherson wrote:
>> Delayed interrupts are fine, since they are injected according to RVI and
>> the posted interrupt descriptor.  I'm thinking more of events (exceptions
>> and interrupts) that caused an EPT violation exit and were recorded in the
>> IDT-vectored info field.
> Ah.  As is, I don't believe KVM has access to this information.  TDX-Module
> handles the actual EPT violation, as well as event reinjection.  The EPT
> violation reported by SEAMRET is synthesized, and IIRC the IDT-vectoring field
> is not readable.
> 
> Regardless, is there an actual a problem with having a "pending" exception that
> isn't reported to userspace?  Obviously the info needs to be migrated, but that
> will be taken care of by virtue of migrating the VMCS.

No problem, I suppose we would just have to get used to not being able 
to look into the state of migrated VMs.

Paolo


^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2020-11-30 20:26 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-14 20:15 [RFC PATCH 00/35] SEV-ES hypervisor support Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 01/35] KVM: SVM: Remove the call to sev_platform_status() during setup Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 02/35] KVM: SVM: Add support for SEV-ES capability in KVM Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 03/35] KVM: SVM: Add indirect access to the VM save area Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 04/35] KVM: SVM: Make GHCB accessor functions available to the hypervisor Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 05/35] KVM: SVM: Add initial support for SEV-ES GHCB access to KVM Tom Lendacky
     [not found]   ` <20200914205801.GA7084@sjchrist-ice>
2020-09-15 13:24     ` Tom Lendacky
2020-09-15 16:28       ` Sean Christopherson
2020-09-16 14:54         ` Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 06/35] KVM: SVM: Add required changes to support intercepts under SEV-ES Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 07/35] KVM: SVM: Modify DRx register intercepts for an SEV-ES guest Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 08/35] KVM: SVM: Prevent debugging under SEV-ES Tom Lendacky
2020-09-14 21:26   ` Sean Christopherson
2020-09-15 13:37     ` Tom Lendacky
2020-09-15 16:30       ` Sean Christopherson
2020-09-15 20:13         ` Tom Lendacky
2020-09-16 15:11           ` Tom Lendacky
2020-09-16 16:02             ` Sean Christopherson
2020-09-16 16:38               ` Tom Lendacky
2020-09-16 16:49                 ` Sean Christopherson
2020-09-16 20:27                   ` Tom Lendacky
2020-09-16 22:50                     ` Sean Christopherson
2020-09-17 16:27                       ` Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 09/35] KVM: SVM: Do not emulate MMIO " Tom Lendacky
2020-09-14 21:33   ` Sean Christopherson
2020-09-15 13:38     ` Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 10/35] KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 11/35] KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file Tom Lendacky
     [not found]   ` <20200915172148.GE8420@sjchrist-ice>
2020-09-16 16:22     ` Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 12/35] KVM: SVM: Add initial support for a VMGEXIT VMEXIT Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 13/35] KVM: SVM: Create trace events for VMGEXIT processing Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 14/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002 Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 15/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004 Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 16/35] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100 Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 17/35] KVM: SVM: Create trace events for VMGEXIT MSR protocol processing Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 18/35] KVM: SVM: Support MMIO for an SEV-ES guest Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 19/35] KVM: SVM: Support port IO operations " Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 20/35] KVM: SVM: Add SEV/SEV-ES support for intercepting INVD Tom Lendacky
2020-09-14 22:00   ` Sean Christopherson
2020-09-15 15:08     ` Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 21/35] KVM: SVM: Add support for EFER write traps for an SEV-ES guest Tom Lendacky
     [not found]   ` <20200914220800.GI7192@sjchrist-ice>
2020-09-15 15:45     ` Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 22/35] KVM: SVM: Add support for CR0 " Tom Lendacky
2020-09-14 22:13   ` Sean Christopherson
2020-09-15 15:56     ` Tom Lendacky
2020-11-30 18:15     ` Paolo Bonzini
2020-09-14 20:15 ` [RFC PATCH 23/35] KVM: SVM: Add support for CR4 " Tom Lendacky
2020-09-14 22:16   ` Sean Christopherson
2020-11-30 18:16     ` Paolo Bonzini
2020-09-14 20:15 ` [RFC PATCH 24/35] KVM: SVM: Add support for CR8 " Tom Lendacky
2020-09-14 22:19   ` Sean Christopherson
2020-09-15 15:57     ` Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 25/35] KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES Tom Lendacky
2020-09-14 21:37   ` Sean Christopherson
2020-09-15 14:19     ` Tom Lendacky
     [not found]       ` <20200915163342.GC8420@sjchrist-ice>
2020-09-15 20:37         ` Tom Lendacky
2020-09-15 22:44           ` Sean Christopherson
2020-11-30 18:28             ` Paolo Bonzini
2020-11-30 19:39               ` Sean Christopherson
2020-09-14 20:15 ` [RFC PATCH 26/35] KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest Tom Lendacky
     [not found]   ` <20200914213917.GD7192@sjchrist-ice>
2020-09-15 14:25     ` Tom Lendacky
2020-09-15 16:34       ` Sean Christopherson
2020-09-14 20:15 ` [RFC PATCH 27/35] KVM: SVM: Add support for booting APs for an " Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 28/35] KVM: X86: Update kvm_skip_emulated_instruction() " Tom Lendacky
2020-09-14 21:51   ` Sean Christopherson
2020-09-15 14:57     ` Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 29/35] KVM: SVM: Add NMI support " Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 30/35] KVM: SVM: Set the encryption mask for the SVM host save area Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 31/35] KVM: SVM: Update ASID allocation to support SEV-ES guests Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 32/35] KVM: SVM: Provide support for SEV-ES vCPU creation/loading Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 33/35] KVM: SVM: Provide support for SEV-ES vCPU loading Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 34/35] KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests Tom Lendacky
2020-09-14 20:15 ` [RFC PATCH 35/35] KVM: SVM: Provide support to launch and run an SEV-ES guest Tom Lendacky
2020-09-14 22:59 ` [RFC PATCH 00/35] SEV-ES hypervisor support Sean Christopherson
2020-09-15 17:22   ` Tom Lendacky
2020-09-15 17:32     ` Sean Christopherson
2020-09-15 20:05       ` Brijesh Singh
2020-09-16  0:19     ` Sean Christopherson
2020-10-13 20:26       ` Tom Lendacky
2020-11-30 15:31       ` Paolo Bonzini
2020-11-30 16:06         ` Tom Lendacky
2020-11-30 18:18           ` Sean Christopherson
2020-11-30 18:14         ` Sean Christopherson
2020-11-30 18:35           ` Paolo Bonzini
2020-11-30 19:35             ` Sean Christopherson
2020-11-30 20:24               ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).