linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap
@ 2023-04-11 12:57 Alexey Kardashevskiy
  2023-04-11 12:57 ` [PATCH kernel v5 1/6] KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header Alexey Kardashevskiy
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-11 12:57 UTC (permalink / raw)
  To: kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Pankaj Gupta, Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Alexey Kardashevskiy

This is to use another AMD SEV-ES hardware assisted register swap,
more detail in 5/6. In the process it's been suggested to fix other
things, here is the attempt, with the great help of amders.

The previous conversation is here:
https://lore.kernel.org/r/20230203051459.1354589-1-aik@amd.com

This is based on sha1
f91f9332d782 Ingo Molnar "Merge branch into tip/master: 'x86/tdx'".

Please comment. Thanks.



Alexey Kardashevskiy (6):
  KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header
  KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV
  KVM: SEV-ES: explicitly disable debug
  KVM: SVM/SEV/SEV-ES: Rework intercepts
  KVM: SEV: Enable data breakpoints in SEV-ES
  x86/sev: Do not handle #VC for DR7 read/write

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/svm.h         |  1 +
 arch/x86/kvm/svm/svm.h             | 42 ---------------
 arch/x86/boot/compressed/sev.c     |  2 +-
 arch/x86/kernel/sev.c              |  6 +++
 arch/x86/kvm/svm/sev.c             | 54 +++++++++++++++++++-
 arch/x86/kvm/svm/svm.c             | 48 +++++++++++++++--
 7 files changed, 105 insertions(+), 49 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH kernel v5 1/6] KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header
  2023-04-11 12:57 [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
@ 2023-04-11 12:57 ` Alexey Kardashevskiy
  2023-04-11 12:57 ` [PATCH kernel v5 2/6] KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-11 12:57 UTC (permalink / raw)
  To: kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Pankaj Gupta, Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Alexey Kardashevskiy

Static functions set_dr_intercepts() and clr_dr_intercepts() are only
called from SVM so move them to .c.

No functional change intended.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
---
Changes:
v5:
* new in the series
---
 arch/x86/kvm/svm/svm.h | 42 --------------------
 arch/x86/kvm/svm/svm.c | 42 ++++++++++++++++++++
 2 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 839809972da1..4deec59be71b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -403,48 +403,6 @@ static inline bool vmcb12_is_intercept(struct vmcb_ctrl_area_cached *control, u3
 	return test_bit(bit, (unsigned long *)&control->intercepts);
 }
 
-static inline void set_dr_intercepts(struct vcpu_svm *svm)
-{
-	struct vmcb *vmcb = svm->vmcb01.ptr;
-
-	if (!sev_es_guest(svm->vcpu.kvm)) {
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
-	}
-
-	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
-	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
-
-	recalc_intercepts(svm);
-}
-
-static inline void clr_dr_intercepts(struct vcpu_svm *svm)
-{
-	struct vmcb *vmcb = svm->vmcb01.ptr;
-
-	vmcb->control.intercepts[INTERCEPT_DR] = 0;
-
-	/* DR7 access must remain intercepted for an SEV-ES guest */
-	if (sev_es_guest(svm->vcpu.kvm)) {
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
-	}
-
-	recalc_intercepts(svm);
-}
-
 static inline void set_exception_intercept(struct vcpu_svm *svm, u32 bit)
 {
 	struct vmcb *vmcb = svm->vmcb01.ptr;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a1b08359769b..1e1c1eb13392 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -687,6 +687,48 @@ static int svm_cpu_init(int cpu)
 
 }
 
+static void set_dr_intercepts(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = svm->vmcb01.ptr;
+
+	if (!sev_es_guest(svm->vcpu.kvm)) {
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
+	}
+
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+
+	recalc_intercepts(svm);
+}
+
+static void clr_dr_intercepts(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = svm->vmcb01.ptr;
+
+	vmcb->control.intercepts[INTERCEPT_DR] = 0;
+
+	/* DR7 access must remain intercepted for an SEV-ES guest */
+	if (sev_es_guest(svm->vcpu.kvm)) {
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+	}
+
+	recalc_intercepts(svm);
+}
+
 static int direct_access_msr_slot(u32 msr)
 {
 	u32 i;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH kernel v5 2/6] KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV
  2023-04-11 12:57 [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
  2023-04-11 12:57 ` [PATCH kernel v5 1/6] KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header Alexey Kardashevskiy
@ 2023-04-11 12:57 ` Alexey Kardashevskiy
  2023-04-11 12:57 ` [PATCH kernel v5 3/6] KVM: SEV-ES: explicitly disable debug Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-11 12:57 UTC (permalink / raw)
  To: kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Pankaj Gupta, Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Alexey Kardashevskiy

Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries
keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is #GP intercept which init_vmcb() skips setting for SEV guests and
then sev_es_init_vmcb() needlessly clears it.

Remove the SEV check from init_vmcb(). Clear the #GP intercept in
sev_init_vmcb(). SEV-ES will use the SEV setting.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
---
Changes:
v5:
* new in the series
---
 arch/x86/kvm/svm/sev.c | 9 ++++++---
 arch/x86/kvm/svm/svm.c | 5 ++---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c25aeb550cd9..0f4761a57d86 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2968,9 +2968,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	svm_set_intercept(svm, TRAP_CR4_WRITE);
 	svm_set_intercept(svm, TRAP_CR8_WRITE);
 
-	/* No support for enable_vmware_backdoor */
-	clr_exception_intercept(svm, GP_VECTOR);
-
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
 
@@ -2996,6 +2993,12 @@ void sev_init_vmcb(struct vcpu_svm *svm)
 	svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
 	clr_exception_intercept(svm, UD_VECTOR);
 
+	/*
+	 * Don't intercept #GP for SEV guests, e.g. for the VMware backdoor, as
+	 * KVM can't decrypt guest memory to decode the faulting instruction.
+	 */
+	clr_exception_intercept(svm, GP_VECTOR);
+
 	if (sev_es_guest(svm->vcpu.kvm))
 		sev_es_init_vmcb(svm);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1e1c1eb13392..dc12de325cca 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1253,10 +1253,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	 * Guest access to VMware backdoor ports could legitimately
 	 * trigger #GP because of TSS I/O permission bitmap.
 	 * We intercept those #GP and allow access to them anyway
-	 * as VMware does.  Don't intercept #GP for SEV guests as KVM can't
-	 * decrypt guest memory to decode the faulting instruction.
+	 * as VMware does.
 	 */
-	if (enable_vmware_backdoor && !sev_guest(vcpu->kvm))
+	if (enable_vmware_backdoor)
 		set_exception_intercept(svm, GP_VECTOR);
 
 	svm_set_intercept(svm, INTERCEPT_INTR);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH kernel v5 3/6] KVM: SEV-ES: explicitly disable debug
  2023-04-11 12:57 [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
  2023-04-11 12:57 ` [PATCH kernel v5 1/6] KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header Alexey Kardashevskiy
  2023-04-11 12:57 ` [PATCH kernel v5 2/6] KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV Alexey Kardashevskiy
@ 2023-04-11 12:57 ` Alexey Kardashevskiy
  2023-05-22 22:50   ` Sean Christopherson
  2023-04-11 12:57 ` [PATCH kernel v5 4/6] KVM: SVM/SEV/SEV-ES: Rework intercepts Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-11 12:57 UTC (permalink / raw)
  To: kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Pankaj Gupta, Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Alexey Kardashevskiy

SVM/SEV enable debug registers intercepts to skip swapping DRs
on entering/exiting the guest. When the guest is in control of
debug registers (vcpu->guest_debug == 0), there is an optimisation to
reduce the number of context switches: intercepts are cleared and
the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping
on guest enter/exit.

The same code also executes for SEV-ES, however it has no effect as
- it always takes (vcpu->guest_debug == 0) branch;
- KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared;
- vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them
with the values from _encrypted_ VMSA.

Be explicit about SEV-ES not supporting debug:
- return right away from dr_interception() and skip unnecessary processing;
- clear vcpu->guest_debug at SEV-ES' LAUNCH_UPDATE_VMSA if debugging
was already enabled; after that point the generic x86's
KVM_SET_GUEST_DEBUG ioctl disallows enabling debug.

Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on
guest exit) to signify that SEV-ES won't hit that path.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
---
Changes:
v5:
* new in the series
---
 arch/x86/kvm/svm/sev.c |  6 ++++++
 arch/x86/kvm/svm/svm.c | 10 +++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0f4761a57d86..b4365622222b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -639,6 +639,12 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
 	  return ret;
 
 	vcpu->arch.guest_state_protected = true;
+
+	if (vcpu->guest_debug)
+		pr_warn_ratelimited("guest_debug (%lx) not supported for SEV-ES",
+				    vcpu->guest_debug);
+	vcpu->guest_debug = 0;
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dc12de325cca..179952a31d3b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1980,7 +1980,7 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (vcpu->arch.guest_state_protected)
+	if (WARN_ON_ONCE(sev_es_guest(vcpu->kvm)))
 		return;
 
 	get_debugreg(vcpu->arch.db[0], 0);
@@ -2698,6 +2698,14 @@ static int dr_interception(struct kvm_vcpu *vcpu)
 	unsigned long val;
 	int err = 0;
 
+	/*
+	 * SEV-ES intercepts DR7 only to disable guest debugging
+	 * and the guest issues a VMGEXIT for DR7 write only. KVM cannot
+	 * change DR7 (always swapped as type 'A') so return early.
+	 */
+	if (sev_es_guest(vcpu->kvm))
+		return 1;
+
 	if (vcpu->guest_debug == 0) {
 		/*
 		 * No more DR vmexits; force a reload of the debug registers
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH kernel v5 4/6] KVM: SVM/SEV/SEV-ES: Rework intercepts
  2023-04-11 12:57 [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2023-04-11 12:57 ` [PATCH kernel v5 3/6] KVM: SEV-ES: explicitly disable debug Alexey Kardashevskiy
@ 2023-04-11 12:57 ` Alexey Kardashevskiy
  2023-05-22 22:53   ` Sean Christopherson
  2023-04-11 12:57 ` [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-11 12:57 UTC (permalink / raw)
  To: kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Pankaj Gupta, Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Alexey Kardashevskiy

Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb()
and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs.

Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on
set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
---
Changes:
v5:
* updated the comments
* removed sev_es_guest() checks from set_dr_intercepts()/clr_dr_intercepts()
* removed remaining intercepts from clr_dr_intercepts()
---
 arch/x86/kvm/svm/sev.c | 11 ++++++
 arch/x86/kvm/svm/svm.c | 37 ++++++++------------
 2 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b4365622222b..f0885250252d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2946,6 +2946,7 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
 
 static void sev_es_init_vmcb(struct vcpu_svm *svm)
 {
+	struct vmcb *vmcb = svm->vmcb01.ptr;
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
 	svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
@@ -2974,6 +2975,16 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	svm_set_intercept(svm, TRAP_CR4_WRITE);
 	svm_set_intercept(svm, TRAP_CR8_WRITE);
 
+	/*
+	 * DR7 access must remain intercepted for an SEV-ES guest to disallow
+	 * the guest kernel enable debugging as otherwise a VM writing to DR7
+	 * from the #DB handler may trigger infinite loop of #DB's.
+	 */
+	vmcb->control.intercepts[INTERCEPT_DR] = 0;
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+	recalc_intercepts(svm);
+
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 179952a31d3b..0271360e8fde 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -691,23 +691,20 @@ static void set_dr_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = svm->vmcb01.ptr;
 
-	if (!sev_es_guest(svm->vcpu.kvm)) {
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
-	}
-
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
+	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
 	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
 	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
 
@@ -720,12 +717,6 @@ static void clr_dr_intercepts(struct vcpu_svm *svm)
 
 	vmcb->control.intercepts[INTERCEPT_DR] = 0;
 
-	/* DR7 access must remain intercepted for an SEV-ES guest */
-	if (sev_es_guest(svm->vcpu.kvm)) {
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
-		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
-	}
-
 	recalc_intercepts(svm);
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-04-11 12:57 [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2023-04-11 12:57 ` [PATCH kernel v5 4/6] KVM: SVM/SEV/SEV-ES: Rework intercepts Alexey Kardashevskiy
@ 2023-04-11 12:57 ` Alexey Kardashevskiy
  2023-05-09 10:58   ` Gupta, Pankaj
  2023-05-22 23:39   ` Sean Christopherson
  2023-04-11 12:57 ` [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write Alexey Kardashevskiy
  2023-04-20  1:49 ` [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
  6 siblings, 2 replies; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-11 12:57 UTC (permalink / raw)
  To: kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Pankaj Gupta, Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Alexey Kardashevskiy

Prior to SEV-ES, KVM saved/restored host debug registers upon switching
to/from a VM. Changing those registers inside a running SEV VM
triggered #VMEXIT to KVM.

SEV-ES added encrypted state (ES) which uses an encrypted page
for the VM state (VMSA). The hardware saves/restores certain registers
on VMRUN/VMEXIT according to a swap type (A, B, C), see
"Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
volume 2.

The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
guests, but a new feature is available, identified via
CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
support for swapping additional debug registers. DR[0-3] and
DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
is set.

Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
("NoNestedDataBp", "Processor ignores nested data breakpoints") is
supported by the SOC as otherwise a malicious SEV-ES guest can set up
data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialized but also on
intrahost migration when VMSA is encrypted.

Eliminate DR7 and #DB intercepts as:
- they are not needed when DebugSwap is supported;
- #VC for these intercepts is most likely not supported anyway and
kills the VM.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
---
Changes:
v8:
* added CPUID's DebugSwap feature
* commit log, comments updated
* redid the whole thing

v4:
* removed sev_es_is_debug_swap_enabled() helper
* made sev_es_debug_swap_enabled (module param) static
* set sev_feature early in sev_es_init_vmcb() and made intercepts
  dependend on it vs. module param
* move set_/clr_dr_intercepts to .c

v3:
* rewrote the commit log again
* rebased on tip/master to use recently defined X86_FEATURE_NO_NESTED_DATA_BP
* s/boot_cpu_has/cpu_feature_enabled/

v2:
* debug_swap moved from vcpu to module_param
* rewrote commit log

---
Tested with:
===
int x;
int main(int argc, char *argv[])
{
        x = 1;
        return 0;
}
===
gcc -g a.c
rsync a.out ruby-954vm:~/
ssh -t ruby-954vm 'gdb -ex "file a.out" -ex "watch x" -ex r'

where ruby-954vm is a VM.

With "/sys/module/kvm_amd/parameters/debug_swap = 0", gdb does not stop
on the watchpoint, with "= 1" - gdb does.
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/svm.h         |  1 +
 arch/x86/kvm/svm/sev.c             | 36 ++++++++++++++++++--
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d9c190cdefa9..3a5eeb178778 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -435,6 +435,7 @@
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
 #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
+#define X86_FEATURE_DEBUG_SWAP		(19*32+14) /* AMD SEV-ES full debug state swap support */
 
 /* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
 #define X86_FEATURE_NO_NESTED_DATA_BP	(20*32+ 0) /* "" No Nested Data Breakpoints */
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 770dcf75eaa9..3a422d213010 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -280,6 +280,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
 #define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
 
+#define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f0885250252d..ba12e7962e94 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -22,6 +22,7 @@
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
 #include <asm/fpu/xcr.h>
+#include <asm/debugreg.h>
 
 #include "mmu.h"
 #include "x86.h"
@@ -53,9 +54,14 @@ module_param_named(sev, sev_enabled, bool, 0444);
 /* enable/disable SEV-ES support */
 static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-ES DebugSwap support */
+static bool sev_es_debug_swap_enabled = true;
+module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 #else
 #define sev_enabled false
 #define sev_es_enabled false
+#define sev_es_debug_swap_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
 static u8 sev_enc_bit;
@@ -605,6 +611,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->xss  = svm->vcpu.arch.ia32_xss;
 	save->dr6  = svm->vcpu.arch.dr6;
 
+	if (sev_es_debug_swap_enabled)
+		save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
@@ -2256,6 +2265,9 @@ void __init sev_hardware_setup(void)
 out:
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
+	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
+		sev_es_debug_swap_enabled = false;
 #endif
 }
 
@@ -2976,14 +2988,20 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 	svm_set_intercept(svm, TRAP_CR8_WRITE);
 
 	/*
+	 * Unless DebugSwap (depends on X86_FEATURE_NO_NESTED_DATA_BP) is enabled,
 	 * DR7 access must remain intercepted for an SEV-ES guest to disallow
 	 * the guest kernel enable debugging as otherwise a VM writing to DR7
 	 * from the #DB handler may trigger infinite loop of #DB's.
 	 */
 	vmcb->control.intercepts[INTERCEPT_DR] = 0;
-	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
-	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
-	recalc_intercepts(svm);
+	if (sev_es_debug_swap_enabled) {
+		clr_exception_intercept(svm, DB_VECTOR);
+		/* clr_exception_intercept() called recalc_intercepts() */
+	} else {
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+		recalc_intercepts(svm);
+	}
 
 	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
 	svm_clr_intercept(svm, INTERCEPT_XSETBV);
@@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
 
 	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
 	hostsa->xss = host_xss;
+
+	/* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
+	if (sev_es_debug_swap_enabled) {
+		hostsa->dr0 = native_get_debugreg(0);
+		hostsa->dr1 = native_get_debugreg(1);
+		hostsa->dr2 = native_get_debugreg(2);
+		hostsa->dr3 = native_get_debugreg(3);
+		hostsa->dr0_addr_mask = amd_get_dr_addr_mask(0);
+		hostsa->dr1_addr_mask = amd_get_dr_addr_mask(1);
+		hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
+		hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
+	}
 }
 
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write
  2023-04-11 12:57 [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2023-04-11 12:57 ` [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES Alexey Kardashevskiy
@ 2023-04-11 12:57 ` Alexey Kardashevskiy
  2023-05-22 23:44   ` Sean Christopherson
  2023-04-20  1:49 ` [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
  6 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-11 12:57 UTC (permalink / raw)
  To: kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Pankaj Gupta, Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Alexey Kardashevskiy

With MSR_AMD64_SEV_DEBUG_SWAP enabled, the guest is not expected to
receive a #VC for reads or writes of DR7.

Update the SNP_FEATURES_PRESENT mask with MSR_AMD64_SNP_DEBUG_SWAP so
an SNP guest doesn't gracefully terminate during SNP feature negotiation
if MSR_AMD64_SEV_DEBUG_SWAP is enabled.

Since a guest is not expected to receive a #VC on DR7 accesses when
MSR_AMD64_SEV_DEBUG_SWAP is enabled, return an error from the #VC
handler in this situation.

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
---
Changes:
v4:
* rebased on top of SNP feature negotiation

v2:
* use new bit definition
---
 arch/x86/boot/compressed/sev.c | 2 +-
 arch/x86/kernel/sev.c          | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 014b89c89088..f6123808be42 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -313,7 +313,7 @@ static void enforce_vmpl0(void)
  * by the guest kernel. As and when a new feature is implemented in the
  * guest kernel, a corresponding bit should be added to the mask.
  */
-#define SNP_FEATURES_PRESENT (0)
+#define SNP_FEATURES_PRESENT	MSR_AMD64_SNP_DEBUG_SWAP
 
 void snp_check_features(void)
 {
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b031244d6d2d..a515eb880970 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1620,6 +1620,9 @@ static enum es_result vc_handle_dr7_write(struct ghcb *ghcb,
 	long val, *reg = vc_insn_get_rm(ctxt);
 	enum es_result ret;
 
+	if (sev_status & MSR_AMD64_SNP_DEBUG_SWAP)
+		return ES_VMM_ERROR;
+
 	if (!reg)
 		return ES_DECODE_FAILED;
 
@@ -1657,6 +1660,9 @@ static enum es_result vc_handle_dr7_read(struct ghcb *ghcb,
 	struct sev_es_runtime_data *data = this_cpu_read(runtime_data);
 	long *reg = vc_insn_get_rm(ctxt);
 
+	if (sev_status & MSR_AMD64_SNP_DEBUG_SWAP)
+		return ES_VMM_ERROR;
+
 	if (!reg)
 		return ES_DECODE_FAILED;
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap
  2023-04-11 12:57 [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2023-04-11 12:57 ` [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write Alexey Kardashevskiy
@ 2023-04-20  1:49 ` Alexey Kardashevskiy
  2023-04-20 14:32   ` Sean Christopherson
  6 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-04-20  1:49 UTC (permalink / raw)
  To: kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Pankaj Gupta, Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Borislav Petkov

On 11/4/23 22:57, Alexey Kardashevskiy wrote:
> This is to use another AMD SEV-ES hardware assisted register swap,
> more detail in 5/6. In the process it's been suggested to fix other
> things, here is the attempt, with the great help of amders.
> 
> The previous conversation is here:
> https://lore.kernel.org/r/20230203051459.1354589-1-aik@amd.com
> 
> This is based on sha1
> f91f9332d782 Ingo Molnar "Merge branch into tip/master: 'x86/tdx'".
> 
> Please comment. Thanks.

Ping?
Or should I relax until the end of the nearest merge window (May 
6th-ish)? :) Thanks,


> 
> 
> Alexey Kardashevskiy (6):
>    KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header
>    KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV
>    KVM: SEV-ES: explicitly disable debug
>    KVM: SVM/SEV/SEV-ES: Rework intercepts
>    KVM: SEV: Enable data breakpoints in SEV-ES
>    x86/sev: Do not handle #VC for DR7 read/write
> 
>   arch/x86/include/asm/cpufeatures.h |  1 +
>   arch/x86/include/asm/svm.h         |  1 +
>   arch/x86/kvm/svm/svm.h             | 42 ---------------
>   arch/x86/boot/compressed/sev.c     |  2 +-
>   arch/x86/kernel/sev.c              |  6 +++
>   arch/x86/kvm/svm/sev.c             | 54 +++++++++++++++++++-
>   arch/x86/kvm/svm/svm.c             | 48 +++++++++++++++--
>   7 files changed, 105 insertions(+), 49 deletions(-)
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap
  2023-04-20  1:49 ` [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
@ 2023-04-20 14:32   ` Sean Christopherson
  2023-05-19  0:19     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2023-04-20 14:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Borislav Petkov

On Thu, Apr 20, 2023, Alexey Kardashevskiy wrote:
> On 11/4/23 22:57, Alexey Kardashevskiy wrote:
> > This is to use another AMD SEV-ES hardware assisted register swap,
> > more detail in 5/6. In the process it's been suggested to fix other
> > things, here is the attempt, with the great help of amders.
> > 
> > The previous conversation is here:
> > https://lore.kernel.org/r/20230203051459.1354589-1-aik@amd.com
> > 
> > This is based on sha1
> > f91f9332d782 Ingo Molnar "Merge branch into tip/master: 'x86/tdx'".
> > 
> > Please comment. Thanks.
> 
> Ping?
> Or should I relax until the end of the nearest merge window (May 6th-ish)?
> :) Thanks,

Sorry, the answer is "relax".  I'm likely going to be offline for a few days in
early May, so it might be more like May 15th until you hear from me, but this is
on my todo list.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-04-11 12:57 ` [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES Alexey Kardashevskiy
@ 2023-05-09 10:58   ` Gupta, Pankaj
  2023-05-10  9:35     ` Gupta, Pankaj
  2023-05-22 23:39   ` Sean Christopherson
  1 sibling, 1 reply; 27+ messages in thread
From: Gupta, Pankaj @ 2023-05-09 10:58 UTC (permalink / raw)
  To: Alexey Kardashevskiy, kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
> to/from a VM. Changing those registers inside a running SEV VM
> triggered #VMEXIT to KVM.
> 
> SEV-ES added encrypted state (ES) which uses an encrypted page
> for the VM state (VMSA). The hardware saves/restores certain registers
> on VMRUN/VMEXIT according to a swap type (A, B, C), see
> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> volume 2.
> 
> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
> guests, but a new feature is available, identified via
> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
> support for swapping additional debug registers. DR[0-3] and
> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
> is set.
> 
> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> supported by the SOC as otherwise a malicious SEV-ES guest can set up
> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.

You mean #DB => #BP here?

> Set the features bit in sev_es_sync_vmsa() which is the last point
> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> init happens) is called not only when VCPU is initialized but also on
> intrahost migration when VMSA is encrypted.
> 
> Eliminate DR7 and #DB intercepts as:
> - they are not needed when DebugSwap is supported;
> - #VC for these intercepts is most likely not supported anyway and
> kills the VM.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> Changes:
> v8:
> * added CPUID's DebugSwap feature
> * commit log, comments updated
> * redid the whole thing
> 
> v4:
> * removed sev_es_is_debug_swap_enabled() helper
> * made sev_es_debug_swap_enabled (module param) static
> * set sev_feature early in sev_es_init_vmcb() and made intercepts
>    dependend on it vs. module param
> * move set_/clr_dr_intercepts to .c
> 
> v3:
> * rewrote the commit log again
> * rebased on tip/master to use recently defined X86_FEATURE_NO_NESTED_DATA_BP
> * s/boot_cpu_has/cpu_feature_enabled/
> 
> v2:
> * debug_swap moved from vcpu to module_param
> * rewrote commit log
> 
> ---
> Tested with:
> ===
> int x;
> int main(int argc, char *argv[])
> {
>          x = 1;
>          return 0;
> }
> ===
> gcc -g a.c
> rsync a.out ruby-954vm:~/
> ssh -t ruby-954vm 'gdb -ex "file a.out" -ex "watch x" -ex r'
> 
> where ruby-954vm is a VM.
> 
> With "/sys/module/kvm_amd/parameters/debug_swap = 0", gdb does not stop
> on the watchpoint, with "= 1" - gdb does.
> ---
>   arch/x86/include/asm/cpufeatures.h |  1 +
>   arch/x86/include/asm/svm.h         |  1 +
>   arch/x86/kvm/svm/sev.c             | 36 ++++++++++++++++++--
>   3 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d9c190cdefa9..3a5eeb178778 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -435,6 +435,7 @@
>   #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
>   #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
>   #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
> +#define X86_FEATURE_DEBUG_SWAP		(19*32+14) /* AMD SEV-ES full debug state swap support */
>   
>   /* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
>   #define X86_FEATURE_NO_NESTED_DATA_BP	(20*32+ 0) /* "" No Nested Data Breakpoints */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 770dcf75eaa9..3a422d213010 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -280,6 +280,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
>   #define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
>   #define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
>   
> +#define SVM_SEV_FEAT_DEBUG_SWAP                        BIT(5)
>   
>   struct vmcb_seg {
>   	u16 selector;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index f0885250252d..ba12e7962e94 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -22,6 +22,7 @@
>   #include <asm/pkru.h>
>   #include <asm/trapnr.h>
>   #include <asm/fpu/xcr.h>
> +#include <asm/debugreg.h>
>   
>   #include "mmu.h"
>   #include "x86.h"
> @@ -53,9 +54,14 @@ module_param_named(sev, sev_enabled, bool, 0444);
>   /* enable/disable SEV-ES support */
>   static bool sev_es_enabled = true;
>   module_param_named(sev_es, sev_es_enabled, bool, 0444);
> +
> +/* enable/disable SEV-ES DebugSwap support */
> +static bool sev_es_debug_swap_enabled = true;
> +module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
>   #else
>   #define sev_enabled false
>   #define sev_es_enabled false
> +#define sev_es_debug_swap_enabled false
>   #endif /* CONFIG_KVM_AMD_SEV */
>   
>   static u8 sev_enc_bit;
> @@ -605,6 +611,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>   	save->xss  = svm->vcpu.arch.ia32_xss;
>   	save->dr6  = svm->vcpu.arch.dr6;
>   
> +	if (sev_es_debug_swap_enabled)
> +		save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
> +
>   	pr_debug("Virtual Machine Save Area (VMSA):\n");
>   	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
>   
> @@ -2256,6 +2265,9 @@ void __init sev_hardware_setup(void)
>   out:
>   	sev_enabled = sev_supported;
>   	sev_es_enabled = sev_es_supported;
> +	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
> +	    !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
> +		sev_es_debug_swap_enabled = false;
>   #endif
>   }
>   
> @@ -2976,14 +2988,20 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
>   	svm_set_intercept(svm, TRAP_CR8_WRITE);
>   
>   	/*
> +	 * Unless DebugSwap (depends on X86_FEATURE_NO_NESTED_DATA_BP) is enabled,
>   	 * DR7 access must remain intercepted for an SEV-ES guest to disallow
>   	 * the guest kernel enable debugging as otherwise a VM writing to DR7
>   	 * from the #DB handler may trigger infinite loop of #DB's.
>   	 */
>   	vmcb->control.intercepts[INTERCEPT_DR] = 0;
> -	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
> -	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
> -	recalc_intercepts(svm);
> +	if (sev_es_debug_swap_enabled) {
> +		clr_exception_intercept(svm, DB_VECTOR);
> +		/* clr_exception_intercept() called recalc_intercepts() */
> +	} else {
> +		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
> +		vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
> +		recalc_intercepts(svm);
> +	}
>   
>   	/* Can't intercept XSETBV, HV can't modify XCR0 directly */
>   	svm_clr_intercept(svm, INTERCEPT_XSETBV);
> @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>   
>   	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
>   	hostsa->xss = host_xss;
> +
> +	/* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
> +	if (sev_es_debug_swap_enabled) {
> +		hostsa->dr0 = native_get_debugreg(0);
> +		hostsa->dr1 = native_get_debugreg(1);
> +		hostsa->dr2 = native_get_debugreg(2);
> +		hostsa->dr3 = native_get_debugreg(3);
> +		hostsa->dr0_addr_mask = amd_get_dr_addr_mask(0);
> +		hostsa->dr1_addr_mask = amd_get_dr_addr_mask(1);
> +		hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
> +		hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
> +	}
>   }
>   
>   void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-05-09 10:58   ` Gupta, Pankaj
@ 2023-05-10  9:35     ` Gupta, Pankaj
  0 siblings, 0 replies; 27+ messages in thread
From: Gupta, Pankaj @ 2023-05-10  9:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy, kvm
  Cc: x86, linux-kernel, Tom Lendacky, Sean Christopherson,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao


>> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
>> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
>> supported by the SOC as otherwise a malicious SEV-ES guest can set up
>> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
> 
> You mean #DB => #BP here
Indeed its #DB. Was thinking something else.

Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap
  2023-04-20 14:32   ` Sean Christopherson
@ 2023-05-19  0:19     ` Alexey Kardashevskiy
  2023-05-19 15:28       ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-05-19  0:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Borislav Petkov

Hi Sean,

is that still on the list? Just checking :) Thanks,


On 21/4/23 00:32, Sean Christopherson wrote:
> On Thu, Apr 20, 2023, Alexey Kardashevskiy wrote:
>> On 11/4/23 22:57, Alexey Kardashevskiy wrote:
>>> This is to use another AMD SEV-ES hardware assisted register swap,
>>> more detail in 5/6. In the process it's been suggested to fix other
>>> things, here is the attempt, with the great help of amders.
>>>
>>> The previous conversation is here:
>>> https://lore.kernel.org/r/20230203051459.1354589-1-aik@amd.com
>>>
>>> This is based on sha1
>>> f91f9332d782 Ingo Molnar "Merge branch into tip/master: 'x86/tdx'".
>>>
>>> Please comment. Thanks.
>>
>> Ping?
>> Or should I relax until the end of the nearest merge window (May 6th-ish)?
>> :) Thanks,
> 
> Sorry, the answer is "relax".  I'm likely going to be offline for a few days in
> early May, so it might be more like May 15th until you hear from me, but this is
> on my todo list.

-- 
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap
  2023-05-19  0:19     ` Alexey Kardashevskiy
@ 2023-05-19 15:28       ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2023-05-19 15:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao,
	Borislav Petkov

On Fri, May 19, 2023, Alexey Kardashevskiy wrote:
> Hi Sean,
> 
> is that still on the list? Just checking :) Thanks,

Yes, sorry for the long delay, I'm getting a late start on reviews this cycle for
a variety of reasons.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 3/6] KVM: SEV-ES: explicitly disable debug
  2023-04-11 12:57 ` [PATCH kernel v5 3/6] KVM: SEV-ES: explicitly disable debug Alexey Kardashevskiy
@ 2023-05-22 22:50   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2023-05-22 22:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> SVM/SEV enable debug registers intercepts to skip swapping DRs
> on entering/exiting the guest. When the guest is in control of
> debug registers (vcpu->guest_debug == 0), there is an optimisation to
> reduce the number of context switches: intercepts are cleared and
> the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping
> on guest enter/exit.
> 
> The same code also executes for SEV-ES, however it has no effect as
> - it always takes (vcpu->guest_debug == 0) branch;
> - KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared;
> - vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them
> with the values from _encrypted_ VMSA.
> 
> Be explicit about SEV-ES not supporting debug:
> - return right away from dr_interception() and skip unnecessary processing;
> - clear vcpu->guest_debug at SEV-ES' LAUNCH_UPDATE_VMSA if debugging
> was already enabled; after that point the generic x86's
> KVM_SET_GUEST_DEBUG ioctl disallows enabling debug.
> 
> Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on
> guest exit) to signify that SEV-ES won't hit that path.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> Changes:
> v5:
> * new in the series
> ---
>  arch/x86/kvm/svm/sev.c |  6 ++++++
>  arch/x86/kvm/svm/svm.c | 10 +++++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0f4761a57d86..b4365622222b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -639,6 +639,12 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  	  return ret;
>  
>  	vcpu->arch.guest_state_protected = true;
> +
> +	if (vcpu->guest_debug)
> +		pr_warn_ratelimited("guest_debug (%lx) not supported for SEV-ES",

Note, this needs a newline in the printk, otherwise it'll get buffered until
the next non-cont printk comes along (guess how many times I've been burned by
this).

> +				    vcpu->guest_debug);
> +	vcpu->guest_debug = 0;

Argh, KVM's APIs can be quite frustrating.  IIUC, guest_debug can never actually
be consumed because, per Tom[*], "A guest can't run before the LAUNCH_UPDATE process
is complete".  But because the fact that the VM is an SEV-ES is communicated to
KVM after KVM_CREATE_VM, userspace can do KVM_SET_GUEST_DEBUG before KVM_SEV_ES_INIT
and before KVM_SEV_LAUNCH_UPDATE_VMSA, and thus get KVM into a state where
guest_debug is non-zero for an SEV-ES guest.  Blech.

Instead of a ratelimited warn, can KVM get away with simply rejecting
KVM_SEV_LAUNCH_UPDATE_VMSA if guest_debug is non-zero?  That combo can't work,
so it's seems unlikely userspace is relying on being able to do KVM_SET_GUEST_DEBUG.

If we do "have" to keep this approach, I'm generally opposed to any kind of printk
in KVM, but this one does seem to be justified since the most likely scenario is
that there's a human interactively debugging the guest (or at least, trying to
debug the guest).  But I would say explicitly call out the ioctl(), "guest_debug"
probably won't mean anything to a random user.  And I vote to not print the value,
that implies that the specific value is unsupported, not that debug in general is
disallowed.

Something like this (if we have to)?

		pr_warn_ratelimited("Suppressing KVM_SET_GUEST_DEBUG for SEV-ES guest\n"

[*] https://lore.kernel.org/all/7edcf2c3-005f-04bd-7ec6-80baee236f40@amd.com

> +
>  	return 0;
>  }
>  
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dc12de325cca..179952a31d3b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1980,7 +1980,7 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	if (vcpu->arch.guest_state_protected)
> +	if (WARN_ON_ONCE(sev_es_guest(vcpu->kvm)))
>  		return;
>  
>  	get_debugreg(vcpu->arch.db[0], 0);
> @@ -2698,6 +2698,14 @@ static int dr_interception(struct kvm_vcpu *vcpu)
>  	unsigned long val;
>  	int err = 0;
>  
> +	/*
> +	 * SEV-ES intercepts DR7 only to disable guest debugging
> +	 * and the guest issues a VMGEXIT for DR7 write only. KVM cannot

Wrapping is a bit aggressive (wrap at 80, not earlier).

> +	 * change DR7 (always swapped as type 'A') so return early.
> +	 */
> +	if (sev_es_guest(vcpu->kvm))
> +		return 1;
> +
>  	if (vcpu->guest_debug == 0) {
>  		/*
>  		 * No more DR vmexits; force a reload of the debug registers
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 4/6] KVM: SVM/SEV/SEV-ES: Rework intercepts
  2023-04-11 12:57 ` [PATCH kernel v5 4/6] KVM: SVM/SEV/SEV-ES: Rework intercepts Alexey Kardashevskiy
@ 2023-05-22 22:53   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2023-05-22 22:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> Currently SVM setup is done sequentially in
> init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb()
> and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
> is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs.
> 
> Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on
> set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only.
> 
> No functional change intended.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Reviewed-by: Santosh Shukla <santosh.shukla@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> Changes:
> v5:
> * updated the comments
> * removed sev_es_guest() checks from set_dr_intercepts()/clr_dr_intercepts()
> * removed remaining intercepts from clr_dr_intercepts()
> ---
>  arch/x86/kvm/svm/sev.c | 11 ++++++
>  arch/x86/kvm/svm/svm.c | 37 ++++++++------------
>  2 files changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b4365622222b..f0885250252d 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2946,6 +2946,7 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
>  
>  static void sev_es_init_vmcb(struct vcpu_svm *svm)
>  {
> +	struct vmcb *vmcb = svm->vmcb01.ptr;
>  	struct kvm_vcpu *vcpu = &svm->vcpu;
>  
>  	svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
> @@ -2974,6 +2975,16 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
>  	svm_set_intercept(svm, TRAP_CR4_WRITE);
>  	svm_set_intercept(svm, TRAP_CR8_WRITE);
>  
> +	/*
> +	 * DR7 access must remain intercepted for an SEV-ES guest to disallow
> +	 * the guest kernel enable debugging as otherwise a VM writing to DR7
> +	 * from the #DB handler may trigger infinite loop of #DB's.

This is wrong.  The attack isn't writing DR7 in the #DB handler, it's setting up
a #DB on memory that's needed to vector a #DB, e.g. the stack, so that the _CPU_
itself gets stuck in an infinite #DB loop[*].  The guest software handler putting
itself into an infinite loop is a non-issue because it can be interrupted.

[*] https://bugzilla.redhat.com/show_bug.cgi?id=1278496

> +	 */
> +	vmcb->control.intercepts[INTERCEPT_DR] = 0;
> +	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
> +	vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
> +	recalc_intercepts(svm);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-04-11 12:57 ` [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES Alexey Kardashevskiy
  2023-05-09 10:58   ` Gupta, Pankaj
@ 2023-05-22 23:39   ` Sean Christopherson
  2023-05-23 11:33     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2023-05-22 23:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
> to/from a VM. Changing those registers inside a running SEV VM
> triggered #VMEXIT to KVM.

Please, please don't make it sound like some behavior is *the* one and only behavior.
*KVM* *chooses* to intercept debug register accesses.  Though I would omit this
paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
a basic understanding of how KVM deals with DRs and #DBs. 

> SEV-ES added encrypted state (ES) which uses an encrypted page
> for the VM state (VMSA). The hardware saves/restores certain registers
> on VMRUN/VMEXIT according to a swap type (A, B, C), see
> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> volume 2.
> 
> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES

Please rewrite this to just state what the behavior is instead of "Type A" vs.
"Type B".  Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
enough to justify obfuscating super simple concepts.

Actually, this feature really has nothing to do with Type A vs. Type B, since
that's purely about host state.  I assume the switch from Type A to Type B is a
side effect, or an opportunistic optimization?

Regardless, something like this is a lot easier for a human to read.  It's easy
enough to find DebugSwap in the APM (literally took me longer to find my copy of
the PDF).

  Add support for "DebugSwap for SEV-ES guests", which provides support
  for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
  allows KVM to expose debug capabilities to SEV-ES guests.  Without
  DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
  and KVM cannot manually context switch guest DRs due the VMSA being
  encrypted.

  Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
  which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
  vectoring a #DB.  Without NoNestedDataBp, a malicious guest can DoS
  the host by putting the CPU into an infinite loop of vectoring #DBs
  (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

  Set the features bit in sev_es_sync_vmsa() which is the last point
  when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
  init happens) is called not only when VCPU is initialized but also on
  intrahost migration when VMSA is encrypted.

> guests, but a new feature is available, identified via
> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
> support for swapping additional debug registers. DR[0-3] and
> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
> is set.
> 
> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> supported by the SOC as otherwise a malicious SEV-ES guest can set up
> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
> Set the features bit in sev_es_sync_vmsa() which is the last point
> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> init happens) is called not only when VCPU is initialized but also on
> intrahost migration when VMSA is encrypted.
> 
> Eliminate DR7 and #DB intercepts as:
> - they are not needed when DebugSwap is supported;

"not needed" isn't sufficient justification.  KVM doesn't strictly need to do a
lot of things, but does them anyways for a variety of reasons.  E.g. #DB intercept
is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
when NoNestedDataBp is support.  Presumably the answer is "because it's simpler
than toggling #DB interception for guest_debug.

Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
separate patch?  KVM can still inject #DBs for SEV-ES guests, no?

As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.

> - #VC for these intercepts is most likely not supported anyway and
> kills the VM.

I don't see how this is relevant or helpful.  What the guest may or may not do in
response to a #VC on a DR7 write has no bearing on KVM's behavior. 

> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---

...

> @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>  
>  	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
>  	hostsa->xss = host_xss;
> +
> +	/* The DebugSwap SEV feature does Type B swaps of DR[0-3] */

Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
Can you fold the below somewhere before this patch, and then tweak this comment
to say something like:

	/*
	 * If DebugSwap is enabled, debug registers are loaded but NOT saved by
	 * the CPU (Type-B).  If DebugSwap is disabled/unsupported, the CPU both
	 * saves and loads debug registers (Type-A).
	 */

[*] https://lore.kernel.org/all/YzOTOzUzKPQSqURo@google.com/

--
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 22 May 2023 16:29:52 -0700
Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
 about swap types

Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
needs to save a seemingly random subset of host state, and to provide a
decoder for the APM's Type-A/B/C terminology.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 69ae5e1b3120..1e0e9b08e491 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
 {
 	/*
-	 * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
-	 * of which one step is to perform a VMLOAD.  KVM performs the
-	 * corresponding VMSAVE in svm_prepare_guest_switch for both
-	 * traditional and SEV-ES guests.
+	 * All host state for SEV-ES guests is categorized into three swap types
+	 * based on how it is handled by hardware during a world switch:
+	 *
+	 * A: VMRUN:   Host state saved in host save area
+	 *    VMEXIT:  Host state loaded from host save area
+	 *
+	 * B: VMRUN:   Host state _NOT_ saved in host save area
+	 *    VMEXIT:  Host state loaded from host save area
+	 *
+	 * C: VMRUN:   Host state _NOT_ saved in host save area
+	 *    VMEXIT:  Host state initialized to default(reset) values
+	 *
+	 * Manually save type-B state, i.e. state that is loaded by VMEXIT but
+	 * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
+	 * by common SVM code).
 	 */
-
-	/* XCR0 is restored on VMEXIT, save the current host value */
 	hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
-
-	/* PKRU is restored on VMEXIT, save the current host value */
 	hostsa->pkru = read_pkru();
-
-	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
 	hostsa->xss = host_xss;
 }
 

base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write
  2023-04-11 12:57 ` [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write Alexey Kardashevskiy
@ 2023-05-22 23:44   ` Sean Christopherson
  2023-05-24  6:36     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2023-05-22 23:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> With MSR_AMD64_SEV_DEBUG_SWAP enabled, the guest is not expected to
> receive a #VC for reads or writes of DR7.
> 
> Update the SNP_FEATURES_PRESENT mask with MSR_AMD64_SNP_DEBUG_SWAP so
> an SNP guest doesn't gracefully terminate during SNP feature negotiation
> if MSR_AMD64_SEV_DEBUG_SWAP is enabled.
> 
> Since a guest is not expected to receive a #VC on DR7 accesses when
> MSR_AMD64_SEV_DEBUG_SWAP is enabled, return an error from the #VC
> handler in this situation.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
> ---
> Changes:
> v4:
> * rebased on top of SNP feature negotiation
> 
> v2:
> * use new bit definition
> ---
>  arch/x86/boot/compressed/sev.c | 2 +-
>  arch/x86/kernel/sev.c          | 6 ++++++
>  2 files changed, 7 insertions(+), 1 deletion(-)

Can you post this separately (or bribe Boris to grab it)?  IIUC, this has no
dependency on the KVM enabling, i.e. can/should go through the tip tree without
waiting for the KVM patches to be applied.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-05-22 23:39   ` Sean Christopherson
@ 2023-05-23 11:33     ` Alexey Kardashevskiy
  2023-05-23 15:44       ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-05-23 11:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao



On 23/5/23 09:39, Sean Christopherson wrote:
> On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
>> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
>> to/from a VM. Changing those registers inside a running SEV VM
>> triggered #VMEXIT to KVM.
> 
> Please, please don't make it sound like some behavior is *the* one and only behavior.
> *KVM* *chooses* to intercept debug register accesses.  Though I would omit this
> paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
> a basic understanding of how KVM deals with DRs and #DBs.

Out of curiosity - is ARM similar in this regard, interceptions and stuff?

>> SEV-ES added encrypted state (ES) which uses an encrypted page
>> for the VM state (VMSA). The hardware saves/restores certain registers
>> on VMRUN/VMEXIT according to a swap type (A, B, C), see
>> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
>> volume 2.
>>
>> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
> 
> Please rewrite this to just state what the behavior is instead of "Type A" vs.
> "Type B".  Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
> enough to justify obfuscating super simple concepts.
> 
> Actually, this feature really has nothing to do with Type A vs. Type B, since
> that's purely about host state. 

Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in 
HOSTSA, then the host looses debug state because of the working feature.

> I assume the switch from Type A to Type B is a
> side effect, or an opportunistic optimization?

There is no switch. DR[67] were and are one type, and the other DRs were 
not swapped and now are, but with a different Swap Type.

And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves 
some explaining why is that.

> Regardless, something like this is a lot easier for a human to read.  It's easy
> enough to find DebugSwap in the APM (literally took me longer to find my copy of
> the PDF).

It is also easy to find "Swap Types" in the APM...

>    Add support for "DebugSwap for SEV-ES guests", which provides support
>    for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
>    allows KVM to expose debug capabilities to SEV-ES guests.  Without
>    DebugSwap support, the CPU doesn't save/load _guest_ debug registers,

But it does save/load DR6 and DR7.

>    and KVM cannot manually context switch guest DRs due the VMSA being
>    encrypted.
> 
>    Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
>    which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
>    vectoring a #DB. 

(english question) What does "vectoring" mean here precisely? Handling? 
Processing?

> Without NoNestedDataBp, a malicious guest can DoS
>    the host by putting the CPU into an infinite loop of vectoring #DBs
>    (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

Good one, thanks for the link.

> 
>    Set the features bit in sev_es_sync_vmsa() which is the last point
>    when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
>    init happens) is called not only when VCPU is initialized but also on
>    intrahost migration when VMSA is encrypted.
> 
>> guests, but a new feature is available, identified via
>> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
>> support for swapping additional debug registers. DR[0-3] and
>> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
>> is set.
>>
>> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
>> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
>> supported by the SOC as otherwise a malicious SEV-ES guest can set up
>> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
>> Set the features bit in sev_es_sync_vmsa() which is the last point
>> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
>> init happens) is called not only when VCPU is initialized but also on
>> intrahost migration when VMSA is encrypted.
>>
>> Eliminate DR7 and #DB intercepts as:
>> - they are not needed when DebugSwap is supported;
> 
> "not needed" isn't sufficient justification.  KVM doesn't strictly need to do a
> lot of things, but does them anyways for a variety of reasons.  E.g. #DB intercept
> is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
> this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
> when NoNestedDataBp is support.  Presumably the answer is "because it's simpler
> than toggling #DB interception for guest_debug.

TBH I did not have a good answer but from the above I'd say we want to 
disable #DB intercepts in a separate patch, just as you say below.

> Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
> separate patch?  KVM can still inject #DBs for SEV-ES guests, no?

Sorry for my ignorance but what is the point of injecting #DB if there 
is no way of changing the guest's DR7?


> As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
> and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.
> 
>> - #VC for these intercepts is most likely not supported anyway and
>> kills the VM.
> 
> I don't see how this is relevant or helpful.  What the guest may or may not do in
> response to a #VC on a DR7 write has no bearing on KVM's behavior.

Readers of the patch may wonder of the chances of breaks guests. 
Definitely helps me to realize there is likely to be some sort of 
panic() in the guest if such intercept happens.


>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
>> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
> 
> ...
> 
>> @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>>   
>>   	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
>>   	hostsa->xss = host_xss;
>> +
>> +	/* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
> 
> Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
> Can you fold the below somewhere before this patch, and then tweak this comment
> to say something like:
> 
> 	/*
> 	 * If DebugSwap is enabled, debug registers are loaded but NOT saved by
> 	 * the CPU (Type-B).  If DebugSwap is disabled/unsupported, the CPU both
> 	 * saves and loads debug registers (Type-A).
> 	 */

Sure but...

> 
> [*] https://lore.kernel.org/all/YzOTOzUzKPQSqURo@google.com/
> 
> 
> From: Sean Christopherson <seanjc@google.com>
> Date: Mon, 22 May 2023 16:29:52 -0700
> Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
>   about swap types


... I am missing the point here. You already wrote the patch below which 
1) you are happy about 2) you can push out right away to the upstream. 
Are you suggesting that I repost it?


> 
> Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
> swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
> needs to save a seemingly random subset of host state, and to provide a
> decoder for the APM's Type-A/B/C terminology.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
>   1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 69ae5e1b3120..1e0e9b08e491 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
>   void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>   {
>   	/*
> -	 * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
> -	 * of which one step is to perform a VMLOAD. KVM performs the
> -	 * corresponding VMSAVE in svm_prepare_guest_switch for both
> -	 * traditional and SEV-ES guests.


When I see "traditional", I assume there was one single x86 KVM before 
say 2010 which was slow, emulated a lot but worked exactly the same on 
Intel and AMD. Which does not seem to ever be the case. May be say "SVM" 
here?


> +	 * All host state for SEV-ES guests is categorized into three swap types
> +	 * based on how it is handled by hardware during a world switch:
> +	 *
> +	 * A: VMRUN:   Host state saved in host save area
> +	 *    VMEXIT:  Host state loaded from host save area
> +	 *
> +	 * B: VMRUN:   Host state _NOT_ saved in host save area
> +	 *    VMEXIT:  Host state loaded from host save area
> +	 *
> +	 * C: VMRUN:   Host state _NOT_ saved in host save area
> +	 *    VMEXIT:  Host state initialized to default(reset) values
> +	 *
> +	 * Manually save type-B state, i.e. state that is loaded by VMEXIT but
> +	 * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
> +	 * by common SVM code).

Like here - "common SVM"?

Thanks for the comments!


>   	 */
> -
> -	/* XCR0 is restored on VMEXIT, save the current host value */
>   	hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
> -
> -	/* PKRU is restored on VMEXIT, save the current host value */
>   	hostsa->pkru = read_pkru();
> -
> -	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
>   	hostsa->xss = host_xss;
>   }
>   
> 
> base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f

-- 
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-05-23 11:33     ` Alexey Kardashevskiy
@ 2023-05-23 15:44       ` Sean Christopherson
  2023-05-26  3:16         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2023-05-23 15:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
> 
> 
> On 23/5/23 09:39, Sean Christopherson wrote:
> > On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> > > Prior to SEV-ES, KVM saved/restored host debug registers upon switching
> > > to/from a VM. Changing those registers inside a running SEV VM
> > > triggered #VMEXIT to KVM.
> > 
> > Please, please don't make it sound like some behavior is *the* one and only behavior.
> > *KVM* *chooses* to intercept debug register accesses.  Though I would omit this
> > paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
> > a basic understanding of how KVM deals with DRs and #DBs.
> 
> Out of curiosity - is ARM similar in this regard, interceptions and stuff?

Yes.  The granularity of interceptions varies based on the architectural revision,
and presumably there are things that always trap.  But the core concept of letting
software decide what to intercept is the same.

> > > SEV-ES added encrypted state (ES) which uses an encrypted page
> > > for the VM state (VMSA). The hardware saves/restores certain registers
> > > on VMRUN/VMEXIT according to a swap type (A, B, C), see
> > > "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> > > volume 2.
> > > 
> > > The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
> > 
> > Please rewrite this to just state what the behavior is instead of "Type A" vs.
> > "Type B".  Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
> > enough to justify obfuscating super simple concepts.
> > 
> > Actually, this feature really has nothing to do with Type A vs. Type B, since
> > that's purely about host state.
> 
> Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in HOSTSA,
> then the host looses debug state because of the working feature.
> 
> > I assume the switch from Type A to Type B is a
> > side effect, or an opportunistic optimization?
> 
> There is no switch. DR[67] were and are one type, and the other DRs were not
> swapped and now are, but with a different Swap Type.

Ah, this is what I missed.

> And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves some
> explaining why is that.
> 
> > Regardless, something like this is a lot easier for a human to read.  It's easy
> > enough to find DebugSwap in the APM (literally took me longer to find my copy of
> > the PDF).
> 
> It is also easy to find "Swap Types" in the APM...

Redirecting readers to specs for gory details is fine.  Redirecting for basic,
simple functionality is not.  Maybe the swap types will someday be common knowledge
in KVM, and an explanation will no longer be necessary, but we are nowhere near
that point.

> >    Add support for "DebugSwap for SEV-ES guests", which provides support
> >    for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
> >    allows KVM to expose debug capabilities to SEV-ES guests.  Without
> >    DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
> 
> But it does save/load DR6 and DR7.
> 
> >    and KVM cannot manually context switch guest DRs due the VMSA being
> >    encrypted.
> > 
> >    Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
> >    which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
> >    vectoring a #DB.
> 
> (english question) What does "vectoring" mean here precisely? Handling?
> Processing?

It's not really an English thing.  "Vectoring" is, or at least was, Intel terminology
for describing the flow from the initial detection of an exception to the exception's
delivery to software, e.g. covers the IDT lookup, any GDT/LDT lookups, pushing
information on the stack, fetching the software exception handler, etc.  Importantly,
it describes the period where there are no instructions retired and thus ucode doesn't
open event windows, i.e. where triggering another, non-contributory exception will
effectively "hang" the CPU (at least on CPUs without Intel's "notify" VM-Exit support).

> >    the host by putting the CPU into an infinite loop of vectoring #DBs
> >    (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)
> 
> Good one, thanks for the link.
> 
> > 
> >    Set the features bit in sev_es_sync_vmsa() which is the last point
> >    when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> >    init happens) is called not only when VCPU is initialized but also on
> >    intrahost migration when VMSA is encrypted.
> > 
> > > guests, but a new feature is available, identified via
> > > CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
> > > support for swapping additional debug registers. DR[0-3] and
> > > DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
> > > is set.
> > > 
> > > Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
> > > ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> > > supported by the SOC as otherwise a malicious SEV-ES guest can set up
> > > data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
> > > Set the features bit in sev_es_sync_vmsa() which is the last point
> > > when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> > > init happens) is called not only when VCPU is initialized but also on
> > > intrahost migration when VMSA is encrypted.
> > > 
> > > Eliminate DR7 and #DB intercepts as:
> > > - they are not needed when DebugSwap is supported;
> > 
> > "not needed" isn't sufficient justification.  KVM doesn't strictly need to do a
> > lot of things, but does them anyways for a variety of reasons.  E.g. #DB intercept
> > is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
> > this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
> > when NoNestedDataBp is support.  Presumably the answer is "because it's simpler
> > than toggling #DB interception for guest_debug.
> 
> TBH I did not have a good answer but from the above I'd say we want to
> disable #DB intercepts in a separate patch, just as you say below.
> 
> > Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
> > separate patch?  KVM can still inject #DBs for SEV-ES guests, no?
> 
> Sorry for my ignorance but what is the point of injecting #DB if there is no
> way of changing the guest's DR7?

Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
"What's the point of _intercepting_ #DB" is the real question.  And for SEV-ES guests
with DebugSwap, there is no point, which is why I agree that KVM should disable
interception in that case.  What I'm calling out is that disabling #Db interception
isn't _necessary_ for correctness (unless I'm missing something), which means that
it can and should go in a separate patch.

> > As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
> > and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.
> > 
> > > - #VC for these intercepts is most likely not supported anyway and
> > > kills the VM.
> > 
> > I don't see how this is relevant or helpful.  What the guest may or may not do in
> > response to a #VC on a DR7 write has no bearing on KVM's behavior.
> 
> Readers of the patch may wonder of the chances of breaks guests. Definitely
> helps me to realize there is likely to be some sort of panic() in the guest
> if such intercept happens.

But that's irrelevant.  Intercepting DR7 writes will break the guest regardless
of how the guest deals with the #VC.  If the guest eats the #VC and continues on,
the debug capabilities expected by the guest will still be missing, i.e. KVM has
broken the functionality of the guest.  I am total ok if the changelog describes
the _possible_ scenarios (within reason), e.g. that the guest will either panic
on an unexpected #VC or lose debug capabilities that were promised.  What I'm
objecting to is speculating on what the guest is _likely_ to do, and especially
using that speculation as justification for doing something in KVM.

> > > Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> > > Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> > > ---
> > 
> > ...
> > 
> > > @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> > >   	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
> > >   	hostsa->xss = host_xss;
> > > +
> > > +	/* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
> > 
> > Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
> > Can you fold the below somewhere before this patch, and then tweak this comment
> > to say something like:
> > 
> > 	/*
> > 	 * If DebugSwap is enabled, debug registers are loaded but NOT saved by
> > 	 * the CPU (Type-B).  If DebugSwap is disabled/unsupported, the CPU both
> > 	 * saves and loads debug registers (Type-A).
> > 	 */
> 
> Sure but...
> 
> > 
> > [*] https://lore.kernel.org/all/YzOTOzUzKPQSqURo@google.com/
> > 
> > 
> > From: Sean Christopherson <seanjc@google.com>
> > Date: Mon, 22 May 2023 16:29:52 -0700
> > Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
> >   about swap types
> 
> 
> ... I am missing the point here. You already wrote the patch below which 1)
> you are happy about 2) you can push out right away to the upstream. Are you
> suggesting that I repost it?

I am asking you to include it in your series because the comment I suggested above
(for DebugSwap) loosely depends on the revamped comment for sev_es_prepare_switch_to_guest()
as a whole.  I would like to settle on the exact wording for all of the comments
in sev_es_prepare_switch_to_guest() in a single series instead of having disjoint
threads.

> > Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
> > swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
> > needs to save a seemingly random subset of host state, and to provide a
> > decoder for the APM's Type-A/B/C terminology.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
> >   1 file changed, 15 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 69ae5e1b3120..1e0e9b08e491 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
> >   void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> >   {
> >   	/*
> > -	 * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
> > -	 * of which one step is to perform a VMLOAD. KVM performs the
> > -	 * corresponding VMSAVE in svm_prepare_guest_switch for both
> > -	 * traditional and SEV-ES guests.
> 
> 
> When I see "traditional", I assume there was one single x86 KVM before say
> 2010 which was slow, emulated a lot but worked exactly the same on Intel and
> AMD. Which does not seem to ever be the case. May be say "SVM" here?

This is the code being removed.  I agree that "traditional" is ambiguous, which
is why I want to delete it :-)

> > +	 * All host state for SEV-ES guests is categorized into three swap types
> > +	 * based on how it is handled by hardware during a world switch:
> > +	 *
> > +	 * A: VMRUN:   Host state saved in host save area
> > +	 *    VMEXIT:  Host state loaded from host save area
> > +	 *
> > +	 * B: VMRUN:   Host state _NOT_ saved in host save area
> > +	 *    VMEXIT:  Host state loaded from host save area
> > +	 *
> > +	 * C: VMRUN:   Host state _NOT_ saved in host save area
> > +	 *    VMEXIT:  Host state initialized to default(reset) values
> > +	 *
> > +	 * Manually save type-B state, i.e. state that is loaded by VMEXIT but
> > +	 * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
> > +	 * by common SVM code).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write
  2023-05-22 23:44   ` Sean Christopherson
@ 2023-05-24  6:36     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-05-24  6:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao



On 23/5/23 09:44, Sean Christopherson wrote:
> On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
>> With MSR_AMD64_SEV_DEBUG_SWAP enabled, the guest is not expected to
>> receive a #VC for reads or writes of DR7.
>>
>> Update the SNP_FEATURES_PRESENT mask with MSR_AMD64_SNP_DEBUG_SWAP so
>> an SNP guest doesn't gracefully terminate during SNP feature negotiation
>> if MSR_AMD64_SEV_DEBUG_SWAP is enabled.
>>
>> Since a guest is not expected to receive a #VC on DR7 accesses when
>> MSR_AMD64_SEV_DEBUG_SWAP is enabled, return an error from the #VC
>> handler in this situation.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
>> Reviewed-by: Carlos Bilbao <carlos.bilbao@amd.com>
>> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
>> ---
>> Changes:
>> v4:
>> * rebased on top of SNP feature negotiation
>>
>> v2:
>> * use new bit definition
>> ---
>>   arch/x86/boot/compressed/sev.c | 2 +-
>>   arch/x86/kernel/sev.c          | 6 ++++++
>>   2 files changed, 7 insertions(+), 1 deletion(-)
> 
> Can you post this separately (or bribe Boris to grab it)?  IIUC, this has no
> dependency on the KVM enabling, i.e. can/should go through the tip tree without
> waiting for the KVM patches to be applied.

I definitely can, do you mind adding yours "rb"/"ab"? Thanks!


-- 
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-05-23 15:44       ` Sean Christopherson
@ 2023-05-26  3:16         ` Alexey Kardashevskiy
  2023-05-26 14:39           ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-05-26  3:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao



On 24/5/23 01:44, Sean Christopherson wrote:
> On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
>>
>>
>> On 23/5/23 09:39, Sean Christopherson wrote:
>>> On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
>>>> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
>>>> to/from a VM. Changing those registers inside a running SEV VM
>>>> triggered #VMEXIT to KVM.
>>>
>>> Please, please don't make it sound like some behavior is *the* one and only behavior.
>>> *KVM* *chooses* to intercept debug register accesses.  Though I would omit this
>>> paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
>>> a basic understanding of how KVM deals with DRs and #DBs.
>>
>> Out of curiosity - is ARM similar in this regard, interceptions and stuff?
> 
> Yes.  The granularity of interceptions varies based on the architectural revision,
> and presumably there are things that always trap.  But the core concept of letting
> software decide what to intercept is the same.
> 
>>>> SEV-ES added encrypted state (ES) which uses an encrypted page
>>>> for the VM state (VMSA). The hardware saves/restores certain registers
>>>> on VMRUN/VMEXIT according to a swap type (A, B, C), see
>>>> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
>>>> volume 2.
>>>>
>>>> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
>>>
>>> Please rewrite this to just state what the behavior is instead of "Type A" vs.
>>> "Type B".  Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
>>> enough to justify obfuscating super simple concepts.
>>>
>>> Actually, this feature really has nothing to do with Type A vs. Type B, since
>>> that's purely about host state.
>>
>> Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in HOSTSA,
>> then the host looses debug state because of the working feature.
>>
>>> I assume the switch from Type A to Type B is a
>>> side effect, or an opportunistic optimization?
>>
>> There is no switch. DR[67] were and are one type, and the other DRs were not
>> swapped and now are, but with a different Swap Type.
> 
> Ah, this is what I missed.
> 
>> And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves some
>> explaining why is that.
>>
>>> Regardless, something like this is a lot easier for a human to read.  It's easy
>>> enough to find DebugSwap in the APM (literally took me longer to find my copy of
>>> the PDF).
>>
>> It is also easy to find "Swap Types" in the APM...
> 
> Redirecting readers to specs for gory details is fine.  Redirecting for basic,
> simple functionality is not.  Maybe the swap types will someday be common knowledge
> in KVM, and an explanation will no longer be necessary, but we are nowhere near
> that point.
> 
>>>     Add support for "DebugSwap for SEV-ES guests", which provides support
>>>     for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
>>>     allows KVM to expose debug capabilities to SEV-ES guests.  Without
>>>     DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
>>
>> But it does save/load DR6 and DR7.
>>
>>>     and KVM cannot manually context switch guest DRs due the VMSA being
>>>     encrypted.
>>>
>>>     Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
>>>     which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
>>>     vectoring a #DB.
>>
>> (english question) What does "vectoring" mean here precisely? Handling?
>> Processing?
> 
> It's not really an English thing.  "Vectoring" is, or at least was, Intel terminology
> for describing the flow from the initial detection of an exception to the exception's
> delivery to software, e.g. covers the IDT lookup, any GDT/LDT lookups, pushing
> information on the stack, fetching the software exception handler, etc.  Importantly,
> it describes the period where there are no instructions retired and thus ucode doesn't
> open event windows, i.e. where triggering another, non-contributory exception will
> effectively "hang" the CPU (at least on CPUs without Intel's "notify" VM-Exit support).
> 
>>>     the host by putting the CPU into an infinite loop of vectoring #DBs
>>>     (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)
>>
>> Good one, thanks for the link.
>>
>>>
>>>     Set the features bit in sev_es_sync_vmsa() which is the last point
>>>     when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
>>>     init happens) is called not only when VCPU is initialized but also on
>>>     intrahost migration when VMSA is encrypted.
>>>
>>>> guests, but a new feature is available, identified via
>>>> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
>>>> support for swapping additional debug registers. DR[0-3] and
>>>> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
>>>> is set.
>>>>
>>>> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
>>>> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
>>>> supported by the SOC as otherwise a malicious SEV-ES guest can set up
>>>> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
>>>> Set the features bit in sev_es_sync_vmsa() which is the last point
>>>> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
>>>> init happens) is called not only when VCPU is initialized but also on
>>>> intrahost migration when VMSA is encrypted.
>>>>
>>>> Eliminate DR7 and #DB intercepts as:
>>>> - they are not needed when DebugSwap is supported;
>>>
>>> "not needed" isn't sufficient justification.  KVM doesn't strictly need to do a
>>> lot of things, but does them anyways for a variety of reasons.  E.g. #DB intercept
>>> is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
>>> this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
>>> when NoNestedDataBp is support.  Presumably the answer is "because it's simpler
>>> than toggling #DB interception for guest_debug.
>>
>> TBH I did not have a good answer but from the above I'd say we want to
>> disable #DB intercepts in a separate patch, just as you say below.
>>
>>> Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
>>> separate patch?  KVM can still inject #DBs for SEV-ES guests, no?
>>
>> Sorry for my ignorance but what is the point of injecting #DB if there is no
>> way of changing the guest's DR7?
> 
> Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
> "What's the point of _intercepting_ #DB" is the real question.  And for SEV-ES guests
> with DebugSwap, there is no point, which is why I agree that KVM should disable
> interception in that case.  What I'm calling out is that disabling #Db interception
> isn't _necessary_ for correctness (unless I'm missing something), which means that
> it can and should go in a separate patch.


About this. Instead of sev_es_init_vmcb(), I can toggle the #DB 
intercept when toggling guest_debug, see below. This 
kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and 
kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if 
guest_state_protected = true). Is there any downside?


diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index da28ed69bf4a..a7df5eb4ac00 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1951,9 +1951,15 @@ static void svm_update_exception_bitmap(struct 
kvm_vcpu *vcpu)

         clr_exception_intercept(svm, BP_VECTOR);

+       if (cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
+               clr_exception_intercept(svm, DB_VECTOR);
+
         if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
                 if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
                         set_exception_intercept(svm, BP_VECTOR);
+
+               if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
+                       set_exception_intercept(svm, DB_VECTOR);
         }
  }




-- 
Alexey

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-05-26  3:16         ` Alexey Kardashevskiy
@ 2023-05-26 14:39           ` Sean Christopherson
  2023-05-30  8:57             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2023-05-26 14:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

On Fri, May 26, 2023, Alexey Kardashevskiy wrote:
> 
> On 24/5/23 01:44, Sean Christopherson wrote:
> > On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
> > > > Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
> > > > separate patch?  KVM can still inject #DBs for SEV-ES guests, no?
> > > 
> > > Sorry for my ignorance but what is the point of injecting #DB if there is no
> > > way of changing the guest's DR7?
> > 
> > Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
> > "What's the point of _intercepting_ #DB" is the real question.  And for SEV-ES guests
> > with DebugSwap, there is no point, which is why I agree that KVM should disable
> > interception in that case.  What I'm calling out is that disabling #Db interception
> > isn't _necessary_ for correctness (unless I'm missing something), which means that
> > it can and should go in a separate patch.
> 
> 
> About this. Instead of sev_es_init_vmcb(), I can toggle the #DB intercept
> when toggling guest_debug, see below. This
> kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and
> kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if
> guest_state_protected = true).

KVM also intercepts #DB when single-stepping over IRET to find an NMI window, so
you'd also have to factor in nmi_singlestep, and update svm_enable_nmi_window()
and disable_nmi_singlestep() to call svm_update_exception_bitmap().

> Is there any downside?

Complexity is the main one.  The complexity is quite low, but the benefit to the
guest is likely even lower.  A #DB in the guest isn't likely to be performance
sensitive.  And on the flip side, opening an NMI window would be a tiny bit more
expensive, though I doubt that would be meaningful either.

All in all, I think it makes sense to just keep intercepting #DB for non-SEV-ES
guests.

Side topic, isn't there an existing bug regarding SEV-ES NMI windows?  KVM can't
actually single-step an SEV-ES guest, but tries to set RFLAGS.TF anyways.  Blech,
and suppressing EFER.SVME in efer_trap() is a bit gross, but I suppose since the
GHCB doesn't allow for CLGI or STGI it's "fine".

E.g. shouldn't KVM do this?

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ca32389f3c36..4e4a49031efe 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
        if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
                return; /* IRET will cause a vm exit */
 
+       /*
+        * KVM can't single-step SEV-ES guests and instead assumes that IRET
+        * in the guest will always succeed, i.e. clears NMI masking on the
+        * next VM-Exit.  Note, GIF is guaranteed to be '1' for SEV-ES guests
+        * as the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
+        * EFER.SVME for good measure, see efer_trap()).
+        */
+       if (sev_es_guest(vcpu->kvm))
+               return;
+
        if (!gif_set(svm)) {
                if (vgif)
                        svm_set_intercept(svm, INTERCEPT_STGI);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-05-26 14:39           ` Sean Christopherson
@ 2023-05-30  8:57             ` Alexey Kardashevskiy
  2023-06-01 23:31               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-05-30  8:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao



On 27/5/23 00:39, Sean Christopherson wrote:
> On Fri, May 26, 2023, Alexey Kardashevskiy wrote:
>>
>> On 24/5/23 01:44, Sean Christopherson wrote:
>>> On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
>>>>> Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
>>>>> separate patch?  KVM can still inject #DBs for SEV-ES guests, no?
>>>>
>>>> Sorry for my ignorance but what is the point of injecting #DB if there is no
>>>> way of changing the guest's DR7?
>>>
>>> Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
>>> "What's the point of _intercepting_ #DB" is the real question.  And for SEV-ES guests
>>> with DebugSwap, there is no point, which is why I agree that KVM should disable
>>> interception in that case.  What I'm calling out is that disabling #Db interception
>>> isn't _necessary_ for correctness (unless I'm missing something), which means that
>>> it can and should go in a separate patch.
>>
>>
>> About this. Instead of sev_es_init_vmcb(), I can toggle the #DB intercept
>> when toggling guest_debug, see below. This
>> kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and
>> kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if
>> guest_state_protected = true).
> 
> KVM also intercepts #DB when single-stepping over IRET to find an NMI window, so
> you'd also have to factor in nmi_singlestep, and update svm_enable_nmi_window()
> and disable_nmi_singlestep() to call svm_update_exception_bitmap().

Uff. New can of worms for me :-/


>> Is there any downside?
> 
> Complexity is the main one.  The complexity is quite low, but the benefit to the
> guest is likely even lower.  A #DB in the guest isn't likely to be performance
> sensitive.  And on the flip side, opening an NMI window would be a tiny bit more
> expensive, though I doubt that would be meaningful either.
> 
> All in all, I think it makes sense to just keep intercepting #DB for non-SEV-ES
> guests.
> 
> Side topic, isn't there an existing bug regarding SEV-ES NMI windows?  KVM can't
> actually single-step an SEV-ES guest, but tries to set RFLAGS.TF anyways. 

Why is it a "bug" and what does the patch fix? Sound to me as it is 
pointless and the guest won't do single stepping and instead will run 
till it exits somehow, what do I miss?

> Blech,
> and suppressing EFER.SVME in efer_trap() is a bit gross,

Why suppressed? svm_set_efer() sets it eventually anyway.

> but I suppose since the
> GHCB doesn't allow for CLGI or STGI it's "fine".

GHCB does not mention this, instead these are always intercepted in 
init_vmcb().

> E.g. shouldn't KVM do this?

It sure can and I am happy to include this into the series, the commit 
log is what I am struggling with :)

> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ca32389f3c36..4e4a49031efe 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>          if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
>                  return; /* IRET will cause a vm exit */
>   
> +       /*
> +        * KVM can't single-step SEV-ES guests and instead assumes that IRET
> +        * in the guest will always succeed, 

It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET):

         case SVM_VMGEXIT_NMI_COMPLETE:
                 ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
                 break;


> i.e. clears NMI masking on the
> +        * next VM-Exit.  Note, GIF is guaranteed to be '1' for SEV-ES guests
> +        * as the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
> +        * EFER.SVME for good measure, see efer_trap()).

SVM KVM seems to not enforce EFER.SVME, the guest does what it wants and 
KVM is only told the new value via EFER_WRITE_TRAP. And "writes by 
SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM. 
I must be missing the point here...


> +        */
> +       if (sev_es_guest(vcpu->kvm))
> +               return;
> +
>          if (!gif_set(svm)) {
>                  if (vgif)
>                          svm_set_intercept(svm, INTERCEPT_STGI);

-- 
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-05-30  8:57             ` Alexey Kardashevskiy
@ 2023-06-01 23:31               ` Alexey Kardashevskiy
  2023-06-13 23:19                 ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-06-01 23:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

Sean, ping?

I wonder if this sev-es-not-singlestepping is a showstopper or it is 
alright to repost this patchset without it? Thanks,


On 30/5/23 18:57, Alexey Kardashevskiy wrote:
> 
> 
> On 27/5/23 00:39, Sean Christopherson wrote:
>> On Fri, May 26, 2023, Alexey Kardashevskiy wrote:
>>>
>>> On 24/5/23 01:44, Sean Christopherson wrote:
>>>> On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
>>>>>> Actually, can't disabling #DB interception for DebugSwap SEV-ES 
>>>>>> guests be a
>>>>>> separate patch?  KVM can still inject #DBs for SEV-ES guests, no?
>>>>>
>>>>> Sorry for my ignorance but what is the point of injecting #DB if 
>>>>> there is no
>>>>> way of changing the guest's DR7?
>>>>
>>>> Well, _injecting_ the #DB is necessary for correctness from the 
>>>> guest's perspective.
>>>> "What's the point of _intercepting_ #DB" is the real question.  And 
>>>> for SEV-ES guests
>>>> with DebugSwap, there is no point, which is why I agree that KVM 
>>>> should disable
>>>> interception in that case.  What I'm calling out is that disabling 
>>>> #Db interception
>>>> isn't _necessary_ for correctness (unless I'm missing something), 
>>>> which means that
>>>> it can and should go in a separate patch.
>>>
>>>
>>> About this. Instead of sev_es_init_vmcb(), I can toggle the #DB 
>>> intercept
>>> when toggling guest_debug, see below. This
>>> kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and
>>> kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if
>>> guest_state_protected = true).
>>
>> KVM also intercepts #DB when single-stepping over IRET to find an NMI 
>> window, so
>> you'd also have to factor in nmi_singlestep, and update 
>> svm_enable_nmi_window()
>> and disable_nmi_singlestep() to call svm_update_exception_bitmap().
> 
> Uff. New can of worms for me :-/
> 
> 
>>> Is there any downside?
>>
>> Complexity is the main one.  The complexity is quite low, but the 
>> benefit to the
>> guest is likely even lower.  A #DB in the guest isn't likely to be 
>> performance
>> sensitive.  And on the flip side, opening an NMI window would be a 
>> tiny bit more
>> expensive, though I doubt that would be meaningful either.
>>
>> All in all, I think it makes sense to just keep intercepting #DB for 
>> non-SEV-ES
>> guests.
>>
>> Side topic, isn't there an existing bug regarding SEV-ES NMI windows?  
>> KVM can't
>> actually single-step an SEV-ES guest, but tries to set RFLAGS.TF anyways. 
> 
> Why is it a "bug" and what does the patch fix? Sound to me as it is 
> pointless and the guest won't do single stepping and instead will run 
> till it exits somehow, what do I miss?
> 
>> Blech,
>> and suppressing EFER.SVME in efer_trap() is a bit gross,
> 
> Why suppressed? svm_set_efer() sets it eventually anyway.
> 
>> but I suppose since the
>> GHCB doesn't allow for CLGI or STGI it's "fine".
> 
> GHCB does not mention this, instead these are always intercepted in 
> init_vmcb().
> 
>> E.g. shouldn't KVM do this?
> 
> It sure can and I am happy to include this into the series, the commit 
> log is what I am struggling with :)
> 
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index ca32389f3c36..4e4a49031efe 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct 
>> kvm_vcpu *vcpu)
>>          if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
>>                  return; /* IRET will cause a vm exit */
>> +       /*
>> +        * KVM can't single-step SEV-ES guests and instead assumes 
>> that IRET
>> +        * in the guest will always succeed, 
> 
> It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET):
> 
>          case SVM_VMGEXIT_NMI_COMPLETE:
>                  ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
>                  break;
> 
> 
>> i.e. clears NMI masking on the
>> +        * next VM-Exit.  Note, GIF is guaranteed to be '1' for SEV-ES 
>> guests
>> +        * as the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
>> +        * EFER.SVME for good measure, see efer_trap()).
> 
> SVM KVM seems to not enforce EFER.SVME, the guest does what it wants and 
> KVM is only told the new value via EFER_WRITE_TRAP. And "writes by 
> SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM. 
> I must be missing the point here...
> 
> 
>> +        */
>> +       if (sev_es_guest(vcpu->kvm))
>> +               return;
>> +
>>          if (!gif_set(svm)) {
>>                  if (vgif)
>>                          svm_set_intercept(svm, INTERCEPT_STGI);
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-06-01 23:31               ` Alexey Kardashevskiy
@ 2023-06-13 23:19                 ` Sean Christopherson
  2023-06-14  3:58                   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2023-06-13 23:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=3Diso-8859-1, Size: 4882 bytes --]

On Fri, Jun 02, 2023, Alexey Kardashevskiy wrote:
> Sean, ping?
>=20
> I wonder if this sev-es-not-singlestepping is a showstopper or it is alri=
ght
> to repost this patchset without it? Thanks,

Ah, shoot, I completely lost this in my inbox.  Sorry :-/

> > > Side topic, isn't there an existing bug regarding SEV-ES NMI windows?
> > > KVM can't actually single-step an SEV-ES guest, but tries to set
> > > RFLAGS.TF anyways.
> >=20
> > Why is it a "bug" and what does the patch fix? Sound to me as it is
> > pointless and the guest won't do single stepping and instead will run
> > till it exits somehow, what do I miss?

The bug is benign in the end, but it's still a bug.  I'm not worried about =
fixing
any behavior, but I dislike having dead, misleading code, especially for so=
mething
like this where both NMI virtualization and SEV-ES are already crazy comple=
x and
subtle.  I think it's safe to say that I've spent more time digging through=
 SEV-ES
and NMI virtualization than most KVM developers, and as evidenced by the nu=
mber of
things I got wrong below, I'm still struggling to keep track of the bigger =
picture.
Developers that are new to all of this need as much help as they can get.

> > > Blech, and suppressing EFER.SVME in efer_trap() is a bit gross,
> >=20
> > Why suppressed? svm_set_efer() sets it eventually anyway.

svm_set_efer() sets SVME in hardware, but KVM's view of the guest's value t=
hat's
stored in vcpu->arch.efer doesn't have SVME set.  E.g. from the guest's per=
spective,
EFER.SVME will have "Reserved Read As Zero" semantics.

> > > but I suppose since the GHCB doesn't allow for CLGI or STGI it's "fin=
e".
> >=20
> > GHCB does not mention this, instead these are always intercepted in
> > init_vmcb().

Right, I'm calling out that the absense of protocol support for requesting =
CLGI
or STGI emulation means dropping the guest's EFER.SVME is ok (though gross =
:-) ).

> > > E.g. shouldn't KVM do this?
> >=20
> > It sure can and I am happy to include this into the series, the commit
> > log is what I am struggling with :)
> >=20
> > >=20
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index ca32389f3c36..4e4a49031efe 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct
> > > kvm_vcpu *vcpu)
> > > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BD if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
> > > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=
 return; /* IRET will cause a vm exit */
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD /*
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * KV=
M can't single-step SEV-ES guests and instead assumes
> > > that IRET
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * in=
 the guest will always succeed,
> >=20
> > It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET=
):
> >=20
> >  =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD case S=
VM_VMGEXIT_NMI_COMPLETE:
> >  =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D =
svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
> >  =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD break;

Ah, right, better to say that the guest is responsible for signaling that i=
t's
ready to accept NMIs, which KVM handles by "emulating" IRET.

> > > i.e. clears NMI masking on the
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * ne=
xt VM-Exit.=EF=BF=BD Note, GIF is guaranteed to be '1' for
> > > SEV-ES guests
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * as=
 the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * EF=
ER.SVME for good measure, see efer_trap()).
> >=20
> > SVM KVM seems to not enforce EFER.SVME, the guest does what it wants an=
d
> > KVM is only told the new value via EFER_WRITE_TRAP. And "writes by
> > SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM=
.

Ahhh, that blurb in the APM is what I'm missing.

Actually, there's a real bug here.  KVM doesn't immediately unmask NMIs in =
response
to NMI_COMPLETE, and instead goes through the whole awaiting_iret_completio=
n =3D>
svm_complete_interrupts(), which means that KVM doesn't unmask NMIs until t=
he
*next* VM-Exit.  Theoretically, that could be never, e.g. if the host is ti=
ckless
and the guest is configured to busy wait idle CPUs.

Attached patches are compile tested only.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 3D"0001-KVM-SVM-Don-t-defer-NMI-unblocking-until-next-exit-f.patc= --]
[-- Type: text/x-diff; charset=3Dus-ascii, Size: 3010 bytes --]

From eb126f1c02b418df0b5dce9e3cdbd984fc4b0611 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Tue, 13 Jun 2023 16:08:18 -0700
Subject: [PATCH 1/2] KVM: SVM: Don't defer NMI unblocking until next exit f=
or
 SEV-ES guests

Immediately mark NMIs as unmasked in response to #VMGEXIT(NMI complete)
instead of setting awaiting_iret_completion and waiting until the *next*
VM-Exit to unmask NMIs.  The whole point of "NMI complete" is that the
guest is responsible for telling the hypervisor when it's safe to inject
an NMI, i.e. there's no need to wait.  And because there's no IRET to
single-step, the next VM-Exit could be a long time coming, i.e. KVM could
incorrectly hold an NMI pending for far longer than what is required and
expected.

Opportunistically fix a stale reference to HF_IRET_MASK.

Fixes: 4444dfe4050b ("KVM: SVM: Add NMI support for an SEV-ES guest")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c |  5 ++++-
 arch/x86/kvm/svm/svm.c | 10 +++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d65578d8784d..9a0e74cb6cb9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2887,7 +2887,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 					    svm->sev_es.ghcb_sa);
 		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
-		ret =3D svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
+		++vcpu->stat.nmi_window_exits;
+		svm->nmi_masked =3D false;
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		ret =3D 1;
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 		ret =3D kvm_emulate_ap_reset_hold(vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b29d0650582e..b284706edde2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2508,12 +2508,13 @@ static int iret_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm =3D to_svm(vcpu);
=20
+	WARN_ON_ONCE(sev_es_guest(vcpu->kvm));
+
 	++vcpu->stat.nmi_window_exits;
 	svm->awaiting_iret_completion =3D true;
=20
 	svm_clr_iret_intercept(svm);
-	if (!sev_es_guest(vcpu->kvm))
-		svm->nmi_iret_rip =3D kvm_rip_read(vcpu);
+	svm->nmi_iret_rip =3D kvm_rip_read(vcpu);
=20
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 	return 1;
@@ -3916,12 +3917,11 @@ static void svm_complete_interrupts(struct kvm_vcpu=
 *vcpu)
 	svm->soft_int_injected =3D false;
=20
 	/*
-	 * If we've made progress since setting HF_IRET_MASK, we've
+	 * If we've made progress since setting awaiting_iret_completion, we've
 	 * executed an IRET and can allow NMI injection.
 	 */
 	if (svm->awaiting_iret_completion &&
-	    (sev_es_guest(vcpu->kvm) ||
-	     kvm_rip_read(vcpu) !=3D svm->nmi_iret_rip)) {
+	    kvm_rip_read(vcpu) !=3D svm->nmi_iret_rip) {
 		svm->awaiting_iret_completion =3D false;
 		svm->nmi_masked =3D false;
 		kvm_make_request(KVM_REQ_EVENT, vcpu);

base-commit: 5e74470e279654d9fa8742184c8c89837b899078
--=20
2.41.0.162.gfafddb0af9-goog


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 3D"0002-KVM-SVM-Don-t-try-to-pointlessly-single-step-SEV-ES-.patc= --]
[-- Type: text/x-diff; charset=3Dus-ascii, Size: 1831 bytes --]

From fe7634942b49a243ec42ca1aaa8b9354c126b2a3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Tue, 13 Jun 2023 15:50:44 -0700
Subject: [PATCH 2/2] KVM: SVM: Don't try to pointlessly single-step SEV-ES
 guests for NMI window

Bail early from svm_enable_nmi_window() for SEV-ES guests without trying
to enable single-step of the guest, as single-stepping an SEV-ES guest is
impossible and the guest is responsible for *telling* KVM when it is ready
for an new NMI to be injected.

Functionally, setting TF and RF in svm->vmcb->save.rflags is benign as the
field is ignored by hardware, but it's all kinds of confusing.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b284706edde2..06d50c9c1e48 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3768,6 +3768,20 @@ static void svm_enable_nmi_window(struct kvm_vcpu *v=
cpu)
 	if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
 		return; /* IRET will cause a vm exit */
=20
+	/*
+	 * SEV-ES guests are responsible for signaling when a vCPU is ready to
+	 * receive a new NMI, as SEV-ES guests can't be single-stepped, i.e.
+	 * KVM can't intercept and single-step IRET to detect when NMIs are
+	 * unblocked (architecturally speaking).  See SVM_VMGEXIT_NMI_COMPLETE.
+	 *
+	 * Note, GIF is guaranteed to be '1' for SEV-ES guests as hardware
+	 * ignores SEV-ES guest writes to EFER.SVME, KVM suppresses EFER.SVME
+	 * (see efer_trap()), *and* CLGI/STGI are not supported NAEs in the
+	 * GHCB protocol.
+	 */
+	if (sev_es_guest(vcpu->kvm))
+		return;
+
 	if (!gif_set(svm)) {
 		if (vgif)
 			svm_set_intercept(svm, INTERCEPT_STGI);
--=20
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-06-13 23:19                 ` Sean Christopherson
@ 2023-06-14  3:58                   ` Alexey Kardashevskiy
  2023-06-14 21:27                     ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Alexey Kardashevskiy @ 2023-06-14  3:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

On 14/6/23 09:19, Sean Christopherson wrote:
> On Fri, Jun 02, 2023, Alexey Kardashevskiy wrote:
>> Sean, ping?
>> =20
>> I wonder if this sev-es-not-singlestepping is a showstopper or it is alri=
> ght
>> to repost this patchset without it? Thanks,
> 
> Ah, shoot, I completely lost this in my inbox.  Sorry :-/

I saw the "OOO" message the other day and relaxed :)


>>>> Side topic, isn't there an existing bug regarding SEV-ES NMI windows?
>>>> KVM can't actually single-step an SEV-ES guest, but tries to set
>>>> RFLAGS.TF anyways.
>>> =20
>>> Why is it a "bug" and what does the patch fix? Sound to me as it is
>>> pointless and the guest won't do single stepping and instead will run
>>> till it exits somehow, what do I miss?
> 
> The bug is benign in the end, but it's still a bug.  I'm not worried about =


(unrelated) Your response's encoding broke somehow and I wonder if this 
is something I did or you did. Lore got it too:

https://lore.kernel.org/all/ZIj5ms+DohcLyXHE@google.com/


> fixing
> any behavior, but I dislike having dead, misleading code, especially for so=
> mething
> like this where both NMI virtualization and SEV-ES are already crazy comple=
> x and
> subtle.  I think it's safe to say that I've spent more time digging through=
>   SEV-ES
> and NMI virtualization than most KVM developers, and as evidenced by the nu=
> mber of
> things I got wrong below, I'm still struggling to keep track of the bigger =
> picture.
> Developers that are new to all of this need as much help as they can get.
> 
>>>> Blech, and suppressing EFER.SVME in efer_trap() is a bit gross,
>>> =20
>>> Why suppressed? svm_set_efer() sets it eventually anyway.
> 
> svm_set_efer() sets SVME in hardware, but KVM's view of the guest's value t=
> hat's
> stored in vcpu->arch.efer doesn't have SVME set.  E.g. from the guest's per=
> spective,
> EFER.SVME will have "Reserved Read As Zero" semantics.

It is not zero, why? From inside the guest, rdmsrl(MSR_EFER, efer) reads 
0x1d01 from that msr where 0x1000==(1<<_EFER_SVME),  _EFER_SVME==12.


> 
>>>> but I suppose since the GHCB doesn't allow for CLGI or STGI it's "fin=
> e".
>>> =20
>>> GHCB does not mention this, instead these are always intercepted in
>>> init_vmcb().
> 
> Right, I'm calling out that the absense of protocol support for requesting =
> CLGI
> or STGI emulation means dropping the guest's EFER.SVME is ok (though gross =
> :-) ).
> 
>>>> E.g. shouldn't KVM do this?
>>> =20
>>> It sure can and I am happy to include this into the series, the commit
>>> log is what I am struggling with :)
>>> =20
>>>> =20
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>> index ca32389f3c36..4e4a49031efe 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct
>>>> kvm_vcpu *vcpu)
>>>> =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
> =BD if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
>>>> =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
> =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=
>   return; /* IRET will cause a vm exit */
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD /*
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * KV=
> M can't single-step SEV-ES guests and instead assumes
>>>> that IRET
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * in=
>   the guest will always succeed,
>>> =20
>>> It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET=
> ):
>>> =20
>>>   =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD case S=
> VM_VMGEXIT_NMI_COMPLETE:
>>>   =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
> =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D =
> svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
>>>   =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
> =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD break;
> 
> Ah, right, better to say that the guest is responsible for signaling that i=
> t's
> ready to accept NMIs, which KVM handles by "emulating" IRET.
> 
>>>> i.e. clears NMI masking on the
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * ne=
> xt VM-Exit.=EF=BF=BD Note, GIF is guaranteed to be '1' for
>>>> SEV-ES guests
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * as=
>   the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * EF=
> ER.SVME for good measure, see efer_trap()).
>>> =20
>>> SVM KVM seems to not enforce EFER.SVME, the guest does what it wants an=
> d
>>> KVM is only told the new value via EFER_WRITE_TRAP. And "writes by
>>> SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM=
> .
> 
> Ahhh, that blurb in the APM is what I'm missing.
> 
> Actually, there's a real bug here.  KVM doesn't immediately unmask NMIs in =
> response
> to NMI_COMPLETE, and instead goes through the whole awaiting_iret_completio=
> n =3D>
> svm_complete_interrupts(), which means that KVM doesn't unmask NMIs until t=
> he
> *next* VM-Exit.  Theoretically, that could be never, e.g. if the host is ti=
> ckless
> and the guest is configured to busy wait idle CPUs.
> 
> Attached patches are compile tested only.

Well, NMIs still get injected from QEMU so I guess it is a pass? Thanks,

-- 
Alexey


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES
  2023-06-14  3:58                   ` Alexey Kardashevskiy
@ 2023-06-14 21:27                     ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2023-06-14 21:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, x86, linux-kernel, Tom Lendacky, Pankaj Gupta,
	Nikunj A Dadhania, Santosh Shukla, Carlos Bilbao

On Wed, Jun 14, 2023, Alexey Kardashevskiy wrote:
> On 14/6/23 09:19, Sean Christopherson wrote:
> > On Fri, Jun 02, 2023, Alexey Kardashevskiy wrote:
> > > > > Side topic, isn't there an existing bug regarding SEV-ES NMI windows?
> > > > > KVM can't actually single-step an SEV-ES guest, but tries to set
> > > > > RFLAGS.TF anyways.
> > > > =20
> > > > Why is it a "bug" and what does the patch fix? Sound to me as it is
> > > > pointless and the guest won't do single stepping and instead will run
> > > > till it exits somehow, what do I miss?
> > 
> > The bug is benign in the end, but it's still a bug.  I'm not worried about =
> 
> 
> (unrelated) Your response's encoding broke somehow and I wonder if this is
> something I did or you did. Lore got it too:
> 
> https://lore.kernel.org/all/ZIj5ms+DohcLyXHE@google.com/

Huh.  Guessing something I did, but I've no idea what caused it.

> > fixing
> > any behavior, but I dislike having dead, misleading code, especially for so=
> > mething
> > like this where both NMI virtualization and SEV-ES are already crazy comple=
> > x and
> > subtle.  I think it's safe to say that I've spent more time digging through=
> >   SEV-ES
> > and NMI virtualization than most KVM developers, and as evidenced by the nu=
> > mber of
> > things I got wrong below, I'm still struggling to keep track of the bigger =
> > picture.
> > Developers that are new to all of this need as much help as they can get.
> > 
> > > > > Blech, and suppressing EFER.SVME in efer_trap() is a bit gross,
> > > > =20
> > > > Why suppressed? svm_set_efer() sets it eventually anyway.
> > 
> > svm_set_efer() sets SVME in hardware, but KVM's view of the guest's value t=
> > hat's
> > stored in vcpu->arch.efer doesn't have SVME set.  E.g. from the guest's per=
> > spective,
> > EFER.SVME will have "Reserved Read As Zero" semantics.
> 
> It is not zero, why? From inside the guest, rdmsrl(MSR_EFER, efer) reads
> 0x1d01 from that msr where 0x1000==(1<<_EFER_SVME),  _EFER_SVME==12.

Oh, lame.  So the guest gets to see the raw value in the VMSA.  So it really comes
down to the GHCB not providing support for STGI/CLGI.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-06-14 21:28 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-11 12:57 [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
2023-04-11 12:57 ` [PATCH kernel v5 1/6] KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header Alexey Kardashevskiy
2023-04-11 12:57 ` [PATCH kernel v5 2/6] KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV Alexey Kardashevskiy
2023-04-11 12:57 ` [PATCH kernel v5 3/6] KVM: SEV-ES: explicitly disable debug Alexey Kardashevskiy
2023-05-22 22:50   ` Sean Christopherson
2023-04-11 12:57 ` [PATCH kernel v5 4/6] KVM: SVM/SEV/SEV-ES: Rework intercepts Alexey Kardashevskiy
2023-05-22 22:53   ` Sean Christopherson
2023-04-11 12:57 ` [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES Alexey Kardashevskiy
2023-05-09 10:58   ` Gupta, Pankaj
2023-05-10  9:35     ` Gupta, Pankaj
2023-05-22 23:39   ` Sean Christopherson
2023-05-23 11:33     ` Alexey Kardashevskiy
2023-05-23 15:44       ` Sean Christopherson
2023-05-26  3:16         ` Alexey Kardashevskiy
2023-05-26 14:39           ` Sean Christopherson
2023-05-30  8:57             ` Alexey Kardashevskiy
2023-06-01 23:31               ` Alexey Kardashevskiy
2023-06-13 23:19                 ` Sean Christopherson
2023-06-14  3:58                   ` Alexey Kardashevskiy
2023-06-14 21:27                     ` Sean Christopherson
2023-04-11 12:57 ` [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write Alexey Kardashevskiy
2023-05-22 23:44   ` Sean Christopherson
2023-05-24  6:36     ` Alexey Kardashevskiy
2023-04-20  1:49 ` [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap Alexey Kardashevskiy
2023-04-20 14:32   ` Sean Christopherson
2023-05-19  0:19     ` Alexey Kardashevskiy
2023-05-19 15:28       ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).