linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support
@ 2021-10-20 12:44 Joerg Roedel
  2021-10-20 12:44 ` [PATCH v5 1/6] KVM: SVM: Get rid of set_ghcb_msr() and *ghcb_msr_bits() functions Joerg Roedel
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-10-20 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, x86, Brijesh Singh, Tom Lendacky, kvm,
	linux-kernel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Hi,

here is a small set of patches which I took from the pending SEV-SNP
patch-sets to enable basic support for GHCB protocol version 2.

When SEV-SNP is not supported, only two new MSR protocol VMGEXIT calls
are required:

	- MSR-based AP-reset-hold
	- MSR-based HV-feature-request

These calls are implemented here and then the protocol is lifted to
version 2.

This is submitted separately because the MSR-based AP-reset-hold call
is required to support kexec/kdump in SEV-ES guests.

The previous version can be found here:

	https://lore.kernel.org/lkml/20210929155330.5597-1-joro@8bytes.org/

Regards,

	Joerg

Changes v4->v5:

	- Removed stable SoB from patch 1
	- Moved kvm_emulate_ap_reset_hold() and all related code in
	  later patches to SVM specific code, as suggested by Sean.

Brijesh Singh (2):
  KVM: SVM: Add support for Hypervisor Feature support MSR protocol
  KVM: SVM: Increase supported GHCB protocol version

Joerg Roedel (2):
  KVM: SVM: Get rid of set_ghcb_msr() and *ghcb_msr_bits() functions
  KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code

Sean Christopherson (1):
  KVM: SVM: Add helper to generate GHCB MSR verson info, and drop macro

Tom Lendacky (1):
  KVM: SVM: Add support to handle AP reset MSR protocol

 arch/x86/include/asm/kvm_host.h   |   3 +-
 arch/x86/include/asm/sev-common.h |  14 +--
 arch/x86/include/uapi/asm/svm.h   |   1 +
 arch/x86/kvm/svm/sev.c            | 141 +++++++++++++++++++++---------
 arch/x86/kvm/svm/svm.h            |  13 +--
 arch/x86/kvm/x86.c                |  11 +--
 6 files changed, 120 insertions(+), 63 deletions(-)


base-commit: 73f122c4f06f650ddf7f7410d8510db1a92a16a0
-- 
2.33.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/6] KVM: SVM: Get rid of set_ghcb_msr() and *ghcb_msr_bits() functions
  2021-10-20 12:44 [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support Joerg Roedel
@ 2021-10-20 12:44 ` Joerg Roedel
  2021-10-20 12:44 ` [PATCH v5 2/6] KVM: SVM: Add helper to generate GHCB MSR verson info, and drop macro Joerg Roedel
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-10-20 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, x86, Brijesh Singh, Tom Lendacky, kvm,
	linux-kernel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Replace the get_ghcb_msr_bits() function with macros and open code the
GHCB MSR setters with hypercall specific helper macros and functions.
This will avoid preserving any previous bits in the GHCB-MSR and
improves code readability.

Also get rid of the set_ghcb_msr() function and open-code it at its
call-sites for better code readability.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/sev-common.h |  9 +++++
 arch/x86/kvm/svm/sev.c            | 56 +++++++++++--------------------
 2 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 2cef6c5a52c2..8540972cad04 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -50,6 +50,10 @@
 		(GHCB_MSR_CPUID_REQ | \
 		(((unsigned long)reg & GHCB_MSR_CPUID_REG_MASK) << GHCB_MSR_CPUID_REG_POS) | \
 		(((unsigned long)fn) << GHCB_MSR_CPUID_FUNC_POS))
+#define GHCB_MSR_CPUID_FN(msr)		\
+	(((msr) >> GHCB_MSR_CPUID_FUNC_POS) & GHCB_MSR_CPUID_FUNC_MASK)
+#define GHCB_MSR_CPUID_REG(msr)		\
+	(((msr) >> GHCB_MSR_CPUID_REG_POS) & GHCB_MSR_CPUID_REG_MASK)
 
 /* AP Reset Hold */
 #define GHCB_MSR_AP_RESET_HOLD_REQ		0x006
@@ -67,6 +71,11 @@
 #define GHCB_SEV_TERM_REASON(reason_set, reason_val)						  \
 	(((((u64)reason_set) &  GHCB_MSR_TERM_REASON_SET_MASK) << GHCB_MSR_TERM_REASON_SET_POS) | \
 	((((u64)reason_val) & GHCB_MSR_TERM_REASON_MASK) << GHCB_MSR_TERM_REASON_POS))
+#define GHCB_MSR_TERM_REASON_SET(msr)	\
+	(((msr) >> GHCB_MSR_TERM_REASON_SET_POS) & GHCB_MSR_TERM_REASON_SET_MASK)
+#define GHCB_MSR_TERM_REASON(msr)	\
+	(((msr) >> GHCB_MSR_TERM_REASON_POS) & GHCB_MSR_TERM_REASON_MASK)
+
 
 #define GHCB_SEV_ES_REASON_GENERAL_REQUEST	0
 #define GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED	1
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3e2769855e51..2e3ecab6f1c8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2378,21 +2378,15 @@ static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
 	return true;
 }
 
-static void set_ghcb_msr_bits(struct vcpu_svm *svm, u64 value, u64 mask,
-			      unsigned int pos)
+static u64 ghcb_msr_cpuid_resp(u64 reg, u64 value)
 {
-	svm->vmcb->control.ghcb_gpa &= ~(mask << pos);
-	svm->vmcb->control.ghcb_gpa |= (value & mask) << pos;
-}
+	u64 msr;
 
-static u64 get_ghcb_msr_bits(struct vcpu_svm *svm, u64 mask, unsigned int pos)
-{
-	return (svm->vmcb->control.ghcb_gpa >> pos) & mask;
-}
+	msr  = GHCB_MSR_CPUID_RESP;
+	msr |= (reg & GHCB_MSR_CPUID_REG_MASK) << GHCB_MSR_CPUID_REG_POS;
+	msr |= (value & GHCB_MSR_CPUID_VALUE_MASK) << GHCB_MSR_CPUID_VALUE_POS;
 
-static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
-{
-	svm->vmcb->control.ghcb_gpa = value;
+	return msr;
 }
 
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
@@ -2409,16 +2403,14 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 
 	switch (ghcb_info) {
 	case GHCB_MSR_SEV_INFO_REQ:
-		set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
-						    GHCB_VERSION_MIN,
-						    sev_enc_bit));
+		svm->vmcb->control.ghcb_gpa = GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+								GHCB_VERSION_MIN,
+								sev_enc_bit);
 		break;
 	case GHCB_MSR_CPUID_REQ: {
 		u64 cpuid_fn, cpuid_reg, cpuid_value;
 
-		cpuid_fn = get_ghcb_msr_bits(svm,
-					     GHCB_MSR_CPUID_FUNC_MASK,
-					     GHCB_MSR_CPUID_FUNC_POS);
+		cpuid_fn = GHCB_MSR_CPUID_FN(control->ghcb_gpa);
 
 		/* Initialize the registers needed by the CPUID intercept */
 		vcpu->arch.regs[VCPU_REGS_RAX] = cpuid_fn;
@@ -2430,9 +2422,8 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 			break;
 		}
 
-		cpuid_reg = get_ghcb_msr_bits(svm,
-					      GHCB_MSR_CPUID_REG_MASK,
-					      GHCB_MSR_CPUID_REG_POS);
+		cpuid_reg = GHCB_MSR_CPUID_REG(control->ghcb_gpa);
+
 		if (cpuid_reg == 0)
 			cpuid_value = vcpu->arch.regs[VCPU_REGS_RAX];
 		else if (cpuid_reg == 1)
@@ -2442,26 +2433,19 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		else
 			cpuid_value = vcpu->arch.regs[VCPU_REGS_RDX];
 
-		set_ghcb_msr_bits(svm, cpuid_value,
-				  GHCB_MSR_CPUID_VALUE_MASK,
-				  GHCB_MSR_CPUID_VALUE_POS);
+		svm->vmcb->control.ghcb_gpa = ghcb_msr_cpuid_resp(cpuid_reg, cpuid_value);
 
-		set_ghcb_msr_bits(svm, GHCB_MSR_CPUID_RESP,
-				  GHCB_MSR_INFO_MASK,
-				  GHCB_MSR_INFO_POS);
 		break;
 	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
-		reason_set = get_ghcb_msr_bits(svm,
-					       GHCB_MSR_TERM_REASON_SET_MASK,
-					       GHCB_MSR_TERM_REASON_SET_POS);
-		reason_code = get_ghcb_msr_bits(svm,
-						GHCB_MSR_TERM_REASON_MASK,
-						GHCB_MSR_TERM_REASON_POS);
+		reason_set  = GHCB_MSR_TERM_REASON_SET(control->ghcb_gpa);
+		reason_code = GHCB_MSR_TERM_REASON(control->ghcb_gpa);
+
 		pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
 			reason_set, reason_code);
+
 		fallthrough;
 	}
 	default:
@@ -2637,9 +2621,9 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 	 * Set the GHCB MSR value as per the GHCB specification when emulating
 	 * vCPU RESET for an SEV-ES guest.
 	 */
-	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
-					    GHCB_VERSION_MIN,
-					    sev_enc_bit));
+	svm->vmcb->control.ghcb_gpa = GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+							GHCB_VERSION_MIN,
+							sev_enc_bit);
 }
 
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/6] KVM: SVM: Add helper to generate GHCB MSR verson info, and drop macro
  2021-10-20 12:44 [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support Joerg Roedel
  2021-10-20 12:44 ` [PATCH v5 1/6] KVM: SVM: Get rid of set_ghcb_msr() and *ghcb_msr_bits() functions Joerg Roedel
@ 2021-10-20 12:44 ` Joerg Roedel
  2021-10-20 12:44 ` [PATCH v5 3/6] KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code Joerg Roedel
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-10-20 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, x86, Brijesh Singh, Tom Lendacky, kvm,
	linux-kernel, Joerg Roedel

From: Sean Christopherson <seanjc@google.com>

Convert the GHCB_MSR_SEV_INFO macro into a helper function, and have the
helper hardcode the min/max versions instead of relying on the caller to
do the same.  Under no circumstance should different pieces of KVM define
different min/max versions.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/sev-common.h |  5 -----
 arch/x86/kvm/svm/sev.c            | 24 ++++++++++++++++++------
 arch/x86/kvm/svm/svm.h            |  5 -----
 3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 8540972cad04..886c36f0cb16 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -24,11 +24,6 @@
 #define GHCB_MSR_VER_MIN_MASK		0xffff
 #define GHCB_MSR_CBIT_POS		24
 #define GHCB_MSR_CBIT_MASK		0xff
-#define GHCB_MSR_SEV_INFO(_max, _min, _cbit)				\
-	((((_max) & GHCB_MSR_VER_MAX_MASK) << GHCB_MSR_VER_MAX_POS) |	\
-	 (((_min) & GHCB_MSR_VER_MIN_MASK) << GHCB_MSR_VER_MIN_POS) |	\
-	 (((_cbit) & GHCB_MSR_CBIT_MASK) << GHCB_MSR_CBIT_POS) |	\
-	 GHCB_MSR_SEV_INFO_RESP)
 #define GHCB_MSR_INFO(v)		((v) & 0xfffUL)
 #define GHCB_MSR_PROTO_MAX(v)		(((v) >> GHCB_MSR_VER_MAX_POS) & GHCB_MSR_VER_MAX_MASK)
 #define GHCB_MSR_PROTO_MIN(v)		(((v) >> GHCB_MSR_VER_MIN_POS) & GHCB_MSR_VER_MIN_MASK)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2e3ecab6f1c8..9fb4d8fad1f4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2389,6 +2389,22 @@ static u64 ghcb_msr_cpuid_resp(u64 reg, u64 value)
 	return msr;
 }
 
+/* The min/max GHCB version supported by KVM. */
+#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MIN	1ULL
+
+static u64 ghcb_msr_version_info(void)
+{
+	u64 msr;
+
+	msr  = GHCB_MSR_SEV_INFO_RESP;
+	msr |= GHCB_VERSION_MAX << GHCB_MSR_VER_MAX_POS;
+	msr |= GHCB_VERSION_MIN << GHCB_MSR_VER_MIN_POS;
+	msr |= sev_enc_bit << GHCB_MSR_CBIT_POS;
+
+	return msr;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2403,9 +2419,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 
 	switch (ghcb_info) {
 	case GHCB_MSR_SEV_INFO_REQ:
-		svm->vmcb->control.ghcb_gpa = GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
-								GHCB_VERSION_MIN,
-								sev_enc_bit);
+		control->ghcb_gpa = ghcb_msr_version_info();
 		break;
 	case GHCB_MSR_CPUID_REQ: {
 		u64 cpuid_fn, cpuid_reg, cpuid_value;
@@ -2621,9 +2635,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
 	 * Set the GHCB MSR value as per the GHCB specification when emulating
 	 * vCPU RESET for an SEV-ES guest.
 	 */
-	svm->vmcb->control.ghcb_gpa = GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
-							GHCB_VERSION_MIN,
-							sev_enc_bit);
+	svm->vmcb->control.ghcb_gpa = ghcb_msr_version_info();
 }
 
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0d7bbe548ac3..68e5f16a0554 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -544,11 +544,6 @@ void svm_vcpu_blocking(struct kvm_vcpu *vcpu);
 void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 
 /* sev.c */
-
-#define GHCB_VERSION_MAX	1ULL
-#define GHCB_VERSION_MIN	1ULL
-
-
 extern unsigned int max_sev_asid;
 
 void sev_vm_destroy(struct kvm *kvm);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 3/6] KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code
  2021-10-20 12:44 [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support Joerg Roedel
  2021-10-20 12:44 ` [PATCH v5 1/6] KVM: SVM: Get rid of set_ghcb_msr() and *ghcb_msr_bits() functions Joerg Roedel
  2021-10-20 12:44 ` [PATCH v5 2/6] KVM: SVM: Add helper to generate GHCB MSR verson info, and drop macro Joerg Roedel
@ 2021-10-20 12:44 ` Joerg Roedel
  2021-10-20 14:29   ` Sean Christopherson
  2021-10-20 12:44 ` [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol Joerg Roedel
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Joerg Roedel @ 2021-10-20 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, x86, Brijesh Singh, Tom Lendacky, kvm,
	linux-kernel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The function is only used by the kvm-amd module. Move it to the AMD
specific part of the code and name it sev_emulate_ap_reset_hold().

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm/sev.c          | 10 +++++++++-
 arch/x86/kvm/x86.c              | 11 ++---------
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 80f4b8a9233c..b67f550616cf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1682,8 +1682,8 @@ int kvm_emulate_monitor(struct kvm_vcpu *vcpu);
 int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
 int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
+int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason);
 int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
-int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9fb4d8fad1f4..9afa71cb36e6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2405,6 +2405,14 @@ static u64 ghcb_msr_version_info(void)
 	return msr;
 }
 
+static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
+{
+	int ret = kvm_skip_emulated_instruction(&svm->vcpu);
+
+	return __kvm_vcpu_halt(&svm->vcpu,
+			       KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2536,7 +2544,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
-		ret = kvm_emulate_ap_reset_hold(vcpu);
+		ret = sev_emulate_ap_reset_hold(svm);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
 		struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c59b63c56af9..cc3a65d6821d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8651,7 +8651,7 @@ void kvm_arch_exit(void)
 #endif
 }
 
-static int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
+int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
 {
 	++vcpu->stat.halt_exits;
 	if (lapic_in_kernel(vcpu)) {
@@ -8662,6 +8662,7 @@ static int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
 		return 0;
 	}
 }
+EXPORT_SYMBOL_GPL(__kvm_vcpu_halt);
 
 int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 {
@@ -8680,14 +8681,6 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
-int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
-{
-	int ret = kvm_skip_emulated_instruction(vcpu);
-
-	return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
-}
-EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
-
 #ifdef CONFIG_X86_64
 static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
 			        unsigned long clock_type)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol
  2021-10-20 12:44 [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support Joerg Roedel
                   ` (2 preceding siblings ...)
  2021-10-20 12:44 ` [PATCH v5 3/6] KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code Joerg Roedel
@ 2021-10-20 12:44 ` Joerg Roedel
  2021-10-20 17:40   ` Sean Christopherson
  2021-10-20 12:44 ` [PATCH v5 5/6] KVM: SVM: Add support for Hypervisor Feature support " Joerg Roedel
  2021-10-20 12:44 ` [PATCH v5 6/6] KVM: SVM: Increase supported GHCB protocol version Joerg Roedel
  5 siblings, 1 reply; 12+ messages in thread
From: Joerg Roedel @ 2021-10-20 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, x86, Brijesh Singh, Tom Lendacky, kvm,
	linux-kernel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/svm/sev.c          | 52 ++++++++++++++++++++++++++-------
 arch/x86/kvm/svm/svm.h          |  8 +++++
 3 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b67f550616cf..5c6b1469cc3b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -237,7 +237,6 @@ enum x86_intercept_stage;
 	KVM_GUESTDBG_INJECT_DB | \
 	KVM_GUESTDBG_BLOCKIRQ)
 
-
 #define PFERR_PRESENT_BIT 0
 #define PFERR_WRITE_BIT 1
 #define PFERR_USER_BIT 2
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9afa71cb36e6..10af4ac83971 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2246,6 +2246,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 
 void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 {
+	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
+	svm->reset_hold_type = AP_RESET_HOLD_NONE;
+
 	if (!svm->ghcb)
 		return;
 
@@ -2405,14 +2408,21 @@ static u64 ghcb_msr_version_info(void)
 	return msr;
 }
 
-static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
+static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm, enum ap_reset_hold_type type)
 {
 	int ret = kvm_skip_emulated_instruction(&svm->vcpu);
 
+	svm->reset_hold_type = type;
+
 	return __kvm_vcpu_halt(&svm->vcpu,
 			       KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
 }
 
+static u64 ghcb_msr_ap_rst_resp(u64 value)
+{
+	return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2459,6 +2469,16 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 
 		break;
 	}
+	case GHCB_MSR_AP_RESET_HOLD_REQ:
+		ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_MSR_PROTO);
+
+		/*
+		 * Preset the result to a non-SIPI return and then only set
+		 * the result to non-zero when delivering a SIPI.
+		 */
+		svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);
+
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2544,7 +2564,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
-		ret = sev_emulate_ap_reset_hold(svm);
+		ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_NAE_EVENT);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
 		struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
@@ -2679,13 +2699,23 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		return;
 	}
 
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->ghcb)
-		return;
-
-	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+	/* Subsequent SIPI */
+	switch (svm->reset_hold_type) {
+	case AP_RESET_HOLD_NAE_EVENT:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+		 */
+		ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+		break;
+	case AP_RESET_HOLD_MSR_PROTO:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set GHCB data field to a non-zero value.
+		 */
+		svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(1);
+		break;
+	default:
+		break;
+	}
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 68e5f16a0554..bf9379f1cfb8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -69,6 +69,12 @@ enum {
 /* TPR and CR2 are always written before VMRUN */
 #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
+enum ap_reset_hold_type {
+	AP_RESET_HOLD_NONE,
+	AP_RESET_HOLD_NAE_EVENT,
+	AP_RESET_HOLD_MSR_PROTO,
+};
+
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
@@ -199,6 +205,8 @@ struct vcpu_svm {
 	bool ghcb_sa_free;
 
 	bool guest_state_loaded;
+
+	enum ap_reset_hold_type reset_hold_type;
 };
 
 struct svm_cpu_data {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 5/6] KVM: SVM: Add support for Hypervisor Feature support MSR protocol
  2021-10-20 12:44 [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support Joerg Roedel
                   ` (3 preceding siblings ...)
  2021-10-20 12:44 ` [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol Joerg Roedel
@ 2021-10-20 12:44 ` Joerg Roedel
  2021-10-20 12:44 ` [PATCH v5 6/6] KVM: SVM: Increase supported GHCB protocol version Joerg Roedel
  5 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-10-20 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, x86, Brijesh Singh, Tom Lendacky, kvm,
	linux-kernel, Joerg Roedel

From: Brijesh Singh <brijesh.singh@amd.com>

Version 2 of the GHCB specification introduced advertisement of
supported Hypervisor SEV features. This request is required to support
a the GHCB version 2 protocol.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/uapi/asm/svm.h |  1 +
 arch/x86/kvm/svm/sev.c          | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index efa969325ede..67cf153fe580 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
 #define SVM_VMGEXIT_AP_JUMP_TABLE		0x80000005
 #define SVM_VMGEXIT_SET_AP_JUMP_TABLE		0
 #define SVM_VMGEXIT_GET_AP_JUMP_TABLE		1
+#define SVM_VMGEXIT_HYPERVISOR_FEATURES		0x8000fffd
 #define SVM_VMGEXIT_UNSUPPORTED_EVENT		0x8000ffff
 
 /* Exit code reserved for hypervisor/software use */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 10af4ac83971..0c673291e905 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2216,6 +2216,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+	case SVM_VMGEXIT_HYPERVISOR_FEATURES:
 		break;
 	default:
 		goto vmgexit_err;
@@ -2423,6 +2424,20 @@ static u64 ghcb_msr_ap_rst_resp(u64 value)
 	return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
 }
 
+/* Hypervisor GHCB Features supported by KVM */
+#define KVM_SUPPORTED_GHCB_HV_FEATURES		0UL
+
+static u64 ghcb_msr_hv_feat_resp(void)
+{
+	u64 msr;
+
+	msr  = GHCB_MSR_HV_FT_RESP;
+	msr |= (KVM_SUPPORTED_GHCB_HV_FEATURES << GHCB_DATA_LOW);
+
+	return msr;
+}
+
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2478,6 +2493,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		 */
 		svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);
 
+		break;
+	case GHCB_MSR_HV_FT_REQ:
+		svm->vmcb->control.ghcb_gpa = ghcb_msr_hv_feat_resp();
 		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
@@ -2591,6 +2609,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_HYPERVISOR_FEATURES:
+		ghcb_set_sw_exit_info_2(ghcb, KVM_SUPPORTED_GHCB_HV_FEATURES);
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 6/6] KVM: SVM: Increase supported GHCB protocol version
  2021-10-20 12:44 [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support Joerg Roedel
                   ` (4 preceding siblings ...)
  2021-10-20 12:44 ` [PATCH v5 5/6] KVM: SVM: Add support for Hypervisor Feature support " Joerg Roedel
@ 2021-10-20 12:44 ` Joerg Roedel
  5 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-10-20 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, x86, Brijesh Singh, Tom Lendacky, kvm,
	linux-kernel, Joerg Roedel

From: Brijesh Singh <brijesh.singh@amd.com>

Now that KVM has basic support for version 2 of the GHCB specification,
bump the maximum supported protocol version. The SNP specific functions
are still missing, but those are only required when the Hypervisor
supports running SNP guests.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kvm/svm/sev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0c673291e905..3ad69f4f0ed6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2394,7 +2394,7 @@ static u64 ghcb_msr_cpuid_resp(u64 reg, u64 value)
 }
 
 /* The min/max GHCB version supported by KVM. */
-#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
 static u64 ghcb_msr_version_info(void)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/6] KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code
  2021-10-20 12:44 ` [PATCH v5 3/6] KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code Joerg Roedel
@ 2021-10-20 14:29   ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-10-20 14:29 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, x86,
	Brijesh Singh, Tom Lendacky, kvm, linux-kernel, Joerg Roedel

On Wed, Oct 20, 2021, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The function is only used by the kvm-amd module. Move it to the AMD
> specific part of the code and name it sev_emulate_ap_reset_hold().
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol
  2021-10-20 12:44 ` [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol Joerg Roedel
@ 2021-10-20 17:40   ` Sean Christopherson
  2021-10-20 18:13     ` Tom Lendacky
  2021-10-20 18:36     ` Tom Lendacky
  0 siblings, 2 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-10-20 17:40 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, x86,
	Brijesh Singh, Tom Lendacky, kvm, linux-kernel, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 9843 bytes --]

On Wed, Oct 20, 2021, Joerg Roedel wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
> available in version 2 of the GHCB specification.

The changelog needs to explain how this is actually supposed to work.  Doesn't
need gory details, just a basic explanation of the sequence of events to wake a
vCPU that requested a reset hold.

I apologize in advance for the following rant(s).  There's some actionable feedback,
but a lot of it is just me complaining about the reset hold nonsense.

For the actual feedback, attached are two patches: patch 1 eliminates the
"first_sipi_received" hack, patch 2 is a (hopefully) fixed version of this patch
(but doesn't have an updated changelog).  Both are compile tested only.  There
will be a benign conflict with patch 05 of this series.

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/svm/sev.c          | 52 ++++++++++++++++++++++++++-------
>  arch/x86/kvm/svm/svm.h          |  8 +++++
>  3 files changed, 49 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b67f550616cf..5c6b1469cc3b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -237,7 +237,6 @@ enum x86_intercept_stage;
>  	KVM_GUESTDBG_INJECT_DB | \
>  	KVM_GUESTDBG_BLOCKIRQ)
>  
> -

Spurious whitespace change.

>  #define PFERR_PRESENT_BIT 0
>  #define PFERR_WRITE_BIT 1
>  #define PFERR_USER_BIT 2
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 9afa71cb36e6..10af4ac83971 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2246,6 +2246,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>  
>  void sev_es_unmap_ghcb(struct vcpu_svm *svm)
>  {
> +	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
> +	svm->reset_hold_type = AP_RESET_HOLD_NONE;
> +
>  	if (!svm->ghcb)
>  		return;
>  
> @@ -2405,14 +2408,21 @@ static u64 ghcb_msr_version_info(void)
>  	return msr;
>  }
>  
> -static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
> +static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm, enum ap_reset_hold_type type)
>  {
>  	int ret = kvm_skip_emulated_instruction(&svm->vcpu);
>  
> +	svm->reset_hold_type = type;
> +
>  	return __kvm_vcpu_halt(&svm->vcpu,
>  			       KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
>  }
>  
> +static u64 ghcb_msr_ap_rst_resp(u64 value)
> +{
> +	return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -2459,6 +2469,16 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  
>  		break;
>  	}
> +	case GHCB_MSR_AP_RESET_HOLD_REQ:
> +		ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_MSR_PROTO);
> +
> +		/*
> +		 * Preset the result to a non-SIPI return and then only set
> +		 * the result to non-zero when delivering a SIPI.
> +		 */
> +		svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);

This can race with the SIPI and effectively corrupt svm->vmcb->control.ghcb_gpa.

        vCPU0                           vCPU1
                                        #VMGEXIT(RESET_HOLD)
                                        __kvm_vcpu_halt()
        INIT
        SIPI
        sev_vcpu_deliver_sipi_vector()
        ghcb_msr_ap_rst_resp(1);
                                        ghcb_msr_ap_rst_resp(0);

Note, the "INIT" above is mostly a guess.  I'm pretty sure it's necessary because
I don't see how KVM can possibly be correct otherwise.  The SIPI handler (below)
quite clearly expects the vCPU to have been in an AP reset hold, but the invocation
of sev_vcpu_deliver_sipi_vector is gated by the vCPU being in
KVM_MP_STATE_INIT_RECEIVED, not KVM_MP_STATE_AP_RESET_HOLD.  That implies the BSP
must INIT the AP.

                if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
                        /* evaluate pending_events before reading the vector */ 
                        smp_rmb();
                        sipi_vector = apic->sipi_vector;
                        kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, sipi_vector);
                        vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
                }

But the GHCB 2.0 spec doesn't actually say that; it instead implies that SIPI is
supposed to be able to directly wake the AP.

  * When a guest AP reaches its HLT loop (or similar method for parking the AP),
    it instead can either:
      1. Issue an AP Reset Hold NAE event.
      2. Issue the AP Reset Hold Request MSR Protocol
  * The hypervisor treats treats either request like the guest issued a HLT
                   ^^^^^^^^^^^^^                                        ^^^
                   spec typo                                            ???
    instruction and marks the vCPU as halted.
  * When the hypervisor receives a SIPI request for the vCPU, it will not update
                                   ^^^^^^^^^^^^
    any register values and, instead, it will either complete the AP Reset Hold
    NAE event or complete the AP Reset Hold MSR protocol
  * Mark the vCPU as active, allowing the VMGEXIT to complete.
  * Upon return from the VMGEXIT, the AP must transition from its current execution
    mode into real mode and begin executing at the reset vector supplied in the SIPI
    request. 

Piecing things together, my understanding is that the "hold request" really is
intended to be a HLT, but with extra semantics where the host sets sw_exit_info_2
to indicate that the vCPU was woken by INIT-SIPI. 

It's probably too late to change the spec, but I'm going to complain anyways. 
This is all ridiculously convoluted and completely unnecessary.  As pointed out
in the SNP series regarding AP "creation", this can be fully handled in the guest
via a simple mailbox between firmware and kernel.  What's really fubar is that
the guest firmware and kernel already have a mailbox!  But it's defined by the
GHCB spec instead of e.g. ACPI, and so instead of handling this fully within the
guest, the hypervisor (and PSP to some extent on SNP because of the secrets page!!!)
gets involved.

The complications to support this in the guest firmware are hilarious, e.g. the
guest hasto manually switch from 64-bit mode to Real Mode just so that the kernel
can continue to use a horribly antiquated method for gaining control of APs.

> +
> +		break;
>  	case GHCB_MSR_TERM_REQ: {
>  		u64 reason_set, reason_code;
>  
> @@ -2544,7 +2564,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>  		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
>  		break;
>  	case SVM_VMGEXIT_AP_HLT_LOOP:
> -		ret = sev_emulate_ap_reset_hold(svm);
> +		ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_NAE_EVENT);
>  		break;
>  	case SVM_VMGEXIT_AP_JUMP_TABLE: {
>  		struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
> @@ -2679,13 +2699,23 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)

Tying into above, handling this in SIPI is flawed.  For example, if the guest
does INIT-SIPI-SIPI without a reset hold, KVM would incorrect set sw_exit_info_2
on the SIPI.  Because this mess requires an INIT, KVM has lost track of whether
the guest was in KVM_MP_STATE_AP_RESET_HOLD and thus can't know if the SIPI
arrived after a reset hold.  Looking at KVM, IIUC, this bug is why the hack
"received_first_sipi" exists.

Of course this all begs the question of why there's a "reset hold" concept in
the first place.  It's literally HLT with a flag to say "you got INIT-SIPI".
But the guest has to supply and fill a jump table!  Just put the flag in the
jump table!!!!

>  		return;
>  	}
>  
> -	/*
> -	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
> -	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
> -	 * non-zero value.
> -	 */
> -	if (!svm->ghcb)
> -		return;
> -
> -	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
> +	/* Subsequent SIPI */
> +	switch (svm->reset_hold_type) {
> +	case AP_RESET_HOLD_NAE_EVENT:
> +		/*
> +		 * Return from an AP Reset Hold VMGEXIT, where the guest will
> +		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
> +		 */
> +		ghcb_set_sw_exit_info_2(svm->ghcb, 1);

Doesn't this need to check for a null svm->ghcb?

I also suspect a boolean might make it easier to understand the implications
and also make the whole thing less error prone, e.g.

        if (svm->reset_hold_msr_protocol) {

        } else if (svm->ghcb) {

        }

> +		break;
> +	case AP_RESET_HOLD_MSR_PROTO:
> +		/*
> +		 * Return from an AP Reset Hold VMGEXIT, where the guest will
> +		 * set the CS and RIP. Set GHCB data field to a non-zero value.
> +		 */
> +		svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(1);
> +		break;
> +	default:
> +		break;
> +	}
>  }
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 68e5f16a0554..bf9379f1cfb8 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -69,6 +69,12 @@ enum {
>  /* TPR and CR2 are always written before VMRUN */
>  #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
>  
> +enum ap_reset_hold_type {
> +	AP_RESET_HOLD_NONE,
> +	AP_RESET_HOLD_NAE_EVENT,
> +	AP_RESET_HOLD_MSR_PROTO,
> +};
> +
>  struct kvm_sev_info {
>  	bool active;		/* SEV enabled guest */
>  	bool es_active;		/* SEV-ES enabled guest */
> @@ -199,6 +205,8 @@ struct vcpu_svm {
>  	bool ghcb_sa_free;
>  
>  	bool guest_state_loaded;
> +
> +	enum ap_reset_hold_type reset_hold_type;
>  };
>  
>  struct svm_cpu_data {
> -- 
> 2.33.1
> 

[-- Attachment #2: 0001-KVM-SVM-Set-released-on-INIT-SIPI-iff-SEV-ES-vCPU-wa.patch --]
[-- Type: text/x-diff, Size: 4820 bytes --]

From deffc8d46fed51f9ec3e77e4a9f4c61a727eb174 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 20 Oct 2021 09:46:16 -0700
Subject: [PATCH 1/2] KVM: SVM: Set "released" on INIT-SIPI iff SEV-ES vCPU was
 in AP reset hold

Set ghcb->sw_exit_info_2 when releasing a vCPU from an AP reset hold if
and only if the vCPU is actually in a reset hold.  Move the handling to
INIT (was SIPI) so that KVM can check the current MP state; when SIPI is
received, the vCPU will be in INIT_RECEIVED and will have lost track of
whether or not the vCPU was in a reset hold.

Drop the received_first_sipi flag, which was a hack to workaround the
fact that KVM lost track of whether or not the vCPU was in a reset hold.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 34 ++++++++++++----------------------
 arch/x86/kvm/svm/svm.c | 13 ++++++++-----
 arch/x86/kvm/svm/svm.h |  4 +---
 3 files changed, 21 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9afa71cb36e6..f8dfa88993b8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2637,8 +2637,19 @@ void sev_es_init_vmcb(struct vcpu_svm *svm)
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
 }
 
-void sev_es_vcpu_reset(struct vcpu_svm *svm)
+void sev_es_vcpu_reset(struct vcpu_svm *svm, bool init_event)
 {
+	if (init_event) {
+		/*
+		 * If the vCPU is in a "reset" hold, signal via SW_EXIT_INFO_2
+		 * that, assuming it receives a SIPI, the vCPU was "released".
+		 */
+		if (svm->vcpu.arch.mp_state == KVM_MP_STATE_AP_RESET_HOLD &&
+		    svm->ghcb)
+			ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+		return;
+	}
+
 	/*
 	 * Set the GHCB MSR value as per the GHCB specification when emulating
 	 * vCPU RESET for an SEV-ES guest.
@@ -2668,24 +2679,3 @@ void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
 	/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
 	hostsa->xss = host_xss;
 }
-
-void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-
-	/* First SIPI: Use the values as initially set by the VMM */
-	if (!svm->received_first_sipi) {
-		svm->received_first_sipi = true;
-		return;
-	}
-
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->ghcb)
-		return;
-
-	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
-}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 89077160d463..0497066a91fb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1372,9 +1372,6 @@ static void __svm_vcpu_reset(struct kvm_vcpu *vcpu)
 	svm_init_osvw(vcpu);
 	vcpu->arch.microcode_version = 0x01000065;
 	svm->tsc_ratio_msr = kvm_default_tsc_scaling_ratio;
-
-	if (sev_es_guest(vcpu->kvm))
-		sev_es_vcpu_reset(svm);
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1388,6 +1385,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	if (!init_event)
 		__svm_vcpu_reset(vcpu);
+
+	if (sev_es_guest(vcpu->kvm))
+		sev_es_vcpu_reset(svm, init_event);
 }
 
 void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
@@ -4553,10 +4553,13 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
 
 static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 {
+	/*
+	 * SEV-ES (and later derivatives) use INIT-SIPI to bring up APs, but
+	 * the guest is responsible for transitioning to Real Mode and setting
+	 * CS:RIP, GPRs, etc...  KVM just needs to make the vCPU runnable.
+	 */
 	if (!sev_es_guest(vcpu->kvm))
 		return kvm_vcpu_deliver_sipi_vector(vcpu, vector);
-
-	sev_vcpu_deliver_sipi_vector(vcpu, vector);
 }
 
 static void svm_vm_destroy(struct kvm *kvm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 68e5f16a0554..c1f3685db2e1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -190,7 +190,6 @@ struct vcpu_svm {
 	struct vmcb_save_area *vmsa;
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
-	bool received_first_sipi;
 
 	/* SEV-ES scratch area support */
 	void *ghcb_sa;
@@ -562,8 +561,7 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu);
 int sev_handle_vmgexit(struct kvm_vcpu *vcpu);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
 void sev_es_init_vmcb(struct vcpu_svm *svm);
-void sev_es_vcpu_reset(struct vcpu_svm *svm);
-void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+void sev_es_vcpu_reset(struct vcpu_svm *svm, bool init_event);
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 
-- 
2.33.0.1079.g6e70778dc9-goog


[-- Attachment #3: 0002-KVM-SVM-Add-support-to-handle-AP-reset-MSR-protocol.patch --]
[-- Type: text/x-diff, Size: 4195 bytes --]

From 7e41f58f1ce591bf5a40a94993957879a60103d6 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 20 Oct 2021 14:44:14 +0200
Subject: [PATCH 2/2] KVM: SVM: Add support to handle AP reset MSR protocol

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 53 +++++++++++++++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.h |  1 +
 2 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f8dfa88993b8..1174270d18ee 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2405,10 +2405,36 @@ static u64 ghcb_msr_version_info(void)
 	return msr;
 }
 
-static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
+
+static u64 ghcb_msr_ap_rst_resp(u64 value)
+{
+	return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
+}
+
+static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm, u64 hold_type)
 {
 	int ret = kvm_skip_emulated_instruction(&svm->vcpu);
 
+	if (hold_type == GHCB_MSR_AP_RESET_HOLD_REQ) {
+		/*
+		 * Preset the result to a non-SIPI return and then only set
+		 * the result to non-zero when delivering a SIPI.
+		 */
+		svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);
+		svm->reset_hold_msr_protocol = true;
+	} else {
+		WARN_ON_ONCE(hold_type != SVM_VMGEXIT_AP_HLT_LOOP);
+		svm->reset_hold_msr_protocol = false;
+	}
+
+	/*
+	 * Ensure the writes to ghcb_gpa and reset_hold_msr_protocol are visible
+	 * before the MP state change so that the INIT-SIPI doesn't misread
+	 * reset_hold_msr_protocol or write ghcb_gpa before this.  Pairs with
+	 * the smp_rmb() in sev_vcpu_reset().
+	 */
+	smp_wmb();
+
 	return __kvm_vcpu_halt(&svm->vcpu,
 			       KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
 }
@@ -2459,6 +2485,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 
 		break;
 	}
+	case GHCB_MSR_AP_RESET_HOLD_REQ:
+		ret = sev_emulate_ap_reset_hold(svm, GHCB_MSR_AP_RESET_HOLD_REQ);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2544,7 +2573,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
-		ret = sev_emulate_ap_reset_hold(svm);
+		ret = sev_emulate_ap_reset_hold(svm, SVM_VMGEXIT_AP_HLT_LOOP);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
 		struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
@@ -2642,11 +2671,23 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm, bool init_event)
 	if (init_event) {
 		/*
 		 * If the vCPU is in a "reset" hold, signal via SW_EXIT_INFO_2
-		 * that, assuming it receives a SIPI, the vCPU was "released".
+		 * (or the GHCB_GPA for the MSR protocol) that, assuming it
+		 * receives a SIPI, the vCPU was "released".
 		 */
-		if (svm->vcpu.arch.mp_state == KVM_MP_STATE_AP_RESET_HOLD &&
-		    svm->ghcb)
-			ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+		if (svm->vcpu.arch.mp_state == KVM_MP_STATE_AP_RESET_HOLD) {
+			/*
+			 * Ensure mp_state is read before reset_hold_msr_protocol
+			 * and before writing ghcb_gpa to ensure KVM conumes the
+			 * correct protocol.  Pairs with the smp_wmb() in
+			 * sev_emulate_ap_reset_hold().
+			 */
+			smp_rmb();
+			if (svm->reset_hold_msr_protocol)
+				svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(1);
+			else if (svm->ghcb)
+				ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+			svm->reset_hold_msr_protocol = false;
+		}
 		return;
 	}
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c1f3685db2e1..531d3258df58 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -198,6 +198,7 @@ struct vcpu_svm {
 	bool ghcb_sa_free;
 
 	bool guest_state_loaded;
+	bool reset_hold_msr_protocol;
 };
 
 struct svm_cpu_data {
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol
  2021-10-20 17:40   ` Sean Christopherson
@ 2021-10-20 18:13     ` Tom Lendacky
  2021-10-20 18:38       ` Sean Christopherson
  2021-10-20 18:36     ` Tom Lendacky
  1 sibling, 1 reply; 12+ messages in thread
From: Tom Lendacky @ 2021-10-20 18:13 UTC (permalink / raw)
  To: Sean Christopherson, Joerg Roedel
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, x86,
	Brijesh Singh, kvm, linux-kernel, Joerg Roedel

On 10/20/21 12:40 PM, Sean Christopherson wrote:
> On Wed, Oct 20, 2021, Joerg Roedel wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
>> available in version 2 of the GHCB specification.
> 
> The changelog needs to explain how this is actually supposed to work.  Doesn't
> need gory details, just a basic explanation of the sequence of events to wake a
> vCPU that requested a reset hold.
> 
> I apologize in advance for the following rant(s).  There's some actionable feedback,
> but a lot of it is just me complaining about the reset hold nonsense.
> 
> For the actual feedback, attached are two patches: patch 1 eliminates the
> "first_sipi_received" hack, patch 2 is a (hopefully) fixed version of this patch
> (but doesn't have an updated changelog).  Both are compile tested only.  There
> will be a benign conflict with patch 05 of this series.
> 
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Signed-off-by: Joerg Roedel <jroedel@suse.de>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  1 -
>>   arch/x86/kvm/svm/sev.c          | 52 ++++++++++++++++++++++++++-------
>>   arch/x86/kvm/svm/svm.h          |  8 +++++
>>   3 files changed, 49 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index b67f550616cf..5c6b1469cc3b 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -237,7 +237,6 @@ enum x86_intercept_stage;
>>   	KVM_GUESTDBG_INJECT_DB | \
>>   	KVM_GUESTDBG_BLOCKIRQ)
>>   
>> -
> 
> Spurious whitespace change.
> 
>>   #define PFERR_PRESENT_BIT 0
>>   #define PFERR_WRITE_BIT 1
>>   #define PFERR_USER_BIT 2
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 9afa71cb36e6..10af4ac83971 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2246,6 +2246,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>>   
>>   void sev_es_unmap_ghcb(struct vcpu_svm *svm)
>>   {
>> +	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
>> +	svm->reset_hold_type = AP_RESET_HOLD_NONE;
>> +
>>   	if (!svm->ghcb)
>>   		return;
>>   
>> @@ -2405,14 +2408,21 @@ static u64 ghcb_msr_version_info(void)
>>   	return msr;
>>   }
>>   
>> -static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
>> +static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm, enum ap_reset_hold_type type)
>>   {
>>   	int ret = kvm_skip_emulated_instruction(&svm->vcpu);
>>   
>> +	svm->reset_hold_type = type;
>> +
>>   	return __kvm_vcpu_halt(&svm->vcpu,
>>   			       KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
>>   }
>>   
>> +static u64 ghcb_msr_ap_rst_resp(u64 value)
>> +{
>> +	return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
>> +}
>> +
>>   static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>>   {
>>   	struct vmcb_control_area *control = &svm->vmcb->control;
>> @@ -2459,6 +2469,16 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>>   
>>   		break;
>>   	}
>> +	case GHCB_MSR_AP_RESET_HOLD_REQ:
>> +		ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_MSR_PROTO);
>> +
>> +		/*
>> +		 * Preset the result to a non-SIPI return and then only set
>> +		 * the result to non-zero when delivering a SIPI.
>> +		 */
>> +		svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);
> 
> This can race with the SIPI and effectively corrupt svm->vmcb->control.ghcb_gpa.
> 
>          vCPU0                           vCPU1
>                                          #VMGEXIT(RESET_HOLD)
>                                          __kvm_vcpu_halt()
>          INIT
>          SIPI
>          sev_vcpu_deliver_sipi_vector()
>          ghcb_msr_ap_rst_resp(1);

This isn't possible. vCPU0 doesn't set vCPU1's GHCB value. vCPU1's GHCB 
value is set when vCPU1 handles events in vcpu_enter_guest().

Thanks,
Tom

>                                          ghcb_msr_ap_rst_resp(0);
> 
> Note, the "INIT" above is mostly a guess.  I'm pretty sure it's necessary because
> I don't see how KVM can possibly be correct otherwise.  The SIPI handler (below)
> quite clearly expects the vCPU to have been in an AP reset hold, but the invocation
> of sev_vcpu_deliver_sipi_vector is gated by the vCPU being in
> KVM_MP_STATE_INIT_RECEIVED, not KVM_MP_STATE_AP_RESET_HOLD.  That implies the BSP
> must INIT the AP.
> 
>                  if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>                          /* evaluate pending_events before reading the vector */
>                          smp_rmb();
>                          sipi_vector = apic->sipi_vector;
>                          kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, sipi_vector);
>                          vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>                  }
> 
> But the GHCB 2.0 spec doesn't actually say that; it instead implies that SIPI is
> supposed to be able to directly wake the AP.
> 
>    * When a guest AP reaches its HLT loop (or similar method for parking the AP),
>      it instead can either:
>        1. Issue an AP Reset Hold NAE event.
>        2. Issue the AP Reset Hold Request MSR Protocol
>    * The hypervisor treats treats either request like the guest issued a HLT
>                     ^^^^^^^^^^^^^                                        ^^^
>                     spec typo                                            ???
>      instruction and marks the vCPU as halted.
>    * When the hypervisor receives a SIPI request for the vCPU, it will not update
>                                     ^^^^^^^^^^^^
>      any register values and, instead, it will either complete the AP Reset Hold
>      NAE event or complete the AP Reset Hold MSR protocol
>    * Mark the vCPU as active, allowing the VMGEXIT to complete.
>    * Upon return from the VMGEXIT, the AP must transition from its current execution
>      mode into real mode and begin executing at the reset vector supplied in the SIPI
>      request.
> 
> Piecing things together, my understanding is that the "hold request" really is
> intended to be a HLT, but with extra semantics where the host sets sw_exit_info_2
> to indicate that the vCPU was woken by INIT-SIPI.
> 
> It's probably too late to change the spec, but I'm going to complain anyways.
> This is all ridiculously convoluted and completely unnecessary.  As pointed out
> in the SNP series regarding AP "creation", this can be fully handled in the guest
> via a simple mailbox between firmware and kernel.  What's really fubar is that
> the guest firmware and kernel already have a mailbox!  But it's defined by the
> GHCB spec instead of e.g. ACPI, and so instead of handling this fully within the
> guest, the hypervisor (and PSP to some extent on SNP because of the secrets page!!!)
> gets involved.
> 
> The complications to support this in the guest firmware are hilarious, e.g. the
> guest hasto manually switch from 64-bit mode to Real Mode just so that the kernel
> can continue to use a horribly antiquated method for gaining control of APs.
> 
>> +
>> +		break;
>>   	case GHCB_MSR_TERM_REQ: {
>>   		u64 reason_set, reason_code;
>>   
>> @@ -2544,7 +2564,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>>   		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
>>   		break;
>>   	case SVM_VMGEXIT_AP_HLT_LOOP:
>> -		ret = sev_emulate_ap_reset_hold(svm);
>> +		ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_NAE_EVENT);
>>   		break;
>>   	case SVM_VMGEXIT_AP_JUMP_TABLE: {
>>   		struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
>> @@ -2679,13 +2699,23 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> 
> Tying into above, handling this in SIPI is flawed.  For example, if the guest
> does INIT-SIPI-SIPI without a reset hold, KVM would incorrect set sw_exit_info_2
> on the SIPI.  Because this mess requires an INIT, KVM has lost track of whether
> the guest was in KVM_MP_STATE_AP_RESET_HOLD and thus can't know if the SIPI
> arrived after a reset hold.  Looking at KVM, IIUC, this bug is why the hack
> "received_first_sipi" exists.
> 
> Of course this all begs the question of why there's a "reset hold" concept in
> the first place.  It's literally HLT with a flag to say "you got INIT-SIPI".
> But the guest has to supply and fill a jump table!  Just put the flag in the
> jump table!!!!
> 
>>   		return;
>>   	}
>>   
>> -	/*
>> -	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
>> -	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
>> -	 * non-zero value.
>> -	 */
>> -	if (!svm->ghcb)
>> -		return;
>> -
>> -	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>> +	/* Subsequent SIPI */
>> +	switch (svm->reset_hold_type) {
>> +	case AP_RESET_HOLD_NAE_EVENT:
>> +		/*
>> +		 * Return from an AP Reset Hold VMGEXIT, where the guest will
>> +		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
>> +		 */
>> +		ghcb_set_sw_exit_info_2(svm->ghcb, 1);
> 
> Doesn't this need to check for a null svm->ghcb?
> 
> I also suspect a boolean might make it easier to understand the implications
> and also make the whole thing less error prone, e.g.
> 
>          if (svm->reset_hold_msr_protocol) {
> 
>          } else if (svm->ghcb) {
> 
>          }
> 
>> +		break;
>> +	case AP_RESET_HOLD_MSR_PROTO:
>> +		/*
>> +		 * Return from an AP Reset Hold VMGEXIT, where the guest will
>> +		 * set the CS and RIP. Set GHCB data field to a non-zero value.
>> +		 */
>> +		svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(1);
>> +		break;
>> +	default:
>> +		break;
>> +	}
>>   }
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 68e5f16a0554..bf9379f1cfb8 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -69,6 +69,12 @@ enum {
>>   /* TPR and CR2 are always written before VMRUN */
>>   #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
>>   
>> +enum ap_reset_hold_type {
>> +	AP_RESET_HOLD_NONE,
>> +	AP_RESET_HOLD_NAE_EVENT,
>> +	AP_RESET_HOLD_MSR_PROTO,
>> +};
>> +
>>   struct kvm_sev_info {
>>   	bool active;		/* SEV enabled guest */
>>   	bool es_active;		/* SEV-ES enabled guest */
>> @@ -199,6 +205,8 @@ struct vcpu_svm {
>>   	bool ghcb_sa_free;
>>   
>>   	bool guest_state_loaded;
>> +
>> +	enum ap_reset_hold_type reset_hold_type;
>>   };
>>   
>>   struct svm_cpu_data {
>> -- 
>> 2.33.1
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol
  2021-10-20 17:40   ` Sean Christopherson
  2021-10-20 18:13     ` Tom Lendacky
@ 2021-10-20 18:36     ` Tom Lendacky
  1 sibling, 0 replies; 12+ messages in thread
From: Tom Lendacky @ 2021-10-20 18:36 UTC (permalink / raw)
  To: Sean Christopherson, Joerg Roedel
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, x86,
	Brijesh Singh, kvm, linux-kernel, Joerg Roedel

On 10/20/21 12:40 PM, Sean Christopherson wrote:
> On Wed, Oct 20, 2021, Joerg Roedel wrote:
> 
> Tying into above, handling this in SIPI is flawed.  For example, if the guest
> does INIT-SIPI-SIPI without a reset hold, KVM would incorrect set sw_exit_info_2
> on the SIPI.  Because this mess requires an INIT, KVM has lost track of whether
> the guest was in KVM_MP_STATE_AP_RESET_HOLD and thus can't know if the SIPI
> arrived after a reset hold.  Looking at KVM, IIUC, this bug is why the hack
> "received_first_sipi" exists.

The received_first_sipi is because the APs have to be started the very 
first time in the traditional way. The AP can't issue an AP reset hold if 
it hasn't started to begin with in which case there would be no GHCB mapped.

After the check to see if the  GHCB is mapped was added to 
sev_vcpu_deliver_sipi_vector(), the "received_first_sipi" could probably 
have been deleted at that point.

Thanks,
Tom


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol
  2021-10-20 18:13     ` Tom Lendacky
@ 2021-10-20 18:38       ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-10-20 18:38 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Joerg Roedel, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, x86, Brijesh Singh, kvm, linux-kernel, Joerg Roedel

On Wed, Oct 20, 2021, Tom Lendacky wrote:
> On 10/20/21 12:40 PM, Sean Christopherson wrote:
> > On Wed, Oct 20, 2021, Joerg Roedel wrote:
> > This can race with the SIPI and effectively corrupt svm->vmcb->control.ghcb_gpa.
> > 
> >          vCPU0                           vCPU1
> >                                          #VMGEXIT(RESET_HOLD)
> >                                          __kvm_vcpu_halt()
> >          INIT
> >          SIPI
> >          sev_vcpu_deliver_sipi_vector()
> >          ghcb_msr_ap_rst_resp(1);
> 
> This isn't possible. vCPU0 doesn't set vCPU1's GHCB value. vCPU1's GHCB
> value is set when vCPU1 handles events in vcpu_enter_guest().

Argh, I was thinking of injecting regular IPIs across vCPUs.  In hindsight it
makes sense that INIT and SIPI are handled on the current vCPU, stuffing all that
state from a different vCPU would be needlessly complex.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-10-20 18:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-20 12:44 [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support Joerg Roedel
2021-10-20 12:44 ` [PATCH v5 1/6] KVM: SVM: Get rid of set_ghcb_msr() and *ghcb_msr_bits() functions Joerg Roedel
2021-10-20 12:44 ` [PATCH v5 2/6] KVM: SVM: Add helper to generate GHCB MSR verson info, and drop macro Joerg Roedel
2021-10-20 12:44 ` [PATCH v5 3/6] KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code Joerg Roedel
2021-10-20 14:29   ` Sean Christopherson
2021-10-20 12:44 ` [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol Joerg Roedel
2021-10-20 17:40   ` Sean Christopherson
2021-10-20 18:13     ` Tom Lendacky
2021-10-20 18:38       ` Sean Christopherson
2021-10-20 18:36     ` Tom Lendacky
2021-10-20 12:44 ` [PATCH v5 5/6] KVM: SVM: Add support for Hypervisor Feature support " Joerg Roedel
2021-10-20 12:44 ` [PATCH v5 6/6] KVM: SVM: Increase supported GHCB protocol version Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).