All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] SVM: vNMI (with my fixes)
@ 2022-11-29 19:37 Maxim Levitsky
  2022-11-29 19:37 ` [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
                   ` (15 more replies)
  0 siblings, 16 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky

Hi!

This is the vNMI patch series based on Santosh Shukla's vNMI patch series.

In this version of this patch series I addressed most of the review feedback
added some more refactoring and also I think fixed the issue with migration.

I only tested this on a machine which doesn't have vNMI, so this does need
some testing to ensure that nothing is broken.

Best regards,
       Maxim Levitsky

Maxim Levitsky (9):
  KVM: nSVM: don't sync back tlb_ctl on nested VM exit
  KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
  KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
    doesn't intercept interrupts
  KVM: SVM: drop the SVM specific H_FLAGS
  KVM: x86: emulator: stop using raw host flags
  KVM: SVM: add wrappers to enable/disable IRET interception
  KVM: x86: add a delayed hardware NMI injection interface
  KVM: SVM: implement support for vNMI
  KVM: nSVM: implement support for nested VNMI

Santosh Shukla (2):
  x86/cpu: Add CPUID feature bit for VNMI
  KVM: SVM: Add VNMI bit definition

 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/kvm-x86-ops.h |   2 +
 arch/x86/include/asm/kvm_host.h    |  24 +++--
 arch/x86/include/asm/svm.h         |   7 ++
 arch/x86/kvm/emulate.c             |  11 +--
 arch/x86/kvm/kvm_emulate.h         |   7 +-
 arch/x86/kvm/smm.c                 |   2 -
 arch/x86/kvm/svm/nested.c          | 102 ++++++++++++++++---
 arch/x86/kvm/svm/svm.c             | 154 ++++++++++++++++++++++-------
 arch/x86/kvm/svm/svm.h             |  41 +++++++-
 arch/x86/kvm/x86.c                 |  50 ++++++++--
 11 files changed, 318 insertions(+), 83 deletions(-)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2022-12-05 14:05   ` Santosh Shukla
  2022-11-29 19:37 ` [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12 Maxim Levitsky
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky

The CPU doesn't change TLB_CTL value as stated in the PRM (15.16.2):

  "The VMRUN instruction reads, but does not change, the
  value of the TLB_CONTROL field"

Therefore the KVM shouldn't do that either.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index bc9cd7086fa972..37af0338da7c32 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1010,7 +1010,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		vmcb12->control.next_rip  = vmcb02->control.next_rip;
 
 	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
-	vmcb12->control.tlb_ctl           = svm->nested.ctl.tlb_ctl;
 	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
 	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
  2022-11-29 19:37 ` [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2023-01-28  0:37   ` Sean Christopherson
  2022-11-29 19:37 ` [PATCH v2 03/11] KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1 doesn't intercept interrupts Maxim Levitsky
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky

the V_IRQ and v_TPR bits don't exist when virtual interrupt
masking is not enabled, therefore the KVM should not copy these
bits regardless of V_IRQ intercept.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 37af0338da7c32..aad3145b2f62fe 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -412,24 +412,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
  */
 void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 {
-	u32 mask;
+	u32 mask = 0;
 	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
 	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
 
-	/* Only a few fields of int_ctl are written by the processor.  */
-	mask = V_IRQ_MASK | V_TPR_MASK;
-	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
-	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
-		/*
-		 * In order to request an interrupt window, L0 is usurping
-		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
-		 * even if it was clear in L1's VMCB.  Restoring it would be
-		 * wrong.  However, in this case V_IRQ will remain true until
-		 * interrupt_window_interception calls svm_clear_vintr and
-		 * restores int_ctl.  We can just leave it aside.
-		 */
-		mask &= ~V_IRQ_MASK;
-	}
+	/*
+	 * Only a few fields of int_ctl are written by the processor.
+	 * Copy back only the bits that are passed through to the L2.
+	 */
+
+	if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
+		mask = V_IRQ_MASK | V_TPR_MASK;
 
 	if (nested_vgif_enabled(svm))
 		mask |= V_GIF_MASK;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 03/11] KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1 doesn't intercept interrupts
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
  2022-11-29 19:37 ` [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
  2022-11-29 19:37 ` [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12 Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2023-01-28  0:56   ` Sean Christopherson
  2022-11-29 19:37 ` [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS Maxim Levitsky
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky

If the L2 doesn't intercept interrupts, then the KVM will use vmcb02's
V_IRQ for L1 (to detect the interrupt window)

In this case on the nested VM exit KVM might need to copy the V_IRQ bit
from the vmcb02 to the vmcb01, to continue waiting for the
interrupt window.

To make it simple, just raise the KVM_REQ_EVENT request, which
execution will lead to the reenabling of the interrupt
window if needed.

Note that this is a theoretical bug because the KVM already does raise
the KVM_REQ_EVENT request one each nested VM exit because the nested
VM exit resets RFLAGS and the kvm_set_rflags() raises the
KVM_REQ_EVENT request in the response.

However raising this request explicitly, together with
documenting why this is needed, is still preferred.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index aad3145b2f62fe..e891318595113e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1016,6 +1016,31 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
+	/* Note about synchronizing some of int_ctl bits from vmcb02 to vmcb01:
+	 *
+	 * - V_IRQ, V_IRQ_VECTOR, V_INTR_PRIO_MASK, V_IGN_TPR:
+	 * If the L2 doesn't intercept interrupts, then
+	 * (even if the L2 does use virtual interrupt masking),
+	 * KVM will use the vmcb02's V_INTR to detect interrupt window.
+	 *
+	 * In this case, the KVM raises the KVM_REQ_EVENT to ensure that interrupt window
+	 * is not lost and this implicitly copies these bits from vmcb02 to vmcb01
+	 *
+	 * V_TPR:
+	 * If the L2 doesn't use virtual interrupt masking, then the L1's vTPR
+	 * is stored in the vmcb02 but its value doesn't need to be copied from/to
+	 * vmcb01 because it is copied from/to the TPR APIC's register on
+	 * each VM entry/exit.
+	 *
+	 * V_GIF:
+	 * - If the nested vGIF is not used, KVM uses vmcb02's V_GIF for L1's V_GIF,
+	 * however, the L1 vGIF is reset to false on each VM exit, thus
+	 * there is no need to copy it from vmcb02 to vmcb01.
+	 */
+
+	if (!nested_exit_on_intr(svm))
+		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
+
 	if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
 		svm_copy_lbrs(vmcb12, vmcb02);
 		svm_update_lbrv(vcpu);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (2 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 03/11] KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1 doesn't intercept interrupts Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2022-12-05 15:31   ` Santosh Shukla
  2023-01-28  0:56   ` Sean Christopherson
  2022-11-29 19:37 ` [PATCH v2 05/11] KVM: x86: emulator: stop using raw host flags Maxim Levitsky
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky

GIF and 'waiting for IRET' are used only for the SVM and thus should
not be in H_FLAGS.

NMI mask is not x86 specific but it is only used for SVM without vNMI.

The VMX have similar concept of NMI mask (soft_vnmi_blocked),
and it is used when its 'vNMI' feature is not enabled,
but because the VMX can't intercept IRET, it is more of a hack,
and thus should not use common host flags either.

No functional change is intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ---
 arch/x86/kvm/svm/svm.c          | 22 +++++++++++++---------
 arch/x86/kvm/svm/svm.h          | 25 ++++++++++++++++++++++---
 3 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 70af7240a1d5af..9208ad7a6bd004 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2052,9 +2052,6 @@ enum {
 	TASK_SWITCH_GATE = 3,
 };
 
-#define HF_GIF_MASK		(1 << 0)
-#define HF_NMI_MASK		(1 << 3)
-#define HF_IRET_MASK		(1 << 4)
 #define HF_GUEST_MASK		(1 << 5) /* VCPU is in guest-mode */
 
 #ifdef CONFIG_KVM_SMM
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 91352d69284524..512b2aa21137e2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1326,6 +1326,9 @@ static void __svm_vcpu_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.microcode_version = 0x01000065;
 	svm->tsc_ratio_msr = kvm_caps.default_tsc_scaling_ratio;
 
+	svm->nmi_masked = false;
+	svm->awaiting_iret_completion = false;
+
 	if (sev_es_guest(vcpu->kvm))
 		sev_es_vcpu_reset(svm);
 }
@@ -2470,7 +2473,7 @@ static int iret_interception(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	++vcpu->stat.nmi_window_exits;
-	vcpu->arch.hflags |= HF_IRET_MASK;
+	svm->awaiting_iret_completion = true;
 	if (!sev_es_guest(vcpu->kvm)) {
 		svm_clr_intercept(svm, INTERCEPT_IRET);
 		svm->nmi_iret_rip = kvm_rip_read(vcpu);
@@ -3466,7 +3469,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 	if (svm->nmi_l1_to_l2)
 		return;
 
-	vcpu->arch.hflags |= HF_NMI_MASK;
+	svm->nmi_masked = true;
 	if (!sev_es_guest(vcpu->kvm))
 		svm_set_intercept(svm, INTERCEPT_IRET);
 	++vcpu->stat.nmi_injections;
@@ -3580,7 +3583,7 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
 		return false;
 
 	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
-	      (vcpu->arch.hflags & HF_NMI_MASK);
+	      (svm->nmi_masked);
 
 	return ret;
 }
@@ -3602,7 +3605,7 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 
 static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
 {
-	return !!(vcpu->arch.hflags & HF_NMI_MASK);
+	return to_svm(vcpu)->nmi_masked;
 }
 
 static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
@@ -3610,11 +3613,11 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	if (masked) {
-		vcpu->arch.hflags |= HF_NMI_MASK;
+		svm->nmi_masked = true;
 		if (!sev_es_guest(vcpu->kvm))
 			svm_set_intercept(svm, INTERCEPT_IRET);
 	} else {
-		vcpu->arch.hflags &= ~HF_NMI_MASK;
+		svm->nmi_masked = false;
 		if (!sev_es_guest(vcpu->kvm))
 			svm_clr_intercept(svm, INTERCEPT_IRET);
 	}
@@ -3700,7 +3703,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
+	if (svm->nmi_masked && !svm->awaiting_iret_completion)
 		return; /* IRET will cause a vm exit */
 
 	if (!gif_set(svm)) {
@@ -3824,10 +3827,11 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu)
 	 * If we've made progress since setting HF_IRET_MASK, we've
 	 * executed an IRET and can allow NMI injection.
 	 */
-	if ((vcpu->arch.hflags & HF_IRET_MASK) &&
+	if (svm->awaiting_iret_completion &&
 	    (sev_es_guest(vcpu->kvm) ||
 	     kvm_rip_read(vcpu) != svm->nmi_iret_rip)) {
-		vcpu->arch.hflags &= ~(HF_NMI_MASK | HF_IRET_MASK);
+		svm->awaiting_iret_completion = false;
+		svm->nmi_masked = false;
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
 	}
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4826e6cc611bf1..587ddc150f9f34 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -237,8 +237,24 @@ struct vcpu_svm {
 
 	struct svm_nested_state nested;
 
+	/* NMI mask value, used when vNMI is not enabled */
+	bool nmi_masked;
+
+	/*
+	 * True when the NMI still masked but guest IRET was just intercepted
+	 * and KVM is waiting for RIP change which will signal that this IRET was
+	 * retired and thus NMI can be unmasked.
+	 */
+	bool awaiting_iret_completion;
+
+	/*
+	 * Set when KVM waits for IRET completion and needs to
+	 * inject NMIs as soon as it completes (e.g NMI is pending injection).
+	 * The KVM takes over EFLAGS.TF for this.
+	 */
 	bool nmi_singlestep;
 	u64 nmi_singlestep_guest_rflags;
+
 	bool nmi_l1_to_l2;
 
 	unsigned long soft_int_csbase;
@@ -280,6 +296,9 @@ struct vcpu_svm {
 	bool guest_state_loaded;
 
 	bool x2avic_msrs_intercepted;
+
+	/* Guest GIF value which is used when vGIF is not enabled */
+	bool gif_value;
 };
 
 struct svm_cpu_data {
@@ -497,7 +516,7 @@ static inline void enable_gif(struct vcpu_svm *svm)
 	if (vmcb)
 		vmcb->control.int_ctl |= V_GIF_MASK;
 	else
-		svm->vcpu.arch.hflags |= HF_GIF_MASK;
+		svm->gif_value = true;
 }
 
 static inline void disable_gif(struct vcpu_svm *svm)
@@ -507,7 +526,7 @@ static inline void disable_gif(struct vcpu_svm *svm)
 	if (vmcb)
 		vmcb->control.int_ctl &= ~V_GIF_MASK;
 	else
-		svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
+		svm->gif_value = false;
 }
 
 static inline bool gif_set(struct vcpu_svm *svm)
@@ -517,7 +536,7 @@ static inline bool gif_set(struct vcpu_svm *svm)
 	if (vmcb)
 		return !!(vmcb->control.int_ctl & V_GIF_MASK);
 	else
-		return !!(svm->vcpu.arch.hflags & HF_GIF_MASK);
+		return svm->gif_value;
 }
 
 static inline bool nested_npt_enabled(struct vcpu_svm *svm)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 05/11] KVM: x86: emulator: stop using raw host flags
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (3 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2023-01-28  0:58   ` Sean Christopherson
  2022-11-29 19:37 ` [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception Maxim Levitsky
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky

Instead of re-defining the H_FLAGS bits, just expose the 'in_smm'
and the 'in_guest_mode' host flags using emulator callbacks.

Also while at it, garbage collect the recently removed host flags.

No functional change is intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  6 +++---
 arch/x86/kvm/emulate.c          | 11 +++++------
 arch/x86/kvm/kvm_emulate.h      |  7 ++-----
 arch/x86/kvm/smm.c              |  2 --
 arch/x86/kvm/x86.c              | 14 +++++++++-----
 5 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9208ad7a6bd004..684a5519812fb2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2052,11 +2052,11 @@ enum {
 	TASK_SWITCH_GATE = 3,
 };
 
-#define HF_GUEST_MASK		(1 << 5) /* VCPU is in guest-mode */
+#define HF_GUEST_MASK		(1 << 0) /* VCPU is in guest-mode */
 
 #ifdef CONFIG_KVM_SMM
-#define HF_SMM_MASK		(1 << 6)
-#define HF_SMM_INSIDE_NMI_MASK	(1 << 7)
+#define HF_SMM_MASK		(1 << 1)
+#define HF_SMM_INSIDE_NMI_MASK	(1 << 2)
 
 # define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
 # define KVM_ADDRESS_SPACE_NUM 2
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5cc3efa0e21c17..d869131f84ffb3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2309,7 +2309,7 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
 
 static int em_rsm(struct x86_emulate_ctxt *ctxt)
 {
-	if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_MASK) == 0)
+	if (!ctxt->ops->in_smm(ctxt))
 		return emulate_ud(ctxt);
 
 	if (ctxt->ops->leave_smm(ctxt))
@@ -5132,7 +5132,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 	const struct x86_emulate_ops *ops = ctxt->ops;
 	int rc = X86EMUL_CONTINUE;
 	int saved_dst_type = ctxt->dst.type;
-	unsigned emul_flags;
+	bool in_guest_mode = ctxt->ops->in_guest_mode(ctxt);
 
 	ctxt->mem_read.pos = 0;
 
@@ -5147,7 +5147,6 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 		goto done;
 	}
 
-	emul_flags = ctxt->ops->get_hflags(ctxt);
 	if (unlikely(ctxt->d &
 		     (No64|Undefined|Sse|Mmx|Intercept|CheckPerm|Priv|Prot|String))) {
 		if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
@@ -5181,7 +5180,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 				fetch_possible_mmx_operand(&ctxt->dst);
 		}
 
-		if (unlikely(emul_flags & X86EMUL_GUEST_MASK) && ctxt->intercept) {
+		if (unlikely(in_guest_mode) && ctxt->intercept) {
 			rc = emulator_check_intercept(ctxt, ctxt->intercept,
 						      X86_ICPT_PRE_EXCEPT);
 			if (rc != X86EMUL_CONTINUE)
@@ -5210,7 +5209,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 				goto done;
 		}
 
-		if (unlikely(emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
+		if (unlikely(in_guest_mode) && (ctxt->d & Intercept)) {
 			rc = emulator_check_intercept(ctxt, ctxt->intercept,
 						      X86_ICPT_POST_EXCEPT);
 			if (rc != X86EMUL_CONTINUE)
@@ -5264,7 +5263,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
 
 special_insn:
 
-	if (unlikely(emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
+	if (unlikely(in_guest_mode) && (ctxt->d & Intercept)) {
 		rc = emulator_check_intercept(ctxt, ctxt->intercept,
 					      X86_ICPT_POST_MEMACCESS);
 		if (rc != X86EMUL_CONTINUE)
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 2d9662be833378..dd0203fbb27543 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -220,7 +220,8 @@ struct x86_emulate_ops {
 
 	void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
 
-	unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);
+	bool (*in_smm)(struct x86_emulate_ctxt *ctxt);
+	bool (*in_guest_mode)(struct x86_emulate_ctxt *ctxt);
 	int (*leave_smm)(struct x86_emulate_ctxt *ctxt);
 	void (*triple_fault)(struct x86_emulate_ctxt *ctxt);
 	int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr);
@@ -275,10 +276,6 @@ enum x86emul_mode {
 	X86EMUL_MODE_PROT64,	/* 64-bit (long) mode.    */
 };
 
-/* These match some of the HF_* flags defined in kvm_host.h  */
-#define X86EMUL_GUEST_MASK           (1 << 5) /* VCPU is in guest-mode */
-#define X86EMUL_SMM_MASK             (1 << 6)
-
 /*
  * fastop functions are declared as taking a never-defined fastop parameter,
  * so they can't be called from C directly.
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index a9c1c2af8d94c2..a3a94edd2f0bc9 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -110,8 +110,6 @@ static void check_smram_offsets(void)
 
 void kvm_smm_changed(struct kvm_vcpu *vcpu, bool entering_smm)
 {
-	BUILD_BUG_ON(HF_SMM_MASK != X86EMUL_SMM_MASK);
-
 	trace_kvm_smm_transition(vcpu->vcpu_id, vcpu->arch.smbase, entering_smm);
 
 	if (entering_smm) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f18f579ebde81c..85d2a12c214dda 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8138,9 +8138,14 @@ static void emulator_set_nmi_mask(struct x86_emulate_ctxt *ctxt, bool masked)
 	static_call(kvm_x86_set_nmi_mask)(emul_to_vcpu(ctxt), masked);
 }
 
-static unsigned emulator_get_hflags(struct x86_emulate_ctxt *ctxt)
+static bool emulator_in_smm(struct x86_emulate_ctxt *ctxt)
 {
-	return emul_to_vcpu(ctxt)->arch.hflags;
+	return emul_to_vcpu(ctxt)->arch.hflags & HF_SMM_MASK;
+}
+
+static bool emulator_in_guest_mode(struct x86_emulate_ctxt *ctxt)
+{
+	return emul_to_vcpu(ctxt)->arch.hflags & HF_GUEST_MASK;
 }
 
 #ifndef CONFIG_KVM_SMM
@@ -8209,7 +8214,8 @@ static const struct x86_emulate_ops emulate_ops = {
 	.guest_has_fxsr      = emulator_guest_has_fxsr,
 	.guest_has_rdpid     = emulator_guest_has_rdpid,
 	.set_nmi_mask        = emulator_set_nmi_mask,
-	.get_hflags          = emulator_get_hflags,
+	.in_smm              = emulator_in_smm,
+	.in_guest_mode       = emulator_in_guest_mode,
 	.leave_smm           = emulator_leave_smm,
 	.triple_fault        = emulator_triple_fault,
 	.set_xcr             = emulator_set_xcr,
@@ -8281,8 +8287,6 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
 		     (cs_l && is_long_mode(vcpu))	? X86EMUL_MODE_PROT64 :
 		     cs_db				? X86EMUL_MODE_PROT32 :
 							  X86EMUL_MODE_PROT16;
-	BUILD_BUG_ON(HF_GUEST_MASK != X86EMUL_GUEST_MASK);
-
 	ctxt->interruptibility = 0;
 	ctxt->have_exception = false;
 	ctxt->exception.vector = -1;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (4 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 05/11] KVM: x86: emulator: stop using raw host flags Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2022-12-05 15:41   ` Santosh Shukla
  2022-11-29 19:37 ` [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface Maxim Levitsky
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky

SEV-ES guests don't use IRET interception for the detection of
an end of a NMI.

Therefore it makes sense to create a wrapper to avoid repeating
the check for the SEV-ES.

No functional change is intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 512b2aa21137e2..cfed6ab29c839a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2468,16 +2468,29 @@ static int task_switch_interception(struct kvm_vcpu *vcpu)
 			       has_error_code, error_code);
 }
 
+static void svm_disable_iret_interception(struct vcpu_svm *svm)
+{
+	if (!sev_es_guest(svm->vcpu.kvm))
+		svm_clr_intercept(svm, INTERCEPT_IRET);
+}
+
+static void svm_enable_iret_interception(struct vcpu_svm *svm)
+{
+	if (!sev_es_guest(svm->vcpu.kvm))
+		svm_set_intercept(svm, INTERCEPT_IRET);
+}
+
 static int iret_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	++vcpu->stat.nmi_window_exits;
 	svm->awaiting_iret_completion = true;
-	if (!sev_es_guest(vcpu->kvm)) {
-		svm_clr_intercept(svm, INTERCEPT_IRET);
+
+	svm_disable_iret_interception(svm);
+	if (!sev_es_guest(vcpu->kvm))
 		svm->nmi_iret_rip = kvm_rip_read(vcpu);
-	}
+
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 	return 1;
 }
@@ -3470,8 +3483,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 		return;
 
 	svm->nmi_masked = true;
-	if (!sev_es_guest(vcpu->kvm))
-		svm_set_intercept(svm, INTERCEPT_IRET);
+	svm_enable_iret_interception(svm);
 	++vcpu->stat.nmi_injections;
 }
 
@@ -3614,12 +3626,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 
 	if (masked) {
 		svm->nmi_masked = true;
-		if (!sev_es_guest(vcpu->kvm))
-			svm_set_intercept(svm, INTERCEPT_IRET);
+		svm_enable_iret_interception(svm);
 	} else {
 		svm->nmi_masked = false;
-		if (!sev_es_guest(vcpu->kvm))
-			svm_clr_intercept(svm, INTERCEPT_IRET);
+		svm_disable_iret_interception(svm);
 	}
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (5 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2023-01-28  1:09   ` Sean Christopherson
  2023-01-31 22:28   ` Sean Christopherson
  2022-11-29 19:37 ` [PATCH v2 08/11] x86/cpu: Add CPUID feature bit for VNMI Maxim Levitsky
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky

This patch adds two new vendor callbacks:

- kvm_x86_get_hw_nmi_pending()
- kvm_x86_set_hw_nmi_pending()

Using those callbacks the KVM can take advantage of the
hardware's accelerated delayed NMI delivery (currently vNMI on SVM).

Once NMI is set to pending via this interface, it is assumed that
the hardware will deliver the NMI on its own to the guest once
all the x86 conditions for the NMI delivery are met.

Note that the 'kvm_x86_set_hw_nmi_pending()' callback is allowed
to fail, in which case a normal NMI injection will be attempted
when NMI can be delivered (possibly by using a NMI window).

With vNMI that can happen either if vNMI is already pending or
if a nested guest is running.

When the vNMI injection fails due to the 'vNMI is already pending'
condition, the new NMI will be dropped unless the new NMI can be
injected immediately, so no NMI window will be requested.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 ++
 arch/x86/include/asm/kvm_host.h    | 15 ++++++++++++-
 arch/x86/kvm/x86.c                 | 36 ++++++++++++++++++++++++++----
 3 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index abccd51dcfca1b..9e2db6cf7cc041 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -67,6 +67,8 @@ KVM_X86_OP(get_interrupt_shadow)
 KVM_X86_OP(patch_hypercall)
 KVM_X86_OP(inject_irq)
 KVM_X86_OP(inject_nmi)
+KVM_X86_OP_OPTIONAL_RET0(get_hw_nmi_pending)
+KVM_X86_OP_OPTIONAL_RET0(set_hw_nmi_pending)
 KVM_X86_OP(inject_exception)
 KVM_X86_OP(cancel_injection)
 KVM_X86_OP(interrupt_allowed)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 684a5519812fb2..46993ce61c92db 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -871,8 +871,13 @@ struct kvm_vcpu_arch {
 	u64 tsc_scaling_ratio; /* current scaling ratio */
 
 	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
-	unsigned nmi_pending; /* NMI queued after currently running handler */
+
+	unsigned int nmi_pending; /*
+				   * NMI queued after currently running handler
+				   * (not including a hardware pending NMI (e.g vNMI))
+				   */
 	bool nmi_injected;    /* Trying to inject an NMI this entry */
+
 	bool smi_pending;    /* SMI queued after currently running handler */
 	u8 handling_intr_from_guest;
 
@@ -1602,6 +1607,13 @@ struct kvm_x86_ops {
 	int (*nmi_allowed)(struct kvm_vcpu *vcpu, bool for_injection);
 	bool (*get_nmi_mask)(struct kvm_vcpu *vcpu);
 	void (*set_nmi_mask)(struct kvm_vcpu *vcpu, bool masked);
+
+	/* returns true, if a NMI is pending injection on hardware level (e.g vNMI) */
+	bool (*get_hw_nmi_pending)(struct kvm_vcpu *vcpu);
+
+	/* attempts make a NMI pending via hardware interface (e.g vNMI) */
+	bool (*set_hw_nmi_pending)(struct kvm_vcpu *vcpu);
+
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
@@ -1964,6 +1976,7 @@ int kvm_pic_set_irq(struct kvm_pic *pic, int irq, int irq_source_id, int level);
 void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
+int kvm_get_total_nmi_pending(struct kvm_vcpu *vcpu);
 
 void kvm_update_dr7(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 85d2a12c214dda..3c30e3f1106f79 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5103,7 +5103,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 	events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
 
 	events->nmi.injected = vcpu->arch.nmi_injected;
-	events->nmi.pending = vcpu->arch.nmi_pending != 0;
+	events->nmi.pending = kvm_get_total_nmi_pending(vcpu) != 0;
 	events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu);
 
 	/* events->sipi_vector is never valid when reporting to user space */
@@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.nmi_injected = events->nmi.injected;
 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
-		vcpu->arch.nmi_pending = events->nmi.pending;
+		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
+
 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
 
+	process_nmi(vcpu);
+
 	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
 	    lapic_in_kernel(vcpu))
 		vcpu->arch.apic->sipi_vector = events->sipi_vector;
@@ -10008,6 +10011,10 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
 static void process_nmi(struct kvm_vcpu *vcpu)
 {
 	unsigned limit = 2;
+	int nmi_to_queue = atomic_xchg(&vcpu->arch.nmi_queued, 0);
+
+	if (!nmi_to_queue)
+		return;
 
 	/*
 	 * x86 is limited to one NMI running, and one NMI pending after it.
@@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
 	 * Otherwise, allow two (and we'll inject the first one immediately).
 	 */
 	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
-		limit = 1;
+		limit--;
+
+	/* Also if there is already a NMI hardware queued to be injected,
+	 * decrease the limit again
+	 */
+	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
+		limit--;
 
-	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
+	if (limit <= 0)
+		return;
+
+	/* Attempt to use hardware NMI queueing */
+	if (static_call(kvm_x86_set_hw_nmi_pending)(vcpu)) {
+		limit--;
+		nmi_to_queue--;
+	}
+
+	vcpu->arch.nmi_pending += nmi_to_queue;
 	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
+/* Return total number of NMIs pending injection to the VM */
+int kvm_get_total_nmi_pending(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.nmi_pending + static_call(kvm_x86_get_hw_nmi_pending)(vcpu);
+}
+
 void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
 				       unsigned long *vcpu_bitmap)
 {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 08/11] x86/cpu: Add CPUID feature bit for VNMI
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (6 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2022-11-29 19:37 ` [PATCH v2 09/11] KVM: SVM: Add VNMI bit definition Maxim Levitsky
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky,
	Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

VNMI feature allows the hypervisor to inject NMI into the guest w/o
using Event injection mechanism, The benefit of using VNMI over the
event Injection that does not require tracking the Guest's NMI state and
intercepting the IRET for the NMI completion. VNMI achieves that by
exposing 3 capability bits in VMCB intr_cntrl which helps with
virtualizing NMI injection and NMI_Masking.

The presence of this feature is indicated via the CPUID function
0x8000000A_EDX[25].

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1419c4e04d45f3..ed50f28bdf235b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -359,6 +359,7 @@
 #define X86_FEATURE_VGIF		(15*32+16) /* Virtual GIF */
 #define X86_FEATURE_X2AVIC		(15*32+18) /* Virtual x2apic */
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* Virtual SPEC_CTRL */
+#define X86_FEATURE_AMD_VNMI		(15*32+25) /* Virtual NMI */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* "" SVME addr check */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 09/11] KVM: SVM: Add VNMI bit definition
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (7 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 08/11] x86/cpu: Add CPUID feature bit for VNMI Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2023-01-31 22:42   ` Sean Christopherson
  2022-11-29 19:37 ` [PATCH v2 10/11] KVM: SVM: implement support for vNMI Maxim Levitsky
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky,
	Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

VNMI exposes 3 capability bits (V_NMI, V_NMI_MASK, and V_NMI_ENABLE) to
virtualize NMI and NMI_MASK, Those capability bits are part of
VMCB::intr_ctrl -
V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.

When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor
will clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
handling NMI, After the guest handled the NMI, The processor will clear
the V_NMI_MASK on the successful completion of IRET instruction Or if
VMEXIT occurs while delivering the virtual NMI.

To enable the VNMI capability, Hypervisor need to program
V_NMI_ENABLE bit 1.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
 arch/x86/include/asm/svm.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index cb1ee53ad3b189..26d6f549ce2b46 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -203,6 +203,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define X2APIC_MODE_SHIFT 30
 #define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
 
+#define V_NMI_PENDING_SHIFT 11
+#define V_NMI_PENDING (1 << V_NMI_PENDING_SHIFT)
+#define V_NMI_MASK_SHIFT 12
+#define V_NMI_MASK (1 << V_NMI_MASK_SHIFT)
+#define V_NMI_ENABLE_SHIFT 26
+#define V_NMI_ENABLE (1 << V_NMI_ENABLE_SHIFT)
+
 #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
 #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (8 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 09/11] KVM: SVM: Add VNMI bit definition Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2022-12-04 17:18   ` Maxim Levitsky
                     ` (3 more replies)
  2022-11-29 19:37 ` [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI Maxim Levitsky
                   ` (5 subsequent siblings)
  15 siblings, 4 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky,
	Santosh Shukla

This patch implements support for injecting pending
NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI
feature.

Note that the vNMI can't cause a VM exit, which is needed
when a nested guest intercepts NMIs.

Therefore to avoid breaking nesting, the vNMI is inhibited while
a nested guest is running and instead, the legacy NMI window
detection and delivery method is used.

While it is possible to passthrough the vNMI if a nested guest
doesn't intercept NMIs, such usage is very uncommon, and it's
not worth to optimize for.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c |  42 +++++++++++++++
 arch/x86/kvm/svm/svm.c    | 111 ++++++++++++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.h    |  10 ++++
 3 files changed, 140 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index e891318595113e..5bea672bf8b12d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj)
 	return type == SVM_EVTINJ_TYPE_NMI;
 }
 
+static void nested_svm_save_vnmi(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb01 = svm->vmcb01.ptr;
+
+	/*
+	 * Copy the vNMI state back to software NMI tracking state
+	 * for the duration of the nested run
+	 */
+
+	svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK;
+	svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING;
+}
+
+static void nested_svm_restore_vnmi(struct vcpu_svm *svm)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct vmcb *vmcb01 = svm->vmcb01.ptr;
+
+	/*
+	 * Restore the vNMI state from the software NMI tracking state
+	 * after a nested run
+	 */
+
+	if (svm->nmi_masked)
+		vmcb01->control.int_ctl |= V_NMI_MASK;
+	else
+		vmcb01->control.int_ctl &= ~V_NMI_MASK;
+
+	if (vcpu->arch.nmi_pending) {
+		vcpu->arch.nmi_pending--;
+		vmcb01->control.int_ctl |= V_NMI_PENDING;
+	} else
+		vmcb01->control.int_ctl &= ~V_NMI_PENDING;
+}
+
+
 static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 					  unsigned long vmcb12_rip,
 					  unsigned long vmcb12_csbase)
@@ -646,6 +682,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	else
 		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
 
+	if (vnmi)
+		nested_svm_save_vnmi(svm);
+
 	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
 	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
 	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
@@ -1049,6 +1088,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		svm_update_lbrv(vcpu);
 	}
 
+	if (vnmi)
+		nested_svm_restore_vnmi(svm);
+
 	/*
 	 * On vmexit the  GIF is set to false and
 	 * no event can be injected in L1.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cfed6ab29c839a..bf10adcf3170a8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -230,6 +230,8 @@ module_param(dump_invalid_vmcb, bool, 0644);
 bool intercept_smi = true;
 module_param(intercept_smi, bool, 0444);
 
+bool vnmi = true;
+module_param(vnmi, bool, 0444);
 
 static bool svm_gp_erratum_intercept = true;
 
@@ -1299,6 +1301,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
+	if (vnmi)
+		svm->vmcb->control.int_ctl |= V_NMI_ENABLE;
+
 	if (vgif) {
 		svm_clr_intercept(svm, INTERCEPT_STGI);
 		svm_clr_intercept(svm, INTERCEPT_CLGI);
@@ -3487,6 +3492,39 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 	++vcpu->stat.nmi_injections;
 }
 
+
+static bool svm_get_hw_nmi_pending(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!is_vnmi_enabled(svm))
+		return false;
+
+	return !!(svm->vmcb->control.int_ctl & V_NMI_MASK);
+}
+
+static bool svm_set_hw_nmi_pending(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (!is_vnmi_enabled(svm))
+		return false;
+
+	if (svm->vmcb->control.int_ctl & V_NMI_PENDING)
+		return false;
+
+	svm->vmcb->control.int_ctl |= V_NMI_PENDING;
+	vmcb_mark_dirty(svm->vmcb, VMCB_INTR);
+
+	/*
+	 * NMI isn't yet technically injected but
+	 * this rough estimation should be good enough
+	 */
+	++vcpu->stat.nmi_injections;
+
+	return true;
+}
+
 static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3582,11 +3620,38 @@ static void svm_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 		svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
 }
 
+static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (is_vnmi_enabled(svm))
+		return svm->vmcb->control.int_ctl & V_NMI_MASK;
+	else
+		return svm->nmi_masked;
+}
+
+static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (is_vnmi_enabled(svm)) {
+		if (masked)
+			svm->vmcb->control.int_ctl |= V_NMI_MASK;
+		else
+			svm->vmcb->control.int_ctl &= ~V_NMI_MASK;
+	} else {
+		svm->nmi_masked = masked;
+		if (masked)
+			svm_enable_iret_interception(svm);
+		else
+			svm_disable_iret_interception(svm);
+	}
+}
+
 bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb *vmcb = svm->vmcb;
-	bool ret;
 
 	if (!gif_set(svm))
 		return true;
@@ -3594,10 +3659,10 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
 		return false;
 
-	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
-	      (svm->nmi_masked);
+	if (svm_get_nmi_mask(vcpu))
+		return true;
 
-	return ret;
+	return vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK;
 }
 
 static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
@@ -3615,24 +3680,6 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 	return 1;
 }
 
-static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
-{
-	return to_svm(vcpu)->nmi_masked;
-}
-
-static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-
-	if (masked) {
-		svm->nmi_masked = true;
-		svm_enable_iret_interception(svm);
-	} else {
-		svm->nmi_masked = false;
-		svm_disable_iret_interception(svm);
-	}
-}
-
 bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3725,10 +3772,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 	/*
 	 * Something prevents NMI from been injected. Single step over possible
 	 * problem (IRET or exception injection or interrupt shadow)
+	 *
+	 * With vNMI we should never need an NMI window
+	 * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ)
 	 */
+	if (WARN_ON_ONCE(is_vnmi_enabled(svm)))
+		return;
+
 	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
-	svm->nmi_singlestep = true;
 	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+	svm->nmi_singlestep = true;
 }
 
 static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
@@ -4770,6 +4823,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.patch_hypercall = svm_patch_hypercall,
 	.inject_irq = svm_inject_irq,
 	.inject_nmi = svm_inject_nmi,
+	.get_hw_nmi_pending = svm_get_hw_nmi_pending,
+	.set_hw_nmi_pending = svm_set_hw_nmi_pending,
 	.inject_exception = svm_inject_exception,
 	.cancel_injection = svm_cancel_injection,
 	.interrupt_allowed = svm_interrupt_allowed,
@@ -5058,6 +5113,16 @@ static __init int svm_hardware_setup(void)
 			pr_info("Virtual GIF supported\n");
 	}
 
+
+	vnmi = vgif && vnmi && boot_cpu_has(X86_FEATURE_AMD_VNMI);
+	if (vnmi)
+		pr_info("Virtual NMI enabled\n");
+
+	if (!vnmi) {
+		svm_x86_ops.get_hw_nmi_pending = NULL;
+		svm_x86_ops.set_hw_nmi_pending = NULL;
+	}
+
 	if (lbrv) {
 		if (!boot_cpu_has(X86_FEATURE_LBRV))
 			lbrv = false;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 587ddc150f9f34..0b7e1790fadde1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
 extern int vgif;
 extern bool intercept_smi;
+extern bool vnmi;
 
 enum avic_modes {
 	AVIC_MODE_NONE = 0,
@@ -553,6 +554,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
 	       (msr < (APIC_BASE_MSR + 0x100));
 }
 
+static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
+{
+	/* L1's vNMI is inhibited while nested guest is running */
+	if (is_guest_mode(&svm->vcpu))
+		return false;
+
+	return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE);
+}
+
 /* svm.c */
 #define MSR_INVALID				0xffffffffU
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (9 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 10/11] KVM: SVM: implement support for vNMI Maxim Levitsky
@ 2022-11-29 19:37 ` Maxim Levitsky
  2022-12-05 17:14   ` Santosh Shukla
  2023-02-01  0:44   ` Sean Christopherson
  2022-12-06  9:58 ` [PATCH v2 00/11] SVM: vNMI (with my fixes) Santosh Shukla
                   ` (4 subsequent siblings)
  15 siblings, 2 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-11-29 19:37 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Maxim Levitsky,
	Santosh Shukla

This patch allows L1 to use vNMI to accelerate its injection
of NMIs to L2 by passing through vNMI int_ctl bits from vmcb12
to/from vmcb02.

While L2 runs, L1's vNMI is inhibited, and L1's NMIs are injected
normally.

In order to support nested VNMI requires saving and restoring the VNMI
bits during nested entry and exit.
In case of L1 and L2 both using VNMI- Copy VNMI bits from vmcb12 to
vmcb02 during entry and vice-versa during exit.
And in case of L1 uses VNMI and L2 doesn't- Copy VNMI bits from vmcb01 to
vmcb02 during entry and vice-versa during exit.

Tested with the KVM-unit-test and Nested Guest scenario.


Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 13 ++++++++++++-
 arch/x86/kvm/svm/svm.c    |  5 +++++
 arch/x86/kvm/svm/svm.h    |  6 ++++++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 5bea672bf8b12d..81346665058e26 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -278,6 +278,11 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
 	if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
 		return false;
 
+	if (CC((control->int_ctl & V_NMI_ENABLE) &&
+		!vmcb12_is_intercept(control, INTERCEPT_NMI))) {
+		return false;
+	}
+
 	return true;
 }
 
@@ -427,6 +432,9 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 	if (nested_vgif_enabled(svm))
 		mask |= V_GIF_MASK;
 
+	if (nested_vnmi_enabled(svm))
+		mask |= V_NMI_MASK | V_NMI_PENDING;
+
 	svm->nested.ctl.int_ctl        &= ~mask;
 	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
 }
@@ -682,8 +690,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	else
 		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
 
-	if (vnmi)
+	if (vnmi) {
 		nested_svm_save_vnmi(svm);
+		if (nested_vnmi_enabled(svm))
+			int_ctl_vmcb12_bits |= (V_NMI_PENDING | V_NMI_ENABLE | V_NMI_MASK);
+	}
 
 	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
 	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bf10adcf3170a8..fb203f536d2f9b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4214,6 +4214,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
 
+	svm->vnmi_enabled = vnmi && guest_cpuid_has(vcpu, X86_FEATURE_AMD_VNMI);
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
@@ -4967,6 +4969,9 @@ static __init void svm_set_cpu_caps(void)
 		if (vgif)
 			kvm_cpu_cap_set(X86_FEATURE_VGIF);
 
+		if (vnmi)
+			kvm_cpu_cap_set(X86_FEATURE_AMD_VNMI);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0b7e1790fadde1..8fb2085188c5ac 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -271,6 +271,7 @@ struct vcpu_svm {
 	bool pause_filter_enabled         : 1;
 	bool pause_threshold_enabled      : 1;
 	bool vgif_enabled                 : 1;
+	bool vnmi_enabled                 : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
@@ -545,6 +546,11 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
 	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
 }
 
+static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
+{
+	return svm->vnmi_enabled && (svm->nested.ctl.int_ctl & V_NMI_ENABLE);
+}
+
 static inline bool is_x2apic_msrpm_offset(u32 offset)
 {
 	/* 4 msrs per u8, and 4 u8 in u32 */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2022-11-29 19:37 ` [PATCH v2 10/11] KVM: SVM: implement support for vNMI Maxim Levitsky
@ 2022-12-04 17:18   ` Maxim Levitsky
  2022-12-05 17:07   ` Santosh Shukla
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-12-04 17:18 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Santosh Shukla

On Tue, 2022-11-29 at 21:37 +0200, Maxim Levitsky wrote:
> This patch implements support for injecting pending
> NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI
> feature.
> 
> Note that the vNMI can't cause a VM exit, which is needed
> when a nested guest intercepts NMIs.
> 
> Therefore to avoid breaking nesting, the vNMI is inhibited while
> a nested guest is running and instead, the legacy NMI window
> detection and delivery method is used.
> 
> While it is possible to passthrough the vNMI if a nested guest
> doesn't intercept NMIs, such usage is very uncommon, and it's
> not worth to optimize for.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c |  42 +++++++++++++++
>  arch/x86/kvm/svm/svm.c    | 111 ++++++++++++++++++++++++++++++--------
>  arch/x86/kvm/svm/svm.h    |  10 ++++
>  3 files changed, 140 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index e891318595113e..5bea672bf8b12d 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj)
>  	return type == SVM_EVTINJ_TYPE_NMI;
>  }


Tiny bits of self review, I noticed some mistakes:

>  
> +static void nested_svm_save_vnmi(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/*
> +	 * Copy the vNMI state back to software NMI tracking state
> +	 * for the duration of the nested run
> +	 */
> +
> +	svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK;
> +	svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING;
Sorry for a mistake here - this should be converted to a boolean first, something like that:


if (vmcb01->control.int_ctl & V_NMI_PENDING)
	svm->vcpu.arch.nmi_pending++;



> +}
> +
> +static void nested_svm_restore_vnmi(struct vcpu_svm *svm)
> +{
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/*
> +	 * Restore the vNMI state from the software NMI tracking state
> +	 * after a nested run
> +	 */
> +
> +	if (svm->nmi_masked)
> +		vmcb01->control.int_ctl |= V_NMI_MASK;
> +	else
> +		vmcb01->control.int_ctl &= ~V_NMI_MASK;
> +
> +	if (vcpu->arch.nmi_pending) {
> +		vcpu->arch.nmi_pending--;
> +		vmcb01->control.int_ctl |= V_NMI_PENDING;
> +	} else
> +		vmcb01->control.int_ctl &= ~V_NMI_PENDING;
> +}
> +
> +
>  static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  					  unsigned long vmcb12_rip,
>  					  unsigned long vmcb12_csbase)
> @@ -646,6 +682,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  	else
>  		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
>  
> +	if (vnmi)
> +		nested_svm_save_vnmi(svm);
> +
>  	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
>  	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
>  	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
> @@ -1049,6 +1088,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  		svm_update_lbrv(vcpu);
>  	}
>  
> +	if (vnmi)
> +		nested_svm_restore_vnmi(svm);
> +
>  	/*
>  	 * On vmexit the  GIF is set to false and
>  	 * no event can be injected in L1.
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cfed6ab29c839a..bf10adcf3170a8 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -230,6 +230,8 @@ module_param(dump_invalid_vmcb, bool, 0644);
>  bool intercept_smi = true;
>  module_param(intercept_smi, bool, 0444);
>  
> +bool vnmi = true;
> +module_param(vnmi, bool, 0444);
>  
>  static bool svm_gp_erratum_intercept = true;
>  
> @@ -1299,6 +1301,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  	if (kvm_vcpu_apicv_active(vcpu))
>  		avic_init_vmcb(svm, vmcb);
>  
> +	if (vnmi)
> +		svm->vmcb->control.int_ctl |= V_NMI_ENABLE;
> +
>  	if (vgif) {
>  		svm_clr_intercept(svm, INTERCEPT_STGI);
>  		svm_clr_intercept(svm, INTERCEPT_CLGI);
> @@ -3487,6 +3492,39 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>  	++vcpu->stat.nmi_injections;
>  }
>  
> +
> +static bool svm_get_hw_nmi_pending(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!is_vnmi_enabled(svm))
> +		return false;
> +
> +	return !!(svm->vmcb->control.int_ctl & V_NMI_MASK);
> +}
> +
> +static bool svm_set_hw_nmi_pending(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!is_vnmi_enabled(svm))
> +		return false;
> +
> +	if (svm->vmcb->control.int_ctl & V_NMI_PENDING)
> +		return false;
> +
> +	svm->vmcb->control.int_ctl |= V_NMI_PENDING;
> +	vmcb_mark_dirty(svm->vmcb, VMCB_INTR);

vmcb_mark_dirty is not needed I think now, as KVM always leaves the VMCB_INTR dirty bit
due to vTPR update.


Best regards,
	Maxim Levitsky

> +
> +	/*
> +	 * NMI isn't yet technically injected but
> +	 * this rough estimation should be good enough
> +	 */
> +	++vcpu->stat.nmi_injections;
> +
> +	return true;
> +}
> +
>  static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3582,11 +3620,38 @@ static void svm_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  		svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>  }
>  
> +static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (is_vnmi_enabled(svm))
> +		return svm->vmcb->control.int_ctl & V_NMI_MASK;
> +	else
> +		return svm->nmi_masked;
> +}
> +
> +static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (is_vnmi_enabled(svm)) {
> +		if (masked)
> +			svm->vmcb->control.int_ctl |= V_NMI_MASK;
> +		else
> +			svm->vmcb->control.int_ctl &= ~V_NMI_MASK;
> +	} else {
> +		svm->nmi_masked = masked;
> +		if (masked)
> +			svm_enable_iret_interception(svm);
> +		else
> +			svm_disable_iret_interception(svm);
> +	}
> +}
> +
>  bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct vmcb *vmcb = svm->vmcb;
> -	bool ret;
>  
>  	if (!gif_set(svm))
>  		return true;
> @@ -3594,10 +3659,10 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>  	if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
>  		return false;
>  
> -	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
> -	      (svm->nmi_masked);
> +	if (svm_get_nmi_mask(vcpu))
> +		return true;
>  
> -	return ret;
> +	return vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK;
>  }
>  
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> @@ -3615,24 +3680,6 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>  	return 1;
>  }
>  
> -static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> -{
> -	return to_svm(vcpu)->nmi_masked;
> -}
> -
> -static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> -{
> -	struct vcpu_svm *svm = to_svm(vcpu);
> -
> -	if (masked) {
> -		svm->nmi_masked = true;
> -		svm_enable_iret_interception(svm);
> -	} else {
> -		svm->nmi_masked = false;
> -		svm_disable_iret_interception(svm);
> -	}
> -}
> -
>  bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3725,10 +3772,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  	/*
>  	 * Something prevents NMI from been injected. Single step over possible
>  	 * problem (IRET or exception injection or interrupt shadow)
> +	 *
> +	 * With vNMI we should never need an NMI window
> +	 * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ)
>  	 */
> +	if (WARN_ON_ONCE(is_vnmi_enabled(svm)))
> +		return;
> +
>  	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> -	svm->nmi_singlestep = true;
>  	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +	svm->nmi_singlestep = true;
>  }
>  
>  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> @@ -4770,6 +4823,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.patch_hypercall = svm_patch_hypercall,
>  	.inject_irq = svm_inject_irq,
>  	.inject_nmi = svm_inject_nmi,
> +	.get_hw_nmi_pending = svm_get_hw_nmi_pending,
> +	.set_hw_nmi_pending = svm_set_hw_nmi_pending,
>  	.inject_exception = svm_inject_exception,
>  	.cancel_injection = svm_cancel_injection,
>  	.interrupt_allowed = svm_interrupt_allowed,
> @@ -5058,6 +5113,16 @@ static __init int svm_hardware_setup(void)
>  			pr_info("Virtual GIF supported\n");
>  	}
>  
> +
> +	vnmi = vgif && vnmi && boot_cpu_has(X86_FEATURE_AMD_VNMI);
> +	if (vnmi)
> +		pr_info("Virtual NMI enabled\n");
> +
> +	if (!vnmi) {
> +		svm_x86_ops.get_hw_nmi_pending = NULL;
> +		svm_x86_ops.set_hw_nmi_pending = NULL;
> +	}
> +
>  	if (lbrv) {
>  		if (!boot_cpu_has(X86_FEATURE_LBRV))
>  			lbrv = false;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 587ddc150f9f34..0b7e1790fadde1 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>  extern bool npt_enabled;
>  extern int vgif;
>  extern bool intercept_smi;
> +extern bool vnmi;
>  
>  enum avic_modes {
>  	AVIC_MODE_NONE = 0,
> @@ -553,6 +554,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
>  	       (msr < (APIC_BASE_MSR + 0x100));
>  }
>  
> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> +{
> +	/* L1's vNMI is inhibited while nested guest is running */
> +	if (is_guest_mode(&svm->vcpu))
> +		return false;
> +
> +	return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE);
> +}
> +
>  /* svm.c */
>  #define MSR_INVALID				0xffffffffU
>  



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit
  2022-11-29 19:37 ` [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
@ 2022-12-05 14:05   ` Santosh Shukla
  2022-12-06 12:13     ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2022-12-05 14:05 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, santosh.shukla

Hi Maxim,

On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> The CPU doesn't change TLB_CTL value as stated in the PRM (15.16.2):
> 
nits:
s / PRM (15.16.2) / APM (15.16.1 - TLB Flush)

>   "The VMRUN instruction reads, but does not change, the
>   value of the TLB_CONTROL field"
> 
> Therefore the KVM shouldn't do that either.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index bc9cd7086fa972..37af0338da7c32 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1010,7 +1010,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  		vmcb12->control.next_rip  = vmcb02->control.next_rip;
>  
>  	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
> -	vmcb12->control.tlb_ctl           = svm->nested.ctl.tlb_ctl;
>  	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
>  	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
>  


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS
  2022-11-29 19:37 ` [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS Maxim Levitsky
@ 2022-12-05 15:31   ` Santosh Shukla
  2023-01-28  0:56   ` Sean Christopherson
  1 sibling, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2022-12-05 15:31 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, santosh.shukla

On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> GIF and 'waiting for IRET' are used only for the SVM and thus should
> not be in H_FLAGS.
> 
> NMI mask is not x86 specific but it is only used for SVM without vNMI.
> 
> The VMX have similar concept of NMI mask (soft_vnmi_blocked),
> and it is used when its 'vNMI' feature is not enabled,
> but because the VMX can't intercept IRET, it is more of a hack,
> and thus should not use common host flags either.
> 
> No functional change is intended.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  3 ---
>  arch/x86/kvm/svm/svm.c          | 22 +++++++++++++---------
>  arch/x86/kvm/svm/svm.h          | 25 ++++++++++++++++++++++---
>  3 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 70af7240a1d5af..9208ad7a6bd004 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2052,9 +2052,6 @@ enum {
>  	TASK_SWITCH_GATE = 3,
>  };
>  
> -#define HF_GIF_MASK		(1 << 0)
> -#define HF_NMI_MASK		(1 << 3)
> -#define HF_IRET_MASK		(1 << 4)
>  #define HF_GUEST_MASK		(1 << 5) /* VCPU is in guest-mode */
>  
>  #ifdef CONFIG_KVM_SMM
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 91352d69284524..512b2aa21137e2 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1326,6 +1326,9 @@ static void __svm_vcpu_reset(struct kvm_vcpu *vcpu)
>  	vcpu->arch.microcode_version = 0x01000065;
>  	svm->tsc_ratio_msr = kvm_caps.default_tsc_scaling_ratio;
>  
> +	svm->nmi_masked = false;
> +	svm->awaiting_iret_completion = false;
> +
>  	if (sev_es_guest(vcpu->kvm))
>  		sev_es_vcpu_reset(svm);
>  }
> @@ -2470,7 +2473,7 @@ static int iret_interception(struct kvm_vcpu *vcpu)
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
>  	++vcpu->stat.nmi_window_exits;
> -	vcpu->arch.hflags |= HF_IRET_MASK;
> +	svm->awaiting_iret_completion = true;
>  	if (!sev_es_guest(vcpu->kvm)) {
>  		svm_clr_intercept(svm, INTERCEPT_IRET);
>  		svm->nmi_iret_rip = kvm_rip_read(vcpu);
> @@ -3466,7 +3469,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>  	if (svm->nmi_l1_to_l2)
>  		return;
>  
> -	vcpu->arch.hflags |= HF_NMI_MASK;
> +	svm->nmi_masked = true;
>  	if (!sev_es_guest(vcpu->kvm))
>  		svm_set_intercept(svm, INTERCEPT_IRET);
>  	++vcpu->stat.nmi_injections;
> @@ -3580,7 +3583,7 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>  		return false;
>  
>  	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
> -	      (vcpu->arch.hflags & HF_NMI_MASK);
> +	      (svm->nmi_masked);
>  
>  	return ret;
>  }
> @@ -3602,7 +3605,7 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>  
>  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
>  {
> -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
> +	return to_svm(vcpu)->nmi_masked;
>  }
>  
>  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> @@ -3610,11 +3613,11 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
>  	if (masked) {
> -		vcpu->arch.hflags |= HF_NMI_MASK;
> +		svm->nmi_masked = true;
>  		if (!sev_es_guest(vcpu->kvm))
>  			svm_set_intercept(svm, INTERCEPT_IRET);
>  	} else {
> -		vcpu->arch.hflags &= ~HF_NMI_MASK;
> +		svm->nmi_masked = false;
>  		if (!sev_es_guest(vcpu->kvm))
>  			svm_clr_intercept(svm, INTERCEPT_IRET);
>  	}
> @@ -3700,7 +3703,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
> +	if (svm->nmi_masked && !svm->awaiting_iret_completion)
>  		return; /* IRET will cause a vm exit */
>  
>  	if (!gif_set(svm)) {
> @@ -3824,10 +3827,11 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu)
>  	 * If we've made progress since setting HF_IRET_MASK, we've
>  	 * executed an IRET and can allow NMI injection.
>  	 */
> -	if ((vcpu->arch.hflags & HF_IRET_MASK) &&
> +	if (svm->awaiting_iret_completion &&
>  	    (sev_es_guest(vcpu->kvm) ||
>  	     kvm_rip_read(vcpu) != svm->nmi_iret_rip)) {
> -		vcpu->arch.hflags &= ~(HF_NMI_MASK | HF_IRET_MASK);
> +		svm->awaiting_iret_completion = false;
> +		svm->nmi_masked = false;
>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
>  	}
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 4826e6cc611bf1..587ddc150f9f34 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -237,8 +237,24 @@ struct vcpu_svm {
>  
>  	struct svm_nested_state nested;
>  
> +	/* NMI mask value, used when vNMI is not enabled */
> +	bool nmi_masked;
> +
> +	/*
> +	 * True when the NMI still masked but guest IRET was just intercepted
> +	 * and KVM is waiting for RIP change which will signal that this IRET was
> +	 * retired and thus NMI can be unmasked.
> +	 */
> +	bool awaiting_iret_completion;
> +
> +	/*
> +	 * Set when KVM waits for IRET completion and needs to
> +	 * inject NMIs as soon as it completes (e.g NMI is pending injection).
> +	 * The KVM takes over EFLAGS.TF for this.
> +	 */
>  	bool nmi_singlestep;
>  	u64 nmi_singlestep_guest_rflags;
> +
^^^ blank line.

Thanks,
Santosh
>  	bool nmi_l1_to_l2;
>  
>  	unsigned long soft_int_csbase;
> @@ -280,6 +296,9 @@ struct vcpu_svm {
>  	bool guest_state_loaded;
>  
>  	bool x2avic_msrs_intercepted;
> +
> +	/* Guest GIF value which is used when vGIF is not enabled */
> +	bool gif_value;
>  };
>  
>  struct svm_cpu_data {
> @@ -497,7 +516,7 @@ static inline void enable_gif(struct vcpu_svm *svm)
>  	if (vmcb)
>  		vmcb->control.int_ctl |= V_GIF_MASK;
>  	else
> -		svm->vcpu.arch.hflags |= HF_GIF_MASK;
> +		svm->gif_value = true;
>  }
>  
>  static inline void disable_gif(struct vcpu_svm *svm)
> @@ -507,7 +526,7 @@ static inline void disable_gif(struct vcpu_svm *svm)
>  	if (vmcb)
>  		vmcb->control.int_ctl &= ~V_GIF_MASK;
>  	else
> -		svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
> +		svm->gif_value = false;
>  }
>  
>  static inline bool gif_set(struct vcpu_svm *svm)
> @@ -517,7 +536,7 @@ static inline bool gif_set(struct vcpu_svm *svm)
>  	if (vmcb)
>  		return !!(vmcb->control.int_ctl & V_GIF_MASK);
>  	else
> -		return !!(svm->vcpu.arch.hflags & HF_GIF_MASK);
> +		return svm->gif_value;
>  }
>  
>  static inline bool nested_npt_enabled(struct vcpu_svm *svm)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception
  2022-11-29 19:37 ` [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception Maxim Levitsky
@ 2022-12-05 15:41   ` Santosh Shukla
  2022-12-06 12:14     ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2022-12-05 15:41 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, santosh.shukla

On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> SEV-ES guests don't use IRET interception for the detection of
> an end of a NMI.
> 
> Therefore it makes sense to create a wrapper to avoid repeating
> the check for the SEV-ES.
> 
> No functional change is intended.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 512b2aa21137e2..cfed6ab29c839a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2468,16 +2468,29 @@ static int task_switch_interception(struct kvm_vcpu *vcpu)
>  			       has_error_code, error_code);
>  }
>  
> +static void svm_disable_iret_interception(struct vcpu_svm *svm)
> +{
> +	if (!sev_es_guest(svm->vcpu.kvm))
> +		svm_clr_intercept(svm, INTERCEPT_IRET);
> +}
> +
> +static void svm_enable_iret_interception(struct vcpu_svm *svm)
> +{
> +	if (!sev_es_guest(svm->vcpu.kvm))
> +		svm_set_intercept(svm, INTERCEPT_IRET);
> +}
> +

nits:
s/_iret_interception / _iret_intercept
does that make sense?

Thanks,
Santosh

>  static int iret_interception(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
>  	++vcpu->stat.nmi_window_exits;
>  	svm->awaiting_iret_completion = true;
> -	if (!sev_es_guest(vcpu->kvm)) {
> -		svm_clr_intercept(svm, INTERCEPT_IRET);
> +
> +	svm_disable_iret_interception(svm);
> +	if (!sev_es_guest(vcpu->kvm))
>  		svm->nmi_iret_rip = kvm_rip_read(vcpu);
> -	}
> +
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  	return 1;
>  }
> @@ -3470,8 +3483,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>  		return;
>  
>  	svm->nmi_masked = true;
> -	if (!sev_es_guest(vcpu->kvm))
> -		svm_set_intercept(svm, INTERCEPT_IRET);
> +	svm_enable_iret_interception(svm);
>  	++vcpu->stat.nmi_injections;
>  }
>  
> @@ -3614,12 +3626,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>  
>  	if (masked) {
>  		svm->nmi_masked = true;
> -		if (!sev_es_guest(vcpu->kvm))
> -			svm_set_intercept(svm, INTERCEPT_IRET);
> +		svm_enable_iret_interception(svm);
>  	} else {
>  		svm->nmi_masked = false;
> -		if (!sev_es_guest(vcpu->kvm))
> -			svm_clr_intercept(svm, INTERCEPT_IRET);
> +		svm_disable_iret_interception(svm);
>  	}
>  }
>  


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2022-11-29 19:37 ` [PATCH v2 10/11] KVM: SVM: implement support for vNMI Maxim Levitsky
  2022-12-04 17:18   ` Maxim Levitsky
@ 2022-12-05 17:07   ` Santosh Shukla
  2023-01-28  1:10   ` Sean Christopherson
  2023-02-01  0:22   ` Sean Christopherson
  3 siblings, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2022-12-05 17:07 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Santosh Shukla

On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> This patch implements support for injecting pending
> NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI
> feature.
> 
> Note that the vNMI can't cause a VM exit, which is needed
> when a nested guest intercepts NMIs.
> 
> Therefore to avoid breaking nesting, the vNMI is inhibited while
> a nested guest is running and instead, the legacy NMI window
> detection and delivery method is used.
> 
> While it is possible to passthrough the vNMI if a nested guest
> doesn't intercept NMIs, such usage is very uncommon, and it's
> not worth to optimize for.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c |  42 +++++++++++++++
>  arch/x86/kvm/svm/svm.c    | 111 ++++++++++++++++++++++++++++++--------
>  arch/x86/kvm/svm/svm.h    |  10 ++++
>  3 files changed, 140 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index e891318595113e..5bea672bf8b12d 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj)
>  	return type == SVM_EVTINJ_TYPE_NMI;
>  }
>  
> +static void nested_svm_save_vnmi(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/*
> +	 * Copy the vNMI state back to software NMI tracking state
> +	 * for the duration of the nested run
> +	 */
> +
nits - Extra line.

> +	svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK;
> +	svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING;
> +}
> +
> +static void nested_svm_restore_vnmi(struct vcpu_svm *svm)
> +{
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/*
> +	 * Restore the vNMI state from the software NMI tracking state
> +	 * after a nested run
> +	 */
> +
Ditto...

Thanks,
Santosh
> +	if (svm->nmi_masked)
> +		vmcb01->control.int_ctl |= V_NMI_MASK;
> +	else
> +		vmcb01->control.int_ctl &= ~V_NMI_MASK;
> +
> +	if (vcpu->arch.nmi_pending) {
> +		vcpu->arch.nmi_pending--;
> +		vmcb01->control.int_ctl |= V_NMI_PENDING;
> +	} else
> +		vmcb01->control.int_ctl &= ~V_NMI_PENDING;
> +}
> +
> +
>  static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  					  unsigned long vmcb12_rip,
>  					  unsigned long vmcb12_csbase)
> @@ -646,6 +682,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  	else
>  		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
>  
> +	if (vnmi)
> +		nested_svm_save_vnmi(svm);
> +
>  	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
>  	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
>  	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
> @@ -1049,6 +1088,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  		svm_update_lbrv(vcpu);
>  	}
>  
> +	if (vnmi)
> +		nested_svm_restore_vnmi(svm);
> +
>  	/*
>  	 * On vmexit the  GIF is set to false and
>  	 * no event can be injected in L1.
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cfed6ab29c839a..bf10adcf3170a8 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -230,6 +230,8 @@ module_param(dump_invalid_vmcb, bool, 0644);
>  bool intercept_smi = true;
>  module_param(intercept_smi, bool, 0444);
>  
> +bool vnmi = true;
> +module_param(vnmi, bool, 0444);
>  
>  static bool svm_gp_erratum_intercept = true;
>  
> @@ -1299,6 +1301,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  	if (kvm_vcpu_apicv_active(vcpu))
>  		avic_init_vmcb(svm, vmcb);
>  
> +	if (vnmi)
> +		svm->vmcb->control.int_ctl |= V_NMI_ENABLE;
> +
>  	if (vgif) {
>  		svm_clr_intercept(svm, INTERCEPT_STGI);
>  		svm_clr_intercept(svm, INTERCEPT_CLGI);
> @@ -3487,6 +3492,39 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>  	++vcpu->stat.nmi_injections;
>  }
>  
> +
> +static bool svm_get_hw_nmi_pending(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!is_vnmi_enabled(svm))
> +		return false;
> +
> +	return !!(svm->vmcb->control.int_ctl & V_NMI_MASK);
> +}
> +
> +static bool svm_set_hw_nmi_pending(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!is_vnmi_enabled(svm))
> +		return false;
> +
> +	if (svm->vmcb->control.int_ctl & V_NMI_PENDING)
> +		return false;
> +
> +	svm->vmcb->control.int_ctl |= V_NMI_PENDING;
> +	vmcb_mark_dirty(svm->vmcb, VMCB_INTR);
> +
> +	/*
> +	 * NMI isn't yet technically injected but
> +	 * this rough estimation should be good enough
> +	 */
> +	++vcpu->stat.nmi_injections;
> +
> +	return true;
> +}
> +
>  static void svm_inject_irq(struct kvm_vcpu *vcpu, bool reinjected)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3582,11 +3620,38 @@ static void svm_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>  		svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>  }
>  
> +static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (is_vnmi_enabled(svm))
> +		return svm->vmcb->control.int_ctl & V_NMI_MASK;
> +	else
> +		return svm->nmi_masked;
> +}
> +
> +static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (is_vnmi_enabled(svm)) {
> +		if (masked)
> +			svm->vmcb->control.int_ctl |= V_NMI_MASK;
> +		else
> +			svm->vmcb->control.int_ctl &= ~V_NMI_MASK;
> +	} else {
> +		svm->nmi_masked = masked;
> +		if (masked)
> +			svm_enable_iret_interception(svm);
> +		else
> +			svm_disable_iret_interception(svm);
> +	}
> +}
> +
>  bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct vmcb *vmcb = svm->vmcb;
> -	bool ret;
>  
>  	if (!gif_set(svm))
>  		return true;
> @@ -3594,10 +3659,10 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>  	if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
>  		return false;
>  
> -	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
> -	      (svm->nmi_masked);
> +	if (svm_get_nmi_mask(vcpu))
> +		return true;
>  
> -	return ret;
> +	return vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK;
>  }
>  
>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> @@ -3615,24 +3680,6 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>  	return 1;
>  }
>  
> -static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> -{
> -	return to_svm(vcpu)->nmi_masked;
> -}
> -
> -static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> -{
> -	struct vcpu_svm *svm = to_svm(vcpu);
> -
> -	if (masked) {
> -		svm->nmi_masked = true;
> -		svm_enable_iret_interception(svm);
> -	} else {
> -		svm->nmi_masked = false;
> -		svm_disable_iret_interception(svm);
> -	}
> -}
> -
>  bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3725,10 +3772,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  	/*
>  	 * Something prevents NMI from been injected. Single step over possible
>  	 * problem (IRET or exception injection or interrupt shadow)
> +	 *
> +	 * With vNMI we should never need an NMI window
> +	 * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ)
>  	 */
> +	if (WARN_ON_ONCE(is_vnmi_enabled(svm)))
> +		return;
> +
>  	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> -	svm->nmi_singlestep = true;
>  	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +	svm->nmi_singlestep = true;
>  }
>  
>  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> @@ -4770,6 +4823,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.patch_hypercall = svm_patch_hypercall,
>  	.inject_irq = svm_inject_irq,
>  	.inject_nmi = svm_inject_nmi,
> +	.get_hw_nmi_pending = svm_get_hw_nmi_pending,
> +	.set_hw_nmi_pending = svm_set_hw_nmi_pending,
>  	.inject_exception = svm_inject_exception,
>  	.cancel_injection = svm_cancel_injection,
>  	.interrupt_allowed = svm_interrupt_allowed,
> @@ -5058,6 +5113,16 @@ static __init int svm_hardware_setup(void)
>  			pr_info("Virtual GIF supported\n");
>  	}
>  
> +
> +	vnmi = vgif && vnmi && boot_cpu_has(X86_FEATURE_AMD_VNMI);
> +	if (vnmi)
> +		pr_info("Virtual NMI enabled\n");
> +
> +	if (!vnmi) {
> +		svm_x86_ops.get_hw_nmi_pending = NULL;
> +		svm_x86_ops.set_hw_nmi_pending = NULL;
> +	}
> +
>  	if (lbrv) {
>  		if (!boot_cpu_has(X86_FEATURE_LBRV))
>  			lbrv = false;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 587ddc150f9f34..0b7e1790fadde1 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>  extern bool npt_enabled;
>  extern int vgif;
>  extern bool intercept_smi;
> +extern bool vnmi;
>  
>  enum avic_modes {
>  	AVIC_MODE_NONE = 0,
> @@ -553,6 +554,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
>  	       (msr < (APIC_BASE_MSR + 0x100));
>  }
>  
> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> +{
> +	/* L1's vNMI is inhibited while nested guest is running */
> +	if (is_guest_mode(&svm->vcpu))
> +		return false;
> +
> +	return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE);
> +}
> +
>  /* svm.c */
>  #define MSR_INVALID				0xffffffffU
>  


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI
  2022-11-29 19:37 ` [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI Maxim Levitsky
@ 2022-12-05 17:14   ` Santosh Shukla
  2022-12-06 12:19     ` Maxim Levitsky
  2023-02-01  0:44   ` Sean Christopherson
  1 sibling, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2022-12-05 17:14 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Santosh Shukla

On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> This patch allows L1 to use vNMI to accelerate its injection
> of NMIs to L2 by passing through vNMI int_ctl bits from vmcb12
> to/from vmcb02.
> 
> While L2 runs, L1's vNMI is inhibited, and L1's NMIs are injected
> normally.
> 
> In order to support nested VNMI requires saving and restoring the VNMI
> bits during nested entry and exit.
> In case of L1 and L2 both using VNMI- Copy VNMI bits from vmcb12 to
> vmcb02 during entry and vice-versa during exit.
> And in case of L1 uses VNMI and L2 doesn't- Copy VNMI bits from vmcb01 to
> vmcb02 during entry and vice-versa during exit.
> 
> Tested with the KVM-unit-test and Nested Guest scenario.
> 
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 13 ++++++++++++-
>  arch/x86/kvm/svm/svm.c    |  5 +++++
>  arch/x86/kvm/svm/svm.h    |  6 ++++++
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 5bea672bf8b12d..81346665058e26 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -278,6 +278,11 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
>  	if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
>  		return false;
>  
> +	if (CC((control->int_ctl & V_NMI_ENABLE) &&
> +		!vmcb12_is_intercept(control, INTERCEPT_NMI))) {
> +		return false;
> +	}
> +
>  	return true;
>  }
>  
> @@ -427,6 +432,9 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
>  	if (nested_vgif_enabled(svm))
>  		mask |= V_GIF_MASK;
>  
> +	if (nested_vnmi_enabled(svm))
> +		mask |= V_NMI_MASK | V_NMI_PENDING;
> +
>  	svm->nested.ctl.int_ctl        &= ~mask;
>  	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
>  }
> @@ -682,8 +690,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  	else
>  		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
>  
> -	if (vnmi)
> +	if (vnmi) {

To avoid above change, I think we should move nested bits from 10/11 to this i.e.. move function
(nested_svm_save_vnmi and nested_svm_restore_vnmi) to this patch.

make sense?

Thanks,
Santosh

>  		nested_svm_save_vnmi(svm);
> +		if (nested_vnmi_enabled(svm))
> +			int_ctl_vmcb12_bits |= (V_NMI_PENDING | V_NMI_ENABLE | V_NMI_MASK);
> +	}
>  
>  	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
>  	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index bf10adcf3170a8..fb203f536d2f9b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4214,6 +4214,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  
>  	svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
>  
> +	svm->vnmi_enabled = vnmi && guest_cpuid_has(vcpu, X86_FEATURE_AMD_VNMI);
> +
>  	svm_recalc_instruction_intercepts(vcpu, svm);
>  
>  	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
> @@ -4967,6 +4969,9 @@ static __init void svm_set_cpu_caps(void)
>  		if (vgif)
>  			kvm_cpu_cap_set(X86_FEATURE_VGIF);
>  
> +		if (vnmi)
> +			kvm_cpu_cap_set(X86_FEATURE_AMD_VNMI);
> +
>  		/* Nested VM can receive #VMEXIT instead of triggering #GP */
>  		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
>  	}
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0b7e1790fadde1..8fb2085188c5ac 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -271,6 +271,7 @@ struct vcpu_svm {
>  	bool pause_filter_enabled         : 1;
>  	bool pause_threshold_enabled      : 1;
>  	bool vgif_enabled                 : 1;
> +	bool vnmi_enabled                 : 1;
>  
>  	u32 ldr_reg;
>  	u32 dfr_reg;
> @@ -545,6 +546,11 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
>  }
>  
> +static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
> +{
> +	return svm->vnmi_enabled && (svm->nested.ctl.int_ctl & V_NMI_ENABLE);
> +}
> +
>  static inline bool is_x2apic_msrpm_offset(u32 offset)
>  {
>  	/* 4 msrs per u8, and 4 u8 in u32 */


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 00/11] SVM: vNMI (with my fixes)
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (10 preceding siblings ...)
  2022-11-29 19:37 ` [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI Maxim Levitsky
@ 2022-12-06  9:58 ` Santosh Shukla
  2023-02-01  0:24   ` Sean Christopherson
  2022-12-20 10:27 ` Maxim Levitsky
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2022-12-06  9:58 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Santosh Shukla

On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> Hi!
> 
> 
> 
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
> 
> 
> 
> In this version of this patch series I addressed most of the review feedback
> 
> added some more refactoring and also I think fixed the issue with migration.
> 
> 
> 
> I only tested this on a machine which doesn't have vNMI, so this does need
> 
> some testing to ensure that nothing is broken.
> 
> 
> 
> Best regards,
> 
>        Maxim Levitsky
> 
> 
Series tested on EPYC-v4.
Tested-By: Santosh Shukla <Santosh.Shukla@amd.com>

> 
> Maxim Levitsky (9):
> 
>   KVM: nSVM: don't sync back tlb_ctl on nested VM exit
> 
>   KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
> 
>   KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
> 
>     doesn't intercept interrupts
> 
>   KVM: SVM: drop the SVM specific H_FLAGS
> 
>   KVM: x86: emulator: stop using raw host flags
> 
>   KVM: SVM: add wrappers to enable/disable IRET interception
> 
>   KVM: x86: add a delayed hardware NMI injection interface
> 
>   KVM: SVM: implement support for vNMI
> 
>   KVM: nSVM: implement support for nested VNMI
> 
> 
> 
> Santosh Shukla (2):
> 
>   x86/cpu: Add CPUID feature bit for VNMI
> 
>   KVM: SVM: Add VNMI bit definition
> 
> 
> 
>  arch/x86/include/asm/cpufeatures.h |   1 +
> 
>  arch/x86/include/asm/kvm-x86-ops.h |   2 +
> 
>  arch/x86/include/asm/kvm_host.h    |  24 +++--
> 
>  arch/x86/include/asm/svm.h         |   7 ++
> 
>  arch/x86/kvm/emulate.c             |  11 +--
> 
>  arch/x86/kvm/kvm_emulate.h         |   7 +-
> 
>  arch/x86/kvm/smm.c                 |   2 -
> 
>  arch/x86/kvm/svm/nested.c          | 102 ++++++++++++++++---
> 
>  arch/x86/kvm/svm/svm.c             | 154 ++++++++++++++++++++++-------
> 
>  arch/x86/kvm/svm/svm.h             |  41 +++++++-
> 
>  arch/x86/kvm/x86.c                 |  50 ++++++++--
> 
>  11 files changed, 318 insertions(+), 83 deletions(-)
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit
  2022-12-05 14:05   ` Santosh Shukla
@ 2022-12-06 12:13     ` Maxim Levitsky
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2022-12-06 12:13 UTC (permalink / raw)
  To: Santosh Shukla, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson

On Mon, 2022-12-05 at 19:35 +0530, Santosh Shukla wrote:
> Hi Maxim,
> 
> On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> > The CPU doesn't change TLB_CTL value as stated in the PRM (15.16.2):
> > 
> nits:
> s / PRM (15.16.2) / APM (15.16.1 - TLB Flush)

True for both changes, thanks!

Best regards,
	Maxim Levitsky

> 
> >   "The VMRUN instruction reads, but does not change, the
> >   value of the TLB_CONTROL field"
> > 
> > Therefore the KVM shouldn't do that either.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/nested.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index bc9cd7086fa972..37af0338da7c32 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -1010,7 +1010,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> >  		vmcb12->control.next_rip  = vmcb02->control.next_rip;
> >  
> >  	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
> > -	vmcb12->control.tlb_ctl           = svm->nested.ctl.tlb_ctl;
> >  	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
> >  	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
> >  



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception
  2022-12-05 15:41   ` Santosh Shukla
@ 2022-12-06 12:14     ` Maxim Levitsky
  2022-12-08 12:09       ` Santosh Shukla
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-12-06 12:14 UTC (permalink / raw)
  To: Santosh Shukla, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson

On Mon, 2022-12-05 at 21:11 +0530, Santosh Shukla wrote:
> On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> > SEV-ES guests don't use IRET interception for the detection of
> > an end of a NMI.
> > 
> > Therefore it makes sense to create a wrapper to avoid repeating
> > the check for the SEV-ES.
> > 
> > No functional change is intended.
> > 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++++---------
> >  1 file changed, 19 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 512b2aa21137e2..cfed6ab29c839a 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2468,16 +2468,29 @@ static int task_switch_interception(struct kvm_vcpu *vcpu)
> >  			       has_error_code, error_code);
> >  }
> >  
> > +static void svm_disable_iret_interception(struct vcpu_svm *svm)
> > +{
> > +	if (!sev_es_guest(svm->vcpu.kvm))
> > +		svm_clr_intercept(svm, INTERCEPT_IRET);
> > +}
> > +
> > +static void svm_enable_iret_interception(struct vcpu_svm *svm)
> > +{
> > +	if (!sev_es_guest(svm->vcpu.kvm))
> > +		svm_set_intercept(svm, INTERCEPT_IRET);
> > +}
> > +
> 
> nits:
> s/_iret_interception / _iret_intercept
> does that make sense?

Makes sense. I can also move this to svm.h near the svm_set_intercept(), I think
it better a better place for this function there if no objections.

Best regards,
	Maxim Levitsky
> 
> Thanks,
> Santosh
> 
> >  static int iret_interception(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> >  
> >  	++vcpu->stat.nmi_window_exits;
> >  	svm->awaiting_iret_completion = true;
> > -	if (!sev_es_guest(vcpu->kvm)) {
> > -		svm_clr_intercept(svm, INTERCEPT_IRET);
> > +
> > +	svm_disable_iret_interception(svm);
> > +	if (!sev_es_guest(vcpu->kvm))
> >  		svm->nmi_iret_rip = kvm_rip_read(vcpu);
> > -	}
> > +
> >  	kvm_make_request(KVM_REQ_EVENT, vcpu);
> >  	return 1;
> >  }
> > @@ -3470,8 +3483,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> >  		return;
> >  
> >  	svm->nmi_masked = true;
> > -	if (!sev_es_guest(vcpu->kvm))
> > -		svm_set_intercept(svm, INTERCEPT_IRET);
> > +	svm_enable_iret_interception(svm);
> >  	++vcpu->stat.nmi_injections;
> >  }
> >  
> > @@ -3614,12 +3626,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> >  
> >  	if (masked) {
> >  		svm->nmi_masked = true;
> > -		if (!sev_es_guest(vcpu->kvm))
> > -			svm_set_intercept(svm, INTERCEPT_IRET);
> > +		svm_enable_iret_interception(svm);
> >  	} else {
> >  		svm->nmi_masked = false;
> > -		if (!sev_es_guest(vcpu->kvm))
> > -			svm_clr_intercept(svm, INTERCEPT_IRET);
> > +		svm_disable_iret_interception(svm);
> >  	}
> >  }
> >  



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI
  2022-12-05 17:14   ` Santosh Shukla
@ 2022-12-06 12:19     ` Maxim Levitsky
  2022-12-08 12:11       ` Santosh Shukla
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-12-06 12:19 UTC (permalink / raw)
  To: Santosh Shukla, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson

On Mon, 2022-12-05 at 22:44 +0530, Santosh Shukla wrote:
> On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> > This patch allows L1 to use vNMI to accelerate its injection
> > of NMIs to L2 by passing through vNMI int_ctl bits from vmcb12
> > to/from vmcb02.
> > 
> > While L2 runs, L1's vNMI is inhibited, and L1's NMIs are injected
> > normally.
> > 
> > In order to support nested VNMI requires saving and restoring the VNMI
> > bits during nested entry and exit.
> > In case of L1 and L2 both using VNMI- Copy VNMI bits from vmcb12 to
> > vmcb02 during entry and vice-versa during exit.
> > And in case of L1 uses VNMI and L2 doesn't- Copy VNMI bits from vmcb01 to
> > vmcb02 during entry and vice-versa during exit.
> > 
> > Tested with the KVM-unit-test and Nested Guest scenario.
> > 
> > 
> > Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/nested.c | 13 ++++++++++++-
> >  arch/x86/kvm/svm/svm.c    |  5 +++++
> >  arch/x86/kvm/svm/svm.h    |  6 ++++++
> >  3 files changed, 23 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 5bea672bf8b12d..81346665058e26 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -278,6 +278,11 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> >  	if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
> >  		return false;
> >  
> > +	if (CC((control->int_ctl & V_NMI_ENABLE) &&
> > +		!vmcb12_is_intercept(control, INTERCEPT_NMI))) {
> > +		return false;
> > +	}
> > +
> >  	return true;
> >  }
> >  
> > @@ -427,6 +432,9 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> >  	if (nested_vgif_enabled(svm))
> >  		mask |= V_GIF_MASK;
> >  
> > +	if (nested_vnmi_enabled(svm))
> > +		mask |= V_NMI_MASK | V_NMI_PENDING;
> > +
> >  	svm->nested.ctl.int_ctl        &= ~mask;
> >  	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
> >  }
> > @@ -682,8 +690,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> >  	else
> >  		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
> >  
> > -	if (vnmi)
> > +	if (vnmi) {
> 
> To avoid above change, I think we should move nested bits from 10/11 to this i.e.. move function
> (nested_svm_save_vnmi and nested_svm_restore_vnmi) to this patch.
> 
> make sense?


This is done on purpose:

For each optional SVM feature there are two parts in regard to nesting.

First part is the nesting co-existance, meaning that KVM should still work
while a nested guest runs, and the second part is letting the nested hypervisor
use the feature.

First part is mandatory, as otherwise KVM will be broken while a nested
guest runs.

I squashed all of the vNMI support including nested co-existance in the patch 10,
and that includes the 'nested_svm_save_vnmi' and 'nested_svm_restore_vnmi'

Now this patch adds the actual nested vNMI, meaning that the nested hypervisor can
use it to speed up the delivery of NMI, it would like to inject to L2.

Best regards,
	Maxim Levitsky

> 
> Thanks,
> Santosh
> 
> >  		nested_svm_save_vnmi(svm);
> > +		if (nested_vnmi_enabled(svm))
> > +			int_ctl_vmcb12_bits |= (V_NMI_PENDING | V_NMI_ENABLE | V_NMI_MASK);
> > +	}
> >  
> >  	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
> >  	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index bf10adcf3170a8..fb203f536d2f9b 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4214,6 +4214,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >  
> >  	svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
> >  
> > +	svm->vnmi_enabled = vnmi && guest_cpuid_has(vcpu, X86_FEATURE_AMD_VNMI);
> > +
> >  	svm_recalc_instruction_intercepts(vcpu, svm);
> >  
> >  	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
> > @@ -4967,6 +4969,9 @@ static __init void svm_set_cpu_caps(void)
> >  		if (vgif)
> >  			kvm_cpu_cap_set(X86_FEATURE_VGIF);
> >  
> > +		if (vnmi)
> > +			kvm_cpu_cap_set(X86_FEATURE_AMD_VNMI);
> > +
> >  		/* Nested VM can receive #VMEXIT instead of triggering #GP */
> >  		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
> >  	}
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 0b7e1790fadde1..8fb2085188c5ac 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -271,6 +271,7 @@ struct vcpu_svm {
> >  	bool pause_filter_enabled         : 1;
> >  	bool pause_threshold_enabled      : 1;
> >  	bool vgif_enabled                 : 1;
> > +	bool vnmi_enabled                 : 1;
> >  
> >  	u32 ldr_reg;
> >  	u32 dfr_reg;
> > @@ -545,6 +546,11 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> >  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> >  }
> >  
> > +static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
> > +{
> > +	return svm->vnmi_enabled && (svm->nested.ctl.int_ctl & V_NMI_ENABLE);
> > +}
> > +
> >  static inline bool is_x2apic_msrpm_offset(u32 offset)
> >  {
> >  	/* 4 msrs per u8, and 4 u8 in u32 */



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception
  2022-12-06 12:14     ` Maxim Levitsky
@ 2022-12-08 12:09       ` Santosh Shukla
  2022-12-08 13:44         ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2022-12-08 12:09 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson



On 12/6/2022 5:44 PM, Maxim Levitsky wrote:
> On Mon, 2022-12-05 at 21:11 +0530, Santosh Shukla wrote:
>> On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
>>> SEV-ES guests don't use IRET interception for the detection of
>>> an end of a NMI.
>>>
>>> Therefore it makes sense to create a wrapper to avoid repeating
>>> the check for the SEV-ES.
>>>
>>> No functional change is intended.
>>>
>>> Suggested-by: Sean Christopherson <seanjc@google.com>
>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>> ---
>>>  arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++++---------
>>>  1 file changed, 19 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index 512b2aa21137e2..cfed6ab29c839a 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -2468,16 +2468,29 @@ static int task_switch_interception(struct kvm_vcpu *vcpu)
>>>  			       has_error_code, error_code);
>>>  }
>>>  
>>> +static void svm_disable_iret_interception(struct vcpu_svm *svm)
>>> +{
>>> +	if (!sev_es_guest(svm->vcpu.kvm))
>>> +		svm_clr_intercept(svm, INTERCEPT_IRET);
>>> +}
>>> +
>>> +static void svm_enable_iret_interception(struct vcpu_svm *svm)
>>> +{
>>> +	if (!sev_es_guest(svm->vcpu.kvm))
>>> +		svm_set_intercept(svm, INTERCEPT_IRET);
>>> +}
>>> +
>>
>> nits:
>> s/_iret_interception / _iret_intercept
>> does that make sense?
> 
> Makes sense. I can also move this to svm.h near the svm_set_intercept(), I think
> it better a better place for this function there if no objections.
> 
I think current approach is fine since function used in svm.c only. but I have
no strong opinion on moving to svm.h either ways.

Thanks,
Santosh

> Best regards,
> 	Maxim Levitsky
>>
>> Thanks,
>> Santosh
>>
>>>  static int iret_interception(struct kvm_vcpu *vcpu)
>>>  {
>>>  	struct vcpu_svm *svm = to_svm(vcpu);
>>>  
>>>  	++vcpu->stat.nmi_window_exits;
>>>  	svm->awaiting_iret_completion = true;
>>> -	if (!sev_es_guest(vcpu->kvm)) {
>>> -		svm_clr_intercept(svm, INTERCEPT_IRET);
>>> +
>>> +	svm_disable_iret_interception(svm);
>>> +	if (!sev_es_guest(vcpu->kvm))
>>>  		svm->nmi_iret_rip = kvm_rip_read(vcpu);
>>> -	}
>>> +
>>>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>>>  	return 1;
>>>  }
>>> @@ -3470,8 +3483,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>>>  		return;
>>>  
>>>  	svm->nmi_masked = true;
>>> -	if (!sev_es_guest(vcpu->kvm))
>>> -		svm_set_intercept(svm, INTERCEPT_IRET);
>>> +	svm_enable_iret_interception(svm);
>>>  	++vcpu->stat.nmi_injections;
>>>  }
>>>  
>>> @@ -3614,12 +3626,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>>>  
>>>  	if (masked) {
>>>  		svm->nmi_masked = true;
>>> -		if (!sev_es_guest(vcpu->kvm))
>>> -			svm_set_intercept(svm, INTERCEPT_IRET);
>>> +		svm_enable_iret_interception(svm);
>>>  	} else {
>>>  		svm->nmi_masked = false;
>>> -		if (!sev_es_guest(vcpu->kvm))
>>> -			svm_clr_intercept(svm, INTERCEPT_IRET);
>>> +		svm_disable_iret_interception(svm);
>>>  	}
>>>  }
>>>  
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI
  2022-12-06 12:19     ` Maxim Levitsky
@ 2022-12-08 12:11       ` Santosh Shukla
  0 siblings, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2022-12-08 12:11 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson, Tom Lendacky



On 12/6/2022 5:49 PM, Maxim Levitsky wrote:
> On Mon, 2022-12-05 at 22:44 +0530, Santosh Shukla wrote:
>> On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
>>> This patch allows L1 to use vNMI to accelerate its injection
>>> of NMIs to L2 by passing through vNMI int_ctl bits from vmcb12
>>> to/from vmcb02.
>>>
>>> While L2 runs, L1's vNMI is inhibited, and L1's NMIs are injected
>>> normally.
>>>
>>> In order to support nested VNMI requires saving and restoring the VNMI
>>> bits during nested entry and exit.
>>> In case of L1 and L2 both using VNMI- Copy VNMI bits from vmcb12 to
>>> vmcb02 during entry and vice-versa during exit.
>>> And in case of L1 uses VNMI and L2 doesn't- Copy VNMI bits from vmcb01 to
>>> vmcb02 during entry and vice-versa during exit.
>>>
>>> Tested with the KVM-unit-test and Nested Guest scenario.
>>>
>>>
>>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>>> ---
>>>  arch/x86/kvm/svm/nested.c | 13 ++++++++++++-
>>>  arch/x86/kvm/svm/svm.c    |  5 +++++
>>>  arch/x86/kvm/svm/svm.h    |  6 ++++++
>>>  3 files changed, 23 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>>> index 5bea672bf8b12d..81346665058e26 100644
>>> --- a/arch/x86/kvm/svm/nested.c
>>> +++ b/arch/x86/kvm/svm/nested.c
>>> @@ -278,6 +278,11 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
>>>  	if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
>>>  		return false;
>>>  
>>> +	if (CC((control->int_ctl & V_NMI_ENABLE) &&
>>> +		!vmcb12_is_intercept(control, INTERCEPT_NMI))) {
>>> +		return false;
>>> +	}
>>> +
>>>  	return true;
>>>  }
>>>  
>>> @@ -427,6 +432,9 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
>>>  	if (nested_vgif_enabled(svm))
>>>  		mask |= V_GIF_MASK;
>>>  
>>> +	if (nested_vnmi_enabled(svm))
>>> +		mask |= V_NMI_MASK | V_NMI_PENDING;
>>> +
>>>  	svm->nested.ctl.int_ctl        &= ~mask;
>>>  	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
>>>  }
>>> @@ -682,8 +690,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>>>  	else
>>>  		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
>>>  
>>> -	if (vnmi)
>>> +	if (vnmi) {
>>
>> To avoid above change, I think we should move nested bits from 10/11 to this i.e.. move function
>> (nested_svm_save_vnmi and nested_svm_restore_vnmi) to this patch.
>>
>> make sense?
> 
> 
> This is done on purpose:
> 
> For each optional SVM feature there are two parts in regard to nesting.
> 
> First part is the nesting co-existance, meaning that KVM should still work
> while a nested guest runs, and the second part is letting the nested hypervisor
> use the feature.
> 
> First part is mandatory, as otherwise KVM will be broken while a nested
> guest runs.
> 
Ok!,.

> I squashed all of the vNMI support including nested co-existance in the patch 10,
> and that includes the 'nested_svm_save_vnmi' and 'nested_svm_restore_vnmi'
> 
> Now this patch adds the actual nested vNMI, meaning that the nested hypervisor can
> use it to speed up the delivery of NMI, it would like to inject to L2.
>
Ok, Make sense to me.
Thank-you for the explanation.

Regards,
Santosh
 
> Best regards,
> 	Maxim Levitsky
> 
>>
>> Thanks,
>> Santosh
>>
>>>  		nested_svm_save_vnmi(svm);
>>> +		if (nested_vnmi_enabled(svm))
>>> +			int_ctl_vmcb12_bits |= (V_NMI_PENDING | V_NMI_ENABLE | V_NMI_MASK);
>>> +	}
>>>  
>>>  	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
>>>  	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index bf10adcf3170a8..fb203f536d2f9b 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -4214,6 +4214,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>>  
>>>  	svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
>>>  
>>> +	svm->vnmi_enabled = vnmi && guest_cpuid_has(vcpu, X86_FEATURE_AMD_VNMI);
>>> +
>>>  	svm_recalc_instruction_intercepts(vcpu, svm);
>>>  
>>>  	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
>>> @@ -4967,6 +4969,9 @@ static __init void svm_set_cpu_caps(void)
>>>  		if (vgif)
>>>  			kvm_cpu_cap_set(X86_FEATURE_VGIF);
>>>  
>>> +		if (vnmi)
>>> +			kvm_cpu_cap_set(X86_FEATURE_AMD_VNMI);
>>> +
>>>  		/* Nested VM can receive #VMEXIT instead of triggering #GP */
>>>  		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
>>>  	}
>>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>>> index 0b7e1790fadde1..8fb2085188c5ac 100644
>>> --- a/arch/x86/kvm/svm/svm.h
>>> +++ b/arch/x86/kvm/svm/svm.h
>>> @@ -271,6 +271,7 @@ struct vcpu_svm {
>>>  	bool pause_filter_enabled         : 1;
>>>  	bool pause_threshold_enabled      : 1;
>>>  	bool vgif_enabled                 : 1;
>>> +	bool vnmi_enabled                 : 1;
>>>  
>>>  	u32 ldr_reg;
>>>  	u32 dfr_reg;
>>> @@ -545,6 +546,11 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>>>  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
>>>  }
>>>  
>>> +static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
>>> +{
>>> +	return svm->vnmi_enabled && (svm->nested.ctl.int_ctl & V_NMI_ENABLE);
>>> +}
>>> +
>>>  static inline bool is_x2apic_msrpm_offset(u32 offset)
>>>  {
>>>  	/* 4 msrs per u8, and 4 u8 in u32 */
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception
  2022-12-08 12:09       ` Santosh Shukla
@ 2022-12-08 13:44         ` Maxim Levitsky
  2023-01-31 21:07           ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-12-08 13:44 UTC (permalink / raw)
  To: Santosh Shukla, kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson

On Thu, 2022-12-08 at 17:39 +0530, Santosh Shukla wrote:
> 
> On 12/6/2022 5:44 PM, Maxim Levitsky wrote:
> > On Mon, 2022-12-05 at 21:11 +0530, Santosh Shukla wrote:
> > > On 11/30/2022 1:07 AM, Maxim Levitsky wrote:
> > > > SEV-ES guests don't use IRET interception for the detection of
> > > > an end of a NMI.
> > > > 
> > > > Therefore it makes sense to create a wrapper to avoid repeating
> > > > the check for the SEV-ES.
> > > > 
> > > > No functional change is intended.
> > > > 
> > > > Suggested-by: Sean Christopherson <seanjc@google.com>
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > ---
> > > >  arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++++---------
> > > >  1 file changed, 19 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index 512b2aa21137e2..cfed6ab29c839a 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -2468,16 +2468,29 @@ static int task_switch_interception(struct kvm_vcpu *vcpu)
> > > >  			       has_error_code, error_code);
> > > >  }
> > > >  
> > > > +static void svm_disable_iret_interception(struct vcpu_svm *svm)
> > > > +{
> > > > +	if (!sev_es_guest(svm->vcpu.kvm))
> > > > +		svm_clr_intercept(svm, INTERCEPT_IRET);
> > > > +}
> > > > +
> > > > +static void svm_enable_iret_interception(struct vcpu_svm *svm)
> > > > +{
> > > > +	if (!sev_es_guest(svm->vcpu.kvm))
> > > > +		svm_set_intercept(svm, INTERCEPT_IRET);
> > > > +}
> > > > +
> > > 
> > > nits:
> > > s/_iret_interception / _iret_intercept
> > > does that make sense?
> > 
> > Makes sense. I can also move this to svm.h near the svm_set_intercept(), I think
> > it better a better place for this function there if no objections.
> > 
> I think current approach is fine since function used in svm.c only. but I have
> no strong opinion on moving to svm.h either ways.

I also think so, just noticed something in case there are any objections.

Best regards,
	Maxim Levitsky

> 
> Thanks,
> Santosh
> 
> > Best regards,
> > 	Maxim Levitsky
> > > Thanks,
> > > Santosh
> > > 
> > > >  static int iret_interception(struct kvm_vcpu *vcpu)
> > > >  {
> > > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > > >  
> > > >  	++vcpu->stat.nmi_window_exits;
> > > >  	svm->awaiting_iret_completion = true;
> > > > -	if (!sev_es_guest(vcpu->kvm)) {
> > > > -		svm_clr_intercept(svm, INTERCEPT_IRET);
> > > > +
> > > > +	svm_disable_iret_interception(svm);
> > > > +	if (!sev_es_guest(vcpu->kvm))
> > > >  		svm->nmi_iret_rip = kvm_rip_read(vcpu);
> > > > -	}
> > > > +
> > > >  	kvm_make_request(KVM_REQ_EVENT, vcpu);
> > > >  	return 1;
> > > >  }
> > > > @@ -3470,8 +3483,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> > > >  		return;
> > > >  
> > > >  	svm->nmi_masked = true;
> > > > -	if (!sev_es_guest(vcpu->kvm))
> > > > -		svm_set_intercept(svm, INTERCEPT_IRET);
> > > > +	svm_enable_iret_interception(svm);
> > > >  	++vcpu->stat.nmi_injections;
> > > >  }
> > > >  
> > > > @@ -3614,12 +3626,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> > > >  
> > > >  	if (masked) {
> > > >  		svm->nmi_masked = true;
> > > > -		if (!sev_es_guest(vcpu->kvm))
> > > > -			svm_set_intercept(svm, INTERCEPT_IRET);
> > > > +		svm_enable_iret_interception(svm);
> > > >  	} else {
> > > >  		svm->nmi_masked = false;
> > > > -		if (!sev_es_guest(vcpu->kvm))
> > > > -			svm_clr_intercept(svm, INTERCEPT_IRET);
> > > > +		svm_disable_iret_interception(svm);
> > > >  	}
> > > >  }
> > > >  



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 00/11] SVM: vNMI (with my fixes)
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (11 preceding siblings ...)
  2022-12-06  9:58 ` [PATCH v2 00/11] SVM: vNMI (with my fixes) Santosh Shukla
@ 2022-12-20 10:27 ` Maxim Levitsky
  2022-12-21 18:44   ` Sean Christopherson
  2023-01-15  9:05 ` Maxim Levitsky
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2022-12-20 10:27 UTC (permalink / raw)
  To: kvm
  Cc: Santosh Shukla, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson

On Tue, 2022-11-29 at 21:37 +0200, Maxim Levitsky wrote:
> Hi!
> 
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
> 
> In this version of this patch series I addressed most of the review feedback
> added some more refactoring and also I think fixed the issue with migration.
> 
> I only tested this on a machine which doesn't have vNMI, so this does need
> some testing to ensure that nothing is broken.
> 
> Best regards,
>        Maxim Levitsky
> 
> Maxim Levitsky (9):
>   KVM: nSVM: don't sync back tlb_ctl on nested VM exit
>   KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
>   KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
>     doesn't intercept interrupts
>   KVM: SVM: drop the SVM specific H_FLAGS
>   KVM: x86: emulator: stop using raw host flags
>   KVM: SVM: add wrappers to enable/disable IRET interception
>   KVM: x86: add a delayed hardware NMI injection interface
>   KVM: SVM: implement support for vNMI
>   KVM: nSVM: implement support for nested VNMI
> 
> Santosh Shukla (2):
>   x86/cpu: Add CPUID feature bit for VNMI
>   KVM: SVM: Add VNMI bit definition
> 
>  arch/x86/include/asm/cpufeatures.h |   1 +
>  arch/x86/include/asm/kvm-x86-ops.h |   2 +
>  arch/x86/include/asm/kvm_host.h    |  24 +++--
>  arch/x86/include/asm/svm.h         |   7 ++
>  arch/x86/kvm/emulate.c             |  11 +--
>  arch/x86/kvm/kvm_emulate.h         |   7 +-
>  arch/x86/kvm/smm.c                 |   2 -
>  arch/x86/kvm/svm/nested.c          | 102 ++++++++++++++++---
>  arch/x86/kvm/svm/svm.c             | 154 ++++++++++++++++++++++-------
>  arch/x86/kvm/svm/svm.h             |  41 +++++++-
>  arch/x86/kvm/x86.c                 |  50 ++++++++--
>  11 files changed, 318 insertions(+), 83 deletions(-)
> 
> -- 
> 2.26.3
> 
> 
A very kind ping on these patches.


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 00/11] SVM: vNMI (with my fixes)
  2022-12-20 10:27 ` Maxim Levitsky
@ 2022-12-21 18:44   ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-12-21 18:44 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Santosh Shukla, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Dec 20, 2022, Maxim Levitsky wrote:
> On Tue, 2022-11-29 at 21:37 +0200, Maxim Levitsky wrote:
> > Hi!
> > 
> > This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
> > 
> > In this version of this patch series I addressed most of the review feedback
> > added some more refactoring and also I think fixed the issue with migration.
> > 
> > I only tested this on a machine which doesn't have vNMI, so this does need
> > some testing to ensure that nothing is broken.
> > 
> > Best regards,
> >        Maxim Levitsky
> > 
> > Maxim Levitsky (9):
> >   KVM: nSVM: don't sync back tlb_ctl on nested VM exit
> >   KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
> >   KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
> >     doesn't intercept interrupts
> >   KVM: SVM: drop the SVM specific H_FLAGS
> >   KVM: x86: emulator: stop using raw host flags
> >   KVM: SVM: add wrappers to enable/disable IRET interception
> >   KVM: x86: add a delayed hardware NMI injection interface
> >   KVM: SVM: implement support for vNMI
> >   KVM: nSVM: implement support for nested VNMI
> > 
> > Santosh Shukla (2):
> >   x86/cpu: Add CPUID feature bit for VNMI
> >   KVM: SVM: Add VNMI bit definition
> > 
> >  arch/x86/include/asm/cpufeatures.h |   1 +
> >  arch/x86/include/asm/kvm-x86-ops.h |   2 +
> >  arch/x86/include/asm/kvm_host.h    |  24 +++--
> >  arch/x86/include/asm/svm.h         |   7 ++
> >  arch/x86/kvm/emulate.c             |  11 +--
> >  arch/x86/kvm/kvm_emulate.h         |   7 +-
> >  arch/x86/kvm/smm.c                 |   2 -
> >  arch/x86/kvm/svm/nested.c          | 102 ++++++++++++++++---
> >  arch/x86/kvm/svm/svm.c             | 154 ++++++++++++++++++++++-------
> >  arch/x86/kvm/svm/svm.h             |  41 +++++++-
> >  arch/x86/kvm/x86.c                 |  50 ++++++++--
> >  11 files changed, 318 insertions(+), 83 deletions(-)
> > 
> > -- 
> > 2.26.3
> > 
> > 
> A very kind ping on these patches.

Sorry, I won't get to this (or anything else) until the new year.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 00/11] SVM: vNMI (with my fixes)
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (12 preceding siblings ...)
  2022-12-20 10:27 ` Maxim Levitsky
@ 2023-01-15  9:05 ` Maxim Levitsky
  2023-01-28  1:13 ` Sean Christopherson
  2023-02-01 19:13 ` Sean Christopherson
  15 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2023-01-15  9:05 UTC (permalink / raw)
  To: kvm
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Sean Christopherson

On Tue, 2022-11-29 at 21:37 +0200, Maxim Levitsky wrote:
> Hi!
> 
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
> 
> In this version of this patch series I addressed most of the review feedback
> added some more refactoring and also I think fixed the issue with migration.
> 
> I only tested this on a machine which doesn't have vNMI, so this does need
> some testing to ensure that nothing is broken.
> 
> Best regards,
>        Maxim Levitsky
> 
> Maxim Levitsky (9):
>   KVM: nSVM: don't sync back tlb_ctl on nested VM exit
>   KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
>   KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1
>     doesn't intercept interrupts
>   KVM: SVM: drop the SVM specific H_FLAGS
>   KVM: x86: emulator: stop using raw host flags
>   KVM: SVM: add wrappers to enable/disable IRET interception
>   KVM: x86: add a delayed hardware NMI injection interface
>   KVM: SVM: implement support for vNMI
>   KVM: nSVM: implement support for nested VNMI
> 
> Santosh Shukla (2):
>   x86/cpu: Add CPUID feature bit for VNMI
>   KVM: SVM: Add VNMI bit definition
> 
>  arch/x86/include/asm/cpufeatures.h |   1 +
>  arch/x86/include/asm/kvm-x86-ops.h |   2 +
>  arch/x86/include/asm/kvm_host.h    |  24 +++--
>  arch/x86/include/asm/svm.h         |   7 ++
>  arch/x86/kvm/emulate.c             |  11 +--
>  arch/x86/kvm/kvm_emulate.h         |   7 +-
>  arch/x86/kvm/smm.c                 |   2 -
>  arch/x86/kvm/svm/nested.c          | 102 ++++++++++++++++---
>  arch/x86/kvm/svm/svm.c             | 154 ++++++++++++++++++++++-------
>  arch/x86/kvm/svm/svm.h             |  41 +++++++-
>  arch/x86/kvm/x86.c                 |  50 ++++++++--
>  11 files changed, 318 insertions(+), 83 deletions(-)
> 
> -- 
> 2.26.3
> 
> 
Another kind ping on this patch series.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
  2022-11-29 19:37 ` [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12 Maxim Levitsky
@ 2023-01-28  0:37   ` Sean Christopherson
  2023-01-31  1:44     ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-01-28  0:37 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> the V_IRQ and v_TPR bits don't exist when virtual interrupt
> masking is not enabled, therefore the KVM should not copy these
> bits regardless of V_IRQ intercept.

Hmm, the APM disagrees:

 The APIC's TPR always controls the task priority for physical interrupts, and the
 V_TPR always controls virtual interrupts.

   While running a guest with V_INTR_MASKING cleared to 0:
     • Writes to CR8 affect both the APIC's TPR and the V_TPR register.


 ...

 The three VMCB fields V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR indicate whether there
 is a virtual interrupt pending, and, if so, what its vector number and priority are.

IIUC, V_INTR_MASKING_MASK is mostly about EFLAGS.IF, with a small side effect on
TPR.  E.g. a VMM could pend a V_IRQ but clear V_INTR_MASKING and expect the guest
to take the V_IRQ.  At least, that's my reading of things.

> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 37af0338da7c32..aad3145b2f62fe 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -412,24 +412,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
>   */
>  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
>  {
> -	u32 mask;
> +	u32 mask = 0;
>  	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
>  	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
>  
> -	/* Only a few fields of int_ctl are written by the processor.  */
> -	mask = V_IRQ_MASK | V_TPR_MASK;
> -	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> -	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
> -		/*
> -		 * In order to request an interrupt window, L0 is usurping
> -		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> -		 * even if it was clear in L1's VMCB.  Restoring it would be
> -		 * wrong.  However, in this case V_IRQ will remain true until
> -		 * interrupt_window_interception calls svm_clear_vintr and
> -		 * restores int_ctl.  We can just leave it aside.
> -		 */
> -		mask &= ~V_IRQ_MASK;
> -	}
> +	/*
> +	 * Only a few fields of int_ctl are written by the processor.
> +	 * Copy back only the bits that are passed through to the L2.

Just "L2", not "the L2".

> +	 */
> +

Unnecessary newline.

> +	if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
> +		mask = V_IRQ_MASK | V_TPR_MASK;
>  
>  	if (nested_vgif_enabled(svm))
>  		mask |= V_GIF_MASK;
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/11] KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1 doesn't intercept interrupts
  2022-11-29 19:37 ` [PATCH v2 03/11] KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1 doesn't intercept interrupts Maxim Levitsky
@ 2023-01-28  0:56   ` Sean Christopherson
  2023-01-30 18:41     ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-01-28  0:56 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

Shortlog is too long, maybe this?

  KVM: nSVM: Raise event on nested VM exit if L1 doesn't intercept IRQs

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> If the L2 doesn't intercept interrupts, then the KVM will use vmcb02's

s/the L2/L2, though don't you mean L1?

> V_IRQ for L1 (to detect the interrupt window)

"an interrupt window", i.e. there's not just one window.

> In this case on the nested VM exit KVM might need to copy the V_IRQ bit

s/the nested/nested

> from the vmcb02 to the vmcb01, to continue waiting for the
> interrupt window.
> 
> To make it simple, just raise the KVM_REQ_EVENT request, which
> execution will lead to the reenabling of the interrupt
> window if needed.
> 
> Note that this is a theoretical bug because the KVM already does raise

s/the KVM/KVM

> the KVM_REQ_EVENT request one each nested VM exit because the nested

s/the KVM_REQ_EVENT/KVM_REQ_EVENT, and s/one/on

> VM exit resets RFLAGS and the kvm_set_rflags() raises the
> KVM_REQ_EVENT request in the response.
> 
> However raising this request explicitly, together with
> documenting why this is needed, is still preferred.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index aad3145b2f62fe..e891318595113e 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1016,6 +1016,31 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  
>  	svm_switch_vmcb(svm, &svm->vmcb01);
>  
> +	/* Note about synchronizing some of int_ctl bits from vmcb02 to vmcb01:

	/*
	 * Preferred block comment style...

> +	 *
> +	 * - V_IRQ, V_IRQ_VECTOR, V_INTR_PRIO_MASK, V_IGN_TPR:

Drop the "-" to be consistent with the rest of the paragraphs.

> +	 * If the L2 doesn't intercept interrupts, then
> +	 * (even if the L2 does use virtual interrupt masking),

KVM uses "L2" to refer to the thing running at L2.  I think what you are referring
to here is vmcb12?  And that's controlled by L1.

> +	 * KVM will use the vmcb02's V_INTR to detect interrupt window.

s/the vmcb02/vmcb02

Which of the V_INTR fields does this refer to?  Oooh, you're saying the KVM injects
a virtual interrupt into L2 using vmcb02 in order to determine when L2 has IRQs
enabled.

Why does KVM do that?  Why not pend the actual IRQ directly?  

> +	 *
> +	 * In this case, the KVM raises the KVM_REQ_EVENT to ensure that interrupt window

s/the KVM_REQ_EVENT/KVM_REQ_EVENT

> +	 * is not lost and this implicitly copies these bits from vmcb02 to vmcb01

Too many pronouns.  What do "this" and "these bits" refer to?

> +	 *
> +	 * V_TPR:
> +	 * If the L2 doesn't use virtual interrupt masking, then the L1's vTPR
> +	 * is stored in the vmcb02 but its value doesn't need to be copied from/to
> +	 * vmcb01 because it is copied from/to the TPR APIC's register on
> +	 * each VM entry/exit.
> +	 *
> +	 * V_GIF:
> +	 * - If the nested vGIF is not used, KVM uses vmcb02's V_GIF for L1's V_GIF,

Drop this "-" too.

> +	 * however, the L1 vGIF is reset to false on each VM exit, thus
> +	 * there is no need to copy it from vmcb02 to vmcb01.
> +	 */
> +
> +	if (!nested_exit_on_intr(svm))
> +		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
> +
>  	if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
>  		svm_copy_lbrs(vmcb12, vmcb02);
>  		svm_update_lbrv(vcpu);
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS
  2022-11-29 19:37 ` [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS Maxim Levitsky
  2022-12-05 15:31   ` Santosh Shukla
@ 2023-01-28  0:56   ` Sean Christopherson
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-01-28  0:56 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> @@ -3580,7 +3583,7 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>  		return false;
>  
>  	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
> -	      (vcpu->arch.hflags & HF_NMI_MASK);
> +	      (svm->nmi_masked);

Unnecessary parantheses.

>  
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 05/11] KVM: x86: emulator: stop using raw host flags
  2022-11-29 19:37 ` [PATCH v2 05/11] KVM: x86: emulator: stop using raw host flags Maxim Levitsky
@ 2023-01-28  0:58   ` Sean Christopherson
  2023-02-24 14:38     ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-01-28  0:58 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f18f579ebde81c..85d2a12c214dda 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8138,9 +8138,14 @@ static void emulator_set_nmi_mask(struct x86_emulate_ctxt *ctxt, bool masked)
>  	static_call(kvm_x86_set_nmi_mask)(emul_to_vcpu(ctxt), masked);
>  }
>  
> -static unsigned emulator_get_hflags(struct x86_emulate_ctxt *ctxt)
> +static bool emulator_in_smm(struct x86_emulate_ctxt *ctxt)
>  {
> -	return emul_to_vcpu(ctxt)->arch.hflags;
> +	return emul_to_vcpu(ctxt)->arch.hflags & HF_SMM_MASK;

This needs to be is_smm() as HF_SMM_MASK is undefined if CONFIG_KVM_SMM=n.

> +}
> +
> +static bool emulator_in_guest_mode(struct x86_emulate_ctxt *ctxt)
> +{
> +	return emul_to_vcpu(ctxt)->arch.hflags & HF_GUEST_MASK;

And just use is_guest_mode() here.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2022-11-29 19:37 ` [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface Maxim Levitsky
@ 2023-01-28  1:09   ` Sean Christopherson
  2023-01-31 21:12     ` Sean Christopherson
                       ` (2 more replies)
  2023-01-31 22:28   ` Sean Christopherson
  1 sibling, 3 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-01-28  1:09 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> This patch adds two new vendor callbacks:

No "this patch" please, just say what it does.

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 684a5519812fb2..46993ce61c92db 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -871,8 +871,13 @@ struct kvm_vcpu_arch {
>  	u64 tsc_scaling_ratio; /* current scaling ratio */
>  
>  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> -	unsigned nmi_pending; /* NMI queued after currently running handler */
> +
> +	unsigned int nmi_pending; /*
> +				   * NMI queued after currently running handler
> +				   * (not including a hardware pending NMI (e.g vNMI))
> +				   */

Put the block comment above.  I'd say collapse all of the comments about NMIs into
a single big block comment.

>  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> +
>  	bool smi_pending;    /* SMI queued after currently running handler */
>  	u8 handling_intr_from_guest;
>  
> @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
>  	 * Otherwise, allow two (and we'll inject the first one immediately).
>  	 */
>  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
> -		limit = 1;
> +		limit--;
> +
> +	/* Also if there is already a NMI hardware queued to be injected,
> +	 * decrease the limit again
> +	 */

	/*
	 * Block comment ...
	 */

> +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))

I'd prefer "is_hw_nmi_pending()" over "get", even if it means not pairing with
"set".  Though I think that's a good thing since they aren't perfect pairs.

> +		limit--;
>  
> -	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
> +	if (limit <= 0)
> +		return;
> +
> +	/* Attempt to use hardware NMI queueing */
> +	if (static_call(kvm_x86_set_hw_nmi_pending)(vcpu)) {
> +		limit--;
> +		nmi_to_queue--;
> +	}
> +
> +	vcpu->arch.nmi_pending += nmi_to_queue;
>  	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  
> +/* Return total number of NMIs pending injection to the VM */
> +int kvm_get_total_nmi_pending(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.nmi_pending + static_call(kvm_x86_get_hw_nmi_pending)(vcpu);

Nothing cares about the total count, this can just be;


	bool kvm_is_nmi_pending(struct kvm_vcpu *vcpu)
	{
		return vcpu->arch.nmi_pending ||
		       static_call(kvm_x86_is_hw_nmi_pending)(vcpu);
	}


> +}
> +
>  void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
>  				       unsigned long *vcpu_bitmap)
>  {
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2022-11-29 19:37 ` [PATCH v2 10/11] KVM: SVM: implement support for vNMI Maxim Levitsky
  2022-12-04 17:18   ` Maxim Levitsky
  2022-12-05 17:07   ` Santosh Shukla
@ 2023-01-28  1:10   ` Sean Christopherson
  2023-02-10 12:02     ` Santosh Shukla
  2023-02-01  0:22   ` Sean Christopherson
  3 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-01-28  1:10 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Santosh Shukla

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> This patch implements support for injecting pending
> NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI
> feature.
> 
> Note that the vNMI can't cause a VM exit, which is needed
> when a nested guest intercepts NMIs.
> 
> Therefore to avoid breaking nesting, the vNMI is inhibited while
> a nested guest is running and instead, the legacy NMI window
> detection and delivery method is used.
> 
> While it is possible to passthrough the vNMI if a nested guest
> doesn't intercept NMIs, such usage is very uncommon, and it's
> not worth to optimize for.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c |  42 +++++++++++++++
>  arch/x86/kvm/svm/svm.c    | 111 ++++++++++++++++++++++++++++++--------
>  arch/x86/kvm/svm/svm.h    |  10 ++++
>  3 files changed, 140 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index e891318595113e..5bea672bf8b12d 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj)
>  	return type == SVM_EVTINJ_TYPE_NMI;
>  }
>  
> +static void nested_svm_save_vnmi(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/*
> +	 * Copy the vNMI state back to software NMI tracking state
> +	 * for the duration of the nested run
> +	 */
> +

Unecessary newline.

> +	svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK;
> +	svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING;
> +}
> +
> +static void nested_svm_restore_vnmi(struct vcpu_svm *svm)
> +{
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/*
> +	 * Restore the vNMI state from the software NMI tracking state
> +	 * after a nested run
> +	 */
> +

Unnecessary newline.

> +	if (svm->nmi_masked)
> +		vmcb01->control.int_ctl |= V_NMI_MASK;
> +	else
> +		vmcb01->control.int_ctl &= ~V_NMI_MASK;
> +
> +	if (vcpu->arch.nmi_pending) {
> +		vcpu->arch.nmi_pending--;
> +		vmcb01->control.int_ctl |= V_NMI_PENDING;
> +	} else
> +		vmcb01->control.int_ctl &= ~V_NMI_PENDING;

Needs curly braces.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 00/11] SVM: vNMI (with my fixes)
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (13 preceding siblings ...)
  2023-01-15  9:05 ` Maxim Levitsky
@ 2023-01-28  1:13 ` Sean Christopherson
  2023-02-01 19:13 ` Sean Christopherson
  15 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-01-28  1:13 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> Hi!
> 
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
> 
> In this version of this patch series I addressed most of the review feedback
> added some more refactoring and also I think fixed the issue with migration.
> 
> I only tested this on a machine which doesn't have vNMI, so this does need
> some testing to ensure that nothing is broken.

Apologies for the slow review.

Did a fast run through, mostly have questions to address my lack of knowledge.
I'll give this a much more thorough review first thing next week (my brain is
fried), and am planning on queueing it unless I see someone truly busted (I'll
fixup my nits when applying).

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 03/11] KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1 doesn't intercept interrupts
  2023-01-28  0:56   ` Sean Christopherson
@ 2023-01-30 18:41     ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-01-30 18:41 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Sat, Jan 28, 2023, Sean Christopherson wrote:
> > +	 * If the L2 doesn't intercept interrupts, then
> > +	 * (even if the L2 does use virtual interrupt masking),
> 
> KVM uses "L2" to refer to the thing running at L2.  I think what you are referring
> to here is vmcb12?  And that's controlled by L1.
> 
> > +	 * KVM will use the vmcb02's V_INTR to detect interrupt window.
> 
> s/the vmcb02/vmcb02
> 
> Which of the V_INTR fields does this refer to?  Oooh, you're saying the KVM injects
> a virtual interrupt into L2 using vmcb02 in order to determine when L2 has IRQs
> enabled.
> 
> Why does KVM do that?  Why not pend the actual IRQ directly?

Duh, because KVM needs to gain control in if there are multiple pending events.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
  2023-01-28  0:37   ` Sean Christopherson
@ 2023-01-31  1:44     ` Sean Christopherson
  2023-02-24 14:38       ` Maxim Levitsky
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-01-31  1:44 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Sat, Jan 28, 2023, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > the V_IRQ and v_TPR bits don't exist when virtual interrupt
> > masking is not enabled, therefore the KVM should not copy these
> > bits regardless of V_IRQ intercept.
> 
> Hmm, the APM disagrees:
> 
>  The APIC's TPR always controls the task priority for physical interrupts, and the
>  V_TPR always controls virtual interrupts.
> 
>    While running a guest with V_INTR_MASKING cleared to 0:
>      • Writes to CR8 affect both the APIC's TPR and the V_TPR register.
> 
> 
>  ...
> 
>  The three VMCB fields V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR indicate whether there
>  is a virtual interrupt pending, and, if so, what its vector number and priority are.
> 
> IIUC, V_INTR_MASKING_MASK is mostly about EFLAGS.IF, with a small side effect on
> TPR.  E.g. a VMM could pend a V_IRQ but clear V_INTR_MASKING and expect the guest
> to take the V_IRQ.  At least, that's my reading of things.
>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
> >  1 file changed, 8 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 37af0338da7c32..aad3145b2f62fe 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -412,24 +412,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
> >   */
> >  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> >  {
> > -	u32 mask;
> > +	u32 mask = 0;
> >  	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
> >  	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
> >  
> > -	/* Only a few fields of int_ctl are written by the processor.  */
> > -	mask = V_IRQ_MASK | V_TPR_MASK;
> > -	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> > -	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
> > -		/*
> > -		 * In order to request an interrupt window, L0 is usurping
> > -		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> > -		 * even if it was clear in L1's VMCB.  Restoring it would be
> > -		 * wrong.  However, in this case V_IRQ will remain true until
> > -		 * interrupt_window_interception calls svm_clear_vintr and
> > -		 * restores int_ctl.  We can just leave it aside.
> > -		 */
> > -		mask &= ~V_IRQ_MASK;

Argh! *shakes fist at KVM and SVM*

This is ridiculously convoluted, and I'm pretty sure there are existing bugs.  If
L1 runs L2 with V_IRQ=1 and V_INTR_MASKING=1, and KVM requests an interrupt window,
then KVM will overwrite vmcb02's int_vector and int_ctl, i.e. clobber L1's V_IRQ,
but then silently clear INTERCEPT_VINTR in recalc_intercepts() and thus prevent
svm_clear_vintr() from being reached, i.e. prevent restoring L1's V_IRQ.

Bug #1 is that KVM shouldn't clobber the V_IRQ fields if KVM ultimately decides
not to open an interrupt window.  Bug #2 is that KVM needs to open an interrupt
window if save.RFLAGS.IF=1, as interrupts may become unblocked in that case,
e.g. if L2 is in an interrupt shadow.

So I think this over two patches?

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 05d38944a6c0..ad1e70ac8669 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
 
        if (g->int_ctl & V_INTR_MASKING_MASK) {
                /*
-                * Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
-                * does not affect any interrupt we may want to inject;
-                * therefore, writes to CR8 are irrelevant to L0, as are
-                * interrupt window vmexits.
+                * If L2 is active and V_INTR_MASKING is enabled in vmcb12,
+                * disable intercept of CR8 writes as L2's CR8 does not affect
+                * any interrupt KVM may want to inject.
+                *
+                * Similarly, disable intercept of virtual interrupts (used to
+                * detect interrupt windows) if the saved RFLAGS.IF is '0', as
+                * the effective RFLAGS.IF for L1 interrupts will never be set
+                * while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
                 */
                vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
-               vmcb_clr_intercept(c, INTERCEPT_VINTR);
+               if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
+                       vmcb_clr_intercept(c, INTERCEPT_VINTR);
        }
 
        /*
@@ -416,18 +421,18 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 
        /* Only a few fields of int_ctl are written by the processor.  */
        mask = V_IRQ_MASK | V_TPR_MASK;
-       if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
-           svm_is_intercept(svm, INTERCEPT_VINTR)) {
-               /*
-                * In order to request an interrupt window, L0 is usurping
-                * svm->vmcb->control.int_ctl and possibly setting V_IRQ
-                * even if it was clear in L1's VMCB.  Restoring it would be
-                * wrong.  However, in this case V_IRQ will remain true until
-                * interrupt_window_interception calls svm_clear_vintr and
-                * restores int_ctl.  We can just leave it aside.
-                */
+
+       /*
+        * Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting
+        * virtual interrupts in order to request an interrupt window, as KVM
+        * has usurped vmcb02's int_ctl.  If an interrupt window opens before
+        * the next VM-Exit, svm_clear_vintr() will restore vmcb12's int_ctl.
+        * If no window opens, V_IRQ will be correctly preserved in vmcb12's
+        * int_ctl (because it was never recognized while L2 was running).
+        */
+       if (svm_is_intercept(svm, INTERCEPT_VINTR) &&
+           !test_bit(INTERCEPT_VINTR, (unsigned long *)svm->nested.ctl.intercepts))
                mask &= ~V_IRQ_MASK;
-       }
 
        if (nested_vgif_enabled(svm))
                mask |= V_GIF_MASK;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b103fe7cbc82..59d2891662ef 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1580,6 +1580,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
 
        svm_set_intercept(svm, INTERCEPT_VINTR);
 
+       /*
+        * Recalculating intercepts may have clear the VINTR intercept.  If
+        * V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
+        * for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
+        * Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
+        * interrupts will never be unblocked while L2 is running.
+        */
+       if (!svm_is_intercept(svm, INTERCEPT_VINTR))
+               return;
+
        /*
         * This is just a dummy VINTR to actually cause a vmexit to happen.
         * Actual injection of virtual interrupts happens through EVENTINJ.

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception
  2022-12-08 13:44         ` Maxim Levitsky
@ 2023-01-31 21:07           ` Sean Christopherson
  2023-02-13 14:50             ` Santosh Shukla
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-01-31 21:07 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Santosh Shukla, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Thu, Dec 08, 2022, Maxim Levitsky wrote:
> On Thu, 2022-12-08 at 17:39 +0530, Santosh Shukla wrote:
> > 
> > On 12/6/2022 5:44 PM, Maxim Levitsky wrote:
> > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > > index 512b2aa21137e2..cfed6ab29c839a 100644
> > > > > --- a/arch/x86/kvm/svm/svm.c
> > > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > > @@ -2468,16 +2468,29 @@ static int task_switch_interception(struct kvm_vcpu *vcpu)
> > > > >  			       has_error_code, error_code);
> > > > >  }
> > > > >  
> > > > > +static void svm_disable_iret_interception(struct vcpu_svm *svm)
> > > > > +{
> > > > > +	if (!sev_es_guest(svm->vcpu.kvm))
> > > > > +		svm_clr_intercept(svm, INTERCEPT_IRET);
> > > > > +}
> > > > > +
> > > > > +static void svm_enable_iret_interception(struct vcpu_svm *svm)
> > > > > +{
> > > > > +	if (!sev_es_guest(svm->vcpu.kvm))
> > > > > +		svm_set_intercept(svm, INTERCEPT_IRET);
> > > > > +}
> > > > > +
> > > > 
> > > > nits:
> > > > s/_iret_interception / _iret_intercept
> > > > does that make sense?
> > > 
> > > Makes sense.

I would rather go with svm_{clr,set}_iret_intercept().  I don't particularly like
the SVM naming scheme, but I really dislike inconsistent naming.  If we want to
clean up naming, I would love unify VMX and SVM nomenclature for things like this.

> > >  I can also move this to svm.h near the svm_set_intercept(), I think
> > > it better a better place for this function there if no objections.
> > > 
> > I think current approach is fine since function used in svm.c only. but I have
> > no strong opinion on moving to svm.h either ways.
> 
> I also think so, just noticed something in case there are any objections.

My vote is to keep it in svm.c unless we anticipate usage outside of svm.h.  Keeping
the implementation close to the usage makes it easer to understand what's going on,
especially for something like this where there's a bit of "hidden" logic for SEV-ES.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-01-28  1:09   ` Sean Christopherson
@ 2023-01-31 21:12     ` Sean Christopherson
  2023-02-08  9:35       ` Santosh Shukla
  2023-02-08  9:32     ` Santosh Shukla
  2023-02-24 14:39     ` Maxim Levitsky
  2 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-01-31 21:12 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Sat, Jan 28, 2023, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
> >  	 * Otherwise, allow two (and we'll inject the first one immediately).
> >  	 */
> >  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
> > -		limit = 1;
> > +		limit--;
> > +
> > +	/* Also if there is already a NMI hardware queued to be injected,
> > +	 * decrease the limit again
> > +	 */
> 
> 	/*
> 	 * Block comment ...
> 	 */
> 
> > +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
> 
> I'd prefer "is_hw_nmi_pending()" over "get", even if it means not pairing with
> "set".  Though I think that's a good thing since they aren't perfect pairs.

Thinking more, I vote for s/hw_nmi/vnmi.  "hardware" usually means actual hardware,
i.e. a pending NMI for the host.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2022-11-29 19:37 ` [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface Maxim Levitsky
  2023-01-28  1:09   ` Sean Christopherson
@ 2023-01-31 22:28   ` Sean Christopherson
  2023-02-01  0:06     ` Sean Christopherson
  2023-02-08  9:43     ` Santosh Shukla
  1 sibling, 2 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-01-31 22:28 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.nmi_injected = events->nmi.injected;
>  	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
> -		vcpu->arch.nmi_pending = events->nmi.pending;
> +		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
> +
>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>  
> +	process_nmi(vcpu);

Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's
ABI that's ugly).  E.g. if we collapse this down, it becomes:

	process_nmi(vcpu);

	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
		<blah blah blah>
	}
	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);

	process_nmi(vcpu);

And the second mess is that V_NMI needs to be cleared.

The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep
nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set.  I think we can just
replace that with an set of nmi_queued, i.e.

	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
		vcpu->arch-nmi_pending = 0;	
		atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
		process_nmi();
	}

because if nmi_queued is non-zero in the !KVM_VCPUEVENT_VALID_NMI_PENDING, then
there should already be a pending KVM_REQ_NMI.  Alternatively, replace process_nmi()
with a KVM_REQ_NMI request (that probably has my vote?).

If that works, can you do that in a separate patch?  Then this patch can tack on
a call to clear V_NMI.

>  	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
>  	    lapic_in_kernel(vcpu))
>  		vcpu->arch.apic->sipi_vector = events->sipi_vector;
> @@ -10008,6 +10011,10 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
>  static void process_nmi(struct kvm_vcpu *vcpu)
>  {
>  	unsigned limit = 2;
> +	int nmi_to_queue = atomic_xchg(&vcpu->arch.nmi_queued, 0);
> +
> +	if (!nmi_to_queue)
> +		return;
>  
>  	/*
>  	 * x86 is limited to one NMI running, and one NMI pending after it.
> @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
>  	 * Otherwise, allow two (and we'll inject the first one immediately).
>  	 */
>  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
> -		limit = 1;
> +		limit--;
> +
> +	/* Also if there is already a NMI hardware queued to be injected,
> +	 * decrease the limit again
> +	 */
> +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
> +		limit--;

I don't think this is correct.  If a vNMI is pending and NMIs are blocked, then
limit will end up '0' and KVM will fail to pend the additional NMI in software.
After much fiddling, and factoring in the above, how about this?

	unsigned limit = 2;

	/*
	 * x86 is limited to one NMI running, and one NMI pending after it.
	 * If an NMI is already in progress, limit further NMIs to just one.
	 * Otherwise, allow two (and we'll inject the first one immediately).
	 */
	if (vcpu->arch.nmi_injected) {
		/* vNMI counts as the "one pending NMI". */
		if (static_call(kvm_x86_is_vnmi_pending)(vcpu))
			limit = 0;
		else
			limit = 1;
	} else if (static_call(kvm_x86_get_nmi_mask)(vcpu) ||
		   static_call(kvm_x86_is_vnmi_pending)(vcpu)) {
		limit = 1;
	}

	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);

	if (vcpu->arch.nmi_pending &&
	    static_call(kvm_x86_set_vnmi_pending(vcpu)))
		vcpu->arch.nmi_pending--;

	if (vcpu->arch.nmi_pending)
		kvm_make_request(KVM_REQ_EVENT, vcpu);

With the KVM_REQ_EVENT change in a separate prep patch:

--
From: Sean Christopherson <seanjc@google.com>
Date: Tue, 31 Jan 2023 13:33:02 -0800
Subject: [PATCH] KVM: x86: Raise an event request when processing NMIs iff an
 NMI is pending

Don't raise KVM_REQ_EVENT if no NMIs are pending at the end of
process_nmi().  Finishing process_nmi() without a pending NMI will become
much more likely when KVM gains support for AMD's vNMI, which allows
pending vNMIs in hardware, i.e. doesn't require explicit injection.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 508074e47bc0..030136b6ebbd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10134,7 +10134,9 @@ static void process_nmi(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
 	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
-	kvm_make_request(KVM_REQ_EVENT, vcpu);
+
+	if (vcpu->arch.nmi_pending)
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
 void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,

base-commit: 916b54a7688b0b9a1c48c708b848e4348c3ae2ab
-- 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 09/11] KVM: SVM: Add VNMI bit definition
  2022-11-29 19:37 ` [PATCH v2 09/11] KVM: SVM: Add VNMI bit definition Maxim Levitsky
@ 2023-01-31 22:42   ` Sean Christopherson
  2023-02-02  9:42     ` Santosh Shukla
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-01-31 22:42 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Santosh Shukla

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> From: Santosh Shukla <santosh.shukla@amd.com>
> 
> VNMI exposes 3 capability bits (V_NMI, V_NMI_MASK, and V_NMI_ENABLE) to
> virtualize NMI and NMI_MASK, Those capability bits are part of
> VMCB::intr_ctrl -
> V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
> V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
> V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.
> 
> When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor
> will clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
> handling NMI, After the guest handled the NMI, The processor will clear
> the V_NMI_MASK on the successful completion of IRET instruction Or if
> VMEXIT occurs while delivering the virtual NMI.
> 
> To enable the VNMI capability, Hypervisor need to program
> V_NMI_ENABLE bit 1.
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> ---
>  arch/x86/include/asm/svm.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index cb1ee53ad3b189..26d6f549ce2b46 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -203,6 +203,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  #define X2APIC_MODE_SHIFT 30
>  #define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
>  
> +#define V_NMI_PENDING_SHIFT 11
> +#define V_NMI_PENDING (1 << V_NMI_PENDING_SHIFT)
> +#define V_NMI_MASK_SHIFT 12
> +#define V_NMI_MASK (1 << V_NMI_MASK_SHIFT)

Argh, more KVM warts.  The existing INT_CTL defines all use "mask" in the name,
so looking at V_NMI_MASK in the context of other code reads "vNMI is pending",
not "vNMIs are blocked".

IMO, the existing _MASK terminology is the one that's wrong, but there's an absurd
amount of prior art in svm.h :-(

And the really annoying one is V_INTR_MASKING_MASK, which IIRC says "virtual INTR
masking is enabled", not "virtual INTRs are blocked".

So maybe call this V_NMI_BLOCKING_MASK?  And tack on _MASK too the others (even
though I agree it's ugly).

> +#define V_NMI_ENABLE_SHIFT 26
> +#define V_NMI_ENABLE (1 << V_NMI_ENABLE_SHIFT)

Hrm.  I think I would prefer to keep the defines ordered by bit position.  Knowing
that there's an enable bit isn't all that critical for understanding vNMI pending
and blocked.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-01-31 22:28   ` Sean Christopherson
@ 2023-02-01  0:06     ` Sean Christopherson
  2023-02-08  9:51       ` Santosh Shukla
  2023-02-08  9:43     ` Santosh Shukla
  1 sibling, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-02-01  0:06 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Jan 31, 2023, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
> >  	 * Otherwise, allow two (and we'll inject the first one immediately).
> >  	 */
> >  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
> > -		limit = 1;
> > +		limit--;
> > +
> > +	/* Also if there is already a NMI hardware queued to be injected,
> > +	 * decrease the limit again
> > +	 */
> > +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
> > +		limit--;
> 
> I don't think this is correct.  If a vNMI is pending and NMIs are blocked, then
> limit will end up '0' and KVM will fail to pend the additional NMI in software.

Scratch that, dropping the second NMI in this case is correct.  The "running" part
of the existing "x86 is limited to one NMI running, and one NMI pending after it"
confused me.  The "running" thing is really just a variant on NMIs being blocked.

I'd also like to avoid the double decrement logic.  Accounting the virtual NMI is
a very different thing than dealing with concurrent NMIs, I'd prefer to reflect
that in the code.

Any objection to folding in the below to end up with:

	unsigned limit;

	/*
	 * x86 is limited to one NMI pending, but because KVM can't react to
	 * incoming NMIs as quickly as bare metal, e.g. if the vCPU is
	 * scheduled out, KVM needs to play nice with two queued NMIs showing
	 * up at the same time.  To handle this scenario, allow two NMIs to be
	 * (temporarily) pending so long as NMIs are not blocked and KVM is not
	 * waiting for a previous NMI injection to complete (which effectively
	 * blocks NMIs).  KVM will immediately inject one of the two NMIs, and
	 * will request an NMI window to handle the second NMI.
	 */
	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
		limit = 1;
	else
		limit = 2;

	/*
	 * Adjust the limit to account for pending virtual NMIs, which aren't
	 * tracked in in vcpu->arch.nmi_pending.
	 */
	if (static_call(kvm_x86_is_vnmi_pending)(vcpu))
		limit--;

	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);

	if (vcpu->arch.nmi_pending)
		kvm_make_request(KVM_REQ_EVENT, vcpu);

--
From: Sean Christopherson <seanjc@google.com>
Date: Tue, 31 Jan 2023 16:02:21 -0800
Subject: [PATCH] KVM: x86: Tweak the code and comment related to handling
 concurrent NMIs

Tweak the code and comment that deals with concurrent NMIs to explicitly
call out that x86 allows exactly one pending NMI, but that KVM needs to
temporarily allow two pending NMIs in order to workaround the fact that
the target vCPU cannot immediately recognize an incoming NMI, unlike bare
metal.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 030136b6ebbd..fda09ba48b6b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10122,15 +10122,22 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
 
 static void process_nmi(struct kvm_vcpu *vcpu)
 {
-	unsigned limit = 2;
+	unsigned limit;
 
 	/*
-	 * x86 is limited to one NMI running, and one NMI pending after it.
-	 * If an NMI is already in progress, limit further NMIs to just one.
-	 * Otherwise, allow two (and we'll inject the first one immediately).
+	 * x86 is limited to one NMI pending, but because KVM can't react to
+	 * incoming NMIs as quickly as bare metal, e.g. if the vCPU is
+	 * scheduled out, KVM needs to play nice with two queued NMIs showing
+	 * up at the same time.  To handle this scenario, allow two NMIs to be
+	 * (temporarily) pending so long as NMIs are not blocked and KVM is not
+	 * waiting for a previous NMI injection to complete (which effectively
+	 * blocks NMIs).  KVM will immediately inject one of the two NMIs, and
+	 * will request an NMI window to handle the second NMI.
 	 */
 	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
 		limit = 1;
+	else
+		limit = 2;
 
 	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
 	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);

base-commit: 07222b33fd1af78dca77c7c66db31477f1b87f0f
-- 


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2022-11-29 19:37 ` [PATCH v2 10/11] KVM: SVM: implement support for vNMI Maxim Levitsky
                     ` (2 preceding siblings ...)
  2023-01-28  1:10   ` Sean Christopherson
@ 2023-02-01  0:22   ` Sean Christopherson
  2023-02-01  0:39     ` Sean Christopherson
  2023-02-10 12:24     ` Santosh Shukla
  3 siblings, 2 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-02-01  0:22 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Santosh Shukla

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> This patch implements support for injecting pending
> NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI

Wrap closer to 75 chars, and please try to be consistent in how your format
things like changelogs and comments.  It's jarring as a reader when the wrap
column is constantly changing.

> feature.

Please combine the introduction, usage, and implementation of the hew kvm_x86_ops,
i.e. introduce and use the ops in this patch.  It's extremely difficult to review
the common x86 code that uses the ops without seeing how they're implemented in
SVM.  I believe the overall size/scope of the patch can be kept reasonable by
introducing some of the common changes in advance of the new ops, e.g. tweaking
the KVM_SET_VCPU_EVENTS flow.

> Note that the vNMI can't cause a VM exit, which is needed
> when a nested guest intercepts NMIs.

I can't tell if this is saying "SVM doesn't allow intercepting virtual NMIs", or
"KVM never enables virtual NMI interception".

> Therefore to avoid breaking nesting, the vNMI is inhibited while
> a nested guest is running and instead, the legacy NMI window
> detection and delivery method is used.

State what KVM does, don't describe the effects.  E.g. "Disable vNMI while running
L2".  When a changelog describes the effects, it's unclear whether the effects are
new behavior introduced by the patch, hardware behavior, etc...

> While it is possible to passthrough the vNMI if a nested guest
> doesn't intercept NMIs, such usage is very uncommon, and it's
> not worth to optimize for.

Can you elaborate on why not?  It's not obvious to me that the code will end up
more complex, and oftentimes omitting code increases the cognitive load on readers,
i.e. makes things more complex in a way.  vNMI is mutually exclusive with NMI
passthrough, i.e. KVM doesn't need to handle NMI passthrough and vNMI simultaneously.

> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>

SoB chain is wrong.  Maxim is credited as the sole Author, i.e. Santosh shouldn't
have a SoB.  Assuming the intent is to attribute both of ya'll this needs to be

 Co-developed-by: Santosh Shukla <santosh.shukla@amd.com>
 Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>

if Maxim is the primary author, or this if Santosh is the primary author

 From: Santosh Shukla <santosh.shukla@amd.com>

 <blah blah blah>

 Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
 Developed-by: Maxim Levitsky <mlevitsk@redhat.com>
 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>

> ---
>  arch/x86/kvm/svm/nested.c |  42 +++++++++++++++
>  arch/x86/kvm/svm/svm.c    | 111 ++++++++++++++++++++++++++++++--------
>  arch/x86/kvm/svm/svm.h    |  10 ++++
>  3 files changed, 140 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index e891318595113e..5bea672bf8b12d 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj)
>  	return type == SVM_EVTINJ_TYPE_NMI;
>  }
>  
> +static void nested_svm_save_vnmi(struct vcpu_svm *svm)

Please avoid save/restore names.  KVM (selftests in particular) uses save/restore
to refer to a saving and restoring state across a migration.  "sync" is probably
the best option, or just open code the flows.

> +{
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/*
> +	 * Copy the vNMI state back to software NMI tracking state
> +	 * for the duration of the nested run
> +	 */
> +
> +	svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK;
> +	svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING;

This is wrong.  V_NMI_PENDING is bit 11, i.e. the bitwise-AND does not yield a
boolean value and will increment nmi_pending by 2048 instead of by 1.

	if (vmcb01->control.int_ctl & V_NMI_PENDING)
		svm->vcpu.arch.nmi_pending++;

And this needs a KVM_REQ_EVENT to ensure KVM processes the newly pending NMI.

> +}
> +
> +static void nested_svm_restore_vnmi(struct vcpu_svm *svm)
> +{
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/*
> +	 * Restore the vNMI state from the software NMI tracking state
> +	 * after a nested run
> +	 */
> +
> +	if (svm->nmi_masked)
> +		vmcb01->control.int_ctl |= V_NMI_MASK;
> +	else
> +		vmcb01->control.int_ctl &= ~V_NMI_MASK;

As proposed, this needs to clear nmi_masked to avoid false positives.  The better
approach is to not have any open coded accesses to svm->nmi_masked outside of
flows that specifically need to deal with vNMI logic.

E.g. svm_enable_nmi_window() reads the raw nmi_masked.

> +
> +	if (vcpu->arch.nmi_pending) {
> +		vcpu->arch.nmi_pending--;
> +		vmcb01->control.int_ctl |= V_NMI_PENDING;
> +	} else

Curly braces on all paths if any path needs 'em.

> +		vmcb01->control.int_ctl &= ~V_NMI_PENDING;
> +}

...

> + static bool svm_set_hw_nmi_pending(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (!is_vnmi_enabled(svm))
> +		return false;
> +
> +	if (svm->vmcb->control.int_ctl & V_NMI_PENDING)
> +		return false;
> +
> +	svm->vmcb->control.int_ctl |= V_NMI_PENDING;
> +	vmcb_mark_dirty(svm->vmcb, VMCB_INTR);
> +
> +	/*
> +	 * NMI isn't yet technically injected but
> +	 * this rough estimation should be good enough

Again, wrap at 80 chars, not at random points.

> +	 */
> +	++vcpu->stat.nmi_injections;
> +
> +	return true;
> +}
> +

...

>  bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3725,10 +3772,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  	/*
>  	 * Something prevents NMI from been injected. Single step over possible
>  	 * problem (IRET or exception injection or interrupt shadow)
> +	 *
> +	 * With vNMI we should never need an NMI window
> +	 * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ)

Please honor the soft limit and avoid pronouns.  There's also no need to put the
blurb in parantheses on its own line.

As for the code, I believe there are bugs.  Pulling in the context...

	if (svm->nmi_masked && !svm->awaiting_iret_completion)
		return; /* IRET will cause a vm exit */

Checking nmi_masked is wrong, this should use the helper.  Even if this code can
only be reached on error, it should still try its best to not make things worse.

	if (!gif_set(svm)) {
		if (vgif)
			svm_set_intercept(svm, INTERCEPT_STGI);
		return; /* STGI will cause a vm exit */
	}

	/*
	 * Something prevents NMI from been injected. Single step over possible
	 * problem (IRET or exception injection or interrupt shadow)
	 *
	 * With vNMI we should never need an NMI window
	 * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ)
	 */
	if (WARN_ON_ONCE(is_vnmi_enabled(svm)))
		return;

This is flawed, where "this" means handling of NMI windows when vNMI is enabled.

IIUC, if there are back-to-back NMIs, the intent is to set V_NMI for one and
inject the other.  I believe there are multiple bugs svm_inject_nmi().  The one
that's definitely a bug is setting svm->nmi_masked.  The other suspected bug,
which is related to the above WARN, is setting the IRET intercept.  The resulting
IRET interception will set awaiting_iret_completion, and then the above WARN is
reachable (even if the masking check is fixed).

I don't think KVM needs to ever intercept IRET.  One NMI gets injected, and the
other is sitting in INT_CTL.V_NMI_PENDING, i.e. there's no need for KVM to regain
control.  If another NMI comes along before V_NMI_PENDING is handled, it will
either get injected or dropped.

So I _think_ KVM can skip the intercept code when injecting an NMI, and this WARN
can be hoisted to the top of svm_enable_nmi_window(), because as stated above, KVM
should never need to request an NMI window.

Last thought, unless there's something that will obviously break, it's probably
better to WARN and continue than to bail.  I.e. do the single-step and hope for
the best.  Bailing at this point doesn't seem like it would help.

>  	 */
> +	if (WARN_ON_ONCE(is_vnmi_enabled(svm)))
> +		return;
> +
>  	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> -	svm->nmi_singlestep = true;
>  	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +	svm->nmi_singlestep = true;
>  }

...

> @@ -553,6 +554,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
>  	       (msr < (APIC_BASE_MSR + 0x100));
>  }
>  
> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> +{
> +	/* L1's vNMI is inhibited while nested guest is running */
> +	if (is_guest_mode(&svm->vcpu))

I would rather check the current VMCB.  I don't see any value in hardcoding the
"KVM doesn't support vNMI in L2" in multiple places.  And I find the above comment
about "L1's vNMI is inhibited" confusing.  vNMI isn't inhibited/blocked, KVM just
doesn't utilize vNMI while L2 is active (IIUC, as proposed).

> +		return false;
> +
> +	return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE);
> +}
> +
>  /* svm.c */
>  #define MSR_INVALID				0xffffffffU
>  
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 00/11] SVM: vNMI (with my fixes)
  2022-12-06  9:58 ` [PATCH v2 00/11] SVM: vNMI (with my fixes) Santosh Shukla
@ 2023-02-01  0:24   ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-02-01  0:24 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Maxim Levitsky, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Dec 06, 2022, Santosh Shukla wrote:
> Series tested on EPYC-v4.
> Tested-By: Santosh Shukla <Santosh.Shukla@amd.com>

In the future, please use Tested-by, not Tested-By.  For whatever reason, the
preferred kernel style for tags is to capitalize only the first word, e.g.
Co-developed-by, Tested-by, Reviewed-by, etc...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2023-02-01  0:22   ` Sean Christopherson
@ 2023-02-01  0:39     ` Sean Christopherson
  2023-02-10 12:24     ` Santosh Shukla
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-02-01  0:39 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Santosh Shukla

On Wed, Feb 01, 2023, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > @@ -553,6 +554,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
> >  	       (msr < (APIC_BASE_MSR + 0x100));
> >  }
> >  
> > +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> > +{
> > +	/* L1's vNMI is inhibited while nested guest is running */
> > +	if (is_guest_mode(&svm->vcpu))
> 
> I would rather check the current VMCB.  I don't see any value in hardcoding the
> "KVM doesn't support vNMI in L2" in multiple places.  And I find the above comment
> about "L1's vNMI is inhibited" confusing.  vNMI isn't inhibited/blocked, KVM just
> doesn't utilize vNMI while L2 is active (IIUC, as proposed).

Oof.  Scratch that, code is correct as proposed, but the comment and function name
are confusing.

After looking at the nested support, is_vnmi_enabled() actually means "is vNMI
currently enabled _for L1_".  And it has less to do with vNMI being "inhibited" and
everything to do with KVM choosing not to utilize vNMI for L1 while running L2.

"inhibited" in quotes because "inhibited" is a synonym of "blocked", i.e. I read
that as L1 NMIs are blocked.

So regardless of whether or not KVM decides to utilize vNMI for L2 if L1's NMIs
are passed through, the function name needs to clarify that it's referring to
L1.  E.g. is_vnmi_enabled_for_l1() or so.

And if KVM decides not to use vNMI for L1 while running L2, state that more
explicitly instead of saying it's inhibited.

And if KVM does decide to use vNMI while running L2 if NMIs are passed through,
then the comment goes away and KVM toggles the flag in vmcb01 on nested enter
and exit.

> > +		return false;
> > +
> > +	return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE);
> > +}
> > +
> >  /* svm.c */
> >  #define MSR_INVALID				0xffffffffU
> >  
> > -- 
> > 2.26.3
> > 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI
  2022-11-29 19:37 ` [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI Maxim Levitsky
  2022-12-05 17:14   ` Santosh Shukla
@ 2023-02-01  0:44   ` Sean Christopherson
  1 sibling, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-02-01  0:44 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Santosh Shukla

On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> This patch allows L1 to use vNMI to accelerate its injection
> of NMIs to L2 by passing through vNMI int_ctl bits from vmcb12
> to/from vmcb02.
> 
> While L2 runs, L1's vNMI is inhibited, and L1's NMIs are injected
> normally.

Same feedback on stating the change as a command instead of describing the net
effects.

> In order to support nested VNMI requires saving and restoring the VNMI
> bits during nested entry and exit.

Again, avoid saving+restoring.  And it's not just for terminology, it's not a
true save/restore, e.g. a pending vNMI for L1 needs to be recognized and trigger
a nested VM-Exit.  I.e. KVM can't simply stash the state and restore it later,
KVM needs to actively process the pending NMI.

> In case of L1 and L2 both using VNMI- Copy VNMI bits from vmcb12 to
> vmcb02 during entry and vice-versa during exit.
> And in case of L1 uses VNMI and L2 doesn't- Copy VNMI bits from vmcb01 to
> vmcb02 during entry and vice-versa during exit.
> 
> Tested with the KVM-unit-test and Nested Guest scenario.
> 
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>

Same SoB issues.

> ---
>  arch/x86/kvm/svm/nested.c | 13 ++++++++++++-
>  arch/x86/kvm/svm/svm.c    |  5 +++++
>  arch/x86/kvm/svm/svm.h    |  6 ++++++
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 5bea672bf8b12d..81346665058e26 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -278,6 +278,11 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
>  	if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
>  		return false;
>  
> +	if (CC((control->int_ctl & V_NMI_ENABLE) &&
> +		!vmcb12_is_intercept(control, INTERCEPT_NMI))) {

Align indentation.

	if (CC((control->int_ctl & V_NMI_ENABLE) &&
	       !vmcb12_is_intercept(control, INTERCEPT_NMI))) {
		return false;
	}

> +		return false;
> +	}
> +
>  	return true;
>  }
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0b7e1790fadde1..8fb2085188c5ac 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -271,6 +271,7 @@ struct vcpu_svm {
>  	bool pause_filter_enabled         : 1;
>  	bool pause_threshold_enabled      : 1;
>  	bool vgif_enabled                 : 1;
> +	bool vnmi_enabled                 : 1;
>  
>  	u32 ldr_reg;
>  	u32 dfr_reg;
> @@ -545,6 +546,11 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
>  }
>  
> +static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
> +{
> +	return svm->vnmi_enabled && (svm->nested.ctl.int_ctl & V_NMI_ENABLE);

Gah, the "nested" flags in vcpu_svm are super confusing.  I initially read this
as "if vNMI is enabled in L1 and vmcb12".  

I have a series that I originally prepped for the architectural LBRs series that
will allow turning this into

	return guest_can_use(vcpu, X86_FEATURE_VNMI) &&
	       (svm->nested.ctl.int_ctl & V_NMI_ENABLE);

I'll get that series posted.

Nothing to do on your end, just an FYI.  I'll sort out conflicts if/when they happen.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 00/11] SVM: vNMI (with my fixes)
  2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (14 preceding siblings ...)
  2023-01-28  1:13 ` Sean Christopherson
@ 2023-02-01 19:13 ` Sean Christopherson
  15 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-02-01 19:13 UTC (permalink / raw)
  To: Sean Christopherson, kvm, Maxim Levitsky
  Cc: Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, 29 Nov 2022 21:37:06 +0200, Maxim Levitsky wrote:
> This is the vNMI patch series based on Santosh Shukla's vNMI patch series.
> 
> In this version of this patch series I addressed most of the review feedback
> added some more refactoring and also I think fixed the issue with migration.
> 
> I only tested this on a machine which doesn't have vNMI, so this does need
> some testing to ensure that nothing is broken.
> 
> [...]

Applied 1, 4, and 5 to kvm-x86 svm.  I split patch 4 as doing so made the
HF_GIF_MASK change super trivial.

vNMI support will get pushed beyond v6.3, but I will do my best to promptly
review future versions, while I still have all of this paged in...

[01/11] KVM: nSVM: Don't sync tlb_ctl back to vmcb12 on nested VM-Exit
        https://github.com/kvm-x86/linux/commit/8957cbcfed0a
[04/11] KVM: x86: Move HF_GIF_MASK into "struct vcpu_svm" as "guest_gif"
        https://github.com/kvm-x86/linux/commit/c760e86f27fe
[04/11] KVM: x86: Move HF_NMI_MASK and HF_IRET_MASK into "struct vcpu_svm"
        https://github.com/kvm-x86/linux/commit/916b54a7688b
[05/11] KVM: x86: Use emulator callbacks instead of duplicating "host flags"
        https://github.com/kvm-x86/linux/commit/32e69f232db4

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 09/11] KVM: SVM: Add VNMI bit definition
  2023-01-31 22:42   ` Sean Christopherson
@ 2023-02-02  9:42     ` Santosh Shukla
  0 siblings, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2023-02-02  9:42 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin



On 2/1/2023 4:12 AM, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>> From: Santosh Shukla <santosh.shukla@amd.com>
>>
>> VNMI exposes 3 capability bits (V_NMI, V_NMI_MASK, and V_NMI_ENABLE) to
>> virtualize NMI and NMI_MASK, Those capability bits are part of
>> VMCB::intr_ctrl -
>> V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
>> V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
>> V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.
>>
>> When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor
>> will clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
>> handling NMI, After the guest handled the NMI, The processor will clear
>> the V_NMI_MASK on the successful completion of IRET instruction Or if
>> VMEXIT occurs while delivering the virtual NMI.
>>
>> To enable the VNMI capability, Hypervisor need to program
>> V_NMI_ENABLE bit 1.
>>
>> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>> ---
>>  arch/x86/include/asm/svm.h | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>> index cb1ee53ad3b189..26d6f549ce2b46 100644
>> --- a/arch/x86/include/asm/svm.h
>> +++ b/arch/x86/include/asm/svm.h
>> @@ -203,6 +203,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>>  #define X2APIC_MODE_SHIFT 30
>>  #define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
>>  
>> +#define V_NMI_PENDING_SHIFT 11
>> +#define V_NMI_PENDING (1 << V_NMI_PENDING_SHIFT)
>> +#define V_NMI_MASK_SHIFT 12
>> +#define V_NMI_MASK (1 << V_NMI_MASK_SHIFT)
> 
> Argh, more KVM warts.  The existing INT_CTL defines all use "mask" in the name,
> so looking at V_NMI_MASK in the context of other code reads "vNMI is pending",
> not "vNMIs are blocked".
> 
> IMO, the existing _MASK terminology is the one that's wrong, but there's an absurd
> amount of prior art in svm.h :-(
> 
> And the really annoying one is V_INTR_MASKING_MASK, which IIRC says "virtual INTR
> masking is enabled", not "virtual INTRs are blocked".
> 
> So maybe call this V_NMI_BLOCKING_MASK?  And tack on _MASK too the others (even
> though I agree it's ugly).
> 

Sure.

>> +#define V_NMI_ENABLE_SHIFT 26
>> +#define V_NMI_ENABLE (1 << V_NMI_ENABLE_SHIFT)
> 
> Hrm.  I think I would prefer to keep the defines ordered by bit position.  Knowing
> that there's an enable bit isn't all that critical for understanding vNMI pending
> and blocked.

Sure, Sean will include in V3.

Thanks,
Santosh


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-01-28  1:09   ` Sean Christopherson
  2023-01-31 21:12     ` Sean Christopherson
@ 2023-02-08  9:32     ` Santosh Shukla
  2023-02-24 14:39     ` Maxim Levitsky
  2 siblings, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2023-02-08  9:32 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On 1/28/2023 6:39 AM, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>> This patch adds two new vendor callbacks:
> 
> No "this patch" please, just say what it does.
> 
Sure.

>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 684a5519812fb2..46993ce61c92db 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -871,8 +871,13 @@ struct kvm_vcpu_arch {
>>  	u64 tsc_scaling_ratio; /* current scaling ratio */
>>  
>>  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
>> -	unsigned nmi_pending; /* NMI queued after currently running handler */
>> +
>> +	unsigned int nmi_pending; /*
>> +				   * NMI queued after currently running handler
>> +				   * (not including a hardware pending NMI (e.g vNMI))
>> +				   */
> 
> Put the block comment above.  I'd say collapse all of the comments about NMIs into
> a single big block comment.
>

ok.
 
>>  	bool nmi_injected;    /* Trying to inject an NMI this entry */
>> +
>>  	bool smi_pending;    /* SMI queued after currently running handler */
>>  	u8 handling_intr_from_guest;
>>  
>> @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
>>  	 * Otherwise, allow two (and we'll inject the first one immediately).
>>  	 */
>>  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
>> -		limit = 1;
>> +		limit--;
>> +
>> +	/* Also if there is already a NMI hardware queued to be injected,
>> +	 * decrease the limit again
>> +	 */
> 
> 	/*
> 	 * Block comment ...
> 	 */
>

ok.
 
>> +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
> 
> I'd prefer "is_hw_nmi_pending()" over "get", even if it means not pairing with
> "set".  Though I think that's a good thing since they aren't perfect pairs.
>

Sure thing, will spin in v3.
 
>> +		limit--;
>>  
>> -	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
>> +	if (limit <= 0)
>> +		return;
>> +
>> +	/* Attempt to use hardware NMI queueing */
>> +	if (static_call(kvm_x86_set_hw_nmi_pending)(vcpu)) {
>> +		limit--;
>> +		nmi_to_queue--;
>> +	}
>> +
>> +	vcpu->arch.nmi_pending += nmi_to_queue;
>>  	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
>>  	kvm_make_request(KVM_REQ_EVENT, vcpu);
>>  }
>>  
>> +/* Return total number of NMIs pending injection to the VM */
>> +int kvm_get_total_nmi_pending(struct kvm_vcpu *vcpu)
>> +{
>> +	return vcpu->arch.nmi_pending + static_call(kvm_x86_get_hw_nmi_pending)(vcpu);
> 
> Nothing cares about the total count, this can just be;
> 
> 
> 	bool kvm_is_nmi_pending(struct kvm_vcpu *vcpu)
> 	{
> 		return vcpu->arch.nmi_pending ||
> 		       static_call(kvm_x86_is_hw_nmi_pending)(vcpu);
> 	}
>

Yes, this simplifies things.

Thanks,
Santosh
 
> 
>> +}
>> +
>>  void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
>>  				       unsigned long *vcpu_bitmap)
>>  {
>> -- 
>> 2.26.3
>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-01-31 21:12     ` Sean Christopherson
@ 2023-02-08  9:35       ` Santosh Shukla
  0 siblings, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2023-02-08  9:35 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On 2/1/2023 2:42 AM, Sean Christopherson wrote:
> On Sat, Jan 28, 2023, Sean Christopherson wrote:
>> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>>> @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
>>>  	 * Otherwise, allow two (and we'll inject the first one immediately).
>>>  	 */
>>>  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
>>> -		limit = 1;
>>> +		limit--;
>>> +
>>> +	/* Also if there is already a NMI hardware queued to be injected,
>>> +	 * decrease the limit again
>>> +	 */
>>
>> 	/*
>> 	 * Block comment ...
>> 	 */
>>
>>> +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
>>
>> I'd prefer "is_hw_nmi_pending()" over "get", even if it means not pairing with
>> "set".  Though I think that's a good thing since they aren't perfect pairs.
> 
> Thinking more, I vote for s/hw_nmi/vnmi.  "hardware" usually means actual hardware,
> i.e. a pending NMI for the host.

Ah!,. better. 

Thanks,
Santosh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-01-31 22:28   ` Sean Christopherson
  2023-02-01  0:06     ` Sean Christopherson
@ 2023-02-08  9:43     ` Santosh Shukla
  2023-02-08 16:06       ` Sean Christopherson
  1 sibling, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2023-02-08  9:43 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On 2/1/2023 3:58 AM, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>>  
>>  	vcpu->arch.nmi_injected = events->nmi.injected;
>>  	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
>> -		vcpu->arch.nmi_pending = events->nmi.pending;
>> +		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
>> +
>>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>>  
>> +	process_nmi(vcpu);
> 
> Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's
> ABI that's ugly).  E.g. if we collapse this down, it becomes:
> 
> 	process_nmi(vcpu);
> 
> 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> 		<blah blah blah>
> 	}
> 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> 
> 	process_nmi(vcpu);
> 
> And the second mess is that V_NMI needs to be cleared.
> 

Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning about V_NMI_MASK or svm->nmi_masked?

> The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep
> nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set.  I think we can just
> replace that with an set of nmi_queued, i.e.
> 
> 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> 		vcpu->arch-nmi_pending = 0;	
> 		atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> 		process_nmi();
> 
You mean replace above process_nmi() with kvm_make_request(KVM_REQ_NMI, vcpu), right?
I'll try with above proposal.

>	}
> 
> because if nmi_queued is non-zero in the !KVM_VCPUEVENT_VALID_NMI_PENDING, then
> there should already be a pending KVM_REQ_NMI.  Alternatively, replace process_nmi()
> with a KVM_REQ_NMI request (that probably has my vote?).
> 
> If that works, can you do that in a separate patch?  Then this patch can tack on
> a call to clear V_NMI.
> 
>>  	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
>>  	    lapic_in_kernel(vcpu))
>>  		vcpu->arch.apic->sipi_vector = events->sipi_vector;
>> @@ -10008,6 +10011,10 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
>>  static void process_nmi(struct kvm_vcpu *vcpu)
>>  {
>>  	unsigned limit = 2;
>> +	int nmi_to_queue = atomic_xchg(&vcpu->arch.nmi_queued, 0);
>> +
>> +	if (!nmi_to_queue)
>> +		return;
>>  
>>  	/*
>>  	 * x86 is limited to one NMI running, and one NMI pending after it.
>> @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
>>  	 * Otherwise, allow two (and we'll inject the first one immediately).
>>  	 */
>>  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
>> -		limit = 1;
>> +		limit--;
>> +
>> +	/* Also if there is already a NMI hardware queued to be injected,
>> +	 * decrease the limit again
>> +	 */
>> +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
>> +		limit--;
> 
> I don't think this is correct.  If a vNMI is pending and NMIs are blocked, then
> limit will end up '0' and KVM will fail to pend the additional NMI in software.
> After much fiddling, and factoring in the above, how about this?
> 
> 	unsigned limit = 2;
> 
> 	/*
> 	 * x86 is limited to one NMI running, and one NMI pending after it.
> 	 * If an NMI is already in progress, limit further NMIs to just one.
> 	 * Otherwise, allow two (and we'll inject the first one immediately).
> 	 */
> 	if (vcpu->arch.nmi_injected) {
> 		/* vNMI counts as the "one pending NMI". */
> 		if (static_call(kvm_x86_is_vnmi_pending)(vcpu))
> 			limit = 0;
> 		else
> 			limit = 1;
> 	} else if (static_call(kvm_x86_get_nmi_mask)(vcpu) ||
> 		   static_call(kvm_x86_is_vnmi_pending)(vcpu)) {
> 		limit = 1;
> 	}
> 
> 	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
> 	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
> 
> 	if (vcpu->arch.nmi_pending &&
> 	    static_call(kvm_x86_set_vnmi_pending(vcpu)))
> 		vcpu->arch.nmi_pending--;
> 
> 	if (vcpu->arch.nmi_pending)
> 		kvm_make_request(KVM_REQ_EVENT, vcpu);
> 
> With the KVM_REQ_EVENT change in a separate prep patch:
> 
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Tue, 31 Jan 2023 13:33:02 -0800
> Subject: [PATCH] KVM: x86: Raise an event request when processing NMIs iff an
>  NMI is pending
> 
> Don't raise KVM_REQ_EVENT if no NMIs are pending at the end of
> process_nmi().  Finishing process_nmi() without a pending NMI will become
> much more likely when KVM gains support for AMD's vNMI, which allows
> pending vNMIs in hardware, i.e. doesn't require explicit injection.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 508074e47bc0..030136b6ebbd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10134,7 +10134,9 @@ static void process_nmi(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
>  	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
> -	kvm_make_request(KVM_REQ_EVENT, vcpu);
> +
> +	if (vcpu->arch.nmi_pending)
> +		kvm_make_request(KVM_REQ_EVENT, vcpu);
>  }
>  
>  void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
> 

Looks good to me, will include as separate patch.

Thanks,
Santosh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-02-01  0:06     ` Sean Christopherson
@ 2023-02-08  9:51       ` Santosh Shukla
  2023-02-08 16:09         ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2023-02-08  9:51 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On 2/1/2023 5:36 AM, Sean Christopherson wrote:
> On Tue, Jan 31, 2023, Sean Christopherson wrote:
>> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>>> @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
>>>  	 * Otherwise, allow two (and we'll inject the first one immediately).
>>>  	 */
>>>  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
>>> -		limit = 1;
>>> +		limit--;
>>> +
>>> +	/* Also if there is already a NMI hardware queued to be injected,
>>> +	 * decrease the limit again
>>> +	 */
>>> +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
>>> +		limit--;
>>
>> I don't think this is correct.  If a vNMI is pending and NMIs are blocked, then
>> limit will end up '0' and KVM will fail to pend the additional NMI in software.
> 
> Scratch that, dropping the second NMI in this case is correct.  The "running" part
> of the existing "x86 is limited to one NMI running, and one NMI pending after it"
> confused me.  The "running" thing is really just a variant on NMIs being blocked.
> 
> I'd also like to avoid the double decrement logic.  Accounting the virtual NMI is
> a very different thing than dealing with concurrent NMIs, I'd prefer to reflect
> that in the code.
> 
> Any objection to folding in the below to end up with:
> 
> 	unsigned limit;
> 
> 	/*
> 	 * x86 is limited to one NMI pending, but because KVM can't react to
> 	 * incoming NMIs as quickly as bare metal, e.g. if the vCPU is
> 	 * scheduled out, KVM needs to play nice with two queued NMIs showing
> 	 * up at the same time.  To handle this scenario, allow two NMIs to be
> 	 * (temporarily) pending so long as NMIs are not blocked and KVM is not
> 	 * waiting for a previous NMI injection to complete (which effectively
> 	 * blocks NMIs).  KVM will immediately inject one of the two NMIs, and
> 	 * will request an NMI window to handle the second NMI.
> 	 */
> 	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
> 		limit = 1;
> 	else
> 		limit = 2;
> 
> 	/*
> 	 * Adjust the limit to account for pending virtual NMIs, which aren't
> 	 * tracked in in vcpu->arch.nmi_pending.
> 	 */
> 	if (static_call(kvm_x86_is_vnmi_pending)(vcpu))
> 		limit--;
> 
> 	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
> 	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
> 

I believe, you missed the function below hunk -

	if (vcpu->arch.nmi_pending &&
	    static_call(kvm_x86_set_vnmi_pending(vcpu)))
		vcpu->arch.nmi_pending--;

Or am I missing something.. please suggest.

> 	if (vcpu->arch.nmi_pending)
> 		kvm_make_request(KVM_REQ_EVENT, vcpu);
> 
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Tue, 31 Jan 2023 16:02:21 -0800
> Subject: [PATCH] KVM: x86: Tweak the code and comment related to handling
>  concurrent NMIs
> 
> Tweak the code and comment that deals with concurrent NMIs to explicitly
> call out that x86 allows exactly one pending NMI, but that KVM needs to
> temporarily allow two pending NMIs in order to workaround the fact that
> the target vCPU cannot immediately recognize an incoming NMI, unlike bare
> metal.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 030136b6ebbd..fda09ba48b6b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10122,15 +10122,22 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
>  
>  static void process_nmi(struct kvm_vcpu *vcpu)
>  {
> -	unsigned limit = 2;
> +	unsigned limit;
>  
>  	/*
> -	 * x86 is limited to one NMI running, and one NMI pending after it.
> -	 * If an NMI is already in progress, limit further NMIs to just one.
> -	 * Otherwise, allow two (and we'll inject the first one immediately).
> +	 * x86 is limited to one NMI pending, but because KVM can't react to
> +	 * incoming NMIs as quickly as bare metal, e.g. if the vCPU is
> +	 * scheduled out, KVM needs to play nice with two queued NMIs showing
> +	 * up at the same time.  To handle this scenario, allow two NMIs to be
> +	 * (temporarily) pending so long as NMIs are not blocked and KVM is not
> +	 * waiting for a previous NMI injection to complete (which effectively
> +	 * blocks NMIs).  KVM will immediately inject one of the two NMIs, and
> +	 * will request an NMI window to handle the second NMI.
>  	 */
>  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
>  		limit = 1;
> +	else
> +		limit = 2;
>  
>  	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
>  	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
> 

Looks good to me, will include in v3.

Thanks,
Santosh


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-02-08  9:43     ` Santosh Shukla
@ 2023-02-08 16:06       ` Sean Christopherson
  2023-02-14 10:22         ` Santosh Shukla
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-02-08 16:06 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Maxim Levitsky, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Wed, Feb 08, 2023, Santosh Shukla wrote:
> On 2/1/2023 3:58 AM, Sean Christopherson wrote:
> > On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> >> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> >>  
> >>  	vcpu->arch.nmi_injected = events->nmi.injected;
> >>  	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
> >> -		vcpu->arch.nmi_pending = events->nmi.pending;
> >> +		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
> >> +
> >>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> >>  
> >> +	process_nmi(vcpu);
> > 
> > Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's
> > ABI that's ugly).  E.g. if we collapse this down, it becomes:
> > 
> > 	process_nmi(vcpu);
> > 
> > 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> > 		<blah blah blah>
> > 	}
> > 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> > 
> > 	process_nmi(vcpu);
> > 
> > And the second mess is that V_NMI needs to be cleared.
> > 
> 
> Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning
> about V_NMI_MASK or svm->nmi_masked?

V_NMI_MASK.  KVM needs to purge any pending virtual NMIs when userspace sets
vCPU event state and KVM_VCPUEVENT_VALID_NMI_PENDING is set.

> > The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep
> > nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set.  I think we can just
> > replace that with an set of nmi_queued, i.e.
> > 
> > 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> > 		vcpu->arch-nmi_pending = 0;	
> > 		atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> > 		process_nmi();
> > 
> You mean replace above process_nmi() with kvm_make_request(KVM_REQ_NMI, vcpu), right?
> I'll try with above proposal.

Yep, if that works.  Actually, that might be a requirement.  There's a

  static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);

lurking below this.  Invoking process_nmi() before NMI blocking is updated could
result in KVM incorrectly dropping/keeping NMIs.  I don't think it would be a
problem in practice since KVM save only one NMI, but userspace could stuff NMIs.

Huh.  The the existing code is buggy.  events->nmi.pending is a u8, and
arch.nmi_pending is an unsigned int.  KVM doesn't cap the incoming value, so
userspace could set up to 255 pending NMIs.  The extra weird part is that the extra
NMIs will get dropped the next time KVM stumbles through process_nmi().

Amusingly, KVM only saves one pending NMI, i.e. in a true migration scenario KVM
may drop an NMI.

  events->nmi.pending = vcpu->arch.nmi_pending != 0;

The really amusing part is that that code was added by 7460fb4a3400 ("KVM: Fix
simultaneous NMIs").  The only thing I can figure is that KVM_GET_VCPU_EVENTS was
somewhat blindly updated without much thought about what should actually happen.

So, can you slide the below in early in the series?  Then in this series, convert
to the above suggested flow (zero nmi_pending, stuff nmi_queued) in another patch?

From: Sean Christopherson <seanjc@google.com>
Date: Wed, 8 Feb 2023 07:44:16 -0800
Subject: [PATCH] KVM: x86: Save/restore all NMIs when multiple NMIs are
 pending

Save all pending NMIs in KVM_GET_VCPU_EVENTS, and queue KVM_REQ_NMI if one
or more NMIs are pending after KVM_SET_VCPU_EVENTS in order to re-evaluate
pending NMIs with respect to NMI blocking.

KVM allows multiple NMIs to be pending in order to faithfully emulate bare
metal handling of simultaneous NMIs (on bare metal, truly simultaneous
NMIs are impossible, i.e. one will always arrive first and be consumed).
Support for simultaneous NMIs botched the save/restore though.  KVM only
saves one pending NMI, but allows userspace to restore 255 pending NMIs
as kvm_vcpu_events.nmi.pending is a u8, and KVM's internal state is stored
in an unsigned int.

7460fb4a3400 ("KVM: Fix simultaneous NMIs")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 508074e47bc0..e9339acbf82a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5115,7 +5115,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 	events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
 
 	events->nmi.injected = vcpu->arch.nmi_injected;
-	events->nmi.pending = vcpu->arch.nmi_pending != 0;
+	events->nmi.pending = vcpu->arch.nmi_pending;
 	events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu);
 
 	/* events->sipi_vector is never valid when reporting to user space */
@@ -5202,8 +5202,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 						events->interrupt.shadow);
 
 	vcpu->arch.nmi_injected = events->nmi.injected;
-	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
+	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
 		vcpu->arch.nmi_pending = events->nmi.pending;
+		if (vcpu->arch.nmi_pending)
+			kvm_make_request(KVM_REQ_NMI, vcpu);
+	}
 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
 
 	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&

base-commit: 6c77ae716d546d71b21f0c9ee7d405314a3f3f9e
-- 


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-02-08  9:51       ` Santosh Shukla
@ 2023-02-08 16:09         ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-02-08 16:09 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Maxim Levitsky, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Wed, Feb 08, 2023, Santosh Shukla wrote:
> On 2/1/2023 5:36 AM, Sean Christopherson wrote:
> > On Tue, Jan 31, 2023, Sean Christopherson wrote:
> >> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> >>> @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
> >>>  	 * Otherwise, allow two (and we'll inject the first one immediately).
> >>>  	 */
> >>>  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
> >>> -		limit = 1;
> >>> +		limit--;
> >>> +
> >>> +	/* Also if there is already a NMI hardware queued to be injected,
> >>> +	 * decrease the limit again
> >>> +	 */
> >>> +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
> >>> +		limit--;
> >>
> >> I don't think this is correct.  If a vNMI is pending and NMIs are blocked, then
> >> limit will end up '0' and KVM will fail to pend the additional NMI in software.
> > 
> > Scratch that, dropping the second NMI in this case is correct.  The "running" part
> > of the existing "x86 is limited to one NMI running, and one NMI pending after it"
> > confused me.  The "running" thing is really just a variant on NMIs being blocked.
> > 
> > I'd also like to avoid the double decrement logic.  Accounting the virtual NMI is
> > a very different thing than dealing with concurrent NMIs, I'd prefer to reflect
> > that in the code.
> > 
> > Any objection to folding in the below to end up with:
> > 
> > 	unsigned limit;
> > 
> > 	/*
> > 	 * x86 is limited to one NMI pending, but because KVM can't react to
> > 	 * incoming NMIs as quickly as bare metal, e.g. if the vCPU is
> > 	 * scheduled out, KVM needs to play nice with two queued NMIs showing
> > 	 * up at the same time.  To handle this scenario, allow two NMIs to be
> > 	 * (temporarily) pending so long as NMIs are not blocked and KVM is not
> > 	 * waiting for a previous NMI injection to complete (which effectively
> > 	 * blocks NMIs).  KVM will immediately inject one of the two NMIs, and
> > 	 * will request an NMI window to handle the second NMI.
> > 	 */
> > 	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
> > 		limit = 1;
> > 	else
> > 		limit = 2;
> > 
> > 	/*
> > 	 * Adjust the limit to account for pending virtual NMIs, which aren't
> > 	 * tracked in in vcpu->arch.nmi_pending.
> > 	 */
> > 	if (static_call(kvm_x86_is_vnmi_pending)(vcpu))
> > 		limit--;
> > 
> > 	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
> > 	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
> > 
> 
> I believe, you missed the function below hunk -
> 
> 	if (vcpu->arch.nmi_pending &&
> 	    static_call(kvm_x86_set_vnmi_pending(vcpu)))
> 		vcpu->arch.nmi_pending--;
> 
> Or am I missing something.. please suggest.

You're not missing anything, I'm pretty sure I just lost tracking of things.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2023-01-28  1:10   ` Sean Christopherson
@ 2023-02-10 12:02     ` Santosh Shukla
  0 siblings, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2023-02-10 12:02 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Santosh Shukla

On 1/28/2023 6:40 AM, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>> This patch implements support for injecting pending
>> NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI
>> feature.
>>
>> Note that the vNMI can't cause a VM exit, which is needed
>> when a nested guest intercepts NMIs.
>>
>> Therefore to avoid breaking nesting, the vNMI is inhibited while
>> a nested guest is running and instead, the legacy NMI window
>> detection and delivery method is used.
>>
>> While it is possible to passthrough the vNMI if a nested guest
>> doesn't intercept NMIs, such usage is very uncommon, and it's
>> not worth to optimize for.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
>> ---
>>  arch/x86/kvm/svm/nested.c |  42 +++++++++++++++
>>  arch/x86/kvm/svm/svm.c    | 111 ++++++++++++++++++++++++++++++--------
>>  arch/x86/kvm/svm/svm.h    |  10 ++++
>>  3 files changed, 140 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>> index e891318595113e..5bea672bf8b12d 100644
>> --- a/arch/x86/kvm/svm/nested.c
>> +++ b/arch/x86/kvm/svm/nested.c
>> @@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj)
>>  	return type == SVM_EVTINJ_TYPE_NMI;
>>  }
>>  
>> +static void nested_svm_save_vnmi(struct vcpu_svm *svm)
>> +{
>> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
>> +
>> +	/*
>> +	 * Copy the vNMI state back to software NMI tracking state
>> +	 * for the duration of the nested run
>> +	 */
>> +
> 
> Unecessary newline.
> 
ok.

>> +	svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK;
>> +	svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING;
>> +}
>> +
>> +static void nested_svm_restore_vnmi(struct vcpu_svm *svm)
>> +{
>> +	struct kvm_vcpu *vcpu = &svm->vcpu;
>> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
>> +
>> +	/*
>> +	 * Restore the vNMI state from the software NMI tracking state
>> +	 * after a nested run
>> +	 */
>> +
> 
> Unnecessary newline.
>

ok.
 
>> +	if (svm->nmi_masked)
>> +		vmcb01->control.int_ctl |= V_NMI_MASK;
>> +	else
>> +		vmcb01->control.int_ctl &= ~V_NMI_MASK;
>> +
>> +	if (vcpu->arch.nmi_pending) {
>> +		vcpu->arch.nmi_pending--;
>> +		vmcb01->control.int_ctl |= V_NMI_PENDING;
>> +	} else
>> +		vmcb01->control.int_ctl &= ~V_NMI_PENDING;
> 
> Needs curly braces.
> 
Yes.

Thanks,
Santosh

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2023-02-01  0:22   ` Sean Christopherson
  2023-02-01  0:39     ` Sean Christopherson
@ 2023-02-10 12:24     ` Santosh Shukla
  2023-02-10 16:44       ` Sean Christopherson
  1 sibling, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2023-02-10 12:24 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin, Santosh Shukla

On 2/1/2023 5:52 AM, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>> This patch implements support for injecting pending
>> NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI
> 
> Wrap closer to 75 chars, and please try to be consistent in how your format
> things like changelogs and comments.  It's jarring as a reader when the wrap
> column is constantly changing.
> 
>> feature.
> 
> Please combine the introduction, usage, and implementation of the hew kvm_x86_ops,
> i.e. introduce and use the ops in this patch.  It's extremely difficult to review
> the common x86 code that uses the ops without seeing how they're implemented in
> SVM.  I believe the overall size/scope of the patch can be kept reasonable by
> introducing some of the common changes in advance of the new ops, e.g. tweaking
> the KVM_SET_VCPU_EVENTS flow.
> 
>> Note that the vNMI can't cause a VM exit, which is needed
>> when a nested guest intercepts NMIs.
> 
> I can't tell if this is saying "SVM doesn't allow intercepting virtual NMIs", or
> "KVM never enables virtual NMI interception".
> 

I think, It meant to say that vNMI doesn't need nmi_window_exiting feature to
pend the new virtual NMI. Will reword.

>> Therefore to avoid breaking nesting, the vNMI is inhibited while
>> a nested guest is running and instead, the legacy NMI window
>> detection and delivery method is used.
> 
> State what KVM does, don't describe the effects.  E.g. "Disable vNMI while running
> L2".  When a changelog describes the effects, it's unclear whether the effects are
> new behavior introduced by the patch, hardware behavior, etc...
> 
>> While it is possible to passthrough the vNMI if a nested guest
>> doesn't intercept NMIs, such usage is very uncommon, and it's
>> not worth to optimize for.
> 
> Can you elaborate on why not?  It's not obvious to me that the code will end up
> more complex, and oftentimes omitting code increases the cognitive load on readers,
> i.e. makes things more complex in a way.  vNMI is mutually exclusive with NMI
> passthrough, i.e. KVM doesn't need to handle NMI passthrough and vNMI simultaneously.
> 
>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> SoB chain is wrong.  Maxim is credited as the sole Author, i.e. Santosh shouldn't
> have a SoB.  Assuming the intent is to attribute both of ya'll this needs to be
> 
>  Co-developed-by: Santosh Shukla <santosh.shukla@amd.com>
>  Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>  Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> if Maxim is the primary author, or this if Santosh is the primary author
> 
>  From: Santosh Shukla <santosh.shukla@amd.com>
> 
>  <blah blah blah>
> 
>  Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>  Developed-by: Maxim Levitsky <mlevitsk@redhat.com>
>  Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> 

Will sort those in v3.

>> ---
>>  arch/x86/kvm/svm/nested.c |  42 +++++++++++++++
>>  arch/x86/kvm/svm/svm.c    | 111 ++++++++++++++++++++++++++++++--------
>>  arch/x86/kvm/svm/svm.h    |  10 ++++
>>  3 files changed, 140 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>> index e891318595113e..5bea672bf8b12d 100644
>> --- a/arch/x86/kvm/svm/nested.c
>> +++ b/arch/x86/kvm/svm/nested.c
>> @@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj)
>>  	return type == SVM_EVTINJ_TYPE_NMI;
>>  }
>>  
>> +static void nested_svm_save_vnmi(struct vcpu_svm *svm)
> 
> Please avoid save/restore names.  KVM (selftests in particular) uses save/restore
> to refer to a saving and restoring state across a migration.  "sync" is probably
> the best option, or just open code the flows.
> 

ok.

I chose to open code that way I don't need to consider using svm->nmi_masked which should be
used for non-vNMI case.

>> +{
>> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
>> +
>> +	/*
>> +	 * Copy the vNMI state back to software NMI tracking state
>> +	 * for the duration of the nested run
>> +	 */
>> +
>> +	svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK;
>> +	svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING;
> 
> This is wrong.  V_NMI_PENDING is bit 11, i.e. the bitwise-AND does not yield a
> boolean value and will increment nmi_pending by 2048 instead of by 1.
>

Right.
 
> 	if (vmcb01->control.int_ctl & V_NMI_PENDING)
> 		svm->vcpu.arch.nmi_pending++;
> 
> And this needs a KVM_REQ_EVENT to ensure KVM processes the newly pending NMI.
> 

Ok.

>> +}
>> +
>> +static void nested_svm_restore_vnmi(struct vcpu_svm *svm)
>> +{
>> +	struct kvm_vcpu *vcpu = &svm->vcpu;
>> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
>> +
>> +	/*
>> +	 * Restore the vNMI state from the software NMI tracking state
>> +	 * after a nested run
>> +	 */
>> +
>> +	if (svm->nmi_masked)
>> +		vmcb01->control.int_ctl |= V_NMI_MASK;
>> +	else
>> +		vmcb01->control.int_ctl &= ~V_NMI_MASK;
> 
> As proposed, this needs to clear nmi_masked to avoid false positives.  The better
> approach is to not have any open coded accesses to svm->nmi_masked outside of
> flows that specifically need to deal with vNMI logic.
>
ok.
 
> E.g. svm_enable_nmi_window() reads the raw nmi_masked.
> 
>> +
>> +	if (vcpu->arch.nmi_pending) {
>> +		vcpu->arch.nmi_pending--;
>> +		vmcb01->control.int_ctl |= V_NMI_PENDING;
>> +	} else
> 
> Curly braces on all paths if any path needs 'em.
>

ok. 

>> +		vmcb01->control.int_ctl &= ~V_NMI_PENDING;
>> +}
> 
> ...
> 
>> + static bool svm_set_hw_nmi_pending(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +	if (!is_vnmi_enabled(svm))
>> +		return false;
>> +
>> +	if (svm->vmcb->control.int_ctl & V_NMI_PENDING)
>> +		return false;
>> +
>> +	svm->vmcb->control.int_ctl |= V_NMI_PENDING;
>> +	vmcb_mark_dirty(svm->vmcb, VMCB_INTR);
>> +
>> +	/*
>> +	 * NMI isn't yet technically injected but
>> +	 * this rough estimation should be good enough
> 
> Again, wrap at 80 chars, not at random points.
>

ok.
 
>> +	 */
>> +	++vcpu->stat.nmi_injections;
>> +
>> +	return true;
>> +}
>> +
> 
> ...
> 
>>  bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vcpu_svm *svm = to_svm(vcpu);
>> @@ -3725,10 +3772,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>>  	/*
>>  	 * Something prevents NMI from been injected. Single step over possible
>>  	 * problem (IRET or exception injection or interrupt shadow)
>> +	 *
>> +	 * With vNMI we should never need an NMI window
>> +	 * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ)
> 
> Please honor the soft limit and avoid pronouns.  There's also no need to put the
> blurb in parantheses on its own line.
> 
> As for the code, I believe there are bugs.  Pulling in the context...
> 
> 	if (svm->nmi_masked && !svm->awaiting_iret_completion)
> 		return; /* IRET will cause a vm exit */
> 
> Checking nmi_masked is wrong, this should use the helper.  Even if this code can

Right,.

> only be reached on error, it should still try its best to not make things worse.
> 
> 	if (!gif_set(svm)) {
> 		if (vgif)
> 			svm_set_intercept(svm, INTERCEPT_STGI);
> 		return; /* STGI will cause a vm exit */
> 	}
> 
> 	/*
> 	 * Something prevents NMI from been injected. Single step over possible
> 	 * problem (IRET or exception injection or interrupt shadow)
> 	 *
> 	 * With vNMI we should never need an NMI window
> 	 * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ)
> 	 */
> 	if (WARN_ON_ONCE(is_vnmi_enabled(svm)))
> 		return;
> 
> This is flawed, where "this" means handling of NMI windows when vNMI is enabled.
> 
> IIUC, if there are back-to-back NMIs, the intent is to set V_NMI for one and
> inject the other.  I believe there are multiple bugs svm_inject_nmi().  The one
> that's definitely a bug is setting svm->nmi_masked.  The other suspected bug,
> which is related to the above WARN, is setting the IRET intercept.  The resulting
> IRET interception will set awaiting_iret_completion, and then the above WARN is
> reachable (even if the masking check is fixed).
> 
> I don't think KVM needs to ever intercept IRET.  One NMI gets injected, and the
> other is sitting in INT_CTL.V_NMI_PENDING, i.e. there's no need for KVM to regain
> control.  If another NMI comes along before V_NMI_PENDING is handled, it will
> either get injected or dropped.
> 
> So I _think_ KVM can skip the intercept code when injecting an NMI, and this WARN
> can be hoisted to the top of svm_enable_nmi_window(), because as stated above, KVM
> should never need to request an NMI window.
> 
> Last thought, unless there's something that will obviously break, it's probably
> better to WARN and continue than to bail.  I.e. do the single-step and hope for
> the best.  Bailing at this point doesn't seem like it would help.
> 
>>  	 */
>> +	if (WARN_ON_ONCE(is_vnmi_enabled(svm)))
>> +		return;
>> +
>>  	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
>> -	svm->nmi_singlestep = true;
>>  	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>> +	svm->nmi_singlestep = true;
>>  }
> 

Ok.

So you mean.. In vNMI mode, KVM should never need to request NMI window and eventually
it reaches to NMI window then WARN_ON and cont.. to single step... so modified code change
may look something like below:

static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
{
        struct vcpu_svm *svm = to_svm(vcpu);

        /*
         * With vNMI we should never need an NMI window.
         * and if we reach here then better WARN and continue to single step.
         */
        WARN_ON_ONCE(is_vnmi_enabled(svm));

        if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
                return; /* IRET will cause a vm exit */

        if (!gif_set(svm)) {
                if (vgif)
                        svm_set_intercept(svm, INTERCEPT_STGI);
                return; /* STGI will cause a vm exit */
        }

        /*
         * Something prevents NMI from been injected. Single step over possible
         * problem (IRET or exception injection or interrupt shadow)
         */

        svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
        svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
        svm->nmi_singlestep = true;
}

Does that make sense?

Thanks,
Santosh

> ...
> 
>> @@ -553,6 +554,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
>>  	       (msr < (APIC_BASE_MSR + 0x100));
>>  }
>>  
>> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
>> +{
>> +	/* L1's vNMI is inhibited while nested guest is running */
>> +	if (is_guest_mode(&svm->vcpu))
> 
> I would rather check the current VMCB.  I don't see any value in hardcoding the
> "KVM doesn't support vNMI in L2" in multiple places.  And I find the above comment
> about "L1's vNMI is inhibited" confusing.  vNMI isn't inhibited/blocked, KVM just
> doesn't utilize vNMI while L2 is active (IIUC, as proposed).
> 
>> +		return false;
>> +
>> +	return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE);
>> +}
>> +
>>  /* svm.c */
>>  #define MSR_INVALID				0xffffffffU
>>  
>> -- 
>> 2.26.3
>>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 10/11] KVM: SVM: implement support for vNMI
  2023-02-10 12:24     ` Santosh Shukla
@ 2023-02-10 16:44       ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-02-10 16:44 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Maxim Levitsky, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Fri, Feb 10, 2023, Santosh Shukla wrote:
> On 2/1/2023 5:52 AM, Sean Christopherson wrote:
> So you mean.. In vNMI mode, KVM should never need to request NMI window and eventually
> it reaches to NMI window then WARN_ON and cont.. to single step... so modified code change
> may look something like below:
> 
> static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
> 
>         /*
>          * With vNMI we should never need an NMI window.
>          * and if we reach here then better WARN and continue to single step.
>          */
>         WARN_ON_ONCE(is_vnmi_enabled(svm));
> 
>         if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
>                 return; /* IRET will cause a vm exit */
> 
>         if (!gif_set(svm)) {
>                 if (vgif)
>                         svm_set_intercept(svm, INTERCEPT_STGI);
>                 return; /* STGI will cause a vm exit */
>         }
> 
>         /*
>          * Something prevents NMI from been injected. Single step over possible
>          * problem (IRET or exception injection or interrupt shadow)
>          */
> 
>         svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
>         svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>         svm->nmi_singlestep = true;
> }
> 
> Does that make sense?

Yep.  Though please avoid "we" and other pronouns in changelogs and comments,
and wrap as close to the boundary as possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception
  2023-01-31 21:07           ` Sean Christopherson
@ 2023-02-13 14:50             ` Santosh Shukla
  0 siblings, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2023-02-13 14:50 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: Santosh Shukla, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On 2/1/2023 2:37 AM, Sean Christopherson wrote:
> On Thu, Dec 08, 2022, Maxim Levitsky wrote:
>> On Thu, 2022-12-08 at 17:39 +0530, Santosh Shukla wrote:
>>>
>>> On 12/6/2022 5:44 PM, Maxim Levitsky wrote:
>>>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>>>> index 512b2aa21137e2..cfed6ab29c839a 100644
>>>>>> --- a/arch/x86/kvm/svm/svm.c
>>>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>>>> @@ -2468,16 +2468,29 @@ static int task_switch_interception(struct kvm_vcpu *vcpu)
>>>>>>  			       has_error_code, error_code);
>>>>>>  }
>>>>>>  
>>>>>> +static void svm_disable_iret_interception(struct vcpu_svm *svm)
>>>>>> +{
>>>>>> +	if (!sev_es_guest(svm->vcpu.kvm))
>>>>>> +		svm_clr_intercept(svm, INTERCEPT_IRET);
>>>>>> +}
>>>>>> +
>>>>>> +static void svm_enable_iret_interception(struct vcpu_svm *svm)
>>>>>> +{
>>>>>> +	if (!sev_es_guest(svm->vcpu.kvm))
>>>>>> +		svm_set_intercept(svm, INTERCEPT_IRET);
>>>>>> +}
>>>>>> +
>>>>>
>>>>> nits:
>>>>> s/_iret_interception / _iret_intercept
>>>>> does that make sense?
>>>>
>>>> Makes sense.
> 
> I would rather go with svm_{clr,set}_iret_intercept().  I don't particularly like

ok.

> the SVM naming scheme, but I really dislike inconsistent naming.  If we want to
> clean up naming, I would love unify VMX and SVM nomenclature for things like this.
> 
>>>>  I can also move this to svm.h near the svm_set_intercept(), I think
>>>> it better a better place for this function there if no objections.
>>>>
>>> I think current approach is fine since function used in svm.c only. but I have
>>> no strong opinion on moving to svm.h either ways.
>>
>> I also think so, just noticed something in case there are any objections.
> 
> My vote is to keep it in svm.c unless we anticipate usage outside of svm.h.  Keeping

ok.

Thanks,
Santosh
> the implementation close to the usage makes it easer to understand what's going on,
> especially for something like this where there's a bit of "hidden" logic for SEV-ES.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-02-08 16:06       ` Sean Christopherson
@ 2023-02-14 10:22         ` Santosh Shukla
  2023-02-15 22:43           ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Santosh Shukla @ 2023-02-14 10:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin



On 2/8/2023 9:36 PM, Sean Christopherson wrote:
> On Wed, Feb 08, 2023, Santosh Shukla wrote:
>> On 2/1/2023 3:58 AM, Sean Christopherson wrote:
>>> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>>>> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>>>>  
>>>>  	vcpu->arch.nmi_injected = events->nmi.injected;
>>>>  	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
>>>> -		vcpu->arch.nmi_pending = events->nmi.pending;
>>>> +		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
>>>> +
>>>>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>>>>  
>>>> +	process_nmi(vcpu);
>>>
>>> Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's
>>> ABI that's ugly).  E.g. if we collapse this down, it becomes:
>>>
>>> 	process_nmi(vcpu);
>>>
>>> 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>>> 		<blah blah blah>
>>> 	}
>>> 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>>>
>>> 	process_nmi(vcpu);
>>>
>>> And the second mess is that V_NMI needs to be cleared.
>>>
>>
>> Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning
>> about V_NMI_MASK or svm->nmi_masked?
> 
> V_NMI_MASK.  KVM needs to purge any pending virtual NMIs when userspace sets
> vCPU event state and KVM_VCPUEVENT_VALID_NMI_PENDING is set.
> 

As per the APM: V_NMI_MASK is managed by the processor
"
V_NMI_MASK: Indicates whether virtual NMIs are masked. The processor will set V_NMI_MASK
once it takes the virtual NMI. V_NMI_MASK is cleared when the guest successfully completes an
IRET instruction or #VMEXIT occurs while delivering the virtual NMI
"

In my initial implementation I had changed V_NMI_MASK for the SMM scenario [1],
This is also not required as HW will save the V_NMI/V_NMI_MASK on 
SMM entry and restore them on RSM.

That said the svm_{get,set}_nmi_mask will look something like:

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a9e9bfbffd72..08911a33cf1e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3635,13 +3635,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)

 static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
 {
-       return to_svm(vcpu)->nmi_masked;
+       struct vcpu_svm *svm = to_svm(vcpu);
+
+       if (is_vnmi_enabled(svm))
+               return svm->vmcb->control.int_ctl & V_NMI_MASK;
+       else
+               return svm->nmi_masked;
 }

 static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 {
        struct vcpu_svm *svm = to_svm(vcpu);

+       if (is_vnmi_enabled(svm))
+               return;
+
        if (masked) {
                svm->nmi_masked = true;
                svm_set_iret_intercept(svm);

is there any inputs on above approach?

[1] https://lore.kernel.org/all/20220810061226.1286-4-santosh.shukla@amd.com/

>>> The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep
>>> nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set.  I think we can just
>>> replace that with an set of nmi_queued, i.e.
>>>
>>> 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>>> 		vcpu->arch-nmi_pending = 0;	
>>> 		atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
>>> 		process_nmi();
>>>
>> You mean replace above process_nmi() with kvm_make_request(KVM_REQ_NMI, vcpu), right?
>> I'll try with above proposal.
> 
> Yep, if that works.  Actually, that might be a requirement.  There's a
> 
>   static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> 
> lurking below this.  Invoking process_nmi() before NMI blocking is updated could
> result in KVM incorrectly dropping/keeping NMIs.  I don't think it would be a
> problem in practice since KVM save only one NMI, but userspace could stuff NMIs.
> 
> Huh.  The the existing code is buggy.  events->nmi.pending is a u8, and
> arch.nmi_pending is an unsigned int.  KVM doesn't cap the incoming value, so
> userspace could set up to 255 pending NMIs.  The extra weird part is that the extra
> NMIs will get dropped the next time KVM stumbles through process_nmi().
> 
> Amusingly, KVM only saves one pending NMI, i.e. in a true migration scenario KVM
> may drop an NMI.
> 
>   events->nmi.pending = vcpu->arch.nmi_pending != 0;
> 
> The really amusing part is that that code was added by 7460fb4a3400 ("KVM: Fix
> simultaneous NMIs").  The only thing I can figure is that KVM_GET_VCPU_EVENTS was
> somewhat blindly updated without much thought about what should actually happen.
> 
> So, can you slide the below in early in the series?  Then in this series, convert
> to the above suggested flow (zero nmi_pending, stuff nmi_queued) in another patch?
> 
> From: Sean Christopherson <seanjc@google.com>
> Date: Wed, 8 Feb 2023 07:44:16 -0800
> Subject: [PATCH] KVM: x86: Save/restore all NMIs when multiple NMIs are
>  pending
> 
> Save all pending NMIs in KVM_GET_VCPU_EVENTS, and queue KVM_REQ_NMI if one
> or more NMIs are pending after KVM_SET_VCPU_EVENTS in order to re-evaluate
> pending NMIs with respect to NMI blocking.
> 
> KVM allows multiple NMIs to be pending in order to faithfully emulate bare
> metal handling of simultaneous NMIs (on bare metal, truly simultaneous
> NMIs are impossible, i.e. one will always arrive first and be consumed).
> Support for simultaneous NMIs botched the save/restore though.  KVM only
> saves one pending NMI, but allows userspace to restore 255 pending NMIs
> as kvm_vcpu_events.nmi.pending is a u8, and KVM's internal state is stored
> in an unsigned int.
> 
> 7460fb4a3400 ("KVM: Fix simultaneous NMIs")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 508074e47bc0..e9339acbf82a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5115,7 +5115,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>  	events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
>  
>  	events->nmi.injected = vcpu->arch.nmi_injected;
> -	events->nmi.pending = vcpu->arch.nmi_pending != 0;
> +	events->nmi.pending = vcpu->arch.nmi_pending;
>  	events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu);
>  
>  	/* events->sipi_vector is never valid when reporting to user space */
> @@ -5202,8 +5202,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>  						events->interrupt.shadow);
>  
>  	vcpu->arch.nmi_injected = events->nmi.injected;
> -	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
> +	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>  		vcpu->arch.nmi_pending = events->nmi.pending;
> +		if (vcpu->arch.nmi_pending)
> +			kvm_make_request(KVM_REQ_NMI, vcpu);
> +	}
>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>  
>  	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
> 

Ok.

On top of the above, I am including your suggested change as below...

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0855599df65..437a6cea3bc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5201,9 +5201,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,

        vcpu->arch.nmi_injected = events->nmi.injected;
        if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
-               vcpu->arch.nmi_pending = events->nmi.pending;
-               if (vcpu->arch.nmi_pending)
-                       kvm_make_request(KVM_REQ_NMI, vcpu);
+               vcpu->arch.nmi_pending = 0;
+               atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
+               kvm_make_request(KVM_REQ_NMI, vcpu);
        }
        static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);

does that make sense?

> base-commit: 6c77ae716d546d71b21f0c9ee7d405314a3f3f9e


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-02-14 10:22         ` Santosh Shukla
@ 2023-02-15 22:43           ` Sean Christopherson
  2023-02-16  0:22             ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-02-15 22:43 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Maxim Levitsky, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, Feb 14, 2023, Santosh Shukla wrote:
> On 2/8/2023 9:36 PM, Sean Christopherson wrote:
> > On Wed, Feb 08, 2023, Santosh Shukla wrote:
> >> On 2/1/2023 3:58 AM, Sean Christopherson wrote:
> >>> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> >>>> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> >>>>  
> >>>>  	vcpu->arch.nmi_injected = events->nmi.injected;
> >>>>  	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
> >>>> -		vcpu->arch.nmi_pending = events->nmi.pending;
> >>>> +		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
> >>>> +
> >>>>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> >>>>  
> >>>> +	process_nmi(vcpu);
> >>>
> >>> Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's
> >>> ABI that's ugly).  E.g. if we collapse this down, it becomes:
> >>>
> >>> 	process_nmi(vcpu);
> >>>
> >>> 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> >>> 		<blah blah blah>
> >>> 	}
> >>> 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> >>>
> >>> 	process_nmi(vcpu);
> >>>
> >>> And the second mess is that V_NMI needs to be cleared.
> >>>
> >>
> >> Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning
> >> about V_NMI_MASK or svm->nmi_masked?
> > 
> > V_NMI_MASK.  KVM needs to purge any pending virtual NMIs when userspace sets
> > vCPU event state and KVM_VCPUEVENT_VALID_NMI_PENDING is set.
> > 
> 
> As per the APM: V_NMI_MASK is managed by the processor

Heh, we're running afoul over KVM's bad terminology conflicting with the APM's
terminology.  By V_NMI_MASK, I meant "KVM's V_NMI_MASK", a.k.a. the flag that says
whether or there's a pending NMI.


However...

> "
> V_NMI_MASK: Indicates whether virtual NMIs are masked. The processor will set V_NMI_MASK
> once it takes the virtual NMI. V_NMI_MASK is cleared when the guest successfully completes an
> IRET instruction or #VMEXIT occurs while delivering the virtual NMI
> "
>
> In my initial implementation I had changed V_NMI_MASK for the SMM scenario [1],
> This is also not required as HW will save the V_NMI/V_NMI_MASK on 
> SMM entry and restore them on RSM.
> 
> That said the svm_{get,set}_nmi_mask will look something like:
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index a9e9bfbffd72..08911a33cf1e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3635,13 +3635,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> 
>  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
>  {
> -       return to_svm(vcpu)->nmi_masked;
> +       struct vcpu_svm *svm = to_svm(vcpu);
> +
> +       if (is_vnmi_enabled(svm))
> +               return svm->vmcb->control.int_ctl & V_NMI_MASK;
> +       else
> +               return svm->nmi_masked;
>  }
> 
>  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>  {
>         struct vcpu_svm *svm = to_svm(vcpu);
> 
> +       if (is_vnmi_enabled(svm))
> +               return;
> +
>         if (masked) {
>                 svm->nmi_masked = true;
>                 svm_set_iret_intercept(svm);
> 
> is there any inputs on above approach?

What happens if software clears the "NMIs are blocked" flag?  If KVM can't clear
the flag, then we've got problems.  E.g. if KVM emulates IRET or SMI+RSM.  And I
I believe there are use cases that use KVM to snapshot and reload vCPU state,
e.g. record+replay?, in which case KVM_SET_VCPU_EVENTS needs to be able to adjust
NMI blocking too.

> On top of the above, I am including your suggested change as below...
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e0855599df65..437a6cea3bc7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5201,9 +5201,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> 
>         vcpu->arch.nmi_injected = events->nmi.injected;
>         if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> -               vcpu->arch.nmi_pending = events->nmi.pending;
> -               if (vcpu->arch.nmi_pending)
> -                       kvm_make_request(KVM_REQ_NMI, vcpu);
> +               vcpu->arch.nmi_pending = 0;
> +               atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> +               kvm_make_request(KVM_REQ_NMI, vcpu);
>         }
>         static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> 
> does that make sense?

Yep!

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-02-15 22:43           ` Sean Christopherson
@ 2023-02-16  0:22             ` Sean Christopherson
  2023-02-17  7:56               ` Santosh Shukla
  0 siblings, 1 reply; 66+ messages in thread
From: Sean Christopherson @ 2023-02-16  0:22 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Maxim Levitsky, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Wed, Feb 15, 2023, Sean Christopherson wrote:
> On Tue, Feb 14, 2023, Santosh Shukla wrote:
> > "
> > V_NMI_MASK: Indicates whether virtual NMIs are masked. The processor will set V_NMI_MASK
> > once it takes the virtual NMI. V_NMI_MASK is cleared when the guest successfully completes an
> > IRET instruction or #VMEXIT occurs while delivering the virtual NMI
> > "
> >
> > In my initial implementation I had changed V_NMI_MASK for the SMM scenario [1],
> > This is also not required as HW will save the V_NMI/V_NMI_MASK on 
> > SMM entry and restore them on RSM.
> > 
> > That said the svm_{get,set}_nmi_mask will look something like:

...

> >  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> >  {
> >         struct vcpu_svm *svm = to_svm(vcpu);
> > 
> > +       if (is_vnmi_enabled(svm))
> > +               return;
> > +
> >         if (masked) {
> >                 svm->nmi_masked = true;
> >                 svm_set_iret_intercept(svm);
> > 
> > is there any inputs on above approach?
> 
> What happens if software clears the "NMIs are blocked" flag?  If KVM can't clear
> the flag, then we've got problems.  E.g. if KVM emulates IRET or SMI+RSM.  And I
> I believe there are use cases that use KVM to snapshot and reload vCPU state,
> e.g. record+replay?, in which case KVM_SET_VCPU_EVENTS needs to be able to adjust
> NMI blocking too.

Actually, what am I thinking.  Any type of state save/restore will need to stuff
NMI blocking.  E.g. live migration of a VM that is handling an NMI (V_NMI_MASK=1)
_and_ has a pending NMI (V_NMI=1) absolutely needs to set V_NMI_MASK=1 on the dest,
otherwise the pending NMI will get serviced when the guest expects NMIs to be blocked.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-02-16  0:22             ` Sean Christopherson
@ 2023-02-17  7:56               ` Santosh Shukla
  0 siblings, 0 replies; 66+ messages in thread
From: Santosh Shukla @ 2023-02-17  7:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm, Sandipan Das, Paolo Bonzini, Jim Mattson,
	Peter Zijlstra, Dave Hansen, Borislav Petkov, Pawan Gupta,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Daniel Sneddon,
	Jiaxi Chen, Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin



On 2/16/2023 5:52 AM, Sean Christopherson wrote:
> On Wed, Feb 15, 2023, Sean Christopherson wrote:
>> On Tue, Feb 14, 2023, Santosh Shukla wrote:
>>> "
>>> V_NMI_MASK: Indicates whether virtual NMIs are masked. The processor will set V_NMI_MASK
>>> once it takes the virtual NMI. V_NMI_MASK is cleared when the guest successfully completes an
>>> IRET instruction or #VMEXIT occurs while delivering the virtual NMI
>>> "
>>>
>>> In my initial implementation I had changed V_NMI_MASK for the SMM scenario [1],
>>> This is also not required as HW will save the V_NMI/V_NMI_MASK on 
>>> SMM entry and restore them on RSM.
>>>
>>> That said the svm_{get,set}_nmi_mask will look something like:
> 
> ...
> 
>>>  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>>>  {
>>>         struct vcpu_svm *svm = to_svm(vcpu);
>>>
>>> +       if (is_vnmi_enabled(svm))
>>> +               return;
>>> +
>>>         if (masked) {
>>>                 svm->nmi_masked = true;
>>>                 svm_set_iret_intercept(svm);
>>>
>>> is there any inputs on above approach?
>>
>> What happens if software clears the "NMIs are blocked" flag?  If KVM can't clear
>> the flag, then we've got problems.  E.g. if KVM emulates IRET or SMI+RSM.  And I
>> I believe there are use cases that use KVM to snapshot and reload vCPU state,
>> e.g. record+replay?, in which case KVM_SET_VCPU_EVENTS needs to be able to adjust
>> NMI blocking too.
> 
> Actually, what am I thinking.  Any type of state save/restore will need to stuff
> NMI blocking.  E.g. live migration of a VM that is handling an NMI (V_NMI_MASK=1)
> _and_ has a pending NMI (V_NMI=1) absolutely needs to set V_NMI_MASK=1 on the dest,
> otherwise the pending NMI will get serviced when the guest expects NMIs to be blocked.

Sure, Make sense. Will include V_NMI_BLOCKING_MASK set/clear in svm_set_nmi_mask() in v3
and will soon share patches for review.

Thanks.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
  2023-01-31  1:44     ` Sean Christopherson
@ 2023-02-24 14:38       ` Maxim Levitsky
  2023-02-24 16:48         ` Sean Christopherson
  0 siblings, 1 reply; 66+ messages in thread
From: Maxim Levitsky @ 2023-02-24 14:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Tue, 2023-01-31 at 01:44 +0000, Sean Christopherson wrote:
> On Sat, Jan 28, 2023, Sean Christopherson wrote:
> > On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > > the V_IRQ and v_TPR bits don't exist when virtual interrupt
> > > masking is not enabled, therefore the KVM should not copy these
> > > bits regardless of V_IRQ intercept.
> > 
> > Hmm, the APM disagrees:

Yes, my apologies, after re-reading the APM I agree with you.


> > 
> >  The APIC's TPR always controls the task priority for physical interrupts, and the
> >  V_TPR always controls virtual interrupts.
> > 
> >    While running a guest with V_INTR_MASKING cleared to 0:
> >      • Writes to CR8 affect both the APIC's TPR and the V_TPR register.
> > 
> > 
> >  ...
> > 
> >  The three VMCB fields V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR indicate whether there
> >  is a virtual interrupt pending, and, if so, what its vector number and priority are.
> > 
> > IIUC, V_INTR_MASKING_MASK is mostly about EFLAGS.IF, with a small side effect on
> > TPR.  E.g. a VMM could pend a V_IRQ but clear V_INTR_MASKING and expect the guest
> > to take the V_IRQ.  At least, that's my reading of things.

Yes, this is how I understand it as well.


> > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
> > >  1 file changed, 8 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > index 37af0338da7c32..aad3145b2f62fe 100644
> > > --- a/arch/x86/kvm/svm/nested.c
> > > +++ b/arch/x86/kvm/svm/nested.c
> > > @@ -412,24 +412,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
> > >   */
> > >  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> > >  {
> > > -	u32 mask;
> > > +	u32 mask = 0;
> > >  	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
> > >  	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
> > >  
> > > -	/* Only a few fields of int_ctl are written by the processor.  */
> > > -	mask = V_IRQ_MASK | V_TPR_MASK;
> > > -	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> > > -	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
> > > -		/*
> > > -		 * In order to request an interrupt window, L0 is usurping
> > > -		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> > > -		 * even if it was clear in L1's VMCB.  Restoring it would be
> > > -		 * wrong.  However, in this case V_IRQ will remain true until
> > > -		 * interrupt_window_interception calls svm_clear_vintr and
> > > -		 * restores int_ctl.  We can just leave it aside.
> > > -		 */
> > > -		mask &= ~V_IRQ_MASK;
> 
> Argh! *shakes fist at KVM and SVM*
> 
> This is ridiculously convoluted, and I'm pretty sure there are existing bugs.  If
> L1 runs L2 with V_IRQ=1 and V_INTR_MASKING=1


Note that there are two cases when we need an interrupt window in nested case:

- If the L1 doesn't intercept interrupts, which is what we are taking about here.
- If the L1 does intercept interrupts, but let L2 control the L1's EFLAGS.IF and/or
  L1's GIF 
  (that is V_INTR_MASKING_MASK is not set, and/or L1 doesn't intercept STGI/CLGI).

  In this case a 'real' interrupt will be converted to a VM exit but only
  when both L1's EFLAGS.IF is true and L1's GIF is true.



> , and KVM requests an interrupt window,
> then KVM will overwrite vmcb02's int_vector and int_ctl, i.e. clobber L1's V_IRQ,
> but then silently clear INTERCEPT_VINTR in recalc_intercepts() and thus prevent
> svm_clear_vintr() from being reached, i.e. prevent restoring L1's V_IRQ.


> 
> Bug #1 is that KVM shouldn't clobber the V_IRQ fields if KVM ultimately decides
> not to open an interrupt window.  Bug #2 is that KVM needs to open an interrupt
> window if save.RFLAGS.IF=1, as interrupts may become unblocked in that case,
> e.g. if L2 is in an interrupt shadow.


> 
> So I think this over two patches?
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 05d38944a6c0..ad1e70ac8669 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
>  
>         if (g->int_ctl & V_INTR_MASKING_MASK) {
>                 /*
> -                * Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
> -                * does not affect any interrupt we may want to inject;
> -                * therefore, writes to CR8 are irrelevant to L0, as are
> -                * interrupt window vmexits.
> +                * If L2 is active and V_INTR_MASKING is enabled in vmcb12,
> +                * disable intercept of CR8 writes as L2's CR8 does not affect
> +                * any interrupt KVM may want to inject.
> +                *
> +                * Similarly, disable intercept of virtual interrupts (used to
> +                * detect interrupt windows) if the saved RFLAGS.IF is '0', as
> +                * the effective RFLAGS.IF for L1 interrupts will never be set
> +                * while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
>                  */
>                 vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
> -               vmcb_clr_intercept(c, INTERCEPT_VINTR);
> +               if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
> +                       vmcb_clr_intercept(c, INTERCEPT_VINTR);

How about instead moving this code to svm_set_vintr?

That is, in the guest mode, if the guest has V_INTR_MASKING_MASK, then
then a nested VM exit is the next point the interrupt window could open,
thus we don't set VINTR)

Or even better put the logic in svm_enable_irq_window (that is avoid
calling svm_set_vintr in the first place).

I also think that it worth it to add a warning that 'svm_set_intercept'
didn't work, that is didn't really set an intercept. 
In theory that can result in nasty CVEs in addition to logic bugs as you found.


>         }
>  
>         /*
> @@ -416,18 +421,18 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
>  
>         /* Only a few fields of int_ctl are written by the processor.  */
>         mask = V_IRQ_MASK | V_TPR_MASK;
> -       if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> -           svm_is_intercept(svm, INTERCEPT_VINTR)) {
> -               /*
> -                * In order to request an interrupt window, L0 is usurping
> -                * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> -                * even if it was clear in L1's VMCB.  Restoring it would be
> -                * wrong.  However, in this case V_IRQ will remain true until
> -                * interrupt_window_interception calls svm_clear_vintr and
> -                * restores int_ctl.  We can just leave it aside.
> -                */
> +
> +       /*
> +        * Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting
> +        * virtual interrupts in order to request an interrupt window, as KVM
> +        * has usurped vmcb02's int_ctl.  If an interrupt window opens before
> +        * the next VM-Exit, svm_clear_vintr() will restore vmcb12's int_ctl.
> +        * If no window opens, V_IRQ will be correctly preserved in vmcb12's
> +        * int_ctl (because it was never recognized while L2 was running).
> +        */
> +       if (svm_is_intercept(svm, INTERCEPT_VINTR) &&
> +           !test_bit(INTERCEPT_VINTR, (unsigned long *)svm->nested.ctl.intercepts))
>                 mask &= ~V_IRQ_MASK;

This makes sense.



> -       }
>  
>         if (nested_vgif_enabled(svm))
>                 mask |= V_GIF_MASK;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b103fe7cbc82..59d2891662ef 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1580,6 +1580,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
>  
>         svm_set_intercept(svm, INTERCEPT_VINTR);
>  
> +       /*
> +        * Recalculating intercepts may have clear the VINTR intercept.  If
> +        * V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
> +        * for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
> +        * Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
> +        * interrupts will never be unblocked while L2 is running.
> +        */
> +       if (!svm_is_intercept(svm, INTERCEPT_VINTR))
> +               return;

This won't be needed if we don't call the svm_set_vintr in the first place.

> +
>         /*
>          * This is just a dummy VINTR to actually cause a vmexit to happen.
>          * Actual injection of virtual interrupts happens through EVENTINJ.
> 



With all this said, I also want to note that this patch has *nothing* to do with VNMI,
I only added it due to some refactoring, so feel free to drop it from vNMI queue,
and deal with those bugs separately.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 05/11] KVM: x86: emulator: stop using raw host flags
  2023-01-28  0:58   ` Sean Christopherson
@ 2023-02-24 14:38     ` Maxim Levitsky
  0 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2023-02-24 14:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Sat, 2023-01-28 at 00:58 +0000, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index f18f579ebde81c..85d2a12c214dda 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8138,9 +8138,14 @@ static void emulator_set_nmi_mask(struct x86_emulate_ctxt *ctxt, bool masked)
> >  	static_call(kvm_x86_set_nmi_mask)(emul_to_vcpu(ctxt), masked);
> >  }
> >  
> > -static unsigned emulator_get_hflags(struct x86_emulate_ctxt *ctxt)
> > +static bool emulator_in_smm(struct x86_emulate_ctxt *ctxt)
> >  {
> > -	return emul_to_vcpu(ctxt)->arch.hflags;
> > +	return emul_to_vcpu(ctxt)->arch.hflags & HF_SMM_MASK;
> 
> This needs to be is_smm() as HF_SMM_MASK is undefined if CONFIG_KVM_SMM=n.
> 
> > +}
> > +
> > +static bool emulator_in_guest_mode(struct x86_emulate_ctxt *ctxt)
> > +{
> > +	return emul_to_vcpu(ctxt)->arch.hflags & HF_GUEST_MASK;
> 
> And just use is_guest_mode() here.
> 

Makes sense.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface
  2023-01-28  1:09   ` Sean Christopherson
  2023-01-31 21:12     ` Sean Christopherson
  2023-02-08  9:32     ` Santosh Shukla
@ 2023-02-24 14:39     ` Maxim Levitsky
  2 siblings, 0 replies; 66+ messages in thread
From: Maxim Levitsky @ 2023-02-24 14:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Sat, 2023-01-28 at 01:09 +0000, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > This patch adds two new vendor callbacks:
> 
> No "this patch" please, just say what it does.
> 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 684a5519812fb2..46993ce61c92db 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -871,8 +871,13 @@ struct kvm_vcpu_arch {
> >  	u64 tsc_scaling_ratio; /* current scaling ratio */
> >  
> >  	atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
> > -	unsigned nmi_pending; /* NMI queued after currently running handler */
> > +
> > +	unsigned int nmi_pending; /*
> > +				   * NMI queued after currently running handler
> > +				   * (not including a hardware pending NMI (e.g vNMI))
> > +				   */
> 
> Put the block comment above.  I'd say collapse all of the comments about NMIs into
> a single big block comment.
> 
> >  	bool nmi_injected;    /* Trying to inject an NMI this entry */
> > +
> >  	bool smi_pending;    /* SMI queued after currently running handler */
> >  	u8 handling_intr_from_guest;
> >  
> > @@ -10015,13 +10022,34 @@ static void process_nmi(struct kvm_vcpu *vcpu)
> >  	 * Otherwise, allow two (and we'll inject the first one immediately).
> >  	 */
> >  	if (static_call(kvm_x86_get_nmi_mask)(vcpu) || vcpu->arch.nmi_injected)
> > -		limit = 1;
> > +		limit--;
> > +
> > +	/* Also if there is already a NMI hardware queued to be injected,
> > +	 * decrease the limit again
> > +	 */
> 
> 	/*
> 	 * Block comment ...
> 	 */
> 
> > +	if (static_call(kvm_x86_get_hw_nmi_pending)(vcpu))
> 
> I'd prefer "is_hw_nmi_pending()" over "get", even if it means not pairing with
> "set".  Though I think that's a good thing since they aren't perfect pairs.
> 
> > +		limit--;
> >  
> > -	vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);
> > +	if (limit <= 0)
> > +		return;
> > +
> > +	/* Attempt to use hardware NMI queueing */
> > +	if (static_call(kvm_x86_set_hw_nmi_pending)(vcpu)) {
> > +		limit--;
> > +		nmi_to_queue--;
> > +	}
> > +
> > +	vcpu->arch.nmi_pending += nmi_to_queue;
> >  	vcpu->arch.nmi_pending = min(vcpu->arch.nmi_pending, limit);
> >  	kvm_make_request(KVM_REQ_EVENT, vcpu);
> >  }
> >  
> > +/* Return total number of NMIs pending injection to the VM */
> > +int kvm_get_total_nmi_pending(struct kvm_vcpu *vcpu)
> > +{
> > +	return vcpu->arch.nmi_pending + static_call(kvm_x86_get_hw_nmi_pending)(vcpu);
> 
> Nothing cares about the total count, this can just be;

I wanted to have the interface to be a bit more generic so that in theory you could have
more that one hardware NMI pending. I don't care much about it.


Best regards,
	Maxim Levitsky

> 
> 
> 	bool kvm_is_nmi_pending(struct kvm_vcpu *vcpu)
> 	{
> 		return vcpu->arch.nmi_pending ||
> 		       static_call(kvm_x86_is_hw_nmi_pending)(vcpu);
> 	}
> 
> 
> > +}
> > +
> >  void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
> >  				       unsigned long *vcpu_bitmap)
> >  {
> > -- 
> > 2.26.3
> > 



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
  2023-02-24 14:38       ` Maxim Levitsky
@ 2023-02-24 16:48         ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2023-02-24 16:48 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Sandipan Das, Paolo Bonzini, Jim Mattson, Peter Zijlstra,
	Dave Hansen, Borislav Petkov, Pawan Gupta, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Daniel Sneddon, Jiaxi Chen,
	Babu Moger, linux-kernel, Jing Liu, Wyes Karny, x86,
	H. Peter Anvin

On Fri, Feb 24, 2023, Maxim Levitsky wrote:
> On Tue, 2023-01-31 at 01:44 +0000, Sean Christopherson wrote:
> > On Sat, Jan 28, 2023, Sean Christopherson wrote:
> > So I think this over two patches?
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 05d38944a6c0..ad1e70ac8669 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
> >  
> >         if (g->int_ctl & V_INTR_MASKING_MASK) {
> >                 /*
> > -                * Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
> > -                * does not affect any interrupt we may want to inject;
> > -                * therefore, writes to CR8 are irrelevant to L0, as are
> > -                * interrupt window vmexits.
> > +                * If L2 is active and V_INTR_MASKING is enabled in vmcb12,
> > +                * disable intercept of CR8 writes as L2's CR8 does not affect
> > +                * any interrupt KVM may want to inject.
> > +                *
> > +                * Similarly, disable intercept of virtual interrupts (used to
> > +                * detect interrupt windows) if the saved RFLAGS.IF is '0', as
> > +                * the effective RFLAGS.IF for L1 interrupts will never be set
> > +                * while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
> >                  */
> >                 vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
> > -               vmcb_clr_intercept(c, INTERCEPT_VINTR);
> > +               if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
> > +                       vmcb_clr_intercept(c, INTERCEPT_VINTR);
> 
> How about instead moving this code to svm_set_vintr?

I considered that, but it doesn't handle the case where INTERCEPT_VINTR was set
in vmcb01 before nested VMRUN, i.e. KVM is waiting for an interrupt window at
the time of L1, and L1 doesn't set RFLAGS.IF=1 prior to VMRUN.

> That is, in the guest mode, if the guest has V_INTR_MASKING_MASK, then
> then a nested VM exit is the next point the interrupt window could open,
> thus we don't set VINTR)
> 
> Or even better put the logic in svm_enable_irq_window (that is avoid
> calling svm_set_vintr in the first place).
> 
> I also think that it worth it to add a warning that 'svm_set_intercept'
> didn't work, that is didn't really set an intercept.

Heh, I had coded that up too, but switched to bailing from svm_set_vintr() if the
intercept was disabled because of the aforementioned scenario.

> In theory that can result in nasty CVEs in addition to logic bugs as you found.

I don't think this can result in a CVE, at least not without even worse bugs in
L1.  KVM uses INTERCEPT_VINTR purely to detect interrupt windows, and failure to
configure an IRQ window would at worst cause KVM to delay an IRQ.  If a missing
or late IRQ lets L2 extract data from L1, then L1 has problems of its own.

> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index b103fe7cbc82..59d2891662ef 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1580,6 +1580,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
> >  
> >         svm_set_intercept(svm, INTERCEPT_VINTR);
> >  
> > +       /*
> > +        * Recalculating intercepts may have clear the VINTR intercept.  If
> > +        * V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
> > +        * for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
> > +        * Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
> > +        * interrupts will never be unblocked while L2 is running.
> > +        */
> > +       if (!svm_is_intercept(svm, INTERCEPT_VINTR))
> > +               return;
> 
> This won't be needed if we don't call the svm_set_vintr in the first place.
> 
> > +
> >         /*
> >          * This is just a dummy VINTR to actually cause a vmexit to happen.
> >          * Actual injection of virtual interrupts happens through EVENTINJ.
> > 
> 
> 
> 
> With all this said, I also want to note that this patch has *nothing* to do with VNMI,
> I only added it due to some refactoring, so feel free to drop it from vNMI queue,
> and deal with those bugs separately.

Ya, let's tackle this in a separate series.  I'll circle back to this after rc1
(I'm OOO next week).

Santosh, in the next version of the vNMI series, can you drop any patches that
aren't strictly necessary to enable vNMI?

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2023-02-24 16:48 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
2022-11-29 19:37 ` [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
2022-12-05 14:05   ` Santosh Shukla
2022-12-06 12:13     ` Maxim Levitsky
2022-11-29 19:37 ` [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12 Maxim Levitsky
2023-01-28  0:37   ` Sean Christopherson
2023-01-31  1:44     ` Sean Christopherson
2023-02-24 14:38       ` Maxim Levitsky
2023-02-24 16:48         ` Sean Christopherson
2022-11-29 19:37 ` [PATCH v2 03/11] KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1 doesn't intercept interrupts Maxim Levitsky
2023-01-28  0:56   ` Sean Christopherson
2023-01-30 18:41     ` Sean Christopherson
2022-11-29 19:37 ` [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS Maxim Levitsky
2022-12-05 15:31   ` Santosh Shukla
2023-01-28  0:56   ` Sean Christopherson
2022-11-29 19:37 ` [PATCH v2 05/11] KVM: x86: emulator: stop using raw host flags Maxim Levitsky
2023-01-28  0:58   ` Sean Christopherson
2023-02-24 14:38     ` Maxim Levitsky
2022-11-29 19:37 ` [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception Maxim Levitsky
2022-12-05 15:41   ` Santosh Shukla
2022-12-06 12:14     ` Maxim Levitsky
2022-12-08 12:09       ` Santosh Shukla
2022-12-08 13:44         ` Maxim Levitsky
2023-01-31 21:07           ` Sean Christopherson
2023-02-13 14:50             ` Santosh Shukla
2022-11-29 19:37 ` [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface Maxim Levitsky
2023-01-28  1:09   ` Sean Christopherson
2023-01-31 21:12     ` Sean Christopherson
2023-02-08  9:35       ` Santosh Shukla
2023-02-08  9:32     ` Santosh Shukla
2023-02-24 14:39     ` Maxim Levitsky
2023-01-31 22:28   ` Sean Christopherson
2023-02-01  0:06     ` Sean Christopherson
2023-02-08  9:51       ` Santosh Shukla
2023-02-08 16:09         ` Sean Christopherson
2023-02-08  9:43     ` Santosh Shukla
2023-02-08 16:06       ` Sean Christopherson
2023-02-14 10:22         ` Santosh Shukla
2023-02-15 22:43           ` Sean Christopherson
2023-02-16  0:22             ` Sean Christopherson
2023-02-17  7:56               ` Santosh Shukla
2022-11-29 19:37 ` [PATCH v2 08/11] x86/cpu: Add CPUID feature bit for VNMI Maxim Levitsky
2022-11-29 19:37 ` [PATCH v2 09/11] KVM: SVM: Add VNMI bit definition Maxim Levitsky
2023-01-31 22:42   ` Sean Christopherson
2023-02-02  9:42     ` Santosh Shukla
2022-11-29 19:37 ` [PATCH v2 10/11] KVM: SVM: implement support for vNMI Maxim Levitsky
2022-12-04 17:18   ` Maxim Levitsky
2022-12-05 17:07   ` Santosh Shukla
2023-01-28  1:10   ` Sean Christopherson
2023-02-10 12:02     ` Santosh Shukla
2023-02-01  0:22   ` Sean Christopherson
2023-02-01  0:39     ` Sean Christopherson
2023-02-10 12:24     ` Santosh Shukla
2023-02-10 16:44       ` Sean Christopherson
2022-11-29 19:37 ` [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI Maxim Levitsky
2022-12-05 17:14   ` Santosh Shukla
2022-12-06 12:19     ` Maxim Levitsky
2022-12-08 12:11       ` Santosh Shukla
2023-02-01  0:44   ` Sean Christopherson
2022-12-06  9:58 ` [PATCH v2 00/11] SVM: vNMI (with my fixes) Santosh Shukla
2023-02-01  0:24   ` Sean Christopherson
2022-12-20 10:27 ` Maxim Levitsky
2022-12-21 18:44   ` Sean Christopherson
2023-01-15  9:05 ` Maxim Levitsky
2023-01-28  1:13 ` Sean Christopherson
2023-02-01 19:13 ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.