All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] SVM: vNMI (with my fixes)
@ 2022-11-17 14:32 Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 01/13] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
                   ` (12 more replies)
  0 siblings, 13 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky

Hi!

This is the vNMI patch series from Santosh Shukla with few
small fixes from me:

1. When a vNMI injection is pending, then to allow to not
  delay for an unbounded time the injection of another NMI that could
  arrive before the first vNMI injection is done, I added the code
  that would intercept IRET/RSM/STGI and then try the injection again.

2. I slighlty modified the 'KVM: SVM: Add VNMI support in get/set_nmi_mask'
   to have WARN_ON in vNMI functions when called without vNMI enabled.
   Also NMI mask/unmask should be allowed regardless if SMM is active,
   to support migration.

3. I did some refactoring in the code which updates the int_ctl in vmcb12
   on nested VM exit, and updated the patch 'KVM: nSVM: implement nested VNMI'
   to use this.

4. I added my reviewed-by to all the patches which I didn't change.

I only tested this on a machine which doesn't have vNMI, so this does need
some testing to ensure that nothing is broken.

Another thing I haven't looked at in depth yet is migration, I think there is a bug
because with vNMI, now in practise we can have 2 NMIs injected to the guest,
one in service, one 'pending injection' but no longer pending from KVM point of view,
and the KVM doesn't take this in account in kvm_vcpu_ioctl_x86_get_vcpu_events,a
and maybe more.

Best regards,
       Maxim Levitsky

Maxim Levitsky (5):
  KVM: nSVM: don't sync back tlb_ctl on nested VM exit
  KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit
  KVM: nSVM: rename nested_sync_control_from_vmcb02 to
    nested_sync_int_ctl_from_vmcb02
  KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12
  KVM: SVM: allow NMI window with vNMI

Santosh Shukla (8):
  x86/cpu: Add CPUID feature bit for VNMI
  KVM: SVM: Add VNMI bit definition
  KVM: SVM: Add VNMI support in get/set_nmi_mask
  KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  KVM: SVM: Add VNMI support in inject_nmi
  KVM: nSVM: implement nested VNMI
  KVM: nSVM: emulate VMEXIT_INVALID case for nested VNMI
  KVM: SVM: Enable VNMI feature

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/svm.h         |  7 +++
 arch/x86/kvm/svm/nested.c          | 84 +++++++++++++++++++++---------
 arch/x86/kvm/svm/svm.c             | 60 ++++++++++++++++++---
 arch/x86/kvm/svm/svm.h             | 70 ++++++++++++++++++++++++-
 5 files changed, 189 insertions(+), 33 deletions(-)

-- 
2.34.3



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/13] KVM: nSVM: don't sync back tlb_ctl on nested VM exit
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each " Maxim Levitsky
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky

The CPU doesn't change TLB_CTL value as stated in the PRM (15.16.2):

  "The VMRUN instruction reads, but does not change, the
  value of the TLB_CONTROL field"

Therefore the KVM shoudn't do that either.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b258d6988f5dde..43cc4a5d22e012 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -989,7 +989,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		vmcb12->control.next_rip  = vmcb02->control.next_rip;
 
 	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
-	vmcb12->control.tlb_ctl           = svm->nested.ctl.tlb_ctl;
 	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
 	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 01/13] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 20:04   ` Sean Christopherson
  2022-11-17 14:32 ` [PATCH 03/13] KVM: nSVM: rename nested_sync_control_from_vmcb02 to nested_sync_int_ctl_from_vmcb02 Maxim Levitsky
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky

Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
was an attempt to keep the int_ctl field in the vmcb12 cache
up to date on each VM exit.

However all other fields in the vmcb12 cache are not kept up to	 date,
therefore for consistency it is better to do this on a nested VM exit only.

No functional change intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 17 ++++++++---------
 arch/x86/kvm/svm/svm.c    |  2 --
 arch/x86/kvm/svm/svm.h    |  1 -
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 43cc4a5d22e012..91a51e75717dca 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -407,11 +407,12 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
  * Synchronize fields that are written by the processor, so that
  * they can be copied back into the vmcb12.
  */
-void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
+static void nested_sync_control_from_vmcb02(struct vcpu_svm *svm,
+					    struct vmcb *vmcb12)
 {
 	u32 mask;
-	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
-	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
+	vmcb12->control.event_inj      = svm->vmcb->control.event_inj;
+	vmcb12->control.event_inj_err  = svm->vmcb->control.event_inj_err;
 
 	/* Only a few fields of int_ctl are written by the processor.  */
 	mask = V_IRQ_MASK | V_TPR_MASK;
@@ -431,8 +432,8 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 	if (nested_vgif_enabled(svm))
 		mask |= V_GIF_MASK;
 
-	svm->nested.ctl.int_ctl        &= ~mask;
-	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
+	vmcb12->control.int_ctl        &= ~mask;
+	vmcb12->control.int_ctl        |= svm->vmcb->control.int_ctl & mask;
 }
 
 /*
@@ -985,13 +986,11 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (vmcb12->control.exit_code != SVM_EXIT_ERR)
 		nested_save_pending_event_to_vmcb12(svm, vmcb12);
 
+	nested_sync_control_from_vmcb02(svm, vmcb12);
+
 	if (svm->nrips_enabled)
 		vmcb12->control.next_rip  = vmcb02->control.next_rip;
 
-	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
-	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
-	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
-
 	if (!kvm_pause_in_guest(vcpu->kvm)) {
 		vmcb01->control.pause_filter_count = vmcb02->control.pause_filter_count;
 		vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 527f18d8cc4489..03acbe8ff34edb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4016,8 +4016,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	svm->next_rip = 0;
 	if (is_guest_mode(vcpu)) {
-		nested_sync_control_from_vmcb02(svm);
-
 		/* Track VMRUNs that have made past consistency checking */
 		if (svm->nested.nested_run_pending &&
 		    svm->vmcb->control.exit_code != SVM_EXIT_ERR)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 199a2ecef1cec6..f5383104d00580 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -618,7 +618,6 @@ void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm,
 				       struct vmcb_control_area *control);
 void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
 				    struct vmcb_save_area *save);
-void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
 void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
 void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
 
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/13] KVM: nSVM: rename nested_sync_control_from_vmcb02 to nested_sync_int_ctl_from_vmcb02
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 01/13] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each " Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 04/13] KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12 Maxim Levitsky
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky

The nested_sync_control_from_vmcb02 name is misleading as there
are many fields which are modified by the CPU and need to be written back
to vmcb12.

This function only copies some int_ctl bits and thecevent_inj*
fields.

Make it copy only these int_ctl bits and rename the function.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 91a51e75717dca..54eb152e2b60b6 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -404,15 +404,13 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
 }
 
 /*
- * Synchronize fields that are written by the processor, so that
+ * Synchronize int_ctl fields that are written by the processor, so that
  * they can be copied back into the vmcb12.
  */
-static void nested_sync_control_from_vmcb02(struct vcpu_svm *svm,
+static void nested_sync_int_ctl_from_vmcb02(struct vcpu_svm *svm,
 					    struct vmcb *vmcb12)
 {
 	u32 mask;
-	vmcb12->control.event_inj      = svm->vmcb->control.event_inj;
-	vmcb12->control.event_inj_err  = svm->vmcb->control.event_inj_err;
 
 	/* Only a few fields of int_ctl are written by the processor.  */
 	mask = V_IRQ_MASK | V_TPR_MASK;
@@ -986,7 +984,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (vmcb12->control.exit_code != SVM_EXIT_ERR)
 		nested_save_pending_event_to_vmcb12(svm, vmcb12);
 
-	nested_sync_control_from_vmcb02(svm, vmcb12);
+	nested_sync_int_ctl_from_vmcb02(svm, vmcb12);
+
+	vmcb12->control.event_inj      = svm->vmcb->control.event_inj;
+	vmcb12->control.event_inj_err  = svm->vmcb->control.event_inj_err;
 
 	if (svm->nrips_enabled)
 		vmcb12->control.next_rip  = vmcb02->control.next_rip;
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/13] KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (2 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 03/13] KVM: nSVM: rename nested_sync_control_from_vmcb02 to nested_sync_int_ctl_from_vmcb02 Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 20:15   ` Sean Christopherson
  2022-11-17 14:32 ` [PATCH 05/13] x86/cpu: Add CPUID feature bit for VNMI Maxim Levitsky
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky

Clean up the nested_sync_int_ctl_from_vmcb02:

1. The comment about preservation of V_IRQ is wrong: when the L2 doesn't
   use virtual interrupt masking, then the field just doesn't exist in
   vmcb12 thus it should not be touched at all.
   Since it is unused in this case, touching it doesn't matter that much,
   so the bug is theoretical.

2. When the L2 doesn't use virtual interrupt masking, then in the *theory*
   if KVM uses the feature, it should copy the changes to V_IRQ* bits from
   vmcb02 to vmcb01.

   In practise, KVM only uses it for detection of the interrupt window,
   and it happens to re-open it on each nested VM exit because
   kvm_set_rflags happens to raise the KVM_REQ_EVENT.
   Do this explicitly.

3. Add comment on why we don't need to copy V_GIF from vmcb02 to vmcb01
   when nested guest doesn't use nested V_GIF (and thus L1's GIF is in
   vmcb02 while nested), even though it can in theory affect L1's GIF.

4. Add support code to also copy some bits of int_ctl from
   vmcb02 to vmcb01.
   Currently there are none.

No (visible) functional change is intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 47 ++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 54eb152e2b60b6..1f2b8492c8782f 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -410,28 +410,45 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
 static void nested_sync_int_ctl_from_vmcb02(struct vcpu_svm *svm,
 					    struct vmcb *vmcb12)
 {
-	u32 mask;
+	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
+	struct vmcb *vmcb01 = svm->vmcb01.ptr;
+
+	/* bitmask of bits of int_ctl that we copy from vmcb02 to vmcb12*/
+	u32 l2_to_l1_mask = 0;
+	/* bitmask of bits of int_ctl that we copy from vmcb02 to vmcb01*/
+	u32 l2_to_l0_mask = 0;
 
-	/* Only a few fields of int_ctl are written by the processor.  */
-	mask = V_IRQ_MASK | V_TPR_MASK;
-	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
-	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
+	if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
+		l2_to_l1_mask |= V_IRQ_MASK | V_TPR_MASK;
+	else {
 		/*
-		 * In order to request an interrupt window, L0 is usurping
-		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
-		 * even if it was clear in L1's VMCB.  Restoring it would be
-		 * wrong.  However, in this case V_IRQ will remain true until
-		 * interrupt_window_interception calls svm_clear_vintr and
-		 * restores int_ctl.  We can just leave it aside.
+		 * If IRQ window was opened while in L2, it must be reopened
+		 * after the VM exit
+		 *
+		 * vTPR value doesn't need to be copied from vmcb02 to vmcb01
+		 * because it is synced from/to apic registers on each VM exit
 		 */
-		mask &= ~V_IRQ_MASK;
+		if (vmcb02->control.int_ctl & V_IRQ_MASK)
+			kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 	}
 
 	if (nested_vgif_enabled(svm))
-		mask |= V_GIF_MASK;
+		l2_to_l1_mask |= V_GIF_MASK;
+	else
+		/* There is no need to sync V_GIF from vmcb02 to vmcb01
+		 * because GIF is cleared on VMexit, thus even though
+		 * nested guest can control host's GIF, on VM exit
+		 * its set value is lost
+		 */
+		;
+
+	vmcb12->control.int_ctl =
+		(svm->nested.ctl.int_ctl & ~l2_to_l1_mask) |
+		(vmcb02->control.int_ctl & l2_to_l1_mask);
 
-	vmcb12->control.int_ctl        &= ~mask;
-	vmcb12->control.int_ctl        |= svm->vmcb->control.int_ctl & mask;
+	vmcb01->control.int_ctl =
+		(vmcb01->control.int_ctl & ~l2_to_l0_mask) |
+		(vmcb02->control.int_ctl & l2_to_l0_mask);
 }
 
 /*
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/13] x86/cpu: Add CPUID feature bit for VNMI
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (3 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 04/13] KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12 Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 06/13] KVM: SVM: Add VNMI bit definition Maxim Levitsky
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky, Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

VNMI feature allows the hypervisor to inject NMI into the guest w/o
using Event injection mechanism, The benefit of using VNMI over the
event Injection that does not require tracking the Guest's NMI state and
intercepting the IRET for the NMI completion. VNMI achieves that by
exposing 3 capability bits in VMCB intr_cntrl which helps with
virtualizing NMI injection and NMI_Masking.

The presence of this feature is indicated via the CPUID function
0x8000000A_EDX[25].

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b71f4f2ecdd571..23bd69848d271e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -356,6 +356,7 @@
 #define X86_FEATURE_VGIF		(15*32+16) /* Virtual GIF */
 #define X86_FEATURE_X2AVIC		(15*32+18) /* Virtual x2apic */
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* Virtual SPEC_CTRL */
+#define X86_FEATURE_AMD_VNMI		(15*32+25) /* Virtual NMI */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* "" SVME addr check */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/13] KVM: SVM: Add VNMI bit definition
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (4 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 05/13] x86/cpu: Add CPUID feature bit for VNMI Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 14:37   ` Borislav Petkov
  2022-11-17 20:27   ` Sean Christopherson
  2022-11-17 14:32 ` [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask Maxim Levitsky
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky, Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

VNMI exposes 3 capability bits (V_NMI, V_NMI_MASK, and V_NMI_ENABLE) to
virtualize NMI and NMI_MASK, Those capability bits are part of
VMCB::intr_ctrl -
V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.

When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor
will clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
handling NMI, After the guest handled the NMI, The processor will clear
the V_NMI_MASK on the successful completion of IRET instruction Or if
VMEXIT occurs while delivering the virtual NMI.

To enable the VNMI capability, Hypervisor need to program
V_NMI_ENABLE bit 1.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
 arch/x86/include/asm/svm.h | 7 +++++++
 arch/x86/kvm/svm/svm.c     | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 4352b46dd20c90..d8474e4b04ac05 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -198,6 +198,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define X2APIC_MODE_SHIFT 30
 #define X2APIC_MODE_MASK (1 << X2APIC_MODE_SHIFT)
 
+#define V_NMI_PENDING_SHIFT 11
+#define V_NMI_PENDING (1 << V_NMI_PENDING_SHIFT)
+#define V_NMI_MASK_SHIFT 12
+#define V_NMI_MASK (1 << V_NMI_MASK_SHIFT)
+#define V_NMI_ENABLE_SHIFT 26
+#define V_NMI_ENABLE (1 << V_NMI_ENABLE_SHIFT)
+
 #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
 #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 03acbe8ff34edb..08a7b2a0a29f3a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -230,6 +230,8 @@ module_param(dump_invalid_vmcb, bool, 0644);
 bool intercept_smi = true;
 module_param(intercept_smi, bool, 0444);
 
+bool vnmi = true;
+module_param(vnmi, bool, 0444);
 
 static bool svm_gp_erratum_intercept = true;
 
@@ -5029,6 +5031,10 @@ static __init int svm_hardware_setup(void)
 		svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL;
 	}
 
+	vnmi = vnmi && boot_cpu_has(X86_FEATURE_AMD_VNMI);
+	if (vnmi)
+		pr_info("Virtual NMI enabled\n");
+
 	if (vls) {
 		if (!npt_enabled ||
 		    !boot_cpu_has(X86_FEATURE_V_VMSAVE_VMLOAD) ||
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (5 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 06/13] KVM: SVM: Add VNMI bit definition Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 18:54   ` Sean Christopherson
  2022-11-17 14:32 ` [PATCH 08/13] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI Maxim Levitsky
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky, Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK
as read-only in the hypervisor except for the SMM case where hypervisor
before entring and after leaving SMM mode requires to set and unset
V_NMI_MASK.

Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
L2.

Maxim:
   - made set_vnmi_mask/clear_vnmi_mask/is_vnmi_mask warn if called
     without vNMI enabled
   - clear IRET intercept in svm_set_nmi_mask even with vNMI

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 18 ++++++++++++++-
 arch/x86/kvm/svm/svm.h | 52 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 08a7b2a0a29f3a..c16f68f6c4f7d7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3618,13 +3618,29 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 
 static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
 {
-	return !!(vcpu->arch.hflags & HF_NMI_MASK);
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (is_vnmi_enabled(svm))
+		return is_vnmi_mask_set(svm);
+	else
+		return !!(vcpu->arch.hflags & HF_NMI_MASK);
 }
 
 static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (is_vnmi_enabled(svm)) {
+		if (masked)
+			set_vnmi_mask(svm);
+		else {
+			clear_vnmi_mask(svm);
+			if (!sev_es_guest(vcpu->kvm))
+				svm_clr_intercept(svm, INTERCEPT_IRET);
+		}
+		return;
+	}
+
 	if (masked) {
 		vcpu->arch.hflags |= HF_NMI_MASK;
 		if (!sev_es_guest(vcpu->kvm))
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f5383104d00580..bf7f4851dee204 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
 extern int vgif;
 extern bool intercept_smi;
+extern bool vnmi;
 
 enum avic_modes {
 	AVIC_MODE_NONE = 0,
@@ -531,6 +532,57 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
 	       (msr < (APIC_BASE_MSR + 0x100));
 }
 
+static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
+{
+	if (!vnmi)
+		return NULL;
+
+	if (is_guest_mode(&svm->vcpu))
+		return svm->nested.vmcb02.ptr;
+	else
+		return svm->vmcb01.ptr;
+}
+
+static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+	if (vmcb)
+		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
+	else
+		return false;
+}
+
+static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+	if (!WARN_ON_ONCE(!vmcb))
+		return false;
+
+	return !!(vmcb->control.int_ctl & V_NMI_MASK);
+}
+
+static inline void set_vnmi_mask(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+	if (!WARN_ON_ONCE(!vmcb))
+		return;
+
+	vmcb->control.int_ctl |= V_NMI_MASK;
+}
+
+static inline void clear_vnmi_mask(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+	if (!WARN_ON_ONCE(!vmcb))
+		return;
+
+	vmcb->control.int_ctl &= ~V_NMI_MASK;
+}
+
 /* svm.c */
 #define MSR_INVALID				0xffffffffU
 
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/13] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (6 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 09/13] KVM: SVM: allow NMI window with vNMI Maxim Levitsky
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky, Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

In the VNMI case, Report NMI is not allowed when V_NMI_PENDING is set
which mean virtual NMI already pended for Guest to process while
the Guest is busy handling the current virtual NMI. The Guest
will first finish handling the current virtual NMI and then it will
take the pended event w/o vmexit.

Maxim:
  - disable NMI window unconditionally for now.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c |  6 ++++++
 arch/x86/kvm/svm/svm.h | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c16f68f6c4f7d7..cfec4c98bb589b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3595,6 +3595,9 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
 		return false;
 
+	if (is_vnmi_enabled(svm) && is_vnmi_pending_set(svm))
+		return true;
+
 	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
 	      (vcpu->arch.hflags & HF_NMI_MASK);
 
@@ -3732,6 +3735,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (is_vnmi_enabled(svm))
+		return;
+
 	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
 		return; /* IRET will cause a vm exit */
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bf7f4851dee204..5f2ee72c6e3125 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -583,6 +583,17 @@ static inline void clear_vnmi_mask(struct vcpu_svm *svm)
 	vmcb->control.int_ctl &= ~V_NMI_MASK;
 }
 
+
+static inline bool is_vnmi_pending_set(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+	if (vmcb)
+		return !!(vmcb->control.int_ctl & V_NMI_PENDING);
+	else
+		return false;
+}
+
 /* svm.c */
 #define MSR_INVALID				0xffffffffU
 
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/13] KVM: SVM: allow NMI window with vNMI
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (7 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 08/13] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 18:21   ` Sean Christopherson
  2022-11-17 14:32 ` [PATCH 10/13] KVM: SVM: Add VNMI support in inject_nmi Maxim Levitsky
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky

When the vNMI is enabled, the only case when the KVM will use an NMI
window is when the vNMI injection is pending.

In this case on next IRET/RSM/STGI, the injection has to be complete
and a new NMI can be injected.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cfec4c98bb589b..eaa30f8ace518d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2477,7 +2477,10 @@ static int iret_interception(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	++vcpu->stat.nmi_window_exits;
-	vcpu->arch.hflags |= HF_IRET_MASK;
+
+	if (!is_vnmi_enabled(svm))
+		vcpu->arch.hflags |= HF_IRET_MASK;
+
 	if (!sev_es_guest(vcpu->kvm)) {
 		svm_clr_intercept(svm, INTERCEPT_IRET);
 		svm->nmi_iret_rip = kvm_rip_read(vcpu);
@@ -3735,9 +3738,6 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (is_vnmi_enabled(svm))
-		return;
-
 	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
 		return; /* IRET will cause a vm exit */
 
@@ -3751,9 +3751,14 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 	 * Something prevents NMI from been injected. Single step over possible
 	 * problem (IRET or exception injection or interrupt shadow)
 	 */
-	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
-	svm->nmi_singlestep = true;
-	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+
+	if (is_vnmi_enabled(svm)) {
+		svm_set_intercept(svm, INTERCEPT_IRET);
+	} else {
+		svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
+		svm->nmi_singlestep = true;
+		svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+	}
 }
 
 static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/13] KVM: SVM: Add VNMI support in inject_nmi
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (8 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 09/13] KVM: SVM: allow NMI window with vNMI Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-21 17:12   ` Sean Christopherson
  2022-11-17 14:32 ` [PATCH 11/13] KVM: nSVM: implement nested VNMI Maxim Levitsky
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky, Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
will clear V_NMI to acknowledge processing has started and will keep the
V_NMI_MASK set until the processor is done with processing the NMI event.

Also, handle the nmi_l1_to_l2 case such that when it is true then
NMI to be injected originally comes from L1's VMCB12 EVENTINJ field.
So adding a check for that case.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index eaa30f8ace518d..9ebfbd0d4b467e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3479,7 +3479,14 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
 static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *vmcb = NULL;
 
+	if (is_vnmi_enabled(svm) && !svm->nmi_l1_to_l2) {
+		vmcb = get_vnmi_vmcb(svm);
+		vmcb->control.int_ctl |= V_NMI_PENDING;
+		++vcpu->stat.nmi_injections;
+		return;
+	}
 	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
 
 	if (svm->nmi_l1_to_l2)
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/13] KVM: nSVM: implement nested VNMI
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (9 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 10/13] KVM: SVM: Add VNMI support in inject_nmi Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 12/13] KVM: nSVM: emulate VMEXIT_INVALID case for " Maxim Levitsky
  2022-11-17 14:32 ` [PATCH 13/13] KVM: SVM: Enable VNMI feature Maxim Levitsky
  12 siblings, 0 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky, Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

In order to support nested VNMI requires saving and restoring the VNMI
bits during nested entry and exit.
In case of L1 and L2 both using VNMI- Copy VNMI bits from vmcb12 to
vmcb02 during entry and vice-versa during exit.
And in case of L1 uses VNMI and L2 doesn't- Copy VNMI bits from vmcb01 to
vmcb02 during entry and vice-versa during exit.

Tested with the KVM-unit-test and Nested Guest scenario.

Maxim:
   - moved the vNMI bits copying to nested_sync_int_ctl_from_vmcb02

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 13 +++++++++++++
 arch/x86/kvm/svm/svm.c    |  5 +++++
 arch/x86/kvm/svm/svm.h    |  6 ++++++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1f2b8492c8782f..c9fcdd691bb5a1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -442,6 +442,14 @@ static void nested_sync_int_ctl_from_vmcb02(struct vcpu_svm *svm,
 		 */
 		;
 
+	if (vnmi) {
+		/* copy back the vNMI fields which can be modified by the CPU */
+		if (nested_vnmi_enabled(svm))
+			l2_to_l1_mask |= V_NMI_MASK | V_NMI_PENDING;
+		else
+			l2_to_l0_mask |= V_NMI_MASK | V_NMI_PENDING;
+	}
+
 	vmcb12->control.int_ctl =
 		(svm->nested.ctl.int_ctl & ~l2_to_l1_mask) |
 		(vmcb02->control.int_ctl & l2_to_l1_mask);
@@ -657,6 +665,11 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	else
 		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
 
+	if (nested_vnmi_enabled(svm))
+		int_ctl_vmcb12_bits |= (V_NMI_PENDING | V_NMI_ENABLE | V_NMI_MASK);
+	else
+		int_ctl_vmcb01_bits |= (V_NMI_PENDING | V_NMI_ENABLE | V_NMI_MASK);
+
 	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
 	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
 	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9ebfbd0d4b467e..c9190a8ee03273 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4188,6 +4188,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
 
+	svm->vnmi_enabled = vnmi && guest_cpuid_has(vcpu, X86_FEATURE_AMD_VNMI);
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
@@ -4939,6 +4941,9 @@ static __init void svm_set_cpu_caps(void)
 		if (vgif)
 			kvm_cpu_cap_set(X86_FEATURE_VGIF);
 
+		if (vnmi)
+			kvm_cpu_cap_set(X86_FEATURE_AMD_VNMI);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5f2ee72c6e3125..d39e937a2c8391 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -252,6 +252,7 @@ struct vcpu_svm {
 	bool pause_filter_enabled         : 1;
 	bool pause_threshold_enabled      : 1;
 	bool vgif_enabled                 : 1;
+	bool vnmi_enabled                 : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
@@ -532,6 +533,11 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
 	       (msr < (APIC_BASE_MSR + 0x100));
 }
 
+static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
+{
+	return svm->vnmi_enabled && (svm->nested.ctl.int_ctl & V_NMI_ENABLE);
+}
+
 static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
 {
 	if (!vnmi)
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/13] KVM: nSVM: emulate VMEXIT_INVALID case for nested VNMI
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (10 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 11/13] KVM: nSVM: implement nested VNMI Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  2022-11-17 20:18   ` Sean Christopherson
  2022-11-17 14:32 ` [PATCH 13/13] KVM: SVM: Enable VNMI feature Maxim Levitsky
  12 siblings, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky, Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

If NMI virtualization enabled and NMI_INTERCEPT is unset then next vm
entry will exit with #INVALID exit reason.

In order to emulate above (VMEXIT(#INVALID)) scenario for nested
environment, extending check for V_NMI_ENABLE, NMI_INTERCEPT bit in func
__nested_vmcb_check_controls.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c9fcdd691bb5a1..3ef7e1971a4709 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -275,6 +275,11 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
 	if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
 		return false;
 
+	if (CC((control->int_ctl & V_NMI_ENABLE) &&
+		!vmcb12_is_intercept(control, INTERCEPT_NMI))) {
+		return false;
+	}
+
 	return true;
 }
 
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/13] KVM: SVM: Enable VNMI feature
  2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
                   ` (11 preceding siblings ...)
  2022-11-17 14:32 ` [PATCH 12/13] KVM: nSVM: emulate VMEXIT_INVALID case for " Maxim Levitsky
@ 2022-11-17 14:32 ` Maxim Levitsky
  12 siblings, 0 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-17 14:32 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Sean Christopherson,
	Jim Mattson, x86, Maxim Levitsky, Santosh Shukla

From: Santosh Shukla <santosh.shukla@amd.com>

Enable the NMI virtualization (V_NMI_ENABLE) in the VMCB interrupt
control when the vnmi module parameter is set.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c9190a8ee03273..5b61d89c644da6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1307,6 +1307,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
+	if (vnmi)
+		svm->vmcb->control.int_ctl |= V_NMI_ENABLE;
+
 	if (vgif) {
 		svm_clr_intercept(svm, INTERCEPT_STGI);
 		svm_clr_intercept(svm, INTERCEPT_CLGI);
-- 
2.34.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/13] KVM: SVM: Add VNMI bit definition
  2022-11-17 14:32 ` [PATCH 06/13] KVM: SVM: Add VNMI bit definition Maxim Levitsky
@ 2022-11-17 14:37   ` Borislav Petkov
  2022-11-17 16:42     ` Sean Christopherson
  2022-11-17 20:27   ` Sean Christopherson
  1 sibling, 1 reply; 33+ messages in thread
From: Borislav Petkov @ 2022-11-17 14:37 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny, Babu Moger,
	Pawan Gupta, Sean Christopherson, Jim Mattson, x86,
	Santosh Shukla

On Thu, Nov 17, 2022 at 04:32:35PM +0200, Maxim Levitsky wrote:
> @@ -5029,6 +5031,10 @@ static __init int svm_hardware_setup(void)
>  		svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL;
>  	}
>  
> +	vnmi = vnmi && boot_cpu_has(X86_FEATURE_AMD_VNMI);

s/boot_cpu_has/cpu_feature_enabled/

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/13] KVM: SVM: Add VNMI bit definition
  2022-11-17 14:37   ` Borislav Petkov
@ 2022-11-17 16:42     ` Sean Christopherson
  2022-11-17 17:07       ` Borislav Petkov
  0 siblings, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2022-11-17 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Maxim Levitsky, kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin,
	Dave Hansen, linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Sandipan Das, Daniel Sneddon, Jing Liu, Josh Poimboeuf,
	Wyes Karny, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, Nov 17, 2022, Borislav Petkov wrote:
> On Thu, Nov 17, 2022 at 04:32:35PM +0200, Maxim Levitsky wrote:
> > @@ -5029,6 +5031,10 @@ static __init int svm_hardware_setup(void)
> >  		svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL;
> >  	}
> >  
> > +	vnmi = vnmi && boot_cpu_has(X86_FEATURE_AMD_VNMI);
> 
> s/boot_cpu_has/cpu_feature_enabled/

Why?  This is rarely run code, won't cpu_feature_enabled() unnecessarily require
patching?

And while we're on the topic... https://lore.kernel.org/all/Y22IzA9DN%2FxYWgWN@google.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/13] KVM: SVM: Add VNMI bit definition
  2022-11-17 16:42     ` Sean Christopherson
@ 2022-11-17 17:07       ` Borislav Petkov
  2022-11-17 20:33         ` Sean Christopherson
  0 siblings, 1 reply; 33+ messages in thread
From: Borislav Petkov @ 2022-11-17 17:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin,
	Dave Hansen, linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Sandipan Das, Daniel Sneddon, Jing Liu, Josh Poimboeuf,
	Wyes Karny, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, Nov 17, 2022 at 04:42:57PM +0000, Sean Christopherson wrote:
> Why? This is rarely run code, won't cpu_feature_enabled()
> unnecessarily require patching?

Because we want one single interface to test X86_FEATURE flags. And
there's no need for the users to know whether it wants patching or not -
we simply patch *everywhere* and that's it.

> And while we're on the topic... https://lore.kernel.org/all/Y22IzA9DN%2FxYWgWN@google.com

Because static_ or boot_ is not relevant to the user - all she
wants to know is whether a cpu feature has been enabled. Thus
cpu_feature_enabled().

And yes, at the time I protested a little about unnecessary patching.
And tglx said "Why not?". And I had no good answer to that. So we can
just as well patch *everywhere*.

And patching is soo not a big deal anymore considering all the other
things we do to kernel code at build time and runtime. objdump output
compared to what's actually running has in some cases no resemblance
whatsoever.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/13] KVM: SVM: allow NMI window with vNMI
  2022-11-17 14:32 ` [PATCH 09/13] KVM: SVM: allow NMI window with vNMI Maxim Levitsky
@ 2022-11-17 18:21   ` Sean Christopherson
  2022-11-21 13:40     ` Maxim Levitsky
  0 siblings, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2022-11-17 18:21 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> When the vNMI is enabled, the only case when the KVM will use an NMI
> window is when the vNMI injection is pending.
> 
> In this case on next IRET/RSM/STGI, the injection has to be complete
> and a new NMI can be injected.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cfec4c98bb589b..eaa30f8ace518d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2477,7 +2477,10 @@ static int iret_interception(struct kvm_vcpu *vcpu)
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
>  	++vcpu->stat.nmi_window_exits;
> -	vcpu->arch.hflags |= HF_IRET_MASK;
> +
> +	if (!is_vnmi_enabled(svm))
> +		vcpu->arch.hflags |= HF_IRET_MASK;

Ugh, HF_IRET_MASK is such a terrible name/flag.  Given that it lives with GIF
and NMI, one would naturally think that it means "IRET is intercepted", but it
really means "KVM just intercepted an IRET and is waiting for NMIs to become
unblocked".

And on a related topic, why on earth are GIF, NMI, and IRET tracked in hflags?
They are 100% SVM concepts.  IMO, this code would be much easier to follow if
by making them bools in vcpu_svm with more descriptive names.

> +
>  	if (!sev_es_guest(vcpu->kvm)) {
>  		svm_clr_intercept(svm, INTERCEPT_IRET);
>  		svm->nmi_iret_rip = kvm_rip_read(vcpu);

The vNMI interaction with this logic is confusing, as nmi_iret_rip doesn't need
to be captured for the vNMI case.  SEV-ES actually has unrelated reasons for not
reading RIP vs. not intercepting IRET, they just got bundled together here for
convenience.

This is also an opportunity to clean up the SEV-ES interaction with IRET interception,
which is splattered all over the place and isn't documented anywhere.

E.g. (with an HF_IRET_MASK => awaiting_iret_completion change)

/*
 * For SEV-ES guests, KVM must not rely on IRET to detect NMI unblocking as
 * #VC->IRET in the guest will result in KVM thinking NMIs are unblocked before
 * the guest is ready for a new NMI.  Architecturally, KVM is 100% correct to
 * treat NMIs as unblocked on IRET, but the guest-host ABI for SEV-ES guests is
 * that KVM must wait for an explicit "NMI Complete" from the guest.
 */
static void svm_disable_iret_interception(struct vcpu_svm *svm)
{
	if (!sev_es_guest(svm->vcpu.kvm))
		svm_clr_intercept(svm, INTERCEPT_IRET);
}

static void svm_enable_iret_interception(struct vcpu_svm *svm)
{
	if (!sev_es_guest(svm->vcpu.kvm))
		svm_set_intercept(svm, INTERCEPT_IRET);
}

static int iret_interception(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	++vcpu->stat.nmi_window_exits;

	/*
	 * No need to wait for the IRET to complete if vNMIs are enabled as
	 * hardware will automatically process the pending NMI when NMIs are
	 * unblocked from the guest's perspective.
	 */
	if (!is_vnmi_enabled(svm)) {
		svm->awaiting_iret_completion = true;

		/*
		 * The guest's RIP is inaccessible for SEV-ES guests, just
		 * assume forward progress was made on the next VM-Exit.
		 */
		if (!sev_es_guest(vcpu->kvm))
			svm->nmi_iret_rip = kvm_rip_read(vcpu);
	}

	svm_disable_iret_interception(svm);

	kvm_make_request(KVM_REQ_EVENT, vcpu);
	return 1;
}

> @@ -3735,9 +3738,6 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	if (is_vnmi_enabled(svm))
> -		return;
> -
>  	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
>  		return; /* IRET will cause a vm exit */

As much as I like incremental patches, in this case I'm having a hell of a time
reviewing the code as the vNMI logic ends up being split across four patches.
E.g. in this particular case, the above requires knowing that svm_inject_nmi()
never sets HF_NMI_MASK when vNMI is enabled.

In the next version, any objection to squashing patches 7-10 into a single "Add
non-nested vNMI support" patch?

As for this code, IMO some pre-work to change the flow would help with the vNMI
case.  The GIF=0 logic overrides legacy NMI blocking, and so can be handled first.
And I vote to explicitly set INTERCEPT_IRET in the above case instead of relying
on INTERCEPT_IRET to already be set by svm_inject_nmi().

That would yield this as a final result:

static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	/*
	 * GIF=0 blocks NMIs irrespective of legacy NMI blocking.  No need to
	 * intercept or single-step IRET if GIF=0, just intercept STGI.
	 */
	if (!gif_set(svm)) {
		if (vgif)
			svm_set_intercept(svm, INTERCEPT_STGI);
		return;
	}

	/*
	 * NMI is blocked, either because an NMI is in service or because KVM
	 * just injected an NMI.  If KVM is waiting for an intercepted IRET to
	 * complete, single-step the IRET to wait for NMIs to become unblocked.
	 * Otherwise, intercept the guest's next IRET.
	 */
	if (svm->awaiting_iret_completion) {
		svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
		svm->nmi_singlestep = true;
		svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
	} else {
		svm_set_intercept(svm, INTERCEPT_IRET);
	}
}

>  
> @@ -3751,9 +3751,14 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  	 * Something prevents NMI from been injected. Single step over possible
>  	 * problem (IRET or exception injection or interrupt shadow)
>  	 */
> -	svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> -	svm->nmi_singlestep = true;
> -	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +
> +	if (is_vnmi_enabled(svm)) {
> +		svm_set_intercept(svm, INTERCEPT_IRET);

This will break SEV-ES.  Per commit 4444dfe4050b ("KVM: SVM: Add NMI support for
an SEV-ES guest"), the hypervisor must not rely on IRET interception to detect
NMI unblocking for SEV-ES guests.  As above, I think we should provide helpers to
toggle NMI interception to reduce the probability of breaking SEV-ES.

> +	} else {
> +		svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> +		svm->nmi_singlestep = true;
> +		svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> +	}
>  }
>  
>  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> -- 
> 2.34.3
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-11-17 14:32 ` [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask Maxim Levitsky
@ 2022-11-17 18:54   ` Sean Christopherson
  2022-11-21 12:36     ` Maxim Levitsky
  2022-12-04 18:42     ` Maxim Levitsky
  0 siblings, 2 replies; 33+ messages in thread
From: Sean Christopherson @ 2022-11-17 18:54 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> From: Santosh Shukla <santosh.shukla@amd.com>
> 
> VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK
> as read-only in the hypervisor except for the SMM case where hypervisor
> before entring and after leaving SMM mode requires to set and unset
> V_NMI_MASK.
> 
> Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> L2.
> 
> Maxim:
>    - made set_vnmi_mask/clear_vnmi_mask/is_vnmi_mask warn if called
>      without vNMI enabled
>    - clear IRET intercept in svm_set_nmi_mask even with vNMI
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 18 ++++++++++++++-
>  arch/x86/kvm/svm/svm.h | 52 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 08a7b2a0a29f3a..c16f68f6c4f7d7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3618,13 +3618,29 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>  
>  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
>  {
> -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (is_vnmi_enabled(svm))
> +		return is_vnmi_mask_set(svm);
> +	else
> +		return !!(vcpu->arch.hflags & HF_NMI_MASK);
>  }
>  
>  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	if (is_vnmi_enabled(svm)) {
> +		if (masked)
> +			set_vnmi_mask(svm);

I believe not setting INTERCEPT_IRET is correct, but only because the existing
code is unnecessary.  And this all very subtly relies on KVM_REQ_EVENT being set
and/or KVM already being in kvm_check_and_inject_events().

When NMIs become unblocked, INTERCEPT_IRET can be cleared, but KVM should also
pending KVM_REQ_EVENT.  AFAICT, that doesn't happen when this is called via the
emulator.  Ah, because em_iret() only handles RM for Intel's restricted guest
crap.  I.e. it "works" only because it never happens.  All other flows set
KVM_REQ_EVENT when toggling NMI blocking, e.g. the RSM path of kvm_smm_changed().

And when NMIs become blocked, there's no need to force INTERCEPT_IRET in this
code because kvm_check_and_inject_events() will request an NMI window and set the
intercept if necessary, and all paths that set NMI blocking are guaranteed to
reach kvm_check_and_inject_events() before entering the guest.

  1. RSM => kvm_smm_changed() sets KVM_REQ_EVENT
  2. enter_smm() is only called from within kvm_check_and_inject_events(),
     before pending NMIs are processed (yay priority)
  3. emulator_set_nmi_mask() never blocks NMIs, only does the half-baked IRET emulation
  4. kvm_vcpu_ioctl_x86_set_vcpu_event() sets KVM_REQ_EVENT

So, can you add a prep patch to drop the forced INTERCEPT_IRET?  That way the
logic for vNMI and !vNMI is the same.

> +		else {
> +			clear_vnmi_mask(svm);

This is the only code that sets/clears the vNMI mask, so rather than have set/clear
helpers, what about a single helper to do the dirty work? 

> +			if (!sev_es_guest(vcpu->kvm))
> +				svm_clr_intercept(svm, INTERCEPT_IRET);
> +		}
> +		return;
> +	}
> +
>  	if (masked) {
>  		vcpu->arch.hflags |= HF_NMI_MASK;
>  		if (!sev_es_guest(vcpu->kvm))
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index f5383104d00580..bf7f4851dee204 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>  extern bool npt_enabled;
>  extern int vgif;
>  extern bool intercept_smi;
> +extern bool vnmi;
>  
>  enum avic_modes {
>  	AVIC_MODE_NONE = 0,
> @@ -531,6 +532,57 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
>  	       (msr < (APIC_BASE_MSR + 0x100));
>  }
>  
> +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> +{
> +	if (!vnmi)
> +		return NULL;
> +
> +	if (is_guest_mode(&svm->vcpu))
> +		return svm->nested.vmcb02.ptr;
> +	else
> +		return svm->vmcb01.ptr;
> +}
> +
> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> +	if (vmcb)
> +		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> +	else
> +		return false;

Maybe just this?

	return vmcb && (vmcb->control.int_ctl & V_NMI_ENABLE);

Or if an inner helper is added:

	return vmcb && __is_vnmi_enabled(vmcb);

> +}
> +
> +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> +	if (!WARN_ON_ONCE(!vmcb))

Rather than WARN, add an inner __is_vnmi_enabled() that takes the vnmi_vmcb.
Actually, if you do that, the test/set/clear helpers can go away entirely.

> +		return false;
> +
> +	return !!(vmcb->control.int_ctl & V_NMI_MASK);
> +}
> +
> +static inline void set_vnmi_mask(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> +	if (!WARN_ON_ONCE(!vmcb))
> +		return;
> +
> +	vmcb->control.int_ctl |= V_NMI_MASK;
> +}
> +
> +static inline void clear_vnmi_mask(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> +	if (!WARN_ON_ONCE(!vmcb))
> +		return;
> +
> +	vmcb->control.int_ctl &= ~V_NMI_MASK;
> +}

These helpers can all go in svm.  There are no users oustide of svm.c, and
unless I'm misunderstanding how nested works, there should never be oustide users.

E.g. with HF_NMI_MASK => svm->nmi_masked, the end result can be something like:

static bool __is_vnmi_enabled(struct *vmcb)
{
	return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
}

static bool is_vnmi_enabled(struct vcpu_svm *svm)
{
	struct vmcb *vmcb = get_vnmi_vmcb(svm);

	return vmcb && __is_vnmi_enabled(vmcb);
}

static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);
	struct vmcb *vmcb = get_vnmi_vmcb(svm);

	if (vmcb && __is_vnmi_enabled(vmcb))
		return !!(vmcb->control.int_ctl & V_NMI_MASK);
	else
		return !!(vcpu->arch.hflags & HF_NMI_MASK);
}

static void svm_set_or_clear_vnmi_mask(struct vmcb *vmcb, bool set)
{
	if (set)
		vmcb->control.int_ctl |= V_NMI_MASK;
	else
		vmcb->control.int_ctl &= ~V_NMI_MASK;
}

static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
{
	struct vcpu_svm *svm = to_svm(vcpu);
	struct vmcb *vmcb = get_vnmi_vmcb(svm);

	if (vmcb && __is_vnmi_enabled(vmcb)) {
		if (masked)
			vmcb->control.int_ctl |= V_NMI_MASK;
		else
			vmcb->control.int_ctl &= ~V_NMI_MASK;
	} else {
		svm->nmi_masked = masked;
	}

	if (!masked)
		svm_disable_iret_interception(svm);
}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit
  2022-11-17 14:32 ` [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each " Maxim Levitsky
@ 2022-11-17 20:04   ` Sean Christopherson
  2022-11-21 11:07     ` Maxim Levitsky
  0 siblings, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2022-11-17 20:04 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
> was an attempt to keep the int_ctl field in the vmcb12 cache
> up to date on each VM exit.

This doesn't mesh with the reasoning in commit 2d8a42be0e2b ("KVM: nSVM: synchronize
VMCB controls updated by the processor on every vmexit"), which states that the
goal is to keep svm->nested.ctl.* synchronized, not vmcb12.  Or is nested.ctl the
cache you are referring to?

> However all other fields in the vmcb12 cache are not kept up to	 date,

IIUC, this isn't technically true.  They are up-to-date because they're never
modified by hardware.

> therefore for consistency it is better to do this on a nested VM exit only.

Again, IIUC, this actually introduces an inconsistency because it leaves stale
state in svm->nested.ctl, whereas the existing code ensures all state in
svm->nested.ctl is fresh immediately after non-nested VM-Exit.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/13] KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12
  2022-11-17 14:32 ` [PATCH 04/13] KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12 Maxim Levitsky
@ 2022-11-17 20:15   ` Sean Christopherson
  2022-11-21 11:10     ` Maxim Levitsky
  0 siblings, 1 reply; 33+ messages in thread
From: Sean Christopherson @ 2022-11-17 20:15 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> Clean up the nested_sync_int_ctl_from_vmcb02:
> 
> 1. The comment about preservation of V_IRQ is wrong: when the L2 doesn't
>    use virtual interrupt masking, then the field just doesn't exist in
>    vmcb12 thus it should not be touched at all.
>    Since it is unused in this case, touching it doesn't matter that much,
>    so the bug is theoretical.
> 
> 2. When the L2 doesn't use virtual interrupt masking, then in the *theory*
>    if KVM uses the feature, it should copy the changes to V_IRQ* bits from
>    vmcb02 to vmcb01.
> 
>    In practise, KVM only uses it for detection of the interrupt window,
>    and it happens to re-open it on each nested VM exit because
>    kvm_set_rflags happens to raise the KVM_REQ_EVENT.
>    Do this explicitly.
> 
> 3. Add comment on why we don't need to copy V_GIF from vmcb02 to vmcb01
>    when nested guest doesn't use nested V_GIF (and thus L1's GIF is in
>    vmcb02 while nested), even though it can in theory affect L1's GIF.
> 
> 4. Add support code to also copy some bits of int_ctl from
>    vmcb02 to vmcb01.
>    Currently there are none.

Unless it's impossible for whatever reason, this patch should be split into
multiple patches.  IIUC, there are at least 2 different functional changes being
made, they just happen to not have any actual impact on things.

> No (visible) functional change is intended.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 47 ++++++++++++++++++++++++++-------------
>  1 file changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 54eb152e2b60b6..1f2b8492c8782f 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -410,28 +410,45 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
>  static void nested_sync_int_ctl_from_vmcb02(struct vcpu_svm *svm,
>  					    struct vmcb *vmcb12)
>  {
> -	u32 mask;
> +	struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
> +	struct vmcb *vmcb01 = svm->vmcb01.ptr;
> +
> +	/* bitmask of bits of int_ctl that we copy from vmcb02 to vmcb12*/
> +	u32 l2_to_l1_mask = 0;
> +	/* bitmask of bits of int_ctl that we copy from vmcb02 to vmcb01*/
> +	u32 l2_to_l0_mask = 0;
>  
> -	/* Only a few fields of int_ctl are written by the processor.  */

Can this comment be kept in some form?  I found it super useful when reading this
code just now.

> -	mask = V_IRQ_MASK | V_TPR_MASK;
> -	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> -	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
> +	if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
> +		l2_to_l1_mask |= V_IRQ_MASK | V_TPR_MASK;
> +	else {
>  		/*
> -		 * In order to request an interrupt window, L0 is usurping
> -		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> -		 * even if it was clear in L1's VMCB.  Restoring it would be
> -		 * wrong.  However, in this case V_IRQ will remain true until
> -		 * interrupt_window_interception calls svm_clear_vintr and
> -		 * restores int_ctl.  We can just leave it aside.
> +		 * If IRQ window was opened while in L2, it must be reopened
> +		 * after the VM exit
> +		 *
> +		 * vTPR value doesn't need to be copied from vmcb02 to vmcb01
> +		 * because it is synced from/to apic registers on each VM exit
>  		 */
> -		mask &= ~V_IRQ_MASK;
> +		if (vmcb02->control.int_ctl & V_IRQ_MASK)
> +			kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
>  	}
>  
>  	if (nested_vgif_enabled(svm))
> -		mask |= V_GIF_MASK;
> +		l2_to_l1_mask |= V_GIF_MASK;
> +	else
> +		/* There is no need to sync V_GIF from vmcb02 to vmcb01
> +		 * because GIF is cleared on VMexit, thus even though
> +		 * nested guest can control host's GIF, on VM exit
> +		 * its set value is lost
> +		 */
> +		;

The "else ... ;" is unnecessary, just throw the block comment above the nested
vGIF if-statment, e.g. if I'm understanding everything, this?

	/*
	 * If nested vGIF is not enabled, L2 has access to L1's "real" GIF.  In
	 * this case, there's no need to sync V_GIF from vmcb02 to vmcb01
	 * because GIF is cleared on VM-Exit, thus any changes made by L2 are
	 * overwritten on VM-Exit to L1.
	 */
	if (nested_vgif_enabled(svm))
		l2_to_l1_mask |= V_GIF_MASK;

> +
> +	vmcb12->control.int_ctl =
> +		(svm->nested.ctl.int_ctl & ~l2_to_l1_mask) |
> +		(vmcb02->control.int_ctl & l2_to_l1_mask);
>  
> -	vmcb12->control.int_ctl        &= ~mask;
> -	vmcb12->control.int_ctl        |= svm->vmcb->control.int_ctl & mask;
> +	vmcb01->control.int_ctl =
> +		(vmcb01->control.int_ctl & ~l2_to_l0_mask) |
> +		(vmcb02->control.int_ctl & l2_to_l0_mask);

No need for wrapping immediately after the "=", these all fit under the soft limit:

	vmcb12->control.int_ctl = (svm->nested.ctl.int_ctl & ~l2_to_l1_mask) |
				  (vmcb02->control.int_ctl & l2_to_l1_mask);

	vmcb01->control.int_ctl = (vmcb01->control.int_ctl & ~l2_to_l0_mask) |
				  (vmcb02->control.int_ctl & l2_to_l0_mask);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 12/13] KVM: nSVM: emulate VMEXIT_INVALID case for nested VNMI
  2022-11-17 14:32 ` [PATCH 12/13] KVM: nSVM: emulate VMEXIT_INVALID case for " Maxim Levitsky
@ 2022-11-17 20:18   ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2022-11-17 20:18 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> From: Santosh Shukla <santosh.shukla@amd.com>
> 
> If NMI virtualization enabled and NMI_INTERCEPT is unset then next vm
> entry will exit with #INVALID exit reason.
> 
> In order to emulate above (VMEXIT(#INVALID)) scenario for nested
> environment, extending check for V_NMI_ENABLE, NMI_INTERCEPT bit in func
> __nested_vmcb_check_controls.

This belongs in the previous patch, no?  I don't see how this isn't just a
natural part of supporting nested vNMI.

> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/nested.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index c9fcdd691bb5a1..3ef7e1971a4709 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -275,6 +275,11 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
>  	if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
>  		return false;
>  
> +	if (CC((control->int_ctl & V_NMI_ENABLE) &&
> +		!vmcb12_is_intercept(control, INTERCEPT_NMI))) {

Alignment is off by one:

	if (CC((control->int_ctl & V_NMI_ENABLE) &&
	       !vmcb12_is_intercept(control, INTERCEPT_NMI))) {
		return false;
	}

> +		return false;
> +	}
> +
>  	return true;
>  }
>  
> -- 
> 2.34.3
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/13] KVM: SVM: Add VNMI bit definition
  2022-11-17 14:32 ` [PATCH 06/13] KVM: SVM: Add VNMI bit definition Maxim Levitsky
  2022-11-17 14:37   ` Borislav Petkov
@ 2022-11-17 20:27   ` Sean Christopherson
  1 sibling, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2022-11-17 20:27 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 03acbe8ff34edb..08a7b2a0a29f3a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -230,6 +230,8 @@ module_param(dump_invalid_vmcb, bool, 0644);
>  bool intercept_smi = true;
>  module_param(intercept_smi, bool, 0444);
>  
> +bool vnmi = true;

This can be __ro_after_init, e.g. see
https://lore.kernel.org/all/20221110013003.1421895-4-seanjc@google.com

> +module_param(vnmi, bool, 0444);

The exposure of "vnmi" to userspace belongs in the last patch.  E.g. this patch
should only stub in vnmi:

  bool __ro_after_init vnmi;

and then the last patch that actually enables it can default it to true and expose
the param to users.

It obviously doesn't matter in the end, but it technically makes this series
broken, e.g. nested_sync_int_ctl_from_vmcb02() pivots on "vnmi" without the extra
V_NMI_ENABLE check, and advertising the vnmi is enabled when it actually isn't is
wrong/misleading.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/13] KVM: SVM: Add VNMI bit definition
  2022-11-17 17:07       ` Borislav Petkov
@ 2022-11-17 20:33         ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2022-11-17 20:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Maxim Levitsky, kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin,
	Dave Hansen, linux-kernel, Peter Zijlstra, Thomas Gleixner,
	Sandipan Das, Daniel Sneddon, Jing Liu, Josh Poimboeuf,
	Wyes Karny, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, Nov 17, 2022, Borislav Petkov wrote:
> On Thu, Nov 17, 2022 at 04:42:57PM +0000, Sean Christopherson wrote:
> > Why? This is rarely run code, won't cpu_feature_enabled()
> > unnecessarily require patching?
> 
> Because we want one single interface to test X86_FEATURE flags. And
> there's no need for the users to know whether it wants patching or not -
> we simply patch *everywhere* and that's it.
> 
> > And while we're on the topic... https://lore.kernel.org/all/Y22IzA9DN%2FxYWgWN@google.com
> 
> Because static_ or boot_ is not relevant to the user - all she
> wants to know is whether a cpu feature has been enabled. Thus
> cpu_feature_enabled().
> 
> And yes, at the time I protested a little about unnecessary patching.
> And tglx said "Why not?". And I had no good answer to that. So we can
> just as well patch *everywhere*.

Ah, I missed that memo.


Paolo,

Since it sounds like static_cpu_has() is going the way of the dodo, and ditto for
boot_cpu_has() except for flows that don't play nice with patching (none of which
are in KVM), should we do a KVM-wide conversion to cpu_feature_enabled() at some
point in the near future?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit
  2022-11-17 20:04   ` Sean Christopherson
@ 2022-11-21 11:07     ` Maxim Levitsky
  2022-11-21 17:51       ` Sean Christopherson
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-21 11:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86

On Thu, 2022-11-17 at 20:04 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
> > was an attempt to keep the int_ctl field in the vmcb12 cache
> > up to date on each VM exit.
> 
> This doesn't mesh with the reasoning in commit 2d8a42be0e2b ("KVM: nSVM: synchronize
> VMCB controls updated by the processor on every vmexit"), which states that the
> goal is to keep svm->nested.ctl.* synchronized, not vmcb12.  Or is nested.ctl the
> cache you are referring to?

Thanks for digging that commit out.

My reasoning was that cache contains both control and 'save' fields, and
we don't update 'save' fields on each VM exit.

For control it indeed looks like int_ctl and eventinj are the only fields
that are updated by the CPU, although IMHO they don't *need* to be updated
until we do a nested VM exit, because the VM isn't supposed to look at vmcb while it
is in use by the CPU, its state is undefined.

For migration though, this does look like a problem. It can be fixed during
reading the nested state but it is a hack as well.

My idea was as you had seen in the patches it to unify int_ctl handling,
since some bits might need to be copied to vmcb12 but some to vmcb01,
and we happened to have none of these so far, and it "happened" to work.

Do you have an idea on how to do this cleanly? I can just leave this as is
and only sync the bits of int_ctl from vmcb02 to vmcb01 on nested VM exit.
Ugly but would work.




> 
> > However all other fields in the vmcb12 cache are not kept up to  date,
> 
> IIUC, this isn't technically true.  They are up-to-date because they're never
> modified by hardware.

In both save and control cache. In control cache indeed looks like the
fields are kept up to date.

Best regards,
	Maxim Levitsky

> 
> > therefore for consistency it is better to do this on a nested VM exit only.
> 
> Again, IIUC, this actually introduces an inconsistency because it leaves stale
> state in svm->nested.ctl, whereas the existing code ensures all state in
> svm->nested.ctl is fresh immediately after non-nested VM-Exit.
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/13] KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12
  2022-11-17 20:15   ` Sean Christopherson
@ 2022-11-21 11:10     ` Maxim Levitsky
  0 siblings, 0 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-21 11:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86

On Thu, 2022-11-17 at 20:15 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > Clean up the nested_sync_int_ctl_from_vmcb02:
> > 
> > 1. The comment about preservation of V_IRQ is wrong: when the L2 doesn't
> >    use virtual interrupt masking, then the field just doesn't exist in
> >    vmcb12 thus it should not be touched at all.
> >    Since it is unused in this case, touching it doesn't matter that much,
> >    so the bug is theoretical.
> > 
> > 2. When the L2 doesn't use virtual interrupt masking, then in the *theory*
> >    if KVM uses the feature, it should copy the changes to V_IRQ* bits from
> >    vmcb02 to vmcb01.
> > 
> >    In practise, KVM only uses it for detection of the interrupt window,
> >    and it happens to re-open it on each nested VM exit because
> >    kvm_set_rflags happens to raise the KVM_REQ_EVENT.
> >    Do this explicitly.
> > 
> > 3. Add comment on why we don't need to copy V_GIF from vmcb02 to vmcb01
> >    when nested guest doesn't use nested V_GIF (and thus L1's GIF is in
> >    vmcb02 while nested), even though it can in theory affect L1's GIF.
> > 
> > 4. Add support code to also copy some bits of int_ctl from
> >    vmcb02 to vmcb01.
> >    Currently there are none.
> 
> Unless it's impossible for whatever reason, this patch should be split into
> multiple patches.  IIUC, there are at least 2 different functional changes being
> made, they just happen to not have any actual impact on things.

No objection to this.

> 
> > No (visible) functional change is intended.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/nested.c | 47 ++++++++++++++++++++++++++-------------
> >  1 file changed, 32 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 54eb152e2b60b6..1f2b8492c8782f 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -410,28 +410,45 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
> >  static void nested_sync_int_ctl_from_vmcb02(struct vcpu_svm *svm,
> >                                             struct vmcb *vmcb12)
> >  {
> > -       u32 mask;
> > +       struct vmcb *vmcb02 = svm->nested.vmcb02.ptr;
> > +       struct vmcb *vmcb01 = svm->vmcb01.ptr;
> > +
> > +       /* bitmask of bits of int_ctl that we copy from vmcb02 to vmcb12*/
> > +       u32 l2_to_l1_mask = 0;
> > +       /* bitmask of bits of int_ctl that we copy from vmcb02 to vmcb01*/
> > +       u32 l2_to_l0_mask = 0;
> >  
> > -       /* Only a few fields of int_ctl are written by the processor.  */
> 
> Can this comment be kept in some form?  I found it super useful when reading this
> code just now.

No problem.

> 
> > -       mask = V_IRQ_MASK | V_TPR_MASK;
> > -       if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> > -           svm_is_intercept(svm, INTERCEPT_VINTR)) {
> > +       if (svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
> > +               l2_to_l1_mask |= V_IRQ_MASK | V_TPR_MASK;
> > +       else {
> >                 /*
> > -                * In order to request an interrupt window, L0 is usurping
> > -                * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> > -                * even if it was clear in L1's VMCB.  Restoring it would be
> > -                * wrong.  However, in this case V_IRQ will remain true until
> > -                * interrupt_window_interception calls svm_clear_vintr and
> > -                * restores int_ctl.  We can just leave it aside.
> > +                * If IRQ window was opened while in L2, it must be reopened
> > +                * after the VM exit
> > +                *
> > +                * vTPR value doesn't need to be copied from vmcb02 to vmcb01
> > +                * because it is synced from/to apic registers on each VM exit
> >                  */
> > -               mask &= ~V_IRQ_MASK;
> > +               if (vmcb02->control.int_ctl & V_IRQ_MASK)
> > +                       kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
> >         }
> >  
> >         if (nested_vgif_enabled(svm))
> > -               mask |= V_GIF_MASK;
> > +               l2_to_l1_mask |= V_GIF_MASK;
> > +       else
> > +               /* There is no need to sync V_GIF from vmcb02 to vmcb01
> > +                * because GIF is cleared on VMexit, thus even though
> > +                * nested guest can control host's GIF, on VM exit
> > +                * its set value is lost
> > +                */
> > +               ;
> 
> The "else ... ;" is unnecessary, just throw the block comment above the nested
> vGIF if-statment, e.g. if I'm understanding everything, this?
Yes.

> 
>         /*
>          * If nested vGIF is not enabled, L2 has access to L1's "real" GIF.  In
>          * this case, there's no need to sync V_GIF from vmcb02 to vmcb01
>          * because GIF is cleared on VM-Exit, thus any changes made by L2 are
>          * overwritten on VM-Exit to L1.
>          */
>         if (nested_vgif_enabled(svm))
>                 l2_to_l1_mask |= V_GIF_MASK;
> 
> > +
> > +       vmcb12->control.int_ctl =
> > +               (svm->nested.ctl.int_ctl & ~l2_to_l1_mask) |
> > +               (vmcb02->control.int_ctl & l2_to_l1_mask);
> >  
> > -       vmcb12->control.int_ctl        &= ~mask;
> > -       vmcb12->control.int_ctl        |= svm->vmcb->control.int_ctl & mask;
> > +       vmcb01->control.int_ctl =
> > +               (vmcb01->control.int_ctl & ~l2_to_l0_mask) |
> > +               (vmcb02->control.int_ctl & l2_to_l0_mask);
> 
> No need for wrapping immediately after the "=", these all fit under the soft limit:
> 
>         vmcb12->control.int_ctl = (svm->nested.ctl.int_ctl & ~l2_to_l1_mask) |
>                                   (vmcb02->control.int_ctl & l2_to_l1_mask);
> 
>         vmcb01->control.int_ctl = (vmcb01->control.int_ctl & ~l2_to_l0_mask) |
>                                   (vmcb02->control.int_ctl & l2_to_l0_mask);

OK.


Best regards,
	Maxim Levitsky

> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-11-17 18:54   ` Sean Christopherson
@ 2022-11-21 12:36     ` Maxim Levitsky
  2022-11-21 17:18       ` Sean Christopherson
  2022-12-04 18:42     ` Maxim Levitsky
  1 sibling, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-21 12:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, 2022-11-17 at 18:54 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > From: Santosh Shukla <santosh.shukla@amd.com>
> > 
> > VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> > NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK
> > as read-only in the hypervisor except for the SMM case where hypervisor
> > before entring and after leaving SMM mode requires to set and unset
> > V_NMI_MASK.
> > 
> > Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> > L2.
> > 
> > Maxim:
> >    - made set_vnmi_mask/clear_vnmi_mask/is_vnmi_mask warn if called
> >      without vNMI enabled
> >    - clear IRET intercept in svm_set_nmi_mask even with vNMI
> > 
> > Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 18 ++++++++++++++-
> >  arch/x86/kvm/svm/svm.h | 52 ++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 69 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 08a7b2a0a29f3a..c16f68f6c4f7d7 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3618,13 +3618,29 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >  
> >  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> >  {
> > -       return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > +       struct vcpu_svm *svm = to_svm(vcpu);
> > +
> > +       if (is_vnmi_enabled(svm))
> > +               return is_vnmi_mask_set(svm);
> > +       else
> > +               return !!(vcpu->arch.hflags & HF_NMI_MASK);
> >  }
> >  
> >  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> >  {
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >  
> > +       if (is_vnmi_enabled(svm)) {
> > +               if (masked)
> > +                       set_vnmi_mask(svm);
> 
> I believe not setting INTERCEPT_IRET is correct, but only because the existing
> code is unnecessary.  And this all very subtly relies on KVM_REQ_EVENT being set
> and/or KVM already being in kvm_check_and_inject_events().
> 
> When NMIs become unblocked, INTERCEPT_IRET can be cleared, but KVM should also
> pending KVM_REQ_EVENT.  AFAICT, that doesn't happen when this is called via the
> emulator.  Ah, because em_iret() only handles RM for Intel's restricted guest
> crap.  I.e. it "works" only because it never happens.  All other flows set
> KVM_REQ_EVENT when toggling NMI blocking, e.g. the RSM path of kvm_smm_changed().
Makes sense


> 
> And when NMIs become blocked, there's no need to force INTERCEPT_IRET in this
> code because kvm_check_and_inject_events() will request an NMI window and set the
> intercept if necessary, and all paths that set NMI blocking are guaranteed to
> reach kvm_check_and_inject_events() before entering the guest.

I think I understand what you mean.


When a normal NMI is injected, we do have to intercept IRET because this is how
we know that NMI done executing.

But if NMI becames masked then it can be either if:

1. We are already in NMI, so masking it is essintially NOP, so no need to intercept
IRET since it is already intercepted.

2. We are not in NMI, then we don't need to intercept IRET, since its interception
is not needed to track that NMI is over, but only needed to detect NMI window
(IRET can be used without a paired NMI to unblock NMIs, which is IMHO a very wrong
x86 design decision, but the ship sailed long ago), and so we can intercept IRET
only when we request an actual NMI window.

Makes sense and I'll send a patch to do it.


> 
>   1. RSM => kvm_smm_changed() sets KVM_REQ_EVENT
>   2. enter_smm() is only called from within kvm_check_and_inject_events(),
>      before pending NMIs are processed (yay priority)
>   3. emulator_set_nmi_mask() never blocks NMIs, only does the half-baked IRET emulation
>   4. kvm_vcpu_ioctl_x86_set_vcpu_event() sets KVM_REQ_EVENT
> 
> So, can you add a prep patch to drop the forced INTERCEPT_IRET?  That way the
> logic for vNMI and !vNMI is the same.
> 
> > +               else {
> > +                       clear_vnmi_mask(svm);
> 
> This is the only code that sets/clears the vNMI mask, so rather than have set/clear
> helpers, what about a single helper to do the dirty work? 

Or not have any hepler at all maybe, since it is only done once? I don't know
I just wanted to not change the original patches too much, I only changed
the minimum to make it work.

> 
> > +                       if (!sev_es_guest(vcpu->kvm))
> > +                               svm_clr_intercept(svm, INTERCEPT_IRET);
> > +               }
> > +               return;
> > +       }
> > +
> >         if (masked) {
> >                 vcpu->arch.hflags |= HF_NMI_MASK;
> >                 if (!sev_es_guest(vcpu->kvm))
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index f5383104d00580..bf7f4851dee204 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> >  extern bool npt_enabled;
> >  extern int vgif;
> >  extern bool intercept_smi;
> > +extern bool vnmi;
> >  
> >  enum avic_modes {
> >         AVIC_MODE_NONE = 0,
> > @@ -531,6 +532,57 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
> >                (msr < (APIC_BASE_MSR + 0x100));
> >  }
> >  
> > +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> > +{
> > +       if (!vnmi)
> > +               return NULL;
> > +
> > +       if (is_guest_mode(&svm->vcpu))
> > +               return svm->nested.vmcb02.ptr;
> > +       else
> > +               return svm->vmcb01.ptr;
> > +}
> > +
> > +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> > +{
> > +       struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +       if (vmcb)
> > +               return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > +       else
> > +               return false;
> 
> Maybe just this?
> 
>         return vmcb && (vmcb->control.int_ctl & V_NMI_ENABLE);
> 
> Or if an inner helper is added:
> 
>         return vmcb && __is_vnmi_enabled(vmcb);
> 
> > +}
> > +
> > +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> > +{
> > +       struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +       if (!WARN_ON_ONCE(!vmcb))
> 
> Rather than WARN, add an inner __is_vnmi_enabled() that takes the vnmi_vmcb.
> Actually, if you do that, the test/set/clear helpers can go away entirely.
> 
> > +               return false;
> > +
> > +       return !!(vmcb->control.int_ctl & V_NMI_MASK);
> > +}
> > +
> > +static inline void set_vnmi_mask(struct vcpu_svm *svm)
> > +{
> > +       struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +       if (!WARN_ON_ONCE(!vmcb))
> > +               return;
> > +
> > +       vmcb->control.int_ctl |= V_NMI_MASK;
> > +}
> > +
> > +static inline void clear_vnmi_mask(struct vcpu_svm *svm)
> > +{
> > +       struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +       if (!WARN_ON_ONCE(!vmcb))
> > +               return;
> > +
> > +       vmcb->control.int_ctl &= ~V_NMI_MASK;
> > +}
> 
> These helpers can all go in svm.  There are no users oustide of svm.c, and
> unless I'm misunderstanding how nested works, there should never be oustide users.
> 
> E.g. with HF_NMI_MASK => svm->nmi_masked, the end result can be something like:
> 
> static bool __is_vnmi_enabled(struct *vmcb)
> {
>         return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> }
> 
> static bool is_vnmi_enabled(struct vcpu_svm *svm)
> {
>         struct vmcb *vmcb = get_vnmi_vmcb(svm);
> 
>         return vmcb && __is_vnmi_enabled(vmcb);
> }
> 
> static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
>         struct vmcb *vmcb = get_vnmi_vmcb(svm);
> 
>         if (vmcb && __is_vnmi_enabled(vmcb))
>                 return !!(vmcb->control.int_ctl & V_NMI_MASK);
>         else
>                 return !!(vcpu->arch.hflags & HF_NMI_MASK);
> }
> 
> static void svm_set_or_clear_vnmi_mask(struct vmcb *vmcb, bool set)
> {
>         if (set)
>                 vmcb->control.int_ctl |= V_NMI_MASK;
>         else
>                 vmcb->control.int_ctl &= ~V_NMI_MASK;
> }
> 
> static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
>         struct vmcb *vmcb = get_vnmi_vmcb(svm);
> 
>         if (vmcb && __is_vnmi_enabled(vmcb)) {
>                 if (masked)
>                         vmcb->control.int_ctl |= V_NMI_MASK;
>                 else
>                         vmcb->control.int_ctl &= ~V_NMI_MASK;
>         } else {
>                 svm->nmi_masked = masked;
>         }
> 
>         if (!masked)
>                 svm_disable_iret_interception(svm);
> }

OK, this is one of the ways to do it, makes sense overall.
I actualy wanted to do something like that but opted to not touch
the original code too much, but only what I needed. I can do this
in a next version.

Best regards,
	Maxim Levitsky

> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/13] KVM: SVM: allow NMI window with vNMI
  2022-11-17 18:21   ` Sean Christopherson
@ 2022-11-21 13:40     ` Maxim Levitsky
  0 siblings, 0 replies; 33+ messages in thread
From: Maxim Levitsky @ 2022-11-21 13:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86

On Thu, 2022-11-17 at 18:21 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > When the vNMI is enabled, the only case when the KVM will use an NMI
> > window is when the vNMI injection is pending.
> > 
> > In this case on next IRET/RSM/STGI, the injection has to be complete
> > and a new NMI can be injected.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index cfec4c98bb589b..eaa30f8ace518d 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2477,7 +2477,10 @@ static int iret_interception(struct kvm_vcpu *vcpu)
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >  
> >         ++vcpu->stat.nmi_window_exits;
> > -       vcpu->arch.hflags |= HF_IRET_MASK;
> > +
> > +       if (!is_vnmi_enabled(svm))
> > +               vcpu->arch.hflags |= HF_IRET_MASK;
> 
> Ugh, HF_IRET_MASK is such a terrible name/flag.  Given that it lives with GIF
> and NMI, one would naturally think that it means "IRET is intercepted", but it
> really means "KVM just intercepted an IRET and is waiting for NMIs to become
> unblocked".
> 
> And on a related topic, why on earth are GIF, NMI, and IRET tracked in hflags?
> They are 100% SVM concepts.  IMO, this code would be much easier to follow if
> by making them bools in vcpu_svm with more descriptive names.
> 
> > +
> >         if (!sev_es_guest(vcpu->kvm)) {
> >                 svm_clr_intercept(svm, INTERCEPT_IRET);
> >                 svm->nmi_iret_rip = kvm_rip_read(vcpu);
> 
> The vNMI interaction with this logic is confusing, as nmi_iret_rip doesn't need
> to be captured for the vNMI case.  SEV-ES actually has unrelated reasons for not
> reading RIP vs. not intercepting IRET, they just got bundled together here for
> convenience.
Yes, this can be cleaned up, again I didn't want to change too much of the code.


> 
> This is also an opportunity to clean up the SEV-ES interaction with IRET interception,
> which is splattered all over the place and isn't documented anywhere.
> 
> E.g. (with an HF_IRET_MASK => awaiting_iret_completion change)
> 
> /*
>  * For SEV-ES guests, KVM must not rely on IRET to detect NMI unblocking as
>  * #VC->IRET in the guest will result in KVM thinking NMIs are unblocked before
>  * the guest is ready for a new NMI.  Architecturally, KVM is 100% correct to
>  * treat NMIs as unblocked on IRET, but the guest-host ABI for SEV-ES guests is
>  * that KVM must wait for an explicit "NMI Complete" from the guest.
>  */
> static void svm_disable_iret_interception(struct vcpu_svm *svm)
> {
>         if (!sev_es_guest(svm->vcpu.kvm))
>                 svm_clr_intercept(svm, INTERCEPT_IRET);
> }
> 
> static void svm_enable_iret_interception(struct vcpu_svm *svm)
> {
>         if (!sev_es_guest(svm->vcpu.kvm))
>                 svm_set_intercept(svm, INTERCEPT_IRET);
> }

This makes sense, but doesn't have to be done in this patch series IMHO.
> 
> static int iret_interception(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
> 
>         ++vcpu->stat.nmi_window_exits;
> 
>         /*
>          * No need to wait for the IRET to complete if vNMIs are enabled as
>          * hardware will automatically process the pending NMI when NMIs are
>          * unblocked from the guest's perspective.
>          */
>         if (!is_vnmi_enabled(svm)) {
>                 svm->awaiting_iret_completion = true;
> 
>                 /*
>                  * The guest's RIP is inaccessible for SEV-ES guests, just
>                  * assume forward progress was made on the next VM-Exit.
>                  */
>                 if (!sev_es_guest(vcpu->kvm))
>                         svm->nmi_iret_rip = kvm_rip_read(vcpu);
>         }
> 
>         svm_disable_iret_interception(svm);
> 
>         kvm_make_request(KVM_REQ_EVENT, vcpu);
>         return 1;
> }

> > @@ -3735,9 +3738,6 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >  {
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >  
> > -       if (is_vnmi_enabled(svm))
> > -               return;
> > -
> >         if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
> >                 return; /* IRET will cause a vm exit */
> 
> As much as I like incremental patches, in this case I'm having a hell of a time
> reviewing the code as the vNMI logic ends up being split across four patches.
> E.g. in this particular case, the above requires knowing that svm_inject_nmi()
> never sets HF_NMI_MASK when vNMI is enabled.
> 
> In the next version, any objection to squashing patches 7-10 into a single "Add
> non-nested vNMI support" patch?

No objection at all - again since this is not my patch series, I didn't want
to make too many invasive changes to it.


> 
> As for this code, IMO some pre-work to change the flow would help with the vNMI
> case.  The GIF=0 logic overrides legacy NMI blocking, and so can be handled first.
> And I vote to explicitly set INTERCEPT_IRET in the above case instead of relying
> on INTERCEPT_IRET to already be set by svm_inject_nmi().
> 
> That would yield this as a final result:
> 
> static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> {
>         struct vcpu_svm *svm = to_svm(vcpu);
> 
>         /*
>          * GIF=0 blocks NMIs irrespective of legacy NMI blocking.  No need to
>          * intercept or single-step IRET if GIF=0, just intercept STGI.
>          */
>         if (!gif_set(svm)) {
>                 if (vgif)
>                         svm_set_intercept(svm, INTERCEPT_STGI);
>                 return;
>         }
> 
>         /*
>          * NMI is blocked, either because an NMI is in service or because KVM
>          * just injected an NMI.  If KVM is waiting for an intercepted IRET to
>          * complete, single-step the IRET to wait for NMIs to become unblocked.
>          * Otherwise, intercept the guest's next IRET.
>          */
>         if (svm->awaiting_iret_completion) {
>                 svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
>                 svm->nmi_singlestep = true;
>                 svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>         } else {
>                 svm_set_intercept(svm, INTERCEPT_IRET);
>         }
> }
> 
> >  
> > @@ -3751,9 +3751,14 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >          * Something prevents NMI from been injected. Single step over possible
> >          * problem (IRET or exception injection or interrupt shadow)
> >          */
> > -       svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> > -       svm->nmi_singlestep = true;
> > -       svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > +
> > +       if (is_vnmi_enabled(svm)) {
> > +               svm_set_intercept(svm, INTERCEPT_IRET);
> 
> This will break SEV-ES.  Per commit 4444dfe4050b ("KVM: SVM: Add NMI support for
> an SEV-ES guest"), the hypervisor must not rely on IRET interception to detect
> NMI unblocking for SEV-ES guests.  As above, I think we should provide helpers to
> toggle NMI interception to reduce the probability of breaking SEV-ES.
Yes, one more reason for the helpers, I didn't notice that I missed that 'if'.


> 
> > +       } else {
> > +               svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu);
> > +               svm->nmi_singlestep = true;
> > +               svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> > +       }
> >  }
> >  
> >  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> > -- 
> > 2.34.3
> > 
> 
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/13] KVM: SVM: Add VNMI support in inject_nmi
  2022-11-17 14:32 ` [PATCH 10/13] KVM: SVM: Add VNMI support in inject_nmi Maxim Levitsky
@ 2022-11-21 17:12   ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2022-11-21 17:12 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> From: Santosh Shukla <santosh.shukla@amd.com>
> 
> Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
> will clear V_NMI to acknowledge processing has started and will keep the
> V_NMI_MASK set until the processor is done with processing the NMI event.
> 
> Also, handle the nmi_l1_to_l2 case such that when it is true then
> NMI to be injected originally comes from L1's VMCB12 EVENTINJ field.
> So adding a check for that case.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index eaa30f8ace518d..9ebfbd0d4b467e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3479,7 +3479,14 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
>  static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct vmcb *vmcb = NULL;

As written, no need to initialize vmcb.  Might be a moot point depending on the
final form of the code.
  
> +	if (is_vnmi_enabled(svm) && !svm->nmi_l1_to_l2) {

Checking nmi_l1_to_l2 is wrong.  KVM should directly re-inject any NMI that was
already recognized by hardware, not just those that were originally injected by
L1.

If another event comes along, e.g. SMI, because an event (NMI) is already injected,
KVM will send a hardware IRQ to interrupt the guest and forcea a VM-Exit so that
the SMI can be injected.  If hardware does the (IMO) sane thing and prioritizes
"real" IRQs over virtual NMIs, the IRQ VM-Exit will occur before the virtual NMI
is processed and KVM will incorrectly service the SMI before the NMI.

I believe the correct way to handle this is to add a @reinjected param to
->inject_nmi(), a la ->inject_irq().  That would also allow adding a sanity check
that KVM never attempts to inject an NMI into L2 if NMIs are supposed to trigger
VM-Exit.

This is the least ugly code I could come up with.  Note, if vNMI is enabled,
hardare sets V_NMI_MASKED if an NMI is injected through event_inj.

static void svm_inject_nmi(struct kvm_vcpu *vcpu, bool reinjected)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	/*
	 * Except for re-injection, KVM should never inject an NMI into L2 if
	 * NMIs are supposed to exit from L2 to L1.
	 */
	WARN_ON_ONCE(!reinjected && is_guest_mode(vcpu) && nested_exit_on_nmi(svm));

	if (is_vnmi_enabled(svm)) {
		if (!reinjected)
			svm->vmcb->control.int_ctl |= V_NMI_PENDING;
		else
			svm->vmcb->control.event_inj = SVM_EVTINJ_VALID |
						       SVM_EVTINJ_TYPE_NMI;
		++vcpu->stat.nmi_injections;
		return;
	}

	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;

	if (svm->nmi_l1_to_l2)
		return;

	vcpu->arch.hflags |= HF_NMI_MASK;
	if (!sev_es_guest(vcpu->kvm))
		svm_set_intercept(svm, INTERCEPT_IRET);
	++vcpu->stat.nmi_injections;
}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-11-21 12:36     ` Maxim Levitsky
@ 2022-11-21 17:18       ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2022-11-21 17:18 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Mon, Nov 21, 2022, Maxim Levitsky wrote:
> On Thu, 2022-11-17 at 18:54 +0000, Sean Christopherson wrote:
> > E.g. with HF_NMI_MASK => svm->nmi_masked, the end result can be something like:
> > 
> > static bool __is_vnmi_enabled(struct *vmcb)
> > {
> >         return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > }
> > 
> > static bool is_vnmi_enabled(struct vcpu_svm *svm)
> > {
> >         struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > 
> >         return vmcb && __is_vnmi_enabled(vmcb);
> > }
> > 
> > static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > {
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >         struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > 
> >         if (vmcb && __is_vnmi_enabled(vmcb))
> >                 return !!(vmcb->control.int_ctl & V_NMI_MASK);
> >         else
> >                 return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > }
> > 
> > static void svm_set_or_clear_vnmi_mask(struct vmcb *vmcb, bool set)
> > {
> >         if (set)
> >                 vmcb->control.int_ctl |= V_NMI_MASK;
> >         else
> >                 vmcb->control.int_ctl &= ~V_NMI_MASK;
> > }
> > 
> > static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> > {
> >         struct vcpu_svm *svm = to_svm(vcpu);
> >         struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > 
> >         if (vmcb && __is_vnmi_enabled(vmcb)) {
> >                 if (masked)
> >                         vmcb->control.int_ctl |= V_NMI_MASK;
> >                 else
> >                         vmcb->control.int_ctl &= ~V_NMI_MASK;
> >         } else {
> >                 svm->nmi_masked = masked;
> >         }
> > 
> >         if (!masked)
> >                 svm_disable_iret_interception(svm);
> > }
> 
> OK, this is one of the ways to do it, makes sense overall.
> I actualy wanted to do something like that but opted to not touch
> the original code too much, but only what I needed. I can do this
> in a next version.

After looking at more of this code, I think having get_vnmi_vmcb() is a mistake.
It just ends up being a funky wrapper to the current svm->vmcb.  And the manual
check on the "vnmi" global is pointless.  If KVM sets V_NMI_ENABLE in any VMCB
when vnmi=false, then that's a KVM bug.

Dropping the wrapper eliminates the possibility of a NULL VMCB pointer, and IMO
yields far more readable code.


static bool is_vnmi_enabled(struct vcpu_svm *svm)
{
	return !!(svm->vmcb->control.int_ctl & V_NMI_ENABLE);
}

static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	if (is_vnmi_enabled(svm))
		return !!(svm->vmcb->control.int_ctl & V_NMI_MASK);
	else
		return !!(vcpu->arch.hflags & HF_NMI_MASK);
}

static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
{
	struct vcpu_svm *svm = to_svm(vcpu);

	if (is_vnmi_enabled(svm)) {
		if (masked)
			svm->vmcb->control.int_ctl |= V_NMI_MASK;
		else
			svm->vmcb->control.int_ctl &= ~V_NMI_MASK;
	} else {
		svm->nmi_masked = masked;
	}

	if (!masked)
		svm_disable_iret_interception(svm);
}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each VM exit
  2022-11-21 11:07     ` Maxim Levitsky
@ 2022-11-21 17:51       ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2022-11-21 17:51 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86

On Mon, Nov 21, 2022, Maxim Levitsky wrote:
> On Thu, 2022-11-17 at 20:04 +0000, Sean Christopherson wrote:
> > On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > > Calling nested_sync_control_from_vmcb02 on each VM exit (nested or not),
> > > was an attempt to keep the int_ctl field in the vmcb12 cache
> > > up to date on each VM exit.
> > 
> > This doesn't mesh with the reasoning in commit 2d8a42be0e2b ("KVM: nSVM: synchronize
> > VMCB controls updated by the processor on every vmexit"), which states that the
> > goal is to keep svm->nested.ctl.* synchronized, not vmcb12.  Or is nested.ctl the
> > cache you are referring to?
> 
> Thanks for digging that commit out.
> 
> My reasoning was that cache contains both control and 'save' fields, and
> we don't update 'save' fields on each VM exit.
>
> For control it indeed looks like int_ctl and eventinj are the only fields
> that are updated by the CPU, although IMHO they don't *need* to be updated
> until we do a nested VM exit, because the VM isn't supposed to look at vmcb
> while it is in use by the CPU, its state is undefined.

The point of the cache isn't to forward info to L2 though, it's so that KVM can
query the effective VMCB state without having to read guest memory and/or track
where the current state lives.

> For migration though, this does look like a problem. It can be fixed during
> reading the nested state but it is a hack as well.
>
> My idea was as you had seen in the patches it to unify int_ctl handling,
> since some bits might need to be copied to vmcb12 but some to vmcb01,
> and we happened to have none of these so far, and it "happened" to work.
> 
> Do you have an idea on how to do this cleanly? I can just leave this as is
> and only sync the bits of int_ctl from vmcb02 to vmcb01 on nested VM exit.
> Ugly but would work.

That honestly seems like the best option to me.  The ugly part isn't as much KVM's
caching as it is the mixed, conditional behavior of int_ctl.  E.g. VMX has even
more caching and synchronization (eVMCS, shadow VMCS, etc...), but off the top of
my head I can't think of any scenarios where KVM needs to splice/split VMCS fields.
KVM needs to conditionally sync fields, but not split like this.

In other words, I think this particular code is going to be rather ugly no matter
what KVM does.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-11-17 18:54   ` Sean Christopherson
  2022-11-21 12:36     ` Maxim Levitsky
@ 2022-12-04 18:42     ` Maxim Levitsky
  2022-12-06 18:27       ` Sean Christopherson
  1 sibling, 1 reply; 33+ messages in thread
From: Maxim Levitsky @ 2022-12-04 18:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Thu, 2022-11-17 at 18:54 +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Maxim Levitsky wrote:
> > From: Santosh Shukla <santosh.shukla@amd.com>
> > 
> > VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> > NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK
> > as read-only in the hypervisor except for the SMM case where hypervisor
> > before entring and after leaving SMM mode requires to set and unset
> > V_NMI_MASK.
> > 
> > Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> > L2.
> > 
> > Maxim:
> >    - made set_vnmi_mask/clear_vnmi_mask/is_vnmi_mask warn if called
> >      without vNMI enabled
> >    - clear IRET intercept in svm_set_nmi_mask even with vNMI
> > 
> > Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 18 ++++++++++++++-
> >  arch/x86/kvm/svm/svm.h | 52 ++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 69 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 08a7b2a0a29f3a..c16f68f6c4f7d7 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3618,13 +3618,29 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >  
> >  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> >  {
> > -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +
> > +	if (is_vnmi_enabled(svm))
> > +		return is_vnmi_mask_set(svm);
> > +	else
> > +		return !!(vcpu->arch.hflags & HF_NMI_MASK);
> >  }
> >  
> >  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> >  
> > +	if (is_vnmi_enabled(svm)) {
> > +		if (masked)
> > +			set_vnmi_mask(svm);
> 
> I believe not setting INTERCEPT_IRET is correct, but only because the existing
> code is unnecessary.  And this all very subtly relies on KVM_REQ_EVENT being set
> and/or KVM already being in kvm_check_and_inject_events().

Actually after thinking about this again, I am almost sure that you are wrong about this:

KVM sets INTERCEPT_IRET not only because of detection of NMI window but to know
when the NMI handler execution ended.

Even if no additional NMI arrives, if you don't intercept IRET, KVM will think that
NMI are never unmasked, and in particular when another NMI arrives, KVM will
try to open an NMI window while it is already open.

Now the svm_set_nmi_mask masks NMI either when it is called by SMM entry point,
or on migaration.

On SMM entry point, most of the time the NMIs will be unmasked by RSM, but
unsoliced IRET is also an (official) way to unmask NMI, which we will miss
if we don't intercept IRET.

On migartion, also, if it happened during NMI, we also have to intercept IRET once loading the
migration stream so that we can know when NMI ended in the same way.

All of this is of course only true for !vNMI case.

For vNMI case it turns out that we don't need to intercept IRET at all after all:

Turns out that when vNMI is pending, you can still EVENTINJ another NMI, and the pending
vNMI will be kept pending, vNMI will became masked due to EVENTINJ, and on IRET the pending vNMI
will be serviced as well, so in total both NMIs will be serviced.

I confirmed this with Santosh Shukla, and in my patch series I take an advantage of
this by failing the delayed NMI injection, and making KVM fail back to EVENTINJ.

If NMIs are blocked (VNMI_MASKED bit set) on the other hand, and a vNMI is pending, 
then we can't use EVENTINJ to inject the second NMI, but in this case KVM drops
the second NMI anyway.

It all seems to add up very nicely, please take a look at the patch series
([PATCH v2 00/11] SVM: vNMI (with my fixes))


Best regards,
	Maxim Levitsky 

> 
> When NMIs become unblocked, INTERCEPT_IRET can be cleared, but KVM should also
> pending KVM_REQ_EVENT.  AFAICT, that doesn't happen when this is called via the
> emulator.  Ah, because em_iret() only handles RM for Intel's restricted guest
> crap.  I.e. it "works" only because it never happens.  All other flows set
> KVM_REQ_EVENT when toggling NMI blocking, e.g. the RSM path of kvm_smm_changed().
> 
> And when NMIs become blocked, there's no need to force INTERCEPT_IRET in this
> code because kvm_check_and_inject_events() will request an NMI window and set the
> intercept if necessary, and all paths that set NMI blocking are guaranteed to
> reach kvm_check_and_inject_events() before entering the guest.
> 
>   1. RSM => kvm_smm_changed() sets KVM_REQ_EVENT
>   2. enter_smm() is only called from within kvm_check_and_inject_events(),
>      before pending NMIs are processed (yay priority)
>   3. emulator_set_nmi_mask() never blocks NMIs, only does the half-baked IRET emulation
>   4. kvm_vcpu_ioctl_x86_set_vcpu_event() sets KVM_REQ_EVENT
> 
> So, can you add a prep patch to drop the forced INTERCEPT_IRET?  That way the
> logic for vNMI and !vNMI is the same.
> 
> > +		else {
> > +			clear_vnmi_mask(svm);
> 
> This is the only code that sets/clears the vNMI mask, so rather than have set/clear
> helpers, what about a single helper to do the dirty work? 
> 
> > +			if (!sev_es_guest(vcpu->kvm))
> > +				svm_clr_intercept(svm, INTERCEPT_IRET);
> > +		}
> > +		return;
> > +	}
> > +
> >  	if (masked) {
> >  		vcpu->arch.hflags |= HF_NMI_MASK;
> >  		if (!sev_es_guest(vcpu->kvm))
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index f5383104d00580..bf7f4851dee204 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> >  extern bool npt_enabled;
> >  extern int vgif;
> >  extern bool intercept_smi;
> > +extern bool vnmi;
> >  
> >  enum avic_modes {
> >  	AVIC_MODE_NONE = 0,
> > @@ -531,6 +532,57 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
> >  	       (msr < (APIC_BASE_MSR + 0x100));
> >  }
> >  
> > +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> > +{
> > +	if (!vnmi)
> > +		return NULL;
> > +
> > +	if (is_guest_mode(&svm->vcpu))
> > +		return svm->nested.vmcb02.ptr;
> > +	else
> > +		return svm->vmcb01.ptr;
> > +}
> > +
> > +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> > +{
> > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +	if (vmcb)
> > +		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > +	else
> > +		return false;
> 
> Maybe just this?
> 
> 	return vmcb && (vmcb->control.int_ctl & V_NMI_ENABLE);
> 
> Or if an inner helper is added:
> 
> 	return vmcb && __is_vnmi_enabled(vmcb);
> 
> > +}
> > +
> > +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> > +{
> > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +	if (!WARN_ON_ONCE(!vmcb))
> 
> Rather than WARN, add an inner __is_vnmi_enabled() that takes the vnmi_vmcb.
> Actually, if you do that, the test/set/clear helpers can go away entirely.
> 
> > +		return false;
> > +
> > +	return !!(vmcb->control.int_ctl & V_NMI_MASK);
> > +}
> > +
> > +static inline void set_vnmi_mask(struct vcpu_svm *svm)
> > +{
> > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +	if (!WARN_ON_ONCE(!vmcb))
> > +		return;
> > +
> > +	vmcb->control.int_ctl |= V_NMI_MASK;
> > +}
> > +
> > +static inline void clear_vnmi_mask(struct vcpu_svm *svm)
> > +{
> > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > +
> > +	if (!WARN_ON_ONCE(!vmcb))
> > +		return;
> > +
> > +	vmcb->control.int_ctl &= ~V_NMI_MASK;
> > +}
> 
> These helpers can all go in svm.  There are no users oustide of svm.c, and
> unless I'm misunderstanding how nested works, there should never be oustide users.
> 
> E.g. with HF_NMI_MASK => svm->nmi_masked, the end result can be something like:
> 
> static bool __is_vnmi_enabled(struct *vmcb)
> {
> 	return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> }
> 
> static bool is_vnmi_enabled(struct vcpu_svm *svm)
> {
> 	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> 
> 	return vmcb && __is_vnmi_enabled(vmcb);
> }
> 
> static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> {
> 	struct vcpu_svm *svm = to_svm(vcpu);
> 	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> 
> 	if (vmcb && __is_vnmi_enabled(vmcb))
> 		return !!(vmcb->control.int_ctl & V_NMI_MASK);
> 	else
> 		return !!(vcpu->arch.hflags & HF_NMI_MASK);
> }
> 
> static void svm_set_or_clear_vnmi_mask(struct vmcb *vmcb, bool set)
> {
> 	if (set)
> 		vmcb->control.int_ctl |= V_NMI_MASK;
> 	else
> 		vmcb->control.int_ctl &= ~V_NMI_MASK;
> }
> 
> static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> {
> 	struct vcpu_svm *svm = to_svm(vcpu);
> 	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> 
> 	if (vmcb && __is_vnmi_enabled(vmcb)) {
> 		if (masked)
> 			vmcb->control.int_ctl |= V_NMI_MASK;
> 		else
> 			vmcb->control.int_ctl &= ~V_NMI_MASK;
> 	} else {
> 		svm->nmi_masked = masked;
> 	}
> 
> 	if (!masked)
> 		svm_disable_iret_interception(svm);
> }
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-12-04 18:42     ` Maxim Levitsky
@ 2022-12-06 18:27       ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2022-12-06 18:27 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	linux-kernel, Peter Zijlstra, Thomas Gleixner, Sandipan Das,
	Daniel Sneddon, Jing Liu, Josh Poimboeuf, Wyes Karny,
	Borislav Petkov, Babu Moger, Pawan Gupta, Jim Mattson, x86,
	Santosh Shukla

On Sun, Dec 04, 2022, Maxim Levitsky wrote:
> For vNMI case it turns out that we don't need to intercept IRET at all after all:
> 
> Turns out that when vNMI is pending, you can still EVENTINJ another NMI, and
> the pending vNMI will be kept pending, vNMI will became masked due to
> EVENTINJ, and on IRET the pending vNMI will be serviced as well, so in total
> both NMIs will be serviced.

I believe past me was thinking that the "merging" of pending NMIs happened in the
context of the sender, but it always happens in the context of the target vCPU.
Senders always do KVM_REQ_NMI, i.e. always kick if the vCPU in running, which gives
KVM the hook it needs to update the VMCB.

So yeah, as long as KVM can stuff two NMIs into the VMCB, I think we're good.
I'll give the series a proper review in the next week or so.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-12-06 18:32 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17 14:32 [PATCH 00/13] SVM: vNMI (with my fixes) Maxim Levitsky
2022-11-17 14:32 ` [PATCH 01/13] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
2022-11-17 14:32 ` [PATCH 02/13] KVM: nSVM: don't call nested_sync_control_from_vmcb02 on each " Maxim Levitsky
2022-11-17 20:04   ` Sean Christopherson
2022-11-21 11:07     ` Maxim Levitsky
2022-11-21 17:51       ` Sean Christopherson
2022-11-17 14:32 ` [PATCH 03/13] KVM: nSVM: rename nested_sync_control_from_vmcb02 to nested_sync_int_ctl_from_vmcb02 Maxim Levitsky
2022-11-17 14:32 ` [PATCH 04/13] KVM: nSVM: clean up copying of int_ctl fields back to vmcb01/vmcb12 Maxim Levitsky
2022-11-17 20:15   ` Sean Christopherson
2022-11-21 11:10     ` Maxim Levitsky
2022-11-17 14:32 ` [PATCH 05/13] x86/cpu: Add CPUID feature bit for VNMI Maxim Levitsky
2022-11-17 14:32 ` [PATCH 06/13] KVM: SVM: Add VNMI bit definition Maxim Levitsky
2022-11-17 14:37   ` Borislav Petkov
2022-11-17 16:42     ` Sean Christopherson
2022-11-17 17:07       ` Borislav Petkov
2022-11-17 20:33         ` Sean Christopherson
2022-11-17 20:27   ` Sean Christopherson
2022-11-17 14:32 ` [PATCH 07/13] KVM: SVM: Add VNMI support in get/set_nmi_mask Maxim Levitsky
2022-11-17 18:54   ` Sean Christopherson
2022-11-21 12:36     ` Maxim Levitsky
2022-11-21 17:18       ` Sean Christopherson
2022-12-04 18:42     ` Maxim Levitsky
2022-12-06 18:27       ` Sean Christopherson
2022-11-17 14:32 ` [PATCH 08/13] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI Maxim Levitsky
2022-11-17 14:32 ` [PATCH 09/13] KVM: SVM: allow NMI window with vNMI Maxim Levitsky
2022-11-17 18:21   ` Sean Christopherson
2022-11-21 13:40     ` Maxim Levitsky
2022-11-17 14:32 ` [PATCH 10/13] KVM: SVM: Add VNMI support in inject_nmi Maxim Levitsky
2022-11-21 17:12   ` Sean Christopherson
2022-11-17 14:32 ` [PATCH 11/13] KVM: nSVM: implement nested VNMI Maxim Levitsky
2022-11-17 14:32 ` [PATCH 12/13] KVM: nSVM: emulate VMEXIT_INVALID case for " Maxim Levitsky
2022-11-17 20:18   ` Sean Christopherson
2022-11-17 14:32 ` [PATCH 13/13] KVM: SVM: Enable VNMI feature Maxim Levitsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.