linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/7] Virtual NMI feature
@ 2022-07-09 13:42 Santosh Shukla
  2022-07-09 13:42 ` [PATCHv2 1/7] x86/cpu: Add CPUID feature bit for VNMI Santosh Shukla
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Santosh Shukla @ 2022-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel, Santosh.Shukla

Currently, NMI is delivered to the guest using the Event Injection
mechanism [1]. The Event Injection mechanism does not block the delivery
of subsequent NMIs. So the Hypervisor needs to track the NMI delivery
and its completion(by intercepting IRET) before sending a new NMI.

Virtual NMI (VNMI) allows the hypervisor to inject the NMI into the guest
w/o using Event Injection mechanism meaning not required to track the
guest NMI and intercepting the IRET. To achieve that,
VNMI feature provides virtualized NMI and NMI_MASK capability bits in
VMCB intr_control -
V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.

When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor will
clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
handling NMI, After the guest handled the NMI, The processor will clear
the V_NMI_MASK on the successful completion of IRET instruction
Or if VMEXIT occurs while delivering the virtual NMI.

To enable the VNMI capability, Hypervisor need to program
V_NMI_ENABLE bit 1.

The presence of this feature is indicated via the CPUID function
0x8000000A_EDX[25].

Testing -
* Used qemu's `inject_nmi` for testing.
* tested with and w/o AVIC case.
* tested with kvm-unit-test
* tested injecting NMI event from L1 to L2 guest (nested env).

v2:
01, 02 - added maxim reviwed-by.
03 - Added get_vnmi_vmcb API to return vmcb for l1 and l2.
04 - Moved vnmi check after is_guest_mode() in func svm_nmi_blocked().
05 - Added WARN_ON check for vnmi pending.
06 - Save the V_NMI_PENDING/MASK state in vmcb12 on vmexit.
07 - No change.


Thanks,
Santosh
[1] https://www.amd.com/system/files/TechDocs/40332.pdf - APM Vol2,
ch-15.20 - "Event Injection".

Santosh Shukla (7):
  x86/cpu: Add CPUID feature bit for VNMI
  KVM: SVM: Add VNMI bit definition
  KVM: SVM: Add VNMI support in get/set_nmi_mask
  KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  KVM: SVM: Add VNMI support in inject_nmi
  KVM: nSVM: implement nested VNMI
  KVM: SVM: Enable VNMI feature

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/svm.h         |  7 ++++++
 arch/x86/kvm/svm/nested.c          | 16 +++++++++++-
 arch/x86/kvm/svm/svm.c             | 40 ++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.h             | 38 ++++++++++++++++++++++++++++
 5 files changed, 99 insertions(+), 3 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCHv2 1/7] x86/cpu: Add CPUID feature bit for VNMI
  2022-07-09 13:42 [PATCHv2 0/7] Virtual NMI feature Santosh Shukla
@ 2022-07-09 13:42 ` Santosh Shukla
  2022-07-09 13:42 ` [PATCHv2 2/7] KVM: SVM: Add VNMI bit definition Santosh Shukla
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Santosh Shukla @ 2022-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel, Santosh.Shukla

VNMI feature allows the hypervisor to inject NMI into the guest w/o
using Event injection mechanism, The benefit of using VNMI over the
event Injection that does not require tracking the Guest's NMI state and
intercepting the IRET for the NMI completion. VNMI achieves that by
exposing 3 capability bits in VMCB intr_cntrl which helps with
virtualizing NMI injection and NMI_Masking.

The presence of this feature is indicated via the CPUID function
0x8000000A_EDX[25].

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
v2:
- Added Maxim reviewed-by.

 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 03acc823838a..9a760236948c 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -346,6 +346,7 @@
 #define X86_FEATURE_V_VMSAVE_VMLOAD	(15*32+15) /* Virtual VMSAVE VMLOAD */
 #define X86_FEATURE_VGIF		(15*32+16) /* Virtual GIF */
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* Virtual SPEC_CTRL */
+#define X86_FEATURE_V_NMI		(15*32+25) /* Virtual NMI */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* "" SVME addr check */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv2 2/7] KVM: SVM: Add VNMI bit definition
  2022-07-09 13:42 [PATCHv2 0/7] Virtual NMI feature Santosh Shukla
  2022-07-09 13:42 ` [PATCHv2 1/7] x86/cpu: Add CPUID feature bit for VNMI Santosh Shukla
@ 2022-07-09 13:42 ` Santosh Shukla
  2022-07-09 13:42 ` [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask Santosh Shukla
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Santosh Shukla @ 2022-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel, Santosh.Shukla

VNMI exposes 3 capability bits (V_NMI, V_NMI_MASK, and V_NMI_ENABLE) to
virtualize NMI and NMI_MASK, Those capability bits are part of
VMCB::intr_ctrl -
V_NMI(11) - Indicates whether a virtual NMI is pending in the guest.
V_NMI_MASK(12) - Indicates whether virtual NMI is masked in the guest.
V_NMI_ENABLE(26) - Enables the NMI virtualization feature for the guest.

When Hypervisor wants to inject NMI, it will set V_NMI bit, Processor
will clear the V_NMI bit and Set the V_NMI_MASK which means the Guest is
handling NMI, After the guest handled the NMI, The processor will clear
the V_NMI_MASK on the successful completion of IRET instruction Or if
VMEXIT occurs while delivering the virtual NMI.

To enable the VNMI capability, Hypervisor need to program
V_NMI_ENABLE bit 1.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
v2:
- Added Maxim reviwed-by.

 arch/x86/include/asm/svm.h | 7 +++++++
 arch/x86/kvm/svm/svm.c     | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1b07fba11704..22d918555df0 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -195,6 +195,13 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define AVIC_ENABLE_SHIFT 31
 #define AVIC_ENABLE_MASK (1 << AVIC_ENABLE_SHIFT)
 
+#define V_NMI_PENDING_SHIFT 11
+#define V_NMI_PENDING (1 << V_NMI_PENDING_SHIFT)
+#define V_NMI_MASK_SHIFT 12
+#define V_NMI_MASK (1 << V_NMI_MASK_SHIFT)
+#define V_NMI_ENABLE_SHIFT 26
+#define V_NMI_ENABLE (1 << V_NMI_ENABLE_SHIFT)
+
 #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
 #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 44bbf25dfeb9..baaf35be36e5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -198,6 +198,8 @@ module_param(dump_invalid_vmcb, bool, 0644);
 bool intercept_smi = true;
 module_param(intercept_smi, bool, 0444);
 
+static bool vnmi;
+module_param(vnmi, bool, 0444);
 
 static bool svm_gp_erratum_intercept = true;
 
@@ -4933,6 +4935,10 @@ static __init int svm_hardware_setup(void)
 		svm_x86_ops.vcpu_get_apicv_inhibit_reasons = NULL;
 	}
 
+	vnmi = vnmi && boot_cpu_has(X86_FEATURE_V_NMI);
+	if (vnmi)
+		pr_info("V_NMI enabled\n");
+
 	if (vls) {
 		if (!npt_enabled ||
 		    !boot_cpu_has(X86_FEATURE_V_VMSAVE_VMLOAD) ||
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-07-09 13:42 [PATCHv2 0/7] Virtual NMI feature Santosh Shukla
  2022-07-09 13:42 ` [PATCHv2 1/7] x86/cpu: Add CPUID feature bit for VNMI Santosh Shukla
  2022-07-09 13:42 ` [PATCHv2 2/7] KVM: SVM: Add VNMI bit definition Santosh Shukla
@ 2022-07-09 13:42 ` Santosh Shukla
  2022-07-10 16:15   ` Maxim Levitsky
  2022-07-09 13:42 ` [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI Santosh Shukla
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Santosh Shukla @ 2022-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel, Santosh.Shukla

VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK as
read-only in the hypervisor and do not populate set accessors.

Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
L2.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
v2:
- Added get_vnmi_vmcb API to return vmcb for l1 and l2.
- Use get_vnmi_vmcb to get correct vmcb in func -
  is_vnmi_enabled/_mask_set()
- removed vnmi check from is_vnmi_enabled() func.

 arch/x86/kvm/svm/svm.c | 12 ++++++++++--
 arch/x86/kvm/svm/svm.h | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index baaf35be36e5..3574e804d757 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -198,7 +198,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
 bool intercept_smi = true;
 module_param(intercept_smi, bool, 0444);
 
-static bool vnmi;
+bool vnmi = true;
 module_param(vnmi, bool, 0444);
 
 static bool svm_gp_erratum_intercept = true;
@@ -3503,13 +3503,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 
 static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
 {
-	return !!(vcpu->arch.hflags & HF_NMI_MASK);
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (is_vnmi_enabled(svm))
+		return is_vnmi_mask_set(svm);
+	else
+		return !!(vcpu->arch.hflags & HF_NMI_MASK);
 }
 
 static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (is_vnmi_enabled(svm))
+		return;
+
 	if (masked) {
 		vcpu->arch.hflags |= HF_NMI_MASK;
 		if (!sev_es_guest(vcpu->kvm))
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9223ac100ef5..f36e30df6202 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
 extern int vgif;
 extern bool intercept_smi;
+extern bool vnmi;
 
 /*
  * Clean bits in VMCB.
@@ -509,6 +510,37 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
 	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
 }
 
+static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
+{
+	if (!vnmi)
+		return NULL;
+
+	if (is_guest_mode(&svm->vcpu))
+		return svm->nested.vmcb02.ptr;
+	else
+		return svm->vmcb01.ptr;
+}
+
+static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+	if (vmcb)
+		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
+	else
+		return false;
+}
+
+static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
+{
+	struct vmcb *vmcb = get_vnmi_vmcb(svm);
+
+	if (vmcb)
+		return !!(vmcb->control.int_ctl & V_NMI_MASK);
+	else
+		return false;
+}
+
 /* svm.c */
 #define MSR_INVALID				0xffffffffU
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-09 13:42 [PATCHv2 0/7] Virtual NMI feature Santosh Shukla
                   ` (2 preceding siblings ...)
  2022-07-09 13:42 ` [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask Santosh Shukla
@ 2022-07-09 13:42 ` Santosh Shukla
  2022-07-20 21:54   ` Sean Christopherson
  2022-07-09 13:42 ` [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi Santosh Shukla
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Santosh Shukla @ 2022-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel, Santosh.Shukla

In the VNMI case, Report NMI is not allowed when the processor set the
V_NMI_MASK to 1 which means the Guest is busy handling VNMI.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
v2:
- Moved vnmi check after is_guest_mode() in func _nmi_blocked().
- Removed is_vnmi_mask_set check from _enable_nmi_window().
as it was a redundent check.

 arch/x86/kvm/svm/svm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3574e804d757..44c1f2317b45 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3480,6 +3480,9 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
 		return false;
 
+	if (is_vnmi_enabled(svm) && is_vnmi_mask_set(svm))
+		return true;
+
 	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
 	      (vcpu->arch.hflags & HF_NMI_MASK);
 
@@ -3609,6 +3612,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
+	if (is_vnmi_enabled(svm))
+		return;
+
 	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
 		return; /* IRET will cause a vm exit */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi
  2022-07-09 13:42 [PATCHv2 0/7] Virtual NMI feature Santosh Shukla
                   ` (3 preceding siblings ...)
  2022-07-09 13:42 ` [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI Santosh Shukla
@ 2022-07-09 13:42 ` Santosh Shukla
  2022-07-20 21:41   ` Sean Christopherson
  2022-07-09 13:42 ` [PATCHv2 6/7] KVM: nSVM: implement nested VNMI Santosh Shukla
  2022-07-09 13:42 ` [PATCHv2 7/7] KVM: SVM: Enable VNMI feature Santosh Shukla
  6 siblings, 1 reply; 29+ messages in thread
From: Santosh Shukla @ 2022-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel, Santosh.Shukla

Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
will clear V_NMI to acknowledge processing has started and will keep the
V_NMI_MASK set until the processor is done with processing the NMI event.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
v2:
- Added WARN_ON check for vnmi pending.
- use `get_vnmi_vmcb` to get correct vmcb so to inject vnmi.

 arch/x86/kvm/svm/svm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 44c1f2317b45..c73a1809a7c7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3375,12 +3375,20 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
 static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *vmcb = NULL;
+
+	++vcpu->stat.nmi_injections;
+	if (is_vnmi_enabled(svm)) {
+		vmcb = get_vnmi_vmcb(svm);
+		WARN_ON(vmcb->control.int_ctl & V_NMI_PENDING);
+		vmcb->control.int_ctl |= V_NMI_PENDING;
+		return;
+	}
 
 	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
 	vcpu->arch.hflags |= HF_NMI_MASK;
 	if (!sev_es_guest(vcpu->kvm))
 		svm_set_intercept(svm, INTERCEPT_IRET);
-	++vcpu->stat.nmi_injections;
 }
 
 static void svm_inject_irq(struct kvm_vcpu *vcpu)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv2 6/7] KVM: nSVM: implement nested VNMI
  2022-07-09 13:42 [PATCHv2 0/7] Virtual NMI feature Santosh Shukla
                   ` (4 preceding siblings ...)
  2022-07-09 13:42 ` [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi Santosh Shukla
@ 2022-07-09 13:42 ` Santosh Shukla
  2022-07-09 13:42 ` [PATCHv2 7/7] KVM: SVM: Enable VNMI feature Santosh Shukla
  6 siblings, 0 replies; 29+ messages in thread
From: Santosh Shukla @ 2022-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel, Santosh.Shukla

Currently nested_vmcb02_prepare_control func checks and programs bits
(V_TPR,_INTR, _IRQ) in nested mode, To support nested VNMI,
extending the check for VNMI bits if VNMI is enabled and
on vmexit save V_NMI_MASK/PENDING state to vmcb12.

Tested with the KVM-unit-test and Nested Guest scenario.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
v2:
- Save the V_NMI_PENDING/MASK state in vmcb12 on vmexit.
- Tested nested environment - L1 injecting vnmi to L2.
- Tested vnmi in kvm-unit-test.

 arch/x86/kvm/svm/nested.c | 16 +++++++++++++++-
 arch/x86/kvm/svm/svm.c    |  5 +++++
 arch/x86/kvm/svm/svm.h    |  6 ++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ba7cd26f438f..f97651f317a0 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -629,6 +629,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	else
 		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
 
+	if (nested_vnmi_enabled(svm))
+		int_ctl_vmcb12_bits |= (V_NMI_PENDING | V_NMI_ENABLE);
+
 	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
 	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
 	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
@@ -951,10 +954,21 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
 	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
+	if (nested_vnmi_enabled(svm)) {
+		if (vmcb02->control.int_ctl & V_NMI_MASK)
+			vmcb12->control.int_ctl |= V_NMI_MASK;
+		else
+			vmcb12->control.int_ctl &= ~V_NMI_MASK;
+
+		if (vmcb02->control.int_ctl & V_NMI_PENDING)
+			vmcb12->control.int_ctl |= V_NMI_PENDING;
+		else
+			vmcb12->control.int_ctl &= ~V_NMI_PENDING;
+	}
+
 	if (!kvm_pause_in_guest(vcpu->kvm)) {
 		vmcb01->control.pause_filter_count = vmcb02->control.pause_filter_count;
 		vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS);
-
 	}
 
 	nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c73a1809a7c7..d81ad913542d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4069,6 +4069,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
 
+	svm->vnmi_enabled = vnmi && guest_cpuid_has(vcpu, X86_FEATURE_V_NMI);
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
@@ -4827,6 +4829,9 @@ static __init void svm_set_cpu_caps(void)
 		if (vgif)
 			kvm_cpu_cap_set(X86_FEATURE_VGIF);
 
+		if (vnmi)
+			kvm_cpu_cap_set(X86_FEATURE_V_NMI);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f36e30df6202..0094ca2698fa 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -241,6 +241,7 @@ struct vcpu_svm {
 	bool pause_filter_enabled         : 1;
 	bool pause_threshold_enabled      : 1;
 	bool vgif_enabled                 : 1;
+	bool vnmi_enabled                 : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
@@ -510,6 +511,11 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
 	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
 }
 
+static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
+{
+	return svm->vnmi_enabled && (svm->nested.ctl.int_ctl & V_NMI_ENABLE);
+}
+
 static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
 {
 	if (!vnmi)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv2 7/7] KVM: SVM: Enable VNMI feature
  2022-07-09 13:42 [PATCHv2 0/7] Virtual NMI feature Santosh Shukla
                   ` (5 preceding siblings ...)
  2022-07-09 13:42 ` [PATCHv2 6/7] KVM: nSVM: implement nested VNMI Santosh Shukla
@ 2022-07-09 13:42 ` Santosh Shukla
  6 siblings, 0 replies; 29+ messages in thread
From: Santosh Shukla @ 2022-07-09 13:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel, Santosh.Shukla

Enable the NMI virtualization (V_NMI_ENABLE) in the VMCB interrupt
control when the vnmi module parameter is set.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
---
 arch/x86/kvm/svm/svm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d81ad913542d..536aa392252b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1208,6 +1208,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
+	if (vnmi)
+		svm->vmcb->control.int_ctl |= V_NMI_ENABLE;
+
 	if (vgif) {
 		svm_clr_intercept(svm, INTERCEPT_STGI);
 		svm_clr_intercept(svm, INTERCEPT_CLGI);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-07-09 13:42 ` [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask Santosh Shukla
@ 2022-07-10 16:15   ` Maxim Levitsky
  2022-07-21  9:34     ` Shukla, Santosh
  0 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2022-07-10 16:15 UTC (permalink / raw)
  To: Santosh Shukla, Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Tom Lendacky, kvm, linux-kernel

On Sat, 2022-07-09 at 19:12 +0530, Santosh Shukla wrote:
> VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK as
> read-only in the hypervisor and do not populate set accessors.
> 
> Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> L2.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> ---
> v2:
> - Added get_vnmi_vmcb API to return vmcb for l1 and l2.
> - Use get_vnmi_vmcb to get correct vmcb in func -
>   is_vnmi_enabled/_mask_set()
> - removed vnmi check from is_vnmi_enabled() func.
> 
>  arch/x86/kvm/svm/svm.c | 12 ++++++++++--
>  arch/x86/kvm/svm/svm.h | 32 ++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index baaf35be36e5..3574e804d757 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -198,7 +198,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
>  bool intercept_smi = true;
>  module_param(intercept_smi, bool, 0444);
>  
> -static bool vnmi;
> +bool vnmi = true;
>  module_param(vnmi, bool, 0444);
>  
>  static bool svm_gp_erratum_intercept = true;
> @@ -3503,13 +3503,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>  
>  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
>  {
> -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (is_vnmi_enabled(svm))
> +		return is_vnmi_mask_set(svm);
> +	else
> +		return !!(vcpu->arch.hflags & HF_NMI_MASK);
>  }
>  
>  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	if (is_vnmi_enabled(svm))
> +		return;
> +
>  	if (masked) {
>  		vcpu->arch.hflags |= HF_NMI_MASK;
>  		if (!sev_es_guest(vcpu->kvm))
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 9223ac100ef5..f36e30df6202 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>  extern bool npt_enabled;
>  extern int vgif;
>  extern bool intercept_smi;
> +extern bool vnmi;
>  
>  /*
>   * Clean bits in VMCB.
> @@ -509,6 +510,37 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
>  }
>  
> +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> +{
> +	if (!vnmi)
> +		return NULL;
> +
> +	if (is_guest_mode(&svm->vcpu))
> +		return svm->nested.vmcb02.ptr;
> +	else
> +		return svm->vmcb01.ptr;
> +}

This is better but still not enough to support nesting:


Let me explain the cases that we need to cover:


1. non nested case, vmcb01 has all the VNMI settings,
and I think it should work, but need to review the patches again.



2. L1 uses vNMI, L2 doesn't use vNMI (nested_vnmi_enabled() == false).

  In this case, vNMI settings just need to be copied from vmcb01 to vmcb02
  and vise versa during nested entry and exit.


  This means that nested_vmcb02_prepare_control in this case should copy
  all 3 bits from vmcb01 to vmcb02, and vise versa nested_svm_vmexit
  should copy them back.

  Currently I see no indication of this being done in this patch series.

  vmcb02 should indeed be used to read vnmi bits (like done above).


3. L1 uses vNMI, L2 uses vNMI:

  - First of all in this case all 3 vNMI bits should be copied from vmcb12
    to vmcb02 on nested entry and back on nested VM exit.

    I *think* this is done correctly in the patch 6, but I need to check again.

 
  - Second issue, depends on vNMI spec which we still don't have, and it
    relates to the fact on what to do if NMIs are not intercepted by
    the (nested) hypervisor, and L0 wants to inject an NMI

    (from L1 point of view it means that a 'real' NMI is about to be
    received while L2 is running).


    - If VNMI is not allowed to be enabled when NMIs are not intercepted,
      (vast majority of nested hypervisors will want to intercept real NMIs)
      then everything is fine -

      this means that if nested vNMI is enabled, then L1 will have
      to intercept 'real' NMIs, and thus L0 would be able to always
      inject 'real' NMIs while L2 is running by doing a VM exit to L1 without
      touching any vNMI state.

    - If the vNMI spec states that if vNMI is enabled, real NMIs
      are not intercepted and a real NMI is arriving, then the CPU
      will use vNMI state to handle it (that is it will set the 'pending'
      bit, then check if 'masked' bit is set, and if not, move pending to masked
      and deliver NMI to L2, in this case, it is indeed right to use vmcb02
      and keep on using VNMI for NMIs that are directed to L1,
      but I highly doubt that this is the case.


    - Most likely case - vNMI is allowed without NMI intercept,
      and real NMI does't consult the vNMI bits, but rather uses 'hosts'
      NMI masking. IRET doesn't affect host's NMI' masking as well.


      In this case, when L0 wants to inject NMI to a nested guest
      that has vNMI enabled, and doesn't intercept NMIs, it
      has to:

      - still consult the vNMI pending/masked bits of *vmcb01*,
        to know if it can inject a NMI

      - if it can inject it, it should update *manually* the pending/masked bits
        of vmcb01 as well, so that L1's vNMI the state remains consistent.

      - inject the NMI to L2, in the old fashioned way with EVENTINJ,
	or open NMI window by intercepting IRET if NMI is masked.


Best regards,
	Maxim Levitsky




> +
> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> +	if (vmcb)
> +		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> +	else
> +		return false;
> +}
> +
> +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> +{
> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> +
> +	if (vmcb)
> +		return !!(vmcb->control.int_ctl & V_NMI_MASK);
> +	else
> +		return false;
> +}
> +
>  /* svm.c */
>  #define MSR_INVALID				0xffffffffU
>  



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi
  2022-07-09 13:42 ` [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi Santosh Shukla
@ 2022-07-20 21:41   ` Sean Christopherson
  2022-07-20 22:46     ` Jim Mattson
  2022-07-29  6:06     ` Shukla, Santosh
  0 siblings, 2 replies; 29+ messages in thread
From: Sean Christopherson @ 2022-07-20 21:41 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel

On Sat, Jul 09, 2022, Santosh Shukla wrote:
> Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
> will clear V_NMI to acknowledge processing has started and will keep the
> V_NMI_MASK set until the processor is done with processing the NMI event.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> ---
> v2:
> - Added WARN_ON check for vnmi pending.
> - use `get_vnmi_vmcb` to get correct vmcb so to inject vnmi.
> 
>  arch/x86/kvm/svm/svm.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 44c1f2317b45..c73a1809a7c7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3375,12 +3375,20 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
>  static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct vmcb *vmcb = NULL;
> +
> +	++vcpu->stat.nmi_injections;
> +	if (is_vnmi_enabled(svm)) {
> +		vmcb = get_vnmi_vmcb(svm);
> +		WARN_ON(vmcb->control.int_ctl & V_NMI_PENDING);

Haven't read the spec, but based on the changelog I assume the flag doesn't get
cleared until the NMI is fully delivered.  That means this WARN will fire if a
VM-Exit occurs during delivery as KVM will re-inject the event, e.g. if KVM is
using shadow paging and a #PF handle by KVM occurs during delivery.

> +		vmcb->control.int_ctl |= V_NMI_PENDING;
> +		return;
> +	}
>  
>  	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
>  	vcpu->arch.hflags |= HF_NMI_MASK;
>  	if (!sev_es_guest(vcpu->kvm))
>  		svm_set_intercept(svm, INTERCEPT_IRET);
> -	++vcpu->stat.nmi_injections;
>  }
>  
>  static void svm_inject_irq(struct kvm_vcpu *vcpu)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-09 13:42 ` [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI Santosh Shukla
@ 2022-07-20 21:54   ` Sean Christopherson
  2022-07-21 12:05     ` Maxim Levitsky
  2022-07-29  5:51     ` Shukla, Santosh
  0 siblings, 2 replies; 29+ messages in thread
From: Sean Christopherson @ 2022-07-20 21:54 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel

On Sat, Jul 09, 2022, Santosh Shukla wrote:
> In the VNMI case, Report NMI is not allowed when the processor set the
> V_NMI_MASK to 1 which means the Guest is busy handling VNMI.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> ---
> v2:
> - Moved vnmi check after is_guest_mode() in func _nmi_blocked().
> - Removed is_vnmi_mask_set check from _enable_nmi_window().
> as it was a redundent check.
> 
>  arch/x86/kvm/svm/svm.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3574e804d757..44c1f2317b45 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3480,6 +3480,9 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>  	if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
>  		return false;
>  
> +	if (is_vnmi_enabled(svm) && is_vnmi_mask_set(svm))
> +		return true;
> +
>  	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
>  	      (vcpu->arch.hflags & HF_NMI_MASK);
>  
> @@ -3609,6 +3612,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> +	if (is_vnmi_enabled(svm))
> +		return;

Ugh, is there really no way to trigger an exit when NMIs become unmasked?  Because
if there isn't, this is broken for KVM.

On bare metal, if two NMIs arrive "simultaneously", so long as NMIs aren't blocked,
the first NMI will be delivered and the second will be pended, i.e. software will
see both NMIs.  And if that doesn't hold true, the window for a true collision is
really, really tiny.

But in KVM, because a vCPU may not be run a long duration, that window becomes
very large.  To not drop NMIs and more faithfully emulate hardware, KVM allows two
NMIs to be _pending_.  And when that happens, KVM needs to trigger an exit when
NMIs become unmasked _after_ the first NMI is injected.

> +
>  	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
>  		return; /* IRET will cause a vm exit */
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi
  2022-07-20 21:41   ` Sean Christopherson
@ 2022-07-20 22:46     ` Jim Mattson
  2022-07-20 23:04       ` Sean Christopherson
  2022-07-29  6:06     ` Shukla, Santosh
  1 sibling, 1 reply; 29+ messages in thread
From: Jim Mattson @ 2022-07-20 22:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Santosh Shukla, Paolo Bonzini, Vitaly Kuznetsov, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel

On Wed, Jul 20, 2022 at 2:41 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Sat, Jul 09, 2022, Santosh Shukla wrote:
> > Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
> > will clear V_NMI to acknowledge processing has started and will keep the
> > V_NMI_MASK set until the processor is done with processing the NMI event.
> >
> > Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> > ---
> > v2:
> > - Added WARN_ON check for vnmi pending.
> > - use `get_vnmi_vmcb` to get correct vmcb so to inject vnmi.
> >
> >  arch/x86/kvm/svm/svm.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 44c1f2317b45..c73a1809a7c7 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3375,12 +3375,20 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
> >  static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> >  {
> >       struct vcpu_svm *svm = to_svm(vcpu);
> > +     struct vmcb *vmcb = NULL;
> > +
> > +     ++vcpu->stat.nmi_injections;
> > +     if (is_vnmi_enabled(svm)) {
> > +             vmcb = get_vnmi_vmcb(svm);
> > +             WARN_ON(vmcb->control.int_ctl & V_NMI_PENDING);
>
> Haven't read the spec, but based on the changelog I assume the flag doesn't get
> cleared until the NMI is fully delivered.

Ooh! Is there a spec to read now?!?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi
  2022-07-20 22:46     ` Jim Mattson
@ 2022-07-20 23:04       ` Sean Christopherson
  0 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2022-07-20 23:04 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Santosh Shukla, Paolo Bonzini, Vitaly Kuznetsov, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel

On Wed, Jul 20, 2022, Jim Mattson wrote:
> On Wed, Jul 20, 2022 at 2:41 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Sat, Jul 09, 2022, Santosh Shukla wrote:
> > > Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
> > > will clear V_NMI to acknowledge processing has started and will keep the
> > > V_NMI_MASK set until the processor is done with processing the NMI event.
> > >
> > > Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> > > ---
> > > v2:
> > > - Added WARN_ON check for vnmi pending.
> > > - use `get_vnmi_vmcb` to get correct vmcb so to inject vnmi.
> > >
> > >  arch/x86/kvm/svm/svm.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 44c1f2317b45..c73a1809a7c7 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -3375,12 +3375,20 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
> > >  static void svm_inject_nmi(struct kvm_vcpu *vcpu)
> > >  {
> > >       struct vcpu_svm *svm = to_svm(vcpu);
> > > +     struct vmcb *vmcb = NULL;
> > > +
> > > +     ++vcpu->stat.nmi_injections;
> > > +     if (is_vnmi_enabled(svm)) {
> > > +             vmcb = get_vnmi_vmcb(svm);
> > > +             WARN_ON(vmcb->control.int_ctl & V_NMI_PENDING);
> >
> > Haven't read the spec, but based on the changelog I assume the flag doesn't get
> > cleared until the NMI is fully delivered.
> 
> Ooh! Is there a spec to read now?!?

Apparently not.  I just assumed there's a spec, but I haven't found one.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-07-10 16:15   ` Maxim Levitsky
@ 2022-07-21  9:34     ` Shukla, Santosh
  2022-07-21 12:01       ` Maxim Levitsky
  0 siblings, 1 reply; 29+ messages in thread
From: Shukla, Santosh @ 2022-07-21  9:34 UTC (permalink / raw)
  To: Maxim Levitsky, Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Tom Lendacky, kvm, linux-kernel



On 7/10/2022 9:45 PM, Maxim Levitsky wrote:
> On Sat, 2022-07-09 at 19:12 +0530, Santosh Shukla wrote:
>> VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
>> NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK as
>> read-only in the hypervisor and do not populate set accessors.
>>
>> Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
>> L2.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>> ---
>> v2:
>> - Added get_vnmi_vmcb API to return vmcb for l1 and l2.
>> - Use get_vnmi_vmcb to get correct vmcb in func -
>>   is_vnmi_enabled/_mask_set()
>> - removed vnmi check from is_vnmi_enabled() func.
>>
>>  arch/x86/kvm/svm/svm.c | 12 ++++++++++--
>>  arch/x86/kvm/svm/svm.h | 32 ++++++++++++++++++++++++++++++++
>>  2 files changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index baaf35be36e5..3574e804d757 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -198,7 +198,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
>>  bool intercept_smi = true;
>>  module_param(intercept_smi, bool, 0444);
>>  
>> -static bool vnmi;
>> +bool vnmi = true;
>>  module_param(vnmi, bool, 0444);
>>  
>>  static bool svm_gp_erratum_intercept = true;
>> @@ -3503,13 +3503,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>>  
>>  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
>>  {
>> -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +	if (is_vnmi_enabled(svm))
>> +		return is_vnmi_mask_set(svm);
>> +	else
>> +		return !!(vcpu->arch.hflags & HF_NMI_MASK);
>>  }
>>  
>>  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>>  {
>>  	struct vcpu_svm *svm = to_svm(vcpu);
>>  
>> +	if (is_vnmi_enabled(svm))
>> +		return;
>> +
>>  	if (masked) {
>>  		vcpu->arch.hflags |= HF_NMI_MASK;
>>  		if (!sev_es_guest(vcpu->kvm))
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 9223ac100ef5..f36e30df6202 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>>  extern bool npt_enabled;
>>  extern int vgif;
>>  extern bool intercept_smi;
>> +extern bool vnmi;
>>  
>>  /*
>>   * Clean bits in VMCB.
>> @@ -509,6 +510,37 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>>  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
>>  }
>>  
>> +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
>> +{
>> +	if (!vnmi)
>> +		return NULL;
>> +
>> +	if (is_guest_mode(&svm->vcpu))
>> +		return svm->nested.vmcb02.ptr;
>> +	else
>> +		return svm->vmcb01.ptr;
>> +}
> 
> This is better but still not enough to support nesting:
> 
> 
> Let me explain the cases that we need to cover:
> 
> 
> 1. non nested case, vmcb01 has all the VNMI settings,
> and I think it should work, but need to review the patches again.
> 
> 
> 
> 2. L1 uses vNMI, L2 doesn't use vNMI (nested_vnmi_enabled() == false).
> 
>   In this case, vNMI settings just need to be copied from vmcb01 to vmcb02
>   and vise versa during nested entry and exit.
> 
> 
>   This means that nested_vmcb02_prepare_control in this case should copy
>   all 3 bits from vmcb01 to vmcb02, and vise versa nested_svm_vmexit
>   should copy them back.
> 
>   Currently I see no indication of this being done in this patch series.
> 

Yes, Thanks for pointing out, in v3 series.

>   vmcb02 should indeed be used to read vnmi bits (like done above).
> 
> 
> 3. L1 uses vNMI, L2 uses vNMI:
> 
>   - First of all in this case all 3 vNMI bits should be copied from vmcb12
>     to vmcb02 on nested entry and back on nested VM exit.
> 
>     I *think* this is done correctly in the patch 6, but I need to check again.
> 
>  
>   - Second issue, depends on vNMI spec which we still don't have, and it
>     relates to the fact on what to do if NMIs are not intercepted by
>     the (nested) hypervisor, and L0 wants to inject an NMI
> 
>     (from L1 point of view it means that a 'real' NMI is about to be
>     received while L2 is running).
> 
> 
>     - If VNMI is not allowed to be enabled when NMIs are not intercepted,
>       (vast majority of nested hypervisors will want to intercept real NMIs)
>       then everything is fine -
> 
>       this means that if nested vNMI is enabled, then L1 will have
>       to intercept 'real' NMIs, and thus L0 would be able to always
>       inject 'real' NMIs while L2 is running by doing a VM exit to L1 without
>       touching any vNMI state.
> 
Yes. Enabling NMI virtualization requires the NMI intercept bit to be set.

>     - If the vNMI spec states that if vNMI is enabled, real NMIs
>       are not intercepted and a real NMI is arriving, then the CPU
>       will use vNMI state to handle it (that is it will set the 'pending'
>       bit, then check if 'masked' bit is set, and if not, move pending to masked
>       and deliver NMI to L2, in this case, it is indeed right to use vmcb02
>       and keep on using VNMI for NMIs that are directed to L1,
>       but I highly doubt that this is the case.
> 
> 
No.

>     - Most likely case - vNMI is allowed without NMI intercept,
>       and real NMI does't consult the vNMI bits, but rather uses 'hosts'
>       NMI masking. IRET doesn't affect host's NMI' masking as well.
> 
>

No.

Thanks,
Santosh
 
>       In this case, when L0 wants to inject NMI to a nested guest
>       that has vNMI enabled, and doesn't intercept NMIs, it
>       has to:
> 
>       - still consult the vNMI pending/masked bits of *vmcb01*,
>         to know if it can inject a NMI
> 
>       - if it can inject it, it should update *manually* the pending/masked bits
>         of vmcb01 as well, so that L1's vNMI the state remains consistent.
> 
>       - inject the NMI to L2, in the old fashioned way with EVENTINJ,
> 	or open NMI window by intercepting IRET if NMI is masked.
> 
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 
> 
>> +
>> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
>> +{
>> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
>> +
>> +	if (vmcb)
>> +		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
>> +	else
>> +		return false;
>> +}
>> +
>> +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
>> +{
>> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
>> +
>> +	if (vmcb)
>> +		return !!(vmcb->control.int_ctl & V_NMI_MASK);
>> +	else
>> +		return false;
>> +}
>> +
>>  /* svm.c */
>>  #define MSR_INVALID				0xffffffffU
>>  
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-07-21  9:34     ` Shukla, Santosh
@ 2022-07-21 12:01       ` Maxim Levitsky
  2022-07-21 13:12         ` Shukla, Santosh
  0 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2022-07-21 12:01 UTC (permalink / raw)
  To: Shukla, Santosh, Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Tom Lendacky, kvm, linux-kernel

On Thu, 2022-07-21 at 15:04 +0530, Shukla, Santosh wrote:
> 
> On 7/10/2022 9:45 PM, Maxim Levitsky wrote:
> > On Sat, 2022-07-09 at 19:12 +0530, Santosh Shukla wrote:
> > > VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> > > NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK as
> > > read-only in the hypervisor and do not populate set accessors.
> > > 
> > > Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> > > L2.
> > > 
> > > Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> > > ---
> > > v2:
> > > - Added get_vnmi_vmcb API to return vmcb for l1 and l2.
> > > - Use get_vnmi_vmcb to get correct vmcb in func -
> > >   is_vnmi_enabled/_mask_set()
> > > - removed vnmi check from is_vnmi_enabled() func.
> > > 
> > >  arch/x86/kvm/svm/svm.c | 12 ++++++++++--
> > >  arch/x86/kvm/svm/svm.h | 32 ++++++++++++++++++++++++++++++++
> > >  2 files changed, 42 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index baaf35be36e5..3574e804d757 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -198,7 +198,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
> > >  bool intercept_smi = true;
> > >  module_param(intercept_smi, bool, 0444);
> > >  
> > > -static bool vnmi;
> > > +bool vnmi = true;
> > >  module_param(vnmi, bool, 0444);
> > >  
> > >  static bool svm_gp_erratum_intercept = true;
> > > @@ -3503,13 +3503,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> > >  
> > >  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > >  {
> > > -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > > +	struct vcpu_svm *svm = to_svm(vcpu);
> > > +
> > > +	if (is_vnmi_enabled(svm))
> > > +		return is_vnmi_mask_set(svm);
> > > +	else
> > > +		return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > >  }
> > >  
> > >  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> > >  {
> > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > >  
> > > +	if (is_vnmi_enabled(svm))
> > > +		return;
> > > +
> > >  	if (masked) {
> > >  		vcpu->arch.hflags |= HF_NMI_MASK;
> > >  		if (!sev_es_guest(vcpu->kvm))
> > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > index 9223ac100ef5..f36e30df6202 100644
> > > --- a/arch/x86/kvm/svm/svm.h
> > > +++ b/arch/x86/kvm/svm/svm.h
> > > @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> > >  extern bool npt_enabled;
> > >  extern int vgif;
> > >  extern bool intercept_smi;
> > > +extern bool vnmi;
> > >  
> > >  /*
> > >   * Clean bits in VMCB.
> > > @@ -509,6 +510,37 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > >  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > >  }
> > >  
> > > +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> > > +{
> > > +	if (!vnmi)
> > > +		return NULL;
> > > +
> > > +	if (is_guest_mode(&svm->vcpu))
> > > +		return svm->nested.vmcb02.ptr;
> > > +	else
> > > +		return svm->vmcb01.ptr;
> > > +}
> > 
> > This is better but still not enough to support nesting:
> > 
> > 
> > Let me explain the cases that we need to cover:
> > 
> > 
> > 1. non nested case, vmcb01 has all the VNMI settings,
> > and I think it should work, but need to review the patches again.
> > 
> > 
> > 
> > 2. L1 uses vNMI, L2 doesn't use vNMI (nested_vnmi_enabled() == false).
> > 
> >   In this case, vNMI settings just need to be copied from vmcb01 to vmcb02
> >   and vise versa during nested entry and exit.
> > 
> > 
> >   This means that nested_vmcb02_prepare_control in this case should copy
> >   all 3 bits from vmcb01 to vmcb02, and vise versa nested_svm_vmexit
> >   should copy them back.
> > 
> >   Currently I see no indication of this being done in this patch series.
> > 
> 
> Yes, Thanks for pointing out, in v3 series.
> 
> >   vmcb02 should indeed be used to read vnmi bits (like done above).
> > 
> > 
> > 3. L1 uses vNMI, L2 uses vNMI:
> > 
> >   - First of all in this case all 3 vNMI bits should be copied from vmcb12
> >     to vmcb02 on nested entry and back on nested VM exit.
> > 
> >     I *think* this is done correctly in the patch 6, but I need to check again.
> > 
> >  
> >   - Second issue, depends on vNMI spec which we still don't have, and it
> >     relates to the fact on what to do if NMIs are not intercepted by
> >     the (nested) hypervisor, and L0 wants to inject an NMI
> > 
> >     (from L1 point of view it means that a 'real' NMI is about to be
> >     received while L2 is running).
> > 
> > 
> >     - If VNMI is not allowed to be enabled when NMIs are not intercepted,
> >       (vast majority of nested hypervisors will want to intercept real NMIs)
> >       then everything is fine -
> > 
> >       this means that if nested vNMI is enabled, then L1 will have
> >       to intercept 'real' NMIs, and thus L0 would be able to always
> >       inject 'real' NMIs while L2 is running by doing a VM exit to L1 without
> >       touching any vNMI state.
> > 
> Yes. Enabling NMI virtualization requires the NMI intercept bit to be set.

Those are very good news. 

What would happen though if the guest doesn't intercept NMI,
and still tries to enable vNMI? 

Failed VM entry or vNMI ignored?

This matters for nested because nested must work the same as real hardware.

In either of the cases some code is needed to emulate this correctly in the nested
virtualization code in KVM, but the patches have none.

Best regards,
	Maxim Levitsky


> 
> >     - If the vNMI spec states that if vNMI is enabled, real NMIs
> >       are not intercepted and a real NMI is arriving, then the CPU
> >       will use vNMI state to handle it (that is it will set the 'pending'
> >       bit, then check if 'masked' bit is set, and if not, move pending to masked
> >       and deliver NMI to L2, in this case, it is indeed right to use vmcb02
> >       and keep on using VNMI for NMIs that are directed to L1,
> >       but I highly doubt that this is the case.
> > 
> > 
> No.
> 
> >     - Most likely case - vNMI is allowed without NMI intercept,
> >       and real NMI does't consult the vNMI bits, but rather uses 'hosts'
> >       NMI masking. IRET doesn't affect host's NMI' masking as well.
> > 
> > 
> 
> No.
> 
> Thanks,
> Santosh
>  
> >       In this case, when L0 wants to inject NMI to a nested guest
> >       that has vNMI enabled, and doesn't intercept NMIs, it
> >       has to:
> > 
> >       - still consult the vNMI pending/masked bits of *vmcb01*,
> >         to know if it can inject a NMI
> > 
> >       - if it can inject it, it should update *manually* the pending/masked bits
> >         of vmcb01 as well, so that L1's vNMI the state remains consistent.
> > 
> >       - inject the NMI to L2, in the old fashioned way with EVENTINJ,
> > 	or open NMI window by intercepting IRET if NMI is masked.
> > 
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> > 
> > 
> > 
> > > +
> > > +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> > > +{
> > > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > > +
> > > +	if (vmcb)
> > > +		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > > +	else
> > > +		return false;
> > > +}
> > > +
> > > +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> > > +{
> > > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > > +
> > > +	if (vmcb)
> > > +		return !!(vmcb->control.int_ctl & V_NMI_MASK);
> > > +	else
> > > +		return false;
> > > +}
> > > +
> > >  /* svm.c */
> > >  #define MSR_INVALID				0xffffffffU
> > >  



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-20 21:54   ` Sean Christopherson
@ 2022-07-21 12:05     ` Maxim Levitsky
  2022-07-21 14:59       ` Sean Christopherson
  2022-07-29  5:51     ` Shukla, Santosh
  1 sibling, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2022-07-21 12:05 UTC (permalink / raw)
  To: Sean Christopherson, Santosh Shukla
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Tom Lendacky, kvm, linux-kernel

On Wed, 2022-07-20 at 21:54 +0000, Sean Christopherson wrote:
> On Sat, Jul 09, 2022, Santosh Shukla wrote:
> > In the VNMI case, Report NMI is not allowed when the processor set the
> > V_NMI_MASK to 1 which means the Guest is busy handling VNMI.
> > 
> > Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> > ---
> > v2:
> > - Moved vnmi check after is_guest_mode() in func _nmi_blocked().
> > - Removed is_vnmi_mask_set check from _enable_nmi_window().
> > as it was a redundent check.
> > 
> >  arch/x86/kvm/svm/svm.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 3574e804d757..44c1f2317b45 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3480,6 +3480,9 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
> >  	if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
> >  		return false;
> >  
> > +	if (is_vnmi_enabled(svm) && is_vnmi_mask_set(svm))
> > +		return true;
> > +
> >  	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
> >  	      (vcpu->arch.hflags & HF_NMI_MASK);
> >  
> > @@ -3609,6 +3612,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> >  
> > +	if (is_vnmi_enabled(svm))
> > +		return;
> 
> Ugh, is there really no way to trigger an exit when NMIs become unmasked?  Because
> if there isn't, this is broken for KVM.
> 
> On bare metal, if two NMIs arrive "simultaneously", so long as NMIs aren't blocked,
> the first NMI will be delivered and the second will be pended, i.e. software will
> see both NMIs.  And if that doesn't hold true, the window for a true collision is
> really, really tiny.
> 
> But in KVM, because a vCPU may not be run a long duration, that window becomes
> very large.  To not drop NMIs and more faithfully emulate hardware, KVM allows two
> NMIs to be _pending_.  And when that happens, KVM needs to trigger an exit when
> NMIs become unmasked _after_ the first NMI is injected.


This is how I see this:

- When a NMI arrives and neither NMI is injected (V_NMI_PENDING) nor in service (V_NMI_MASK)
  then all it is needed to inject the NMI will be to set the V_NMI_PENDING bit and do VM entry.

- If V_NMI_PENDING is set but not V_NMI_MASK, and another NMI arrives we can make the
  svm_nmi_allowed return -EBUSY which will cause immediate VM exit,

  and if hopefully vNMI takes priority over the fake interrupt we raise, it will be injected,
  and upon immediate VM exit we can inject another NMI by setting the V_NMI_PENDING again,
  and later when the guest is done with first NMI, it will take the second.

  Of course if we get a nested exception, then it will be fun....

  (the patches don't do it (causing immediate VM exit), 
  but I think we should make the svm_nmi_allowed, check for the case for 
  V_NMI_PENDING && !V_NMI_MASK and make it return -EBUSY).


- If both V_NMI_PENDING and V_NMI_MASK are set, then I guess we lose an NMI.
 (It means that the guest is handling an NMI, there is a pending NMI, and now
 another NMI arrived)

 Sean, this is the problem you mention, right?

Best regards,
	Maxim Levitsky

> 
> > +
> >  	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
> >  		return; /* IRET will cause a vm exit */
> >  
> > -- 
> > 2.25.1
> > 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-07-21 12:01       ` Maxim Levitsky
@ 2022-07-21 13:12         ` Shukla, Santosh
  2022-07-21 15:48           ` Maxim Levitsky
  0 siblings, 1 reply; 29+ messages in thread
From: Shukla, Santosh @ 2022-07-21 13:12 UTC (permalink / raw)
  To: Maxim Levitsky, Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Tom Lendacky, kvm, linux-kernel



On 7/21/2022 5:31 PM, Maxim Levitsky wrote:
> On Thu, 2022-07-21 at 15:04 +0530, Shukla, Santosh wrote:
>>
>> On 7/10/2022 9:45 PM, Maxim Levitsky wrote:
>>> On Sat, 2022-07-09 at 19:12 +0530, Santosh Shukla wrote:
>>>> VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
>>>> NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK as
>>>> read-only in the hypervisor and do not populate set accessors.
>>>>
>>>> Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
>>>> L2.
>>>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>>>> ---
>>>> v2:
>>>> - Added get_vnmi_vmcb API to return vmcb for l1 and l2.
>>>> - Use get_vnmi_vmcb to get correct vmcb in func -
>>>>   is_vnmi_enabled/_mask_set()
>>>> - removed vnmi check from is_vnmi_enabled() func.
>>>>
>>>>  arch/x86/kvm/svm/svm.c | 12 ++++++++++--
>>>>  arch/x86/kvm/svm/svm.h | 32 ++++++++++++++++++++++++++++++++
>>>>  2 files changed, 42 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>> index baaf35be36e5..3574e804d757 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -198,7 +198,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
>>>>  bool intercept_smi = true;
>>>>  module_param(intercept_smi, bool, 0444);
>>>>  
>>>> -static bool vnmi;
>>>> +bool vnmi = true;
>>>>  module_param(vnmi, bool, 0444);
>>>>  
>>>>  static bool svm_gp_erratum_intercept = true;
>>>> @@ -3503,13 +3503,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>>>>  
>>>>  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
>>>>  {
>>>> -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
>>>> +	struct vcpu_svm *svm = to_svm(vcpu);
>>>> +
>>>> +	if (is_vnmi_enabled(svm))
>>>> +		return is_vnmi_mask_set(svm);
>>>> +	else
>>>> +		return !!(vcpu->arch.hflags & HF_NMI_MASK);
>>>>  }
>>>>  
>>>>  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
>>>>  {
>>>>  	struct vcpu_svm *svm = to_svm(vcpu);
>>>>  
>>>> +	if (is_vnmi_enabled(svm))
>>>> +		return;
>>>> +
>>>>  	if (masked) {
>>>>  		vcpu->arch.hflags |= HF_NMI_MASK;
>>>>  		if (!sev_es_guest(vcpu->kvm))
>>>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>>>> index 9223ac100ef5..f36e30df6202 100644
>>>> --- a/arch/x86/kvm/svm/svm.h
>>>> +++ b/arch/x86/kvm/svm/svm.h
>>>> @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>>>>  extern bool npt_enabled;
>>>>  extern int vgif;
>>>>  extern bool intercept_smi;
>>>> +extern bool vnmi;
>>>>  
>>>>  /*
>>>>   * Clean bits in VMCB.
>>>> @@ -509,6 +510,37 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>>>>  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
>>>>  }
>>>>  
>>>> +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
>>>> +{
>>>> +	if (!vnmi)
>>>> +		return NULL;
>>>> +
>>>> +	if (is_guest_mode(&svm->vcpu))
>>>> +		return svm->nested.vmcb02.ptr;
>>>> +	else
>>>> +		return svm->vmcb01.ptr;
>>>> +}
>>>
>>> This is better but still not enough to support nesting:
>>>
>>>
>>> Let me explain the cases that we need to cover:
>>>
>>>
>>> 1. non nested case, vmcb01 has all the VNMI settings,
>>> and I think it should work, but need to review the patches again.
>>>
>>>
>>>
>>> 2. L1 uses vNMI, L2 doesn't use vNMI (nested_vnmi_enabled() == false).
>>>
>>>   In this case, vNMI settings just need to be copied from vmcb01 to vmcb02
>>>   and vise versa during nested entry and exit.
>>>
>>>
>>>   This means that nested_vmcb02_prepare_control in this case should copy
>>>   all 3 bits from vmcb01 to vmcb02, and vise versa nested_svm_vmexit
>>>   should copy them back.
>>>
>>>   Currently I see no indication of this being done in this patch series.
>>>
>>
>> Yes, Thanks for pointing out, in v3 series.
>>
>>>   vmcb02 should indeed be used to read vnmi bits (like done above).
>>>
>>>
>>> 3. L1 uses vNMI, L2 uses vNMI:
>>>
>>>   - First of all in this case all 3 vNMI bits should be copied from vmcb12
>>>     to vmcb02 on nested entry and back on nested VM exit.
>>>
>>>     I *think* this is done correctly in the patch 6, but I need to check again.
>>>
>>>  
>>>   - Second issue, depends on vNMI spec which we still don't have, and it
>>>     relates to the fact on what to do if NMIs are not intercepted by
>>>     the (nested) hypervisor, and L0 wants to inject an NMI
>>>
>>>     (from L1 point of view it means that a 'real' NMI is about to be
>>>     received while L2 is running).
>>>
>>>
>>>     - If VNMI is not allowed to be enabled when NMIs are not intercepted,
>>>       (vast majority of nested hypervisors will want to intercept real NMIs)
>>>       then everything is fine -
>>>
>>>       this means that if nested vNMI is enabled, then L1 will have
>>>       to intercept 'real' NMIs, and thus L0 would be able to always
>>>       inject 'real' NMIs while L2 is running by doing a VM exit to L1 without
>>>       touching any vNMI state.
>>>
>> Yes. Enabling NMI virtualization requires the NMI intercept bit to be set.
> 
> Those are very good news. 
> 
> What would happen though if the guest doesn't intercept NMI,
> and still tries to enable vNMI? 
> 
> Failed VM entry or vNMI ignored?
> 

VMEXIT_INVALID.

> This matters for nested because nested must work the same as real hardware.
> 
> In either of the cases some code is needed to emulate this correctly in the nested
> virtualization code in KVM, but the patches have none.
>

Yes,. in v3.

Thanks,
Santosh
 
> Best regards,
> 	Maxim Levitsky
> 
> 
>>
>>>     - If the vNMI spec states that if vNMI is enabled, real NMIs
>>>       are not intercepted and a real NMI is arriving, then the CPU
>>>       will use vNMI state to handle it (that is it will set the 'pending'
>>>       bit, then check if 'masked' bit is set, and if not, move pending to masked
>>>       and deliver NMI to L2, in this case, it is indeed right to use vmcb02
>>>       and keep on using VNMI for NMIs that are directed to L1,
>>>       but I highly doubt that this is the case.
>>>
>>>
>> No.
>>
>>>     - Most likely case - vNMI is allowed without NMI intercept,
>>>       and real NMI does't consult the vNMI bits, but rather uses 'hosts'
>>>       NMI masking. IRET doesn't affect host's NMI' masking as well.
>>>
>>>
>>
>> No.
>>
>> Thanks,
>> Santosh
>>  
>>>       In this case, when L0 wants to inject NMI to a nested guest
>>>       that has vNMI enabled, and doesn't intercept NMIs, it
>>>       has to:
>>>
>>>       - still consult the vNMI pending/masked bits of *vmcb01*,
>>>         to know if it can inject a NMI
>>>
>>>       - if it can inject it, it should update *manually* the pending/masked bits
>>>         of vmcb01 as well, so that L1's vNMI the state remains consistent.
>>>
>>>       - inject the NMI to L2, in the old fashioned way with EVENTINJ,
>>> 	or open NMI window by intercepting IRET if NMI is masked.
>>>
>>>
>>> Best regards,
>>> 	Maxim Levitsky
>>>
>>>
>>>
>>>
>>>> +
>>>> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
>>>> +{
>>>> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
>>>> +
>>>> +	if (vmcb)
>>>> +		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
>>>> +	else
>>>> +		return false;
>>>> +}
>>>> +
>>>> +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
>>>> +{
>>>> +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
>>>> +
>>>> +	if (vmcb)
>>>> +		return !!(vmcb->control.int_ctl & V_NMI_MASK);
>>>> +	else
>>>> +		return false;
>>>> +}
>>>> +
>>>>  /* svm.c */
>>>>  #define MSR_INVALID				0xffffffffU
>>>>  
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-21 12:05     ` Maxim Levitsky
@ 2022-07-21 14:59       ` Sean Christopherson
  2022-07-21 15:31         ` Maxim Levitsky
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2022-07-21 14:59 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Santosh Shukla, Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, Tom Lendacky, kvm, linux-kernel

On Thu, Jul 21, 2022, Maxim Levitsky wrote:
> On Wed, 2022-07-20 at 21:54 +0000, Sean Christopherson wrote:
> > On Sat, Jul 09, 2022, Santosh Shukla wrote:
> > > @@ -3609,6 +3612,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> > >  {
> > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > >  
> > > +	if (is_vnmi_enabled(svm))
> > > +		return;
> > 
> > Ugh, is there really no way to trigger an exit when NMIs become unmasked?  Because
> > if there isn't, this is broken for KVM.
> > 
> > On bare metal, if two NMIs arrive "simultaneously", so long as NMIs aren't blocked,
> > the first NMI will be delivered and the second will be pended, i.e. software will
> > see both NMIs.  And if that doesn't hold true, the window for a true collision is
> > really, really tiny.
> > 
> > But in KVM, because a vCPU may not be run a long duration, that window becomes
> > very large.  To not drop NMIs and more faithfully emulate hardware, KVM allows two
> > NMIs to be _pending_.  And when that happens, KVM needs to trigger an exit when
> > NMIs become unmasked _after_ the first NMI is injected.
> 
> 
> This is how I see this:
> 
> - When a NMI arrives and neither NMI is injected (V_NMI_PENDING) nor in service (V_NMI_MASK)
>   then all it is needed to inject the NMI will be to set the V_NMI_PENDING bit and do VM entry.
> 
> - If V_NMI_PENDING is set but not V_NMI_MASK, and another NMI arrives we can make the
>   svm_nmi_allowed return -EBUSY which will cause immediate VM exit,
> 
>   and if hopefully vNMI takes priority over the fake interrupt we raise, it will be injected,

Nit (for other readers following along), it's not a fake interrupt,I would describe
it as spurious or ignored.  It's very much a real IRQ, which matters because it
factors into event priority.

>   and upon immediate VM exit we can inject another NMI by setting the V_NMI_PENDING again,
>   and later when the guest is done with first NMI, it will take the second.

Yeaaaah.  This depends heavily on the vNMI being prioritized over the IRQ.

>   Of course if we get a nested exception, then it will be fun....
> 
>   (the patches don't do it (causing immediate VM exit), 
>   but I think we should make the svm_nmi_allowed, check for the case for 
>   V_NMI_PENDING && !V_NMI_MASK and make it return -EBUSY).

Yep, though I think there's a wrinkle (see below).

> - If both V_NMI_PENDING and V_NMI_MASK are set, then I guess we lose an NMI.
>  (It means that the guest is handling an NMI, there is a pending NMI, and now
>  another NMI arrived)
> 
>  Sean, this is the problem you mention, right?

Yep.  Dropping an NMI in the last case is ok, AFAIK no CPU will pend multiple NMIs
while another is in-flight.  But triggering an immediate exit in svm_nmi_allowed()
will hang the vCPU as the second pending NMI will never go away since the vCPU
won't make forward progress to unmask NMIs.  This can also happen if there are
two pending NMIs and GIF=0, i.e. any time there are multiple pending NMIs and NMIs
are blocked.

One other question: what happens if software atomically sets V_NMI_PENDING while
the VMCB is in use?  I assume bad things?  I.e. I assume KVM can't "post" NMIs :-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-21 14:59       ` Sean Christopherson
@ 2022-07-21 15:31         ` Maxim Levitsky
  2022-07-21 16:08           ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2022-07-21 15:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Santosh Shukla, Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, Tom Lendacky, kvm, linux-kernel

On Thu, 2022-07-21 at 14:59 +0000, Sean Christopherson wrote:
> On Thu, Jul 21, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-07-20 at 21:54 +0000, Sean Christopherson wrote:
> > > On Sat, Jul 09, 2022, Santosh Shukla wrote:
> > > > @@ -3609,6 +3612,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> > > >  {
> > > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > > >  
> > > > +	if (is_vnmi_enabled(svm))
> > > > +		return;
> > > 
> > > Ugh, is there really no way to trigger an exit when NMIs become unmasked?  Because
> > > if there isn't, this is broken for KVM.
> > > 
> > > On bare metal, if two NMIs arrive "simultaneously", so long as NMIs aren't blocked,
> > > the first NMI will be delivered and the second will be pended, i.e. software will
> > > see both NMIs.  And if that doesn't hold true, the window for a true collision is
> > > really, really tiny.
> > > 
> > > But in KVM, because a vCPU may not be run a long duration, that window becomes
> > > very large.  To not drop NMIs and more faithfully emulate hardware, KVM allows two
> > > NMIs to be _pending_.  And when that happens, KVM needs to trigger an exit when
> > > NMIs become unmasked _after_ the first NMI is injected.
> > 
> > This is how I see this:
> > 
> > - When a NMI arrives and neither NMI is injected (V_NMI_PENDING) nor in service (V_NMI_MASK)
> >   then all it is needed to inject the NMI will be to set the V_NMI_PENDING bit and do VM entry.
> > 
> > - If V_NMI_PENDING is set but not V_NMI_MASK, and another NMI arrives we can make the
> >   svm_nmi_allowed return -EBUSY which will cause immediate VM exit,
> > 
> >   and if hopefully vNMI takes priority over the fake interrupt we raise, it will be injected,
> 
> Nit (for other readers following along), it's not a fake interrupt,I would describe
> it as spurious or ignored.  It's very much a real IRQ, which matters because it
> factors into event priority.

Yep, 100% agree.


> 
> >   and upon immediate VM exit we can inject another NMI by setting the V_NMI_PENDING again,
> >   and later when the guest is done with first NMI, it will take the second.
> 
> Yeaaaah.  This depends heavily on the vNMI being prioritized over the IRQ.
> 
> >   Of course if we get a nested exception, then it will be fun....
> > 
> >   (the patches don't do it (causing immediate VM exit), 
> >   but I think we should make the svm_nmi_allowed, check for the case for 
> >   V_NMI_PENDING && !V_NMI_MASK and make it return -EBUSY).
> 
> Yep, though I think there's a wrinkle (see below).
> 
> > - If both V_NMI_PENDING and V_NMI_MASK are set, then I guess we lose an NMI.
> >  (It means that the guest is handling an NMI, there is a pending NMI, and now
> >  another NMI arrived)
> > 
> >  Sean, this is the problem you mention, right?
> 
> Yep.  Dropping an NMI in the last case is ok, AFAIK no CPU will pend multiple NMIs
> while another is in-flight.  But triggering an immediate exit in svm_nmi_allowed()
> will hang the vCPU as the second pending NMI will never go away since the vCPU

The idea is to trigger the immediate exit only when a NMI was just injected (V_NMI_PENDING=1)
but not masked (that is currently in service, that is V_NMI_MASK=0).

In case both bits are set, the NMI is dropped, that is no immediate exit is requested.

In this case, next VM entry should have no reason to not inject the NMI and then VM exit
on the interrupt we raised, so there should not be a problem with forward progress.

There is an issue still, the NMI could also be masked if we are in SMM (I suggested
setting the V_NMI_MASK manually in this case), thus in this case we won't have more
that one pending NMI, but I guess this is not that big problem.

We can btw also in this case "open" the NMI window by waiting for RSM intercept.
(that is just not inject the NMI, and on RSM inject it, I think that KVM already does this)

I think it should overal work, but no doubt I do expect issues and corner cases,


> won't make forward progress to unmask NMIs.  This can also happen if there are
> two pending NMIs and GIF=0, i.e. any time there are multiple pending NMIs and NMIs
> are blocked.

GIF=0 can be dealt with though, if GIF is 0 when 2nd pending NMI arrives, we can
delay its injection to the moment the STGI is executed and intercept STGI.

We I think already do something like that as well.

> 
> One other question: what happens if software atomically sets V_NMI_PENDING while
> the VMCB is in use?  I assume bad things?  I.e. I assume KVM can't "post" NMIs :-)

I can't answer that but I bet that this bit is only checked on VM entry.


Another question about the spec (sorry if I already asked this):

if vGIF is used, and the guest does CLGI, does the CPU set the V_NMI_MASK,
itself, or the CPU considers both the V_GIF and the V_NMI_MASK in the int_ctl,
as a condition of delivering the NMI? I think that the later should be true,
and thus V_NMI_MASK is more like V_NMI_IN_SERVICE.

Best regards,
	Maxim Levitsky

> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask
  2022-07-21 13:12         ` Shukla, Santosh
@ 2022-07-21 15:48           ` Maxim Levitsky
  0 siblings, 0 replies; 29+ messages in thread
From: Maxim Levitsky @ 2022-07-21 15:48 UTC (permalink / raw)
  To: Shukla, Santosh, Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Tom Lendacky, kvm, linux-kernel

On Thu, 2022-07-21 at 18:42 +0530, Shukla, Santosh wrote:
> 
> On 7/21/2022 5:31 PM, Maxim Levitsky wrote:
> > On Thu, 2022-07-21 at 15:04 +0530, Shukla, Santosh wrote:
> > > On 7/10/2022 9:45 PM, Maxim Levitsky wrote:
> > > > On Sat, 2022-07-09 at 19:12 +0530, Santosh Shukla wrote:
> > > > > VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> > > > > NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK as
> > > > > read-only in the hypervisor and do not populate set accessors.
> > > > > 
> > > > > Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> > > > > L2.
> > > > > 
> > > > > Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> > > > > ---
> > > > > v2:
> > > > > - Added get_vnmi_vmcb API to return vmcb for l1 and l2.
> > > > > - Use get_vnmi_vmcb to get correct vmcb in func -
> > > > >   is_vnmi_enabled/_mask_set()
> > > > > - removed vnmi check from is_vnmi_enabled() func.
> > > > > 
> > > > >  arch/x86/kvm/svm/svm.c | 12 ++++++++++--
> > > > >  arch/x86/kvm/svm/svm.h | 32 ++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 42 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > > index baaf35be36e5..3574e804d757 100644
> > > > > --- a/arch/x86/kvm/svm/svm.c
> > > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > > @@ -198,7 +198,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
> > > > >  bool intercept_smi = true;
> > > > >  module_param(intercept_smi, bool, 0444);
> > > > >  
> > > > > -static bool vnmi;
> > > > > +bool vnmi = true;
> > > > >  module_param(vnmi, bool, 0444);
> > > > >  
> > > > >  static bool svm_gp_erratum_intercept = true;
> > > > > @@ -3503,13 +3503,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> > > > >  
> > > > >  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > > > >  {
> > > > > -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > > > > +	struct vcpu_svm *svm = to_svm(vcpu);
> > > > > +
> > > > > +	if (is_vnmi_enabled(svm))
> > > > > +		return is_vnmi_mask_set(svm);
> > > > > +	else
> > > > > +		return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > > > >  }
> > > > >  
> > > > >  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> > > > >  {
> > > > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > > > >  
> > > > > +	if (is_vnmi_enabled(svm))
> > > > > +		return;
> > > > > +
> > > > >  	if (masked) {
> > > > >  		vcpu->arch.hflags |= HF_NMI_MASK;
> > > > >  		if (!sev_es_guest(vcpu->kvm))
> > > > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > > > index 9223ac100ef5..f36e30df6202 100644
> > > > > --- a/arch/x86/kvm/svm/svm.h
> > > > > +++ b/arch/x86/kvm/svm/svm.h
> > > > > @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> > > > >  extern bool npt_enabled;
> > > > >  extern int vgif;
> > > > >  extern bool intercept_smi;
> > > > > +extern bool vnmi;
> > > > >  
> > > > >  /*
> > > > >   * Clean bits in VMCB.
> > > > > @@ -509,6 +510,37 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > > > >  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > >  }
> > > > >  
> > > > > +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> > > > > +{
> > > > > +	if (!vnmi)
> > > > > +		return NULL;
> > > > > +
> > > > > +	if (is_guest_mode(&svm->vcpu))
> > > > > +		return svm->nested.vmcb02.ptr;
> > > > > +	else
> > > > > +		return svm->vmcb01.ptr;
> > > > > +}
> > > > 
> > > > This is better but still not enough to support nesting:
> > > > 
> > > > 
> > > > Let me explain the cases that we need to cover:
> > > > 
> > > > 
> > > > 1. non nested case, vmcb01 has all the VNMI settings,
> > > > and I think it should work, but need to review the patches again.
> > > > 
> > > > 
> > > > 
> > > > 2. L1 uses vNMI, L2 doesn't use vNMI (nested_vnmi_enabled() == false).
> > > > 
> > > >   In this case, vNMI settings just need to be copied from vmcb01 to vmcb02
> > > >   and vise versa during nested entry and exit.
> > > > 
> > > > 
> > > >   This means that nested_vmcb02_prepare_control in this case should copy
> > > >   all 3 bits from vmcb01 to vmcb02, and vise versa nested_svm_vmexit
> > > >   should copy them back.
> > > > 
> > > >   Currently I see no indication of this being done in this patch series.
> > > > 
> > > 
> > > Yes, Thanks for pointing out, in v3 series.
> > > 
> > > >   vmcb02 should indeed be used to read vnmi bits (like done above).
> > > > 
> > > > 
> > > > 3. L1 uses vNMI, L2 uses vNMI:
> > > > 
> > > >   - First of all in this case all 3 vNMI bits should be copied from vmcb12
> > > >     to vmcb02 on nested entry and back on nested VM exit.
> > > > 
> > > >     I *think* this is done correctly in the patch 6, but I need to check again.
> > > > 
> > > >  
> > > >   - Second issue, depends on vNMI spec which we still don't have, and it
> > > >     relates to the fact on what to do if NMIs are not intercepted by
> > > >     the (nested) hypervisor, and L0 wants to inject an NMI
> > > > 
> > > >     (from L1 point of view it means that a 'real' NMI is about to be
> > > >     received while L2 is running).
> > > > 
> > > > 
> > > >     - If VNMI is not allowed to be enabled when NMIs are not intercepted,
> > > >       (vast majority of nested hypervisors will want to intercept real NMIs)
> > > >       then everything is fine -
> > > > 
> > > >       this means that if nested vNMI is enabled, then L1 will have
> > > >       to intercept 'real' NMIs, and thus L0 would be able to always
> > > >       inject 'real' NMIs while L2 is running by doing a VM exit to L1 without
> > > >       touching any vNMI state.
> > > > 
> > > Yes. Enabling NMI virtualization requires the NMI intercept bit to be set.
> > 
> > Those are very good news. 
> > 
> > What would happen though if the guest doesn't intercept NMI,
> > and still tries to enable vNMI? 
> > 
> > Failed VM entry or vNMI ignored?
> > 
> 
> VMEXIT_INVALID.

Perfect!
> 
> > This matters for nested because nested must work the same as real hardware.
> > 
> > In either of the cases some code is needed to emulate this correctly in the nested
> > virtualization code in KVM, but the patches have none.
> > 
> 
> Yes,. in v3.

Perfect!


Best regards,
	Maxim Levitsky

> 
> Thanks,
> Santosh
>  
> > Best regards,
> > 	Maxim Levitsky
> > 
> > 
> > > >     - If the vNMI spec states that if vNMI is enabled, real NMIs
> > > >       are not intercepted and a real NMI is arriving, then the CPU
> > > >       will use vNMI state to handle it (that is it will set the 'pending'
> > > >       bit, then check if 'masked' bit is set, and if not, move pending to masked
> > > >       and deliver NMI to L2, in this case, it is indeed right to use vmcb02
> > > >       and keep on using VNMI for NMIs that are directed to L1,
> > > >       but I highly doubt that this is the case.
> > > > 
> > > > 
> > > No.
> > > 
> > > >     - Most likely case - vNMI is allowed without NMI intercept,
> > > >       and real NMI does't consult the vNMI bits, but rather uses 'hosts'
> > > >       NMI masking. IRET doesn't affect host's NMI' masking as well.
> > > > 
> > > > 
> > > 
> > > No.
> > > 
> > > Thanks,
> > > Santosh
> > >  
> > > >       In this case, when L0 wants to inject NMI to a nested guest
> > > >       that has vNMI enabled, and doesn't intercept NMIs, it
> > > >       has to:
> > > > 
> > > >       - still consult the vNMI pending/masked bits of *vmcb01*,
> > > >         to know if it can inject a NMI
> > > > 
> > > >       - if it can inject it, it should update *manually* the pending/masked bits
> > > >         of vmcb01 as well, so that L1's vNMI the state remains consistent.
> > > > 
> > > >       - inject the NMI to L2, in the old fashioned way with EVENTINJ,
> > > > 	or open NMI window by intercepting IRET if NMI is masked.
> > > > 
> > > > 
> > > > Best regards,
> > > > 	Maxim Levitsky
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > +
> > > > > +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> > > > > +{
> > > > > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > > > > +
> > > > > +	if (vmcb)
> > > > > +		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > > > > +	else
> > > > > +		return false;
> > > > > +}
> > > > > +
> > > > > +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> > > > > +{
> > > > > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > > > > +
> > > > > +	if (vmcb)
> > > > > +		return !!(vmcb->control.int_ctl & V_NMI_MASK);
> > > > > +	else
> > > > > +		return false;
> > > > > +}
> > > > > +
> > > > >  /* svm.c */
> > > > >  #define MSR_INVALID				0xffffffffU
> > > > >  



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-21 15:31         ` Maxim Levitsky
@ 2022-07-21 16:08           ` Sean Christopherson
  2022-07-21 16:17             ` Maxim Levitsky
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2022-07-21 16:08 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Santosh Shukla, Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, Tom Lendacky, kvm, linux-kernel

On Thu, Jul 21, 2022, Maxim Levitsky wrote:
> On Thu, 2022-07-21 at 14:59 +0000, Sean Christopherson wrote:
> > Yep.  Dropping an NMI in the last case is ok, AFAIK no CPU will pend multiple NMIs
> > while another is in-flight.  But triggering an immediate exit in svm_nmi_allowed()
> > will hang the vCPU as the second pending NMI will never go away since the vCPU
> 
> The idea is to trigger the immediate exit only when a NMI was just injected (V_NMI_PENDING=1)
> but not masked (that is currently in service, that is V_NMI_MASK=0).

I assume you mean "and an NMI is currently NOT in service"?

Anyways, we're on the same page, trigger an exit if and only if there's an NMI pending
and the vCPU isn't already handling a vNMI.  We may need to explicitly drop one of
the pending NMIs in that case though, otherwise the NMI that _KVM_ holds pending could
get "injected" well after NMIs are unmasked, which could suprise the guest.  E.g.
guest IRETs from the second (of three) NMIs, KVM doesn't "inject" that third NMI
until the next VM-Exit, which could be a long time in the future.

> In case both bits are set, the NMI is dropped, that is no immediate exit is requested.
> 
> In this case, next VM entry should have no reason to not inject the NMI and then VM exit
> on the interrupt we raised, so there should not be a problem with forward progress.
> 
> There is an issue still, the NMI could also be masked if we are in SMM (I suggested
> setting the V_NMI_MASK manually in this case), thus in this case we won't have more
> that one pending NMI, but I guess this is not that big problem.
> 
> We can btw also in this case "open" the NMI window by waiting for RSM intercept.
> (that is just not inject the NMI, and on RSM inject it, I think that KVM already does this)
> 
> I think it should overal work, but no doubt I do expect issues and corner cases,
> 
> 
> > won't make forward progress to unmask NMIs.  This can also happen if there are
> > two pending NMIs and GIF=0, i.e. any time there are multiple pending NMIs and NMIs
> > are blocked.
> 
> GIF=0 can be dealt with though, if GIF is 0 when 2nd pending NMI arrives, we can
> delay its injection to the moment the STGI is executed and intercept STGI.
> 
> We I think already do something like that as well.

Yep, you're right, svm_enable_nmi_window() sets INTERCEPT_STGI if VGIF is enabled
and GIF=0 (and STGI exits unconditional if VGIF=0?).

So we have a poor man's NMI-window exiting.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-21 16:08           ` Sean Christopherson
@ 2022-07-21 16:17             ` Maxim Levitsky
  2022-07-21 16:25               ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2022-07-21 16:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Santosh Shukla, Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, Tom Lendacky, kvm, linux-kernel

On Thu, 2022-07-21 at 16:08 +0000, Sean Christopherson wrote:
> On Thu, Jul 21, 2022, Maxim Levitsky wrote:
> > On Thu, 2022-07-21 at 14:59 +0000, Sean Christopherson wrote:
> > > Yep.  Dropping an NMI in the last case is ok, AFAIK no CPU will pend multiple NMIs
> > > while another is in-flight.  But triggering an immediate exit in svm_nmi_allowed()
> > > will hang the vCPU as the second pending NMI will never go away since the vCPU
> > 
> > The idea is to trigger the immediate exit only when a NMI was just injected (V_NMI_PENDING=1)
> > but not masked (that is currently in service, that is V_NMI_MASK=0).
> 
> I assume you mean "and an NMI is currently NOT in service"?

Yes
> 
> Anyways, we're on the same page, trigger an exit if and only if there's an NMI pending
> and the vCPU isn't already handling a vNMI.  We may need to explicitly drop one of
> the pending NMIs in that case though, otherwise the NMI that _KVM_ holds pending could
> get "injected" well after NMIs are unmasked, which could suprise the guest.  E.g.
> guest IRETs from the second (of three) NMIs, KVM doesn't "inject" that third NMI
> until the next VM-Exit, which could be a long time in the future.
> 
> > In case both bits are set, the NMI is dropped, that is no immediate exit is requested.
> > 
> > In this case, next VM entry should have no reason to not inject the NMI and then VM exit
> > on the interrupt we raised, so there should not be a problem with forward progress.
> > 
> > There is an issue still, the NMI could also be masked if we are in SMM (I suggested
> > setting the V_NMI_MASK manually in this case), thus in this case we won't have more
> > that one pending NMI, but I guess this is not that big problem.
> > 
> > We can btw also in this case "open" the NMI window by waiting for RSM intercept.
> > (that is just not inject the NMI, and on RSM inject it, I think that KVM already does this)
> > 
> > I think it should overal work, but no doubt I do expect issues and corner cases,
> > 
> > 
> > > won't make forward progress to unmask NMIs.  This can also happen if there are
> > > two pending NMIs and GIF=0, i.e. any time there are multiple pending NMIs and NMIs
> > > are blocked.
> > 
> > GIF=0 can be dealt with though, if GIF is 0 when 2nd pending NMI arrives, we can
> > delay its injection to the moment the STGI is executed and intercept STGI.
> > 
> > We I think already do something like that as well.
> 
> Yep, you're right, svm_enable_nmi_window() sets INTERCEPT_STGI if VGIF is enabled
> and GIF=0 (and STGI exits unconditional if VGIF=0?

Its not unconditional but KVM has to set the intercept, otherwise the guest
will control the host's GIF.

Best regards,
	Maxim Levitsky


> ).
> 
> So we have a poor man's NMI-window exiting.

Yep, we also intercept IRET for the same purpose, and RSM interception
is also a place the NMI are evaluated.

We only single step over the IRET, because NMIs are unmasked _after_ the IRET
retires.

Best regards,
	Maxim Levitsky
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-21 16:17             ` Maxim Levitsky
@ 2022-07-21 16:25               ` Sean Christopherson
  0 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2022-07-21 16:25 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Santosh Shukla, Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, Tom Lendacky, kvm, linux-kernel

On Thu, Jul 21, 2022, Maxim Levitsky wrote:
> On Thu, 2022-07-21 at 16:08 +0000, Sean Christopherson wrote:
> > So we have a poor man's NMI-window exiting.
> 
> Yep, we also intercept IRET for the same purpose, and RSM interception
> is also a place the NMI are evaluated.
> 
> We only single step over the IRET, because NMIs are unmasked _after_ the IRET
> retires.

Heh, check out this blurb from Intel's SDM:

  An execution of the IRET instruction unblocks NMIs even if the instruction
  causes a fault. For example, if the IRET instruction executes with EFLAGS.VM = 1
  and IOPL of less than 3, a general-protection exception is generated (see
  Section 20.2.7, “Sensitive Instructions”). In such a case, NMIs are unmasked
  before the exception handler is invoked.

Not that I want to try and handle that in KVM if AMD follows suit, I simply find
it amusing how messy this all is.  A true NMI-window exit would have been nice...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-20 21:54   ` Sean Christopherson
  2022-07-21 12:05     ` Maxim Levitsky
@ 2022-07-29  5:51     ` Shukla, Santosh
  2022-07-29 14:41       ` Sean Christopherson
  1 sibling, 1 reply; 29+ messages in thread
From: Shukla, Santosh @ 2022-07-29  5:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel

Hello Sean,

On 7/21/2022 3:24 AM, Sean Christopherson wrote:
> On Sat, Jul 09, 2022, Santosh Shukla wrote:
>> In the VNMI case, Report NMI is not allowed when the processor set the
>> V_NMI_MASK to 1 which means the Guest is busy handling VNMI.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>> ---
>> v2:
>> - Moved vnmi check after is_guest_mode() in func _nmi_blocked().
>> - Removed is_vnmi_mask_set check from _enable_nmi_window().
>> as it was a redundent check.
>>
>>  arch/x86/kvm/svm/svm.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 3574e804d757..44c1f2317b45 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3480,6 +3480,9 @@ bool svm_nmi_blocked(struct kvm_vcpu *vcpu)
>>  	if (is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
>>  		return false;
>>  
>> +	if (is_vnmi_enabled(svm) && is_vnmi_mask_set(svm))
>> +		return true;
>> +
>>  	ret = (vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) ||
>>  	      (vcpu->arch.hflags & HF_NMI_MASK);
>>  
>> @@ -3609,6 +3612,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vcpu_svm *svm = to_svm(vcpu);
>>  
>> +	if (is_vnmi_enabled(svm))
>> +		return;
> 
> Ugh, is there really no way to trigger an exit when NMIs become unmasked?  Because
> if there isn't, this is broken for KVM.
> 

Yes. there is.

NMI_INTERCEPT will trigger VMEXIT when second NMI arrives while guest is busy handling first NMI.
And in that scenario, Guest will exit with V_NMI_MASK set to 1, KVM can inject pending(Second)
NMI(V_NMI=1). Guest will resume handling the first NMI, then HW will
clear the V_NMI_MASK and later HW will take the pending V_NMI in side the guest. 

I'll handle above case in v3.

Thanks,
Santosh

> On bare metal, if two NMIs arrive "simultaneously", so long as NMIs aren't blocked,
> the first NMI will be delivered and the second will be pended, i.e. software will
> see both NMIs.  And if that doesn't hold true, the window for a true collision is
> really, really tiny.
> 
> But in KVM, because a vCPU may not be run a long duration, that window becomes
> very large.  To not drop NMIs and more faithfully emulate hardware, KVM allows two
> NMIs to be _pending_.  And when that happens, KVM needs to trigger an exit when
> NMIs become unmasked _after_ the first NMI is injected.
> 
>> +
>>  	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
>>  		return; /* IRET will cause a vm exit */
>>  
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi
  2022-07-20 21:41   ` Sean Christopherson
  2022-07-20 22:46     ` Jim Mattson
@ 2022-07-29  6:06     ` Shukla, Santosh
  2022-07-29 13:53       ` Sean Christopherson
  1 sibling, 1 reply; 29+ messages in thread
From: Shukla, Santosh @ 2022-07-29  6:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel

Hi Sean,

On 7/21/2022 3:11 AM, Sean Christopherson wrote:
> On Sat, Jul 09, 2022, Santosh Shukla wrote:
>> Inject the NMI by setting V_NMI in the VMCB interrupt control. processor
>> will clear V_NMI to acknowledge processing has started and will keep the
>> V_NMI_MASK set until the processor is done with processing the NMI event.
>>
>> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>> ---
>> v2:
>> - Added WARN_ON check for vnmi pending.
>> - use `get_vnmi_vmcb` to get correct vmcb so to inject vnmi.
>>
>>  arch/x86/kvm/svm/svm.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 44c1f2317b45..c73a1809a7c7 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3375,12 +3375,20 @@ static void pre_svm_run(struct kvm_vcpu *vcpu)
>>  static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vcpu_svm *svm = to_svm(vcpu);
>> +	struct vmcb *vmcb = NULL;
>> +
>> +	++vcpu->stat.nmi_injections;
>> +	if (is_vnmi_enabled(svm)) {
>> +		vmcb = get_vnmi_vmcb(svm);
>> +		WARN_ON(vmcb->control.int_ctl & V_NMI_PENDING);
> 
> Haven't read the spec, but based on the changelog I assume the flag doesn't get
> cleared until the NMI is fully delivered.  That means this WARN will fire if a
> VM-Exit occurs during delivery as KVM will re-inject the event, e.g. if KVM is
> using shadow paging and a #PF handle by KVM occurs during delivery.
> 

Right,.


For the above scenario i.e.. if VMEXIT happens during delivery of virtual NMI
then EXITINTINFO will be set accordingly and V_NMI_MASK is saved as 0.
hypervisor will re-inject event in next VMRUN.

I just wanted to track above scenario,. I will replace it with pr_debug().

Thanks,
Santosh


>> +		vmcb->control.int_ctl |= V_NMI_PENDING;
>> +		return;
>> +	}
>>  
>>  	svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
>>  	vcpu->arch.hflags |= HF_NMI_MASK;
>>  	if (!sev_es_guest(vcpu->kvm))
>>  		svm_set_intercept(svm, INTERCEPT_IRET);
>> -	++vcpu->stat.nmi_injections;
>>  }
>>  
>>  static void svm_inject_irq(struct kvm_vcpu *vcpu)
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi
  2022-07-29  6:06     ` Shukla, Santosh
@ 2022-07-29 13:53       ` Sean Christopherson
  2022-07-29 13:55         ` Shukla, Santosh
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2022-07-29 13:53 UTC (permalink / raw)
  To: Shukla, Santosh
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel

On Fri, Jul 29, 2022, Shukla, Santosh wrote:
> Hi Sean,
> 
> On 7/21/2022 3:11 AM, Sean Christopherson wrote:
> > On Sat, Jul 09, 2022, Santosh Shukla wrote:
> >> +	++vcpu->stat.nmi_injections;
> >> +	if (is_vnmi_enabled(svm)) {
> >> +		vmcb = get_vnmi_vmcb(svm);
> >> +		WARN_ON(vmcb->control.int_ctl & V_NMI_PENDING);
> > 
> > Haven't read the spec, but based on the changelog I assume the flag doesn't get
> > cleared until the NMI is fully delivered.  That means this WARN will fire if a
> > VM-Exit occurs during delivery as KVM will re-inject the event, e.g. if KVM is
> > using shadow paging and a #PF handle by KVM occurs during delivery.
> > 
> 
> Right,.
> 
> 
> For the above scenario i.e.. if VMEXIT happens during delivery of virtual NMI
> then EXITINTINFO will be set accordingly and V_NMI_MASK is saved as 0.
> hypervisor will re-inject event in next VMRUN.
> 
> I just wanted to track above scenario,. I will replace it with pr_debug().

No, this is normal (albeit uncommon) behavior, don't print anything.  Even if it
weren't normal behavior, pr_debug() is usually the wrong choice in KVM.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi
  2022-07-29 13:53       ` Sean Christopherson
@ 2022-07-29 13:55         ` Shukla, Santosh
  0 siblings, 0 replies; 29+ messages in thread
From: Shukla, Santosh @ 2022-07-29 13:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel



On 7/29/2022 7:23 PM, Sean Christopherson wrote:
> On Fri, Jul 29, 2022, Shukla, Santosh wrote:
>> Hi Sean,
>>
>> On 7/21/2022 3:11 AM, Sean Christopherson wrote:
>>> On Sat, Jul 09, 2022, Santosh Shukla wrote:
>>>> +	++vcpu->stat.nmi_injections;
>>>> +	if (is_vnmi_enabled(svm)) {
>>>> +		vmcb = get_vnmi_vmcb(svm);
>>>> +		WARN_ON(vmcb->control.int_ctl & V_NMI_PENDING);
>>>
>>> Haven't read the spec, but based on the changelog I assume the flag doesn't get
>>> cleared until the NMI is fully delivered.  That means this WARN will fire if a
>>> VM-Exit occurs during delivery as KVM will re-inject the event, e.g. if KVM is
>>> using shadow paging and a #PF handle by KVM occurs during delivery.
>>>
>>
>> Right,.
>>
>>
>> For the above scenario i.e.. if VMEXIT happens during delivery of virtual NMI
>> then EXITINTINFO will be set accordingly and V_NMI_MASK is saved as 0.
>> hypervisor will re-inject event in next VMRUN.
>>
>> I just wanted to track above scenario,. I will replace it with pr_debug().
> 
> No, this is normal (albeit uncommon) behavior, don't print anything.  Even if it

Ok.

Thanks,
Santosh

> weren't normal behavior, pr_debug() is usually the wrong choice in KVM.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-29  5:51     ` Shukla, Santosh
@ 2022-07-29 14:41       ` Sean Christopherson
  2022-08-04  9:51         ` Shukla, Santosh
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2022-07-29 14:41 UTC (permalink / raw)
  To: Shukla, Santosh
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel

On Fri, Jul 29, 2022, Shukla, Santosh wrote:
> Hello Sean,
> 
> On 7/21/2022 3:24 AM, Sean Christopherson wrote:
> > On Sat, Jul 09, 2022, Santosh Shukla wrote:

...

> >> @@ -3609,6 +3612,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >>  {
> >>  	struct vcpu_svm *svm = to_svm(vcpu);
> >>  
> >> +	if (is_vnmi_enabled(svm))
> >> +		return;
> > 
> > Ugh, is there really no way to trigger an exit when NMIs become unmasked?  Because
> > if there isn't, this is broken for KVM.
> > 
> 
> Yes. there is.
> 
> NMI_INTERCEPT will trigger VMEXIT when second NMI arrives while guest is busy handling first NMI.

But NMI_INTERCEPT only applies to "real" NMIs.  The scenario laid out below is where
KVM wants to inject a virtual NMI without an associated hardware/real NMI, e.g. if
multiple vCPUs send NMI IPIs to the target.

> And in that scenario, Guest will exit with V_NMI_MASK set to 1, KVM can inject pending(Second)
> NMI(V_NMI=1). Guest will resume handling the first NMI, then HW will
> clear the V_NMI_MASK and later HW will take the pending V_NMI in side the guest. 
> 
> I'll handle above case in v3.
> 
> Thanks,
> Santosh
> 
> > On bare metal, if two NMIs arrive "simultaneously", so long as NMIs aren't blocked,
> > the first NMI will be delivered and the second will be pended, i.e. software will
> > see both NMIs.  And if that doesn't hold true, the window for a true collision is
> > really, really tiny.
> > 
> > But in KVM, because a vCPU may not be run a long duration, that window becomes
> > very large.  To not drop NMIs and more faithfully emulate hardware, KVM allows two
> > NMIs to be _pending_.  And when that happens, KVM needs to trigger an exit when
> > NMIs become unmasked _after_ the first NMI is injected.
> > 
> >> +
> >>  	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
> >>  		return; /* IRET will cause a vm exit */
> >>  
> >> -- 
> >> 2.25.1
> >>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI
  2022-07-29 14:41       ` Sean Christopherson
@ 2022-08-04  9:51         ` Shukla, Santosh
  0 siblings, 0 replies; 29+ messages in thread
From: Shukla, Santosh @ 2022-08-04  9:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Jim Mattson, Joerg Roedel,
	Maxim Levitsky, Tom Lendacky, kvm, linux-kernel



On 7/29/2022 8:11 PM, Sean Christopherson wrote:
> On Fri, Jul 29, 2022, Shukla, Santosh wrote:
>> Hello Sean,
>>
>> On 7/21/2022 3:24 AM, Sean Christopherson wrote:
>>> On Sat, Jul 09, 2022, Santosh Shukla wrote:
> 
> ...
> 
>>>> @@ -3609,6 +3612,9 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>>>>  {
>>>>  	struct vcpu_svm *svm = to_svm(vcpu);
>>>>  
>>>> +	if (is_vnmi_enabled(svm))
>>>> +		return;
>>>
>>> Ugh, is there really no way to trigger an exit when NMIs become unmasked?  Because
>>> if there isn't, this is broken for KVM.
>>>
>>
>> Yes. there is.
>>
>> NMI_INTERCEPT will trigger VMEXIT when second NMI arrives while guest is busy handling first NMI.
> 
> But NMI_INTERCEPT only applies to "real" NMIs.  The scenario laid out below is where
> KVM wants to inject a virtual NMI without an associated hardware/real NMI, e.g. if
> multiple vCPUs send NMI IPIs to the target.
> 

Yes, HW supports above case. KVM can inject the pending virtual NMI while guest busy handling
current (virtual) NMI such that after guest finished handling.. HW will take
the pending virtual NMI on the IRET instruction (w/o vmexit).

Thanks,
Santosh

>> And in that scenario, Guest will exit with V_NMI_MASK set to 1, KVM can inject pending(Second)
>> NMI(V_NMI=1). Guest will resume handling the first NMI, then HW will
>> clear the V_NMI_MASK and later HW will take the pending V_NMI in side the guest. 
>>
>> I'll handle above case in v3.
>>
>> Thanks,
>> Santosh
>>
>>> On bare metal, if two NMIs arrive "simultaneously", so long as NMIs aren't blocked,
>>> the first NMI will be delivered and the second will be pended, i.e. software will
>>> see both NMIs.  And if that doesn't hold true, the window for a true collision is
>>> really, really tiny.
>>>
>>> But in KVM, because a vCPU may not be run a long duration, that window becomes
>>> very large.  To not drop NMIs and more faithfully emulate hardware, KVM allows two
>>> NMIs to be _pending_.  And when that happens, KVM needs to trigger an exit when
>>> NMIs become unmasked _after_ the first NMI is injected.
>>>
>>>> +
>>>>  	if ((vcpu->arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK)
>>>>  		return; /* IRET will cause a vm exit */
>>>>  
>>>> -- 
>>>> 2.25.1
>>>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-08-04  9:51 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-09 13:42 [PATCHv2 0/7] Virtual NMI feature Santosh Shukla
2022-07-09 13:42 ` [PATCHv2 1/7] x86/cpu: Add CPUID feature bit for VNMI Santosh Shukla
2022-07-09 13:42 ` [PATCHv2 2/7] KVM: SVM: Add VNMI bit definition Santosh Shukla
2022-07-09 13:42 ` [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask Santosh Shukla
2022-07-10 16:15   ` Maxim Levitsky
2022-07-21  9:34     ` Shukla, Santosh
2022-07-21 12:01       ` Maxim Levitsky
2022-07-21 13:12         ` Shukla, Santosh
2022-07-21 15:48           ` Maxim Levitsky
2022-07-09 13:42 ` [PATCHv2 4/7] KVM: SVM: Report NMI not allowed when Guest busy handling VNMI Santosh Shukla
2022-07-20 21:54   ` Sean Christopherson
2022-07-21 12:05     ` Maxim Levitsky
2022-07-21 14:59       ` Sean Christopherson
2022-07-21 15:31         ` Maxim Levitsky
2022-07-21 16:08           ` Sean Christopherson
2022-07-21 16:17             ` Maxim Levitsky
2022-07-21 16:25               ` Sean Christopherson
2022-07-29  5:51     ` Shukla, Santosh
2022-07-29 14:41       ` Sean Christopherson
2022-08-04  9:51         ` Shukla, Santosh
2022-07-09 13:42 ` [PATCHv2 5/7] KVM: SVM: Add VNMI support in inject_nmi Santosh Shukla
2022-07-20 21:41   ` Sean Christopherson
2022-07-20 22:46     ` Jim Mattson
2022-07-20 23:04       ` Sean Christopherson
2022-07-29  6:06     ` Shukla, Santosh
2022-07-29 13:53       ` Sean Christopherson
2022-07-29 13:55         ` Shukla, Santosh
2022-07-09 13:42 ` [PATCHv2 6/7] KVM: nSVM: implement nested VNMI Santosh Shukla
2022-07-09 13:42 ` [PATCHv2 7/7] KVM: SVM: Enable VNMI feature Santosh Shukla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).