All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND 00/30] My patch queue
@ 2022-02-07 15:54 Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 01/30] KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case Maxim Levitsky
                   ` (30 more replies)
  0 siblings, 31 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

This is set of various patches that are stuck in my patch queue.

KVM_REQ_GET_NESTED_STATE_PAGES patch is mostly RFC, but it does seem
to work for me.

Read-only APIC ID is also somewhat RFC.

Some of these patches are preparation for support for nested AVIC
which I almost done developing, and will start testing very soon.

Resend with cleaned up CCs.

Best regards,
	Maxim Levitsky

Maxim Levitsky (30):
  KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT &&
    !gCR0.PG case
  KVM: x86: nSVM: fix potential NULL derefernce on nested migration
  KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state
  KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a
    result of RSM
  KVM: x86: nSVM: expose clean bit support to the guest
  KVM: x86: mark syntethic SMM vmexit as SVM_EXIT_SW
  KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but
    lets L2 control them
  KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when
    inhibiting it
  KVM: x86: SVM: move avic definitions from AMD's spec to svm.h
  KVM: x86: SVM: fix race between interrupt delivery and AVIC inhibition
  KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  KVM: x86: SVM: allow AVIC to co-exist with a nested guest running
  KVM: x86: lapic: don't allow to change APIC ID when apic acceleration
    is enabled
  KVM: x86: lapic: don't allow to change local apic id when using older
    x2apic api
  KVM: x86: SVM: remove avic's broken code that updated APIC ID
  KVM: x86: SVM: allow to force AVIC to be enabled
  KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
  KVM: x86: mmu: add strict mmu mode
  KVM: x86: mmu: add gfn_in_memslot helper
  KVM: x86: mmu: allow to enable write tracking externally
  x86: KVMGT: use kvm_page_track_write_tracking_enable
  KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running
  KVM: x86: nSVM: implement nested LBR virtualization
  KVM: x86: nSVM: implement nested VMLOAD/VMSAVE
  KVM: x86: nSVM: support PAUSE filter threshold and count when
    cpu_pm=on
  KVM: x86: nSVM: implement nested vGIF
  KVM: x86: add force_intercept_exceptions_mask
  KVM: SVM: implement force_intercept_exceptions_mask
  KVM: VMX: implement force_intercept_exceptions_mask
  KVM: x86: get rid of KVM_REQ_GET_NESTED_STATE_PAGES

 arch/x86/include/asm/kvm-x86-ops.h    |   1 +
 arch/x86/include/asm/kvm_host.h       |  24 +-
 arch/x86/include/asm/kvm_page_track.h |   1 +
 arch/x86/include/asm/msr-index.h      |   1 +
 arch/x86/include/asm/svm.h            |  36 +++
 arch/x86/include/uapi/asm/kvm.h       |   1 +
 arch/x86/kvm/Kconfig                  |   3 -
 arch/x86/kvm/hyperv.c                 |   4 +
 arch/x86/kvm/lapic.c                  |  53 ++--
 arch/x86/kvm/mmu.h                    |   8 +-
 arch/x86/kvm/mmu/mmu.c                |  31 ++-
 arch/x86/kvm/mmu/page_track.c         |  10 +-
 arch/x86/kvm/svm/avic.c               | 135 +++-------
 arch/x86/kvm/svm/nested.c             | 167 +++++++-----
 arch/x86/kvm/svm/svm.c                | 375 ++++++++++++++++++++++----
 arch/x86/kvm/svm/svm.h                |  60 +++--
 arch/x86/kvm/svm/svm_onhyperv.c       |   1 +
 arch/x86/kvm/vmx/nested.c             | 107 +++-----
 arch/x86/kvm/vmx/vmcs.h               |   6 +
 arch/x86/kvm/vmx/vmx.c                |  48 +++-
 arch/x86/kvm/x86.c                    |  42 ++-
 arch/x86/kvm/x86.h                    |   5 +
 drivers/gpu/drm/i915/Kconfig          |   1 -
 drivers/gpu/drm/i915/gvt/kvmgt.c      |   5 +
 include/linux/kvm_host.h              |  10 +-
 25 files changed, 764 insertions(+), 371 deletions(-)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH RESEND 01/30] KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 02/30] KVM: x86: nSVM: fix potential NULL derefernce on nested migration Maxim Levitsky
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky, stable

When the guest doesn't enable paging, and NPT/EPT is disabled, we
use guest't paging CR3's as KVM's shadow paging pointer and
we are technically in direct mode as if we were to use NPT/EPT.

In direct mode we create SPTEs with user mode permissions
because usually in the direct mode the NPT/EPT doesn't
need to restrict access based on guest CPL
(there are MBE/GMET extenstions for that but KVM doesn't use them).

In this special "use guest paging as direct" mode however,
and if CR4.SMAP/CR4.SMEP are enabled, that will make the CPU
fault on each access and KVM will enter endless loop of page faults.

Since page protection doesn't have any meaning in !PG case,
just don't passthrough these bits.

The fix is the same as was done for VMX in commit:
commit 656ec4a4928a ("KVM: VMX: fix SMEP and SMAP without EPT")

This fixes the boot of windows 10 without NPT for good.
(Without this patch, BSP boots, but APs were stuck in endless
loop of page faults, causing the VM boot with 1 CPU)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kvm/svm/svm.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 975be872cd1a3..995c203a62fd9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1596,6 +1596,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	u64 hcr0 = cr0;
+	bool old_paging = is_paging(vcpu);
 
 #ifdef CONFIG_X86_64
 	if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) {
@@ -1612,8 +1613,11 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 #endif
 	vcpu->arch.cr0 = cr0;
 
-	if (!npt_enabled)
+	if (!npt_enabled) {
 		hcr0 |= X86_CR0_PG | X86_CR0_WP;
+		if (old_paging != is_paging(vcpu))
+			svm_set_cr4(vcpu, kvm_read_cr4(vcpu));
+	}
 
 	/*
 	 * re-enable caching here because the QEMU bios
@@ -1657,8 +1661,12 @@ void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		svm_flush_tlb_current(vcpu);
 
 	vcpu->arch.cr4 = cr4;
-	if (!npt_enabled)
+	if (!npt_enabled) {
 		cr4 |= X86_CR4_PAE;
+
+		if (!is_paging(vcpu))
+			cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
+	}
 	cr4 |= host_cr4_mce;
 	to_svm(vcpu)->vmcb->save.cr4 = cr4;
 	vmcb_mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 02/30] KVM: x86: nSVM: fix potential NULL derefernce on nested migration
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 01/30] KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 03/30] KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state Maxim Levitsky
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky, stable

Turns out that due to review feedback and/or rebases
I accidentally moved the call to nested_svm_load_cr3 to be too early,
before the NPT is enabled, which is very wrong to do.

KVM can't even access guest memory at that point as nested NPT
is needed for that, and of course it won't initialize the walk_mmu,
which is main issue the patch was addressing.

Fix this for real.

Fixes: 232f75d3b4b5 ("KVM: nSVM: call nested_svm_load_cr3 on nested state load")
Cc: stable@vger.kernel.org

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1218b5a342fc8..39d280e7e80ef 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1457,18 +1457,6 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 	    !__nested_vmcb_check_save(vcpu, &save_cached))
 		goto out_free;
 
-	/*
-	 * While the nested guest CR3 is already checked and set by
-	 * KVM_SET_SREGS, it was set when nested state was yet loaded,
-	 * thus MMU might not be initialized correctly.
-	 * Set it again to fix this.
-	 */
-
-	ret = nested_svm_load_cr3(&svm->vcpu, vcpu->arch.cr3,
-				  nested_npt_enabled(svm), false);
-	if (WARN_ON_ONCE(ret))
-		goto out_free;
-
 
 	/*
 	 * All checks done, we can enter guest mode. Userspace provides
@@ -1494,6 +1482,20 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 
 	svm_switch_vmcb(svm, &svm->nested.vmcb02);
 	nested_vmcb02_prepare_control(svm);
+
+	/*
+	 * While the nested guest CR3 is already checked and set by
+	 * KVM_SET_SREGS, it was set when nested state was yet loaded,
+	 * thus MMU might not be initialized correctly.
+	 * Set it again to fix this.
+	 */
+
+	ret = nested_svm_load_cr3(&svm->vcpu, vcpu->arch.cr3,
+				  nested_npt_enabled(svm), false);
+	if (WARN_ON_ONCE(ret))
+		goto out_free;
+
+
 	kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
 	ret = 0;
 out_free:
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 03/30] KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 01/30] KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 02/30] KVM: x86: nSVM: fix potential NULL derefernce on nested migration Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 04/30] KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM Maxim Levitsky
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky, stable

While usually, restoring the smm state makes the KVM enter
the nested guest thus a different vmcb (vmcb02 vs vmcb01),
KVM should still mark it as dirty, since hardware
can in theory cache multiple vmcbs.

Failure to do so, combined with lack of setting the
nested_run_pending (which is fixed in the next patch),
might make KVM re-enter vmcb01, which was just exited from,
with completely different set of guest state registers
(SMM vs non SMM) and without proper dirty bits set,
which results in the CPU reusing stale IDTR pointer
which leads to a guest shutdown on any interrupt.

On the real hardware this usually doesn't happen,
but when running nested, L0's KVM does check and
honour few dirty bits, causing this issue to happen.

This patch fixes boot of hyperv and SMM enabled
windows VM running nested on KVM.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kvm/svm/svm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 995c203a62fd9..3f1d11e652123 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4267,6 +4267,8 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 	 * Enter the nested guest now
 	 */
 
+	vmcb_mark_all_dirty(svm->vmcb01.ptr);
+
 	vmcb12 = map.hva;
 	nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
 	nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 04/30] KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (2 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 03/30] KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 05/30] KVM: x86: nSVM: expose clean bit support to the guest Maxim Levitsky
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky, stable

While RSM induced VM entries are not full VM entries,
they still need to be followed by actual VM entry to complete it,
unlike setting the nested state.

This patch fixes boot of hyperv and SMM enabled
windows VM running nested on KVM, which fail due
to this issue combined with lack of dirty bit setting.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kvm/svm/svm.c | 5 +++++
 arch/x86/kvm/vmx/vmx.c | 1 +
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3f1d11e652123..71bfa52121622 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4274,6 +4274,11 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 	nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
 	ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, false);
 
+	if (ret)
+		goto unmap_save;
+
+	svm->nested.nested_run_pending = 1;
+
 unmap_save:
 	kvm_vcpu_unmap(vcpu, &map_save, true);
 unmap_map:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8ac5a6fa77203..fc9c4eca90a78 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7659,6 +7659,7 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 		if (ret)
 			return ret;
 
+		vmx->nested.nested_run_pending = 1;
 		vmx->nested.smm.guest_mode = false;
 	}
 	return 0;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 05/30] KVM: x86: nSVM: expose clean bit support to the guest
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (3 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 04/30] KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 06/30] KVM: x86: mark syntethic SMM vmexit as SVM_EXIT_SW Maxim Levitsky
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

KVM already honours few clean bits thus it makes sense
to let the nested guest know about it.

Note that KVM also doesn't check if the hardware supports
clean bits, and therefore nested KVM was
already setting clean bits and L0 KVM
was already honouring them.


Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 71bfa52121622..8013be9edf27c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4663,6 +4663,7 @@ static __init void svm_set_cpu_caps(void)
 	/* CPUID 0x80000001 and 0x8000000A (SVM features) */
 	if (nested) {
 		kvm_cpu_cap_set(X86_FEATURE_SVM);
+		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN);
 
 		if (nrips)
 			kvm_cpu_cap_set(X86_FEATURE_NRIPS);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 06/30] KVM: x86: mark syntethic SMM vmexit as SVM_EXIT_SW
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (4 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 05/30] KVM: x86: nSVM: expose clean bit support to the guest Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 07/30] KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them Maxim Levitsky
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Use a dummy unused vmexit reason to mark the 'VM exit'
that is happening when we exit to handle SMM,
which is not a real VM exit.

This makes it a bit easier to read the KVM trace,
and avoids other potential problems.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8013be9edf27c..9a4e299ed5673 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4194,7 +4194,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
 	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
 
-	ret = nested_svm_vmexit(svm);
+	ret = nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
 	if (ret)
 		return ret;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 07/30] KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (5 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 06/30] KVM: x86: mark syntethic SMM vmexit as SVM_EXIT_SW Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-08 11:33   ` Paolo Bonzini
  2022-02-07 15:54 ` [PATCH RESEND 08/30] KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it Maxim Levitsky
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Fix a corner case in which the L1 hypervisor intercepts
interrupts (INTERCEPT_INTR) and either doesn't set
virtual interrupt masking (V_INTR_MASKING) or enters a
nested guest with EFLAGS.IF disabled prior to the entry.

In this case, despite the fact that L1 intercepts the interrupts,
KVM still needs to set up an interrupt window to wait before
injecting the INTR vmexit.

Currently the KVM instead enters an endless loop of 'req_immediate_exit'.

Exactly the same issue also happens for SMIs and NMI.
Fix this as well.

Note that on VMX, this case is impossible as there is only
'vmexit on external interrupts' execution control which either set,
in which case both host and guest's EFLAGS.IF
are ignored, or not set, in which case no VMexits are delivered.


Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9a4e299ed5673..22e614008cf59 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3372,11 +3372,13 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 	if (svm->nested.nested_run_pending)
 		return -EBUSY;
 
+	if (svm_nmi_blocked(vcpu))
+		return 0;
+
 	/* An NMI must not be injected into L2 if it's supposed to VM-Exit.  */
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
 		return -EBUSY;
-
-	return !svm_nmi_blocked(vcpu);
+	return 1;
 }
 
 static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
@@ -3428,9 +3430,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
 static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+
 	if (svm->nested.nested_run_pending)
 		return -EBUSY;
 
+	if (svm_interrupt_blocked(vcpu))
+		return 0;
+
 	/*
 	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
 	 * e.g. if the IRQ arrived asynchronously after checking nested events.
@@ -3438,7 +3444,7 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
 		return -EBUSY;
 
-	return !svm_interrupt_blocked(vcpu);
+	return 1;
 }
 
 static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
@@ -4169,11 +4175,14 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
 	if (svm->nested.nested_run_pending)
 		return -EBUSY;
 
+	if (svm_smi_blocked(vcpu))
+		return 0;
+
 	/* An SMI must not be injected into L2 if it's supposed to VM-Exit.  */
 	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_smi(svm))
 		return -EBUSY;
 
-	return !svm_smi_blocked(vcpu);
+	return 1;
 }
 
 static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 08/30] KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (6 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 07/30] KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 09/30] KVM: x86: SVM: move avic definitions from AMD's spec to svm.h Maxim Levitsky
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits
can be set by the CPU after it is called, and don't cause the irr_pending
to be set to true.

Also logic in avic_kick_target_vcpu doesn't expect a race with this
function so to make it simple, just keep irr_pending set to true and
let the next interrupt injection to the guest clear it.


Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0da7d0960fcb5..dd4e2888c244b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2307,7 +2307,12 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
 		apic->irr_pending = true;
 		apic->isr_count = 1;
 	} else {
-		apic->irr_pending = (apic_search_irr(apic) != -1);
+		/*
+		 * Don't clear irr_pending, searching the IRR can race with
+		 * updates from the CPU as APICv is still active from hardware's
+		 * perspective.  The flag will be cleared as appropriate when
+		 * KVM injects the interrupt.
+		 */
 		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
 	}
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 09/30] KVM: x86: SVM: move avic definitions from AMD's spec to svm.h
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (7 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 08/30] KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 10/30] KVM: x86: SVM: fix race between interrupt delivery and AVIC inhibition Maxim Levitsky
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

asm/svm.h is the correct place for all values that are defined in
the SVM spec, and that includes AVIC.

Also add some values from the spec that were not defined before
and will be soon useful.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/include/asm/svm.h       | 36 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/avic.c          | 22 +------------------
 arch/x86/kvm/svm/svm.h           | 11 ----------
 4 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 01e2650b95859..552ff8a5ea023 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -476,6 +476,7 @@
 #define MSR_AMD64_ICIBSEXTDCTL		0xc001103c
 #define MSR_AMD64_IBSOPDATA4		0xc001103d
 #define MSR_AMD64_IBS_REG_COUNT_MAX	8 /* includes MSR_AMD64_IBSBRTARGET */
+#define MSR_AMD64_SVM_AVIC_DOORBELL	0xc001011b
 #define MSR_AMD64_VM_PAGE_FLUSH		0xc001011e
 #define MSR_AMD64_SEV_ES_GHCB		0xc0010130
 #define MSR_AMD64_SEV			0xc0010131
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index b00dbc5fac2b2..bb2fb78523cee 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -220,6 +220,42 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_NESTED_CTL_SEV_ENABLE	BIT(1)
 #define SVM_NESTED_CTL_SEV_ES_ENABLE	BIT(2)
 
+
+/* AVIC */
+#define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK	(0xFF)
+#define AVIC_LOGICAL_ID_ENTRY_VALID_BIT			31
+#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK		(1 << 31)
+
+#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK	(0xFFULL)
+#define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK	(0xFFFFFFFFFFULL << 12)
+#define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK		(1ULL << 62)
+#define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK		(1ULL << 63)
+#define AVIC_PHYSICAL_ID_TABLE_SIZE_MASK		(0xFF)
+
+#define AVIC_DOORBELL_PHYSICAL_ID_MASK			(0xFF)
+
+#define AVIC_UNACCEL_ACCESS_WRITE_MASK		1
+#define AVIC_UNACCEL_ACCESS_OFFSET_MASK		0xFF0
+#define AVIC_UNACCEL_ACCESS_VECTOR_MASK		0xFFFFFFFF
+
+enum avic_ipi_failure_cause {
+	AVIC_IPI_FAILURE_INVALID_INT_TYPE,
+	AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
+	AVIC_IPI_FAILURE_INVALID_TARGET,
+	AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
+};
+
+
+/*
+ * 0xff is broadcast, so the max index allowed for physical APIC ID
+ * table is 0xfe.  APIC IDs above 0xff are reserved.
+ */
+#define AVIC_MAX_PHYSICAL_ID_COUNT	0xff
+
+#define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
+#define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
+
+
 struct vmcb_seg {
 	u16 selector;
 	u16 attrib;
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 99f907ec5aa8f..fabfc337e1c35 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -27,20 +27,6 @@
 #include "irq.h"
 #include "svm.h"
 
-#define SVM_AVIC_DOORBELL	0xc001011b
-
-#define AVIC_HPA_MASK	~((0xFFFULL << 52) | 0xFFF)
-
-/*
- * 0xff is broadcast, so the max index allowed for physical APIC ID
- * table is 0xfe.  APIC IDs above 0xff are reserved.
- */
-#define AVIC_MAX_PHYSICAL_ID_COUNT	255
-
-#define AVIC_UNACCEL_ACCESS_WRITE_MASK		1
-#define AVIC_UNACCEL_ACCESS_OFFSET_MASK		0xFF0
-#define AVIC_UNACCEL_ACCESS_VECTOR_MASK		0xFFFFFFFF
-
 /* AVIC GATAG is encoded using VM and VCPU IDs */
 #define AVIC_VCPU_ID_BITS		8
 #define AVIC_VCPU_ID_MASK		((1 << AVIC_VCPU_ID_BITS) - 1)
@@ -73,12 +59,6 @@ struct amd_svm_iommu_ir {
 	void *data;		/* Storing pointer to struct amd_ir_data */
 };
 
-enum avic_ipi_failure_cause {
-	AVIC_IPI_FAILURE_INVALID_INT_TYPE,
-	AVIC_IPI_FAILURE_TARGET_NOT_RUNNING,
-	AVIC_IPI_FAILURE_INVALID_TARGET,
-	AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
-};
 
 /* Note:
  * This function is called from IOMMU driver to notify
@@ -702,7 +682,7 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
 		 * one is harmless).
 		 */
 		if (cpu != get_cpu())
-			wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
+			wrmsrl(MSR_AMD64_SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
 		put_cpu();
 	} else {
 		/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 852b12aee03d7..6343558982c73 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -555,17 +555,6 @@ extern struct kvm_x86_nested_ops svm_nested_ops;
 
 /* avic.c */
 
-#define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK	(0xFF)
-#define AVIC_LOGICAL_ID_ENTRY_VALID_BIT			31
-#define AVIC_LOGICAL_ID_ENTRY_VALID_MASK		(1 << 31)
-
-#define AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK	(0xFFULL)
-#define AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK	(0xFFFFFFFFFFULL << 12)
-#define AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK		(1ULL << 62)
-#define AVIC_PHYSICAL_ID_ENTRY_VALID_MASK		(1ULL << 63)
-
-#define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
-
 int avic_ga_log_notifier(u32 ga_tag);
 void avic_vm_destroy(struct kvm *kvm);
 int avic_vm_init(struct kvm *kvm);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 10/30] KVM: x86: SVM: fix race between interrupt delivery and AVIC inhibition
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (8 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 09/30] KVM: x86: SVM: move avic definitions from AMD's spec to svm.h Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 11/30] KVM: x86: SVM: use vmcb01 in avic_init_vmcb Maxim Levitsky
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
inhibited, it might read a stale value of vcpu->arch.apicv_active
which can lead to the target vCPU not noticing the interrupt.

To fix this use load-acquire/store-release so that, if the target vCPU
is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
AVIC.  If AVIC has been disabled in the meanwhile, proceed with the
KVM_REQ_EVENT-based delivery.

All this complicated logic is actually exactly how we can handle an
incomplete IPI vmexit; the only difference lies in who sets IRR, whether
KVM or the processor.

Also incomplete IPI vmexit also has the same races as
svm_deliver_avic_intr.
Therefore use the avic_kick_target_vcpu there as well.

Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 73 ++++++++++++++---------------------------
 arch/x86/kvm/svm/svm.c  | 65 ++++++++++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.h  |  3 ++
 arch/x86/kvm/x86.c      |  4 ++-
 4 files changed, 82 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index fabfc337e1c35..4c2d622b3b9f0 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -269,6 +269,24 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+
+void avic_ring_doorbell(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Note, the vCPU could get migrated to a different pCPU at any
+	 * point, which could result in signalling the wrong/previous
+	 * pCPU.  But if that happens the vCPU is guaranteed to do a
+	 * VMRUN (after being migrated) and thus will process pending
+	 * interrupts, i.e. a doorbell is not needed (and the spurious
+	 * one is harmless).
+	 */
+	int cpu = READ_ONCE(vcpu->cpu);
+
+	if (cpu != get_cpu())
+		wrmsrl(MSR_AMD64_SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
+	put_cpu();
+}
+
 static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
 				   u32 icrl, u32 icrh)
 {
@@ -284,8 +302,13 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
 					GET_APIC_DEST_FIELD(icrh),
-					icrl & APIC_DEST_MASK))
-			kvm_vcpu_wake_up(vcpu);
+					icrl & APIC_DEST_MASK)) {
+			vcpu->arch.apic->irr_pending = true;
+			svm_complete_interrupt_delivery(vcpu,
+							icrl & APIC_MODE_MASK,
+							icrl & APIC_INT_LEVELTRIG,
+							icrl & APIC_VECTOR_MASK);
+		}
 	}
 }
 
@@ -649,52 +672,6 @@ void avic_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
 	return;
 }
 
-int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
-{
-	if (!vcpu->arch.apicv_active)
-		return -1;
-
-	kvm_lapic_set_irr(vec, vcpu->arch.apic);
-
-	/*
-	 * Pairs with the smp_mb_*() after setting vcpu->guest_mode in
-	 * vcpu_enter_guest() to ensure the write to the vIRR is ordered before
-	 * the read of guest_mode, which guarantees that either VMRUN will see
-	 * and process the new vIRR entry, or that the below code will signal
-	 * the doorbell if the vCPU is already running in the guest.
-	 */
-	smp_mb__after_atomic();
-
-	/*
-	 * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
-	 * is in the guest.  If the vCPU is not in the guest, hardware will
-	 * automatically process AVIC interrupts at VMRUN.
-	 */
-	if (vcpu->mode == IN_GUEST_MODE) {
-		int cpu = READ_ONCE(vcpu->cpu);
-
-		/*
-		 * Note, the vCPU could get migrated to a different pCPU at any
-		 * point, which could result in signalling the wrong/previous
-		 * pCPU.  But if that happens the vCPU is guaranteed to do a
-		 * VMRUN (after being migrated) and thus will process pending
-		 * interrupts, i.e. a doorbell is not needed (and the spurious
-		 * one is harmless).
-		 */
-		if (cpu != get_cpu())
-			wrmsrl(MSR_AMD64_SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu));
-		put_cpu();
-	} else {
-		/*
-		 * Wake the vCPU if it was blocking.  KVM will then detect the
-		 * pending IRQ when checking if the vCPU has a wake event.
-		 */
-		kvm_vcpu_wake_up(vcpu);
-	}
-
-	return 0;
-}
-
 bool avic_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
 {
 	return false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 22e614008cf59..18d4d87e12e15 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3310,20 +3310,6 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu)
 		SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
 }
 
-static void svm_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
-				  int trig_mode, int vector)
-{
-	struct kvm_vcpu *vcpu = apic->vcpu;
-
-	if (svm_deliver_avic_intr(vcpu, vector)) {
-		kvm_lapic_set_irr(vector, apic);
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
-		kvm_vcpu_kick(vcpu);
-	} else {
-		trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
-					   trig_mode, vector);
-	}
-}
 
 static void svm_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
@@ -4142,6 +4128,57 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
 	return ret;
 }
 
+void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
+		  int trig_mode, int vec)
+{
+	/*
+	 * vcpu->arch.apicv_active must be read after vcpu->mode.
+	 * Pairs with smp_store_release in vcpu_enter_guest.
+	 */
+	bool in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);
+
+	if (!READ_ONCE(vcpu->arch.apicv_active)) {
+		/*
+		 * Manually signal the event
+		 */
+		kvm_make_request(KVM_REQ_EVENT, vcpu);
+		kvm_vcpu_kick(vcpu);
+		return;
+	}
+
+	trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vec);
+
+	if (in_guest_mode)
+		/*
+		 * Signal the doorbell to tell hardware to inject the IRQ if the vCPU
+		 * is in the guest.  If the vCPU is not in the guest, hardware will
+		 * automatically process AVIC interrupts at VMRUN.
+		 */
+		avic_ring_doorbell(vcpu);
+	else
+		/*
+		 * Wake the vCPU if it was blocking.  KVM will then detect the
+		 * pending IRQ when checking if the vCPU has a wake event.
+		 */
+		kvm_vcpu_wake_up(vcpu);
+}
+
+static void svm_deliver_interrupt(struct kvm_lapic *apic,  int delivery_mode,
+		  int trig_mode, int vec)
+{
+	kvm_lapic_set_irr(vec, apic);
+
+	/*
+	 * Pairs with the smp_mb_*() after setting vcpu->guest_mode in
+	 * vcpu_enter_guest() to ensure the write to the vIRR is ordered before
+	 * the read of guest_mode, which guarantees that either VMRUN will see
+	 * and process the new vIRR entry, or that the below code will signal
+	 * the doorbell if the vCPU is already running in the guest.
+	 */
+	smp_mb__after_atomic();
+	svm_complete_interrupt_delivery(apic->vcpu, delivery_mode, trig_mode, vec);
+}
+
 static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6343558982c73..83f9f95eced3e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -488,6 +488,8 @@ void svm_set_gif(struct vcpu_svm *svm, bool value);
 int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code);
 void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
 			  int read, int write);
+void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
+		  int trig_mode, int vec);
 
 /* nested.c */
 
@@ -577,6 +579,7 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
 			uint32_t guest_irq, bool set);
 void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
 void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
+void avic_ring_doorbell(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f69f3e3635e2..8cb5390f75efe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10039,7 +10039,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * result in virtual interrupt delivery.
 	 */
 	local_irq_disable();
-	vcpu->mode = IN_GUEST_MODE;
+
+	/* Store vcpu->apicv_active before vcpu->mode.  */
+	smp_store_release(&vcpu->mode, IN_GUEST_MODE);
 
 	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 11/30] KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (9 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 10/30] KVM: x86: SVM: fix race between interrupt delivery and AVIC inhibition Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 12/30] KVM: x86: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Out of precation use vmcb01 when enabling host AVIC.

No functional change intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 4c2d622b3b9f0..c6072245f7fbb 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -167,7 +167,7 @@ int avic_vm_init(struct kvm *kvm)
 
 void avic_init_vmcb(struct vcpu_svm *svm)
 {
-	struct vmcb *vmcb = svm->vmcb;
+	struct vmcb *vmcb = svm->vmcb01.ptr;
 	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
 	phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
 	phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic_logical_id_table_page));
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 12/30] KVM: x86: SVM: allow AVIC to co-exist with a nested guest running
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (10 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 11/30] KVM: x86: SVM: use vmcb01 in avic_init_vmcb Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 13/30] KVM: x86: lapic: don't allow to change APIC ID when apic acceleration is enabled Maxim Levitsky
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Inhibit the AVIC of the vCPU that is running nested for the duration of the
nested run, so that all interrupts arriving from both its vCPU siblings
and from KVM are delivered using normal IPIs and cause that vCPU to vmexit.

Note that unlike normal AVIC inhibition, there is no need to
update the AVIC mmio memslot, because the nested guest uses its
own set of paging tables.
That also means that AVIC doesn't need to be inhibited VM wide.

Note that in the theory when a nested guest doesn't intercept
physical interrupts, we could continue using AVIC to deliver them
to it but don't bother doing so for now. Plus when nested AVIC
is implemented, the nested guest will likely use it, which will
not allow this optimization to be used

(can't use real AVIC to support both L1 and L2 at the same time)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  8 +++++++-
 arch/x86/kvm/svm/avic.c            |  7 ++++++-
 arch/x86/kvm/svm/nested.c          | 15 ++++++++++-----
 arch/x86/kvm/svm/svm.c             | 31 +++++++++++++++++++-----------
 arch/x86/kvm/svm/svm.h             |  1 +
 arch/x86/kvm/x86.c                 | 18 +++++++++++++++--
 7 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 9e37dc3d88636..c0d8f351dcbc0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -125,6 +125,7 @@ KVM_X86_OP_NULL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP_NULL(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
+KVM_X86_OP_NULL(vcpu_has_apicv_inhibit_condition);
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c371ee7e45f78..256539c0481c5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1039,7 +1039,6 @@ struct kvm_x86_msr_filter {
 
 #define APICV_INHIBIT_REASON_DISABLE    0
 #define APICV_INHIBIT_REASON_HYPERV     1
-#define APICV_INHIBIT_REASON_NESTED     2
 #define APICV_INHIBIT_REASON_IRQWIN     3
 #define APICV_INHIBIT_REASON_PIT_REINJ  4
 #define APICV_INHIBIT_REASON_X2APIC	5
@@ -1494,6 +1493,12 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+
+	/*
+	 * Returns true if for some reason APICv (e.g guest mode)
+	 * must be inhibited on this vCPU
+	 */
+	bool (*vcpu_has_apicv_inhibit_condition)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
@@ -1784,6 +1789,7 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
 
 bool kvm_apicv_activated(struct kvm *kvm);
 void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
+bool vcpu_has_apicv_inhibit_condition(struct kvm_vcpu *vcpu);
 void kvm_request_apicv_update(struct kvm *kvm, bool activate,
 			      unsigned long bit);
 
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index c6072245f7fbb..8f23e7d239097 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -677,6 +677,12 @@ bool avic_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
 	return false;
 }
 
+bool avic_has_vcpu_inhibit_condition(struct kvm_vcpu *vcpu)
+{
+	return is_guest_mode(vcpu);
+}
+
+
 static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
 {
 	unsigned long flags;
@@ -888,7 +894,6 @@ bool avic_check_apicv_inhibit_reasons(ulong bit)
 	ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
 			  BIT(APICV_INHIBIT_REASON_ABSENT) |
 			  BIT(APICV_INHIBIT_REASON_HYPERV) |
-			  BIT(APICV_INHIBIT_REASON_NESTED) |
 			  BIT(APICV_INHIBIT_REASON_IRQWIN) |
 			  BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
 			  BIT(APICV_INHIBIT_REASON_X2APIC) |
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 39d280e7e80ef..ac9159b0618c7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -551,11 +551,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	 * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
 	 */
 
-	/*
-	 * Also covers avic_vapic_bar, avic_backing_page, avic_logical_id,
-	 * avic_physical_id.
-	 */
-	WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
 
 	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
 	svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl;
@@ -659,6 +654,9 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
 
 	svm_set_gif(svm, true);
 
+	if (kvm_vcpu_apicv_active(vcpu))
+		kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
+
 	return 0;
 }
 
@@ -923,6 +921,13 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (unlikely(svm->vmcb->save.rflags & X86_EFLAGS_TF))
 		kvm_queue_exception(&(svm->vcpu), DB_VECTOR);
 
+	/*
+	 * Un-inhibit the AVIC right away, so that other vCPUs can start
+	 * to benefit from VM-exit less IPI right away
+	 */
+	if (kvm_apicv_activated(vcpu->kvm))
+		kvm_vcpu_update_apicv(vcpu);
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 18d4d87e12e15..85035324ed762 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1392,7 +1392,8 @@ static void svm_set_vintr(struct vcpu_svm *svm)
 	/*
 	 * The following fields are ignored when AVIC is enabled
 	 */
-	WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
+	if (!is_guest_mode(&svm->vcpu))
+		WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
 
 	svm_set_intercept(svm, INTERCEPT_VINTR);
 
@@ -2898,10 +2899,16 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
 	svm_clear_vintr(to_svm(vcpu));
 
 	/*
-	 * For AVIC, the only reason to end up here is ExtINTs.
+	 * If not running nested, for AVIC, the only reason to end up here is ExtINTs.
 	 * In this case AVIC was temporarily disabled for
 	 * requesting the IRQ window and we have to re-enable it.
+	 *
+	 * If running nested, still uninhibit the AVIC in case irq window
+	 * was requested when it was not running nested.
+	 * All vCPUs which run nested will have their AVIC still
+	 * inhibited due to AVIC inhibition override for that.
 	 */
+
 	kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN);
 
 	++vcpu->stat.irq_window_exits;
@@ -3451,8 +3458,16 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
 		 * unless we have pending ExtINT since it cannot be injected
 		 * via AVIC. In such case, we need to temporarily disable AVIC,
 		 * and fallback to injecting IRQ via V_IRQ.
+		 *
+		 * If running nested, this vCPU will use separate page tables
+		 * which don't have L1's AVIC mapped, and the AVIC is
+		 * already inhibited thus there is no need for global
+		 * AVIC inhibition.
 		 */
-		kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
+
+		if (!is_guest_mode(vcpu))
+			kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
+
 		svm_set_vintr(svm);
 	}
 }
@@ -3927,14 +3942,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC))
 			kvm_request_apicv_update(vcpu->kvm, false,
 						 APICV_INHIBIT_REASON_X2APIC);
-
-		/*
-		 * Currently, AVIC does not work with nested virtualization.
-		 * So, we disable AVIC when cpuid for SVM is set in the L1 guest.
-		 */
-		if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
-			kvm_request_apicv_update(vcpu->kvm, false,
-						 APICV_INHIBIT_REASON_NESTED);
 	}
 	init_vmcb_after_set_cpuid(vcpu);
 }
@@ -4657,6 +4664,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
+	.vcpu_has_apicv_inhibit_condition = avic_has_vcpu_inhibit_condition,
 };
 
 /*
@@ -4840,6 +4848,7 @@ static __init int svm_hardware_setup(void)
 	} else {
 		svm_x86_ops.vcpu_blocking = NULL;
 		svm_x86_ops.vcpu_unblocking = NULL;
+		svm_x86_ops.vcpu_has_apicv_inhibit_condition = NULL;
 	}
 
 	if (vls) {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 83f9f95eced3e..c02903641d13d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -580,6 +580,7 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
 void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
 void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void avic_ring_doorbell(struct kvm_vcpu *vcpu);
+bool avic_has_vcpu_inhibit_condition(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8cb5390f75efe..63d84c373e465 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9697,6 +9697,14 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
 	kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
 }
 
+bool vcpu_has_apicv_inhibit_condition(struct kvm_vcpu *vcpu)
+{
+	if (kvm_x86_ops.vcpu_has_apicv_inhibit_condition)
+		return static_call(kvm_x86_vcpu_has_apicv_inhibit_condition)(vcpu);
+	else
+		return false;
+}
+
 void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 {
 	bool activate;
@@ -9706,7 +9714,9 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 
 	down_read(&vcpu->kvm->arch.apicv_update_lock);
 
-	activate = kvm_apicv_activated(vcpu->kvm);
+	activate = kvm_apicv_activated(vcpu->kvm) &&
+		   !vcpu_has_apicv_inhibit_condition(vcpu);
+
 	if (vcpu->arch.apicv_active == activate)
 		goto out;
 
@@ -10110,7 +10120,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		 * per-VM state, and responsing vCPUs must wait for the update
 		 * to complete before servicing KVM_REQ_APICV_UPDATE.
 		 */
-		WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
+		if (vcpu_has_apicv_inhibit_condition(vcpu))
+			WARN_ON(kvm_vcpu_apicv_active(vcpu));
+		else
+			WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
+
 
 		exit_fastpath = static_call(kvm_x86_vcpu_run)(vcpu);
 		if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 13/30] KVM: x86: lapic: don't allow to change APIC ID when apic acceleration is enabled
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (11 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 12/30] KVM: x86: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 14/30] KVM: x86: lapic: don't allow to change local apic id when using older x2apic api Maxim Levitsky
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

No normal guest has any reason to change physical APIC IDs, and
allowing this introduces bugs into APIC acceleration code.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dd4e2888c244b..7ff695cab27b2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2002,10 +2002,20 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 	switch (reg) {
 	case APIC_ID:		/* Local APIC ID */
-		if (!apic_x2apic_mode(apic))
-			kvm_apic_set_xapic_id(apic, val >> 24);
-		else
+		if (apic_x2apic_mode(apic)) {
 			ret = 1;
+			break;
+		}
+		/*
+		 * Don't allow setting APIC ID with any APIC acceleration
+		 * enabled to avoid unexpected issues
+		 */
+		if (enable_apicv && ((val >> 24) != apic->vcpu->vcpu_id)) {
+			kvm_vm_bugged(apic->vcpu->kvm);
+			break;
+		}
+
+		kvm_apic_set_xapic_id(apic, val >> 24);
 		break;
 
 	case APIC_TASKPRI:
@@ -2572,10 +2582,16 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 		struct kvm_lapic_state *s, bool set)
 {
-	if (apic_x2apic_mode(vcpu->arch.apic)) {
-		u32 *id = (u32 *)(s->regs + APIC_ID);
-		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
+	u32 *id = (u32 *)(s->regs + APIC_ID);
+	u32 *ldr = (u32 *)(s->regs + APIC_LDR);
 
+	if (!apic_x2apic_mode(vcpu->arch.apic)) {
+		/* Don't allow setting APIC ID with any APIC acceleration
+		 * enabled to avoid unexpected issues
+		 */
+		if (enable_apicv && (*id >> 24) != vcpu->vcpu_id)
+			return -EINVAL;
+	} else {
 		if (vcpu->kvm->arch.x2apic_format) {
 			if (*id != vcpu->vcpu_id)
 				return -EINVAL;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 14/30] KVM: x86: lapic: don't allow to change local apic id when using older x2apic api
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (12 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 13/30] KVM: x86: lapic: don't allow to change APIC ID when apic acceleration is enabled Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 15/30] KVM: x86: SVM: remove avic's broken code that updated APIC ID Maxim Levitsky
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

KVM allowed to set non boot apic id via setting apic state
if using older non x2apic 32 bit apic id userspace api.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7ff695cab27b2..aeddd68d31181 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2592,15 +2592,15 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 		if (enable_apicv && (*id >> 24) != vcpu->vcpu_id)
 			return -EINVAL;
 	} else {
-		if (vcpu->kvm->arch.x2apic_format) {
-			if (*id != vcpu->vcpu_id)
-				return -EINVAL;
-		} else {
-			if (set)
-				*id >>= 24;
-			else
-				*id <<= 24;
-		}
+
+		if (!vcpu->kvm->arch.x2apic_format && set)
+			*id >>= 24;
+
+		if (*id != vcpu->vcpu_id)
+			return -EINVAL;
+
+		if (!vcpu->kvm->arch.x2apic_format && !set)
+			*id <<= 24;
 
 		/* In x2APIC mode, the LDR is fixed and based on the id */
 		if (set)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 15/30] KVM: x86: SVM: remove avic's broken code that updated APIC ID
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (13 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 14/30] KVM: x86: lapic: don't allow to change local apic id when using older x2apic api Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 16/30] KVM: x86: SVM: allow to force AVIC to be enabled Maxim Levitsky
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Now that KVM doesn't allow to change APIC ID in case AVIC is
enabled, remove buggy AVIC code that tried to do so.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 35 -----------------------------------
 1 file changed, 35 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8f23e7d239097..768252b3dfee6 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -440,35 +440,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
-static int avic_handle_apic_id_update(struct kvm_vcpu *vcpu)
-{
-	u64 *old, *new;
-	struct vcpu_svm *svm = to_svm(vcpu);
-	u32 id = kvm_xapic_id(vcpu->arch.apic);
-
-	if (vcpu->vcpu_id == id)
-		return 0;
-
-	old = avic_get_physical_id_entry(vcpu, vcpu->vcpu_id);
-	new = avic_get_physical_id_entry(vcpu, id);
-	if (!new || !old)
-		return 1;
-
-	/* We need to move physical_id_entry to new offset */
-	*new = *old;
-	*old = 0ULL;
-	to_svm(vcpu)->avic_physical_id_cache = new;
-
-	/*
-	 * Also update the guest physical APIC ID in the logical
-	 * APIC ID table entry if already setup the LDR.
-	 */
-	if (svm->ldr_reg)
-		avic_handle_ldr_update(vcpu);
-
-	return 0;
-}
-
 static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -488,10 +459,6 @@ static int avic_unaccel_trap_write(struct vcpu_svm *svm)
 				AVIC_UNACCEL_ACCESS_OFFSET_MASK;
 
 	switch (offset) {
-	case APIC_ID:
-		if (avic_handle_apic_id_update(&svm->vcpu))
-			return 0;
-		break;
 	case APIC_LDR:
 		if (avic_handle_ldr_update(&svm->vcpu))
 			return 0;
@@ -584,8 +551,6 @@ int avic_init_vcpu(struct vcpu_svm *svm)
 
 void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 {
-	if (avic_handle_apic_id_update(vcpu) != 0)
-		return;
 	avic_handle_dfr_update(vcpu);
 	avic_handle_ldr_update(vcpu);
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 16/30] KVM: x86: SVM: allow to force AVIC to be enabled
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (14 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 15/30] KVM: x86: SVM: remove avic's broken code that updated APIC ID Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 17/30] KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set Maxim Levitsky
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Apparently on some systems AVIC is disabled in CPUID but still usable.

Allow the user to override the CPUID if the user is willing to
take the risk.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 85035324ed762..b88ca7f07a0fc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -202,6 +202,9 @@ module_param(tsc_scaling, int, 0444);
 static bool avic;
 module_param(avic, bool, 0444);
 
+static bool force_avic;
+module_param_unsafe(force_avic, bool, 0444);
+
 bool __read_mostly dump_invalid_vmcb;
 module_param(dump_invalid_vmcb, bool, 0644);
 
@@ -4839,10 +4842,14 @@ static __init int svm_hardware_setup(void)
 			nrips = false;
 	}
 
-	enable_apicv = avic = avic && npt_enabled && boot_cpu_has(X86_FEATURE_AVIC);
+	enable_apicv = avic = avic && npt_enabled && (boot_cpu_has(X86_FEATURE_AVIC) || force_avic);
 
 	if (enable_apicv) {
-		pr_info("AVIC enabled\n");
+		if (!boot_cpu_has(X86_FEATURE_AVIC)) {
+			pr_warn("AVIC is not supported in CPUID but force enabled");
+			pr_warn("Your system might crash and burn");
+		} else
+			pr_info("AVIC enabled\n");
 
 		amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
 	} else {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 17/30] KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (15 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 16/30] KVM: x86: SVM: allow to force AVIC to be enabled Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 18/30] KVM: x86: mmu: add strict mmu mode Maxim Levitsky
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

It makes more sense to print new SPTE value than the
old value.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 296f8723f9ae9..43c7abdd6b70f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2708,8 +2708,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 	if (*sptep == spte) {
 		ret = RET_PF_SPURIOUS;
 	} else {
-		trace_kvm_mmu_set_spte(level, gfn, sptep);
 		flush |= mmu_spte_update(sptep, spte);
+		trace_kvm_mmu_set_spte(level, gfn, sptep);
 	}
 
 	if (wrprot) {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 18/30] KVM: x86: mmu: add strict mmu mode
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (16 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 17/30] KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 19/30] KVM: x86: mmu: add gfn_in_memslot helper Maxim Levitsky
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Add an (mostly debug) option to force KVM's shadow mmu
to never have unsync pages.

This is useful in some cases to debug it.

It is also useful for some legacy guest OSes which don't
flush TLBs correctly, and thus don't work on modern
CPUs which have speculative MMUs.

Using this option together with legacy paging (npt/ept=0)
allows to correctly simulate such old MMU while still
getting most of the benefits of the virtualization.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 43c7abdd6b70f..fa2da6990703f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -91,6 +91,10 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_period_ms, "uint");
 static bool __read_mostly force_flush_and_sync_on_reuse;
 module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644);
 
+
+bool strict_mmu;
+module_param(strict_mmu, bool, 0644);
+
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
  * where the hardware walks 2 page tables:
@@ -2703,7 +2707,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
 	}
 
 	wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefetch,
-			   true, host_writable, &spte);
+			   !strict_mmu, host_writable, &spte);
 
 	if (*sptep == spte) {
 		ret = RET_PF_SPURIOUS;
@@ -5139,6 +5143,11 @@ static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
  */
 static bool detect_write_flooding(struct kvm_mmu_page *sp)
 {
+	/*
+	 * When using non speculating MMU, use a bit higher threshold
+	 * for write flood detection
+	 */
+	int threshold = strict_mmu ? 10 :  3;
 	/*
 	 * Skip write-flooding detected for the sp whose level is 1, because
 	 * it can become unsync, then the guest page is not write-protected.
@@ -5147,7 +5156,7 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
 		return false;
 
 	atomic_inc(&sp->write_flooding_count);
-	return atomic_read(&sp->write_flooding_count) >= 3;
+	return atomic_read(&sp->write_flooding_count) >= threshold;
 }
 
 /*
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 19/30] KVM: x86: mmu: add gfn_in_memslot helper
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (17 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 18/30] KVM: x86: mmu: add strict mmu mode Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 20/30] KVM: x86: mmu: allow to enable write tracking externally Maxim Levitsky
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

This is a tiny refactoring, and can be useful to check
if a GPA/GFN is within a memslot a bit more cleanly.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 include/linux/kvm_host.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b3810976a27f8..483681c6e322e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1564,6 +1564,13 @@ int kvm_request_irq_source_id(struct kvm *kvm);
 void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
 bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args);
 
+
+static inline bool gfn_in_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	return (gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages);
+}
+
+
 /*
  * Returns a pointer to the memslot if it contains gfn.
  * Otherwise returns NULL.
@@ -1574,12 +1581,13 @@ try_get_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 	if (!slot)
 		return NULL;
 
-	if (gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages)
+	if (gfn_in_memslot(slot, gfn))
 		return slot;
 	else
 		return NULL;
 }
 
+
 /*
  * Returns a pointer to the memslot that contains gfn. Otherwise returns NULL.
  *
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 20/30] KVM: x86: mmu: allow to enable write tracking externally
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (18 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 19/30] KVM: x86: mmu: add gfn_in_memslot helper Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 21/30] x86: KVMGT: use kvm_page_track_write_tracking_enable Maxim Levitsky
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

This will be used to enable write tracking from nested AVIC code
and can also be used to enable write tracking in GVT-g module
when it actually uses it as opposed to always enabling it,
when the module is compiled in the kernel.

No functional change intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h       |  2 +-
 arch/x86/include/asm/kvm_page_track.h |  1 +
 arch/x86/kvm/mmu.h                    |  8 +++++---
 arch/x86/kvm/mmu/mmu.c                | 16 +++++++++-------
 arch/x86/kvm/mmu/page_track.c         | 10 ++++++++--
 5 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 256539c0481c5..428ab1cc7dd34 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1225,7 +1225,7 @@ struct kvm_arch {
 	 * is used as one input when determining whether certain memslot
 	 * related allocations are necessary.
 	 */
-	bool shadow_root_allocated;
+	bool mmu_page_tracking_enabled;
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	hpa_t	hv_root_tdp;
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a9..955a5ae07b10e 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -50,6 +50,7 @@ int kvm_page_track_init(struct kvm *kvm);
 void kvm_page_track_cleanup(struct kvm *kvm);
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
+int kvm_page_track_write_tracking_enable(struct kvm *kvm);
 int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
 
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 51faa2c76ca5f..48cc042f17466 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -267,7 +267,7 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 int kvm_mmu_post_init_vm(struct kvm *kvm);
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
 
-static inline bool kvm_shadow_root_allocated(struct kvm *kvm)
+static inline bool mmu_page_tracking_enabled(struct kvm *kvm)
 {
 	/*
 	 * Read shadow_root_allocated before related pointers. Hence, threads
@@ -275,9 +275,11 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm)
 	 * see the pointers. Pairs with smp_store_release in
 	 * mmu_first_shadow_root_alloc.
 	 */
-	return smp_load_acquire(&kvm->arch.shadow_root_allocated);
+	return smp_load_acquire(&kvm->arch.mmu_page_tracking_enabled);
 }
 
+int mmu_enable_write_tracking(struct kvm *kvm);
+
 #ifdef CONFIG_X86_64
 static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; }
 #else
@@ -286,7 +288,7 @@ static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; }
 
 static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
 {
-	return !is_tdp_mmu_enabled(kvm) || kvm_shadow_root_allocated(kvm);
+	return !is_tdp_mmu_enabled(kvm) || mmu_page_tracking_enabled(kvm);
 }
 
 static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fa2da6990703f..431e02ba73690 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3384,7 +3384,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	return r;
 }
 
-static int mmu_first_shadow_root_alloc(struct kvm *kvm)
+int mmu_enable_write_tracking(struct kvm *kvm)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
@@ -3394,21 +3394,20 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm)
 	 * Check if this is the first shadow root being allocated before
 	 * taking the lock.
 	 */
-	if (kvm_shadow_root_allocated(kvm))
+	if (mmu_page_tracking_enabled(kvm))
 		return 0;
 
 	mutex_lock(&kvm->slots_arch_lock);
 
 	/* Recheck, under the lock, whether this is the first shadow root. */
-	if (kvm_shadow_root_allocated(kvm))
+	if (mmu_page_tracking_enabled(kvm))
 		goto out_unlock;
 
 	/*
 	 * Check if anything actually needs to be allocated, e.g. all metadata
 	 * will be allocated upfront if TDP is disabled.
 	 */
-	if (kvm_memslots_have_rmaps(kvm) &&
-	    kvm_page_track_write_tracking_enabled(kvm))
+	if (kvm_memslots_have_rmaps(kvm) && mmu_page_tracking_enabled(kvm))
 		goto out_success;
 
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
@@ -3438,7 +3437,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm)
 	 * all the related pointers are set.
 	 */
 out_success:
-	smp_store_release(&kvm->arch.shadow_root_allocated, true);
+	smp_store_release(&kvm->arch.mmu_page_tracking_enabled, true);
 
 out_unlock:
 	mutex_unlock(&kvm->slots_arch_lock);
@@ -3475,7 +3474,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	r = mmu_first_shadow_root_alloc(vcpu->kvm);
+	r = mmu_enable_write_tracking(vcpu->kvm);
 	if (r)
 		return r;
 
@@ -5712,6 +5711,9 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 	node->track_write = kvm_mmu_pte_write;
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
+
+	if (IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || !tdp_enabled)
+		mmu_enable_write_tracking(kvm);
 }
 
 void kvm_mmu_uninit_vm(struct kvm *kvm)
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 68eb1fb548b61..ce5735909e74c 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -21,10 +21,16 @@
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
-	return IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) ||
-	       !tdp_enabled || kvm_shadow_root_allocated(kvm);
+	return mmu_page_tracking_enabled(kvm);
 }
 
+int kvm_page_track_write_tracking_enable(struct kvm *kvm)
+{
+	return mmu_enable_write_tracking(kvm);
+}
+EXPORT_SYMBOL_GPL(kvm_page_track_write_tracking_enable);
+
+
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
 {
 	int i;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 21/30] x86: KVMGT: use kvm_page_track_write_tracking_enable
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (19 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 20/30] KVM: x86: mmu: allow to enable write tracking externally Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 22/30] KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running Maxim Levitsky
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

This allows to enable the write tracking only when KVMGT is
actually used and doesn't carry any penalty otherwise.

Tested by booting a VM with a kvmgt mdev device.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/Kconfig             | 3 ---
 arch/x86/kvm/mmu/mmu.c           | 2 +-
 drivers/gpu/drm/i915/Kconfig     | 1 -
 drivers/gpu/drm/i915/gvt/kvmgt.c | 5 +++++
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ebc8ce9ec9173..169f8833cd0d1 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -132,7 +132,4 @@ config KVM_MMU_AUDIT
 	 This option adds a R/W kVM module parameter 'mmu_audit', which allows
 	 auditing of KVM MMU events at runtime.
 
-config KVM_EXTERNAL_WRITE_TRACKING
-	bool
-
 endif # VIRTUALIZATION
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 431e02ba73690..e4e2fc8e7d7a5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5712,7 +5712,7 @@ void kvm_mmu_init_vm(struct kvm *kvm)
 	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
 
-	if (IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || !tdp_enabled)
+	if (!tdp_enabled)
 		mmu_enable_write_tracking(kvm);
 }
 
diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 84b6fc70cbf52..bf041b26ffec3 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -126,7 +126,6 @@ config DRM_I915_GVT_KVMGT
 	depends on DRM_I915_GVT
 	depends on KVM
 	depends on VFIO_MDEV
-	select KVM_EXTERNAL_WRITE_TRACKING
 	default n
 	help
 	  Choose this option if you want to enable KVMGT support for
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 20b82fb036f8c..64ced3c2bc550 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1916,6 +1916,7 @@ static int kvmgt_guest_init(struct mdev_device *mdev)
 	struct intel_vgpu *vgpu;
 	struct kvmgt_vdev *vdev;
 	struct kvm *kvm;
+	int ret;
 
 	vgpu = mdev_get_drvdata(mdev);
 	if (handle_valid(vgpu->handle))
@@ -1931,6 +1932,10 @@ static int kvmgt_guest_init(struct mdev_device *mdev)
 	if (__kvmgt_vgpu_exist(vgpu, kvm))
 		return -EEXIST;
 
+	ret = kvm_page_track_write_tracking_enable(kvm);
+	if (ret)
+		return ret;
+
 	info = vzalloc(sizeof(struct kvmgt_guest_info));
 	if (!info)
 		return -ENOMEM;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 22/30] KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (20 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 21/30] x86: KVMGT: use kvm_page_track_write_tracking_enable Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 23/30] KVM: x86: nSVM: implement nested LBR virtualization Maxim Levitsky
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

When L2 is running without LBR virtualization, we should ensure
that L1's LBR msrs continue to update as usual.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 11 +++++
 arch/x86/kvm/svm/svm.c    | 98 +++++++++++++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.h    |  2 +
 3 files changed, 92 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ac9159b0618c7..9f7bc7db08dd3 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -535,6 +535,9 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		svm->vcpu.arch.dr6  = svm->nested.save.dr6 | DR6_ACTIVE_LOW;
 		vmcb_mark_dirty(svm->vmcb, VMCB_DR);
 	}
+
+	if (unlikely(svm->vmcb01.ptr->control.virt_ext & LBR_CTL_ENABLE_MASK))
+		svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb);
 }
 
 static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
@@ -587,6 +590,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	svm->vmcb->control.event_inj           = svm->nested.ctl.event_inj;
 	svm->vmcb->control.event_inj_err       = svm->nested.ctl.event_inj_err;
 
+	svm->vmcb->control.virt_ext            = svm->vmcb01.ptr->control.virt_ext &
+						 LBR_CTL_ENABLE_MASK;
+
 	nested_svm_transition_tlb_flush(vcpu);
 
 	/* Enter Guest-Mode */
@@ -852,6 +858,11 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
+	if (unlikely(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
+		svm_copy_lbrs(svm->nested.vmcb02.ptr, svm->vmcb);
+		svm_update_lbrv(vcpu);
+	}
+
 	/*
 	 * On vmexit the  GIF is set to false and
 	 * no event can be injected in L1.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b88ca7f07a0fc..294e016f575a8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -805,6 +805,17 @@ static void init_msrpm_offsets(void)
 	}
 }
 
+void svm_copy_lbrs(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
+{
+	to_vmcb->save.dbgctl		= from_vmcb->save.dbgctl;
+	to_vmcb->save.br_from		= from_vmcb->save.br_from;
+	to_vmcb->save.br_to		= from_vmcb->save.br_to;
+	to_vmcb->save.last_excp_from	= from_vmcb->save.last_excp_from;
+	to_vmcb->save.last_excp_to	= from_vmcb->save.last_excp_to;
+
+	vmcb_mark_dirty(to_vmcb, VMCB_LBR);
+}
+
 static void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -814,6 +825,10 @@ static void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
+
+	/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
+	if (is_guest_mode(vcpu))
+		svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb);
 }
 
 static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
@@ -825,6 +840,63 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0);
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 0, 0);
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
+
+	/*
+	 * Move the LBR msrs back to the vmcb01 to avoid copying them
+	 * on nested guest entries.
+	 */
+	if (is_guest_mode(vcpu))
+		svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr);
+}
+
+static int svm_get_lbr_msr(struct vcpu_svm *svm, u32 index)
+{
+	/*
+	 * If the LBR virtualization is disabled, the LBR msrs are always
+	 * kept in the vmcb01 to avoid copying them on nested guest entries.
+	 *
+	 * If nested, and the LBR virtualization is enabled/disabled, the msrs
+	 * are moved between the vmcb01 and vmcb02 as needed.
+	 */
+	struct vmcb *vmcb =
+		(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) ?
+			svm->vmcb : svm->vmcb01.ptr;
+
+	switch (index) {
+	case MSR_IA32_DEBUGCTLMSR:
+		return vmcb->save.dbgctl;
+	case MSR_IA32_LASTBRANCHFROMIP:
+		return vmcb->save.br_from;
+	case MSR_IA32_LASTBRANCHTOIP:
+		return vmcb->save.br_to;
+	case MSR_IA32_LASTINTFROMIP:
+		return vmcb->save.last_excp_from;
+	case MSR_IA32_LASTINTTOIP:
+		return vmcb->save.last_excp_to;
+	default:
+		KVM_BUG(false, svm->vcpu.kvm,
+			"%s: Unknown MSR 0x%x", __func__, index);
+		return 0;
+	}
+}
+
+void svm_update_lbrv(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	bool enable_lbrv = svm_get_lbr_msr(svm, MSR_IA32_DEBUGCTLMSR) &
+					   DEBUGCTLMSR_LBR;
+
+	bool current_enable_lbrv = !!(svm->vmcb->control.virt_ext &
+				      LBR_CTL_ENABLE_MASK);
+
+	if (enable_lbrv == current_enable_lbrv)
+		return;
+
+	if (enable_lbrv)
+		svm_enable_lbrv(vcpu);
+	else
+		svm_disable_lbrv(vcpu);
 }
 
 void disable_nmi_singlestep(struct vcpu_svm *svm)
@@ -2591,25 +2663,12 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_TSC_AUX:
 		msr_info->data = svm->tsc_aux;
 		break;
-	/*
-	 * Nobody will change the following 5 values in the VMCB so we can
-	 * safely return them on rdmsr. They will always be 0 until LBRV is
-	 * implemented.
-	 */
 	case MSR_IA32_DEBUGCTLMSR:
-		msr_info->data = svm->vmcb->save.dbgctl;
-		break;
 	case MSR_IA32_LASTBRANCHFROMIP:
-		msr_info->data = svm->vmcb->save.br_from;
-		break;
 	case MSR_IA32_LASTBRANCHTOIP:
-		msr_info->data = svm->vmcb->save.br_to;
-		break;
 	case MSR_IA32_LASTINTFROMIP:
-		msr_info->data = svm->vmcb->save.last_excp_from;
-		break;
 	case MSR_IA32_LASTINTTOIP:
-		msr_info->data = svm->vmcb->save.last_excp_to;
+		msr_info->data = svm_get_lbr_msr(svm, msr_info->index);
 		break;
 	case MSR_VM_HSAVE_PA:
 		msr_info->data = svm->nested.hsave_msr;
@@ -2840,12 +2899,13 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		if (data & DEBUGCTL_RESERVED_BITS)
 			return 1;
 
-		svm->vmcb->save.dbgctl = data;
-		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
-		if (data & (1ULL<<0))
-			svm_enable_lbrv(vcpu);
+		if (svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)
+			svm->vmcb->save.dbgctl = data;
 		else
-			svm_disable_lbrv(vcpu);
+			svm->vmcb01.ptr->save.dbgctl = data;
+
+		svm_update_lbrv(vcpu);
+
 		break;
 	case MSR_VM_HSAVE_PA:
 		/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c02903641d13d..b83e06d5d942a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -476,6 +476,8 @@ u32 svm_msrpm_offset(u32 msr);
 u32 *svm_vcpu_alloc_msrpm(void);
 void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
 void svm_vcpu_free_msrpm(u32 *msrpm);
+void svm_copy_lbrs(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
+void svm_update_lbrv(struct kvm_vcpu *vcpu);
 
 int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
 void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 23/30] KVM: x86: nSVM: implement nested LBR virtualization
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (21 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 22/30] KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 24/30] KVM: x86: nSVM: implement nested VMLOAD/VMSAVE Maxim Levitsky
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

This was tested with kvm-unit-test that was developed
for this purpose.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 21 +++++++++++++++++++--
 arch/x86/kvm/svm/svm.c    |  8 ++++++++
 arch/x86/kvm/svm/svm.h    |  1 +
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9f7bc7db08dd3..4a228a76b27d7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -536,8 +536,19 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		vmcb_mark_dirty(svm->vmcb, VMCB_DR);
 	}
 
-	if (unlikely(svm->vmcb01.ptr->control.virt_ext & LBR_CTL_ENABLE_MASK))
+	if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+
+		/* Copy LBR related registers from vmcb12,
+		 * but make sure that we only pick LBR enable bit from the guest.
+		 */
+
+		svm_copy_lbrs(vmcb12, svm->vmcb);
+		svm->vmcb->save.dbgctl &= LBR_CTL_ENABLE_MASK;
+		svm_update_lbrv(&svm->vcpu);
+
+	} else if (unlikely(svm->vmcb01.ptr->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
 		svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb);
+	}
 }
 
 static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
@@ -592,6 +603,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 
 	svm->vmcb->control.virt_ext            = svm->vmcb01.ptr->control.virt_ext &
 						 LBR_CTL_ENABLE_MASK;
+	if (svm->lbrv_enabled)
+		svm->vmcb->control.virt_ext  |=
+			(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
 
 	nested_svm_transition_tlb_flush(vcpu);
 
@@ -858,7 +872,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
-	if (unlikely(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
+	if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+		svm_copy_lbrs(svm->nested.vmcb02.ptr, vmcb12);
+		svm_update_lbrv(vcpu);
+	} else if (unlikely(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
 		svm_copy_lbrs(svm->nested.vmcb02.ptr, svm->vmcb);
 		svm_update_lbrv(vcpu);
 	}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 294e016f575a8..76aa6054d9db2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -890,6 +890,10 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu)
 	bool current_enable_lbrv = !!(svm->vmcb->control.virt_ext &
 				      LBR_CTL_ENABLE_MASK);
 
+	if (unlikely(is_guest_mode(vcpu) && svm->lbrv_enabled))
+		if (unlikely(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))
+			enable_lbrv = true;
+
 	if (enable_lbrv == current_enable_lbrv)
 		return;
 
@@ -3987,6 +3991,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 			     guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
 
 	svm->tsc_scaling_enabled = tsc_scaling && guest_cpuid_has(vcpu, X86_FEATURE_TSCRATEMSR);
+	svm->lbrv_enabled = lbrv && guest_cpuid_has(vcpu, X86_FEATURE_LBRV);
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
@@ -4791,6 +4796,9 @@ static __init void svm_set_cpu_caps(void)
 		if (tsc_scaling)
 			kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR);
 
+		if (lbrv)
+			kvm_cpu_cap_set(X86_FEATURE_LBRV);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index b83e06d5d942a..0012ba5affcba 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -220,6 +220,7 @@ struct vcpu_svm {
 	/* cached guest cpuid flags for faster access */
 	bool nrips_enabled                : 1;
 	bool tsc_scaling_enabled          : 1;
+	bool lbrv_enabled                 : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 24/30] KVM: x86: nSVM: implement nested VMLOAD/VMSAVE
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (22 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 23/30] KVM: x86: nSVM: implement nested LBR virtualization Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 25/30] KVM: x86: nSVM: support PAUSE filter threshold and count when cpu_pm=on Maxim Levitsky
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

This was tested by booting L1,L2,L3 (all Linux) and checking
that no VMLOAD/VMSAVE vmexits happened.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 35 +++++++++++++++++++++++++++++------
 arch/x86/kvm/svm/svm.c    |  7 +++++++
 arch/x86/kvm/svm/svm.h    |  8 +++++++-
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4a228a76b27d7..bdcb23c76e89e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -120,6 +120,20 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
 }
 
+static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
+{
+	if (!svm->v_vmload_vmsave_enabled)
+		return true;
+
+	if (!nested_npt_enabled(svm))
+		return true;
+
+	if (!(svm->nested.ctl.virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK))
+		return true;
+
+	return false;
+}
+
 void recalc_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *c, *h;
@@ -161,8 +175,17 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	if (!intercept_smi)
 		vmcb_clr_intercept(c, INTERCEPT_SMI);
 
-	vmcb_set_intercept(c, INTERCEPT_VMLOAD);
-	vmcb_set_intercept(c, INTERCEPT_VMSAVE);
+	if (nested_vmcb_needs_vls_intercept(svm)) {
+		/*
+		 * If the virtual VMLOAD/VMSAVE is not enabled for the L2,
+		 * we must intercept these instructions to correctly
+		 * emulate them in case L1 doesn't intercept them.
+		 */
+		vmcb_set_intercept(c, INTERCEPT_VMLOAD);
+		vmcb_set_intercept(c, INTERCEPT_VMSAVE);
+	} else {
+		WARN_ON(!(c->virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
+	}
 }
 
 static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
@@ -426,10 +449,7 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
 	vmcb12->control.exit_int_info = exit_int_info;
 }
 
-static inline bool nested_npt_enabled(struct vcpu_svm *svm)
-{
-	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
-}
+
 
 static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 {
@@ -607,6 +627,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 		svm->vmcb->control.virt_ext  |=
 			(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
 
+	if (!nested_vmcb_needs_vls_intercept(svm))
+		svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+
 	nested_svm_transition_tlb_flush(vcpu);
 
 	/* Enter Guest-Mode */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 76aa6054d9db2..0f068da098d9f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1051,6 +1051,8 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 0, 0);
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 0, 0);
+
+		svm->v_vmload_vmsave_enabled = false;
 	} else {
 		/*
 		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
@@ -3993,6 +3995,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	svm->tsc_scaling_enabled = tsc_scaling && guest_cpuid_has(vcpu, X86_FEATURE_TSCRATEMSR);
 	svm->lbrv_enabled = lbrv && guest_cpuid_has(vcpu, X86_FEATURE_LBRV);
 
+	svm->v_vmload_vmsave_enabled = vls && guest_cpuid_has(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
@@ -4799,6 +4803,9 @@ static __init void svm_set_cpu_caps(void)
 		if (lbrv)
 			kvm_cpu_cap_set(X86_FEATURE_LBRV);
 
+		if (vls)
+			kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0012ba5affcba..e8ffd458a5575 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -217,10 +217,11 @@ struct vcpu_svm {
 	unsigned int3_injected;
 	unsigned long int3_rip;
 
-	/* cached guest cpuid flags for faster access */
+	/* optional nested SVM features that are enabled for this guest  */
 	bool nrips_enabled                : 1;
 	bool tsc_scaling_enabled          : 1;
 	bool lbrv_enabled                 : 1;
+	bool v_vmload_vmsave_enabled      : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
@@ -468,6 +469,11 @@ static inline bool gif_set(struct vcpu_svm *svm)
 		return !!(svm->vcpu.arch.hflags & HF_GIF_MASK);
 }
 
+static inline bool nested_npt_enabled(struct vcpu_svm *svm)
+{
+	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
+}
+
 /* svm.c */
 #define MSR_INVALID				0xffffffffU
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 25/30] KVM: x86: nSVM: support PAUSE filter threshold and count when cpu_pm=on
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (23 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 24/30] KVM: x86: nSVM: implement nested VMLOAD/VMSAVE Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 26/30] KVM: x86: nSVM: implement nested vGIF Maxim Levitsky
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Allow L1 to use these settings if L0 disables PAUSE interception
(AKA cpu_pm=on)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c |  6 ++++++
 arch/x86/kvm/svm/svm.c    | 17 +++++++++++++++++
 arch/x86/kvm/svm/svm.h    |  2 ++
 3 files changed, 25 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index bdcb23c76e89e..601d38ae05cc6 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -630,6 +630,12 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	if (!nested_vmcb_needs_vls_intercept(svm))
 		svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 
+	if (svm->pause_filter_enabled)
+		svm->vmcb->control.pause_filter_count = svm->nested.ctl.pause_filter_count;
+
+	if (svm->pause_threshold_enabled)
+		svm->vmcb->control.pause_filter_thresh = svm->nested.ctl.pause_filter_thresh;
+
 	nested_svm_transition_tlb_flush(vcpu);
 
 	/* Enter Guest-Mode */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0f068da098d9f..e49043807ec44 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3997,6 +3997,17 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm->v_vmload_vmsave_enabled = vls && guest_cpuid_has(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
 
+	if (kvm_pause_in_guest(vcpu->kvm)) {
+		svm->pause_filter_enabled = pause_filter_count > 0 &&
+					    guest_cpuid_has(vcpu, X86_FEATURE_PAUSEFILTER);
+
+		svm->pause_threshold_enabled = pause_filter_thresh > 0 &&
+					    guest_cpuid_has(vcpu, X86_FEATURE_PFTHRESHOLD);
+	} else {
+		svm->pause_filter_enabled = false;
+		svm->pause_threshold_enabled = false;
+	}
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
@@ -4806,6 +4817,12 @@ static __init void svm_set_cpu_caps(void)
 		if (vls)
 			kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD);
 
+		if (pause_filter_count)
+			kvm_cpu_cap_set(X86_FEATURE_PAUSEFILTER);
+
+		if (pause_filter_thresh)
+			kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e8ffd458a5575..297ec57f9941c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -222,6 +222,8 @@ struct vcpu_svm {
 	bool tsc_scaling_enabled          : 1;
 	bool lbrv_enabled                 : 1;
 	bool v_vmload_vmsave_enabled      : 1;
+	bool pause_filter_enabled         : 1;
+	bool pause_threshold_enabled      : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 26/30] KVM: x86: nSVM: implement nested vGIF
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (24 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 25/30] KVM: x86: nSVM: support PAUSE filter threshold and count when cpu_pm=on Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 27/30] KVM: x86: add force_intercept_exceptions_mask Maxim Levitsky
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

In case L1 enables vGIF for L2, the L2 cannot affect L1's GIF, regardless
of STGI/CLGI intercepts, and since VM entry enables GIF, this means
that L1's GIF is always 1 while L2 is running.

Thus in this case leave L1's vGIF in vmcb01, while letting L2
control the vGIF thus implementing nested vGIF.

Also allow KVM to toggle L1's GIF during nested entry/exit
by always using vmcb01.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 17 +++++++++++++----
 arch/x86/kvm/svm/svm.c    |  5 +++++
 arch/x86/kvm/svm/svm.h    | 25 +++++++++++++++++++++----
 3 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 601d38ae05cc6..a426d4d3dcd82 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -408,6 +408,10 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 		 */
 		mask &= ~V_IRQ_MASK;
 	}
+
+	if (nested_vgif_enabled(svm))
+		mask |= V_GIF_MASK;
+
 	svm->nested.ctl.int_ctl        &= ~mask;
 	svm->nested.ctl.int_ctl        |= svm->vmcb->control.int_ctl & mask;
 }
@@ -573,10 +577,8 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 
 static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 {
-	const u32 int_ctl_vmcb01_bits =
-		V_INTR_MASKING_MASK | V_GIF_MASK | V_GIF_ENABLE_MASK;
-
-	const u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK;
+	u32 int_ctl_vmcb01_bits = V_INTR_MASKING_MASK;
+	u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK;
 
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
@@ -586,6 +588,13 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	 */
 
 
+
+
+	if (svm->vgif_enabled && (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK))
+		int_ctl_vmcb12_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
+	else
+		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
+
 	/* Copied from vmcb01.  msrpm_base can be overwritten later.  */
 	svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl;
 	svm->vmcb->control.iopm_base_pa = svm->vmcb01.ptr->control.iopm_base_pa;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e49043807ec44..1cf682d1553cc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4008,6 +4008,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		svm->pause_threshold_enabled = false;
 	}
 
+	svm->vgif_enabled = vgif && guest_cpuid_has(vcpu, X86_FEATURE_VGIF);
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
@@ -4823,6 +4825,9 @@ static __init void svm_set_cpu_caps(void)
 		if (pause_filter_thresh)
 			kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD);
 
+		if (vgif)
+			kvm_cpu_cap_set(X86_FEATURE_VGIF);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 297ec57f9941c..73cc9d3e784bd 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -224,6 +224,7 @@ struct vcpu_svm {
 	bool v_vmload_vmsave_enabled      : 1;
 	bool pause_filter_enabled         : 1;
 	bool pause_threshold_enabled      : 1;
+	bool vgif_enabled                 : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
@@ -442,31 +443,47 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
 	return vmcb_is_intercept(&svm->vmcb->control, bit);
 }
 
+static bool nested_vgif_enabled(struct vcpu_svm *svm)
+{
+	if (!is_guest_mode(&svm->vcpu) || !svm->vgif_enabled)
+		return false;
+	return svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK;
+}
+
 static inline bool vgif_enabled(struct vcpu_svm *svm)
 {
-	return !!(svm->vmcb->control.int_ctl & V_GIF_ENABLE_MASK);
+	struct vmcb *vmcb = nested_vgif_enabled(svm) ? svm->vmcb01.ptr : svm->vmcb;
+
+	return !!(vmcb->control.int_ctl & V_GIF_ENABLE_MASK);
 }
 
 static inline void enable_gif(struct vcpu_svm *svm)
 {
+	struct vmcb *vmcb = nested_vgif_enabled(svm) ? svm->vmcb01.ptr : svm->vmcb;
+
 	if (vgif_enabled(svm))
-		svm->vmcb->control.int_ctl |= V_GIF_MASK;
+		vmcb->control.int_ctl |= V_GIF_MASK;
 	else
 		svm->vcpu.arch.hflags |= HF_GIF_MASK;
 }
 
 static inline void disable_gif(struct vcpu_svm *svm)
 {
+	struct vmcb *vmcb = nested_vgif_enabled(svm) ? svm->vmcb01.ptr : svm->vmcb;
+
 	if (vgif_enabled(svm))
-		svm->vmcb->control.int_ctl &= ~V_GIF_MASK;
+		vmcb->control.int_ctl &= ~V_GIF_MASK;
 	else
 		svm->vcpu.arch.hflags &= ~HF_GIF_MASK;
+
 }
 
 static inline bool gif_set(struct vcpu_svm *svm)
 {
+	struct vmcb *vmcb = nested_vgif_enabled(svm) ? svm->vmcb01.ptr : svm->vmcb;
+
 	if (vgif_enabled(svm))
-		return !!(svm->vmcb->control.int_ctl & V_GIF_MASK);
+		return !!(vmcb->control.int_ctl & V_GIF_MASK);
 	else
 		return !!(svm->vcpu.arch.hflags & HF_GIF_MASK);
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 27/30] KVM: x86: add force_intercept_exceptions_mask
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (25 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 26/30] KVM: x86: nSVM: implement nested vGIF Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 28/30] KVM: SVM: implement force_intercept_exceptions_mask Maxim Levitsky
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky, Borislav Petkov

This parameter will be used by VMX and SVM code to force
interception of a set of exceptions, given by a bitmask
for guest debug and/or kvm debug.

This is based on an idea first shown here:
https://patchwork.kernel.org/project/kvm/patch/20160301192822.GD22677@pd.tnic/

CC: Borislav Petkov <bp@suse.de>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 7 +++++++
 arch/x86/kvm/x86.c              | 9 +++++++++
 arch/x86/kvm/x86.h              | 5 +++++
 3 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 428ab1cc7dd34..fa498612839a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1168,6 +1168,13 @@ struct kvm_arch {
 	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
 
+	/*
+	 * Bitmask of exceptions that KVM will intercept
+	 * and forward to the guest, even if that is not needed
+	 * for normal operation. Debug feature.
+	 */
+	u32 force_intercept_exceptions_bitmask;
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Whether the TDP MMU is enabled for this VM. This contains a
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 63d84c373e465..202c34697852f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -193,6 +193,13 @@ module_param(enable_pmu, bool, 0444);
 bool __read_mostly eager_page_split = true;
 module_param(eager_page_split, bool, 0644);
 
+/*
+ * force_intercept_exceptions_mask is a writable param and its value
+ * is snapshotted when a VM is created
+ */
+static uint force_intercept_exceptions_mask;
+module_param(force_intercept_exceptions_mask, uint, S_IRUGO | S_IWUSR);
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
@@ -11646,6 +11653,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
 	kvm->arch.guest_can_read_msr_platform_info = true;
+	kvm->arch.force_intercept_exceptions_bitmask = force_intercept_exceptions_mask;
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	spin_lock_init(&kvm->arch.hv_root_tdp_lock);
@@ -12886,6 +12894,7 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
 
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e9b303b21f173..34f96f483c7e5 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -91,6 +91,11 @@ static inline bool kvm_exception_is_soft(unsigned int nr)
 	return (nr == BP_VECTOR) || (nr == OF_VECTOR);
 }
 
+static inline bool kvm_is_exception_force_intercepted(struct kvm *kvm, int exception)
+{
+	return kvm->arch.force_intercept_exceptions_bitmask & BIT(exception);
+}
+
 static inline bool is_protmode(struct kvm_vcpu *vcpu)
 {
 	return kvm_read_cr0_bits(vcpu, X86_CR0_PE);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 28/30] KVM: SVM: implement force_intercept_exceptions_mask
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (26 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 27/30] KVM: x86: add force_intercept_exceptions_mask Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 29/30] KVM: VMX: " Maxim Levitsky
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

Currently #TS interception is only done once.
Also exception interception is not enabled for SEV guests.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/svm/svm.c          | 92 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h          |  5 +-
 arch/x86/kvm/svm/svm_onhyperv.c |  1 +
 arch/x86/kvm/x86.c              |  5 +-
 6 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fa498612839a0..446ee29e6cc99 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1750,6 +1750,8 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
 void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
+void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
+			     u32 error_code, unsigned long payload);
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index bf6e96011dfed..d462b4808e893 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -32,6 +32,7 @@
 #define MC_VECTOR 18
 #define XM_VECTOR 19
 #define VE_VECTOR 20
+#define CP_VECTOR 21
 
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1cf682d1553cc..afa4116ea938c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -245,6 +245,8 @@ static const u32 msrpm_ranges[] = {0, 0xc0000000, 0xc0010000};
 #define MSRS_RANGE_SIZE 2048
 #define MSRS_IN_RANGE (MSRS_RANGE_SIZE * 8 / 2)
 
+static int svm_handle_invalid_exit(struct kvm_vcpu *vcpu, u64 exit_code);
+
 u32 svm_msrpm_offset(u32 msr)
 {
 	u32 offset;
@@ -1035,6 +1037,16 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
 	}
 }
 
+static void svm_init_force_exceptions_intercepts(struct kvm_vcpu *vcpu)
+{
+	int exc;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	for (exc = 0 ; exc < 32 ; ++exc)
+		if (kvm_is_exception_force_intercepted(vcpu->kvm, exc))
+			set_exception_intercept(svm, exc);
+}
+
 static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1235,6 +1247,8 @@ static void __svm_vcpu_reset(struct kvm_vcpu *vcpu)
 
 	if (sev_es_guest(vcpu->kvm))
 		sev_es_vcpu_reset(svm);
+	else
+		svm_init_force_exceptions_intercepts(vcpu);
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -1865,6 +1879,19 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
+
+	if (kvm_is_exception_force_intercepted(vcpu->kvm, PF_VECTOR)) {
+		if (npt_enabled && !vcpu->arch.apf.host_apf_flags) {
+			/* If the #PF was only intercepted for debug, inject
+			 * it directly to the guest, since the kvm's mmu code
+			 * is not ready to deal with such page faults.
+			 */
+			kvm_queue_exception_e_p(vcpu, PF_VECTOR,
+						error_code, fault_address);
+			return 1;
+		}
+	}
+
 	return kvm_handle_page_fault(vcpu, error_code, fault_address,
 			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
 			svm->vmcb->control.insn_bytes : NULL,
@@ -1940,6 +1967,46 @@ static int ac_interception(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int gen_exc_interception(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Generic exception intercept handler which forwards a guest exception
+	 * as-is to the guest.
+	 * For exceptions that don't have a special intercept handler.
+	 *
+	 * Used only for 'force_intercept_exceptions_mask' KVM debug feature.
+	 */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int exc = svm->vmcb->control.exit_code - SVM_EXIT_EXCP_BASE;
+
+	if (!kvm_is_exception_force_intercepted(vcpu->kvm, exc))
+		return svm_handle_invalid_exit(vcpu, svm->vmcb->control.exit_code);
+
+	if (x86_exception_has_error_code(exc)) {
+
+		if (exc == TS_VECTOR) {
+			/*
+			 * SVM doesn't provide us with an error code to be able to
+			 * re-inject the #TS exception, so just disable its
+			 * intercept, and let the guest re-execute the instruction.
+			 */
+			vmcb_clr_intercept(&svm->vmcb01.ptr->control,
+					   INTERCEPT_EXCEPTION_OFFSET + TS_VECTOR);
+			recalc_intercepts(svm);
+			return 1;
+		} else if (exc == DF_VECTOR) {
+			/*
+			 * SVM doesn't provide the error code on #DF either
+			 * but it is always 0
+			 */
+			svm->vmcb->control.exit_info_1 = 0;
+		}
+		kvm_queue_exception_e(vcpu, exc, svm->vmcb->control.exit_info_1);
+	} else
+		kvm_queue_exception(vcpu, exc);
+	return 1;
+}
+
 static bool is_erratum_383(void)
 {
 	int err, i;
@@ -3050,13 +3117,34 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_WRITE_DR5]			= dr_interception,
 	[SVM_EXIT_WRITE_DR6]			= dr_interception,
 	[SVM_EXIT_WRITE_DR7]			= dr_interception,
+
+	/* 0 */
+	[SVM_EXIT_EXCP_BASE + DE_VECTOR]	= gen_exc_interception,
 	[SVM_EXIT_EXCP_BASE + DB_VECTOR]	= db_interception,
+						  /* NMI*/
 	[SVM_EXIT_EXCP_BASE + BP_VECTOR]	= bp_interception,
+	[SVM_EXIT_EXCP_BASE + OF_VECTOR]	= gen_exc_interception,
+	[SVM_EXIT_EXCP_BASE + BR_VECTOR]	= gen_exc_interception,
 	[SVM_EXIT_EXCP_BASE + UD_VECTOR]	= ud_interception,
+	[SVM_EXIT_EXCP_BASE + NM_VECTOR]	= gen_exc_interception,
+	/* 8*/
+	[SVM_EXIT_EXCP_BASE + DF_VECTOR]	= gen_exc_interception,
+						  /* 9 is reserved*/
+	[SVM_EXIT_EXCP_BASE + TS_VECTOR]	= gen_exc_interception,
+	[SVM_EXIT_EXCP_BASE + NP_VECTOR]	= gen_exc_interception,
+	[SVM_EXIT_EXCP_BASE + SS_VECTOR]	= gen_exc_interception,
+	[SVM_EXIT_EXCP_BASE + GP_VECTOR]	= gp_interception,
 	[SVM_EXIT_EXCP_BASE + PF_VECTOR]	= pf_interception,
-	[SVM_EXIT_EXCP_BASE + MC_VECTOR]	= mc_interception,
+						  /* 15 is reserved*/
+	/* 16 */
+	[SVM_EXIT_EXCP_BASE + MF_VECTOR]	= gen_exc_interception,
 	[SVM_EXIT_EXCP_BASE + AC_VECTOR]	= ac_interception,
-	[SVM_EXIT_EXCP_BASE + GP_VECTOR]	= gp_interception,
+	[SVM_EXIT_EXCP_BASE + MC_VECTOR]	= mc_interception,
+	[SVM_EXIT_EXCP_BASE + XM_VECTOR]	= gen_exc_interception,
+						  /* 20 - #VE - reserved on AMD*/
+	[SVM_EXIT_EXCP_BASE + CP_VECTOR]	= gen_exc_interception,
+	/* TODO: exceptions 22-31 */
+
 	[SVM_EXIT_INTR]				= intr_interception,
 	[SVM_EXIT_NMI]				= nmi_interception,
 	[SVM_EXIT_SMI]				= smi_interception,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 73cc9d3e784bd..dd3671d77258b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -415,8 +415,11 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
 	struct vmcb *vmcb = svm->vmcb01.ptr;
 
 	WARN_ON_ONCE(bit >= 32);
-	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
 
+	if (kvm_is_exception_force_intercepted(svm->vcpu.kvm, bit))
+		return;
+
+	vmcb_clr_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
 	recalc_intercepts(svm);
 }
 
diff --git a/arch/x86/kvm/svm/svm_onhyperv.c b/arch/x86/kvm/svm/svm_onhyperv.c
index 98aa981c04ec5..81be254e757b2 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.c
+++ b/arch/x86/kvm/svm/svm_onhyperv.c
@@ -8,6 +8,7 @@
 
 #include <asm/mshyperv.h>
 
+#include "x86.h"
 #include "svm.h"
 #include "svm_ops.h"
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 202c34697852f..0ee2fbb068b17 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -706,12 +706,13 @@ void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr,
 }
 EXPORT_SYMBOL_GPL(kvm_queue_exception_p);
 
-static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
-				    u32 error_code, unsigned long payload)
+void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
+			     u32 error_code, unsigned long payload)
 {
 	kvm_multiple_exception(vcpu, nr, true, error_code,
 			       true, payload, false);
 }
+EXPORT_SYMBOL_GPL(kvm_queue_exception_e_p);
 
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
 {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 29/30] KVM: VMX: implement force_intercept_exceptions_mask
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (27 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 28/30] KVM: SVM: implement force_intercept_exceptions_mask Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-07 15:54 ` [PATCH RESEND 30/30] KVM: x86: get rid of KVM_REQ_GET_NESTED_STATE_PAGES Maxim Levitsky
  2022-02-08 12:02 ` [PATCH RESEND 00/30] My patch queue Paolo Bonzini
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

All exceptions are supported. Some bugs might remain in regard to KVM own
interception of #PF but since this is strictly
debug feature this should be OK.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c |  8 +++++++
 arch/x86/kvm/vmx/vmcs.h   |  6 +++++
 arch/x86/kvm/vmx/vmx.c    | 47 +++++++++++++++++++++++++++++++++------
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c73e4d938ddc3..e89b32b1d9efb 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5902,6 +5902,14 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
 	switch ((u16)exit_reason.basic) {
 	case EXIT_REASON_EXCEPTION_NMI:
 		intr_info = vmx_get_intr_info(vcpu);
+
+		if (is_exception(intr_info)) {
+			int ex_no = intr_info & INTR_INFO_VECTOR_MASK;
+
+			if (kvm_is_exception_force_intercepted(vcpu->kvm, ex_no))
+				return true;
+		}
+
 		if (is_nmi(intr_info))
 			return true;
 		else if (is_page_fault(intr_info))
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index e325c290a8162..d5aac5abe5cdd 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -94,6 +94,12 @@ static inline bool is_exception_n(u32 intr_info, u8 vector)
 	return is_intr_type_n(intr_info, INTR_TYPE_HARD_EXCEPTION, vector);
 }
 
+static inline bool is_exception(u32 intr_info)
+{
+	return is_intr_type(intr_info, INTR_TYPE_HARD_EXCEPTION) ||
+	       is_intr_type(intr_info, INTR_TYPE_SOFT_EXCEPTION);
+}
+
 static inline bool is_debug(u32 intr_info)
 {
 	return is_exception_n(intr_info, DB_VECTOR);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fc9c4eca90a78..aec2b962707a0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -719,6 +719,7 @@ static u32 vmx_read_guest_seg_ar(struct vcpu_vmx *vmx, unsigned seg)
 void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
 {
 	u32 eb;
+	int exc;
 
 	eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR) |
 	     (1u << DB_VECTOR) | (1u << AC_VECTOR);
@@ -749,7 +750,8 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
         else {
 		int mask = 0, match = 0;
 
-		if (enable_ept && (eb & (1u << PF_VECTOR))) {
+		if (enable_ept && (eb & (1u << PF_VECTOR)) &&
+		    !kvm_is_exception_force_intercepted(vcpu->kvm, PF_VECTOR)) {
 			/*
 			 * If EPT is enabled, #PF is currently only intercepted
 			 * if MAXPHYADDR is smaller on the guest than on the
@@ -772,6 +774,10 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.xfd_no_write_intercept)
 		eb |= (1u << NM_VECTOR);
 
+	for (exc = 0 ; exc < 32 ; ++exc)
+		if (kvm_is_exception_force_intercepted(vcpu->kvm, exc) && exc != NMI_VECTOR)
+			eb |= (1u << exc);
+
 	vmcs_write32(EXCEPTION_BITMAP, eb);
 }
 
@@ -4867,18 +4873,23 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
 
 	if (!vmx->rmode.vm86_active && is_gp_fault(intr_info)) {
-		WARN_ON_ONCE(!enable_vmware_backdoor);
-
 		/*
 		 * VMware backdoor emulation on #GP interception only handles
 		 * IN{S}, OUT{S}, and RDPMC, none of which generate a non-zero
 		 * error code on #GP.
 		 */
-		if (error_code) {
+
+		if (enable_vmware_backdoor && !error_code)
+			return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
+
+		if (!kvm_is_exception_force_intercepted(vcpu->kvm, GP_VECTOR))
+			WARN_ON_ONCE(!enable_vmware_backdoor);
+
+		if (intr_info & INTR_INFO_DELIVER_CODE_MASK)
 			kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
-			return 1;
-		}
-		return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
+		else
+			kvm_queue_exception(vcpu, GP_VECTOR);
+		return 1;
 	}
 
 	/*
@@ -4887,6 +4898,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 	 * See the comments in vmx_handle_exit.
 	 */
 	if ((vect_info & VECTORING_INFO_VALID_MASK) &&
+	    !kvm_is_exception_force_intercepted(vcpu->kvm, PF_VECTOR) &&
 	    !(is_page_fault(intr_info) && !(error_code & PFERR_RSVD_MASK))) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_SIMUL_EX;
@@ -4901,10 +4913,23 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 	if (is_page_fault(intr_info)) {
 		cr2 = vmx_get_exit_qual(vcpu);
 		if (enable_ept && !vcpu->arch.apf.host_apf_flags) {
+			/*
+			 * If we force intercept #PF and the page fault
+			 * is due to the reason which we don't intercept,
+			 * reflect it to the guest.
+			 */
+			if (kvm_is_exception_force_intercepted(vcpu->kvm, PF_VECTOR) &&
+			    (!allow_smaller_maxphyaddr ||
+			     !(error_code & PFERR_PRESENT_MASK) ||
+			     (error_code & PFERR_RSVD_MASK))) {
+				kvm_queue_exception_e_p(vcpu, PF_VECTOR, error_code, cr2);
+				return 1;
+			}
 			/*
 			 * EPT will cause page fault only if we need to
 			 * detect illegal GPAs.
 			 */
+
 			WARN_ON_ONCE(!allow_smaller_maxphyaddr);
 			kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
 			return 1;
@@ -4983,6 +5008,14 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 			return 1;
 		fallthrough;
 	default:
+		if (kvm_is_exception_force_intercepted(vcpu->kvm, ex_no)) {
+			if (intr_info & INTR_INFO_DELIVER_CODE_MASK)
+				kvm_queue_exception_e(vcpu, ex_no, error_code);
+			else
+				kvm_queue_exception(vcpu, ex_no);
+			break;
+		}
+
 		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
 		kvm_run->ex.exception = ex_no;
 		kvm_run->ex.error_code = error_code;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH RESEND 30/30] KVM: x86: get rid of KVM_REQ_GET_NESTED_STATE_PAGES
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (28 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 29/30] KVM: VMX: " Maxim Levitsky
@ 2022-02-07 15:54 ` Maxim Levitsky
  2022-02-08 12:02 ` [PATCH RESEND 00/30] My patch queue Paolo Bonzini
  30 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-07 15:54 UTC (permalink / raw)
  To: kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	Paolo Bonzini, linux-kernel, Rodrigo Vivi, H. Peter Anvin,
	intel-gvt-dev, Joonas Lahtinen, Joerg Roedel,
	Sean Christopherson, David Airlie, Zhi Wang, Brijesh Singh,
	Jim Mattson, x86, Daniel Vetter, Borislav Petkov, Zhenyu Wang,
	Kan Liang, Jani Nikula, Maxim Levitsky

As it turned out this request isn't really needed,
and it complicates the nested migration.

In theory this patch can break userspace if
userspace relies on updating KVM's memslots
after setting nested state but there is little reason
for it to rely on this.

However this is undocumented and there is a good chance
that no userspace relies on this, thus
just try to remove this code.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +-
 arch/x86/kvm/hyperv.c           |  4 ++
 arch/x86/kvm/svm/nested.c       | 50 ++++-------------
 arch/x86/kvm/svm/svm.c          |  2 +-
 arch/x86/kvm/svm/svm.h          |  2 +-
 arch/x86/kvm/vmx/nested.c       | 99 +++++++++------------------------
 arch/x86/kvm/x86.c              |  6 --
 7 files changed, 45 insertions(+), 123 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 446ee29e6cc99..fc2d5628ad930 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -92,7 +92,6 @@
 #define KVM_REQ_HV_EXIT			KVM_ARCH_REQ(21)
 #define KVM_REQ_HV_STIMER		KVM_ARCH_REQ(22)
 #define KVM_REQ_LOAD_EOI_EXITMAP	KVM_ARCH_REQ(23)
-#define KVM_REQ_GET_NESTED_STATE_PAGES	KVM_ARCH_REQ(24)
 #define KVM_REQ_APICV_UPDATE \
 	KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_TLB_FLUSH_CURRENT	KVM_ARCH_REQ(26)
@@ -1519,12 +1518,14 @@ struct kvm_x86_nested_ops {
 	int (*set_state)(struct kvm_vcpu *vcpu,
 			 struct kvm_nested_state __user *user_kvm_nested_state,
 			 struct kvm_nested_state *kvm_state);
-	bool (*get_nested_state_pages)(struct kvm_vcpu *vcpu);
 	int (*write_log_dirty)(struct kvm_vcpu *vcpu, gpa_t l2_gpa);
 
 	int (*enable_evmcs)(struct kvm_vcpu *vcpu,
 			    uint16_t *vmcs_version);
 	uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu);
+
+	bool (*get_evmcs_page)(struct kvm_vcpu *vcpu);
+
 };
 
 struct kvm_x86_init_ops {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index dac41784f2b87..d297d102c0910 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1497,6 +1497,10 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
 					    gfn_to_gpa(gfn) | KVM_MSR_ENABLED,
 					    sizeof(struct hv_vp_assist_page)))
 			return 1;
+
+		if (host && kvm_x86_ops.nested_ops->get_evmcs_page)
+			if (!kvm_x86_ops.nested_ops->get_evmcs_page(vcpu))
+				return 1;
 		break;
 	}
 	case HV_X64_MSR_EOI:
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a426d4d3dcd82..ac813ad83d784 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -670,7 +670,7 @@ static void nested_svm_copy_common_state(struct vmcb *from_vmcb, struct vmcb *to
 }
 
 int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
-			 struct vmcb *vmcb12, bool from_vmrun)
+			 struct vmcb *vmcb12)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	int ret;
@@ -700,15 +700,13 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
 	nested_vmcb02_prepare_save(svm, vmcb12);
 
 	ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
-				  nested_npt_enabled(svm), from_vmrun);
+				  nested_npt_enabled(svm), true);
 	if (ret)
 		return ret;
 
 	if (!npt_enabled)
 		vcpu->arch.mmu->inject_page_fault = svm_inject_page_fault_nested;
 
-	if (!from_vmrun)
-		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
 
 	svm_set_gif(svm, true);
 
@@ -779,7 +777,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
 
 	svm->nested.nested_run_pending = 1;
 
-	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true))
+	if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12))
 		goto out_exit_err;
 
 	if (nested_svm_vmrun_msrpm(svm))
@@ -863,8 +861,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->nested.vmcb12_gpa = 0;
 	WARN_ON_ONCE(svm->nested.nested_run_pending);
 
-	kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
-
 	/* in case we halted in L2 */
 	svm->vcpu.arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
@@ -1069,8 +1065,6 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
 		nested_svm_uninit_mmu_context(vcpu);
 		vmcb_mark_all_dirty(svm->vmcb);
 	}
-
-	kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
 }
 
 static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
@@ -1562,53 +1556,31 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 	 */
 
 	ret = nested_svm_load_cr3(&svm->vcpu, vcpu->arch.cr3,
-				  nested_npt_enabled(svm), false);
+				  nested_npt_enabled(svm), !vcpu->arch.pdptrs_from_userspace);
 	if (WARN_ON_ONCE(ret))
 		goto out_free;
 
 
-	kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
-	ret = 0;
-out_free:
-	kfree(save);
-	kfree(ctl);
-
-	return ret;
-}
-
-static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-
-	if (WARN_ON(!is_guest_mode(vcpu)))
-		return true;
-
-	if (!vcpu->arch.pdptrs_from_userspace &&
-	    !nested_npt_enabled(svm) && is_pae_paging(vcpu))
-		/*
-		 * Reload the guest's PDPTRs since after a migration
-		 * the guest CR3 might be restored prior to setting the nested
-		 * state which can lead to a load of wrong PDPTRs.
-		 */
-		if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
-			return false;
-
 	if (!nested_svm_vmrun_msrpm(svm)) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror =
 			KVM_INTERNAL_ERROR_EMULATION;
 		vcpu->run->internal.ndata = 0;
-		return false;
+		goto out_free;
 	}
 
-	return true;
+	ret = 0;
+out_free:
+	kfree(save);
+	kfree(ctl);
+	return ret;
 }
 
+
 struct kvm_x86_nested_ops svm_nested_ops = {
 	.leave_nested = svm_leave_nested,
 	.check_events = svm_check_nested_events,
 	.triple_fault = nested_svm_triple_fault,
-	.get_nested_state_pages = svm_get_nested_state_pages,
 	.get_state = svm_get_nested_state,
 	.set_state = svm_set_nested_state,
 };
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index afa4116ea938c..6d6421e0cadcd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4498,7 +4498,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 	vmcb12 = map.hva;
 	nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
 	nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
-	ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, false);
+	ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12);
 
 	if (ret)
 		goto unmap_save;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index dd3671d77258b..e2eb91851e922 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -551,7 +551,7 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 }
 
 int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
-			 u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun);
+			 u64 vmcb_gpa, struct vmcb *vmcb12);
 void svm_leave_nested(struct kvm_vcpu *vcpu);
 void svm_free_nested(struct vcpu_svm *svm);
 int svm_allocate_nested(struct vcpu_svm *svm);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e89b32b1d9efb..19331f742662d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -294,8 +294,6 @@ static void free_nested(struct kvm_vcpu *vcpu)
 	if (!vmx->nested.vmxon && !vmx->nested.smm.vmxon)
 		return;
 
-	kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
-
 	vmx->nested.vmxon = false;
 	vmx->nested.smm.vmxon = false;
 	vmx->nested.vmxon_ptr = INVALID_GPA;
@@ -2593,7 +2591,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 
 	/* Shadow page tables on either EPT or shadow page tables. */
 	if (nested_vmx_load_cr3(vcpu, vmcs12->guest_cr3, nested_cpu_has_ept(vmcs12),
-				from_vmentry, entry_failure_code))
+				from_vmentry || !vcpu->arch.pdptrs_from_userspace,
+				entry_failure_code))
 		return -EINVAL;
 
 	/*
@@ -3125,7 +3124,7 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu)
+bool nested_get_evmcs_page(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
@@ -3161,18 +3160,6 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 	struct page *page;
 	u64 hpa;
 
-	if (!vcpu->arch.pdptrs_from_userspace &&
-	    !nested_cpu_has_ept(vmcs12) && is_pae_paging(vcpu)) {
-		/*
-		 * Reload the guest's PDPTRs since after a migration
-		 * the guest CR3 might be restored prior to setting the nested
-		 * state which can lead to a load of wrong PDPTRs.
-		 */
-		if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
-			return false;
-	}
-
-
 	if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
 		/*
 		 * Translate L1 physical address to host physical
@@ -3254,25 +3241,6 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 	return true;
 }
 
-static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu)
-{
-	if (!nested_get_evmcs_page(vcpu)) {
-		pr_debug_ratelimited("%s: enlightened vmptrld failed\n",
-				     __func__);
-		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-		vcpu->run->internal.suberror =
-			KVM_INTERNAL_ERROR_EMULATION;
-		vcpu->run->internal.ndata = 0;
-
-		return false;
-	}
-
-	if (is_guest_mode(vcpu) && !nested_get_vmcs12_pages(vcpu))
-		return false;
-
-	return true;
-}
-
 static int nested_vmx_write_pml_buffer(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	struct vmcs12 *vmcs12;
@@ -3402,12 +3370,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 
 	prepare_vmcs02_early(vmx, &vmx->vmcs01, vmcs12);
 
-	if (from_vmentry) {
-		if (unlikely(!nested_get_vmcs12_pages(vcpu))) {
-			vmx_switch_vmcs(vcpu, &vmx->vmcs01);
-			return NVMX_VMENTRY_KVM_INTERNAL_ERROR;
-		}
+	if (unlikely(!nested_get_vmcs12_pages(vcpu))) {
+		vmx_switch_vmcs(vcpu, &vmx->vmcs01);
+		return NVMX_VMENTRY_KVM_INTERNAL_ERROR;
+	}
 
+	if (from_vmentry) {
 		if (nested_vmx_check_vmentry_hw(vcpu)) {
 			vmx_switch_vmcs(vcpu, &vmx->vmcs01);
 			return NVMX_VMENTRY_VMFAIL;
@@ -3429,24 +3397,14 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 		goto vmentry_fail_vmexit_guest_mode;
 	}
 
-	if (from_vmentry) {
-		failed_index = nested_vmx_load_msr(vcpu,
-						   vmcs12->vm_entry_msr_load_addr,
-						   vmcs12->vm_entry_msr_load_count);
-		if (failed_index) {
-			exit_reason.basic = EXIT_REASON_MSR_LOAD_FAIL;
-			vmcs12->exit_qualification = failed_index;
-			goto vmentry_fail_vmexit_guest_mode;
-		}
-	} else {
-		/*
-		 * The MMU is not initialized to point at the right entities yet and
-		 * "get pages" would need to read data from the guest (i.e. we will
-		 * need to perform gpa to hpa translation). Request a call
-		 * to nested_get_vmcs12_pages before the next VM-entry.  The MSRs
-		 * have already been set at vmentry time and should not be reset.
-		 */
-		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
+
+	failed_index = nested_vmx_load_msr(vcpu,
+					   vmcs12->vm_entry_msr_load_addr,
+					   vmcs12->vm_entry_msr_load_count);
+	if (failed_index) {
+		exit_reason.basic = EXIT_REASON_MSR_LOAD_FAIL;
+		vmcs12->exit_qualification = failed_index;
+		goto vmentry_fail_vmexit_guest_mode;
 	}
 
 	/*
@@ -4516,16 +4474,6 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	/* Similarly, triple faults in L2 should never escape. */
 	WARN_ON_ONCE(kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu));
 
-	if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
-		/*
-		 * KVM_REQ_GET_NESTED_STATE_PAGES is also used to map
-		 * Enlightened VMCS after migration and we still need to
-		 * do that when something is forcing L2->L1 exit prior to
-		 * the first L2 run.
-		 */
-		(void)nested_get_evmcs_page(vcpu);
-	}
-
 	/* Service pending TLB flush requests for L2 before switching to L1. */
 	kvm_service_local_tlb_flush_requests(vcpu);
 
@@ -6382,14 +6330,17 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 
 		set_current_vmptr(vmx, kvm_state->hdr.vmx.vmcs12_pa);
 	} else if (kvm_state->flags & KVM_STATE_NESTED_EVMCS) {
+
+		vmx->nested.hv_evmcs_vmptr = EVMPTR_MAP_PENDING;
+
 		/*
 		 * nested_vmx_handle_enlightened_vmptrld() cannot be called
-		 * directly from here as HV_X64_MSR_VP_ASSIST_PAGE may not be
-		 * restored yet. EVMCS will be mapped from
-		 * nested_get_vmcs12_pages().
+		 * directly from here if HV_X64_MSR_VP_ASSIST_PAGE is not
+		 * restored yet. EVMCS will be mapped when it is.
 		 */
-		vmx->nested.hv_evmcs_vmptr = EVMPTR_MAP_PENDING;
-		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
+		if (kvm_hv_assist_page_enabled(vcpu))
+			nested_get_evmcs_page(vcpu);
+
 	} else {
 		return -EINVAL;
 	}
@@ -6811,8 +6762,8 @@ struct kvm_x86_nested_ops vmx_nested_ops = {
 	.triple_fault = nested_vmx_triple_fault,
 	.get_state = vmx_get_nested_state,
 	.set_state = vmx_set_nested_state,
-	.get_nested_state_pages = vmx_get_nested_state_pages,
 	.write_log_dirty = nested_vmx_write_pml_buffer,
 	.enable_evmcs = nested_enable_evmcs,
 	.get_evmcs_version = nested_get_evmcs_version,
+	.get_evmcs_page = nested_get_evmcs_page,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0ee2fbb068b17..48dd01fd7a1ec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9897,12 +9897,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			r = -EIO;
 			goto out;
 		}
-		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
-			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
-				r = 0;
-				goto out;
-			}
-		}
 		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
 			kvm_mmu_unload(vcpu);
 		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 07/30] KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them
  2022-02-07 15:54 ` [PATCH RESEND 07/30] KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them Maxim Levitsky
@ 2022-02-08 11:33   ` Paolo Bonzini
  2022-02-08 11:55     ` Maxim Levitsky
  0 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2022-02-08 11:33 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	linux-kernel, Rodrigo Vivi, H. Peter Anvin, intel-gvt-dev,
	Joonas Lahtinen, Joerg Roedel, Sean Christopherson, David Airlie,
	Zhi Wang, Brijesh Singh, Jim Mattson, x86, Daniel Vetter,
	Borislav Petkov, Zhenyu Wang, Kan Liang, Jani Nikula

On 2/7/22 16:54, Maxim Levitsky wrote:
> Fix a corner case in which the L1 hypervisor intercepts
> interrupts (INTERCEPT_INTR) and either doesn't set
> virtual interrupt masking (V_INTR_MASKING) or enters a
> nested guest with EFLAGS.IF disabled prior to the entry.
> 
> In this case, despite the fact that L1 intercepts the interrupts,
> KVM still needs to set up an interrupt window to wait before
> injecting the INTR vmexit.
> 
> Currently the KVM instead enters an endless loop of 'req_immediate_exit'.
> 
> Exactly the same issue also happens for SMIs and NMI.
> Fix this as well.
> 
> Note that on VMX, this case is impossible as there is only
> 'vmexit on external interrupts' execution control which either set,
> in which case both host and guest's EFLAGS.IF
> are ignored, or not set, in which case no VMexits are delivered.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/svm.c | 17 +++++++++++++----
>   1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 9a4e299ed5673..22e614008cf59 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3372,11 +3372,13 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>   	if (svm->nested.nested_run_pending)
>   		return -EBUSY;
>   
> +	if (svm_nmi_blocked(vcpu))
> +		return 0;
> +
>   	/* An NMI must not be injected into L2 if it's supposed to VM-Exit.  */
>   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
>   		return -EBUSY;
> -
> -	return !svm_nmi_blocked(vcpu);
> +	return 1;
>   }
>   
>   static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> @@ -3428,9 +3430,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>   static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> +
>   	if (svm->nested.nested_run_pending)
>   		return -EBUSY;
>   
> +	if (svm_interrupt_blocked(vcpu))
> +		return 0;
> +
>   	/*
>   	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
>   	 * e.g. if the IRQ arrived asynchronously after checking nested events.
> @@ -3438,7 +3444,7 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
>   		return -EBUSY;
>   
> -	return !svm_interrupt_blocked(vcpu);
> +	return 1;
>   }
>   
>   static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
> @@ -4169,11 +4175,14 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>   	if (svm->nested.nested_run_pending)
>   		return -EBUSY;
>   
> +	if (svm_smi_blocked(vcpu))
> +		return 0;
> +
>   	/* An SMI must not be injected into L2 if it's supposed to VM-Exit.  */
>   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_smi(svm))
>   		return -EBUSY;
>   
> -	return !svm_smi_blocked(vcpu);
> +	return 1;
>   }
>   
>   static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)

Can you prepare a testcase for at least the interrupt case?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 07/30] KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them
  2022-02-08 11:33   ` Paolo Bonzini
@ 2022-02-08 11:55     ` Maxim Levitsky
  2022-02-08 12:24       ` Maxim Levitsky
  0 siblings, 1 reply; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-08 11:55 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	linux-kernel, Rodrigo Vivi, H. Peter Anvin, intel-gvt-dev,
	Joonas Lahtinen, Joerg Roedel, Sean Christopherson, David Airlie,
	Zhi Wang, Brijesh Singh, Jim Mattson, x86, Daniel Vetter,
	Borislav Petkov, Zhenyu Wang, Kan Liang, Jani Nikula

On Tue, 2022-02-08 at 12:33 +0100, Paolo Bonzini wrote:
> On 2/7/22 16:54, Maxim Levitsky wrote:
> > Fix a corner case in which the L1 hypervisor intercepts
> > interrupts (INTERCEPT_INTR) and either doesn't set
> > virtual interrupt masking (V_INTR_MASKING) or enters a
> > nested guest with EFLAGS.IF disabled prior to the entry.
> > 
> > In this case, despite the fact that L1 intercepts the interrupts,
> > KVM still needs to set up an interrupt window to wait before
> > injecting the INTR vmexit.
> > 
> > Currently the KVM instead enters an endless loop of 'req_immediate_exit'.
> > 
> > Exactly the same issue also happens for SMIs and NMI.
> > Fix this as well.
> > 
> > Note that on VMX, this case is impossible as there is only
> > 'vmexit on external interrupts' execution control which either set,
> > in which case both host and guest's EFLAGS.IF
> > are ignored, or not set, in which case no VMexits are delivered.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >   arch/x86/kvm/svm/svm.c | 17 +++++++++++++----
> >   1 file changed, 13 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 9a4e299ed5673..22e614008cf59 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3372,11 +3372,13 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >   	if (svm->nested.nested_run_pending)
> >   		return -EBUSY;
> >   
> > +	if (svm_nmi_blocked(vcpu))
> > +		return 0;
> > +
> >   	/* An NMI must not be injected into L2 if it's supposed to VM-Exit.  */
> >   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
> >   		return -EBUSY;
> > -
> > -	return !svm_nmi_blocked(vcpu);
> > +	return 1;
> >   }
> >   
> >   static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > @@ -3428,9 +3430,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> >   static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >   {
> >   	struct vcpu_svm *svm = to_svm(vcpu);
> > +
> >   	if (svm->nested.nested_run_pending)
> >   		return -EBUSY;
> >   
> > +	if (svm_interrupt_blocked(vcpu))
> > +		return 0;
> > +
> >   	/*
> >   	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
> >   	 * e.g. if the IRQ arrived asynchronously after checking nested events.
> > @@ -3438,7 +3444,7 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
> >   		return -EBUSY;
> >   
> > -	return !svm_interrupt_blocked(vcpu);
> > +	return 1;
> >   }
> >   
> >   static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
> > @@ -4169,11 +4175,14 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >   	if (svm->nested.nested_run_pending)
> >   		return -EBUSY;
> >   
> > +	if (svm_smi_blocked(vcpu))
> > +		return 0;
> > +
> >   	/* An SMI must not be injected into L2 if it's supposed to VM-Exit.  */
> >   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_smi(svm))
> >   		return -EBUSY;
> >   
> > -	return !svm_smi_blocked(vcpu);
> > +	return 1;
> >   }
> >   
> >   static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> 
> Can you prepare a testcase for at least the interrupt case?


Yep, I already wrote a kvm unit tests for all the cases, and I will send them very soon.

Best regards,
	Maxim Levitsky
> 
> Thanks,
> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 00/30] My patch queue
  2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
                   ` (29 preceding siblings ...)
  2022-02-07 15:54 ` [PATCH RESEND 30/30] KVM: x86: get rid of KVM_REQ_GET_NESTED_STATE_PAGES Maxim Levitsky
@ 2022-02-08 12:02 ` Paolo Bonzini
  2022-02-08 12:45   ` Maxim Levitsky
  30 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2022-02-08 12:02 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	linux-kernel, Rodrigo Vivi, H. Peter Anvin, intel-gvt-dev,
	Joonas Lahtinen, Joerg Roedel, Sean Christopherson, David Airlie,
	Zhi Wang, Brijesh Singh, Jim Mattson, x86, Daniel Vetter,
	Borislav Petkov, Zhenyu Wang, Kan Liang, Jani Nikula

On 2/7/22 16:54, Maxim Levitsky wrote:
> This is set of various patches that are stuck in my patch queue.
> 
> KVM_REQ_GET_NESTED_STATE_PAGES patch is mostly RFC, but it does seem
> to work for me.
> 
> Read-only APIC ID is also somewhat RFC.
> 
> Some of these patches are preparation for support for nested AVIC
> which I almost done developing, and will start testing very soon.
> 
> Resend with cleaned up CCs.

1-9 are all bugfixes and pretty small, so I queued them.

10 is also a bugfix but I think it should be split up further, so I'll 
resend it.

For 11-30 I'll start reviewing them, but most of them are independent 
series.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 07/30] KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them
  2022-02-08 11:55     ` Maxim Levitsky
@ 2022-02-08 12:24       ` Maxim Levitsky
  0 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-08 12:24 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	linux-kernel, Rodrigo Vivi, H. Peter Anvin, intel-gvt-dev,
	Joonas Lahtinen, Joerg Roedel, Sean Christopherson, David Airlie,
	Zhi Wang, Brijesh Singh, Jim Mattson, x86, Daniel Vetter,
	Borislav Petkov, Zhenyu Wang, Kan Liang, Jani Nikula

On Tue, 2022-02-08 at 13:55 +0200, Maxim Levitsky wrote:
> On Tue, 2022-02-08 at 12:33 +0100, Paolo Bonzini wrote:
> > On 2/7/22 16:54, Maxim Levitsky wrote:
> > > Fix a corner case in which the L1 hypervisor intercepts
> > > interrupts (INTERCEPT_INTR) and either doesn't set
> > > virtual interrupt masking (V_INTR_MASKING) or enters a
> > > nested guest with EFLAGS.IF disabled prior to the entry.
> > > 
> > > In this case, despite the fact that L1 intercepts the interrupts,
> > > KVM still needs to set up an interrupt window to wait before
> > > injecting the INTR vmexit.
> > > 
> > > Currently the KVM instead enters an endless loop of 'req_immediate_exit'.
> > > 
> > > Exactly the same issue also happens for SMIs and NMI.
> > > Fix this as well.
> > > 
> > > Note that on VMX, this case is impossible as there is only
> > > 'vmexit on external interrupts' execution control which either set,
> > > in which case both host and guest's EFLAGS.IF
> > > are ignored, or not set, in which case no VMexits are delivered.
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >   arch/x86/kvm/svm/svm.c | 17 +++++++++++++----
> > >   1 file changed, 13 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 9a4e299ed5673..22e614008cf59 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -3372,11 +3372,13 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> > >   	if (svm->nested.nested_run_pending)
> > >   		return -EBUSY;
> > >   
> > > +	if (svm_nmi_blocked(vcpu))
> > > +		return 0;
> > > +
> > >   	/* An NMI must not be injected into L2 if it's supposed to VM-Exit.  */
> > >   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_nmi(svm))
> > >   		return -EBUSY;
> > > -
> > > -	return !svm_nmi_blocked(vcpu);
> > > +	return 1;
> > >   }
> > >   
> > >   static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > > @@ -3428,9 +3430,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> > >   static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> > >   {
> > >   	struct vcpu_svm *svm = to_svm(vcpu);
> > > +
> > >   	if (svm->nested.nested_run_pending)
> > >   		return -EBUSY;
> > >   
> > > +	if (svm_interrupt_blocked(vcpu))
> > > +		return 0;
> > > +
> > >   	/*
> > >   	 * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
> > >   	 * e.g. if the IRQ arrived asynchronously after checking nested events.
> > > @@ -3438,7 +3444,7 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> > >   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(svm))
> > >   		return -EBUSY;
> > >   
> > > -	return !svm_interrupt_blocked(vcpu);
> > > +	return 1;
> > >   }
> > >   
> > >   static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
> > > @@ -4169,11 +4175,14 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> > >   	if (svm->nested.nested_run_pending)
> > >   		return -EBUSY;
> > >   
> > > +	if (svm_smi_blocked(vcpu))
> > > +		return 0;
> > > +
> > >   	/* An SMI must not be injected into L2 if it's supposed to VM-Exit.  */
> > >   	if (for_injection && is_guest_mode(vcpu) && nested_exit_on_smi(svm))
> > >   		return -EBUSY;
> > >   
> > > -	return !svm_smi_blocked(vcpu);
> > > +	return 1;
> > >   }
> > >   
> > >   static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> > 
> > Can you prepare a testcase for at least the interrupt case?
> 
> Yep, I already wrote a kvm unit tests for all the cases, and I will send them very soon.

Done.

I also included tests for LBR virtualization which I think I already posted but I am not sure.

Best regards,
	Maxim Levitsky
> 
> Best regards,
> 	Maxim Levitsky
> > Thanks,
> > 
> > Paolo
> > 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH RESEND 00/30] My patch queue
  2022-02-08 12:02 ` [PATCH RESEND 00/30] My patch queue Paolo Bonzini
@ 2022-02-08 12:45   ` Maxim Levitsky
  0 siblings, 0 replies; 36+ messages in thread
From: Maxim Levitsky @ 2022-02-08 12:45 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Tony Luck, Chang S. Bae, Thomas Gleixner, Wanpeng Li,
	Ingo Molnar, Vitaly Kuznetsov, Pawan Gupta, Dave Hansen,
	linux-kernel, Rodrigo Vivi, H. Peter Anvin, intel-gvt-dev,
	Joonas Lahtinen, Joerg Roedel, Sean Christopherson, David Airlie,
	Zhi Wang, Brijesh Singh, Jim Mattson, x86, Daniel Vetter,
	Borislav Petkov, Zhenyu Wang, Kan Liang, Jani Nikula

On Tue, 2022-02-08 at 13:02 +0100, Paolo Bonzini wrote:
> On 2/7/22 16:54, Maxim Levitsky wrote:
> > This is set of various patches that are stuck in my patch queue.
> > 
> > KVM_REQ_GET_NESTED_STATE_PAGES patch is mostly RFC, but it does seem
> > to work for me.
> > 
> > Read-only APIC ID is also somewhat RFC.
> > 
> > Some of these patches are preparation for support for nested AVIC
> > which I almost done developing, and will start testing very soon.
> > 
> > Resend with cleaned up CCs.
> 
> 1-9 are all bugfixes and pretty small, so I queued them.
> 
> 10 is also a bugfix but I think it should be split up further, so I'll 
> resend it.

> 
> For 11-30 I'll start reviewing them, but most of them are independent 
> series.

Thank you very much!
 
I must again say sorry that I posted the whole thing as a one patch series,
next time I'll post each series separately, and I also try to post
the patches as soon as I write them.
 
 I didn't post them because I felt that the whole thing needs good testing 
and I only recently gotten to somewhat automate my nested migration tests 
which I usually use to test this kind of work.
 
 

Best regards,
	Maxim Levitsky
 
PS: the strict_mmu option does have quite an effect on nested migration with npt=0/ept=0.
In my testing such migration crashes with pagefault in L2 kernel after around 50-100
iterations, while with this options, on survived ~1000 iterations and around the same on intel,
and on both machines L1 eventually crashed with a page fault instead.
 
Could be that it just throws timing off, or maybe we still do have some form of bug
in shadow paging after all, maybe even 2 bugs.
Hmmm....
 
I automated these tests so I can run them for days until I have more confidence
in what is going on.



Best regards,
	Maxim Levitsky

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-02-08 13:22 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-07 15:54 [PATCH RESEND 00/30] My patch queue Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 01/30] KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 02/30] KVM: x86: nSVM: fix potential NULL derefernce on nested migration Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 03/30] KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 04/30] KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 05/30] KVM: x86: nSVM: expose clean bit support to the guest Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 06/30] KVM: x86: mark syntethic SMM vmexit as SVM_EXIT_SW Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 07/30] KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them Maxim Levitsky
2022-02-08 11:33   ` Paolo Bonzini
2022-02-08 11:55     ` Maxim Levitsky
2022-02-08 12:24       ` Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 08/30] KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 09/30] KVM: x86: SVM: move avic definitions from AMD's spec to svm.h Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 10/30] KVM: x86: SVM: fix race between interrupt delivery and AVIC inhibition Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 11/30] KVM: x86: SVM: use vmcb01 in avic_init_vmcb Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 12/30] KVM: x86: SVM: allow AVIC to co-exist with a nested guest running Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 13/30] KVM: x86: lapic: don't allow to change APIC ID when apic acceleration is enabled Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 14/30] KVM: x86: lapic: don't allow to change local apic id when using older x2apic api Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 15/30] KVM: x86: SVM: remove avic's broken code that updated APIC ID Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 16/30] KVM: x86: SVM: allow to force AVIC to be enabled Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 17/30] KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 18/30] KVM: x86: mmu: add strict mmu mode Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 19/30] KVM: x86: mmu: add gfn_in_memslot helper Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 20/30] KVM: x86: mmu: allow to enable write tracking externally Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 21/30] x86: KVMGT: use kvm_page_track_write_tracking_enable Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 22/30] KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 23/30] KVM: x86: nSVM: implement nested LBR virtualization Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 24/30] KVM: x86: nSVM: implement nested VMLOAD/VMSAVE Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 25/30] KVM: x86: nSVM: support PAUSE filter threshold and count when cpu_pm=on Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 26/30] KVM: x86: nSVM: implement nested vGIF Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 27/30] KVM: x86: add force_intercept_exceptions_mask Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 28/30] KVM: SVM: implement force_intercept_exceptions_mask Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 29/30] KVM: VMX: " Maxim Levitsky
2022-02-07 15:54 ` [PATCH RESEND 30/30] KVM: x86: get rid of KVM_REQ_GET_NESTED_STATE_PAGES Maxim Levitsky
2022-02-08 12:02 ` [PATCH RESEND 00/30] My patch queue Paolo Bonzini
2022-02-08 12:45   ` Maxim Levitsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.