linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/12] My AVIC patch queue
@ 2021-08-02 18:33 Maxim Levitsky
  2021-08-02 18:33 ` [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock" Maxim Levitsky
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

Hi!\r
\r
This is a series of bugfixes to the AVIC dynamic inhibition, which was\r
made while trying to fix bugs as much as possible in this area and trying\r
to make the AVIC+SYNIC conditional enablement work.\r
\r
* Patches 1-6 are code from Sean Christopherson which\r
  implement an alternative approach of inhibiting AVIC without\r
  disabling its memslot.\r
  Those patches are new in this version of the patchset.\r
\r
* Patches 7-8 in this series fix a race condition which can cause\r
  a lost write from a guest to APIC when the APIC write races\r
  the AVIC un-inhibition, and add a warning to catch this problem\r
  if it re-emerges again.\r
\r
  V3: moved the mutex to kvm.arch and added comment\r
\r
* Patch 9 is the patch from Vitaly about allowing AVIC with SYNC\r
  as long as the guest doesn’t use the AutoEOI feature. I only slightly\r
  changed it to expose the AutoEOI cpuid bit regardless of AVIC enablement.\r
\r
* Patch 10 is a new patch in this version, which is a refactoring that is now\r
  possible in SVM AVIC inhibition code, because the RCU lock is not dropped anymore.\r
\r
* Patch 11 fixes another issue I found in AVIC inhibit code:\r
\r
  Currently avic_vcpu_load/avic_vcpu_put are called on userspace entry/exit\r
  from KVM (aka kvm_vcpu_get/kvm_vcpu_put), and these functions update the\r
  "is running" bit in the AVIC physical ID remap table and update the\r
  target vCPU in iommu code.\r
\r
  However both of these functions don't do anything when AVIC is inhibited\r
  thus the "is running" bit will be kept enabled during the exit to userspace.\r
  This shouldn't be a big issue as the caller\r
  doesn't use the AVIC when inhibited but still inconsistent and can trigger\r
  a warning about this in avic_vcpu_load.\r
\r
  To be on the safe side I think it makes sense to call\r
  avic_vcpu_put/avic_vcpu_load when inhibiting/uninhibiting the AVIC.\r
  This will ensure that the work these functions do is matched.\r
\r
* Patch 12 (also new in this series) removes the pointless APIC base\r
  relocation from AVIC to make it consistent with the rest of KVM.\r
\r
  (both AVIC and APICv only support default base, while regular KVM,\r
  sort of support any APIC base as long as it is not RAM.\r
  If guest attempts to relocate APIC base to non RAM area,\r
  while APICv/AVIC are active, the new base will be non accelerated,\r
  while the default base will continue to be AVIC/APICv backed).\r
\r
Best regards,\r
	Maxim Levitsky\r
\r
Maxim Levitsky (10):\r
  KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range\r
  KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfn\r
  KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code\r
  KVM: x86/mmu: allow APICv memslot to be partially enabled\r
  KVM: x86: don't disable APICv memslot when inhibited\r
  KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM\r
  KVM: SVM: add warning for mistmatch between AVIC state and AVIC access\r
    page state\r
  KVM: SVM: remove svm_toggle_avic_for_irq_window\r
  KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling\r
    AVIC\r
  KVM: SVM: AVIC: drop unsupported AVIC base relocation code\r
\r
Sean Christopherson (1):\r
  Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu\r
    read lock"\r
\r
Vitaly Kuznetsov (1):\r
  KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in\r
    use\r
\r
 arch/x86/include/asm/kvm-x86-ops.h |  1 -\r
 arch/x86/include/asm/kvm_host.h    |  7 ++-\r
 arch/x86/kvm/hyperv.c              | 27 ++++++++---\r
 arch/x86/kvm/mmu/mmu.c             | 69 ++++++++++++++++----------\r
 arch/x86/kvm/mmu/paging_tmpl.h     |  6 +--\r
 arch/x86/kvm/mmu/tdp_mmu.c         | 15 ++----\r
 arch/x86/kvm/mmu/tdp_mmu.h         | 11 ++---\r
 arch/x86/kvm/svm/avic.c            | 77 ++++++++++++------------------\r
 arch/x86/kvm/svm/svm.c             | 23 +++++----\r
 arch/x86/kvm/svm/svm.h             |  8 ----\r
 arch/x86/kvm/x86.c                 | 56 +++++++++++++---------\r
 include/linux/kvm_host.h           |  5 ++\r
 virt/kvm/kvm_main.c                |  7 ++-\r
 13 files changed, 166 insertions(+), 146 deletions(-)\r
\r
-- \r
2.26.3\r
\r


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  8:05   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 02/12] KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range Maxim Levitsky
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

From: Sean Christopherson <seanjc@google.com>

This together with the next patch will fix a future race between
kvm_zap_gfn_range and the page fault handler, which will happen
when AVIC memslot is going to be only partially disabled.

This is based on a patch suggested by Sean Christopherson:
https://lkml.org/lkml/2021/7/22/1025

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c     | 19 ++++++++-----------
 arch/x86/kvm/mmu/tdp_mmu.c | 15 ++++-----------
 arch/x86/kvm/mmu/tdp_mmu.h | 11 ++++-------
 3 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8cdfd8d45c4..9d78cb1c0f35 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5638,8 +5638,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	int i;
 	bool flush = false;
 
+	write_lock(&kvm->mmu_lock);
+
 	if (kvm_memslots_have_rmaps(kvm)) {
-		write_lock(&kvm->mmu_lock);
 		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
 			slots = __kvm_memslots(kvm, i);
 			kvm_for_each_memslot(memslot, slots) {
@@ -5659,22 +5660,18 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 		}
 		if (flush)
 			kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
-		write_unlock(&kvm->mmu_lock);
 	}
 
 	if (is_tdp_mmu_enabled(kvm)) {
-		flush = false;
-
-		read_lock(&kvm->mmu_lock);
 		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
 			flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start,
-							  gfn_end, flush, true);
-		if (flush)
-			kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
-							   gfn_end);
-
-		read_unlock(&kvm->mmu_lock);
+							  gfn_end, flush);
 	}
+
+	if (flush)
+		kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
+
+	write_unlock(&kvm->mmu_lock);
 }
 
 static bool slot_rmap_write_protect(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 43f12f5d12c0..3e0222ce3f4e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -777,21 +777,15 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
  * non-root pages mapping GFNs strictly within that range. Returns true if
  * SPTEs have been cleared and a TLB flush is needed before releasing the
  * MMU lock.
- *
- * If shared is true, this thread holds the MMU lock in read mode and must
- * account for the possibility that other threads are modifying the paging
- * structures concurrently. If shared is false, this thread should hold the
- * MMU in write mode.
  */
 bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
-				 gfn_t end, bool can_yield, bool flush,
-				 bool shared)
+				 gfn_t end, bool can_yield, bool flush)
 {
 	struct kvm_mmu_page *root;
 
-	for_each_tdp_mmu_root_yield_safe(kvm, root, as_id, shared)
+	for_each_tdp_mmu_root_yield_safe(kvm, root, as_id, false)
 		flush = zap_gfn_range(kvm, root, start, end, can_yield, flush,
-				      shared);
+				      false);
 
 	return flush;
 }
@@ -803,8 +797,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
 	int i;
 
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
-		flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, max_gfn,
-						  flush, false);
+		flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, max_gfn, flush);
 
 	if (flush)
 		kvm_flush_remote_tlbs(kvm);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index b224d126adf9..358f447d4012 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -20,14 +20,11 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 			  bool shared);
 
 bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
-				 gfn_t end, bool can_yield, bool flush,
-				 bool shared);
+				 gfn_t end, bool can_yield, bool flush);
 static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
-					     gfn_t start, gfn_t end, bool flush,
-					     bool shared)
+					     gfn_t start, gfn_t end, bool flush)
 {
-	return __kvm_tdp_mmu_zap_gfn_range(kvm, as_id, start, end, true, flush,
-					   shared);
+	return __kvm_tdp_mmu_zap_gfn_range(kvm, as_id, start, end, true, flush);
 }
 static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
@@ -44,7 +41,7 @@ static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 	 */
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	return __kvm_tdp_mmu_zap_gfn_range(kvm, kvm_mmu_page_as_id(sp),
-					   sp->gfn, end, false, false, false);
+					   sp->gfn, end, false, false);
 }
 
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 02/12] KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
  2021-08-02 18:33 ` [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock" Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  9:00   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 03/12] KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfn Maxim Levitsky
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

This together with previous patch, ensures that
kvm_zap_gfn_range doesn't race with page fault
running on another vcpu, and will make this page fault code
retry instead.

This is based on a patch suggested by Sean Christopherson:
https://lkml.org/lkml/2021/7/22/1025

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c   | 4 ++++
 include/linux/kvm_host.h | 5 +++++
 virt/kvm/kvm_main.c      | 7 +++++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9d78cb1c0f35..9da635e383c2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5640,6 +5640,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 
 	write_lock(&kvm->mmu_lock);
 
+	kvm_inc_notifier_count(kvm, gfn_start, gfn_end);
+
 	if (kvm_memslots_have_rmaps(kvm)) {
 		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
 			slots = __kvm_memslots(kvm, i);
@@ -5671,6 +5673,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	if (flush)
 		kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
 
+	kvm_dec_notifier_count(kvm, gfn_start, gfn_end);
+
 	write_unlock(&kvm->mmu_lock);
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9d6b4ad407b8..962e11a73e8e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -985,6 +985,11 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
 void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 #endif
 
+void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
+				   unsigned long end);
+void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
+				   unsigned long end);
+
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg);
 long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a96cbe24c688..71042cd807b3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -608,7 +608,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
 }
 
-static void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
+void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
 				   unsigned long end)
 {
 	/*
@@ -636,6 +636,7 @@ static void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
 			max(kvm->mmu_notifier_range_end, end);
 	}
 }
+EXPORT_SYMBOL_GPL(kvm_inc_notifier_count);
 
 static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 					const struct mmu_notifier_range *range)
@@ -670,7 +671,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	return 0;
 }
 
-static void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
+void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
 				   unsigned long end)
 {
 	/*
@@ -687,6 +688,8 @@ static void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
 	 */
 	kvm->mmu_notifier_count--;
 }
+EXPORT_SYMBOL_GPL(kvm_dec_notifier_count);
+
 
 static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
 					const struct mmu_notifier_range *range)
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 03/12] KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfn
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
  2021-08-02 18:33 ` [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock" Maxim Levitsky
  2021-08-02 18:33 ` [PATCH v3 02/12] KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  9:00   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 04/12] KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code Maxim Levitsky
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

try_async_pf is a wrong name for this function, since this code
is used when asynchronous page fault is not enabled as well.

This code is based on a patch from Sean Christopherson:
https://lkml.org/lkml/2021/7/19/2970

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c         | 4 ++--
 arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9da635e383c2..c5e0ecf5f758 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3842,7 +3842,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
 }
 
-static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
+static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
 			 bool write, bool *writable)
 {
@@ -3912,7 +3912,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
+	if (kvm_faultin_pfn(vcpu, prefault, gfn, gpa, &pfn, &hva,
 			 write, &map_writable))
 		return RET_PF_RETRY;
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index ee044d357b5f..f349eae69bf3 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -881,7 +881,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
+	if (kvm_faultin_pfn(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
 			 write_fault, &map_writable))
 		return RET_PF_RETRY;
 
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 04/12] KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (2 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 03/12] KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfn Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  9:00   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 05/12] KVM: x86/mmu: allow APICv memslot to be partially enabled Maxim Levitsky
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

This will allow it to return RET_PF_EMULATE for APIC mmio
emulation.

This code is based on a patch from Sean Christopherson:
https://lkml.org/lkml/2021/7/19/2970

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c         | 17 ++++++++++-------
 arch/x86/kvm/mmu/paging_tmpl.h |  4 ++--
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c5e0ecf5f758..6f77f6efd43c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3844,7 +3844,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 
 static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
-			 bool write, bool *writable)
+			 bool write, bool *writable, int *r)
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
@@ -3855,7 +3855,7 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	 * be zapped before KVM inserts a new MMIO SPTE for the gfn.
 	 */
 	if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
-		return true;
+		goto out_retry;
 
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
@@ -3875,14 +3875,17 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 		if (kvm_find_async_pf_gfn(vcpu, gfn)) {
 			trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
 			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
-			return true;
+			goto out_retry;
 		} else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn))
-			return true;
+			goto out_retry;
 	}
 
 	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
 				    write, writable, hva);
-	return false;
+
+out_retry:
+	*r = RET_PF_RETRY;
+	return true;
 }
 
 static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
@@ -3913,8 +3916,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	smp_rmb();
 
 	if (kvm_faultin_pfn(vcpu, prefault, gfn, gpa, &pfn, &hva,
-			 write, &map_writable))
-		return RET_PF_RETRY;
+			 write, &map_writable, &r))
+		return r;
 
 	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
 		return r;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f349eae69bf3..7d03e9b7ccfa 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -882,8 +882,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	smp_rmb();
 
 	if (kvm_faultin_pfn(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
-			 write_fault, &map_writable))
-		return RET_PF_RETRY;
+			 write_fault, &map_writable, &r))
+		return r;
 
 	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
 		return r;
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 05/12] KVM: x86/mmu: allow APICv memslot to be partially enabled
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (3 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 04/12] KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  9:12   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited Maxim Levitsky
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

on AMD, APIC virtualization needs to dynamicaly inhibit the AVIC in a
response to some events, and this is problematic and not efficient to do by
enabling/disabling the memslot that covers APIC's mmio range.
Plus due to SRCU locking, it makes it more complex to request AVIC inhibition.

Instead, the APIC memslot will be always enabled, but the MMU code
will not install a SPTE for it, when arch.apic_access_memslot_enabled == false
and instead jump straight to emulating the access.

When inhibiting the AVIC, this SPTE will be zapped.

This code is based on a suggestion from Sean Christopherson:
https://lkml.org/lkml/2021/7/19/2970

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6f77f6efd43c..965b562da893 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3857,11 +3857,24 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
 		goto out_retry;
 
-	/* Don't expose private memslots to L2. */
-	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
-		*pfn = KVM_PFN_NOSLOT;
-		*writable = false;
-		return false;
+	if (!kvm_is_visible_memslot(slot)) {
+		/* Don't expose private memslots to L2. */
+		if (is_guest_mode(vcpu)) {
+			*pfn = KVM_PFN_NOSLOT;
+			*writable = false;
+			return false;
+		}
+		/*
+		 * If the APIC access page exists but is disabled, go directly
+		 * to emulation without caching the MMIO access or creating a
+		 * MMIO SPTE.  That way the cache doesn't need to be purged
+		 * when the AVIC is re-enabled.
+		 */
+		if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
+		    !vcpu->kvm->arch.apic_access_memslot_enabled) {
+			*r = RET_PF_EMULATE;
+			return true;
+		}
 	}
 
 	async = false;
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (4 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 05/12] KVM: x86/mmu: allow APICv memslot to be partially enabled Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  8:44   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 07/12] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM Maxim Levitsky
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

Thanks to the former patches, it is now possible to keep the APICv
memslot always enabled, and only zap the GFN range when we inhibit it,

Special code in MMU now takes care to avoid re-populating the SPTE
but rather do emulation instead.

This code is based on a suggestion from Sean Christopherson:
https://lkml.org/lkml/2021/7/19/2970

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 -
 arch/x86/include/asm/kvm_host.h    |  1 -
 arch/x86/kvm/svm/avic.c            | 21 ++++++---------------
 arch/x86/kvm/svm/svm.c             |  1 -
 arch/x86/kvm/svm/svm.h             |  1 -
 arch/x86/kvm/x86.c                 | 15 ++++++---------
 6 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index a12a4987154e..cefe1d81e2e8 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -72,7 +72,6 @@ KVM_X86_OP(enable_nmi_window)
 KVM_X86_OP(enable_irq_window)
 KVM_X86_OP(update_cr8_intercept)
 KVM_X86_OP(check_apicv_inhibit_reasons)
-KVM_X86_OP_NULL(pre_update_apicv_exec_ctrl)
 KVM_X86_OP(refresh_apicv_exec_ctrl)
 KVM_X86_OP(hwapic_irr_update)
 KVM_X86_OP(hwapic_isr_update)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 99f37781a6fc..430447ce19c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1353,7 +1353,6 @@ struct kvm_x86_ops {
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
 	bool (*check_apicv_inhibit_reasons)(ulong bit);
-	void (*pre_update_apicv_exec_ctrl)(struct kvm *kvm, bool activate);
 	void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
 	void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
 	void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index a8ad78a2faa1..d0acbeeab3d6 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -225,31 +225,26 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
  * field of the VMCB. Therefore, we set up the
  * APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
  */
-static int avic_update_access_page(struct kvm *kvm, bool activate)
+static int avic_alloc_access_page(struct kvm *kvm)
 {
 	void __user *ret;
 	int r = 0;
 
 	mutex_lock(&kvm->slots_lock);
-	/*
-	 * During kvm_destroy_vm(), kvm_pit_set_reinject() could trigger
-	 * APICv mode change, which update APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
-	 * memory region. So, we need to ensure that kvm->mm == current->mm.
-	 */
-	if ((kvm->arch.apic_access_memslot_enabled == activate) ||
-	    (kvm->mm != current->mm))
+
+	if (kvm->arch.apic_access_memslot_enabled)
 		goto out;
 
 	ret = __x86_set_memory_region(kvm,
 				      APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
 				      APIC_DEFAULT_PHYS_BASE,
-				      activate ? PAGE_SIZE : 0);
+				      PAGE_SIZE);
 	if (IS_ERR(ret)) {
 		r = PTR_ERR(ret);
 		goto out;
 	}
 
-	kvm->arch.apic_access_memslot_enabled = activate;
+	kvm->arch.apic_access_memslot_enabled = true;
 out:
 	mutex_unlock(&kvm->slots_lock);
 	return r;
@@ -270,7 +265,7 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
 	if (kvm_apicv_activated(vcpu->kvm)) {
 		int ret;
 
-		ret = avic_update_access_page(vcpu->kvm, true);
+		ret = avic_alloc_access_page(vcpu->kvm);
 		if (ret)
 			return ret;
 	}
@@ -918,10 +913,6 @@ bool svm_check_apicv_inhibit_reasons(ulong bit)
 	return supported & BIT(bit);
 }
 
-void svm_pre_update_apicv_exec_ctrl(struct kvm *kvm, bool activate)
-{
-	avic_update_access_page(kvm, activate);
-}
 
 static inline int
 avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9d72b1df426e..4feff53dd1d3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4583,7 +4583,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.set_virtual_apic_mode = svm_set_virtual_apic_mode,
 	.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
 	.check_apicv_inhibit_reasons = svm_check_apicv_inhibit_reasons,
-	.pre_update_apicv_exec_ctrl = svm_pre_update_apicv_exec_ctrl,
 	.load_eoi_exitmap = svm_load_eoi_exitmap,
 	.hwapic_irr_update = svm_hwapic_irr_update,
 	.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bd0fe94c2920..bd41f2a32838 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -534,7 +534,6 @@ void avic_post_state_restore(struct kvm_vcpu *vcpu);
 void svm_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
 void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);
 bool svm_check_apicv_inhibit_reasons(ulong bit);
-void svm_pre_update_apicv_exec_ctrl(struct kvm *kvm, bool activate);
 void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr);
 void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 916c976e99ab..1de7d97341dd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9255,13 +9255,6 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
 
-/*
- * NOTE: Do not hold any lock prior to calling this.
- *
- * In particular, kvm_request_apicv_update() expects kvm->srcu not to be
- * locked, because it calls __x86_set_memory_region() which does
- * synchronize_srcu(&kvm->srcu).
- */
 void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
 {
 	unsigned long old, new, expected;
@@ -9286,8 +9279,12 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
 		return;
 
 	trace_kvm_apicv_update_request(activate, bit);
-	if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
-		static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
+
+	if (!activate)
+		kvm_zap_gfn_range(kvm, gpa_to_gfn(APIC_DEFAULT_PHYS_BASE),
+				  gpa_to_gfn(APIC_DEFAULT_PHYS_BASE + PAGE_SIZE));
+
+	kvm->arch.apic_access_memslot_enabled = activate;
 
 	kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
 }
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 07/12] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (5 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-02 18:33 ` [PATCH v3 08/12] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state Maxim Levitsky
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

Currently on SVM, the kvm_request_apicv_update toggles the APICv
memslot without doing any synchronization.

If there is a mismatch between that memslot state and the AVIC state,
on one of the vCPUs, an APIC mmio access can be lost:

For example:

VCPU0: enable the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
VCPU1: access an APIC mmio register.

Since AVIC is still disabled on VCPU1, the access will not be intercepted
by it, and neither will it cause MMIO fault, but rather it will just be
read/written from/to the dummy page mapped into the
APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.

Fix that by adding a lock guarding the AVIC state changes, and carefully
order the operations of kvm_request_apicv_update to avoid this race:

1. Take the lock
2. Send KVM_REQ_APICV_UPDATE
3. Update the apic inhibit reason
4. Release the lock

This ensures that at (2) all vCPUs are kicked out of the guest mode,
but don't yet see the new avic state.
Then only after (4) all other vCPUs can update their AVIC state and resume.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/x86.c              | 36 +++++++++++++++++++--------------
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 430447ce19c0..80323b5fb20a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1054,6 +1054,9 @@ struct kvm_arch {
 	struct kvm_apic_map __rcu *apic_map;
 	atomic_t apic_map_dirty;
 
+	/* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
+	struct mutex apicv_update_lock;
+
 	bool apic_access_memslot_enabled;
 	unsigned long apicv_inhibit_reasons;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1de7d97341dd..9d2e4594c4eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8579,6 +8579,8 @@ EXPORT_SYMBOL_GPL(kvm_apicv_activated);
 
 static void kvm_apicv_init(struct kvm *kvm)
 {
+	mutex_init(&kvm->arch.apicv_update_lock);
+
 	if (enable_apicv)
 		clear_bit(APICV_INHIBIT_REASON_DISABLE,
 			  &kvm->arch.apicv_inhibit_reasons);
@@ -9240,6 +9242,8 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	if (!lapic_in_kernel(vcpu))
 		return;
 
+	mutex_lock(&vcpu->kvm->arch.apicv_update_lock);
+
 	vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
 	kvm_apic_update_apicv(vcpu);
 	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
@@ -9252,41 +9256,43 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	 */
 	if (!vcpu->arch.apicv_active)
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
+
+	mutex_unlock(&vcpu->kvm->arch.apicv_update_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
 
 void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
 {
-	unsigned long old, new, expected;
+	unsigned long old, new;
 
 	if (!kvm_x86_ops.check_apicv_inhibit_reasons ||
 	    !static_call(kvm_x86_check_apicv_inhibit_reasons)(bit))
 		return;
 
-	old = READ_ONCE(kvm->arch.apicv_inhibit_reasons);
-	do {
-		expected = new = old;
-		if (activate)
-			__clear_bit(bit, &new);
-		else
-			__set_bit(bit, &new);
-		if (new == old)
-			break;
-		old = cmpxchg(&kvm->arch.apicv_inhibit_reasons, expected, new);
-	} while (old != expected);
+	mutex_lock(&kvm->arch.apicv_update_lock);
+
+	old = new = kvm->arch.apicv_inhibit_reasons;
+	if (activate)
+		__clear_bit(bit, &new);
+	else
+		__set_bit(bit, &new);
+
+	kvm->arch.apicv_inhibit_reasons = new;
 
 	if (!!old == !!new)
-		return;
+		goto out;
 
 	trace_kvm_apicv_update_request(activate, bit);
 
+	kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
+
 	if (!activate)
 		kvm_zap_gfn_range(kvm, gpa_to_gfn(APIC_DEFAULT_PHYS_BASE),
 				  gpa_to_gfn(APIC_DEFAULT_PHYS_BASE + PAGE_SIZE));
 
 	kvm->arch.apic_access_memslot_enabled = activate;
-
-	kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
+out:
+	mutex_unlock(&kvm->arch.apicv_update_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
 
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 08/12] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (6 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 07/12] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  8:45   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

It is never a good idea to enter a guest when AVIC state doesn't match
the state of the AVIC MMIO page state.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4feff53dd1d3..3923d383face 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3781,6 +3781,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	pre_svm_run(vcpu);
 
+	WARN_ON_ONCE(vcpu->kvm->arch.apic_access_memslot_enabled !=
+		     kvm_vcpu_apicv_active(vcpu));
+
 	sync_lapic_to_cr8(vcpu);
 
 	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (7 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 08/12] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  8:47   ` Paolo Bonzini
                     ` (2 more replies)
  2021-08-02 18:33 ` [PATCH v3 10/12] KVM: SVM: remove svm_toggle_avic_for_irq_window Maxim Levitsky
                   ` (2 subsequent siblings)
  11 siblings, 3 replies; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

From: Vitaly Kuznetsov <vkuznets@redhat.com>

APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon
SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is,
however, possible to track whether the feature was actually used by the
guest and only inhibit APICv/AVIC when needed.

TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let
Windows know that AutoEOI feature should be avoided. While it's up to
KVM userspace to set the flag, KVM can help a bit by exposing global
APICv/AVIC enablement.

Maxim:
   - always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid,
     since this feature can be used regardless of AVIC

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/hyperv.c           | 27 +++++++++++++++++++++------
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 80323b5fb20a..55b1f79d9c43 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -988,6 +988,9 @@ struct kvm_hv {
 	/* How many vCPUs have VP index != vCPU index */
 	atomic_t num_mismatched_vp_indexes;
 
+	/* How many SynICs use 'AutoEOI' feature */
+	atomic_t synic_auto_eoi_used;
+
 	struct hv_partition_assist_pg *hv_pa_pg;
 	struct kvm_hv_syndbg hv_syndbg;
 };
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index b07592ca92f0..638f3c559623 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -88,6 +88,10 @@ static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic,
 static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
 				int vector)
 {
+	struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
+	struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
+	int auto_eoi_old, auto_eoi_new;
+
 	if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
 		return;
 
@@ -96,10 +100,25 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
 	else
 		__clear_bit(vector, synic->vec_bitmap);
 
+	auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
+
 	if (synic_has_vector_auto_eoi(synic, vector))
 		__set_bit(vector, synic->auto_eoi_bitmap);
 	else
 		__clear_bit(vector, synic->auto_eoi_bitmap);
+
+	auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
+
+	/* Hyper-V SynIC auto EOI SINTs are not compatible with APICV */
+	if (!auto_eoi_old && auto_eoi_new) {
+		if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
+			kvm_request_apicv_update(vcpu->kvm, false,
+						APICV_INHIBIT_REASON_HYPERV);
+	} else if (!auto_eoi_new && auto_eoi_old) {
+		if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
+			kvm_request_apicv_update(vcpu->kvm, true,
+						APICV_INHIBIT_REASON_HYPERV);
+	}
 }
 
 static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
@@ -933,12 +952,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)
 
 	synic = to_hv_synic(vcpu);
 
-	/*
-	 * Hyper-V SynIC auto EOI SINT's are
-	 * not compatible with APICV, so request
-	 * to deactivate APICV permanently.
-	 */
-	kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
 	synic->active = true;
 	synic->dont_zero_synic_pages = dont_zero_synic_pages;
 	synic->control = HV_SYNIC_CONTROL_ENABLE;
@@ -2466,6 +2479,8 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 				ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
 			if (!cpu_smt_possible())
 				ent->eax |= HV_X64_NO_NONARCH_CORESHARING;
+
+			ent->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
 			/*
 			 * Default number of spinlock retry attempts, matches
 			 * HyperV 2016.
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 10/12] KVM: SVM: remove svm_toggle_avic_for_irq_window
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (8 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  9:11   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 11/12] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC Maxim Levitsky
  2021-08-02 18:33 ` [PATCH v3 12/12] KVM: SVM: AVIC: drop unsupported AVIC base relocation code Maxim Levitsky
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

Now that kvm_request_apicv_update doesn't need to drop the kvm->srcu lock,
we can call kvm_request_apicv_update directly.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 11 -----------
 arch/x86/kvm/svm/svm.c  |  4 ++--
 arch/x86/kvm/svm/svm.h  |  1 -
 3 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index d0acbeeab3d6..1def54c26259 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -582,17 +582,6 @@ void avic_post_state_restore(struct kvm_vcpu *vcpu)
 	avic_handle_ldr_update(vcpu);
 }
 
-void svm_toggle_avic_for_irq_window(struct kvm_vcpu *vcpu, bool activate)
-{
-	if (!enable_apicv || !lapic_in_kernel(vcpu))
-		return;
-
-	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
-	kvm_request_apicv_update(vcpu->kvm, activate,
-				 APICV_INHIBIT_REASON_IRQWIN);
-	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-}
-
 void svm_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
 {
 	return;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3923d383face..c8827de49c75 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2994,7 +2994,7 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
 	 * In this case AVIC was temporarily disabled for
 	 * requesting the IRQ window and we have to re-enable it.
 	 */
-	svm_toggle_avic_for_irq_window(vcpu, true);
+	kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN);
 
 	++vcpu->stat.irq_window_exits;
 	return 1;
@@ -3546,7 +3546,7 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
 		 * via AVIC. In such case, we need to temporarily disable AVIC,
 		 * and fallback to injecting IRQ via V_IRQ.
 		 */
-		svm_toggle_avic_for_irq_window(vcpu, false);
+		kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
 		svm_set_vintr(svm);
 	}
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bd41f2a32838..aae851762b59 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -524,7 +524,6 @@ int avic_ga_log_notifier(u32 ga_tag);
 void avic_vm_destroy(struct kvm *kvm);
 int avic_vm_init(struct kvm *kvm);
 void avic_init_vmcb(struct vcpu_svm *svm);
-void svm_toggle_avic_for_irq_window(struct kvm_vcpu *vcpu, bool activate);
 int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu);
 int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu);
 int avic_init_vcpu(struct vcpu_svm *svm);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 11/12] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (9 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 10/12] KVM: SVM: remove svm_toggle_avic_for_irq_window Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  2021-08-03  9:00   ` Paolo Bonzini
  2021-08-02 18:33 ` [PATCH v3 12/12] KVM: SVM: AVIC: drop unsupported AVIC base relocation code Maxim Levitsky
  11 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

Currently it is possible to have the following scenario:

1. AVIC is disabled by svm_refresh_apicv_exec_ctrl
2. svm_vcpu_blocking calls avic_vcpu_put which does nothing
3. svm_vcpu_unblocking enables the AVIC (due to KVM_REQ_APICV_UPDATE)
   and then calls avic_vcpu_load
4. warning is triggered in avic_vcpu_load since
   AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK was never cleared

While it is possible to just remove the warning, it seems to be more robust
to fully disable/enable AVIC in svm_refresh_apicv_exec_ctrl by calling the
avic_vcpu_load/avic_vcpu_put

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 43 ++++++++++++++++++++++-------------------
 arch/x86/kvm/svm/svm.c  |  8 ++++++--
 arch/x86/kvm/x86.c      |  9 ++++++++-
 3 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 1def54c26259..a39f7888b587 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -80,6 +80,28 @@ enum avic_ipi_failure_cause {
 	AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
 };
 
+
+static void __avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+	if (is_run)
+		avic_vcpu_load(vcpu, vcpu->cpu);
+	else
+		avic_vcpu_put(vcpu);
+}
+
+/*
+ * This function is called during VCPU halt/unhalt.
+ */
+static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->avic_is_running = is_run;
+
+	if (kvm_vcpu_apicv_active(vcpu))
+		__avic_set_running(vcpu, is_run);
+}
+
 /* Note:
  * This function is called from IOMMU driver to notify
  * SVM to schedule in a particular vCPU of a particular VM.
@@ -651,6 +673,7 @@ void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 	}
 	vmcb_mark_dirty(vmcb, VMCB_AVIC);
 
+	__avic_set_running(vcpu, activated);
 	svm_set_pi_irte_mode(vcpu, activated);
 }
 
@@ -940,9 +963,6 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	int h_physical_id = kvm_cpu_get_apicid(cpu);
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (!kvm_vcpu_apicv_active(vcpu))
-		return;
-
 	/*
 	 * Since the host physical APIC id is 8 bits,
 	 * we can support host APIC ID upto 255.
@@ -970,9 +990,6 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
 	u64 entry;
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	if (!kvm_vcpu_apicv_active(vcpu))
-		return;
-
 	entry = READ_ONCE(*(svm->avic_physical_id_cache));
 	if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
 		avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
@@ -981,20 +998,6 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
 	WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
 }
 
-/*
- * This function is called during VCPU halt/unhalt.
- */
-static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-
-	svm->avic_is_running = is_run;
-	if (is_run)
-		avic_vcpu_load(vcpu, vcpu->cpu);
-	else
-		avic_vcpu_put(vcpu);
-}
-
 void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
 {
 	avic_set_running(vcpu, false);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c8827de49c75..61b5faba36cf 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1483,12 +1483,16 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		sd->current_vmcb = svm->vmcb;
 		indirect_branch_prediction_barrier();
 	}
-	avic_vcpu_load(vcpu, cpu);
+
+	if (kvm_vcpu_apicv_active(vcpu))
+		avic_vcpu_load(vcpu, cpu);
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 {
-	avic_vcpu_put(vcpu);
+	if (kvm_vcpu_apicv_active(vcpu))
+		avic_vcpu_put(vcpu);
+
 	svm_prepare_host_switch(vcpu);
 
 	++vcpu->stat.host_state_reload;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9d2e4594c4eb..67bea0809636 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9239,12 +9239,18 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
 
 void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 {
+	bool activate;
+
 	if (!lapic_in_kernel(vcpu))
 		return;
 
 	mutex_lock(&vcpu->kvm->arch.apicv_update_lock);
 
-	vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
+	activate = kvm_apicv_activated(vcpu->kvm);
+	if (vcpu->arch.apicv_active == activate)
+		goto out;
+
+	vcpu->arch.apicv_active = activate;
 	kvm_apic_update_apicv(vcpu);
 	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
 
@@ -9257,6 +9263,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.apicv_active)
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
 
+out:
 	mutex_unlock(&vcpu->kvm->arch.apicv_update_lock);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 12/12] KVM: SVM: AVIC: drop unsupported AVIC base relocation code
  2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
                   ` (10 preceding siblings ...)
  2021-08-02 18:33 ` [PATCH v3 11/12] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC Maxim Levitsky
@ 2021-08-02 18:33 ` Maxim Levitsky
  11 siblings, 0 replies; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-02 18:33 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin, Paolo Bonzini, Maxim Levitsky

APIC base relocation is not supported anyway and won't work
correctly so just drop the code that handles it and keep AVIC
MMIO bar at the default APIC base.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 2 ++
 arch/x86/kvm/svm/svm.c  | 7 -------
 arch/x86/kvm/svm/svm.h  | 6 ------
 3 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index a39f7888b587..d3e3b0bbcbab 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -219,6 +219,8 @@ void avic_init_vmcb(struct vcpu_svm *svm)
 	vmcb->control.avic_logical_id = lpa & AVIC_HPA_MASK;
 	vmcb->control.avic_physical_id = ppa & AVIC_HPA_MASK;
 	vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID_COUNT;
+	vmcb->control.avic_vapic_bar = APIC_DEFAULT_PHYS_BASE & VMCB_AVIC_APIC_BAR_MASK;
+
 	if (kvm_apicv_activated(svm->vcpu.kvm))
 		vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
 	else
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 61b5faba36cf..fab55f7d59ca 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1316,9 +1316,6 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	svm->virt_spec_ctrl = 0;
 
 	init_vmcb(vcpu);
-
-	if (kvm_vcpu_apicv_active(vcpu) && !init_event)
-		avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
 }
 
 void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
@@ -2970,10 +2967,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->msr_decfg = data;
 		break;
 	}
-	case MSR_IA32_APICBASE:
-		if (kvm_vcpu_apicv_active(vcpu))
-			avic_update_vapic_bar(to_svm(vcpu), data);
-		fallthrough;
 	default:
 		return kvm_set_msr_common(vcpu, msr);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index aae851762b59..524d943f3efc 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -503,12 +503,6 @@ extern struct kvm_x86_nested_ops svm_nested_ops;
 
 #define VMCB_AVIC_APIC_BAR_MASK		0xFFFFFFFFFF000ULL
 
-static inline void avic_update_vapic_bar(struct vcpu_svm *svm, u64 data)
-{
-	svm->vmcb->control.avic_vapic_bar = data & VMCB_AVIC_APIC_BAR_MASK;
-	vmcb_mark_dirty(svm->vmcb, VMCB_AVIC);
-}
-
 static inline bool avic_vcpu_is_running(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-- 
2.26.3


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"
  2021-08-02 18:33 ` [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock" Maxim Levitsky
@ 2021-08-03  8:05   ` Paolo Bonzini
  2021-08-03 15:11     ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  8:05 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> This together with the next patch will fix a future race between
> kvm_zap_gfn_range and the page fault handler, which will happen
> when AVIC memslot is going to be only partially disabled.
> 
> This is based on a patch suggested by Sean Christopherson:
> https://lkml.org/lkml/2021/7/22/1025

I'll also add a small note from the original message:

     The performance impact is minimal since kvm_zap_gfn_range is only called by
     users, update_mtrr() and kvm_post_set_cr0().  Both only use it if the guest
     has non-coherent DMA, in order to honor the guest's UC memtype.  MTRR and CD
     setup only happens at boot, and generally in an area where the page tables
     should be small (for CD) or should not include the affected GFNs at all
     (for MTRRs).

On top of this, I think the CD case (kvm_post_set_cr0) can be changed to use
kvm_mmu_zap_all_fast.

Paolo

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/mmu/mmu.c     | 19 ++++++++-----------
>   arch/x86/kvm/mmu/tdp_mmu.c | 15 ++++-----------
>   arch/x86/kvm/mmu/tdp_mmu.h | 11 ++++-------
>   3 files changed, 16 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a8cdfd8d45c4..9d78cb1c0f35 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5638,8 +5638,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>   	int i;
>   	bool flush = false;
>   
> +	write_lock(&kvm->mmu_lock);
> +
>   	if (kvm_memslots_have_rmaps(kvm)) {
> -		write_lock(&kvm->mmu_lock);
>   		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>   			slots = __kvm_memslots(kvm, i);
>   			kvm_for_each_memslot(memslot, slots) {
> @@ -5659,22 +5660,18 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>   		}
>   		if (flush)
>   			kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
> -		write_unlock(&kvm->mmu_lock);
>   	}
>   
>   	if (is_tdp_mmu_enabled(kvm)) {
> -		flush = false;
> -
> -		read_lock(&kvm->mmu_lock);
>   		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
>   			flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start,
> -							  gfn_end, flush, true);
> -		if (flush)
> -			kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
> -							   gfn_end);
> -
> -		read_unlock(&kvm->mmu_lock);
> +							  gfn_end, flush);
>   	}
> +
> +	if (flush)
> +		kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
> +
> +	write_unlock(&kvm->mmu_lock);
>   }
>   
>   static bool slot_rmap_write_protect(struct kvm *kvm,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 43f12f5d12c0..3e0222ce3f4e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -777,21 +777,15 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>    * non-root pages mapping GFNs strictly within that range. Returns true if
>    * SPTEs have been cleared and a TLB flush is needed before releasing the
>    * MMU lock.
> - *
> - * If shared is true, this thread holds the MMU lock in read mode and must
> - * account for the possibility that other threads are modifying the paging
> - * structures concurrently. If shared is false, this thread should hold the
> - * MMU in write mode.
>    */
>   bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
> -				 gfn_t end, bool can_yield, bool flush,
> -				 bool shared)
> +				 gfn_t end, bool can_yield, bool flush)
>   {
>   	struct kvm_mmu_page *root;
>   
> -	for_each_tdp_mmu_root_yield_safe(kvm, root, as_id, shared)
> +	for_each_tdp_mmu_root_yield_safe(kvm, root, as_id, false)
>   		flush = zap_gfn_range(kvm, root, start, end, can_yield, flush,
> -				      shared);
> +				      false);
>   
>   	return flush;
>   }
> @@ -803,8 +797,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
>   	int i;
>   
>   	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
> -		flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, max_gfn,
> -						  flush, false);
> +		flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, max_gfn, flush);
>   
>   	if (flush)
>   		kvm_flush_remote_tlbs(kvm);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
> index b224d126adf9..358f447d4012 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.h
> +++ b/arch/x86/kvm/mmu/tdp_mmu.h
> @@ -20,14 +20,11 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>   			  bool shared);
>   
>   bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
> -				 gfn_t end, bool can_yield, bool flush,
> -				 bool shared);
> +				 gfn_t end, bool can_yield, bool flush);
>   static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id,
> -					     gfn_t start, gfn_t end, bool flush,
> -					     bool shared)
> +					     gfn_t start, gfn_t end, bool flush)
>   {
> -	return __kvm_tdp_mmu_zap_gfn_range(kvm, as_id, start, end, true, flush,
> -					   shared);
> +	return __kvm_tdp_mmu_zap_gfn_range(kvm, as_id, start, end, true, flush);
>   }
>   static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>   {
> @@ -44,7 +41,7 @@ static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>   	 */
>   	lockdep_assert_held_write(&kvm->mmu_lock);
>   	return __kvm_tdp_mmu_zap_gfn_range(kvm, kvm_mmu_page_as_id(sp),
> -					   sp->gfn, end, false, false, false);
> +					   sp->gfn, end, false, false);
>   }
>   
>   void kvm_tdp_mmu_zap_all(struct kvm *kvm);
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited
  2021-08-02 18:33 ` [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited Maxim Levitsky
@ 2021-08-03  8:44   ` Paolo Bonzini
  2021-08-09 18:51     ` Maxim Levitsky
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  8:44 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

Reviewing this patch and the next one together.

On 02/08/21 20:33, Maxim Levitsky wrote:
> +static int avic_alloc_access_page(struct kvm *kvm)
>  {
>  	void __user *ret;
>  	int r = 0;
>  
>  	mutex_lock(&kvm->slots_lock);
> +
> +	if (kvm->arch.apic_access_memslot_enabled)
>  		goto out;

This variable is overloaded between "is access enabled" and "is the 
memslot allocated".  I think you should check 
kvm->arch.apicv_inhibit_reasons instead in kvm_faultin_pfn.


> +	if (!activate)
> +		kvm_zap_gfn_range(kvm, gpa_to_gfn(APIC_DEFAULT_PHYS_BASE),
> +				  gpa_to_gfn(APIC_DEFAULT_PHYS_BASE + PAGE_SIZE));
> +

Off by one, the last argument of kvm_zap_gfn_range is inclusive:
Also, checking "activate" is a bit ugly when we have "new" available as 
well.  Yes, they are the same if !!old != !!new, but we care about the 
global state, not the single bit.

Putting everything together, this could become something like

         trace_kvm_apicv_update_request(activate, bit);
         if (!!old != !!new) {
		/*
		 * Kick all CPUs out of guest mode.  When
		 * kvm_vcpu_update_apicv succeeds in taking
		 * apicv_update_lock, it will see the
		 * new apicv_inhibit_reasons that we set below.
		 */
	        kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);

	        if (new) {
	                unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
	                kvm_zap_gfn_range(kvm, gfn, gfn);
	        }
	}
         kvm->arch.apicv_inhibit_reasons = new;
         mutex_unlock(&kvm->arch.apicv_update_lock);

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 08/12] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state
  2021-08-02 18:33 ` [PATCH v3 08/12] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state Maxim Levitsky
@ 2021-08-03  8:45   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  8:45 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
>   
> +	WARN_ON_ONCE(vcpu->kvm->arch.apic_access_memslot_enabled !=
> +		     kvm_vcpu_apicv_active(vcpu));

This should also check !vcpu->kvm->arch.apicv_inhibit_reasons instead of 
apic_access_memslot_enabled.

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use
  2021-08-02 18:33 ` [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
@ 2021-08-03  8:47   ` Paolo Bonzini
  2021-08-03  9:01   ` Paolo Bonzini
  2021-08-03  9:11   ` Paolo Bonzini
  2 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  8:47 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> Maxim:
>     - always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid,
>       since this feature can be used regardless of AVIC

I tend to agree here.  While AutoEOI does have a minor performance 
improvement, I suspect that APICv will be available in most cases where 
Windows guest performance matters to that point.

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 11/12] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC
  2021-08-02 18:33 ` [PATCH v3 11/12] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC Maxim Levitsky
@ 2021-08-03  9:00   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  9:00 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> @@ -651,6 +673,7 @@ void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>   	}
>   	vmcb_mark_dirty(vmcb, VMCB_AVIC);
>   
> +	__avic_set_running(vcpu, activated);
>   	svm_set_pi_irte_mode(vcpu, activated);
>   }
>   

I'd rather have calls to avic_vcpu_load/avic_vcpu_put directly inside 
the "if (activated)", and leaving avic_set_running to its current 
implementation.  That way you don't need __avic_set_running (which is a 
confusing name, because it does more than just setting the running bit).

>  void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>  {
> +	bool activate;
> +
>  	if (!lapic_in_kernel(vcpu))
>  		return;
>  
>  	mutex_lock(&vcpu->kvm->arch.apicv_update_lock);
>  
> -	vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
> +	activate = kvm_apicv_activated(vcpu->kvm);
> +	if (vcpu->arch.apicv_active == activate)
> +		goto out;
> +
> +	vcpu->arch.apicv_active = activate;
>  	kvm_apic_update_apicv(vcpu);
>  	static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
>  
> @@ -9257,6 +9263,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>  	if (!vcpu->arch.apicv_active)
>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
>  
> +out:
>  	mutex_unlock(&vcpu->kvm->arch.apicv_update_lock);
>  }
>  EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);

Should this be a separate patch?

As an aside, we have

static inline bool kvm_vcpu_apicv_active(struct kvm_vcpu *vcpu)
{
         return vcpu->arch.apic && vcpu->arch.apicv_active;
}

but really vcpu->arch.apicv_active should never be true if 
vcpu->arch.apic is.  So it should be possible to change this to "return 
vcpu->arch.apicv_active" with a comment that the serialization between 
apicv_inhibit_reasons and apicv_active happens via apicv_update_lock.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 02/12] KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range
  2021-08-02 18:33 ` [PATCH v3 02/12] KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range Maxim Levitsky
@ 2021-08-03  9:00   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  9:00 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> This together with previous patch, ensures that
> kvm_zap_gfn_range doesn't race with page fault
> running on another vcpu, and will make this page fault code
> retry instead.
> 
> This is based on a patch suggested by Sean Christopherson:
> https://lkml.org/lkml/2021/7/22/1025
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/mmu/mmu.c   | 4 ++++
>   include/linux/kvm_host.h | 5 +++++
>   virt/kvm/kvm_main.c      | 7 +++++--
>   3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 9d78cb1c0f35..9da635e383c2 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5640,6 +5640,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>   
>   	write_lock(&kvm->mmu_lock);
>   
> +	kvm_inc_notifier_count(kvm, gfn_start, gfn_end);
> +
>   	if (kvm_memslots_have_rmaps(kvm)) {
>   		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>   			slots = __kvm_memslots(kvm, i);
> @@ -5671,6 +5673,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>   	if (flush)
>   		kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
>   
> +	kvm_dec_notifier_count(kvm, gfn_start, gfn_end);
> +
>   	write_unlock(&kvm->mmu_lock);
>   }
>   
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9d6b4ad407b8..962e11a73e8e 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -985,6 +985,11 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
>   void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
>   #endif
>   
> +void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
> +				   unsigned long end);
> +void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
> +				   unsigned long end);
> +
>   long kvm_arch_dev_ioctl(struct file *filp,
>   			unsigned int ioctl, unsigned long arg);
>   long kvm_arch_vcpu_ioctl(struct file *filp,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a96cbe24c688..71042cd807b3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -608,7 +608,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
>   	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
>   }
>   
> -static void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
> +void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
>   				   unsigned long end)
>   {
>   	/*
> @@ -636,6 +636,7 @@ static void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
>   			max(kvm->mmu_notifier_range_end, end);
>   	}
>   }
> +EXPORT_SYMBOL_GPL(kvm_inc_notifier_count);
>   
>   static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>   					const struct mmu_notifier_range *range)
> @@ -670,7 +671,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>   	return 0;
>   }
>   
> -static void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
> +void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
>   				   unsigned long end)
>   {
>   	/*
> @@ -687,6 +688,8 @@ static void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
>   	 */
>   	kvm->mmu_notifier_count--;
>   }
> +EXPORT_SYMBOL_GPL(kvm_dec_notifier_count);
> +
>   
>   static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
>   					const struct mmu_notifier_range *range)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 03/12] KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfn
  2021-08-02 18:33 ` [PATCH v3 03/12] KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfn Maxim Levitsky
@ 2021-08-03  9:00   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  9:00 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> try_async_pf is a wrong name for this function, since this code
> is used when asynchronous page fault is not enabled as well.
> 
> This code is based on a patch from Sean Christopherson:
> https://lkml.org/lkml/2021/7/19/2970
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/mmu/mmu.c         | 4 ++--
>   arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
>   2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 9da635e383c2..c5e0ecf5f758 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3842,7 +3842,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>   				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
>   }
>   
> -static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
> +static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
>   			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
>   			 bool write, bool *writable)
>   {
> @@ -3912,7 +3912,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>   	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>   	smp_rmb();
>   
> -	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
> +	if (kvm_faultin_pfn(vcpu, prefault, gfn, gpa, &pfn, &hva,
>   			 write, &map_writable))
>   		return RET_PF_RETRY;
>   
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index ee044d357b5f..f349eae69bf3 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -881,7 +881,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
>   	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>   	smp_rmb();
>   
> -	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
> +	if (kvm_faultin_pfn(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
>   			 write_fault, &map_writable))
>   		return RET_PF_RETRY;
>   
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 04/12] KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code
  2021-08-02 18:33 ` [PATCH v3 04/12] KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code Maxim Levitsky
@ 2021-08-03  9:00   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  9:00 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> This will allow it to return RET_PF_EMULATE for APIC mmio
> emulation.
> 
> This code is based on a patch from Sean Christopherson:
> https://lkml.org/lkml/2021/7/19/2970
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/mmu/mmu.c         | 17 ++++++++++-------
>   arch/x86/kvm/mmu/paging_tmpl.h |  4 ++--
>   2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c5e0ecf5f758..6f77f6efd43c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3844,7 +3844,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>   
>   static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
>   			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
> -			 bool write, bool *writable)
> +			 bool write, bool *writable, int *r)
>   {
>   	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
>   	bool async;
> @@ -3855,7 +3855,7 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
>   	 * be zapped before KVM inserts a new MMIO SPTE for the gfn.
>   	 */
>   	if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
> -		return true;
> +		goto out_retry;
>   
>   	/* Don't expose private memslots to L2. */
>   	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
> @@ -3875,14 +3875,17 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
>   		if (kvm_find_async_pf_gfn(vcpu, gfn)) {
>   			trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
>   			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
> -			return true;
> +			goto out_retry;
>   		} else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn))
> -			return true;
> +			goto out_retry;
>   	}
>   
>   	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
>   				    write, writable, hva);
> -	return false;
> +
> +out_retry:
> +	*r = RET_PF_RETRY;
> +	return true;
>   }
>   
>   static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> @@ -3913,8 +3916,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
>   	smp_rmb();
>   
>   	if (kvm_faultin_pfn(vcpu, prefault, gfn, gpa, &pfn, &hva,
> -			 write, &map_writable))
> -		return RET_PF_RETRY;
> +			 write, &map_writable, &r))
> +		return r;
>   
>   	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
>   		return r;
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index f349eae69bf3..7d03e9b7ccfa 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -882,8 +882,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
>   	smp_rmb();
>   
>   	if (kvm_faultin_pfn(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
> -			 write_fault, &map_writable))
> -		return RET_PF_RETRY;
> +			 write_fault, &map_writable, &r))
> +		return r;
>   
>   	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
>   		return r;
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use
  2021-08-02 18:33 ` [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
  2021-08-03  8:47   ` Paolo Bonzini
@ 2021-08-03  9:01   ` Paolo Bonzini
  2021-08-03  9:11   ` Paolo Bonzini
  2 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  9:01 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon
> SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is,
> however, possible to track whether the feature was actually used by the
> guest and only inhibit APICv/AVIC when needed.
> 
> TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let
> Windows know that AutoEOI feature should be avoided. While it's up to
> KVM userspace to set the flag, KVM can help a bit by exposing global
> APICv/AVIC enablement.
> 
> Maxim:
>     - always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid,
>       since this feature can be used regardless of AVIC
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  3 +++
>   arch/x86/kvm/hyperv.c           | 27 +++++++++++++++++++++------
>   2 files changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 80323b5fb20a..55b1f79d9c43 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -988,6 +988,9 @@ struct kvm_hv {
>   	/* How many vCPUs have VP index != vCPU index */
>   	atomic_t num_mismatched_vp_indexes;
>   
> +	/* How many SynICs use 'AutoEOI' feature */
> +	atomic_t synic_auto_eoi_used;
> +
>   	struct hv_partition_assist_pg *hv_pa_pg;
>   	struct kvm_hv_syndbg hv_syndbg;
>   };
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index b07592ca92f0..638f3c559623 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -88,6 +88,10 @@ static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic,
>   static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
>   				int vector)
>   {
> +	struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
> +	struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
> +	int auto_eoi_old, auto_eoi_new;
> +
>   	if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
>   		return;
>   
> @@ -96,10 +100,25 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
>   	else
>   		__clear_bit(vector, synic->vec_bitmap);
>   
> +	auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
> +
>   	if (synic_has_vector_auto_eoi(synic, vector))
>   		__set_bit(vector, synic->auto_eoi_bitmap);
>   	else
>   		__clear_bit(vector, synic->auto_eoi_bitmap);
> +
> +	auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
> +
> +	/* Hyper-V SynIC auto EOI SINTs are not compatible with APICV */
> +	if (!auto_eoi_old && auto_eoi_new) {
> +		if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
> +			kvm_request_apicv_update(vcpu->kvm, false,
> +						APICV_INHIBIT_REASON_HYPERV);
> +	} else if (!auto_eoi_new && auto_eoi_old) {
> +		if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
> +			kvm_request_apicv_update(vcpu->kvm, true,
> +						APICV_INHIBIT_REASON_HYPERV);
> +	}
>   }
>   
>   static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
> @@ -933,12 +952,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)
>   
>   	synic = to_hv_synic(vcpu);
>   
> -	/*
> -	 * Hyper-V SynIC auto EOI SINT's are
> -	 * not compatible with APICV, so request
> -	 * to deactivate APICV permanently.
> -	 */
> -	kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
>   	synic->active = true;
>   	synic->dont_zero_synic_pages = dont_zero_synic_pages;
>   	synic->control = HV_SYNIC_CONTROL_ENABLE;
> @@ -2466,6 +2479,8 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>   				ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
>   			if (!cpu_smt_possible())
>   				ent->eax |= HV_X64_NO_NONARCH_CORESHARING;
> +
> +			ent->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
>   			/*
>   			 * Default number of spinlock retry attempts, matches
>   			 * HyperV 2016.
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use
  2021-08-02 18:33 ` [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
  2021-08-03  8:47   ` Paolo Bonzini
  2021-08-03  9:01   ` Paolo Bonzini
@ 2021-08-03  9:11   ` Paolo Bonzini
  2 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  9:11 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> +	if (!auto_eoi_old && auto_eoi_new) {
> +		if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
> +			kvm_request_apicv_update(vcpu->kvm, false,
> +						APICV_INHIBIT_REASON_HYPERV);
> +	} else if (!auto_eoi_new && auto_eoi_old) {
> +		if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
> +			kvm_request_apicv_update(vcpu->kvm, true,
> +						APICV_INHIBIT_REASON_HYPERV);

Hmm no, Reviewed-by rescinded.  This is racy, you cannot use atomics. 
The check for zero needs to happen within the lock.

The easiest way is to have a __kvm_request_apicv_update function that 
leaves it to the caller to take the lock.  Then synic_update_vector can do

	if (!!auto_eoi_old == !!auto_eoi_new)
		return;

	mutex_lock(&kvm->apicv_update_lock);
	bool was_active = hv->synic_auto_eoi_used;
	if (auto_eoi_new)
		hv->synic_auto_eoi_used++;
	else
		hv->synic_auto_eoi_used--;

	if (!!hv->synic_auto_eoi_used != !!was_active)
		__kvm_request_apicv_update(vcpu->kvm,
					   !!hv->synic_auto_eoi_used,
					   APICV_INHIBIT_REASON_HYPERV);
	mutex_unlock(&kvm->apicv_update_lock);

Please add a note to synic_auto_eoi_used that it is protected by 
apicv_update_lock.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 10/12] KVM: SVM: remove svm_toggle_avic_for_irq_window
  2021-08-02 18:33 ` [PATCH v3 10/12] KVM: SVM: remove svm_toggle_avic_for_irq_window Maxim Levitsky
@ 2021-08-03  9:11   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  9:11 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> Now that kvm_request_apicv_update doesn't need to drop the kvm->srcu lock,
> we can call kvm_request_apicv_update directly.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/avic.c | 11 -----------
>   arch/x86/kvm/svm/svm.c  |  4 ++--
>   arch/x86/kvm/svm/svm.h  |  1 -
>   3 files changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index d0acbeeab3d6..1def54c26259 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -582,17 +582,6 @@ void avic_post_state_restore(struct kvm_vcpu *vcpu)
>   	avic_handle_ldr_update(vcpu);
>   }
>   
> -void svm_toggle_avic_for_irq_window(struct kvm_vcpu *vcpu, bool activate)
> -{
> -	if (!enable_apicv || !lapic_in_kernel(vcpu))
> -		return;
> -
> -	srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> -	kvm_request_apicv_update(vcpu->kvm, activate,
> -				 APICV_INHIBIT_REASON_IRQWIN);
> -	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> -}
> -
>   void svm_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
>   {
>   	return;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3923d383face..c8827de49c75 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2994,7 +2994,7 @@ static int interrupt_window_interception(struct kvm_vcpu *vcpu)
>   	 * In this case AVIC was temporarily disabled for
>   	 * requesting the IRQ window and we have to re-enable it.
>   	 */
> -	svm_toggle_avic_for_irq_window(vcpu, true);
> +	kvm_request_apicv_update(vcpu->kvm, true, APICV_INHIBIT_REASON_IRQWIN);
>   
>   	++vcpu->stat.irq_window_exits;
>   	return 1;
> @@ -3546,7 +3546,7 @@ static void svm_enable_irq_window(struct kvm_vcpu *vcpu)
>   		 * via AVIC. In such case, we need to temporarily disable AVIC,
>   		 * and fallback to injecting IRQ via V_IRQ.
>   		 */
> -		svm_toggle_avic_for_irq_window(vcpu, false);
> +		kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_IRQWIN);
>   		svm_set_vintr(svm);
>   	}
>   }
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index bd41f2a32838..aae851762b59 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -524,7 +524,6 @@ int avic_ga_log_notifier(u32 ga_tag);
>   void avic_vm_destroy(struct kvm *kvm);
>   int avic_vm_init(struct kvm *kvm);
>   void avic_init_vmcb(struct vcpu_svm *svm);
> -void svm_toggle_avic_for_irq_window(struct kvm_vcpu *vcpu, bool activate);
>   int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu);
>   int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu);
>   int avic_init_vcpu(struct vcpu_svm *svm);
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 05/12] KVM: x86/mmu: allow APICv memslot to be partially enabled
  2021-08-02 18:33 ` [PATCH v3 05/12] KVM: x86/mmu: allow APICv memslot to be partially enabled Maxim Levitsky
@ 2021-08-03  9:12   ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03  9:12 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 02/08/21 20:33, Maxim Levitsky wrote:
> on AMD, APIC virtualization needs to dynamicaly inhibit the AVIC in a
> response to some events, and this is problematic and not efficient to do by
> enabling/disabling the memslot that covers APIC's mmio range.
> Plus due to SRCU locking, it makes it more complex to request AVIC inhibition.
> 
> Instead, the APIC memslot will be always enabled, but the MMU code
> will not install a SPTE for it, when arch.apic_access_memslot_enabled == false
> and instead jump straight to emulating the access.
> 
> When inhibiting the AVIC, this SPTE will be zapped.
> 
> This code is based on a suggestion from Sean Christopherson:
> https://lkml.org/lkml/2021/7/19/2970
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6f77f6efd43c..965b562da893 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3857,11 +3857,24 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
>   	if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
>   		goto out_retry;
>   
> -	/* Don't expose private memslots to L2. */
> -	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
> -		*pfn = KVM_PFN_NOSLOT;
> -		*writable = false;
> -		return false;
> +	if (!kvm_is_visible_memslot(slot)) {
> +		/* Don't expose private memslots to L2. */
> +		if (is_guest_mode(vcpu)) {
> +			*pfn = KVM_PFN_NOSLOT;
> +			*writable = false;
> +			return false;
> +		}
> +		/*
> +		 * If the APIC access page exists but is disabled, go directly
> +		 * to emulation without caching the MMIO access or creating a
> +		 * MMIO SPTE.  That way the cache doesn't need to be purged
> +		 * when the AVIC is re-enabled.
> +		 */
> +		if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
> +		    !vcpu->kvm->arch.apic_access_memslot_enabled) {

In addition to using apicv_inhibit_reasons, I would change the subject 
to "allow APICv memslot to be enabled but invisible".  Otherwise looks good.

Paolo

> +			*r = RET_PF_EMULATE;
> +			return true;
> +		}
>   	}
>   
>   	async = false;
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"
  2021-08-03  8:05   ` Paolo Bonzini
@ 2021-08-03 15:11     ` Sean Christopherson
  2021-08-03 17:29       ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2021-08-03 15:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Maxim Levitsky, kvm, Wanpeng Li, Thomas Gleixner, Joerg Roedel,
	Borislav Petkov, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On Tue, Aug 03, 2021, Paolo Bonzini wrote:
> On 02/08/21 20:33, Maxim Levitsky wrote:
> > From: Sean Christopherson <seanjc@google.com>
> > 
> > This together with the next patch will fix a future race between
> > kvm_zap_gfn_range and the page fault handler, which will happen
> > when AVIC memslot is going to be only partially disabled.
> > 
> > This is based on a patch suggested by Sean Christopherson:
> > https://lkml.org/lkml/2021/7/22/1025
> 
> I'll also add a small note from the original message:
> 
>     The performance impact is minimal since kvm_zap_gfn_range is only called by
>     users, update_mtrr() and kvm_post_set_cr0().  Both only use it if the guest
>     has non-coherent DMA, in order to honor the guest's UC memtype.  MTRR and CD
>     setup only happens at boot, and generally in an area where the page tables
>     should be small (for CD) or should not include the affected GFNs at all
>     (for MTRRs).
> 
> On top of this, I think the CD case (kvm_post_set_cr0) can be changed to use
> kvm_mmu_zap_all_fast.

No, because fast zap requires kvm->slots_lock be held.  That could be relaxed by
reverting ca333add6933 ("KVM: x86/mmu: Explicitly track only a single invalid mmu
generation") and converting mmu_valid_gen to a u64 (to prevent wrap on 32-bit KVM).
IMO the extra memory cost, even though it's meager savings when using TDP without
nested, isn't worth relaxing the rules for fast zap.  Non-coherent DMA isn't very
common these days, and toggling CR0.CD is a rare guest operation (it'd probably
never happen if the darn architcture didn't set it on RESET).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"
  2021-08-03 15:11     ` Sean Christopherson
@ 2021-08-03 17:29       ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2021-08-03 17:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm, Wanpeng Li, Thomas Gleixner, Joerg Roedel,
	Borislav Petkov, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On 03/08/21 17:11, Sean Christopherson wrote:
>> On top of this, I think the CD case (kvm_post_set_cr0) can be changed to use
>> kvm_mmu_zap_all_fast.
> No, because fast zap requires kvm->slots_lock be held.

Yeah, that much I knew. :)  The issue is the usual ordering between 
slots_lock and SRCU.

Paolo

   That could be relaxed by
> reverting ca333add6933 ("KVM: x86/mmu: Explicitly track only a single invalid mmu
> generation") and converting mmu_valid_gen to a u64 (to prevent wrap on 32-bit KVM).
> IMO the extra memory cost, even though it's meager savings when using TDP without
> nested, isn't worth relaxing the rules for fast zap.  Non-coherent DMA isn't very
> common these days, and toggling CR0.CD is a rare guest operation (it'd probably
> never happen if the darn architcture didn't set it on RESET).
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited
  2021-08-03  8:44   ` Paolo Bonzini
@ 2021-08-09 18:51     ` Maxim Levitsky
  2021-08-09 19:14       ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Maxim Levitsky @ 2021-08-09 18:51 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Wanpeng Li, Thomas Gleixner, Joerg Roedel, Borislav Petkov,
	Sean Christopherson, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On Tue, 2021-08-03 at 10:44 +0200, Paolo Bonzini wrote:
> Reviewing this patch and the next one together.
> 
> On 02/08/21 20:33, Maxim Levitsky wrote:
> > +static int avic_alloc_access_page(struct kvm *kvm)
> >  {
> >  	void __user *ret;
> >  	int r = 0;
> >  
> >  	mutex_lock(&kvm->slots_lock);
> > +
> > +	if (kvm->arch.apic_access_memslot_enabled)
> >  		goto out;
> 
> This variable is overloaded between "is access enabled" and "is the 
> memslot allocated".  I think you should check 
> kvm->arch.apicv_inhibit_reasons instead in kvm_faultin_pfn.
> 
> 
> > +	if (!activate)
> > +		kvm_zap_gfn_range(kvm, gpa_to_gfn(APIC_DEFAULT_PHYS_BASE),
> > +				  gpa_to_gfn(APIC_DEFAULT_PHYS_BASE + PAGE_SIZE));
> > +
> 
> Off by one, the last argument of kvm_zap_gfn_range is inclusive:

Actually is it? 

There are 3 uses of this function.
Two of them (kvm_post_set_cr0 and one case in update_mtrr) use 0,~0ULL which is indeed inclusive,
but for variable mtrrs I see that in var_mtrr_range this code:

*end = (*start | ~mask) + 1;

and the *end is passed to kvm_zap_gfn_range.


Another thing I noticed that I added calls to kvm_inc_notifier_count/kvm_dec_notifier_count
in the kvm_zap_gfn_range but these do seem to have non inclusive ends, thus 
I need to fix them sadly if this is the case.
This depends on mmu_notifier_ops and it is not documented well.

However at least mmu_notifier_retry_hva, does assume a non inclusive range since it checks


hva >= kvm->mmu_notifier_range_start &&
	    hva < kvm->mmu_notifier_range_end


Also looking at the algorithm of the kvm_zap_gfn_range.
Suppose that gfn_start == gfn_end and we have a memslot with one page at gfn_start

Then:


start = max(gfn_start, memslot->base_gfn); // start = memslot->base_gfn
end = min(gfn_end, memslot->base_gfn + memslot->npages); // end = memslot->base_gfn

if (start >= end)
	continue;

In this case it seems that it will do nothing. So I suspect that kvm_zap_gfn_range
actually needs non inclusive range but due to the facts that it was used much
it didn't cause trouble.


Another thing I found in kvm_zap_gfn_range:

kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);

But kvm_flush_remote_tlbs_with_address expects (struct kvm *kvm, u64 start_gfn, u64 pages)

kvm_flush_remote_tlbs_with_address is also for some reason called twice with the same parameters.

Could you help with that? Am I missing something?

Thanks in advance,
Best regards,
	Maxim Levitsky




> Also, checking "activate" is a bit ugly when we have "new" available as 
> well.  Yes, they are the same if !!old != !!new, but we care about the 
> global state, not the single bit.
> 
> Putting everything together, this could become something like
> 
>          trace_kvm_apicv_update_request(activate, bit);
>          if (!!old != !!new) {
> 		/*
> 		 * Kick all CPUs out of guest mode.  When
> 		 * kvm_vcpu_update_apicv succeeds in taking
> 		 * apicv_update_lock, it will see the
> 		 * new apicv_inhibit_reasons that we set below.
> 		 */
> 	        kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
> 
> 	        if (new) {
> 	                unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
> 	                kvm_zap_gfn_range(kvm, gfn, gfn);
> 	        }
> 	}
>          kvm->arch.apicv_inhibit_reasons = new;
>          mutex_unlock(&kvm->arch.apicv_update_lock);
> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited
  2021-08-09 18:51     ` Maxim Levitsky
@ 2021-08-09 19:14       ` Sean Christopherson
  0 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2021-08-09 19:14 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, kvm, Wanpeng Li, Thomas Gleixner, Joerg Roedel,
	Borislav Petkov, Jim Mattson,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Suravee Suthikulpanit, Vitaly Kuznetsov, Ingo Molnar,
	H. Peter Anvin

On Mon, Aug 09, 2021, Maxim Levitsky wrote:
> On Tue, 2021-08-03 at 10:44 +0200, Paolo Bonzini wrote:
> > Reviewing this patch and the next one together.
> > 
> > On 02/08/21 20:33, Maxim Levitsky wrote:
> > > +static int avic_alloc_access_page(struct kvm *kvm)
> > >  {
> > >  	void __user *ret;
> > >  	int r = 0;
> > >  
> > >  	mutex_lock(&kvm->slots_lock);
> > > +
> > > +	if (kvm->arch.apic_access_memslot_enabled)
> > >  		goto out;
> > 
> > This variable is overloaded between "is access enabled" and "is the 
> > memslot allocated".  I think you should check 
> > kvm->arch.apicv_inhibit_reasons instead in kvm_faultin_pfn.
> > 
> > 
> > > +	if (!activate)
> > > +		kvm_zap_gfn_range(kvm, gpa_to_gfn(APIC_DEFAULT_PHYS_BASE),
> > > +				  gpa_to_gfn(APIC_DEFAULT_PHYS_BASE + PAGE_SIZE));
> > > +
> > 
> > Off by one, the last argument of kvm_zap_gfn_range is inclusive:
> 
> Actually is it? 

Nope.  The actual implementation is exclusive for both legacy and TDP MMU.  And
as you covered below, the fixed and variable MTRR helpers provide exclusive
start+end, so there's no functional bug.  The "0 - ~0" use case is irrevelant
because there can't be physical memory at -4096.

The ~0ull case can be fixed by adding a helper to get the max GFN possible, e.g.
steal this code from kvm_tdp_mmu_put_root()

	gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);

and maybe add a comment saying it intentionally ignores guest.MAXPHYADDR (from
CPUID) so that the helper can be used even when CPUID is being modified.

> There are 3 uses of this function.
> Two of them (kvm_post_set_cr0 and one case in update_mtrr) use 0,~0ULL which is indeed inclusive,
> but for variable mtrrs I see that in var_mtrr_range this code:
> 
> *end = (*start | ~mask) + 1;
> 
> and the *end is passed to kvm_zap_gfn_range.
> 
> 
> Another thing I noticed that I added calls to kvm_inc_notifier_count/kvm_dec_notifier_count
> in the kvm_zap_gfn_range but these do seem to have non inclusive ends, thus 
> I need to fix them sadly if this is the case.
> This depends on mmu_notifier_ops and it is not documented well.
> 
> However at least mmu_notifier_retry_hva, does assume a non inclusive range since it checks
> 
> 
> hva >= kvm->mmu_notifier_range_start &&
> 	    hva < kvm->mmu_notifier_range_end
> 
> 
> Also looking at the algorithm of the kvm_zap_gfn_range.
> Suppose that gfn_start == gfn_end and we have a memslot with one page at gfn_start
> 
> Then:
> 
> 
> start = max(gfn_start, memslot->base_gfn); // start = memslot->base_gfn
> end = min(gfn_end, memslot->base_gfn + memslot->npages); // end = memslot->base_gfn
> 
> if (start >= end)
> 	continue;
> 
> In this case it seems that it will do nothing. So I suspect that kvm_zap_gfn_range
> actually needs non inclusive range but due to the facts that it was used much
> it didn't cause trouble.
>
> Another thing I found in kvm_zap_gfn_range:
> 
> kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
> 
> But kvm_flush_remote_tlbs_with_address expects (struct kvm *kvm, u64 start_gfn, u64 pages)

Heh, surpise, surprise, a rare path with no architecturally visible effects is
busted :-)

> kvm_flush_remote_tlbs_with_address is also for some reason called twice with
> the same parameters.

It's called twice in the current code because mmu_lock is dropped between handling
the current MMU and the legacy mmu.

> Could you help with that? Am I missing something?


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-08-09 19:14 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-02 18:33 [PATCH v3 00/12] My AVIC patch queue Maxim Levitsky
2021-08-02 18:33 ` [PATCH v3 01/12] Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock" Maxim Levitsky
2021-08-03  8:05   ` Paolo Bonzini
2021-08-03 15:11     ` Sean Christopherson
2021-08-03 17:29       ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 02/12] KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range Maxim Levitsky
2021-08-03  9:00   ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 03/12] KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfn Maxim Levitsky
2021-08-03  9:00   ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 04/12] KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code Maxim Levitsky
2021-08-03  9:00   ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 05/12] KVM: x86/mmu: allow APICv memslot to be partially enabled Maxim Levitsky
2021-08-03  9:12   ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited Maxim Levitsky
2021-08-03  8:44   ` Paolo Bonzini
2021-08-09 18:51     ` Maxim Levitsky
2021-08-09 19:14       ` Sean Christopherson
2021-08-02 18:33 ` [PATCH v3 07/12] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM Maxim Levitsky
2021-08-02 18:33 ` [PATCH v3 08/12] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state Maxim Levitsky
2021-08-03  8:45   ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 09/12] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use Maxim Levitsky
2021-08-03  8:47   ` Paolo Bonzini
2021-08-03  9:01   ` Paolo Bonzini
2021-08-03  9:11   ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 10/12] KVM: SVM: remove svm_toggle_avic_for_irq_window Maxim Levitsky
2021-08-03  9:11   ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 11/12] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC Maxim Levitsky
2021-08-03  9:00   ` Paolo Bonzini
2021-08-02 18:33 ` [PATCH v3 12/12] KVM: SVM: AVIC: drop unsupported AVIC base relocation code Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).