All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap
@ 2023-07-14  6:46 Yan Zhao
  2023-07-14  6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
                   ` (13 more replies)
  0 siblings, 14 replies; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:46 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

This series refines mmu zap caused by EPT memory type update when guest
MTRRs are honored.

Patches 1-5 revolve around utilizing helper functions to check if
KVM TDP honors guest MTRRs, TDP zaps and page fault max_level reduction
are now only targeted to TDPs that honor guest MTRRs.

-The 5th patch will trigger zapping of TDP leaf entries if non-coherent
 DMA devices count goes from 0 to 1 or from 1 to 0.

Patches 6-7 are fixes and patches 9-12 are optimizations for mmu zaps
when guest MTRRs are honored.
Those mmu zaps are intended to remove stale memtypes of TDP entries
caused by changes of guest MTRRs and CR0.CD and are usually triggered from
all vCPUs in bursts.

- The 6th patch places TDP zap to when CR0.CD toggles and when guest MTRRs
  update under CR0.CD=0.

- The 7th-8th patches refine KVM_X86_QUIRK_CD_NW_CLEARED by removing the
  IPAT bit in EPT memtype when CR0.CD=1 and guest MTRRs are honored.

- The 9th-11th patches are optimizations of the mmu zap when guest MTRRs
  are honored by serializing vCPUs' gfn zap requests and calculating of
  precise fine-grained ranges to zap.
  They are put in mtrr.c because the optimizations are related to when
  guest MTRRs are honored and because it requires to read guest MTRRs
  for fine-grained ranges.
  Calls to kvm_unmap_gfn_range() are not included into the optimization,
  because they are not triggered from all vCPUs in bursts and not all of
  them are blockable. They usually happen at memslot removal and thus do
  not affect the mmu zaps when guest MTRRs are honored. Also, current
  performance data shows that there's no observable performance difference
  to mmu zaps by turning on/off auto numa balancing triggered
  kvm_unmap_gfn_range().

- The 12th patch further convert kvm_zap_gfn_range() to use shared
  mmu_lock in TDP MMU. It can visibly help to reduce cost in contentions
  along with vCPUs number increases.

A reference performance data for last 7 patches as below:

Base1: base code before patch 6
Base2: Base 1 + patches 6 + 7 + 8
       patch 6: move TDP zaps from guest MTRRs update to CR0.CD toggling
       patch 7: drop IPAT in memtype when CD=1 for
                KVM_X86_QUIRK_CD_NW_CLEARED
       patch 8: entralize code to get CD=1 memtype when guest MTRRs are
                honored 

patch 9:  serialize gfn zap
patch 10: fine-grained gfn zap 
patch 11: split and zap in-slot gfn ranges only **
patch 12: convert gfn zap to use shared mmu_lock
_________________________________________________________________________
   guest bios: OVMF        | 8 vCPUs  memory=16G  | 16 vCPUs memory=16G  |
 CPU frequency: 3100 MHz   | zap cycles | zap cnt | zap cycles | zap cnt |
---------------------------|----------------------|----------------------|
Base1                      |  3506.37M  |    84   | 17683.24M  |   164   |
Base2                      |  4241.46M  |    74   | 25944.80M  |   146   |*
Base2 + Patch 9            |   319.23M  |    74   |  6183.92M  |   146   |
Base2 + Patches 9+10       |   128.34M  |    74   |  1735.13M  |   146   |
Base2 + Patches 9+10+11    |    37.17M  |    74   |   357.68M  |   146   |
Base2 + Patches 9+10+11+12 |    17.21M  |    74   |    39.85M  |   146   |
---------------------------|----------------------|----------------------|
Base2 + Patch 12           |    32.66M  |    74   |    74.77M  |   146   |
Base2 + Patches 9+10+12    |    15.01M  |    74   |    35.46M  |   146   |
___________________________|______________________|______________________|

_________________________________________________________________________
   guest bios: Seabios     | 8 vCPUs  memory=16G  | 16 vCPUs memory=16G  |
 CPU frequency: 3100 MHz   | zap cycles | zap cnt | zap cycles | zap cnt |
---------------------------|----------------------|----------------------|
Base1                      |    44.55M  |    50   |   532.71M  |    98   |
Base2                      |   526.65M  |    42   |  2138.80M  |    82   |*
Base2 + Patch 9            |   116.59M  |    42   |   922.68M  |    82   |
Base2 + Patches 9+10       |    62.09M  |    42   |   377.15M  |    82   |
Base2 + Patches 9+10+11    |    17.86M  |    42   |    49.88M  |    82   |
Base2 + Patches 9+10+11+12 |    16.98M  |    42   |    44.64M  |    82   |
---------------------------|----------------------|----------------------|
Base2 + Patch 12           |    24.65M  |    42   |    62.04M  |    82   |
Base2 + Patches 9+10+12    |    18.44M  |    42   |    41.88M  |    82   |
___________________________|______________________|______________________|


* With Base2, EPT zap cnt are reduced because there are some MTRR updates
  under CR0.CD=1.
  EPT zap cycles increases a bit (especially true in case of Seabios)
  because concurrency is more intense when CR0.CD toggles than when
  guest MTRRs update.
  (patch 7/8 are neglectable in performance)

** patch 11 splits a single gfn range that may cover out-of-slot ranges
   into several in-slot ranges and zap only those in-slot ranges.
   It essentially reduces the counts to check contentions and yileds
   when mmu_lock is held for write.
   However, if mmu_lock is held for read (i.e. with patch 12), the
   effect of reducing contention and yileds from patch 11 are not that
   obvious, whereas patch 11 may introduce more kmalloc() for splitting.
   So, I intend to drop patch 11 if patch 12 is acceptable. 


Changelog:
v3 --> v4:
1. Added patch 12 of converting gfn zap to use shared mmu_lock.
2. Updated commit messages of patch 3 and patch 4 to better describe
   patch intention. (Sean)
3. Updated commit message of patch 7 to explain the problem better.
   (Chao Gao, Xiaoyao Li, Yuan Yao)
4. Renamed kvm_mtrr_get_cd_memory_type() to
   kvm_honors_guest_mtrrs_get_cd_memtype() in patch 8.
5. Renamed kvm_zap_gfn_range_on_cd_toggle() to
   kvm_honors_guest_mtrrs_get_cd_memtype() in patch 9.
6. Move initialization of mtrr_zap_list_lock and mtrr_zap_list from
   kvm_vcpu_mtrr_init() to kvm_arch_init_vm(). (Sean)
7. Removed unnecesary kvm_clear_mtrr_zap_list() in patch 9 and moved it
   to patch 10. (Yuan Yao).
8. Added a table in comment for fine-grained zapping for
   kvm_honors_guest_mtrrs_zap_on_cd_toggle(). (Yuan Yao)

v2 --> v3:
1. Updated patch 1 about definition of honor guest MTRRs helper. (Sean)
2. Added patch 2 to use honor guest MTRRs helper in kvm_tdp_page_fault().
   (Sean)
3. Remove unnecessary calculation of MTRR ranges.
   (Chao Gao, Kai Huang, Sean)
4. Updated patches 3-5 to use the helper. (Chao Gao, Kai Huang, Sean)
5. Added patches 6,7 to reposition TDP zap and drop IPAT bit. (Sean)
6. Added patch 8 to prepare for patch 10's memtype calculation when
   CR0.CD=1.
7. Added patches 9-11 to speed up MTRR update /CD0 toggle when guest
   MTRRs are honored. (Sean)
8. Dropped per-VM based MTRRs in v2 (Sean)

v1 --> v2:
1. Added a helper to skip non EPT case in patch 1
2. Added patch 2 to skip mmu zap when guest CR0_CD changes if EPT is not
   enabled. (Chao Gao)
3. Added patch 3 to skip mmu zap when guest MTRR changes if EPT is not
   enabled.
4. Do not mention TDX in patch 4 as the code is not merged yet (Chao Gao)
5. Added patches 5-6 to reduce EPT zap during guest bootup.

v3:
https://lore.kernel.org/all/20230616023101.7019-1-yan.y.zhao@intel.com/

v2:
https://lore.kernel.org/all/20230509134825.1523-1-yan.y.zhao@intel.com/

v1:
https://lore.kernel.org/all/20230508034700.7686-1-yan.y.zhao@intel.com/

Yan Zhao (12):
  KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  KVM: x86/mmu: Use KVM honors guest MTRRs helper in
    kvm_tdp_page_fault()
  KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles
  KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr
  KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops
  KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling
  KVM: VMX: drop IPAT in memtype when CD=1 for
    KVM_X86_QUIRK_CD_NW_CLEARED
  KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are
    honored
  KVM: x86/mmu: serialize vCPUs to zap gfn when guest MTRRs are honored
  KVM: x86/mmu: fine-grained gfn zap when guest MTRRs are honored
  KVM: x86/mmu: split a single gfn zap range when guest MTRRs are
    honored
  KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in
    TDP MMU

 arch/x86/include/asm/kvm_host.h |   4 +
 arch/x86/kvm/mmu.h              |   7 +
 arch/x86/kvm/mmu/mmu.c          |  32 +++-
 arch/x86/kvm/mmu/tdp_mmu.c      |  50 +++++
 arch/x86/kvm/mmu/tdp_mmu.h      |   1 +
 arch/x86/kvm/mtrr.c             | 328 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.c          |  11 +-
 arch/x86/kvm/x86.c              |  28 ++-
 arch/x86/kvm/x86.h              |   3 +
 9 files changed, 439 insertions(+), 25 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
@ 2023-07-14  6:50 ` Yan Zhao
  2023-10-07  7:00   ` Like Xu
  2023-07-14  6:50 ` [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault() Yan Zhao
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:50 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

Added helpers to check if KVM honors guest MTRRs.
The inner helper __kvm_mmu_honors_guest_mtrrs() is also provided to
outside callers for the purpose of checking if guest MTRRs were honored
before stopping non-coherent DMA.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mmu.h     |  7 +++++++
 arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 92d5a1924fc1..38bd449226f6 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -235,6 +235,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return -(u32)fault & errcode;
 }
 
+bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
+
+static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
+{
+	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
+}
+
 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1e5db621241f..b4f89f015c37 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4516,6 +4516,21 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 }
 #endif
 
+bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma)
+{
+	/*
+	 * If the TDP is enabled, the host MTRRs are ignored by TDP
+	 * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
+	 * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
+	 * from the guest's MTRRs so that guest accesses to memory that is
+	 * DMA'd aren't cached against the guest's wishes.
+	 *
+	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
+	 * e.g. KVM will force UC memtype for host MMIO.
+	 */
+	return vm_has_noncoherent_dma && tdp_enabled && shadow_memtype_mask;
+}
+
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault()
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
  2023-07-14  6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
@ 2023-07-14  6:50 ` Yan Zhao
  2023-07-14  6:51 ` [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles Yan Zhao
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:50 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

Let kvm_tdp_page_fault() use helper function kvm_mmu_honors_guest_mtrrs()
to decide if it needs to consult guest MTRR to check GFN range consistency.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b4f89f015c37..7f52bbe013b3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4536,16 +4536,9 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	/*
 	 * If the guest's MTRRs may be used to compute the "real" memtype,
 	 * restrict the mapping level to ensure KVM uses a consistent memtype
-	 * across the entire mapping.  If the host MTRRs are ignored by TDP
-	 * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
-	 * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
-	 * from the guest's MTRRs so that guest accesses to memory that is
-	 * DMA'd aren't cached against the guest's wishes.
-	 *
-	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
-	 * e.g. KVM will force UC memtype for host MMIO.
+	 * across the entire mapping.
 	 */
-	if (shadow_memtype_mask && kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
+	if (kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) {
 		for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) {
 			int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
 			gfn_t base = gfn_round_for_level(fault->gfn,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
  2023-07-14  6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
  2023-07-14  6:50 ` [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault() Yan Zhao
@ 2023-07-14  6:51 ` Yan Zhao
  2023-07-14  6:51 ` [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr Yan Zhao
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:51 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

Zap SPTEs when CR0.CD is toggled if and only if KVM's MMU is honoring
guest MTRRs, which is the only time that KVM incorporates the guest's
CR0.CD into the final memtype.

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9e7186864542..6693daeb5686 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -942,7 +942,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
 		kvm_mmu_reset_context(vcpu);
 
 	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
-	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
+	    kvm_mmu_honors_guest_mtrrs(vcpu->kvm) &&
 	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
 		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (2 preceding siblings ...)
  2023-07-14  6:51 ` [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles Yan Zhao
@ 2023-07-14  6:51 ` Yan Zhao
  2023-07-14  6:52 ` [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops Yan Zhao
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:51 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

When guest MTRRs are updated, zap SPTEs and do zap range calcluation if and
only if KVM's MMU is honoring guest MTRRs, which is the only time that KVM
incorporates the guest's MTRR type into the final memtype.

Suggested-by: Chao Gao <chao.gao@intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Kai Huang <kai.huang@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mtrr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 3eb6e7f47e96..a67c28a56417 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -320,7 +320,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
 	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
 	gfn_t start, end;
 
-	if (!tdp_enabled || !kvm_arch_has_noncoherent_dma(vcpu->kvm))
+	if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
 		return;
 
 	if (!mtrr_is_enabled(mtrr_state) && msr != MSR_MTRRdefType)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (3 preceding siblings ...)
  2023-07-14  6:51 ` [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr Yan Zhao
@ 2023-07-14  6:52 ` Yan Zhao
  2023-07-14  6:52 ` [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling Yan Zhao
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:52 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

Zap KVM TDP when noncoherent DMA assignment starts (noncoherent dma count
transitions from 0 to 1) or stops (noncoherent dma count transistions
from 1 to 0). Before the zap, test if guest MTRR is to be honored after
the assignment starts or was honored before the assignment stops.

When there's no noncoherent DMA device, EPT memory type is
((MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT)

When there're noncoherent DMA devices, EPT memory type needs to honor
guest CR0.CD and MTRR settings.

So, if noncoherent DMA count transitions between 0 and 1, EPT leaf entries
need to be zapped to clear stale memory type.

This issue might be hidden when the device is statically assigned with
VFIO adding/removing MMIO regions of the noncoherent DMA devices for
several times during guest boot, and current KVM MMU will call
kvm_mmu_zap_all_fast() on the memslot removal.

But if the device is hot-plugged, or if the guest has mmio_always_on for
the device, the MMIO regions of it may only be added for once, then there's
no path to do the EPT entries zapping to clear stale memory type.

Therefore do the EPT zapping when noncoherent assignment starts/stops to
ensure stale entries cleaned away.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/x86.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6693daeb5686..ac9548efa76f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13164,15 +13164,31 @@ bool noinstr kvm_arch_has_assigned_device(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
 
+static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm)
+{
+	/*
+	 * Non-coherent DMA assignement and de-assignment will affect
+	 * whether KVM honors guest MTRRs and cause changes in memtypes
+	 * in TDP.
+	 * So, specify the second parameter as true here to indicate
+	 * non-coherent DMAs are/were involved and TDP zap might be
+	 * necessary.
+	 */
+	if (__kvm_mmu_honors_guest_mtrrs(kvm, true))
+		kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
+}
+
 void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
 {
-	atomic_inc(&kvm->arch.noncoherent_dma_count);
+	if (atomic_inc_return(&kvm->arch.noncoherent_dma_count) == 1)
+		kvm_noncoherent_dma_assignment_start_or_stop(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);
 
 void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
 {
-	atomic_dec(&kvm->arch.noncoherent_dma_count);
+	if (!atomic_dec_return(&kvm->arch.noncoherent_dma_count))
+		kvm_noncoherent_dma_assignment_start_or_stop(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (4 preceding siblings ...)
  2023-07-14  6:52 ` [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops Yan Zhao
@ 2023-07-14  6:52 ` Yan Zhao
  2023-07-14  6:53 ` [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Yan Zhao
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:52 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

If guest MTRRs are honored, always zap TDP when CR0.CD toggles and don't do
it if guest MTRRs are updated under CR0.CD=1.

This is because CR0.CD=1 takes precedence over guest MTRRs to decide TDP
memory types, TDP memtypes are not changed if guest MTRRs update under
CR0.CD=1.

Instead, always do the TDP zapping when CR0.CD toggles, because even with
the quirk KVM_X86_QUIRK_CD_NW_CLEARED, TDP memory types may change after
guest CR0.CD toggles.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mtrr.c | 3 +++
 arch/x86/kvm/x86.c  | 3 +--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index a67c28a56417..3ce58734ad22 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -323,6 +323,9 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
 	if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
 		return;
 
+	if (kvm_is_cr0_bit_set(vcpu, X86_CR0_CD))
+		return;
+
 	if (!mtrr_is_enabled(mtrr_state) && msr != MSR_MTRRdefType)
 		return;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac9548efa76f..32cc8bfaa5f1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -942,8 +942,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
 		kvm_mmu_reset_context(vcpu);
 
 	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
-	    kvm_mmu_honors_guest_mtrrs(vcpu->kvm) &&
-	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
+	    kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
 		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
 }
 EXPORT_SYMBOL_GPL(kvm_post_set_cr0);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (5 preceding siblings ...)
  2023-07-14  6:52 ` [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling Yan Zhao
@ 2023-07-14  6:53 ` Yan Zhao
  2023-08-25 21:43   ` Sean Christopherson
  2023-07-14  6:53 ` [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Yan Zhao
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:53 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

For KVM_X86_QUIRK_CD_NW_CLEARED is on, remove the IPAT (ignore PAT) bit in
EPT memory types when cache is disabled and non-coherent DMA are present.

To correctly emulate CR0.CD=1, UC + IPAT are required as memtype in EPT.
However, as with commit
fb279950ba02 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED"), WB + IPAT are
now returned to workaround a BIOS issue that guest MTRRs are enabled too
late. Without this workaround, a super slow guest boot-up is expected
during the pre-guest-MTRR-enabled period due to UC as the effective memory
type for all guest memory.

Absent emulating CR0.CD=1 with UC, it makes no sense to set IPAT when KVM
is honoring the guest memtype.
Removing the IPAT bit in this patch allows effective memory type to honor
PAT values as well, as WB is the weakest memtype. It means if a guest
explicitly claims UC as the memtype in PAT, the effective memory is UC
instead of previous WB. If, for some unknown reason, a guest meets a slow
boot-up issue with the removal of IPAT, it's desired to fix the blamed PAT
in the guest.

Besides, this patch is also a preparation patch for later fine-grained gfn
zap when guest MTRRs are honored, because it allows zapping only non-WB
ranges when CR0.CD toggles.

BTW, returning guest MTRR type as if CR0.CD=0 is also not preferred because
it still has to hardcode the MTRR type to WB during the
pre-guest-MTRR-enabled period to workaround the slow guest boot-up issue
(guest MTRR type when guest MTRRs are disabled is UC).
In addition, it will make the quirk unnecessarily complexer .

The change of removing IPAT has been verified with normal boot-up time
on old OVMF of commit c9e5618f84b0cb54a9ac2d7604f7b7e7859b45a7 as well,
dated back to Apr 14 2015.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ecf4be2c6af..c1e93678cea4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7548,8 +7548,6 @@ static int vmx_vm_init(struct kvm *kvm)
 
 static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
-	u8 cache;
-
 	/* We wanted to honor guest CD/MTRR/PAT, but doing so could result in
 	 * memory aliases with conflicting memory types and sometimes MCEs.
 	 * We have to be careful as to what are honored and when.
@@ -7576,11 +7574,10 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 
 	if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
 		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
-			cache = MTRR_TYPE_WRBACK;
+			return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
 		else
-			cache = MTRR_TYPE_UNCACHABLE;
-
-		return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
+			return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) |
+				VMX_EPT_IPAT_BIT;
 	}
 
 	return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (6 preceding siblings ...)
  2023-07-14  6:53 ` [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Yan Zhao
@ 2023-07-14  6:53 ` Yan Zhao
  2023-08-25 21:46   ` Sean Christopherson
  2023-07-14  6:54 ` [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn " Yan Zhao
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:53 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

Centralize the code to get cache disabled memtype when guest MTRRs are
honored. If a TDP honors guest MTRRs, it is required to call the provided
API to get the memtype for CR0.CD=1.

This is the preparation patch for later implementation of fine-grained gfn
zap for CR0.CD toggles when guest MTRRs are honored.

No functional change intended.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mtrr.c    | 16 ++++++++++++++++
 arch/x86/kvm/vmx/vmx.c | 10 +++++-----
 arch/x86/kvm/x86.h     |  2 ++
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 3ce58734ad22..64c6daa659c8 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -721,3 +721,19 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
 
 	return type == mtrr_default_type(mtrr_state);
 }
+
+/*
+ * this routine is supposed to be called when guest mtrrs are honored
+ */
+void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
+					   u8 *type, bool *ipat)
+{
+	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) {
+		*type = MTRR_TYPE_WRBACK;
+		*ipat = false;
+	} else {
+		*type = MTRR_TYPE_UNCACHABLE;
+		*ipat = true;
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_honors_guest_mtrrs_get_cd_memtype);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c1e93678cea4..7fec1ee23b54 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7573,11 +7573,11 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
 
 	if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
-		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
-			return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
-		else
-			return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) |
-				VMX_EPT_IPAT_BIT;
+		bool ipat;
+		u8 cache;
+
+		kvm_honors_guest_mtrrs_get_cd_memtype(vcpu, &cache, &ipat);
+		return cache << VMX_EPT_MT_EPTE_SHIFT | (ipat ? VMX_EPT_IPAT_BIT : 0);
 	}
 
 	return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 82e3dafc5453..e7733dc4dccc 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -313,6 +313,8 @@ int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
 int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
 bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
 					  int page_num);
+void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
+					   u8 *type, bool *ipat);
 bool kvm_vector_hashing_enabled(void);
 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
 int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn when guest MTRRs are honored
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (7 preceding siblings ...)
  2023-07-14  6:53 ` [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Yan Zhao
@ 2023-07-14  6:54 ` Yan Zhao
  2023-08-25 22:47   ` Sean Christopherson
  2023-07-14  6:55 ` [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap " Yan Zhao
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:54 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

Serialize concurrent and repeated calls of kvm_zap_gfn_range() from every
vCPU for CR0.CD toggles and MTRR updates when guest MTRRs are honored.

During guest boot-up, if guest MTRRs are honored by TDP, TDP zaps are
triggered several times by each vCPU for CR0.CD toggles and MTRRs updates.
This will take unexpected longer CPU cycles because of the contention of
kvm->mmu_lock.

Therefore, introduce a mtrr_zap_list to remove duplicated zap and an atomic
mtrr_zapping to allow only one vCPU to do the real zap work at one time.

Cc: Yuan Yao <yuan.yao@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/include/asm/kvm_host.h |   4 ++
 arch/x86/kvm/mtrr.c             | 122 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |   5 +-
 arch/x86/kvm/x86.h              |   1 +
 4 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 28bd38303d70..8da1517a1513 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1444,6 +1444,10 @@ struct kvm_arch {
 	 */
 #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
 	struct kvm_mmu_memory_cache split_desc_cache;
+
+	struct list_head mtrr_zap_list;
+	spinlock_t mtrr_zap_list_lock;
+	atomic_t mtrr_zapping;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 64c6daa659c8..996a274cee40 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -25,6 +25,8 @@
 #define IA32_MTRR_DEF_TYPE_FE		(1ULL << 10)
 #define IA32_MTRR_DEF_TYPE_TYPE_MASK	(0xff)
 
+static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
+				   gfn_t gfn_start, gfn_t gfn_end);
 static bool is_mtrr_base_msr(unsigned int msr)
 {
 	/* MTRR base MSRs use even numbers, masks use odd numbers. */
@@ -341,7 +343,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
 		var_mtrr_range(var_mtrr_msr_to_range(vcpu, msr), &start, &end);
 	}
 
-	kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end));
+	kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(start), gpa_to_gfn(end));
 }
 
 static bool var_mtrr_range_is_valid(struct kvm_mtrr_range *range)
@@ -737,3 +739,121 @@ void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
 	}
 }
 EXPORT_SYMBOL_GPL(kvm_honors_guest_mtrrs_get_cd_memtype);
+
+struct mtrr_zap_range {
+	gfn_t start;
+	/* end is exclusive */
+	gfn_t end;
+	struct list_head node;
+};
+
+/*
+ * Add @range into kvm->arch.mtrr_zap_list and sort the list in
+ * "length" ascending + "start" descending order, so that
+ * ranges consuming more zap cycles can be dequeued later and their
+ * chances of being found duplicated are increased.
+ */
+static void kvm_add_mtrr_zap_list(struct kvm *kvm, struct mtrr_zap_range *range)
+{
+	struct list_head *head = &kvm->arch.mtrr_zap_list;
+	u64 len = range->end - range->start;
+	struct mtrr_zap_range *cur, *n;
+	bool added = false;
+
+	spin_lock(&kvm->arch.mtrr_zap_list_lock);
+
+	if (list_empty(head)) {
+		list_add(&range->node, head);
+		spin_unlock(&kvm->arch.mtrr_zap_list_lock);
+		return;
+	}
+
+	list_for_each_entry_safe(cur, n, head, node) {
+		u64 cur_len = cur->end - cur->start;
+
+		if (len < cur_len)
+			break;
+
+		if (len > cur_len)
+			continue;
+
+		if (range->start > cur->start)
+			break;
+
+		if (range->start < cur->start)
+			continue;
+
+		/* equal len & start, no need to add */
+		added = true;
+		kfree(range);
+		break;
+	}
+
+	if (!added)
+		list_add_tail(&range->node, &cur->node);
+
+	spin_unlock(&kvm->arch.mtrr_zap_list_lock);
+}
+
+static void kvm_zap_mtrr_zap_list(struct kvm *kvm)
+{
+	struct list_head *head = &kvm->arch.mtrr_zap_list;
+	struct mtrr_zap_range *cur = NULL;
+
+	spin_lock(&kvm->arch.mtrr_zap_list_lock);
+
+	while (!list_empty(head)) {
+		u64 start, end;
+
+		cur = list_first_entry(head, typeof(*cur), node);
+		start = cur->start;
+		end = cur->end;
+		list_del(&cur->node);
+		kfree(cur);
+		spin_unlock(&kvm->arch.mtrr_zap_list_lock);
+
+		kvm_zap_gfn_range(kvm, start, end);
+
+		spin_lock(&kvm->arch.mtrr_zap_list_lock);
+	}
+
+	spin_unlock(&kvm->arch.mtrr_zap_list_lock);
+}
+
+static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm)
+{
+	if (atomic_cmpxchg_acquire(&kvm->arch.mtrr_zapping, 0, 1) == 0) {
+		kvm_zap_mtrr_zap_list(kvm);
+		atomic_set_release(&kvm->arch.mtrr_zapping, 0);
+		return;
+	}
+
+	while (atomic_read(&kvm->arch.mtrr_zapping))
+		cpu_relax();
+}
+
+static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
+				   gfn_t gfn_start, gfn_t gfn_end)
+{
+	struct mtrr_zap_range *range;
+
+	range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
+	if (!range)
+		goto fail;
+
+	range->start = gfn_start;
+	range->end = gfn_end;
+
+	kvm_add_mtrr_zap_list(vcpu->kvm, range);
+
+	kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
+	return;
+
+fail:
+	kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end);
+}
+
+void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu)
+{
+	return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 32cc8bfaa5f1..bb79154cf465 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -943,7 +943,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
 
 	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
 	    kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
-		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
+		kvm_honors_guest_mtrrs_zap_on_cd_toggle(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_post_set_cr0);
 
@@ -12310,6 +12310,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.guest_can_read_msr_platform_info = true;
 	kvm->arch.enable_pmu = enable_pmu;
 
+	spin_lock_init(&kvm->arch.mtrr_zap_list_lock);
+	INIT_LIST_HEAD(&kvm->arch.mtrr_zap_list);
+
 #if IS_ENABLED(CONFIG_HYPERV)
 	spin_lock_init(&kvm->arch.hv_root_tdp_lock);
 	kvm->arch.hv_root_tdp = INVALID_PAGE;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e7733dc4dccc..56d8755b2560 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -315,6 +315,7 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
 					  int page_num);
 void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
 					   u8 *type, bool *ipat);
+void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu);
 bool kvm_vector_hashing_enabled(void);
 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
 int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap when guest MTRRs are honored
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (8 preceding siblings ...)
  2023-07-14  6:54 ` [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn " Yan Zhao
@ 2023-07-14  6:55 ` Yan Zhao
  2023-08-25 23:13   ` Sean Christopherson
  2023-07-14  6:56 ` [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range " Yan Zhao
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:55 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

When guest MTRRs are honored and CR0.CD toggles, rather than blindly zap
everything, find out fine-grained ranges to zap according to guest MTRRs.

Fine-grained and precise zap ranges allow reduced traversal footprint
during zap and increased chances for concurrent vCPUs to find and skip
duplicated ranges to zap.

Opportunistically fix a typo in a nearby comment.

Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mtrr.c | 164 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 162 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 996a274cee40..9fdbdbf874a8 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -179,7 +179,7 @@ static struct fixed_mtrr_segment fixed_seg_table[] = {
 	{
 		.start = 0xc0000,
 		.end = 0x100000,
-		.range_shift = 12, /* 12K */
+		.range_shift = 12, /* 4K */
 		.range_start = 24,
 	}
 };
@@ -747,6 +747,19 @@ struct mtrr_zap_range {
 	struct list_head node;
 };
 
+static void kvm_clear_mtrr_zap_list(struct kvm *kvm)
+{
+	struct list_head *head = &kvm->arch.mtrr_zap_list;
+	struct mtrr_zap_range *tmp, *n;
+
+	spin_lock(&kvm->arch.mtrr_zap_list_lock);
+	list_for_each_entry_safe(tmp, n, head, node) {
+		list_del(&tmp->node);
+		kfree(tmp);
+	}
+	spin_unlock(&kvm->arch.mtrr_zap_list_lock);
+}
+
 /*
  * Add @range into kvm->arch.mtrr_zap_list and sort the list in
  * "length" ascending + "start" descending order, so that
@@ -795,6 +808,67 @@ static void kvm_add_mtrr_zap_list(struct kvm *kvm, struct mtrr_zap_range *range)
 	spin_unlock(&kvm->arch.mtrr_zap_list_lock);
 }
 
+/*
+ * Fixed ranges are only 256 pages in total.
+ * After balancing between reducing overhead of zap multiple ranges
+ * and increasing chances of finding duplicated ranges,
+ * just add fixed mtrr ranges as a whole to the mtrr zap list
+ * if memory type of one of them is not the specified type.
+ */
+static int prepare_zaplist_fixed_mtrr_of_non_type(struct kvm_vcpu *vcpu, u8 type)
+{
+	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
+	struct mtrr_zap_range *range;
+	int index, seg_end;
+	u8 mem_type;
+
+	for (index = 0; index < KVM_NR_FIXED_MTRR_REGION; index++) {
+		mem_type = mtrr_state->fixed_ranges[index];
+
+		if (mem_type == type)
+			continue;
+
+		range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
+		if (!range)
+			return -ENOMEM;
+
+		seg_end = ARRAY_SIZE(fixed_seg_table) - 1;
+		range->start = gpa_to_gfn(fixed_seg_table[0].start);
+		range->end = gpa_to_gfn(fixed_seg_table[seg_end].end);
+		kvm_add_mtrr_zap_list(vcpu->kvm, range);
+		break;
+	}
+	return 0;
+}
+
+/*
+ * Add var mtrr ranges to the mtrr zap list
+ * if its memory type does not equal to type
+ */
+static int prepare_zaplist_var_mtrr_of_non_type(struct kvm_vcpu *vcpu, u8 type)
+{
+	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
+	struct mtrr_zap_range *range;
+	struct kvm_mtrr_range *tmp;
+	u8 mem_type;
+
+	list_for_each_entry(tmp, &mtrr_state->head, node) {
+		mem_type = tmp->base & 0xff;
+		if (mem_type == type)
+			continue;
+
+		range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
+		if (!range)
+			return -ENOMEM;
+
+		var_mtrr_range(tmp, &range->start, &range->end);
+		range->start = gpa_to_gfn(range->start);
+		range->end = gpa_to_gfn(range->end);
+		kvm_add_mtrr_zap_list(vcpu->kvm, range);
+	}
+	return 0;
+}
+
 static void kvm_zap_mtrr_zap_list(struct kvm *kvm)
 {
 	struct list_head *head = &kvm->arch.mtrr_zap_list;
@@ -853,7 +927,93 @@ static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
 	kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end);
 }
 
+/*
+ * Zap SPTEs when guest MTRRs are honored and CR0.CD toggles
+ * in fine-grained way according to guest MTRRs.
+ * As guest MTRRs are per-vCPU, they are unchanged across this function.
+ *
+ * when CR0.CD=1, TDP memtype is WB or UC + IPAT;
+ * when CR0.CD=0, TDP memtype is determined by guest MTRRs.
+ *
+ * On CR0.CD toggles, as guest MTRRs remain unchanged,
+ * - if old memtype are new memtype are equal, nothing needs to do;
+ * - if guest default MTRR type equals to memtype in CR0.CD=1,
+ *   only MTRR ranges of non-default-memtype are required to be zapped.
+ * - if guest default MTRR type !equals to memtype in CR0.CD=1,
+ *   everything is zapped because memtypes for almost all guest memory
+ *   are out-dated.
+ * _____________________________________________________________________
+ *| quirk on             | CD=1 to CD=0         | CD=0 to CD=1          |
+ *|                      | old memtype = WB     | new memtype = WB      |
+ *|----------------------|----------------------|-----------------------|
+ *| MTRR enabled         | new memtype =        | old memtype =         |
+ *|                      | guest MTRR type      | guest MTRR type       |
+ *|    ------------------|----------------------|-----------------------|
+ *|    | if default MTRR | zap non-WB guest     | zap non-WB guest      |
+ *|    | type == WB      | MTRR ranges          | MTRR ranges           |
+ *|    |-----------------|----------------------|-----------------------|
+ *|    | if default MTRR | zap all              | zap all               |
+ *|    | type != WB      | as almost all guest MTRR ranges are non-WB   |
+ *|----------------------|----------------------------------------------|
+ *| MTRR disabled        | new memtype = UC     | old memtype = UC      |
+ *| (w/ FEATURE_MTRR)    | zap all              | zap all               |
+ *|----------------------|----------------------|-----------------------|
+ *| MTRR disabled        | new memtype = WB     | old memtype = WB      |
+ *| (w/o FEATURE_MTRR)   | do nothing           | do nothing            |
+ *|______________________|______________________|_______________________|
+ *
+ * _____________________________________________________________________
+ *| quirk off     | CD=1 to CD=0             | CD=0 to CD=1             |
+ *|               | old memtype = UC + IPAT  | new memtype = UC + IPAT  |
+ *|---------------|--------------------------|--------------------------|
+ *| MTRR enabled  | new memtype = guest MTRR | old memtype = guest MTRR |
+ *|               | type (!= UC + IPAT)      | type (!= UC + IPAT)      |
+ *|               | zap all                  | zap all                  |
+ *|---------------|------------------------- |--------------------------|
+ *| MTRR disabled | new memtype = UC         | old memtype = UC         |
+ *| (w/           | (!= UC + IPAT)           | (!= UC + IPAT)           |
+ *| FEATURE_MTRR) | zap all                  | zap all                  |
+ *|---------------|--------------------------|--------------------------|
+ *| MTRR disabled | new memtype = WB         | old memtype = WB         |
+ *| (w/o          | (!= UC + IPAT)           | (!= UC + IPAT)           |
+ *| FEATURE_MTRR) | zap all                  | zap all                  |
+ *|_______________|__________________________|__________________________|
+ *
+ */
 void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu)
 {
-	return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
+	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
+	bool mtrr_enabled = mtrr_is_enabled(mtrr_state);
+	u8 default_mtrr_type;
+	bool cd_ipat;
+	u8 cd_type;
+
+	kvm_honors_guest_mtrrs_get_cd_memtype(vcpu, &cd_type, &cd_ipat);
+
+	default_mtrr_type = mtrr_enabled ? mtrr_default_type(mtrr_state) :
+			    mtrr_disabled_type(vcpu);
+
+	if (cd_type != default_mtrr_type || cd_ipat)
+		return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
+
+	/*
+	 * If mtrr is not enabled, it will go to zap all above if the default
+	 * type does not equal to cd_type;
+	 * Or it has no need to zap if the default type equals to cd_type.
+	 */
+	if (mtrr_enabled) {
+		if (prepare_zaplist_fixed_mtrr_of_non_type(vcpu, default_mtrr_type))
+			goto fail;
+
+		if (prepare_zaplist_var_mtrr_of_non_type(vcpu, default_mtrr_type))
+			goto fail;
+
+		kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
+	}
+	return;
+fail:
+	kvm_clear_mtrr_zap_list(vcpu->kvm);
+	/* resort to zapping all on failure*/
+	kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
+	return;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range when guest MTRRs are honored
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (9 preceding siblings ...)
  2023-07-14  6:55 ` [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap " Yan Zhao
@ 2023-07-14  6:56 ` Yan Zhao
  2023-08-25 23:15   ` Sean Christopherson
  2023-07-14  6:56 ` [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Yan Zhao
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:56 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

Split a single gfn zap range (specifially range [0, ~0UL)) to smaller
ranges according to current memslot layout when guest MTRRs are honored.

Though vCPUs have been serialized to perform kvm_zap_gfn_range() for MTRRs
updates and CR0.CD toggles, contention caused rescheduling cost is still
huge when there're concurrent page fault holding mmu_lock for read.

Split a single huge zap range according to the actual memslot layout can
reduce unnecessary transversal and yielding cost in tdp mmu.
Also, it can increase the chances for larger ranges to find existing ranges
to zap in zap list.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mtrr.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 9fdbdbf874a8..00e98dfc4b0d 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -909,21 +909,44 @@ static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm)
 static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
 				   gfn_t gfn_start, gfn_t gfn_end)
 {
+	int idx = srcu_read_lock(&vcpu->kvm->srcu);
+	const struct kvm_memory_slot *memslot;
 	struct mtrr_zap_range *range;
+	struct kvm_memslot_iter iter;
+	struct kvm_memslots *slots;
+	gfn_t start, end;
+	int i;
 
-	range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
-	if (!range)
-		goto fail;
-
-	range->start = gfn_start;
-	range->end = gfn_end;
-
-	kvm_add_mtrr_zap_list(vcpu->kvm, range);
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(vcpu->kvm, i);
+		kvm_for_each_memslot_in_gfn_range(&iter, slots, gfn_start, gfn_end) {
+			memslot = iter.slot;
+			start = max(gfn_start, memslot->base_gfn);
+			end = min(gfn_end, memslot->base_gfn + memslot->npages);
+			if (WARN_ON_ONCE(start >= end))
+				continue;
+
+			range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
+			if (!range)
+				goto fail;
+
+			range->start = start;
+			range->end = end;
+
+			/*
+			 * Redundent ranges in different address space will be
+			 * removed in kvm_add_mtrr_zap_list().
+			 */
+			kvm_add_mtrr_zap_list(vcpu->kvm, range);
+		}
+	}
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 
 	kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
 	return;
 
 fail:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (10 preceding siblings ...)
  2023-07-14  6:56 ` [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range " Yan Zhao
@ 2023-07-14  6:56 ` Yan Zhao
  2023-08-25 21:34   ` Sean Christopherson
  2023-08-25 23:17 ` [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
  2023-10-05  1:29 ` Sean Christopherson
  13 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-07-14  6:56 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: pbonzini, seanjc, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, Yan Zhao

Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for
read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G.
TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded
by RCU lock.

GFN zap can be super slow if mmu_lock is held for write when there are
contentions. In worst cases, huge cpu cycles are spent on yielding GFN by
GFN, i.e. the loop of "check and flush tlb -> drop rcu lock ->
drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered
for every GFN.
Contentions can either from concurrent zaps holding mmu_lock for write or
from tdp_mmu_map() holding mmu_lock for read.

After converting to hold mmu_lock for read, there will be less contentions
detected and retaking mmu_lock for read is also faster. There's no need to
flush TLB before dropping mmu_lock when there're contentions as SPTEs have
been zapped atomically and TLBs are flushed/flush requested immediately
within RCU lock.
In order to reduce TLB flush count, non-leaf SPTEs not greater than 1G
level are allowed to be zapped if their ranges are fully covered in the
gfn zap range.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/mmu/mmu.c     | 14 +++++++----
 arch/x86/kvm/mmu/tdp_mmu.c | 50 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.h |  1 +
 3 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7f52bbe013b3..1fa2a0a3fc9b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6310,15 +6310,19 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 
 	flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
 
+	if (flush)
+		kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start);
+
 	if (tdp_mmu_enabled) {
+		write_unlock(&kvm->mmu_lock);
+		read_lock(&kvm->mmu_lock);
+
 		for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
-			flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start,
-						      gfn_end, true, flush);
+			kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start, gfn_end);
+		read_unlock(&kvm->mmu_lock);
+		write_lock(&kvm->mmu_lock);
 	}
 
-	if (flush)
-		kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start);
-
 	kvm_mmu_invalidate_end(kvm, 0, -1ul);
 
 	write_unlock(&kvm->mmu_lock);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 512163d52194..2ad18275b643 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -888,6 +888,56 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
 	return flush;
 }
 
+static void zap_gfn_range_atomic(struct kvm *kvm, struct kvm_mmu_page *root,
+				 gfn_t start, gfn_t end)
+{
+	struct tdp_iter iter;
+
+	end = min(end, tdp_mmu_max_gfn_exclusive());
+
+	lockdep_assert_held_read(&kvm->mmu_lock);
+
+	rcu_read_lock();
+
+	for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
+retry:
+		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
+			continue;
+
+		if (!is_shadow_present_pte(iter.old_spte))
+			continue;
+
+		/*
+		 * As also documented in tdp_mmu_zap_root(),
+		 * KVM must be able to zap a 1gb shadow page without
+		 * inducing a stall to allow in-place replacement with a 1gb hugepage.
+		 */
+		if (iter.gfn < start ||
+		    iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end ||
+		    iter.level > KVM_MAX_HUGEPAGE_LEVEL)
+			continue;
+
+		/* Note, a successful atomic zap also does a remote TLB flush. */
+		if (tdp_mmu_zap_spte_atomic(kvm, &iter))
+			goto retry;
+	}
+
+	rcu_read_unlock();
+}
+
+/*
+ * Zap all SPTEs for the range of gfns, [start, end), for all roots with
+ * shared mmu lock in atomic way.
+ * TLB flushs are performed within the rcu lock.
+ */
+void kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start, gfn_t end)
+{
+	struct kvm_mmu_page *root;
+
+	for_each_valid_tdp_mmu_root_yield_safe(kvm, root, as_id, true)
+		zap_gfn_range_atomic(kvm, root, start, end);
+}
+
 void kvm_tdp_mmu_zap_all(struct kvm *kvm)
 {
 	struct kvm_mmu_page *root;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 0a63b1afabd3..90856bd7a2fd 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -22,6 +22,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
 
 bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start,
 				 gfn_t end, bool can_yield, bool flush);
+void kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start, gfn_t end);
 bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp);
 void kvm_tdp_mmu_zap_all(struct kvm *kvm);
 void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU
  2023-07-14  6:56 ` [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Yan Zhao
@ 2023-08-25 21:34   ` Sean Christopherson
  2023-09-04  7:31     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-08-25 21:34 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Jul 14, 2023, Yan Zhao wrote:
> Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for
> read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G.
> TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded
> by RCU lock.
> 
> GFN zap can be super slow if mmu_lock is held for write when there are
> contentions. In worst cases, huge cpu cycles are spent on yielding GFN by
> GFN, i.e. the loop of "check and flush tlb -> drop rcu lock ->
> drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered
> for every GFN.
> Contentions can either from concurrent zaps holding mmu_lock for write or
> from tdp_mmu_map() holding mmu_lock for read.

The lock contention should go away with a pre-check[*], correct?  That's a more
complete solution too, in that it also avoids lock contention for the shadow MMU,
which presumably suffers the same problem (I don't see anything that would prevent
it from yielding).

If we do want to zap with mmu_lock held for read, I think we should convert
kvm_tdp_mmu_zap_leafs() and all its callers to run under read, because unless I'm
missing something, the rules are the same regardless of _why_ KVM is zapping, e.g.
the zap needs to be protected by mmu_invalidate_in_progress, which ensures no other
tasks will race to install SPTEs that are supposed to be zapped.

If you post a version of this patch that converts kvm_tdp_mmu_zap_leafs(), please
post it as a standalone patch.  At a glance it doesn't have any dependencies on the
MTRR changes, and I don't want this type of changed buried at the end of a series
that is for a fairly niche setup.  This needs a lot of scrutiny to make sure zapping
under read really is safe.

[*] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com

> After converting to hold mmu_lock for read, there will be less contentions
> detected and retaking mmu_lock for read is also faster. There's no need to
> flush TLB before dropping mmu_lock when there're contentions as SPTEs have
> been zapped atomically and TLBs are flushed/flush requested immediately
> within RCU lock.
> In order to reduce TLB flush count, non-leaf SPTEs not greater than 1G
> level are allowed to be zapped if their ranges are fully covered in the
> gfn zap range.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
  2023-07-14  6:53 ` [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Yan Zhao
@ 2023-08-25 21:43   ` Sean Christopherson
  2023-09-04  7:41     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-08-25 21:43 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Jul 14, 2023, Yan Zhao wrote:
> For KVM_X86_QUIRK_CD_NW_CLEARED is on, remove the IPAT (ignore PAT) bit in
> EPT memory types when cache is disabled and non-coherent DMA are present.
> 
> To correctly emulate CR0.CD=1, UC + IPAT are required as memtype in EPT.
> However, as with commit
> fb279950ba02 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED"), WB + IPAT are
> now returned to workaround a BIOS issue that guest MTRRs are enabled too
> late. Without this workaround, a super slow guest boot-up is expected
> during the pre-guest-MTRR-enabled period due to UC as the effective memory
> type for all guest memory.
> 
> Absent emulating CR0.CD=1 with UC, it makes no sense to set IPAT when KVM
> is honoring the guest memtype.
> Removing the IPAT bit in this patch allows effective memory type to honor
> PAT values as well, as WB is the weakest memtype. It means if a guest
> explicitly claims UC as the memtype in PAT, the effective memory is UC
> instead of previous WB. If, for some unknown reason, a guest meets a slow
> boot-up issue with the removal of IPAT, it's desired to fix the blamed PAT
> in the guest.
> 
> Besides, this patch is also a preparation patch for later fine-grained gfn
> zap when guest MTRRs are honored, because it allows zapping only non-WB
> ranges when CR0.CD toggles.
> 
> BTW, returning guest MTRR type as if CR0.CD=0 is also not preferred because
> it still has to hardcode the MTRR type to WB during the

Please use full names instead of prononous, I found the "it still has to hardcode"
part really hard to grok.  I think this is what you're saying?

  BTW, returning guest MTRR type as if CR0.CD=0 is also not preferred because
  KVMs ABI for the quirk also requires KVM to force WB memtype regardless of
  guest MTRRs to workaround the slow guest boot-up issue.

> pre-guest-MTRR-enabled period to workaround the slow guest boot-up issue
> (guest MTRR type when guest MTRRs are disabled is UC).
> In addition, it will make the quirk unnecessarily complexer .

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored
  2023-07-14  6:53 ` [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Yan Zhao
@ 2023-08-25 21:46   ` Sean Christopherson
  2023-09-04  7:46     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-08-25 21:46 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Jul 14, 2023, Yan Zhao wrote:
> Centralize the code to get cache disabled memtype when guest MTRRs are
> honored. If a TDP honors guest MTRRs, it is required to call the provided
> API to get the memtype for CR0.CD=1.
> 
> This is the preparation patch for later implementation of fine-grained gfn
> zap for CR0.CD toggles when guest MTRRs are honored.
> 
> No functional change intended.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  arch/x86/kvm/mtrr.c    | 16 ++++++++++++++++
>  arch/x86/kvm/vmx/vmx.c | 10 +++++-----
>  arch/x86/kvm/x86.h     |  2 ++
>  3 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> index 3ce58734ad22..64c6daa659c8 100644
> --- a/arch/x86/kvm/mtrr.c
> +++ b/arch/x86/kvm/mtrr.c
> @@ -721,3 +721,19 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
>  
>  	return type == mtrr_default_type(mtrr_state);
>  }
> +
> +/*
> + * this routine is supposed to be called when guest mtrrs are honored
> + */
> +void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
> +					   u8 *type, bool *ipat)

I really don't like this helper.  IMO it's a big net negative for the readability
of vmx_get_mt_mask().  As I said in the previous version, I agree that splitting
logic is generally undesirable, but in this case I strongly believe it's the
lesser evil.

> +{
> +	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) {
> +		*type = MTRR_TYPE_WRBACK;
> +		*ipat = false;
> +	} else {
> +		*type = MTRR_TYPE_UNCACHABLE;
> +		*ipat = true;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(kvm_honors_guest_mtrrs_get_cd_memtype);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c1e93678cea4..7fec1ee23b54 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7573,11 +7573,11 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>  		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
>  
>  	if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
> -		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
> -			return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> -		else
> -			return (MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT) |
> -				VMX_EPT_IPAT_BIT;
> +		bool ipat;
> +		u8 cache;
> +
> +		kvm_honors_guest_mtrrs_get_cd_memtype(vcpu, &cache, &ipat);
> +		return cache << VMX_EPT_MT_EPTE_SHIFT | (ipat ? VMX_EPT_IPAT_BIT : 0);
>  	}
>  
>  	return kvm_mtrr_get_guest_memory_type(vcpu, gfn) << VMX_EPT_MT_EPTE_SHIFT;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 82e3dafc5453..e7733dc4dccc 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -313,6 +313,8 @@ int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
>  int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
>  bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
>  					  int page_num);
> +void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
> +					   u8 *type, bool *ipat);
>  bool kvm_vector_hashing_enabled(void);
>  void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
>  int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn when guest MTRRs are honored
  2023-07-14  6:54 ` [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn " Yan Zhao
@ 2023-08-25 22:47   ` Sean Christopherson
  2023-09-04  8:24     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-08-25 22:47 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Jul 14, 2023, Yan Zhao wrote:
> +/*
> + * Add @range into kvm->arch.mtrr_zap_list and sort the list in
> + * "length" ascending + "start" descending order, so that
> + * ranges consuming more zap cycles can be dequeued later and their
> + * chances of being found duplicated are increased.

Wrap comments as close to 80 chars as possible.

> + */
> +static void kvm_add_mtrr_zap_list(struct kvm *kvm, struct mtrr_zap_range *range)
> +{
> +	struct list_head *head = &kvm->arch.mtrr_zap_list;
> +	u64 len = range->end - range->start;
> +	struct mtrr_zap_range *cur, *n;
> +	bool added = false;
> +
> +	spin_lock(&kvm->arch.mtrr_zap_list_lock);
> +
> +	if (list_empty(head)) {
> +		list_add(&range->node, head);
> +		spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +		return;

Make this

		goto out;

or
		goto out_unlock;

and then do the same instead of the break; in the loop.  Then "added" goes away
and there's a single unlock.

> +	}
> +
> +	list_for_each_entry_safe(cur, n, head, node) {

This shouldn't need to use the _safe() variant, it's not deleting anything.

> +		u64 cur_len = cur->end - cur->start;
> +
> +		if (len < cur_len)
> +			break;
> +
> +		if (len > cur_len)
> +			continue;
> +
> +		if (range->start > cur->start)
> +			break;
> +
> +		if (range->start < cur->start)
> +			continue;

Looking at kvm_zap_mtrr_zap_list(), wouldn't we be better off sorting by start,
and then batching in kvm_zap_mtrr_zap_list()?  And maybe make the batching "fuzzy"
for fixed MTRRs?  I.e. if KVM is zapping any fixed MTRRs, zap all fixed MTRR ranges
even if there's a gap.

> +
> +		/* equal len & start, no need to add */
> +		added = true;
> +		kfree(range);


Hmm, the memory allocations are a bit of complexity that'd I'd prefer to avoid.
At a minimum, I think kvm_add_mtrr_zap_list() should do the allocation.  That'll
dedup a decount amount of code.

At the risk of rehashing the old memslots implementation, I think we should simply
have a statically sized array in struct kvm to hold "range to zap".  E.g. use 16
entries, bin all fixed MTRRs into a single range, and if the remaining 15 fill up,
purge and fall back to a full zap.

128 bytes per VM is totally acceptable, especially since we're burning waaay
more than that to deal with per-vCPU MTRRs.  And a well-behaved guest should have
identical MTRRs across all vCPUs, or maybe at worst one config for the BSP and
one for APs.

> +		break;
> +	}
> +
> +	if (!added)
> +		list_add_tail(&range->node, &cur->node);
> +
> +	spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +}
> +
> +static void kvm_zap_mtrr_zap_list(struct kvm *kvm)
> +{
> +	struct list_head *head = &kvm->arch.mtrr_zap_list;
> +	struct mtrr_zap_range *cur = NULL;
> +
> +	spin_lock(&kvm->arch.mtrr_zap_list_lock);
> +
> +	while (!list_empty(head)) {
> +		u64 start, end;
> +
> +		cur = list_first_entry(head, typeof(*cur), node);
> +		start = cur->start;
> +		end = cur->end;
> +		list_del(&cur->node);
> +		kfree(cur);

Hmm, the memory allocations are a bit of complexity that'd I'd prefer to avoid.

> +		spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +
> +		kvm_zap_gfn_range(kvm, start, end);
> +
> +		spin_lock(&kvm->arch.mtrr_zap_list_lock);
> +	}
> +
> +	spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> +}
> +
> +static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm)
> +{
> +	if (atomic_cmpxchg_acquire(&kvm->arch.mtrr_zapping, 0, 1) == 0) {
> +		kvm_zap_mtrr_zap_list(kvm);
> +		atomic_set_release(&kvm->arch.mtrr_zapping, 0);
> +		return;
> +	}
> +
> +	while (atomic_read(&kvm->arch.mtrr_zapping))
> +		cpu_relax();
> +}
> +
> +static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
> +				   gfn_t gfn_start, gfn_t gfn_end)
> +{
> +	struct mtrr_zap_range *range;
> +
> +	range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
> +	if (!range)
> +		goto fail;
> +
> +	range->start = gfn_start;
> +	range->end = gfn_end;
> +
> +	kvm_add_mtrr_zap_list(vcpu->kvm, range);
> +
> +	kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
> +	return;
> +
> +fail:
> +	kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end);
> +}
> +
> +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu)

Rather than provide a one-liner, add something like

  void kvm_mtrr_cr0_cd_changed(struct kvm_vcpu *vcpu)
  {
	if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
		return;

	return kvm_zap_gfn_range(vcpu, 0, -1ull);
  }

that avoids the comically long function name, and keeps the MTRR logic more
contained in the MTRR code.

> +{
> +	return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));

Meh, just zap 0 => ~0ull.  That 51:0 happens to be the theoretical max gfn on
x86 is coincidence (AFAIK).  And if the guest.MAXPHYADDR < 52, shifting ~0ull
still doesn't yield a "legal" gfn.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 32cc8bfaa5f1..bb79154cf465 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -943,7 +943,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>  
>  	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
>  	    kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
> -		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
> +		kvm_honors_guest_mtrrs_zap_on_cd_toggle(vcpu);
>  }
>  EXPORT_SYMBOL_GPL(kvm_post_set_cr0);
>  
> @@ -12310,6 +12310,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	kvm->arch.guest_can_read_msr_platform_info = true;
>  	kvm->arch.enable_pmu = enable_pmu;
>  
> +	spin_lock_init(&kvm->arch.mtrr_zap_list_lock);
> +	INIT_LIST_HEAD(&kvm->arch.mtrr_zap_list);
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
>  	spin_lock_init(&kvm->arch.hv_root_tdp_lock);
>  	kvm->arch.hv_root_tdp = INVALID_PAGE;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index e7733dc4dccc..56d8755b2560 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -315,6 +315,7 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
>  					  int page_num);
>  void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
>  					   u8 *type, bool *ipat);
> +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu);
>  bool kvm_vector_hashing_enabled(void);
>  void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
>  int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap when guest MTRRs are honored
  2023-07-14  6:55 ` [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap " Yan Zhao
@ 2023-08-25 23:13   ` Sean Christopherson
  2023-09-04  8:37     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-08-25 23:13 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Jul 14, 2023, Yan Zhao wrote:
>  void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> +	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
> +	bool mtrr_enabled = mtrr_is_enabled(mtrr_state);
> +	u8 default_mtrr_type;
> +	bool cd_ipat;
> +	u8 cd_type;
> +
> +	kvm_honors_guest_mtrrs_get_cd_memtype(vcpu, &cd_type, &cd_ipat);
> +
> +	default_mtrr_type = mtrr_enabled ? mtrr_default_type(mtrr_state) :
> +			    mtrr_disabled_type(vcpu);
> +
> +	if (cd_type != default_mtrr_type || cd_ipat)
> +		return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));

Why does this use use the MTRR version but the failure path does not?  Ah, because
trying to allocate in the failure path will likely fail to allocate memory.  With
a statically sized array, we can just special case the 0 => -1 case.  Actually,
we can do that regardless, it just doesn't need a dedicated flag if we use an
array.

Using the MTRR version on failure (array is full) means that other vCPUs can see
that everything is being zapped and go straight to waitin.

> +
> +	/*
> +	 * If mtrr is not enabled, it will go to zap all above if the default

Pronouns again.  Maybe this?

	/*
	 * The default MTRR type has already been checked above, if MTRRs are
	 * disabled there are no other MTRR types to consider.
	 */

> +	 * type does not equal to cd_type;
> +	 * Or it has no need to zap if the default type equals to cd_type.
> +	 */
> +	if (mtrr_enabled) {

To save some indentation:

	if (!mtrr_enabled)
		return;


> +		if (prepare_zaplist_fixed_mtrr_of_non_type(vcpu, default_mtrr_type))
> +			goto fail;
> +
> +		if (prepare_zaplist_var_mtrr_of_non_type(vcpu, default_mtrr_type))
> +			goto fail;
> +
> +		kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
> +	}
> +	return;
> +fail:
> +	kvm_clear_mtrr_zap_list(vcpu->kvm);
> +	/* resort to zapping all on failure*/
> +	kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> +	return;
>  }
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range when guest MTRRs are honored
  2023-07-14  6:56 ` [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range " Yan Zhao
@ 2023-08-25 23:15   ` Sean Christopherson
  2023-09-04  8:39     ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-08-25 23:15 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Jul 14, 2023, Yan Zhao wrote:
> Split a single gfn zap range (specifially range [0, ~0UL)) to smaller
> ranges according to current memslot layout when guest MTRRs are honored.
> 
> Though vCPUs have been serialized to perform kvm_zap_gfn_range() for MTRRs
> updates and CR0.CD toggles, contention caused rescheduling cost is still
> huge when there're concurrent page fault holding mmu_lock for read.

Unless the pre-check doesn't work for some reason, I definitely want to avoid
this patch.  This is a lot of complexity that, IIUC, is just working around a
problem elsewhere in KVM.

> Split a single huge zap range according to the actual memslot layout can
> reduce unnecessary transversal and yielding cost in tdp mmu.
> Also, it can increase the chances for larger ranges to find existing ranges
> to zap in zap list.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (11 preceding siblings ...)
  2023-07-14  6:56 ` [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Yan Zhao
@ 2023-08-25 23:17 ` Sean Christopherson
  2023-09-04  8:48   ` Yan Zhao
  2023-10-05  1:29 ` Sean Christopherson
  13 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-08-25 23:17 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Jul 14, 2023, Yan Zhao wrote:
> This series refines mmu zap caused by EPT memory type update when guest
> MTRRs are honored.
> 
> Patches 1-5 revolve around utilizing helper functions to check if
> KVM TDP honors guest MTRRs, TDP zaps and page fault max_level reduction
> are now only targeted to TDPs that honor guest MTRRs.
> 
> -The 5th patch will trigger zapping of TDP leaf entries if non-coherent
>  DMA devices count goes from 0 to 1 or from 1 to 0.
> 
> Patches 6-7 are fixes and patches 9-12 are optimizations for mmu zaps
> when guest MTRRs are honored.
> Those mmu zaps are intended to remove stale memtypes of TDP entries
> caused by changes of guest MTRRs and CR0.CD and are usually triggered from
> all vCPUs in bursts.

Sorry for the delayed review, especially with respect to patches 1-5.  I completely
forgot there were cleanups at the beginning of this series.  I'll make to grab
1-5 early in the 6.7 cycle, even if you haven't sent a new version before then.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU
  2023-08-25 21:34   ` Sean Christopherson
@ 2023-09-04  7:31     ` Yan Zhao
  2023-09-05 22:31       ` Sean Christopherson
  0 siblings, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-09-04  7:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Aug 25, 2023 at 02:34:30PM -0700, Sean Christopherson wrote:
> On Fri, Jul 14, 2023, Yan Zhao wrote:
> > Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for
> > read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G.
> > TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded
> > by RCU lock.
> > 
> > GFN zap can be super slow if mmu_lock is held for write when there are
> > contentions. In worst cases, huge cpu cycles are spent on yielding GFN by
> > GFN, i.e. the loop of "check and flush tlb -> drop rcu lock ->
> > drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered
> > for every GFN.
> > Contentions can either from concurrent zaps holding mmu_lock for write or
> > from tdp_mmu_map() holding mmu_lock for read.
> 
> The lock contention should go away with a pre-check[*], correct?  That's a more
Yes, I think so, though I don't have time to verify it yet.

> complete solution too, in that it also avoids lock contention for the shadow MMU,
> which presumably suffers the same problem (I don't see anything that would prevent
> it from yielding).
> 
> If we do want to zap with mmu_lock held for read, I think we should convert
> kvm_tdp_mmu_zap_leafs() and all its callers to run under read, because unless I'm
> missing something, the rules are the same regardless of _why_ KVM is zapping, e.g.
> the zap needs to be protected by mmu_invalidate_in_progress, which ensures no other
> tasks will race to install SPTEs that are supposed to be zapped.
Yes. I did't do that to the unmap path was only because I don't want to make a
big code change.
The write lock in kvm_unmap_gfn_range() path is taken in arch-agnostic code,
which is not easy to change, right?

> 
> If you post a version of this patch that converts kvm_tdp_mmu_zap_leafs(), please
> post it as a standalone patch.  At a glance it doesn't have any dependencies on the
> MTRR changes, and I don't want this type of changed buried at the end of a series
> that is for a fairly niche setup.  This needs a lot of scrutiny to make sure zapping
> under read really is safe
Given the pre-check patch should work, do you think it's still worthwhile to do
this convertion?

> 
> [*] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com
> 
> > After converting to hold mmu_lock for read, there will be less contentions
> > detected and retaking mmu_lock for read is also faster. There's no need to
> > flush TLB before dropping mmu_lock when there're contentions as SPTEs have
> > been zapped atomically and TLBs are flushed/flush requested immediately
> > within RCU lock.
> > In order to reduce TLB flush count, non-leaf SPTEs not greater than 1G
> > level are allowed to be zapped if their ranges are fully covered in the
> > gfn zap range.
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
  2023-08-25 21:43   ` Sean Christopherson
@ 2023-09-04  7:41     ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-09-04  7:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

...
> > BTW, returning guest MTRR type as if CR0.CD=0 is also not preferred because
> > it still has to hardcode the MTRR type to WB during the
> 
> Please use full names instead of prononous, I found the "it still has to hardcode"
Thanks! yes, will avoid this kind of prononous in future.

> part really hard to grok.  I think this is what you're saying?
> 
>   BTW, returning guest MTRR type as if CR0.CD=0 is also not preferred because
>   KVMs ABI for the quirk also requires KVM to force WB memtype regardless of
>   guest MTRRs to workaround the slow guest boot-up issue.

Yes, exactly :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored
  2023-08-25 21:46   ` Sean Christopherson
@ 2023-09-04  7:46     ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-09-04  7:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Aug 25, 2023 at 02:46:07PM -0700, Sean Christopherson wrote:
> On Fri, Jul 14, 2023, Yan Zhao wrote:
> > Centralize the code to get cache disabled memtype when guest MTRRs are
> > honored. If a TDP honors guest MTRRs, it is required to call the provided
> > API to get the memtype for CR0.CD=1.
> > 
> > This is the preparation patch for later implementation of fine-grained gfn
> > zap for CR0.CD toggles when guest MTRRs are honored.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  arch/x86/kvm/mtrr.c    | 16 ++++++++++++++++
> >  arch/x86/kvm/vmx/vmx.c | 10 +++++-----
> >  arch/x86/kvm/x86.h     |  2 ++
> >  3 files changed, 23 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> > index 3ce58734ad22..64c6daa659c8 100644
> > --- a/arch/x86/kvm/mtrr.c
> > +++ b/arch/x86/kvm/mtrr.c
> > @@ -721,3 +721,19 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
> >  
> >  	return type == mtrr_default_type(mtrr_state);
> >  }
> > +
> > +/*
> > + * this routine is supposed to be called when guest mtrrs are honored
> > + */
> > +void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
> > +					   u8 *type, bool *ipat)
> 
> I really don't like this helper.  IMO it's a big net negative for the readability
> of vmx_get_mt_mask().  As I said in the previous version, I agree that splitting
> logic is generally undesirable, but in this case I strongly believe it's the
> lesser evil.
> 
Ok, will drop this helper in the next version :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn when guest MTRRs are honored
  2023-08-25 22:47   ` Sean Christopherson
@ 2023-09-04  8:24     ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-09-04  8:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Aug 25, 2023 at 03:47:11PM -0700, Sean Christopherson wrote:
> On Fri, Jul 14, 2023, Yan Zhao wrote:
> > +/*
> > + * Add @range into kvm->arch.mtrr_zap_list and sort the list in
> > + * "length" ascending + "start" descending order, so that
> > + * ranges consuming more zap cycles can be dequeued later and their
> > + * chances of being found duplicated are increased.
> 
> Wrap comments as close to 80 chars as possible.
Got it!
I thought it's easy to interpret if a group of words are in one line :)


> > + */
> > +static void kvm_add_mtrr_zap_list(struct kvm *kvm, struct mtrr_zap_range *range)
> > +{
> > +	struct list_head *head = &kvm->arch.mtrr_zap_list;
> > +	u64 len = range->end - range->start;
> > +	struct mtrr_zap_range *cur, *n;
> > +	bool added = false;
> > +
> > +	spin_lock(&kvm->arch.mtrr_zap_list_lock);
> > +
> > +	if (list_empty(head)) {
> > +		list_add(&range->node, head);
> > +		spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> > +		return;
> 
> Make this
> 
> 		goto out;
> 
> or
> 		goto out_unlock;
> 
> and then do the same instead of the break; in the loop.  Then "added" goes away
> and there's a single unlock.
>
Ok.

> > +	}
> > +
> > +	list_for_each_entry_safe(cur, n, head, node) {
> 
> This shouldn't need to use the _safe() variant, it's not deleting anything.
Right. Will remove it.
_safe() version was a legacy of my initial test versions that items were merged
and deleted and I later found they don't have any performance benefit.

> > +		u64 cur_len = cur->end - cur->start;
> > +
> > +		if (len < cur_len)
> > +			break;
> > +
> > +		if (len > cur_len)
> > +			continue;
> > +
> > +		if (range->start > cur->start)
> > +			break;
> > +
> > +		if (range->start < cur->start)
> > +			continue;
> 
> Looking at kvm_zap_mtrr_zap_list(), wouldn't we be better off sorting by start,
> and then batching in kvm_zap_mtrr_zap_list()?  And maybe make the batching "fuzzy"
> for fixed MTRRs?  I.e. if KVM is zapping any fixed MTRRs, zap all fixed MTRR ranges
> even if there's a gap.
Yes, this "fuzzy" is done in the next patch.
In prepare_zaplist_fixed_mtrr_of_non_type(),
	range->start = gpa_to_gfn(fixed_seg_table[0].start);
	range->end = gpa_to_gfn(fixed_seg_table[seg_end].end);
range start is set to start of first fixed range, and end to the end of
last fixed range.

> 
> > +
> > +		/* equal len & start, no need to add */
> > +		added = true;
> > +		kfree(range);
> 
> 
> Hmm, the memory allocations are a bit of complexity that'd I'd prefer to avoid.
> At a minimum, I think kvm_add_mtrr_zap_list() should do the allocation.  That'll
> dedup a decount amount of code.
> 
> At the risk of rehashing the old memslots implementation, I think we should simply
> have a statically sized array in struct kvm to hold "range to zap".  E.g. use 16
> entries, bin all fixed MTRRs into a single range, and if the remaining 15 fill up,
> purge and fall back to a full zap.
> 
> 128 bytes per VM is totally acceptable, especially since we're burning waaay
> more than that to deal with per-vCPU MTRRs.  And a well-behaved guest should have
> identical MTRRs across all vCPUs, or maybe at worst one config for the BSP and
> one for APs.

Ok, will do it in the next version.

> 
> > +		break;
> > +	}
> > +
> > +	if (!added)
> > +		list_add_tail(&range->node, &cur->node);
> > +
> > +	spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> > +}
> > +
> > +static void kvm_zap_mtrr_zap_list(struct kvm *kvm)
> > +{
> > +	struct list_head *head = &kvm->arch.mtrr_zap_list;
> > +	struct mtrr_zap_range *cur = NULL;
> > +
> > +	spin_lock(&kvm->arch.mtrr_zap_list_lock);
> > +
> > +	while (!list_empty(head)) {
> > +		u64 start, end;
> > +
> > +		cur = list_first_entry(head, typeof(*cur), node);
> > +		start = cur->start;
> > +		end = cur->end;
> > +		list_del(&cur->node);
> > +		kfree(cur);
> 
> Hmm, the memory allocations are a bit of complexity that'd I'd prefer to avoid.
yes.

> 
> > +		spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> > +
> > +		kvm_zap_gfn_range(kvm, start, end);
> > +
> > +		spin_lock(&kvm->arch.mtrr_zap_list_lock);
> > +	}
> > +
> > +	spin_unlock(&kvm->arch.mtrr_zap_list_lock);
> > +}
> > +
> > +static void kvm_zap_or_wait_mtrr_zap_list(struct kvm *kvm)
> > +{
> > +	if (atomic_cmpxchg_acquire(&kvm->arch.mtrr_zapping, 0, 1) == 0) {
> > +		kvm_zap_mtrr_zap_list(kvm);
> > +		atomic_set_release(&kvm->arch.mtrr_zapping, 0);
> > +		return;
> > +	}
> > +
> > +	while (atomic_read(&kvm->arch.mtrr_zapping))
> > +		cpu_relax();
> > +}
> > +
> > +static void kvm_mtrr_zap_gfn_range(struct kvm_vcpu *vcpu,
> > +				   gfn_t gfn_start, gfn_t gfn_end)
> > +{
> > +	struct mtrr_zap_range *range;
> > +
> > +	range = kmalloc(sizeof(*range), GFP_KERNEL_ACCOUNT);
> > +	if (!range)
> > +		goto fail;
> > +
> > +	range->start = gfn_start;
> > +	range->end = gfn_end;
> > +
> > +	kvm_add_mtrr_zap_list(vcpu->kvm, range);
> > +
> > +	kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
> > +	return;
> > +
> > +fail:
> > +	kvm_zap_gfn_range(vcpu->kvm, gfn_start, gfn_end);
> > +}
> > +
> > +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu)
> 
> Rather than provide a one-liner, add something like
> 
>   void kvm_mtrr_cr0_cd_changed(struct kvm_vcpu *vcpu)
>   {
> 	if (!kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
> 		return;
> 
> 	return kvm_zap_gfn_range(vcpu, 0, -1ull);
>   }
> 
> that avoids the comically long function name, and keeps the MTRR logic more
> contained in the MTRR code.
Yes, it's better!
Thanks for you guiding :)

> 
> > +{
> > +	return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> 
> Meh, just zap 0 => ~0ull.  That 51:0 happens to be the theoretical max gfn on
> x86 is coincidence (AFAIK).  And if the guest.MAXPHYADDR < 52, shifting ~0ull
> still doesn't yield a "legal" gfn.
Yes. I think I just wanted to make npage to be less in kvm_zap_gfn_range().

kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start);

> 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 32cc8bfaa5f1..bb79154cf465 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -943,7 +943,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
> >  
> >  	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
> >  	    kvm_mmu_honors_guest_mtrrs(vcpu->kvm))
> > -		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
> > +		kvm_honors_guest_mtrrs_zap_on_cd_toggle(vcpu);
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_post_set_cr0);
> >  
> > @@ -12310,6 +12310,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >  	kvm->arch.guest_can_read_msr_platform_info = true;
> >  	kvm->arch.enable_pmu = enable_pmu;
> >  
> > +	spin_lock_init(&kvm->arch.mtrr_zap_list_lock);
> > +	INIT_LIST_HEAD(&kvm->arch.mtrr_zap_list);
> > +
> >  #if IS_ENABLED(CONFIG_HYPERV)
> >  	spin_lock_init(&kvm->arch.hv_root_tdp_lock);
> >  	kvm->arch.hv_root_tdp = INVALID_PAGE;
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index e7733dc4dccc..56d8755b2560 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -315,6 +315,7 @@ bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
> >  					  int page_num);
> >  void kvm_honors_guest_mtrrs_get_cd_memtype(struct kvm_vcpu *vcpu,
> >  					   u8 *type, bool *ipat);
> > +void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu);
> >  bool kvm_vector_hashing_enabled(void);
> >  void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
> >  int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap when guest MTRRs are honored
  2023-08-25 23:13   ` Sean Christopherson
@ 2023-09-04  8:37     ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-09-04  8:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Aug 25, 2023 at 04:13:35PM -0700, Sean Christopherson wrote:
> On Fri, Jul 14, 2023, Yan Zhao wrote:
> >  void kvm_honors_guest_mtrrs_zap_on_cd_toggle(struct kvm_vcpu *vcpu)
> >  {
> > -	return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> > +	struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
> > +	bool mtrr_enabled = mtrr_is_enabled(mtrr_state);
> > +	u8 default_mtrr_type;
> > +	bool cd_ipat;
> > +	u8 cd_type;
> > +
> > +	kvm_honors_guest_mtrrs_get_cd_memtype(vcpu, &cd_type, &cd_ipat);
> > +
> > +	default_mtrr_type = mtrr_enabled ? mtrr_default_type(mtrr_state) :
> > +			    mtrr_disabled_type(vcpu);
> > +
> > +	if (cd_type != default_mtrr_type || cd_ipat)
> > +		return kvm_mtrr_zap_gfn_range(vcpu, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> 
> Why does this use use the MTRR version but the failure path does not?  Ah, because
> trying to allocate in the failure path will likely fail to allocate memory.  With
> a statically sized array, we can just special case the 0 => -1 case.  Actually,
> we can do that regardless, it just doesn't need a dedicated flag if we use an
> array.
> 
> Using the MTRR version on failure (array is full) means that other vCPUs can see
> that everything is being zapped and go straight to waitin.
Yes, will convert to the way of using statially sized array in next version.

> 
> > +
> > +	/*
> > +	 * If mtrr is not enabled, it will go to zap all above if the default
> 
> Pronouns again.  Maybe this?
> 
> 	/*
> 	 * The default MTRR type has already been checked above, if MTRRs are
> 	 * disabled there are no other MTRR types to consider.
> 	 */
Yes, better :)

> > +	 * type does not equal to cd_type;
> > +	 * Or it has no need to zap if the default type equals to cd_type.
> > +	 */
> > +	if (mtrr_enabled) {
> 
> To save some indentation:
> 
> 	if (!mtrr_enabled)
> 		return;
> 
Got it! 

> > +		if (prepare_zaplist_fixed_mtrr_of_non_type(vcpu, default_mtrr_type))
> > +			goto fail;
> > +
> > +		if (prepare_zaplist_var_mtrr_of_non_type(vcpu, default_mtrr_type))
> > +			goto fail;
> > +
> > +		kvm_zap_or_wait_mtrr_zap_list(vcpu->kvm);
> > +	}
> > +	return;
> > +fail:
> > +	kvm_clear_mtrr_zap_list(vcpu->kvm);
> > +	/* resort to zapping all on failure*/
> > +	kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
> > +	return;
> >  }
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range when guest MTRRs are honored
  2023-08-25 23:15   ` Sean Christopherson
@ 2023-09-04  8:39     ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-09-04  8:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Aug 25, 2023 at 04:15:41PM -0700, Sean Christopherson wrote:
> On Fri, Jul 14, 2023, Yan Zhao wrote:
> > Split a single gfn zap range (specifially range [0, ~0UL)) to smaller
> > ranges according to current memslot layout when guest MTRRs are honored.
> > 
> > Though vCPUs have been serialized to perform kvm_zap_gfn_range() for MTRRs
> > updates and CR0.CD toggles, contention caused rescheduling cost is still
> > huge when there're concurrent page fault holding mmu_lock for read.
> 
> Unless the pre-check doesn't work for some reason, I definitely want to avoid
> this patch.  This is a lot of complexity that, IIUC, is just working around a
> problem elsewhere in KVM.
>
I think so too.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap
  2023-08-25 23:17 ` [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
@ 2023-09-04  8:48   ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-09-04  8:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Fri, Aug 25, 2023 at 04:17:09PM -0700, Sean Christopherson wrote:
> On Fri, Jul 14, 2023, Yan Zhao wrote:
> > This series refines mmu zap caused by EPT memory type update when guest
> > MTRRs are honored.
> > 
> > Patches 1-5 revolve around utilizing helper functions to check if
> > KVM TDP honors guest MTRRs, TDP zaps and page fault max_level reduction
> > are now only targeted to TDPs that honor guest MTRRs.
> > 
> > -The 5th patch will trigger zapping of TDP leaf entries if non-coherent
> >  DMA devices count goes from 0 to 1 or from 1 to 0.
> > 
> > Patches 6-7 are fixes and patches 9-12 are optimizations for mmu zaps
> > when guest MTRRs are honored.
> > Those mmu zaps are intended to remove stale memtypes of TDP entries
> > caused by changes of guest MTRRs and CR0.CD and are usually triggered from
> > all vCPUs in bursts.
> 
> Sorry for the delayed review, especially with respect to patches 1-5.  I completely
> forgot there were cleanups at the beginning of this series.  I'll make to grab
> 1-5 early in the 6.7 cycle, even if you haven't sent a new version before then.
Never mind and thanks a lot regarding to patches 1-5!
I may not be able to spin the next version soon as I got a high priority task and
I need to finish the task first (I wish I can complete it in 1-1.5 months)
Sorry and thanks again!


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU
  2023-09-04  7:31     ` Yan Zhao
@ 2023-09-05 22:31       ` Sean Christopherson
  2023-09-06  0:50         ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-09-05 22:31 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Mon, Sep 04, 2023, Yan Zhao wrote:
> On Fri, Aug 25, 2023 at 02:34:30PM -0700, Sean Christopherson wrote:
> > On Fri, Jul 14, 2023, Yan Zhao wrote:
> > > Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for
> > > read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G.
> > > TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded
> > > by RCU lock.
> > > 
> > > GFN zap can be super slow if mmu_lock is held for write when there are
> > > contentions. In worst cases, huge cpu cycles are spent on yielding GFN by
> > > GFN, i.e. the loop of "check and flush tlb -> drop rcu lock ->
> > > drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered
> > > for every GFN.
> > > Contentions can either from concurrent zaps holding mmu_lock for write or
> > > from tdp_mmu_map() holding mmu_lock for read.
> > 
> > The lock contention should go away with a pre-check[*], correct?  That's a more
> Yes, I think so, though I don't have time to verify it yet.
> 
> > complete solution too, in that it also avoids lock contention for the shadow MMU,
> > which presumably suffers the same problem (I don't see anything that would prevent
> > it from yielding).
> > 
> > If we do want to zap with mmu_lock held for read, I think we should convert
> > kvm_tdp_mmu_zap_leafs() and all its callers to run under read, because unless I'm
> > missing something, the rules are the same regardless of _why_ KVM is zapping, e.g.
> > the zap needs to be protected by mmu_invalidate_in_progress, which ensures no other
> > tasks will race to install SPTEs that are supposed to be zapped.
> Yes. I did't do that to the unmap path was only because I don't want to make a
> big code change.
> The write lock in kvm_unmap_gfn_range() path is taken in arch-agnostic code,
> which is not easy to change, right?

Yeah.  The lock itself isn't bad, especially if we can convert all mmu_nofitier
hooks, e.g. we already have KVM_MMU_LOCK(), adding a variant for mmu_notifiers
would be quite easy.

The bigger problem would be kvm_mmu_invalidate_{begin,end}() and getting the
memory ordering right, especially if there are multiple mmu_notifier events in
flight.

But I was actually thinking of a cheesier approach: drop and reacquire mmu_lock
when zapping, e.g. without the necessary changes in tdp_mmu_zap_leafs():

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 735c976913c2..c89a2511789b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -882,9 +882,15 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
 {
        struct kvm_mmu_page *root;
 
+       write_unlock(&kvm->mmu_lock);
+       read_lock(&kvm->mmu_lock);
+
        for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
                flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
 
+       read_unlock(&kvm->mmu_lock);
+       write_lock(&kvm->mmu_lock);
+
        return flush;
 }

vCPUs would still get blocked, but for a smaller duration, and the lock contention
between vCPUs and the zapping task would mostly go away.

> > If you post a version of this patch that converts kvm_tdp_mmu_zap_leafs(), please
> > post it as a standalone patch.  At a glance it doesn't have any dependencies on the
> > MTRR changes, and I don't want this type of changed buried at the end of a series
> > that is for a fairly niche setup.  This needs a lot of scrutiny to make sure zapping
> > under read really is safe
> Given the pre-check patch should work, do you think it's still worthwhile to do
> this convertion?

I do think it would be a net positive, though I don't know that it's worth your
time without a concrete use cases.  My gut instinct could be wrong, so I wouldn't
want to take on the risk of running with mmu_lock held for read without hard
performance numbers to justify the change.

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU
  2023-09-05 22:31       ` Sean Christopherson
@ 2023-09-06  0:50         ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-09-06  0:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, pbonzini, chao.gao, kai.huang,
	robert.hoo.linux, yuan.yao

On Tue, Sep 05, 2023 at 03:31:59PM -0700, Sean Christopherson wrote:
> On Mon, Sep 04, 2023, Yan Zhao wrote:
> > On Fri, Aug 25, 2023 at 02:34:30PM -0700, Sean Christopherson wrote:
> > > On Fri, Jul 14, 2023, Yan Zhao wrote:
> > > > Convert kvm_zap_gfn_range() from holding mmu_lock for write to holding for
> > > > read in TDP MMU and allow zapping of non-leaf SPTEs of level <= 1G.
> > > > TLB flushes are executed/requested within tdp_mmu_zap_spte_atomic() guarded
> > > > by RCU lock.
> > > > 
> > > > GFN zap can be super slow if mmu_lock is held for write when there are
> > > > contentions. In worst cases, huge cpu cycles are spent on yielding GFN by
> > > > GFN, i.e. the loop of "check and flush tlb -> drop rcu lock ->
> > > > drop mmu_lock -> cpu_relax() -> take mmu_lock -> take rcu lock" are entered
> > > > for every GFN.
> > > > Contentions can either from concurrent zaps holding mmu_lock for write or
> > > > from tdp_mmu_map() holding mmu_lock for read.
> > > 
> > > The lock contention should go away with a pre-check[*], correct?  That's a more
> > Yes, I think so, though I don't have time to verify it yet.
> > 
> > > complete solution too, in that it also avoids lock contention for the shadow MMU,
> > > which presumably suffers the same problem (I don't see anything that would prevent
> > > it from yielding).
> > > 
> > > If we do want to zap with mmu_lock held for read, I think we should convert
> > > kvm_tdp_mmu_zap_leafs() and all its callers to run under read, because unless I'm
> > > missing something, the rules are the same regardless of _why_ KVM is zapping, e.g.
> > > the zap needs to be protected by mmu_invalidate_in_progress, which ensures no other
> > > tasks will race to install SPTEs that are supposed to be zapped.
> > Yes. I did't do that to the unmap path was only because I don't want to make a
> > big code change.
> > The write lock in kvm_unmap_gfn_range() path is taken in arch-agnostic code,
> > which is not easy to change, right?
> 
> Yeah.  The lock itself isn't bad, especially if we can convert all mmu_nofitier
> hooks, e.g. we already have KVM_MMU_LOCK(), adding a variant for mmu_notifiers
> would be quite easy.
>
> The bigger problem would be kvm_mmu_invalidate_{begin,end}() and getting the
> memory ordering right, especially if there are multiple mmu_notifier events in
> flight.
> 
> But I was actually thinking of a cheesier approach: drop and reacquire mmu_lock
> when zapping, e.g. without the necessary changes in tdp_mmu_zap_leafs():
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 735c976913c2..c89a2511789b 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -882,9 +882,15 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
>  {
>         struct kvm_mmu_page *root;
>  
> +       write_unlock(&kvm->mmu_lock);
> +       read_lock(&kvm->mmu_lock);
> +
>         for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
>                 flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);
>  
> +       read_unlock(&kvm->mmu_lock);
> +       write_lock(&kvm->mmu_lock);
> +
>         return flush;
>  }
> 
> vCPUs would still get blocked, but for a smaller duration, and the lock contention
> between vCPUs and the zapping task would mostly go away.
>
Yes, I actually did similar thing locally, i.e. releasing write lock and taking
read lock before zapping.
But yes, I also think it's cheesier as the caller of the write lock knows nothing
about its write lock was replaced with read lock.


> > > If you post a version of this patch that converts kvm_tdp_mmu_zap_leafs(), please
> > > post it as a standalone patch.  At a glance it doesn't have any dependencies on the
> > > MTRR changes, and I don't want this type of changed buried at the end of a series
> > > that is for a fairly niche setup.  This needs a lot of scrutiny to make sure zapping
> > > under read really is safe
> > Given the pre-check patch should work, do you think it's still worthwhile to do
> > this convertion?
> 
> I do think it would be a net positive, though I don't know that it's worth your
> time without a concrete use cases.  My gut instinct could be wrong, so I wouldn't
> want to take on the risk of running with mmu_lock held for read without hard
> performance numbers to justify the change.
Ok, I see. May try conversion later if I found out the performance justification.

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap
  2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
                   ` (12 preceding siblings ...)
  2023-08-25 23:17 ` [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
@ 2023-10-05  1:29 ` Sean Christopherson
  2023-10-05  2:19   ` Huang, Kai
  13 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-10-05  1:29 UTC (permalink / raw)
  To: Sean Christopherson, kvm, linux-kernel, Yan Zhao
  Cc: pbonzini, chao.gao, kai.huang, robert.hoo.linux, yuan.yao

On Fri, 14 Jul 2023 14:46:56 +0800, Yan Zhao wrote:
> This series refines mmu zap caused by EPT memory type update when guest
> MTRRs are honored.
> 
> Patches 1-5 revolve around utilizing helper functions to check if
> KVM TDP honors guest MTRRs, TDP zaps and page fault max_level reduction
> are now only targeted to TDPs that honor guest MTRRs.
> 
> [...]

Applied 1-5 and 7 to kvm-x86 mmu.  I squashed 1 and 2 as introducing helpers to
consolidate existing code without converting the existing code is wierd and
makes it unnecessarily impossible to properly test the helpers when they're
added.

I skipped 6, "move TDP zaps from guest MTRRs update to CR0.CD toggling", for
now as your performance numbers showed that it slowed down the guest even
though the number of zaps went down.  I'm definitely not against the patch, I
just don't want to risk regressing guest performance, i.e. I don't wantt to
take it without the rest of the series that takes advantage of the change.

I massaged a few shortlogs and changelogs, but didn't touch any code.  Holler
if anything looks funky.

Thanks much!

[1/5] KVM: x86/mmu: Add helpers to return if KVM honors guest MTRRs
      https://github.com/kvm-x86/linux/commit/6590a37e7ec6
[2/5] KVM: x86/mmu: Zap SPTEs when CR0.CD is toggled iff guest MTRRs are honored
      https://github.com/kvm-x86/linux/commit/c0ad4a14c5af
[3/5] KVM: x86/mmu: Zap SPTEs on MTRR update iff guest MTRRs are honored
      https://github.com/kvm-x86/linux/commit/a1596812cce1
[4/5] KVM: x86/mmu: Xap KVM TDP when noncoherent DMA assignment starts/stops
      https://github.com/kvm-x86/linux/commit/3c4955c04b95
[5/5] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
      https://github.com/kvm-x86/linux/commit/f7b4bcd501ef

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap
  2023-10-05  1:29 ` Sean Christopherson
@ 2023-10-05  2:19   ` Huang, Kai
  2023-10-05  2:28     ` Sean Christopherson
  0 siblings, 1 reply; 40+ messages in thread
From: Huang, Kai @ 2023-10-05  2:19 UTC (permalink / raw)
  To: kvm, Christopherson,, Sean, linux-kernel, Zhao, Yan Y
  Cc: robert.hoo.linux, pbonzini, Gao, Chao, yuan.yao

On Wed, 2023-10-04 at 18:29 -0700, Sean Christopherson wrote:
> [4/5] KVM: x86/mmu: Xap KVM TDP when noncoherent DMA assignment starts/stops
>       https://github.com/kvm-x86/linux/commit/3c4955c04b95

Xap -> Zap? :-)

Apologize if I missed something.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap
  2023-10-05  2:19   ` Huang, Kai
@ 2023-10-05  2:28     ` Sean Christopherson
  2023-10-06  0:50       ` Sean Christopherson
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-10-05  2:28 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-kernel, Yan Y Zhao, robert.hoo.linux, pbonzini,
	Chao Gao, yuan.yao

On Thu, Oct 05, 2023, Kai Huang wrote:
> On Wed, 2023-10-04 at 18:29 -0700, Sean Christopherson wrote:
> > [4/5] KVM: x86/mmu: Xap KVM TDP when noncoherent DMA assignment starts/stops
> >       https://github.com/kvm-x86/linux/commit/3c4955c04b95
> 
> Xap -> Zap? :-)

Dagnabbit, I tried to capitalize z => Z and hit the wrong key.  I'll fixup.

Thanks Kai!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap
  2023-10-05  2:28     ` Sean Christopherson
@ 2023-10-06  0:50       ` Sean Christopherson
  0 siblings, 0 replies; 40+ messages in thread
From: Sean Christopherson @ 2023-10-06  0:50 UTC (permalink / raw)
  To: Kai Huang
  Cc: kvm, linux-kernel, Yan Y Zhao, robert.hoo.linux, pbonzini,
	Chao Gao, yuan.yao

On Wed, Oct 04, 2023, Sean Christopherson wrote:
> On Thu, Oct 05, 2023, Kai Huang wrote:
> > On Wed, 2023-10-04 at 18:29 -0700, Sean Christopherson wrote:
> > > [4/5] KVM: x86/mmu: Xap KVM TDP when noncoherent DMA assignment starts/stops
> > >       https://github.com/kvm-x86/linux/commit/3c4955c04b95
> > 
> > Xap -> Zap? :-)
> 
> Dagnabbit, I tried to capitalize z => Z and hit the wrong key.  I'll fixup.

LOL, the real irony is that this particular patch also has this in the changelog:

  [sean: fix misspelled words in comment and changelog]

Anyways, fixed.  New hashes are:

[4/5] KVM: x86/mmu: Zap KVM TDP when noncoherent DMA assignment starts/stops
      https://github.com/kvm-x86/linux/commit/539c103e2a13
[5/5] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
      https://github.com/kvm-x86/linux/commit/10ed442fefdd

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  2023-07-14  6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
@ 2023-10-07  7:00   ` Like Xu
  2023-10-09 19:52     ` Sean Christopherson
  0 siblings, 1 reply; 40+ messages in thread
From: Like Xu @ 2023-10-07  7:00 UTC (permalink / raw)
  To: Yan Zhao, Sean Christopherson
  Cc: pbonzini, chao.gao, kai.huang, robert.hoo.linux, yuan.yao,
	kvm list, linux-kernel

On 14/7/2023 2:50 pm, Yan Zhao wrote:
> Added helpers to check if KVM honors guest MTRRs.
> The inner helper __kvm_mmu_honors_guest_mtrrs() is also provided to
> outside callers for the purpose of checking if guest MTRRs were honored
> before stopping non-coherent DMA.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   arch/x86/kvm/mmu.h     |  7 +++++++
>   arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++++
>   2 files changed, 22 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 92d5a1924fc1..38bd449226f6 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -235,6 +235,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>   	return -(u32)fault & errcode;
>   }
>   
> +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
> +
> +static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
> +{
> +	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
> +}
> +
>   void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
>   
>   int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1e5db621241f..b4f89f015c37 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4516,6 +4516,21 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
>   }
>   #endif
>   
> +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma)

According to the motivation provided in the comment, the function will no
longer need to be passed the parameter "struct kvm *kvm" but will rely on
the global parameters (plus vm_has_noncoherent_dma), removing "*kvm" ?

> +{
> +	/*
> +	 * If the TDP is enabled, the host MTRRs are ignored by TDP
> +	 * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
> +	 * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
> +	 * from the guest's MTRRs so that guest accesses to memory that is
> +	 * DMA'd aren't cached against the guest's wishes.
> +	 *
> +	 * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
> +	 * e.g. KVM will force UC memtype for host MMIO.
> +	 */
> +	return vm_has_noncoherent_dma && tdp_enabled && shadow_memtype_mask;
> +}
> +
>   int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>   {
>   	/*

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  2023-10-07  7:00   ` Like Xu
@ 2023-10-09 19:52     ` Sean Christopherson
  2023-10-09 21:27       ` Sean Christopherson
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-10-09 19:52 UTC (permalink / raw)
  To: Like Xu
  Cc: Yan Zhao, pbonzini, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, kvm list, linux-kernel

On Sat, Oct 07, 2023, Like Xu wrote:
> On 14/7/2023 2:50 pm, Yan Zhao wrote:
> > Added helpers to check if KVM honors guest MTRRs.
> > The inner helper __kvm_mmu_honors_guest_mtrrs() is also provided to
> > outside callers for the purpose of checking if guest MTRRs were honored
> > before stopping non-coherent DMA.
> > 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >   arch/x86/kvm/mmu.h     |  7 +++++++
> >   arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++++
> >   2 files changed, 22 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 92d5a1924fc1..38bd449226f6 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -235,6 +235,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> >   	return -(u32)fault & errcode;
> >   }
> > +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
> > +
> > +static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
> > +{
> > +	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
> > +}
> > +
> >   void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> >   int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 1e5db621241f..b4f89f015c37 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4516,6 +4516,21 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
> >   }
> >   #endif
> > +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma)
> 
> According to the motivation provided in the comment, the function will no
> longer need to be passed the parameter "struct kvm *kvm" but will rely on
> the global parameters (plus vm_has_noncoherent_dma), removing "*kvm" ?

Yeah, I'll fixup the commit to drop @kvm from the inner helper.  Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  2023-10-09 19:52     ` Sean Christopherson
@ 2023-10-09 21:27       ` Sean Christopherson
  2023-10-09 21:36         ` Sean Christopherson
  2023-10-10  3:46         ` Yan Zhao
  0 siblings, 2 replies; 40+ messages in thread
From: Sean Christopherson @ 2023-10-09 21:27 UTC (permalink / raw)
  To: Like Xu
  Cc: Yan Zhao, pbonzini, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, kvm list, linux-kernel

On Mon, Oct 09, 2023, Sean Christopherson wrote:
> On Sat, Oct 07, 2023, Like Xu wrote:
> > On 14/7/2023 2:50 pm, Yan Zhao wrote:
> > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > > index 92d5a1924fc1..38bd449226f6 100644
> > > --- a/arch/x86/kvm/mmu.h
> > > +++ b/arch/x86/kvm/mmu.h
> > > @@ -235,6 +235,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> > >   	return -(u32)fault & errcode;
> > >   }
> > > +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
> > > +
> > > +static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
> > > +{
> > > +	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
> > > +}
> > > +
> > >   void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> > >   int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 1e5db621241f..b4f89f015c37 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -4516,6 +4516,21 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
> > >   }
> > >   #endif
> > > +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma)
> > 
> > According to the motivation provided in the comment, the function will no
> > longer need to be passed the parameter "struct kvm *kvm" but will rely on
> > the global parameters (plus vm_has_noncoherent_dma), removing "*kvm" ?
> 
> Yeah, I'll fixup the commit to drop @kvm from the inner helper.  Thanks!

Gah, and I gave more bad advice when I suggested this idea.  There's no need to
explicitly check tdp_enabled, as shadow_memtype_mask is set to zero if TDP is
disabled.  And that must be the case, e.g. make_spte() would generate a corrupt
shadow_memtype_mask were non-zero on Intel with shadow paging.

Yan, can you take a look at what I ended up with (see below) to make sure it
looks sane/acceptable to you?

New hashes (assuming I didn't botch things and need even more fixup).

[1/5] KVM: x86/mmu: Add helpers to return if KVM honors guest MTRRs
      https://github.com/kvm-x86/linux/commit/ec1d8217d59b
[2/5] KVM: x86/mmu: Zap SPTEs when CR0.CD is toggled iff guest MTRRs are honored
      https://github.com/kvm-x86/linux/commit/40de16c10b9d
[3/5] KVM: x86/mmu: Zap SPTEs on MTRR update iff guest MTRRs are honored
      https://github.com/kvm-x86/linux/commit/defc3fae8d0f
[4/5] KVM: x86/mmu: Zap KVM TDP when noncoherent DMA assignment starts/stops
      https://github.com/kvm-x86/linux/commit/b344d331adeb
[5/5] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
      https://github.com/kvm-x86/linux/commit/a4d14445c47d

commit ec1d8217d59bd7cb03ae4e80551fee987be98a4e
Author: Yan Zhao <yan.y.zhao@intel.com>
Date:   Fri Jul 14 14:50:06 2023 +0800

    KVM: x86/mmu: Add helpers to return if KVM honors guest MTRRs
    
    Add helpers to check if KVM honors guest MTRRs instead of open coding the
    logic in kvm_tdp_page_fault().  Future fixes and cleanups will also need
    to determine if KVM should honor guest MTRRs, e.g. for CR0.CD toggling and
    and non-coherent DMA transitions.
    
    Provide an inner helper, __kvm_mmu_honors_guest_mtrrs(), so that KVM can
    if guest MTRRs were honored when stopping non-coherent DMA.
    
    Note, there is no need to explicitly check that TDP is enabled, KVM clears
    shadow_memtype_mask when TDP is disabled, i.e. it's non-zero if and only
    if EPT is enabled.
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
    Link: https://lore.kernel.org/r/20230714065006.20201-1-yan.y.zhao@intel.com
    Link: https://lore.kernel.org/r/20230714065043.20258-1-yan.y.zhao@intel.com
    [sean: squash into a one patch, drop explicit TDP check massage changelog]
    Signed-off-by: Sean Christopherson <seanjc@google.com>

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 253fb2093d5d..bb8c86eefac0 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -237,6 +237,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
        return -(u32)fault & errcode;
 }
 
+bool __kvm_mmu_honors_guest_mtrrs(bool vm_has_noncoherent_dma);
+
+static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
+{
+       return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
+}
+
 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f7901cb4d2fa..5d3dc7119e57 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4479,21 +4479,28 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 }
 #endif
 
+bool __kvm_mmu_honors_guest_mtrrs(bool vm_has_noncoherent_dma)
+{
+       /*
+        * If host MTRRs are ignored (shadow_memtype_mask is non-zero), and the
+        * VM has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is
+        * to honor the memtype from the guest's MTRRs so that guest accesses
+        * to memory that is DMA'd aren't cached against the guest's wishes.
+        *
+        * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
+        * e.g. KVM will force UC memtype for host MMIO.
+        */
+       return vm_has_noncoherent_dma && shadow_memtype_mask;
+}
+
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
        /*
         * If the guest's MTRRs may be used to compute the "real" memtype,
         * restrict the mapping level to ensure KVM uses a consistent memtype
-        * across the entire mapping.  If the host MTRRs are ignored by TDP
-        * (shadow_memtype_mask is non-zero), and the VM has non-coherent DMA
-        * (DMA doesn't snoop CPU caches), KVM's ABI is to honor the memtype
-        * from the guest's MTRRs so that guest accesses to memory that is
-        * DMA'd aren't cached against the guest's wishes.
-        *
-        * Note, KVM may still ultimately ignore guest MTRRs for certain PFNs,
-        * e.g. KVM will force UC memtype for host MMIO.
+        * across the entire mapping.
         */
-       if (shadow_memtype_mask && kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
+       if (kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) {
                for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) {
                        int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
                        gfn_t base = gfn_round_for_level(fault->gfn,


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  2023-10-09 21:27       ` Sean Christopherson
@ 2023-10-09 21:36         ` Sean Christopherson
  2023-10-10  3:46         ` Yan Zhao
  1 sibling, 0 replies; 40+ messages in thread
From: Sean Christopherson @ 2023-10-09 21:36 UTC (permalink / raw)
  To: Like Xu
  Cc: Yan Zhao, pbonzini, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, kvm list, linux-kernel

On Mon, Oct 09, 2023, Sean Christopherson wrote:
> On Mon, Oct 09, 2023, Sean Christopherson wrote:
> > On Sat, Oct 07, 2023, Like Xu wrote:
> > > On 14/7/2023 2:50 pm, Yan Zhao wrote:
> > > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > > > index 92d5a1924fc1..38bd449226f6 100644
> > > > --- a/arch/x86/kvm/mmu.h
> > > > +++ b/arch/x86/kvm/mmu.h
> > > > @@ -235,6 +235,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> > > >   	return -(u32)fault & errcode;
> > > >   }
> > > > +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
> > > > +
> > > > +static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
> > > > +{
> > > > +	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
> > > > +}
> > > > +
> > > >   void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> > > >   int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 1e5db621241f..b4f89f015c37 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -4516,6 +4516,21 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
> > > >   }
> > > >   #endif
> > > > +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma)
> > > 
> > > According to the motivation provided in the comment, the function will no
> > > longer need to be passed the parameter "struct kvm *kvm" but will rely on
> > > the global parameters (plus vm_has_noncoherent_dma), removing "*kvm" ?
> > 
> > Yeah, I'll fixup the commit to drop @kvm from the inner helper.  Thanks!
> 
> Gah, and I gave more bad advice when I suggested this idea.  There's no need to
> explicitly check tdp_enabled, as shadow_memtype_mask is set to zero if TDP is
> disabled.  And that must be the case, e.g. make_spte() would generate a corrupt
> shadow_memtype_mask were non-zero on Intel with shadow paging.
> 
> Yan, can you take a look at what I ended up with (see below) to make sure it
> looks sane/acceptable to you?
> 
> New hashes (assuming I didn't botch things and need even more fixup).

Oof, today is not my day.  I forgot to fix the missing "check" in the changelog
that Yan reported.  So *these* are the new hashes, barring yet another goof on
my end.

[1/5] KVM: x86/mmu: Add helpers to return if KVM honors guest MTRRs
      https://github.com/kvm-x86/linux/commit/1affe455d66d
[2/5] KVM: x86/mmu: Zap SPTEs when CR0.CD is toggled iff guest MTRRs are honored
      https://github.com/kvm-x86/linux/commit/7a18c7c2b69a
[3/5] KVM: x86/mmu: Zap SPTEs on MTRR update iff guest MTRRs are honored
      https://github.com/kvm-x86/linux/commit/9a3768191d95
[4/5] KVM: x86/mmu: Zap KVM TDP when noncoherent DMA assignment starts/stops
      https://github.com/kvm-x86/linux/commit/68c320298404
[5/5] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
      https://github.com/kvm-x86/linux/commit/8925b3194512

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  2023-10-09 21:27       ` Sean Christopherson
  2023-10-09 21:36         ` Sean Christopherson
@ 2023-10-10  3:46         ` Yan Zhao
  2023-10-11  0:08           ` Sean Christopherson
  1 sibling, 1 reply; 40+ messages in thread
From: Yan Zhao @ 2023-10-10  3:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Like Xu, pbonzini, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, kvm list, linux-kernel

On Mon, Oct 09, 2023 at 02:27:16PM -0700, Sean Christopherson wrote:
> On Mon, Oct 09, 2023, Sean Christopherson wrote:
> > On Sat, Oct 07, 2023, Like Xu wrote:
> > > On 14/7/2023 2:50 pm, Yan Zhao wrote:
> > > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > > > index 92d5a1924fc1..38bd449226f6 100644
> > > > --- a/arch/x86/kvm/mmu.h
> > > > +++ b/arch/x86/kvm/mmu.h
> > > > @@ -235,6 +235,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> > > >   	return -(u32)fault & errcode;
> > > >   }
> > > > +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma);
> > > > +
> > > > +static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
> > > > +{
> > > > +	return __kvm_mmu_honors_guest_mtrrs(kvm, kvm_arch_has_noncoherent_dma(kvm));
> > > > +}
> > > > +
> > > >   void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> > > >   int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 1e5db621241f..b4f89f015c37 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -4516,6 +4516,21 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
> > > >   }
> > > >   #endif
> > > > +bool __kvm_mmu_honors_guest_mtrrs(struct kvm *kvm, bool vm_has_noncoherent_dma)
> > > 
> > > According to the motivation provided in the comment, the function will no
> > > longer need to be passed the parameter "struct kvm *kvm" but will rely on
> > > the global parameters (plus vm_has_noncoherent_dma), removing "*kvm" ?
> > 
> > Yeah, I'll fixup the commit to drop @kvm from the inner helper.  Thanks!
> 
> Gah, and I gave more bad advice when I suggested this idea.  There's no need to
> explicitly check tdp_enabled, as shadow_memtype_mask is set to zero if TDP is
> disabled.  And that must be the case, e.g. make_spte() would generate a corrupt
> shadow_memtype_mask were non-zero on Intel with shadow paging.
> 
> Yan, can you take a look at what I ended up with (see below) to make sure it
> looks sane/acceptable to you?
yes, tested working on my side.
I think why we added the checking of tdp_enabled was because of the existing check
in patch 3. As noncoherent DMAs checking is not on hot paths, the previous double
checking is also good :)

BTW, as param "kvm" is now removed from the helper, better to remove the word
"second" in comment in patch 4, i.e.

-        * So, specify the second parameter as true here to indicate
-        * non-coherent DMAs are/were involved and TDP zap might be
-        * necessary.
+        * So, specify the parameter as true here to indicate non-coherent
+        * DMAs are/were involved and TDP zap might be necessary.

Sorry and thanks a lot for helps on this series! 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  2023-10-10  3:46         ` Yan Zhao
@ 2023-10-11  0:08           ` Sean Christopherson
  2023-10-11  1:47             ` Yan Zhao
  0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2023-10-11  0:08 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Like Xu, pbonzini, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, kvm list, linux-kernel

On Tue, Oct 10, 2023, Yan Zhao wrote:
> BTW, as param "kvm" is now removed from the helper, better to remove the word
> "second" in comment in patch 4, i.e.
> 
> -        * So, specify the second parameter as true here to indicate
> -        * non-coherent DMAs are/were involved and TDP zap might be
> -        * necessary.
> +        * So, specify the parameter as true here to indicate non-coherent
> +        * DMAs are/were involved and TDP zap might be necessary.
> 
> Sorry and thanks a lot for helps on this series!

Heh, don't be sorry, it's not your fault I can't get this quite right.  Fixed
up yet again, hopefully for the last time.  This is what I ended up with for the
comment:

	/*
	 * Non-coherent DMA assignment and de-assignment will affect
	 * whether KVM honors guest MTRRs and cause changes in memtypes
	 * in TDP.
	 * So, pass %true unconditionally to indicate non-coherent DMA was,
	 * or will be involved, and that zapping SPTEs might be necessary.
	 */

and the hashes:

[1/5] KVM: x86/mmu: Add helpers to return if KVM honors guest MTRRs
      https://github.com/kvm-x86/linux/commit/1affe455d66d
[2/5] KVM: x86/mmu: Zap SPTEs when CR0.CD is toggled iff guest MTRRs are honored
      https://github.com/kvm-x86/linux/commit/7a18c7c2b69a
[3/5] KVM: x86/mmu: Zap SPTEs on MTRR update iff guest MTRRs are honored
      https://github.com/kvm-x86/linux/commit/9a3768191d95
[4/5] KVM: x86/mmu: Zap KVM TDP when noncoherent DMA assignment starts/stops
      https://github.com/kvm-x86/linux/commit/362ff6dca541
[5/5] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
      https://github.com/kvm-x86/linux/commit/c9f65a3f2d92

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs
  2023-10-11  0:08           ` Sean Christopherson
@ 2023-10-11  1:47             ` Yan Zhao
  0 siblings, 0 replies; 40+ messages in thread
From: Yan Zhao @ 2023-10-11  1:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Like Xu, pbonzini, chao.gao, kai.huang, robert.hoo.linux,
	yuan.yao, kvm list, linux-kernel

On Tue, Oct 10, 2023 at 05:08:08PM -0700, Sean Christopherson wrote:
> On Tue, Oct 10, 2023, Yan Zhao wrote:
> > BTW, as param "kvm" is now removed from the helper, better to remove the word
> > "second" in comment in patch 4, i.e.
> > 
> > -        * So, specify the second parameter as true here to indicate
> > -        * non-coherent DMAs are/were involved and TDP zap might be
> > -        * necessary.
> > +        * So, specify the parameter as true here to indicate non-coherent
> > +        * DMAs are/were involved and TDP zap might be necessary.
> > 
> > Sorry and thanks a lot for helps on this series!
> 
> Heh, don't be sorry, it's not your fault I can't get this quite right.  Fixed
> up yet again, hopefully for the last time.  This is what I ended up with for the
> comment:
> 
> 	/*
> 	 * Non-coherent DMA assignment and de-assignment will affect
> 	 * whether KVM honors guest MTRRs and cause changes in memtypes
> 	 * in TDP.
> 	 * So, pass %true unconditionally to indicate non-coherent DMA was,
> 	 * or will be involved, and that zapping SPTEs might be necessary.
> 	 */
> 
> and the hashes:
> 
> [1/5] KVM: x86/mmu: Add helpers to return if KVM honors guest MTRRs
>       https://github.com/kvm-x86/linux/commit/1affe455d66d
> [2/5] KVM: x86/mmu: Zap SPTEs when CR0.CD is toggled iff guest MTRRs are honored
>       https://github.com/kvm-x86/linux/commit/7a18c7c2b69a
> [3/5] KVM: x86/mmu: Zap SPTEs on MTRR update iff guest MTRRs are honored
>       https://github.com/kvm-x86/linux/commit/9a3768191d95
> [4/5] KVM: x86/mmu: Zap KVM TDP when noncoherent DMA assignment starts/stops
>       https://github.com/kvm-x86/linux/commit/362ff6dca541
> [5/5] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED
>       https://github.com/kvm-x86/linux/commit/c9f65a3f2d92

Looks good to me, thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-10-11  2:15 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-14  6:46 [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Yan Zhao
2023-07-14  6:50 ` [PATCH v4 01/12] KVM: x86/mmu: helpers to return if KVM honors guest MTRRs Yan Zhao
2023-10-07  7:00   ` Like Xu
2023-10-09 19:52     ` Sean Christopherson
2023-10-09 21:27       ` Sean Christopherson
2023-10-09 21:36         ` Sean Christopherson
2023-10-10  3:46         ` Yan Zhao
2023-10-11  0:08           ` Sean Christopherson
2023-10-11  1:47             ` Yan Zhao
2023-07-14  6:50 ` [PATCH v4 02/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper in kvm_tdp_page_fault() Yan Zhao
2023-07-14  6:51 ` [PATCH v4 03/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when CR0.CD toggles Yan Zhao
2023-07-14  6:51 ` [PATCH v4 04/12] KVM: x86/mmu: Use KVM honors guest MTRRs helper when update mtrr Yan Zhao
2023-07-14  6:52 ` [PATCH v4 05/12] KVM: x86/mmu: zap KVM TDP when noncoherent DMA assignment starts/stops Yan Zhao
2023-07-14  6:52 ` [PATCH v4 06/12] KVM: x86/mmu: move TDP zaps from guest MTRRs update to CR0.CD toggling Yan Zhao
2023-07-14  6:53 ` [PATCH v4 07/12] KVM: VMX: drop IPAT in memtype when CD=1 for KVM_X86_QUIRK_CD_NW_CLEARED Yan Zhao
2023-08-25 21:43   ` Sean Christopherson
2023-09-04  7:41     ` Yan Zhao
2023-07-14  6:53 ` [PATCH v4 08/12] KVM: x86: centralize code to get CD=1 memtype when guest MTRRs are honored Yan Zhao
2023-08-25 21:46   ` Sean Christopherson
2023-09-04  7:46     ` Yan Zhao
2023-07-14  6:54 ` [PATCH v4 09/12] KVM: x86/mmu: serialize vCPUs to zap gfn " Yan Zhao
2023-08-25 22:47   ` Sean Christopherson
2023-09-04  8:24     ` Yan Zhao
2023-07-14  6:55 ` [PATCH v4 10/12] KVM: x86/mmu: fine-grained gfn zap " Yan Zhao
2023-08-25 23:13   ` Sean Christopherson
2023-09-04  8:37     ` Yan Zhao
2023-07-14  6:56 ` [PATCH v4 11/12] KVM: x86/mmu: split a single gfn zap range " Yan Zhao
2023-08-25 23:15   ` Sean Christopherson
2023-09-04  8:39     ` Yan Zhao
2023-07-14  6:56 ` [PATCH v4 12/12] KVM: x86/mmu: convert kvm_zap_gfn_range() to use shared mmu_lock in TDP MMU Yan Zhao
2023-08-25 21:34   ` Sean Christopherson
2023-09-04  7:31     ` Yan Zhao
2023-09-05 22:31       ` Sean Christopherson
2023-09-06  0:50         ` Yan Zhao
2023-08-25 23:17 ` [PATCH v4 00/12] KVM: x86/mmu: refine memtype related mmu zap Sean Christopherson
2023-09-04  8:48   ` Yan Zhao
2023-10-05  1:29 ` Sean Christopherson
2023-10-05  2:19   ` Huang, Kai
2023-10-05  2:28     ` Sean Christopherson
2023-10-06  0:50       ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.