kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table
@ 2021-04-15 11:50 Yanan Wang
  2021-04-15 11:50 ` [PATCH v5 1/6] KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag Yanan Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Yanan Wang @ 2021-04-15 11:50 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Quentin Perret, Alexandru Elisei,
	kvmarm, linux-arm-kernel, kvm, linux-kernel
  Cc: Catalin Marinas, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, wanghaibin.wang, zhukeqian1, yuzenghui, Yanan Wang

Hi,

This series makes some efficiency improvement of guest stage-2 page
table code, and there are some test results to quantify the benefit.
The code has been re-arranged based on the latest kvmarm/next tree.

Descriptions:
We currently uniformly permorm CMOs of D-cache and I-cache in function
user_mem_abort before calling the fault handlers. If we get concurrent
guest faults(e.g. translation faults, permission faults) or some really
unnecessary guest faults caused by BBM, CMOs for the first vcpu are
necessary while the others later are not.

By moving CMOs to the fault handlers, we can easily identify conditions
where they are really needed and avoid the unnecessary ones. As it's a
time consuming process to perform CMOs especially when flushing a block
range, so this solution reduces much load of kvm and improve efficiency
of the stage-2 page table code.

In this series, patch #1, #3, #4 make preparation for place movement
of CMOs (adapt to the latest stage-2 page table framework). And patch
#2, #5 move CMOs of D-cache and I-cache to the fault handlers. Patch
#6 introduces a new way to distinguish cases of memcache allocations.

The following are results in v3 to represent the benefit introduced
by movement of CMOs, and they were tested by [1] (kvm/selftest) that
I have posted recently.
[1] https://lore.kernel.org/lkml/20210302125751.19080-1-wangyanan55@huawei.com/

When there are muitiple vcpus concurrently accessing the same memory
region, we can test the execution time of KVM creating new mappings,
updating the permissions of old mappings from RO to RW, and the time
of re-creating the blocks after they have been split.

hardware platform: HiSilicon Kunpeng920 Server
host kernel: Linux mainline v5.12-rc2

cmdline: ./kvm_page_table_test -m 4 -s anonymous -b 1G -v 80
           (80 vcpus, 1G memory, page mappings(normal 4K))
KVM_CREATE_MAPPINGS: before 104.35s -> after  90.42s  +13.35%
KVM_UPDATE_MAPPINGS: before  78.64s -> after  75.45s  + 4.06%

cmdline: ./kvm_page_table_test -m 4 -s anonymous_thp -b 20G -v 40
           (40 vcpus, 20G memory, block mappings(THP 2M))
KVM_CREATE_MAPPINGS: before  15.66s -> after   6.92s  +55.80%
KVM_UPDATE_MAPPINGS: before 178.80s -> after 123.35s  +31.00%
KVM_REBUILD_BLOCKS:  before 187.34s -> after 131.76s  +30.65%

cmdline: ./kvm_page_table_test -m 4 -s anonymous_hugetlb_1gb -b 20G -v 40
           (40 vcpus, 20G memory, block mappings(HUGETLB 1G))
KVM_CREATE_MAPPINGS: before 104.54s -> after   3.70s  +96.46%
KVM_UPDATE_MAPPINGS: before 174.20s -> after 115.94s  +33.44%
KVM_REBUILD_BLOCKS:  before 103.95s -> after   2.96s  +97.15%

---

Changelogs:
v4->v5:
- rebased on the latest kvmarm/tree to adapt to the new stage-2 page-table code
- v4: https://lore.kernel.org/lkml/20210409033652.28316-1-wangyanan55@huawei.com/

v3->v4:
- perform D-cache flush if we are not mapping device memory
- rebased on top of mainline v5.12-rc6
- v3: https://lore.kernel.org/lkml/20210326031654.3716-1-wangyanan55@huawei.com/

v2->v3:
- drop patch #3 in v2
- retest v3 based on v5.12-rc2
- v2: https://lore.kernel.org/lkml/20210310094319.18760-1-wangyanan55@huawei.com/

v1->v2:
- rebased on top of mainline v5.12-rc2
- also move CMOs of I-cache to the fault handlers
- retest v2 based on v5.12-rc2
- v1: https://lore.kernel.org/lkml/20210208112250.163568-1-wangyanan55@huawei.com/

---

Yanan Wang (6):
  KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag
  KVM: arm64: Move D-cache flush to the fault handlers
  KVM: arm64: Add mm_ops member for structure stage2_attr_data
  KVM: arm64: Provide invalidate_icache_range at non-VHE EL2
  KVM: arm64: Move I-cache flush to the fault handlers
  KVM: arm64: Distinguish cases of memcache allocations completely

 arch/arm64/include/asm/kvm_mmu.h     | 31 -------------
 arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++------
 arch/arm64/kvm/hyp/nvhe/cache.S      | 11 +++++
 arch/arm64/kvm/hyp/pgtable.c         | 65 +++++++++++++++++++++++-----
 arch/arm64/kvm/mmu.c                 | 51 ++++++++--------------
 5 files changed, 107 insertions(+), 89 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 1/6] KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag
  2021-04-15 11:50 [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
@ 2021-04-15 11:50 ` Yanan Wang
  2021-06-02 10:43   ` Quentin Perret
  2021-04-15 11:50 ` [PATCH v5 2/6] KVM: arm64: Move D-cache flush to the fault handlers Yanan Wang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Yanan Wang @ 2021-04-15 11:50 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Quentin Perret, Alexandru Elisei,
	kvmarm, linux-arm-kernel, kvm, linux-kernel
  Cc: Catalin Marinas, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, wanghaibin.wang, zhukeqian1, yuzenghui, Yanan Wang

The stage-2 page table code in pgtable.c now is generally used for
guest stage-2 and host stage-2. There may be some different issues
between guest S2 page-table and host S2 page-table that we should
consider, e.g., whether CMOs are needed when creating a new mapping.

So introduce the KVM_PGTABLE_S2_GUEST flag to determine if we are
doing something about guest stage-2. This flag will be used in a
coming patch, in which we will move CMOs for guest to pgtable.c.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++++++++++----------
 arch/arm64/kvm/mmu.c                 |  3 ++-
 2 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c3674c47d48c..a43cbe697b37 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -61,10 +61,12 @@ struct kvm_pgtable_mm_ops {
  * @KVM_PGTABLE_S2_NOFWB:	Don't enforce Normal-WB even if the CPUs have
  *				ARM64_HAS_STAGE2_FWB.
  * @KVM_PGTABLE_S2_IDMAP:	Only use identity mappings.
+ * @KVM_PGTABLE_S2_GUEST:	Whether the page-tables are guest stage-2.
  */
 enum kvm_pgtable_stage2_flags {
 	KVM_PGTABLE_S2_NOFWB			= BIT(0),
 	KVM_PGTABLE_S2_IDMAP			= BIT(1),
+	KVM_PGTABLE_S2_GUEST			= BIT(2),
 };
 
 /**
@@ -221,12 +223,10 @@ int kvm_pgtable_stage2_init_flags(struct kvm_pgtable *pgt, struct kvm_arch *arch
 				  struct kvm_pgtable_mm_ops *mm_ops,
 				  enum kvm_pgtable_stage2_flags flags);
 
-#define kvm_pgtable_stage2_init(pgt, arch, mm_ops) \
-	kvm_pgtable_stage2_init_flags(pgt, arch, mm_ops, 0)
-
 /**
  * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  *
  * The page-table is assumed to be unreachable by any hardware walkers prior
  * to freeing and therefore no TLB invalidation is performed.
@@ -235,7 +235,8 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
 
 /**
  * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Intermediate physical address at which to place the mapping.
  * @size:	Size of the mapping.
  * @phys:	Physical address of the memory to map.
@@ -268,7 +269,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 /**
  * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
  *				    track ownership.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Base intermediate physical address to annotate.
  * @size:	Size of the annotated range.
  * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
@@ -287,7 +289,8 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Intermediate physical address from which to remove the mapping.
  * @size:	Size of the mapping.
  *
@@ -307,7 +310,8 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
 /**
  * kvm_pgtable_stage2_wrprotect() - Write-protect guest stage-2 address range
  *                                  without TLB invalidation.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Intermediate physical address from which to write-protect,
  * @size:	Size of the range.
  *
@@ -324,7 +328,8 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
 
 /**
  * kvm_pgtable_stage2_mkyoung() - Set the access flag in a page-table entry.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Intermediate physical address to identify the page-table entry.
  *
  * The offset of @addr within a page is ignored.
@@ -338,7 +343,8 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
 
 /**
  * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Intermediate physical address to identify the page-table entry.
  *
  * The offset of @addr within a page is ignored.
@@ -357,7 +363,8 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
 /**
  * kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
  *				      page-table entry.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Intermediate physical address to identify the page-table entry.
  * @prot:	Additional permissions to grant for the mapping.
  *
@@ -376,7 +383,8 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 /**
  * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
  *				   access flag set.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Intermediate physical address to identify the page-table entry.
  *
  * The offset of @addr within a page is ignored.
@@ -389,7 +397,8 @@ bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
  * kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
  * 				      of Coherency for guest stage-2 address
  *				      range.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Intermediate physical address from which to flush.
  * @size:	Size of the range.
  *
@@ -428,7 +437,8 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
  * kvm_pgtable_stage2_find_range() - Find a range of Intermediate Physical
  *				     Addresses with compatible permission
  *				     attributes.
- * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
+ * @pgt:	Page-table structure initialised by function
+ *		kvm_pgtable_stage2_init_flags().
  * @addr:	Address that must be covered by the range.
  * @prot:	Protection attributes that the range must be compatible with.
  * @range:	Range structure used to limit the search space at call time and
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index cd4d51ae3d4a..2cfcfc5f4e4e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -457,7 +457,8 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	if (!pgt)
 		return -ENOMEM;
 
-	err = kvm_pgtable_stage2_init(pgt, &kvm->arch, &kvm_s2_mm_ops);
+	err = kvm_pgtable_stage2_init_flags(pgt, &kvm->arch, &kvm_s2_mm_ops,
+					    KVM_PGTABLE_S2_GUEST);
 	if (err)
 		goto out_free_pgtable;
 
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 2/6] KVM: arm64: Move D-cache flush to the fault handlers
  2021-04-15 11:50 [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
  2021-04-15 11:50 ` [PATCH v5 1/6] KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag Yanan Wang
@ 2021-04-15 11:50 ` Yanan Wang
       [not found]   ` <877djc1sca.wl-maz@kernel.org>
  2021-04-15 11:50 ` [PATCH v5 3/6] KVM: arm64: Add mm_ops member for structure stage2_attr_data Yanan Wang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Yanan Wang @ 2021-04-15 11:50 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Quentin Perret, Alexandru Elisei,
	kvmarm, linux-arm-kernel, kvm, linux-kernel
  Cc: Catalin Marinas, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, wanghaibin.wang, zhukeqian1, yuzenghui, Yanan Wang

We currently uniformly permorm CMOs of D-cache and I-cache in function
user_mem_abort before calling the fault handlers. If we get concurrent
guest faults(e.g. translation faults, permission faults) or some really
unnecessary guest faults caused by BBM, CMOs for the first vcpu are
necessary while the others later are not.

By moving CMOs to the fault handlers, we can easily identify conditions
where they are really needed and avoid the unnecessary ones. As it's a
time consuming process to perform CMOs especially when flushing a block
range, so this solution reduces much load of kvm and improve efficiency
of the page table code.

This patch only moves clean of D-cache to the map path, and drop the
original APIs in mmu.c/mmu.h for D-cache maintenance by using what we
already have in pgtable.c. Change about the I-side will come from a
later patch.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 16 ----------------
 arch/arm64/kvm/hyp/pgtable.c     | 20 ++++++++++++++------
 arch/arm64/kvm/mmu.c             | 14 +++-----------
 3 files changed, 17 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 25ed956f9af1..e9b163c5f023 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -187,22 +187,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-	void *va = page_address(pfn_to_page(pfn));
-
-	/*
-	 * With FWB, we ensure that the guest always accesses memory using
-	 * cacheable attributes, and we don't have to clean to PoC when
-	 * faulting in pages. Furthermore, FWB implies IDC, so cleaning to
-	 * PoU is not required either in this case.
-	 */
-	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
-		return;
-
-	kvm_flush_dcache_to_poc(va, size);
-}
-
 static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
 						  unsigned long size)
 {
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index c37c1dc4feaf..e3606c9dcec7 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -562,6 +562,12 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
+static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
+{
+	u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
+	return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
+}
+
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
 			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -583,6 +589,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
+	struct kvm_pgtable *pgt = data->mmu->pgt;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
@@ -606,6 +613,13 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
 	}
 
+	/* Perform CMOs before installation of the guest stage-2 PTE */
+	if (pgt->flags & KVM_PGTABLE_S2_GUEST) {
+		if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
+			__flush_dcache_area(mm_ops->phys_to_virt(phys),
+					    granule);
+	}
+
 	smp_store_release(ptep, new);
 	if (stage2_pte_is_counted(new))
 		mm_ops->get_page(ptep);
@@ -798,12 +812,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
-{
-	u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
-	return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
-}
-
 static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2cfcfc5f4e4e..86f7dd1c234f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -694,11 +694,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-	__clean_dcache_guest_page(pfn, size);
-}
-
 static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
 	__invalidate_icache_guest_page(pfn, size);
@@ -972,9 +967,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (fault_status != FSC_PERM && !device)
-		clean_dcache_guest_page(pfn, vma_pagesize);
-
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
 		invalidate_icache_guest_page(pfn, vma_pagesize);
@@ -1234,10 +1226,10 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	trace_kvm_set_spte_hva(hva);
 
 	/*
-	 * We've moved a page around, probably through CoW, so let's treat it
-	 * just like a translation fault and clean the cache to the PoC.
+	 * We've moved a page around, probably through CoW, so let's treat
+	 * it just like a translation fault and the map handler will clean
+	 * the cache to the PoC.
 	 */
-	clean_dcache_guest_page(pfn, PAGE_SIZE);
 	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
 	return 0;
 }
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 3/6] KVM: arm64: Add mm_ops member for structure stage2_attr_data
  2021-04-15 11:50 [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
  2021-04-15 11:50 ` [PATCH v5 1/6] KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag Yanan Wang
  2021-04-15 11:50 ` [PATCH v5 2/6] KVM: arm64: Move D-cache flush to the fault handlers Yanan Wang
@ 2021-04-15 11:50 ` Yanan Wang
  2021-04-15 11:50 ` [PATCH v5 4/6] KVM: arm64: Provide invalidate_icache_range at non-VHE EL2 Yanan Wang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Yanan Wang @ 2021-04-15 11:50 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Quentin Perret, Alexandru Elisei,
	kvmarm, linux-arm-kernel, kvm, linux-kernel
  Cc: Catalin Marinas, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, wanghaibin.wang, zhukeqian1, yuzenghui, Yanan Wang

Also add a mm_ops member for structure stage2_attr_data, since we
will move I-cache maintenance for guest stage-2 to the permission
path and as a result will need mm_ops for address transformation.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index e3606c9dcec7..b480f6d1171e 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -869,10 +869,11 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 }
 
 struct stage2_attr_data {
-	kvm_pte_t	attr_set;
-	kvm_pte_t	attr_clr;
-	kvm_pte_t	pte;
-	u32		level;
+	kvm_pte_t			attr_set;
+	kvm_pte_t			attr_clr;
+	kvm_pte_t			pte;
+	u32				level;
+	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
@@ -911,6 +912,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct stage2_attr_data data = {
 		.attr_set	= attr_set & attr_mask,
 		.attr_clr	= attr_clr & attr_mask,
+		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 4/6] KVM: arm64: Provide invalidate_icache_range at non-VHE EL2
  2021-04-15 11:50 [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
                   ` (2 preceding siblings ...)
  2021-04-15 11:50 ` [PATCH v5 3/6] KVM: arm64: Add mm_ops member for structure stage2_attr_data Yanan Wang
@ 2021-04-15 11:50 ` Yanan Wang
       [not found]   ` <875yyw1s73.wl-maz@kernel.org>
  2021-04-15 11:50 ` [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers Yanan Wang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Yanan Wang @ 2021-04-15 11:50 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Quentin Perret, Alexandru Elisei,
	kvmarm, linux-arm-kernel, kvm, linux-kernel
  Cc: Catalin Marinas, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, wanghaibin.wang, zhukeqian1, yuzenghui, Yanan Wang

We want to move I-cache maintenance for the guest to the stage-2
page table code for performance improvement. Before it can work,
we should first make function invalidate_icache_range available
to non-VHE EL2 to avoid compiling or program running error, as
pgtable.c is now linked into the non-VHE EL2 code for pKVM mode.

In this patch, we only introduce symbol of invalidate_icache_range
with no real functionality in nvhe/cache.S, because there haven't
been situations found currently where I-cache maintenance is also
needed in non-VHE EL2 for pKVM mode.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/hyp/nvhe/cache.S | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
index 36cef6915428..a125ec9aeed2 100644
--- a/arch/arm64/kvm/hyp/nvhe/cache.S
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -11,3 +11,14 @@ SYM_FUNC_START_PI(__flush_dcache_area)
 	dcache_by_line_op civac, sy, x0, x1, x2, x3
 	ret
 SYM_FUNC_END_PI(__flush_dcache_area)
+
+/*
+ *	invalidate_icache_range(start,end)
+ *
+ *	Ensure that the I cache is invalid within specified region.
+ *
+ *	- start   - virtual start address of region
+ *	- end     - virtual end address of region
+ */
+SYM_FUNC_START(invalidate_icache_range)
+SYM_FUNC_END(invalidate_icache_range)
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers
  2021-04-15 11:50 [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
                   ` (3 preceding siblings ...)
  2021-04-15 11:50 ` [PATCH v5 4/6] KVM: arm64: Provide invalidate_icache_range at non-VHE EL2 Yanan Wang
@ 2021-04-15 11:50 ` Yanan Wang
  2021-06-02 10:58   ` Quentin Perret
  2021-04-15 11:50 ` [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely Yanan Wang
  2021-05-12 12:54 ` [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table wangyanan (Y)
  6 siblings, 1 reply; 17+ messages in thread
From: Yanan Wang @ 2021-04-15 11:50 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Quentin Perret, Alexandru Elisei,
	kvmarm, linux-arm-kernel, kvm, linux-kernel
  Cc: Catalin Marinas, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, wanghaibin.wang, zhukeqian1, yuzenghui, Yanan Wang

In this patch, we move invalidation of I-cache to the fault handlers to
avoid unnecessary I-cache maintenances. On the map path, invalidate the
I-cache if we are going to create an executable stage-2 mapping for guest.
And on the permission path, invalidate the I-cache if we are going to add
an executable permission to the existing guest stage-2 mapping.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 15 --------------
 arch/arm64/kvm/hyp/pgtable.c     | 35 +++++++++++++++++++++++++++++++-
 arch/arm64/kvm/mmu.c             |  9 +-------
 3 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e9b163c5f023..155492fe5b15 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -187,21 +187,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
-						  unsigned long size)
-{
-	if (icache_is_aliasing()) {
-		/* any kind of VIPT cache */
-		__flush_icache_all();
-	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
-		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
-		void *va = page_address(pfn_to_page(pfn));
-
-		invalidate_icache_range((unsigned long)va,
-					(unsigned long)va + size);
-	}
-}
-
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b480f6d1171e..9f4429d80df0 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -568,6 +568,26 @@ static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
 	return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
 }
 
+static bool stage2_pte_executable(kvm_pte_t pte)
+{
+	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
+}
+
+static void stage2_invalidate_icache(void *addr, u64 size)
+{
+	if (icache_is_aliasing()) {
+		/* Any kind of VIPT cache */
+		__flush_icache_all();
+	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
+		/*
+		 * See comment in __kvm_tlb_flush_vmid_ipa().
+		 * Invalidate PIPT, or VPIPT at EL2.
+		 */
+		invalidate_icache_range((unsigned long)addr,
+					(unsigned long)addr + size);
+	}
+}
+
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
 			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -618,6 +638,10 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
 			__flush_dcache_area(mm_ops->phys_to_virt(phys),
 					    granule);
+
+		if (stage2_pte_executable(new))
+			stage2_invalidate_icache(mm_ops->phys_to_virt(phys),
+						 granule);
 	}
 
 	smp_store_release(ptep, new);
@@ -896,8 +920,17 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * but worst-case the access flag update gets lost and will be
 	 * set on the next access instead.
 	 */
-	if (data->pte != pte)
+	if (data->pte != pte) {
+		/*
+		 * Invalidate the instruction cache before updating
+		 * if we are going to add the executable permission
+		 * for the guest stage-2 PTE.
+		 */
+		if (!stage2_pte_executable(*ptep) && stage2_pte_executable(pte))
+			stage2_invalidate_icache(kvm_pte_follow(pte, data->mm_ops),
+						 kvm_granule_size(level));
 		WRITE_ONCE(*ptep, pte);
+	}
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 86f7dd1c234f..aa536392b308 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -694,11 +694,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-	__invalidate_icache_guest_page(pfn, size);
-}
-
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
 {
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
@@ -967,10 +962,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (exec_fault) {
+	if (exec_fault)
 		prot |= KVM_PGTABLE_PROT_X;
-		invalidate_icache_guest_page(pfn, vma_pagesize);
-	}
 
 	if (device)
 		prot |= KVM_PGTABLE_PROT_DEVICE;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely
  2021-04-15 11:50 [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
                   ` (4 preceding siblings ...)
  2021-04-15 11:50 ` [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers Yanan Wang
@ 2021-04-15 11:50 ` Yanan Wang
  2021-06-02 11:07   ` Quentin Perret
  2021-05-12 12:54 ` [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table wangyanan (Y)
  6 siblings, 1 reply; 17+ messages in thread
From: Yanan Wang @ 2021-04-15 11:50 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Quentin Perret, Alexandru Elisei,
	kvmarm, linux-arm-kernel, kvm, linux-kernel
  Cc: Catalin Marinas, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, wanghaibin.wang, zhukeqian1, yuzenghui, Yanan Wang

With a guest translation fault, the memcache pages are not needed if KVM
is only about to install a new leaf entry into the existing page table.
And with a guest permission fault, the memcache pages are also not needed
for a write_fault in dirty-logging time if KVM is only about to update
the existing leaf entry instead of collapsing a block entry into a table.

By comparing fault_granule and vma_pagesize, cases that require allocations
from memcache and cases that don't can be distinguished completely.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/mmu.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index aa536392b308..9e35aa5d29f2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -895,19 +895,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn = fault_ipa >> PAGE_SHIFT;
 	mmap_read_unlock(current->mm);
 
-	/*
-	 * Permission faults just need to update the existing leaf entry,
-	 * and so normally don't require allocations from the memcache. The
-	 * only exception to this is when dirty logging is enabled at runtime
-	 * and a write fault needs to collapse a block entry into a table.
-	 */
-	if (fault_status != FSC_PERM || (logging_active && write_fault)) {
-		ret = kvm_mmu_topup_memory_cache(memcache,
-						 kvm_mmu_cache_min_pages(kvm));
-		if (ret)
-			return ret;
-	}
-
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	/*
 	 * Ensure the read of mmu_notifier_seq happens before we call
@@ -970,6 +957,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
 		prot |= KVM_PGTABLE_PROT_X;
 
+	/*
+	 * Allocations from the memcache are required only when granule of the
+	 * lookup level where the guest fault happened exceeds vma_pagesize,
+	 * which means new page tables will be created in the fault handlers.
+	 */
+	if (fault_granule > vma_pagesize) {
+		ret = kvm_mmu_topup_memory_cache(memcache,
+						 kvm_mmu_cache_min_pages(kvm));
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * Under the premise of getting a FSC_PERM fault, we just need to relax
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table
  2021-04-15 11:50 [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
                   ` (5 preceding siblings ...)
  2021-04-15 11:50 ` [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely Yanan Wang
@ 2021-05-12 12:54 ` wangyanan (Y)
  6 siblings, 0 replies; 17+ messages in thread
From: wangyanan (Y) @ 2021-05-12 12:54 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Quentin Perret, Alexandru Elisei,
	kvmarm, linux-arm-kernel, kvm, linux-kernel
  Cc: Catalin Marinas, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, wanghaibin.wang, zhukeqian1, yuzenghui

A really gentle ping ...

Sincerely,
Yanan


On 2021/4/15 19:50, Yanan Wang wrote:
> Hi,
>
> This series makes some efficiency improvement of guest stage-2 page
> table code, and there are some test results to quantify the benefit.
> The code has been re-arranged based on the latest kvmarm/next tree.
>
> Descriptions:
> We currently uniformly permorm CMOs of D-cache and I-cache in function
> user_mem_abort before calling the fault handlers. If we get concurrent
> guest faults(e.g. translation faults, permission faults) or some really
> unnecessary guest faults caused by BBM, CMOs for the first vcpu are
> necessary while the others later are not.
>
> By moving CMOs to the fault handlers, we can easily identify conditions
> where they are really needed and avoid the unnecessary ones. As it's a
> time consuming process to perform CMOs especially when flushing a block
> range, so this solution reduces much load of kvm and improve efficiency
> of the stage-2 page table code.
>
> In this series, patch #1, #3, #4 make preparation for place movement
> of CMOs (adapt to the latest stage-2 page table framework). And patch
> #2, #5 move CMOs of D-cache and I-cache to the fault handlers. Patch
> #6 introduces a new way to distinguish cases of memcache allocations.
>
> The following are results in v3 to represent the benefit introduced
> by movement of CMOs, and they were tested by [1] (kvm/selftest) that
> I have posted recently.
> [1] https://lore.kernel.org/lkml/20210302125751.19080-1-wangyanan55@huawei.com/
>
> When there are muitiple vcpus concurrently accessing the same memory
> region, we can test the execution time of KVM creating new mappings,
> updating the permissions of old mappings from RO to RW, and the time
> of re-creating the blocks after they have been split.
>
> hardware platform: HiSilicon Kunpeng920 Server
> host kernel: Linux mainline v5.12-rc2
>
> cmdline: ./kvm_page_table_test -m 4 -s anonymous -b 1G -v 80
>             (80 vcpus, 1G memory, page mappings(normal 4K))
> KVM_CREATE_MAPPINGS: before 104.35s -> after  90.42s  +13.35%
> KVM_UPDATE_MAPPINGS: before  78.64s -> after  75.45s  + 4.06%
>
> cmdline: ./kvm_page_table_test -m 4 -s anonymous_thp -b 20G -v 40
>             (40 vcpus, 20G memory, block mappings(THP 2M))
> KVM_CREATE_MAPPINGS: before  15.66s -> after   6.92s  +55.80%
> KVM_UPDATE_MAPPINGS: before 178.80s -> after 123.35s  +31.00%
> KVM_REBUILD_BLOCKS:  before 187.34s -> after 131.76s  +30.65%
>
> cmdline: ./kvm_page_table_test -m 4 -s anonymous_hugetlb_1gb -b 20G -v 40
>             (40 vcpus, 20G memory, block mappings(HUGETLB 1G))
> KVM_CREATE_MAPPINGS: before 104.54s -> after   3.70s  +96.46%
> KVM_UPDATE_MAPPINGS: before 174.20s -> after 115.94s  +33.44%
> KVM_REBUILD_BLOCKS:  before 103.95s -> after   2.96s  +97.15%
>
> ---
>
> Changelogs:
> v4->v5:
> - rebased on the latest kvmarm/tree to adapt to the new stage-2 page-table code
> - v4: https://lore.kernel.org/lkml/20210409033652.28316-1-wangyanan55@huawei.com/
>
> v3->v4:
> - perform D-cache flush if we are not mapping device memory
> - rebased on top of mainline v5.12-rc6
> - v3: https://lore.kernel.org/lkml/20210326031654.3716-1-wangyanan55@huawei.com/
>
> v2->v3:
> - drop patch #3 in v2
> - retest v3 based on v5.12-rc2
> - v2: https://lore.kernel.org/lkml/20210310094319.18760-1-wangyanan55@huawei.com/
>
> v1->v2:
> - rebased on top of mainline v5.12-rc2
> - also move CMOs of I-cache to the fault handlers
> - retest v2 based on v5.12-rc2
> - v1: https://lore.kernel.org/lkml/20210208112250.163568-1-wangyanan55@huawei.com/
>
> ---
>
> Yanan Wang (6):
>    KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag
>    KVM: arm64: Move D-cache flush to the fault handlers
>    KVM: arm64: Add mm_ops member for structure stage2_attr_data
>    KVM: arm64: Provide invalidate_icache_range at non-VHE EL2
>    KVM: arm64: Move I-cache flush to the fault handlers
>    KVM: arm64: Distinguish cases of memcache allocations completely
>
>   arch/arm64/include/asm/kvm_mmu.h     | 31 -------------
>   arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++------
>   arch/arm64/kvm/hyp/nvhe/cache.S      | 11 +++++
>   arch/arm64/kvm/hyp/pgtable.c         | 65 +++++++++++++++++++++++-----
>   arch/arm64/kvm/mmu.c                 | 51 ++++++++--------------
>   5 files changed, 107 insertions(+), 89 deletions(-)
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag
  2021-04-15 11:50 ` [PATCH v5 1/6] KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag Yanan Wang
@ 2021-06-02 10:43   ` Quentin Perret
  2021-06-03 12:36     ` wangyanan (Y)
  0 siblings, 1 reply; 17+ messages in thread
From: Quentin Perret @ 2021-06-02 10:43 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Marc Zyngier, Will Deacon, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

Hi Yanan,

On Thursday 15 Apr 2021 at 19:50:27 (+0800), Yanan Wang wrote:
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index c3674c47d48c..a43cbe697b37 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -61,10 +61,12 @@ struct kvm_pgtable_mm_ops {
>   * @KVM_PGTABLE_S2_NOFWB:	Don't enforce Normal-WB even if the CPUs have
>   *				ARM64_HAS_STAGE2_FWB.
>   * @KVM_PGTABLE_S2_IDMAP:	Only use identity mappings.
> + * @KVM_PGTABLE_S2_GUEST:	Whether the page-tables are guest stage-2.
>   */
>  enum kvm_pgtable_stage2_flags {
>  	KVM_PGTABLE_S2_NOFWB			= BIT(0),
>  	KVM_PGTABLE_S2_IDMAP			= BIT(1),
> +	KVM_PGTABLE_S2_GUEST			= BIT(2),

Assuming that we need this flag (maybe not given Marc's suggestion on
another patch), I'd recommend re-naming it to explain _what_ it does,
rather than _who_ is using it.

That's the principle we followed for e.g. KVM_PGTABLE_S2_IDMAP, so we
should be consistent here as well.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 2/6] KVM: arm64: Move D-cache flush to the fault handlers
       [not found]   ` <877djc1sca.wl-maz@kernel.org>
@ 2021-06-02 10:49     ` Quentin Perret
  2021-06-03 12:33     ` wangyanan (Y)
  1 sibling, 0 replies; 17+ messages in thread
From: Quentin Perret @ 2021-06-02 10:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Yanan Wang, Will Deacon, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

On Wednesday 02 Jun 2021 at 11:19:49 (+0100), Marc Zyngier wrote:
> On Thu, 15 Apr 2021 12:50:28 +0100,
> > @@ -583,6 +589,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >  {
> >  	kvm_pte_t new, old = *ptep;
> >  	u64 granule = kvm_granule_size(level), phys = data->phys;
> > +	struct kvm_pgtable *pgt = data->mmu->pgt;
> >  	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> >  
> >  	if (!kvm_block_mapping_supported(addr, end, phys, level))
> > @@ -606,6 +613,13 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >  		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> >  	}
> >  
> > +	/* Perform CMOs before installation of the guest stage-2 PTE */
> > +	if (pgt->flags & KVM_PGTABLE_S2_GUEST) {
> > +		if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
> > +			__flush_dcache_area(mm_ops->phys_to_virt(phys),
> > +					    granule);
> > +	}
> 
> Rather than this, why not provide new callbacks in mm_ops, even if we
> have to provide one that is specific to guests (and let the protected
> stuff do its own thing)?

Ack, an optional callback in the mm_ops sounds much nicer.

> One thing I really dislike though is that the page-table code is
> starting to be littered with things that are not directly related to
> page tables. We are re-creating the user_mem_abort() mess in a
> different place.

+1, we should probably keep the page-table code as close as possible
to a standalone and architecturally-compliant library as opposed to a
mess of unrelated logic, simply because that will lead to a cleaner and
more maintainable design in the long run, and because that will ease the
interoperability with EL2 in protected mode.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers
  2021-04-15 11:50 ` [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers Yanan Wang
@ 2021-06-02 10:58   ` Quentin Perret
  2021-06-03 12:35     ` wangyanan (Y)
  0 siblings, 1 reply; 17+ messages in thread
From: Quentin Perret @ 2021-06-02 10:58 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Marc Zyngier, Will Deacon, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

On Thursday 15 Apr 2021 at 19:50:31 (+0800), Yanan Wang wrote:
> In this patch, we move invalidation of I-cache to the fault handlers to

Nit: please avoid using 'This patch' in commit messages, see
Documentation/process/submitting-patches.rst.

> avoid unnecessary I-cache maintenances. On the map path, invalidate the
> I-cache if we are going to create an executable stage-2 mapping for guest.
> And on the permission path, invalidate the I-cache if we are going to add
> an executable permission to the existing guest stage-2 mapping.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h | 15 --------------
>  arch/arm64/kvm/hyp/pgtable.c     | 35 +++++++++++++++++++++++++++++++-
>  arch/arm64/kvm/mmu.c             |  9 +-------
>  3 files changed, 35 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index e9b163c5f023..155492fe5b15 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -187,21 +187,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>  }
>  
> -static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
> -						  unsigned long size)
> -{
> -	if (icache_is_aliasing()) {
> -		/* any kind of VIPT cache */
> -		__flush_icache_all();
> -	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
> -		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> -		void *va = page_address(pfn_to_page(pfn));
> -
> -		invalidate_icache_range((unsigned long)va,
> -					(unsigned long)va + size);
> -	}
> -}
> -
>  void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>  void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b480f6d1171e..9f4429d80df0 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -568,6 +568,26 @@ static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
>  	return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
>  }
>  
> +static bool stage2_pte_executable(kvm_pte_t pte)
> +{
> +	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
> +}
> +
> +static void stage2_invalidate_icache(void *addr, u64 size)
> +{
> +	if (icache_is_aliasing()) {
> +		/* Any kind of VIPT cache */
> +		__flush_icache_all();
> +	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {


> +		/*
> +		 * See comment in __kvm_tlb_flush_vmid_ipa().
> +		 * Invalidate PIPT, or VPIPT at EL2.
> +		 */
> +		invalidate_icache_range((unsigned long)addr,
> +					(unsigned long)addr + size);
> +	}
> +}
> +
>  static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
>  			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
>  {
> @@ -618,6 +638,10 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>  		if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
>  			__flush_dcache_area(mm_ops->phys_to_virt(phys),
>  					    granule);
> +
> +		if (stage2_pte_executable(new))
> +			stage2_invalidate_icache(mm_ops->phys_to_virt(phys),
> +						 granule);
>  	}
>  
>  	smp_store_release(ptep, new);
> @@ -896,8 +920,17 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	 * but worst-case the access flag update gets lost and will be
>  	 * set on the next access instead.
>  	 */
> -	if (data->pte != pte)
> +	if (data->pte != pte) {
> +		/*
> +		 * Invalidate the instruction cache before updating
> +		 * if we are going to add the executable permission
> +		 * for the guest stage-2 PTE.
> +		 */
> +		if (!stage2_pte_executable(*ptep) && stage2_pte_executable(pte))
> +			stage2_invalidate_icache(kvm_pte_follow(pte, data->mm_ops),
> +						 kvm_granule_size(level));
>  		WRITE_ONCE(*ptep, pte);
> +	}

As for the dcache stuff, it seems like this would be best placed in an
optional mm_ops callback, and have the kernel implement it.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely
  2021-04-15 11:50 ` [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely Yanan Wang
@ 2021-06-02 11:07   ` Quentin Perret
  2021-06-03 12:52     ` wangyanan (Y)
  0 siblings, 1 reply; 17+ messages in thread
From: Quentin Perret @ 2021-06-02 11:07 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Marc Zyngier, Will Deacon, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

On Thursday 15 Apr 2021 at 19:50:32 (+0800), Yanan Wang wrote:
> With a guest translation fault, the memcache pages are not needed if KVM
> is only about to install a new leaf entry into the existing page table.
> And with a guest permission fault, the memcache pages are also not needed
> for a write_fault in dirty-logging time if KVM is only about to update
> the existing leaf entry instead of collapsing a block entry into a table.
> 
> By comparing fault_granule and vma_pagesize, cases that require allocations
> from memcache and cases that don't can be distinguished completely.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  arch/arm64/kvm/mmu.c | 25 ++++++++++++-------------
>  1 file changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index aa536392b308..9e35aa5d29f2 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -895,19 +895,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	gfn = fault_ipa >> PAGE_SHIFT;
>  	mmap_read_unlock(current->mm);
>  
> -	/*
> -	 * Permission faults just need to update the existing leaf entry,
> -	 * and so normally don't require allocations from the memcache. The
> -	 * only exception to this is when dirty logging is enabled at runtime
> -	 * and a write fault needs to collapse a block entry into a table.
> -	 */
> -	if (fault_status != FSC_PERM || (logging_active && write_fault)) {
> -		ret = kvm_mmu_topup_memory_cache(memcache,
> -						 kvm_mmu_cache_min_pages(kvm));
> -		if (ret)
> -			return ret;
> -	}
> -
>  	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>  	/*
>  	 * Ensure the read of mmu_notifier_seq happens before we call
> @@ -970,6 +957,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
>  		prot |= KVM_PGTABLE_PROT_X;
>  
> +	/*
> +	 * Allocations from the memcache are required only when granule of the
> +	 * lookup level where the guest fault happened exceeds vma_pagesize,
> +	 * which means new page tables will be created in the fault handlers.
> +	 */
> +	if (fault_granule > vma_pagesize) {
> +		ret = kvm_mmu_topup_memory_cache(memcache,
> +						 kvm_mmu_cache_min_pages(kvm));
> +		if (ret)
> +			return ret;
> +	}

You're now doing the top-up in the kvm->mmu_lock critical section. Isn't
this more or less what we try to avoid by using a memory cache?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 2/6] KVM: arm64: Move D-cache flush to the fault handlers
       [not found]   ` <877djc1sca.wl-maz@kernel.org>
  2021-06-02 10:49     ` Quentin Perret
@ 2021-06-03 12:33     ` wangyanan (Y)
  1 sibling, 0 replies; 17+ messages in thread
From: wangyanan (Y) @ 2021-06-03 12:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Quentin Perret, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

Hi Marc,

On 2021/6/2 18:19, Marc Zyngier wrote:
> On Thu, 15 Apr 2021 12:50:28 +0100,
> Yanan Wang <wangyanan55@huawei.com> wrote:
>> We currently uniformly permorm CMOs of D-cache and I-cache in function
>> user_mem_abort before calling the fault handlers. If we get concurrent
>> guest faults(e.g. translation faults, permission faults) or some really
>> unnecessary guest faults caused by BBM, CMOs for the first vcpu are
>> necessary while the others later are not.
>>
>> By moving CMOs to the fault handlers, we can easily identify conditions
>> where they are really needed and avoid the unnecessary ones. As it's a
>> time consuming process to perform CMOs especially when flushing a block
>> range, so this solution reduces much load of kvm and improve efficiency
>> of the page table code.
>>
>> This patch only moves clean of D-cache to the map path, and drop the
>> original APIs in mmu.c/mmu.h for D-cache maintenance by using what we
>> already have in pgtable.c. Change about the I-side will come from a
>> later patch.
> But this means that until patch #5, this is broken (invalidation on
> the i-side will happen before the clean to PoU on the d-side). You
> need to keep the series bisectable and not break things in the middle.
> It would be OK to have two set of D-cache CMOs in the interval, for
> example.
Indeed. The D-side change and the I-side changebe put together
into one single patch.
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/include/asm/kvm_mmu.h | 16 ----------------
>>   arch/arm64/kvm/hyp/pgtable.c     | 20 ++++++++++++++------
>>   arch/arm64/kvm/mmu.c             | 14 +++-----------
>>   3 files changed, 17 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 25ed956f9af1..e9b163c5f023 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -187,22 +187,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>>   	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>>   }
>>   
>> -static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
>> -{
>> -	void *va = page_address(pfn_to_page(pfn));
>> -
>> -	/*
>> -	 * With FWB, we ensure that the guest always accesses memory using
>> -	 * cacheable attributes, and we don't have to clean to PoC when
>> -	 * faulting in pages. Furthermore, FWB implies IDC, so cleaning to
>> -	 * PoU is not required either in this case.
>> -	 */
>> -	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
>> -		return;
>> -
>> -	kvm_flush_dcache_to_poc(va, size);
>> -}
>> -
>>   static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
>>   						  unsigned long size)
>>   {
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index c37c1dc4feaf..e3606c9dcec7 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -562,6 +562,12 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>>   	return !!pte;
>>   }
>>   
>> +static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
>> +{
>> +	u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
>> +	return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
>> +}
>> +
>>   static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
>>   			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
>>   {
>> @@ -583,6 +589,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>   {
>>   	kvm_pte_t new, old = *ptep;
>>   	u64 granule = kvm_granule_size(level), phys = data->phys;
>> +	struct kvm_pgtable *pgt = data->mmu->pgt;
>>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>>   
>>   	if (!kvm_block_mapping_supported(addr, end, phys, level))
>> @@ -606,6 +613,13 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>   		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
>>   	}
>>   
>> +	/* Perform CMOs before installation of the guest stage-2 PTE */
>> +	if (pgt->flags & KVM_PGTABLE_S2_GUEST) {
>> +		if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
>> +			__flush_dcache_area(mm_ops->phys_to_virt(phys),
>> +					    granule);
>> +	}
> Rather than this, why not provide new callbacks in mm_ops, even if we
> have to provide one that is specific to guests (and let the protected
> stuff do its own thing)?
Thanks!
It's obviously a better idea! By introducing two new optional callbacks
(D-cache handler and I-cache handler), we can avoid code duplication
and make the generic pgtable code much cleaner.
> One thing I really dislike though is that the page-table code is
> starting to be littered with things that are not directly related to
> page tables. We are re-creating the user_mem_abort() mess in a
> different place.
I agree with this too. For the long run, we should try to avoid adding
more and more unrelated things into the page-table code, given that
it is developing to be generic (also supports host S2 now).

As for this series, it does introduce some performance improvement,
although the cost is moving the guest specific cache maintenance to
the page-table handlers. But I think the page-table code will be well
isolated if we put the CMOs as optional callbacks in mm_ops as you
have suggested above.

Thanks,
Yanan
>> +
>>   	smp_store_release(ptep, new);
>>   	if (stage2_pte_is_counted(new))
>>   		mm_ops->get_page(ptep);
>> @@ -798,12 +812,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>>   	return ret;
>>   }
>>   
>> -static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
>> -{
>> -	u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
>> -	return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
>> -}
>> -
>>   static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>   			       enum kvm_pgtable_walk_flags flag,
>>   			       void * const arg)
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 2cfcfc5f4e4e..86f7dd1c234f 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -694,11 +694,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>   	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>   }
>>   
>> -static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
>> -{
>> -	__clean_dcache_guest_page(pfn, size);
>> -}
>> -
>>   static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
>>   {
>>   	__invalidate_icache_guest_page(pfn, size);
>> @@ -972,9 +967,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (writable)
>>   		prot |= KVM_PGTABLE_PROT_W;
>>   
>> -	if (fault_status != FSC_PERM && !device)
>> -		clean_dcache_guest_page(pfn, vma_pagesize);
>> -
>>   	if (exec_fault) {
>>   		prot |= KVM_PGTABLE_PROT_X;
>>   		invalidate_icache_guest_page(pfn, vma_pagesize);
>> @@ -1234,10 +1226,10 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>>   	trace_kvm_set_spte_hva(hva);
>>   
>>   	/*
>> -	 * We've moved a page around, probably through CoW, so let's treat it
>> -	 * just like a translation fault and clean the cache to the PoC.
>> +	 * We've moved a page around, probably through CoW, so let's treat
>> +	 * it just like a translation fault and the map handler will clean
>> +	 * the cache to the PoC.
>>   	 */
>> -	clean_dcache_guest_page(pfn, PAGE_SIZE);
>>   	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
>>   	return 0;
>>   }
> Thanks,
>
> 	M.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/6] KVM: arm64: Provide invalidate_icache_range at non-VHE EL2
       [not found]   ` <875yyw1s73.wl-maz@kernel.org>
@ 2021-06-03 12:34     ` wangyanan (Y)
  0 siblings, 0 replies; 17+ messages in thread
From: wangyanan (Y) @ 2021-06-03 12:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Quentin Perret, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

Hi Marc,

On 2021/6/2 18:22, Marc Zyngier wrote:
> On Thu, 15 Apr 2021 12:50:30 +0100,
> Yanan Wang <wangyanan55@huawei.com> wrote:
>> We want to move I-cache maintenance for the guest to the stage-2
>> page table code for performance improvement. Before it can work,
>> we should first make function invalidate_icache_range available
>> to non-VHE EL2 to avoid compiling or program running error, as
>> pgtable.c is now linked into the non-VHE EL2 code for pKVM mode.
>>
>> In this patch, we only introduce symbol of invalidate_icache_range
>> with no real functionality in nvhe/cache.S, because there haven't
>> been situations found currently where I-cache maintenance is also
>> needed in non-VHE EL2 for pKVM mode.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/kvm/hyp/nvhe/cache.S | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
>> index 36cef6915428..a125ec9aeed2 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/cache.S
>> +++ b/arch/arm64/kvm/hyp/nvhe/cache.S
>> @@ -11,3 +11,14 @@ SYM_FUNC_START_PI(__flush_dcache_area)
>>   	dcache_by_line_op civac, sy, x0, x1, x2, x3
>>   	ret
>>   SYM_FUNC_END_PI(__flush_dcache_area)
>> +
>> +/*
>> + *	invalidate_icache_range(start,end)
>> + *
>> + *	Ensure that the I cache is invalid within specified region.
>> + *
>> + *	- start   - virtual start address of region
>> + *	- end     - virtual end address of region
>> + */
>> +SYM_FUNC_START(invalidate_icache_range)
>> +SYM_FUNC_END(invalidate_icache_range)
> This is a good indication that something is really wrong.
>
> If you were to provide cache management callbacks as part of the
> mm_ops themselves (or a similar abstraction), you wouldn't have to do
> these things.
Yes, we definitely don't need this work if we do that.

Thanks,
Yanan
>
> 	M.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers
  2021-06-02 10:58   ` Quentin Perret
@ 2021-06-03 12:35     ` wangyanan (Y)
  0 siblings, 0 replies; 17+ messages in thread
From: wangyanan (Y) @ 2021-06-03 12:35 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Will Deacon, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

Hi Quentin,

On 2021/6/2 18:58, Quentin Perret wrote:
> On Thursday 15 Apr 2021 at 19:50:31 (+0800), Yanan Wang wrote:
>> In this patch, we move invalidation of I-cache to the fault handlers to
> Nit: please avoid using 'This patch' in commit messages, see
> Documentation/process/submitting-patches.rst.
Thanks!
I will get rid of this.
>> avoid unnecessary I-cache maintenances. On the map path, invalidate the
>> I-cache if we are going to create an executable stage-2 mapping for guest.
>> And on the permission path, invalidate the I-cache if we are going to add
>> an executable permission to the existing guest stage-2 mapping.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/include/asm/kvm_mmu.h | 15 --------------
>>   arch/arm64/kvm/hyp/pgtable.c     | 35 +++++++++++++++++++++++++++++++-
>>   arch/arm64/kvm/mmu.c             |  9 +-------
>>   3 files changed, 35 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index e9b163c5f023..155492fe5b15 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -187,21 +187,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>>   	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>>   }
>>   
>> -static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
>> -						  unsigned long size)
>> -{
>> -	if (icache_is_aliasing()) {
>> -		/* any kind of VIPT cache */
>> -		__flush_icache_all();
>> -	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>> -		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
>> -		void *va = page_address(pfn_to_page(pfn));
>> -
>> -		invalidate_icache_range((unsigned long)va,
>> -					(unsigned long)va + size);
>> -	}
>> -}
>> -
>>   void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>>   void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>>   
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index b480f6d1171e..9f4429d80df0 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -568,6 +568,26 @@ static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
>>   	return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
>>   }
>>   
>> +static bool stage2_pte_executable(kvm_pte_t pte)
>> +{
>> +	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
>> +}
>> +
>> +static void stage2_invalidate_icache(void *addr, u64 size)
>> +{
>> +	if (icache_is_aliasing()) {
>> +		/* Any kind of VIPT cache */
>> +		__flush_icache_all();
>> +	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>
>> +		/*
>> +		 * See comment in __kvm_tlb_flush_vmid_ipa().
>> +		 * Invalidate PIPT, or VPIPT at EL2.
>> +		 */
>> +		invalidate_icache_range((unsigned long)addr,
>> +					(unsigned long)addr + size);
>> +	}
>> +}
>> +
>>   static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
>>   			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
>>   {
>> @@ -618,6 +638,10 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>   		if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
>>   			__flush_dcache_area(mm_ops->phys_to_virt(phys),
>>   					    granule);
>> +
>> +		if (stage2_pte_executable(new))
>> +			stage2_invalidate_icache(mm_ops->phys_to_virt(phys),
>> +						 granule);
>>   	}
>>   
>>   	smp_store_release(ptep, new);
>> @@ -896,8 +920,17 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>   	 * but worst-case the access flag update gets lost and will be
>>   	 * set on the next access instead.
>>   	 */
>> -	if (data->pte != pte)
>> +	if (data->pte != pte) {
>> +		/*
>> +		 * Invalidate the instruction cache before updating
>> +		 * if we are going to add the executable permission
>> +		 * for the guest stage-2 PTE.
>> +		 */
>> +		if (!stage2_pte_executable(*ptep) && stage2_pte_executable(pte))
>> +			stage2_invalidate_icache(kvm_pte_follow(pte, data->mm_ops),
>> +						 kvm_granule_size(level));
>>   		WRITE_ONCE(*ptep, pte);
>> +	}
> As for the dcache stuff, it seems like this would be best placed in an
> optional mm_ops callback, and have the kernel implement it.
I think so, that is the preferred way.

Thanks,
Yanan
> Thanks,
> Quentin
> .


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag
  2021-06-02 10:43   ` Quentin Perret
@ 2021-06-03 12:36     ` wangyanan (Y)
  0 siblings, 0 replies; 17+ messages in thread
From: wangyanan (Y) @ 2021-06-03 12:36 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Will Deacon, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

Hi Quentin,

On 2021/6/2 18:43, Quentin Perret wrote:
> Hi Yanan,
>
> On Thursday 15 Apr 2021 at 19:50:27 (+0800), Yanan Wang wrote:
>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>> index c3674c47d48c..a43cbe697b37 100644
>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>> @@ -61,10 +61,12 @@ struct kvm_pgtable_mm_ops {
>>    * @KVM_PGTABLE_S2_NOFWB:	Don't enforce Normal-WB even if the CPUs have
>>    *				ARM64_HAS_STAGE2_FWB.
>>    * @KVM_PGTABLE_S2_IDMAP:	Only use identity mappings.
>> + * @KVM_PGTABLE_S2_GUEST:	Whether the page-tables are guest stage-2.
>>    */
>>   enum kvm_pgtable_stage2_flags {
>>   	KVM_PGTABLE_S2_NOFWB			= BIT(0),
>>   	KVM_PGTABLE_S2_IDMAP			= BIT(1),
>> +	KVM_PGTABLE_S2_GUEST			= BIT(2),
> Assuming that we need this flag (maybe not given Marc's suggestion on
> another patch), I'd recommend re-naming it to explain _what_ it does,
> rather than _who_ is using it.
I agree with this.
> That's the principle we followed for e.g. KVM_PGTABLE_S2_IDMAP, so we
> should be consistent here as well.
But I think maybe we don't need the new flag anymore with Marc's suggestion.
Currently the CMOs right before installation or update of a PTE are 
guest-specific.
So if we also take the new optional callbacks as guest specific, then a 
new flag is
not necessary because we can check whether the callbacks have been 
initialized
to determine if we are managing a guest S2 PTE.

Thanks,
Yanan
> Thanks,
> Quentin
> .


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely
  2021-06-02 11:07   ` Quentin Perret
@ 2021-06-03 12:52     ` wangyanan (Y)
  0 siblings, 0 replies; 17+ messages in thread
From: wangyanan (Y) @ 2021-06-03 12:52 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, Will Deacon, Alexandru Elisei, kvmarm,
	linux-arm-kernel, kvm, linux-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	wanghaibin.wang, zhukeqian1, yuzenghui

Hi Quentin,

On 2021/6/2 19:07, Quentin Perret wrote:
> On Thursday 15 Apr 2021 at 19:50:32 (+0800), Yanan Wang wrote:
>> With a guest translation fault, the memcache pages are not needed if KVM
>> is only about to install a new leaf entry into the existing page table.
>> And with a guest permission fault, the memcache pages are also not needed
>> for a write_fault in dirty-logging time if KVM is only about to update
>> the existing leaf entry instead of collapsing a block entry into a table.
>>
>> By comparing fault_granule and vma_pagesize, cases that require allocations
>> from memcache and cases that don't can be distinguished completely.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/kvm/mmu.c | 25 ++++++++++++-------------
>>   1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index aa536392b308..9e35aa5d29f2 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -895,19 +895,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	gfn = fault_ipa >> PAGE_SHIFT;
>>   	mmap_read_unlock(current->mm);
>>   
>> -	/*
>> -	 * Permission faults just need to update the existing leaf entry,
>> -	 * and so normally don't require allocations from the memcache. The
>> -	 * only exception to this is when dirty logging is enabled at runtime
>> -	 * and a write fault needs to collapse a block entry into a table.
>> -	 */
>> -	if (fault_status != FSC_PERM || (logging_active && write_fault)) {
>> -		ret = kvm_mmu_topup_memory_cache(memcache,
>> -						 kvm_mmu_cache_min_pages(kvm));
>> -		if (ret)
>> -			return ret;
>> -	}
>> -
>>   	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>>   	/*
>>   	 * Ensure the read of mmu_notifier_seq happens before we call
>> @@ -970,6 +957,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
>>   		prot |= KVM_PGTABLE_PROT_X;
>>   
>> +	/*
>> +	 * Allocations from the memcache are required only when granule of the
>> +	 * lookup level where the guest fault happened exceeds vma_pagesize,
>> +	 * which means new page tables will be created in the fault handlers.
>> +	 */
>> +	if (fault_granule > vma_pagesize) {
>> +		ret = kvm_mmu_topup_memory_cache(memcache,
>> +						 kvm_mmu_cache_min_pages(kvm));
>> +		if (ret)
>> +			return ret;
>> +	}
> You're now doing the top-up in the kvm->mmu_lock critical section. Isn't
> this more or less what we try to avoid by using a memory cache?
Oh, right!

This patch intended to clean up the code and avoid the unnecessary top-ups,
but it's a bad idea to do the top-up when holding mmu_lock. I will rearrange
this part and keep it where it should be.

Thanks,
Yanan
> Thanks,
> Quentin
> .


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-06-03 12:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 11:50 [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
2021-04-15 11:50 ` [PATCH v5 1/6] KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag Yanan Wang
2021-06-02 10:43   ` Quentin Perret
2021-06-03 12:36     ` wangyanan (Y)
2021-04-15 11:50 ` [PATCH v5 2/6] KVM: arm64: Move D-cache flush to the fault handlers Yanan Wang
     [not found]   ` <877djc1sca.wl-maz@kernel.org>
2021-06-02 10:49     ` Quentin Perret
2021-06-03 12:33     ` wangyanan (Y)
2021-04-15 11:50 ` [PATCH v5 3/6] KVM: arm64: Add mm_ops member for structure stage2_attr_data Yanan Wang
2021-04-15 11:50 ` [PATCH v5 4/6] KVM: arm64: Provide invalidate_icache_range at non-VHE EL2 Yanan Wang
     [not found]   ` <875yyw1s73.wl-maz@kernel.org>
2021-06-03 12:34     ` wangyanan (Y)
2021-04-15 11:50 ` [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers Yanan Wang
2021-06-02 10:58   ` Quentin Perret
2021-06-03 12:35     ` wangyanan (Y)
2021-04-15 11:50 ` [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely Yanan Wang
2021-06-02 11:07   ` Quentin Perret
2021-06-03 12:52     ` wangyanan (Y)
2021-05-12 12:54 ` [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table wangyanan (Y)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).