kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Some optimization for stage-2 translation
@ 2021-01-14 12:13 Yanan Wang
  2021-01-14 12:13 ` [PATCH v3 1/3] KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map Yanan Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Yanan Wang @ 2021-01-14 12:13 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, kvmarm,
	linux-arm-kernel, kvm, linux-kernel
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, Yanan Wang

Hi,
This patch series(v3) make some optimization for stage-2 translation.

About patch-1:
Procedures of hyp stage-1 map and guest stage-2 map are quite different,
but they are now tied closely by function kvm_set_valid_leaf_pte().
So adjust the relative code for ease of code maintenance in the future.

About patch-2:
There have been the separate map handler and perms handler used independently
for mapping and relaxing permissions in the new page-table infrastructure for
stage-2, yet there is still a specific case where we end up changing the access
permissions in the map path, and something unsatisfactory could happen because
of current handling for this case.

To solve above problem, we can filter out this case from the map path and abort
the PTE update. Instead, let the vCPU enter back the guest and it will exit next
time to go through the relax_perms path if still needed.

About patch-3:
We now set the pfn dirty and mark the page dirty before calling fault
handlers in user_mem_abort(), so we might end up having spurious dirty
pages if update of permissions or mapping has failed. Let's move these
two operations after the fault handlers, and they will be done only if
the fault has been handled successfully.

When an -EAGAIN errno is returned from the map handler, we hope to the
vcpu to enter guest directly instead of exiting back to userspace, so
adjust the return value at the end of function.

---

Changelogs

v2->v3:
- Rebased on top of v5.11-rc3
- Refine the commit messages
- Make some adjustment about return value in patch-2 and patch-3
- v2: https://lore.kernel.org/lkml/20201216122844.25092-1-wangyanan55@huawei.com/

v1->v2:
- Make part of the diff a seperate patch (patch-1)
- Add Will's Signed-off-by for patch-1
- Return an errno when meeting changing permissions case in map path
- Add a new patch (patch-3)
- v1: https://lore.kernel.org/lkml/20201211080115.21460-1-wangyanan55@huawei.com/

---

Yanan Wang (3):
  KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2
    map
  KVM: arm64: Filter out the case of only changing permissions from
    stage-2 map path
  KVM: arm64: Mark the page dirty only if the fault is handled
    successfully

 arch/arm64/include/asm/kvm_pgtable.h |  5 ++
 arch/arm64/kvm/hyp/pgtable.c         | 83 ++++++++++++++++------------
 arch/arm64/kvm/mmu.c                 | 13 +++--
 3 files changed, 60 insertions(+), 41 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/3] KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map
  2021-01-14 12:13 [PATCH v3 0/3] Some optimization for stage-2 translation Yanan Wang
@ 2021-01-14 12:13 ` Yanan Wang
  2021-01-14 12:13 ` [PATCH v3 2/3] KVM: arm64: Filter out the case of only changing permissions from stage-2 map path Yanan Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Yanan Wang @ 2021-01-14 12:13 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, kvmarm,
	linux-arm-kernel, kvm, linux-kernel
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, Yanan Wang

Procedures of hyp stage-1 map and guest stage-2 map are quite different,
but they are tied closely by function kvm_set_valid_leaf_pte().
So adjust the relative code for ease of code maintenance in the future.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 55 ++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index bdf8e55ed308..a11ac874bc2a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -170,10 +170,9 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
 	smp_store_release(ptep, pte);
 }
 
-static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
-				   u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
+	kvm_pte_t pte = kvm_phys_to_pte(pa);
 	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
 							   KVM_PTE_TYPE_BLOCK;
 
@@ -181,12 +180,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
 	pte |= KVM_PTE_VALID;
 
-	/* Tolerate KVM recreating the exact same mapping. */
-	if (kvm_pte_valid(old))
-		return old == pte;
-
-	smp_store_release(ptep, pte);
-	return true;
+	return pte;
 }
 
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
@@ -341,12 +335,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
 static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
+	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return false;
 
-	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
+	/* Tolerate KVM recreating the exact same mapping */
+	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (old != new && !WARN_ON(kvm_pte_valid(old)))
+		smp_store_release(ptep, new);
+
 	data->phys += granule;
 	return true;
 }
@@ -465,27 +464,30 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 				       kvm_pte_t *ptep,
 				       struct stage2_map_data *data)
 {
+	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
+	struct page *page = virt_to_page(ptep);
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return false;
 
-	/*
-	 * If the PTE was already valid, drop the refcount on the table
-	 * early, as it will be bumped-up again in stage2_map_walk_leaf().
-	 * This ensures that the refcount stays constant across a valid to
-	 * valid PTE update.
-	 */
-	if (kvm_pte_valid(*ptep))
-		put_page(virt_to_page(ptep));
+	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (kvm_pte_valid(old)) {
+		/* Tolerate KVM recreating the exact same mapping */
+		if (old == new)
+			goto out;
 
-	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
-		goto out;
+		/*
+		 * There's an existing different valid leaf entry, so perform
+		 * break-before-make.
+		 */
+		kvm_set_invalid_pte(ptep);
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+		put_page(page);
+	}
 
-	/* There's an existing valid leaf entry, so perform break-before-make */
-	kvm_set_invalid_pte(ptep);
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
+	smp_store_release(ptep, new);
+	get_page(page);
 out:
 	data->phys += granule;
 	return true;
@@ -527,7 +529,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
-		goto out_get_page;
+		return 0;
 
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
@@ -551,9 +553,8 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	kvm_set_table_pte(ptep, childp);
-
-out_get_page:
 	get_page(page);
+
 	return 0;
 }
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 2/3] KVM: arm64: Filter out the case of only changing permissions from stage-2 map path
  2021-01-14 12:13 [PATCH v3 0/3] Some optimization for stage-2 translation Yanan Wang
  2021-01-14 12:13 ` [PATCH v3 1/3] KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map Yanan Wang
@ 2021-01-14 12:13 ` Yanan Wang
  2021-01-14 12:13 ` [PATCH v3 3/3] KVM: arm64: Mark the page dirty only if the fault is handled successfully Yanan Wang
  2021-01-25 16:33 ` [PATCH v3 0/3] Some optimization for stage-2 translation Marc Zyngier
  3 siblings, 0 replies; 5+ messages in thread
From: Yanan Wang @ 2021-01-14 12:13 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, kvmarm,
	linux-arm-kernel, kvm, linux-kernel
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, Yanan Wang

(1) During running time of a a VM with numbers of vCPUs, if some vCPUs
access the same GPA almost at the same time and the stage-2 mapping of
the GPA has not been built yet, as a result they will all cause
translation faults. The first vCPU builds the mapping, and the followed
ones end up updating the valid leaf PTE. Note that these vCPUs might
want different access permissions (RO, RW, RX, RWX, etc.).

(2) It's inevitable that we sometimes will update an existing valid leaf
PTE in the map path, and we perform break-before-make in this case.
Then more unnecessary translation faults could be caused if the
*break stage* of BBM is just catched by other vCPUS.

With (1) and (2), something unsatisfactory could happen: vCPU A causes
a translation fault and builds the mapping with RW permissions, vCPU B
then update the valid leaf PTE with break-before-make and permissions
are updated back to RO. Besides, *break stage* of BBM may trigger more
translation faults. Finally, some useless small loops could occur.

We can make some optimization to solve above problems: When we need to
update a valid leaf PTE in the map path, let's filter out the case where
this update only change access permissions, and don't update the valid
leaf PTE here in this case. Instead, let the vCPU enter back the guest
and it will exit next time to go through the relax_perms path without
break-before-make if it still wants more permissions.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/include/asm/kvm_pgtable.h |  5 +++++
 arch/arm64/kvm/hyp/pgtable.c         | 32 ++++++++++++++++++----------
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 52ab38db04c7..8886d43cfb11 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -157,6 +157,11 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  * If device attributes are not explicitly requested in @prot, then the
  * mapping will be normal, cacheable.
  *
+ * Note that the update of a valid leaf PTE in this function will be aborted,
+ * if it's trying to recreate the exact same mapping or only change the access
+ * permissions. Instead, the vCPU will exit one more time from guest if still
+ * needed and then go through the path of relaxing permissions.
+ *
  * Note that this function will both coalesce existing table entries and split
  * existing block mappings, relying on page-faults to fault back areas outside
  * of the new mapping lazily.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index a11ac874bc2a..4d177ce1d536 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -45,6 +45,10 @@
 
 #define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)
 
+#define KVM_PTE_LEAF_ATTR_S2_PERMS	(KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \
+					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
+					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -460,22 +464,27 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	return 0;
 }
 
-static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				       kvm_pte_t *ptep,
-				       struct stage2_map_data *data)
+static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+				      kvm_pte_t *ptep,
+				      struct stage2_map_data *data)
 {
 	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 	struct page *page = virt_to_page(ptep);
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
-		return false;
+		return -E2BIG;
 
 	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
 	if (kvm_pte_valid(old)) {
-		/* Tolerate KVM recreating the exact same mapping */
-		if (old == new)
-			goto out;
+		/*
+		 * Skip updating the PTE if we are trying to recreate the exact
+		 * same mapping or only change the access permissions. Instead,
+		 * the vCPU will exit one more time from guest if still needed
+		 * and then go through the path of relaxing permissions.
+		 */
+		if (!((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS)))
+			return -EAGAIN;
 
 		/*
 		 * There's an existing different valid leaf entry, so perform
@@ -488,9 +497,8 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 
 	smp_store_release(ptep, new);
 	get_page(page);
-out:
 	data->phys += granule;
-	return true;
+	return 0;
 }
 
 static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
@@ -518,6 +526,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
+	int ret;
 	kvm_pte_t *childp, pte = *ptep;
 	struct page *page = virt_to_page(ptep);
 
@@ -528,8 +537,9 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return 0;
 	}
 
-	if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
-		return 0;
+	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
+	if (ret != -E2BIG)
+		return ret;
 
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
-- 
2.19.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 3/3] KVM: arm64: Mark the page dirty only if the fault is handled successfully
  2021-01-14 12:13 [PATCH v3 0/3] Some optimization for stage-2 translation Yanan Wang
  2021-01-14 12:13 ` [PATCH v3 1/3] KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map Yanan Wang
  2021-01-14 12:13 ` [PATCH v3 2/3] KVM: arm64: Filter out the case of only changing permissions from stage-2 map path Yanan Wang
@ 2021-01-14 12:13 ` Yanan Wang
  2021-01-25 16:33 ` [PATCH v3 0/3] Some optimization for stage-2 translation Marc Zyngier
  3 siblings, 0 replies; 5+ messages in thread
From: Yanan Wang @ 2021-01-14 12:13 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon, Catalin Marinas, kvmarm,
	linux-arm-kernel, kvm, linux-kernel
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, Yanan Wang

We now set the pfn dirty and mark the page dirty before calling fault
handlers in user_mem_abort(), so we might end up having spurious dirty
pages if update of permissions or mapping has failed. Let's move these
two operations after the fault handlers, and they will be done only if
the fault has been handled successfully.

When an -EAGAIN errno is returned from the map handler, we hope to the
vcpu to enter guest directly instead of exiting back to userspace, so
adjust the return value at the end of function.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/mmu.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 7d2257cc5438..77cb2d28f2a4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -879,11 +879,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
 							   &pfn, &fault_ipa);
-	if (writable) {
+	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
-		kvm_set_pfn_dirty(pfn);
-		mark_page_dirty(kvm, gfn);
-	}
 
 	if (fault_status != FSC_PERM && !device)
 		clean_dcache_guest_page(pfn, vma_pagesize);
@@ -911,11 +908,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 					     memcache);
 	}
 
+	/* Mark the page dirty only if the fault is handled successfully */
+	if (writable && !ret) {
+		kvm_set_pfn_dirty(pfn);
+		mark_page_dirty(kvm, gfn);
+	}
+
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
 	kvm_set_pfn_accessed(pfn);
 	kvm_release_pfn_clean(pfn);
-	return ret;
+	return ret != -EAGAIN ? ret : 0;
 }
 
 /* Resolve the access fault by making the page young again. */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 0/3] Some optimization for stage-2 translation
  2021-01-14 12:13 [PATCH v3 0/3] Some optimization for stage-2 translation Yanan Wang
                   ` (2 preceding siblings ...)
  2021-01-14 12:13 ` [PATCH v3 3/3] KVM: arm64: Mark the page dirty only if the fault is handled successfully Yanan Wang
@ 2021-01-25 16:33 ` Marc Zyngier
  3 siblings, 0 replies; 5+ messages in thread
From: Marc Zyngier @ 2021-01-25 16:33 UTC (permalink / raw)
  To: Catalin Marinas, Yanan Wang, linux-arm-kernel, Will Deacon, kvm,
	kvmarm, linux-kernel
  Cc: zhukeqian1, James Morse, yuzenghui, Suzuki K Poulose,
	Julien Thierry, yezengruan, Gavin Shan, wanghaibin.wang,
	Quentin Perret

On Thu, 14 Jan 2021 20:13:47 +0800, Yanan Wang wrote:
> This patch series(v3) make some optimization for stage-2 translation.
> 
> About patch-1:
> Procedures of hyp stage-1 map and guest stage-2 map are quite different,
> but they are now tied closely by function kvm_set_valid_leaf_pte().
> So adjust the relative code for ease of code maintenance in the future.
> 
> [...]

Applied to kvm-arm64/concurrent-translation-fault, thanks!

[1/3] KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map
      commit: 8ed80051c8c31d1587722fdb3af16677eba9d693
[2/3] KVM: arm64: Filter out the case of only changing permissions from stage-2 map path
      commit: 694d071f8d85d504055540a27f0dbe9dbf44584e
[3/3] KVM: arm64: Mark the page dirty only if the fault is handled successfully
      commit: 509552e65ae8287178a5cdea2d734dcd2d6380ab

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-26  5:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 12:13 [PATCH v3 0/3] Some optimization for stage-2 translation Yanan Wang
2021-01-14 12:13 ` [PATCH v3 1/3] KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map Yanan Wang
2021-01-14 12:13 ` [PATCH v3 2/3] KVM: arm64: Filter out the case of only changing permissions from stage-2 map path Yanan Wang
2021-01-14 12:13 ` [PATCH v3 3/3] KVM: arm64: Mark the page dirty only if the fault is handled successfully Yanan Wang
2021-01-25 16:33 ` [PATCH v3 0/3] Some optimization for stage-2 translation Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).