All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Fix several bugs in KVM stage 2 translation
@ 2020-11-30 12:18 ` Yanan Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Yanan Wang @ 2020-11-30 12:18 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, Quentin Perret
  Cc: wanghaibin.wang, yezengruan, zhukeqian1, yuzenghui, jiangkunkun,
	wangjingyi11, lushenming, Yanan Wang

Several problems about KVM stage 2 translation were found when testing based
on the mainline code. The following is description of the problems and the
corresponding patchs.

When installing a new pte entry or updating an old valid entry in stage 2
translation, we use get_page()/put_page() to record page_count of the page-table
pages. PATCH 1/3 aims to fix incorrect use of get_page()/put_page() in stage 2,
which might make page-table pages unable to be freed when unmapping a range.

When dirty logging of a guest with hugepages is finished, we should merge tables
back into a block entry if adjustment of huge mapping is found necessary.
In addition to installing the block entry, mapping of the lower-levels for the
block should also be unmapped to avoid multiple TLB entries.
PATCH 2/3 adds unmap operation when merge tables into a block entry.

The rewrite of page-table code and fault handling add two different handlers
for "just relaxing permissions" and "map by stage2 page-table walk", that's
great improvement. Yet, in function user_mem_abort(), conditions where we choose
the above two fault handlers are not strictly distinguished. This will causes
guest errors such as infinite-loop (soft lockup will occur in result), because of
calling the inappropriate fault handler. So, a solution that can strictly
distinguish conditions is introduced in PATCH 3/3.

Yanan Wang (3):
  KVM: arm64: Fix possible memory leak in kvm stage2
  KVM: arm64: Fix handling of merging tables into a block entry
  KVM: arm64: Add usage of stage 2 fault lookup level in
    user_mem_abort()

 arch/arm64/include/asm/esr.h         |  1 +
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/kvm/hyp/pgtable.c         | 22 +++++++++++++++++-----
 arch/arm64/kvm/mmu.c                 | 11 +++++++++--
 4 files changed, 32 insertions(+), 7 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 0/3] Fix several bugs in KVM stage 2 translation
@ 2020-11-30 12:18 ` Yanan Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Yanan Wang @ 2020-11-30 12:18 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, Quentin Perret
  Cc: lushenming, jiangkunkun, Yanan Wang, yezengruan, wangjingyi11,
	yuzenghui, wanghaibin.wang, zhukeqian1

Several problems about KVM stage 2 translation were found when testing based
on the mainline code. The following is description of the problems and the
corresponding patchs.

When installing a new pte entry or updating an old valid entry in stage 2
translation, we use get_page()/put_page() to record page_count of the page-table
pages. PATCH 1/3 aims to fix incorrect use of get_page()/put_page() in stage 2,
which might make page-table pages unable to be freed when unmapping a range.

When dirty logging of a guest with hugepages is finished, we should merge tables
back into a block entry if adjustment of huge mapping is found necessary.
In addition to installing the block entry, mapping of the lower-levels for the
block should also be unmapped to avoid multiple TLB entries.
PATCH 2/3 adds unmap operation when merge tables into a block entry.

The rewrite of page-table code and fault handling add two different handlers
for "just relaxing permissions" and "map by stage2 page-table walk", that's
great improvement. Yet, in function user_mem_abort(), conditions where we choose
the above two fault handlers are not strictly distinguished. This will causes
guest errors such as infinite-loop (soft lockup will occur in result), because of
calling the inappropriate fault handler. So, a solution that can strictly
distinguish conditions is introduced in PATCH 3/3.

Yanan Wang (3):
  KVM: arm64: Fix possible memory leak in kvm stage2
  KVM: arm64: Fix handling of merging tables into a block entry
  KVM: arm64: Add usage of stage 2 fault lookup level in
    user_mem_abort()

 arch/arm64/include/asm/esr.h         |  1 +
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/kvm/hyp/pgtable.c         | 22 +++++++++++++++++-----
 arch/arm64/kvm/mmu.c                 | 11 +++++++++--
 4 files changed, 32 insertions(+), 7 deletions(-)

-- 
2.19.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
  2020-11-30 12:18 ` Yanan Wang
@ 2020-11-30 12:18   ` Yanan Wang
  -1 siblings, 0 replies; 49+ messages in thread
From: Yanan Wang @ 2020-11-30 12:18 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, Quentin Perret
  Cc: wanghaibin.wang, yezengruan, zhukeqian1, yuzenghui, jiangkunkun,
	wangjingyi11, lushenming, Yanan Wang

When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
When just updating a valid leaf ptep, we shouldn't get_page(ptep).
Incorrect page_count of translation tables might lead to memory leak,
when unmapping a stage 2 memory range.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..696b6aa83faf 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
 		return old == pte;
 
 	smp_store_release(ptep, pte);
+	get_page(virt_to_page(ptep));
 	return true;
 }
 
@@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	/* There's an existing valid leaf entry, so perform break-before-make */
 	kvm_set_invalid_pte(ptep);
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+	put_page(virt_to_page(ptep));
 	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
 out:
 	data->phys += granule;
@@ -512,7 +514,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
-		goto out_get_page;
+		return 0;
 
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
@@ -536,9 +538,8 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	kvm_set_table_pte(ptep, childp);
-
-out_get_page:
 	get_page(page);
+
 	return 0;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
@ 2020-11-30 12:18   ` Yanan Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Yanan Wang @ 2020-11-30 12:18 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, Quentin Perret
  Cc: lushenming, jiangkunkun, Yanan Wang, yezengruan, wangjingyi11,
	yuzenghui, wanghaibin.wang, zhukeqian1

When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
When just updating a valid leaf ptep, we shouldn't get_page(ptep).
Incorrect page_count of translation tables might lead to memory leak,
when unmapping a stage 2 memory range.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..696b6aa83faf 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
 		return old == pte;
 
 	smp_store_release(ptep, pte);
+	get_page(virt_to_page(ptep));
 	return true;
 }
 
@@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	/* There's an existing valid leaf entry, so perform break-before-make */
 	kvm_set_invalid_pte(ptep);
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+	put_page(virt_to_page(ptep));
 	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
 out:
 	data->phys += granule;
@@ -512,7 +514,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
-		goto out_get_page;
+		return 0;
 
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
@@ -536,9 +538,8 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	}
 
 	kvm_set_table_pte(ptep, childp);
-
-out_get_page:
 	get_page(page);
+
 	return 0;
 }
 
-- 
2.19.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-11-30 12:18 ` Yanan Wang
@ 2020-11-30 12:18   ` Yanan Wang
  -1 siblings, 0 replies; 49+ messages in thread
From: Yanan Wang @ 2020-11-30 12:18 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, Quentin Perret
  Cc: wanghaibin.wang, yezengruan, zhukeqian1, yuzenghui, jiangkunkun,
	wangjingyi11, lushenming, Yanan Wang

In dirty logging case(logging_active == True), we need to collapse a block
entry into a table if necessary. After dirty logging is canceled, when merging
tables back into a block entry, we should not only free the non-huge page
tables but also unmap the non-huge mapping for the block. Without the unmap,
inconsistent TLB entries for the pages in the the block will be created.

We could also use unmap_stage2_range API to unmap the non-huge mapping,
but this could potentially free the upper level page-table page which
will be useful later.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 696b6aa83faf..fec8dc9f2baa 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	return 0;
 }
 
+static void stage2_flush_dcache(void *addr, u64 size);
+static bool stage2_pte_cacheable(kvm_pte_t pte);
+
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
@@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	struct page *page = virt_to_page(ptep);
 
 	if (data->anchor) {
-		if (kvm_pte_valid(pte))
+		if (kvm_pte_valid(pte)) {
+			kvm_set_invalid_pte(ptep);
+			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
+				     addr, level);
 			put_page(page);
 
+			if (stage2_pte_cacheable(pte))
+				stage2_flush_dcache(kvm_pte_follow(pte),
+						    kvm_granule_size(level));
+		}
+
 		return 0;
 	}
 
@@ -574,7 +585,7 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
  * The behaviour of the LEAF callback then depends on whether or not the
  * anchor has been set. If not, then we're not using a block mapping higher
  * up the table and we perform the mapping at the existing leaves instead.
- * If, on the other hand, the anchor _is_ set, then we drop references to
+ * If, on the other hand, the anchor _is_ set, then we unmap the mapping of
  * all valid leaves so that the pages beneath the anchor can be freed.
  *
  * Finally, the TABLE_POST callback does nothing if the anchor has not
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-11-30 12:18   ` Yanan Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Yanan Wang @ 2020-11-30 12:18 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, Quentin Perret
  Cc: lushenming, jiangkunkun, Yanan Wang, yezengruan, wangjingyi11,
	yuzenghui, wanghaibin.wang, zhukeqian1

In dirty logging case(logging_active == True), we need to collapse a block
entry into a table if necessary. After dirty logging is canceled, when merging
tables back into a block entry, we should not only free the non-huge page
tables but also unmap the non-huge mapping for the block. Without the unmap,
inconsistent TLB entries for the pages in the the block will be created.

We could also use unmap_stage2_range API to unmap the non-huge mapping,
but this could potentially free the upper level page-table page which
will be useful later.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 696b6aa83faf..fec8dc9f2baa 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	return 0;
 }
 
+static void stage2_flush_dcache(void *addr, u64 size);
+static bool stage2_pte_cacheable(kvm_pte_t pte);
+
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data)
 {
@@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	struct page *page = virt_to_page(ptep);
 
 	if (data->anchor) {
-		if (kvm_pte_valid(pte))
+		if (kvm_pte_valid(pte)) {
+			kvm_set_invalid_pte(ptep);
+			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
+				     addr, level);
 			put_page(page);
 
+			if (stage2_pte_cacheable(pte))
+				stage2_flush_dcache(kvm_pte_follow(pte),
+						    kvm_granule_size(level));
+		}
+
 		return 0;
 	}
 
@@ -574,7 +585,7 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
  * The behaviour of the LEAF callback then depends on whether or not the
  * anchor has been set. If not, then we're not using a block mapping higher
  * up the table and we perform the mapping at the existing leaves instead.
- * If, on the other hand, the anchor _is_ set, then we drop references to
+ * If, on the other hand, the anchor _is_ set, then we unmap the mapping of
  * all valid leaves so that the pages beneath the anchor can be freed.
  *
  * Finally, the TABLE_POST callback does nothing if the anchor has not
-- 
2.19.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
  2020-11-30 12:18 ` Yanan Wang
@ 2020-11-30 12:18   ` Yanan Wang
  -1 siblings, 0 replies; 49+ messages in thread
From: Yanan Wang @ 2020-11-30 12:18 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, Quentin Perret
  Cc: wanghaibin.wang, yezengruan, zhukeqian1, yuzenghui, jiangkunkun,
	wangjingyi11, lushenming, Yanan Wang

If we get a FSC_PERM fault, just using (logging_active && writable) to determine
calling kvm_pgtable_stage2_map(). There will be two more cases we should consider.

(1) After logging_active is configged back to false from true. When we get a
FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should
merge tables back to a block entry. This case is ignored by still calling
kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest
panic due to soft lockup.

(2) We use (FSC_PERM && logging_active && writable) to determine collapsing
a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes
we may only need to relax permissions when trying to write to a page other than
a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine.

The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level
at which a D-abort or I-abort occured. By comparing granule of the fault lookup
level with vma_pagesize, we can strictly distinguish conditions of calling
kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above
two cases will be well considered.

Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/include/asm/esr.h         |  1 +
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/kvm/mmu.c                 | 11 +++++++++--
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 22c81f1edda2..85a3e49f92f4 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -104,6 +104,7 @@
 /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
 #define ESR_ELx_FSC		(0x3F)
 #define ESR_ELx_FSC_TYPE	(0x3C)
+#define ESR_ELx_FSC_LEVEL	(0x03)
 #define ESR_ELx_FSC_EXTABT	(0x10)
 #define ESR_ELx_FSC_SERROR	(0x11)
 #define ESR_ELx_FSC_ACCESS	(0x08)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5ef2669ccd6c..2e0e8edf6306 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
+static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
+{
+
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
 {
 	switch (kvm_vcpu_trap_get_fault(vcpu)) {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1a01da9fdc99..75814a02d189 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	unsigned long vma_pagesize;
+	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
+	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 
+	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
 	VM_BUG_ON(write_fault && exec_fault);
@@ -896,7 +898,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
 		prot |= KVM_PGTABLE_PROT_X;
 
-	if (fault_status == FSC_PERM && !(logging_active && writable)) {
+	/*
+	 * Under the premise of getting a FSC_PERM fault, we just need to relax
+	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
+	 * kvm_pgtable_stage2_map() should be called to change block size.
+	 */
+	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
@ 2020-11-30 12:18   ` Yanan Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Yanan Wang @ 2020-11-30 12:18 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose,
	Gavin Shan, Quentin Perret
  Cc: lushenming, jiangkunkun, Yanan Wang, yezengruan, wangjingyi11,
	yuzenghui, wanghaibin.wang, zhukeqian1

If we get a FSC_PERM fault, just using (logging_active && writable) to determine
calling kvm_pgtable_stage2_map(). There will be two more cases we should consider.

(1) After logging_active is configged back to false from true. When we get a
FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should
merge tables back to a block entry. This case is ignored by still calling
kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest
panic due to soft lockup.

(2) We use (FSC_PERM && logging_active && writable) to determine collapsing
a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes
we may only need to relax permissions when trying to write to a page other than
a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine.

The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level
at which a D-abort or I-abort occured. By comparing granule of the fault lookup
level with vma_pagesize, we can strictly distinguish conditions of calling
kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above
two cases will be well considered.

Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 arch/arm64/include/asm/esr.h         |  1 +
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/kvm/mmu.c                 | 11 +++++++++--
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 22c81f1edda2..85a3e49f92f4 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -104,6 +104,7 @@
 /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
 #define ESR_ELx_FSC		(0x3F)
 #define ESR_ELx_FSC_TYPE	(0x3C)
+#define ESR_ELx_FSC_LEVEL	(0x03)
 #define ESR_ELx_FSC_EXTABT	(0x10)
 #define ESR_ELx_FSC_SERROR	(0x11)
 #define ESR_ELx_FSC_ACCESS	(0x08)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5ef2669ccd6c..2e0e8edf6306 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
+static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
+{
+	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
+{
+
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
 {
 	switch (kvm_vcpu_trap_get_fault(vcpu)) {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1a01da9fdc99..75814a02d189 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	unsigned long vma_pagesize;
+	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
+	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 
+	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
 	VM_BUG_ON(write_fault && exec_fault);
@@ -896,7 +898,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
 		prot |= KVM_PGTABLE_PROT_X;
 
-	if (fault_status == FSC_PERM && !(logging_active && writable)) {
+	/*
+	 * Under the premise of getting a FSC_PERM fault, we just need to relax
+	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
+	 * kvm_pgtable_stage2_map() should be called to change block size.
+	 */
+	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-- 
2.19.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
  2020-11-30 12:18   ` Yanan Wang
@ 2020-11-30 13:21     ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-11-30 13:21 UTC (permalink / raw)
  To: Yanan Wang
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
> When just updating a valid leaf ptep, we shouldn't get_page(ptep).
> Incorrect page_count of translation tables might lead to memory leak,
> when unmapping a stage 2 memory range.

Did you find this by inspection, or did you hit this in practice? I'd be
interested to see the backtrace for mapping over an existing mapping.

> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..696b6aa83faf 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>  		return old == pte;
>  
>  	smp_store_release(ptep, pte);
> +	get_page(virt_to_page(ptep));

This is also used for the hypervisor stage-1 page-table, so I'd prefer to
leave this function as-is.

>  	return true;
>  }
>  
> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>  	/* There's an existing valid leaf entry, so perform break-before-make */
>  	kvm_set_invalid_pte(ptep);
>  	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +	put_page(virt_to_page(ptep));
>  	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>  out:
>  	data->phys += granule;

Isn't this hunk alone sufficient to solve the problem?

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
@ 2020-11-30 13:21     ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-11-30 13:21 UTC (permalink / raw)
  To: Yanan Wang
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
> When just updating a valid leaf ptep, we shouldn't get_page(ptep).
> Incorrect page_count of translation tables might lead to memory leak,
> when unmapping a stage 2 memory range.

Did you find this by inspection, or did you hit this in practice? I'd be
interested to see the backtrace for mapping over an existing mapping.

> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..696b6aa83faf 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>  		return old == pte;
>  
>  	smp_store_release(ptep, pte);
> +	get_page(virt_to_page(ptep));

This is also used for the hypervisor stage-1 page-table, so I'd prefer to
leave this function as-is.

>  	return true;
>  }
>  
> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>  	/* There's an existing valid leaf entry, so perform break-before-make */
>  	kvm_set_invalid_pte(ptep);
>  	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +	put_page(virt_to_page(ptep));
>  	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>  out:
>  	data->phys += granule;

Isn't this hunk alone sufficient to solve the problem?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-11-30 12:18   ` Yanan Wang
@ 2020-11-30 13:34     ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-11-30 13:34 UTC (permalink / raw)
  To: Yanan Wang
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
> In dirty logging case(logging_active == True), we need to collapse a block
> entry into a table if necessary. After dirty logging is canceled, when merging
> tables back into a block entry, we should not only free the non-huge page
> tables but also unmap the non-huge mapping for the block. Without the unmap,
> inconsistent TLB entries for the pages in the the block will be created.
> 
> We could also use unmap_stage2_range API to unmap the non-huge mapping,
> but this could potentially free the upper level page-table page which
> will be useful later.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 696b6aa83faf..fec8dc9f2baa 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  	return 0;
>  }
>  
> +static void stage2_flush_dcache(void *addr, u64 size);
> +static bool stage2_pte_cacheable(kvm_pte_t pte);
> +
>  static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  				struct stage2_map_data *data)
>  {
> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	struct page *page = virt_to_page(ptep);
>  
>  	if (data->anchor) {
> -		if (kvm_pte_valid(pte))
> +		if (kvm_pte_valid(pte)) {
> +			kvm_set_invalid_pte(ptep);
> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
> +				     addr, level);
>  			put_page(page);

This doesn't make sense to me: the page-table pages we're walking when the
anchor is set are not accessible to the hardware walker because we unhooked
the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
TLB invalidation.

Are you seeing a problem in practice here?

> +			if (stage2_pte_cacheable(pte))
> +				stage2_flush_dcache(kvm_pte_follow(pte),
> +						    kvm_granule_size(level));

I don't understand the need for the flush either, as we're just coalescing
existing entries into a larger block mapping.

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-11-30 13:34     ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-11-30 13:34 UTC (permalink / raw)
  To: Yanan Wang
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
> In dirty logging case(logging_active == True), we need to collapse a block
> entry into a table if necessary. After dirty logging is canceled, when merging
> tables back into a block entry, we should not only free the non-huge page
> tables but also unmap the non-huge mapping for the block. Without the unmap,
> inconsistent TLB entries for the pages in the the block will be created.
> 
> We could also use unmap_stage2_range API to unmap the non-huge mapping,
> but this could potentially free the upper level page-table page which
> will be useful later.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 696b6aa83faf..fec8dc9f2baa 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  	return 0;
>  }
>  
> +static void stage2_flush_dcache(void *addr, u64 size);
> +static bool stage2_pte_cacheable(kvm_pte_t pte);
> +
>  static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  				struct stage2_map_data *data)
>  {
> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	struct page *page = virt_to_page(ptep);
>  
>  	if (data->anchor) {
> -		if (kvm_pte_valid(pte))
> +		if (kvm_pte_valid(pte)) {
> +			kvm_set_invalid_pte(ptep);
> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
> +				     addr, level);
>  			put_page(page);

This doesn't make sense to me: the page-table pages we're walking when the
anchor is set are not accessible to the hardware walker because we unhooked
the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
TLB invalidation.

Are you seeing a problem in practice here?

> +			if (stage2_pte_cacheable(pte))
> +				stage2_flush_dcache(kvm_pte_follow(pte),
> +						    kvm_granule_size(level));

I don't understand the need for the flush either, as we're just coalescing
existing entries into a larger block mapping.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
  2020-11-30 12:18   ` Yanan Wang
@ 2020-11-30 13:49     ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-11-30 13:49 UTC (permalink / raw)
  To: Yanan Wang
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On Mon, Nov 30, 2020 at 08:18:47PM +0800, Yanan Wang wrote:
> If we get a FSC_PERM fault, just using (logging_active && writable) to determine
> calling kvm_pgtable_stage2_map(). There will be two more cases we should consider.
> 
> (1) After logging_active is configged back to false from true. When we get a
> FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should
> merge tables back to a block entry. This case is ignored by still calling
> kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest
> panic due to soft lockup.
> 
> (2) We use (FSC_PERM && logging_active && writable) to determine collapsing
> a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes
> we may only need to relax permissions when trying to write to a page other than
> a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine.
> 
> The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level
> at which a D-abort or I-abort occured. By comparing granule of the fault lookup
> level with vma_pagesize, we can strictly distinguish conditions of calling
> kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above
> two cases will be well considered.
> 
> Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  arch/arm64/include/asm/esr.h         |  1 +
>  arch/arm64/include/asm/kvm_emulate.h |  5 +++++
>  arch/arm64/kvm/mmu.c                 | 11 +++++++++--
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 22c81f1edda2..85a3e49f92f4 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -104,6 +104,7 @@
>  /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
>  #define ESR_ELx_FSC		(0x3F)
>  #define ESR_ELx_FSC_TYPE	(0x3C)
> +#define ESR_ELx_FSC_LEVEL	(0x03)
>  #define ESR_ELx_FSC_EXTABT	(0x10)
>  #define ESR_ELx_FSC_SERROR	(0x11)
>  #define ESR_ELx_FSC_ACCESS	(0x08)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5ef2669ccd6c..2e0e8edf6306 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>  }
>  
> +static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
> +{
> +	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
> +{
> +
>  static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
>  {
>  	switch (kvm_vcpu_trap_get_fault(vcpu)) {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 1a01da9fdc99..75814a02d189 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	gfn_t gfn;
>  	kvm_pfn_t pfn;
>  	bool logging_active = memslot_is_logging(memslot);
> -	unsigned long vma_pagesize;
> +	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
> +	unsigned long vma_pagesize, fault_granule;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>  	struct kvm_pgtable *pgt;
>  
> +	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);

I like the idea, but is this macro reliable for stage-2 page-tables, given
that we could have a concatenated pgd?

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
@ 2020-11-30 13:49     ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-11-30 13:49 UTC (permalink / raw)
  To: Yanan Wang
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

On Mon, Nov 30, 2020 at 08:18:47PM +0800, Yanan Wang wrote:
> If we get a FSC_PERM fault, just using (logging_active && writable) to determine
> calling kvm_pgtable_stage2_map(). There will be two more cases we should consider.
> 
> (1) After logging_active is configged back to false from true. When we get a
> FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should
> merge tables back to a block entry. This case is ignored by still calling
> kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest
> panic due to soft lockup.
> 
> (2) We use (FSC_PERM && logging_active && writable) to determine collapsing
> a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes
> we may only need to relax permissions when trying to write to a page other than
> a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine.
> 
> The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level
> at which a D-abort or I-abort occured. By comparing granule of the fault lookup
> level with vma_pagesize, we can strictly distinguish conditions of calling
> kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above
> two cases will be well considered.
> 
> Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  arch/arm64/include/asm/esr.h         |  1 +
>  arch/arm64/include/asm/kvm_emulate.h |  5 +++++
>  arch/arm64/kvm/mmu.c                 | 11 +++++++++--
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 22c81f1edda2..85a3e49f92f4 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -104,6 +104,7 @@
>  /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
>  #define ESR_ELx_FSC		(0x3F)
>  #define ESR_ELx_FSC_TYPE	(0x3C)
> +#define ESR_ELx_FSC_LEVEL	(0x03)
>  #define ESR_ELx_FSC_EXTABT	(0x10)
>  #define ESR_ELx_FSC_SERROR	(0x11)
>  #define ESR_ELx_FSC_ACCESS	(0x08)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5ef2669ccd6c..2e0e8edf6306 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>  }
>  
> +static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
> +{
> +	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
> +{
> +
>  static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
>  {
>  	switch (kvm_vcpu_trap_get_fault(vcpu)) {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 1a01da9fdc99..75814a02d189 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	gfn_t gfn;
>  	kvm_pfn_t pfn;
>  	bool logging_active = memslot_is_logging(memslot);
> -	unsigned long vma_pagesize;
> +	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
> +	unsigned long vma_pagesize, fault_granule;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>  	struct kvm_pgtable *pgt;
>  
> +	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);

I like the idea, but is this macro reliable for stage-2 page-tables, given
that we could have a concatenated pgd?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-11-30 13:34     ` Will Deacon
@ 2020-11-30 15:24       ` wangyanan (Y)
  -1 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-11-30 15:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming


On 2020/11/30 21:34, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>> In dirty logging case(logging_active == True), we need to collapse a block
>> entry into a table if necessary. After dirty logging is canceled, when merging
>> tables back into a block entry, we should not only free the non-huge page
>> tables but also unmap the non-huge mapping for the block. Without the unmap,
>> inconsistent TLB entries for the pages in the the block will be created.
>>
>> We could also use unmap_stage2_range API to unmap the non-huge mapping,
>> but this could potentially free the upper level page-table page which
>> will be useful later.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 696b6aa83faf..fec8dc9f2baa 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>>   	return 0;
>>   }
>>   
>> +static void stage2_flush_dcache(void *addr, u64 size);
>> +static bool stage2_pte_cacheable(kvm_pte_t pte);
>> +
>>   static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>   				struct stage2_map_data *data)
>>   {
>> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>   	struct page *page = virt_to_page(ptep);
>>   
>>   	if (data->anchor) {
>> -		if (kvm_pte_valid(pte))
>> +		if (kvm_pte_valid(pte)) {
>> +			kvm_set_invalid_pte(ptep);
>> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>> +				     addr, level);
>>   			put_page(page);
> This doesn't make sense to me: the page-table pages we're walking when the
> anchor is set are not accessible to the hardware walker because we unhooked
> the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
> TLB invalidation.
>
> Are you seeing a problem in practice here?

Yes, I indeed find a problem in practice.

When the migration was cancelled, a TLB conflic abort  was found in guest.

This problem is fixed before rework of the page table code, you can have 
a look in the following two links:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277

https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html

>> +			if (stage2_pte_cacheable(pte))
>> +				stage2_flush_dcache(kvm_pte_follow(pte),
>> +						    kvm_granule_size(level));
> I don't understand the need for the flush either, as we're just coalescing
> existing entries into a larger block mapping.

In my opinion, after unmapping, it is necessary to ensure the cache 
coherency, because it is unknown whether the future mapping memory 
attribute is changed or not (cacheable -> non_cacheable) theoretically.

> Will
> .

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-11-30 15:24       ` wangyanan (Y)
  0 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-11-30 15:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry


On 2020/11/30 21:34, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>> In dirty logging case(logging_active == True), we need to collapse a block
>> entry into a table if necessary. After dirty logging is canceled, when merging
>> tables back into a block entry, we should not only free the non-huge page
>> tables but also unmap the non-huge mapping for the block. Without the unmap,
>> inconsistent TLB entries for the pages in the the block will be created.
>>
>> We could also use unmap_stage2_range API to unmap the non-huge mapping,
>> but this could potentially free the upper level page-table page which
>> will be useful later.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 696b6aa83faf..fec8dc9f2baa 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>>   	return 0;
>>   }
>>   
>> +static void stage2_flush_dcache(void *addr, u64 size);
>> +static bool stage2_pte_cacheable(kvm_pte_t pte);
>> +
>>   static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>   				struct stage2_map_data *data)
>>   {
>> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>   	struct page *page = virt_to_page(ptep);
>>   
>>   	if (data->anchor) {
>> -		if (kvm_pte_valid(pte))
>> +		if (kvm_pte_valid(pte)) {
>> +			kvm_set_invalid_pte(ptep);
>> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>> +				     addr, level);
>>   			put_page(page);
> This doesn't make sense to me: the page-table pages we're walking when the
> anchor is set are not accessible to the hardware walker because we unhooked
> the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
> TLB invalidation.
>
> Are you seeing a problem in practice here?

Yes, I indeed find a problem in practice.

When the migration was cancelled, a TLB conflic abort  was found in guest.

This problem is fixed before rework of the page table code, you can have 
a look in the following two links:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277

https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html

>> +			if (stage2_pte_cacheable(pte))
>> +				stage2_flush_dcache(kvm_pte_follow(pte),
>> +						    kvm_granule_size(level));
> I don't understand the need for the flush either, as we're just coalescing
> existing entries into a larger block mapping.

In my opinion, after unmapping, it is necessary to ensure the cache 
coherency, because it is unknown whether the future mapping memory 
attribute is changed or not (cacheable -> non_cacheable) theoretically.

> Will
> .

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-11-30 15:24       ` wangyanan (Y)
@ 2020-11-30 16:01         ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-11-30 16:01 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

Hi,

Cheers for the quick reply. See below for more questions...

On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
> On 2020/11/30 21:34, Will Deacon wrote:
> > On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index 696b6aa83faf..fec8dc9f2baa 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> > >   	return 0;
> > >   }
> > > +static void stage2_flush_dcache(void *addr, u64 size);
> > > +static bool stage2_pte_cacheable(kvm_pte_t pte);
> > > +
> > >   static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > >   				struct stage2_map_data *data)
> > >   {
> > > @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > >   	struct page *page = virt_to_page(ptep);
> > >   	if (data->anchor) {
> > > -		if (kvm_pte_valid(pte))
> > > +		if (kvm_pte_valid(pte)) {
> > > +			kvm_set_invalid_pte(ptep);
> > > +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
> > > +				     addr, level);
> > >   			put_page(page);
> > This doesn't make sense to me: the page-table pages we're walking when the
> > anchor is set are not accessible to the hardware walker because we unhooked
> > the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
> > TLB invalidation.
> > 
> > Are you seeing a problem in practice here?
> 
> Yes, I indeed find a problem in practice.
> 
> When the migration was cancelled, a TLB conflic abort  was found in guest.
> 
> This problem is fixed before rework of the page table code, you can have a
> look in the following two links:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
> 
> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html

Ok, let's go through this, because I still don't see the bug. Please correct
me if you spot any mistakes:

  1. We have a block mapping for X => Y
  2. Dirty logging is enabled, so the block mapping is write-protected and
     ends up being split into page mappings
  3. Dirty logging is disabled due to a failed migration.

--- At this point, I think we agree that the state of the MMU is alright ---

  4. We take a stage-2 fault and want to reinstall the block mapping:

     a. kvm_pgtable_stage2_map() is invoked to install the block mapping
     b. stage2_map_walk_table_pre() finds a table where we would like to
        install the block:

	i.   The anchor is set to point at this entry
	ii.  The entry is made invalid
	iii. We invalidate the TLB for the input address. This is
	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.

	*** At this point, the page-table pointed to by the old table entry
	    is not reachable to the hardware walker ***

     c. stage2_map_walk_leaf() is called for each leaf entry in the
        now-unreachable subtree, dropping page-references for each valid
	entry it finds.
     d. stage2_map_walk_table_post() is eventually called for the entry
        which we cleared back in b.ii, so we install the new block mapping.

You are proposing to add additional TLB invalidation to (c), but I don't
think that is necessary, thanks to the invalidation already performed in
b.iii. What am I missing here?

> > > +			if (stage2_pte_cacheable(pte))
> > > +				stage2_flush_dcache(kvm_pte_follow(pte),
> > > +						    kvm_granule_size(level));
> > I don't understand the need for the flush either, as we're just coalescing
> > existing entries into a larger block mapping.
> 
> In my opinion, after unmapping, it is necessary to ensure the cache
> coherency, because it is unknown whether the future mapping memory attribute
> is changed or not (cacheable -> non_cacheable) theoretically.

But in this case we're just changing the structure of the page-tables,
not the pages which are mapped, are we?

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-11-30 16:01         ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-11-30 16:01 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

Hi,

Cheers for the quick reply. See below for more questions...

On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
> On 2020/11/30 21:34, Will Deacon wrote:
> > On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index 696b6aa83faf..fec8dc9f2baa 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> > >   	return 0;
> > >   }
> > > +static void stage2_flush_dcache(void *addr, u64 size);
> > > +static bool stage2_pte_cacheable(kvm_pte_t pte);
> > > +
> > >   static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > >   				struct stage2_map_data *data)
> > >   {
> > > @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > >   	struct page *page = virt_to_page(ptep);
> > >   	if (data->anchor) {
> > > -		if (kvm_pte_valid(pte))
> > > +		if (kvm_pte_valid(pte)) {
> > > +			kvm_set_invalid_pte(ptep);
> > > +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
> > > +				     addr, level);
> > >   			put_page(page);
> > This doesn't make sense to me: the page-table pages we're walking when the
> > anchor is set are not accessible to the hardware walker because we unhooked
> > the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
> > TLB invalidation.
> > 
> > Are you seeing a problem in practice here?
> 
> Yes, I indeed find a problem in practice.
> 
> When the migration was cancelled, a TLB conflic abort  was found in guest.
> 
> This problem is fixed before rework of the page table code, you can have a
> look in the following two links:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
> 
> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html

Ok, let's go through this, because I still don't see the bug. Please correct
me if you spot any mistakes:

  1. We have a block mapping for X => Y
  2. Dirty logging is enabled, so the block mapping is write-protected and
     ends up being split into page mappings
  3. Dirty logging is disabled due to a failed migration.

--- At this point, I think we agree that the state of the MMU is alright ---

  4. We take a stage-2 fault and want to reinstall the block mapping:

     a. kvm_pgtable_stage2_map() is invoked to install the block mapping
     b. stage2_map_walk_table_pre() finds a table where we would like to
        install the block:

	i.   The anchor is set to point at this entry
	ii.  The entry is made invalid
	iii. We invalidate the TLB for the input address. This is
	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.

	*** At this point, the page-table pointed to by the old table entry
	    is not reachable to the hardware walker ***

     c. stage2_map_walk_leaf() is called for each leaf entry in the
        now-unreachable subtree, dropping page-references for each valid
	entry it finds.
     d. stage2_map_walk_table_post() is eventually called for the entry
        which we cleared back in b.ii, so we install the new block mapping.

You are proposing to add additional TLB invalidation to (c), but I don't
think that is necessary, thanks to the invalidation already performed in
b.iii. What am I missing here?

> > > +			if (stage2_pte_cacheable(pte))
> > > +				stage2_flush_dcache(kvm_pte_follow(pte),
> > > +						    kvm_granule_size(level));
> > I don't understand the need for the flush either, as we're just coalescing
> > existing entries into a larger block mapping.
> 
> In my opinion, after unmapping, it is necessary to ensure the cache
> coherency, because it is unknown whether the future mapping memory attribute
> is changed or not (cacheable -> non_cacheable) theoretically.

But in this case we're just changing the structure of the page-tables,
not the pages which are mapped, are we?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
  2020-11-30 12:18   ` Yanan Wang
  (?)
  (?)
@ 2020-11-30 19:40   ` kernel test robot
  -1 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2020-11-30 19:40 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 45527 bytes --]

Hi Yanan,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvmarm/next]
[also build test ERROR on arm64/for-next/core linux/master linus/master v5.10-rc6 next-20201130]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Yanan-Wang/Fix-several-bugs-in-KVM-stage-2-translation/20201130-202556
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/3f926cff86d61ffd0145d3fadf339230b41d8994
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Yanan-Wang/Fix-several-bugs-in-KVM-stage-2-translation/20201130-202556
        git checkout 3f926cff86d61ffd0145d3fadf339230b41d8994
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from arch/arm64/kvm/arm.c:37:
   arch/arm64/include/asm/kvm_emulate.h: In function 'kvm_vcpu_trap_get_fault_level':
>> arch/arm64/include/asm/kvm_emulate.h:339:29: error: invalid storage class for function 'kvm_vcpu_abt_issea'
     339 | static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:358:28: error: invalid storage class for function 'kvm_vcpu_sys_get_rt'
     358 | static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
         |                            ^~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:364:20: error: invalid storage class for function 'kvm_is_write_fault'
     364 | static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
         |                    ^~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:375:29: error: invalid storage class for function 'kvm_vcpu_get_mpidr_aff'
     375 | static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:380:20: error: invalid storage class for function 'kvm_vcpu_set_be'
     380 | static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
         |                    ^~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:391:20: error: invalid storage class for function 'kvm_vcpu_is_be'
     391 | static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
         |                    ^~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:399:29: error: invalid storage class for function 'vcpu_data_guest_to_host'
     399 | static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
         |                             ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:430:29: error: invalid storage class for function 'vcpu_data_host_to_guest'
     430 | static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
         |                             ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:461:29: error: invalid storage class for function 'kvm_incr_pc'
     461 | static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~
   In file included from arch/arm64/kvm/arm.c:40:
>> include/kvm/arm_hypercalls.h:11:19: error: invalid storage class for function 'smccc_get_function'
      11 | static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
         |                   ^~~~~~~~~~~~~~~~~~
>> include/kvm/arm_hypercalls.h:16:29: error: invalid storage class for function 'smccc_get_arg1'
      16 | static inline unsigned long smccc_get_arg1(struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~~~~
>> include/kvm/arm_hypercalls.h:21:29: error: invalid storage class for function 'smccc_get_arg2'
      21 | static inline unsigned long smccc_get_arg2(struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~~~~
>> include/kvm/arm_hypercalls.h:26:29: error: invalid storage class for function 'smccc_get_arg3'
      26 | static inline unsigned long smccc_get_arg3(struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~~~~
>> include/kvm/arm_hypercalls.h:31:20: error: invalid storage class for function 'smccc_set_retval'
      31 | static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
         |                    ^~~~~~~~~~~~~~~~
   In file included from arch/arm64/kvm/arm.c:42:
>> include/kvm/arm_psci.h:23:19: error: invalid storage class for function 'kvm_psci_version'
      23 | static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm)
         |                   ^~~~~~~~~~~~~~~~
   In file included from include/asm-generic/percpu.h:7,
                    from arch/arm64/include/asm/percpu.h:242,
                    from arch/arm64/include/asm/smp.h:28,
                    from include/linux/smp.h:85,
                    from include/linux/lockdep.h:14,
                    from include/linux/mutex.h:17,
                    from include/linux/notifier.h:14,
                    from include/linux/cpu_pm.h:13,
                    from arch/arm64/kvm/arm.c:8:
>> include/linux/percpu-defs.h:87:33: error: section attribute cannot be specified for local variables
      87 |  extern __PCPU_DUMMY_ATTRS char __pcpu_scope_##name;  \
         |                                 ^~~~~~~~~~~~~
   include/linux/percpu-defs.h:112:2: note: in expansion of macro 'DECLARE_PER_CPU_SECTION'
     112 |  DECLARE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/include/asm/kvm_asm.h:77:2: note: in expansion of macro 'DECLARE_PER_CPU'
      77 |  DECLARE_PER_CPU(type, sym)
         |  ^~~~~~~~~~~~~~~
   arch/arm64/include/asm/kvm_asm.h:82:2: note: in expansion of macro 'DECLARE_KVM_VHE_PER_CPU'
      82 |  DECLARE_KVM_VHE_PER_CPU(type, sym); \
         |  ^~~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:48:1: note: in expansion of macro 'DECLARE_KVM_HYP_PER_CPU'
      48 | DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
         | ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/kvm/arm.c:48:40: error: section attribute cannot be specified for local variables
      48 | DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
         |                                        ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:88:44: note: in definition of macro 'DECLARE_PER_CPU_SECTION'
      88 |  extern __PCPU_ATTRS(sec) __typeof__(type) name
         |                                            ^~~~
   arch/arm64/include/asm/kvm_asm.h:77:2: note: in expansion of macro 'DECLARE_PER_CPU'
      77 |  DECLARE_PER_CPU(type, sym)
         |  ^~~~~~~~~~~~~~~
   arch/arm64/include/asm/kvm_asm.h:82:2: note: in expansion of macro 'DECLARE_KVM_VHE_PER_CPU'
      82 |  DECLARE_KVM_VHE_PER_CPU(type, sym); \
         |  ^~~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:48:1: note: in expansion of macro 'DECLARE_KVM_HYP_PER_CPU'
      48 | DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
         | ^~~~~~~~~~~~~~~~~~~~~~~
>> include/linux/percpu-defs.h:87:33: error: section attribute cannot be specified for local variables
      87 |  extern __PCPU_DUMMY_ATTRS char __pcpu_scope_##name;  \
         |                                 ^~~~~~~~~~~~~
   include/linux/percpu-defs.h:112:2: note: in expansion of macro 'DECLARE_PER_CPU_SECTION'
     112 |  DECLARE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/include/asm/kvm_asm.h:79:2: note: in expansion of macro 'DECLARE_PER_CPU'
      79 |  DECLARE_PER_CPU(type, kvm_nvhe_sym(sym))
         |  ^~~~~~~~~~~~~~~
   arch/arm64/include/asm/kvm_asm.h:83:2: note: in expansion of macro 'DECLARE_KVM_NVHE_PER_CPU'
      83 |  DECLARE_KVM_NVHE_PER_CPU(type, sym)
         |  ^~~~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:48:1: note: in expansion of macro 'DECLARE_KVM_HYP_PER_CPU'
      48 | DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
         | ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/hyp_image.h:14:27: error: section attribute cannot be specified for local variables
      14 | #define kvm_nvhe_sym(sym) __kvm_nvhe_##sym
         |                           ^~~~~~~~~~~
   include/linux/percpu-defs.h:88:44: note: in definition of macro 'DECLARE_PER_CPU_SECTION'
      88 |  extern __PCPU_ATTRS(sec) __typeof__(type) name
         |                                            ^~~~
   arch/arm64/include/asm/kvm_asm.h:79:2: note: in expansion of macro 'DECLARE_PER_CPU'
      79 |  DECLARE_PER_CPU(type, kvm_nvhe_sym(sym))
         |  ^~~~~~~~~~~~~~~
   arch/arm64/include/asm/kvm_asm.h:79:24: note: in expansion of macro 'kvm_nvhe_sym'
      79 |  DECLARE_PER_CPU(type, kvm_nvhe_sym(sym))
         |                        ^~~~~~~~~~~~
   arch/arm64/include/asm/kvm_asm.h:83:2: note: in expansion of macro 'DECLARE_KVM_NVHE_PER_CPU'
      83 |  DECLARE_KVM_NVHE_PER_CPU(type, sym)
         |  ^~~~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:48:1: note: in expansion of macro 'DECLARE_KVM_HYP_PER_CPU'
      48 | DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
         | ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:92:33: error: section attribute cannot be specified for local variables
      92 |  extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;  \
         |                                 ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
     115 |  DEFINE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:93:26: error: section attribute cannot be specified for local variables
      93 |  __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;   \
         |                          ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
     115 |  DEFINE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
>> include/linux/percpu-defs.h:93:26: error: declaration of '__pcpu_unique_kvm_arm_hyp_stack_page' with no linkage follows extern declaration
      93 |  __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;   \
         |                          ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
     115 |  DEFINE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:92:33: note: previous declaration of '__pcpu_unique_kvm_arm_hyp_stack_page' was here
      92 |  extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;  \
         |                                 ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
     115 |  DEFINE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:38: error: section attribute cannot be specified for local variables
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |                                      ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:94:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
      94 |  extern __PCPU_ATTRS(sec) __typeof__(type) name;   \
         |                                            ^~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:38: error: section attribute cannot be specified for local variables
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |                                      ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:95:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
      95 |  __PCPU_ATTRS(sec) __weak __typeof__(type) name
         |                                            ^~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:38: error: weak declaration of 'kvm_arm_hyp_stack_page' must be public
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |                                      ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:95:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
      95 |  __PCPU_ATTRS(sec) __weak __typeof__(type) name
         |                                            ^~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:38: error: declaration of 'kvm_arm_hyp_stack_page' with no linkage follows extern declaration
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |                                      ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:95:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
      95 |  __PCPU_ATTRS(sec) __weak __typeof__(type) name
         |                                            ^~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:50:38: note: previous declaration of 'kvm_arm_hyp_stack_page' was here
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |                                      ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:94:44: note: in definition of macro 'DEFINE_PER_CPU_SECTION'
      94 |  extern __PCPU_ATTRS(sec) __typeof__(type) name;   \
         |                                            ^~~~
   arch/arm64/kvm/arm.c:50:8: note: in expansion of macro 'DEFINE_PER_CPU'
      50 | static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
         |        ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:92:33: error: section attribute cannot be specified for local variables
      92 |  extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;  \
         |                                 ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
     115 |  DEFINE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:60:8: note: in expansion of macro 'DEFINE_PER_CPU'
      60 | static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
         |        ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:93:26: error: section attribute cannot be specified for local variables
      93 |  __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;   \
         |                          ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
     115 |  DEFINE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:60:8: note: in expansion of macro 'DEFINE_PER_CPU'
      60 | static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
         |        ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:93:26: error: declaration of '__pcpu_unique_kvm_arm_hardware_enabled' with no linkage follows extern declaration
      93 |  __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;   \
         |                          ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
     115 |  DEFINE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:60:8: note: in expansion of macro 'DEFINE_PER_CPU'
      60 | static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
         |        ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:92:33: note: previous declaration of '__pcpu_unique_kvm_arm_hardware_enabled' was here
      92 |  extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;  \
         |                                 ^~~~~~~~~~~~~~
   include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
     115 |  DEFINE_PER_CPU_SECTION(type, name, "")
         |  ^~~~~~~~~~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:60:8: note: in expansion of macro 'DEFINE_PER_CPU'
      60 | static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
         |        ^~~~~~~~~~~~~~
   arch/arm64/kvm/arm.c:60:38: error: section attribute cannot be specified for local variables
      60 | static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
--
   In file included from arch/arm64/kvm/mmu.c:20:
   arch/arm64/include/asm/kvm_emulate.h: In function 'kvm_vcpu_trap_get_fault_level':
>> arch/arm64/include/asm/kvm_emulate.h:339:29: error: invalid storage class for function 'kvm_vcpu_abt_issea'
     339 | static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:358:28: error: invalid storage class for function 'kvm_vcpu_sys_get_rt'
     358 | static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
         |                            ^~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:364:20: error: invalid storage class for function 'kvm_is_write_fault'
     364 | static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
         |                    ^~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:375:29: error: invalid storage class for function 'kvm_vcpu_get_mpidr_aff'
     375 | static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:380:20: error: invalid storage class for function 'kvm_vcpu_set_be'
     380 | static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
         |                    ^~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:391:20: error: invalid storage class for function 'kvm_vcpu_is_be'
     391 | static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
         |                    ^~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:399:29: error: invalid storage class for function 'vcpu_data_guest_to_host'
     399 | static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
         |                             ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:430:29: error: invalid storage class for function 'vcpu_data_host_to_guest'
     430 | static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
         |                             ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/arm64/include/asm/kvm_emulate.h:461:29: error: invalid storage class for function 'kvm_incr_pc'
     461 | static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu)
         |                             ^~~~~~~~~~~
   In file included from include/trace/events/kvm.h:5,
                    from arch/arm64/kvm/mmu.c:12:
   include/linux/tracepoint.h:237:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     237 |  extern struct tracepoint __tracepoint_##name;   \
         |  ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:14:1: note: in expansion of macro 'TRACE_EVENT'
      14 | TRACE_EVENT(kvm_entry,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:238:21: error: invalid storage class for function 'trace_kvm_entry'
     238 |  static inline void trace_##name(proto)    \
         |                     ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:14:1: note: in expansion of macro 'TRACE_EVENT'
      14 | TRACE_EVENT(kvm_entry,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:210:21: error: invalid storage class for function 'trace_kvm_entry_rcuidle'
     210 |  static inline void trace_##name##_rcuidle(proto)  \
         |                     ^~~~~~
   include/linux/tracepoint.h:251:2: note: in expansion of macro '__DECLARE_TRACE_RCU'
     251 |  __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:14:1: note: in expansion of macro 'TRACE_EVENT'
      14 | TRACE_EVENT(kvm_entry,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:254:2: error: invalid storage class for function 'register_trace_kvm_entry'
     254 |  register_trace_##name(void (*probe)(data_proto), void *data) \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:14:1: note: in expansion of macro 'TRACE_EVENT'
      14 | TRACE_EVENT(kvm_entry,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:260:2: error: invalid storage class for function 'register_trace_prio_kvm_entry'
     260 |  register_trace_prio_##name(void (*probe)(data_proto), void *data,\
         |  ^~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:14:1: note: in expansion of macro 'TRACE_EVENT'
      14 | TRACE_EVENT(kvm_entry,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:267:2: error: invalid storage class for function 'unregister_trace_kvm_entry'
     267 |  unregister_trace_##name(void (*probe)(data_proto), void *data) \
         |  ^~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:14:1: note: in expansion of macro 'TRACE_EVENT'
      14 | TRACE_EVENT(kvm_entry,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:273:2: error: invalid storage class for function 'check_trace_callback_type_kvm_entry'
     273 |  check_trace_callback_type_##name(void (*cb)(data_proto)) \
         |  ^~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:14:1: note: in expansion of macro 'TRACE_EVENT'
      14 | TRACE_EVENT(kvm_entry,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:277:2: error: invalid storage class for function 'trace_kvm_entry_enabled'
     277 |  trace_##name##_enabled(void)     \
         |  ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:14:1: note: in expansion of macro 'TRACE_EVENT'
      14 | TRACE_EVENT(kvm_entry,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:235:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     235 |  extern int __traceiter_##name(data_proto);   \
         |  ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:237:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     237 |  extern struct tracepoint __tracepoint_##name;   \
         |  ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:238:21: error: invalid storage class for function 'trace_kvm_exit'
     238 |  static inline void trace_##name(proto)    \
         |                     ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:210:21: error: invalid storage class for function 'trace_kvm_exit_rcuidle'
     210 |  static inline void trace_##name##_rcuidle(proto)  \
         |                     ^~~~~~
   include/linux/tracepoint.h:251:2: note: in expansion of macro '__DECLARE_TRACE_RCU'
     251 |  __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:254:2: error: invalid storage class for function 'register_trace_kvm_exit'
     254 |  register_trace_##name(void (*probe)(data_proto), void *data) \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
>> include/linux/tracepoint.h:260:2: error: invalid storage class for function 'register_trace_prio_kvm_exit'
     260 |  register_trace_prio_##name(void (*probe)(data_proto), void *data,\
         |  ^~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:267:2: error: invalid storage class for function 'unregister_trace_kvm_exit'
     267 |  unregister_trace_##name(void (*probe)(data_proto), void *data) \
         |  ^~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:273:2: error: invalid storage class for function 'check_trace_callback_type_kvm_exit'
     273 |  check_trace_callback_type_##name(void (*cb)(data_proto)) \
         |  ^~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:277:2: error: invalid storage class for function 'trace_kvm_exit_enabled'
     277 |  trace_##name##_enabled(void)     \
         |  ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:29:1: note: in expansion of macro 'TRACE_EVENT'
      29 | TRACE_EVENT(kvm_exit,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:235:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     235 |  extern int __traceiter_##name(data_proto);   \
         |  ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:52:1: note: in expansion of macro 'TRACE_EVENT'
      52 | TRACE_EVENT(kvm_guest_fault,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:237:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     237 |  extern struct tracepoint __tracepoint_##name;   \
         |  ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:52:1: note: in expansion of macro 'TRACE_EVENT'
      52 | TRACE_EVENT(kvm_guest_fault,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:238:21: error: invalid storage class for function 'trace_kvm_guest_fault'
     238 |  static inline void trace_##name(proto)    \
         |                     ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:52:1: note: in expansion of macro 'TRACE_EVENT'
      52 | TRACE_EVENT(kvm_guest_fault,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:210:21: error: invalid storage class for function 'trace_kvm_guest_fault_rcuidle'
     210 |  static inline void trace_##name##_rcuidle(proto)  \
         |                     ^~~~~~
   include/linux/tracepoint.h:251:2: note: in expansion of macro '__DECLARE_TRACE_RCU'
     251 |  __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:547:2: note: in expansion of macro 'DECLARE_TRACE'
     547 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   arch/arm64/kvm/trace_arm.h:52:1: note: in expansion of macro 'TRACE_EVENT'
      52 | TRACE_EVENT(kvm_guest_fault,
         | ^~~~~~~~~~~
   include/linux/tracepoint.h:254:2: error: invalid storage class for function 'register_trace_kvm_guest_fault'
     254 |  register_trace_##name(void (*probe)(data_proto), void *data) \
..

vim +/kvm_vcpu_abt_issea +339 arch/arm64/include/asm/kvm_emulate.h

3f926cff86d61ff Yanan Wang        2020-11-30  338  
c9a636f29b5f236 Will Deacon       2020-07-29 @339  static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
bb428921b777a5e James Morse       2017-07-18  340  {
a2b83133339067c Dongjiu Geng      2017-10-30  341  	switch (kvm_vcpu_trap_get_fault(vcpu)) {
bb428921b777a5e James Morse       2017-07-18  342  	case FSC_SEA:
bb428921b777a5e James Morse       2017-07-18  343  	case FSC_SEA_TTW0:
bb428921b777a5e James Morse       2017-07-18  344  	case FSC_SEA_TTW1:
bb428921b777a5e James Morse       2017-07-18  345  	case FSC_SEA_TTW2:
bb428921b777a5e James Morse       2017-07-18  346  	case FSC_SEA_TTW3:
bb428921b777a5e James Morse       2017-07-18  347  	case FSC_SECC:
bb428921b777a5e James Morse       2017-07-18  348  	case FSC_SECC_TTW0:
bb428921b777a5e James Morse       2017-07-18  349  	case FSC_SECC_TTW1:
bb428921b777a5e James Morse       2017-07-18  350  	case FSC_SECC_TTW2:
bb428921b777a5e James Morse       2017-07-18  351  	case FSC_SECC_TTW3:
bb428921b777a5e James Morse       2017-07-18  352  		return true;
bb428921b777a5e James Morse       2017-07-18  353  	default:
bb428921b777a5e James Morse       2017-07-18  354  		return false;
bb428921b777a5e James Morse       2017-07-18  355  	}
bb428921b777a5e James Morse       2017-07-18  356  }
bb428921b777a5e James Morse       2017-07-18  357  
5c37f1ae1c33580 James Morse       2020-02-20 @358  static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
c667186f1c01ca8 Marc Zyngier      2017-04-27  359  {
3a949f4c9354875 Gavin Shan        2020-06-30  360  	u32 esr = kvm_vcpu_get_esr(vcpu);
1c8391412d7794e Anshuman Khandual 2018-09-20  361  	return ESR_ELx_SYS64_ISS_RT(esr);
c667186f1c01ca8 Marc Zyngier      2017-04-27  362  }
c667186f1c01ca8 Marc Zyngier      2017-04-27  363  
64cf98fa5544aee Christoffer Dall  2016-05-01 @364  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
64cf98fa5544aee Christoffer Dall  2016-05-01  365  {
c4ad98e4b72cb5b Marc Zyngier      2020-09-15  366  	if (kvm_vcpu_abt_iss1tw(vcpu))
c4ad98e4b72cb5b Marc Zyngier      2020-09-15  367  		return true;
c4ad98e4b72cb5b Marc Zyngier      2020-09-15  368  
64cf98fa5544aee Christoffer Dall  2016-05-01  369  	if (kvm_vcpu_trap_is_iabt(vcpu))
64cf98fa5544aee Christoffer Dall  2016-05-01  370  		return false;
64cf98fa5544aee Christoffer Dall  2016-05-01  371  
64cf98fa5544aee Christoffer Dall  2016-05-01  372  	return kvm_vcpu_dabt_iswrite(vcpu);
64cf98fa5544aee Christoffer Dall  2016-05-01  373  }
64cf98fa5544aee Christoffer Dall  2016-05-01  374  
4429fc64b90368e Andre Przywara    2014-06-02 @375  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
79c648806f9034a Marc Zyngier      2013-10-18  376  {
8d404c4c2461375 Christoffer Dall  2016-03-16  377  	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
79c648806f9034a Marc Zyngier      2013-10-18  378  }
79c648806f9034a Marc Zyngier      2013-10-18  379  
ce94fe93d566bf3 Marc Zyngier      2013-11-05 @380  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
ce94fe93d566bf3 Marc Zyngier      2013-11-05  381  {
8d404c4c2461375 Christoffer Dall  2016-03-16  382  	if (vcpu_mode_is_32bit(vcpu)) {
256c0960b7b6453 Mark Rutland      2018-07-05  383  		*vcpu_cpsr(vcpu) |= PSR_AA32_E_BIT;
8d404c4c2461375 Christoffer Dall  2016-03-16  384  	} else {
8d404c4c2461375 Christoffer Dall  2016-03-16  385  		u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
8d404c4c2461375 Christoffer Dall  2016-03-16  386  		sctlr |= (1 << 25);
1975fa56f1c85f5 James Morse       2018-05-02  387  		vcpu_write_sys_reg(vcpu, sctlr, SCTLR_EL1);
8d404c4c2461375 Christoffer Dall  2016-03-16  388  	}
ce94fe93d566bf3 Marc Zyngier      2013-11-05  389  }
ce94fe93d566bf3 Marc Zyngier      2013-11-05  390  
6d89d2d9b5bac9d Marc Zyngier      2013-02-12 @391  static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  392  {
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  393  	if (vcpu_mode_is_32bit(vcpu))
256c0960b7b6453 Mark Rutland      2018-07-05  394  		return !!(*vcpu_cpsr(vcpu) & PSR_AA32_E_BIT);
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  395  
8d404c4c2461375 Christoffer Dall  2016-03-16  396  	return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  397  }
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  398  
6d89d2d9b5bac9d Marc Zyngier      2013-02-12 @399  static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  400  						    unsigned long data,
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  401  						    unsigned int len)
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  402  {
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  403  	if (kvm_vcpu_is_be(vcpu)) {
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  404  		switch (len) {
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  405  		case 1:
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  406  			return data & 0xff;
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  407  		case 2:
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  408  			return be16_to_cpu(data & 0xffff);
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  409  		case 4:
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  410  			return be32_to_cpu(data & 0xffffffff);
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  411  		default:
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  412  			return be64_to_cpu(data);
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  413  		}
b30070862edbdb2 Victor Kamensky   2014-06-12  414  	} else {
b30070862edbdb2 Victor Kamensky   2014-06-12  415  		switch (len) {
b30070862edbdb2 Victor Kamensky   2014-06-12  416  		case 1:
b30070862edbdb2 Victor Kamensky   2014-06-12  417  			return data & 0xff;
b30070862edbdb2 Victor Kamensky   2014-06-12  418  		case 2:
b30070862edbdb2 Victor Kamensky   2014-06-12  419  			return le16_to_cpu(data & 0xffff);
b30070862edbdb2 Victor Kamensky   2014-06-12  420  		case 4:
b30070862edbdb2 Victor Kamensky   2014-06-12  421  			return le32_to_cpu(data & 0xffffffff);
b30070862edbdb2 Victor Kamensky   2014-06-12  422  		default:
b30070862edbdb2 Victor Kamensky   2014-06-12  423  			return le64_to_cpu(data);
b30070862edbdb2 Victor Kamensky   2014-06-12  424  		}
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  425  	}
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  426  
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  427  	return data;		/* Leave LE untouched */
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  428  }
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  429  
6d89d2d9b5bac9d Marc Zyngier      2013-02-12 @430  static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  431  						    unsigned long data,
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  432  						    unsigned int len)
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  433  {
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  434  	if (kvm_vcpu_is_be(vcpu)) {
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  435  		switch (len) {
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  436  		case 1:
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  437  			return data & 0xff;
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  438  		case 2:
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  439  			return cpu_to_be16(data & 0xffff);
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  440  		case 4:
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  441  			return cpu_to_be32(data & 0xffffffff);
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  442  		default:
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  443  			return cpu_to_be64(data);
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  444  		}
b30070862edbdb2 Victor Kamensky   2014-06-12  445  	} else {
b30070862edbdb2 Victor Kamensky   2014-06-12  446  		switch (len) {
b30070862edbdb2 Victor Kamensky   2014-06-12  447  		case 1:
b30070862edbdb2 Victor Kamensky   2014-06-12  448  			return data & 0xff;
b30070862edbdb2 Victor Kamensky   2014-06-12  449  		case 2:
b30070862edbdb2 Victor Kamensky   2014-06-12  450  			return cpu_to_le16(data & 0xffff);
b30070862edbdb2 Victor Kamensky   2014-06-12  451  		case 4:
b30070862edbdb2 Victor Kamensky   2014-06-12  452  			return cpu_to_le32(data & 0xffffffff);
b30070862edbdb2 Victor Kamensky   2014-06-12  453  		default:
b30070862edbdb2 Victor Kamensky   2014-06-12  454  			return cpu_to_le64(data);
b30070862edbdb2 Victor Kamensky   2014-06-12  455  		}
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  456  	}
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  457  
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  458  	return data;		/* Leave LE untouched */
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  459  }
6d89d2d9b5bac9d Marc Zyngier      2013-02-12  460  
cdb5e02ed133731 Marc Zyngier      2020-10-14 @461  static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu)
bd7d95cafb499e2 Mark Rutland      2018-11-09  462  {
cdb5e02ed133731 Marc Zyngier      2020-10-14  463  	vcpu->arch.flags |= KVM_ARM64_INCREMENT_PC;
bd7d95cafb499e2 Mark Rutland      2018-11-09  464  }
bd7d95cafb499e2 Mark Rutland      2018-11-09  465  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 75479 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-11-30 16:01         ` Will Deacon
@ 2020-12-01  2:30           ` wangyanan (Y)
  -1 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01  2:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

Hi Will,

On 2020/12/1 0:01, Will Deacon wrote:
> Hi,
>
> Cheers for the quick reply. See below for more questions...
>
> On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
>> On 2020/11/30 21:34, Will Deacon wrote:
>>> On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>>> index 696b6aa83faf..fec8dc9f2baa 100644
>>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>>> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>>>>    	return 0;
>>>>    }
>>>> +static void stage2_flush_dcache(void *addr, u64 size);
>>>> +static bool stage2_pte_cacheable(kvm_pte_t pte);
>>>> +
>>>>    static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>>>    				struct stage2_map_data *data)
>>>>    {
>>>> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>>>    	struct page *page = virt_to_page(ptep);
>>>>    	if (data->anchor) {
>>>> -		if (kvm_pte_valid(pte))
>>>> +		if (kvm_pte_valid(pte)) {
>>>> +			kvm_set_invalid_pte(ptep);
>>>> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>>>> +				     addr, level);
>>>>    			put_page(page);
>>> This doesn't make sense to me: the page-table pages we're walking when the
>>> anchor is set are not accessible to the hardware walker because we unhooked
>>> the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
>>> TLB invalidation.
>>>
>>> Are you seeing a problem in practice here?
>> Yes, I indeed find a problem in practice.
>>
>> When the migration was cancelled, a TLB conflic abort  was found in guest.
>>
>> This problem is fixed before rework of the page table code, you can have a
>> look in the following two links:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
>>
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
> Ok, let's go through this, because I still don't see the bug. Please correct
> me if you spot any mistakes:
>
>    1. We have a block mapping for X => Y
>    2. Dirty logging is enabled, so the block mapping is write-protected and
>       ends up being split into page mappings
>    3. Dirty logging is disabled due to a failed migration.
>
> --- At this point, I think we agree that the state of the MMU is alright ---
>
>    4. We take a stage-2 fault and want to reinstall the block mapping:
>
>       a. kvm_pgtable_stage2_map() is invoked to install the block mapping
>       b. stage2_map_walk_table_pre() finds a table where we would like to
>          install the block:
>
> 	i.   The anchor is set to point at this entry
> 	ii.  The entry is made invalid
> 	iii. We invalidate the TLB for the input address. This is
> 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
>
> 	*** At this point, the page-table pointed to by the old table entry
> 	    is not reachable to the hardware walker ***
>
>       c. stage2_map_walk_leaf() is called for each leaf entry in the
>          now-unreachable subtree, dropping page-references for each valid
> 	entry it finds.
>       d. stage2_map_walk_table_post() is eventually called for the entry
>          which we cleared back in b.ii, so we install the new block mapping.
>
> You are proposing to add additional TLB invalidation to (c), but I don't
> think that is necessary, thanks to the invalidation already performed in
> b.iii. What am I missing here?

The point is at b.iii where the TLBI is not enough. There are many page 
mappings that we need to merge into a block mapping.

We invalidate the TLB for the input address without level hint at b.iii, 
but this operation just flush TLB for one page mapping, there

are still some TLB entries for the other page mappings in the cache, the 
MMU hardware walker can still hit these entries next time.


Maybe we can imagine a concrete example here. If we now need to merge 
page mappings into a 1G block mapping, and the

fault_ipa to user_mem_abort() is 0x225043000, after ALIGNMENT to 1G, the 
input address will be 0x200000000, then the TLBI

operation at b.iii will invalidate the TLB entry for address 
0x200000000. But what about address 0x200001000 , 0x200002000 ... ?

After the installing of 1G block mapping is finished, when the fault_ipa 
is 0x200007000, MMU can still hit the TLB entry for page

mapping in the cache and can also access memory through the new block entry.


So adding TLBI operation in stage2_map_walk_leaf() aims to invalidate 
TLB entries for all the page mappings that will be merged.

In this way, after installing of block mapping, MMU can only access 
memory through the new block entry.

>>>> +			if (stage2_pte_cacheable(pte))
>>>> +				stage2_flush_dcache(kvm_pte_follow(pte),
>>>> +						    kvm_granule_size(level));
>>> I don't understand the need for the flush either, as we're just coalescing
>>> existing entries into a larger block mapping.
>> In my opinion, after unmapping, it is necessary to ensure the cache
>> coherency, because it is unknown whether the future mapping memory attribute
>> is changed or not (cacheable -> non_cacheable) theoretically.
> But in this case we're just changing the structure of the page-tables,
> not the pages which are mapped, are we?
>
> Will
> .

Yes, there is not a condition for now that cache attributes will be 
changed in this case.

Maybe this part of dcache flush can be cut.


Yanan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01  2:30           ` wangyanan (Y)
  0 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01  2:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

Hi Will,

On 2020/12/1 0:01, Will Deacon wrote:
> Hi,
>
> Cheers for the quick reply. See below for more questions...
>
> On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
>> On 2020/11/30 21:34, Will Deacon wrote:
>>> On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>>> index 696b6aa83faf..fec8dc9f2baa 100644
>>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>>> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>>>>    	return 0;
>>>>    }
>>>> +static void stage2_flush_dcache(void *addr, u64 size);
>>>> +static bool stage2_pte_cacheable(kvm_pte_t pte);
>>>> +
>>>>    static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>>>    				struct stage2_map_data *data)
>>>>    {
>>>> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>>>    	struct page *page = virt_to_page(ptep);
>>>>    	if (data->anchor) {
>>>> -		if (kvm_pte_valid(pte))
>>>> +		if (kvm_pte_valid(pte)) {
>>>> +			kvm_set_invalid_pte(ptep);
>>>> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>>>> +				     addr, level);
>>>>    			put_page(page);
>>> This doesn't make sense to me: the page-table pages we're walking when the
>>> anchor is set are not accessible to the hardware walker because we unhooked
>>> the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
>>> TLB invalidation.
>>>
>>> Are you seeing a problem in practice here?
>> Yes, I indeed find a problem in practice.
>>
>> When the migration was cancelled, a TLB conflic abort  was found in guest.
>>
>> This problem is fixed before rework of the page table code, you can have a
>> look in the following two links:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
>>
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
> Ok, let's go through this, because I still don't see the bug. Please correct
> me if you spot any mistakes:
>
>    1. We have a block mapping for X => Y
>    2. Dirty logging is enabled, so the block mapping is write-protected and
>       ends up being split into page mappings
>    3. Dirty logging is disabled due to a failed migration.
>
> --- At this point, I think we agree that the state of the MMU is alright ---
>
>    4. We take a stage-2 fault and want to reinstall the block mapping:
>
>       a. kvm_pgtable_stage2_map() is invoked to install the block mapping
>       b. stage2_map_walk_table_pre() finds a table where we would like to
>          install the block:
>
> 	i.   The anchor is set to point at this entry
> 	ii.  The entry is made invalid
> 	iii. We invalidate the TLB for the input address. This is
> 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
>
> 	*** At this point, the page-table pointed to by the old table entry
> 	    is not reachable to the hardware walker ***
>
>       c. stage2_map_walk_leaf() is called for each leaf entry in the
>          now-unreachable subtree, dropping page-references for each valid
> 	entry it finds.
>       d. stage2_map_walk_table_post() is eventually called for the entry
>          which we cleared back in b.ii, so we install the new block mapping.
>
> You are proposing to add additional TLB invalidation to (c), but I don't
> think that is necessary, thanks to the invalidation already performed in
> b.iii. What am I missing here?

The point is at b.iii where the TLBI is not enough. There are many page 
mappings that we need to merge into a block mapping.

We invalidate the TLB for the input address without level hint at b.iii, 
but this operation just flush TLB for one page mapping, there

are still some TLB entries for the other page mappings in the cache, the 
MMU hardware walker can still hit these entries next time.


Maybe we can imagine a concrete example here. If we now need to merge 
page mappings into a 1G block mapping, and the

fault_ipa to user_mem_abort() is 0x225043000, after ALIGNMENT to 1G, the 
input address will be 0x200000000, then the TLBI

operation at b.iii will invalidate the TLB entry for address 
0x200000000. But what about address 0x200001000 , 0x200002000 ... ?

After the installing of 1G block mapping is finished, when the fault_ipa 
is 0x200007000, MMU can still hit the TLB entry for page

mapping in the cache and can also access memory through the new block entry.


So adding TLBI operation in stage2_map_walk_leaf() aims to invalidate 
TLB entries for all the page mappings that will be merged.

In this way, after installing of block mapping, MMU can only access 
memory through the new block entry.

>>>> +			if (stage2_pte_cacheable(pte))
>>>> +				stage2_flush_dcache(kvm_pte_follow(pte),
>>>> +						    kvm_granule_size(level));
>>> I don't understand the need for the flush either, as we're just coalescing
>>> existing entries into a larger block mapping.
>> In my opinion, after unmapping, it is necessary to ensure the cache
>> coherency, because it is unknown whether the future mapping memory attribute
>> is changed or not (cacheable -> non_cacheable) theoretically.
> But in this case we're just changing the structure of the page-tables,
> not the pages which are mapped, are we?
>
> Will
> .

Yes, there is not a condition for now that cache attributes will be 
changed in this case.

Maybe this part of dcache flush can be cut.


Yanan


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
  2020-11-30 13:49     ` Will Deacon
@ 2020-12-01  6:04       ` wangyanan (Y)
  -1 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01  6:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

Hi Will,

On 2020/11/30 21:49, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 08:18:47PM +0800, Yanan Wang wrote:
>> If we get a FSC_PERM fault, just using (logging_active && writable) to determine
>> calling kvm_pgtable_stage2_map(). There will be two more cases we should consider.
>>
>> (1) After logging_active is configged back to false from true. When we get a
>> FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should
>> merge tables back to a block entry. This case is ignored by still calling
>> kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest
>> panic due to soft lockup.
>>
>> (2) We use (FSC_PERM && logging_active && writable) to determine collapsing
>> a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes
>> we may only need to relax permissions when trying to write to a page other than
>> a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine.
>>
>> The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level
>> at which a D-abort or I-abort occured. By comparing granule of the fault lookup
>> level with vma_pagesize, we can strictly distinguish conditions of calling
>> kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above
>> two cases will be well considered.
>>
>> Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/include/asm/esr.h         |  1 +
>>   arch/arm64/include/asm/kvm_emulate.h |  5 +++++
>>   arch/arm64/kvm/mmu.c                 | 11 +++++++++--
>>   3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index 22c81f1edda2..85a3e49f92f4 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -104,6 +104,7 @@
>>   /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
>>   #define ESR_ELx_FSC		(0x3F)
>>   #define ESR_ELx_FSC_TYPE	(0x3C)
>> +#define ESR_ELx_FSC_LEVEL	(0x03)
>>   #define ESR_ELx_FSC_EXTABT	(0x10)
>>   #define ESR_ELx_FSC_SERROR	(0x11)
>>   #define ESR_ELx_FSC_ACCESS	(0x08)
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 5ef2669ccd6c..2e0e8edf6306 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
>>   	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>>   }
>>   
>> +static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>> +{
>> +	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
>> +{
>> +
>>   static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
>>   {
>>   	switch (kvm_vcpu_trap_get_fault(vcpu)) {
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 1a01da9fdc99..75814a02d189 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	gfn_t gfn;
>>   	kvm_pfn_t pfn;
>>   	bool logging_active = memslot_is_logging(memslot);
>> -	unsigned long vma_pagesize;
>> +	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>> +	unsigned long vma_pagesize, fault_granule;
>>   	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>>   	struct kvm_pgtable *pgt;
>>   
>> +	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
> I like the idea, but is this macro reliable for stage-2 page-tables, given
> that we could have a concatenated pgd?
>
> Will
> .

Yes, it's fine even when we have a concatenated pgd table.

No matter a concatenated pgd will be made or not, the initial lookup 
level (start _level) is set in VTCR_EL2 register.

The MMU hardware walker will know the start_level according to 
information in VTCR_EL2.

This idea runs well in practice on host where ia_bits is 40, PAGE_SIZE 
is 4k, and a concatenated pgd is made for guest stage2.

According to the kernel info printed, the start_level is 1, and stage 2 
translation runs as expected.


Yanan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
@ 2020-12-01  6:04       ` wangyanan (Y)
  0 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01  6:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

Hi Will,

On 2020/11/30 21:49, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 08:18:47PM +0800, Yanan Wang wrote:
>> If we get a FSC_PERM fault, just using (logging_active && writable) to determine
>> calling kvm_pgtable_stage2_map(). There will be two more cases we should consider.
>>
>> (1) After logging_active is configged back to false from true. When we get a
>> FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should
>> merge tables back to a block entry. This case is ignored by still calling
>> kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest
>> panic due to soft lockup.
>>
>> (2) We use (FSC_PERM && logging_active && writable) to determine collapsing
>> a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes
>> we may only need to relax permissions when trying to write to a page other than
>> a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine.
>>
>> The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level
>> at which a D-abort or I-abort occured. By comparing granule of the fault lookup
>> level with vma_pagesize, we can strictly distinguish conditions of calling
>> kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above
>> two cases will be well considered.
>>
>> Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/include/asm/esr.h         |  1 +
>>   arch/arm64/include/asm/kvm_emulate.h |  5 +++++
>>   arch/arm64/kvm/mmu.c                 | 11 +++++++++--
>>   3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index 22c81f1edda2..85a3e49f92f4 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -104,6 +104,7 @@
>>   /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */
>>   #define ESR_ELx_FSC		(0x3F)
>>   #define ESR_ELx_FSC_TYPE	(0x3C)
>> +#define ESR_ELx_FSC_LEVEL	(0x03)
>>   #define ESR_ELx_FSC_EXTABT	(0x10)
>>   #define ESR_ELx_FSC_SERROR	(0x11)
>>   #define ESR_ELx_FSC_ACCESS	(0x08)
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 5ef2669ccd6c..2e0e8edf6306 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
>>   	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>>   }
>>   
>> +static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>> +{
>> +	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
>> +{
>> +
>>   static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
>>   {
>>   	switch (kvm_vcpu_trap_get_fault(vcpu)) {
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 1a01da9fdc99..75814a02d189 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	gfn_t gfn;
>>   	kvm_pfn_t pfn;
>>   	bool logging_active = memslot_is_logging(memslot);
>> -	unsigned long vma_pagesize;
>> +	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>> +	unsigned long vma_pagesize, fault_granule;
>>   	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>>   	struct kvm_pgtable *pgt;
>>   
>> +	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
> I like the idea, but is this macro reliable for stage-2 page-tables, given
> that we could have a concatenated pgd?
>
> Will
> .

Yes, it's fine even when we have a concatenated pgd table.

No matter a concatenated pgd will be made or not, the initial lookup 
level (start _level) is set in VTCR_EL2 register.

The MMU hardware walker will know the start_level according to 
information in VTCR_EL2.

This idea runs well in practice on host where ia_bits is 40, PAGE_SIZE 
is 4k, and a concatenated pgd is made for guest stage2.

According to the kernel info printed, the start_level is 1, and stage 2 
translation runs as expected.


Yanan


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
  2020-11-30 13:21     ` Will Deacon
@ 2020-12-01  7:21       ` wangyanan (Y)
  -1 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01  7:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

Hi Will,

On 2020/11/30 21:21, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>> When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
>> When just updating a valid leaf ptep, we shouldn't get_page(ptep).
>> Incorrect page_count of translation tables might lead to memory leak,
>> when unmapping a stage 2 memory range.
> Did you find this by inspection, or did you hit this in practice? I'd be
> interested to see the backtrace for mapping over an existing mapping.

Actually this is found by inspection.

In the current code, get_page() will uniformly called at "out_get_page" 
in function stage2_map_walk_leaf(),

no matter the old ptep is valid or not.

When using stage2_unmap_walker() API to unmap a memory range, some 
page-table pages might not be

freed if page_count of the pages is not right.

>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 0271b4a3b9fe..696b6aa83faf 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>   		return old == pte;
>>   
>>   	smp_store_release(ptep, pte);
>> +	get_page(virt_to_page(ptep));
> This is also used for the hypervisor stage-1 page-table, so I'd prefer to
> leave this function as-is.
I agree at this point.
>>   	return true;
>>   }
>>   
>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>   	/* There's an existing valid leaf entry, so perform break-before-make */
>>   	kvm_set_invalid_pte(ptep);
>>   	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>> +	put_page(virt_to_page(ptep));
>>   	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>   out:
>>   	data->phys += granule;
> Isn't this hunk alone sufficient to solve the problem?
>
> Will
> .

Not sufficient enough. When the old ptep is valid and old pte equlas new 
pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()

and get_page() will still be called.


Yanan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
@ 2020-12-01  7:21       ` wangyanan (Y)
  0 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01  7:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

Hi Will,

On 2020/11/30 21:21, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>> When installing a new leaf pte onto an invalid ptep, we need to get_page(ptep).
>> When just updating a valid leaf ptep, we shouldn't get_page(ptep).
>> Incorrect page_count of translation tables might lead to memory leak,
>> when unmapping a stage 2 memory range.
> Did you find this by inspection, or did you hit this in practice? I'd be
> interested to see the backtrace for mapping over an existing mapping.

Actually this is found by inspection.

In the current code, get_page() will uniformly called at "out_get_page" 
in function stage2_map_walk_leaf(),

no matter the old ptep is valid or not.

When using stage2_unmap_walker() API to unmap a memory range, some 
page-table pages might not be

freed if page_count of the pages is not right.

>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   arch/arm64/kvm/hyp/pgtable.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 0271b4a3b9fe..696b6aa83faf 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>   		return old == pte;
>>   
>>   	smp_store_release(ptep, pte);
>> +	get_page(virt_to_page(ptep));
> This is also used for the hypervisor stage-1 page-table, so I'd prefer to
> leave this function as-is.
I agree at this point.
>>   	return true;
>>   }
>>   
>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>   	/* There's an existing valid leaf entry, so perform break-before-make */
>>   	kvm_set_invalid_pte(ptep);
>>   	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>> +	put_page(virt_to_page(ptep));
>>   	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>   out:
>>   	data->phys += granule;
> Isn't this hunk alone sufficient to solve the problem?
>
> Will
> .

Not sufficient enough. When the old ptep is valid and old pte equlas new 
pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()

and get_page() will still be called.


Yanan


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-12-01  2:30           ` wangyanan (Y)
@ 2020-12-01 13:46             ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 13:46 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
> On 2020/12/1 0:01, Will Deacon wrote:
> > On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
> > > On 2020/11/30 21:34, Will Deacon wrote:
> > > > On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
> > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > > > index 696b6aa83faf..fec8dc9f2baa 100644
> > > > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > > > @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> > > > >    	return 0;
> > > > >    }
> > > > > +static void stage2_flush_dcache(void *addr, u64 size);
> > > > > +static bool stage2_pte_cacheable(kvm_pte_t pte);
> > > > > +
> > > > >    static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > > > >    				struct stage2_map_data *data)
> > > > >    {
> > > > > @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > > > >    	struct page *page = virt_to_page(ptep);
> > > > >    	if (data->anchor) {
> > > > > -		if (kvm_pte_valid(pte))
> > > > > +		if (kvm_pte_valid(pte)) {
> > > > > +			kvm_set_invalid_pte(ptep);
> > > > > +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
> > > > > +				     addr, level);
> > > > >    			put_page(page);
> > > > This doesn't make sense to me: the page-table pages we're walking when the
> > > > anchor is set are not accessible to the hardware walker because we unhooked
> > > > the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
> > > > TLB invalidation.
> > > > 
> > > > Are you seeing a problem in practice here?
> > > Yes, I indeed find a problem in practice.
> > > 
> > > When the migration was cancelled, a TLB conflic abort  was found in guest.
> > > 
> > > This problem is fixed before rework of the page table code, you can have a
> > > look in the following two links:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
> > > 
> > > https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
> > Ok, let's go through this, because I still don't see the bug. Please correct
> > me if you spot any mistakes:
> > 
> >    1. We have a block mapping for X => Y
> >    2. Dirty logging is enabled, so the block mapping is write-protected and
> >       ends up being split into page mappings
> >    3. Dirty logging is disabled due to a failed migration.
> > 
> > --- At this point, I think we agree that the state of the MMU is alright ---
> > 
> >    4. We take a stage-2 fault and want to reinstall the block mapping:
> > 
> >       a. kvm_pgtable_stage2_map() is invoked to install the block mapping
> >       b. stage2_map_walk_table_pre() finds a table where we would like to
> >          install the block:
> > 
> > 	i.   The anchor is set to point at this entry
> > 	ii.  The entry is made invalid
> > 	iii. We invalidate the TLB for the input address. This is
> > 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
> > 
> > 	*** At this point, the page-table pointed to by the old table entry
> > 	    is not reachable to the hardware walker ***
> > 
> >       c. stage2_map_walk_leaf() is called for each leaf entry in the
> >          now-unreachable subtree, dropping page-references for each valid
> > 	entry it finds.
> >       d. stage2_map_walk_table_post() is eventually called for the entry
> >          which we cleared back in b.ii, so we install the new block mapping.
> > 
> > You are proposing to add additional TLB invalidation to (c), but I don't
> > think that is necessary, thanks to the invalidation already performed in
> > b.iii. What am I missing here?
> 
> The point is at b.iii where the TLBI is not enough. There are many page
> mappings that we need to merge into a block mapping.
> 
> We invalidate the TLB for the input address without level hint at b.iii, but
> this operation just flush TLB for one page mapping, there
> 
> are still some TLB entries for the other page mappings in the cache, the MMU
> hardware walker can still hit these entries next time.

Ah, yes, I see. Thanks. I hadn't considered the case where there are table
entries beneath the anchor. So how about the diff below?

Will

--->8

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..12526d8c7ae4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 		return 0;
 
 	kvm_set_invalid_pte(ptep);
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
+	/* TLB invalidation is deferred until the _post handler */
 	data->anchor = ptep;
 	return 0;
 }
@@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 				      struct stage2_map_data *data)
 {
 	int ret = 0;
+	kvm_pte_t pte = *ptep;
 
 	if (!data->anchor)
 		return 0;
 
-	free_page((unsigned long)kvm_pte_follow(*ptep));
+	kvm_set_invalid_pte(ptep);
+
+	/*
+	 * Invalidate the whole stage-2, as we may have numerous leaf
+	 * entries below us which would otherwise need invalidating
+	 * individually.
+	 */
+	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+
+	free_page((unsigned long)kvm_pte_follow(pte));
 	put_page(virt_to_page(ptep));
 
 	if (data->anchor == ptep) {

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01 13:46             ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 13:46 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
> On 2020/12/1 0:01, Will Deacon wrote:
> > On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
> > > On 2020/11/30 21:34, Will Deacon wrote:
> > > > On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
> > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > > > index 696b6aa83faf..fec8dc9f2baa 100644
> > > > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > > > @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> > > > >    	return 0;
> > > > >    }
> > > > > +static void stage2_flush_dcache(void *addr, u64 size);
> > > > > +static bool stage2_pte_cacheable(kvm_pte_t pte);
> > > > > +
> > > > >    static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > > > >    				struct stage2_map_data *data)
> > > > >    {
> > > > > @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > > > >    	struct page *page = virt_to_page(ptep);
> > > > >    	if (data->anchor) {
> > > > > -		if (kvm_pte_valid(pte))
> > > > > +		if (kvm_pte_valid(pte)) {
> > > > > +			kvm_set_invalid_pte(ptep);
> > > > > +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
> > > > > +				     addr, level);
> > > > >    			put_page(page);
> > > > This doesn't make sense to me: the page-table pages we're walking when the
> > > > anchor is set are not accessible to the hardware walker because we unhooked
> > > > the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
> > > > TLB invalidation.
> > > > 
> > > > Are you seeing a problem in practice here?
> > > Yes, I indeed find a problem in practice.
> > > 
> > > When the migration was cancelled, a TLB conflic abort  was found in guest.
> > > 
> > > This problem is fixed before rework of the page table code, you can have a
> > > look in the following two links:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
> > > 
> > > https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
> > Ok, let's go through this, because I still don't see the bug. Please correct
> > me if you spot any mistakes:
> > 
> >    1. We have a block mapping for X => Y
> >    2. Dirty logging is enabled, so the block mapping is write-protected and
> >       ends up being split into page mappings
> >    3. Dirty logging is disabled due to a failed migration.
> > 
> > --- At this point, I think we agree that the state of the MMU is alright ---
> > 
> >    4. We take a stage-2 fault and want to reinstall the block mapping:
> > 
> >       a. kvm_pgtable_stage2_map() is invoked to install the block mapping
> >       b. stage2_map_walk_table_pre() finds a table where we would like to
> >          install the block:
> > 
> > 	i.   The anchor is set to point at this entry
> > 	ii.  The entry is made invalid
> > 	iii. We invalidate the TLB for the input address. This is
> > 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
> > 
> > 	*** At this point, the page-table pointed to by the old table entry
> > 	    is not reachable to the hardware walker ***
> > 
> >       c. stage2_map_walk_leaf() is called for each leaf entry in the
> >          now-unreachable subtree, dropping page-references for each valid
> > 	entry it finds.
> >       d. stage2_map_walk_table_post() is eventually called for the entry
> >          which we cleared back in b.ii, so we install the new block mapping.
> > 
> > You are proposing to add additional TLB invalidation to (c), but I don't
> > think that is necessary, thanks to the invalidation already performed in
> > b.iii. What am I missing here?
> 
> The point is at b.iii where the TLBI is not enough. There are many page
> mappings that we need to merge into a block mapping.
> 
> We invalidate the TLB for the input address without level hint at b.iii, but
> this operation just flush TLB for one page mapping, there
> 
> are still some TLB entries for the other page mappings in the cache, the MMU
> hardware walker can still hit these entries next time.

Ah, yes, I see. Thanks. I hadn't considered the case where there are table
entries beneath the anchor. So how about the diff below?

Will

--->8

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..12526d8c7ae4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 		return 0;
 
 	kvm_set_invalid_pte(ptep);
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
+	/* TLB invalidation is deferred until the _post handler */
 	data->anchor = ptep;
 	return 0;
 }
@@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 				      struct stage2_map_data *data)
 {
 	int ret = 0;
+	kvm_pte_t pte = *ptep;
 
 	if (!data->anchor)
 		return 0;
 
-	free_page((unsigned long)kvm_pte_follow(*ptep));
+	kvm_set_invalid_pte(ptep);
+
+	/*
+	 * Invalidate the whole stage-2, as we may have numerous leaf
+	 * entries below us which would otherwise need invalidating
+	 * individually.
+	 */
+	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+
+	free_page((unsigned long)kvm_pte_follow(pte));
 	put_page(virt_to_page(ptep));
 
 	if (data->anchor == ptep) {

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-12-01 13:46             ` Will Deacon
@ 2020-12-01 14:05               ` Marc Zyngier
  -1 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2020-12-01 14:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: wangyanan (Y),
	linux-kernel, linux-arm-kernel, Catalin Marinas, James Morse,
	Julien Thierry, Suzuki K Poulose, Gavin Shan, Quentin Perret,
	wanghaibin.wang, yezengruan, zhukeqian1, yuzenghui, jiangkunkun,
	wangjingyi11, lushenming

On 2020-12-01 13:46, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 0:01, Will Deacon wrote:
>> > On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
>> > > On 2020/11/30 21:34, Will Deacon wrote:
>> > > > On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>> > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> > > > > index 696b6aa83faf..fec8dc9f2baa 100644
>> > > > > --- a/arch/arm64/kvm/hyp/pgtable.c
>> > > > > +++ b/arch/arm64/kvm/hyp/pgtable.c
>> > > > > @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>> > > > >    	return 0;
>> > > > >    }
>> > > > > +static void stage2_flush_dcache(void *addr, u64 size);
>> > > > > +static bool stage2_pte_cacheable(kvm_pte_t pte);
>> > > > > +
>> > > > >    static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> > > > >    				struct stage2_map_data *data)
>> > > > >    {
>> > > > > @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> > > > >    	struct page *page = virt_to_page(ptep);
>> > > > >    	if (data->anchor) {
>> > > > > -		if (kvm_pte_valid(pte))
>> > > > > +		if (kvm_pte_valid(pte)) {
>> > > > > +			kvm_set_invalid_pte(ptep);
>> > > > > +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>> > > > > +				     addr, level);
>> > > > >    			put_page(page);
>> > > > This doesn't make sense to me: the page-table pages we're walking when the
>> > > > anchor is set are not accessible to the hardware walker because we unhooked
>> > > > the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
>> > > > TLB invalidation.
>> > > >
>> > > > Are you seeing a problem in practice here?
>> > > Yes, I indeed find a problem in practice.
>> > >
>> > > When the migration was cancelled, a TLB conflic abort  was found in guest.
>> > >
>> > > This problem is fixed before rework of the page table code, you can have a
>> > > look in the following two links:
>> > >
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
>> > >
>> > > https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
>> > Ok, let's go through this, because I still don't see the bug. Please correct
>> > me if you spot any mistakes:
>> >
>> >    1. We have a block mapping for X => Y
>> >    2. Dirty logging is enabled, so the block mapping is write-protected and
>> >       ends up being split into page mappings
>> >    3. Dirty logging is disabled due to a failed migration.
>> >
>> > --- At this point, I think we agree that the state of the MMU is alright ---
>> >
>> >    4. We take a stage-2 fault and want to reinstall the block mapping:
>> >
>> >       a. kvm_pgtable_stage2_map() is invoked to install the block mapping
>> >       b. stage2_map_walk_table_pre() finds a table where we would like to
>> >          install the block:
>> >
>> > 	i.   The anchor is set to point at this entry
>> > 	ii.  The entry is made invalid
>> > 	iii. We invalidate the TLB for the input address. This is
>> > 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
>> >
>> > 	*** At this point, the page-table pointed to by the old table entry
>> > 	    is not reachable to the hardware walker ***
>> >
>> >       c. stage2_map_walk_leaf() is called for each leaf entry in the
>> >          now-unreachable subtree, dropping page-references for each valid
>> > 	entry it finds.
>> >       d. stage2_map_walk_table_post() is eventually called for the entry
>> >          which we cleared back in b.ii, so we install the new block mapping.
>> >
>> > You are proposing to add additional TLB invalidation to (c), but I don't
>> > think that is necessary, thanks to the invalidation already performed in
>> > b.iii. What am I missing here?
>> 
>> The point is at b.iii where the TLBI is not enough. There are many 
>> page
>> mappings that we need to merge into a block mapping.
>> 
>> We invalidate the TLB for the input address without level hint at 
>> b.iii, but
>> this operation just flush TLB for one page mapping, there
>> 
>> are still some TLB entries for the other page mappings in the cache, 
>> the MMU
>> hardware walker can still hit these entries next time.
> 
> Ah, yes, I see. Thanks. I hadn't considered the case where there are 
> table
> entries beneath the anchor. So how about the diff below?
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c 
> b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..12526d8c7ae4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64
> end, u32 level,
>  		return 0;
> 
>  	kvm_set_invalid_pte(ptep);
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	/* TLB invalidation is deferred until the _post handler */
>  	data->anchor = ptep;
>  	return 0;
>  }
> @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr,
> u64 end, u32 level,
>  				      struct stage2_map_data *data)
>  {
>  	int ret = 0;
> +	kvm_pte_t pte = *ptep;
> 
>  	if (!data->anchor)
>  		return 0;
> 
> -	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	kvm_set_invalid_pte(ptep);
> +
> +	/*
> +	 * Invalidate the whole stage-2, as we may have numerous leaf
> +	 * entries below us which would otherwise need invalidating
> +	 * individually.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);

That's a big hammer, and we so far have been pretty careful not to
over-invalidate. Is the block-replacing-table *without* an unmap
in between the only case where this triggers?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01 14:05               ` Marc Zyngier
  0 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2020-12-01 14:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Catalin Marinas,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	wangyanan (Y),
	yezengruan, James Morse, linux-arm-kernel, yuzenghui,
	wanghaibin.wang, zhukeqian1, Julien Thierry

On 2020-12-01 13:46, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 0:01, Will Deacon wrote:
>> > On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
>> > > On 2020/11/30 21:34, Will Deacon wrote:
>> > > > On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>> > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> > > > > index 696b6aa83faf..fec8dc9f2baa 100644
>> > > > > --- a/arch/arm64/kvm/hyp/pgtable.c
>> > > > > +++ b/arch/arm64/kvm/hyp/pgtable.c
>> > > > > @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>> > > > >    	return 0;
>> > > > >    }
>> > > > > +static void stage2_flush_dcache(void *addr, u64 size);
>> > > > > +static bool stage2_pte_cacheable(kvm_pte_t pte);
>> > > > > +
>> > > > >    static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> > > > >    				struct stage2_map_data *data)
>> > > > >    {
>> > > > > @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> > > > >    	struct page *page = virt_to_page(ptep);
>> > > > >    	if (data->anchor) {
>> > > > > -		if (kvm_pte_valid(pte))
>> > > > > +		if (kvm_pte_valid(pte)) {
>> > > > > +			kvm_set_invalid_pte(ptep);
>> > > > > +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>> > > > > +				     addr, level);
>> > > > >    			put_page(page);
>> > > > This doesn't make sense to me: the page-table pages we're walking when the
>> > > > anchor is set are not accessible to the hardware walker because we unhooked
>> > > > the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
>> > > > TLB invalidation.
>> > > >
>> > > > Are you seeing a problem in practice here?
>> > > Yes, I indeed find a problem in practice.
>> > >
>> > > When the migration was cancelled, a TLB conflic abort  was found in guest.
>> > >
>> > > This problem is fixed before rework of the page table code, you can have a
>> > > look in the following two links:
>> > >
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
>> > >
>> > > https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
>> > Ok, let's go through this, because I still don't see the bug. Please correct
>> > me if you spot any mistakes:
>> >
>> >    1. We have a block mapping for X => Y
>> >    2. Dirty logging is enabled, so the block mapping is write-protected and
>> >       ends up being split into page mappings
>> >    3. Dirty logging is disabled due to a failed migration.
>> >
>> > --- At this point, I think we agree that the state of the MMU is alright ---
>> >
>> >    4. We take a stage-2 fault and want to reinstall the block mapping:
>> >
>> >       a. kvm_pgtable_stage2_map() is invoked to install the block mapping
>> >       b. stage2_map_walk_table_pre() finds a table where we would like to
>> >          install the block:
>> >
>> > 	i.   The anchor is set to point at this entry
>> > 	ii.  The entry is made invalid
>> > 	iii. We invalidate the TLB for the input address. This is
>> > 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
>> >
>> > 	*** At this point, the page-table pointed to by the old table entry
>> > 	    is not reachable to the hardware walker ***
>> >
>> >       c. stage2_map_walk_leaf() is called for each leaf entry in the
>> >          now-unreachable subtree, dropping page-references for each valid
>> > 	entry it finds.
>> >       d. stage2_map_walk_table_post() is eventually called for the entry
>> >          which we cleared back in b.ii, so we install the new block mapping.
>> >
>> > You are proposing to add additional TLB invalidation to (c), but I don't
>> > think that is necessary, thanks to the invalidation already performed in
>> > b.iii. What am I missing here?
>> 
>> The point is at b.iii where the TLBI is not enough. There are many 
>> page
>> mappings that we need to merge into a block mapping.
>> 
>> We invalidate the TLB for the input address without level hint at 
>> b.iii, but
>> this operation just flush TLB for one page mapping, there
>> 
>> are still some TLB entries for the other page mappings in the cache, 
>> the MMU
>> hardware walker can still hit these entries next time.
> 
> Ah, yes, I see. Thanks. I hadn't considered the case where there are 
> table
> entries beneath the anchor. So how about the diff below?
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c 
> b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..12526d8c7ae4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64
> end, u32 level,
>  		return 0;
> 
>  	kvm_set_invalid_pte(ptep);
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	/* TLB invalidation is deferred until the _post handler */
>  	data->anchor = ptep;
>  	return 0;
>  }
> @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr,
> u64 end, u32 level,
>  				      struct stage2_map_data *data)
>  {
>  	int ret = 0;
> +	kvm_pte_t pte = *ptep;
> 
>  	if (!data->anchor)
>  		return 0;
> 
> -	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	kvm_set_invalid_pte(ptep);
> +
> +	/*
> +	 * Invalidate the whole stage-2, as we may have numerous leaf
> +	 * entries below us which would otherwise need invalidating
> +	 * individually.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);

That's a big hammer, and we so far have been pretty careful not to
over-invalidate. Is the block-replacing-table *without* an unmap
in between the only case where this triggers?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-12-01 13:46             ` Will Deacon
@ 2020-12-01 14:11               ` wangyanan (Y)
  -1 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01 14:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming


On 2020/12/1 21:46, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 0:01, Will Deacon wrote:
>>> On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
>>>> On 2020/11/30 21:34, Will Deacon wrote:
>>>>> On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>>>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>>>>> index 696b6aa83faf..fec8dc9f2baa 100644
>>>>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>>>>> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>>>>>>     	return 0;
>>>>>>     }
>>>>>> +static void stage2_flush_dcache(void *addr, u64 size);
>>>>>> +static bool stage2_pte_cacheable(kvm_pte_t pte);
>>>>>> +
>>>>>>     static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>>>>>     				struct stage2_map_data *data)
>>>>>>     {
>>>>>> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>>>>>     	struct page *page = virt_to_page(ptep);
>>>>>>     	if (data->anchor) {
>>>>>> -		if (kvm_pte_valid(pte))
>>>>>> +		if (kvm_pte_valid(pte)) {
>>>>>> +			kvm_set_invalid_pte(ptep);
>>>>>> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>>>>>> +				     addr, level);
>>>>>>     			put_page(page);
>>>>> This doesn't make sense to me: the page-table pages we're walking when the
>>>>> anchor is set are not accessible to the hardware walker because we unhooked
>>>>> the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
>>>>> TLB invalidation.
>>>>>
>>>>> Are you seeing a problem in practice here?
>>>> Yes, I indeed find a problem in practice.
>>>>
>>>> When the migration was cancelled, a TLB conflic abort  was found in guest.
>>>>
>>>> This problem is fixed before rework of the page table code, you can have a
>>>> look in the following two links:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
>>>>
>>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
>>> Ok, let's go through this, because I still don't see the bug. Please correct
>>> me if you spot any mistakes:
>>>
>>>     1. We have a block mapping for X => Y
>>>     2. Dirty logging is enabled, so the block mapping is write-protected and
>>>        ends up being split into page mappings
>>>     3. Dirty logging is disabled due to a failed migration.
>>>
>>> --- At this point, I think we agree that the state of the MMU is alright ---
>>>
>>>     4. We take a stage-2 fault and want to reinstall the block mapping:
>>>
>>>        a. kvm_pgtable_stage2_map() is invoked to install the block mapping
>>>        b. stage2_map_walk_table_pre() finds a table where we would like to
>>>           install the block:
>>>
>>> 	i.   The anchor is set to point at this entry
>>> 	ii.  The entry is made invalid
>>> 	iii. We invalidate the TLB for the input address. This is
>>> 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
>>>
>>> 	*** At this point, the page-table pointed to by the old table entry
>>> 	    is not reachable to the hardware walker ***
>>>
>>>        c. stage2_map_walk_leaf() is called for each leaf entry in the
>>>           now-unreachable subtree, dropping page-references for each valid
>>> 	entry it finds.
>>>        d. stage2_map_walk_table_post() is eventually called for the entry
>>>           which we cleared back in b.ii, so we install the new block mapping.
>>>
>>> You are proposing to add additional TLB invalidation to (c), but I don't
>>> think that is necessary, thanks to the invalidation already performed in
>>> b.iii. What am I missing here?
>> The point is at b.iii where the TLBI is not enough. There are many page
>> mappings that we need to merge into a block mapping.
>>
>> We invalidate the TLB for the input address without level hint at b.iii, but
>> this operation just flush TLB for one page mapping, there
>>
>> are still some TLB entries for the other page mappings in the cache, the MMU
>> hardware walker can still hit these entries next time.
> Ah, yes, I see. Thanks. I hadn't considered the case where there are table
> entries beneath the anchor. So how about the diff below?
>
> Will
>
> --->8

Hi, I think it's inappropriate to put the TLBI of all the leaf entries 
in function stage2_map_walk_table_post(),

because the *ptep must be an upper table entry when we enter 
stage2_map_walk_table_post().

We should make the TLBI for every leaf entry not table entry in the last 
lookup level,  just as I am proposing

to add the additional TLBI in function stage2_map_walk_leaf().

Thanks.


Yanan

>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..12526d8c7ae4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>   		return 0;
>   
>   	kvm_set_invalid_pte(ptep);
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	/* TLB invalidation is deferred until the _post handler */
>   	data->anchor = ptep;
>   	return 0;
>   }
> @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>   				      struct stage2_map_data *data)
>   {
>   	int ret = 0;
> +	kvm_pte_t pte = *ptep;
>   
>   	if (!data->anchor)
>   		return 0;
>   
> -	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	kvm_set_invalid_pte(ptep);
> +
> +	/*
> +	 * Invalidate the whole stage-2, as we may have numerous leaf
> +	 * entries below us which would otherwise need invalidating
> +	 * individually.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> +
> +	free_page((unsigned long)kvm_pte_follow(pte));
>   	put_page(virt_to_page(ptep));
>   
>   	if (data->anchor == ptep) {
> .

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01 14:11               ` wangyanan (Y)
  0 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01 14:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry


On 2020/12/1 21:46, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 0:01, Will Deacon wrote:
>>> On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
>>>> On 2020/11/30 21:34, Will Deacon wrote:
>>>>> On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>>>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>>>>> index 696b6aa83faf..fec8dc9f2baa 100644
>>>>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>>>>> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>>>>>>     	return 0;
>>>>>>     }
>>>>>> +static void stage2_flush_dcache(void *addr, u64 size);
>>>>>> +static bool stage2_pte_cacheable(kvm_pte_t pte);
>>>>>> +
>>>>>>     static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>>>>>     				struct stage2_map_data *data)
>>>>>>     {
>>>>>> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>>>>>     	struct page *page = virt_to_page(ptep);
>>>>>>     	if (data->anchor) {
>>>>>> -		if (kvm_pte_valid(pte))
>>>>>> +		if (kvm_pte_valid(pte)) {
>>>>>> +			kvm_set_invalid_pte(ptep);
>>>>>> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>>>>>> +				     addr, level);
>>>>>>     			put_page(page);
>>>>> This doesn't make sense to me: the page-table pages we're walking when the
>>>>> anchor is set are not accessible to the hardware walker because we unhooked
>>>>> the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
>>>>> TLB invalidation.
>>>>>
>>>>> Are you seeing a problem in practice here?
>>>> Yes, I indeed find a problem in practice.
>>>>
>>>> When the migration was cancelled, a TLB conflic abort  was found in guest.
>>>>
>>>> This problem is fixed before rework of the page table code, you can have a
>>>> look in the following two links:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
>>>>
>>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
>>> Ok, let's go through this, because I still don't see the bug. Please correct
>>> me if you spot any mistakes:
>>>
>>>     1. We have a block mapping for X => Y
>>>     2. Dirty logging is enabled, so the block mapping is write-protected and
>>>        ends up being split into page mappings
>>>     3. Dirty logging is disabled due to a failed migration.
>>>
>>> --- At this point, I think we agree that the state of the MMU is alright ---
>>>
>>>     4. We take a stage-2 fault and want to reinstall the block mapping:
>>>
>>>        a. kvm_pgtable_stage2_map() is invoked to install the block mapping
>>>        b. stage2_map_walk_table_pre() finds a table where we would like to
>>>           install the block:
>>>
>>> 	i.   The anchor is set to point at this entry
>>> 	ii.  The entry is made invalid
>>> 	iii. We invalidate the TLB for the input address. This is
>>> 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
>>>
>>> 	*** At this point, the page-table pointed to by the old table entry
>>> 	    is not reachable to the hardware walker ***
>>>
>>>        c. stage2_map_walk_leaf() is called for each leaf entry in the
>>>           now-unreachable subtree, dropping page-references for each valid
>>> 	entry it finds.
>>>        d. stage2_map_walk_table_post() is eventually called for the entry
>>>           which we cleared back in b.ii, so we install the new block mapping.
>>>
>>> You are proposing to add additional TLB invalidation to (c), but I don't
>>> think that is necessary, thanks to the invalidation already performed in
>>> b.iii. What am I missing here?
>> The point is at b.iii where the TLBI is not enough. There are many page
>> mappings that we need to merge into a block mapping.
>>
>> We invalidate the TLB for the input address without level hint at b.iii, but
>> this operation just flush TLB for one page mapping, there
>>
>> are still some TLB entries for the other page mappings in the cache, the MMU
>> hardware walker can still hit these entries next time.
> Ah, yes, I see. Thanks. I hadn't considered the case where there are table
> entries beneath the anchor. So how about the diff below?
>
> Will
>
> --->8

Hi, I think it's inappropriate to put the TLBI of all the leaf entries 
in function stage2_map_walk_table_post(),

because the *ptep must be an upper table entry when we enter 
stage2_map_walk_table_post().

We should make the TLBI for every leaf entry not table entry in the last 
lookup level,  just as I am proposing

to add the additional TLBI in function stage2_map_walk_leaf().

Thanks.


Yanan

>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..12526d8c7ae4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>   		return 0;
>   
>   	kvm_set_invalid_pte(ptep);
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	/* TLB invalidation is deferred until the _post handler */
>   	data->anchor = ptep;
>   	return 0;
>   }
> @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>   				      struct stage2_map_data *data)
>   {
>   	int ret = 0;
> +	kvm_pte_t pte = *ptep;
>   
>   	if (!data->anchor)
>   		return 0;
>   
> -	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	kvm_set_invalid_pte(ptep);
> +
> +	/*
> +	 * Invalidate the whole stage-2, as we may have numerous leaf
> +	 * entries below us which would otherwise need invalidating
> +	 * individually.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> +
> +	free_page((unsigned long)kvm_pte_follow(pte));
>   	put_page(virt_to_page(ptep));
>   
>   	if (data->anchor == ptep) {
> .

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
  2020-12-01  7:21       ` wangyanan (Y)
@ 2020-12-01 14:16         ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 14:16 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
> On 2020/11/30 21:21, Will Deacon wrote:
> > On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index 0271b4a3b9fe..696b6aa83faf 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> > >   		return old == pte;
> > >   	smp_store_release(ptep, pte);
> > > +	get_page(virt_to_page(ptep));
> > This is also used for the hypervisor stage-1 page-table, so I'd prefer to
> > leave this function as-is.
> I agree at this point.
> > >   	return true;
> > >   }
> > > @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > >   	/* There's an existing valid leaf entry, so perform break-before-make */
> > >   	kvm_set_invalid_pte(ptep);
> > >   	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > > +	put_page(virt_to_page(ptep));
> > >   	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> > >   out:
> > >   	data->phys += granule;
> > Isn't this hunk alone sufficient to solve the problem?
> > 
> > Will
> > .
> 
> Not sufficient enough. When the old ptep is valid and old pte equlas new
> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
> 
> and get_page() will still be called.

I had a go at fixing this without introducing refcounting to the hyp stage-1
case, and ended up with the diff below. What do you think?

Will

--->8

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..78e2c0dc47ae 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
 	smp_store_release(ptep, pte);
 }
 
-static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
-				   u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
+	kvm_pte_t pte = kvm_phys_to_pte(pa);
 	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
 							   KVM_PTE_TYPE_BLOCK;
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
 	pte |= KVM_PTE_VALID;
-
-	/* Tolerate KVM recreating the exact same mapping. */
-	if (kvm_pte_valid(old))
-		return old == pte;
-
-	smp_store_release(ptep, pte);
-	return true;
+	return pte;
 }
 
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
@@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
 static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
+	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return false;
 
-	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
+	/* Tolerate KVM recreating the exact same mapping. */
+	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (old != new && !WARN_ON(kvm_pte_valid(old)))
+		smp_store_release(ptep, new);
+
 	data->phys += granule;
 	return true;
 }
@@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 				       kvm_pte_t *ptep,
 				       struct stage2_map_data *data)
 {
+	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return false;
 
-	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
-		goto out;
+	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (kvm_pte_valid(old)) {
+		/*
+		 * There's an existing valid leaf entry, so perform
+		 * break-before-make.
+		 */
+		kvm_set_invalid_pte(ptep);
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+		put_page(virt_to_page(ptep));
+	}
 
-	/* There's an existing valid leaf entry, so perform break-before-make */
-	kvm_set_invalid_pte(ptep);
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
-out:
+	smp_store_release(ptep, new);
 	data->phys += granule;
 	return true;
 }

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
@ 2020-12-01 14:16         ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 14:16 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
> On 2020/11/30 21:21, Will Deacon wrote:
> > On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index 0271b4a3b9fe..696b6aa83faf 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> > >   		return old == pte;
> > >   	smp_store_release(ptep, pte);
> > > +	get_page(virt_to_page(ptep));
> > This is also used for the hypervisor stage-1 page-table, so I'd prefer to
> > leave this function as-is.
> I agree at this point.
> > >   	return true;
> > >   }
> > > @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > >   	/* There's an existing valid leaf entry, so perform break-before-make */
> > >   	kvm_set_invalid_pte(ptep);
> > >   	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > > +	put_page(virt_to_page(ptep));
> > >   	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> > >   out:
> > >   	data->phys += granule;
> > Isn't this hunk alone sufficient to solve the problem?
> > 
> > Will
> > .
> 
> Not sufficient enough. When the old ptep is valid and old pte equlas new
> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
> 
> and get_page() will still be called.

I had a go at fixing this without introducing refcounting to the hyp stage-1
case, and ended up with the diff below. What do you think?

Will

--->8

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..78e2c0dc47ae 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
 	smp_store_release(ptep, pte);
 }
 
-static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
-				   u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
+	kvm_pte_t pte = kvm_phys_to_pte(pa);
 	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
 							   KVM_PTE_TYPE_BLOCK;
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
 	pte |= KVM_PTE_VALID;
-
-	/* Tolerate KVM recreating the exact same mapping. */
-	if (kvm_pte_valid(old))
-		return old == pte;
-
-	smp_store_release(ptep, pte);
-	return true;
+	return pte;
 }
 
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
@@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
 static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 				    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
+	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return false;
 
-	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
+	/* Tolerate KVM recreating the exact same mapping. */
+	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (old != new && !WARN_ON(kvm_pte_valid(old)))
+		smp_store_release(ptep, new);
+
 	data->phys += granule;
 	return true;
 }
@@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 				       kvm_pte_t *ptep,
 				       struct stage2_map_data *data)
 {
+	kvm_pte_t new, old = *ptep;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return false;
 
-	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
-		goto out;
+	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	if (kvm_pte_valid(old)) {
+		/*
+		 * There's an existing valid leaf entry, so perform
+		 * break-before-make.
+		 */
+		kvm_set_invalid_pte(ptep);
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+		put_page(virt_to_page(ptep));
+	}
 
-	/* There's an existing valid leaf entry, so perform break-before-make */
-	kvm_set_invalid_pte(ptep);
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
-out:
+	smp_store_release(ptep, new);
 	data->phys += granule;
 	return true;
 }

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-12-01 14:05               ` Marc Zyngier
@ 2020-12-01 14:23                 ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 14:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: wangyanan (Y),
	linux-kernel, linux-arm-kernel, Catalin Marinas, James Morse,
	Julien Thierry, Suzuki K Poulose, Gavin Shan, Quentin Perret,
	wanghaibin.wang, yezengruan, zhukeqian1, yuzenghui, jiangkunkun,
	wangjingyi11, lushenming

On Tue, Dec 01, 2020 at 02:05:03PM +0000, Marc Zyngier wrote:
> On 2020-12-01 13:46, Will Deacon wrote:
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 0271b4a3b9fe..12526d8c7ae4 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64
> > end, u32 level,
> >  		return 0;
> > 
> >  	kvm_set_invalid_pte(ptep);
> > -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> > +	/* TLB invalidation is deferred until the _post handler */
> >  	data->anchor = ptep;
> >  	return 0;
> >  }
> > @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr,
> > u64 end, u32 level,
> >  				      struct stage2_map_data *data)
> >  {
> >  	int ret = 0;
> > +	kvm_pte_t pte = *ptep;
> > 
> >  	if (!data->anchor)
> >  		return 0;
> > 
> > -	free_page((unsigned long)kvm_pte_follow(*ptep));
> > +	kvm_set_invalid_pte(ptep);
> > +
> > +	/*
> > +	 * Invalidate the whole stage-2, as we may have numerous leaf
> > +	 * entries below us which would otherwise need invalidating
> > +	 * individually.
> > +	 */
> > +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> 
> That's a big hammer, and we so far have been pretty careful not to
> over-invalidate. Is the block-replacing-table *without* an unmap
> in between the only case where this triggers?

Yes, this only happens in that case. The alternative would be to issue
invalidations for every single entry we unmap, which I can implement if
you prefer, but it felt worse to me given that by-IPA invalidation
isn't really great either).

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01 14:23                 ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 14:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Catalin Marinas,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	wangyanan (Y),
	yezengruan, James Morse, linux-arm-kernel, yuzenghui,
	wanghaibin.wang, zhukeqian1, Julien Thierry

On Tue, Dec 01, 2020 at 02:05:03PM +0000, Marc Zyngier wrote:
> On 2020-12-01 13:46, Will Deacon wrote:
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 0271b4a3b9fe..12526d8c7ae4 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64
> > end, u32 level,
> >  		return 0;
> > 
> >  	kvm_set_invalid_pte(ptep);
> > -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> > +	/* TLB invalidation is deferred until the _post handler */
> >  	data->anchor = ptep;
> >  	return 0;
> >  }
> > @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr,
> > u64 end, u32 level,
> >  				      struct stage2_map_data *data)
> >  {
> >  	int ret = 0;
> > +	kvm_pte_t pte = *ptep;
> > 
> >  	if (!data->anchor)
> >  		return 0;
> > 
> > -	free_page((unsigned long)kvm_pte_follow(*ptep));
> > +	kvm_set_invalid_pte(ptep);
> > +
> > +	/*
> > +	 * Invalidate the whole stage-2, as we may have numerous leaf
> > +	 * entries below us which would otherwise need invalidating
> > +	 * individually.
> > +	 */
> > +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> 
> That's a big hammer, and we so far have been pretty careful not to
> over-invalidate. Is the block-replacing-table *without* an unmap
> in between the only case where this triggers?

Yes, this only happens in that case. The alternative would be to issue
invalidations for every single entry we unmap, which I can implement if
you prefer, but it felt worse to me given that by-IPA invalidation
isn't really great either).

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-12-01 14:23                 ` Will Deacon
@ 2020-12-01 14:32                   ` Marc Zyngier
  -1 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2020-12-01 14:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: wangyanan (Y),
	linux-kernel, linux-arm-kernel, Catalin Marinas, James Morse,
	Julien Thierry, Suzuki K Poulose, Gavin Shan, Quentin Perret,
	wanghaibin.wang, yezengruan, zhukeqian1, yuzenghui, jiangkunkun,
	wangjingyi11, lushenming

On 2020-12-01 14:23, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 02:05:03PM +0000, Marc Zyngier wrote:
>> On 2020-12-01 13:46, Will Deacon wrote:
>> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> > index 0271b4a3b9fe..12526d8c7ae4 100644
>> > --- a/arch/arm64/kvm/hyp/pgtable.c
>> > +++ b/arch/arm64/kvm/hyp/pgtable.c
>> > @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64
>> > end, u32 level,
>> >  		return 0;
>> >
>> >  	kvm_set_invalid_pte(ptep);
>> > -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
>> > +	/* TLB invalidation is deferred until the _post handler */
>> >  	data->anchor = ptep;
>> >  	return 0;
>> >  }
>> > @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr,
>> > u64 end, u32 level,
>> >  				      struct stage2_map_data *data)
>> >  {
>> >  	int ret = 0;
>> > +	kvm_pte_t pte = *ptep;
>> >
>> >  	if (!data->anchor)
>> >  		return 0;
>> >
>> > -	free_page((unsigned long)kvm_pte_follow(*ptep));
>> > +	kvm_set_invalid_pte(ptep);
>> > +
>> > +	/*
>> > +	 * Invalidate the whole stage-2, as we may have numerous leaf
>> > +	 * entries below us which would otherwise need invalidating
>> > +	 * individually.
>> > +	 */
>> > +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>> 
>> That's a big hammer, and we so far have been pretty careful not to
>> over-invalidate. Is the block-replacing-table *without* an unmap
>> in between the only case where this triggers?
> 
> Yes, this only happens in that case. The alternative would be to issue
> invalidations for every single entry we unmap, which I can implement if
> you prefer, but it felt worse to me given that by-IPA invalidation
> isn't really great either).

Right. If that's the only case where this happens, I'm not too bothered.
What I want to make sure is that the normal table->block upgrade path
(which goes via a MMU notifier that unmaps the table) stays relatively
quick and doesn't suffer from the above.

It looks like Yanan still has some concerns though, and I'd like to
understand what they are.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01 14:32                   ` Marc Zyngier
  0 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2020-12-01 14:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Catalin Marinas,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	wangyanan (Y),
	yezengruan, James Morse, linux-arm-kernel, yuzenghui,
	wanghaibin.wang, zhukeqian1, Julien Thierry

On 2020-12-01 14:23, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 02:05:03PM +0000, Marc Zyngier wrote:
>> On 2020-12-01 13:46, Will Deacon wrote:
>> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> > index 0271b4a3b9fe..12526d8c7ae4 100644
>> > --- a/arch/arm64/kvm/hyp/pgtable.c
>> > +++ b/arch/arm64/kvm/hyp/pgtable.c
>> > @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64
>> > end, u32 level,
>> >  		return 0;
>> >
>> >  	kvm_set_invalid_pte(ptep);
>> > -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
>> > +	/* TLB invalidation is deferred until the _post handler */
>> >  	data->anchor = ptep;
>> >  	return 0;
>> >  }
>> > @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr,
>> > u64 end, u32 level,
>> >  				      struct stage2_map_data *data)
>> >  {
>> >  	int ret = 0;
>> > +	kvm_pte_t pte = *ptep;
>> >
>> >  	if (!data->anchor)
>> >  		return 0;
>> >
>> > -	free_page((unsigned long)kvm_pte_follow(*ptep));
>> > +	kvm_set_invalid_pte(ptep);
>> > +
>> > +	/*
>> > +	 * Invalidate the whole stage-2, as we may have numerous leaf
>> > +	 * entries below us which would otherwise need invalidating
>> > +	 * individually.
>> > +	 */
>> > +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>> 
>> That's a big hammer, and we so far have been pretty careful not to
>> over-invalidate. Is the block-replacing-table *without* an unmap
>> in between the only case where this triggers?
> 
> Yes, this only happens in that case. The alternative would be to issue
> invalidations for every single entry we unmap, which I can implement if
> you prefer, but it felt worse to me given that by-IPA invalidation
> isn't really great either).

Right. If that's the only case where this happens, I'm not too bothered.
What I want to make sure is that the normal table->block upgrade path
(which goes via a MMU notifier that unmaps the table) stays relatively
quick and doesn't suffer from the above.

It looks like Yanan still has some concerns though, and I'd like to
understand what they are.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-12-01 14:11               ` wangyanan (Y)
@ 2020-12-01 14:35                 ` Marc Zyngier
  -1 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2020-12-01 14:35 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Will Deacon, linux-kernel, linux-arm-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

Hi Yanan,

On 2020-12-01 14:11, wangyanan (Y) wrote:
> On 2020/12/1 21:46, Will Deacon wrote:
>> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:

[...]

>>> The point is at b.iii where the TLBI is not enough. There are many 
>>> page
>>> mappings that we need to merge into a block mapping.
>>> 
>>> We invalidate the TLB for the input address without level hint at 
>>> b.iii, but
>>> this operation just flush TLB for one page mapping, there
>>> 
>>> are still some TLB entries for the other page mappings in the cache, 
>>> the MMU
>>> hardware walker can still hit these entries next time.
>> Ah, yes, I see. Thanks. I hadn't considered the case where there are 
>> table
>> entries beneath the anchor. So how about the diff below?
>> 
>> Will
>> 
>> --->8
> 
> Hi, I think it's inappropriate to put the TLBI of all the leaf entries
> in function stage2_map_walk_table_post(),
> 
> because the *ptep must be an upper table entry when we enter
> stage2_map_walk_table_post().
> 
> We should make the TLBI for every leaf entry not table entry in the
> last lookup level,  just as I am proposing
> 
> to add the additional TLBI in function stage2_map_walk_leaf().

Could you make your concerns explicit? As far as I can tell, this should
address the bug you found, at least from a correctness perspective.

Are you worried about the impact of the full S2 invalidation? Or do you
see another correctness issue?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01 14:35                 ` Marc Zyngier
  0 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2020-12-01 14:35 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: jiangkunkun, Gavin Shan, lushenming, Suzuki K Poulose,
	Catalin Marinas, zhukeqian1, Quentin Perret, wangjingyi11,
	linux-kernel, yezengruan, James Morse, linux-arm-kernel,
	yuzenghui, wanghaibin.wang, Will Deacon, Julien Thierry

Hi Yanan,

On 2020-12-01 14:11, wangyanan (Y) wrote:
> On 2020/12/1 21:46, Will Deacon wrote:
>> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:

[...]

>>> The point is at b.iii where the TLBI is not enough. There are many 
>>> page
>>> mappings that we need to merge into a block mapping.
>>> 
>>> We invalidate the TLB for the input address without level hint at 
>>> b.iii, but
>>> this operation just flush TLB for one page mapping, there
>>> 
>>> are still some TLB entries for the other page mappings in the cache, 
>>> the MMU
>>> hardware walker can still hit these entries next time.
>> Ah, yes, I see. Thanks. I hadn't considered the case where there are 
>> table
>> entries beneath the anchor. So how about the diff below?
>> 
>> Will
>> 
>> --->8
> 
> Hi, I think it's inappropriate to put the TLBI of all the leaf entries
> in function stage2_map_walk_table_post(),
> 
> because the *ptep must be an upper table entry when we enter
> stage2_map_walk_table_post().
> 
> We should make the TLBI for every leaf entry not table entry in the
> last lookup level,  just as I am proposing
> 
> to add the additional TLBI in function stage2_map_walk_leaf().

Could you make your concerns explicit? As far as I can tell, this should
address the bug you found, at least from a correctness perspective.

Are you worried about the impact of the full S2 invalidation? Or do you
see another correctness issue?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
  2020-12-01 14:16         ` Will Deacon
@ 2020-12-01 17:19           ` wangyanan (Y)
  -1 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01 17:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On 2020/12/1 22:16, Will Deacon wrote:

> On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
>> On 2020/11/30 21:21, Will Deacon wrote:
>>> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>>> index 0271b4a3b9fe..696b6aa83faf 100644
>>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>>> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>>>    		return old == pte;
>>>>    	smp_store_release(ptep, pte);
>>>> +	get_page(virt_to_page(ptep));
>>> This is also used for the hypervisor stage-1 page-table, so I'd prefer to
>>> leave this function as-is.
>> I agree at this point.
>>>>    	return true;
>>>>    }
>>>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>>    	/* There's an existing valid leaf entry, so perform break-before-make */
>>>>    	kvm_set_invalid_pte(ptep);
>>>>    	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>>> +	put_page(virt_to_page(ptep));
>>>>    	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>>>    out:
>>>>    	data->phys += granule;
>>> Isn't this hunk alone sufficient to solve the problem?
>>>
>>> Will
>>> .
>> Not sufficient enough. When the old ptep is valid and old pte equlas new
>> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
>>
>> and get_page() will still be called.
> I had a go at fixing this without introducing refcounting to the hyp stage-1
> case, and ended up with the diff below. What do you think?

Hi Will,

Functionally this diff looks fine to me. A small comment inline, please 
see below.

I had made an alternative fix (after sending v1) and it looks much more 
concise.

If you're ok with it, I can send it as v2 (together with patch#2 and #3) 
after some tests.


Thanks,

Yanan


diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..b232bdd142a6 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 
end, u32 level,
         if (!kvm_block_mapping_supported(addr, end, phys, level))
                 return false;

+       if (kvm_pte_valid(*ptep))
+               put_page(virt_to_page(ptep));
+
         if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
                 goto out;

>
> --->8
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..78e2c0dc47ae 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
>   	smp_store_release(ptep, pte);
>   }
>   
> -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> -				   u32 level)
> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
>   {
> -	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> +	kvm_pte_t pte = kvm_phys_to_pte(pa);
>   	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
>   							   KVM_PTE_TYPE_BLOCK;
>   
>   	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>   	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
>   	pte |= KVM_PTE_VALID;
> -
> -	/* Tolerate KVM recreating the exact same mapping. */
> -	if (kvm_pte_valid(old))
> -		return old == pte;
> -
> -	smp_store_release(ptep, pte);
> -	return true;
> +	return pte;
>   }
>   
>   static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
>   static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   				    kvm_pte_t *ptep, struct hyp_map_data *data)
>   {
> +	kvm_pte_t new, old = *ptep;
>   	u64 granule = kvm_granule_size(level), phys = data->phys;
>   
>   	if (!kvm_block_mapping_supported(addr, end, phys, level))
>   		return false;
>   
> -	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
> +	/* Tolerate KVM recreating the exact same mapping. */
> +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +	if (old != new && !WARN_ON(kvm_pte_valid(old)))
> +		smp_store_release(ptep, new);
> +
>   	data->phys += granule;
>   	return true;
>   }
> @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   				       kvm_pte_t *ptep,
>   				       struct stage2_map_data *data)
>   {
> +	kvm_pte_t new, old = *ptep;
>   	u64 granule = kvm_granule_size(level), phys = data->phys;
>   
>   	if (!kvm_block_mapping_supported(addr, end, phys, level))
>   		return false;
>   
> -	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> -		goto out;
> +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +	if (kvm_pte_valid(old)) {
> +		/*
> +		 * There's an existing valid leaf entry, so perform
> +		 * break-before-make.
> +		 */
> +		kvm_set_invalid_pte(ptep);
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +		put_page(virt_to_page(ptep));

When old PTE is valid and equals to the new one, we will also perform 
break-before-make and the new PTE installation. But they're actually not 
necessary in this case.

> +	}
>   
> -	/* There's an existing valid leaf entry, so perform break-before-make */
> -	kvm_set_invalid_pte(ptep);
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> -	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> -out:
> +	smp_store_release(ptep, new);
>   	data->phys += granule;
>   	return true;
>   }
> .

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
@ 2020-12-01 17:19           ` wangyanan (Y)
  0 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01 17:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

On 2020/12/1 22:16, Will Deacon wrote:

> On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
>> On 2020/11/30 21:21, Will Deacon wrote:
>>> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>>> index 0271b4a3b9fe..696b6aa83faf 100644
>>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>>> @@ -186,6 +186,7 @@ static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>>>    		return old == pte;
>>>>    	smp_store_release(ptep, pte);
>>>> +	get_page(virt_to_page(ptep));
>>> This is also used for the hypervisor stage-1 page-table, so I'd prefer to
>>> leave this function as-is.
>> I agree at this point.
>>>>    	return true;
>>>>    }
>>>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>>    	/* There's an existing valid leaf entry, so perform break-before-make */
>>>>    	kvm_set_invalid_pte(ptep);
>>>>    	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>>> +	put_page(virt_to_page(ptep));
>>>>    	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>>>    out:
>>>>    	data->phys += granule;
>>> Isn't this hunk alone sufficient to solve the problem?
>>>
>>> Will
>>> .
>> Not sufficient enough. When the old ptep is valid and old pte equlas new
>> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
>>
>> and get_page() will still be called.
> I had a go at fixing this without introducing refcounting to the hyp stage-1
> case, and ended up with the diff below. What do you think?

Hi Will,

Functionally this diff looks fine to me. A small comment inline, please 
see below.

I had made an alternative fix (after sending v1) and it looks much more 
concise.

If you're ok with it, I can send it as v2 (together with patch#2 and #3) 
after some tests.


Thanks,

Yanan


diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 0271b4a3b9fe..b232bdd142a6 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 
end, u32 level,
         if (!kvm_block_mapping_supported(addr, end, phys, level))
                 return false;

+       if (kvm_pte_valid(*ptep))
+               put_page(virt_to_page(ptep));
+
         if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
                 goto out;

>
> --->8
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..78e2c0dc47ae 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
>   	smp_store_release(ptep, pte);
>   }
>   
> -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> -				   u32 level)
> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
>   {
> -	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> +	kvm_pte_t pte = kvm_phys_to_pte(pa);
>   	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
>   							   KVM_PTE_TYPE_BLOCK;
>   
>   	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>   	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
>   	pte |= KVM_PTE_VALID;
> -
> -	/* Tolerate KVM recreating the exact same mapping. */
> -	if (kvm_pte_valid(old))
> -		return old == pte;
> -
> -	smp_store_release(ptep, pte);
> -	return true;
> +	return pte;
>   }
>   
>   static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
>   static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   				    kvm_pte_t *ptep, struct hyp_map_data *data)
>   {
> +	kvm_pte_t new, old = *ptep;
>   	u64 granule = kvm_granule_size(level), phys = data->phys;
>   
>   	if (!kvm_block_mapping_supported(addr, end, phys, level))
>   		return false;
>   
> -	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
> +	/* Tolerate KVM recreating the exact same mapping. */
> +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +	if (old != new && !WARN_ON(kvm_pte_valid(old)))
> +		smp_store_release(ptep, new);
> +
>   	data->phys += granule;
>   	return true;
>   }
> @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   				       kvm_pte_t *ptep,
>   				       struct stage2_map_data *data)
>   {
> +	kvm_pte_t new, old = *ptep;
>   	u64 granule = kvm_granule_size(level), phys = data->phys;
>   
>   	if (!kvm_block_mapping_supported(addr, end, phys, level))
>   		return false;
>   
> -	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> -		goto out;
> +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +	if (kvm_pte_valid(old)) {
> +		/*
> +		 * There's an existing valid leaf entry, so perform
> +		 * break-before-make.
> +		 */
> +		kvm_set_invalid_pte(ptep);
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +		put_page(virt_to_page(ptep));

When old PTE is valid and equals to the new one, we will also perform 
break-before-make and the new PTE installation. But they're actually not 
necessary in this case.

> +	}
>   
> -	/* There's an existing valid leaf entry, so perform break-before-make */
> -	kvm_set_invalid_pte(ptep);
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> -	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> -out:
> +	smp_store_release(ptep, new);
>   	data->phys += granule;
>   	return true;
>   }
> .

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-12-01 14:35                 ` Marc Zyngier
@ 2020-12-01 17:20                   ` wangyanan (Y)
  -1 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01 17:20 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, linux-kernel, linux-arm-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On 2020/12/1 22:35, Marc Zyngier wrote:

> Hi Yanan,
>
> On 2020-12-01 14:11, wangyanan (Y) wrote:
>> On 2020/12/1 21:46, Will Deacon wrote:
>>> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
>
> [...]
>
>>>> The point is at b.iii where the TLBI is not enough. There are many 
>>>> page
>>>> mappings that we need to merge into a block mapping.
>>>>
>>>> We invalidate the TLB for the input address without level hint at 
>>>> b.iii, but
>>>> this operation just flush TLB for one page mapping, there
>>>>
>>>> are still some TLB entries for the other page mappings in the 
>>>> cache, the MMU
>>>> hardware walker can still hit these entries next time.
>>> Ah, yes, I see. Thanks. I hadn't considered the case where there are 
>>> table
>>> entries beneath the anchor. So how about the diff below?
>>>
>>> Will
>>>
>>> --->8
>>
>> Hi, I think it's inappropriate to put the TLBI of all the leaf entries
>> in function stage2_map_walk_table_post(),
>>
>> because the *ptep must be an upper table entry when we enter
>> stage2_map_walk_table_post().
>>
>> We should make the TLBI for every leaf entry not table entry in the
>> last lookup level,  just as I am proposing
>>
>> to add the additional TLBI in function stage2_map_walk_leaf().
>
> Could you make your concerns explicit? As far as I can tell, this should
> address the bug you found, at least from a correctness perspective.
>
> Are you worried about the impact of the full S2 invalidation? Or do you
> see another correctness issue?


Hi Will, Marc,


After recheck of the diff, the full S2 invalidation in 
stage2_map_walk_table_post() should be well enough to solve this problem.

But I was wondering if we can add the full S2 invalidation in 
stage2_map_walk_table_pre(), where __kvm_tlb_flush_vmid() will be called 
for only one time.

If we add the full TLBI in stage2_map_walk_table_post(), 
__kvm_tlb_flush_vmid() might be called for many times in the loop and 
lots of (unnecessary) CPU instructions will be wasted.

What I'm saying is something like below, please let me know what do you 
think.

If this is OK, I can update the diff in v2 and send it with your SOB (is 
it appropriate?) after some tests.


diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b232bdd142a6..f11fb2996080 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -496,7 +496,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 
end, u32 level,
                 return 0;

         kvm_set_invalid_pte(ptep);
-       kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
+       kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
         data->anchor = ptep;
         return 0;
  }


Thanks,

Yanan


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01 17:20                   ` wangyanan (Y)
  0 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01 17:20 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: jiangkunkun, Gavin Shan, lushenming, Suzuki K Poulose,
	Catalin Marinas, zhukeqian1, Quentin Perret, wangjingyi11,
	linux-kernel, yezengruan, James Morse, linux-arm-kernel,
	yuzenghui, wanghaibin.wang, Will Deacon, Julien Thierry

On 2020/12/1 22:35, Marc Zyngier wrote:

> Hi Yanan,
>
> On 2020-12-01 14:11, wangyanan (Y) wrote:
>> On 2020/12/1 21:46, Will Deacon wrote:
>>> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
>
> [...]
>
>>>> The point is at b.iii where the TLBI is not enough. There are many 
>>>> page
>>>> mappings that we need to merge into a block mapping.
>>>>
>>>> We invalidate the TLB for the input address without level hint at 
>>>> b.iii, but
>>>> this operation just flush TLB for one page mapping, there
>>>>
>>>> are still some TLB entries for the other page mappings in the 
>>>> cache, the MMU
>>>> hardware walker can still hit these entries next time.
>>> Ah, yes, I see. Thanks. I hadn't considered the case where there are 
>>> table
>>> entries beneath the anchor. So how about the diff below?
>>>
>>> Will
>>>
>>> --->8
>>
>> Hi, I think it's inappropriate to put the TLBI of all the leaf entries
>> in function stage2_map_walk_table_post(),
>>
>> because the *ptep must be an upper table entry when we enter
>> stage2_map_walk_table_post().
>>
>> We should make the TLBI for every leaf entry not table entry in the
>> last lookup level,  just as I am proposing
>>
>> to add the additional TLBI in function stage2_map_walk_leaf().
>
> Could you make your concerns explicit? As far as I can tell, this should
> address the bug you found, at least from a correctness perspective.
>
> Are you worried about the impact of the full S2 invalidation? Or do you
> see another correctness issue?


Hi Will, Marc,


After recheck of the diff, the full S2 invalidation in 
stage2_map_walk_table_post() should be well enough to solve this problem.

But I was wondering if we can add the full S2 invalidation in 
stage2_map_walk_table_pre(), where __kvm_tlb_flush_vmid() will be called 
for only one time.

If we add the full TLBI in stage2_map_walk_table_post(), 
__kvm_tlb_flush_vmid() might be called for many times in the loop and 
lots of (unnecessary) CPU instructions will be wasted.

What I'm saying is something like below, please let me know what do you 
think.

If this is OK, I can update the diff in v2 and send it with your SOB (is 
it appropriate?) after some tests.


diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b232bdd142a6..f11fb2996080 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -496,7 +496,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 
end, u32 level,
                 return 0;

         kvm_set_invalid_pte(ptep);
-       kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
+       kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
         data->anchor = ptep;
         return 0;
  }


Thanks,

Yanan


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
  2020-12-01 17:19           ` wangyanan (Y)
@ 2020-12-01 18:15             ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 18:15 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On Wed, Dec 02, 2020 at 01:19:35AM +0800, wangyanan (Y) wrote:
> On 2020/12/1 22:16, Will Deacon wrote:
> > On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
> > > On 2020/11/30 21:21, Will Deacon wrote:
> > > > On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> > > > > @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > > > >    	/* There's an existing valid leaf entry, so perform break-before-make */
> > > > >    	kvm_set_invalid_pte(ptep);
> > > > >    	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > > > > +	put_page(virt_to_page(ptep));
> > > > >    	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> > > > >    out:
> > > > >    	data->phys += granule;
> > > > Isn't this hunk alone sufficient to solve the problem?
> > > > 
> > > Not sufficient enough. When the old ptep is valid and old pte equlas new
> > > pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
> > > 
> > > and get_page() will still be called.
> > I had a go at fixing this without introducing refcounting to the hyp stage-1
> > case, and ended up with the diff below. What do you think?
> 
> Functionally this diff looks fine to me. A small comment inline, please see
> below.
> 
> I had made an alternative fix (after sending v1) and it looks much more
> concise.
> 
> If you're ok with it, I can send it as v2 (together with patch#2 and #3)
> after some tests.

Thanks.

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..b232bdd142a6 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
> end, u32 level,
>         if (!kvm_block_mapping_supported(addr, end, phys, level))
>                 return false;
> 
> +       if (kvm_pte_valid(*ptep))
> +               put_page(virt_to_page(ptep));
> +
>         if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>                 goto out;

This is certainly smaller, but see below.

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 0271b4a3b9fe..78e2c0dc47ae 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> >   	smp_store_release(ptep, pte);
> >   }
> > -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> > -				   u32 level)
> > +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> >   {
> > -	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> > +	kvm_pte_t pte = kvm_phys_to_pte(pa);
> >   	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> >   							   KVM_PTE_TYPE_BLOCK;
> >   	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> >   	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> >   	pte |= KVM_PTE_VALID;
> > -
> > -	/* Tolerate KVM recreating the exact same mapping. */
> > -	if (kvm_pte_valid(old))
> > -		return old == pte;
> > -
> > -	smp_store_release(ptep, pte);
> > -	return true;
> > +	return pte;
> >   }
> >   static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> > @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
> >   static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >   				    kvm_pte_t *ptep, struct hyp_map_data *data)
> >   {
> > +	kvm_pte_t new, old = *ptep;
> >   	u64 granule = kvm_granule_size(level), phys = data->phys;
> >   	if (!kvm_block_mapping_supported(addr, end, phys, level))
> >   		return false;
> > -	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
> > +	/* Tolerate KVM recreating the exact same mapping. */
> > +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > +	if (old != new && !WARN_ON(kvm_pte_valid(old)))
> > +		smp_store_release(ptep, new);
> > +
> >   	data->phys += granule;
> >   	return true;
> >   }
> > @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >   				       kvm_pte_t *ptep,
> >   				       struct stage2_map_data *data)
> >   {
> > +	kvm_pte_t new, old = *ptep;
> >   	u64 granule = kvm_granule_size(level), phys = data->phys;
> >   	if (!kvm_block_mapping_supported(addr, end, phys, level))
> >   		return false;
> > -	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> > -		goto out;
> > +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > +	if (kvm_pte_valid(old)) {
> > +		/*
> > +		 * There's an existing valid leaf entry, so perform
> > +		 * break-before-make.
> > +		 */
> > +		kvm_set_invalid_pte(ptep);
> > +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > +		put_page(virt_to_page(ptep));
> 
> When old PTE is valid and equals to the new one, we will also perform
> break-before-make and the new PTE installation. But they're actually not
> necessary in this case.

Agreed, but I don't think that case should happen for stage-2 mappings.
That's why I prefer my diff here, as it makes that 'old == new' logic
specific to the hyp mappings.

But I'll leave it all up to you (these diffs are only intended to be
helpful).

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
@ 2020-12-01 18:15             ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 18:15 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

On Wed, Dec 02, 2020 at 01:19:35AM +0800, wangyanan (Y) wrote:
> On 2020/12/1 22:16, Will Deacon wrote:
> > On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
> > > On 2020/11/30 21:21, Will Deacon wrote:
> > > > On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
> > > > > @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > > > >    	/* There's an existing valid leaf entry, so perform break-before-make */
> > > > >    	kvm_set_invalid_pte(ptep);
> > > > >    	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > > > > +	put_page(virt_to_page(ptep));
> > > > >    	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> > > > >    out:
> > > > >    	data->phys += granule;
> > > > Isn't this hunk alone sufficient to solve the problem?
> > > > 
> > > Not sufficient enough. When the old ptep is valid and old pte equlas new
> > > pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
> > > 
> > > and get_page() will still be called.
> > I had a go at fixing this without introducing refcounting to the hyp stage-1
> > case, and ended up with the diff below. What do you think?
> 
> Functionally this diff looks fine to me. A small comment inline, please see
> below.
> 
> I had made an alternative fix (after sending v1) and it looks much more
> concise.
> 
> If you're ok with it, I can send it as v2 (together with patch#2 and #3)
> after some tests.

Thanks.

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..b232bdd142a6 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
> end, u32 level,
>         if (!kvm_block_mapping_supported(addr, end, phys, level))
>                 return false;
> 
> +       if (kvm_pte_valid(*ptep))
> +               put_page(virt_to_page(ptep));
> +
>         if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>                 goto out;

This is certainly smaller, but see below.

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 0271b4a3b9fe..78e2c0dc47ae 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> >   	smp_store_release(ptep, pte);
> >   }
> > -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> > -				   u32 level)
> > +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> >   {
> > -	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> > +	kvm_pte_t pte = kvm_phys_to_pte(pa);
> >   	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> >   							   KVM_PTE_TYPE_BLOCK;
> >   	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> >   	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> >   	pte |= KVM_PTE_VALID;
> > -
> > -	/* Tolerate KVM recreating the exact same mapping. */
> > -	if (kvm_pte_valid(old))
> > -		return old == pte;
> > -
> > -	smp_store_release(ptep, pte);
> > -	return true;
> > +	return pte;
> >   }
> >   static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> > @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
> >   static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >   				    kvm_pte_t *ptep, struct hyp_map_data *data)
> >   {
> > +	kvm_pte_t new, old = *ptep;
> >   	u64 granule = kvm_granule_size(level), phys = data->phys;
> >   	if (!kvm_block_mapping_supported(addr, end, phys, level))
> >   		return false;
> > -	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
> > +	/* Tolerate KVM recreating the exact same mapping. */
> > +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > +	if (old != new && !WARN_ON(kvm_pte_valid(old)))
> > +		smp_store_release(ptep, new);
> > +
> >   	data->phys += granule;
> >   	return true;
> >   }
> > @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >   				       kvm_pte_t *ptep,
> >   				       struct stage2_map_data *data)
> >   {
> > +	kvm_pte_t new, old = *ptep;
> >   	u64 granule = kvm_granule_size(level), phys = data->phys;
> >   	if (!kvm_block_mapping_supported(addr, end, phys, level))
> >   		return false;
> > -	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> > -		goto out;
> > +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > +	if (kvm_pte_valid(old)) {
> > +		/*
> > +		 * There's an existing valid leaf entry, so perform
> > +		 * break-before-make.
> > +		 */
> > +		kvm_set_invalid_pte(ptep);
> > +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > +		put_page(virt_to_page(ptep));
> 
> When old PTE is valid and equals to the new one, we will also perform
> break-before-make and the new PTE installation. But they're actually not
> necessary in this case.

Agreed, but I don't think that case should happen for stage-2 mappings.
That's why I prefer my diff here, as it makes that 'old == new' logic
specific to the hyp mappings.

But I'll leave it all up to you (these diffs are only intended to be
helpful).

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
  2020-12-01 17:20                   ` wangyanan (Y)
@ 2020-12-01 18:17                     ` Will Deacon
  -1 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 18:17 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Marc Zyngier, linux-kernel, linux-arm-kernel, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming

On Wed, Dec 02, 2020 at 01:20:33AM +0800, wangyanan (Y) wrote:
> On 2020/12/1 22:35, Marc Zyngier wrote:
> 
> > Hi Yanan,
> > 
> > On 2020-12-01 14:11, wangyanan (Y) wrote:
> > > On 2020/12/1 21:46, Will Deacon wrote:
> > > > On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
> > 
> > [...]
> > 
> > > > > The point is at b.iii where the TLBI is not enough. There
> > > > > are many page
> > > > > mappings that we need to merge into a block mapping.
> > > > > 
> > > > > We invalidate the TLB for the input address without level
> > > > > hint at b.iii, but
> > > > > this operation just flush TLB for one page mapping, there
> > > > > 
> > > > > are still some TLB entries for the other page mappings in
> > > > > the cache, the MMU
> > > > > hardware walker can still hit these entries next time.
> > > > Ah, yes, I see. Thanks. I hadn't considered the case where there
> > > > are table
> > > > entries beneath the anchor. So how about the diff below?
> > > > 
> > > > Will
> > > > 
> > > > --->8
> > > 
> > > Hi, I think it's inappropriate to put the TLBI of all the leaf entries
> > > in function stage2_map_walk_table_post(),
> > > 
> > > because the *ptep must be an upper table entry when we enter
> > > stage2_map_walk_table_post().
> > > 
> > > We should make the TLBI for every leaf entry not table entry in the
> > > last lookup level,  just as I am proposing
> > > 
> > > to add the additional TLBI in function stage2_map_walk_leaf().
> > 
> > Could you make your concerns explicit? As far as I can tell, this should
> > address the bug you found, at least from a correctness perspective.
> > 
> > Are you worried about the impact of the full S2 invalidation? Or do you
> > see another correctness issue?
> 
> 
> Hi Will, Marc,
> 
> 
> After recheck of the diff, the full S2 invalidation in
> stage2_map_walk_table_post() should be well enough to solve this problem.
> 
> But I was wondering if we can add the full S2 invalidation in
> stage2_map_walk_table_pre(), where __kvm_tlb_flush_vmid() will be called for
> only one time.
> 
> If we add the full TLBI in stage2_map_walk_table_post(),
> __kvm_tlb_flush_vmid() might be called for many times in the loop and lots
> of (unnecessary) CPU instructions will be wasted.
> 
> What I'm saying is something like below, please let me know what do you
> think.
> 
> If this is OK, I can update the diff in v2 and send it with your SOB (is it
> appropriate?) after some tests.
> 
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b232bdd142a6..f11fb2996080 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -496,7 +496,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end,
> u32 level,
>                 return 0;
> 
>         kvm_set_invalid_pte(ptep);
> -       kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +       kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>         data->anchor = ptep;
>         return 0;

Yes, I think that's much better, but please add a comment! (you can
probably more-or-less copy the one I had in the post handler)

Will

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
@ 2020-12-01 18:17                     ` Will Deacon
  0 siblings, 0 replies; 49+ messages in thread
From: Will Deacon @ 2020-12-01 18:17 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry

On Wed, Dec 02, 2020 at 01:20:33AM +0800, wangyanan (Y) wrote:
> On 2020/12/1 22:35, Marc Zyngier wrote:
> 
> > Hi Yanan,
> > 
> > On 2020-12-01 14:11, wangyanan (Y) wrote:
> > > On 2020/12/1 21:46, Will Deacon wrote:
> > > > On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
> > 
> > [...]
> > 
> > > > > The point is at b.iii where the TLBI is not enough. There
> > > > > are many page
> > > > > mappings that we need to merge into a block mapping.
> > > > > 
> > > > > We invalidate the TLB for the input address without level
> > > > > hint at b.iii, but
> > > > > this operation just flush TLB for one page mapping, there
> > > > > 
> > > > > are still some TLB entries for the other page mappings in
> > > > > the cache, the MMU
> > > > > hardware walker can still hit these entries next time.
> > > > Ah, yes, I see. Thanks. I hadn't considered the case where there
> > > > are table
> > > > entries beneath the anchor. So how about the diff below?
> > > > 
> > > > Will
> > > > 
> > > > --->8
> > > 
> > > Hi, I think it's inappropriate to put the TLBI of all the leaf entries
> > > in function stage2_map_walk_table_post(),
> > > 
> > > because the *ptep must be an upper table entry when we enter
> > > stage2_map_walk_table_post().
> > > 
> > > We should make the TLBI for every leaf entry not table entry in the
> > > last lookup level,  just as I am proposing
> > > 
> > > to add the additional TLBI in function stage2_map_walk_leaf().
> > 
> > Could you make your concerns explicit? As far as I can tell, this should
> > address the bug you found, at least from a correctness perspective.
> > 
> > Are you worried about the impact of the full S2 invalidation? Or do you
> > see another correctness issue?
> 
> 
> Hi Will, Marc,
> 
> 
> After recheck of the diff, the full S2 invalidation in
> stage2_map_walk_table_post() should be well enough to solve this problem.
> 
> But I was wondering if we can add the full S2 invalidation in
> stage2_map_walk_table_pre(), where __kvm_tlb_flush_vmid() will be called for
> only one time.
> 
> If we add the full TLBI in stage2_map_walk_table_post(),
> __kvm_tlb_flush_vmid() might be called for many times in the loop and lots
> of (unnecessary) CPU instructions will be wasted.
> 
> What I'm saying is something like below, please let me know what do you
> think.
> 
> If this is OK, I can update the diff in v2 and send it with your SOB (is it
> appropriate?) after some tests.
> 
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b232bdd142a6..f11fb2996080 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -496,7 +496,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end,
> u32 level,
>                 return 0;
> 
>         kvm_set_invalid_pte(ptep);
> -       kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +       kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>         data->anchor = ptep;
>         return 0;

Yes, I think that's much better, but please add a comment! (you can
probably more-or-less copy the one I had in the post handler)

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
  2020-12-01 18:15             ` Will Deacon
@ 2020-12-01 20:08               ` wangyanan (Y)
  -1 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01 20:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Marc Zyngier, Catalin Marinas,
	James Morse, Julien Thierry, Suzuki K Poulose, Gavin Shan,
	Quentin Perret, wanghaibin.wang, yezengruan, zhukeqian1,
	yuzenghui, jiangkunkun, wangjingyi11, lushenming


On 2020/12/2 2:15, Will Deacon wrote:
> On Wed, Dec 02, 2020 at 01:19:35AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 22:16, Will Deacon wrote:
>>> On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
>>>> On 2020/11/30 21:21, Will Deacon wrote:
>>>>> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>>>>>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>>>>     	/* There's an existing valid leaf entry, so perform break-before-make */
>>>>>>     	kvm_set_invalid_pte(ptep);
>>>>>>     	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>>>>> +	put_page(virt_to_page(ptep));
>>>>>>     	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>>>>>     out:
>>>>>>     	data->phys += granule;
>>>>> Isn't this hunk alone sufficient to solve the problem?
>>>>>
>>>> Not sufficient enough. When the old ptep is valid and old pte equlas new
>>>> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
>>>>
>>>> and get_page() will still be called.
>>> I had a go at fixing this without introducing refcounting to the hyp stage-1
>>> case, and ended up with the diff below. What do you think?
>> Functionally this diff looks fine to me. A small comment inline, please see
>> below.
>>
>> I had made an alternative fix (after sending v1) and it looks much more
>> concise.
>>
>> If you're ok with it, I can send it as v2 (together with patch#2 and #3)
>> after some tests.
> Thanks.
>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 0271b4a3b9fe..b232bdd142a6 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
>> end, u32 level,
>>          if (!kvm_block_mapping_supported(addr, end, phys, level))
>>                  return false;
>>
>> +       if (kvm_pte_valid(*ptep))
>> +               put_page(virt_to_page(ptep));
>> +
>>          if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>>                  goto out;
> This is certainly smaller, but see below.
>
>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>> index 0271b4a3b9fe..78e2c0dc47ae 100644
>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>> @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
>>>    	smp_store_release(ptep, pte);
>>>    }
>>> -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>> -				   u32 level)
>>> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
>>>    {
>>> -	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
>>> +	kvm_pte_t pte = kvm_phys_to_pte(pa);
>>>    	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
>>>    							   KVM_PTE_TYPE_BLOCK;
>>>    	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>>>    	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
>>>    	pte |= KVM_PTE_VALID;
>>> -
>>> -	/* Tolerate KVM recreating the exact same mapping. */
>>> -	if (kvm_pte_valid(old))
>>> -		return old == pte;
>>> -
>>> -	smp_store_release(ptep, pte);
>>> -	return true;
>>> +	return pte;
>>>    }
>>>    static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
>>> @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
>>>    static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>    				    kvm_pte_t *ptep, struct hyp_map_data *data)
>>>    {
>>> +	kvm_pte_t new, old = *ptep;
>>>    	u64 granule = kvm_granule_size(level), phys = data->phys;
>>>    	if (!kvm_block_mapping_supported(addr, end, phys, level))
>>>    		return false;
>>> -	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
>>> +	/* Tolerate KVM recreating the exact same mapping. */
>>> +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>> +	if (old != new && !WARN_ON(kvm_pte_valid(old)))
>>> +		smp_store_release(ptep, new);
>>> +
>>>    	data->phys += granule;
>>>    	return true;
>>>    }
>>> @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>    				       kvm_pte_t *ptep,
>>>    				       struct stage2_map_data *data)
>>>    {
>>> +	kvm_pte_t new, old = *ptep;
>>>    	u64 granule = kvm_granule_size(level), phys = data->phys;
>>>    	if (!kvm_block_mapping_supported(addr, end, phys, level))
>>>    		return false;
>>> -	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>>> -		goto out;
>>> +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>> +	if (kvm_pte_valid(old)) {
>>> +		/*
>>> +		 * There's an existing valid leaf entry, so perform
>>> +		 * break-before-make.
>>> +		 */
>>> +		kvm_set_invalid_pte(ptep);
>>> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>> +		put_page(virt_to_page(ptep));
>> When old PTE is valid and equals to the new one, we will also perform
>> break-before-make and the new PTE installation. But they're actually not
>> necessary in this case.
> Agreed, but I don't think that case should happen for stage-2 mappings.
> That's why I prefer my diff here, as it makes that 'old == new' logic
> specific to the hyp mappings.
>
> But I'll leave it all up to you (these diffs are only intended to be
> helpful).
>
> Will
> .

Hi Will,

In my opinion, the 'old == new' case might happen too in stage-2 
translation,  especially in the condition of concurrent access of 
multiple vCPUs.

For example, when merging tables into a block, we will perform 
break-before-make for a valid old PTE in function stage2_map_walk_pre().

If the other vCPUs are trying to access the same GPA range, so the MMUs 
will target at the same PTE as above, and they might just catch the moment

when the target PTE is set invalid in BBM by the vCPU holding the 
mmu_lock, but the old PTE will be updated to valid later.

As a result, translation fault will be triggered in these vCPUs, then 
they will wait for the mmu_lock to map exactly the *same* mapping (old 
== new).


Thanks,

Yanan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2
@ 2020-12-01 20:08               ` wangyanan (Y)
  0 siblings, 0 replies; 49+ messages in thread
From: wangyanan (Y) @ 2020-12-01 20:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: jiangkunkun, Gavin Shan, Suzuki K Poulose, Marc Zyngier,
	wangjingyi11, Quentin Perret, lushenming, linux-kernel,
	yezengruan, James Morse, linux-arm-kernel, Catalin Marinas,
	yuzenghui, wanghaibin.wang, zhukeqian1, Julien Thierry


On 2020/12/2 2:15, Will Deacon wrote:
> On Wed, Dec 02, 2020 at 01:19:35AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 22:16, Will Deacon wrote:
>>> On Tue, Dec 01, 2020 at 03:21:23PM +0800, wangyanan (Y) wrote:
>>>> On 2020/11/30 21:21, Will Deacon wrote:
>>>>> On Mon, Nov 30, 2020 at 08:18:45PM +0800, Yanan Wang wrote:
>>>>>> @@ -476,6 +477,7 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>>>>     	/* There's an existing valid leaf entry, so perform break-before-make */
>>>>>>     	kvm_set_invalid_pte(ptep);
>>>>>>     	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>>>>> +	put_page(virt_to_page(ptep));
>>>>>>     	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>>>>>>     out:
>>>>>>     	data->phys += granule;
>>>>> Isn't this hunk alone sufficient to solve the problem?
>>>>>
>>>> Not sufficient enough. When the old ptep is valid and old pte equlas new
>>>> pte, in this case, "True" is also returned by kvm_set_valid_leaf_pte()
>>>>
>>>> and get_page() will still be called.
>>> I had a go at fixing this without introducing refcounting to the hyp stage-1
>>> case, and ended up with the diff below. What do you think?
>> Functionally this diff looks fine to me. A small comment inline, please see
>> below.
>>
>> I had made an alternative fix (after sending v1) and it looks much more
>> concise.
>>
>> If you're ok with it, I can send it as v2 (together with patch#2 and #3)
>> after some tests.
> Thanks.
>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 0271b4a3b9fe..b232bdd142a6 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -470,6 +470,9 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64
>> end, u32 level,
>>          if (!kvm_block_mapping_supported(addr, end, phys, level))
>>                  return false;
>>
>> +       if (kvm_pte_valid(*ptep))
>> +               put_page(virt_to_page(ptep));
>> +
>>          if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>>                  goto out;
> This is certainly smaller, but see below.
>
>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>> index 0271b4a3b9fe..78e2c0dc47ae 100644
>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>> @@ -170,23 +170,16 @@ static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
>>>    	smp_store_release(ptep, pte);
>>>    }
>>> -static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
>>> -				   u32 level)
>>> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
>>>    {
>>> -	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
>>> +	kvm_pte_t pte = kvm_phys_to_pte(pa);
>>>    	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
>>>    							   KVM_PTE_TYPE_BLOCK;
>>>    	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>>>    	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
>>>    	pte |= KVM_PTE_VALID;
>>> -
>>> -	/* Tolerate KVM recreating the exact same mapping. */
>>> -	if (kvm_pte_valid(old))
>>> -		return old == pte;
>>> -
>>> -	smp_store_release(ptep, pte);
>>> -	return true;
>>> +	return pte;
>>>    }
>>>    static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
>>> @@ -341,12 +334,17 @@ static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
>>>    static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>    				    kvm_pte_t *ptep, struct hyp_map_data *data)
>>>    {
>>> +	kvm_pte_t new, old = *ptep;
>>>    	u64 granule = kvm_granule_size(level), phys = data->phys;
>>>    	if (!kvm_block_mapping_supported(addr, end, phys, level))
>>>    		return false;
>>> -	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
>>> +	/* Tolerate KVM recreating the exact same mapping. */
>>> +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>> +	if (old != new && !WARN_ON(kvm_pte_valid(old)))
>>> +		smp_store_release(ptep, new);
>>> +
>>>    	data->phys += granule;
>>>    	return true;
>>>    }
>>> @@ -465,19 +463,24 @@ static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>>>    				       kvm_pte_t *ptep,
>>>    				       struct stage2_map_data *data)
>>>    {
>>> +	kvm_pte_t new, old = *ptep;
>>>    	u64 granule = kvm_granule_size(level), phys = data->phys;
>>>    	if (!kvm_block_mapping_supported(addr, end, phys, level))
>>>    		return false;
>>> -	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>>> -		goto out;
>>> +	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
>>> +	if (kvm_pte_valid(old)) {
>>> +		/*
>>> +		 * There's an existing valid leaf entry, so perform
>>> +		 * break-before-make.
>>> +		 */
>>> +		kvm_set_invalid_pte(ptep);
>>> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>>> +		put_page(virt_to_page(ptep));
>> When old PTE is valid and equals to the new one, we will also perform
>> break-before-make and the new PTE installation. But they're actually not
>> necessary in this case.
> Agreed, but I don't think that case should happen for stage-2 mappings.
> That's why I prefer my diff here, as it makes that 'old == new' logic
> specific to the hyp mappings.
>
> But I'll leave it all up to you (these diffs are only intended to be
> helpful).
>
> Will
> .

Hi Will,

In my opinion, the 'old == new' case might happen too in stage-2 
translation,  especially in the condition of concurrent access of 
multiple vCPUs.

For example, when merging tables into a block, we will perform 
break-before-make for a valid old PTE in function stage2_map_walk_pre().

If the other vCPUs are trying to access the same GPA range, so the MMUs 
will target at the same PTE as above, and they might just catch the moment

when the target PTE is set invalid in BBM by the vCPU holding the 
mmu_lock, but the old PTE will be updated to valid later.

As a result, translation fault will be triggered in these vCPUs, then 
they will wait for the mmu_lock to map exactly the *same* mapping (old 
== new).


Thanks,

Yanan


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2020-12-01 20:10 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30 12:18 [RFC PATCH 0/3] Fix several bugs in KVM stage 2 translation Yanan Wang
2020-11-30 12:18 ` Yanan Wang
2020-11-30 12:18 ` [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2 Yanan Wang
2020-11-30 12:18   ` Yanan Wang
2020-11-30 13:21   ` Will Deacon
2020-11-30 13:21     ` Will Deacon
2020-12-01  7:21     ` wangyanan (Y)
2020-12-01  7:21       ` wangyanan (Y)
2020-12-01 14:16       ` Will Deacon
2020-12-01 14:16         ` Will Deacon
2020-12-01 17:19         ` wangyanan (Y)
2020-12-01 17:19           ` wangyanan (Y)
2020-12-01 18:15           ` Will Deacon
2020-12-01 18:15             ` Will Deacon
2020-12-01 20:08             ` wangyanan (Y)
2020-12-01 20:08               ` wangyanan (Y)
2020-11-30 12:18 ` [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry Yanan Wang
2020-11-30 12:18   ` Yanan Wang
2020-11-30 13:34   ` Will Deacon
2020-11-30 13:34     ` Will Deacon
2020-11-30 15:24     ` wangyanan (Y)
2020-11-30 15:24       ` wangyanan (Y)
2020-11-30 16:01       ` Will Deacon
2020-11-30 16:01         ` Will Deacon
2020-12-01  2:30         ` wangyanan (Y)
2020-12-01  2:30           ` wangyanan (Y)
2020-12-01 13:46           ` Will Deacon
2020-12-01 13:46             ` Will Deacon
2020-12-01 14:05             ` Marc Zyngier
2020-12-01 14:05               ` Marc Zyngier
2020-12-01 14:23               ` Will Deacon
2020-12-01 14:23                 ` Will Deacon
2020-12-01 14:32                 ` Marc Zyngier
2020-12-01 14:32                   ` Marc Zyngier
2020-12-01 14:11             ` wangyanan (Y)
2020-12-01 14:11               ` wangyanan (Y)
2020-12-01 14:35               ` Marc Zyngier
2020-12-01 14:35                 ` Marc Zyngier
2020-12-01 17:20                 ` wangyanan (Y)
2020-12-01 17:20                   ` wangyanan (Y)
2020-12-01 18:17                   ` Will Deacon
2020-12-01 18:17                     ` Will Deacon
2020-11-30 12:18 ` [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort() Yanan Wang
2020-11-30 12:18   ` Yanan Wang
2020-11-30 13:49   ` Will Deacon
2020-11-30 13:49     ` Will Deacon
2020-12-01  6:04     ` wangyanan (Y)
2020-12-01  6:04       ` wangyanan (Y)
2020-11-30 19:40   ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.