linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling
@ 2022-08-30 19:41 Oliver Upton
  2022-08-30 19:41 ` [PATCH 01/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees Oliver Upton
                   ` (14 more replies)
  0 siblings, 15 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson
  Cc: linux-arm-kernel, kvmarm, kvm, Oliver Upton

Presently KVM only takes a read lock for stage 2 faults if it believes
the fault can be fixed by relaxing permissions on a PTE (write unprotect
for dirty logging). Otherwise, stage 2 faults grab the write lock, which
predictably can pile up all the vCPUs in a sufficiently large VM.

Like the TDP MMU for x86, this series loosens the locking around
manipulations of the stage 2 page tables to allow parallel faults. RCU
and atomics are exploited to safely build/destroy the stage 2 page
tables in light of multiple software observers.

Patches 1-2 are a cleanup to the way we collapse page tables, with the
added benefit of narrowing the window of time a range of memory is
unmapped.

Patches 3-7 are minor cleanups and refactorings to the way KVM reads
PTEs and traverses the stage 2 page tables to make it amenable to
concurrent modification.

Patches 8-9 use RCU to punt page table cleanup out of the vCPU fault
path, which should also improve fault latency a bit.

Patches 10-14 implement the meat of this series, extending the
'break-before-make' sequence with atomics to realize locking on PTEs.
Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
changes to a given PTE.

Finally, patch 15 flips the switch on all the new code and starts
grabbing the read side of the MMU lock for stage 2 faults.

Applies to 6.0-rc3. Tested with KVM selftests and benchmarked with
dirty_log_perf_test, scaling from 1 to 48 vCPUs with 4GB of memory per
vCPU backed by THP.

  ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}

Time to dirty memory:

        +-------+---------+------------------+
        | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
        +-------+---------+------------------+
        |     1 | 0.89s   | 0.92s            |
        |     2 | 1.13s   | 1.18s            |
        |     4 | 2.42s   | 1.25s            |
        |     8 | 5.03s   | 1.36s            |
        |    16 | 8.84s   | 2.09s            |
        |    32 | 19.60s  | 4.47s            |
        |    48 | 31.39s  | 6.22s            |
        +-------+---------+------------------+

It is also worth mentioning that the time to populate memory has
improved:

        +-------+---------+------------------+
        | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
        +-------+---------+------------------+
        |     1 | 0.19s   | 0.18s            |
        |     2 | 0.25s   | 0.21s            |
        |     4 | 0.38s   | 0.32s            |
        |     8 | 0.64s   | 0.40s            |
        |    16 | 1.22s   | 0.54s            |
        |    32 | 2.50s   | 1.03s            |
        |    48 | 3.88s   | 1.52s            |
        +-------+---------+------------------+

RFC: https://lore.kernel.org/kvmarm/20220415215901.1737897-1-oupton@google.com/

RFC -> v1:
 - Factored out page table teardown from kvm_pgtable_stage2_map()
 - Use the RCU callback to tear down a subtree, instead of scheduling a
   callback for every individual table page.
 - Reorganized series to (hopefully) avoid intermediate breakage.
 - Dropped the use of page headers, instead stuffing KVM metadata into
   page::private directly

Oliver Upton (14):
  KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  KVM: arm64: Directly read owner id field in stage2_pte_is_counted()
  KVM: arm64: Read the PTE once per visit
  KVM: arm64: Split init and set for table PTE
  KVM: arm64: Return next table from map callbacks
  KVM: arm64: Document behavior of pgtable visitor callback
  KVM: arm64: Protect page table traversal with RCU
  KVM: arm64: Free removed stage-2 tables in RCU callback
  KVM: arm64: Atomically update stage 2 leaf attributes in parallel
    walks
  KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
  KVM: arm64: Make leaf->leaf PTE changes parallel-aware
  KVM: arm64: Make table->block changes parallel-aware
  KVM: arm64: Handle stage-2 faults in parallel

 arch/arm64/include/asm/kvm_pgtable.h  |  59 ++++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |   7 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |   4 +-
 arch/arm64/kvm/hyp/pgtable.c          | 360 ++++++++++++++++----------
 arch/arm64/kvm/mmu.c                  |  65 +++--
 5 files changed, 325 insertions(+), 170 deletions(-)


base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-08-30 19:41 ` [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

A subsequent change to KVM will move the tear down of an unlinked
stage-2 subtree out of the critical path of the break-before-make
sequence.

Introduce a new helper for tearing down unlinked stage-2 subtrees.
Leverage the existing stage-2 free walkers to do so, with a deep call
into __kvm_pgtable_walk() as the subtree is no longer reachable from the
root.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9f339dffbc1a..d71fb92dc913 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -316,6 +316,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
  */
 void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
 
+/**
+ * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
+ * @pgtable:	Unlinked stage-2 paging structure to be freed.
+ * @level:	Level of the stage-2 paging structure to be freed.
+ * @arg:	Page-table structure initialised by kvm_pgtable_stage2_init*()
+ *
+ * The page-table is assumed to be unreachable by any hardware walkers prior to
+ * freeing and therefore no TLB invalidation is performed.
+ */
+void kvm_pgtable_stage2_free_removed(void *pgtable, u32 level, void *arg);
+
 /**
  * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2cb3867eb7c2..d8127c25424c 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1233,3 +1233,29 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
+
+void kvm_pgtable_stage2_free_removed(void *pgtable, u32 level, void *arg)
+{
+	struct kvm_pgtable *pgt = (struct kvm_pgtable *)arg;
+	kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
+	struct kvm_pgtable_walker walker = {
+		.cb	= stage2_free_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF |
+			  KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pgt->mm_ops,
+	};
+	struct kvm_pgtable_walk_data data = {
+		.pgt	= pgt,
+		.walker	= &walker,
+
+		/*
+		 * At this point the IPA really doesn't matter, as the page
+		 * table being traversed has already been removed from the stage
+		 * 2. Set an appropriate range to cover the entire page table.
+		 */
+		.addr	= 0,
+		.end	= kvm_granule_size(level),
+	};
+
+	WARN_ON(__kvm_pgtable_walk(&data, ptep, level));
+}
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
  2022-08-30 19:41 ` [PATCH 01/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-09-06 14:35   ` Quentin Perret
                     ` (2 more replies)
  2022-08-30 19:41 ` [PATCH 03/14] KVM: arm64: Directly read owner id field in stage2_pte_is_counted() Oliver Upton
                   ` (12 subsequent siblings)
  14 siblings, 3 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

The break-before-make sequence is a bit annoying as it opens a window
wherein memory is unmapped from the guest. KVM should replace the PTE
as quickly as possible and avoid unnecessary work in between.

Presently, the stage-2 map walker tears down a removed table before
installing a block mapping when coalescing a table into a block. As the
removed table is no longer visible to hardware walkers after the
DSB+TLBI, it is possible to move the remaining cleanup to happen after
installing the new PTE.

Reshuffle the stage-2 map walker to install the new block entry in
the pre-order callback. Unwire all of the teardown logic and replace
it with a call to kvm_pgtable_stage2_free_removed() after fixing
the PTE. The post-order visitor is now completely unnecessary, so drop
it. Finally, touch up the comments to better represent the now
simplified map walker.

Note that the call to tear down the unlinked stage-2 is indirected
as a subsequent change will use an RCU callback to trigger tear down.
RCU is not available to pKVM, so there is a need to use different
implementations on pKVM and non-pKVM VMs.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  3 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  1 +
 arch/arm64/kvm/hyp/pgtable.c          | 83 ++++++++-------------------
 arch/arm64/kvm/mmu.c                  |  1 +
 4 files changed, 28 insertions(+), 60 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index d71fb92dc913..c25633f53b2b 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -77,6 +77,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
  *				allocation is physically contiguous.
  * @free_pages_exact:		Free an exact number of memory pages previously
  *				allocated by zalloc_pages_exact.
+ * @free_removed_table:		Free a removed paging structure by unlinking and
+ *				dropping references.
  * @get_page:			Increment the refcount on a page.
  * @put_page:			Decrement the refcount on a page. When the
  *				refcount reaches 0 the page is automatically
@@ -95,6 +97,7 @@ struct kvm_pgtable_mm_ops {
 	void*		(*zalloc_page)(void *arg);
 	void*		(*zalloc_pages_exact)(size_t size);
 	void		(*free_pages_exact)(void *addr, size_t size);
+	void		(*free_removed_table)(void *addr, u32 level, void *arg);
 	void		(*get_page)(void *addr);
 	void		(*put_page)(void *addr);
 	int		(*page_count)(void *addr);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1e78acf9662e..a930fdee6fce 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -93,6 +93,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
 	host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
 		.zalloc_pages_exact = host_s2_zalloc_pages_exact,
 		.zalloc_page = host_s2_zalloc_page,
+		.free_removed_table = kvm_pgtable_stage2_free_removed,
 		.phys_to_virt = hyp_phys_to_virt,
 		.virt_to_phys = hyp_virt_to_phys,
 		.page_count = hyp_page_count,
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index d8127c25424c..5c0c8028d71c 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -763,17 +763,21 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	return 0;
 }
 
+static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+				struct stage2_map_data *data);
+
 static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 				     kvm_pte_t *ptep,
 				     struct stage2_map_data *data)
 {
-	if (data->anchor)
-		return 0;
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops);
+	struct kvm_pgtable *pgt = data->mmu->pgt;
+	int ret;
 
 	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
 	kvm_clear_pte(ptep);
 
 	/*
@@ -782,8 +786,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	 * individually.
 	 */
 	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-	data->anchor = ptep;
-	return 0;
+
+	ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+
+	mm_ops->put_page(ptep);
+	mm_ops->free_removed_table(childp, level + 1, pgt);
+
+	return ret;
 }
 
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
@@ -793,13 +802,6 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	kvm_pte_t *childp, pte = *ptep;
 	int ret;
 
-	if (data->anchor) {
-		if (stage2_pte_is_counted(pte))
-			mm_ops->put_page(ptep);
-
-		return 0;
-	}
-
 	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
 	if (ret != -E2BIG)
 		return ret;
@@ -828,50 +830,14 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	return 0;
 }
 
-static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
-				      struct stage2_map_data *data)
-{
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp;
-	int ret = 0;
-
-	if (!data->anchor)
-		return 0;
-
-	if (data->anchor == ptep) {
-		childp = data->childp;
-		data->anchor = NULL;
-		data->childp = NULL;
-		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
-	} else {
-		childp = kvm_pte_follow(*ptep, mm_ops);
-	}
-
-	mm_ops->put_page(childp);
-	mm_ops->put_page(ptep);
-
-	return ret;
-}
-
 /*
- * This is a little fiddly, as we use all three of the walk flags. The idea
- * is that the TABLE_PRE callback runs for table entries on the way down,
- * looking for table entries which we could conceivably replace with a
- * block entry for this mapping. If it finds one, then it sets the 'anchor'
- * field in 'struct stage2_map_data' to point at the table entry, before
- * clearing the entry to zero and descending into the now detached table.
- *
- * The behaviour of the LEAF callback then depends on whether or not the
- * anchor has been set. If not, then we're not using a block mapping higher
- * up the table and we perform the mapping at the existing leaves instead.
- * If, on the other hand, the anchor _is_ set, then we drop references to
- * all valid leaves so that the pages beneath the anchor can be freed.
+ * The TABLE_PRE callback runs for table entries on the way down, looking
+ * for table entries which we could conceivably replace with a block entry
+ * for this mapping. If it finds one it replaces the entry and calls
+ * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table.
  *
- * Finally, the TABLE_POST callback does nothing if the anchor has not
- * been set, but otherwise frees the page-table pages while walking back up
- * the page-table, installing the block entry when it revisits the anchor
- * pointer and clearing the anchor to NULL.
+ * Otherwise, the LEAF callback performs the mapping at the existing leaves
+ * instead.
  */
 static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			     enum kvm_pgtable_walk_flags flag, void * const arg)
@@ -883,11 +849,9 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
 	case KVM_PGTABLE_WALK_LEAF:
 		return stage2_map_walk_leaf(addr, end, level, ptep, data);
-	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(addr, end, level, ptep, data);
+	default:
+		return -EINVAL;
 	}
-
-	return -EINVAL;
 }
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -905,8 +869,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
 		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
-				  KVM_PGTABLE_WALK_LEAF |
-				  KVM_PGTABLE_WALK_TABLE_POST,
+				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c9a13e487187..91521f4aab97 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -627,6 +627,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.zalloc_page		= stage2_memcache_zalloc_page,
 	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
 	.free_pages_exact	= free_pages_exact,
+	.free_removed_table	= kvm_pgtable_stage2_free_removed,
 	.get_page		= kvm_host_get_page,
 	.put_page		= kvm_host_put_page,
 	.page_count		= kvm_host_page_count,
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/14] KVM: arm64: Directly read owner id field in stage2_pte_is_counted()
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
  2022-08-30 19:41 ` [PATCH 01/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees Oliver Upton
  2022-08-30 19:41 ` [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-08-30 19:41 ` [PATCH 04/14] KVM: arm64: Read the PTE once per visit Oliver Upton
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

A subsequent change to KVM will make use of additional bits in invalid
ptes. Prepare for said change by explicitly checking the valid bit and
owner fields in stage2_pte_is_counted()

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 5c0c8028d71c..b6ce786ae570 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -172,6 +172,11 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
 	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
 }
 
+static u8 kvm_invalid_pte_owner(kvm_pte_t pte)
+{
+	return FIELD_GET(KVM_INVALID_PTE_OWNER_MASK, pte);
+}
+
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 				  u32 level, kvm_pte_t *ptep,
 				  enum kvm_pgtable_walk_flags flag)
@@ -679,7 +684,7 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	 * encode ownership of a page to another entity than the page-table
 	 * owner, whose id is 0.
 	 */
-	return !!pte;
+	return kvm_pte_valid(pte) || kvm_invalid_pte_owner(pte);
 }
 
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/14] KVM: arm64: Read the PTE once per visit
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (2 preceding siblings ...)
  2022-08-30 19:41 ` [PATCH 03/14] KVM: arm64: Directly read owner id field in stage2_pte_is_counted() Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-08-30 19:41 ` [PATCH 05/14] KVM: arm64: Split init and set for table PTE Oliver Upton
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

The page table walkers read the PTE multiple times per visit. Presently,
that is safe as changes to the non-leaf PTEs are serialized. A
subsequent change to KVM will enable parallel modifications to the stage
2 page tables. Prepare by ensuring a PTE is read only once per visit.

Promote the PTE read in __kvm_pgtable_visit() to READ_ONCE() and pass
the observed value through to callbacks. Note that the PTE is passed as
a pointer to the callbacks; visitors that install new tables need to aim
traversal at the new table.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  8 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  4 +-
 arch/arm64/kvm/hyp/pgtable.c          | 73 ++++++++++++++-------------
 4 files changed, 48 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c25633f53b2b..47920ae3f7e7 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -195,7 +195,7 @@ enum kvm_pgtable_walk_flags {
 };
 
 typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
-					kvm_pte_t *ptep,
+					kvm_pte_t *ptep, kvm_pte_t *old,
 					enum kvm_pgtable_walk_flags flag,
 					void * const arg);
 
@@ -561,4 +561,10 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
  *	   kvm_pgtable_prot format.
  */
 enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
+
+static inline kvm_pte_t kvm_pte_read(kvm_pte_t *ptep)
+{
+	return READ_ONCE(*ptep);
+}
+
 #endif	/* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a930fdee6fce..61cf223e0796 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -419,12 +419,12 @@ struct check_walk_data {
 };
 
 static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
+				      kvm_pte_t *ptep, kvm_pte_t *old,
 				      enum kvm_pgtable_walk_flags flag,
 				      void * const arg)
 {
 	struct check_walk_data *d = arg;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *old;
 
 	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..2b62ca58ebd4 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -187,14 +187,14 @@ static void hpool_put_page(void *addr)
 }
 
 static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
+					 kvm_pte_t *ptep, kvm_pte_t *old,
 					 enum kvm_pgtable_walk_flags flag,
 					 void * const arg)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *old;
 	phys_addr_t phys;
 
 	if (!kvm_pte_valid(pte))
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b6ce786ae570..430753fbb727 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -178,11 +178,11 @@ static u8 kvm_invalid_pte_owner(kvm_pte_t pte)
 }
 
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
-				  u32 level, kvm_pte_t *ptep,
+				  u32 level, kvm_pte_t *ptep, kvm_pte_t *old,
 				  enum kvm_pgtable_walk_flags flag)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
-	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
+	return walker->cb(addr, data->end, level, ptep, old, flag, walker->arg);
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
@@ -193,17 +193,17 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 {
 	int ret = 0;
 	u64 addr = data->addr;
-	kvm_pte_t *childp, pte = *ptep;
+	kvm_pte_t *childp, pte = kvm_pte_read(ptep);
 	bool table = kvm_pte_table(pte, level);
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 
 	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
+		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte,
 					     KVM_PGTABLE_WALK_TABLE_PRE);
 	}
 
 	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
+		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte,
 					     KVM_PGTABLE_WALK_LEAF);
 		pte = *ptep;
 		table = kvm_pte_table(pte, level);
@@ -224,7 +224,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 
 	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
+		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte,
 					     KVM_PGTABLE_WALK_TABLE_POST);
 	}
 
@@ -297,12 +297,12 @@ struct leaf_walk_data {
 	u32		level;
 };
 
-static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old,
 		       enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct leaf_walk_data *data = arg;
 
-	data->pte   = *ptep;
+	data->pte   = *old;
 	data->level = level;
 
 	return 0;
@@ -388,10 +388,10 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 	return prot;
 }
 
-static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				    kvm_pte_t *ptep, struct hyp_map_data *data)
+static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+				    kvm_pte_t old, struct hyp_map_data *data)
 {
-	kvm_pte_t new, old = *ptep;
+	kvm_pte_t new;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
@@ -410,14 +410,14 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	return true;
 }
 
-static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	kvm_pte_t *childp;
 	struct hyp_map_data *data = arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
+	if (hyp_map_walker_try_leaf(addr, end, level, ptep, *old, arg))
 		return 0;
 
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
@@ -461,10 +461,10 @@ struct hyp_unmap_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old,
 			    enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	kvm_pte_t pte = *ptep, *childp = NULL;
+	kvm_pte_t pte = *old, *childp = NULL;
 	u64 granule = kvm_granule_size(level);
 	struct hyp_unmap_data *data = arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
@@ -537,11 +537,11 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	return 0;
 }
 
-static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = arg;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *old;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
@@ -723,10 +723,10 @@ static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
 }
 
 static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
+				      kvm_pte_t *ptep, kvm_pte_t old,
 				      struct stage2_map_data *data)
 {
-	kvm_pte_t new, old = *ptep;
+	kvm_pte_t new;
 	u64 granule = kvm_granule_size(level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
@@ -772,11 +772,11 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				struct stage2_map_data *data);
 
 static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
-				     kvm_pte_t *ptep,
+				     kvm_pte_t *ptep, kvm_pte_t *old,
 				     struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops);
+	kvm_pte_t *childp = kvm_pte_follow(*old, mm_ops);
 	struct kvm_pgtable *pgt = data->mmu->pgt;
 	int ret;
 
@@ -801,13 +801,14 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 }
 
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-				struct stage2_map_data *data)
+				kvm_pte_t *old, struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp, pte = *ptep;
+	kvm_pte_t *childp, pte = *old;
 	int ret;
 
-	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
+	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, pte, data);
+
 	if (ret != -E2BIG)
 		return ret;
 
@@ -844,16 +845,16 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
  * Otherwise, the LEAF callback performs the mapping at the existing leaves
  * instead.
  */
-static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old,
 			     enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct stage2_map_data *data = arg;
 
 	switch (flag) {
 	case KVM_PGTABLE_WALK_TABLE_PRE:
-		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
+		return stage2_map_walk_table_pre(addr, end, level, ptep, old, data);
 	case KVM_PGTABLE_WALK_LEAF:
-		return stage2_map_walk_leaf(addr, end, level, ptep, data);
+		return stage2_map_walk_leaf(addr, end, level, ptep, old, data);
 	default:
 		return -EINVAL;
 	}
@@ -918,13 +919,13 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 }
 
 static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			       enum kvm_pgtable_walk_flags flag,
+			       kvm_pte_t *old, enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
 	struct kvm_pgtable *pgt = arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ptep, *childp = NULL;
+	kvm_pte_t pte = *old, *childp = NULL;
 	bool need_flush = false;
 
 	if (!kvm_pte_valid(pte)) {
@@ -981,10 +982,10 @@ struct stage2_attr_data {
 };
 
 static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			      enum kvm_pgtable_walk_flags flag,
+			      kvm_pte_t *old, enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *old;
 	struct stage2_attr_data *data = arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
@@ -1007,7 +1008,7 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		 * stage-2 PTE if we are going to add executable permission.
 		 */
 		if (mm_ops->icache_inval_pou &&
-		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
+		    stage2_pte_executable(pte) && !stage2_pte_executable(data->pte))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
 						  kvm_granule_size(level));
 		WRITE_ONCE(*ptep, pte);
@@ -1109,12 +1110,12 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 }
 
 static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			       enum kvm_pgtable_walk_flags flag,
+			       kvm_pte_t *old, enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
 {
 	struct kvm_pgtable *pgt = arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *old;
 
 	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
 		return 0;
@@ -1169,11 +1170,11 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 }
 
 static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			      enum kvm_pgtable_walk_flags flag,
+			      kvm_pte_t *old, enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = arg;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *old;
 
 	if (!stage2_pte_is_counted(pte))
 		return 0;
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/14] KVM: arm64: Split init and set for table PTE
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (3 preceding siblings ...)
  2022-08-30 19:41 ` [PATCH 04/14] KVM: arm64: Read the PTE once per visit Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-08-30 19:41 ` [PATCH 06/14] KVM: arm64: Return next table from map callbacks Oliver Upton
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

Create a helper to initialize a stage-2 table and directly call
smp_store_release() to install it.

A subsequent change to KVM will tweak the way we traverse the page
tables, requiring that the visitor callbacks steer the walker down a
newly installed table. Furthermore, when stage-2 faults are serviced
in parallel the PTE must be considered volatile, so walkers will need
to stash a pointer to the new table.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 430753fbb727..331f6e3b2c20 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -142,16 +142,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, 0);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
-			      struct kvm_pgtable_mm_ops *mm_ops)
+static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
+	kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
-
-	WARN_ON(kvm_pte_valid(old));
-	smp_store_release(ptep, pte);
+	return pte;
 }
 
 static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
@@ -413,7 +410,7 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *pte
 static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t *old,
 			  enum kvm_pgtable_walk_flags flag, void * const arg)
 {
-	kvm_pte_t *childp;
+	kvm_pte_t *childp, new;
 	struct hyp_map_data *data = arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
@@ -427,8 +424,10 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
+	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ptep);
+	smp_store_release(ptep, new);
+
 	return 0;
 }
 
@@ -804,7 +803,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 				kvm_pte_t *old, struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp, pte = *old;
+	kvm_pte_t *childp, pte = *old, new;
 	int ret;
 
 	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, pte, data);
@@ -830,8 +829,9 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (stage2_pte_is_counted(pte))
 		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
+	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ptep);
+	smp_store_release(ptep, new);
 
 	return 0;
 }
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/14] KVM: arm64: Return next table from map callbacks
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (4 preceding siblings ...)
  2022-08-30 19:41 ` [PATCH 05/14] KVM: arm64: Split init and set for table PTE Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-09-07 21:32   ` David Matlack
  2022-08-30 19:41 ` [PATCH 07/14] KVM: arm64: Document behavior of pgtable visitor callback Oliver Upton
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

The map walkers install new page tables during their traversal. Return
the newly-installed table PTE from the map callbacks to point the walker
at the new table w/o rereading the ptep.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 331f6e3b2c20..f911509e6512 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -202,13 +202,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
 		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte,
 					     KVM_PGTABLE_WALK_LEAF);
-		pte = *ptep;
-		table = kvm_pte_table(pte, level);
 	}
 
 	if (ret)
 		goto out;
 
+	table = kvm_pte_table(pte, level);
 	if (!table) {
 		data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
 		data->addr += kvm_granule_size(level);
@@ -427,6 +426,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte
 	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ptep);
 	smp_store_release(ptep, new);
+	*old = new;
 
 	return 0;
 }
@@ -768,7 +768,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 }
 
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-				struct stage2_map_data *data);
+				kvm_pte_t *old, struct stage2_map_data *data);
 
 static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 				     kvm_pte_t *ptep, kvm_pte_t *old,
@@ -791,7 +791,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	 */
 	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
 
-	ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+	ret = stage2_map_walk_leaf(addr, end, level, ptep, old, data);
 
 	mm_ops->put_page(ptep);
 	mm_ops->free_removed_table(childp, level + 1, pgt);
@@ -832,6 +832,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ptep);
 	smp_store_release(ptep, new);
+	*old = new;
 
 	return 0;
 }
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/14] KVM: arm64: Document behavior of pgtable visitor callback
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (5 preceding siblings ...)
  2022-08-30 19:41 ` [PATCH 06/14] KVM: arm64: Return next table from map callbacks Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-08-30 19:41 ` [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU Oliver Upton
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

The argument list to kvm_pgtable_visitor_fn_t has gotten rather long.
Additionally, @old serves as both an input and output parameter, which
isn't easily discerned from the declaration alone.

Document the meaning of the visitor callback arguments and the
conditions under which @old was written to.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 47920ae3f7e7..78fbb7be1af6 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -194,6 +194,22 @@ enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 };
 
+/**
+ * kvm_pgtable_visitor_fn_t - Page table traversal callback for visiting a PTE.
+ * @addr:	Input address (IA) mapped by the PTE.
+ * @end:	IA corresponding to the end of the page table traversal range.
+ * @ptep:	Pointer to the PTE.
+ * @old:	Value of the PTE observed by the visitor. Also used as an output
+ *		parameter for returning the new PTE value.
+ * @flag:	Flag identifying the entry type visited.
+ * @arg:	Argument passed to the callback function.
+ *
+ * Callback function signature invoked during page table traversal. Optionally
+ * returns the new value of the PTE via @old if the new value requires further
+ * traversal (i.e. installing a new table).
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
 typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
 					kvm_pte_t *ptep, kvm_pte_t *old,
 					enum kvm_pgtable_walk_flags flag,
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (6 preceding siblings ...)
  2022-08-30 19:41 ` [PATCH 07/14] KVM: arm64: Document behavior of pgtable visitor callback Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-09-07 21:47   ` David Matlack
  2022-08-30 19:41 ` [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Oliver Upton
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

The use of RCU is necessary to change the paging structures in parallel.
Acquire and release an RCU read lock when traversing the page tables.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 19 ++++++++++++++++++-
 arch/arm64/kvm/hyp/pgtable.c         |  7 ++++++-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 78fbb7be1af6..7d2de0a98ccb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -578,9 +578,26 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
  */
 enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
 
+#if defined(__KVM_NVHE_HYPERVISOR___)
+
+static inline void kvm_pgtable_walk_begin(void) {}
+static inline void kvm_pgtable_walk_end(void) {}
+
+#define kvm_dereference_ptep rcu_dereference_raw
+
+#else	/* !defined(__KVM_NVHE_HYPERVISOR__) */
+
+#define kvm_pgtable_walk_begin	rcu_read_lock
+#define kvm_pgtable_walk_end	rcu_read_unlock
+#define kvm_dereference_ptep	rcu_dereference
+
+#endif	/* defined(__KVM_NVHE_HYPERVISOR__) */
+
 static inline kvm_pte_t kvm_pte_read(kvm_pte_t *ptep)
 {
-	return READ_ONCE(*ptep);
+	kvm_pte_t __rcu *p = (kvm_pte_t __rcu *)ptep;
+
+	return READ_ONCE(*kvm_dereference_ptep(p));
 }
 
 #endif	/* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f911509e6512..215a14c434ed 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -284,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.end	= PAGE_ALIGN(walk_data.addr + size),
 		.walker	= walker,
 	};
+	int r;
 
-	return _kvm_pgtable_walk(&walk_data);
+	kvm_pgtable_walk_begin();
+	r = _kvm_pgtable_walk(&walk_data);
+	kvm_pgtable_walk_end();
+
+	return r;
 }
 
 struct leaf_walk_data {
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (7 preceding siblings ...)
  2022-08-30 19:41 ` [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU Oliver Upton
@ 2022-08-30 19:41 ` Oliver Upton
  2022-09-07 22:00   ` David Matlack
  2022-09-14  0:49   ` Ricardo Koller
  2022-08-30 19:50 ` [PATCH 10/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks Oliver Upton
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:41 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel

There is no real urgency to free a stage-2 subtree that was pruned.
Nonetheless, KVM does the tear down in the stage-2 fault path while
holding the MMU lock.

Free removed stage-2 subtrees after an RCU grace period. To guarantee
all stage-2 table pages are freed before killing a VM, add an
rcu_barrier() to the flush path.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 91521f4aab97..265951c05879 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -97,6 +97,38 @@ static void *stage2_memcache_zalloc_page(void *arg)
 	return kvm_mmu_memory_cache_alloc(mc);
 }
 
+#define STAGE2_PAGE_PRIVATE_LEVEL_MASK	GENMASK_ULL(2, 0)
+
+static inline unsigned long stage2_page_private(u32 level, void *arg)
+{
+	unsigned long pvt = (unsigned long)arg;
+
+	BUILD_BUG_ON(KVM_PGTABLE_MAX_LEVELS > STAGE2_PAGE_PRIVATE_LEVEL_MASK);
+	WARN_ON_ONCE(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK);
+
+	return pvt | level;
+}
+
+static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
+{
+	struct page *page = container_of(head, struct page, rcu_head);
+	unsigned long pvt = page_private(page);
+	void *arg = (void *)(pvt & ~STAGE2_PAGE_PRIVATE_LEVEL_MASK);
+	u32 level = (u32)(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK);
+	void *pgtable = page_to_virt(page);
+
+	kvm_pgtable_stage2_free_removed(pgtable, level, arg);
+}
+
+static void stage2_free_removed_table(void *pgtable, u32 level, void *arg)
+{
+	unsigned long pvt = stage2_page_private(level, arg);
+	struct page *page = virt_to_page(pgtable);
+
+	set_page_private(page, (unsigned long)pvt);
+	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
+}
+
 static void *kvm_host_zalloc_pages_exact(size_t size)
 {
 	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
@@ -627,7 +659,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.zalloc_page		= stage2_memcache_zalloc_page,
 	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
 	.free_pages_exact	= free_pages_exact,
-	.free_removed_table	= kvm_pgtable_stage2_free_removed,
+	.free_removed_table	= stage2_free_removed_table,
 	.get_page		= kvm_host_get_page,
 	.put_page		= kvm_host_put_page,
 	.page_count		= kvm_host_page_count,
@@ -770,6 +802,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	if (pgt) {
 		kvm_pgtable_stage2_destroy(pgt);
 		kfree(pgt);
+		rcu_barrier();
 	}
 }
 
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (8 preceding siblings ...)
  2022-08-30 19:41 ` [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Oliver Upton
@ 2022-08-30 19:50 ` Oliver Upton
  2022-08-30 19:51 ` [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware Oliver Upton
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:50 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, linux-kernel

The stage2 attr walker is already used for parallel walks. Since commit
f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
during dirty logging"), KVM acquires the read lock when
write-unprotecting a PTE. However, the walker only uses a simple store
to update the PTE. This is safe as the only possible race is with
hardware updates to the access flag, which is benign.

However, a subsequent change to KVM will allow more changes to the stage
2 page tables to be done in parallel. Prepare the stage 2 attribute
walker by performing atomic updates to the PTE when walking in parallel.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 215a14c434ed..61a4437c8c16 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -691,6 +691,16 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return kvm_pte_valid(pte) || kvm_invalid_pte_owner(pte);
 }
 
+static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bool shared)
+{
+	if (!shared) {
+		WRITE_ONCE(*ptep, new);
+		return true;
+	}
+
+	return cmpxchg(ptep, old, new) == old;
+}
+
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
 			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -985,6 +995,7 @@ struct stage2_attr_data {
 	kvm_pte_t			pte;
 	u32				level;
 	struct kvm_pgtable_mm_ops	*mm_ops;
+	bool				shared;
 };
 
 static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
@@ -1017,7 +1028,9 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		    stage2_pte_executable(pte) && !stage2_pte_executable(data->pte))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
 						  kvm_granule_size(level));
-		WRITE_ONCE(*ptep, pte);
+
+		if (!stage2_try_set_pte(ptep, data->pte, pte, data->shared))
+			return -EAGAIN;
 	}
 
 	return 0;
@@ -1026,7 +1039,7 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level)
+				    u32 *level, bool shared)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1034,6 +1047,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 		.attr_set	= attr_set & attr_mask,
 		.attr_clr	= attr_clr & attr_mask,
 		.mm_ops		= pgt->mm_ops,
+		.shared		= shared,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
@@ -1057,14 +1071,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	return stage2_update_leaf_attrs(pgt, addr, size, 0,
 					KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
-					NULL, NULL);
+					NULL, NULL, false);
 }
 
 kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
 	stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
-				 &pte, NULL);
+				 &pte, NULL, false);
 	dsb(ishst);
 	return pte;
 }
@@ -1073,7 +1087,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
 	stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
-				 &pte, NULL);
+				 &pte, NULL, false);
 	/*
 	 * "But where's the TLBI?!", you scream.
 	 * "Over in the core code", I sigh.
@@ -1086,7 +1100,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
 bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
-	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
+	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, false);
 	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
 }
 
@@ -1109,7 +1123,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	if (prot & KVM_PGTABLE_PROT_X)
 		clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 
-	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level);
+	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level, true);
 	if (!ret)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
 	return ret;
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (9 preceding siblings ...)
  2022-08-30 19:50 ` [PATCH 10/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks Oliver Upton
@ 2022-08-30 19:51 ` Oliver Upton
  2022-09-14  0:51   ` Ricardo Koller
  2022-08-30 19:51 ` [PATCH 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware Oliver Upton
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:51 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, linux-kernel

In order to service stage-2 faults in parallel, stage-2 table walkers
must take exclusive ownership of the PTE being worked on. An additional
requirement of the architecture is that software must perform a
'break-before-make' operation when changing the block size used for
mapping memory.

Roll these two concepts together into helpers for performing a
'break-before-make' sequence. Use a special PTE value to indicate a PTE
has been locked by a software walker. Additionally, use an atomic
compare-exchange to 'break' the PTE when the stage-2 page tables are
possibly shared with another software walker. Elide the DSB + TLBI if
the evicted PTE was invalid (and thus not subject to break-before-make).

All of the atomics do nothing for now, as the stage-2 walker isn't fully
ready to perform parallel walks.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 87 +++++++++++++++++++++++++++++++++---
 1 file changed, 82 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 61a4437c8c16..71ae96608752 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -49,6 +49,12 @@
 #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
 #define KVM_MAX_OWNER_ID		1
 
+/*
+ * Used to indicate a pte for which a 'break-before-make' sequence is in
+ * progress.
+ */
+#define KVM_INVALID_PTE_LOCKED		BIT(10)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -586,6 +592,8 @@ struct stage2_map_data {
 
 	/* Force mappings to page granularity */
 	bool				force_pte;
+
+	bool				shared;
 };
 
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
@@ -691,6 +699,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return kvm_pte_valid(pte) || kvm_invalid_pte_owner(pte);
 }
 
+static bool stage2_pte_is_locked(kvm_pte_t pte)
+{
+	return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
+}
+
 static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bool shared)
 {
 	if (!shared) {
@@ -701,6 +714,69 @@ static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bo
 	return cmpxchg(ptep, old, new) == old;
 }
 
+/**
+ * stage2_try_break_pte() - Invalidates a pte according to the
+ *			    'break-before-make' requirements of the
+ *			    architecture.
+ *
+ * @ptep: Pointer to the pte to break
+ * @old: The previously observed value of the pte
+ * @addr: IPA corresponding to the pte
+ * @level: Table level of the pte
+ * @shared: true if the stage-2 page tables could be shared by multiple software
+ *	    walkers
+ *
+ * Returns: true if the pte was successfully broken.
+ *
+ * If the removed pte was valid, performs the necessary serialization and TLB
+ * invalidation for the old value. For counted ptes, drops the reference count
+ * on the containing table page.
+ */
+static bool stage2_try_break_pte(kvm_pte_t *ptep, kvm_pte_t old, u64 addr, u32 level,
+				 struct stage2_map_data *data)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+
+	if (stage2_pte_is_locked(old)) {
+		/*
+		 * Should never occur if this walker has exclusive access to the
+		 * page tables.
+		 */
+		WARN_ON(!data->shared);
+		return false;
+	}
+
+	if (!stage2_try_set_pte(ptep, old, KVM_INVALID_PTE_LOCKED, data->shared))
+		return false;
+
+	/*
+	 * Perform the appropriate TLB invalidation based on the evicted pte
+	 * value (if any).
+	 */
+	if (kvm_pte_table(old, level))
+		kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+	else if (kvm_pte_valid(old))
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+
+	if (stage2_pte_is_counted(old))
+		mm_ops->put_page(ptep);
+
+	return true;
+}
+
+static void stage2_make_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new,
+			    struct stage2_map_data *data)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+
+	WARN_ON(!stage2_pte_is_locked(*ptep));
+
+	if (stage2_pte_is_counted(new))
+		mm_ops->get_page(ptep);
+
+	smp_store_release(ptep, new);
+}
+
 static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
 			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -836,17 +912,18 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (!childp)
 		return -ENOMEM;
 
+	if (!stage2_try_break_pte(ptep, *old, addr, level, data)) {
+		mm_ops->put_page(childp);
+		return -EAGAIN;
+	}
+
 	/*
 	 * If we've run into an existing block mapping then replace it with
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (stage2_pte_is_counted(pte))
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
-
 	new = kvm_init_table_pte(childp, mm_ops);
-	mm_ops->get_page(ptep);
-	smp_store_release(ptep, new);
+	stage2_make_pte(ptep, *old, new, data);
 	*old = new;
 
 	return 0;
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (10 preceding siblings ...)
  2022-08-30 19:51 ` [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware Oliver Upton
@ 2022-08-30 19:51 ` Oliver Upton
  2022-08-30 19:51 ` [PATCH 13/14] KVM: arm64: Make table->block " Oliver Upton
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:51 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, linux-kernel

Convert stage2_map_walker_try_leaf() to use the new break-before-make
helpers, thereby making the handler parallel-aware. As before, avoid the
break-before-make if recreating the existing mapping. Additionally,
retry execution if another vCPU thread is modifying the same PTE.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 71ae96608752..de1d352657d0 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -829,18 +829,17 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
-	if (stage2_pte_is_counted(old)) {
-		/*
-		 * Skip updating the PTE if we are trying to recreate the exact
-		 * same mapping or only change the access permissions. Instead,
-		 * the vCPU will exit one more time from guest if still needed
-		 * and then go through the path of relaxing permissions.
-		 */
-		if (!stage2_pte_needs_update(old, new))
-			return -EAGAIN;
+	/*
+	 * Skip updating the PTE if we are trying to recreate the exact
+	 * same mapping or only change the access permissions. Instead,
+	 * the vCPU will exit one more time from guest if still needed
+	 * and then go through the path of relaxing permissions.
+	 */
+	if (!stage2_pte_needs_update(old, new))
+		return -EAGAIN;
 
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
-	}
+	if (!stage2_try_break_pte(ptep, old, addr, level, data))
+		return -EAGAIN;
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
 	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
@@ -850,9 +849,8 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-	smp_store_release(ptep, new);
-	if (stage2_pte_is_counted(new))
-		mm_ops->get_page(ptep);
+	stage2_make_pte(ptep, old, new, data);
+
 	if (kvm_phys_is_valid(phys))
 		data->phys += granule;
 	return 0;
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/14] KVM: arm64: Make table->block changes parallel-aware
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (11 preceding siblings ...)
  2022-08-30 19:51 ` [PATCH 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware Oliver Upton
@ 2022-08-30 19:51 ` Oliver Upton
  2022-08-30 19:52 ` [PATCH 14/14] KVM: arm64: Handle stage-2 faults in parallel Oliver Upton
  2022-09-06 10:00 ` [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Marc Zyngier
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:51 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, linux-kernel

stage2_map_walk_leaf() and friends now handle stage-2 PTEs generically,
and perform the correct flush when a table PTE is removed. Additionally,
they've been made parallel-aware, using an atomic break to take
ownership of the PTE.

Stop clearing the PTE in the pre-order callback and instead let
stage2_map_walk_leaf() deal with it.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index de1d352657d0..92e230e7bf3a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -871,21 +871,12 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
 		return 0;
 
-	kvm_clear_pte(ptep);
-
-	/*
-	 * Invalidate the whole stage-2, as we may have numerous leaf
-	 * entries below us which would otherwise need invalidating
-	 * individually.
-	 */
-	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-
 	ret = stage2_map_walk_leaf(addr, end, level, ptep, old, data);
+	if (ret)
+		return ret;
 
-	mm_ops->put_page(ptep);
 	mm_ops->free_removed_table(childp, level + 1, pgt);
-
-	return ret;
+	return 0;
 }
 
 static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/14] KVM: arm64: Handle stage-2 faults in parallel
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (12 preceding siblings ...)
  2022-08-30 19:51 ` [PATCH 13/14] KVM: arm64: Make table->block " Oliver Upton
@ 2022-08-30 19:52 ` Oliver Upton
  2022-09-06 10:00 ` [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Marc Zyngier
  14 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-08-30 19:52 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Oliver Upton, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, kvm, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, David Matlack, Ben Gardon, Paolo Bonzini,
	Gavin Shan, Peter Xu, Sean Christopherson, linux-kernel

The stage-2 map walker has been made parallel-aware, and as such can be
called while only holding the read side of the MMU lock. Rip out the
conditional locking in user_mem_abort() and instead grab the read lock.
Continue to take the write lock from other callsites to
kvm_pgtable_stage2_map().

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  4 +++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  2 +-
 arch/arm64/kvm/hyp/pgtable.c          |  3 ++-
 arch/arm64/kvm/mmu.c                  | 31 ++++++---------------------
 4 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 7d2de0a98ccb..dc839db86a1a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -355,6 +355,8 @@ void kvm_pgtable_stage2_free_removed(void *pgtable, u32 level, void *arg);
  * @prot:	Permissions and attributes for the mapping.
  * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
  *		page-table pages.
+ * @shared:	true if multiple software walkers could be traversing the tables
+ *		in parallel
  *
  * The offset of @addr within a page is ignored, @size is rounded-up to
  * the next page boundary and @phys is rounded-down to the previous page
@@ -376,7 +378,7 @@ void kvm_pgtable_stage2_free_removed(void *pgtable, u32 level, void *arg);
  */
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   void *mc);
+			   void *mc, bool shared);
 
 /**
  * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 61cf223e0796..924d028af447 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -252,7 +252,7 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
 				      enum kvm_pgtable_prot prot)
 {
 	return kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
-				      prot, &host_s2_pool);
+				      prot, &host_s2_pool, false);
 }
 
 /*
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 92e230e7bf3a..52ecaaa84b22 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -944,7 +944,7 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   void *mc)
+			   void *mc, bool shared)
 {
 	int ret;
 	struct stage2_map_data map_data = {
@@ -953,6 +953,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.memcache	= mc,
 		.mm_ops		= pgt->mm_ops,
 		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
+		.shared		= shared,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 265951c05879..a73adc35cf41 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -840,7 +840,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 
 		write_lock(&kvm->mmu_lock);
 		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
-					     &cache);
+					     &cache, false);
 		write_unlock(&kvm->mmu_lock);
 		if (ret)
 			break;
@@ -1135,7 +1135,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	bool use_read_lock = false;
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
@@ -1170,8 +1169,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (logging_active) {
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
-		use_read_lock = (fault_status == FSC_PERM && write_fault &&
-				 fault_granule == PAGE_SIZE);
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
 	}
@@ -1270,15 +1267,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
-	/*
-	 * To reduce MMU contentions and enhance concurrency during dirty
-	 * logging dirty logging, only acquire read lock for permission
-	 * relaxation.
-	 */
-	if (use_read_lock)
-		read_lock(&kvm->mmu_lock);
-	else
-		write_lock(&kvm->mmu_lock);
+	read_lock(&kvm->mmu_lock);
 	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_invalidate_retry(kvm, mmu_seq))
 		goto out_unlock;
@@ -1322,15 +1311,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
+	if (fault_status == FSC_PERM && vma_pagesize == fault_granule)
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
-	} else {
-		WARN_ONCE(use_read_lock, "Attempted stage-2 map outside of write lock\n");
-
+	else
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
-					     memcache);
-	}
+					     memcache, true);
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
@@ -1339,10 +1325,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 out_unlock:
-	if (use_read_lock)
-		read_unlock(&kvm->mmu_lock);
-	else
-		write_unlock(&kvm->mmu_lock);
+	read_unlock(&kvm->mmu_lock);
 	kvm_set_pfn_accessed(pfn);
 	kvm_release_pfn_clean(pfn);
 	return ret != -EAGAIN ? ret : 0;
@@ -1548,7 +1531,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	 */
 	kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
 			       PAGE_SIZE, __pfn_to_phys(pfn),
-			       KVM_PGTABLE_PROT_R, NULL);
+			       KVM_PGTABLE_PROT_R, NULL, false);
 
 	return false;
 }
-- 
2.37.2.672.g94769d06f0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling
  2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
                   ` (13 preceding siblings ...)
  2022-08-30 19:52 ` [PATCH 14/14] KVM: arm64: Handle stage-2 faults in parallel Oliver Upton
@ 2022-09-06 10:00 ` Marc Zyngier
  2022-09-09 10:01   ` Oliver Upton
  14 siblings, 1 reply; 32+ messages in thread
From: Marc Zyngier @ 2022-09-06 10:00 UTC (permalink / raw)
  To: Oliver Upton
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Will Deacon, Quentin Perret, Ricardo Koller, Reiji Watanabe,
	David Matlack, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm

On Tue, 30 Aug 2022 20:41:18 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
> 
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
> 
> Patches 1-2 are a cleanup to the way we collapse page tables, with the
> added benefit of narrowing the window of time a range of memory is
> unmapped.
> 
> Patches 3-7 are minor cleanups and refactorings to the way KVM reads
> PTEs and traverses the stage 2 page tables to make it amenable to
> concurrent modification.
> 
> Patches 8-9 use RCU to punt page table cleanup out of the vCPU fault
> path, which should also improve fault latency a bit.
> 
> Patches 10-14 implement the meat of this series, extending the
> 'break-before-make' sequence with atomics to realize locking on PTEs.
> Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
> changes to a given PTE.
> 
> Finally, patch 15 flips the switch on all the new code and starts
> grabbing the read side of the MMU lock for stage 2 faults.
> 
> Applies to 6.0-rc3. Tested with KVM selftests and benchmarked with
> dirty_log_perf_test, scaling from 1 to 48 vCPUs with 4GB of memory per
> vCPU backed by THP.
> 
>   ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}
> 
> Time to dirty memory:
> 
>         +-------+---------+------------------+
>         | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
>         +-------+---------+------------------+
>         |     1 | 0.89s   | 0.92s            |
>         |     2 | 1.13s   | 1.18s            |
>         |     4 | 2.42s   | 1.25s            |
>         |     8 | 5.03s   | 1.36s            |
>         |    16 | 8.84s   | 2.09s            |
>         |    32 | 19.60s  | 4.47s            |
>         |    48 | 31.39s  | 6.22s            |
>         +-------+---------+------------------+
> 
> It is also worth mentioning that the time to populate memory has
> improved:
> 
>         +-------+---------+------------------+
>         | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
>         +-------+---------+------------------+
>         |     1 | 0.19s   | 0.18s            |
>         |     2 | 0.25s   | 0.21s            |
>         |     4 | 0.38s   | 0.32s            |
>         |     8 | 0.64s   | 0.40s            |
>         |    16 | 1.22s   | 0.54s            |
>         |    32 | 2.50s   | 1.03s            |
>         |    48 | 3.88s   | 1.52s            |
>         +-------+---------+------------------+
> 
> RFC: https://lore.kernel.org/kvmarm/20220415215901.1737897-1-oupton@google.com/
> 
> RFC -> v1:
>  - Factored out page table teardown from kvm_pgtable_stage2_map()
>  - Use the RCU callback to tear down a subtree, instead of scheduling a
>    callback for every individual table page.
>  - Reorganized series to (hopefully) avoid intermediate breakage.
>  - Dropped the use of page headers, instead stuffing KVM metadata into
>    page::private directly
> 
> Oliver Upton (14):
>   KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
>   KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
>   KVM: arm64: Directly read owner id field in stage2_pte_is_counted()
>   KVM: arm64: Read the PTE once per visit
>   KVM: arm64: Split init and set for table PTE
>   KVM: arm64: Return next table from map callbacks
>   KVM: arm64: Document behavior of pgtable visitor callback
>   KVM: arm64: Protect page table traversal with RCU
>   KVM: arm64: Free removed stage-2 tables in RCU callback
>   KVM: arm64: Atomically update stage 2 leaf attributes in parallel
>     walks
>   KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
>   KVM: arm64: Make leaf->leaf PTE changes parallel-aware
>   KVM: arm64: Make table->block changes parallel-aware
>   KVM: arm64: Handle stage-2 faults in parallel
> 
>  arch/arm64/include/asm/kvm_pgtable.h  |  59 ++++-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |   7 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |   4 +-
>  arch/arm64/kvm/hyp/pgtable.c          | 360 ++++++++++++++++----------
>  arch/arm64/kvm/mmu.c                  |  65 +++--
>  5 files changed, 325 insertions(+), 170 deletions(-)

This fails to build on -rc4:

  MODPOST vmlinux.symvers
  MODINFO modules.builtin.modinfo
  GEN     modules.builtin
  CC      .vmlinux.export.o
  LD      .tmp_vmlinux.kallsyms1
ld: Unexpected GOT/PLT entries detected!
ld: Unexpected run-time procedure linkages detected!
ld: ID map text too big or misaligned
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_walk':
(.hyp.text+0xdc0c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc1c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_get_leaf':
(.hyp.text+0xdc80): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc90): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_map':
(.hyp.text+0xddb0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xddc0): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_unmap':
(.hyp.text+0xde44): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xde50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_destroy':
(.hyp.text+0xdf40): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdf50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_map':
(.hyp.text+0xe16c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe17c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_set_owner':
(.hyp.text+0xe264): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe274): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_unmap':
(.hyp.text+0xe2d4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe2e4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_flush':
(.hyp.text+0xe5b4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe5c4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_destroy':
(.hyp.text+0xe6f0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe700): undefined reference to `__kvm_nvhe___rcu_read_unlock'
make[3]: *** [Makefile:1169: vmlinux] Error 1
make[2]: *** [debian/rules:7: build-arch] Error 2

as this drags the RCU read-lock into EL2, and that's not going to
work... The following fixes it, but I wonder how you tested it.

Thanks,

	M.

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index dc839db86a1a..adf170122daf 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -580,7 +580,7 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
  */
 enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
 
-#if defined(__KVM_NVHE_HYPERVISOR___)
+#if defined(__KVM_NVHE_HYPERVISOR__)
 
 static inline void kvm_pgtable_walk_begin(void) {}
 static inline void kvm_pgtable_walk_end(void) {}

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-08-30 19:41 ` [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
@ 2022-09-06 14:35   ` Quentin Perret
  2022-09-09 10:04     ` Oliver Upton
  2022-09-07 20:57   ` David Matlack
  2022-09-14  0:20   ` Ricardo Koller
  2 siblings, 1 reply; 32+ messages in thread
From: Quentin Perret @ 2022-09-06 14:35 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Ricardo Koller, Reiji Watanabe,
	David Matlack, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

Hi Oliver,

On Tuesday 30 Aug 2022 at 19:41:20 (+0000), Oliver Upton wrote:
>  static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  				     kvm_pte_t *ptep,
>  				     struct stage2_map_data *data)
>  {
> -	if (data->anchor)
> -		return 0;
> +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops);
> +	struct kvm_pgtable *pgt = data->mmu->pgt;
> +	int ret;
>  
>  	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
>  		return 0;
>  
> -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
>  	kvm_clear_pte(ptep);
>  
>  	/*
> @@ -782,8 +786,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  	 * individually.
>  	 */
>  	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -	data->anchor = ptep;
> -	return 0;
> +
> +	ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +
> +	mm_ops->put_page(ptep);
> +	mm_ops->free_removed_table(childp, level + 1, pgt);

By the look of it, __kvm_pgtable_visit() has saved the table PTE on the
stack prior to calling the TABLE_PRE callback, and it then uses the PTE
from its stack and does kvm_pte_follow() to find the childp, and walks
from there. Would that be a UAF now?

> +	return ret;
>  }

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-08-30 19:41 ` [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
  2022-09-06 14:35   ` Quentin Perret
@ 2022-09-07 20:57   ` David Matlack
  2022-09-09 10:07     ` Oliver Upton
  2022-09-14  0:20   ` Ricardo Koller
  2 siblings, 1 reply; 32+ messages in thread
From: David Matlack @ 2022-09-07 20:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Tue, Aug 30, 2022 at 07:41:20PM +0000, Oliver Upton wrote:
[...]
>  
> +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +				struct stage2_map_data *data);
> +
>  static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  				     kvm_pte_t *ptep,
>  				     struct stage2_map_data *data)
>  {
> -	if (data->anchor)

Should @anchor and @childp be removed from struct stage2_map_data? This
commit removes the only remaining references to them.

> -		return 0;
> +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops);
> +	struct kvm_pgtable *pgt = data->mmu->pgt;
> +	int ret;
>  
>  	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
>  		return 0;
>  
> -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
>  	kvm_clear_pte(ptep);
>  
>  	/*
[...]
>  static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  			     enum kvm_pgtable_walk_flags flag, void * const arg)
> @@ -883,11 +849,9 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
>  	case KVM_PGTABLE_WALK_LEAF:
>  		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> -	case KVM_PGTABLE_WALK_TABLE_POST:
> -		return stage2_map_walk_table_post(addr, end, level, ptep, data);

kvm_pgtable_stage2_set_owner() still uses stage2_map_walker() with
KVM_PGTABLE_WALK_TABLE_POST.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/14] KVM: arm64: Return next table from map callbacks
  2022-08-30 19:41 ` [PATCH 06/14] KVM: arm64: Return next table from map callbacks Oliver Upton
@ 2022-09-07 21:32   ` David Matlack
  2022-09-09  9:38     ` Oliver Upton
  0 siblings, 1 reply; 32+ messages in thread
From: David Matlack @ 2022-09-07 21:32 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Tue, Aug 30, 2022 at 07:41:24PM +0000, Oliver Upton wrote:
> The map walkers install new page tables during their traversal. Return
> the newly-installed table PTE from the map callbacks to point the walker
> at the new table w/o rereading the ptep.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 331f6e3b2c20..f911509e6512 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -202,13 +202,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
>  		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte,
>  					     KVM_PGTABLE_WALK_LEAF);
> -		pte = *ptep;
> -		table = kvm_pte_table(pte, level);
>  	}
>  
>  	if (ret)
>  		goto out;

Rather than passing a pointer to the local variable pte and requiring
all downstream code to update it (and deal with dereferencing to read
the old pte), wouldn't it be simpler to just re-read the PTE here? e.g.

        /*
         * Explicitly re-read the PTE since it may have been modified
         * during the TABLE_PRE or LEAF callback.
         */
        pte = kvm_pte_read(ptep);

This should also result in better behavior once parallelization is
introduced, because it will prevent the walker from traversing down and
doing a bunch of work on page tables that are in the process of being
freed by some other thread.

>  
> +	table = kvm_pte_table(pte, level);
>  	if (!table) {

nit: Technically there's no reason to set @table again. e.g. This could
just be:

        if (!kvm_pte_table(pte, level)) {

>  		data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
>  		data->addr += kvm_granule_size(level);
> @@ -427,6 +426,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte
>  	new = kvm_init_table_pte(childp, mm_ops);
>  	mm_ops->get_page(ptep);
>  	smp_store_release(ptep, new);
> +	*old = new;
>  
>  	return 0;
>  }
> @@ -768,7 +768,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>  }
>  
>  static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -				struct stage2_map_data *data);
> +				kvm_pte_t *old, struct stage2_map_data *data);
>  
>  static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  				     kvm_pte_t *ptep, kvm_pte_t *old,
> @@ -791,7 +791,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  	 */
>  	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
>  
> -	ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +	ret = stage2_map_walk_leaf(addr, end, level, ptep, old, data);
>  
>  	mm_ops->put_page(ptep);
>  	mm_ops->free_removed_table(childp, level + 1, pgt);
> @@ -832,6 +832,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	new = kvm_init_table_pte(childp, mm_ops);
>  	mm_ops->get_page(ptep);
>  	smp_store_release(ptep, new);
> +	*old = new;
>  
>  	return 0;
>  }
> -- 
> 2.37.2.672.g94769d06f0-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU
  2022-08-30 19:41 ` [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU Oliver Upton
@ 2022-09-07 21:47   ` David Matlack
  2022-09-09  9:55     ` Oliver Upton
  0 siblings, 1 reply; 32+ messages in thread
From: David Matlack @ 2022-09-07 21:47 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Tue, Aug 30, 2022 at 07:41:26PM +0000, Oliver Upton wrote:
> The use of RCU is necessary to change the paging structures in parallel.
> Acquire and release an RCU read lock when traversing the page tables.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 19 ++++++++++++++++++-
>  arch/arm64/kvm/hyp/pgtable.c         |  7 ++++++-
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 78fbb7be1af6..7d2de0a98ccb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -578,9 +578,26 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
>   */
>  enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
>  
> +#if defined(__KVM_NVHE_HYPERVISOR___)
> +

Future readers will wonder why NVHE stubs out RCU support and how that
is even correct. Some comments here would be useful explain it.

> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +#define kvm_dereference_ptep rcu_dereference_raw

How does NVHE have access rcu_dereference_raw()?

> +
> +#else	/* !defined(__KVM_NVHE_HYPERVISOR__) */
> +
> +#define kvm_pgtable_walk_begin	rcu_read_lock
> +#define kvm_pgtable_walk_end	rcu_read_unlock
> +#define kvm_dereference_ptep	rcu_dereference
> +
> +#endif	/* defined(__KVM_NVHE_HYPERVISOR__) */
> +
>  static inline kvm_pte_t kvm_pte_read(kvm_pte_t *ptep)
>  {
> -	return READ_ONCE(*ptep);
> +	kvm_pte_t __rcu *p = (kvm_pte_t __rcu *)ptep;
> +
> +	return READ_ONCE(*kvm_dereference_ptep(p));

What about all the other places where page table memory is accessed?

If RCU is going to be used to protect page table memory, then all
accesses have to go under an RCU critical section. This means that page
table memory should only be accessed through __rcu annotated pointers
and dereferenced with rcu_dereference().

>  }
>  
>  #endif	/* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index f911509e6512..215a14c434ed 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -284,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  		.end	= PAGE_ALIGN(walk_data.addr + size),
>  		.walker	= walker,
>  	};
> +	int r;
>  
> -	return _kvm_pgtable_walk(&walk_data);
> +	kvm_pgtable_walk_begin();
> +	r = _kvm_pgtable_walk(&walk_data);
> +	kvm_pgtable_walk_end();
> +
> +	return r;
>  }
>  
>  struct leaf_walk_data {
> -- 
> 2.37.2.672.g94769d06f0-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback
  2022-08-30 19:41 ` [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Oliver Upton
@ 2022-09-07 22:00   ` David Matlack
  2022-09-08 16:40     ` David Matlack
  2022-09-14  0:49   ` Ricardo Koller
  1 sibling, 1 reply; 32+ messages in thread
From: David Matlack @ 2022-09-07 22:00 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Tue, Aug 30, 2022 at 07:41:27PM +0000, Oliver Upton wrote:
> There is no real urgency to free a stage-2 subtree that was pruned.
> Nonetheless, KVM does the tear down in the stage-2 fault path while
> holding the MMU lock.
> 
> Free removed stage-2 subtrees after an RCU grace period. To guarantee
> all stage-2 table pages are freed before killing a VM, add an
> rcu_barrier() to the flush path.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 91521f4aab97..265951c05879 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -97,6 +97,38 @@ static void *stage2_memcache_zalloc_page(void *arg)
>  	return kvm_mmu_memory_cache_alloc(mc);
>  }
>  
> +#define STAGE2_PAGE_PRIVATE_LEVEL_MASK	GENMASK_ULL(2, 0)
> +
> +static inline unsigned long stage2_page_private(u32 level, void *arg)
> +{
> +	unsigned long pvt = (unsigned long)arg;
> +
> +	BUILD_BUG_ON(KVM_PGTABLE_MAX_LEVELS > STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> +	WARN_ON_ONCE(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> +
> +	return pvt | level;
> +}
> +
> +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> +{
> +	struct page *page = container_of(head, struct page, rcu_head);
> +	unsigned long pvt = page_private(page);
> +	void *arg = (void *)(pvt & ~STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> +	u32 level = (u32)(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> +	void *pgtable = page_to_virt(page);
> +
> +	kvm_pgtable_stage2_free_removed(pgtable, level, arg);
> +}
> +
> +static void stage2_free_removed_table(void *pgtable, u32 level, void *arg)
> +{
> +	unsigned long pvt = stage2_page_private(level, arg);
> +	struct page *page = virt_to_page(pgtable);
> +
> +	set_page_private(page, (unsigned long)pvt);
> +	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
> +}
> +
>  static void *kvm_host_zalloc_pages_exact(size_t size)
>  {
>  	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> @@ -627,7 +659,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
>  	.zalloc_page		= stage2_memcache_zalloc_page,
>  	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
>  	.free_pages_exact	= free_pages_exact,
> -	.free_removed_table	= kvm_pgtable_stage2_free_removed,
> +	.free_removed_table	= stage2_free_removed_table,
>  	.get_page		= kvm_host_get_page,
>  	.put_page		= kvm_host_put_page,
>  	.page_count		= kvm_host_page_count,
> @@ -770,6 +802,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  	if (pgt) {
>  		kvm_pgtable_stage2_destroy(pgt);
>  		kfree(pgt);
> +		rcu_barrier();

A comment here would be useful to document the behavior. e.g.

        /*
         * Wait for all stage-2 page tables that are being freed
         * asynchronously via RCU callback because ...
         */

Speaking of, what's the reason for this rcu_barrier()? Is there any
reason why KVM can't let in-flight stage-2 freeing RCU callbacks run at
the end of the next grace period?

>  	}
>  }
>  
> -- 
> 2.37.2.672.g94769d06f0-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback
  2022-09-07 22:00   ` David Matlack
@ 2022-09-08 16:40     ` David Matlack
  0 siblings, 0 replies; 32+ messages in thread
From: David Matlack @ 2022-09-08 16:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Wed, Sep 07, 2022 at 03:00:18PM -0700, David Matlack wrote:
> On Tue, Aug 30, 2022 at 07:41:27PM +0000, Oliver Upton wrote:
> > There is no real urgency to free a stage-2 subtree that was pruned.
> > Nonetheless, KVM does the tear down in the stage-2 fault path while
> > holding the MMU lock.
> > 
> > Free removed stage-2 subtrees after an RCU grace period. To guarantee
> > all stage-2 table pages are freed before killing a VM, add an
> > rcu_barrier() to the flush path.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++++++++-
> >  1 file changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 91521f4aab97..265951c05879 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -97,6 +97,38 @@ static void *stage2_memcache_zalloc_page(void *arg)
> >  	return kvm_mmu_memory_cache_alloc(mc);
> >  }
> >  
> > +#define STAGE2_PAGE_PRIVATE_LEVEL_MASK	GENMASK_ULL(2, 0)
> > +
> > +static inline unsigned long stage2_page_private(u32 level, void *arg)
> > +{
> > +	unsigned long pvt = (unsigned long)arg;
> > +
> > +	BUILD_BUG_ON(KVM_PGTABLE_MAX_LEVELS > STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> > +	WARN_ON_ONCE(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> > +
> > +	return pvt | level;
> > +}
> > +
> > +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> > +{
> > +	struct page *page = container_of(head, struct page, rcu_head);
> > +	unsigned long pvt = page_private(page);
> > +	void *arg = (void *)(pvt & ~STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> > +	u32 level = (u32)(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> > +	void *pgtable = page_to_virt(page);
> > +
> > +	kvm_pgtable_stage2_free_removed(pgtable, level, arg);
> > +}
> > +
> > +static void stage2_free_removed_table(void *pgtable, u32 level, void *arg)
> > +{
> > +	unsigned long pvt = stage2_page_private(level, arg);
> > +	struct page *page = virt_to_page(pgtable);
> > +
> > +	set_page_private(page, (unsigned long)pvt);
> > +	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
> > +}
> > +
> >  static void *kvm_host_zalloc_pages_exact(size_t size)
> >  {
> >  	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> > @@ -627,7 +659,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
> >  	.zalloc_page		= stage2_memcache_zalloc_page,
> >  	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
> >  	.free_pages_exact	= free_pages_exact,
> > -	.free_removed_table	= kvm_pgtable_stage2_free_removed,
> > +	.free_removed_table	= stage2_free_removed_table,
> >  	.get_page		= kvm_host_get_page,
> >  	.put_page		= kvm_host_put_page,
> >  	.page_count		= kvm_host_page_count,
> > @@ -770,6 +802,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> >  	if (pgt) {
> >  		kvm_pgtable_stage2_destroy(pgt);
> >  		kfree(pgt);
> > +		rcu_barrier();
> 
> A comment here would be useful to document the behavior. e.g.
> 
>         /*
>          * Wait for all stage-2 page tables that are being freed
>          * asynchronously via RCU callback because ...
>          */
> 
> Speaking of, what's the reason for this rcu_barrier()? Is there any
> reason why KVM can't let in-flight stage-2 freeing RCU callbacks run at
> the end of the next grace period?

After thinking about this more I have 2 follow-up questions:

1. Should the RCU barrier come before kvm_pgtable_stage2_destroy() and
   kfree(pgt)? Otherwise an RCU callback running
   kvm_pgtable_stage2_free_removed() could access the pgt after it has
   been freed?

2. In general, is it safe for kvm_pgtable_stage2_free_removed() to run
   outside of the MMU lock? Yes the page tables have already been
   disconnected from the tree, but kvm_pgtable_stage2_free_removed()
   also accesses shared data structures likstruct kvm_pgtable. I *think*
   it might be safe after you fix (1.) but it would be more robust to
   avoid accessing shared data structures at all outside of the MMU lock
   and just do the page table freeing in the RCU callback.

> 
> >  	}
> >  }
> >  
> > -- 
> > 2.37.2.672.g94769d06f0-goog
> > 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/14] KVM: arm64: Return next table from map callbacks
  2022-09-07 21:32   ` David Matlack
@ 2022-09-09  9:38     ` Oliver Upton
  0 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-09-09  9:38 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

Hi David,

On Wed, Sep 07, 2022 at 02:32:29PM -0700, David Matlack wrote:
> On Tue, Aug 30, 2022 at 07:41:24PM +0000, Oliver Upton wrote:
> > The map walkers install new page tables during their traversal. Return
> > the newly-installed table PTE from the map callbacks to point the walker
> > at the new table w/o rereading the ptep.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 331f6e3b2c20..f911509e6512 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -202,13 +202,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> >  	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> >  		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, &pte,
> >  					     KVM_PGTABLE_WALK_LEAF);
> > -		pte = *ptep;
> > -		table = kvm_pte_table(pte, level);
> >  	}
> >  
> >  	if (ret)
> >  		goto out;
> 
> Rather than passing a pointer to the local variable pte and requiring
> all downstream code to update it (and deal with dereferencing to read
> the old pte), wouldn't it be simpler to just re-read the PTE here?

Yeah, you're right. I had some odd rationalization about this, but
there's no need to force a walker to descend into the new table level as
it is wasted work if another thread unlinks it.

[...]

> >  
> > +	table = kvm_pte_table(pte, level);
> >  	if (!table) {
> 
> nit: Technically there's no reason to set @table again. e.g. This could
> just be:
> 
>         if (!kvm_pte_table(pte, level)) {

Sure, I'll squish these lines together.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU
  2022-09-07 21:47   ` David Matlack
@ 2022-09-09  9:55     ` Oliver Upton
  0 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-09-09  9:55 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Wed, Sep 07, 2022 at 02:47:08PM -0700, David Matlack wrote:
> On Tue, Aug 30, 2022 at 07:41:26PM +0000, Oliver Upton wrote:
> > The use of RCU is necessary to change the paging structures in parallel.
> > Acquire and release an RCU read lock when traversing the page tables.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 19 ++++++++++++++++++-
> >  arch/arm64/kvm/hyp/pgtable.c         |  7 ++++++-
> >  2 files changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 78fbb7be1af6..7d2de0a98ccb 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -578,9 +578,26 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
> >   */
> >  enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
> >  
> > +#if defined(__KVM_NVHE_HYPERVISOR___)
> > +
> 
> Future readers will wonder why NVHE stubs out RCU support and how that
> is even correct. Some comments here would be useful explain it.

Good point.

> > +static inline void kvm_pgtable_walk_begin(void) {}
> > +static inline void kvm_pgtable_walk_end(void) {}
> > +
> > +#define kvm_dereference_ptep rcu_dereference_raw
> 
> How does NVHE have access rcu_dereference_raw()?

rcu_dereference_raw() is inlined and simply recasts the pointer into the
kernel address space.

Perhaps it is less confusing to template this on kvm_pte_read() to avoid
polluting nVHE with an otherwise benign reference to RCU.

> > +
> > +#else	/* !defined(__KVM_NVHE_HYPERVISOR__) */
> > +
> > +#define kvm_pgtable_walk_begin	rcu_read_lock
> > +#define kvm_pgtable_walk_end	rcu_read_unlock
> > +#define kvm_dereference_ptep	rcu_dereference
> > +
> > +#endif	/* defined(__KVM_NVHE_HYPERVISOR__) */
> > +
> >  static inline kvm_pte_t kvm_pte_read(kvm_pte_t *ptep)
> >  {
> > -	return READ_ONCE(*ptep);
> > +	kvm_pte_t __rcu *p = (kvm_pte_t __rcu *)ptep;
> > +
> > +	return READ_ONCE(*kvm_dereference_ptep(p));
> 
> What about all the other places where page table memory is accessed?
> 
> If RCU is going to be used to protect page table memory, then all
> accesses have to go under an RCU critical section. This means that page
> table memory should only be accessed through __rcu annotated pointers
> and dereferenced with rcu_dereference().

Let me play around with this a bit, as the annoying part is trying to
sprinkle in RCU annotations w/o messing with nVHE. 

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling
  2022-09-06 10:00 ` [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Marc Zyngier
@ 2022-09-09 10:01   ` Oliver Upton
  0 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-09-09 10:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Catalin Marinas,
	Will Deacon, Quentin Perret, Ricardo Koller, Reiji Watanabe,
	David Matlack, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm

Hey Marc,

On Tue, Sep 06, 2022 at 11:00:09AM +0100, Marc Zyngier wrote:

[...]

> This fails to build on -rc4:
> 
>   MODPOST vmlinux.symvers
>   MODINFO modules.builtin.modinfo
>   GEN     modules.builtin
>   CC      .vmlinux.export.o
>   LD      .tmp_vmlinux.kallsyms1
> ld: Unexpected GOT/PLT entries detected!
> ld: Unexpected run-time procedure linkages detected!
> ld: ID map text too big or misaligned
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_walk':
> (.hyp.text+0xdc0c): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xdc1c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_get_leaf':
> (.hyp.text+0xdc80): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xdc90): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_map':
> (.hyp.text+0xddb0): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xddc0): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_unmap':
> (.hyp.text+0xde44): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xde50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_destroy':
> (.hyp.text+0xdf40): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xdf50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_map':
> (.hyp.text+0xe16c): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xe17c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_set_owner':
> (.hyp.text+0xe264): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xe274): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_unmap':
> (.hyp.text+0xe2d4): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xe2e4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_flush':
> (.hyp.text+0xe5b4): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xe5c4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_destroy':
> (.hyp.text+0xe6f0): undefined reference to `__kvm_nvhe___rcu_read_lock'
> ld: (.hyp.text+0xe700): undefined reference to `__kvm_nvhe___rcu_read_unlock'
> make[3]: *** [Makefile:1169: vmlinux] Error 1
> make[2]: *** [debian/rules:7: build-arch] Error 2
> 
> as this drags the RCU read-lock into EL2, and that's not going to
> work... The following fixes it, but I wonder how you tested it.

Ugh. I was carrying a patch on top of my series to handle compilation
issues with rseq_test, I managed to squash the equivalent of below in
that patch.

Nonetheless, I *did* actually test it to get the numbers above :)

--
Thanks,
Oliver

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index dc839db86a1a..adf170122daf 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -580,7 +580,7 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
>   */
>  enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
>  
> -#if defined(__KVM_NVHE_HYPERVISOR___)
> +#if defined(__KVM_NVHE_HYPERVISOR__)
>  
>  static inline void kvm_pgtable_walk_begin(void) {}
>  static inline void kvm_pgtable_walk_end(void) {}
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-09-06 14:35   ` Quentin Perret
@ 2022-09-09 10:04     ` Oliver Upton
  0 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-09-09 10:04 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Ricardo Koller, Reiji Watanabe,
	David Matlack, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Tue, Sep 06, 2022 at 02:35:47PM +0000, Quentin Perret wrote:
> Hi Oliver,
> 
> On Tuesday 30 Aug 2022 at 19:41:20 (+0000), Oliver Upton wrote:
> >  static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> >  				     kvm_pte_t *ptep,
> >  				     struct stage2_map_data *data)
> >  {
> > -	if (data->anchor)
> > -		return 0;
> > +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> > +	kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops);
> > +	struct kvm_pgtable *pgt = data->mmu->pgt;
> > +	int ret;
> >  
> >  	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> >  		return 0;
> >  
> > -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> >  	kvm_clear_pte(ptep);
> >  
> >  	/*
> > @@ -782,8 +786,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> >  	 * individually.
> >  	 */
> >  	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> > -	data->anchor = ptep;
> > -	return 0;
> > +
> > +	ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> > +
> > +	mm_ops->put_page(ptep);
> > +	mm_ops->free_removed_table(childp, level + 1, pgt);
> 
> By the look of it, __kvm_pgtable_visit() has saved the table PTE on the
> stack prior to calling the TABLE_PRE callback, and it then uses the PTE
> from its stack and does kvm_pte_follow() to find the childp, and walks
> from there. Would that be a UAF now?

Sure would, I suppose the actual UAF is hidden by the use of RCU later
in the series. Nonetheless, I'm going to adopt David's suggestion of
just rereading the PTE which should tidy this up.

Thanks for catching this.

--
Best,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-09-07 20:57   ` David Matlack
@ 2022-09-09 10:07     ` Oliver Upton
  0 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-09-09 10:07 UTC (permalink / raw)
  To: David Matlack
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Ricardo Koller,
	Reiji Watanabe, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Wed, Sep 07, 2022 at 01:57:17PM -0700, David Matlack wrote:
> On Tue, Aug 30, 2022 at 07:41:20PM +0000, Oliver Upton wrote:
> [...]
> >  
> > +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +				struct stage2_map_data *data);
> > +
> >  static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> >  				     kvm_pte_t *ptep,
> >  				     struct stage2_map_data *data)
> >  {
> > -	if (data->anchor)
> 
> Should @anchor and @childp be removed from struct stage2_map_data? This
> commit removes the only remaining references to them.

Yup, I'll toss those in the next spin.

> > -		return 0;
> > +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> > +	kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops);
> > +	struct kvm_pgtable *pgt = data->mmu->pgt;
> > +	int ret;
> >  
> >  	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> >  		return 0;
> >  
> > -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> >  	kvm_clear_pte(ptep);
> >  
> >  	/*
> [...]
> >  static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> >  			     enum kvm_pgtable_walk_flags flag, void * const arg)
> > @@ -883,11 +849,9 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> >  		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> >  	case KVM_PGTABLE_WALK_LEAF:
> >  		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> > -	case KVM_PGTABLE_WALK_TABLE_POST:
> > -		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> 
> kvm_pgtable_stage2_set_owner() still uses stage2_map_walker() with
> KVM_PGTABLE_WALK_TABLE_POST.

Good catch, I'll drop the TABLE_POST flag there as well.

Appreciate the reviews on the series.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-08-30 19:41 ` [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
  2022-09-06 14:35   ` Quentin Perret
  2022-09-07 20:57   ` David Matlack
@ 2022-09-14  0:20   ` Ricardo Koller
  2022-10-10  3:58     ` Oliver Upton
  2 siblings, 1 reply; 32+ messages in thread
From: Ricardo Koller @ 2022-09-14  0:20 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Reiji Watanabe,
	David Matlack, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

Hi Oliver,

On Tue, Aug 30, 2022 at 07:41:20PM +0000, Oliver Upton wrote:
> The break-before-make sequence is a bit annoying as it opens a window
> wherein memory is unmapped from the guest. KVM should replace the PTE
> as quickly as possible and avoid unnecessary work in between.
> 
> Presently, the stage-2 map walker tears down a removed table before
> installing a block mapping when coalescing a table into a block. As the
> removed table is no longer visible to hardware walkers after the
> DSB+TLBI, it is possible to move the remaining cleanup to happen after
> installing the new PTE.
> 
> Reshuffle the stage-2 map walker to install the new block entry in
> the pre-order callback. Unwire all of the teardown logic and replace
> it with a call to kvm_pgtable_stage2_free_removed() after fixing
> the PTE. The post-order visitor is now completely unnecessary, so drop
> it. Finally, touch up the comments to better represent the now
> simplified map walker.
> 
> Note that the call to tear down the unlinked stage-2 is indirected
> as a subsequent change will use an RCU callback to trigger tear down.
> RCU is not available to pKVM, so there is a need to use different
> implementations on pKVM and non-pKVM VMs.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  3 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  1 +
>  arch/arm64/kvm/hyp/pgtable.c          | 83 ++++++++-------------------
>  arch/arm64/kvm/mmu.c                  |  1 +
>  4 files changed, 28 insertions(+), 60 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index d71fb92dc913..c25633f53b2b 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -77,6 +77,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
>   *				allocation is physically contiguous.
>   * @free_pages_exact:		Free an exact number of memory pages previously
>   *				allocated by zalloc_pages_exact.
> + * @free_removed_table:		Free a removed paging structure by unlinking and
> + *				dropping references.
>   * @get_page:			Increment the refcount on a page.
>   * @put_page:			Decrement the refcount on a page. When the
>   *				refcount reaches 0 the page is automatically
> @@ -95,6 +97,7 @@ struct kvm_pgtable_mm_ops {
>  	void*		(*zalloc_page)(void *arg);
>  	void*		(*zalloc_pages_exact)(size_t size);
>  	void		(*free_pages_exact)(void *addr, size_t size);
> +	void		(*free_removed_table)(void *addr, u32 level, void *arg);
>  	void		(*get_page)(void *addr);
>  	void		(*put_page)(void *addr);
>  	int		(*page_count)(void *addr);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1e78acf9662e..a930fdee6fce 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -93,6 +93,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
>  	host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
>  		.zalloc_pages_exact = host_s2_zalloc_pages_exact,
>  		.zalloc_page = host_s2_zalloc_page,
> +		.free_removed_table = kvm_pgtable_stage2_free_removed,
>  		.phys_to_virt = hyp_phys_to_virt,
>  		.virt_to_phys = hyp_virt_to_phys,
>  		.page_count = hyp_page_count,
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index d8127c25424c..5c0c8028d71c 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -763,17 +763,21 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>  	return 0;
>  }
>  
> +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +				struct stage2_map_data *data);
> +
>  static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  				     kvm_pte_t *ptep,
>  				     struct stage2_map_data *data)
>  {
> -	if (data->anchor)
> -		return 0;
> +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops);
> +	struct kvm_pgtable *pgt = data->mmu->pgt;
> +	int ret;
>  
>  	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
>  		return 0;
>  
> -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
>  	kvm_clear_pte(ptep);
>  
>  	/*
> @@ -782,8 +786,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>  	 * individually.
>  	 */
>  	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -	data->anchor = ptep;
> -	return 0;
> +
> +	ret = stage2_map_walk_leaf(addr, end, level, ptep, data);

I think this always ends up calling stage2_map_walker_try_leaf() (at
least it should). In that case, I think it might be clearer to do so, as
the intention is to just install a block.

> +
> +	mm_ops->put_page(ptep);
> +	mm_ops->free_removed_table(childp, level + 1, pgt);

*old = *ptep;

> +
> +	return ret;
>  }
>  
>  static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> @@ -793,13 +802,6 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	kvm_pte_t *childp, pte = *ptep;
>  	int ret;
>  
> -	if (data->anchor) {
> -		if (stage2_pte_is_counted(pte))
> -			mm_ops->put_page(ptep);
> -
> -		return 0;
> -	}
> -
>  	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
>  	if (ret != -E2BIG)
>  		return ret;
> @@ -828,50 +830,14 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	return 0;
>  }
>  
> -static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> -				      struct stage2_map_data *data)
> -{
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -	kvm_pte_t *childp;
> -	int ret = 0;
> -
> -	if (!data->anchor)
> -		return 0;
> -
> -	if (data->anchor == ptep) {
> -		childp = data->childp;
> -		data->anchor = NULL;
> -		data->childp = NULL;
> -		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> -	} else {
> -		childp = kvm_pte_follow(*ptep, mm_ops);
> -	}
> -
> -	mm_ops->put_page(childp);
> -	mm_ops->put_page(ptep);
> -
> -	return ret;
> -}
> -
>  /*
> - * This is a little fiddly, as we use all three of the walk flags. The idea
> - * is that the TABLE_PRE callback runs for table entries on the way down,
> - * looking for table entries which we could conceivably replace with a
> - * block entry for this mapping. If it finds one, then it sets the 'anchor'
> - * field in 'struct stage2_map_data' to point at the table entry, before
> - * clearing the entry to zero and descending into the now detached table.
> - *
> - * The behaviour of the LEAF callback then depends on whether or not the
> - * anchor has been set. If not, then we're not using a block mapping higher
> - * up the table and we perform the mapping at the existing leaves instead.
> - * If, on the other hand, the anchor _is_ set, then we drop references to
> - * all valid leaves so that the pages beneath the anchor can be freed.
> + * The TABLE_PRE callback runs for table entries on the way down, looking
> + * for table entries which we could conceivably replace with a block entry
> + * for this mapping. If it finds one it replaces the entry and calls
> + * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table.
>   *
> - * Finally, the TABLE_POST callback does nothing if the anchor has not
> - * been set, but otherwise frees the page-table pages while walking back up
> - * the page-table, installing the block entry when it revisits the anchor
> - * pointer and clearing the anchor to NULL.
> + * Otherwise, the LEAF callback performs the mapping at the existing leaves
> + * instead.
>   */
>  static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  			     enum kvm_pgtable_walk_flags flag, void * const arg)
> @@ -883,11 +849,9 @@ static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
>  	case KVM_PGTABLE_WALK_LEAF:
>  		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> -	case KVM_PGTABLE_WALK_TABLE_POST:
> -		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +	default:
> +		return -EINVAL;

nice!

>  	}
> -
> -	return -EINVAL;
>  }
>  
>  int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> @@ -905,8 +869,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  	struct kvm_pgtable_walker walker = {
>  		.cb		= stage2_map_walker,
>  		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
> -				  KVM_PGTABLE_WALK_LEAF |
> -				  KVM_PGTABLE_WALK_TABLE_POST,
> +				  KVM_PGTABLE_WALK_LEAF,
>  		.arg		= &map_data,
>  	};
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c9a13e487187..91521f4aab97 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -627,6 +627,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
>  	.zalloc_page		= stage2_memcache_zalloc_page,
>  	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
>  	.free_pages_exact	= free_pages_exact,
> +	.free_removed_table	= kvm_pgtable_stage2_free_removed,
>  	.get_page		= kvm_host_get_page,
>  	.put_page		= kvm_host_put_page,
>  	.page_count		= kvm_host_page_count,
> -- 
> 2.37.2.672.g94769d06f0-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback
  2022-08-30 19:41 ` [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Oliver Upton
  2022-09-07 22:00   ` David Matlack
@ 2022-09-14  0:49   ` Ricardo Koller
  1 sibling, 0 replies; 32+ messages in thread
From: Ricardo Koller @ 2022-09-14  0:49 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Reiji Watanabe,
	David Matlack, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

Hi Oliver,

On Tue, Aug 30, 2022 at 07:41:27PM +0000, Oliver Upton wrote:
> There is no real urgency to free a stage-2 subtree that was pruned.
> Nonetheless, KVM does the tear down in the stage-2 fault path while
> holding the MMU lock.
> 
> Free removed stage-2 subtrees after an RCU grace period. To guarantee
> all stage-2 table pages are freed before killing a VM, add an
> rcu_barrier() to the flush path.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/mmu.c | 35 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 91521f4aab97..265951c05879 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -97,6 +97,38 @@ static void *stage2_memcache_zalloc_page(void *arg)
>  	return kvm_mmu_memory_cache_alloc(mc);
>  }
>  
> +#define STAGE2_PAGE_PRIVATE_LEVEL_MASK	GENMASK_ULL(2, 0)
> +
> +static inline unsigned long stage2_page_private(u32 level, void *arg)
> +{
> +	unsigned long pvt = (unsigned long)arg;
> +
> +	BUILD_BUG_ON(KVM_PGTABLE_MAX_LEVELS > STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> +	WARN_ON_ONCE(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK);

If the pgt pointer (arg) is not aligned for some reason, I think it
might be better to BUG_ON(). Alternatively, why not trying to pass a new
struct (with level and arg) that's freed by the rcu callback.

> +
> +	return pvt | level;
> +}
> +
> +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> +{
> +	struct page *page = container_of(head, struct page, rcu_head);
> +	unsigned long pvt = page_private(page);
> +	void *arg = (void *)(pvt & ~STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> +	u32 level = (u32)(pvt & STAGE2_PAGE_PRIVATE_LEVEL_MASK);
> +	void *pgtable = page_to_virt(page);
> +
> +	kvm_pgtable_stage2_free_removed(pgtable, level, arg);
> +}
> +
> +static void stage2_free_removed_table(void *pgtable, u32 level, void *arg)
> +{
> +	unsigned long pvt = stage2_page_private(level, arg);
> +	struct page *page = virt_to_page(pgtable);
> +
> +	set_page_private(page, (unsigned long)pvt);
> +	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
> +}
> +
>  static void *kvm_host_zalloc_pages_exact(size_t size)
>  {
>  	return alloc_pages_exact(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> @@ -627,7 +659,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
>  	.zalloc_page		= stage2_memcache_zalloc_page,
>  	.zalloc_pages_exact	= kvm_host_zalloc_pages_exact,
>  	.free_pages_exact	= free_pages_exact,
> -	.free_removed_table	= kvm_pgtable_stage2_free_removed,
> +	.free_removed_table	= stage2_free_removed_table,
>  	.get_page		= kvm_host_get_page,
>  	.put_page		= kvm_host_put_page,
>  	.page_count		= kvm_host_page_count,
> @@ -770,6 +802,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  	if (pgt) {
>  		kvm_pgtable_stage2_destroy(pgt);
>  		kfree(pgt);
> +		rcu_barrier();
>  	}
>  }
>  
> -- 
> 2.37.2.672.g94769d06f0-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
  2022-08-30 19:51 ` [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware Oliver Upton
@ 2022-09-14  0:51   ` Ricardo Koller
  2022-09-14  0:53     ` Ricardo Koller
  0 siblings, 1 reply; 32+ messages in thread
From: Ricardo Koller @ 2022-09-14  0:51 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm, kvm,
	Quentin Perret, Reiji Watanabe, David Matlack, Ben Gardon,
	Paolo Bonzini, Gavin Shan, Peter Xu, Sean Christopherson,
	linux-kernel

On Tue, Aug 30, 2022 at 07:51:01PM +0000, Oliver Upton wrote:
> In order to service stage-2 faults in parallel, stage-2 table walkers
> must take exclusive ownership of the PTE being worked on. An additional
> requirement of the architecture is that software must perform a
> 'break-before-make' operation when changing the block size used for
> mapping memory.
> 
> Roll these two concepts together into helpers for performing a
> 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> has been locked by a software walker. Additionally, use an atomic
> compare-exchange to 'break' the PTE when the stage-2 page tables are
> possibly shared with another software walker. Elide the DSB + TLBI if
> the evicted PTE was invalid (and thus not subject to break-before-make).
> 
> All of the atomics do nothing for now, as the stage-2 walker isn't fully
> ready to perform parallel walks.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 87 +++++++++++++++++++++++++++++++++---
>  1 file changed, 82 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 61a4437c8c16..71ae96608752 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -49,6 +49,12 @@
>  #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
>  #define KVM_MAX_OWNER_ID		1
>  
> +/*
> + * Used to indicate a pte for which a 'break-before-make' sequence is in
> + * progress.
> + */
> +#define KVM_INVALID_PTE_LOCKED		BIT(10)
> +
>  struct kvm_pgtable_walk_data {
>  	struct kvm_pgtable		*pgt;
>  	struct kvm_pgtable_walker	*walker;
> @@ -586,6 +592,8 @@ struct stage2_map_data {
>  
>  	/* Force mappings to page granularity */
>  	bool				force_pte;
> +
> +	bool				shared;
>  };
>  
>  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> @@ -691,6 +699,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>  	return kvm_pte_valid(pte) || kvm_invalid_pte_owner(pte);
>  }
>  
> +static bool stage2_pte_is_locked(kvm_pte_t pte)
> +{
> +	return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> +}
> +
>  static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bool shared)
>  {
>  	if (!shared) {
> @@ -701,6 +714,69 @@ static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bo
>  	return cmpxchg(ptep, old, new) == old;
>  }
>  
> +/**
> + * stage2_try_break_pte() - Invalidates a pte according to the
> + *			    'break-before-make' requirements of the
> + *			    architecture.
> + *
> + * @ptep: Pointer to the pte to break
> + * @old: The previously observed value of the pte
> + * @addr: IPA corresponding to the pte
> + * @level: Table level of the pte
> + * @shared: true if the stage-2 page tables could be shared by multiple software
> + *	    walkers
> + *
> + * Returns: true if the pte was successfully broken.
> + *
> + * If the removed pte was valid, performs the necessary serialization and TLB
> + * invalidation for the old value. For counted ptes, drops the reference count
> + * on the containing table page.
> + */
> +static bool stage2_try_break_pte(kvm_pte_t *ptep, kvm_pte_t old, u64 addr, u32 level,
> +				 struct stage2_map_data *data)
> +{
> +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +
> +	if (stage2_pte_is_locked(old)) {
> +		/*
> +		 * Should never occur if this walker has exclusive access to the
> +		 * page tables.
> +		 */
> +		WARN_ON(!data->shared);
> +		return false;
> +	}

The above check is not needed as the cmpxchg() will return false if the
old pte is equal to "new" (KVM_INVALID_PTE_LOCKED).

> +
> +	if (!stage2_try_set_pte(ptep, old, KVM_INVALID_PTE_LOCKED, data->shared))
> +		return false;
> +
> +	/*
> +	 * Perform the appropriate TLB invalidation based on the evicted pte
> +	 * value (if any).
> +	 */
> +	if (kvm_pte_table(old, level))
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> +	else if (kvm_pte_valid(old))
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +
> +	if (stage2_pte_is_counted(old))
> +		mm_ops->put_page(ptep);
> +
> +	return true;
> +}
> +
> +static void stage2_make_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new,
> +			    struct stage2_map_data *data)
> +{
> +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +
> +	WARN_ON(!stage2_pte_is_locked(*ptep));
> +
> +	if (stage2_pte_is_counted(new))
> +		mm_ops->get_page(ptep);
> +
> +	smp_store_release(ptep, new);
> +}
> +
>  static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
>  			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
>  {
> @@ -836,17 +912,18 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	if (!childp)
>  		return -ENOMEM;
>  
> +	if (!stage2_try_break_pte(ptep, *old, addr, level, data)) {
> +		mm_ops->put_page(childp);
> +		return -EAGAIN;
> +	}
> +
>  	/*
>  	 * If we've run into an existing block mapping then replace it with
>  	 * a table. Accesses beyond 'end' that fall within the new table
>  	 * will be mapped lazily.
>  	 */
> -	if (stage2_pte_is_counted(pte))
> -		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> -
>  	new = kvm_init_table_pte(childp, mm_ops);
> -	mm_ops->get_page(ptep);
> -	smp_store_release(ptep, new);
> +	stage2_make_pte(ptep, *old, new, data);
>  	*old = new;
>  
>  	return 0;
> -- 
> 2.37.2.672.g94769d06f0-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
  2022-09-14  0:51   ` Ricardo Koller
@ 2022-09-14  0:53     ` Ricardo Koller
  0 siblings, 0 replies; 32+ messages in thread
From: Ricardo Koller @ 2022-09-14  0:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm, kvm,
	Quentin Perret, Reiji Watanabe, David Matlack, Ben Gardon,
	Paolo Bonzini, Gavin Shan, Peter Xu, Sean Christopherson,
	linux-kernel

On Tue, Sep 13, 2022 at 05:51:55PM -0700, Ricardo Koller wrote:
> On Tue, Aug 30, 2022 at 07:51:01PM +0000, Oliver Upton wrote:
> > In order to service stage-2 faults in parallel, stage-2 table walkers
> > must take exclusive ownership of the PTE being worked on. An additional
> > requirement of the architecture is that software must perform a
> > 'break-before-make' operation when changing the block size used for
> > mapping memory.
> > 
> > Roll these two concepts together into helpers for performing a
> > 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> > has been locked by a software walker. Additionally, use an atomic
> > compare-exchange to 'break' the PTE when the stage-2 page tables are
> > possibly shared with another software walker. Elide the DSB + TLBI if
> > the evicted PTE was invalid (and thus not subject to break-before-make).
> > 
> > All of the atomics do nothing for now, as the stage-2 walker isn't fully
> > ready to perform parallel walks.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 87 +++++++++++++++++++++++++++++++++---
> >  1 file changed, 82 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 61a4437c8c16..71ae96608752 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -49,6 +49,12 @@
> >  #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
> >  #define KVM_MAX_OWNER_ID		1
> >  
> > +/*
> > + * Used to indicate a pte for which a 'break-before-make' sequence is in
> > + * progress.
> > + */
> > +#define KVM_INVALID_PTE_LOCKED		BIT(10)
> > +
> >  struct kvm_pgtable_walk_data {
> >  	struct kvm_pgtable		*pgt;
> >  	struct kvm_pgtable_walker	*walker;
> > @@ -586,6 +592,8 @@ struct stage2_map_data {
> >  
> >  	/* Force mappings to page granularity */
> >  	bool				force_pte;
> > +
> > +	bool				shared;
> >  };
> >  
> >  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> > @@ -691,6 +699,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
> >  	return kvm_pte_valid(pte) || kvm_invalid_pte_owner(pte);
> >  }
> >  
> > +static bool stage2_pte_is_locked(kvm_pte_t pte)
> > +{
> > +	return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> > +}
> > +
> >  static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bool shared)
> >  {
> >  	if (!shared) {
> > @@ -701,6 +714,69 @@ static bool stage2_try_set_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new, bo
> >  	return cmpxchg(ptep, old, new) == old;
> >  }
> >  
> > +/**
> > + * stage2_try_break_pte() - Invalidates a pte according to the
> > + *			    'break-before-make' requirements of the
> > + *			    architecture.
> > + *
> > + * @ptep: Pointer to the pte to break
> > + * @old: The previously observed value of the pte
> > + * @addr: IPA corresponding to the pte
> > + * @level: Table level of the pte
> > + * @shared: true if the stage-2 page tables could be shared by multiple software
> > + *	    walkers
> > + *
> > + * Returns: true if the pte was successfully broken.
> > + *
> > + * If the removed pte was valid, performs the necessary serialization and TLB
> > + * invalidation for the old value. For counted ptes, drops the reference count
> > + * on the containing table page.
> > + */
> > +static bool stage2_try_break_pte(kvm_pte_t *ptep, kvm_pte_t old, u64 addr, u32 level,
> > +				 struct stage2_map_data *data)
> > +{
> > +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> > +
> > +	if (stage2_pte_is_locked(old)) {
> > +		/*
> > +		 * Should never occur if this walker has exclusive access to the
> > +		 * page tables.
> > +		 */
> > +		WARN_ON(!data->shared);
> > +		return false;
> > +	}
> 
> The above check is not needed as the cmpxchg() will return false if the
> old pte is equal to "new" (KVM_INVALID_PTE_LOCKED).
> 
> > +
> > +	if (!stage2_try_set_pte(ptep, old, KVM_INVALID_PTE_LOCKED, data->shared))
> > +		return false;
> > +
> > +	/*
> > +	 * Perform the appropriate TLB invalidation based on the evicted pte
> > +	 * value (if any).
> > +	 */
> > +	if (kvm_pte_table(old, level))
> > +		kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> > +	else if (kvm_pte_valid(old))
> > +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > +
> > +	if (stage2_pte_is_counted(old))
> > +		mm_ops->put_page(ptep);
> > +
> > +	return true;
> > +}
> > +
> > +static void stage2_make_pte(kvm_pte_t *ptep, kvm_pte_t old, kvm_pte_t new,
> > +			    struct stage2_map_data *data)
> > +{

nit: old is not used

> > +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> > +
> > +	WARN_ON(!stage2_pte_is_locked(*ptep));
> > +
> > +	if (stage2_pte_is_counted(new))
> > +		mm_ops->get_page(ptep);
> > +
> > +	smp_store_release(ptep, new);
> > +}
> > +
> >  static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> >  			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
> >  {
> > @@ -836,17 +912,18 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> >  	if (!childp)
> >  		return -ENOMEM;
> >  
> > +	if (!stage2_try_break_pte(ptep, *old, addr, level, data)) {
> > +		mm_ops->put_page(childp);
> > +		return -EAGAIN;
> > +	}
> > +
> >  	/*
> >  	 * If we've run into an existing block mapping then replace it with
> >  	 * a table. Accesses beyond 'end' that fall within the new table
> >  	 * will be mapped lazily.
> >  	 */
> > -	if (stage2_pte_is_counted(pte))
> > -		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> > -
> >  	new = kvm_init_table_pte(childp, mm_ops);
> > -	mm_ops->get_page(ptep);
> > -	smp_store_release(ptep, new);
> > +	stage2_make_pte(ptep, *old, new, data);
> >  	*old = new;
> >  
> >  	return 0;
> > -- 
> > 2.37.2.672.g94769d06f0-goog
> > 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-09-14  0:20   ` Ricardo Koller
@ 2022-10-10  3:58     ` Oliver Upton
  0 siblings, 0 replies; 32+ messages in thread
From: Oliver Upton @ 2022-10-10  3:58 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Quentin Perret, Reiji Watanabe,
	David Matlack, Ben Gardon, Paolo Bonzini, Gavin Shan, Peter Xu,
	Sean Christopherson, linux-arm-kernel, kvmarm, kvm, linux-kernel

Hey Ricardo,

On Tue, Sep 13, 2022 at 05:20:11PM -0700, Ricardo Koller wrote:

[...]

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index d8127c25424c..5c0c8028d71c 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -763,17 +763,21 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >  	return 0;
> >  }
> >  
> > +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +				struct stage2_map_data *data);
> > +
> >  static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> >  				     kvm_pte_t *ptep,
> >  				     struct stage2_map_data *data)
> >  {
> > -	if (data->anchor)
> > -		return 0;
> > +	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> > +	kvm_pte_t *childp = kvm_pte_follow(*ptep, mm_ops);
> > +	struct kvm_pgtable *pgt = data->mmu->pgt;
> > +	int ret;
> >  
> >  	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> >  		return 0;
> >  
> > -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> >  	kvm_clear_pte(ptep);
> >  
> >  	/*
> > @@ -782,8 +786,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> >  	 * individually.
> >  	 */
> >  	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> > -	data->anchor = ptep;
> > -	return 0;
> > +
> > +	ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> 
> I think this always ends up calling stage2_map_walker_try_leaf() (at
> least it should). In that case, I think it might be clearer to do so, as
> the intention is to just install a block.

Yikes, I missed this in v2. I do agree with your point, it reads a bit
odd to call something that could reinstall a table.

Picked up the fix for v3. Thanks!

--
Best,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-10-10  4:00 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
2022-08-30 19:41 ` [PATCH 01/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees Oliver Upton
2022-08-30 19:41 ` [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
2022-09-06 14:35   ` Quentin Perret
2022-09-09 10:04     ` Oliver Upton
2022-09-07 20:57   ` David Matlack
2022-09-09 10:07     ` Oliver Upton
2022-09-14  0:20   ` Ricardo Koller
2022-10-10  3:58     ` Oliver Upton
2022-08-30 19:41 ` [PATCH 03/14] KVM: arm64: Directly read owner id field in stage2_pte_is_counted() Oliver Upton
2022-08-30 19:41 ` [PATCH 04/14] KVM: arm64: Read the PTE once per visit Oliver Upton
2022-08-30 19:41 ` [PATCH 05/14] KVM: arm64: Split init and set for table PTE Oliver Upton
2022-08-30 19:41 ` [PATCH 06/14] KVM: arm64: Return next table from map callbacks Oliver Upton
2022-09-07 21:32   ` David Matlack
2022-09-09  9:38     ` Oliver Upton
2022-08-30 19:41 ` [PATCH 07/14] KVM: arm64: Document behavior of pgtable visitor callback Oliver Upton
2022-08-30 19:41 ` [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU Oliver Upton
2022-09-07 21:47   ` David Matlack
2022-09-09  9:55     ` Oliver Upton
2022-08-30 19:41 ` [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Oliver Upton
2022-09-07 22:00   ` David Matlack
2022-09-08 16:40     ` David Matlack
2022-09-14  0:49   ` Ricardo Koller
2022-08-30 19:50 ` [PATCH 10/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks Oliver Upton
2022-08-30 19:51 ` [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware Oliver Upton
2022-09-14  0:51   ` Ricardo Koller
2022-09-14  0:53     ` Ricardo Koller
2022-08-30 19:51 ` [PATCH 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware Oliver Upton
2022-08-30 19:51 ` [PATCH 13/14] KVM: arm64: Make table->block " Oliver Upton
2022-08-30 19:52 ` [PATCH 14/14] KVM: arm64: Handle stage-2 faults in parallel Oliver Upton
2022-09-06 10:00 ` [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Marc Zyngier
2022-09-09 10:01   ` Oliver Upton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).