All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/14] KVM: arm64: Parallel stage-2 fault handling
@ 2022-11-07 21:56 ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Presently KVM only takes a read lock for stage 2 faults if it believes
the fault can be fixed by relaxing permissions on a PTE (write unprotect
for dirty logging). Otherwise, stage 2 faults grab the write lock, which
predictably can pile up all the vCPUs in a sufficiently large VM.

Like the TDP MMU for x86, this series loosens the locking around
manipulations of the stage 2 page tables to allow parallel faults. RCU
and atomics are exploited to safely build/destroy the stage 2 page
tables in light of multiple software observers.

Patches 1-4 clean up the context associated with a page table walk / PTE
visit. This is helpful for:
 - Extending the context passed through for a visit
 - Building page table walkers that operate outside of a kvm_pgtable
   context (e.g. RCU callback)

Patches 5-7 clean up the stage-2 map walkers by calling a helper to tear
down removed tables. There is a small improvement here in that a broken
PTE is replaced more quickly, as page table teardown happens afterwards.

Patch 8 sprinkles in RCU to the page table walkers, punting the
teardown of removed tables to an RCU callback.

Patches 9-13 implement the meat of this series, extending the
'break-before-make' sequence with atomics to realize locking on PTEs.
Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
changes to a given PTE.

Finally, patch 14 flips the switch on all the new code and starts
grabbing the read side of the MMU lock for stage 2 faults.

Applies to 6.1-rc3. Tested with KVM selftests, kvm-unit-tests, and live
migrating a 24 vCPU, 96GB VM that was running a Debian install.
Confirmed all stage-2 table memory was freed by checking the
SecPageTables stat in meminfo.

Branch available at:

  https://github.com/oupton/linux kvm-arm64/parallel_mmu

benchmarked with dirty_log_perf_test, scaling from 1 to 48 vCPUs with
4GB of memory per vCPU backed by THP.

  ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}

Time to dirty memory:

        +-------+----------+-------------------+
        | vCPUs | 6.1-rc3  | 6.1-rc3 + series  |
        +-------+----------+-------------------+
        |     1 | 0.87s    | 0.93s             |
        |     2 | 1.11s    | 1.16s             |
        |     4 | 2.39s    | 1.27s             |
        |     8 | 5.01s    | 1.39s             |
        |    16 | 8.89s    | 2.07s             |
        |    32 | 19.90s   | 4.45s             |
        |    48 | 32.10s   | 6.23s             |
        +-------+----------+-------------------+

It is also worth mentioning that the time to populate memory has
improved:

        +-------+----------+-------------------+
        | vCPUs | 6.1-rc3  | 6.1-rc3 + series  |
        +-------+----------+-------------------+
        |     1 | 0.21s    | 0.17s             |
        |     2 | 0.26s    | 0.23s             |
        |     4 | 0.39s    | 0.31s             |
        |     8 | 0.68s    | 0.39s             |
        |    16 | 1.26s    | 0.53s             |
        |    32 | 2.51s    | 1.04s             |
        |    48 | 3.94s    | 1.55s             |
        +-------+----------+-------------------+

v4 -> v5:
 - Fix an obvious leak of table memory (Ricardo)

v3 -> v4:
 - Fix some type conversion misses caught by sparse (test robot)
 - Squash RCU locking and RCU callback patches together into one (Sean)
 - Commit message nits (Sean)
 - Take a pointer to kvm_s2_mmu in stage2_try_break_pte(), in
   anticipation of eager page splitting (Ricardo)

v3: https://lore.kernel.org/kvmarm/20221027221752.1683510-1-oliver.upton@linux.dev/
v4: https://lore.kernel.org/kvmarm/20221103091140.1040433-1-oliver.upton@linux.dev/

Oliver Upton (14):
  KVM: arm64: Combine visitor arguments into a context structure
  KVM: arm64: Stash observed pte value in visitor context
  KVM: arm64: Pass mm_ops through the visitor context
  KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  KVM: arm64: Use an opaque type for pteps
  KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  KVM: arm64: Protect stage-2 traversal with RCU
  KVM: arm64: Atomically update stage 2 leaf attributes in parallel
    walks
  KVM: arm64: Split init and set for table PTE
  KVM: arm64: Make block->table PTE changes parallel-aware
  KVM: arm64: Make leaf->leaf PTE changes parallel-aware
  KVM: arm64: Make table->block changes parallel-aware
  KVM: arm64: Handle stage-2 faults in parallel

 arch/arm64/include/asm/kvm_pgtable.h  |  92 +++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  21 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  22 +-
 arch/arm64/kvm/hyp/pgtable.c          | 628 ++++++++++++++------------
 arch/arm64/kvm/mmu.c                  |  53 ++-
 5 files changed, 466 insertions(+), 350 deletions(-)


base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v5 00/14] KVM: arm64: Parallel stage-2 fault handling
@ 2022-11-07 21:56 ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Presently KVM only takes a read lock for stage 2 faults if it believes
the fault can be fixed by relaxing permissions on a PTE (write unprotect
for dirty logging). Otherwise, stage 2 faults grab the write lock, which
predictably can pile up all the vCPUs in a sufficiently large VM.

Like the TDP MMU for x86, this series loosens the locking around
manipulations of the stage 2 page tables to allow parallel faults. RCU
and atomics are exploited to safely build/destroy the stage 2 page
tables in light of multiple software observers.

Patches 1-4 clean up the context associated with a page table walk / PTE
visit. This is helpful for:
 - Extending the context passed through for a visit
 - Building page table walkers that operate outside of a kvm_pgtable
   context (e.g. RCU callback)

Patches 5-7 clean up the stage-2 map walkers by calling a helper to tear
down removed tables. There is a small improvement here in that a broken
PTE is replaced more quickly, as page table teardown happens afterwards.

Patch 8 sprinkles in RCU to the page table walkers, punting the
teardown of removed tables to an RCU callback.

Patches 9-13 implement the meat of this series, extending the
'break-before-make' sequence with atomics to realize locking on PTEs.
Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
changes to a given PTE.

Finally, patch 14 flips the switch on all the new code and starts
grabbing the read side of the MMU lock for stage 2 faults.

Applies to 6.1-rc3. Tested with KVM selftests, kvm-unit-tests, and live
migrating a 24 vCPU, 96GB VM that was running a Debian install.
Confirmed all stage-2 table memory was freed by checking the
SecPageTables stat in meminfo.

Branch available at:

  https://github.com/oupton/linux kvm-arm64/parallel_mmu

benchmarked with dirty_log_perf_test, scaling from 1 to 48 vCPUs with
4GB of memory per vCPU backed by THP.

  ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}

Time to dirty memory:

        +-------+----------+-------------------+
        | vCPUs | 6.1-rc3  | 6.1-rc3 + series  |
        +-------+----------+-------------------+
        |     1 | 0.87s    | 0.93s             |
        |     2 | 1.11s    | 1.16s             |
        |     4 | 2.39s    | 1.27s             |
        |     8 | 5.01s    | 1.39s             |
        |    16 | 8.89s    | 2.07s             |
        |    32 | 19.90s   | 4.45s             |
        |    48 | 32.10s   | 6.23s             |
        +-------+----------+-------------------+

It is also worth mentioning that the time to populate memory has
improved:

        +-------+----------+-------------------+
        | vCPUs | 6.1-rc3  | 6.1-rc3 + series  |
        +-------+----------+-------------------+
        |     1 | 0.21s    | 0.17s             |
        |     2 | 0.26s    | 0.23s             |
        |     4 | 0.39s    | 0.31s             |
        |     8 | 0.68s    | 0.39s             |
        |    16 | 1.26s    | 0.53s             |
        |    32 | 2.51s    | 1.04s             |
        |    48 | 3.94s    | 1.55s             |
        +-------+----------+-------------------+

v4 -> v5:
 - Fix an obvious leak of table memory (Ricardo)

v3 -> v4:
 - Fix some type conversion misses caught by sparse (test robot)
 - Squash RCU locking and RCU callback patches together into one (Sean)
 - Commit message nits (Sean)
 - Take a pointer to kvm_s2_mmu in stage2_try_break_pte(), in
   anticipation of eager page splitting (Ricardo)

v3: https://lore.kernel.org/kvmarm/20221027221752.1683510-1-oliver.upton@linux.dev/
v4: https://lore.kernel.org/kvmarm/20221103091140.1040433-1-oliver.upton@linux.dev/

Oliver Upton (14):
  KVM: arm64: Combine visitor arguments into a context structure
  KVM: arm64: Stash observed pte value in visitor context
  KVM: arm64: Pass mm_ops through the visitor context
  KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  KVM: arm64: Use an opaque type for pteps
  KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  KVM: arm64: Protect stage-2 traversal with RCU
  KVM: arm64: Atomically update stage 2 leaf attributes in parallel
    walks
  KVM: arm64: Split init and set for table PTE
  KVM: arm64: Make block->table PTE changes parallel-aware
  KVM: arm64: Make leaf->leaf PTE changes parallel-aware
  KVM: arm64: Make table->block changes parallel-aware
  KVM: arm64: Handle stage-2 faults in parallel

 arch/arm64/include/asm/kvm_pgtable.h  |  92 +++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  21 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  22 +-
 arch/arm64/kvm/hyp/pgtable.c          | 628 ++++++++++++++------------
 arch/arm64/kvm/mmu.c                  |  53 ++-
 5 files changed, 466 insertions(+), 350 deletions(-)


base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v5 00/14] KVM: arm64: Parallel stage-2 fault handling
@ 2022-11-07 21:56 ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Presently KVM only takes a read lock for stage 2 faults if it believes
the fault can be fixed by relaxing permissions on a PTE (write unprotect
for dirty logging). Otherwise, stage 2 faults grab the write lock, which
predictably can pile up all the vCPUs in a sufficiently large VM.

Like the TDP MMU for x86, this series loosens the locking around
manipulations of the stage 2 page tables to allow parallel faults. RCU
and atomics are exploited to safely build/destroy the stage 2 page
tables in light of multiple software observers.

Patches 1-4 clean up the context associated with a page table walk / PTE
visit. This is helpful for:
 - Extending the context passed through for a visit
 - Building page table walkers that operate outside of a kvm_pgtable
   context (e.g. RCU callback)

Patches 5-7 clean up the stage-2 map walkers by calling a helper to tear
down removed tables. There is a small improvement here in that a broken
PTE is replaced more quickly, as page table teardown happens afterwards.

Patch 8 sprinkles in RCU to the page table walkers, punting the
teardown of removed tables to an RCU callback.

Patches 9-13 implement the meat of this series, extending the
'break-before-make' sequence with atomics to realize locking on PTEs.
Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
changes to a given PTE.

Finally, patch 14 flips the switch on all the new code and starts
grabbing the read side of the MMU lock for stage 2 faults.

Applies to 6.1-rc3. Tested with KVM selftests, kvm-unit-tests, and live
migrating a 24 vCPU, 96GB VM that was running a Debian install.
Confirmed all stage-2 table memory was freed by checking the
SecPageTables stat in meminfo.

Branch available at:

  https://github.com/oupton/linux kvm-arm64/parallel_mmu

benchmarked with dirty_log_perf_test, scaling from 1 to 48 vCPUs with
4GB of memory per vCPU backed by THP.

  ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}

Time to dirty memory:

        +-------+----------+-------------------+
        | vCPUs | 6.1-rc3  | 6.1-rc3 + series  |
        +-------+----------+-------------------+
        |     1 | 0.87s    | 0.93s             |
        |     2 | 1.11s    | 1.16s             |
        |     4 | 2.39s    | 1.27s             |
        |     8 | 5.01s    | 1.39s             |
        |    16 | 8.89s    | 2.07s             |
        |    32 | 19.90s   | 4.45s             |
        |    48 | 32.10s   | 6.23s             |
        +-------+----------+-------------------+

It is also worth mentioning that the time to populate memory has
improved:

        +-------+----------+-------------------+
        | vCPUs | 6.1-rc3  | 6.1-rc3 + series  |
        +-------+----------+-------------------+
        |     1 | 0.21s    | 0.17s             |
        |     2 | 0.26s    | 0.23s             |
        |     4 | 0.39s    | 0.31s             |
        |     8 | 0.68s    | 0.39s             |
        |    16 | 1.26s    | 0.53s             |
        |    32 | 2.51s    | 1.04s             |
        |    48 | 3.94s    | 1.55s             |
        +-------+----------+-------------------+

v4 -> v5:
 - Fix an obvious leak of table memory (Ricardo)

v3 -> v4:
 - Fix some type conversion misses caught by sparse (test robot)
 - Squash RCU locking and RCU callback patches together into one (Sean)
 - Commit message nits (Sean)
 - Take a pointer to kvm_s2_mmu in stage2_try_break_pte(), in
   anticipation of eager page splitting (Ricardo)

v3: https://lore.kernel.org/kvmarm/20221027221752.1683510-1-oliver.upton@linux.dev/
v4: https://lore.kernel.org/kvmarm/20221103091140.1040433-1-oliver.upton@linux.dev/

Oliver Upton (14):
  KVM: arm64: Combine visitor arguments into a context structure
  KVM: arm64: Stash observed pte value in visitor context
  KVM: arm64: Pass mm_ops through the visitor context
  KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  KVM: arm64: Use an opaque type for pteps
  KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  KVM: arm64: Protect stage-2 traversal with RCU
  KVM: arm64: Atomically update stage 2 leaf attributes in parallel
    walks
  KVM: arm64: Split init and set for table PTE
  KVM: arm64: Make block->table PTE changes parallel-aware
  KVM: arm64: Make leaf->leaf PTE changes parallel-aware
  KVM: arm64: Make table->block changes parallel-aware
  KVM: arm64: Handle stage-2 faults in parallel

 arch/arm64/include/asm/kvm_pgtable.h  |  92 +++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  21 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  22 +-
 arch/arm64/kvm/hyp/pgtable.c          | 628 ++++++++++++++------------
 arch/arm64/kvm/mmu.c                  |  53 ++-
 5 files changed, 466 insertions(+), 350 deletions(-)


base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Passing new arguments by value to the visitor callbacks is extremely
inflexible for stuffing new parameters used by only some of the
visitors. Use a context structure instead and pass the pointer through
to the visitor callback.

While at it, redefine the 'flags' parameter to the visitor to contain
the bit indicating the phase of the walk. Pass the entire set of flags
through the context structure such that the walker can communicate
additional state to the visitor callback.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
 arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
 4 files changed, 154 insertions(+), 156 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3252eb50ecfe..607f9bb8aab4 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 };
 
-typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
-					kvm_pte_t *ptep,
-					enum kvm_pgtable_walk_flags flag,
-					void * const arg);
+struct kvm_pgtable_visit_ctx {
+	kvm_pte_t				*ptep;
+	void					*arg;
+	u64					addr;
+	u64					end;
+	u32					level;
+	enum kvm_pgtable_walk_flags		flags;
+};
+
+typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
+					enum kvm_pgtable_walk_flags visit);
 
 /**
  * struct kvm_pgtable_walker - Hook into a page-table walk.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1e78acf9662e..8f5b6a36a039 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -417,13 +417,11 @@ struct check_walk_data {
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
 };
 
-static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
-				      enum kvm_pgtable_walk_flags flag,
-				      void * const arg)
+static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
+				      enum kvm_pgtable_walk_flags visit)
 {
-	struct check_walk_data *d = arg;
-	kvm_pte_t pte = *ptep;
+	struct check_walk_data *d = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..a293cf5eba1b 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
-					 enum kvm_pgtable_walk_flags flag,
-					 void * const arg)
+static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
+					 enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *ctx->ptep;
 	phys_addr_t phys;
 
 	if (!kvm_pte_valid(pte))
@@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	 * was unable to access the hyp_vmemmap and so the buddy allocator has
 	 * initialised the refcount to '1'.
 	 */
-	mm_ops->get_page(ptep);
-	if (flag != KVM_PGTABLE_WALK_LEAF)
+	mm_ops->get_page(ctx->ptep);
+	if (visit != KVM_PGTABLE_WALK_LEAF)
 		return 0;
 
-	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
+	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	phys = kvm_pte_to_phys(pte);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cdf8e76b0be1..900c8b9c0cfc 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
 	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
 }
 
-static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
+static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
 {
-	u64 granule = kvm_granule_size(level);
+	u64 granule = kvm_granule_size(ctx->level);
 
-	if (!kvm_level_supports_block_mapping(level))
+	if (!kvm_level_supports_block_mapping(ctx->level))
 		return false;
 
-	if (granule > (end - addr))
+	if (granule > (ctx->end - ctx->addr))
 		return false;
 
 	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
 		return false;
 
-	return IS_ALIGNED(addr, granule);
+	return IS_ALIGNED(ctx->addr, granule);
 }
 
 static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
@@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
 	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
 }
 
-static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
-				  u32 level, kvm_pte_t *ptep,
-				  enum kvm_pgtable_walk_flags flag)
+static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
+				  const struct kvm_pgtable_visit_ctx *ctx,
+				  enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
-	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
+	return walker->cb(ctx, visit);
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
@@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      kvm_pte_t *ptep, u32 level)
 {
+	enum kvm_pgtable_walk_flags flags = data->walker->flags;
+	struct kvm_pgtable_visit_ctx ctx = {
+		.ptep	= ptep,
+		.arg	= data->walker->arg,
+		.addr	= data->addr,
+		.end	= data->end,
+		.level	= level,
+		.flags	= flags,
+	};
 	int ret = 0;
-	u64 addr = data->addr;
 	kvm_pte_t *childp, pte = *ptep;
 	bool table = kvm_pte_table(pte, level);
-	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 
-	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_TABLE_PRE);
-	}
+	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
 
-	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_LEAF);
+	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
 		pte = *ptep;
 		table = kvm_pte_table(pte, level);
 	}
@@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 	if (ret)
 		goto out;
 
-	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_TABLE_POST);
-	}
+	if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
 
 out:
 	return ret;
@@ -292,13 +294,13 @@ struct leaf_walk_data {
 	u32		level;
 };
 
-static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-		       enum kvm_pgtable_walk_flags flag, void * const arg)
+static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
+		       enum kvm_pgtable_walk_flags visit)
 {
-	struct leaf_walk_data *data = arg;
+	struct leaf_walk_data *data = ctx->arg;
 
-	data->pte   = *ptep;
-	data->level = level;
+	data->pte   = *ctx->ptep;
+	data->level = ctx->level;
 
 	return 0;
 }
@@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 	return prot;
 }
 
-static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				    kvm_pte_t *ptep, struct hyp_map_data *data)
+static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
+				    struct hyp_map_data *data)
 {
-	kvm_pte_t new, old = *ptep;
-	u64 granule = kvm_granule_size(level), phys = data->phys;
+	kvm_pte_t new, old = *ctx->ptep;
+	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 
-	if (!kvm_block_mapping_supported(addr, end, phys, level))
+	if (!kvm_block_mapping_supported(ctx, phys))
 		return false;
 
 	data->phys += granule;
-	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
 	if (old == new)
 		return true;
 	if (!kvm_pte_valid(old))
-		data->mm_ops->get_page(ptep);
+		data->mm_ops->get_page(ctx->ptep);
 	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
-	smp_store_release(ptep, new);
+	smp_store_release(ctx->ptep, new);
 	return true;
 }
 
-static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			  enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			  enum kvm_pgtable_walk_flags visit)
 {
 	kvm_pte_t *childp;
-	struct hyp_map_data *data = arg;
+	struct hyp_map_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
+	if (hyp_map_walker_try_leaf(ctx, data))
 		return 0;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
-	mm_ops->get_page(ptep);
+	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	mm_ops->get_page(ctx->ptep);
 	return 0;
 }
 
@@ -456,39 +458,39 @@ struct hyp_unmap_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			    enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			    enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ptep, *childp = NULL;
-	u64 granule = kvm_granule_size(level);
-	struct hyp_unmap_data *data = arg;
+	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	u64 granule = kvm_granule_size(ctx->level);
+	struct hyp_unmap_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return -EINVAL;
 
-	if (kvm_pte_table(pte, level)) {
+	if (kvm_pte_table(pte, ctx->level)) {
 		childp = kvm_pte_follow(pte, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
 
-		kvm_clear_pte(ptep);
+		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
-		__tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
+		__tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
 	} else {
-		if (end - addr < granule)
+		if (ctx->end - ctx->addr < granule)
 			return -EINVAL;
 
-		kvm_clear_pte(ptep);
+		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
-		__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
+		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
 		data->unmapped += granule;
 	}
 
 	dsb(ish);
 	isb();
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
 	if (childp)
 		mm_ops->put_page(childp);
@@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	return 0;
 }
 
-static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			   enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			   enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
-	kvm_pte_t pte = *ptep;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, level))
+	if (kvm_pte_table(pte, ctx->level))
 		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
@@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
-static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
-			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
+static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
+			   struct kvm_pgtable_mm_ops *mm_ops)
 {
 	/*
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
-	if (kvm_pte_valid(*ptep)) {
-		kvm_clear_pte(ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
+	if (kvm_pte_valid(*ctx->ptep)) {
+		kvm_clear_pte(ctx->ptep);
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
 	}
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 }
 
 static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
@@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
-static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
+static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
 					struct stage2_map_data *data)
 {
-	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
 		return false;
 
-	return kvm_block_mapping_supported(addr, end, data->phys, level);
+	return kvm_block_mapping_supported(ctx, data->phys);
 }
 
-static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
+static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	kvm_pte_t new, old = *ptep;
-	u64 granule = kvm_granule_size(level), phys = data->phys;
+	kvm_pte_t new, old = *ctx->ptep;
+	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return -E2BIG;
 
 	if (kvm_phys_is_valid(phys))
-		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+		new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
@@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		if (!stage2_pte_needs_update(old, new))
 			return -EAGAIN;
 
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ctx, data->mmu, mm_ops);
 	}
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
@@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-	smp_store_release(ptep, new);
+	smp_store_release(ctx->ptep, new);
 	if (stage2_pte_is_counted(new))
-		mm_ops->get_page(ptep);
+		mm_ops->get_page(ctx->ptep);
 	if (kvm_phys_is_valid(phys))
 		data->phys += granule;
 	return 0;
 }
 
-static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
-				     kvm_pte_t *ptep,
+static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 				     struct stage2_map_data *data)
 {
 	if (data->anchor)
 		return 0;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
-	kvm_clear_pte(ptep);
+	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
+	kvm_clear_pte(ctx->ptep);
 
 	/*
 	 * Invalidate the whole stage-2, as we may have numerous leaf
@@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	 * individually.
 	 */
 	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-	data->anchor = ptep;
+	data->anchor = ctx->ptep;
 	return 0;
 }
 
-static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp, pte = *ptep;
+	kvm_pte_t *childp, pte = *ctx->ptep;
 	int ret;
 
 	if (data->anchor) {
 		if (stage2_pte_is_counted(pte))
-			mm_ops->put_page(ptep);
+			mm_ops->put_page(ctx->ptep);
 
 		return 0;
 	}
 
-	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
+	ret = stage2_map_walker_try_leaf(ctx, data);
 	if (ret != -E2BIG)
 		return ret;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	if (!data->memcache)
@@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * will be mapped lazily.
 	 */
 	if (stage2_pte_is_counted(pte))
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ctx, data->mmu, mm_ops);
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
-	mm_ops->get_page(ptep);
+	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	mm_ops->get_page(ctx->ptep);
 
 	return 0;
 }
 
-static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
+static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
@@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 	if (!data->anchor)
 		return 0;
 
-	if (data->anchor == ptep) {
+	if (data->anchor == ctx->ptep) {
 		childp = data->childp;
 		data->anchor = NULL;
 		data->childp = NULL;
-		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+		ret = stage2_map_walk_leaf(ctx, data);
 	} else {
-		childp = kvm_pte_follow(*ptep, mm_ops);
+		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
 	}
 
 	mm_ops->put_page(childp);
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
 	return ret;
 }
@@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
  * the page-table, installing the block entry when it revisits the anchor
  * pointer and clearing the anchor to NULL.
  */
-static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			     enum kvm_pgtable_walk_flags flag, void * const arg)
+static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			     enum kvm_pgtable_walk_flags visit)
 {
-	struct stage2_map_data *data = arg;
+	struct stage2_map_data *data = ctx->arg;
 
-	switch (flag) {
+	switch (visit) {
 	case KVM_PGTABLE_WALK_TABLE_PRE:
-		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
+		return stage2_map_walk_table_pre(ctx, data);
 	case KVM_PGTABLE_WALK_LEAF:
-		return stage2_map_walk_leaf(addr, end, level, ptep, data);
+		return stage2_map_walk_leaf(ctx, data);
 	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(addr, end, level, ptep, data);
+		return stage2_map_walk_table_post(ctx, data);
 	}
 
 	return -EINVAL;
@@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			       enum kvm_pgtable_walk_flags flag,
-			       void * const arg)
+static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			       enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable *pgt = arg;
+	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ptep, *childp = NULL;
+	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
 	bool need_flush = false;
 
 	if (!kvm_pte_valid(pte)) {
 		if (stage2_pte_is_counted(pte)) {
-			kvm_clear_pte(ptep);
-			mm_ops->put_page(ptep);
+			kvm_clear_pte(ctx->ptep);
+			mm_ops->put_page(ctx->ptep);
 		}
 		return 0;
 	}
 
-	if (kvm_pte_table(pte, level)) {
+	if (kvm_pte_table(pte, ctx->level)) {
 		childp = kvm_pte_follow(pte, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
@@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * block entry and rely on the remaining portions being faulted
 	 * back lazily.
 	 */
-	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
+	stage2_put_pte(ctx, mmu, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
-					       kvm_granule_size(level));
+					       kvm_granule_size(ctx->level));
 
 	if (childp)
 		mm_ops->put_page(childp);
@@ -1012,18 +1010,17 @@ struct stage2_attr_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			      enum kvm_pgtable_walk_flags flag,
-			      void * const arg)
+static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			      enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ptep;
-	struct stage2_attr_data *data = arg;
+	kvm_pte_t pte = *ctx->ptep;
+	struct stage2_attr_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	data->level = level;
+	data->level = ctx->level;
 	data->pte = pte;
 	pte &= ~data->attr_clr;
 	pte |= data->attr_set;
@@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		 * stage-2 PTE if we are going to add executable permission.
 		 */
 		if (mm_ops->icache_inval_pou &&
-		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
+		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
-						  kvm_granule_size(level));
-		WRITE_ONCE(*ptep, pte);
+						  kvm_granule_size(ctx->level));
+		WRITE_ONCE(*ctx->ptep, pte);
 	}
 
 	return 0;
@@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	return ret;
 }
 
-static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			       enum kvm_pgtable_walk_flags flag,
-			       void * const arg)
+static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			       enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable *pgt = arg;
+	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
-					       kvm_granule_size(level));
+					       kvm_granule_size(ctx->level));
 	return 0;
 }
 
@@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
-static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			      enum kvm_pgtable_walk_flags flag,
-			      void * const arg)
+static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			      enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
-	kvm_pte_t pte = *ptep;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!stage2_pte_is_counted(pte))
 		return 0;
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, level))
+	if (kvm_pte_table(pte, ctx->level))
 		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Passing new arguments by value to the visitor callbacks is extremely
inflexible for stuffing new parameters used by only some of the
visitors. Use a context structure instead and pass the pointer through
to the visitor callback.

While at it, redefine the 'flags' parameter to the visitor to contain
the bit indicating the phase of the walk. Pass the entire set of flags
through the context structure such that the walker can communicate
additional state to the visitor callback.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
 arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
 4 files changed, 154 insertions(+), 156 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3252eb50ecfe..607f9bb8aab4 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 };
 
-typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
-					kvm_pte_t *ptep,
-					enum kvm_pgtable_walk_flags flag,
-					void * const arg);
+struct kvm_pgtable_visit_ctx {
+	kvm_pte_t				*ptep;
+	void					*arg;
+	u64					addr;
+	u64					end;
+	u32					level;
+	enum kvm_pgtable_walk_flags		flags;
+};
+
+typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
+					enum kvm_pgtable_walk_flags visit);
 
 /**
  * struct kvm_pgtable_walker - Hook into a page-table walk.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1e78acf9662e..8f5b6a36a039 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -417,13 +417,11 @@ struct check_walk_data {
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
 };
 
-static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
-				      enum kvm_pgtable_walk_flags flag,
-				      void * const arg)
+static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
+				      enum kvm_pgtable_walk_flags visit)
 {
-	struct check_walk_data *d = arg;
-	kvm_pte_t pte = *ptep;
+	struct check_walk_data *d = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..a293cf5eba1b 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
-					 enum kvm_pgtable_walk_flags flag,
-					 void * const arg)
+static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
+					 enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *ctx->ptep;
 	phys_addr_t phys;
 
 	if (!kvm_pte_valid(pte))
@@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	 * was unable to access the hyp_vmemmap and so the buddy allocator has
 	 * initialised the refcount to '1'.
 	 */
-	mm_ops->get_page(ptep);
-	if (flag != KVM_PGTABLE_WALK_LEAF)
+	mm_ops->get_page(ctx->ptep);
+	if (visit != KVM_PGTABLE_WALK_LEAF)
 		return 0;
 
-	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
+	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	phys = kvm_pte_to_phys(pte);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cdf8e76b0be1..900c8b9c0cfc 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
 	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
 }
 
-static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
+static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
 {
-	u64 granule = kvm_granule_size(level);
+	u64 granule = kvm_granule_size(ctx->level);
 
-	if (!kvm_level_supports_block_mapping(level))
+	if (!kvm_level_supports_block_mapping(ctx->level))
 		return false;
 
-	if (granule > (end - addr))
+	if (granule > (ctx->end - ctx->addr))
 		return false;
 
 	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
 		return false;
 
-	return IS_ALIGNED(addr, granule);
+	return IS_ALIGNED(ctx->addr, granule);
 }
 
 static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
@@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
 	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
 }
 
-static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
-				  u32 level, kvm_pte_t *ptep,
-				  enum kvm_pgtable_walk_flags flag)
+static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
+				  const struct kvm_pgtable_visit_ctx *ctx,
+				  enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
-	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
+	return walker->cb(ctx, visit);
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
@@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      kvm_pte_t *ptep, u32 level)
 {
+	enum kvm_pgtable_walk_flags flags = data->walker->flags;
+	struct kvm_pgtable_visit_ctx ctx = {
+		.ptep	= ptep,
+		.arg	= data->walker->arg,
+		.addr	= data->addr,
+		.end	= data->end,
+		.level	= level,
+		.flags	= flags,
+	};
 	int ret = 0;
-	u64 addr = data->addr;
 	kvm_pte_t *childp, pte = *ptep;
 	bool table = kvm_pte_table(pte, level);
-	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 
-	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_TABLE_PRE);
-	}
+	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
 
-	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_LEAF);
+	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
 		pte = *ptep;
 		table = kvm_pte_table(pte, level);
 	}
@@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 	if (ret)
 		goto out;
 
-	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_TABLE_POST);
-	}
+	if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
 
 out:
 	return ret;
@@ -292,13 +294,13 @@ struct leaf_walk_data {
 	u32		level;
 };
 
-static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-		       enum kvm_pgtable_walk_flags flag, void * const arg)
+static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
+		       enum kvm_pgtable_walk_flags visit)
 {
-	struct leaf_walk_data *data = arg;
+	struct leaf_walk_data *data = ctx->arg;
 
-	data->pte   = *ptep;
-	data->level = level;
+	data->pte   = *ctx->ptep;
+	data->level = ctx->level;
 
 	return 0;
 }
@@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 	return prot;
 }
 
-static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				    kvm_pte_t *ptep, struct hyp_map_data *data)
+static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
+				    struct hyp_map_data *data)
 {
-	kvm_pte_t new, old = *ptep;
-	u64 granule = kvm_granule_size(level), phys = data->phys;
+	kvm_pte_t new, old = *ctx->ptep;
+	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 
-	if (!kvm_block_mapping_supported(addr, end, phys, level))
+	if (!kvm_block_mapping_supported(ctx, phys))
 		return false;
 
 	data->phys += granule;
-	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
 	if (old == new)
 		return true;
 	if (!kvm_pte_valid(old))
-		data->mm_ops->get_page(ptep);
+		data->mm_ops->get_page(ctx->ptep);
 	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
-	smp_store_release(ptep, new);
+	smp_store_release(ctx->ptep, new);
 	return true;
 }
 
-static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			  enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			  enum kvm_pgtable_walk_flags visit)
 {
 	kvm_pte_t *childp;
-	struct hyp_map_data *data = arg;
+	struct hyp_map_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
+	if (hyp_map_walker_try_leaf(ctx, data))
 		return 0;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
-	mm_ops->get_page(ptep);
+	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	mm_ops->get_page(ctx->ptep);
 	return 0;
 }
 
@@ -456,39 +458,39 @@ struct hyp_unmap_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			    enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			    enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ptep, *childp = NULL;
-	u64 granule = kvm_granule_size(level);
-	struct hyp_unmap_data *data = arg;
+	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	u64 granule = kvm_granule_size(ctx->level);
+	struct hyp_unmap_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return -EINVAL;
 
-	if (kvm_pte_table(pte, level)) {
+	if (kvm_pte_table(pte, ctx->level)) {
 		childp = kvm_pte_follow(pte, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
 
-		kvm_clear_pte(ptep);
+		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
-		__tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
+		__tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
 	} else {
-		if (end - addr < granule)
+		if (ctx->end - ctx->addr < granule)
 			return -EINVAL;
 
-		kvm_clear_pte(ptep);
+		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
-		__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
+		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
 		data->unmapped += granule;
 	}
 
 	dsb(ish);
 	isb();
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
 	if (childp)
 		mm_ops->put_page(childp);
@@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	return 0;
 }
 
-static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			   enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			   enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
-	kvm_pte_t pte = *ptep;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, level))
+	if (kvm_pte_table(pte, ctx->level))
 		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
@@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
-static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
-			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
+static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
+			   struct kvm_pgtable_mm_ops *mm_ops)
 {
 	/*
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
-	if (kvm_pte_valid(*ptep)) {
-		kvm_clear_pte(ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
+	if (kvm_pte_valid(*ctx->ptep)) {
+		kvm_clear_pte(ctx->ptep);
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
 	}
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 }
 
 static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
@@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
-static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
+static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
 					struct stage2_map_data *data)
 {
-	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
 		return false;
 
-	return kvm_block_mapping_supported(addr, end, data->phys, level);
+	return kvm_block_mapping_supported(ctx, data->phys);
 }
 
-static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
+static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	kvm_pte_t new, old = *ptep;
-	u64 granule = kvm_granule_size(level), phys = data->phys;
+	kvm_pte_t new, old = *ctx->ptep;
+	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return -E2BIG;
 
 	if (kvm_phys_is_valid(phys))
-		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+		new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
@@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		if (!stage2_pte_needs_update(old, new))
 			return -EAGAIN;
 
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ctx, data->mmu, mm_ops);
 	}
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
@@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-	smp_store_release(ptep, new);
+	smp_store_release(ctx->ptep, new);
 	if (stage2_pte_is_counted(new))
-		mm_ops->get_page(ptep);
+		mm_ops->get_page(ctx->ptep);
 	if (kvm_phys_is_valid(phys))
 		data->phys += granule;
 	return 0;
 }
 
-static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
-				     kvm_pte_t *ptep,
+static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 				     struct stage2_map_data *data)
 {
 	if (data->anchor)
 		return 0;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
-	kvm_clear_pte(ptep);
+	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
+	kvm_clear_pte(ctx->ptep);
 
 	/*
 	 * Invalidate the whole stage-2, as we may have numerous leaf
@@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	 * individually.
 	 */
 	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-	data->anchor = ptep;
+	data->anchor = ctx->ptep;
 	return 0;
 }
 
-static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp, pte = *ptep;
+	kvm_pte_t *childp, pte = *ctx->ptep;
 	int ret;
 
 	if (data->anchor) {
 		if (stage2_pte_is_counted(pte))
-			mm_ops->put_page(ptep);
+			mm_ops->put_page(ctx->ptep);
 
 		return 0;
 	}
 
-	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
+	ret = stage2_map_walker_try_leaf(ctx, data);
 	if (ret != -E2BIG)
 		return ret;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	if (!data->memcache)
@@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * will be mapped lazily.
 	 */
 	if (stage2_pte_is_counted(pte))
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ctx, data->mmu, mm_ops);
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
-	mm_ops->get_page(ptep);
+	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	mm_ops->get_page(ctx->ptep);
 
 	return 0;
 }
 
-static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
+static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
@@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 	if (!data->anchor)
 		return 0;
 
-	if (data->anchor == ptep) {
+	if (data->anchor == ctx->ptep) {
 		childp = data->childp;
 		data->anchor = NULL;
 		data->childp = NULL;
-		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+		ret = stage2_map_walk_leaf(ctx, data);
 	} else {
-		childp = kvm_pte_follow(*ptep, mm_ops);
+		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
 	}
 
 	mm_ops->put_page(childp);
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
 	return ret;
 }
@@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
  * the page-table, installing the block entry when it revisits the anchor
  * pointer and clearing the anchor to NULL.
  */
-static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			     enum kvm_pgtable_walk_flags flag, void * const arg)
+static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			     enum kvm_pgtable_walk_flags visit)
 {
-	struct stage2_map_data *data = arg;
+	struct stage2_map_data *data = ctx->arg;
 
-	switch (flag) {
+	switch (visit) {
 	case KVM_PGTABLE_WALK_TABLE_PRE:
-		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
+		return stage2_map_walk_table_pre(ctx, data);
 	case KVM_PGTABLE_WALK_LEAF:
-		return stage2_map_walk_leaf(addr, end, level, ptep, data);
+		return stage2_map_walk_leaf(ctx, data);
 	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(addr, end, level, ptep, data);
+		return stage2_map_walk_table_post(ctx, data);
 	}
 
 	return -EINVAL;
@@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			       enum kvm_pgtable_walk_flags flag,
-			       void * const arg)
+static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			       enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable *pgt = arg;
+	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ptep, *childp = NULL;
+	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
 	bool need_flush = false;
 
 	if (!kvm_pte_valid(pte)) {
 		if (stage2_pte_is_counted(pte)) {
-			kvm_clear_pte(ptep);
-			mm_ops->put_page(ptep);
+			kvm_clear_pte(ctx->ptep);
+			mm_ops->put_page(ctx->ptep);
 		}
 		return 0;
 	}
 
-	if (kvm_pte_table(pte, level)) {
+	if (kvm_pte_table(pte, ctx->level)) {
 		childp = kvm_pte_follow(pte, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
@@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * block entry and rely on the remaining portions being faulted
 	 * back lazily.
 	 */
-	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
+	stage2_put_pte(ctx, mmu, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
-					       kvm_granule_size(level));
+					       kvm_granule_size(ctx->level));
 
 	if (childp)
 		mm_ops->put_page(childp);
@@ -1012,18 +1010,17 @@ struct stage2_attr_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			      enum kvm_pgtable_walk_flags flag,
-			      void * const arg)
+static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			      enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ptep;
-	struct stage2_attr_data *data = arg;
+	kvm_pte_t pte = *ctx->ptep;
+	struct stage2_attr_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	data->level = level;
+	data->level = ctx->level;
 	data->pte = pte;
 	pte &= ~data->attr_clr;
 	pte |= data->attr_set;
@@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		 * stage-2 PTE if we are going to add executable permission.
 		 */
 		if (mm_ops->icache_inval_pou &&
-		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
+		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
-						  kvm_granule_size(level));
-		WRITE_ONCE(*ptep, pte);
+						  kvm_granule_size(ctx->level));
+		WRITE_ONCE(*ctx->ptep, pte);
 	}
 
 	return 0;
@@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	return ret;
 }
 
-static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			       enum kvm_pgtable_walk_flags flag,
-			       void * const arg)
+static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			       enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable *pgt = arg;
+	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
-					       kvm_granule_size(level));
+					       kvm_granule_size(ctx->level));
 	return 0;
 }
 
@@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
-static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			      enum kvm_pgtable_walk_flags flag,
-			      void * const arg)
+static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			      enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
-	kvm_pte_t pte = *ptep;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!stage2_pte_is_counted(pte))
 		return 0;
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, level))
+	if (kvm_pte_table(pte, ctx->level))
 		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Passing new arguments by value to the visitor callbacks is extremely
inflexible for stuffing new parameters used by only some of the
visitors. Use a context structure instead and pass the pointer through
to the visitor callback.

While at it, redefine the 'flags' parameter to the visitor to contain
the bit indicating the phase of the walk. Pass the entire set of flags
through the context structure such that the walker can communicate
additional state to the visitor callback.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
 arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
 4 files changed, 154 insertions(+), 156 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3252eb50ecfe..607f9bb8aab4 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 };
 
-typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
-					kvm_pte_t *ptep,
-					enum kvm_pgtable_walk_flags flag,
-					void * const arg);
+struct kvm_pgtable_visit_ctx {
+	kvm_pte_t				*ptep;
+	void					*arg;
+	u64					addr;
+	u64					end;
+	u32					level;
+	enum kvm_pgtable_walk_flags		flags;
+};
+
+typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
+					enum kvm_pgtable_walk_flags visit);
 
 /**
  * struct kvm_pgtable_walker - Hook into a page-table walk.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1e78acf9662e..8f5b6a36a039 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -417,13 +417,11 @@ struct check_walk_data {
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
 };
 
-static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
-				      enum kvm_pgtable_walk_flags flag,
-				      void * const arg)
+static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
+				      enum kvm_pgtable_walk_flags visit)
 {
-	struct check_walk_data *d = arg;
-	kvm_pte_t pte = *ptep;
+	struct check_walk_data *d = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..a293cf5eba1b 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
-					 enum kvm_pgtable_walk_flags flag,
-					 void * const arg)
+static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
+					 enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *ctx->ptep;
 	phys_addr_t phys;
 
 	if (!kvm_pte_valid(pte))
@@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	 * was unable to access the hyp_vmemmap and so the buddy allocator has
 	 * initialised the refcount to '1'.
 	 */
-	mm_ops->get_page(ptep);
-	if (flag != KVM_PGTABLE_WALK_LEAF)
+	mm_ops->get_page(ctx->ptep);
+	if (visit != KVM_PGTABLE_WALK_LEAF)
 		return 0;
 
-	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
+	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	phys = kvm_pte_to_phys(pte);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cdf8e76b0be1..900c8b9c0cfc 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
 	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
 }
 
-static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
+static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
 {
-	u64 granule = kvm_granule_size(level);
+	u64 granule = kvm_granule_size(ctx->level);
 
-	if (!kvm_level_supports_block_mapping(level))
+	if (!kvm_level_supports_block_mapping(ctx->level))
 		return false;
 
-	if (granule > (end - addr))
+	if (granule > (ctx->end - ctx->addr))
 		return false;
 
 	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
 		return false;
 
-	return IS_ALIGNED(addr, granule);
+	return IS_ALIGNED(ctx->addr, granule);
 }
 
 static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
@@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
 	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
 }
 
-static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
-				  u32 level, kvm_pte_t *ptep,
-				  enum kvm_pgtable_walk_flags flag)
+static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
+				  const struct kvm_pgtable_visit_ctx *ctx,
+				  enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
-	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
+	return walker->cb(ctx, visit);
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
@@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      kvm_pte_t *ptep, u32 level)
 {
+	enum kvm_pgtable_walk_flags flags = data->walker->flags;
+	struct kvm_pgtable_visit_ctx ctx = {
+		.ptep	= ptep,
+		.arg	= data->walker->arg,
+		.addr	= data->addr,
+		.end	= data->end,
+		.level	= level,
+		.flags	= flags,
+	};
 	int ret = 0;
-	u64 addr = data->addr;
 	kvm_pte_t *childp, pte = *ptep;
 	bool table = kvm_pte_table(pte, level);
-	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 
-	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_TABLE_PRE);
-	}
+	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
 
-	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_LEAF);
+	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
 		pte = *ptep;
 		table = kvm_pte_table(pte, level);
 	}
@@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 	if (ret)
 		goto out;
 
-	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
-		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
-					     KVM_PGTABLE_WALK_TABLE_POST);
-	}
+	if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
+		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
 
 out:
 	return ret;
@@ -292,13 +294,13 @@ struct leaf_walk_data {
 	u32		level;
 };
 
-static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-		       enum kvm_pgtable_walk_flags flag, void * const arg)
+static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
+		       enum kvm_pgtable_walk_flags visit)
 {
-	struct leaf_walk_data *data = arg;
+	struct leaf_walk_data *data = ctx->arg;
 
-	data->pte   = *ptep;
-	data->level = level;
+	data->pte   = *ctx->ptep;
+	data->level = ctx->level;
 
 	return 0;
 }
@@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 	return prot;
 }
 
-static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				    kvm_pte_t *ptep, struct hyp_map_data *data)
+static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
+				    struct hyp_map_data *data)
 {
-	kvm_pte_t new, old = *ptep;
-	u64 granule = kvm_granule_size(level), phys = data->phys;
+	kvm_pte_t new, old = *ctx->ptep;
+	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 
-	if (!kvm_block_mapping_supported(addr, end, phys, level))
+	if (!kvm_block_mapping_supported(ctx, phys))
 		return false;
 
 	data->phys += granule;
-	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
 	if (old == new)
 		return true;
 	if (!kvm_pte_valid(old))
-		data->mm_ops->get_page(ptep);
+		data->mm_ops->get_page(ctx->ptep);
 	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
-	smp_store_release(ptep, new);
+	smp_store_release(ctx->ptep, new);
 	return true;
 }
 
-static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			  enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			  enum kvm_pgtable_walk_flags visit)
 {
 	kvm_pte_t *childp;
-	struct hyp_map_data *data = arg;
+	struct hyp_map_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
+	if (hyp_map_walker_try_leaf(ctx, data))
 		return 0;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
-	mm_ops->get_page(ptep);
+	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	mm_ops->get_page(ctx->ptep);
 	return 0;
 }
 
@@ -456,39 +458,39 @@ struct hyp_unmap_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			    enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			    enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ptep, *childp = NULL;
-	u64 granule = kvm_granule_size(level);
-	struct hyp_unmap_data *data = arg;
+	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	u64 granule = kvm_granule_size(ctx->level);
+	struct hyp_unmap_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return -EINVAL;
 
-	if (kvm_pte_table(pte, level)) {
+	if (kvm_pte_table(pte, ctx->level)) {
 		childp = kvm_pte_follow(pte, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
 
-		kvm_clear_pte(ptep);
+		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
-		__tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
+		__tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
 	} else {
-		if (end - addr < granule)
+		if (ctx->end - ctx->addr < granule)
 			return -EINVAL;
 
-		kvm_clear_pte(ptep);
+		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
-		__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
+		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
 		data->unmapped += granule;
 	}
 
 	dsb(ish);
 	isb();
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
 	if (childp)
 		mm_ops->put_page(childp);
@@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 	return 0;
 }
 
-static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			   enum kvm_pgtable_walk_flags flag, void * const arg)
+static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			   enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
-	kvm_pte_t pte = *ptep;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, level))
+	if (kvm_pte_table(pte, ctx->level))
 		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
@@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
-static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
-			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
+static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
+			   struct kvm_pgtable_mm_ops *mm_ops)
 {
 	/*
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
-	if (kvm_pte_valid(*ptep)) {
-		kvm_clear_pte(ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
+	if (kvm_pte_valid(*ctx->ptep)) {
+		kvm_clear_pte(ctx->ptep);
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
 	}
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 }
 
 static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
@@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
 	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
-static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
+static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
 					struct stage2_map_data *data)
 {
-	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
 		return false;
 
-	return kvm_block_mapping_supported(addr, end, data->phys, level);
+	return kvm_block_mapping_supported(ctx, data->phys);
 }
 
-static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
+static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	kvm_pte_t new, old = *ptep;
-	u64 granule = kvm_granule_size(level), phys = data->phys;
+	kvm_pte_t new, old = *ctx->ptep;
+	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return -E2BIG;
 
 	if (kvm_phys_is_valid(phys))
-		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+		new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
@@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		if (!stage2_pte_needs_update(old, new))
 			return -EAGAIN;
 
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ctx, data->mmu, mm_ops);
 	}
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
@@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-	smp_store_release(ptep, new);
+	smp_store_release(ctx->ptep, new);
 	if (stage2_pte_is_counted(new))
-		mm_ops->get_page(ptep);
+		mm_ops->get_page(ctx->ptep);
 	if (kvm_phys_is_valid(phys))
 		data->phys += granule;
 	return 0;
 }
 
-static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
-				     kvm_pte_t *ptep,
+static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 				     struct stage2_map_data *data)
 {
 	if (data->anchor)
 		return 0;
 
-	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
+	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
-	kvm_clear_pte(ptep);
+	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
+	kvm_clear_pte(ctx->ptep);
 
 	/*
 	 * Invalidate the whole stage-2, as we may have numerous leaf
@@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
 	 * individually.
 	 */
 	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-	data->anchor = ptep;
+	data->anchor = ctx->ptep;
 	return 0;
 }
 
-static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp, pte = *ptep;
+	kvm_pte_t *childp, pte = *ctx->ptep;
 	int ret;
 
 	if (data->anchor) {
 		if (stage2_pte_is_counted(pte))
-			mm_ops->put_page(ptep);
+			mm_ops->put_page(ctx->ptep);
 
 		return 0;
 	}
 
-	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
+	ret = stage2_map_walker_try_leaf(ctx, data);
 	if (ret != -E2BIG)
 		return ret;
 
-	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
 	if (!data->memcache)
@@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * will be mapped lazily.
 	 */
 	if (stage2_pte_is_counted(pte))
-		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
+		stage2_put_pte(ctx, data->mmu, mm_ops);
 
-	kvm_set_table_pte(ptep, childp, mm_ops);
-	mm_ops->get_page(ptep);
+	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	mm_ops->get_page(ctx->ptep);
 
 	return 0;
 }
 
-static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
-				      kvm_pte_t *ptep,
+static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
@@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
 	if (!data->anchor)
 		return 0;
 
-	if (data->anchor == ptep) {
+	if (data->anchor == ctx->ptep) {
 		childp = data->childp;
 		data->anchor = NULL;
 		data->childp = NULL;
-		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+		ret = stage2_map_walk_leaf(ctx, data);
 	} else {
-		childp = kvm_pte_follow(*ptep, mm_ops);
+		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
 	}
 
 	mm_ops->put_page(childp);
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
 	return ret;
 }
@@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
  * the page-table, installing the block entry when it revisits the anchor
  * pointer and clearing the anchor to NULL.
  */
-static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			     enum kvm_pgtable_walk_flags flag, void * const arg)
+static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			     enum kvm_pgtable_walk_flags visit)
 {
-	struct stage2_map_data *data = arg;
+	struct stage2_map_data *data = ctx->arg;
 
-	switch (flag) {
+	switch (visit) {
 	case KVM_PGTABLE_WALK_TABLE_PRE:
-		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
+		return stage2_map_walk_table_pre(ctx, data);
 	case KVM_PGTABLE_WALK_LEAF:
-		return stage2_map_walk_leaf(addr, end, level, ptep, data);
+		return stage2_map_walk_leaf(ctx, data);
 	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(addr, end, level, ptep, data);
+		return stage2_map_walk_table_post(ctx, data);
 	}
 
 	return -EINVAL;
@@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			       enum kvm_pgtable_walk_flags flag,
-			       void * const arg)
+static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			       enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable *pgt = arg;
+	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ptep, *childp = NULL;
+	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
 	bool need_flush = false;
 
 	if (!kvm_pte_valid(pte)) {
 		if (stage2_pte_is_counted(pte)) {
-			kvm_clear_pte(ptep);
-			mm_ops->put_page(ptep);
+			kvm_clear_pte(ctx->ptep);
+			mm_ops->put_page(ctx->ptep);
 		}
 		return 0;
 	}
 
-	if (kvm_pte_table(pte, level)) {
+	if (kvm_pte_table(pte, ctx->level)) {
 		childp = kvm_pte_follow(pte, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
@@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * block entry and rely on the remaining portions being faulted
 	 * back lazily.
 	 */
-	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
+	stage2_put_pte(ctx, mmu, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
-					       kvm_granule_size(level));
+					       kvm_granule_size(ctx->level));
 
 	if (childp)
 		mm_ops->put_page(childp);
@@ -1012,18 +1010,17 @@ struct stage2_attr_data {
 	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
-static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			      enum kvm_pgtable_walk_flags flag,
-			      void * const arg)
+static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			      enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ptep;
-	struct stage2_attr_data *data = arg;
+	kvm_pte_t pte = *ctx->ptep;
+	struct stage2_attr_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	data->level = level;
+	data->level = ctx->level;
 	data->pte = pte;
 	pte &= ~data->attr_clr;
 	pte |= data->attr_set;
@@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		 * stage-2 PTE if we are going to add executable permission.
 		 */
 		if (mm_ops->icache_inval_pou &&
-		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
+		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
-						  kvm_granule_size(level));
-		WRITE_ONCE(*ptep, pte);
+						  kvm_granule_size(ctx->level));
+		WRITE_ONCE(*ctx->ptep, pte);
 	}
 
 	return 0;
@@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	return ret;
 }
 
-static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			       enum kvm_pgtable_walk_flags flag,
-			       void * const arg)
+static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			       enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable *pgt = arg;
+	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ptep;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
-					       kvm_granule_size(level));
+					       kvm_granule_size(ctx->level));
 	return 0;
 }
 
@@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
-static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
-			      enum kvm_pgtable_walk_flags flag,
-			      void * const arg)
+static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
+			      enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
-	kvm_pte_t pte = *ptep;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	kvm_pte_t pte = *ctx->ptep;
 
 	if (!stage2_pte_is_counted(pte))
 		return 0;
 
-	mm_ops->put_page(ptep);
+	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, level))
+	if (kvm_pte_table(pte, ctx->level))
 		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
 	return 0;
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Rather than reading the ptep all over the shop, read the ptep once from
__kvm_pgtable_visit() and stick it in the visitor context. Reread the
ptep after visiting a leaf in case the callback installed a new table
underneath.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
 arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
 4 files changed, 48 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 607f9bb8aab4..14d4b68a1e92 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
 
 struct kvm_pgtable_visit_ctx {
 	kvm_pte_t				*ptep;
+	kvm_pte_t				old;
 	void					*arg;
 	u64					addr;
 	u64					end;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 8f5b6a36a039..d21d1b08a055 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
 				      enum kvm_pgtable_walk_flags visit)
 {
 	struct check_walk_data *d = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
 		return -EINVAL;
 
-	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
+	return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
 }
 
 static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index a293cf5eba1b..6af443c9d78e 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
-	kvm_pte_t pte = *ctx->ptep;
 	phys_addr_t phys;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	/*
@@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	phys = kvm_pte_to_phys(pte);
+	phys = kvm_pte_to_phys(ctx->old);
 	if (!addr_is_memory(phys))
 		return -EINVAL;
 
@@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	 * Adjust the host stage-2 mappings to match the ownership attributes
 	 * configured in the hypervisor stage-1.
 	 */
-	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
+	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
 		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 900c8b9c0cfc..fb3696b3a997 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
+		.old	= READ_ONCE(*ptep),
 		.arg	= data->walker->arg,
 		.addr	= data->addr,
 		.end	= data->end,
@@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.flags	= flags,
 	};
 	int ret = 0;
-	kvm_pte_t *childp, pte = *ptep;
-	bool table = kvm_pte_table(pte, level);
+	kvm_pte_t *childp;
+	bool table = kvm_pte_table(ctx.old, level);
 
 	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
 		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
 
 	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
 		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
-		pte = *ptep;
-		table = kvm_pte_table(pte, level);
+		ctx.old = READ_ONCE(*ptep);
+		table = kvm_pte_table(ctx.old, level);
 	}
 
 	if (ret)
@@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
+	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct leaf_walk_data *data = ctx->arg;
 
-	data->pte   = *ctx->ptep;
+	data->pte   = ctx->old;
 	data->level = ctx->level;
 
 	return 0;
@@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				    struct hyp_map_data *data)
 {
-	kvm_pte_t new, old = *ctx->ptep;
+	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(ctx, phys))
@@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 
 	data->phys += granule;
 	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
-	if (old == new)
+	if (ctx->old == new)
 		return true;
-	if (!kvm_pte_valid(old))
+	if (!kvm_pte_valid(ctx->old))
 		data->mm_ops->get_page(ctx->ptep);
-	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
+	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
 	smp_store_release(ctx->ptep, new);
@@ -461,16 +462,16 @@ struct hyp_unmap_data {
 static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			    enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	kvm_pte_t *childp = NULL;
 	u64 granule = kvm_granule_size(ctx->level);
 	struct hyp_unmap_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return -EINVAL;
 
-	if (kvm_pte_table(pte, ctx->level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			   enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, ctx->level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+	if (kvm_pte_table(ctx->old, ctx->level))
+		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
 
 	return 0;
 }
@@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
-	if (kvm_pte_valid(*ctx->ptep)) {
+	if (kvm_pte_valid(ctx->old)) {
 		kvm_clear_pte(ctx->ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
 	}
@@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	kvm_pte_t new, old = *ctx->ptep;
+	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
@@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
-	if (stage2_pte_is_counted(old)) {
+	if (stage2_pte_is_counted(ctx->old)) {
 		/*
 		 * Skip updating the PTE if we are trying to recreate the exact
 		 * same mapping or only change the access permissions. Instead,
 		 * the vCPU will exit one more time from guest if still needed
 		 * and then go through the path of relaxing permissions.
 		 */
-		if (!stage2_pte_needs_update(old, new))
+		if (!stage2_pte_needs_update(ctx->old, new))
 			return -EAGAIN;
 
 		stage2_put_pte(ctx, data->mmu, mm_ops);
@@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
+	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp, pte = *ctx->ptep;
+	kvm_pte_t *childp;
 	int ret;
 
 	if (data->anchor) {
-		if (stage2_pte_is_counted(pte))
+		if (stage2_pte_is_counted(ctx->old))
 			mm_ops->put_page(ctx->ptep);
 
 		return 0;
@@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (stage2_pte_is_counted(pte))
+	if (stage2_pte_is_counted(ctx->old))
 		stage2_put_pte(ctx, data->mmu, mm_ops);
 
 	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
@@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 		data->childp = NULL;
 		ret = stage2_map_walk_leaf(ctx, data);
 	} else {
-		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 	}
 
 	mm_ops->put_page(childp);
@@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	kvm_pte_t *childp = NULL;
 	bool need_flush = false;
 
-	if (!kvm_pte_valid(pte)) {
-		if (stage2_pte_is_counted(pte)) {
+	if (!kvm_pte_valid(ctx->old)) {
+		if (stage2_pte_is_counted(ctx->old)) {
 			kvm_clear_pte(ctx->ptep);
 			mm_ops->put_page(ctx->ptep);
 		}
 		return 0;
 	}
 
-	if (kvm_pte_table(pte, ctx->level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
-	} else if (stage2_pte_cacheable(pgt, pte)) {
+	} else if (stage2_pte_cacheable(pgt, ctx->old)) {
 		need_flush = !stage2_has_fwb(pgt);
 	}
 
@@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	stage2_put_pte(ctx, mmu, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
 					       kvm_granule_size(ctx->level));
 
 	if (childp)
@@ -1013,11 +1013,11 @@ struct stage2_attr_data {
 static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ctx->ptep;
+	kvm_pte_t pte = ctx->old;
 	struct stage2_attr_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	data->level = ctx->level;
@@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		 * stage-2 PTE if we are going to add executable permission.
 		 */
 		if (mm_ops->icache_inval_pou &&
-		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
+		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
 						  kvm_granule_size(ctx->level));
 		WRITE_ONCE(*ctx->ptep, pte);
@@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
+	if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
 					       kvm_granule_size(ctx->level));
 	return 0;
 }
@@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!stage2_pte_is_counted(pte))
+	if (!stage2_pte_is_counted(ctx->old))
 		return 0;
 
 	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, ctx->level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+	if (kvm_pte_table(ctx->old, ctx->level))
+		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Rather than reading the ptep all over the shop, read the ptep once from
__kvm_pgtable_visit() and stick it in the visitor context. Reread the
ptep after visiting a leaf in case the callback installed a new table
underneath.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
 arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
 4 files changed, 48 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 607f9bb8aab4..14d4b68a1e92 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
 
 struct kvm_pgtable_visit_ctx {
 	kvm_pte_t				*ptep;
+	kvm_pte_t				old;
 	void					*arg;
 	u64					addr;
 	u64					end;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 8f5b6a36a039..d21d1b08a055 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
 				      enum kvm_pgtable_walk_flags visit)
 {
 	struct check_walk_data *d = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
 		return -EINVAL;
 
-	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
+	return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
 }
 
 static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index a293cf5eba1b..6af443c9d78e 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
-	kvm_pte_t pte = *ctx->ptep;
 	phys_addr_t phys;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	/*
@@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	phys = kvm_pte_to_phys(pte);
+	phys = kvm_pte_to_phys(ctx->old);
 	if (!addr_is_memory(phys))
 		return -EINVAL;
 
@@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	 * Adjust the host stage-2 mappings to match the ownership attributes
 	 * configured in the hypervisor stage-1.
 	 */
-	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
+	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
 		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 900c8b9c0cfc..fb3696b3a997 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
+		.old	= READ_ONCE(*ptep),
 		.arg	= data->walker->arg,
 		.addr	= data->addr,
 		.end	= data->end,
@@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.flags	= flags,
 	};
 	int ret = 0;
-	kvm_pte_t *childp, pte = *ptep;
-	bool table = kvm_pte_table(pte, level);
+	kvm_pte_t *childp;
+	bool table = kvm_pte_table(ctx.old, level);
 
 	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
 		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
 
 	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
 		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
-		pte = *ptep;
-		table = kvm_pte_table(pte, level);
+		ctx.old = READ_ONCE(*ptep);
+		table = kvm_pte_table(ctx.old, level);
 	}
 
 	if (ret)
@@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
+	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct leaf_walk_data *data = ctx->arg;
 
-	data->pte   = *ctx->ptep;
+	data->pte   = ctx->old;
 	data->level = ctx->level;
 
 	return 0;
@@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				    struct hyp_map_data *data)
 {
-	kvm_pte_t new, old = *ctx->ptep;
+	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(ctx, phys))
@@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 
 	data->phys += granule;
 	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
-	if (old == new)
+	if (ctx->old == new)
 		return true;
-	if (!kvm_pte_valid(old))
+	if (!kvm_pte_valid(ctx->old))
 		data->mm_ops->get_page(ctx->ptep);
-	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
+	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
 	smp_store_release(ctx->ptep, new);
@@ -461,16 +462,16 @@ struct hyp_unmap_data {
 static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			    enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	kvm_pte_t *childp = NULL;
 	u64 granule = kvm_granule_size(ctx->level);
 	struct hyp_unmap_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return -EINVAL;
 
-	if (kvm_pte_table(pte, ctx->level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			   enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, ctx->level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+	if (kvm_pte_table(ctx->old, ctx->level))
+		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
 
 	return 0;
 }
@@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
-	if (kvm_pte_valid(*ctx->ptep)) {
+	if (kvm_pte_valid(ctx->old)) {
 		kvm_clear_pte(ctx->ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
 	}
@@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	kvm_pte_t new, old = *ctx->ptep;
+	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
@@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
-	if (stage2_pte_is_counted(old)) {
+	if (stage2_pte_is_counted(ctx->old)) {
 		/*
 		 * Skip updating the PTE if we are trying to recreate the exact
 		 * same mapping or only change the access permissions. Instead,
 		 * the vCPU will exit one more time from guest if still needed
 		 * and then go through the path of relaxing permissions.
 		 */
-		if (!stage2_pte_needs_update(old, new))
+		if (!stage2_pte_needs_update(ctx->old, new))
 			return -EAGAIN;
 
 		stage2_put_pte(ctx, data->mmu, mm_ops);
@@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
+	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp, pte = *ctx->ptep;
+	kvm_pte_t *childp;
 	int ret;
 
 	if (data->anchor) {
-		if (stage2_pte_is_counted(pte))
+		if (stage2_pte_is_counted(ctx->old))
 			mm_ops->put_page(ctx->ptep);
 
 		return 0;
@@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (stage2_pte_is_counted(pte))
+	if (stage2_pte_is_counted(ctx->old))
 		stage2_put_pte(ctx, data->mmu, mm_ops);
 
 	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
@@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 		data->childp = NULL;
 		ret = stage2_map_walk_leaf(ctx, data);
 	} else {
-		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 	}
 
 	mm_ops->put_page(childp);
@@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	kvm_pte_t *childp = NULL;
 	bool need_flush = false;
 
-	if (!kvm_pte_valid(pte)) {
-		if (stage2_pte_is_counted(pte)) {
+	if (!kvm_pte_valid(ctx->old)) {
+		if (stage2_pte_is_counted(ctx->old)) {
 			kvm_clear_pte(ctx->ptep);
 			mm_ops->put_page(ctx->ptep);
 		}
 		return 0;
 	}
 
-	if (kvm_pte_table(pte, ctx->level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
-	} else if (stage2_pte_cacheable(pgt, pte)) {
+	} else if (stage2_pte_cacheable(pgt, ctx->old)) {
 		need_flush = !stage2_has_fwb(pgt);
 	}
 
@@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	stage2_put_pte(ctx, mmu, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
 					       kvm_granule_size(ctx->level));
 
 	if (childp)
@@ -1013,11 +1013,11 @@ struct stage2_attr_data {
 static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ctx->ptep;
+	kvm_pte_t pte = ctx->old;
 	struct stage2_attr_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	data->level = ctx->level;
@@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		 * stage-2 PTE if we are going to add executable permission.
 		 */
 		if (mm_ops->icache_inval_pou &&
-		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
+		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
 						  kvm_granule_size(ctx->level));
 		WRITE_ONCE(*ctx->ptep, pte);
@@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
+	if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
 					       kvm_granule_size(ctx->level));
 	return 0;
 }
@@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!stage2_pte_is_counted(pte))
+	if (!stage2_pte_is_counted(ctx->old))
 		return 0;
 
 	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, ctx->level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+	if (kvm_pte_table(ctx->old, ctx->level))
+		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Rather than reading the ptep all over the shop, read the ptep once from
__kvm_pgtable_visit() and stick it in the visitor context. Reread the
ptep after visiting a leaf in case the callback installed a new table
underneath.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
 arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
 4 files changed, 48 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 607f9bb8aab4..14d4b68a1e92 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
 
 struct kvm_pgtable_visit_ctx {
 	kvm_pte_t				*ptep;
+	kvm_pte_t				old;
 	void					*arg;
 	u64					addr;
 	u64					end;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 8f5b6a36a039..d21d1b08a055 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
 				      enum kvm_pgtable_walk_flags visit)
 {
 	struct check_walk_data *d = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
 		return -EINVAL;
 
-	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
+	return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
 }
 
 static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index a293cf5eba1b..6af443c9d78e 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
-	kvm_pte_t pte = *ctx->ptep;
 	phys_addr_t phys;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	/*
@@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	phys = kvm_pte_to_phys(pte);
+	phys = kvm_pte_to_phys(ctx->old);
 	if (!addr_is_memory(phys))
 		return -EINVAL;
 
@@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
 	 * Adjust the host stage-2 mappings to match the ownership attributes
 	 * configured in the hypervisor stage-1.
 	 */
-	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
+	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
 		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 900c8b9c0cfc..fb3696b3a997 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
+		.old	= READ_ONCE(*ptep),
 		.arg	= data->walker->arg,
 		.addr	= data->addr,
 		.end	= data->end,
@@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.flags	= flags,
 	};
 	int ret = 0;
-	kvm_pte_t *childp, pte = *ptep;
-	bool table = kvm_pte_table(pte, level);
+	kvm_pte_t *childp;
+	bool table = kvm_pte_table(ctx.old, level);
 
 	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
 		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
 
 	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
 		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
-		pte = *ptep;
-		table = kvm_pte_table(pte, level);
+		ctx.old = READ_ONCE(*ptep);
+		table = kvm_pte_table(ctx.old, level);
 	}
 
 	if (ret)
@@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
+	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
 	ret = __kvm_pgtable_walk(data, childp, level + 1);
 	if (ret)
 		goto out;
@@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct leaf_walk_data *data = ctx->arg;
 
-	data->pte   = *ctx->ptep;
+	data->pte   = ctx->old;
 	data->level = ctx->level;
 
 	return 0;
@@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
 static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				    struct hyp_map_data *data)
 {
-	kvm_pte_t new, old = *ctx->ptep;
+	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 
 	if (!kvm_block_mapping_supported(ctx, phys))
@@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 
 	data->phys += granule;
 	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
-	if (old == new)
+	if (ctx->old == new)
 		return true;
-	if (!kvm_pte_valid(old))
+	if (!kvm_pte_valid(ctx->old))
 		data->mm_ops->get_page(ctx->ptep);
-	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
+	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
 	smp_store_release(ctx->ptep, new);
@@ -461,16 +462,16 @@ struct hyp_unmap_data {
 static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			    enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	kvm_pte_t *childp = NULL;
 	u64 granule = kvm_granule_size(ctx->level);
 	struct hyp_unmap_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return -EINVAL;
 
-	if (kvm_pte_table(pte, ctx->level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
@@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			   enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, ctx->level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+	if (kvm_pte_table(ctx->old, ctx->level))
+		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
 
 	return 0;
 }
@@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
-	if (kvm_pte_valid(*ctx->ptep)) {
+	if (kvm_pte_valid(ctx->old)) {
 		kvm_clear_pte(ctx->ptep);
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
 	}
@@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	kvm_pte_t new, old = *ctx->ptep;
+	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
@@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
-	if (stage2_pte_is_counted(old)) {
+	if (stage2_pte_is_counted(ctx->old)) {
 		/*
 		 * Skip updating the PTE if we are trying to recreate the exact
 		 * same mapping or only change the access permissions. Instead,
 		 * the vCPU will exit one more time from guest if still needed
 		 * and then go through the path of relaxing permissions.
 		 */
-		if (!stage2_pte_needs_update(old, new))
+		if (!stage2_pte_needs_update(ctx->old, new))
 			return -EAGAIN;
 
 		stage2_put_pte(ctx, data->mmu, mm_ops);
@@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
+	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
-	kvm_pte_t *childp, pte = *ctx->ptep;
+	kvm_pte_t *childp;
 	int ret;
 
 	if (data->anchor) {
-		if (stage2_pte_is_counted(pte))
+		if (stage2_pte_is_counted(ctx->old))
 			mm_ops->put_page(ctx->ptep);
 
 		return 0;
@@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (stage2_pte_is_counted(pte))
+	if (stage2_pte_is_counted(ctx->old))
 		stage2_put_pte(ctx, data->mmu, mm_ops);
 
 	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
@@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 		data->childp = NULL;
 		ret = stage2_map_walk_leaf(ctx, data);
 	} else {
-		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 	}
 
 	mm_ops->put_page(childp);
@@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
+	kvm_pte_t *childp = NULL;
 	bool need_flush = false;
 
-	if (!kvm_pte_valid(pte)) {
-		if (stage2_pte_is_counted(pte)) {
+	if (!kvm_pte_valid(ctx->old)) {
+		if (stage2_pte_is_counted(ctx->old)) {
 			kvm_clear_pte(ctx->ptep);
 			mm_ops->put_page(ctx->ptep);
 		}
 		return 0;
 	}
 
-	if (kvm_pte_table(pte, ctx->level)) {
-		childp = kvm_pte_follow(pte, mm_ops);
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		childp = kvm_pte_follow(ctx->old, mm_ops);
 
 		if (mm_ops->page_count(childp) != 1)
 			return 0;
-	} else if (stage2_pte_cacheable(pgt, pte)) {
+	} else if (stage2_pte_cacheable(pgt, ctx->old)) {
 		need_flush = !stage2_has_fwb(pgt);
 	}
 
@@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	stage2_put_pte(ctx, mmu, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
 					       kvm_granule_size(ctx->level));
 
 	if (childp)
@@ -1013,11 +1013,11 @@ struct stage2_attr_data {
 static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t pte = *ctx->ptep;
+	kvm_pte_t pte = ctx->old;
 	struct stage2_attr_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
 	data->level = ctx->level;
@@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		 * stage-2 PTE if we are going to add executable permission.
 		 */
 		if (mm_ops->icache_inval_pou &&
-		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
+		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
 						  kvm_granule_size(ctx->level));
 		WRITE_ONCE(*ctx->ptep, pte);
@@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
+	if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
 		return 0;
 
 	if (mm_ops->dcache_clean_inval_poc)
-		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
+		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
 					       kvm_granule_size(ctx->level));
 	return 0;
 }
@@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
-	kvm_pte_t pte = *ctx->ptep;
 
-	if (!stage2_pte_is_counted(pte))
+	if (!stage2_pte_is_counted(ctx->old))
 		return 0;
 
 	mm_ops->put_page(ctx->ptep);
 
-	if (kvm_pte_table(pte, ctx->level))
-		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
+	if (kvm_pte_table(ctx->old, ctx->level))
+		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

As a prerequisite for getting visitors off of struct kvm_pgtable, pass
mm_ops through the visitor context.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h |  1 +
 arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
 arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
 3 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 14d4b68a1e92..a752793482cb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
 	kvm_pte_t				*ptep;
 	kvm_pte_t				old;
 	void					*arg;
+	struct kvm_pgtable_mm_ops		*mm_ops;
 	u64					addr;
 	u64					end;
 	u32					level;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 6af443c9d78e..1068338d77f3 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
 static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
 					 enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	phys_addr_t phys;
@@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
 	struct kvm_pgtable_walker walker = {
 		.cb	= finalize_host_mappings_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
 	};
 	int i, ret;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index fb3696b3a997..db25e81a9890 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level);
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
+				      struct kvm_pgtable_mm_ops *mm_ops,
 				      kvm_pte_t *ptep, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
@@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
 		.arg	= data->walker->arg,
+		.mm_ops	= mm_ops,
 		.addr	= data->addr,
 		.end	= data->end,
 		.level	= level,
@@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
-	ret = __kvm_pgtable_walk(data, childp, level + 1);
+	childp = kvm_pte_follow(ctx.old, mm_ops);
+	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
 	if (ret)
 		goto out;
 
@@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level)
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
 {
 	u32 idx;
 	int ret = 0;
@@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 		if (data->addr >= data->end)
 			break;
 
-		ret = __kvm_pgtable_visit(data, ptep, level);
+		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
 		if (ret)
 			break;
 	}
@@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
 	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
 		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
 
-		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
+		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
 		if (ret)
 			break;
 	}
@@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
 struct hyp_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
@@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (ctx->old == new)
 		return true;
 	if (!kvm_pte_valid(ctx->old))
-		data->mm_ops->get_page(ctx->ptep);
+		ctx->mm_ops->get_page(ctx->ptep);
 	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
@@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	kvm_pte_t *childp;
 	struct hyp_map_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (hyp_map_walker_try_leaf(ctx, data))
 		return 0;
@@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	return ret;
 }
 
-struct hyp_unmap_data {
-	u64				unmapped;
-	struct kvm_pgtable_mm_ops	*mm_ops;
-};
-
 static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			    enum kvm_pgtable_walk_flags visit)
 {
 	kvm_pte_t *childp = NULL;
 	u64 granule = kvm_granule_size(ctx->level);
-	struct hyp_unmap_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	u64 *unmapped = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return -EINVAL;
@@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
 		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
-		data->unmapped += granule;
+		*unmapped += granule;
 	}
 
 	dsb(ish);
@@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
-	struct hyp_unmap_data unmap_data = {
-		.mm_ops	= pgt->mm_ops,
-	};
+	u64 unmapped = 0;
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_unmap_walker,
-		.arg	= &unmap_data,
+		.arg	= &unmapped,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 		return 0;
 
 	kvm_pgtable_walk(pgt, addr, size, &walker);
-	return unmap_data.unmapped;
+	return unmapped;
 }
 
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
@@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			   enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return 0;
@@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
@@ -575,8 +567,6 @@ struct stage2_map_data {
 	struct kvm_s2_mmu		*mmu;
 	void				*memcache;
 
-	struct kvm_pgtable_mm_ops	*mm_ops;
-
 	/* Force mappings to page granularity */
 	bool				force_pte;
 };
@@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return -E2BIG;
@@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
+	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp;
 	int ret;
 
@@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp;
 	int ret = 0;
 
@@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
 	};
 	struct kvm_pgtable_walker walker = {
@@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= KVM_PHYS_INVALID,
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.owner_id	= owner_id,
 		.force_pte	= true,
 	};
@@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
-	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp = NULL;
 	bool need_flush = false;
 
@@ -1007,7 +995,6 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
 	u32				level;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
@@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	kvm_pte_t pte = ctx->old;
 	struct stage2_attr_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return 0;
@@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct stage2_attr_data data = {
 		.attr_set	= attr_set & attr_mask,
 		.attr_clr	= attr_clr & attr_mask,
-		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
@@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!stage2_pte_is_counted(ctx->old))
 		return 0;
@@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

As a prerequisite for getting visitors off of struct kvm_pgtable, pass
mm_ops through the visitor context.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h |  1 +
 arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
 arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
 3 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 14d4b68a1e92..a752793482cb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
 	kvm_pte_t				*ptep;
 	kvm_pte_t				old;
 	void					*arg;
+	struct kvm_pgtable_mm_ops		*mm_ops;
 	u64					addr;
 	u64					end;
 	u32					level;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 6af443c9d78e..1068338d77f3 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
 static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
 					 enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	phys_addr_t phys;
@@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
 	struct kvm_pgtable_walker walker = {
 		.cb	= finalize_host_mappings_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
 	};
 	int i, ret;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index fb3696b3a997..db25e81a9890 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level);
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
+				      struct kvm_pgtable_mm_ops *mm_ops,
 				      kvm_pte_t *ptep, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
@@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
 		.arg	= data->walker->arg,
+		.mm_ops	= mm_ops,
 		.addr	= data->addr,
 		.end	= data->end,
 		.level	= level,
@@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
-	ret = __kvm_pgtable_walk(data, childp, level + 1);
+	childp = kvm_pte_follow(ctx.old, mm_ops);
+	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
 	if (ret)
 		goto out;
 
@@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level)
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
 {
 	u32 idx;
 	int ret = 0;
@@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 		if (data->addr >= data->end)
 			break;
 
-		ret = __kvm_pgtable_visit(data, ptep, level);
+		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
 		if (ret)
 			break;
 	}
@@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
 	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
 		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
 
-		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
+		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
 		if (ret)
 			break;
 	}
@@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
 struct hyp_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
@@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (ctx->old == new)
 		return true;
 	if (!kvm_pte_valid(ctx->old))
-		data->mm_ops->get_page(ctx->ptep);
+		ctx->mm_ops->get_page(ctx->ptep);
 	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
@@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	kvm_pte_t *childp;
 	struct hyp_map_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (hyp_map_walker_try_leaf(ctx, data))
 		return 0;
@@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	return ret;
 }
 
-struct hyp_unmap_data {
-	u64				unmapped;
-	struct kvm_pgtable_mm_ops	*mm_ops;
-};
-
 static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			    enum kvm_pgtable_walk_flags visit)
 {
 	kvm_pte_t *childp = NULL;
 	u64 granule = kvm_granule_size(ctx->level);
-	struct hyp_unmap_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	u64 *unmapped = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return -EINVAL;
@@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
 		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
-		data->unmapped += granule;
+		*unmapped += granule;
 	}
 
 	dsb(ish);
@@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
-	struct hyp_unmap_data unmap_data = {
-		.mm_ops	= pgt->mm_ops,
-	};
+	u64 unmapped = 0;
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_unmap_walker,
-		.arg	= &unmap_data,
+		.arg	= &unmapped,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 		return 0;
 
 	kvm_pgtable_walk(pgt, addr, size, &walker);
-	return unmap_data.unmapped;
+	return unmapped;
 }
 
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
@@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			   enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return 0;
@@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
@@ -575,8 +567,6 @@ struct stage2_map_data {
 	struct kvm_s2_mmu		*mmu;
 	void				*memcache;
 
-	struct kvm_pgtable_mm_ops	*mm_ops;
-
 	/* Force mappings to page granularity */
 	bool				force_pte;
 };
@@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return -E2BIG;
@@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
+	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp;
 	int ret;
 
@@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp;
 	int ret = 0;
 
@@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
 	};
 	struct kvm_pgtable_walker walker = {
@@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= KVM_PHYS_INVALID,
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.owner_id	= owner_id,
 		.force_pte	= true,
 	};
@@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
-	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp = NULL;
 	bool need_flush = false;
 
@@ -1007,7 +995,6 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
 	u32				level;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
@@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	kvm_pte_t pte = ctx->old;
 	struct stage2_attr_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return 0;
@@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct stage2_attr_data data = {
 		.attr_set	= attr_set & attr_mask,
 		.attr_clr	= attr_clr & attr_mask,
-		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
@@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!stage2_pte_is_counted(ctx->old))
 		return 0;
@@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

As a prerequisite for getting visitors off of struct kvm_pgtable, pass
mm_ops through the visitor context.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h |  1 +
 arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
 arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
 3 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 14d4b68a1e92..a752793482cb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
 	kvm_pte_t				*ptep;
 	kvm_pte_t				old;
 	void					*arg;
+	struct kvm_pgtable_mm_ops		*mm_ops;
 	u64					addr;
 	u64					end;
 	u32					level;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 6af443c9d78e..1068338d77f3 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
 static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
 					 enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	phys_addr_t phys;
@@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
 	struct kvm_pgtable_walker walker = {
 		.cb	= finalize_host_mappings_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
 	};
 	int i, ret;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index fb3696b3a997..db25e81a9890 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level);
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
+				      struct kvm_pgtable_mm_ops *mm_ops,
 				      kvm_pte_t *ptep, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
@@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
 		.arg	= data->walker->arg,
+		.mm_ops	= mm_ops,
 		.addr	= data->addr,
 		.end	= data->end,
 		.level	= level,
@@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
-	ret = __kvm_pgtable_walk(data, childp, level + 1);
+	childp = kvm_pte_follow(ctx.old, mm_ops);
+	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
 	if (ret)
 		goto out;
 
@@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      kvm_pte_t *pgtable, u32 level)
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
 {
 	u32 idx;
 	int ret = 0;
@@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 		if (data->addr >= data->end)
 			break;
 
-		ret = __kvm_pgtable_visit(data, ptep, level);
+		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
 		if (ret)
 			break;
 	}
@@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
 	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
 		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
 
-		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
+		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
 		if (ret)
 			break;
 	}
@@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
 struct hyp_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
@@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (ctx->old == new)
 		return true;
 	if (!kvm_pte_valid(ctx->old))
-		data->mm_ops->get_page(ctx->ptep);
+		ctx->mm_ops->get_page(ctx->ptep);
 	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
 		return false;
 
@@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	kvm_pte_t *childp;
 	struct hyp_map_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (hyp_map_walker_try_leaf(ctx, data))
 		return 0;
@@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	int ret;
 	struct hyp_map_data map_data = {
 		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
-		.mm_ops	= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_map_walker,
@@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	return ret;
 }
 
-struct hyp_unmap_data {
-	u64				unmapped;
-	struct kvm_pgtable_mm_ops	*mm_ops;
-};
-
 static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			    enum kvm_pgtable_walk_flags visit)
 {
 	kvm_pte_t *childp = NULL;
 	u64 granule = kvm_granule_size(ctx->level);
-	struct hyp_unmap_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	u64 *unmapped = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return -EINVAL;
@@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		kvm_clear_pte(ctx->ptep);
 		dsb(ishst);
 		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
-		data->unmapped += granule;
+		*unmapped += granule;
 	}
 
 	dsb(ish);
@@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
 u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
-	struct hyp_unmap_data unmap_data = {
-		.mm_ops	= pgt->mm_ops,
-	};
+	u64 unmapped = 0;
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_unmap_walker,
-		.arg	= &unmap_data,
+		.arg	= &unmapped,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
@@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 		return 0;
 
 	kvm_pgtable_walk(pgt, addr, size, &walker);
-	return unmap_data.unmapped;
+	return unmapped;
 }
 
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
@@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			   enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return 0;
@@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
@@ -575,8 +567,6 @@ struct stage2_map_data {
 	struct kvm_s2_mmu		*mmu;
 	void				*memcache;
 
-	struct kvm_pgtable_mm_ops	*mm_ops;
-
 	/* Force mappings to page granularity */
 	bool				force_pte;
 };
@@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	kvm_pte_t new;
 	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
 	struct kvm_pgtable *pgt = data->mmu->pgt;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return -E2BIG;
@@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
+	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp;
 	int ret;
 
@@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
 				      struct stage2_map_data *data)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp;
 	int ret = 0;
 
@@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
 	};
 	struct kvm_pgtable_walker walker = {
@@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.phys		= KVM_PHYS_INVALID,
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.mm_ops		= pgt->mm_ops,
 		.owner_id	= owner_id,
 		.force_pte	= true,
 	};
@@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct kvm_pgtable *pgt = ctx->arg;
 	struct kvm_s2_mmu *mmu = pgt->mmu;
-	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 	kvm_pte_t *childp = NULL;
 	bool need_flush = false;
 
@@ -1007,7 +995,6 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
 	u32				level;
-	struct kvm_pgtable_mm_ops	*mm_ops;
 };
 
 static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
@@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	kvm_pte_t pte = ctx->old;
 	struct stage2_attr_data *data = ctx->arg;
-	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!kvm_pte_valid(ctx->old))
 		return 0;
@@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct stage2_attr_data data = {
 		.attr_set	= attr_set & attr_mask,
 		.attr_clr	= attr_clr & attr_mask,
-		.mm_ops		= pgt->mm_ops,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
@@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			      enum kvm_pgtable_walk_flags visit)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
 	if (!stage2_pte_is_counted(ctx->old))
 		return 0;
@@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
 			  KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pgt->mm_ops,
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

In order to tear down page tables from outside the context of
kvm_pgtable (such as an RCU callback), stop passing a pointer through
kvm_pgtable_walk_data.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index db25e81a9890..93989b750a26 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -50,7 +50,6 @@
 #define KVM_MAX_OWNER_ID		1
 
 struct kvm_pgtable_walk_data {
-	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
 
 	u64				addr;
@@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
 	return (data->addr >> shift) & mask;
 }
 
-static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
+static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
 {
 	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
 	u64 mask = BIT(pgt->ia_bits) - 1;
@@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
 	return (addr & mask) >> shift;
 }
 
-static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
-{
-	return __kvm_pgd_page_idx(data->pgt, data->addr);
-}
-
 static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 {
 	struct kvm_pgtable pgt = {
@@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 		.start_level	= start_level,
 	};
 
-	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
+	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
 static bool kvm_pte_table(kvm_pte_t pte, u32 level)
@@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 	return ret;
 }
 
-static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
+static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
 {
 	u32 idx;
 	int ret = 0;
-	struct kvm_pgtable *pgt = data->pgt;
 	u64 limit = BIT(pgt->ia_bits);
 
 	if (data->addr > limit || data->end > limit)
@@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
 	if (!pgt->pgd)
 		return -EINVAL;
 
-	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
+	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
 		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
 
 		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
@@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		     struct kvm_pgtable_walker *walker)
 {
 	struct kvm_pgtable_walk_data walk_data = {
-		.pgt	= pgt,
 		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
 		.end	= PAGE_ALIGN(walk_data.addr + size),
 		.walker	= walker,
 	};
 
-	return _kvm_pgtable_walk(&walk_data);
+	return _kvm_pgtable_walk(pgt, &walk_data);
 }
 
 struct leaf_walk_data {
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

In order to tear down page tables from outside the context of
kvm_pgtable (such as an RCU callback), stop passing a pointer through
kvm_pgtable_walk_data.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index db25e81a9890..93989b750a26 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -50,7 +50,6 @@
 #define KVM_MAX_OWNER_ID		1
 
 struct kvm_pgtable_walk_data {
-	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
 
 	u64				addr;
@@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
 	return (data->addr >> shift) & mask;
 }
 
-static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
+static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
 {
 	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
 	u64 mask = BIT(pgt->ia_bits) - 1;
@@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
 	return (addr & mask) >> shift;
 }
 
-static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
-{
-	return __kvm_pgd_page_idx(data->pgt, data->addr);
-}
-
 static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 {
 	struct kvm_pgtable pgt = {
@@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 		.start_level	= start_level,
 	};
 
-	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
+	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
 static bool kvm_pte_table(kvm_pte_t pte, u32 level)
@@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 	return ret;
 }
 
-static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
+static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
 {
 	u32 idx;
 	int ret = 0;
-	struct kvm_pgtable *pgt = data->pgt;
 	u64 limit = BIT(pgt->ia_bits);
 
 	if (data->addr > limit || data->end > limit)
@@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
 	if (!pgt->pgd)
 		return -EINVAL;
 
-	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
+	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
 		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
 
 		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
@@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		     struct kvm_pgtable_walker *walker)
 {
 	struct kvm_pgtable_walk_data walk_data = {
-		.pgt	= pgt,
 		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
 		.end	= PAGE_ALIGN(walk_data.addr + size),
 		.walker	= walker,
 	};
 
-	return _kvm_pgtable_walk(&walk_data);
+	return _kvm_pgtable_walk(pgt, &walk_data);
 }
 
 struct leaf_walk_data {
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

In order to tear down page tables from outside the context of
kvm_pgtable (such as an RCU callback), stop passing a pointer through
kvm_pgtable_walk_data.

No functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index db25e81a9890..93989b750a26 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -50,7 +50,6 @@
 #define KVM_MAX_OWNER_ID		1
 
 struct kvm_pgtable_walk_data {
-	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
 
 	u64				addr;
@@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
 	return (data->addr >> shift) & mask;
 }
 
-static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
+static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
 {
 	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
 	u64 mask = BIT(pgt->ia_bits) - 1;
@@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
 	return (addr & mask) >> shift;
 }
 
-static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
-{
-	return __kvm_pgd_page_idx(data->pgt, data->addr);
-}
-
 static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 {
 	struct kvm_pgtable pgt = {
@@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 		.start_level	= start_level,
 	};
 
-	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
+	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
 static bool kvm_pte_table(kvm_pte_t pte, u32 level)
@@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 	return ret;
 }
 
-static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
+static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
 {
 	u32 idx;
 	int ret = 0;
-	struct kvm_pgtable *pgt = data->pgt;
 	u64 limit = BIT(pgt->ia_bits);
 
 	if (data->addr > limit || data->end > limit)
@@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
 	if (!pgt->pgd)
 		return -EINVAL;
 
-	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
+	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
 		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
 
 		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
@@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		     struct kvm_pgtable_walker *walker)
 {
 	struct kvm_pgtable_walk_data walk_data = {
-		.pgt	= pgt,
 		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
 		.end	= PAGE_ALIGN(walk_data.addr + size),
 		.walker	= walker,
 	};
 
-	return _kvm_pgtable_walk(&walk_data);
+	return _kvm_pgtable_walk(pgt, &walk_data);
 }
 
 struct leaf_walk_data {
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

A subsequent change to KVM will move the tear down of an unlinked
stage-2 subtree out of the critical path of the break-before-make
sequence.

Introduce a new helper for tearing down unlinked stage-2 subtrees.
Leverage the existing stage-2 free walkers to do so, with a deep call
into __kvm_pgtable_walk() as the subtree is no longer reachable from the
root.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a752793482cb..93b1feeaebab 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
  */
 void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
 
+/**
+ * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
+ * @mm_ops:	Memory management callbacks.
+ * @pgtable:	Unlinked stage-2 paging structure to be freed.
+ * @level:	Level of the stage-2 paging structure to be freed.
+ *
+ * The page-table is assumed to be unreachable by any hardware walkers prior to
+ * freeing and therefore no TLB invalidation is performed.
+ */
+void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
+
 /**
  * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 93989b750a26..363a5cce7e1a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
+
+void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
+{
+	kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
+	struct kvm_pgtable_walker walker = {
+		.cb	= stage2_free_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF |
+			  KVM_PGTABLE_WALK_TABLE_POST,
+	};
+	struct kvm_pgtable_walk_data data = {
+		.walker	= &walker,
+
+		/*
+		 * At this point the IPA really doesn't matter, as the page
+		 * table being traversed has already been removed from the stage
+		 * 2. Set an appropriate range to cover the entire page table.
+		 */
+		.addr	= 0,
+		.end	= kvm_granule_size(level),
+	};
+
+	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
+}
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

A subsequent change to KVM will move the tear down of an unlinked
stage-2 subtree out of the critical path of the break-before-make
sequence.

Introduce a new helper for tearing down unlinked stage-2 subtrees.
Leverage the existing stage-2 free walkers to do so, with a deep call
into __kvm_pgtable_walk() as the subtree is no longer reachable from the
root.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a752793482cb..93b1feeaebab 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
  */
 void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
 
+/**
+ * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
+ * @mm_ops:	Memory management callbacks.
+ * @pgtable:	Unlinked stage-2 paging structure to be freed.
+ * @level:	Level of the stage-2 paging structure to be freed.
+ *
+ * The page-table is assumed to be unreachable by any hardware walkers prior to
+ * freeing and therefore no TLB invalidation is performed.
+ */
+void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
+
 /**
  * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 93989b750a26..363a5cce7e1a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
+
+void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
+{
+	kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
+	struct kvm_pgtable_walker walker = {
+		.cb	= stage2_free_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF |
+			  KVM_PGTABLE_WALK_TABLE_POST,
+	};
+	struct kvm_pgtable_walk_data data = {
+		.walker	= &walker,
+
+		/*
+		 * At this point the IPA really doesn't matter, as the page
+		 * table being traversed has already been removed from the stage
+		 * 2. Set an appropriate range to cover the entire page table.
+		 */
+		.addr	= 0,
+		.end	= kvm_granule_size(level),
+	};
+
+	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
+}
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

A subsequent change to KVM will move the tear down of an unlinked
stage-2 subtree out of the critical path of the break-before-make
sequence.

Introduce a new helper for tearing down unlinked stage-2 subtrees.
Leverage the existing stage-2 free walkers to do so, with a deep call
into __kvm_pgtable_walk() as the subtree is no longer reachable from the
root.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a752793482cb..93b1feeaebab 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
  */
 void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
 
+/**
+ * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
+ * @mm_ops:	Memory management callbacks.
+ * @pgtable:	Unlinked stage-2 paging structure to be freed.
+ * @level:	Level of the stage-2 paging structure to be freed.
+ *
+ * The page-table is assumed to be unreachable by any hardware walkers prior to
+ * freeing and therefore no TLB invalidation is performed.
+ */
+void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
+
 /**
  * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 93989b750a26..363a5cce7e1a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
 	pgt->pgd = NULL;
 }
+
+void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
+{
+	kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
+	struct kvm_pgtable_walker walker = {
+		.cb	= stage2_free_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF |
+			  KVM_PGTABLE_WALK_TABLE_POST,
+	};
+	struct kvm_pgtable_walk_data data = {
+		.walker	= &walker,
+
+		/*
+		 * At this point the IPA really doesn't matter, as the page
+		 * table being traversed has already been removed from the stage
+		 * 2. Set an appropriate range to cover the entire page table.
+		 */
+		.addr	= 0,
+		.end	= kvm_granule_size(level),
+	};
+
+	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
+}
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 06/14] KVM: arm64: Use an opaque type for pteps
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Use an opaque type for pteps and require visitors explicitly dereference
the pointer before using. Protecting page table memory with RCU requires
that KVM dereferences RCU-annotated pointers before using. However, RCU
is not available for use in the nVHE hypervisor and the opaque type can
be conditionally annotated with RCU for the stage-2 MMU.

Call the type a 'pteref' to avoid a naming collision with raw pteps. No
functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h |  9 ++++++++-
 arch/arm64/kvm/hyp/pgtable.c         | 27 ++++++++++++++-------------
 arch/arm64/kvm/mmu.c                 |  2 +-
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 93b1feeaebab..cbd2851eefc1 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
 
 typedef u64 kvm_pte_t;
 
+typedef kvm_pte_t *kvm_pteref_t;
+
+static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
+{
+	return pteref;
+}
+
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
@@ -175,7 +182,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
 struct kvm_pgtable {
 	u32					ia_bits;
 	u32					start_level;
-	kvm_pte_t				*pgd;
+	kvm_pteref_t				pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 
 	/* Stage-2 only */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 363a5cce7e1a..7511494537e5 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -175,13 +175,14 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      struct kvm_pgtable_mm_ops *mm_ops,
-				      kvm_pte_t *ptep, u32 level)
+				      kvm_pteref_t pteref, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
+	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
@@ -193,7 +194,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.flags	= flags,
 	};
 	int ret = 0;
-	kvm_pte_t *childp;
+	kvm_pteref_t childp;
 	bool table = kvm_pte_table(ctx.old, level);
 
 	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
@@ -214,7 +215,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(ctx.old, mm_ops);
+	childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
 	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
 	if (ret)
 		goto out;
@@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
 {
 	u32 idx;
 	int ret = 0;
@@ -236,12 +237,12 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 		return -EINVAL;
 
 	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
-		kvm_pte_t *ptep = &pgtable[idx];
+		kvm_pteref_t pteref = &pgtable[idx];
 
 		if (data->addr >= data->end)
 			break;
 
-		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
+		ret = __kvm_pgtable_visit(data, mm_ops, pteref, level);
 		if (ret)
 			break;
 	}
@@ -262,9 +263,9 @@ static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_da
 		return -EINVAL;
 
 	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
-		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
+		kvm_pteref_t pteref = &pgt->pgd[idx * PTRS_PER_PTE];
 
-		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
+		ret = __kvm_pgtable_walk(data, pgt->mm_ops, pteref, pgt->start_level);
 		if (ret)
 			break;
 	}
@@ -507,7 +508,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
-	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
+	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
@@ -544,7 +545,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	pgt->mm_ops->put_page(pgt->pgd);
+	pgt->mm_ops->put_page(kvm_dereference_pteref(pgt->pgd, false));
 	pgt->pgd = NULL;
 }
 
@@ -1157,7 +1158,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
-	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
+	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
@@ -1200,7 +1201,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
 	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
-	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
+	pgt->mm_ops->free_pages_exact(kvm_dereference_pteref(pgt->pgd, false), pgd_sz);
 	pgt->pgd = NULL;
 }
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..5e197ae190ef 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -640,7 +640,7 @@ static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
 static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 {
 	struct kvm_pgtable pgt = {
-		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
+		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
 		.ia_bits	= VA_BITS,
 		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
 				   CONFIG_PGTABLE_LEVELS),
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 06/14] KVM: arm64: Use an opaque type for pteps
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Use an opaque type for pteps and require visitors explicitly dereference
the pointer before using. Protecting page table memory with RCU requires
that KVM dereferences RCU-annotated pointers before using. However, RCU
is not available for use in the nVHE hypervisor and the opaque type can
be conditionally annotated with RCU for the stage-2 MMU.

Call the type a 'pteref' to avoid a naming collision with raw pteps. No
functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h |  9 ++++++++-
 arch/arm64/kvm/hyp/pgtable.c         | 27 ++++++++++++++-------------
 arch/arm64/kvm/mmu.c                 |  2 +-
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 93b1feeaebab..cbd2851eefc1 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
 
 typedef u64 kvm_pte_t;
 
+typedef kvm_pte_t *kvm_pteref_t;
+
+static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
+{
+	return pteref;
+}
+
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
@@ -175,7 +182,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
 struct kvm_pgtable {
 	u32					ia_bits;
 	u32					start_level;
-	kvm_pte_t				*pgd;
+	kvm_pteref_t				pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 
 	/* Stage-2 only */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 363a5cce7e1a..7511494537e5 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -175,13 +175,14 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      struct kvm_pgtable_mm_ops *mm_ops,
-				      kvm_pte_t *ptep, u32 level)
+				      kvm_pteref_t pteref, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
+	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
@@ -193,7 +194,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.flags	= flags,
 	};
 	int ret = 0;
-	kvm_pte_t *childp;
+	kvm_pteref_t childp;
 	bool table = kvm_pte_table(ctx.old, level);
 
 	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
@@ -214,7 +215,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(ctx.old, mm_ops);
+	childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
 	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
 	if (ret)
 		goto out;
@@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
 {
 	u32 idx;
 	int ret = 0;
@@ -236,12 +237,12 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 		return -EINVAL;
 
 	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
-		kvm_pte_t *ptep = &pgtable[idx];
+		kvm_pteref_t pteref = &pgtable[idx];
 
 		if (data->addr >= data->end)
 			break;
 
-		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
+		ret = __kvm_pgtable_visit(data, mm_ops, pteref, level);
 		if (ret)
 			break;
 	}
@@ -262,9 +263,9 @@ static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_da
 		return -EINVAL;
 
 	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
-		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
+		kvm_pteref_t pteref = &pgt->pgd[idx * PTRS_PER_PTE];
 
-		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
+		ret = __kvm_pgtable_walk(data, pgt->mm_ops, pteref, pgt->start_level);
 		if (ret)
 			break;
 	}
@@ -507,7 +508,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
-	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
+	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
@@ -544,7 +545,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	pgt->mm_ops->put_page(pgt->pgd);
+	pgt->mm_ops->put_page(kvm_dereference_pteref(pgt->pgd, false));
 	pgt->pgd = NULL;
 }
 
@@ -1157,7 +1158,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
-	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
+	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
@@ -1200,7 +1201,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
 	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
-	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
+	pgt->mm_ops->free_pages_exact(kvm_dereference_pteref(pgt->pgd, false), pgd_sz);
 	pgt->pgd = NULL;
 }
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..5e197ae190ef 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -640,7 +640,7 @@ static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
 static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 {
 	struct kvm_pgtable pgt = {
-		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
+		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
 		.ia_bits	= VA_BITS,
 		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
 				   CONFIG_PGTABLE_LEVELS),
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 06/14] KVM: arm64: Use an opaque type for pteps
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Use an opaque type for pteps and require visitors explicitly dereference
the pointer before using. Protecting page table memory with RCU requires
that KVM dereferences RCU-annotated pointers before using. However, RCU
is not available for use in the nVHE hypervisor and the opaque type can
be conditionally annotated with RCU for the stage-2 MMU.

Call the type a 'pteref' to avoid a naming collision with raw pteps. No
functional change intended.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h |  9 ++++++++-
 arch/arm64/kvm/hyp/pgtable.c         | 27 ++++++++++++++-------------
 arch/arm64/kvm/mmu.c                 |  2 +-
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 93b1feeaebab..cbd2851eefc1 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
 
 typedef u64 kvm_pte_t;
 
+typedef kvm_pte_t *kvm_pteref_t;
+
+static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
+{
+	return pteref;
+}
+
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
@@ -175,7 +182,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
 struct kvm_pgtable {
 	u32					ia_bits;
 	u32					start_level;
-	kvm_pte_t				*pgd;
+	kvm_pteref_t				pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 
 	/* Stage-2 only */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 363a5cce7e1a..7511494537e5 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -175,13 +175,14 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      struct kvm_pgtable_mm_ops *mm_ops,
-				      kvm_pte_t *ptep, u32 level)
+				      kvm_pteref_t pteref, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
+	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
@@ -193,7 +194,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		.flags	= flags,
 	};
 	int ret = 0;
-	kvm_pte_t *childp;
+	kvm_pteref_t childp;
 	bool table = kvm_pte_table(ctx.old, level);
 
 	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
@@ -214,7 +215,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 		goto out;
 	}
 
-	childp = kvm_pte_follow(ctx.old, mm_ops);
+	childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
 	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
 	if (ret)
 		goto out;
@@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
 {
 	u32 idx;
 	int ret = 0;
@@ -236,12 +237,12 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
 		return -EINVAL;
 
 	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
-		kvm_pte_t *ptep = &pgtable[idx];
+		kvm_pteref_t pteref = &pgtable[idx];
 
 		if (data->addr >= data->end)
 			break;
 
-		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
+		ret = __kvm_pgtable_visit(data, mm_ops, pteref, level);
 		if (ret)
 			break;
 	}
@@ -262,9 +263,9 @@ static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_da
 		return -EINVAL;
 
 	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
-		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
+		kvm_pteref_t pteref = &pgt->pgd[idx * PTRS_PER_PTE];
 
-		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
+		ret = __kvm_pgtable_walk(data, pgt->mm_ops, pteref, pgt->start_level);
 		if (ret)
 			break;
 	}
@@ -507,7 +508,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 {
 	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
 
-	pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
+	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
@@ -544,7 +545,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	};
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
-	pgt->mm_ops->put_page(pgt->pgd);
+	pgt->mm_ops->put_page(kvm_dereference_pteref(pgt->pgd, false));
 	pgt->pgd = NULL;
 }
 
@@ -1157,7 +1158,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
-	pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
+	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
@@ -1200,7 +1201,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 
 	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
 	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
-	pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
+	pgt->mm_ops->free_pages_exact(kvm_dereference_pteref(pgt->pgd, false), pgd_sz);
 	pgt->pgd = NULL;
 }
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..5e197ae190ef 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -640,7 +640,7 @@ static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
 static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 {
 	struct kvm_pgtable pgt = {
-		.pgd		= (kvm_pte_t *)kvm->mm->pgd,
+		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
 		.ia_bits	= VA_BITS,
 		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
 				   CONFIG_PGTABLE_LEVELS),
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

The break-before-make sequence is a bit annoying as it opens a window
wherein memory is unmapped from the guest. KVM should replace the PTE
as quickly as possible and avoid unnecessary work in between.

Presently, the stage-2 map walker tears down a removed table before
installing a block mapping when coalescing a table into a block. As the
removed table is no longer visible to hardware walkers after the
DSB+TLBI, it is possible to move the remaining cleanup to happen after
installing the new PTE.

Reshuffle the stage-2 map walker to install the new block entry in
the pre-order callback. Unwire all of the teardown logic and replace
it with a call to kvm_pgtable_stage2_free_removed() after fixing
the PTE. The post-order visitor is now completely unnecessary, so drop
it. Finally, touch up the comments to better represent the now
simplified map walker.

Note that the call to tear down the unlinked stage-2 is indirected
as a subsequent change will use an RCU callback to trigger tear down.
RCU is not available to pKVM, so there is a need to use different
implementations on pKVM and non-pKVM VMs.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  3 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 ++
 arch/arm64/kvm/hyp/pgtable.c          | 85 +++++++--------------------
 arch/arm64/kvm/mmu.c                  |  8 +++
 4 files changed, 39 insertions(+), 63 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index cbd2851eefc1..e70cf57b719e 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -92,6 +92,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
  *				allocation is physically contiguous.
  * @free_pages_exact:		Free an exact number of memory pages previously
  *				allocated by zalloc_pages_exact.
+ * @free_removed_table:		Free a removed paging structure by unlinking and
+ *				dropping references.
  * @get_page:			Increment the refcount on a page.
  * @put_page:			Decrement the refcount on a page. When the
  *				refcount reaches 0 the page is automatically
@@ -110,6 +112,7 @@ struct kvm_pgtable_mm_ops {
 	void*		(*zalloc_page)(void *arg);
 	void*		(*zalloc_pages_exact)(size_t size);
 	void		(*free_pages_exact)(void *addr, size_t size);
+	void		(*free_removed_table)(void *addr, u32 level);
 	void		(*get_page)(void *addr);
 	void		(*put_page)(void *addr);
 	int		(*page_count)(void *addr);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d21d1b08a055..735769886b55 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -79,6 +79,11 @@ static void host_s2_put_page(void *addr)
 	hyp_put_page(&host_s2_pool, addr);
 }
 
+static void host_s2_free_removed_table(void *addr, u32 level)
+{
+	kvm_pgtable_stage2_free_removed(&host_kvm.mm_ops, addr, level);
+}
+
 static int prepare_s2_pool(void *pgt_pool_base)
 {
 	unsigned long nr_pages, pfn;
@@ -93,6 +98,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
 	host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
 		.zalloc_pages_exact = host_s2_zalloc_pages_exact,
 		.zalloc_page = host_s2_zalloc_page,
+		.free_removed_table = host_s2_free_removed_table,
 		.phys_to_virt = hyp_phys_to_virt,
 		.virt_to_phys = hyp_virt_to_phys,
 		.page_count = hyp_page_count,
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7511494537e5..7c9782347570 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -750,13 +750,13 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 				     struct stage2_map_data *data)
 {
-	if (data->anchor)
-		return 0;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+	kvm_pte_t *childp = kvm_pte_follow(ctx->old, mm_ops);
+	int ret;
 
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -765,8 +765,13 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	 * individually.
 	 */
 	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-	data->anchor = ctx->ptep;
-	return 0;
+
+	ret = stage2_map_walker_try_leaf(ctx, data);
+
+	mm_ops->put_page(ctx->ptep);
+	mm_ops->free_removed_table(childp, ctx->level);
+
+	return ret;
 }
 
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
@@ -776,13 +781,6 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	kvm_pte_t *childp;
 	int ret;
 
-	if (data->anchor) {
-		if (stage2_pte_is_counted(ctx->old))
-			mm_ops->put_page(ctx->ptep);
-
-		return 0;
-	}
-
 	ret = stage2_map_walker_try_leaf(ctx, data);
 	if (ret != -E2BIG)
 		return ret;
@@ -811,49 +809,14 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	return 0;
 }
 
-static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
-				      struct stage2_map_data *data)
-{
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
-	kvm_pte_t *childp;
-	int ret = 0;
-
-	if (!data->anchor)
-		return 0;
-
-	if (data->anchor == ctx->ptep) {
-		childp = data->childp;
-		data->anchor = NULL;
-		data->childp = NULL;
-		ret = stage2_map_walk_leaf(ctx, data);
-	} else {
-		childp = kvm_pte_follow(ctx->old, mm_ops);
-	}
-
-	mm_ops->put_page(childp);
-	mm_ops->put_page(ctx->ptep);
-
-	return ret;
-}
-
 /*
- * This is a little fiddly, as we use all three of the walk flags. The idea
- * is that the TABLE_PRE callback runs for table entries on the way down,
- * looking for table entries which we could conceivably replace with a
- * block entry for this mapping. If it finds one, then it sets the 'anchor'
- * field in 'struct stage2_map_data' to point at the table entry, before
- * clearing the entry to zero and descending into the now detached table.
- *
- * The behaviour of the LEAF callback then depends on whether or not the
- * anchor has been set. If not, then we're not using a block mapping higher
- * up the table and we perform the mapping at the existing leaves instead.
- * If, on the other hand, the anchor _is_ set, then we drop references to
- * all valid leaves so that the pages beneath the anchor can be freed.
+ * The TABLE_PRE callback runs for table entries on the way down, looking
+ * for table entries which we could conceivably replace with a block entry
+ * for this mapping. If it finds one it replaces the entry and calls
+ * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table.
  *
- * Finally, the TABLE_POST callback does nothing if the anchor has not
- * been set, but otherwise frees the page-table pages while walking back up
- * the page-table, installing the block entry when it revisits the anchor
- * pointer and clearing the anchor to NULL.
+ * Otherwise, the LEAF callback performs the mapping at the existing leaves
+ * instead.
  */
 static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			     enum kvm_pgtable_walk_flags visit)
@@ -865,11 +828,9 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		return stage2_map_walk_table_pre(ctx, data);
 	case KVM_PGTABLE_WALK_LEAF:
 		return stage2_map_walk_leaf(ctx, data);
-	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(ctx, data);
+	default:
+		return -EINVAL;
 	}
-
-	return -EINVAL;
 }
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -886,8 +847,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
 		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
-				  KVM_PGTABLE_WALK_LEAF |
-				  KVM_PGTABLE_WALK_TABLE_POST,
+				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
 
@@ -917,8 +877,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
 		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
-				  KVM_PGTABLE_WALK_LEAF |
-				  KVM_PGTABLE_WALK_TABLE_POST,
+				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
 
@@ -1207,7 +1166,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 
 void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
 {
-	kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
+	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
@@ -1225,5 +1184,5 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
 		.end	= kvm_granule_size(level),
 	};
 
-	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
+	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level + 1));
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5e197ae190ef..73ae908eb5d9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -128,6 +128,13 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
 	free_pages_exact(virt, size);
 }
 
+static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
+
+static void stage2_free_removed_table(void *addr, u32 level)
+{
+	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
+}
+
 static void kvm_host_get_page(void *addr)
 {
 	get_page(virt_to_page(addr));
@@ -662,6 +669,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.zalloc_page		= stage2_memcache_zalloc_page,
 	.zalloc_pages_exact	= kvm_s2_zalloc_pages_exact,
 	.free_pages_exact	= kvm_s2_free_pages_exact,
+	.free_removed_table	= stage2_free_removed_table,
 	.get_page		= kvm_host_get_page,
 	.put_page		= kvm_s2_put_page,
 	.page_count		= kvm_host_page_count,
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

The break-before-make sequence is a bit annoying as it opens a window
wherein memory is unmapped from the guest. KVM should replace the PTE
as quickly as possible and avoid unnecessary work in between.

Presently, the stage-2 map walker tears down a removed table before
installing a block mapping when coalescing a table into a block. As the
removed table is no longer visible to hardware walkers after the
DSB+TLBI, it is possible to move the remaining cleanup to happen after
installing the new PTE.

Reshuffle the stage-2 map walker to install the new block entry in
the pre-order callback. Unwire all of the teardown logic and replace
it with a call to kvm_pgtable_stage2_free_removed() after fixing
the PTE. The post-order visitor is now completely unnecessary, so drop
it. Finally, touch up the comments to better represent the now
simplified map walker.

Note that the call to tear down the unlinked stage-2 is indirected
as a subsequent change will use an RCU callback to trigger tear down.
RCU is not available to pKVM, so there is a need to use different
implementations on pKVM and non-pKVM VMs.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  3 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 ++
 arch/arm64/kvm/hyp/pgtable.c          | 85 +++++++--------------------
 arch/arm64/kvm/mmu.c                  |  8 +++
 4 files changed, 39 insertions(+), 63 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index cbd2851eefc1..e70cf57b719e 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -92,6 +92,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
  *				allocation is physically contiguous.
  * @free_pages_exact:		Free an exact number of memory pages previously
  *				allocated by zalloc_pages_exact.
+ * @free_removed_table:		Free a removed paging structure by unlinking and
+ *				dropping references.
  * @get_page:			Increment the refcount on a page.
  * @put_page:			Decrement the refcount on a page. When the
  *				refcount reaches 0 the page is automatically
@@ -110,6 +112,7 @@ struct kvm_pgtable_mm_ops {
 	void*		(*zalloc_page)(void *arg);
 	void*		(*zalloc_pages_exact)(size_t size);
 	void		(*free_pages_exact)(void *addr, size_t size);
+	void		(*free_removed_table)(void *addr, u32 level);
 	void		(*get_page)(void *addr);
 	void		(*put_page)(void *addr);
 	int		(*page_count)(void *addr);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d21d1b08a055..735769886b55 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -79,6 +79,11 @@ static void host_s2_put_page(void *addr)
 	hyp_put_page(&host_s2_pool, addr);
 }
 
+static void host_s2_free_removed_table(void *addr, u32 level)
+{
+	kvm_pgtable_stage2_free_removed(&host_kvm.mm_ops, addr, level);
+}
+
 static int prepare_s2_pool(void *pgt_pool_base)
 {
 	unsigned long nr_pages, pfn;
@@ -93,6 +98,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
 	host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
 		.zalloc_pages_exact = host_s2_zalloc_pages_exact,
 		.zalloc_page = host_s2_zalloc_page,
+		.free_removed_table = host_s2_free_removed_table,
 		.phys_to_virt = hyp_phys_to_virt,
 		.virt_to_phys = hyp_virt_to_phys,
 		.page_count = hyp_page_count,
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7511494537e5..7c9782347570 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -750,13 +750,13 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 				     struct stage2_map_data *data)
 {
-	if (data->anchor)
-		return 0;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+	kvm_pte_t *childp = kvm_pte_follow(ctx->old, mm_ops);
+	int ret;
 
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -765,8 +765,13 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	 * individually.
 	 */
 	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-	data->anchor = ctx->ptep;
-	return 0;
+
+	ret = stage2_map_walker_try_leaf(ctx, data);
+
+	mm_ops->put_page(ctx->ptep);
+	mm_ops->free_removed_table(childp, ctx->level);
+
+	return ret;
 }
 
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
@@ -776,13 +781,6 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	kvm_pte_t *childp;
 	int ret;
 
-	if (data->anchor) {
-		if (stage2_pte_is_counted(ctx->old))
-			mm_ops->put_page(ctx->ptep);
-
-		return 0;
-	}
-
 	ret = stage2_map_walker_try_leaf(ctx, data);
 	if (ret != -E2BIG)
 		return ret;
@@ -811,49 +809,14 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	return 0;
 }
 
-static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
-				      struct stage2_map_data *data)
-{
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
-	kvm_pte_t *childp;
-	int ret = 0;
-
-	if (!data->anchor)
-		return 0;
-
-	if (data->anchor == ctx->ptep) {
-		childp = data->childp;
-		data->anchor = NULL;
-		data->childp = NULL;
-		ret = stage2_map_walk_leaf(ctx, data);
-	} else {
-		childp = kvm_pte_follow(ctx->old, mm_ops);
-	}
-
-	mm_ops->put_page(childp);
-	mm_ops->put_page(ctx->ptep);
-
-	return ret;
-}
-
 /*
- * This is a little fiddly, as we use all three of the walk flags. The idea
- * is that the TABLE_PRE callback runs for table entries on the way down,
- * looking for table entries which we could conceivably replace with a
- * block entry for this mapping. If it finds one, then it sets the 'anchor'
- * field in 'struct stage2_map_data' to point at the table entry, before
- * clearing the entry to zero and descending into the now detached table.
- *
- * The behaviour of the LEAF callback then depends on whether or not the
- * anchor has been set. If not, then we're not using a block mapping higher
- * up the table and we perform the mapping at the existing leaves instead.
- * If, on the other hand, the anchor _is_ set, then we drop references to
- * all valid leaves so that the pages beneath the anchor can be freed.
+ * The TABLE_PRE callback runs for table entries on the way down, looking
+ * for table entries which we could conceivably replace with a block entry
+ * for this mapping. If it finds one it replaces the entry and calls
+ * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table.
  *
- * Finally, the TABLE_POST callback does nothing if the anchor has not
- * been set, but otherwise frees the page-table pages while walking back up
- * the page-table, installing the block entry when it revisits the anchor
- * pointer and clearing the anchor to NULL.
+ * Otherwise, the LEAF callback performs the mapping at the existing leaves
+ * instead.
  */
 static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			     enum kvm_pgtable_walk_flags visit)
@@ -865,11 +828,9 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		return stage2_map_walk_table_pre(ctx, data);
 	case KVM_PGTABLE_WALK_LEAF:
 		return stage2_map_walk_leaf(ctx, data);
-	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(ctx, data);
+	default:
+		return -EINVAL;
 	}
-
-	return -EINVAL;
 }
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -886,8 +847,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
 		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
-				  KVM_PGTABLE_WALK_LEAF |
-				  KVM_PGTABLE_WALK_TABLE_POST,
+				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
 
@@ -917,8 +877,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
 		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
-				  KVM_PGTABLE_WALK_LEAF |
-				  KVM_PGTABLE_WALK_TABLE_POST,
+				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
 
@@ -1207,7 +1166,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 
 void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
 {
-	kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
+	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
@@ -1225,5 +1184,5 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
 		.end	= kvm_granule_size(level),
 	};
 
-	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
+	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level + 1));
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5e197ae190ef..73ae908eb5d9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -128,6 +128,13 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
 	free_pages_exact(virt, size);
 }
 
+static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
+
+static void stage2_free_removed_table(void *addr, u32 level)
+{
+	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
+}
+
 static void kvm_host_get_page(void *addr)
 {
 	get_page(virt_to_page(addr));
@@ -662,6 +669,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.zalloc_page		= stage2_memcache_zalloc_page,
 	.zalloc_pages_exact	= kvm_s2_zalloc_pages_exact,
 	.free_pages_exact	= kvm_s2_free_pages_exact,
+	.free_removed_table	= stage2_free_removed_table,
 	.get_page		= kvm_host_get_page,
 	.put_page		= kvm_s2_put_page,
 	.page_count		= kvm_host_page_count,
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

The break-before-make sequence is a bit annoying as it opens a window
wherein memory is unmapped from the guest. KVM should replace the PTE
as quickly as possible and avoid unnecessary work in between.

Presently, the stage-2 map walker tears down a removed table before
installing a block mapping when coalescing a table into a block. As the
removed table is no longer visible to hardware walkers after the
DSB+TLBI, it is possible to move the remaining cleanup to happen after
installing the new PTE.

Reshuffle the stage-2 map walker to install the new block entry in
the pre-order callback. Unwire all of the teardown logic and replace
it with a call to kvm_pgtable_stage2_free_removed() after fixing
the PTE. The post-order visitor is now completely unnecessary, so drop
it. Finally, touch up the comments to better represent the now
simplified map walker.

Note that the call to tear down the unlinked stage-2 is indirected
as a subsequent change will use an RCU callback to trigger tear down.
RCU is not available to pKVM, so there is a need to use different
implementations on pKVM and non-pKVM VMs.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  3 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 ++
 arch/arm64/kvm/hyp/pgtable.c          | 85 +++++++--------------------
 arch/arm64/kvm/mmu.c                  |  8 +++
 4 files changed, 39 insertions(+), 63 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index cbd2851eefc1..e70cf57b719e 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -92,6 +92,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
  *				allocation is physically contiguous.
  * @free_pages_exact:		Free an exact number of memory pages previously
  *				allocated by zalloc_pages_exact.
+ * @free_removed_table:		Free a removed paging structure by unlinking and
+ *				dropping references.
  * @get_page:			Increment the refcount on a page.
  * @put_page:			Decrement the refcount on a page. When the
  *				refcount reaches 0 the page is automatically
@@ -110,6 +112,7 @@ struct kvm_pgtable_mm_ops {
 	void*		(*zalloc_page)(void *arg);
 	void*		(*zalloc_pages_exact)(size_t size);
 	void		(*free_pages_exact)(void *addr, size_t size);
+	void		(*free_removed_table)(void *addr, u32 level);
 	void		(*get_page)(void *addr);
 	void		(*put_page)(void *addr);
 	int		(*page_count)(void *addr);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d21d1b08a055..735769886b55 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -79,6 +79,11 @@ static void host_s2_put_page(void *addr)
 	hyp_put_page(&host_s2_pool, addr);
 }
 
+static void host_s2_free_removed_table(void *addr, u32 level)
+{
+	kvm_pgtable_stage2_free_removed(&host_kvm.mm_ops, addr, level);
+}
+
 static int prepare_s2_pool(void *pgt_pool_base)
 {
 	unsigned long nr_pages, pfn;
@@ -93,6 +98,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
 	host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
 		.zalloc_pages_exact = host_s2_zalloc_pages_exact,
 		.zalloc_page = host_s2_zalloc_page,
+		.free_removed_table = host_s2_free_removed_table,
 		.phys_to_virt = hyp_phys_to_virt,
 		.virt_to_phys = hyp_virt_to_phys,
 		.page_count = hyp_page_count,
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7511494537e5..7c9782347570 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -750,13 +750,13 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 				     struct stage2_map_data *data)
 {
-	if (data->anchor)
-		return 0;
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+	kvm_pte_t *childp = kvm_pte_follow(ctx->old, mm_ops);
+	int ret;
 
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
 	kvm_clear_pte(ctx->ptep);
 
 	/*
@@ -765,8 +765,13 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	 * individually.
 	 */
 	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-	data->anchor = ctx->ptep;
-	return 0;
+
+	ret = stage2_map_walker_try_leaf(ctx, data);
+
+	mm_ops->put_page(ctx->ptep);
+	mm_ops->free_removed_table(childp, ctx->level);
+
+	return ret;
 }
 
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
@@ -776,13 +781,6 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	kvm_pte_t *childp;
 	int ret;
 
-	if (data->anchor) {
-		if (stage2_pte_is_counted(ctx->old))
-			mm_ops->put_page(ctx->ptep);
-
-		return 0;
-	}
-
 	ret = stage2_map_walker_try_leaf(ctx, data);
 	if (ret != -E2BIG)
 		return ret;
@@ -811,49 +809,14 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	return 0;
 }
 
-static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
-				      struct stage2_map_data *data)
-{
-	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
-	kvm_pte_t *childp;
-	int ret = 0;
-
-	if (!data->anchor)
-		return 0;
-
-	if (data->anchor == ctx->ptep) {
-		childp = data->childp;
-		data->anchor = NULL;
-		data->childp = NULL;
-		ret = stage2_map_walk_leaf(ctx, data);
-	} else {
-		childp = kvm_pte_follow(ctx->old, mm_ops);
-	}
-
-	mm_ops->put_page(childp);
-	mm_ops->put_page(ctx->ptep);
-
-	return ret;
-}
-
 /*
- * This is a little fiddly, as we use all three of the walk flags. The idea
- * is that the TABLE_PRE callback runs for table entries on the way down,
- * looking for table entries which we could conceivably replace with a
- * block entry for this mapping. If it finds one, then it sets the 'anchor'
- * field in 'struct stage2_map_data' to point at the table entry, before
- * clearing the entry to zero and descending into the now detached table.
- *
- * The behaviour of the LEAF callback then depends on whether or not the
- * anchor has been set. If not, then we're not using a block mapping higher
- * up the table and we perform the mapping at the existing leaves instead.
- * If, on the other hand, the anchor _is_ set, then we drop references to
- * all valid leaves so that the pages beneath the anchor can be freed.
+ * The TABLE_PRE callback runs for table entries on the way down, looking
+ * for table entries which we could conceivably replace with a block entry
+ * for this mapping. If it finds one it replaces the entry and calls
+ * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table.
  *
- * Finally, the TABLE_POST callback does nothing if the anchor has not
- * been set, but otherwise frees the page-table pages while walking back up
- * the page-table, installing the block entry when it revisits the anchor
- * pointer and clearing the anchor to NULL.
+ * Otherwise, the LEAF callback performs the mapping at the existing leaves
+ * instead.
  */
 static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			     enum kvm_pgtable_walk_flags visit)
@@ -865,11 +828,9 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		return stage2_map_walk_table_pre(ctx, data);
 	case KVM_PGTABLE_WALK_LEAF:
 		return stage2_map_walk_leaf(ctx, data);
-	case KVM_PGTABLE_WALK_TABLE_POST:
-		return stage2_map_walk_table_post(ctx, data);
+	default:
+		return -EINVAL;
 	}
-
-	return -EINVAL;
 }
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -886,8 +847,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
 		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
-				  KVM_PGTABLE_WALK_LEAF |
-				  KVM_PGTABLE_WALK_TABLE_POST,
+				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
 
@@ -917,8 +877,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
 		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
-				  KVM_PGTABLE_WALK_LEAF |
-				  KVM_PGTABLE_WALK_TABLE_POST,
+				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
 
@@ -1207,7 +1166,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 
 void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
 {
-	kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
+	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_free_walker,
 		.flags	= KVM_PGTABLE_WALK_LEAF |
@@ -1225,5 +1184,5 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
 		.end	= kvm_granule_size(level),
 	};
 
-	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
+	WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level + 1));
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5e197ae190ef..73ae908eb5d9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -128,6 +128,13 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
 	free_pages_exact(virt, size);
 }
 
+static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
+
+static void stage2_free_removed_table(void *addr, u32 level)
+{
+	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
+}
+
 static void kvm_host_get_page(void *addr)
 {
 	get_page(virt_to_page(addr));
@@ -662,6 +669,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.zalloc_page		= stage2_memcache_zalloc_page,
 	.zalloc_pages_exact	= kvm_s2_zalloc_pages_exact,
 	.free_pages_exact	= kvm_s2_free_pages_exact,
+	.free_removed_table	= stage2_free_removed_table,
 	.get_page		= kvm_host_get_page,
 	.put_page		= kvm_s2_put_page,
 	.page_count		= kvm_host_page_count,
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
release the RCU read lock when traversing the page tables. Defer the
freeing of table memory to an RCU callback. Indirect the calls into RCU
and provide stubs for hypervisor code, as RCU is not available in such a
context.

The RCU protection doesn't amount to much at the moment, as readers are
already protected by the read-write lock (all walkers that free table
memory take the write lock). Nonetheless, a subsequent change will
futher relax the locking requirements around the stage-2 MMU, thereby
depending on RCU.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
 arch/arm64/kvm/mmu.c                 | 14 +++++++-
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index e70cf57b719e..7634b6964779 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
 
 typedef u64 kvm_pte_t;
 
+/*
+ * RCU cannot be used in a non-kernel context such as the hyp. As such, page
+ * table walkers used in hyp do not call into RCU and instead use other
+ * synchronization mechanisms (such as a spinlock).
+ */
+#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
+
 typedef kvm_pte_t *kvm_pteref_t;
 
 static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
@@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
 	return pteref;
 }
 
+static inline void kvm_pgtable_walk_begin(void) {}
+static inline void kvm_pgtable_walk_end(void) {}
+
+static inline bool kvm_pgtable_walk_lock_held(void)
+{
+	return true;
+}
+
+#else
+
+typedef kvm_pte_t __rcu *kvm_pteref_t;
+
+static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
+{
+	return rcu_dereference_check(pteref, !shared);
+}
+
+static inline void kvm_pgtable_walk_begin(void)
+{
+	rcu_read_lock();
+}
+
+static inline void kvm_pgtable_walk_end(void)
+{
+	rcu_read_unlock();
+}
+
+static inline bool kvm_pgtable_walk_lock_held(void)
+{
+	return rcu_read_lock_held();
+}
+
+#endif
+
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
@@ -202,11 +243,14 @@ struct kvm_pgtable {
  *					children.
  * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
  *					children.
+ * @KVM_PGTABLE_WALK_SHARED:		Indicates the page-tables may be shared
+ *					with other software walkers.
  */
 enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_LEAF			= BIT(0),
 	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
+	KVM_PGTABLE_WALK_SHARED			= BIT(3),
 };
 
 struct kvm_pgtable_visit_ctx {
@@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
 typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
 					enum kvm_pgtable_walk_flags visit);
 
+static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
+{
+	return ctx->flags & KVM_PGTABLE_WALK_SHARED;
+}
+
 /**
  * struct kvm_pgtable_walker - Hook into a page-table walk.
  * @cb:		Callback function to invoke during the walk.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7c9782347570..d8d963521d4e 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 				  enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
+
+	/* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
+	WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
 	return walker->cb(ctx, visit);
 }
 
@@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.end	= PAGE_ALIGN(walk_data.addr + size),
 		.walker	= walker,
 	};
+	int r;
+
+	kvm_pgtable_walk_begin();
+	r = _kvm_pgtable_walk(pgt, &walk_data);
+	kvm_pgtable_walk_end();
 
-	return _kvm_pgtable_walk(pgt, &walk_data);
+	return r;
 }
 
 struct leaf_walk_data {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 73ae908eb5d9..52e042399ba5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
 
 static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
 
+static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
+{
+	struct page *page = container_of(head, struct page, rcu_head);
+	void *pgtable = page_to_virt(page);
+	u32 level = page_private(page);
+
+	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
+}
+
 static void stage2_free_removed_table(void *addr, u32 level)
 {
-	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
+	struct page *page = virt_to_page(addr);
+
+	set_page_private(page, (unsigned long)level);
+	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
 }
 
 static void kvm_host_get_page(void *addr)
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
release the RCU read lock when traversing the page tables. Defer the
freeing of table memory to an RCU callback. Indirect the calls into RCU
and provide stubs for hypervisor code, as RCU is not available in such a
context.

The RCU protection doesn't amount to much at the moment, as readers are
already protected by the read-write lock (all walkers that free table
memory take the write lock). Nonetheless, a subsequent change will
futher relax the locking requirements around the stage-2 MMU, thereby
depending on RCU.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
 arch/arm64/kvm/mmu.c                 | 14 +++++++-
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index e70cf57b719e..7634b6964779 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
 
 typedef u64 kvm_pte_t;
 
+/*
+ * RCU cannot be used in a non-kernel context such as the hyp. As such, page
+ * table walkers used in hyp do not call into RCU and instead use other
+ * synchronization mechanisms (such as a spinlock).
+ */
+#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
+
 typedef kvm_pte_t *kvm_pteref_t;
 
 static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
@@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
 	return pteref;
 }
 
+static inline void kvm_pgtable_walk_begin(void) {}
+static inline void kvm_pgtable_walk_end(void) {}
+
+static inline bool kvm_pgtable_walk_lock_held(void)
+{
+	return true;
+}
+
+#else
+
+typedef kvm_pte_t __rcu *kvm_pteref_t;
+
+static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
+{
+	return rcu_dereference_check(pteref, !shared);
+}
+
+static inline void kvm_pgtable_walk_begin(void)
+{
+	rcu_read_lock();
+}
+
+static inline void kvm_pgtable_walk_end(void)
+{
+	rcu_read_unlock();
+}
+
+static inline bool kvm_pgtable_walk_lock_held(void)
+{
+	return rcu_read_lock_held();
+}
+
+#endif
+
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
@@ -202,11 +243,14 @@ struct kvm_pgtable {
  *					children.
  * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
  *					children.
+ * @KVM_PGTABLE_WALK_SHARED:		Indicates the page-tables may be shared
+ *					with other software walkers.
  */
 enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_LEAF			= BIT(0),
 	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
+	KVM_PGTABLE_WALK_SHARED			= BIT(3),
 };
 
 struct kvm_pgtable_visit_ctx {
@@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
 typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
 					enum kvm_pgtable_walk_flags visit);
 
+static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
+{
+	return ctx->flags & KVM_PGTABLE_WALK_SHARED;
+}
+
 /**
  * struct kvm_pgtable_walker - Hook into a page-table walk.
  * @cb:		Callback function to invoke during the walk.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7c9782347570..d8d963521d4e 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 				  enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
+
+	/* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
+	WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
 	return walker->cb(ctx, visit);
 }
 
@@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.end	= PAGE_ALIGN(walk_data.addr + size),
 		.walker	= walker,
 	};
+	int r;
+
+	kvm_pgtable_walk_begin();
+	r = _kvm_pgtable_walk(pgt, &walk_data);
+	kvm_pgtable_walk_end();
 
-	return _kvm_pgtable_walk(pgt, &walk_data);
+	return r;
 }
 
 struct leaf_walk_data {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 73ae908eb5d9..52e042399ba5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
 
 static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
 
+static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
+{
+	struct page *page = container_of(head, struct page, rcu_head);
+	void *pgtable = page_to_virt(page);
+	u32 level = page_private(page);
+
+	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
+}
+
 static void stage2_free_removed_table(void *addr, u32 level)
 {
-	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
+	struct page *page = virt_to_page(addr);
+
+	set_page_private(page, (unsigned long)level);
+	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
 }
 
 static void kvm_host_get_page(void *addr)
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
release the RCU read lock when traversing the page tables. Defer the
freeing of table memory to an RCU callback. Indirect the calls into RCU
and provide stubs for hypervisor code, as RCU is not available in such a
context.

The RCU protection doesn't amount to much at the moment, as readers are
already protected by the read-write lock (all walkers that free table
memory take the write lock). Nonetheless, a subsequent change will
futher relax the locking requirements around the stage-2 MMU, thereby
depending on RCU.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
 arch/arm64/kvm/mmu.c                 | 14 +++++++-
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index e70cf57b719e..7634b6964779 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
 
 typedef u64 kvm_pte_t;
 
+/*
+ * RCU cannot be used in a non-kernel context such as the hyp. As such, page
+ * table walkers used in hyp do not call into RCU and instead use other
+ * synchronization mechanisms (such as a spinlock).
+ */
+#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
+
 typedef kvm_pte_t *kvm_pteref_t;
 
 static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
@@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
 	return pteref;
 }
 
+static inline void kvm_pgtable_walk_begin(void) {}
+static inline void kvm_pgtable_walk_end(void) {}
+
+static inline bool kvm_pgtable_walk_lock_held(void)
+{
+	return true;
+}
+
+#else
+
+typedef kvm_pte_t __rcu *kvm_pteref_t;
+
+static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
+{
+	return rcu_dereference_check(pteref, !shared);
+}
+
+static inline void kvm_pgtable_walk_begin(void)
+{
+	rcu_read_lock();
+}
+
+static inline void kvm_pgtable_walk_end(void)
+{
+	rcu_read_unlock();
+}
+
+static inline bool kvm_pgtable_walk_lock_held(void)
+{
+	return rcu_read_lock_held();
+}
+
+#endif
+
 #define KVM_PTE_VALID			BIT(0)
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
@@ -202,11 +243,14 @@ struct kvm_pgtable {
  *					children.
  * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
  *					children.
+ * @KVM_PGTABLE_WALK_SHARED:		Indicates the page-tables may be shared
+ *					with other software walkers.
  */
 enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_LEAF			= BIT(0),
 	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
+	KVM_PGTABLE_WALK_SHARED			= BIT(3),
 };
 
 struct kvm_pgtable_visit_ctx {
@@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
 typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
 					enum kvm_pgtable_walk_flags visit);
 
+static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
+{
+	return ctx->flags & KVM_PGTABLE_WALK_SHARED;
+}
+
 /**
  * struct kvm_pgtable_walker - Hook into a page-table walk.
  * @cb:		Callback function to invoke during the walk.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 7c9782347570..d8d963521d4e 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 				  enum kvm_pgtable_walk_flags visit)
 {
 	struct kvm_pgtable_walker *walker = data->walker;
+
+	/* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
+	WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
 	return walker->cb(ctx, visit);
 }
 
@@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.end	= PAGE_ALIGN(walk_data.addr + size),
 		.walker	= walker,
 	};
+	int r;
+
+	kvm_pgtable_walk_begin();
+	r = _kvm_pgtable_walk(pgt, &walk_data);
+	kvm_pgtable_walk_end();
 
-	return _kvm_pgtable_walk(pgt, &walk_data);
+	return r;
 }
 
 struct leaf_walk_data {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 73ae908eb5d9..52e042399ba5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
 
 static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
 
+static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
+{
+	struct page *page = container_of(head, struct page, rcu_head);
+	void *pgtable = page_to_virt(page);
+	u32 level = page_private(page);
+
+	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
+}
+
 static void stage2_free_removed_table(void *addr, u32 level)
 {
-	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
+	struct page *page = virt_to_page(addr);
+
+	set_page_private(page, (unsigned long)level);
+	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
 }
 
 static void kvm_host_get_page(void *addr)
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

The stage2 attr walker is already used for parallel walks. Since commit
f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
during dirty logging"), KVM acquires the read lock when
write-unprotecting a PTE. However, the walker only uses a simple store
to update the PTE. This is safe as the only possible race is with
hardware updates to the access flag, which is benign.

However, a subsequent change to KVM will allow more changes to the stage
2 page tables to be done in parallel. Prepare the stage 2 attribute
walker by performing atomic updates to the PTE when walking in parallel.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index d8d963521d4e..a34e2050f931 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -185,7 +185,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      kvm_pteref_t pteref, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
-	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
+	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, flags & KVM_PGTABLE_WALK_SHARED);
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
@@ -675,6 +675,16 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
+static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
+{
+	if (!kvm_pgtable_walk_shared(ctx)) {
+		WRITE_ONCE(*ctx->ptep, new);
+		return true;
+	}
+
+	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
+}
+
 static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
 			   struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -986,7 +996,9 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
 						  kvm_granule_size(ctx->level));
-		WRITE_ONCE(*ctx->ptep, pte);
+
+		if (!stage2_try_set_pte(ctx, pte))
+			return -EAGAIN;
 	}
 
 	return 0;
@@ -995,7 +1007,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level)
+				    u32 *level, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1006,7 +1018,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
 		.arg		= &data,
-		.flags		= KVM_PGTABLE_WALK_LEAF,
+		.flags		= flags | KVM_PGTABLE_WALK_LEAF,
 	};
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
@@ -1025,14 +1037,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	return stage2_update_leaf_attrs(pgt, addr, size, 0,
 					KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
-					NULL, NULL);
+					NULL, NULL, 0);
 }
 
 kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
 	stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
-				 &pte, NULL);
+				 &pte, NULL, 0);
 	dsb(ishst);
 	return pte;
 }
@@ -1041,7 +1053,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
 	stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
-				 &pte, NULL);
+				 &pte, NULL, 0);
 	/*
 	 * "But where's the TLBI?!", you scream.
 	 * "Over in the core code", I sigh.
@@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
 bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
-	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
+	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
 	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
 }
 
@@ -1077,7 +1089,8 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	if (prot & KVM_PGTABLE_PROT_X)
 		clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 
-	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level);
+	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
+				       KVM_PGTABLE_WALK_SHARED);
 	if (!ret)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
 	return ret;
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

The stage2 attr walker is already used for parallel walks. Since commit
f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
during dirty logging"), KVM acquires the read lock when
write-unprotecting a PTE. However, the walker only uses a simple store
to update the PTE. This is safe as the only possible race is with
hardware updates to the access flag, which is benign.

However, a subsequent change to KVM will allow more changes to the stage
2 page tables to be done in parallel. Prepare the stage 2 attribute
walker by performing atomic updates to the PTE when walking in parallel.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index d8d963521d4e..a34e2050f931 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -185,7 +185,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      kvm_pteref_t pteref, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
-	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
+	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, flags & KVM_PGTABLE_WALK_SHARED);
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
@@ -675,6 +675,16 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
+static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
+{
+	if (!kvm_pgtable_walk_shared(ctx)) {
+		WRITE_ONCE(*ctx->ptep, new);
+		return true;
+	}
+
+	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
+}
+
 static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
 			   struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -986,7 +996,9 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
 						  kvm_granule_size(ctx->level));
-		WRITE_ONCE(*ctx->ptep, pte);
+
+		if (!stage2_try_set_pte(ctx, pte))
+			return -EAGAIN;
 	}
 
 	return 0;
@@ -995,7 +1007,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level)
+				    u32 *level, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1006,7 +1018,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
 		.arg		= &data,
-		.flags		= KVM_PGTABLE_WALK_LEAF,
+		.flags		= flags | KVM_PGTABLE_WALK_LEAF,
 	};
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
@@ -1025,14 +1037,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	return stage2_update_leaf_attrs(pgt, addr, size, 0,
 					KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
-					NULL, NULL);
+					NULL, NULL, 0);
 }
 
 kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
 	stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
-				 &pte, NULL);
+				 &pte, NULL, 0);
 	dsb(ishst);
 	return pte;
 }
@@ -1041,7 +1053,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
 	stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
-				 &pte, NULL);
+				 &pte, NULL, 0);
 	/*
 	 * "But where's the TLBI?!", you scream.
 	 * "Over in the core code", I sigh.
@@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
 bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
-	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
+	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
 	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
 }
 
@@ -1077,7 +1089,8 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	if (prot & KVM_PGTABLE_PROT_X)
 		clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 
-	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level);
+	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
+				       KVM_PGTABLE_WALK_SHARED);
 	if (!ret)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
 	return ret;
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

The stage2 attr walker is already used for parallel walks. Since commit
f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
during dirty logging"), KVM acquires the read lock when
write-unprotecting a PTE. However, the walker only uses a simple store
to update the PTE. This is safe as the only possible race is with
hardware updates to the access flag, which is benign.

However, a subsequent change to KVM will allow more changes to the stage
2 page tables to be done in parallel. Prepare the stage 2 attribute
walker by performing atomic updates to the PTE when walking in parallel.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index d8d963521d4e..a34e2050f931 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -185,7 +185,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      kvm_pteref_t pteref, u32 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
-	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
+	kvm_pte_t *ptep = kvm_dereference_pteref(pteref, flags & KVM_PGTABLE_WALK_SHARED);
 	struct kvm_pgtable_visit_ctx ctx = {
 		.ptep	= ptep,
 		.old	= READ_ONCE(*ptep),
@@ -675,6 +675,16 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
+static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
+{
+	if (!kvm_pgtable_walk_shared(ctx)) {
+		WRITE_ONCE(*ctx->ptep, new);
+		return true;
+	}
+
+	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
+}
+
 static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
 			   struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -986,7 +996,9 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
 			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
 						  kvm_granule_size(ctx->level));
-		WRITE_ONCE(*ctx->ptep, pte);
+
+		if (!stage2_try_set_pte(ctx, pte))
+			return -EAGAIN;
 	}
 
 	return 0;
@@ -995,7 +1007,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level)
+				    u32 *level, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1006,7 +1018,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_attr_walker,
 		.arg		= &data,
-		.flags		= KVM_PGTABLE_WALK_LEAF,
+		.flags		= flags | KVM_PGTABLE_WALK_LEAF,
 	};
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
@@ -1025,14 +1037,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	return stage2_update_leaf_attrs(pgt, addr, size, 0,
 					KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
-					NULL, NULL);
+					NULL, NULL, 0);
 }
 
 kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
 	stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
-				 &pte, NULL);
+				 &pte, NULL, 0);
 	dsb(ishst);
 	return pte;
 }
@@ -1041,7 +1053,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
 	stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
-				 &pte, NULL);
+				 &pte, NULL, 0);
 	/*
 	 * "But where's the TLBI?!", you scream.
 	 * "Over in the core code", I sigh.
@@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
 bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
-	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
+	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
 	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
 }
 
@@ -1077,7 +1089,8 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 	if (prot & KVM_PGTABLE_PROT_X)
 		clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
 
-	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level);
+	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
+				       KVM_PGTABLE_WALK_SHARED);
 	if (!ret)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
 	return ret;
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:56   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Create a helper to initialize a table and directly call
smp_store_release() to install it (for now). Prepare for a subsequent
change that generalizes PTE writes with a helper.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index a34e2050f931..f4dd77c6c97d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, 0);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
-			      struct kvm_pgtable_mm_ops *mm_ops)
+static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
+	kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
-
-	WARN_ON(kvm_pte_valid(old));
-	smp_store_release(ptep, pte);
+	return pte;
 }
 
 static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
@@ -413,7 +410,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			  enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t *childp;
+	kvm_pte_t *childp, new;
 	struct hyp_map_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
@@ -427,8 +424,10 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ctx->ptep);
+	smp_store_release(ctx->ptep, new);
+
 	return 0;
 }
 
@@ -796,7 +795,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
-	kvm_pte_t *childp;
+	kvm_pte_t *childp, new;
 	int ret;
 
 	ret = stage2_map_walker_try_leaf(ctx, data);
@@ -821,8 +820,9 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (stage2_pte_is_counted(ctx->old))
 		stage2_put_pte(ctx, data->mmu, mm_ops);
 
-	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ctx->ptep);
+	smp_store_release(ctx->ptep, new);
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Create a helper to initialize a table and directly call
smp_store_release() to install it (for now). Prepare for a subsequent
change that generalizes PTE writes with a helper.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index a34e2050f931..f4dd77c6c97d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, 0);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
-			      struct kvm_pgtable_mm_ops *mm_ops)
+static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
+	kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
-
-	WARN_ON(kvm_pte_valid(old));
-	smp_store_release(ptep, pte);
+	return pte;
 }
 
 static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
@@ -413,7 +410,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			  enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t *childp;
+	kvm_pte_t *childp, new;
 	struct hyp_map_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
@@ -427,8 +424,10 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ctx->ptep);
+	smp_store_release(ctx->ptep, new);
+
 	return 0;
 }
 
@@ -796,7 +795,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
-	kvm_pte_t *childp;
+	kvm_pte_t *childp, new;
 	int ret;
 
 	ret = stage2_map_walker_try_leaf(ctx, data);
@@ -821,8 +820,9 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (stage2_pte_is_counted(ctx->old))
 		stage2_put_pte(ctx, data->mmu, mm_ops);
 
-	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ctx->ptep);
+	smp_store_release(ctx->ptep, new);
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
@ 2022-11-07 21:56   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:56 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Create a helper to initialize a table and directly call
smp_store_release() to install it (for now). Prepare for a subsequent
change that generalizes PTE writes with a helper.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index a34e2050f931..f4dd77c6c97d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
 	WRITE_ONCE(*ptep, 0);
 }
 
-static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
-			      struct kvm_pgtable_mm_ops *mm_ops)
+static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
 {
-	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
+	kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
 
 	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
 	pte |= KVM_PTE_VALID;
-
-	WARN_ON(kvm_pte_valid(old));
-	smp_store_release(ptep, pte);
+	return pte;
 }
 
 static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
@@ -413,7 +410,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 			  enum kvm_pgtable_walk_flags visit)
 {
-	kvm_pte_t *childp;
+	kvm_pte_t *childp, new;
 	struct hyp_map_data *data = ctx->arg;
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
 
@@ -427,8 +424,10 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!childp)
 		return -ENOMEM;
 
-	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ctx->ptep);
+	smp_store_release(ctx->ptep, new);
+
 	return 0;
 }
 
@@ -796,7 +795,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 				struct stage2_map_data *data)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
-	kvm_pte_t *childp;
+	kvm_pte_t *childp, new;
 	int ret;
 
 	ret = stage2_map_walker_try_leaf(ctx, data);
@@ -821,8 +820,9 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (stage2_pte_is_counted(ctx->old))
 		stage2_put_pte(ctx, data->mmu, mm_ops);
 
-	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
+	new = kvm_init_table_pte(childp, mm_ops);
 	mm_ops->get_page(ctx->ptep);
+	smp_store_release(ctx->ptep, new);
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:58   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:58 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

In order to service stage-2 faults in parallel, stage-2 table walkers
must take exclusive ownership of the PTE being worked on. An additional
requirement of the architecture is that software must perform a
'break-before-make' operation when changing the block size used for
mapping memory.

Roll these two concepts together into helpers for performing a
'break-before-make' sequence. Use a special PTE value to indicate a PTE
has been locked by a software walker. Additionally, use an atomic
compare-exchange to 'break' the PTE when the stage-2 page tables are
possibly shared with another software walker. Elide the DSB + TLBI if
the evicted PTE was invalid (and thus not subject to break-before-make).

All of the atomics do nothing for now, as the stage-2 walker isn't fully
ready to perform parallel walks.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
 1 file changed, 75 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f4dd77c6c97d..b9f0d792b8d9 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -49,6 +49,12 @@
 #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
 #define KVM_MAX_OWNER_ID		1
 
+/*
+ * Used to indicate a pte for which a 'break-before-make' sequence is in
+ * progress.
+ */
+#define KVM_INVALID_PTE_LOCKED		BIT(10)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable_walker	*walker;
 
@@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
+static bool stage2_pte_is_locked(kvm_pte_t pte)
+{
+	return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
+}
+
 static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
 {
 	if (!kvm_pgtable_walk_shared(ctx)) {
@@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
 	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
 }
 
+/**
+ * stage2_try_break_pte() - Invalidates a pte according to the
+ *			    'break-before-make' requirements of the
+ *			    architecture.
+ *
+ * @ctx: context of the visited pte.
+ * @mmu: stage-2 mmu
+ *
+ * Returns: true if the pte was successfully broken.
+ *
+ * If the removed pte was valid, performs the necessary serialization and TLB
+ * invalidation for the old value. For counted ptes, drops the reference count
+ * on the containing table page.
+ */
+static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
+				 struct kvm_s2_mmu *mmu)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+
+	if (stage2_pte_is_locked(ctx->old)) {
+		/*
+		 * Should never occur if this walker has exclusive access to the
+		 * page tables.
+		 */
+		WARN_ON(!kvm_pgtable_walk_shared(ctx));
+		return false;
+	}
+
+	if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
+		return false;
+
+	/*
+	 * Perform the appropriate TLB invalidation based on the evicted pte
+	 * value (if any).
+	 */
+	if (kvm_pte_table(ctx->old, ctx->level))
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	else if (kvm_pte_valid(ctx->old))
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+
+	if (stage2_pte_is_counted(ctx->old))
+		mm_ops->put_page(ctx->ptep);
+
+	return true;
+}
+
+static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+
+	WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
+
+	if (stage2_pte_is_counted(new))
+		mm_ops->get_page(ctx->ptep);
+
+	smp_store_release(ctx->ptep, new);
+}
+
 static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
 			   struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!childp)
 		return -ENOMEM;
 
+	if (!stage2_try_break_pte(ctx, data->mmu)) {
+		mm_ops->put_page(childp);
+		return -EAGAIN;
+	}
+
 	/*
 	 * If we've run into an existing block mapping then replace it with
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (stage2_pte_is_counted(ctx->old))
-		stage2_put_pte(ctx, data->mmu, mm_ops);
-
 	new = kvm_init_table_pte(childp, mm_ops);
-	mm_ops->get_page(ctx->ptep);
-	smp_store_release(ctx->ptep, new);
+	stage2_make_pte(ctx, new);
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
@ 2022-11-07 21:58   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:58 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

In order to service stage-2 faults in parallel, stage-2 table walkers
must take exclusive ownership of the PTE being worked on. An additional
requirement of the architecture is that software must perform a
'break-before-make' operation when changing the block size used for
mapping memory.

Roll these two concepts together into helpers for performing a
'break-before-make' sequence. Use a special PTE value to indicate a PTE
has been locked by a software walker. Additionally, use an atomic
compare-exchange to 'break' the PTE when the stage-2 page tables are
possibly shared with another software walker. Elide the DSB + TLBI if
the evicted PTE was invalid (and thus not subject to break-before-make).

All of the atomics do nothing for now, as the stage-2 walker isn't fully
ready to perform parallel walks.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
 1 file changed, 75 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f4dd77c6c97d..b9f0d792b8d9 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -49,6 +49,12 @@
 #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
 #define KVM_MAX_OWNER_ID		1
 
+/*
+ * Used to indicate a pte for which a 'break-before-make' sequence is in
+ * progress.
+ */
+#define KVM_INVALID_PTE_LOCKED		BIT(10)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable_walker	*walker;
 
@@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
+static bool stage2_pte_is_locked(kvm_pte_t pte)
+{
+	return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
+}
+
 static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
 {
 	if (!kvm_pgtable_walk_shared(ctx)) {
@@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
 	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
 }
 
+/**
+ * stage2_try_break_pte() - Invalidates a pte according to the
+ *			    'break-before-make' requirements of the
+ *			    architecture.
+ *
+ * @ctx: context of the visited pte.
+ * @mmu: stage-2 mmu
+ *
+ * Returns: true if the pte was successfully broken.
+ *
+ * If the removed pte was valid, performs the necessary serialization and TLB
+ * invalidation for the old value. For counted ptes, drops the reference count
+ * on the containing table page.
+ */
+static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
+				 struct kvm_s2_mmu *mmu)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+
+	if (stage2_pte_is_locked(ctx->old)) {
+		/*
+		 * Should never occur if this walker has exclusive access to the
+		 * page tables.
+		 */
+		WARN_ON(!kvm_pgtable_walk_shared(ctx));
+		return false;
+	}
+
+	if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
+		return false;
+
+	/*
+	 * Perform the appropriate TLB invalidation based on the evicted pte
+	 * value (if any).
+	 */
+	if (kvm_pte_table(ctx->old, ctx->level))
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	else if (kvm_pte_valid(ctx->old))
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+
+	if (stage2_pte_is_counted(ctx->old))
+		mm_ops->put_page(ctx->ptep);
+
+	return true;
+}
+
+static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+
+	WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
+
+	if (stage2_pte_is_counted(new))
+		mm_ops->get_page(ctx->ptep);
+
+	smp_store_release(ctx->ptep, new);
+}
+
 static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
 			   struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!childp)
 		return -ENOMEM;
 
+	if (!stage2_try_break_pte(ctx, data->mmu)) {
+		mm_ops->put_page(childp);
+		return -EAGAIN;
+	}
+
 	/*
 	 * If we've run into an existing block mapping then replace it with
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (stage2_pte_is_counted(ctx->old))
-		stage2_put_pte(ctx, data->mmu, mm_ops);
-
 	new = kvm_init_table_pte(childp, mm_ops);
-	mm_ops->get_page(ctx->ptep);
-	smp_store_release(ctx->ptep, new);
+	stage2_make_pte(ctx, new);
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
@ 2022-11-07 21:58   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:58 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

In order to service stage-2 faults in parallel, stage-2 table walkers
must take exclusive ownership of the PTE being worked on. An additional
requirement of the architecture is that software must perform a
'break-before-make' operation when changing the block size used for
mapping memory.

Roll these two concepts together into helpers for performing a
'break-before-make' sequence. Use a special PTE value to indicate a PTE
has been locked by a software walker. Additionally, use an atomic
compare-exchange to 'break' the PTE when the stage-2 page tables are
possibly shared with another software walker. Elide the DSB + TLBI if
the evicted PTE was invalid (and thus not subject to break-before-make).

All of the atomics do nothing for now, as the stage-2 walker isn't fully
ready to perform parallel walks.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
 1 file changed, 75 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f4dd77c6c97d..b9f0d792b8d9 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -49,6 +49,12 @@
 #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
 #define KVM_MAX_OWNER_ID		1
 
+/*
+ * Used to indicate a pte for which a 'break-before-make' sequence is in
+ * progress.
+ */
+#define KVM_INVALID_PTE_LOCKED		BIT(10)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable_walker	*walker;
 
@@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 	return !!pte;
 }
 
+static bool stage2_pte_is_locked(kvm_pte_t pte)
+{
+	return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
+}
+
 static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
 {
 	if (!kvm_pgtable_walk_shared(ctx)) {
@@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
 	return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
 }
 
+/**
+ * stage2_try_break_pte() - Invalidates a pte according to the
+ *			    'break-before-make' requirements of the
+ *			    architecture.
+ *
+ * @ctx: context of the visited pte.
+ * @mmu: stage-2 mmu
+ *
+ * Returns: true if the pte was successfully broken.
+ *
+ * If the removed pte was valid, performs the necessary serialization and TLB
+ * invalidation for the old value. For counted ptes, drops the reference count
+ * on the containing table page.
+ */
+static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
+				 struct kvm_s2_mmu *mmu)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+
+	if (stage2_pte_is_locked(ctx->old)) {
+		/*
+		 * Should never occur if this walker has exclusive access to the
+		 * page tables.
+		 */
+		WARN_ON(!kvm_pgtable_walk_shared(ctx));
+		return false;
+	}
+
+	if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
+		return false;
+
+	/*
+	 * Perform the appropriate TLB invalidation based on the evicted pte
+	 * value (if any).
+	 */
+	if (kvm_pte_table(ctx->old, ctx->level))
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	else if (kvm_pte_valid(ctx->old))
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+
+	if (stage2_pte_is_counted(ctx->old))
+		mm_ops->put_page(ctx->ptep);
+
+	return true;
+}
+
+static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+
+	WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
+
+	if (stage2_pte_is_counted(new))
+		mm_ops->get_page(ctx->ptep);
+
+	smp_store_release(ctx->ptep, new);
+}
+
 static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
 			   struct kvm_pgtable_mm_ops *mm_ops)
 {
@@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!childp)
 		return -ENOMEM;
 
+	if (!stage2_try_break_pte(ctx, data->mmu)) {
+		mm_ops->put_page(childp);
+		return -EAGAIN;
+	}
+
 	/*
 	 * If we've run into an existing block mapping then replace it with
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (stage2_pte_is_counted(ctx->old))
-		stage2_put_pte(ctx, data->mmu, mm_ops);
-
 	new = kvm_init_table_pte(childp, mm_ops);
-	mm_ops->get_page(ctx->ptep);
-	smp_store_release(ctx->ptep, new);
+	stage2_make_pte(ctx, new);
 
 	return 0;
 }
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 21:59   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:59 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Convert stage2_map_walker_try_leaf() to use the new break-before-make
helpers, thereby making the handler parallel-aware. As before, avoid the
break-before-make if recreating the existing mapping. Additionally,
retry execution if another vCPU thread is modifying the same PTE.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b9f0d792b8d9..238f29389617 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -804,18 +804,17 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
-	if (stage2_pte_is_counted(ctx->old)) {
-		/*
-		 * Skip updating the PTE if we are trying to recreate the exact
-		 * same mapping or only change the access permissions. Instead,
-		 * the vCPU will exit one more time from guest if still needed
-		 * and then go through the path of relaxing permissions.
-		 */
-		if (!stage2_pte_needs_update(ctx->old, new))
-			return -EAGAIN;
+	/*
+	 * Skip updating the PTE if we are trying to recreate the exact
+	 * same mapping or only change the access permissions. Instead,
+	 * the vCPU will exit one more time from guest if still needed
+	 * and then go through the path of relaxing permissions.
+	 */
+	if (!stage2_pte_needs_update(ctx->old, new))
+		return -EAGAIN;
 
-		stage2_put_pte(ctx, data->mmu, mm_ops);
-	}
+	if (!stage2_try_break_pte(ctx, data->mmu))
+		return -EAGAIN;
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
 	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
@@ -825,9 +824,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-	smp_store_release(ctx->ptep, new);
-	if (stage2_pte_is_counted(new))
-		mm_ops->get_page(ctx->ptep);
+	stage2_make_pte(ctx, new);
+
 	if (kvm_phys_is_valid(phys))
 		data->phys += granule;
 	return 0;
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
@ 2022-11-07 21:59   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:59 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Convert stage2_map_walker_try_leaf() to use the new break-before-make
helpers, thereby making the handler parallel-aware. As before, avoid the
break-before-make if recreating the existing mapping. Additionally,
retry execution if another vCPU thread is modifying the same PTE.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b9f0d792b8d9..238f29389617 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -804,18 +804,17 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
-	if (stage2_pte_is_counted(ctx->old)) {
-		/*
-		 * Skip updating the PTE if we are trying to recreate the exact
-		 * same mapping or only change the access permissions. Instead,
-		 * the vCPU will exit one more time from guest if still needed
-		 * and then go through the path of relaxing permissions.
-		 */
-		if (!stage2_pte_needs_update(ctx->old, new))
-			return -EAGAIN;
+	/*
+	 * Skip updating the PTE if we are trying to recreate the exact
+	 * same mapping or only change the access permissions. Instead,
+	 * the vCPU will exit one more time from guest if still needed
+	 * and then go through the path of relaxing permissions.
+	 */
+	if (!stage2_pte_needs_update(ctx->old, new))
+		return -EAGAIN;
 
-		stage2_put_pte(ctx, data->mmu, mm_ops);
-	}
+	if (!stage2_try_break_pte(ctx, data->mmu))
+		return -EAGAIN;
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
 	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
@@ -825,9 +824,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-	smp_store_release(ctx->ptep, new);
-	if (stage2_pte_is_counted(new))
-		mm_ops->get_page(ctx->ptep);
+	stage2_make_pte(ctx, new);
+
 	if (kvm_phys_is_valid(phys))
 		data->phys += granule;
 	return 0;
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
@ 2022-11-07 21:59   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 21:59 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

Convert stage2_map_walker_try_leaf() to use the new break-before-make
helpers, thereby making the handler parallel-aware. As before, avoid the
break-before-make if recreating the existing mapping. Additionally,
retry execution if another vCPU thread is modifying the same PTE.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b9f0d792b8d9..238f29389617 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -804,18 +804,17 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	else
 		new = kvm_init_invalid_leaf_owner(data->owner_id);
 
-	if (stage2_pte_is_counted(ctx->old)) {
-		/*
-		 * Skip updating the PTE if we are trying to recreate the exact
-		 * same mapping or only change the access permissions. Instead,
-		 * the vCPU will exit one more time from guest if still needed
-		 * and then go through the path of relaxing permissions.
-		 */
-		if (!stage2_pte_needs_update(ctx->old, new))
-			return -EAGAIN;
+	/*
+	 * Skip updating the PTE if we are trying to recreate the exact
+	 * same mapping or only change the access permissions. Instead,
+	 * the vCPU will exit one more time from guest if still needed
+	 * and then go through the path of relaxing permissions.
+	 */
+	if (!stage2_pte_needs_update(ctx->old, new))
+		return -EAGAIN;
 
-		stage2_put_pte(ctx, data->mmu, mm_ops);
-	}
+	if (!stage2_try_break_pte(ctx, data->mmu))
+		return -EAGAIN;
 
 	/* Perform CMOs before installation of the guest stage-2 PTE */
 	if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
@@ -825,9 +824,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-	smp_store_release(ctx->ptep, new);
-	if (stage2_pte_is_counted(new))
-		mm_ops->get_page(ctx->ptep);
+	stage2_make_pte(ctx, new);
+
 	if (kvm_phys_is_valid(phys))
 		data->phys += granule;
 	return 0;
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 13/14] KVM: arm64: Make table->block changes parallel-aware
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 22:00   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 22:00 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

stage2_map_walker_try_leaf() and friends now handle stage-2 PTEs
generically, and perform the correct flush when a table PTE is removed.
Additionally, they've been made parallel-aware, using an atomic break
to take ownership of the PTE.

Stop clearing the PTE in the pre-order callback and instead let
stage2_map_walker_try_leaf() deal with it.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 238f29389617..f814422ef795 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -841,21 +841,12 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	kvm_clear_pte(ctx->ptep);
-
-	/*
-	 * Invalidate the whole stage-2, as we may have numerous leaf
-	 * entries below us which would otherwise need invalidating
-	 * individually.
-	 */
-	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-
 	ret = stage2_map_walker_try_leaf(ctx, data);
+	if (ret)
+		return ret;
 
-	mm_ops->put_page(ctx->ptep);
 	mm_ops->free_removed_table(childp, ctx->level);
-
-	return ret;
+	return 0;
 }
 
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 13/14] KVM: arm64: Make table->block changes parallel-aware
@ 2022-11-07 22:00   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 22:00 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

stage2_map_walker_try_leaf() and friends now handle stage-2 PTEs
generically, and perform the correct flush when a table PTE is removed.
Additionally, they've been made parallel-aware, using an atomic break
to take ownership of the PTE.

Stop clearing the PTE in the pre-order callback and instead let
stage2_map_walker_try_leaf() deal with it.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 238f29389617..f814422ef795 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -841,21 +841,12 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	kvm_clear_pte(ctx->ptep);
-
-	/*
-	 * Invalidate the whole stage-2, as we may have numerous leaf
-	 * entries below us which would otherwise need invalidating
-	 * individually.
-	 */
-	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-
 	ret = stage2_map_walker_try_leaf(ctx, data);
+	if (ret)
+		return ret;
 
-	mm_ops->put_page(ctx->ptep);
 	mm_ops->free_removed_table(childp, ctx->level);
-
-	return ret;
+	return 0;
 }
 
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 13/14] KVM: arm64: Make table->block changes parallel-aware
@ 2022-11-07 22:00   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 22:00 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

stage2_map_walker_try_leaf() and friends now handle stage-2 PTEs
generically, and perform the correct flush when a table PTE is removed.
Additionally, they've been made parallel-aware, using an atomic break
to take ownership of the PTE.

Stop clearing the PTE in the pre-order callback and instead let
stage2_map_walker_try_leaf() deal with it.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/hyp/pgtable.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 238f29389617..f814422ef795 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -841,21 +841,12 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!stage2_leaf_mapping_allowed(ctx, data))
 		return 0;
 
-	kvm_clear_pte(ctx->ptep);
-
-	/*
-	 * Invalidate the whole stage-2, as we may have numerous leaf
-	 * entries below us which would otherwise need invalidating
-	 * individually.
-	 */
-	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
-
 	ret = stage2_map_walker_try_leaf(ctx, data);
+	if (ret)
+		return ret;
 
-	mm_ops->put_page(ctx->ptep);
 	mm_ops->free_removed_table(childp, ctx->level);
-
-	return ret;
+	return 0;
 }
 
 static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 14/14] KVM: arm64: Handle stage-2 faults in parallel
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-07 22:00   ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 22:00 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

The stage-2 map walker has been made parallel-aware, and as such can be
called while only holding the read side of the MMU lock. Rip out the
conditional locking in user_mem_abort() and instead grab the read lock.
Continue to take the write lock from other callsites to
kvm_pgtable_stage2_map().

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  3 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  2 +-
 arch/arm64/kvm/hyp/pgtable.c          |  5 +++--
 arch/arm64/kvm/mmu.c                  | 31 ++++++---------------------
 4 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 7634b6964779..a874ce0ce7b5 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -412,6 +412,7 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
  * @prot:	Permissions and attributes for the mapping.
  * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
  *		page-table pages.
+ * @flags:	Flags to control the page-table walk (ex. a shared walk)
  *
  * The offset of @addr within a page is ignored, @size is rounded-up to
  * the next page boundary and @phys is rounded-down to the previous page
@@ -433,7 +434,7 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
  */
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   void *mc);
+			   void *mc, enum kvm_pgtable_walk_flags flags);
 
 /**
  * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 735769886b55..f6d82bf33ce1 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -257,7 +257,7 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
 				      enum kvm_pgtable_prot prot)
 {
 	return kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
-				      prot, &host_s2_pool);
+				      prot, &host_s2_pool, 0);
 }
 
 /*
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f814422ef795..5bca9610d040 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -912,7 +912,7 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   void *mc)
+			   void *mc, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	struct stage2_map_data map_data = {
@@ -923,7 +923,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
-		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
+		.flags		= flags |
+				  KVM_PGTABLE_WALK_TABLE_PRE |
 				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 52e042399ba5..410c2a37fe32 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -861,7 +861,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 
 		write_lock(&kvm->mmu_lock);
 		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
-					     &cache);
+					     &cache, 0);
 		write_unlock(&kvm->mmu_lock);
 		if (ret)
 			break;
@@ -1156,7 +1156,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	bool use_read_lock = false;
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
@@ -1191,8 +1190,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (logging_active) {
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
-		use_read_lock = (fault_status == FSC_PERM && write_fault &&
-				 fault_granule == PAGE_SIZE);
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
 	}
@@ -1291,15 +1288,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
-	/*
-	 * To reduce MMU contentions and enhance concurrency during dirty
-	 * logging dirty logging, only acquire read lock for permission
-	 * relaxation.
-	 */
-	if (use_read_lock)
-		read_lock(&kvm->mmu_lock);
-	else
-		write_lock(&kvm->mmu_lock);
+	read_lock(&kvm->mmu_lock);
 	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_invalidate_retry(kvm, mmu_seq))
 		goto out_unlock;
@@ -1343,15 +1332,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
+	if (fault_status == FSC_PERM && vma_pagesize == fault_granule)
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
-	} else {
-		WARN_ONCE(use_read_lock, "Attempted stage-2 map outside of write lock\n");
-
+	else
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
-					     memcache);
-	}
+					     memcache, KVM_PGTABLE_WALK_SHARED);
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
@@ -1360,10 +1346,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 out_unlock:
-	if (use_read_lock)
-		read_unlock(&kvm->mmu_lock);
-	else
-		write_unlock(&kvm->mmu_lock);
+	read_unlock(&kvm->mmu_lock);
 	kvm_set_pfn_accessed(pfn);
 	kvm_release_pfn_clean(pfn);
 	return ret != -EAGAIN ? ret : 0;
@@ -1569,7 +1552,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	 */
 	kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
 			       PAGE_SIZE, __pfn_to_phys(pfn),
-			       KVM_PGTABLE_PROT_R, NULL);
+			       KVM_PGTABLE_PROT_R, NULL, 0);
 
 	return false;
 }
-- 
2.38.1.431.g37b22c650d-goog


^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 14/14] KVM: arm64: Handle stage-2 faults in parallel
@ 2022-11-07 22:00   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 22:00 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

The stage-2 map walker has been made parallel-aware, and as such can be
called while only holding the read side of the MMU lock. Rip out the
conditional locking in user_mem_abort() and instead grab the read lock.
Continue to take the write lock from other callsites to
kvm_pgtable_stage2_map().

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  3 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  2 +-
 arch/arm64/kvm/hyp/pgtable.c          |  5 +++--
 arch/arm64/kvm/mmu.c                  | 31 ++++++---------------------
 4 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 7634b6964779..a874ce0ce7b5 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -412,6 +412,7 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
  * @prot:	Permissions and attributes for the mapping.
  * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
  *		page-table pages.
+ * @flags:	Flags to control the page-table walk (ex. a shared walk)
  *
  * The offset of @addr within a page is ignored, @size is rounded-up to
  * the next page boundary and @phys is rounded-down to the previous page
@@ -433,7 +434,7 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
  */
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   void *mc);
+			   void *mc, enum kvm_pgtable_walk_flags flags);
 
 /**
  * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 735769886b55..f6d82bf33ce1 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -257,7 +257,7 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
 				      enum kvm_pgtable_prot prot)
 {
 	return kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
-				      prot, &host_s2_pool);
+				      prot, &host_s2_pool, 0);
 }
 
 /*
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f814422ef795..5bca9610d040 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -912,7 +912,7 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   void *mc)
+			   void *mc, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	struct stage2_map_data map_data = {
@@ -923,7 +923,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
-		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
+		.flags		= flags |
+				  KVM_PGTABLE_WALK_TABLE_PRE |
 				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 52e042399ba5..410c2a37fe32 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -861,7 +861,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 
 		write_lock(&kvm->mmu_lock);
 		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
-					     &cache);
+					     &cache, 0);
 		write_unlock(&kvm->mmu_lock);
 		if (ret)
 			break;
@@ -1156,7 +1156,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	bool use_read_lock = false;
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
@@ -1191,8 +1190,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (logging_active) {
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
-		use_read_lock = (fault_status == FSC_PERM && write_fault &&
-				 fault_granule == PAGE_SIZE);
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
 	}
@@ -1291,15 +1288,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
-	/*
-	 * To reduce MMU contentions and enhance concurrency during dirty
-	 * logging dirty logging, only acquire read lock for permission
-	 * relaxation.
-	 */
-	if (use_read_lock)
-		read_lock(&kvm->mmu_lock);
-	else
-		write_lock(&kvm->mmu_lock);
+	read_lock(&kvm->mmu_lock);
 	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_invalidate_retry(kvm, mmu_seq))
 		goto out_unlock;
@@ -1343,15 +1332,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
+	if (fault_status == FSC_PERM && vma_pagesize == fault_granule)
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
-	} else {
-		WARN_ONCE(use_read_lock, "Attempted stage-2 map outside of write lock\n");
-
+	else
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
-					     memcache);
-	}
+					     memcache, KVM_PGTABLE_WALK_SHARED);
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
@@ -1360,10 +1346,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 out_unlock:
-	if (use_read_lock)
-		read_unlock(&kvm->mmu_lock);
-	else
-		write_unlock(&kvm->mmu_lock);
+	read_unlock(&kvm->mmu_lock);
 	kvm_set_pfn_accessed(pfn);
 	kvm_release_pfn_clean(pfn);
 	return ret != -EAGAIN ? ret : 0;
@@ -1569,7 +1552,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	 */
 	kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
 			       PAGE_SIZE, __pfn_to_phys(pfn),
-			       KVM_PGTABLE_PROT_R, NULL);
+			       KVM_PGTABLE_PROT_R, NULL, 0);
 
 	return false;
 }
-- 
2.38.1.431.g37b22c650d-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v5 14/14] KVM: arm64: Handle stage-2 faults in parallel
@ 2022-11-07 22:00   ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-07 22:00 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm, Oliver Upton

The stage-2 map walker has been made parallel-aware, and as such can be
called while only holding the read side of the MMU lock. Rip out the
conditional locking in user_mem_abort() and instead grab the read lock.
Continue to take the write lock from other callsites to
kvm_pgtable_stage2_map().

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  3 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  2 +-
 arch/arm64/kvm/hyp/pgtable.c          |  5 +++--
 arch/arm64/kvm/mmu.c                  | 31 ++++++---------------------
 4 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 7634b6964779..a874ce0ce7b5 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -412,6 +412,7 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
  * @prot:	Permissions and attributes for the mapping.
  * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
  *		page-table pages.
+ * @flags:	Flags to control the page-table walk (ex. a shared walk)
  *
  * The offset of @addr within a page is ignored, @size is rounded-up to
  * the next page boundary and @phys is rounded-down to the previous page
@@ -433,7 +434,7 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
  */
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   void *mc);
+			   void *mc, enum kvm_pgtable_walk_flags flags);
 
 /**
  * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 735769886b55..f6d82bf33ce1 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -257,7 +257,7 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
 				      enum kvm_pgtable_prot prot)
 {
 	return kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
-				      prot, &host_s2_pool);
+				      prot, &host_s2_pool, 0);
 }
 
 /*
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f814422ef795..5bca9610d040 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -912,7 +912,7 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
 int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
-			   void *mc)
+			   void *mc, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	struct stage2_map_data map_data = {
@@ -923,7 +923,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
-		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
+		.flags		= flags |
+				  KVM_PGTABLE_WALK_TABLE_PRE |
 				  KVM_PGTABLE_WALK_LEAF,
 		.arg		= &map_data,
 	};
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 52e042399ba5..410c2a37fe32 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -861,7 +861,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 
 		write_lock(&kvm->mmu_lock);
 		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
-					     &cache);
+					     &cache, 0);
 		write_unlock(&kvm->mmu_lock);
 		if (ret)
 			break;
@@ -1156,7 +1156,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	bool use_read_lock = false;
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
@@ -1191,8 +1190,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (logging_active) {
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
-		use_read_lock = (fault_status == FSC_PERM && write_fault &&
-				 fault_granule == PAGE_SIZE);
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
 	}
@@ -1291,15 +1288,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
-	/*
-	 * To reduce MMU contentions and enhance concurrency during dirty
-	 * logging dirty logging, only acquire read lock for permission
-	 * relaxation.
-	 */
-	if (use_read_lock)
-		read_lock(&kvm->mmu_lock);
-	else
-		write_lock(&kvm->mmu_lock);
+	read_lock(&kvm->mmu_lock);
 	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_invalidate_retry(kvm, mmu_seq))
 		goto out_unlock;
@@ -1343,15 +1332,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
+	if (fault_status == FSC_PERM && vma_pagesize == fault_granule)
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
-	} else {
-		WARN_ONCE(use_read_lock, "Attempted stage-2 map outside of write lock\n");
-
+	else
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
-					     memcache);
-	}
+					     memcache, KVM_PGTABLE_WALK_SHARED);
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
@@ -1360,10 +1346,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 out_unlock:
-	if (use_read_lock)
-		read_unlock(&kvm->mmu_lock);
-	else
-		write_unlock(&kvm->mmu_lock);
+	read_unlock(&kvm->mmu_lock);
 	kvm_set_pfn_accessed(pfn);
 	kvm_release_pfn_clean(pfn);
 	return ret != -EAGAIN ? ret : 0;
@@ -1569,7 +1552,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	 */
 	kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
 			       PAGE_SIZE, __pfn_to_phys(pfn),
-			       KVM_PGTABLE_PROT_R, NULL);
+			       KVM_PGTABLE_PROT_R, NULL, 0);
 
 	return false;
 }
-- 
2.38.1.431.g37b22c650d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 21:53     ` Sean Christopherson
  -1 siblings, 0 replies; 156+ messages in thread
From: Sean Christopherson @ 2022-11-09 21:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, Ben Gardon,
	David Matlack, kvmarm, linux-arm-kernel

On Mon, Nov 07, 2022, Oliver Upton wrote:
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
> 
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.

Two somewhat off-topic questions (because I'm curious):

 1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
    faults (handle_access_fault()) and/or write-protection faults without acquiring
    mmu_lock?

 2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
    for the RCU-less hypervisor code?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-09 21:53     ` Sean Christopherson
  0 siblings, 0 replies; 156+ messages in thread
From: Sean Christopherson @ 2022-11-09 21:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Mon, Nov 07, 2022, Oliver Upton wrote:
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
> 
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.

Two somewhat off-topic questions (because I'm curious):

 1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
    faults (handle_access_fault()) and/or write-protection faults without acquiring
    mmu_lock?

 2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
    for the RCU-less hypervisor code?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-09 21:53     ` Sean Christopherson
  0 siblings, 0 replies; 156+ messages in thread
From: Sean Christopherson @ 2022-11-09 21:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Mon, Nov 07, 2022, Oliver Upton wrote:
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
> 
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.

Two somewhat off-topic questions (because I'm curious):

 1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
    faults (handle_access_fault()) and/or write-protection faults without acquiring
    mmu_lock?

 2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
    for the RCU-less hypervisor code?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:23     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Passing new arguments by value to the visitor callbacks is extremely
> inflexible for stuffing new parameters used by only some of the
> visitors. Use a context structure instead and pass the pointer through
> to the visitor callback.
>
> While at it, redefine the 'flags' parameter to the visitor to contain
> the bit indicating the phase of the walk. Pass the entire set of flags
> through the context structure such that the walker can communicate
> additional state to the visitor callback.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

This looks good to me. It's all fairly mechanical and I don't see any
problems. I was a little confused by the walk context flags passed via
visit, because they seem somewhat redundant if the leaf-ness can be
determined by looking at the PTE, but perhaps that's not always
possible.

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>  arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>  4 files changed, 154 insertions(+), 156 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3252eb50ecfe..607f9bb8aab4 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>         KVM_PGTABLE_WALK_TABLE_POST             = BIT(2),
>  };
>
> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> -                                       kvm_pte_t *ptep,
> -                                       enum kvm_pgtable_walk_flags flag,
> -                                       void * const arg);
> +struct kvm_pgtable_visit_ctx {
> +       kvm_pte_t                               *ptep;
> +       void                                    *arg;
> +       u64                                     addr;
> +       u64                                     end;
> +       u32                                     level;
> +       enum kvm_pgtable_walk_flags             flags;
> +};
> +
> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> +                                       enum kvm_pgtable_walk_flags visit);
>
>  /**
>   * struct kvm_pgtable_walker - Hook into a page-table walk.
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1e78acf9662e..8f5b6a36a039 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -417,13 +417,11 @@ struct check_walk_data {
>         enum pkvm_page_state    (*get_page_state)(kvm_pte_t pte);
>  };
>
> -static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> -                                     enum kvm_pgtable_walk_flags flag,
> -                                     void * const arg)
> +static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
> +                                     enum kvm_pgtable_walk_flags visit)
>  {
> -       struct check_walk_data *d = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct check_walk_data *d = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
>                 return -EINVAL;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index e8d4ea2fcfa0..a293cf5eba1b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
>         hyp_put_page(&hpool, addr);
>  }
>
> -static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> -                                        kvm_pte_t *ptep,
> -                                        enum kvm_pgtable_walk_flags flag,
> -                                        void * const arg)
> +static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                                        enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
> -       kvm_pte_t pte = *ptep;
> +       kvm_pte_t pte = *ctx->ptep;
>         phys_addr_t phys;
>
>         if (!kvm_pte_valid(pte))
> @@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
>          * was unable to access the hyp_vmemmap and so the buddy allocator has
>          * initialised the refcount to '1'.
>          */
> -       mm_ops->get_page(ptep);
> -       if (flag != KVM_PGTABLE_WALK_LEAF)
> +       mm_ops->get_page(ctx->ptep);
> +       if (visit != KVM_PGTABLE_WALK_LEAF)
>                 return 0;
>
> -       if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         phys = kvm_pte_to_phys(pte);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index cdf8e76b0be1..900c8b9c0cfc 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
>         return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
>  }
>
> -static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> +static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
>  {
> -       u64 granule = kvm_granule_size(level);
> +       u64 granule = kvm_granule_size(ctx->level);
>
> -       if (!kvm_level_supports_block_mapping(level))
> +       if (!kvm_level_supports_block_mapping(ctx->level))
>                 return false;
>
> -       if (granule > (end - addr))
> +       if (granule > (ctx->end - ctx->addr))
>                 return false;
>
>         if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
>                 return false;
>
> -       return IS_ALIGNED(addr, granule);
> +       return IS_ALIGNED(ctx->addr, granule);
>  }
>
>  static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> @@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
>         return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
>  }
>
> -static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> -                                 u32 level, kvm_pte_t *ptep,
> -                                 enum kvm_pgtable_walk_flags flag)
> +static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
> +                                 const struct kvm_pgtable_visit_ctx *ctx,
> +                                 enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_walker *walker = data->walker;
> -       return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
> +       return walker->cb(ctx, visit);
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> @@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       kvm_pte_t *ptep, u32 level)
>  {
> +       enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +       struct kvm_pgtable_visit_ctx ctx = {
> +               .ptep   = ptep,
> +               .arg    = data->walker->arg,
> +               .addr   = data->addr,
> +               .end    = data->end,
> +               .level  = level,
> +               .flags  = flags,
> +       };
>         int ret = 0;
> -       u64 addr = data->addr;
>         kvm_pte_t *childp, pte = *ptep;
>         bool table = kvm_pte_table(pte, level);
> -       enum kvm_pgtable_walk_flags flags = data->walker->flags;
>
> -       if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_TABLE_PRE);
> -       }
> +       if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>
> -       if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_LEAF);
> +       if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
>                 pte = *ptep;
>                 table = kvm_pte_table(pte, level);
>         }
> @@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>         if (ret)
>                 goto out;
>
> -       if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_TABLE_POST);
> -       }
> +       if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
>
>  out:
>         return ret;
> @@ -292,13 +294,13 @@ struct leaf_walk_data {
>         u32             level;
>  };
>
> -static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                      enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                      enum kvm_pgtable_walk_flags visit)
>  {
> -       struct leaf_walk_data *data = arg;
> +       struct leaf_walk_data *data = ctx->arg;
>
> -       data->pte   = *ptep;
> -       data->level = level;
> +       data->pte   = *ctx->ptep;
> +       data->level = ctx->level;
>
>         return 0;
>  }
> @@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>         return prot;
>  }
>
> -static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -                                   kvm_pte_t *ptep, struct hyp_map_data *data)
> +static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> +                                   struct hyp_map_data *data)
>  {
> -       kvm_pte_t new, old = *ptep;
> -       u64 granule = kvm_granule_size(level), phys = data->phys;
> +       kvm_pte_t new, old = *ctx->ptep;
> +       u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>
> -       if (!kvm_block_mapping_supported(addr, end, phys, level))
> +       if (!kvm_block_mapping_supported(ctx, phys))
>                 return false;
>
>         data->phys += granule;
> -       new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +       new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>         if (old == new)
>                 return true;
>         if (!kvm_pte_valid(old))
> -               data->mm_ops->get_page(ptep);
> +               data->mm_ops->get_page(ctx->ptep);
>         else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
> -       smp_store_release(ptep, new);
> +       smp_store_release(ctx->ptep, new);
>         return true;
>  }
>
> -static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                         enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                         enum kvm_pgtable_walk_flags visit)
>  {
>         kvm_pte_t *childp;
> -       struct hyp_map_data *data = arg;
> +       struct hyp_map_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
> +       if (hyp_map_walker_try_leaf(ctx, data))
>                 return 0;
>
> -       if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>         if (!childp)
>                 return -ENOMEM;
>
> -       kvm_set_table_pte(ptep, childp, mm_ops);
> -       mm_ops->get_page(ptep);
> +       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       mm_ops->get_page(ctx->ptep);
>         return 0;
>  }
>
> @@ -456,39 +458,39 @@ struct hyp_unmap_data {
>         struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
> -static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                           enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                           enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ptep, *childp = NULL;
> -       u64 granule = kvm_granule_size(level);
> -       struct hyp_unmap_data *data = arg;
> +       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       u64 granule = kvm_granule_size(ctx->level);
> +       struct hyp_unmap_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
>         if (!kvm_pte_valid(pte))
>                 return -EINVAL;
>
> -       if (kvm_pte_table(pte, level)) {
> +       if (kvm_pte_table(pte, ctx->level)) {
>                 childp = kvm_pte_follow(pte, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
>
> -               kvm_clear_pte(ptep);
> +               kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
> -               __tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
> +               __tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>         } else {
> -               if (end - addr < granule)
> +               if (ctx->end - ctx->addr < granule)
>                         return -EINVAL;
>
> -               kvm_clear_pte(ptep);
> +               kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
> -               __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
> +               __tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>                 data->unmapped += granule;
>         }
>
>         dsb(ish);
>         isb();
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
>         if (childp)
>                 mm_ops->put_page(childp);
> @@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>         return 0;
>  }
>
> -static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                          enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                          enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!kvm_pte_valid(pte))
>                 return 0;
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, level))
> +       if (kvm_pte_table(pte, ctx->level))
>                 mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>
>         return 0;
> @@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> -static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> -                          u32 level, struct kvm_pgtable_mm_ops *mm_ops)
> +static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> +                          struct kvm_pgtable_mm_ops *mm_ops)
>  {
>         /*
>          * Clear the existing PTE, and perform break-before-make with
>          * TLB maintenance if it was valid.
>          */
> -       if (kvm_pte_valid(*ptep)) {
> -               kvm_clear_pte(ptep);
> -               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +       if (kvm_pte_valid(*ctx->ptep)) {
> +               kvm_clear_pte(ctx->ptep);
> +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>         }
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>  }
>
>  static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
> @@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
>         return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
>  }
>
> -static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
> +static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>                                         struct stage2_map_data *data)
>  {
> -       if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> +       if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
>                 return false;
>
> -       return kvm_block_mapping_supported(addr, end, data->phys, level);
> +       return kvm_block_mapping_supported(ctx, data->phys);
>  }
>
> -static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> +static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       kvm_pte_t new, old = *ptep;
> -       u64 granule = kvm_granule_size(level), phys = data->phys;
> +       kvm_pte_t new, old = *ctx->ptep;
> +       u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +       if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return -E2BIG;
>
>         if (kvm_phys_is_valid(phys))
> -               new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +               new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> @@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>                 if (!stage2_pte_needs_update(old, new))
>                         return -EAGAIN;
>
> -               stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +               stage2_put_pte(ctx, data->mmu, mm_ops);
>         }
>
>         /* Perform CMOs before installation of the guest stage-2 PTE */
> @@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>         if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>                 mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>
> -       smp_store_release(ptep, new);
> +       smp_store_release(ctx->ptep, new);
>         if (stage2_pte_is_counted(new))
> -               mm_ops->get_page(ptep);
> +               mm_ops->get_page(ctx->ptep);
>         if (kvm_phys_is_valid(phys))
>                 data->phys += granule;
>         return 0;
>  }
>
> -static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> -                                    kvm_pte_t *ptep,
> +static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>                                      struct stage2_map_data *data)
>  {
>         if (data->anchor)
>                 return 0;
>
> -       if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +       if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> -       kvm_clear_pte(ptep);
> +       data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +       kvm_clear_pte(ctx->ptep);
>
>         /*
>          * Invalidate the whole stage-2, as we may have numerous leaf
> @@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>          * individually.
>          */
>         kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -       data->anchor = ptep;
> +       data->anchor = ctx->ptep;
>         return 0;
>  }
>
> -static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -       kvm_pte_t *childp, pte = *ptep;
> +       kvm_pte_t *childp, pte = *ctx->ptep;
>         int ret;
>
>         if (data->anchor) {
>                 if (stage2_pte_is_counted(pte))
> -                       mm_ops->put_page(ptep);
> +                       mm_ops->put_page(ctx->ptep);
>
>                 return 0;
>         }
>
> -       ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
> +       ret = stage2_map_walker_try_leaf(ctx, data);
>         if (ret != -E2BIG)
>                 return ret;
>
> -       if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         if (!data->memcache)
> @@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>          * will be mapped lazily.
>          */
>         if (stage2_pte_is_counted(pte))
> -               stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +               stage2_put_pte(ctx, data->mmu, mm_ops);
>
> -       kvm_set_table_pte(ptep, childp, mm_ops);
> -       mm_ops->get_page(ptep);
> +       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       mm_ops->get_page(ctx->ptep);
>
>         return 0;
>  }
>
> -static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> +static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>         if (!data->anchor)
>                 return 0;
>
> -       if (data->anchor == ptep) {
> +       if (data->anchor == ctx->ptep) {
>                 childp = data->childp;
>                 data->anchor = NULL;
>                 data->childp = NULL;
> -               ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +               ret = stage2_map_walk_leaf(ctx, data);
>         } else {
> -               childp = kvm_pte_follow(*ptep, mm_ops);
> +               childp = kvm_pte_follow(*ctx->ptep, mm_ops);
>         }
>
>         mm_ops->put_page(childp);
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
>         return ret;
>  }
> @@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>   * the page-table, installing the block entry when it revisits the anchor
>   * pointer and clearing the anchor to NULL.
>   */
> -static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                            enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                            enum kvm_pgtable_walk_flags visit)
>  {
> -       struct stage2_map_data *data = arg;
> +       struct stage2_map_data *data = ctx->arg;
>
> -       switch (flag) {
> +       switch (visit) {
>         case KVM_PGTABLE_WALK_TABLE_PRE:
> -               return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +               return stage2_map_walk_table_pre(ctx, data);
>         case KVM_PGTABLE_WALK_LEAF:
> -               return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +               return stage2_map_walk_leaf(ctx, data);
>         case KVM_PGTABLE_WALK_TABLE_POST:
> -               return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +               return stage2_map_walk_table_post(ctx, data);
>         }
>
>         return -EINVAL;
> @@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         return ret;
>  }
>
> -static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                              enum kvm_pgtable_walk_flags flag,
> -                              void * const arg)
> +static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                              enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable *pgt = arg;
> +       struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ptep, *childp = NULL;
> +       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
>         bool need_flush = false;
>
>         if (!kvm_pte_valid(pte)) {
>                 if (stage2_pte_is_counted(pte)) {
> -                       kvm_clear_pte(ptep);
> -                       mm_ops->put_page(ptep);
> +                       kvm_clear_pte(ctx->ptep);
> +                       mm_ops->put_page(ctx->ptep);
>                 }
>                 return 0;
>         }
>
> -       if (kvm_pte_table(pte, level)) {
> +       if (kvm_pte_table(pte, ctx->level)) {
>                 childp = kvm_pte_follow(pte, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
> @@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>          * block entry and rely on the remaining portions being faulted
>          * back lazily.
>          */
> -       stage2_put_pte(ptep, mmu, addr, level, mm_ops);
> +       stage2_put_pte(ctx, mmu, mm_ops);
>
>         if (need_flush && mm_ops->dcache_clean_inval_poc)
>                 mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -                                              kvm_granule_size(level));
> +                                              kvm_granule_size(ctx->level));
>
>         if (childp)
>                 mm_ops->put_page(childp);
> @@ -1012,18 +1010,17 @@ struct stage2_attr_data {
>         struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
> -static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                             enum kvm_pgtable_walk_flags flag,
> -                             void * const arg)
> +static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                             enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ptep;
> -       struct stage2_attr_data *data = arg;
> +       kvm_pte_t pte = *ctx->ptep;
> +       struct stage2_attr_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
>         if (!kvm_pte_valid(pte))
>                 return 0;
>
> -       data->level = level;
> +       data->level = ctx->level;
>         data->pte = pte;
>         pte &= ~data->attr_clr;
>         pte |= data->attr_set;
> @@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>                  * stage-2 PTE if we are going to add executable permission.
>                  */
>                 if (mm_ops->icache_inval_pou &&
> -                   stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
> +                   stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
> -                                                 kvm_granule_size(level));
> -               WRITE_ONCE(*ptep, pte);
> +                                                 kvm_granule_size(ctx->level));
> +               WRITE_ONCE(*ctx->ptep, pte);
>         }
>
>         return 0;
> @@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>         return ret;
>  }
>
> -static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                              enum kvm_pgtable_walk_flags flag,
> -                              void * const arg)
> +static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                              enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable *pgt = arg;
> +       struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ptep;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
>                 return 0;
>
>         if (mm_ops->dcache_clean_inval_poc)
>                 mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -                                              kvm_granule_size(level));
> +                                              kvm_granule_size(ctx->level));
>         return 0;
>  }
>
> @@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>         return 0;
>  }
>
> -static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                             enum kvm_pgtable_walk_flags flag,
> -                             void * const arg)
> +static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                             enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!stage2_pte_is_counted(pte))
>                 return 0;
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, level))
> +       if (kvm_pte_table(pte, ctx->level))
>                 mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>
>         return 0;
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Passing new arguments by value to the visitor callbacks is extremely
> inflexible for stuffing new parameters used by only some of the
> visitors. Use a context structure instead and pass the pointer through
> to the visitor callback.
>
> While at it, redefine the 'flags' parameter to the visitor to contain
> the bit indicating the phase of the walk. Pass the entire set of flags
> through the context structure such that the walker can communicate
> additional state to the visitor callback.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

This looks good to me. It's all fairly mechanical and I don't see any
problems. I was a little confused by the walk context flags passed via
visit, because they seem somewhat redundant if the leaf-ness can be
determined by looking at the PTE, but perhaps that's not always
possible.

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>  arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>  4 files changed, 154 insertions(+), 156 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3252eb50ecfe..607f9bb8aab4 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>         KVM_PGTABLE_WALK_TABLE_POST             = BIT(2),
>  };
>
> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> -                                       kvm_pte_t *ptep,
> -                                       enum kvm_pgtable_walk_flags flag,
> -                                       void * const arg);
> +struct kvm_pgtable_visit_ctx {
> +       kvm_pte_t                               *ptep;
> +       void                                    *arg;
> +       u64                                     addr;
> +       u64                                     end;
> +       u32                                     level;
> +       enum kvm_pgtable_walk_flags             flags;
> +};
> +
> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> +                                       enum kvm_pgtable_walk_flags visit);
>
>  /**
>   * struct kvm_pgtable_walker - Hook into a page-table walk.
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1e78acf9662e..8f5b6a36a039 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -417,13 +417,11 @@ struct check_walk_data {
>         enum pkvm_page_state    (*get_page_state)(kvm_pte_t pte);
>  };
>
> -static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> -                                     enum kvm_pgtable_walk_flags flag,
> -                                     void * const arg)
> +static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
> +                                     enum kvm_pgtable_walk_flags visit)
>  {
> -       struct check_walk_data *d = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct check_walk_data *d = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
>                 return -EINVAL;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index e8d4ea2fcfa0..a293cf5eba1b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
>         hyp_put_page(&hpool, addr);
>  }
>
> -static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> -                                        kvm_pte_t *ptep,
> -                                        enum kvm_pgtable_walk_flags flag,
> -                                        void * const arg)
> +static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                                        enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
> -       kvm_pte_t pte = *ptep;
> +       kvm_pte_t pte = *ctx->ptep;
>         phys_addr_t phys;
>
>         if (!kvm_pte_valid(pte))
> @@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
>          * was unable to access the hyp_vmemmap and so the buddy allocator has
>          * initialised the refcount to '1'.
>          */
> -       mm_ops->get_page(ptep);
> -       if (flag != KVM_PGTABLE_WALK_LEAF)
> +       mm_ops->get_page(ctx->ptep);
> +       if (visit != KVM_PGTABLE_WALK_LEAF)
>                 return 0;
>
> -       if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         phys = kvm_pte_to_phys(pte);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index cdf8e76b0be1..900c8b9c0cfc 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
>         return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
>  }
>
> -static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> +static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
>  {
> -       u64 granule = kvm_granule_size(level);
> +       u64 granule = kvm_granule_size(ctx->level);
>
> -       if (!kvm_level_supports_block_mapping(level))
> +       if (!kvm_level_supports_block_mapping(ctx->level))
>                 return false;
>
> -       if (granule > (end - addr))
> +       if (granule > (ctx->end - ctx->addr))
>                 return false;
>
>         if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
>                 return false;
>
> -       return IS_ALIGNED(addr, granule);
> +       return IS_ALIGNED(ctx->addr, granule);
>  }
>
>  static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> @@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
>         return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
>  }
>
> -static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> -                                 u32 level, kvm_pte_t *ptep,
> -                                 enum kvm_pgtable_walk_flags flag)
> +static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
> +                                 const struct kvm_pgtable_visit_ctx *ctx,
> +                                 enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_walker *walker = data->walker;
> -       return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
> +       return walker->cb(ctx, visit);
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> @@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       kvm_pte_t *ptep, u32 level)
>  {
> +       enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +       struct kvm_pgtable_visit_ctx ctx = {
> +               .ptep   = ptep,
> +               .arg    = data->walker->arg,
> +               .addr   = data->addr,
> +               .end    = data->end,
> +               .level  = level,
> +               .flags  = flags,
> +       };
>         int ret = 0;
> -       u64 addr = data->addr;
>         kvm_pte_t *childp, pte = *ptep;
>         bool table = kvm_pte_table(pte, level);
> -       enum kvm_pgtable_walk_flags flags = data->walker->flags;
>
> -       if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_TABLE_PRE);
> -       }
> +       if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>
> -       if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_LEAF);
> +       if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
>                 pte = *ptep;
>                 table = kvm_pte_table(pte, level);
>         }
> @@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>         if (ret)
>                 goto out;
>
> -       if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_TABLE_POST);
> -       }
> +       if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
>
>  out:
>         return ret;
> @@ -292,13 +294,13 @@ struct leaf_walk_data {
>         u32             level;
>  };
>
> -static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                      enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                      enum kvm_pgtable_walk_flags visit)
>  {
> -       struct leaf_walk_data *data = arg;
> +       struct leaf_walk_data *data = ctx->arg;
>
> -       data->pte   = *ptep;
> -       data->level = level;
> +       data->pte   = *ctx->ptep;
> +       data->level = ctx->level;
>
>         return 0;
>  }
> @@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>         return prot;
>  }
>
> -static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -                                   kvm_pte_t *ptep, struct hyp_map_data *data)
> +static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> +                                   struct hyp_map_data *data)
>  {
> -       kvm_pte_t new, old = *ptep;
> -       u64 granule = kvm_granule_size(level), phys = data->phys;
> +       kvm_pte_t new, old = *ctx->ptep;
> +       u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>
> -       if (!kvm_block_mapping_supported(addr, end, phys, level))
> +       if (!kvm_block_mapping_supported(ctx, phys))
>                 return false;
>
>         data->phys += granule;
> -       new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +       new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>         if (old == new)
>                 return true;
>         if (!kvm_pte_valid(old))
> -               data->mm_ops->get_page(ptep);
> +               data->mm_ops->get_page(ctx->ptep);
>         else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
> -       smp_store_release(ptep, new);
> +       smp_store_release(ctx->ptep, new);
>         return true;
>  }
>
> -static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                         enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                         enum kvm_pgtable_walk_flags visit)
>  {
>         kvm_pte_t *childp;
> -       struct hyp_map_data *data = arg;
> +       struct hyp_map_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
> +       if (hyp_map_walker_try_leaf(ctx, data))
>                 return 0;
>
> -       if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>         if (!childp)
>                 return -ENOMEM;
>
> -       kvm_set_table_pte(ptep, childp, mm_ops);
> -       mm_ops->get_page(ptep);
> +       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       mm_ops->get_page(ctx->ptep);
>         return 0;
>  }
>
> @@ -456,39 +458,39 @@ struct hyp_unmap_data {
>         struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
> -static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                           enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                           enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ptep, *childp = NULL;
> -       u64 granule = kvm_granule_size(level);
> -       struct hyp_unmap_data *data = arg;
> +       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       u64 granule = kvm_granule_size(ctx->level);
> +       struct hyp_unmap_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
>         if (!kvm_pte_valid(pte))
>                 return -EINVAL;
>
> -       if (kvm_pte_table(pte, level)) {
> +       if (kvm_pte_table(pte, ctx->level)) {
>                 childp = kvm_pte_follow(pte, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
>
> -               kvm_clear_pte(ptep);
> +               kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
> -               __tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
> +               __tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>         } else {
> -               if (end - addr < granule)
> +               if (ctx->end - ctx->addr < granule)
>                         return -EINVAL;
>
> -               kvm_clear_pte(ptep);
> +               kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
> -               __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
> +               __tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>                 data->unmapped += granule;
>         }
>
>         dsb(ish);
>         isb();
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
>         if (childp)
>                 mm_ops->put_page(childp);
> @@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>         return 0;
>  }
>
> -static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                          enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                          enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!kvm_pte_valid(pte))
>                 return 0;
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, level))
> +       if (kvm_pte_table(pte, ctx->level))
>                 mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>
>         return 0;
> @@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> -static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> -                          u32 level, struct kvm_pgtable_mm_ops *mm_ops)
> +static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> +                          struct kvm_pgtable_mm_ops *mm_ops)
>  {
>         /*
>          * Clear the existing PTE, and perform break-before-make with
>          * TLB maintenance if it was valid.
>          */
> -       if (kvm_pte_valid(*ptep)) {
> -               kvm_clear_pte(ptep);
> -               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +       if (kvm_pte_valid(*ctx->ptep)) {
> +               kvm_clear_pte(ctx->ptep);
> +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>         }
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>  }
>
>  static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
> @@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
>         return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
>  }
>
> -static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
> +static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>                                         struct stage2_map_data *data)
>  {
> -       if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> +       if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
>                 return false;
>
> -       return kvm_block_mapping_supported(addr, end, data->phys, level);
> +       return kvm_block_mapping_supported(ctx, data->phys);
>  }
>
> -static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> +static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       kvm_pte_t new, old = *ptep;
> -       u64 granule = kvm_granule_size(level), phys = data->phys;
> +       kvm_pte_t new, old = *ctx->ptep;
> +       u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +       if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return -E2BIG;
>
>         if (kvm_phys_is_valid(phys))
> -               new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +               new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> @@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>                 if (!stage2_pte_needs_update(old, new))
>                         return -EAGAIN;
>
> -               stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +               stage2_put_pte(ctx, data->mmu, mm_ops);
>         }
>
>         /* Perform CMOs before installation of the guest stage-2 PTE */
> @@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>         if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>                 mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>
> -       smp_store_release(ptep, new);
> +       smp_store_release(ctx->ptep, new);
>         if (stage2_pte_is_counted(new))
> -               mm_ops->get_page(ptep);
> +               mm_ops->get_page(ctx->ptep);
>         if (kvm_phys_is_valid(phys))
>                 data->phys += granule;
>         return 0;
>  }
>
> -static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> -                                    kvm_pte_t *ptep,
> +static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>                                      struct stage2_map_data *data)
>  {
>         if (data->anchor)
>                 return 0;
>
> -       if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +       if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> -       kvm_clear_pte(ptep);
> +       data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +       kvm_clear_pte(ctx->ptep);
>
>         /*
>          * Invalidate the whole stage-2, as we may have numerous leaf
> @@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>          * individually.
>          */
>         kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -       data->anchor = ptep;
> +       data->anchor = ctx->ptep;
>         return 0;
>  }
>
> -static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -       kvm_pte_t *childp, pte = *ptep;
> +       kvm_pte_t *childp, pte = *ctx->ptep;
>         int ret;
>
>         if (data->anchor) {
>                 if (stage2_pte_is_counted(pte))
> -                       mm_ops->put_page(ptep);
> +                       mm_ops->put_page(ctx->ptep);
>
>                 return 0;
>         }
>
> -       ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
> +       ret = stage2_map_walker_try_leaf(ctx, data);
>         if (ret != -E2BIG)
>                 return ret;
>
> -       if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         if (!data->memcache)
> @@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>          * will be mapped lazily.
>          */
>         if (stage2_pte_is_counted(pte))
> -               stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +               stage2_put_pte(ctx, data->mmu, mm_ops);
>
> -       kvm_set_table_pte(ptep, childp, mm_ops);
> -       mm_ops->get_page(ptep);
> +       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       mm_ops->get_page(ctx->ptep);
>
>         return 0;
>  }
>
> -static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> +static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>         if (!data->anchor)
>                 return 0;
>
> -       if (data->anchor == ptep) {
> +       if (data->anchor == ctx->ptep) {
>                 childp = data->childp;
>                 data->anchor = NULL;
>                 data->childp = NULL;
> -               ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +               ret = stage2_map_walk_leaf(ctx, data);
>         } else {
> -               childp = kvm_pte_follow(*ptep, mm_ops);
> +               childp = kvm_pte_follow(*ctx->ptep, mm_ops);
>         }
>
>         mm_ops->put_page(childp);
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
>         return ret;
>  }
> @@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>   * the page-table, installing the block entry when it revisits the anchor
>   * pointer and clearing the anchor to NULL.
>   */
> -static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                            enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                            enum kvm_pgtable_walk_flags visit)
>  {
> -       struct stage2_map_data *data = arg;
> +       struct stage2_map_data *data = ctx->arg;
>
> -       switch (flag) {
> +       switch (visit) {
>         case KVM_PGTABLE_WALK_TABLE_PRE:
> -               return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +               return stage2_map_walk_table_pre(ctx, data);
>         case KVM_PGTABLE_WALK_LEAF:
> -               return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +               return stage2_map_walk_leaf(ctx, data);
>         case KVM_PGTABLE_WALK_TABLE_POST:
> -               return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +               return stage2_map_walk_table_post(ctx, data);
>         }
>
>         return -EINVAL;
> @@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         return ret;
>  }
>
> -static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                              enum kvm_pgtable_walk_flags flag,
> -                              void * const arg)
> +static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                              enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable *pgt = arg;
> +       struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ptep, *childp = NULL;
> +       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
>         bool need_flush = false;
>
>         if (!kvm_pte_valid(pte)) {
>                 if (stage2_pte_is_counted(pte)) {
> -                       kvm_clear_pte(ptep);
> -                       mm_ops->put_page(ptep);
> +                       kvm_clear_pte(ctx->ptep);
> +                       mm_ops->put_page(ctx->ptep);
>                 }
>                 return 0;
>         }
>
> -       if (kvm_pte_table(pte, level)) {
> +       if (kvm_pte_table(pte, ctx->level)) {
>                 childp = kvm_pte_follow(pte, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
> @@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>          * block entry and rely on the remaining portions being faulted
>          * back lazily.
>          */
> -       stage2_put_pte(ptep, mmu, addr, level, mm_ops);
> +       stage2_put_pte(ctx, mmu, mm_ops);
>
>         if (need_flush && mm_ops->dcache_clean_inval_poc)
>                 mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -                                              kvm_granule_size(level));
> +                                              kvm_granule_size(ctx->level));
>
>         if (childp)
>                 mm_ops->put_page(childp);
> @@ -1012,18 +1010,17 @@ struct stage2_attr_data {
>         struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
> -static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                             enum kvm_pgtable_walk_flags flag,
> -                             void * const arg)
> +static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                             enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ptep;
> -       struct stage2_attr_data *data = arg;
> +       kvm_pte_t pte = *ctx->ptep;
> +       struct stage2_attr_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
>         if (!kvm_pte_valid(pte))
>                 return 0;
>
> -       data->level = level;
> +       data->level = ctx->level;
>         data->pte = pte;
>         pte &= ~data->attr_clr;
>         pte |= data->attr_set;
> @@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>                  * stage-2 PTE if we are going to add executable permission.
>                  */
>                 if (mm_ops->icache_inval_pou &&
> -                   stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
> +                   stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
> -                                                 kvm_granule_size(level));
> -               WRITE_ONCE(*ptep, pte);
> +                                                 kvm_granule_size(ctx->level));
> +               WRITE_ONCE(*ctx->ptep, pte);
>         }
>
>         return 0;
> @@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>         return ret;
>  }
>
> -static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                              enum kvm_pgtable_walk_flags flag,
> -                              void * const arg)
> +static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                              enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable *pgt = arg;
> +       struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ptep;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
>                 return 0;
>
>         if (mm_ops->dcache_clean_inval_poc)
>                 mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -                                              kvm_granule_size(level));
> +                                              kvm_granule_size(ctx->level));
>         return 0;
>  }
>
> @@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>         return 0;
>  }
>
> -static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                             enum kvm_pgtable_walk_flags flag,
> -                             void * const arg)
> +static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                             enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!stage2_pte_is_counted(pte))
>                 return 0;
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, level))
> +       if (kvm_pte_table(pte, ctx->level))
>                 mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>
>         return 0;
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Passing new arguments by value to the visitor callbacks is extremely
> inflexible for stuffing new parameters used by only some of the
> visitors. Use a context structure instead and pass the pointer through
> to the visitor callback.
>
> While at it, redefine the 'flags' parameter to the visitor to contain
> the bit indicating the phase of the walk. Pass the entire set of flags
> through the context structure such that the walker can communicate
> additional state to the visitor callback.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

This looks good to me. It's all fairly mechanical and I don't see any
problems. I was a little confused by the walk context flags passed via
visit, because they seem somewhat redundant if the leaf-ness can be
determined by looking at the PTE, but perhaps that's not always
possible.

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>  arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>  4 files changed, 154 insertions(+), 156 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3252eb50ecfe..607f9bb8aab4 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>         KVM_PGTABLE_WALK_TABLE_POST             = BIT(2),
>  };
>
> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> -                                       kvm_pte_t *ptep,
> -                                       enum kvm_pgtable_walk_flags flag,
> -                                       void * const arg);
> +struct kvm_pgtable_visit_ctx {
> +       kvm_pte_t                               *ptep;
> +       void                                    *arg;
> +       u64                                     addr;
> +       u64                                     end;
> +       u32                                     level;
> +       enum kvm_pgtable_walk_flags             flags;
> +};
> +
> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> +                                       enum kvm_pgtable_walk_flags visit);
>
>  /**
>   * struct kvm_pgtable_walker - Hook into a page-table walk.
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1e78acf9662e..8f5b6a36a039 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -417,13 +417,11 @@ struct check_walk_data {
>         enum pkvm_page_state    (*get_page_state)(kvm_pte_t pte);
>  };
>
> -static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> -                                     enum kvm_pgtable_walk_flags flag,
> -                                     void * const arg)
> +static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
> +                                     enum kvm_pgtable_walk_flags visit)
>  {
> -       struct check_walk_data *d = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct check_walk_data *d = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
>                 return -EINVAL;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index e8d4ea2fcfa0..a293cf5eba1b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
>         hyp_put_page(&hpool, addr);
>  }
>
> -static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> -                                        kvm_pte_t *ptep,
> -                                        enum kvm_pgtable_walk_flags flag,
> -                                        void * const arg)
> +static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                                        enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
> -       kvm_pte_t pte = *ptep;
> +       kvm_pte_t pte = *ctx->ptep;
>         phys_addr_t phys;
>
>         if (!kvm_pte_valid(pte))
> @@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
>          * was unable to access the hyp_vmemmap and so the buddy allocator has
>          * initialised the refcount to '1'.
>          */
> -       mm_ops->get_page(ptep);
> -       if (flag != KVM_PGTABLE_WALK_LEAF)
> +       mm_ops->get_page(ctx->ptep);
> +       if (visit != KVM_PGTABLE_WALK_LEAF)
>                 return 0;
>
> -       if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         phys = kvm_pte_to_phys(pte);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index cdf8e76b0be1..900c8b9c0cfc 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
>         return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
>  }
>
> -static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> +static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
>  {
> -       u64 granule = kvm_granule_size(level);
> +       u64 granule = kvm_granule_size(ctx->level);
>
> -       if (!kvm_level_supports_block_mapping(level))
> +       if (!kvm_level_supports_block_mapping(ctx->level))
>                 return false;
>
> -       if (granule > (end - addr))
> +       if (granule > (ctx->end - ctx->addr))
>                 return false;
>
>         if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
>                 return false;
>
> -       return IS_ALIGNED(addr, granule);
> +       return IS_ALIGNED(ctx->addr, granule);
>  }
>
>  static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> @@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
>         return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
>  }
>
> -static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> -                                 u32 level, kvm_pte_t *ptep,
> -                                 enum kvm_pgtable_walk_flags flag)
> +static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
> +                                 const struct kvm_pgtable_visit_ctx *ctx,
> +                                 enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_walker *walker = data->walker;
> -       return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
> +       return walker->cb(ctx, visit);
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> @@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       kvm_pte_t *ptep, u32 level)
>  {
> +       enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +       struct kvm_pgtable_visit_ctx ctx = {
> +               .ptep   = ptep,
> +               .arg    = data->walker->arg,
> +               .addr   = data->addr,
> +               .end    = data->end,
> +               .level  = level,
> +               .flags  = flags,
> +       };
>         int ret = 0;
> -       u64 addr = data->addr;
>         kvm_pte_t *childp, pte = *ptep;
>         bool table = kvm_pte_table(pte, level);
> -       enum kvm_pgtable_walk_flags flags = data->walker->flags;
>
> -       if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_TABLE_PRE);
> -       }
> +       if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>
> -       if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_LEAF);
> +       if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
>                 pte = *ptep;
>                 table = kvm_pte_table(pte, level);
>         }
> @@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>         if (ret)
>                 goto out;
>
> -       if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
> -               ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -                                            KVM_PGTABLE_WALK_TABLE_POST);
> -       }
> +       if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
> +               ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
>
>  out:
>         return ret;
> @@ -292,13 +294,13 @@ struct leaf_walk_data {
>         u32             level;
>  };
>
> -static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                      enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                      enum kvm_pgtable_walk_flags visit)
>  {
> -       struct leaf_walk_data *data = arg;
> +       struct leaf_walk_data *data = ctx->arg;
>
> -       data->pte   = *ptep;
> -       data->level = level;
> +       data->pte   = *ctx->ptep;
> +       data->level = ctx->level;
>
>         return 0;
>  }
> @@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>         return prot;
>  }
>
> -static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -                                   kvm_pte_t *ptep, struct hyp_map_data *data)
> +static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> +                                   struct hyp_map_data *data)
>  {
> -       kvm_pte_t new, old = *ptep;
> -       u64 granule = kvm_granule_size(level), phys = data->phys;
> +       kvm_pte_t new, old = *ctx->ptep;
> +       u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>
> -       if (!kvm_block_mapping_supported(addr, end, phys, level))
> +       if (!kvm_block_mapping_supported(ctx, phys))
>                 return false;
>
>         data->phys += granule;
> -       new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +       new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>         if (old == new)
>                 return true;
>         if (!kvm_pte_valid(old))
> -               data->mm_ops->get_page(ptep);
> +               data->mm_ops->get_page(ctx->ptep);
>         else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
> -       smp_store_release(ptep, new);
> +       smp_store_release(ctx->ptep, new);
>         return true;
>  }
>
> -static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                         enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                         enum kvm_pgtable_walk_flags visit)
>  {
>         kvm_pte_t *childp;
> -       struct hyp_map_data *data = arg;
> +       struct hyp_map_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
> +       if (hyp_map_walker_try_leaf(ctx, data))
>                 return 0;
>
> -       if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>         if (!childp)
>                 return -ENOMEM;
>
> -       kvm_set_table_pte(ptep, childp, mm_ops);
> -       mm_ops->get_page(ptep);
> +       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       mm_ops->get_page(ctx->ptep);
>         return 0;
>  }
>
> @@ -456,39 +458,39 @@ struct hyp_unmap_data {
>         struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
> -static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                           enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                           enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ptep, *childp = NULL;
> -       u64 granule = kvm_granule_size(level);
> -       struct hyp_unmap_data *data = arg;
> +       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       u64 granule = kvm_granule_size(ctx->level);
> +       struct hyp_unmap_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
>         if (!kvm_pte_valid(pte))
>                 return -EINVAL;
>
> -       if (kvm_pte_table(pte, level)) {
> +       if (kvm_pte_table(pte, ctx->level)) {
>                 childp = kvm_pte_follow(pte, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
>
> -               kvm_clear_pte(ptep);
> +               kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
> -               __tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
> +               __tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>         } else {
> -               if (end - addr < granule)
> +               if (ctx->end - ctx->addr < granule)
>                         return -EINVAL;
>
> -               kvm_clear_pte(ptep);
> +               kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
> -               __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
> +               __tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>                 data->unmapped += granule;
>         }
>
>         dsb(ish);
>         isb();
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
>         if (childp)
>                 mm_ops->put_page(childp);
> @@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>         return 0;
>  }
>
> -static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                          enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                          enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!kvm_pte_valid(pte))
>                 return 0;
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, level))
> +       if (kvm_pte_table(pte, ctx->level))
>                 mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>
>         return 0;
> @@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> -static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> -                          u32 level, struct kvm_pgtable_mm_ops *mm_ops)
> +static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> +                          struct kvm_pgtable_mm_ops *mm_ops)
>  {
>         /*
>          * Clear the existing PTE, and perform break-before-make with
>          * TLB maintenance if it was valid.
>          */
> -       if (kvm_pte_valid(*ptep)) {
> -               kvm_clear_pte(ptep);
> -               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +       if (kvm_pte_valid(*ctx->ptep)) {
> +               kvm_clear_pte(ctx->ptep);
> +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>         }
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>  }
>
>  static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
> @@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
>         return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
>  }
>
> -static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
> +static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>                                         struct stage2_map_data *data)
>  {
> -       if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> +       if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
>                 return false;
>
> -       return kvm_block_mapping_supported(addr, end, data->phys, level);
> +       return kvm_block_mapping_supported(ctx, data->phys);
>  }
>
> -static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> +static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       kvm_pte_t new, old = *ptep;
> -       u64 granule = kvm_granule_size(level), phys = data->phys;
> +       kvm_pte_t new, old = *ctx->ptep;
> +       u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +       if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return -E2BIG;
>
>         if (kvm_phys_is_valid(phys))
> -               new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +               new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> @@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>                 if (!stage2_pte_needs_update(old, new))
>                         return -EAGAIN;
>
> -               stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +               stage2_put_pte(ctx, data->mmu, mm_ops);
>         }
>
>         /* Perform CMOs before installation of the guest stage-2 PTE */
> @@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>         if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>                 mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>
> -       smp_store_release(ptep, new);
> +       smp_store_release(ctx->ptep, new);
>         if (stage2_pte_is_counted(new))
> -               mm_ops->get_page(ptep);
> +               mm_ops->get_page(ctx->ptep);
>         if (kvm_phys_is_valid(phys))
>                 data->phys += granule;
>         return 0;
>  }
>
> -static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> -                                    kvm_pte_t *ptep,
> +static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>                                      struct stage2_map_data *data)
>  {
>         if (data->anchor)
>                 return 0;
>
> -       if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +       if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> -       kvm_clear_pte(ptep);
> +       data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +       kvm_clear_pte(ctx->ptep);
>
>         /*
>          * Invalidate the whole stage-2, as we may have numerous leaf
> @@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>          * individually.
>          */
>         kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -       data->anchor = ptep;
> +       data->anchor = ctx->ptep;
>         return 0;
>  }
>
> -static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -       kvm_pte_t *childp, pte = *ptep;
> +       kvm_pte_t *childp, pte = *ctx->ptep;
>         int ret;
>
>         if (data->anchor) {
>                 if (stage2_pte_is_counted(pte))
> -                       mm_ops->put_page(ptep);
> +                       mm_ops->put_page(ctx->ptep);
>
>                 return 0;
>         }
>
> -       ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
> +       ret = stage2_map_walker_try_leaf(ctx, data);
>         if (ret != -E2BIG)
>                 return ret;
>
> -       if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +       if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
>         if (!data->memcache)
> @@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>          * will be mapped lazily.
>          */
>         if (stage2_pte_is_counted(pte))
> -               stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +               stage2_put_pte(ctx, data->mmu, mm_ops);
>
> -       kvm_set_table_pte(ptep, childp, mm_ops);
> -       mm_ops->get_page(ptep);
> +       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       mm_ops->get_page(ctx->ptep);
>
>         return 0;
>  }
>
> -static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> -                                     kvm_pte_t *ptep,
> +static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>         if (!data->anchor)
>                 return 0;
>
> -       if (data->anchor == ptep) {
> +       if (data->anchor == ctx->ptep) {
>                 childp = data->childp;
>                 data->anchor = NULL;
>                 data->childp = NULL;
> -               ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +               ret = stage2_map_walk_leaf(ctx, data);
>         } else {
> -               childp = kvm_pte_follow(*ptep, mm_ops);
> +               childp = kvm_pte_follow(*ctx->ptep, mm_ops);
>         }
>
>         mm_ops->put_page(childp);
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
>         return ret;
>  }
> @@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>   * the page-table, installing the block entry when it revisits the anchor
>   * pointer and clearing the anchor to NULL.
>   */
> -static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                            enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                            enum kvm_pgtable_walk_flags visit)
>  {
> -       struct stage2_map_data *data = arg;
> +       struct stage2_map_data *data = ctx->arg;
>
> -       switch (flag) {
> +       switch (visit) {
>         case KVM_PGTABLE_WALK_TABLE_PRE:
> -               return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +               return stage2_map_walk_table_pre(ctx, data);
>         case KVM_PGTABLE_WALK_LEAF:
> -               return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +               return stage2_map_walk_leaf(ctx, data);
>         case KVM_PGTABLE_WALK_TABLE_POST:
> -               return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +               return stage2_map_walk_table_post(ctx, data);
>         }
>
>         return -EINVAL;
> @@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         return ret;
>  }
>
> -static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                              enum kvm_pgtable_walk_flags flag,
> -                              void * const arg)
> +static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                              enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable *pgt = arg;
> +       struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ptep, *childp = NULL;
> +       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
>         bool need_flush = false;
>
>         if (!kvm_pte_valid(pte)) {
>                 if (stage2_pte_is_counted(pte)) {
> -                       kvm_clear_pte(ptep);
> -                       mm_ops->put_page(ptep);
> +                       kvm_clear_pte(ctx->ptep);
> +                       mm_ops->put_page(ctx->ptep);
>                 }
>                 return 0;
>         }
>
> -       if (kvm_pte_table(pte, level)) {
> +       if (kvm_pte_table(pte, ctx->level)) {
>                 childp = kvm_pte_follow(pte, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
> @@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>          * block entry and rely on the remaining portions being faulted
>          * back lazily.
>          */
> -       stage2_put_pte(ptep, mmu, addr, level, mm_ops);
> +       stage2_put_pte(ctx, mmu, mm_ops);
>
>         if (need_flush && mm_ops->dcache_clean_inval_poc)
>                 mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -                                              kvm_granule_size(level));
> +                                              kvm_granule_size(ctx->level));
>
>         if (childp)
>                 mm_ops->put_page(childp);
> @@ -1012,18 +1010,17 @@ struct stage2_attr_data {
>         struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
> -static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                             enum kvm_pgtable_walk_flags flag,
> -                             void * const arg)
> +static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                             enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ptep;
> -       struct stage2_attr_data *data = arg;
> +       kvm_pte_t pte = *ctx->ptep;
> +       struct stage2_attr_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
>         if (!kvm_pte_valid(pte))
>                 return 0;
>
> -       data->level = level;
> +       data->level = ctx->level;
>         data->pte = pte;
>         pte &= ~data->attr_clr;
>         pte |= data->attr_set;
> @@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>                  * stage-2 PTE if we are going to add executable permission.
>                  */
>                 if (mm_ops->icache_inval_pou &&
> -                   stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
> +                   stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
> -                                                 kvm_granule_size(level));
> -               WRITE_ONCE(*ptep, pte);
> +                                                 kvm_granule_size(ctx->level));
> +               WRITE_ONCE(*ctx->ptep, pte);
>         }
>
>         return 0;
> @@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>         return ret;
>  }
>
> -static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                              enum kvm_pgtable_walk_flags flag,
> -                              void * const arg)
> +static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                              enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable *pgt = arg;
> +       struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ptep;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
>                 return 0;
>
>         if (mm_ops->dcache_clean_inval_poc)
>                 mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -                                              kvm_granule_size(level));
> +                                              kvm_granule_size(ctx->level));
>         return 0;
>  }
>
> @@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>         return 0;
>  }
>
> -static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -                             enum kvm_pgtable_walk_flags flag,
> -                             void * const arg)
> +static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +                             enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = arg;
> -       kvm_pte_t pte = *ptep;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       kvm_pte_t pte = *ctx->ptep;
>
>         if (!stage2_pte_is_counted(pte))
>                 return 0;
>
> -       mm_ops->put_page(ptep);
> +       mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, level))
> +       if (kvm_pte_table(pte, ctx->level))
>                 mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>
>         return 0;
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:23     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Rather than reading the ptep all over the shop, read the ptep once from
> __kvm_pgtable_visit() and stick it in the visitor context. Reread the
> ptep after visiting a leaf in case the callback installed a new table
> underneath.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Looks good to me.
Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  1 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
>  arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
>  4 files changed, 48 insertions(+), 51 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 607f9bb8aab4..14d4b68a1e92 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
>
>  struct kvm_pgtable_visit_ctx {
>         kvm_pte_t                               *ptep;
> +       kvm_pte_t                               old;
>         void                                    *arg;
>         u64                                     addr;
>         u64                                     end;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 8f5b6a36a039..d21d1b08a055 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
>                                       enum kvm_pgtable_walk_flags visit)
>  {
>         struct check_walk_data *d = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
> +       if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
>                 return -EINVAL;
>
> -       return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
> +       return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
>  }
>
>  static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index a293cf5eba1b..6af443c9d78e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
> -       kvm_pte_t pte = *ctx->ptep;
>         phys_addr_t phys;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         /*
> @@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>         if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
> -       phys = kvm_pte_to_phys(pte);
> +       phys = kvm_pte_to_phys(ctx->old);
>         if (!addr_is_memory(phys))
>                 return -EINVAL;
>
> @@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>          * Adjust the host stage-2 mappings to match the ownership attributes
>          * configured in the hypervisor stage-1.
>          */
> -       state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> +       state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
>         switch (state) {
>         case PKVM_PAGE_OWNED:
>                 return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 900c8b9c0cfc..fb3696b3a997 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
> +               .old    = READ_ONCE(*ptep),
>                 .arg    = data->walker->arg,
>                 .addr   = data->addr,
>                 .end    = data->end,
> @@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .flags  = flags,
>         };
>         int ret = 0;
> -       kvm_pte_t *childp, pte = *ptep;
> -       bool table = kvm_pte_table(pte, level);
> +       kvm_pte_t *childp;
> +       bool table = kvm_pte_table(ctx.old, level);
>
>         if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
>                 ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>
>         if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
>                 ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
> -               pte = *ptep;
> -               table = kvm_pte_table(pte, level);
> +               ctx.old = READ_ONCE(*ptep);
> +               table = kvm_pte_table(ctx.old, level);
>         }
>
>         if (ret)
> @@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(pte, data->pgt->mm_ops);
> +       childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
>         ret = __kvm_pgtable_walk(data, childp, level + 1);
>         if (ret)
>                 goto out;
> @@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct leaf_walk_data *data = ctx->arg;
>
> -       data->pte   = *ctx->ptep;
> +       data->pte   = ctx->old;
>         data->level = ctx->level;
>
>         return 0;
> @@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>  static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                     struct hyp_map_data *data)
>  {
> -       kvm_pte_t new, old = *ctx->ptep;
> +       kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>
>         if (!kvm_block_mapping_supported(ctx, phys))
> @@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>
>         data->phys += granule;
>         new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
> -       if (old == new)
> +       if (ctx->old == new)
>                 return true;
> -       if (!kvm_pte_valid(old))
> +       if (!kvm_pte_valid(ctx->old))
>                 data->mm_ops->get_page(ctx->ptep);
> -       else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
> +       else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
>         smp_store_release(ctx->ptep, new);
> @@ -461,16 +462,16 @@ struct hyp_unmap_data {
>  static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                             enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       kvm_pte_t *childp = NULL;
>         u64 granule = kvm_granule_size(ctx->level);
>         struct hyp_unmap_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return -EINVAL;
>
> -       if (kvm_pte_table(pte, ctx->level)) {
> -               childp = kvm_pte_follow(pte, mm_ops);
> +       if (kvm_pte_table(ctx->old, ctx->level)) {
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
> @@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                            enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, ctx->level))
> -               mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>
>         return 0;
>  }
> @@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
>          * Clear the existing PTE, and perform break-before-make with
>          * TLB maintenance if it was valid.
>          */
> -       if (kvm_pte_valid(*ctx->ptep)) {
> +       if (kvm_pte_valid(ctx->old)) {
>                 kvm_clear_pte(ctx->ptep);
>                 kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>         }
> @@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       kvm_pte_t new, old = *ctx->ptep;
> +       kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> -       if (stage2_pte_is_counted(old)) {
> +       if (stage2_pte_is_counted(ctx->old)) {
>                 /*
>                  * Skip updating the PTE if we are trying to recreate the exact
>                  * same mapping or only change the access permissions. Instead,
>                  * the vCPU will exit one more time from guest if still needed
>                  * and then go through the path of relaxing permissions.
>                  */
> -               if (!stage2_pte_needs_update(old, new))
> +               if (!stage2_pte_needs_update(ctx->old, new))
>                         return -EAGAIN;
>
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
> @@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +       data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -       kvm_pte_t *childp, pte = *ctx->ptep;
> +       kvm_pte_t *childp;
>         int ret;
>
>         if (data->anchor) {
> -               if (stage2_pte_is_counted(pte))
> +               if (stage2_pte_is_counted(ctx->old))
>                         mm_ops->put_page(ctx->ptep);
>
>                 return 0;
> @@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>          * a table. Accesses beyond 'end' that fall within the new table
>          * will be mapped lazily.
>          */
> -       if (stage2_pte_is_counted(pte))
> +       if (stage2_pte_is_counted(ctx->old))
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
>
>         kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> @@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                 data->childp = NULL;
>                 ret = stage2_map_walk_leaf(ctx, data);
>         } else {
> -               childp = kvm_pte_follow(*ctx->ptep, mm_ops);
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>         }
>
>         mm_ops->put_page(childp);
> @@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       kvm_pte_t *childp = NULL;
>         bool need_flush = false;
>
> -       if (!kvm_pte_valid(pte)) {
> -               if (stage2_pte_is_counted(pte)) {
> +       if (!kvm_pte_valid(ctx->old)) {
> +               if (stage2_pte_is_counted(ctx->old)) {
>                         kvm_clear_pte(ctx->ptep);
>                         mm_ops->put_page(ctx->ptep);
>                 }
>                 return 0;
>         }
>
> -       if (kvm_pte_table(pte, ctx->level)) {
> -               childp = kvm_pte_follow(pte, mm_ops);
> +       if (kvm_pte_table(ctx->old, ctx->level)) {
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
> -       } else if (stage2_pte_cacheable(pgt, pte)) {
> +       } else if (stage2_pte_cacheable(pgt, ctx->old)) {
>                 need_flush = !stage2_has_fwb(pgt);
>         }
>
> @@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         stage2_put_pte(ctx, mmu, mm_ops);
>
>         if (need_flush && mm_ops->dcache_clean_inval_poc)
> -               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>                                                kvm_granule_size(ctx->level));
>
>         if (childp)
> @@ -1013,11 +1013,11 @@ struct stage2_attr_data {
>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ctx->ptep;
> +       kvm_pte_t pte = ctx->old;
>         struct stage2_attr_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         data->level = ctx->level;
> @@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                  * stage-2 PTE if we are going to add executable permission.
>                  */
>                 if (mm_ops->icache_inval_pou &&
> -                   stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
> +                   stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>                                                   kvm_granule_size(ctx->level));
>                 WRITE_ONCE(*ctx->ptep, pte);
> @@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
> +       if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
>                 return 0;
>
>         if (mm_ops->dcache_clean_inval_poc)
> -               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>                                                kvm_granule_size(ctx->level));
>         return 0;
>  }
> @@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!stage2_pte_is_counted(pte))
> +       if (!stage2_pte_is_counted(ctx->old))
>                 return 0;
>
>         mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, ctx->level))
> -               mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Rather than reading the ptep all over the shop, read the ptep once from
> __kvm_pgtable_visit() and stick it in the visitor context. Reread the
> ptep after visiting a leaf in case the callback installed a new table
> underneath.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Looks good to me.
Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  1 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
>  arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
>  4 files changed, 48 insertions(+), 51 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 607f9bb8aab4..14d4b68a1e92 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
>
>  struct kvm_pgtable_visit_ctx {
>         kvm_pte_t                               *ptep;
> +       kvm_pte_t                               old;
>         void                                    *arg;
>         u64                                     addr;
>         u64                                     end;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 8f5b6a36a039..d21d1b08a055 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
>                                       enum kvm_pgtable_walk_flags visit)
>  {
>         struct check_walk_data *d = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
> +       if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
>                 return -EINVAL;
>
> -       return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
> +       return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
>  }
>
>  static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index a293cf5eba1b..6af443c9d78e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
> -       kvm_pte_t pte = *ctx->ptep;
>         phys_addr_t phys;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         /*
> @@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>         if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
> -       phys = kvm_pte_to_phys(pte);
> +       phys = kvm_pte_to_phys(ctx->old);
>         if (!addr_is_memory(phys))
>                 return -EINVAL;
>
> @@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>          * Adjust the host stage-2 mappings to match the ownership attributes
>          * configured in the hypervisor stage-1.
>          */
> -       state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> +       state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
>         switch (state) {
>         case PKVM_PAGE_OWNED:
>                 return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 900c8b9c0cfc..fb3696b3a997 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
> +               .old    = READ_ONCE(*ptep),
>                 .arg    = data->walker->arg,
>                 .addr   = data->addr,
>                 .end    = data->end,
> @@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .flags  = flags,
>         };
>         int ret = 0;
> -       kvm_pte_t *childp, pte = *ptep;
> -       bool table = kvm_pte_table(pte, level);
> +       kvm_pte_t *childp;
> +       bool table = kvm_pte_table(ctx.old, level);
>
>         if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
>                 ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>
>         if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
>                 ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
> -               pte = *ptep;
> -               table = kvm_pte_table(pte, level);
> +               ctx.old = READ_ONCE(*ptep);
> +               table = kvm_pte_table(ctx.old, level);
>         }
>
>         if (ret)
> @@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(pte, data->pgt->mm_ops);
> +       childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
>         ret = __kvm_pgtable_walk(data, childp, level + 1);
>         if (ret)
>                 goto out;
> @@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct leaf_walk_data *data = ctx->arg;
>
> -       data->pte   = *ctx->ptep;
> +       data->pte   = ctx->old;
>         data->level = ctx->level;
>
>         return 0;
> @@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>  static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                     struct hyp_map_data *data)
>  {
> -       kvm_pte_t new, old = *ctx->ptep;
> +       kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>
>         if (!kvm_block_mapping_supported(ctx, phys))
> @@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>
>         data->phys += granule;
>         new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
> -       if (old == new)
> +       if (ctx->old == new)
>                 return true;
> -       if (!kvm_pte_valid(old))
> +       if (!kvm_pte_valid(ctx->old))
>                 data->mm_ops->get_page(ctx->ptep);
> -       else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
> +       else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
>         smp_store_release(ctx->ptep, new);
> @@ -461,16 +462,16 @@ struct hyp_unmap_data {
>  static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                             enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       kvm_pte_t *childp = NULL;
>         u64 granule = kvm_granule_size(ctx->level);
>         struct hyp_unmap_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return -EINVAL;
>
> -       if (kvm_pte_table(pte, ctx->level)) {
> -               childp = kvm_pte_follow(pte, mm_ops);
> +       if (kvm_pte_table(ctx->old, ctx->level)) {
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
> @@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                            enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, ctx->level))
> -               mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>
>         return 0;
>  }
> @@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
>          * Clear the existing PTE, and perform break-before-make with
>          * TLB maintenance if it was valid.
>          */
> -       if (kvm_pte_valid(*ctx->ptep)) {
> +       if (kvm_pte_valid(ctx->old)) {
>                 kvm_clear_pte(ctx->ptep);
>                 kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>         }
> @@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       kvm_pte_t new, old = *ctx->ptep;
> +       kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> -       if (stage2_pte_is_counted(old)) {
> +       if (stage2_pte_is_counted(ctx->old)) {
>                 /*
>                  * Skip updating the PTE if we are trying to recreate the exact
>                  * same mapping or only change the access permissions. Instead,
>                  * the vCPU will exit one more time from guest if still needed
>                  * and then go through the path of relaxing permissions.
>                  */
> -               if (!stage2_pte_needs_update(old, new))
> +               if (!stage2_pte_needs_update(ctx->old, new))
>                         return -EAGAIN;
>
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
> @@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +       data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -       kvm_pte_t *childp, pte = *ctx->ptep;
> +       kvm_pte_t *childp;
>         int ret;
>
>         if (data->anchor) {
> -               if (stage2_pte_is_counted(pte))
> +               if (stage2_pte_is_counted(ctx->old))
>                         mm_ops->put_page(ctx->ptep);
>
>                 return 0;
> @@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>          * a table. Accesses beyond 'end' that fall within the new table
>          * will be mapped lazily.
>          */
> -       if (stage2_pte_is_counted(pte))
> +       if (stage2_pte_is_counted(ctx->old))
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
>
>         kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> @@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                 data->childp = NULL;
>                 ret = stage2_map_walk_leaf(ctx, data);
>         } else {
> -               childp = kvm_pte_follow(*ctx->ptep, mm_ops);
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>         }
>
>         mm_ops->put_page(childp);
> @@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       kvm_pte_t *childp = NULL;
>         bool need_flush = false;
>
> -       if (!kvm_pte_valid(pte)) {
> -               if (stage2_pte_is_counted(pte)) {
> +       if (!kvm_pte_valid(ctx->old)) {
> +               if (stage2_pte_is_counted(ctx->old)) {
>                         kvm_clear_pte(ctx->ptep);
>                         mm_ops->put_page(ctx->ptep);
>                 }
>                 return 0;
>         }
>
> -       if (kvm_pte_table(pte, ctx->level)) {
> -               childp = kvm_pte_follow(pte, mm_ops);
> +       if (kvm_pte_table(ctx->old, ctx->level)) {
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
> -       } else if (stage2_pte_cacheable(pgt, pte)) {
> +       } else if (stage2_pte_cacheable(pgt, ctx->old)) {
>                 need_flush = !stage2_has_fwb(pgt);
>         }
>
> @@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         stage2_put_pte(ctx, mmu, mm_ops);
>
>         if (need_flush && mm_ops->dcache_clean_inval_poc)
> -               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>                                                kvm_granule_size(ctx->level));
>
>         if (childp)
> @@ -1013,11 +1013,11 @@ struct stage2_attr_data {
>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ctx->ptep;
> +       kvm_pte_t pte = ctx->old;
>         struct stage2_attr_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         data->level = ctx->level;
> @@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                  * stage-2 PTE if we are going to add executable permission.
>                  */
>                 if (mm_ops->icache_inval_pou &&
> -                   stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
> +                   stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>                                                   kvm_granule_size(ctx->level));
>                 WRITE_ONCE(*ctx->ptep, pte);
> @@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
> +       if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
>                 return 0;
>
>         if (mm_ops->dcache_clean_inval_poc)
> -               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>                                                kvm_granule_size(ctx->level));
>         return 0;
>  }
> @@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!stage2_pte_is_counted(pte))
> +       if (!stage2_pte_is_counted(ctx->old))
>                 return 0;
>
>         mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, ctx->level))
> -               mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Rather than reading the ptep all over the shop, read the ptep once from
> __kvm_pgtable_visit() and stick it in the visitor context. Reread the
> ptep after visiting a leaf in case the callback installed a new table
> underneath.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Looks good to me.
Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  1 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
>  arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
>  4 files changed, 48 insertions(+), 51 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 607f9bb8aab4..14d4b68a1e92 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
>
>  struct kvm_pgtable_visit_ctx {
>         kvm_pte_t                               *ptep;
> +       kvm_pte_t                               old;
>         void                                    *arg;
>         u64                                     addr;
>         u64                                     end;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 8f5b6a36a039..d21d1b08a055 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
>                                       enum kvm_pgtable_walk_flags visit)
>  {
>         struct check_walk_data *d = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
> +       if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
>                 return -EINVAL;
>
> -       return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
> +       return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
>  }
>
>  static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index a293cf5eba1b..6af443c9d78e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
> -       kvm_pte_t pte = *ctx->ptep;
>         phys_addr_t phys;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         /*
> @@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>         if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>                 return -EINVAL;
>
> -       phys = kvm_pte_to_phys(pte);
> +       phys = kvm_pte_to_phys(ctx->old);
>         if (!addr_is_memory(phys))
>                 return -EINVAL;
>
> @@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>          * Adjust the host stage-2 mappings to match the ownership attributes
>          * configured in the hypervisor stage-1.
>          */
> -       state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> +       state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
>         switch (state) {
>         case PKVM_PAGE_OWNED:
>                 return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 900c8b9c0cfc..fb3696b3a997 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
> +               .old    = READ_ONCE(*ptep),
>                 .arg    = data->walker->arg,
>                 .addr   = data->addr,
>                 .end    = data->end,
> @@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .flags  = flags,
>         };
>         int ret = 0;
> -       kvm_pte_t *childp, pte = *ptep;
> -       bool table = kvm_pte_table(pte, level);
> +       kvm_pte_t *childp;
> +       bool table = kvm_pte_table(ctx.old, level);
>
>         if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
>                 ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>
>         if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
>                 ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
> -               pte = *ptep;
> -               table = kvm_pte_table(pte, level);
> +               ctx.old = READ_ONCE(*ptep);
> +               table = kvm_pte_table(ctx.old, level);
>         }
>
>         if (ret)
> @@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(pte, data->pgt->mm_ops);
> +       childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
>         ret = __kvm_pgtable_walk(data, childp, level + 1);
>         if (ret)
>                 goto out;
> @@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct leaf_walk_data *data = ctx->arg;
>
> -       data->pte   = *ctx->ptep;
> +       data->pte   = ctx->old;
>         data->level = ctx->level;
>
>         return 0;
> @@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>  static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                     struct hyp_map_data *data)
>  {
> -       kvm_pte_t new, old = *ctx->ptep;
> +       kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>
>         if (!kvm_block_mapping_supported(ctx, phys))
> @@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>
>         data->phys += granule;
>         new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
> -       if (old == new)
> +       if (ctx->old == new)
>                 return true;
> -       if (!kvm_pte_valid(old))
> +       if (!kvm_pte_valid(ctx->old))
>                 data->mm_ops->get_page(ctx->ptep);
> -       else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
> +       else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
>         smp_store_release(ctx->ptep, new);
> @@ -461,16 +462,16 @@ struct hyp_unmap_data {
>  static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                             enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       kvm_pte_t *childp = NULL;
>         u64 granule = kvm_granule_size(ctx->level);
>         struct hyp_unmap_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return -EINVAL;
>
> -       if (kvm_pte_table(pte, ctx->level)) {
> -               childp = kvm_pte_follow(pte, mm_ops);
> +       if (kvm_pte_table(ctx->old, ctx->level)) {
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
> @@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                            enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, ctx->level))
> -               mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>
>         return 0;
>  }
> @@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
>          * Clear the existing PTE, and perform break-before-make with
>          * TLB maintenance if it was valid.
>          */
> -       if (kvm_pte_valid(*ctx->ptep)) {
> +       if (kvm_pte_valid(ctx->old)) {
>                 kvm_clear_pte(ctx->ptep);
>                 kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>         }
> @@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       kvm_pte_t new, old = *ctx->ptep;
> +       kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> -       if (stage2_pte_is_counted(old)) {
> +       if (stage2_pte_is_counted(ctx->old)) {
>                 /*
>                  * Skip updating the PTE if we are trying to recreate the exact
>                  * same mapping or only change the access permissions. Instead,
>                  * the vCPU will exit one more time from guest if still needed
>                  * and then go through the path of relaxing permissions.
>                  */
> -               if (!stage2_pte_needs_update(old, new))
> +               if (!stage2_pte_needs_update(ctx->old, new))
>                         return -EAGAIN;
>
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
> @@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +       data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -       kvm_pte_t *childp, pte = *ctx->ptep;
> +       kvm_pte_t *childp;
>         int ret;
>
>         if (data->anchor) {
> -               if (stage2_pte_is_counted(pte))
> +               if (stage2_pte_is_counted(ctx->old))
>                         mm_ops->put_page(ctx->ptep);
>
>                 return 0;
> @@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>          * a table. Accesses beyond 'end' that fall within the new table
>          * will be mapped lazily.
>          */
> -       if (stage2_pte_is_counted(pte))
> +       if (stage2_pte_is_counted(ctx->old))
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
>
>         kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> @@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                 data->childp = NULL;
>                 ret = stage2_map_walk_leaf(ctx, data);
>         } else {
> -               childp = kvm_pte_follow(*ctx->ptep, mm_ops);
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>         }
>
>         mm_ops->put_page(childp);
> @@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +       kvm_pte_t *childp = NULL;
>         bool need_flush = false;
>
> -       if (!kvm_pte_valid(pte)) {
> -               if (stage2_pte_is_counted(pte)) {
> +       if (!kvm_pte_valid(ctx->old)) {
> +               if (stage2_pte_is_counted(ctx->old)) {
>                         kvm_clear_pte(ctx->ptep);
>                         mm_ops->put_page(ctx->ptep);
>                 }
>                 return 0;
>         }
>
> -       if (kvm_pte_table(pte, ctx->level)) {
> -               childp = kvm_pte_follow(pte, mm_ops);
> +       if (kvm_pte_table(ctx->old, ctx->level)) {
> +               childp = kvm_pte_follow(ctx->old, mm_ops);
>
>                 if (mm_ops->page_count(childp) != 1)
>                         return 0;
> -       } else if (stage2_pte_cacheable(pgt, pte)) {
> +       } else if (stage2_pte_cacheable(pgt, ctx->old)) {
>                 need_flush = !stage2_has_fwb(pgt);
>         }
>
> @@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         stage2_put_pte(ctx, mmu, mm_ops);
>
>         if (need_flush && mm_ops->dcache_clean_inval_poc)
> -               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>                                                kvm_granule_size(ctx->level));
>
>         if (childp)
> @@ -1013,11 +1013,11 @@ struct stage2_attr_data {
>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t pte = *ctx->ptep;
> +       kvm_pte_t pte = ctx->old;
>         struct stage2_attr_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> -       if (!kvm_pte_valid(pte))
> +       if (!kvm_pte_valid(ctx->old))
>                 return 0;
>
>         data->level = ctx->level;
> @@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                  * stage-2 PTE if we are going to add executable permission.
>                  */
>                 if (mm_ops->icache_inval_pou &&
> -                   stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
> +                   stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>                                                   kvm_granule_size(ctx->level));
>                 WRITE_ONCE(*ctx->ptep, pte);
> @@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
> +       if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
>                 return 0;
>
>         if (mm_ops->dcache_clean_inval_poc)
> -               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>                                                kvm_granule_size(ctx->level));
>         return 0;
>  }
> @@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -       kvm_pte_t pte = *ctx->ptep;
>
> -       if (!stage2_pte_is_counted(pte))
> +       if (!stage2_pte_is_counted(ctx->old))
>                 return 0;
>
>         mm_ops->put_page(ctx->ptep);
>
> -       if (kvm_pte_table(pte, ctx->level))
> -               mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:23     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>  arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>  3 files changed, 26 insertions(+), 41 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>         kvm_pte_t                               *ptep;
>         kvm_pte_t                               old;
>         void                                    *arg;
> +       struct kvm_pgtable_mm_ops               *mm_ops;
>         u64                                     addr;
>         u64                                     end;
>         u32                                     level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>  static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                                          enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
>         phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>         struct kvm_pgtable_walker walker = {
>                 .cb     = finalize_host_mappings_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pkvm_pgtable.mm_ops,
>         };
>         int i, ret;
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             kvm_pte_t *pgtable, u32 level);
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +                                     struct kvm_pgtable_mm_ops *mm_ops,
>                                       kvm_pte_t *ptep, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
>                 .arg    = data->walker->arg,
> +               .mm_ops = mm_ops,
>                 .addr   = data->addr,
>                 .end    = data->end,
>                 .level  = level,
> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -       ret = __kvm_pgtable_walk(data, childp, level + 1);
> +       childp = kvm_pte_follow(ctx.old, mm_ops);
> +       ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>         if (ret)
>                 goto out;
>
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             kvm_pte_t *pgtable, u32 level)
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>  {
>         u32 idx;
>         int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>                 if (data->addr >= data->end)
>                         break;
>
> -               ret = __kvm_pgtable_visit(data, ptep, level);
> +               ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>                 if (ret)
>                         break;
>         }
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>         for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>                 kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>
> -               ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +               ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>                 if (ret)
>                         break;
>         }
> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>  struct hyp_map_data {
>         u64                             phys;
>         kvm_pte_t                       attr;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
>  static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (ctx->old == new)
>                 return true;
>         if (!kvm_pte_valid(ctx->old))
> -               data->mm_ops->get_page(ctx->ptep);
> +               ctx->mm_ops->get_page(ctx->ptep);
>         else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         kvm_pte_t *childp;
>         struct hyp_map_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (hyp_map_walker_try_leaf(ctx, data))
>                 return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>         int ret;
>         struct hyp_map_data map_data = {
>                 .phys   = ALIGN_DOWN(phys, PAGE_SIZE),
> -               .mm_ops = pgt->mm_ops,
>         };
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>         return ret;
>  }
>
> -struct hyp_unmap_data {
> -       u64                             unmapped;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
> -};
> -
>  static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                             enum kvm_pgtable_walk_flags visit)
>  {
>         kvm_pte_t *childp = NULL;
>         u64 granule = kvm_granule_size(ctx->level);
> -       struct hyp_unmap_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       u64 *unmapped = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                 kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
>                 __tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -               data->unmapped += granule;
> +               *unmapped += granule;
>         }
>
>         dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>
>  u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
> -       struct hyp_unmap_data unmap_data = {
> -               .mm_ops = pgt->mm_ops,
> -       };
> +       u64 unmapped = 0;
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_unmap_walker,
> -               .arg    = &unmap_data,
> +               .arg    = &unmapped,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>         };
>
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>                 return 0;
>
>         kvm_pgtable_walk(pgt, addr, size, &walker);
> -       return unmap_data.unmapped;
> +       return unmapped;
>  }
>
>  int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                            enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pgt->mm_ops,
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>         struct kvm_s2_mmu               *mmu;
>         void                            *memcache;
>
> -       struct kvm_pgtable_mm_ops       *mm_ops;
> -
>         /* Force mappings to page granularity */
>         bool                            force_pte;
>  };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +       data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp;
>         int ret;
>
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp;
>         int ret = 0;
>
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .phys           = ALIGN_DOWN(phys, PAGE_SIZE),
>                 .mmu            = pgt->mmu,
>                 .memcache       = mc,
> -               .mm_ops         = pgt->mm_ops,
>                 .force_pte      = pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>         };
>         struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .phys           = KVM_PHYS_INVALID,
>                 .mmu            = pgt->mmu,
>                 .memcache       = mc,
> -               .mm_ops         = pgt->mm_ops,
>                 .owner_id       = owner_id,
>                 .force_pte      = true,
>         };
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
> -       struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp = NULL;
>         bool need_flush = false;
>
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>         kvm_pte_t                       attr_clr;
>         kvm_pte_t                       pte;
>         u32                             level;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         kvm_pte_t pte = ctx->old;
>         struct stage2_attr_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>         struct stage2_attr_data data = {
>                 .attr_set       = attr_set & attr_mask,
>                 .attr_clr       = attr_clr & attr_mask,
> -               .mm_ops         = pgt->mm_ops,
>         };
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>  static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!stage2_pte_is_counted(ctx->old))
>                 return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>                 .cb     = stage2_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF |
>                           KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pgt->mm_ops,
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>  arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>  3 files changed, 26 insertions(+), 41 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>         kvm_pte_t                               *ptep;
>         kvm_pte_t                               old;
>         void                                    *arg;
> +       struct kvm_pgtable_mm_ops               *mm_ops;
>         u64                                     addr;
>         u64                                     end;
>         u32                                     level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>  static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                                          enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
>         phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>         struct kvm_pgtable_walker walker = {
>                 .cb     = finalize_host_mappings_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pkvm_pgtable.mm_ops,
>         };
>         int i, ret;
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             kvm_pte_t *pgtable, u32 level);
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +                                     struct kvm_pgtable_mm_ops *mm_ops,
>                                       kvm_pte_t *ptep, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
>                 .arg    = data->walker->arg,
> +               .mm_ops = mm_ops,
>                 .addr   = data->addr,
>                 .end    = data->end,
>                 .level  = level,
> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -       ret = __kvm_pgtable_walk(data, childp, level + 1);
> +       childp = kvm_pte_follow(ctx.old, mm_ops);
> +       ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>         if (ret)
>                 goto out;
>
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             kvm_pte_t *pgtable, u32 level)
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>  {
>         u32 idx;
>         int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>                 if (data->addr >= data->end)
>                         break;
>
> -               ret = __kvm_pgtable_visit(data, ptep, level);
> +               ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>                 if (ret)
>                         break;
>         }
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>         for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>                 kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>
> -               ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +               ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>                 if (ret)
>                         break;
>         }
> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>  struct hyp_map_data {
>         u64                             phys;
>         kvm_pte_t                       attr;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
>  static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (ctx->old == new)
>                 return true;
>         if (!kvm_pte_valid(ctx->old))
> -               data->mm_ops->get_page(ctx->ptep);
> +               ctx->mm_ops->get_page(ctx->ptep);
>         else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         kvm_pte_t *childp;
>         struct hyp_map_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (hyp_map_walker_try_leaf(ctx, data))
>                 return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>         int ret;
>         struct hyp_map_data map_data = {
>                 .phys   = ALIGN_DOWN(phys, PAGE_SIZE),
> -               .mm_ops = pgt->mm_ops,
>         };
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>         return ret;
>  }
>
> -struct hyp_unmap_data {
> -       u64                             unmapped;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
> -};
> -
>  static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                             enum kvm_pgtable_walk_flags visit)
>  {
>         kvm_pte_t *childp = NULL;
>         u64 granule = kvm_granule_size(ctx->level);
> -       struct hyp_unmap_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       u64 *unmapped = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                 kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
>                 __tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -               data->unmapped += granule;
> +               *unmapped += granule;
>         }
>
>         dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>
>  u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
> -       struct hyp_unmap_data unmap_data = {
> -               .mm_ops = pgt->mm_ops,
> -       };
> +       u64 unmapped = 0;
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_unmap_walker,
> -               .arg    = &unmap_data,
> +               .arg    = &unmapped,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>         };
>
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>                 return 0;
>
>         kvm_pgtable_walk(pgt, addr, size, &walker);
> -       return unmap_data.unmapped;
> +       return unmapped;
>  }
>
>  int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                            enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pgt->mm_ops,
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>         struct kvm_s2_mmu               *mmu;
>         void                            *memcache;
>
> -       struct kvm_pgtable_mm_ops       *mm_ops;
> -
>         /* Force mappings to page granularity */
>         bool                            force_pte;
>  };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +       data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp;
>         int ret;
>
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp;
>         int ret = 0;
>
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .phys           = ALIGN_DOWN(phys, PAGE_SIZE),
>                 .mmu            = pgt->mmu,
>                 .memcache       = mc,
> -               .mm_ops         = pgt->mm_ops,
>                 .force_pte      = pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>         };
>         struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .phys           = KVM_PHYS_INVALID,
>                 .mmu            = pgt->mmu,
>                 .memcache       = mc,
> -               .mm_ops         = pgt->mm_ops,
>                 .owner_id       = owner_id,
>                 .force_pte      = true,
>         };
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
> -       struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp = NULL;
>         bool need_flush = false;
>
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>         kvm_pte_t                       attr_clr;
>         kvm_pte_t                       pte;
>         u32                             level;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         kvm_pte_t pte = ctx->old;
>         struct stage2_attr_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>         struct stage2_attr_data data = {
>                 .attr_set       = attr_set & attr_mask,
>                 .attr_clr       = attr_clr & attr_mask,
> -               .mm_ops         = pgt->mm_ops,
>         };
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>  static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!stage2_pte_is_counted(ctx->old))
>                 return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>                 .cb     = stage2_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF |
>                           KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pgt->mm_ops,
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  1 +
>  arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>  arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>  3 files changed, 26 insertions(+), 41 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>         kvm_pte_t                               *ptep;
>         kvm_pte_t                               old;
>         void                                    *arg;
> +       struct kvm_pgtable_mm_ops               *mm_ops;
>         u64                                     addr;
>         u64                                     end;
>         u32                                     level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>  static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                                          enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         enum kvm_pgtable_prot prot;
>         enum pkvm_page_state state;
>         phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>         struct kvm_pgtable_walker walker = {
>                 .cb     = finalize_host_mappings_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pkvm_pgtable.mm_ops,
>         };
>         int i, ret;
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             kvm_pte_t *pgtable, u32 level);
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +                                     struct kvm_pgtable_mm_ops *mm_ops,
>                                       kvm_pte_t *ptep, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
>                 .arg    = data->walker->arg,
> +               .mm_ops = mm_ops,
>                 .addr   = data->addr,
>                 .end    = data->end,
>                 .level  = level,
> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -       ret = __kvm_pgtable_walk(data, childp, level + 1);
> +       childp = kvm_pte_follow(ctx.old, mm_ops);
> +       ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>         if (ret)
>                 goto out;
>
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             kvm_pte_t *pgtable, u32 level)
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>  {
>         u32 idx;
>         int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>                 if (data->addr >= data->end)
>                         break;
>
> -               ret = __kvm_pgtable_visit(data, ptep, level);
> +               ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>                 if (ret)
>                         break;
>         }
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>         for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>                 kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>
> -               ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +               ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>                 if (ret)
>                         break;
>         }
> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>  struct hyp_map_data {
>         u64                             phys;
>         kvm_pte_t                       attr;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
>  static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (ctx->old == new)
>                 return true;
>         if (!kvm_pte_valid(ctx->old))
> -               data->mm_ops->get_page(ctx->ptep);
> +               ctx->mm_ops->get_page(ctx->ptep);
>         else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>                 return false;
>
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         kvm_pte_t *childp;
>         struct hyp_map_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (hyp_map_walker_try_leaf(ctx, data))
>                 return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>         int ret;
>         struct hyp_map_data map_data = {
>                 .phys   = ALIGN_DOWN(phys, PAGE_SIZE),
> -               .mm_ops = pgt->mm_ops,
>         };
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>         return ret;
>  }
>
> -struct hyp_unmap_data {
> -       u64                             unmapped;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
> -};
> -
>  static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                             enum kvm_pgtable_walk_flags visit)
>  {
>         kvm_pte_t *childp = NULL;
>         u64 granule = kvm_granule_size(ctx->level);
> -       struct hyp_unmap_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       u64 *unmapped = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                 kvm_clear_pte(ctx->ptep);
>                 dsb(ishst);
>                 __tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -               data->unmapped += granule;
> +               *unmapped += granule;
>         }
>
>         dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>
>  u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
> -       struct hyp_unmap_data unmap_data = {
> -               .mm_ops = pgt->mm_ops,
> -       };
> +       u64 unmapped = 0;
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_unmap_walker,
> -               .arg    = &unmap_data,
> +               .arg    = &unmapped,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>         };
>
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>                 return 0;
>
>         kvm_pgtable_walk(pgt, addr, size, &walker);
> -       return unmap_data.unmapped;
> +       return unmapped;
>  }
>
>  int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                            enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>         struct kvm_pgtable_walker walker = {
>                 .cb     = hyp_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pgt->mm_ops,
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>         struct kvm_s2_mmu               *mmu;
>         void                            *memcache;
>
> -       struct kvm_pgtable_mm_ops       *mm_ops;
> -
>         /* Force mappings to page granularity */
>         bool                            force_pte;
>  };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         kvm_pte_t new;
>         u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>         struct kvm_pgtable *pgt = data->mmu->pgt;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +       data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp;
>         int ret;
>
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>                                       struct stage2_map_data *data)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp;
>         int ret = 0;
>
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .phys           = ALIGN_DOWN(phys, PAGE_SIZE),
>                 .mmu            = pgt->mmu,
>                 .memcache       = mc,
> -               .mm_ops         = pgt->mm_ops,
>                 .force_pte      = pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>         };
>         struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .phys           = KVM_PHYS_INVALID,
>                 .mmu            = pgt->mmu,
>                 .memcache       = mc,
> -               .mm_ops         = pgt->mm_ops,
>                 .owner_id       = owner_id,
>                 .force_pte      = true,
>         };
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         struct kvm_pgtable *pgt = ctx->arg;
>         struct kvm_s2_mmu *mmu = pgt->mmu;
> -       struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>         kvm_pte_t *childp = NULL;
>         bool need_flush = false;
>
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>         kvm_pte_t                       attr_clr;
>         kvm_pte_t                       pte;
>         u32                             level;
> -       struct kvm_pgtable_mm_ops       *mm_ops;
>  };
>
>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>         kvm_pte_t pte = ctx->old;
>         struct stage2_attr_data *data = ctx->arg;
> -       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!kvm_pte_valid(ctx->old))
>                 return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>         struct stage2_attr_data data = {
>                 .attr_set       = attr_set & attr_mask,
>                 .attr_clr       = attr_clr & attr_mask,
> -               .mm_ops         = pgt->mm_ops,
>         };
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>  static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                               enum kvm_pgtable_walk_flags visit)
>  {
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
>         if (!stage2_pte_is_counted(ctx->old))
>                 return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>                 .cb     = stage2_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF |
>                           KVM_PGTABLE_WALK_TABLE_POST,
> -               .arg    = pgt->mm_ops,
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:23     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> In order to tear down page tables from outside the context of
> kvm_pgtable (such as an RCU callback), stop passing a pointer through
> kvm_pgtable_walk_data.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
>  1 file changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index db25e81a9890..93989b750a26 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -50,7 +50,6 @@
>  #define KVM_MAX_OWNER_ID               1
>
>  struct kvm_pgtable_walk_data {
> -       struct kvm_pgtable              *pgt;
>         struct kvm_pgtable_walker       *walker;
>
>         u64                             addr;
> @@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
>         return (data->addr >> shift) & mask;
>  }
>
> -static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
> +static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>  {
>         u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
>         u64 mask = BIT(pgt->ia_bits) - 1;
> @@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>         return (addr & mask) >> shift;
>  }
>
> -static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
> -{
> -       return __kvm_pgd_page_idx(data->pgt, data->addr);
> -}
> -
>  static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>  {
>         struct kvm_pgtable pgt = {
> @@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>                 .start_level    = start_level,
>         };
>
> -       return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> +       return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>  }
>
>  static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> @@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>         return ret;
>  }
>
> -static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> +static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
>  {
>         u32 idx;
>         int ret = 0;
> -       struct kvm_pgtable *pgt = data->pgt;
>         u64 limit = BIT(pgt->ia_bits);
>
>         if (data->addr > limit || data->end > limit)
> @@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>         if (!pgt->pgd)
>                 return -EINVAL;
>
> -       for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> +       for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
>                 kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>
>                 ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> @@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                      struct kvm_pgtable_walker *walker)
>  {
>         struct kvm_pgtable_walk_data walk_data = {
> -               .pgt    = pgt,
>                 .addr   = ALIGN_DOWN(addr, PAGE_SIZE),
>                 .end    = PAGE_ALIGN(walk_data.addr + size),
>                 .walker = walker,
>         };
>
> -       return _kvm_pgtable_walk(&walk_data);
> +       return _kvm_pgtable_walk(pgt, &walk_data);
>  }
>
>  struct leaf_walk_data {
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> In order to tear down page tables from outside the context of
> kvm_pgtable (such as an RCU callback), stop passing a pointer through
> kvm_pgtable_walk_data.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
>  1 file changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index db25e81a9890..93989b750a26 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -50,7 +50,6 @@
>  #define KVM_MAX_OWNER_ID               1
>
>  struct kvm_pgtable_walk_data {
> -       struct kvm_pgtable              *pgt;
>         struct kvm_pgtable_walker       *walker;
>
>         u64                             addr;
> @@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
>         return (data->addr >> shift) & mask;
>  }
>
> -static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
> +static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>  {
>         u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
>         u64 mask = BIT(pgt->ia_bits) - 1;
> @@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>         return (addr & mask) >> shift;
>  }
>
> -static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
> -{
> -       return __kvm_pgd_page_idx(data->pgt, data->addr);
> -}
> -
>  static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>  {
>         struct kvm_pgtable pgt = {
> @@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>                 .start_level    = start_level,
>         };
>
> -       return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> +       return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>  }
>
>  static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> @@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>         return ret;
>  }
>
> -static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> +static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
>  {
>         u32 idx;
>         int ret = 0;
> -       struct kvm_pgtable *pgt = data->pgt;
>         u64 limit = BIT(pgt->ia_bits);
>
>         if (data->addr > limit || data->end > limit)
> @@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>         if (!pgt->pgd)
>                 return -EINVAL;
>
> -       for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> +       for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
>                 kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>
>                 ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> @@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                      struct kvm_pgtable_walker *walker)
>  {
>         struct kvm_pgtable_walk_data walk_data = {
> -               .pgt    = pgt,
>                 .addr   = ALIGN_DOWN(addr, PAGE_SIZE),
>                 .end    = PAGE_ALIGN(walk_data.addr + size),
>                 .walker = walker,
>         };
>
> -       return _kvm_pgtable_walk(&walk_data);
> +       return _kvm_pgtable_walk(pgt, &walk_data);
>  }
>
>  struct leaf_walk_data {
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> In order to tear down page tables from outside the context of
> kvm_pgtable (such as an RCU callback), stop passing a pointer through
> kvm_pgtable_walk_data.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
>  1 file changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index db25e81a9890..93989b750a26 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -50,7 +50,6 @@
>  #define KVM_MAX_OWNER_ID               1
>
>  struct kvm_pgtable_walk_data {
> -       struct kvm_pgtable              *pgt;
>         struct kvm_pgtable_walker       *walker;
>
>         u64                             addr;
> @@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
>         return (data->addr >> shift) & mask;
>  }
>
> -static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
> +static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>  {
>         u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
>         u64 mask = BIT(pgt->ia_bits) - 1;
> @@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>         return (addr & mask) >> shift;
>  }
>
> -static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
> -{
> -       return __kvm_pgd_page_idx(data->pgt, data->addr);
> -}
> -
>  static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>  {
>         struct kvm_pgtable pgt = {
> @@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>                 .start_level    = start_level,
>         };
>
> -       return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> +       return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>  }
>
>  static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> @@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>         return ret;
>  }
>
> -static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> +static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
>  {
>         u32 idx;
>         int ret = 0;
> -       struct kvm_pgtable *pgt = data->pgt;
>         u64 limit = BIT(pgt->ia_bits);
>
>         if (data->addr > limit || data->end > limit)
> @@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>         if (!pgt->pgd)
>                 return -EINVAL;
>
> -       for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> +       for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
>                 kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>
>                 ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> @@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                      struct kvm_pgtable_walker *walker)
>  {
>         struct kvm_pgtable_walk_data walk_data = {
> -               .pgt    = pgt,
>                 .addr   = ALIGN_DOWN(addr, PAGE_SIZE),
>                 .end    = PAGE_ALIGN(walk_data.addr + size),
>                 .walker = walker,
>         };
>
> -       return _kvm_pgtable_walk(&walk_data);
> +       return _kvm_pgtable_walk(pgt, &walk_data);
>  }
>
>  struct leaf_walk_data {
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:23     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> A subsequent change to KVM will move the tear down of an unlinked
> stage-2 subtree out of the critical path of the break-before-make
> sequence.
>
> Introduce a new helper for tearing down unlinked stage-2 subtrees.
> Leverage the existing stage-2 free walkers to do so, with a deep call
> into __kvm_pgtable_walk() as the subtree is no longer reachable from the
> root.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index a752793482cb..93b1feeaebab 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   */
>  void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>
> +/**
> + * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
> + * @mm_ops:    Memory management callbacks.
> + * @pgtable:   Unlinked stage-2 paging structure to be freed.
> + * @level:     Level of the stage-2 paging structure to be freed.
> + *
> + * The page-table is assumed to be unreachable by any hardware walkers prior to
> + * freeing and therefore no TLB invalidation is performed.
> + */
> +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
> +
>  /**
>   * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
>   * @pgt:       Page-table structure initialised by kvm_pgtable_stage2_init*().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 93989b750a26..363a5cce7e1a 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>         pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
>         pgt->pgd = NULL;
>  }
> +
> +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
> +{
> +       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> +       struct kvm_pgtable_walker walker = {
> +               .cb     = stage2_free_walker,
> +               .flags  = KVM_PGTABLE_WALK_LEAF |
> +                         KVM_PGTABLE_WALK_TABLE_POST,
> +       };
> +       struct kvm_pgtable_walk_data data = {
> +               .walker = &walker,
> +
> +               /*
> +                * At this point the IPA really doesn't matter, as the page
> +                * table being traversed has already been removed from the stage
> +                * 2. Set an appropriate range to cover the entire page table.
> +                */
> +               .addr   = 0,
> +               .end    = kvm_granule_size(level),
> +       };
> +
> +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> +}

Will this callback be able to yield? In my experience, if processing a
large teardown (i.e. level >=3 / maps 512G region) it's possible to
hit scheduler tick warnings.


> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> A subsequent change to KVM will move the tear down of an unlinked
> stage-2 subtree out of the critical path of the break-before-make
> sequence.
>
> Introduce a new helper for tearing down unlinked stage-2 subtrees.
> Leverage the existing stage-2 free walkers to do so, with a deep call
> into __kvm_pgtable_walk() as the subtree is no longer reachable from the
> root.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index a752793482cb..93b1feeaebab 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   */
>  void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>
> +/**
> + * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
> + * @mm_ops:    Memory management callbacks.
> + * @pgtable:   Unlinked stage-2 paging structure to be freed.
> + * @level:     Level of the stage-2 paging structure to be freed.
> + *
> + * The page-table is assumed to be unreachable by any hardware walkers prior to
> + * freeing and therefore no TLB invalidation is performed.
> + */
> +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
> +
>  /**
>   * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
>   * @pgt:       Page-table structure initialised by kvm_pgtable_stage2_init*().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 93989b750a26..363a5cce7e1a 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>         pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
>         pgt->pgd = NULL;
>  }
> +
> +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
> +{
> +       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> +       struct kvm_pgtable_walker walker = {
> +               .cb     = stage2_free_walker,
> +               .flags  = KVM_PGTABLE_WALK_LEAF |
> +                         KVM_PGTABLE_WALK_TABLE_POST,
> +       };
> +       struct kvm_pgtable_walk_data data = {
> +               .walker = &walker,
> +
> +               /*
> +                * At this point the IPA really doesn't matter, as the page
> +                * table being traversed has already been removed from the stage
> +                * 2. Set an appropriate range to cover the entire page table.
> +                */
> +               .addr   = 0,
> +               .end    = kvm_granule_size(level),
> +       };
> +
> +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> +}

Will this callback be able to yield? In my experience, if processing a
large teardown (i.e. level >=3 / maps 512G region) it's possible to
hit scheduler tick warnings.


> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> A subsequent change to KVM will move the tear down of an unlinked
> stage-2 subtree out of the critical path of the break-before-make
> sequence.
>
> Introduce a new helper for tearing down unlinked stage-2 subtrees.
> Leverage the existing stage-2 free walkers to do so, with a deep call
> into __kvm_pgtable_walk() as the subtree is no longer reachable from the
> root.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
>  2 files changed, 34 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index a752793482cb..93b1feeaebab 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   */
>  void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>
> +/**
> + * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
> + * @mm_ops:    Memory management callbacks.
> + * @pgtable:   Unlinked stage-2 paging structure to be freed.
> + * @level:     Level of the stage-2 paging structure to be freed.
> + *
> + * The page-table is assumed to be unreachable by any hardware walkers prior to
> + * freeing and therefore no TLB invalidation is performed.
> + */
> +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
> +
>  /**
>   * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
>   * @pgt:       Page-table structure initialised by kvm_pgtable_stage2_init*().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 93989b750a26..363a5cce7e1a 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>         pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
>         pgt->pgd = NULL;
>  }
> +
> +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
> +{
> +       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> +       struct kvm_pgtable_walker walker = {
> +               .cb     = stage2_free_walker,
> +               .flags  = KVM_PGTABLE_WALK_LEAF |
> +                         KVM_PGTABLE_WALK_TABLE_POST,
> +       };
> +       struct kvm_pgtable_walk_data data = {
> +               .walker = &walker,
> +
> +               /*
> +                * At this point the IPA really doesn't matter, as the page
> +                * table being traversed has already been removed from the stage
> +                * 2. Set an appropriate range to cover the entire page table.
> +                */
> +               .addr   = 0,
> +               .end    = kvm_granule_size(level),
> +       };
> +
> +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> +}

Will this callback be able to yield? In my experience, if processing a
large teardown (i.e. level >=3 / maps 512G region) it's possible to
hit scheduler tick warnings.


> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 06/14] KVM: arm64: Use an opaque type for pteps
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:23     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Use an opaque type for pteps and require visitors explicitly dereference
> the pointer before using. Protecting page table memory with RCU requires
> that KVM dereferences RCU-annotated pointers before using. However, RCU
> is not available for use in the nVHE hypervisor and the opaque type can
> be conditionally annotated with RCU for the stage-2 MMU.
>
> Call the type a 'pteref' to avoid a naming collision with raw pteps. No
> functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  9 ++++++++-
>  arch/arm64/kvm/hyp/pgtable.c         | 27 ++++++++++++++-------------
>  arch/arm64/kvm/mmu.c                 |  2 +-
>  3 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 93b1feeaebab..cbd2851eefc1 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>
>  typedef u64 kvm_pte_t;
>
> +typedef kvm_pte_t *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)

Since shared is not used and never true as of this commit, it would
probably be worth explaining what it's for in the change description.


> +{
> +       return pteref;
> +}
> +
>  #define KVM_PTE_VALID                  BIT(0)
>
>  #define KVM_PTE_ADDR_MASK              GENMASK(47, PAGE_SHIFT)
> @@ -175,7 +182,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
>  struct kvm_pgtable {
>         u32                                     ia_bits;
>         u32                                     start_level;
> -       kvm_pte_t                               *pgd;
> +       kvm_pteref_t                            pgd;
>         struct kvm_pgtable_mm_ops               *mm_ops;
>
>         /* Stage-2 only */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 363a5cce7e1a..7511494537e5 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -175,13 +175,14 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
>
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       struct kvm_pgtable_mm_ops *mm_ops,
> -                                     kvm_pte_t *ptep, u32 level)
> +                                     kvm_pteref_t pteref, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
> @@ -193,7 +194,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .flags  = flags,
>         };
>         int ret = 0;
> -       kvm_pte_t *childp;
> +       kvm_pteref_t childp;
>         bool table = kvm_pte_table(ctx.old, level);
>
>         if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> @@ -214,7 +215,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(ctx.old, mm_ops);
> +       childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
>         ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>         if (ret)
>                 goto out;
> @@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
>  {
>         u32 idx;
>         int ret = 0;
> @@ -236,12 +237,12 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>                 return -EINVAL;
>
>         for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
> -               kvm_pte_t *ptep = &pgtable[idx];
> +               kvm_pteref_t pteref = &pgtable[idx];
>
>                 if (data->addr >= data->end)
>                         break;
>
> -               ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
> +               ret = __kvm_pgtable_visit(data, mm_ops, pteref, level);
>                 if (ret)
>                         break;
>         }
> @@ -262,9 +263,9 @@ static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_da
>                 return -EINVAL;
>
>         for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
> -               kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
> +               kvm_pteref_t pteref = &pgt->pgd[idx * PTRS_PER_PTE];
>
> -               ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> +               ret = __kvm_pgtable_walk(data, pgt->mm_ops, pteref, pgt->start_level);
>                 if (ret)
>                         break;
>         }
> @@ -507,7 +508,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  {
>         u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
>
> -       pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
> +       pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
>         if (!pgt->pgd)
>                 return -ENOMEM;
>
> @@ -544,7 +545,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> -       pgt->mm_ops->put_page(pgt->pgd);
> +       pgt->mm_ops->put_page(kvm_dereference_pteref(pgt->pgd, false));
>         pgt->pgd = NULL;
>  }
>
> @@ -1157,7 +1158,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>         u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>
>         pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
> -       pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
> +       pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
>         if (!pgt->pgd)
>                 return -ENOMEM;
>
> @@ -1200,7 +1201,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
>         pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
> -       pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> +       pgt->mm_ops->free_pages_exact(kvm_dereference_pteref(pgt->pgd, false), pgd_sz);
>         pgt->pgd = NULL;
>  }
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 60ee3d9f01f8..5e197ae190ef 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -640,7 +640,7 @@ static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
>  static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  {
>         struct kvm_pgtable pgt = {
> -               .pgd            = (kvm_pte_t *)kvm->mm->pgd,
> +               .pgd            = (kvm_pteref_t)kvm->mm->pgd,
>                 .ia_bits        = VA_BITS,
>                 .start_level    = (KVM_PGTABLE_MAX_LEVELS -
>                                    CONFIG_PGTABLE_LEVELS),
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 06/14] KVM: arm64: Use an opaque type for pteps
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Use an opaque type for pteps and require visitors explicitly dereference
> the pointer before using. Protecting page table memory with RCU requires
> that KVM dereferences RCU-annotated pointers before using. However, RCU
> is not available for use in the nVHE hypervisor and the opaque type can
> be conditionally annotated with RCU for the stage-2 MMU.
>
> Call the type a 'pteref' to avoid a naming collision with raw pteps. No
> functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  9 ++++++++-
>  arch/arm64/kvm/hyp/pgtable.c         | 27 ++++++++++++++-------------
>  arch/arm64/kvm/mmu.c                 |  2 +-
>  3 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 93b1feeaebab..cbd2851eefc1 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>
>  typedef u64 kvm_pte_t;
>
> +typedef kvm_pte_t *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)

Since shared is not used and never true as of this commit, it would
probably be worth explaining what it's for in the change description.


> +{
> +       return pteref;
> +}
> +
>  #define KVM_PTE_VALID                  BIT(0)
>
>  #define KVM_PTE_ADDR_MASK              GENMASK(47, PAGE_SHIFT)
> @@ -175,7 +182,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
>  struct kvm_pgtable {
>         u32                                     ia_bits;
>         u32                                     start_level;
> -       kvm_pte_t                               *pgd;
> +       kvm_pteref_t                            pgd;
>         struct kvm_pgtable_mm_ops               *mm_ops;
>
>         /* Stage-2 only */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 363a5cce7e1a..7511494537e5 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -175,13 +175,14 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
>
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       struct kvm_pgtable_mm_ops *mm_ops,
> -                                     kvm_pte_t *ptep, u32 level)
> +                                     kvm_pteref_t pteref, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
> @@ -193,7 +194,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .flags  = flags,
>         };
>         int ret = 0;
> -       kvm_pte_t *childp;
> +       kvm_pteref_t childp;
>         bool table = kvm_pte_table(ctx.old, level);
>
>         if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> @@ -214,7 +215,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(ctx.old, mm_ops);
> +       childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
>         ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>         if (ret)
>                 goto out;
> @@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
>  {
>         u32 idx;
>         int ret = 0;
> @@ -236,12 +237,12 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>                 return -EINVAL;
>
>         for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
> -               kvm_pte_t *ptep = &pgtable[idx];
> +               kvm_pteref_t pteref = &pgtable[idx];
>
>                 if (data->addr >= data->end)
>                         break;
>
> -               ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
> +               ret = __kvm_pgtable_visit(data, mm_ops, pteref, level);
>                 if (ret)
>                         break;
>         }
> @@ -262,9 +263,9 @@ static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_da
>                 return -EINVAL;
>
>         for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
> -               kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
> +               kvm_pteref_t pteref = &pgt->pgd[idx * PTRS_PER_PTE];
>
> -               ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> +               ret = __kvm_pgtable_walk(data, pgt->mm_ops, pteref, pgt->start_level);
>                 if (ret)
>                         break;
>         }
> @@ -507,7 +508,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  {
>         u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
>
> -       pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
> +       pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
>         if (!pgt->pgd)
>                 return -ENOMEM;
>
> @@ -544,7 +545,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> -       pgt->mm_ops->put_page(pgt->pgd);
> +       pgt->mm_ops->put_page(kvm_dereference_pteref(pgt->pgd, false));
>         pgt->pgd = NULL;
>  }
>
> @@ -1157,7 +1158,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>         u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>
>         pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
> -       pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
> +       pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
>         if (!pgt->pgd)
>                 return -ENOMEM;
>
> @@ -1200,7 +1201,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
>         pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
> -       pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> +       pgt->mm_ops->free_pages_exact(kvm_dereference_pteref(pgt->pgd, false), pgd_sz);
>         pgt->pgd = NULL;
>  }
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 60ee3d9f01f8..5e197ae190ef 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -640,7 +640,7 @@ static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
>  static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  {
>         struct kvm_pgtable pgt = {
> -               .pgd            = (kvm_pte_t *)kvm->mm->pgd,
> +               .pgd            = (kvm_pteref_t)kvm->mm->pgd,
>                 .ia_bits        = VA_BITS,
>                 .start_level    = (KVM_PGTABLE_MAX_LEVELS -
>                                    CONFIG_PGTABLE_LEVELS),
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 06/14] KVM: arm64: Use an opaque type for pteps
@ 2022-11-09 22:23     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Use an opaque type for pteps and require visitors explicitly dereference
> the pointer before using. Protecting page table memory with RCU requires
> that KVM dereferences RCU-annotated pointers before using. However, RCU
> is not available for use in the nVHE hypervisor and the opaque type can
> be conditionally annotated with RCU for the stage-2 MMU.
>
> Call the type a 'pteref' to avoid a naming collision with raw pteps. No
> functional change intended.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  9 ++++++++-
>  arch/arm64/kvm/hyp/pgtable.c         | 27 ++++++++++++++-------------
>  arch/arm64/kvm/mmu.c                 |  2 +-
>  3 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 93b1feeaebab..cbd2851eefc1 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>
>  typedef u64 kvm_pte_t;
>
> +typedef kvm_pte_t *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)

Since shared is not used and never true as of this commit, it would
probably be worth explaining what it's for in the change description.


> +{
> +       return pteref;
> +}
> +
>  #define KVM_PTE_VALID                  BIT(0)
>
>  #define KVM_PTE_ADDR_MASK              GENMASK(47, PAGE_SHIFT)
> @@ -175,7 +182,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
>  struct kvm_pgtable {
>         u32                                     ia_bits;
>         u32                                     start_level;
> -       kvm_pte_t                               *pgd;
> +       kvm_pteref_t                            pgd;
>         struct kvm_pgtable_mm_ops               *mm_ops;
>
>         /* Stage-2 only */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 363a5cce7e1a..7511494537e5 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -175,13 +175,14 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
>
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       struct kvm_pgtable_mm_ops *mm_ops,
> -                                     kvm_pte_t *ptep, u32 level)
> +                                     kvm_pteref_t pteref, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
> @@ -193,7 +194,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 .flags  = flags,
>         };
>         int ret = 0;
> -       kvm_pte_t *childp;
> +       kvm_pteref_t childp;
>         bool table = kvm_pte_table(ctx.old, level);
>
>         if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> @@ -214,7 +215,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                 goto out;
>         }
>
> -       childp = kvm_pte_follow(ctx.old, mm_ops);
> +       childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
>         ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>         if (ret)
>                 goto out;
> @@ -227,7 +228,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  }
>
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
> +                             struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
>  {
>         u32 idx;
>         int ret = 0;
> @@ -236,12 +237,12 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>                 return -EINVAL;
>
>         for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
> -               kvm_pte_t *ptep = &pgtable[idx];
> +               kvm_pteref_t pteref = &pgtable[idx];
>
>                 if (data->addr >= data->end)
>                         break;
>
> -               ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
> +               ret = __kvm_pgtable_visit(data, mm_ops, pteref, level);
>                 if (ret)
>                         break;
>         }
> @@ -262,9 +263,9 @@ static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_da
>                 return -EINVAL;
>
>         for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
> -               kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
> +               kvm_pteref_t pteref = &pgt->pgd[idx * PTRS_PER_PTE];
>
> -               ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> +               ret = __kvm_pgtable_walk(data, pgt->mm_ops, pteref, pgt->start_level);
>                 if (ret)
>                         break;
>         }
> @@ -507,7 +508,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  {
>         u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
>
> -       pgt->pgd = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
> +       pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
>         if (!pgt->pgd)
>                 return -ENOMEM;
>
> @@ -544,7 +545,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>         };
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> -       pgt->mm_ops->put_page(pgt->pgd);
> +       pgt->mm_ops->put_page(kvm_dereference_pteref(pgt->pgd, false));
>         pgt->pgd = NULL;
>  }
>
> @@ -1157,7 +1158,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>         u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>
>         pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
> -       pgt->pgd = mm_ops->zalloc_pages_exact(pgd_sz);
> +       pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
>         if (!pgt->pgd)
>                 return -ENOMEM;
>
> @@ -1200,7 +1201,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>
>         WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
>         pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
> -       pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> +       pgt->mm_ops->free_pages_exact(kvm_dereference_pteref(pgt->pgd, false), pgd_sz);
>         pgt->pgd = NULL;
>  }
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 60ee3d9f01f8..5e197ae190ef 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -640,7 +640,7 @@ static struct kvm_pgtable_mm_ops kvm_user_mm_ops = {
>  static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  {
>         struct kvm_pgtable pgt = {
> -               .pgd            = (kvm_pte_t *)kvm->mm->pgd,
> +               .pgd            = (kvm_pteref_t)kvm->mm->pgd,
>                 .ia_bits        = VA_BITS,
>                 .start_level    = (KVM_PGTABLE_MAX_LEVELS -
>                                    CONFIG_PGTABLE_LEVELS),
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:24     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:24 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> The break-before-make sequence is a bit annoying as it opens a window
> wherein memory is unmapped from the guest. KVM should replace the PTE
> as quickly as possible and avoid unnecessary work in between.
>
> Presently, the stage-2 map walker tears down a removed table before
> installing a block mapping when coalescing a table into a block. As the
> removed table is no longer visible to hardware walkers after the
> DSB+TLBI, it is possible to move the remaining cleanup to happen after
> installing the new PTE.
>
> Reshuffle the stage-2 map walker to install the new block entry in
> the pre-order callback. Unwire all of the teardown logic and replace
> it with a call to kvm_pgtable_stage2_free_removed() after fixing
> the PTE. The post-order visitor is now completely unnecessary, so drop
> it. Finally, touch up the comments to better represent the now
> simplified map walker.
>
> Note that the call to tear down the unlinked stage-2 is indirected
> as a subsequent change will use an RCU callback to trigger tear down.
> RCU is not available to pKVM, so there is a need to use different
> implementations on pKVM and non-pKVM VMs.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

That anchor scheme is complicated. Glad to see it removed in favor of this.
Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  3 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 ++
>  arch/arm64/kvm/hyp/pgtable.c          | 85 +++++++--------------------
>  arch/arm64/kvm/mmu.c                  |  8 +++
>  4 files changed, 39 insertions(+), 63 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index cbd2851eefc1..e70cf57b719e 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -92,6 +92,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
>   *                             allocation is physically contiguous.
>   * @free_pages_exact:          Free an exact number of memory pages previously
>   *                             allocated by zalloc_pages_exact.
> + * @free_removed_table:                Free a removed paging structure by unlinking and
> + *                             dropping references.
>   * @get_page:                  Increment the refcount on a page.
>   * @put_page:                  Decrement the refcount on a page. When the
>   *                             refcount reaches 0 the page is automatically
> @@ -110,6 +112,7 @@ struct kvm_pgtable_mm_ops {
>         void*           (*zalloc_page)(void *arg);
>         void*           (*zalloc_pages_exact)(size_t size);
>         void            (*free_pages_exact)(void *addr, size_t size);
> +       void            (*free_removed_table)(void *addr, u32 level);
>         void            (*get_page)(void *addr);
>         void            (*put_page)(void *addr);
>         int             (*page_count)(void *addr);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index d21d1b08a055..735769886b55 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -79,6 +79,11 @@ static void host_s2_put_page(void *addr)
>         hyp_put_page(&host_s2_pool, addr);
>  }
>
> +static void host_s2_free_removed_table(void *addr, u32 level)
> +{
> +       kvm_pgtable_stage2_free_removed(&host_kvm.mm_ops, addr, level);
> +}
> +
>  static int prepare_s2_pool(void *pgt_pool_base)
>  {
>         unsigned long nr_pages, pfn;
> @@ -93,6 +98,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
>         host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
>                 .zalloc_pages_exact = host_s2_zalloc_pages_exact,
>                 .zalloc_page = host_s2_zalloc_page,
> +               .free_removed_table = host_s2_free_removed_table,
>                 .phys_to_virt = hyp_phys_to_virt,
>                 .virt_to_phys = hyp_virt_to_phys,
>                 .page_count = hyp_page_count,
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7511494537e5..7c9782347570 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -750,13 +750,13 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>                                      struct stage2_map_data *data)
>  {
> -       if (data->anchor)
> -               return 0;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +       kvm_pte_t *childp = kvm_pte_follow(ctx->old, mm_ops);
> +       int ret;
>
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -765,8 +765,13 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>          * individually.
>          */
>         kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -       data->anchor = ctx->ptep;
> -       return 0;
> +
> +       ret = stage2_map_walker_try_leaf(ctx, data);
> +
> +       mm_ops->put_page(ctx->ptep);
> +       mm_ops->free_removed_table(childp, ctx->level);
> +
> +       return ret;
>  }
>
>  static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -776,13 +781,6 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         kvm_pte_t *childp;
>         int ret;
>
> -       if (data->anchor) {
> -               if (stage2_pte_is_counted(ctx->old))
> -                       mm_ops->put_page(ctx->ptep);
> -
> -               return 0;
> -       }
> -
>         ret = stage2_map_walker_try_leaf(ctx, data);
>         if (ret != -E2BIG)
>                 return ret;
> @@ -811,49 +809,14 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         return 0;
>  }
>
> -static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
> -                                     struct stage2_map_data *data)
> -{
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> -       kvm_pte_t *childp;
> -       int ret = 0;
> -
> -       if (!data->anchor)
> -               return 0;
> -
> -       if (data->anchor == ctx->ptep) {
> -               childp = data->childp;
> -               data->anchor = NULL;
> -               data->childp = NULL;
> -               ret = stage2_map_walk_leaf(ctx, data);
> -       } else {
> -               childp = kvm_pte_follow(ctx->old, mm_ops);
> -       }
> -
> -       mm_ops->put_page(childp);
> -       mm_ops->put_page(ctx->ptep);
> -
> -       return ret;
> -}
> -
>  /*
> - * This is a little fiddly, as we use all three of the walk flags. The idea
> - * is that the TABLE_PRE callback runs for table entries on the way down,
> - * looking for table entries which we could conceivably replace with a
> - * block entry for this mapping. If it finds one, then it sets the 'anchor'
> - * field in 'struct stage2_map_data' to point at the table entry, before
> - * clearing the entry to zero and descending into the now detached table.
> - *
> - * The behaviour of the LEAF callback then depends on whether or not the
> - * anchor has been set. If not, then we're not using a block mapping higher
> - * up the table and we perform the mapping at the existing leaves instead.
> - * If, on the other hand, the anchor _is_ set, then we drop references to
> - * all valid leaves so that the pages beneath the anchor can be freed.
> + * The TABLE_PRE callback runs for table entries on the way down, looking
> + * for table entries which we could conceivably replace with a block entry
> + * for this mapping. If it finds one it replaces the entry and calls
> + * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table.
>   *
> - * Finally, the TABLE_POST callback does nothing if the anchor has not
> - * been set, but otherwise frees the page-table pages while walking back up
> - * the page-table, installing the block entry when it revisits the anchor
> - * pointer and clearing the anchor to NULL.
> + * Otherwise, the LEAF callback performs the mapping at the existing leaves
> + * instead.
>   */
>  static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                              enum kvm_pgtable_walk_flags visit)
> @@ -865,11 +828,9 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                 return stage2_map_walk_table_pre(ctx, data);
>         case KVM_PGTABLE_WALK_LEAF:
>                 return stage2_map_walk_leaf(ctx, data);
> -       case KVM_PGTABLE_WALK_TABLE_POST:
> -               return stage2_map_walk_table_post(ctx, data);
> +       default:
> +               return -EINVAL;
>         }
> -
> -       return -EINVAL;
>  }
>
>  int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> @@ -886,8 +847,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_map_walker,
>                 .flags          = KVM_PGTABLE_WALK_TABLE_PRE |
> -                                 KVM_PGTABLE_WALK_LEAF |
> -                                 KVM_PGTABLE_WALK_TABLE_POST,
> +                                 KVM_PGTABLE_WALK_LEAF,
>                 .arg            = &map_data,
>         };
>
> @@ -917,8 +877,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_map_walker,
>                 .flags          = KVM_PGTABLE_WALK_TABLE_PRE |
> -                                 KVM_PGTABLE_WALK_LEAF |
> -                                 KVM_PGTABLE_WALK_TABLE_POST,
> +                                 KVM_PGTABLE_WALK_LEAF,
>                 .arg            = &map_data,
>         };
>
> @@ -1207,7 +1166,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>
>  void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
>  {
> -       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> +       kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
>         struct kvm_pgtable_walker walker = {
>                 .cb     = stage2_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF |
> @@ -1225,5 +1184,5 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
>                 .end    = kvm_granule_size(level),
>         };
>
> -       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level + 1));
>  }
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 5e197ae190ef..73ae908eb5d9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -128,6 +128,13 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>         free_pages_exact(virt, size);
>  }
>
> +static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
> +
> +static void stage2_free_removed_table(void *addr, u32 level)
> +{
> +       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +}
> +
>  static void kvm_host_get_page(void *addr)
>  {
>         get_page(virt_to_page(addr));
> @@ -662,6 +669,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
>         .zalloc_page            = stage2_memcache_zalloc_page,
>         .zalloc_pages_exact     = kvm_s2_zalloc_pages_exact,
>         .free_pages_exact       = kvm_s2_free_pages_exact,
> +       .free_removed_table     = stage2_free_removed_table,
>         .get_page               = kvm_host_get_page,
>         .put_page               = kvm_s2_put_page,
>         .page_count             = kvm_host_page_count,
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
@ 2022-11-09 22:24     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:24 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> The break-before-make sequence is a bit annoying as it opens a window
> wherein memory is unmapped from the guest. KVM should replace the PTE
> as quickly as possible and avoid unnecessary work in between.
>
> Presently, the stage-2 map walker tears down a removed table before
> installing a block mapping when coalescing a table into a block. As the
> removed table is no longer visible to hardware walkers after the
> DSB+TLBI, it is possible to move the remaining cleanup to happen after
> installing the new PTE.
>
> Reshuffle the stage-2 map walker to install the new block entry in
> the pre-order callback. Unwire all of the teardown logic and replace
> it with a call to kvm_pgtable_stage2_free_removed() after fixing
> the PTE. The post-order visitor is now completely unnecessary, so drop
> it. Finally, touch up the comments to better represent the now
> simplified map walker.
>
> Note that the call to tear down the unlinked stage-2 is indirected
> as a subsequent change will use an RCU callback to trigger tear down.
> RCU is not available to pKVM, so there is a need to use different
> implementations on pKVM and non-pKVM VMs.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

That anchor scheme is complicated. Glad to see it removed in favor of this.
Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  3 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 ++
>  arch/arm64/kvm/hyp/pgtable.c          | 85 +++++++--------------------
>  arch/arm64/kvm/mmu.c                  |  8 +++
>  4 files changed, 39 insertions(+), 63 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index cbd2851eefc1..e70cf57b719e 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -92,6 +92,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
>   *                             allocation is physically contiguous.
>   * @free_pages_exact:          Free an exact number of memory pages previously
>   *                             allocated by zalloc_pages_exact.
> + * @free_removed_table:                Free a removed paging structure by unlinking and
> + *                             dropping references.
>   * @get_page:                  Increment the refcount on a page.
>   * @put_page:                  Decrement the refcount on a page. When the
>   *                             refcount reaches 0 the page is automatically
> @@ -110,6 +112,7 @@ struct kvm_pgtable_mm_ops {
>         void*           (*zalloc_page)(void *arg);
>         void*           (*zalloc_pages_exact)(size_t size);
>         void            (*free_pages_exact)(void *addr, size_t size);
> +       void            (*free_removed_table)(void *addr, u32 level);
>         void            (*get_page)(void *addr);
>         void            (*put_page)(void *addr);
>         int             (*page_count)(void *addr);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index d21d1b08a055..735769886b55 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -79,6 +79,11 @@ static void host_s2_put_page(void *addr)
>         hyp_put_page(&host_s2_pool, addr);
>  }
>
> +static void host_s2_free_removed_table(void *addr, u32 level)
> +{
> +       kvm_pgtable_stage2_free_removed(&host_kvm.mm_ops, addr, level);
> +}
> +
>  static int prepare_s2_pool(void *pgt_pool_base)
>  {
>         unsigned long nr_pages, pfn;
> @@ -93,6 +98,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
>         host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
>                 .zalloc_pages_exact = host_s2_zalloc_pages_exact,
>                 .zalloc_page = host_s2_zalloc_page,
> +               .free_removed_table = host_s2_free_removed_table,
>                 .phys_to_virt = hyp_phys_to_virt,
>                 .virt_to_phys = hyp_virt_to_phys,
>                 .page_count = hyp_page_count,
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7511494537e5..7c9782347570 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -750,13 +750,13 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>                                      struct stage2_map_data *data)
>  {
> -       if (data->anchor)
> -               return 0;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +       kvm_pte_t *childp = kvm_pte_follow(ctx->old, mm_ops);
> +       int ret;
>
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -765,8 +765,13 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>          * individually.
>          */
>         kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -       data->anchor = ctx->ptep;
> -       return 0;
> +
> +       ret = stage2_map_walker_try_leaf(ctx, data);
> +
> +       mm_ops->put_page(ctx->ptep);
> +       mm_ops->free_removed_table(childp, ctx->level);
> +
> +       return ret;
>  }
>
>  static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -776,13 +781,6 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         kvm_pte_t *childp;
>         int ret;
>
> -       if (data->anchor) {
> -               if (stage2_pte_is_counted(ctx->old))
> -                       mm_ops->put_page(ctx->ptep);
> -
> -               return 0;
> -       }
> -
>         ret = stage2_map_walker_try_leaf(ctx, data);
>         if (ret != -E2BIG)
>                 return ret;
> @@ -811,49 +809,14 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         return 0;
>  }
>
> -static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
> -                                     struct stage2_map_data *data)
> -{
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> -       kvm_pte_t *childp;
> -       int ret = 0;
> -
> -       if (!data->anchor)
> -               return 0;
> -
> -       if (data->anchor == ctx->ptep) {
> -               childp = data->childp;
> -               data->anchor = NULL;
> -               data->childp = NULL;
> -               ret = stage2_map_walk_leaf(ctx, data);
> -       } else {
> -               childp = kvm_pte_follow(ctx->old, mm_ops);
> -       }
> -
> -       mm_ops->put_page(childp);
> -       mm_ops->put_page(ctx->ptep);
> -
> -       return ret;
> -}
> -
>  /*
> - * This is a little fiddly, as we use all three of the walk flags. The idea
> - * is that the TABLE_PRE callback runs for table entries on the way down,
> - * looking for table entries which we could conceivably replace with a
> - * block entry for this mapping. If it finds one, then it sets the 'anchor'
> - * field in 'struct stage2_map_data' to point at the table entry, before
> - * clearing the entry to zero and descending into the now detached table.
> - *
> - * The behaviour of the LEAF callback then depends on whether or not the
> - * anchor has been set. If not, then we're not using a block mapping higher
> - * up the table and we perform the mapping at the existing leaves instead.
> - * If, on the other hand, the anchor _is_ set, then we drop references to
> - * all valid leaves so that the pages beneath the anchor can be freed.
> + * The TABLE_PRE callback runs for table entries on the way down, looking
> + * for table entries which we could conceivably replace with a block entry
> + * for this mapping. If it finds one it replaces the entry and calls
> + * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table.
>   *
> - * Finally, the TABLE_POST callback does nothing if the anchor has not
> - * been set, but otherwise frees the page-table pages while walking back up
> - * the page-table, installing the block entry when it revisits the anchor
> - * pointer and clearing the anchor to NULL.
> + * Otherwise, the LEAF callback performs the mapping at the existing leaves
> + * instead.
>   */
>  static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                              enum kvm_pgtable_walk_flags visit)
> @@ -865,11 +828,9 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                 return stage2_map_walk_table_pre(ctx, data);
>         case KVM_PGTABLE_WALK_LEAF:
>                 return stage2_map_walk_leaf(ctx, data);
> -       case KVM_PGTABLE_WALK_TABLE_POST:
> -               return stage2_map_walk_table_post(ctx, data);
> +       default:
> +               return -EINVAL;
>         }
> -
> -       return -EINVAL;
>  }
>
>  int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> @@ -886,8 +847,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_map_walker,
>                 .flags          = KVM_PGTABLE_WALK_TABLE_PRE |
> -                                 KVM_PGTABLE_WALK_LEAF |
> -                                 KVM_PGTABLE_WALK_TABLE_POST,
> +                                 KVM_PGTABLE_WALK_LEAF,
>                 .arg            = &map_data,
>         };
>
> @@ -917,8 +877,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_map_walker,
>                 .flags          = KVM_PGTABLE_WALK_TABLE_PRE |
> -                                 KVM_PGTABLE_WALK_LEAF |
> -                                 KVM_PGTABLE_WALK_TABLE_POST,
> +                                 KVM_PGTABLE_WALK_LEAF,
>                 .arg            = &map_data,
>         };
>
> @@ -1207,7 +1166,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>
>  void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
>  {
> -       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> +       kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
>         struct kvm_pgtable_walker walker = {
>                 .cb     = stage2_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF |
> @@ -1225,5 +1184,5 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
>                 .end    = kvm_granule_size(level),
>         };
>
> -       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level + 1));
>  }
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 5e197ae190ef..73ae908eb5d9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -128,6 +128,13 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>         free_pages_exact(virt, size);
>  }
>
> +static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
> +
> +static void stage2_free_removed_table(void *addr, u32 level)
> +{
> +       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +}
> +
>  static void kvm_host_get_page(void *addr)
>  {
>         get_page(virt_to_page(addr));
> @@ -662,6 +669,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
>         .zalloc_page            = stage2_memcache_zalloc_page,
>         .zalloc_pages_exact     = kvm_s2_zalloc_pages_exact,
>         .free_pages_exact       = kvm_s2_free_pages_exact,
> +       .free_removed_table     = stage2_free_removed_table,
>         .get_page               = kvm_host_get_page,
>         .put_page               = kvm_s2_put_page,
>         .page_count             = kvm_host_page_count,
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
@ 2022-11-09 22:24     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:24 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> The break-before-make sequence is a bit annoying as it opens a window
> wherein memory is unmapped from the guest. KVM should replace the PTE
> as quickly as possible and avoid unnecessary work in between.
>
> Presently, the stage-2 map walker tears down a removed table before
> installing a block mapping when coalescing a table into a block. As the
> removed table is no longer visible to hardware walkers after the
> DSB+TLBI, it is possible to move the remaining cleanup to happen after
> installing the new PTE.
>
> Reshuffle the stage-2 map walker to install the new block entry in
> the pre-order callback. Unwire all of the teardown logic and replace
> it with a call to kvm_pgtable_stage2_free_removed() after fixing
> the PTE. The post-order visitor is now completely unnecessary, so drop
> it. Finally, touch up the comments to better represent the now
> simplified map walker.
>
> Note that the call to tear down the unlinked stage-2 is indirected
> as a subsequent change will use an RCU callback to trigger tear down.
> RCU is not available to pKVM, so there is a need to use different
> implementations on pKVM and non-pKVM VMs.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

That anchor scheme is complicated. Glad to see it removed in favor of this.
Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/include/asm/kvm_pgtable.h  |  3 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 ++
>  arch/arm64/kvm/hyp/pgtable.c          | 85 +++++++--------------------
>  arch/arm64/kvm/mmu.c                  |  8 +++
>  4 files changed, 39 insertions(+), 63 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index cbd2851eefc1..e70cf57b719e 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -92,6 +92,8 @@ static inline bool kvm_level_supports_block_mapping(u32 level)
>   *                             allocation is physically contiguous.
>   * @free_pages_exact:          Free an exact number of memory pages previously
>   *                             allocated by zalloc_pages_exact.
> + * @free_removed_table:                Free a removed paging structure by unlinking and
> + *                             dropping references.
>   * @get_page:                  Increment the refcount on a page.
>   * @put_page:                  Decrement the refcount on a page. When the
>   *                             refcount reaches 0 the page is automatically
> @@ -110,6 +112,7 @@ struct kvm_pgtable_mm_ops {
>         void*           (*zalloc_page)(void *arg);
>         void*           (*zalloc_pages_exact)(size_t size);
>         void            (*free_pages_exact)(void *addr, size_t size);
> +       void            (*free_removed_table)(void *addr, u32 level);
>         void            (*get_page)(void *addr);
>         void            (*put_page)(void *addr);
>         int             (*page_count)(void *addr);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index d21d1b08a055..735769886b55 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -79,6 +79,11 @@ static void host_s2_put_page(void *addr)
>         hyp_put_page(&host_s2_pool, addr);
>  }
>
> +static void host_s2_free_removed_table(void *addr, u32 level)
> +{
> +       kvm_pgtable_stage2_free_removed(&host_kvm.mm_ops, addr, level);
> +}
> +
>  static int prepare_s2_pool(void *pgt_pool_base)
>  {
>         unsigned long nr_pages, pfn;
> @@ -93,6 +98,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
>         host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
>                 .zalloc_pages_exact = host_s2_zalloc_pages_exact,
>                 .zalloc_page = host_s2_zalloc_page,
> +               .free_removed_table = host_s2_free_removed_table,
>                 .phys_to_virt = hyp_phys_to_virt,
>                 .virt_to_phys = hyp_virt_to_phys,
>                 .page_count = hyp_page_count,
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7511494537e5..7c9782347570 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -750,13 +750,13 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>                                      struct stage2_map_data *data)
>  {
> -       if (data->anchor)
> -               return 0;
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +       kvm_pte_t *childp = kvm_pte_follow(ctx->old, mm_ops);
> +       int ret;
>
>         if (!stage2_leaf_mapping_allowed(ctx, data))
>                 return 0;
>
> -       data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>         kvm_clear_pte(ctx->ptep);
>
>         /*
> @@ -765,8 +765,13 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>          * individually.
>          */
>         kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -       data->anchor = ctx->ptep;
> -       return 0;
> +
> +       ret = stage2_map_walker_try_leaf(ctx, data);
> +
> +       mm_ops->put_page(ctx->ptep);
> +       mm_ops->free_removed_table(childp, ctx->level);
> +
> +       return ret;
>  }
>
>  static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -776,13 +781,6 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         kvm_pte_t *childp;
>         int ret;
>
> -       if (data->anchor) {
> -               if (stage2_pte_is_counted(ctx->old))
> -                       mm_ops->put_page(ctx->ptep);
> -
> -               return 0;
> -       }
> -
>         ret = stage2_map_walker_try_leaf(ctx, data);
>         if (ret != -E2BIG)
>                 return ret;
> @@ -811,49 +809,14 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         return 0;
>  }
>
> -static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
> -                                     struct stage2_map_data *data)
> -{
> -       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> -       kvm_pte_t *childp;
> -       int ret = 0;
> -
> -       if (!data->anchor)
> -               return 0;
> -
> -       if (data->anchor == ctx->ptep) {
> -               childp = data->childp;
> -               data->anchor = NULL;
> -               data->childp = NULL;
> -               ret = stage2_map_walk_leaf(ctx, data);
> -       } else {
> -               childp = kvm_pte_follow(ctx->old, mm_ops);
> -       }
> -
> -       mm_ops->put_page(childp);
> -       mm_ops->put_page(ctx->ptep);
> -
> -       return ret;
> -}
> -
>  /*
> - * This is a little fiddly, as we use all three of the walk flags. The idea
> - * is that the TABLE_PRE callback runs for table entries on the way down,
> - * looking for table entries which we could conceivably replace with a
> - * block entry for this mapping. If it finds one, then it sets the 'anchor'
> - * field in 'struct stage2_map_data' to point at the table entry, before
> - * clearing the entry to zero and descending into the now detached table.
> - *
> - * The behaviour of the LEAF callback then depends on whether or not the
> - * anchor has been set. If not, then we're not using a block mapping higher
> - * up the table and we perform the mapping at the existing leaves instead.
> - * If, on the other hand, the anchor _is_ set, then we drop references to
> - * all valid leaves so that the pages beneath the anchor can be freed.
> + * The TABLE_PRE callback runs for table entries on the way down, looking
> + * for table entries which we could conceivably replace with a block entry
> + * for this mapping. If it finds one it replaces the entry and calls
> + * kvm_pgtable_mm_ops::free_removed_table() to tear down the detached table.
>   *
> - * Finally, the TABLE_POST callback does nothing if the anchor has not
> - * been set, but otherwise frees the page-table pages while walking back up
> - * the page-table, installing the block entry when it revisits the anchor
> - * pointer and clearing the anchor to NULL.
> + * Otherwise, the LEAF callback performs the mapping at the existing leaves
> + * instead.
>   */
>  static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                              enum kvm_pgtable_walk_flags visit)
> @@ -865,11 +828,9 @@ static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                 return stage2_map_walk_table_pre(ctx, data);
>         case KVM_PGTABLE_WALK_LEAF:
>                 return stage2_map_walk_leaf(ctx, data);
> -       case KVM_PGTABLE_WALK_TABLE_POST:
> -               return stage2_map_walk_table_post(ctx, data);
> +       default:
> +               return -EINVAL;
>         }
> -
> -       return -EINVAL;
>  }
>
>  int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> @@ -886,8 +847,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_map_walker,
>                 .flags          = KVM_PGTABLE_WALK_TABLE_PRE |
> -                                 KVM_PGTABLE_WALK_LEAF |
> -                                 KVM_PGTABLE_WALK_TABLE_POST,
> +                                 KVM_PGTABLE_WALK_LEAF,
>                 .arg            = &map_data,
>         };
>
> @@ -917,8 +877,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_map_walker,
>                 .flags          = KVM_PGTABLE_WALK_TABLE_PRE |
> -                                 KVM_PGTABLE_WALK_LEAF |
> -                                 KVM_PGTABLE_WALK_TABLE_POST,
> +                                 KVM_PGTABLE_WALK_LEAF,
>                 .arg            = &map_data,
>         };
>
> @@ -1207,7 +1166,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>
>  void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
>  {
> -       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> +       kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
>         struct kvm_pgtable_walker walker = {
>                 .cb     = stage2_free_walker,
>                 .flags  = KVM_PGTABLE_WALK_LEAF |
> @@ -1225,5 +1184,5 @@ void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pg
>                 .end    = kvm_granule_size(level),
>         };
>
> -       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level + 1));
>  }
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 5e197ae190ef..73ae908eb5d9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -128,6 +128,13 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>         free_pages_exact(virt, size);
>  }
>
> +static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
> +
> +static void stage2_free_removed_table(void *addr, u32 level)
> +{
> +       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +}
> +
>  static void kvm_host_get_page(void *addr)
>  {
>         get_page(virt_to_page(addr));
> @@ -662,6 +669,7 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
>         .zalloc_page            = stage2_memcache_zalloc_page,
>         .zalloc_pages_exact     = kvm_s2_zalloc_pages_exact,
>         .free_pages_exact       = kvm_s2_free_pages_exact,
> +       .free_removed_table     = stage2_free_removed_table,
>         .get_page               = kvm_host_get_page,
>         .put_page               = kvm_s2_put_page,
>         .page_count             = kvm_host_page_count,
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:25     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:25 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
>
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
>  arch/arm64/kvm/mmu.c                 | 14 +++++++-
>  3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index e70cf57b719e..7634b6964779 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>
>  typedef u64 kvm_pte_t;
>
> +/*
> + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> + * table walkers used in hyp do not call into RCU and instead use other
> + * synchronization mechanisms (such as a spinlock).
> + */
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +
>  typedef kvm_pte_t *kvm_pteref_t;
>
>  static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>         return pteref;
>  }
>
> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +       return true;

Forgive my ignorance, but does hyp not use a MMU lock at all? Seems
like this would be a good place to add a lockdep check.

> +}
> +
> +#else
> +
> +typedef kvm_pte_t __rcu *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> +{
> +       return rcu_dereference_check(pteref, !shared);

Same here, could add a lockdep check depending on shared.

> +}
> +
> +static inline void kvm_pgtable_walk_begin(void)
> +{
> +       rcu_read_lock();
> +}
> +
> +static inline void kvm_pgtable_walk_end(void)
> +{
> +       rcu_read_unlock();
> +}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +       return rcu_read_lock_held();

Likewise could do some lockdep here.

> +}
> +
> +#endif
> +
>  #define KVM_PTE_VALID                  BIT(0)
>
>  #define KVM_PTE_ADDR_MASK              GENMASK(47, PAGE_SHIFT)
> @@ -202,11 +243,14 @@ struct kvm_pgtable {
>   *                                     children.
>   * @KVM_PGTABLE_WALK_TABLE_POST:       Visit table entries after their
>   *                                     children.
> + * @KVM_PGTABLE_WALK_SHARED:           Indicates the page-tables may be shared
> + *                                     with other software walkers.
>   */
>  enum kvm_pgtable_walk_flags {
>         KVM_PGTABLE_WALK_LEAF                   = BIT(0),
>         KVM_PGTABLE_WALK_TABLE_PRE              = BIT(1),
>         KVM_PGTABLE_WALK_TABLE_POST             = BIT(2),
> +       KVM_PGTABLE_WALK_SHARED                 = BIT(3),

Not sure if necessary, but it might pay to have 3 shared options:
exclusive, shared mmu lock, no mmu lock if we ever want lockless fast
page faults.


>  };
>
>  struct kvm_pgtable_visit_ctx {
> @@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
>  typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>                                         enum kvm_pgtable_walk_flags visit);
>
> +static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
> +{
> +       return ctx->flags & KVM_PGTABLE_WALK_SHARED;
> +}
> +
>  /**
>   * struct kvm_pgtable_walker - Hook into a page-table walk.
>   * @cb:                Callback function to invoke during the walk.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7c9782347570..d8d963521d4e 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>                                   enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_walker *walker = data->walker;
> +
> +       /* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
> +       WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
>         return walker->cb(ctx, visit);
>  }
>
> @@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .end    = PAGE_ALIGN(walk_data.addr + size),
>                 .walker = walker,
>         };
> +       int r;
> +
> +       kvm_pgtable_walk_begin();
> +       r = _kvm_pgtable_walk(pgt, &walk_data);
> +       kvm_pgtable_walk_end();
>
> -       return _kvm_pgtable_walk(pgt, &walk_data);
> +       return r;
>  }
>
>  struct leaf_walk_data {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 73ae908eb5d9..52e042399ba5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>
>  static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
>
> +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> +{
> +       struct page *page = container_of(head, struct page, rcu_head);
> +       void *pgtable = page_to_virt(page);
> +       u32 level = page_private(page);
> +
> +       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
> +}
> +
>  static void stage2_free_removed_table(void *addr, u32 level)
>  {
> -       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +       struct page *page = virt_to_page(addr);
> +
> +       set_page_private(page, (unsigned long)level);
> +       call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
>  }
>
>  static void kvm_host_get_page(void *addr)
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-09 22:25     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:25 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
>
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
>  arch/arm64/kvm/mmu.c                 | 14 +++++++-
>  3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index e70cf57b719e..7634b6964779 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>
>  typedef u64 kvm_pte_t;
>
> +/*
> + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> + * table walkers used in hyp do not call into RCU and instead use other
> + * synchronization mechanisms (such as a spinlock).
> + */
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +
>  typedef kvm_pte_t *kvm_pteref_t;
>
>  static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>         return pteref;
>  }
>
> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +       return true;

Forgive my ignorance, but does hyp not use a MMU lock at all? Seems
like this would be a good place to add a lockdep check.

> +}
> +
> +#else
> +
> +typedef kvm_pte_t __rcu *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> +{
> +       return rcu_dereference_check(pteref, !shared);

Same here, could add a lockdep check depending on shared.

> +}
> +
> +static inline void kvm_pgtable_walk_begin(void)
> +{
> +       rcu_read_lock();
> +}
> +
> +static inline void kvm_pgtable_walk_end(void)
> +{
> +       rcu_read_unlock();
> +}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +       return rcu_read_lock_held();

Likewise could do some lockdep here.

> +}
> +
> +#endif
> +
>  #define KVM_PTE_VALID                  BIT(0)
>
>  #define KVM_PTE_ADDR_MASK              GENMASK(47, PAGE_SHIFT)
> @@ -202,11 +243,14 @@ struct kvm_pgtable {
>   *                                     children.
>   * @KVM_PGTABLE_WALK_TABLE_POST:       Visit table entries after their
>   *                                     children.
> + * @KVM_PGTABLE_WALK_SHARED:           Indicates the page-tables may be shared
> + *                                     with other software walkers.
>   */
>  enum kvm_pgtable_walk_flags {
>         KVM_PGTABLE_WALK_LEAF                   = BIT(0),
>         KVM_PGTABLE_WALK_TABLE_PRE              = BIT(1),
>         KVM_PGTABLE_WALK_TABLE_POST             = BIT(2),
> +       KVM_PGTABLE_WALK_SHARED                 = BIT(3),

Not sure if necessary, but it might pay to have 3 shared options:
exclusive, shared mmu lock, no mmu lock if we ever want lockless fast
page faults.


>  };
>
>  struct kvm_pgtable_visit_ctx {
> @@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
>  typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>                                         enum kvm_pgtable_walk_flags visit);
>
> +static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
> +{
> +       return ctx->flags & KVM_PGTABLE_WALK_SHARED;
> +}
> +
>  /**
>   * struct kvm_pgtable_walker - Hook into a page-table walk.
>   * @cb:                Callback function to invoke during the walk.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7c9782347570..d8d963521d4e 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>                                   enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_walker *walker = data->walker;
> +
> +       /* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
> +       WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
>         return walker->cb(ctx, visit);
>  }
>
> @@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .end    = PAGE_ALIGN(walk_data.addr + size),
>                 .walker = walker,
>         };
> +       int r;
> +
> +       kvm_pgtable_walk_begin();
> +       r = _kvm_pgtable_walk(pgt, &walk_data);
> +       kvm_pgtable_walk_end();
>
> -       return _kvm_pgtable_walk(pgt, &walk_data);
> +       return r;
>  }
>
>  struct leaf_walk_data {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 73ae908eb5d9..52e042399ba5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>
>  static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
>
> +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> +{
> +       struct page *page = container_of(head, struct page, rcu_head);
> +       void *pgtable = page_to_virt(page);
> +       u32 level = page_private(page);
> +
> +       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
> +}
> +
>  static void stage2_free_removed_table(void *addr, u32 level)
>  {
> -       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +       struct page *page = virt_to_page(addr);
> +
> +       set_page_private(page, (unsigned long)level);
> +       call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
>  }
>
>  static void kvm_host_get_page(void *addr)
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-09 22:25     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:25 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
>
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
>  arch/arm64/kvm/mmu.c                 | 14 +++++++-
>  3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index e70cf57b719e..7634b6964779 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>
>  typedef u64 kvm_pte_t;
>
> +/*
> + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> + * table walkers used in hyp do not call into RCU and instead use other
> + * synchronization mechanisms (such as a spinlock).
> + */
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +
>  typedef kvm_pte_t *kvm_pteref_t;
>
>  static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>         return pteref;
>  }
>
> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +       return true;

Forgive my ignorance, but does hyp not use a MMU lock at all? Seems
like this would be a good place to add a lockdep check.

> +}
> +
> +#else
> +
> +typedef kvm_pte_t __rcu *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> +{
> +       return rcu_dereference_check(pteref, !shared);

Same here, could add a lockdep check depending on shared.

> +}
> +
> +static inline void kvm_pgtable_walk_begin(void)
> +{
> +       rcu_read_lock();
> +}
> +
> +static inline void kvm_pgtable_walk_end(void)
> +{
> +       rcu_read_unlock();
> +}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +       return rcu_read_lock_held();

Likewise could do some lockdep here.

> +}
> +
> +#endif
> +
>  #define KVM_PTE_VALID                  BIT(0)
>
>  #define KVM_PTE_ADDR_MASK              GENMASK(47, PAGE_SHIFT)
> @@ -202,11 +243,14 @@ struct kvm_pgtable {
>   *                                     children.
>   * @KVM_PGTABLE_WALK_TABLE_POST:       Visit table entries after their
>   *                                     children.
> + * @KVM_PGTABLE_WALK_SHARED:           Indicates the page-tables may be shared
> + *                                     with other software walkers.
>   */
>  enum kvm_pgtable_walk_flags {
>         KVM_PGTABLE_WALK_LEAF                   = BIT(0),
>         KVM_PGTABLE_WALK_TABLE_PRE              = BIT(1),
>         KVM_PGTABLE_WALK_TABLE_POST             = BIT(2),
> +       KVM_PGTABLE_WALK_SHARED                 = BIT(3),

Not sure if necessary, but it might pay to have 3 shared options:
exclusive, shared mmu lock, no mmu lock if we ever want lockless fast
page faults.


>  };
>
>  struct kvm_pgtable_visit_ctx {
> @@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
>  typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>                                         enum kvm_pgtable_walk_flags visit);
>
> +static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
> +{
> +       return ctx->flags & KVM_PGTABLE_WALK_SHARED;
> +}
> +
>  /**
>   * struct kvm_pgtable_walker - Hook into a page-table walk.
>   * @cb:                Callback function to invoke during the walk.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7c9782347570..d8d963521d4e 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>                                   enum kvm_pgtable_walk_flags visit)
>  {
>         struct kvm_pgtable_walker *walker = data->walker;
> +
> +       /* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
> +       WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
>         return walker->cb(ctx, visit);
>  }
>
> @@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>                 .end    = PAGE_ALIGN(walk_data.addr + size),
>                 .walker = walker,
>         };
> +       int r;
> +
> +       kvm_pgtable_walk_begin();
> +       r = _kvm_pgtable_walk(pgt, &walk_data);
> +       kvm_pgtable_walk_end();
>
> -       return _kvm_pgtable_walk(pgt, &walk_data);
> +       return r;
>  }
>
>  struct leaf_walk_data {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 73ae908eb5d9..52e042399ba5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>
>  static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
>
> +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> +{
> +       struct page *page = container_of(head, struct page, rcu_head);
> +       void *pgtable = page_to_virt(page);
> +       u32 level = page_private(page);
> +
> +       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
> +}
> +
>  static void stage2_free_removed_table(void *addr, u32 level)
>  {
> -       kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +       struct page *page = virt_to_page(addr);
> +
> +       set_page_private(page, (unsigned long)level);
> +       call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
>  }
>
>  static void kvm_host_get_page(void *addr)
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:26     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> The stage2 attr walker is already used for parallel walks. Since commit
> f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
> during dirty logging"), KVM acquires the read lock when
> write-unprotecting a PTE. However, the walker only uses a simple store
> to update the PTE. This is safe as the only possible race is with
> hardware updates to the access flag, which is benign.
>
> However, a subsequent change to KVM will allow more changes to the stage
> 2 page tables to be done in parallel. Prepare the stage 2 attribute
> walker by performing atomic updates to the PTE when walking in parallel.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index d8d963521d4e..a34e2050f931 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -185,7 +185,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       kvm_pteref_t pteref, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> -       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
> +       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, flags & KVM_PGTABLE_WALK_SHARED);
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
> @@ -675,6 +675,16 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> +static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> +{
> +       if (!kvm_pgtable_walk_shared(ctx)) {
> +               WRITE_ONCE(*ctx->ptep, new);
> +               return true;
> +       }
> +
> +       return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
> +}
> +
>  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
>                            struct kvm_pgtable_mm_ops *mm_ops)
>  {
> @@ -986,7 +996,9 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                     stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>                                                   kvm_granule_size(ctx->level));
> -               WRITE_ONCE(*ctx->ptep, pte);
> +
> +               if (!stage2_try_set_pte(ctx, pte))
> +                       return -EAGAIN;
>         }
>
>         return 0;
> @@ -995,7 +1007,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>                                     u64 size, kvm_pte_t attr_set,
>                                     kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
> -                                   u32 *level)
> +                                   u32 *level, enum kvm_pgtable_walk_flags flags)
>  {
>         int ret;
>         kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
> @@ -1006,7 +1018,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_attr_walker,
>                 .arg            = &data,
> -               .flags          = KVM_PGTABLE_WALK_LEAF,
> +               .flags          = flags | KVM_PGTABLE_WALK_LEAF,
>         };
>
>         ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> @@ -1025,14 +1037,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
>         return stage2_update_leaf_attrs(pgt, addr, size, 0,
>                                         KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
> -                                       NULL, NULL);
> +                                       NULL, NULL, 0);
>  }
>
>  kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
>         stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
> -                                &pte, NULL);
> +                                &pte, NULL, 0);
>         dsb(ishst);
>         return pte;
>  }
> @@ -1041,7 +1053,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
>         stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
> -                                &pte, NULL);
> +                                &pte, NULL, 0);
>         /*
>          * "But where's the TLBI?!", you scream.
>          * "Over in the core code", I sigh.
> @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
>  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
> -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);

Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
doesn't just have to pass 0.


>         return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
>  }
>
> @@ -1077,7 +1089,8 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>         if (prot & KVM_PGTABLE_PROT_X)
>                 clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
>
> -       ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level);
> +       ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
> +                                      KVM_PGTABLE_WALK_SHARED);
>         if (!ret)
>                 kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
>         return ret;
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-09 22:26     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> The stage2 attr walker is already used for parallel walks. Since commit
> f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
> during dirty logging"), KVM acquires the read lock when
> write-unprotecting a PTE. However, the walker only uses a simple store
> to update the PTE. This is safe as the only possible race is with
> hardware updates to the access flag, which is benign.
>
> However, a subsequent change to KVM will allow more changes to the stage
> 2 page tables to be done in parallel. Prepare the stage 2 attribute
> walker by performing atomic updates to the PTE when walking in parallel.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index d8d963521d4e..a34e2050f931 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -185,7 +185,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       kvm_pteref_t pteref, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> -       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
> +       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, flags & KVM_PGTABLE_WALK_SHARED);
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
> @@ -675,6 +675,16 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> +static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> +{
> +       if (!kvm_pgtable_walk_shared(ctx)) {
> +               WRITE_ONCE(*ctx->ptep, new);
> +               return true;
> +       }
> +
> +       return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
> +}
> +
>  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
>                            struct kvm_pgtable_mm_ops *mm_ops)
>  {
> @@ -986,7 +996,9 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                     stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>                                                   kvm_granule_size(ctx->level));
> -               WRITE_ONCE(*ctx->ptep, pte);
> +
> +               if (!stage2_try_set_pte(ctx, pte))
> +                       return -EAGAIN;
>         }
>
>         return 0;
> @@ -995,7 +1007,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>                                     u64 size, kvm_pte_t attr_set,
>                                     kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
> -                                   u32 *level)
> +                                   u32 *level, enum kvm_pgtable_walk_flags flags)
>  {
>         int ret;
>         kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
> @@ -1006,7 +1018,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_attr_walker,
>                 .arg            = &data,
> -               .flags          = KVM_PGTABLE_WALK_LEAF,
> +               .flags          = flags | KVM_PGTABLE_WALK_LEAF,
>         };
>
>         ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> @@ -1025,14 +1037,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
>         return stage2_update_leaf_attrs(pgt, addr, size, 0,
>                                         KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
> -                                       NULL, NULL);
> +                                       NULL, NULL, 0);
>  }
>
>  kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
>         stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
> -                                &pte, NULL);
> +                                &pte, NULL, 0);
>         dsb(ishst);
>         return pte;
>  }
> @@ -1041,7 +1053,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
>         stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
> -                                &pte, NULL);
> +                                &pte, NULL, 0);
>         /*
>          * "But where's the TLBI?!", you scream.
>          * "Over in the core code", I sigh.
> @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
>  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
> -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);

Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
doesn't just have to pass 0.


>         return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
>  }
>
> @@ -1077,7 +1089,8 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>         if (prot & KVM_PGTABLE_PROT_X)
>                 clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
>
> -       ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level);
> +       ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
> +                                      KVM_PGTABLE_WALK_SHARED);
>         if (!ret)
>                 kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
>         return ret;
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-09 22:26     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> The stage2 attr walker is already used for parallel walks. Since commit
> f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
> during dirty logging"), KVM acquires the read lock when
> write-unprotecting a PTE. However, the walker only uses a simple store
> to update the PTE. This is safe as the only possible race is with
> hardware updates to the access flag, which is benign.
>
> However, a subsequent change to KVM will allow more changes to the stage
> 2 page tables to be done in parallel. Prepare the stage 2 attribute
> walker by performing atomic updates to the PTE when walking in parallel.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index d8d963521d4e..a34e2050f931 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -185,7 +185,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>                                       kvm_pteref_t pteref, u32 level)
>  {
>         enum kvm_pgtable_walk_flags flags = data->walker->flags;
> -       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false);
> +       kvm_pte_t *ptep = kvm_dereference_pteref(pteref, flags & KVM_PGTABLE_WALK_SHARED);
>         struct kvm_pgtable_visit_ctx ctx = {
>                 .ptep   = ptep,
>                 .old    = READ_ONCE(*ptep),
> @@ -675,6 +675,16 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> +static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> +{
> +       if (!kvm_pgtable_walk_shared(ctx)) {
> +               WRITE_ONCE(*ctx->ptep, new);
> +               return true;
> +       }
> +
> +       return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
> +}
> +
>  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
>                            struct kvm_pgtable_mm_ops *mm_ops)
>  {
> @@ -986,7 +996,9 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                     stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>                         mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>                                                   kvm_granule_size(ctx->level));
> -               WRITE_ONCE(*ctx->ptep, pte);
> +
> +               if (!stage2_try_set_pte(ctx, pte))
> +                       return -EAGAIN;
>         }
>
>         return 0;
> @@ -995,7 +1007,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>                                     u64 size, kvm_pte_t attr_set,
>                                     kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
> -                                   u32 *level)
> +                                   u32 *level, enum kvm_pgtable_walk_flags flags)
>  {
>         int ret;
>         kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
> @@ -1006,7 +1018,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>         struct kvm_pgtable_walker walker = {
>                 .cb             = stage2_attr_walker,
>                 .arg            = &data,
> -               .flags          = KVM_PGTABLE_WALK_LEAF,
> +               .flags          = flags | KVM_PGTABLE_WALK_LEAF,
>         };
>
>         ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> @@ -1025,14 +1037,14 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
>         return stage2_update_leaf_attrs(pgt, addr, size, 0,
>                                         KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
> -                                       NULL, NULL);
> +                                       NULL, NULL, 0);
>  }
>
>  kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
>         stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
> -                                &pte, NULL);
> +                                &pte, NULL, 0);
>         dsb(ishst);
>         return pte;
>  }
> @@ -1041,7 +1053,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
>         stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
> -                                &pte, NULL);
> +                                &pte, NULL, 0);
>         /*
>          * "But where's the TLBI?!", you scream.
>          * "Over in the core code", I sigh.
> @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
>  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
>  {
>         kvm_pte_t pte = 0;
> -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);

Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
doesn't just have to pass 0.


>         return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
>  }
>
> @@ -1077,7 +1089,8 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>         if (prot & KVM_PGTABLE_PROT_X)
>                 clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
>
> -       ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level);
> +       ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
> +                                      KVM_PGTABLE_WALK_SHARED);
>         if (!ret)
>                 kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
>         return ret;
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-09 22:26     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Create a helper to initialize a table and directly call
> smp_store_release() to install it (for now). Prepare for a subsequent
> change that generalizes PTE writes with a helper.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index a34e2050f931..f4dd77c6c97d 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
>         WRITE_ONCE(*ptep, 0);
>  }
>
> -static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> -                             struct kvm_pgtable_mm_ops *mm_ops)
> +static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
>  {
> -       kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> +       kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
>
>         pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
>         pte |= KVM_PTE_VALID;
> -
> -       WARN_ON(kvm_pte_valid(old));

Is there any reason to drop this warning?


> -       smp_store_release(ptep, pte);
> +       return pte;
>  }
>
>  static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> @@ -413,7 +410,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                           enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t *childp;
> +       kvm_pte_t *childp, new;
>         struct hyp_map_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
> @@ -427,8 +424,10 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!childp)
>                 return -ENOMEM;
>
> -       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       new = kvm_init_table_pte(childp, mm_ops);
>         mm_ops->get_page(ctx->ptep);
> +       smp_store_release(ctx->ptep, new);
> +
>         return 0;
>  }
>
> @@ -796,7 +795,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> -       kvm_pte_t *childp;
> +       kvm_pte_t *childp, new;
>         int ret;
>
>         ret = stage2_map_walker_try_leaf(ctx, data);
> @@ -821,8 +820,9 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (stage2_pte_is_counted(ctx->old))
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
>
> -       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       new = kvm_init_table_pte(childp, mm_ops);
>         mm_ops->get_page(ctx->ptep);
> +       smp_store_release(ctx->ptep, new);
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
@ 2022-11-09 22:26     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Create a helper to initialize a table and directly call
> smp_store_release() to install it (for now). Prepare for a subsequent
> change that generalizes PTE writes with a helper.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index a34e2050f931..f4dd77c6c97d 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
>         WRITE_ONCE(*ptep, 0);
>  }
>
> -static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> -                             struct kvm_pgtable_mm_ops *mm_ops)
> +static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
>  {
> -       kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> +       kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
>
>         pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
>         pte |= KVM_PTE_VALID;
> -
> -       WARN_ON(kvm_pte_valid(old));

Is there any reason to drop this warning?


> -       smp_store_release(ptep, pte);
> +       return pte;
>  }
>
>  static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> @@ -413,7 +410,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                           enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t *childp;
> +       kvm_pte_t *childp, new;
>         struct hyp_map_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
> @@ -427,8 +424,10 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!childp)
>                 return -ENOMEM;
>
> -       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       new = kvm_init_table_pte(childp, mm_ops);
>         mm_ops->get_page(ctx->ptep);
> +       smp_store_release(ctx->ptep, new);
> +
>         return 0;
>  }
>
> @@ -796,7 +795,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> -       kvm_pte_t *childp;
> +       kvm_pte_t *childp, new;
>         int ret;
>
>         ret = stage2_map_walker_try_leaf(ctx, data);
> @@ -821,8 +820,9 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (stage2_pte_is_counted(ctx->old))
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
>
> -       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       new = kvm_init_table_pte(childp, mm_ops);
>         mm_ops->get_page(ctx->ptep);
> +       smp_store_release(ctx->ptep, new);
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
@ 2022-11-09 22:26     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Create a helper to initialize a table and directly call
> smp_store_release() to install it (for now). Prepare for a subsequent
> change that generalizes PTE writes with a helper.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index a34e2050f931..f4dd77c6c97d 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
>         WRITE_ONCE(*ptep, 0);
>  }
>
> -static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> -                             struct kvm_pgtable_mm_ops *mm_ops)
> +static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
>  {
> -       kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> +       kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
>
>         pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
>         pte |= KVM_PTE_VALID;
> -
> -       WARN_ON(kvm_pte_valid(old));

Is there any reason to drop this warning?


> -       smp_store_release(ptep, pte);
> +       return pte;
>  }
>
>  static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> @@ -413,7 +410,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>                           enum kvm_pgtable_walk_flags visit)
>  {
> -       kvm_pte_t *childp;
> +       kvm_pte_t *childp, new;
>         struct hyp_map_data *data = ctx->arg;
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>
> @@ -427,8 +424,10 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!childp)
>                 return -ENOMEM;
>
> -       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       new = kvm_init_table_pte(childp, mm_ops);
>         mm_ops->get_page(ctx->ptep);
> +       smp_store_release(ctx->ptep, new);
> +
>         return 0;
>  }
>
> @@ -796,7 +795,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>                                 struct stage2_map_data *data)
>  {
>         struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> -       kvm_pte_t *childp;
> +       kvm_pte_t *childp, new;
>         int ret;
>
>         ret = stage2_map_walker_try_leaf(ctx, data);
> @@ -821,8 +820,9 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (stage2_pte_is_counted(ctx->old))
>                 stage2_put_pte(ctx, data->mmu, mm_ops);
>
> -       kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +       new = kvm_init_table_pte(childp, mm_ops);
>         mm_ops->get_page(ctx->ptep);
> +       smp_store_release(ctx->ptep, new);
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
  2022-11-07 21:58   ` Oliver Upton
  (?)
@ 2022-11-09 22:26     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> In order to service stage-2 faults in parallel, stage-2 table walkers
> must take exclusive ownership of the PTE being worked on. An additional
> requirement of the architecture is that software must perform a
> 'break-before-make' operation when changing the block size used for
> mapping memory.
>
> Roll these two concepts together into helpers for performing a
> 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> has been locked by a software walker. Additionally, use an atomic
> compare-exchange to 'break' the PTE when the stage-2 page tables are
> possibly shared with another software walker. Elide the DSB + TLBI if
> the evicted PTE was invalid (and thus not subject to break-before-make).
>
> All of the atomics do nothing for now, as the stage-2 walker isn't fully
> ready to perform parallel walks.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
>  1 file changed, 75 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index f4dd77c6c97d..b9f0d792b8d9 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -49,6 +49,12 @@
>  #define KVM_INVALID_PTE_OWNER_MASK     GENMASK(9, 2)
>  #define KVM_MAX_OWNER_ID               1
>
> +/*
> + * Used to indicate a pte for which a 'break-before-make' sequence is in
> + * progress.
> + */
> +#define KVM_INVALID_PTE_LOCKED         BIT(10)
> +
>  struct kvm_pgtable_walk_data {
>         struct kvm_pgtable_walker       *walker;
>
> @@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> +static bool stage2_pte_is_locked(kvm_pte_t pte)
> +{
> +       return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> +}
> +
>  static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
>  {
>         if (!kvm_pgtable_walk_shared(ctx)) {
> @@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
>         return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
>  }
>
> +/**
> + * stage2_try_break_pte() - Invalidates a pte according to the
> + *                         'break-before-make' requirements of the
> + *                         architecture.
> + *
> + * @ctx: context of the visited pte.
> + * @mmu: stage-2 mmu
> + *
> + * Returns: true if the pte was successfully broken.
> + *
> + * If the removed pte was valid, performs the necessary serialization and TLB
> + * invalidation for the old value. For counted ptes, drops the reference count
> + * on the containing table page.
> + */
> +static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> +                                struct kvm_s2_mmu *mmu)
> +{
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +
> +       if (stage2_pte_is_locked(ctx->old)) {
> +               /*
> +                * Should never occur if this walker has exclusive access to the
> +                * page tables.
> +                */
> +               WARN_ON(!kvm_pgtable_walk_shared(ctx));
> +               return false;
> +       }
> +
> +       if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
> +               return false;
> +
> +       /*
> +        * Perform the appropriate TLB invalidation based on the evicted pte
> +        * value (if any).
> +        */
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +       else if (kvm_pte_valid(ctx->old))
> +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +
> +       if (stage2_pte_is_counted(ctx->old))
> +               mm_ops->put_page(ctx->ptep);
> +
> +       return true;
> +}
> +
> +static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> +{
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +
> +       WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> +
> +       if (stage2_pte_is_counted(new))
> +               mm_ops->get_page(ctx->ptep);
> +
> +       smp_store_release(ctx->ptep, new);
> +}
> +
>  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
>                            struct kvm_pgtable_mm_ops *mm_ops)
>  {
> @@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!childp)
>                 return -ENOMEM;
>
> +       if (!stage2_try_break_pte(ctx, data->mmu)) {
> +               mm_ops->put_page(childp);
> +               return -EAGAIN;
> +       }
> +
>         /*
>          * If we've run into an existing block mapping then replace it with
>          * a table. Accesses beyond 'end' that fall within the new table
>          * will be mapped lazily.
>          */
> -       if (stage2_pte_is_counted(ctx->old))
> -               stage2_put_pte(ctx, data->mmu, mm_ops);
> -
>         new = kvm_init_table_pte(childp, mm_ops);

Does it make any sense to move this before the "break" to minimize the
critical section in which the PTE is locked?


> -       mm_ops->get_page(ctx->ptep);
> -       smp_store_release(ctx->ptep, new);
> +       stage2_make_pte(ctx, new);
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
@ 2022-11-09 22:26     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> In order to service stage-2 faults in parallel, stage-2 table walkers
> must take exclusive ownership of the PTE being worked on. An additional
> requirement of the architecture is that software must perform a
> 'break-before-make' operation when changing the block size used for
> mapping memory.
>
> Roll these two concepts together into helpers for performing a
> 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> has been locked by a software walker. Additionally, use an atomic
> compare-exchange to 'break' the PTE when the stage-2 page tables are
> possibly shared with another software walker. Elide the DSB + TLBI if
> the evicted PTE was invalid (and thus not subject to break-before-make).
>
> All of the atomics do nothing for now, as the stage-2 walker isn't fully
> ready to perform parallel walks.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
>  1 file changed, 75 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index f4dd77c6c97d..b9f0d792b8d9 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -49,6 +49,12 @@
>  #define KVM_INVALID_PTE_OWNER_MASK     GENMASK(9, 2)
>  #define KVM_MAX_OWNER_ID               1
>
> +/*
> + * Used to indicate a pte for which a 'break-before-make' sequence is in
> + * progress.
> + */
> +#define KVM_INVALID_PTE_LOCKED         BIT(10)
> +
>  struct kvm_pgtable_walk_data {
>         struct kvm_pgtable_walker       *walker;
>
> @@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> +static bool stage2_pte_is_locked(kvm_pte_t pte)
> +{
> +       return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> +}
> +
>  static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
>  {
>         if (!kvm_pgtable_walk_shared(ctx)) {
> @@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
>         return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
>  }
>
> +/**
> + * stage2_try_break_pte() - Invalidates a pte according to the
> + *                         'break-before-make' requirements of the
> + *                         architecture.
> + *
> + * @ctx: context of the visited pte.
> + * @mmu: stage-2 mmu
> + *
> + * Returns: true if the pte was successfully broken.
> + *
> + * If the removed pte was valid, performs the necessary serialization and TLB
> + * invalidation for the old value. For counted ptes, drops the reference count
> + * on the containing table page.
> + */
> +static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> +                                struct kvm_s2_mmu *mmu)
> +{
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +
> +       if (stage2_pte_is_locked(ctx->old)) {
> +               /*
> +                * Should never occur if this walker has exclusive access to the
> +                * page tables.
> +                */
> +               WARN_ON(!kvm_pgtable_walk_shared(ctx));
> +               return false;
> +       }
> +
> +       if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
> +               return false;
> +
> +       /*
> +        * Perform the appropriate TLB invalidation based on the evicted pte
> +        * value (if any).
> +        */
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +       else if (kvm_pte_valid(ctx->old))
> +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +
> +       if (stage2_pte_is_counted(ctx->old))
> +               mm_ops->put_page(ctx->ptep);
> +
> +       return true;
> +}
> +
> +static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> +{
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +
> +       WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> +
> +       if (stage2_pte_is_counted(new))
> +               mm_ops->get_page(ctx->ptep);
> +
> +       smp_store_release(ctx->ptep, new);
> +}
> +
>  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
>                            struct kvm_pgtable_mm_ops *mm_ops)
>  {
> @@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!childp)
>                 return -ENOMEM;
>
> +       if (!stage2_try_break_pte(ctx, data->mmu)) {
> +               mm_ops->put_page(childp);
> +               return -EAGAIN;
> +       }
> +
>         /*
>          * If we've run into an existing block mapping then replace it with
>          * a table. Accesses beyond 'end' that fall within the new table
>          * will be mapped lazily.
>          */
> -       if (stage2_pte_is_counted(ctx->old))
> -               stage2_put_pte(ctx, data->mmu, mm_ops);
> -
>         new = kvm_init_table_pte(childp, mm_ops);

Does it make any sense to move this before the "break" to minimize the
critical section in which the PTE is locked?


> -       mm_ops->get_page(ctx->ptep);
> -       smp_store_release(ctx->ptep, new);
> +       stage2_make_pte(ctx, new);
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
@ 2022-11-09 22:26     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> In order to service stage-2 faults in parallel, stage-2 table walkers
> must take exclusive ownership of the PTE being worked on. An additional
> requirement of the architecture is that software must perform a
> 'break-before-make' operation when changing the block size used for
> mapping memory.
>
> Roll these two concepts together into helpers for performing a
> 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> has been locked by a software walker. Additionally, use an atomic
> compare-exchange to 'break' the PTE when the stage-2 page tables are
> possibly shared with another software walker. Elide the DSB + TLBI if
> the evicted PTE was invalid (and thus not subject to break-before-make).
>
> All of the atomics do nothing for now, as the stage-2 walker isn't fully
> ready to perform parallel walks.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
>  1 file changed, 75 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index f4dd77c6c97d..b9f0d792b8d9 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -49,6 +49,12 @@
>  #define KVM_INVALID_PTE_OWNER_MASK     GENMASK(9, 2)
>  #define KVM_MAX_OWNER_ID               1
>
> +/*
> + * Used to indicate a pte for which a 'break-before-make' sequence is in
> + * progress.
> + */
> +#define KVM_INVALID_PTE_LOCKED         BIT(10)
> +
>  struct kvm_pgtable_walk_data {
>         struct kvm_pgtable_walker       *walker;
>
> @@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>         return !!pte;
>  }
>
> +static bool stage2_pte_is_locked(kvm_pte_t pte)
> +{
> +       return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> +}
> +
>  static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
>  {
>         if (!kvm_pgtable_walk_shared(ctx)) {
> @@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
>         return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
>  }
>
> +/**
> + * stage2_try_break_pte() - Invalidates a pte according to the
> + *                         'break-before-make' requirements of the
> + *                         architecture.
> + *
> + * @ctx: context of the visited pte.
> + * @mmu: stage-2 mmu
> + *
> + * Returns: true if the pte was successfully broken.
> + *
> + * If the removed pte was valid, performs the necessary serialization and TLB
> + * invalidation for the old value. For counted ptes, drops the reference count
> + * on the containing table page.
> + */
> +static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> +                                struct kvm_s2_mmu *mmu)
> +{
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +
> +       if (stage2_pte_is_locked(ctx->old)) {
> +               /*
> +                * Should never occur if this walker has exclusive access to the
> +                * page tables.
> +                */
> +               WARN_ON(!kvm_pgtable_walk_shared(ctx));
> +               return false;
> +       }
> +
> +       if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
> +               return false;
> +
> +       /*
> +        * Perform the appropriate TLB invalidation based on the evicted pte
> +        * value (if any).
> +        */
> +       if (kvm_pte_table(ctx->old, ctx->level))
> +               kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +       else if (kvm_pte_valid(ctx->old))
> +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +
> +       if (stage2_pte_is_counted(ctx->old))
> +               mm_ops->put_page(ctx->ptep);
> +
> +       return true;
> +}
> +
> +static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> +{
> +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> +
> +       WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> +
> +       if (stage2_pte_is_counted(new))
> +               mm_ops->get_page(ctx->ptep);
> +
> +       smp_store_release(ctx->ptep, new);
> +}
> +
>  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
>                            struct kvm_pgtable_mm_ops *mm_ops)
>  {
> @@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (!childp)
>                 return -ENOMEM;
>
> +       if (!stage2_try_break_pte(ctx, data->mmu)) {
> +               mm_ops->put_page(childp);
> +               return -EAGAIN;
> +       }
> +
>         /*
>          * If we've run into an existing block mapping then replace it with
>          * a table. Accesses beyond 'end' that fall within the new table
>          * will be mapped lazily.
>          */
> -       if (stage2_pte_is_counted(ctx->old))
> -               stage2_put_pte(ctx, data->mmu, mm_ops);
> -
>         new = kvm_init_table_pte(childp, mm_ops);

Does it make any sense to move this before the "break" to minimize the
critical section in which the PTE is locked?


> -       mm_ops->get_page(ctx->ptep);
> -       smp_store_release(ctx->ptep, new);
> +       stage2_make_pte(ctx, new);
>
>         return 0;
>  }
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
  2022-11-07 21:59   ` Oliver Upton
  (?)
@ 2022-11-09 22:26     ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Convert stage2_map_walker_try_leaf() to use the new break-before-make
> helpers, thereby making the handler parallel-aware. As before, avoid the
> break-before-make if recreating the existing mapping. Additionally,
> retry execution if another vCPU thread is modifying the same PTE.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b9f0d792b8d9..238f29389617 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -804,18 +804,17 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> -       if (stage2_pte_is_counted(ctx->old)) {
> -               /*
> -                * Skip updating the PTE if we are trying to recreate the exact
> -                * same mapping or only change the access permissions. Instead,
> -                * the vCPU will exit one more time from guest if still needed
> -                * and then go through the path of relaxing permissions.
> -                */
> -               if (!stage2_pte_needs_update(ctx->old, new))
> -                       return -EAGAIN;
> +       /*
> +        * Skip updating the PTE if we are trying to recreate the exact
> +        * same mapping or only change the access permissions. Instead,
> +        * the vCPU will exit one more time from guest if still needed
> +        * and then go through the path of relaxing permissions.
> +        */
> +       if (!stage2_pte_needs_update(ctx->old, new))
> +               return -EAGAIN;
>
> -               stage2_put_pte(ctx, data->mmu, mm_ops);
> -       }
> +       if (!stage2_try_break_pte(ctx, data->mmu))
> +               return -EAGAIN;
>
>         /* Perform CMOs before installation of the guest stage-2 PTE */
>         if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> @@ -825,9 +824,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>                 mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>
> -       smp_store_release(ctx->ptep, new);
> -       if (stage2_pte_is_counted(new))
> -               mm_ops->get_page(ctx->ptep);
> +       stage2_make_pte(ctx, new);
> +
>         if (kvm_phys_is_valid(phys))
>                 data->phys += granule;
>         return 0;
> --
> 2.38.1.431.g37b22c650d-goog
>

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
@ 2022-11-09 22:26     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Convert stage2_map_walker_try_leaf() to use the new break-before-make
> helpers, thereby making the handler parallel-aware. As before, avoid the
> break-before-make if recreating the existing mapping. Additionally,
> retry execution if another vCPU thread is modifying the same PTE.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b9f0d792b8d9..238f29389617 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -804,18 +804,17 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> -       if (stage2_pte_is_counted(ctx->old)) {
> -               /*
> -                * Skip updating the PTE if we are trying to recreate the exact
> -                * same mapping or only change the access permissions. Instead,
> -                * the vCPU will exit one more time from guest if still needed
> -                * and then go through the path of relaxing permissions.
> -                */
> -               if (!stage2_pte_needs_update(ctx->old, new))
> -                       return -EAGAIN;
> +       /*
> +        * Skip updating the PTE if we are trying to recreate the exact
> +        * same mapping or only change the access permissions. Instead,
> +        * the vCPU will exit one more time from guest if still needed
> +        * and then go through the path of relaxing permissions.
> +        */
> +       if (!stage2_pte_needs_update(ctx->old, new))
> +               return -EAGAIN;
>
> -               stage2_put_pte(ctx, data->mmu, mm_ops);
> -       }
> +       if (!stage2_try_break_pte(ctx, data->mmu))
> +               return -EAGAIN;
>
>         /* Perform CMOs before installation of the guest stage-2 PTE */
>         if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> @@ -825,9 +824,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>                 mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>
> -       smp_store_release(ctx->ptep, new);
> -       if (stage2_pte_is_counted(new))
> -               mm_ops->get_page(ctx->ptep);
> +       stage2_make_pte(ctx, new);
> +
>         if (kvm_phys_is_valid(phys))
>                 data->phys += granule;
>         return 0;
> --
> 2.38.1.431.g37b22c650d-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
@ 2022-11-09 22:26     ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 22:26 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Convert stage2_map_walker_try_leaf() to use the new break-before-make
> helpers, thereby making the handler parallel-aware. As before, avoid the
> break-before-make if recreating the existing mapping. Additionally,
> retry execution if another vCPU thread is modifying the same PTE.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

Reviewed-by: Ben Gardon <bgardon@google.com>


> ---
>  arch/arm64/kvm/hyp/pgtable.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b9f0d792b8d9..238f29389617 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -804,18 +804,17 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         else
>                 new = kvm_init_invalid_leaf_owner(data->owner_id);
>
> -       if (stage2_pte_is_counted(ctx->old)) {
> -               /*
> -                * Skip updating the PTE if we are trying to recreate the exact
> -                * same mapping or only change the access permissions. Instead,
> -                * the vCPU will exit one more time from guest if still needed
> -                * and then go through the path of relaxing permissions.
> -                */
> -               if (!stage2_pte_needs_update(ctx->old, new))
> -                       return -EAGAIN;
> +       /*
> +        * Skip updating the PTE if we are trying to recreate the exact
> +        * same mapping or only change the access permissions. Instead,
> +        * the vCPU will exit one more time from guest if still needed
> +        * and then go through the path of relaxing permissions.
> +        */
> +       if (!stage2_pte_needs_update(ctx->old, new))
> +               return -EAGAIN;
>
> -               stage2_put_pte(ctx, data->mmu, mm_ops);
> -       }
> +       if (!stage2_try_break_pte(ctx, data->mmu))
> +               return -EAGAIN;
>
>         /* Perform CMOs before installation of the guest stage-2 PTE */
>         if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> @@ -825,9 +824,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>         if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>                 mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>
> -       smp_store_release(ctx->ptep, new);
> -       if (stage2_pte_is_counted(new))
> -               mm_ops->get_page(ctx->ptep);
> +       stage2_make_pte(ctx, new);
> +
>         if (kvm_phys_is_valid(phys))
>                 data->phys += granule;
>         return 0;
> --
> 2.38.1.431.g37b22c650d-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
  2022-11-09 22:26     ` Ben Gardon
  (?)
@ 2022-11-09 22:42       ` Sean Christopherson
  -1 siblings, 0 replies; 156+ messages in thread
From: Sean Christopherson @ 2022-11-09 22:42 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, Nov 09, 2022, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> >  {
> >         kvm_pte_t pte = 0;
> > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> 
> Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> doesn't just have to pass 0.

That's also dangerous though since the param is a set of flags, not unique,
arbitrary values.  E.g. this won't do the expected thing

	if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)

I assume compilers would complain, but never say never when it comes to compilers :-)

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-09 22:42       ` Sean Christopherson
  0 siblings, 0 replies; 156+ messages in thread
From: Sean Christopherson @ 2022-11-09 22:42 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Wed, Nov 09, 2022, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> >  {
> >         kvm_pte_t pte = 0;
> > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> 
> Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> doesn't just have to pass 0.

That's also dangerous though since the param is a set of flags, not unique,
arbitrary values.  E.g. this won't do the expected thing

	if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)

I assume compilers would complain, but never say never when it comes to compilers :-)
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-09 22:42       ` Sean Christopherson
  0 siblings, 0 replies; 156+ messages in thread
From: Sean Christopherson @ 2022-11-09 22:42 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, Nov 09, 2022, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> >  {
> >         kvm_pte_t pte = 0;
> > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> 
> Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> doesn't just have to pass 0.

That's also dangerous though since the param is a set of flags, not unique,
arbitrary values.  E.g. this won't do the expected thing

	if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)

I assume compilers would complain, but never say never when it comes to compilers :-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
  2022-11-09 22:23     ` Ben Gardon
  (?)
@ 2022-11-09 22:48       ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 22:48 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, Nov 09, 2022 at 02:23:08PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Passing new arguments by value to the visitor callbacks is extremely
> > inflexible for stuffing new parameters used by only some of the
> > visitors. Use a context structure instead and pass the pointer through
> > to the visitor callback.
> >
> > While at it, redefine the 'flags' parameter to the visitor to contain
> > the bit indicating the phase of the walk. Pass the entire set of flags
> > through the context structure such that the walker can communicate
> > additional state to the visitor callback.
> >
> > No functional change intended.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> 
> This looks good to me. It's all fairly mechanical and I don't see any
> problems. I was a little confused by the walk context flags passed via
> visit, because they seem somewhat redundant if the leaf-ness can be
> determined by looking at the PTE, but perhaps that's not always
> possible.

Some explanation is probably owed here. I think you caught the detail
later on in the series, but I'm overloading flags to describe both the
requested visits and some properties about the walk (i.e. a SHARED
walk).

I tried to leave it sufficiently generic as there will be other
configuration bits we will want to stuff into a walker later on (such as
TLBI and CMO elision).

> Reviewed-by: Ben Gardon <bgardon@google.com>

Thanks!

--
Best,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-09 22:48       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 22:48 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Wed, Nov 09, 2022 at 02:23:08PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Passing new arguments by value to the visitor callbacks is extremely
> > inflexible for stuffing new parameters used by only some of the
> > visitors. Use a context structure instead and pass the pointer through
> > to the visitor callback.
> >
> > While at it, redefine the 'flags' parameter to the visitor to contain
> > the bit indicating the phase of the walk. Pass the entire set of flags
> > through the context structure such that the walker can communicate
> > additional state to the visitor callback.
> >
> > No functional change intended.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> 
> This looks good to me. It's all fairly mechanical and I don't see any
> problems. I was a little confused by the walk context flags passed via
> visit, because they seem somewhat redundant if the leaf-ness can be
> determined by looking at the PTE, but perhaps that's not always
> possible.

Some explanation is probably owed here. I think you caught the detail
later on in the series, but I'm overloading flags to describe both the
requested visits and some properties about the walk (i.e. a SHARED
walk).

I tried to leave it sufficiently generic as there will be other
configuration bits we will want to stuff into a walker later on (such as
TLBI and CMO elision).

> Reviewed-by: Ben Gardon <bgardon@google.com>

Thanks!

--
Best,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-09 22:48       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 22:48 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, Nov 09, 2022 at 02:23:08PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Passing new arguments by value to the visitor callbacks is extremely
> > inflexible for stuffing new parameters used by only some of the
> > visitors. Use a context structure instead and pass the pointer through
> > to the visitor callback.
> >
> > While at it, redefine the 'flags' parameter to the visitor to contain
> > the bit indicating the phase of the walk. Pass the entire set of flags
> > through the context structure such that the walker can communicate
> > additional state to the visitor callback.
> >
> > No functional change intended.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> 
> This looks good to me. It's all fairly mechanical and I don't see any
> problems. I was a little confused by the walk context flags passed via
> visit, because they seem somewhat redundant if the leaf-ness can be
> determined by looking at the PTE, but perhaps that's not always
> possible.

Some explanation is probably owed here. I think you caught the detail
later on in the series, but I'm overloading flags to describe both the
requested visits and some properties about the walk (i.e. a SHARED
walk).

I tried to leave it sufficiently generic as there will be other
configuration bits we will want to stuff into a walker later on (such as
TLBI and CMO elision).

> Reviewed-by: Ben Gardon <bgardon@google.com>

Thanks!

--
Best,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
  2022-11-09 22:23     ` Ben Gardon
  (?)
@ 2022-11-09 22:54       ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 22:54 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, Nov 09, 2022 at 02:23:33PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > A subsequent change to KVM will move the tear down of an unlinked
> > stage-2 subtree out of the critical path of the break-before-make
> > sequence.
> >
> > Introduce a new helper for tearing down unlinked stage-2 subtrees.
> > Leverage the existing stage-2 free walkers to do so, with a deep call
> > into __kvm_pgtable_walk() as the subtree is no longer reachable from the
> > root.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
> >  2 files changed, 34 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index a752793482cb..93b1feeaebab 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
> >   */
> >  void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
> >
> > +/**
> > + * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
> > + * @mm_ops:    Memory management callbacks.
> > + * @pgtable:   Unlinked stage-2 paging structure to be freed.
> > + * @level:     Level of the stage-2 paging structure to be freed.
> > + *
> > + * The page-table is assumed to be unreachable by any hardware walkers prior to
> > + * freeing and therefore no TLB invalidation is performed.
> > + */
> > +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
> > +
> >  /**
> >   * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
> >   * @pgt:       Page-table structure initialised by kvm_pgtable_stage2_init*().
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 93989b750a26..363a5cce7e1a 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
> >         pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> >         pgt->pgd = NULL;
> >  }
> > +
> > +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
> > +{
> > +       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> > +       struct kvm_pgtable_walker walker = {
> > +               .cb     = stage2_free_walker,
> > +               .flags  = KVM_PGTABLE_WALK_LEAF |
> > +                         KVM_PGTABLE_WALK_TABLE_POST,
> > +       };
> > +       struct kvm_pgtable_walk_data data = {
> > +               .walker = &walker,
> > +
> > +               /*
> > +                * At this point the IPA really doesn't matter, as the page
> > +                * table being traversed has already been removed from the stage
> > +                * 2. Set an appropriate range to cover the entire page table.
> > +                */
> > +               .addr   = 0,
> > +               .end    = kvm_granule_size(level),
> > +       };
> > +
> > +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> > +}
> 
> Will this callback be able to yield? In my experience, if processing a
> large teardown (i.e. level >=3 / maps 512G region) it's possible to
> hit scheduler tick warnings.

No, but this is a pretty obvious problem with all of our table walkers,
which led to commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

We're lucky in that the largest supported granule across all page table
sizes is 1GB (no true 5-level paging yet), so it may not be too
horrendous.

But yeah, it is on the list of things to fix :)

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
@ 2022-11-09 22:54       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 22:54 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Wed, Nov 09, 2022 at 02:23:33PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > A subsequent change to KVM will move the tear down of an unlinked
> > stage-2 subtree out of the critical path of the break-before-make
> > sequence.
> >
> > Introduce a new helper for tearing down unlinked stage-2 subtrees.
> > Leverage the existing stage-2 free walkers to do so, with a deep call
> > into __kvm_pgtable_walk() as the subtree is no longer reachable from the
> > root.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
> >  2 files changed, 34 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index a752793482cb..93b1feeaebab 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
> >   */
> >  void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
> >
> > +/**
> > + * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
> > + * @mm_ops:    Memory management callbacks.
> > + * @pgtable:   Unlinked stage-2 paging structure to be freed.
> > + * @level:     Level of the stage-2 paging structure to be freed.
> > + *
> > + * The page-table is assumed to be unreachable by any hardware walkers prior to
> > + * freeing and therefore no TLB invalidation is performed.
> > + */
> > +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
> > +
> >  /**
> >   * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
> >   * @pgt:       Page-table structure initialised by kvm_pgtable_stage2_init*().
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 93989b750a26..363a5cce7e1a 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
> >         pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> >         pgt->pgd = NULL;
> >  }
> > +
> > +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
> > +{
> > +       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> > +       struct kvm_pgtable_walker walker = {
> > +               .cb     = stage2_free_walker,
> > +               .flags  = KVM_PGTABLE_WALK_LEAF |
> > +                         KVM_PGTABLE_WALK_TABLE_POST,
> > +       };
> > +       struct kvm_pgtable_walk_data data = {
> > +               .walker = &walker,
> > +
> > +               /*
> > +                * At this point the IPA really doesn't matter, as the page
> > +                * table being traversed has already been removed from the stage
> > +                * 2. Set an appropriate range to cover the entire page table.
> > +                */
> > +               .addr   = 0,
> > +               .end    = kvm_granule_size(level),
> > +       };
> > +
> > +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> > +}
> 
> Will this callback be able to yield? In my experience, if processing a
> large teardown (i.e. level >=3 / maps 512G region) it's possible to
> hit scheduler tick warnings.

No, but this is a pretty obvious problem with all of our table walkers,
which led to commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

We're lucky in that the largest supported granule across all page table
sizes is 1GB (no true 5-level paging yet), so it may not be too
horrendous.

But yeah, it is on the list of things to fix :)

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
@ 2022-11-09 22:54       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 22:54 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, Nov 09, 2022 at 02:23:33PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > A subsequent change to KVM will move the tear down of an unlinked
> > stage-2 subtree out of the critical path of the break-before-make
> > sequence.
> >
> > Introduce a new helper for tearing down unlinked stage-2 subtrees.
> > Leverage the existing stage-2 free walkers to do so, with a deep call
> > into __kvm_pgtable_walk() as the subtree is no longer reachable from the
> > root.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 23 +++++++++++++++++++++++
> >  2 files changed, 34 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index a752793482cb..93b1feeaebab 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -333,6 +333,17 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
> >   */
> >  void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
> >
> > +/**
> > + * kvm_pgtable_stage2_free_removed() - Free a removed stage-2 paging structure.
> > + * @mm_ops:    Memory management callbacks.
> > + * @pgtable:   Unlinked stage-2 paging structure to be freed.
> > + * @level:     Level of the stage-2 paging structure to be freed.
> > + *
> > + * The page-table is assumed to be unreachable by any hardware walkers prior to
> > + * freeing and therefore no TLB invalidation is performed.
> > + */
> > +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
> > +
> >  /**
> >   * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
> >   * @pgt:       Page-table structure initialised by kvm_pgtable_stage2_init*().
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 93989b750a26..363a5cce7e1a 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -1203,3 +1203,26 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
> >         pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> >         pgt->pgd = NULL;
> >  }
> > +
> > +void kvm_pgtable_stage2_free_removed(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
> > +{
> > +       kvm_pte_t *ptep = (kvm_pte_t *)pgtable;
> > +       struct kvm_pgtable_walker walker = {
> > +               .cb     = stage2_free_walker,
> > +               .flags  = KVM_PGTABLE_WALK_LEAF |
> > +                         KVM_PGTABLE_WALK_TABLE_POST,
> > +       };
> > +       struct kvm_pgtable_walk_data data = {
> > +               .walker = &walker,
> > +
> > +               /*
> > +                * At this point the IPA really doesn't matter, as the page
> > +                * table being traversed has already been removed from the stage
> > +                * 2. Set an appropriate range to cover the entire page table.
> > +                */
> > +               .addr   = 0,
> > +               .end    = kvm_granule_size(level),
> > +       };
> > +
> > +       WARN_ON(__kvm_pgtable_walk(&data, mm_ops, ptep, level));
> > +}
> 
> Will this callback be able to yield? In my experience, if processing a
> large teardown (i.e. level >=3 / maps 512G region) it's possible to
> hit scheduler tick warnings.

No, but this is a pretty obvious problem with all of our table walkers,
which led to commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

We're lucky in that the largest supported granule across all page table
sizes is 1GB (no true 5-level paging yet), so it may not be too
horrendous.

But yeah, it is on the list of things to fix :)

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
  2022-11-09 22:42       ` Sean Christopherson
  (?)
@ 2022-11-09 23:00         ` Ben Gardon
  -1 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 23:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, Nov 9, 2022 at 2:42 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Nov 09, 2022, Ben Gardon wrote:
> > On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> > >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> > >  {
> > >         kvm_pte_t pte = 0;
> > > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> >
> > Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> > doesn't just have to pass 0.
>
> That's also dangerous though since the param is a set of flags, not unique,
> arbitrary values.  E.g. this won't do the expected thing
>
>         if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)
>
> I assume compilers would complain, but never say never when it comes to compilers :-)

Yeah, I was thinking about that too. IMO using one enum for multiple
flags is kind of an abuse of the enum. If you're going to put multiple
orthogonal flags in an int or whatever, it would probably be best to
have separate enums for each flag. That way you can define masks to
extract the enum from the int and only compare with == and != as
opposed to using &.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-09 23:00         ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 23:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, Nov 9, 2022 at 2:42 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Nov 09, 2022, Ben Gardon wrote:
> > On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> > >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> > >  {
> > >         kvm_pte_t pte = 0;
> > > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> >
> > Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> > doesn't just have to pass 0.
>
> That's also dangerous though since the param is a set of flags, not unique,
> arbitrary values.  E.g. this won't do the expected thing
>
>         if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)
>
> I assume compilers would complain, but never say never when it comes to compilers :-)

Yeah, I was thinking about that too. IMO using one enum for multiple
flags is kind of an abuse of the enum. If you're going to put multiple
orthogonal flags in an int or whatever, it would probably be best to
have separate enums for each flag. That way you can define masks to
extract the enum from the int and only compare with == and != as
opposed to using &.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-09 23:00         ` Ben Gardon
  0 siblings, 0 replies; 156+ messages in thread
From: Ben Gardon @ 2022-11-09 23:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Wed, Nov 9, 2022 at 2:42 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Nov 09, 2022, Ben Gardon wrote:
> > On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> > >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> > >  {
> > >         kvm_pte_t pte = 0;
> > > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> >
> > Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> > doesn't just have to pass 0.
>
> That's also dangerous though since the param is a set of flags, not unique,
> arbitrary values.  E.g. this won't do the expected thing
>
>         if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)
>
> I assume compilers would complain, but never say never when it comes to compilers :-)

Yeah, I was thinking about that too. IMO using one enum for multiple
flags is kind of an abuse of the enum. If you're going to put multiple
orthogonal flags in an int or whatever, it would probably be best to
have separate enums for each flag. That way you can define masks to
extract the enum from the int and only compare with == and != as
opposed to using &.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
  2022-11-09 22:26     ` Ben Gardon
  (?)
@ 2022-11-09 23:00       ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:00 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, Nov 09, 2022 at 02:26:26PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Create a helper to initialize a table and directly call
> > smp_store_release() to install it (for now). Prepare for a subsequent
> > change that generalizes PTE writes with a helper.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index a34e2050f931..f4dd77c6c97d 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
> >         WRITE_ONCE(*ptep, 0);
> >  }
> >
> > -static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> > -                             struct kvm_pgtable_mm_ops *mm_ops)
> > +static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
> >  {
> > -       kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> > +       kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> >
> >         pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
> >         pte |= KVM_PTE_VALID;
> > -
> > -       WARN_ON(kvm_pte_valid(old));
> 
> Is there any reason to drop this warning?

It is (eventually) superseded by a WARN() when a PTE isn't locked in
stage2_make_pte(), but that isn't obvious in this patch alone.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
@ 2022-11-09 23:00       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:00 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Wed, Nov 09, 2022 at 02:26:26PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Create a helper to initialize a table and directly call
> > smp_store_release() to install it (for now). Prepare for a subsequent
> > change that generalizes PTE writes with a helper.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index a34e2050f931..f4dd77c6c97d 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
> >         WRITE_ONCE(*ptep, 0);
> >  }
> >
> > -static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> > -                             struct kvm_pgtable_mm_ops *mm_ops)
> > +static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
> >  {
> > -       kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> > +       kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> >
> >         pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
> >         pte |= KVM_PTE_VALID;
> > -
> > -       WARN_ON(kvm_pte_valid(old));
> 
> Is there any reason to drop this warning?

It is (eventually) superseded by a WARN() when a PTE isn't locked in
stage2_make_pte(), but that isn't obvious in this patch alone.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE
@ 2022-11-09 23:00       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:00 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, Nov 09, 2022 at 02:26:26PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Create a helper to initialize a table and directly call
> > smp_store_release() to install it (for now). Prepare for a subsequent
> > change that generalizes PTE writes with a helper.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++----------
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index a34e2050f931..f4dd77c6c97d 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -136,16 +136,13 @@ static void kvm_clear_pte(kvm_pte_t *ptep)
> >         WRITE_ONCE(*ptep, 0);
> >  }
> >
> > -static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp,
> > -                             struct kvm_pgtable_mm_ops *mm_ops)
> > +static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops *mm_ops)
> >  {
> > -       kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> > +       kvm_pte_t pte = kvm_phys_to_pte(mm_ops->virt_to_phys(childp));
> >
> >         pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
> >         pte |= KVM_PTE_VALID;
> > -
> > -       WARN_ON(kvm_pte_valid(old));
> 
> Is there any reason to drop this warning?

It is (eventually) superseded by a WARN() when a PTE isn't locked in
stage2_make_pte(), but that isn't obvious in this patch alone.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
  2022-11-09 22:26     ` Ben Gardon
  (?)
@ 2022-11-09 23:03       ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:03 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, Nov 09, 2022 at 02:26:36PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > In order to service stage-2 faults in parallel, stage-2 table walkers
> > must take exclusive ownership of the PTE being worked on. An additional
> > requirement of the architecture is that software must perform a
> > 'break-before-make' operation when changing the block size used for
> > mapping memory.
> >
> > Roll these two concepts together into helpers for performing a
> > 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> > has been locked by a software walker. Additionally, use an atomic
> > compare-exchange to 'break' the PTE when the stage-2 page tables are
> > possibly shared with another software walker. Elide the DSB + TLBI if
> > the evicted PTE was invalid (and thus not subject to break-before-make).
> >
> > All of the atomics do nothing for now, as the stage-2 walker isn't fully
> > ready to perform parallel walks.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
> >  1 file changed, 75 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index f4dd77c6c97d..b9f0d792b8d9 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -49,6 +49,12 @@
> >  #define KVM_INVALID_PTE_OWNER_MASK     GENMASK(9, 2)
> >  #define KVM_MAX_OWNER_ID               1
> >
> > +/*
> > + * Used to indicate a pte for which a 'break-before-make' sequence is in
> > + * progress.
> > + */
> > +#define KVM_INVALID_PTE_LOCKED         BIT(10)
> > +
> >  struct kvm_pgtable_walk_data {
> >         struct kvm_pgtable_walker       *walker;
> >
> > @@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
> >         return !!pte;
> >  }
> >
> > +static bool stage2_pte_is_locked(kvm_pte_t pte)
> > +{
> > +       return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> > +}
> > +
> >  static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> >  {
> >         if (!kvm_pgtable_walk_shared(ctx)) {
> > @@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
> >         return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
> >  }
> >
> > +/**
> > + * stage2_try_break_pte() - Invalidates a pte according to the
> > + *                         'break-before-make' requirements of the
> > + *                         architecture.
> > + *
> > + * @ctx: context of the visited pte.
> > + * @mmu: stage-2 mmu
> > + *
> > + * Returns: true if the pte was successfully broken.
> > + *
> > + * If the removed pte was valid, performs the necessary serialization and TLB
> > + * invalidation for the old value. For counted ptes, drops the reference count
> > + * on the containing table page.
> > + */
> > +static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> > +                                struct kvm_s2_mmu *mmu)
> > +{
> > +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> > +
> > +       if (stage2_pte_is_locked(ctx->old)) {
> > +               /*
> > +                * Should never occur if this walker has exclusive access to the
> > +                * page tables.
> > +                */
> > +               WARN_ON(!kvm_pgtable_walk_shared(ctx));
> > +               return false;
> > +       }
> > +
> > +       if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
> > +               return false;
> > +
> > +       /*
> > +        * Perform the appropriate TLB invalidation based on the evicted pte
> > +        * value (if any).
> > +        */
> > +       if (kvm_pte_table(ctx->old, ctx->level))
> > +               kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> > +       else if (kvm_pte_valid(ctx->old))
> > +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> > +
> > +       if (stage2_pte_is_counted(ctx->old))
> > +               mm_ops->put_page(ctx->ptep);
> > +
> > +       return true;
> > +}
> > +
> > +static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> > +{
> > +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> > +
> > +       WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> > +
> > +       if (stage2_pte_is_counted(new))
> > +               mm_ops->get_page(ctx->ptep);
> > +
> > +       smp_store_release(ctx->ptep, new);
> > +}
> > +
> >  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> >                            struct kvm_pgtable_mm_ops *mm_ops)
> >  {
> > @@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> >         if (!childp)
> >                 return -ENOMEM;
> >
> > +       if (!stage2_try_break_pte(ctx, data->mmu)) {
> > +               mm_ops->put_page(childp);
> > +               return -EAGAIN;
> > +       }
> > +
> >         /*
> >          * If we've run into an existing block mapping then replace it with
> >          * a table. Accesses beyond 'end' that fall within the new table
> >          * will be mapped lazily.
> >          */
> > -       if (stage2_pte_is_counted(ctx->old))
> > -               stage2_put_pte(ctx, data->mmu, mm_ops);
> > -
> >         new = kvm_init_table_pte(childp, mm_ops);
> 
> Does it make any sense to move this before the "break" to minimize the
> critical section in which the PTE is locked?

I had rationalized this before as doing less work in the threads that
would lose a race, but the critical section is very likely to be
performance sensitive as we're unmapping memory after all.

Thanks for the suggestion, I'll fold it in the next spin.

--
Best,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
@ 2022-11-09 23:03       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:03 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, David Matlack, kvmarm,
	linux-arm-kernel

On Wed, Nov 09, 2022 at 02:26:36PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > In order to service stage-2 faults in parallel, stage-2 table walkers
> > must take exclusive ownership of the PTE being worked on. An additional
> > requirement of the architecture is that software must perform a
> > 'break-before-make' operation when changing the block size used for
> > mapping memory.
> >
> > Roll these two concepts together into helpers for performing a
> > 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> > has been locked by a software walker. Additionally, use an atomic
> > compare-exchange to 'break' the PTE when the stage-2 page tables are
> > possibly shared with another software walker. Elide the DSB + TLBI if
> > the evicted PTE was invalid (and thus not subject to break-before-make).
> >
> > All of the atomics do nothing for now, as the stage-2 walker isn't fully
> > ready to perform parallel walks.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
> >  1 file changed, 75 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index f4dd77c6c97d..b9f0d792b8d9 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -49,6 +49,12 @@
> >  #define KVM_INVALID_PTE_OWNER_MASK     GENMASK(9, 2)
> >  #define KVM_MAX_OWNER_ID               1
> >
> > +/*
> > + * Used to indicate a pte for which a 'break-before-make' sequence is in
> > + * progress.
> > + */
> > +#define KVM_INVALID_PTE_LOCKED         BIT(10)
> > +
> >  struct kvm_pgtable_walk_data {
> >         struct kvm_pgtable_walker       *walker;
> >
> > @@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
> >         return !!pte;
> >  }
> >
> > +static bool stage2_pte_is_locked(kvm_pte_t pte)
> > +{
> > +       return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> > +}
> > +
> >  static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> >  {
> >         if (!kvm_pgtable_walk_shared(ctx)) {
> > @@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
> >         return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
> >  }
> >
> > +/**
> > + * stage2_try_break_pte() - Invalidates a pte according to the
> > + *                         'break-before-make' requirements of the
> > + *                         architecture.
> > + *
> > + * @ctx: context of the visited pte.
> > + * @mmu: stage-2 mmu
> > + *
> > + * Returns: true if the pte was successfully broken.
> > + *
> > + * If the removed pte was valid, performs the necessary serialization and TLB
> > + * invalidation for the old value. For counted ptes, drops the reference count
> > + * on the containing table page.
> > + */
> > +static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> > +                                struct kvm_s2_mmu *mmu)
> > +{
> > +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> > +
> > +       if (stage2_pte_is_locked(ctx->old)) {
> > +               /*
> > +                * Should never occur if this walker has exclusive access to the
> > +                * page tables.
> > +                */
> > +               WARN_ON(!kvm_pgtable_walk_shared(ctx));
> > +               return false;
> > +       }
> > +
> > +       if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
> > +               return false;
> > +
> > +       /*
> > +        * Perform the appropriate TLB invalidation based on the evicted pte
> > +        * value (if any).
> > +        */
> > +       if (kvm_pte_table(ctx->old, ctx->level))
> > +               kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> > +       else if (kvm_pte_valid(ctx->old))
> > +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> > +
> > +       if (stage2_pte_is_counted(ctx->old))
> > +               mm_ops->put_page(ctx->ptep);
> > +
> > +       return true;
> > +}
> > +
> > +static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> > +{
> > +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> > +
> > +       WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> > +
> > +       if (stage2_pte_is_counted(new))
> > +               mm_ops->get_page(ctx->ptep);
> > +
> > +       smp_store_release(ctx->ptep, new);
> > +}
> > +
> >  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> >                            struct kvm_pgtable_mm_ops *mm_ops)
> >  {
> > @@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> >         if (!childp)
> >                 return -ENOMEM;
> >
> > +       if (!stage2_try_break_pte(ctx, data->mmu)) {
> > +               mm_ops->put_page(childp);
> > +               return -EAGAIN;
> > +       }
> > +
> >         /*
> >          * If we've run into an existing block mapping then replace it with
> >          * a table. Accesses beyond 'end' that fall within the new table
> >          * will be mapped lazily.
> >          */
> > -       if (stage2_pte_is_counted(ctx->old))
> > -               stage2_put_pte(ctx, data->mmu, mm_ops);
> > -
> >         new = kvm_init_table_pte(childp, mm_ops);
> 
> Does it make any sense to move this before the "break" to minimize the
> critical section in which the PTE is locked?

I had rationalized this before as doing less work in the threads that
would lose a race, but the critical section is very likely to be
performance sensitive as we're unmapping memory after all.

Thanks for the suggestion, I'll fold it in the next spin.

--
Best,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware
@ 2022-11-09 23:03       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:03 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, Nov 09, 2022 at 02:26:36PM -0800, Ben Gardon wrote:
> On Mon, Nov 7, 2022 at 1:59 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > In order to service stage-2 faults in parallel, stage-2 table walkers
> > must take exclusive ownership of the PTE being worked on. An additional
> > requirement of the architecture is that software must perform a
> > 'break-before-make' operation when changing the block size used for
> > mapping memory.
> >
> > Roll these two concepts together into helpers for performing a
> > 'break-before-make' sequence. Use a special PTE value to indicate a PTE
> > has been locked by a software walker. Additionally, use an atomic
> > compare-exchange to 'break' the PTE when the stage-2 page tables are
> > possibly shared with another software walker. Elide the DSB + TLBI if
> > the evicted PTE was invalid (and thus not subject to break-before-make).
> >
> > All of the atomics do nothing for now, as the stage-2 walker isn't fully
> > ready to perform parallel walks.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 80 +++++++++++++++++++++++++++++++++---
> >  1 file changed, 75 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index f4dd77c6c97d..b9f0d792b8d9 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -49,6 +49,12 @@
> >  #define KVM_INVALID_PTE_OWNER_MASK     GENMASK(9, 2)
> >  #define KVM_MAX_OWNER_ID               1
> >
> > +/*
> > + * Used to indicate a pte for which a 'break-before-make' sequence is in
> > + * progress.
> > + */
> > +#define KVM_INVALID_PTE_LOCKED         BIT(10)
> > +
> >  struct kvm_pgtable_walk_data {
> >         struct kvm_pgtable_walker       *walker;
> >
> > @@ -674,6 +680,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
> >         return !!pte;
> >  }
> >
> > +static bool stage2_pte_is_locked(kvm_pte_t pte)
> > +{
> > +       return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
> > +}
> > +
> >  static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> >  {
> >         if (!kvm_pgtable_walk_shared(ctx)) {
> > @@ -684,6 +695,64 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
> >         return cmpxchg(ctx->ptep, ctx->old, new) == ctx->old;
> >  }
> >
> > +/**
> > + * stage2_try_break_pte() - Invalidates a pte according to the
> > + *                         'break-before-make' requirements of the
> > + *                         architecture.
> > + *
> > + * @ctx: context of the visited pte.
> > + * @mmu: stage-2 mmu
> > + *
> > + * Returns: true if the pte was successfully broken.
> > + *
> > + * If the removed pte was valid, performs the necessary serialization and TLB
> > + * invalidation for the old value. For counted ptes, drops the reference count
> > + * on the containing table page.
> > + */
> > +static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> > +                                struct kvm_s2_mmu *mmu)
> > +{
> > +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> > +
> > +       if (stage2_pte_is_locked(ctx->old)) {
> > +               /*
> > +                * Should never occur if this walker has exclusive access to the
> > +                * page tables.
> > +                */
> > +               WARN_ON(!kvm_pgtable_walk_shared(ctx));
> > +               return false;
> > +       }
> > +
> > +       if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
> > +               return false;
> > +
> > +       /*
> > +        * Perform the appropriate TLB invalidation based on the evicted pte
> > +        * value (if any).
> > +        */
> > +       if (kvm_pte_table(ctx->old, ctx->level))
> > +               kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> > +       else if (kvm_pte_valid(ctx->old))
> > +               kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> > +
> > +       if (stage2_pte_is_counted(ctx->old))
> > +               mm_ops->put_page(ctx->ptep);
> > +
> > +       return true;
> > +}
> > +
> > +static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> > +{
> > +       struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> > +
> > +       WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> > +
> > +       if (stage2_pte_is_counted(new))
> > +               mm_ops->get_page(ctx->ptep);
> > +
> > +       smp_store_release(ctx->ptep, new);
> > +}
> > +
> >  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> >                            struct kvm_pgtable_mm_ops *mm_ops)
> >  {
> > @@ -812,17 +881,18 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> >         if (!childp)
> >                 return -ENOMEM;
> >
> > +       if (!stage2_try_break_pte(ctx, data->mmu)) {
> > +               mm_ops->put_page(childp);
> > +               return -EAGAIN;
> > +       }
> > +
> >         /*
> >          * If we've run into an existing block mapping then replace it with
> >          * a table. Accesses beyond 'end' that fall within the new table
> >          * will be mapped lazily.
> >          */
> > -       if (stage2_pte_is_counted(ctx->old))
> > -               stage2_put_pte(ctx, data->mmu, mm_ops);
> > -
> >         new = kvm_init_table_pte(childp, mm_ops);
> 
> Does it make any sense to move this before the "break" to minimize the
> critical section in which the PTE is locked?

I had rationalized this before as doing less work in the threads that
would lose a race, but the critical section is very likely to be
performance sensitive as we're unmapping memory after all.

Thanks for the suggestion, I'll fold it in the next spin.

--
Best,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-09 21:53     ` Sean Christopherson
  (?)
@ 2022-11-09 23:55       ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> On Mon, Nov 07, 2022, Oliver Upton wrote:
> > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > release the RCU read lock when traversing the page tables. Defer the
> > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > and provide stubs for hypervisor code, as RCU is not available in such a
> > context.
> > 
> > The RCU protection doesn't amount to much at the moment, as readers are
> > already protected by the read-write lock (all walkers that free table
> > memory take the write lock). Nonetheless, a subsequent change will
> > futher relax the locking requirements around the stage-2 MMU, thereby
> > depending on RCU.
> 
> Two somewhat off-topic questions (because I'm curious):

Worth asking!

>  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
>     faults (handle_access_fault()) and/or write-protection faults without acquiring
>     mmu_lock?

I don't have any plans personally.

OTOH, adding support for read-side access faults is trivial, I just
didn't give it much thought as most large-scale implementations have
FEAT_HAFDBS (hardware access flag management).

>  2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
>     for the RCU-less hypervisor code?

If/when we are worried about fault serialization in the lowvisor I was
thinking something along the lines of disabling interrupts and using
IPIs as barriers before freeing removed table memory, crudely giving the
same protection as RCU.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-09 23:55       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, Ben Gardon,
	David Matlack, kvmarm, linux-arm-kernel

On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> On Mon, Nov 07, 2022, Oliver Upton wrote:
> > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > release the RCU read lock when traversing the page tables. Defer the
> > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > and provide stubs for hypervisor code, as RCU is not available in such a
> > context.
> > 
> > The RCU protection doesn't amount to much at the moment, as readers are
> > already protected by the read-write lock (all walkers that free table
> > memory take the write lock). Nonetheless, a subsequent change will
> > futher relax the locking requirements around the stage-2 MMU, thereby
> > depending on RCU.
> 
> Two somewhat off-topic questions (because I'm curious):

Worth asking!

>  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
>     faults (handle_access_fault()) and/or write-protection faults without acquiring
>     mmu_lock?

I don't have any plans personally.

OTOH, adding support for read-side access faults is trivial, I just
didn't give it much thought as most large-scale implementations have
FEAT_HAFDBS (hardware access flag management).

>  2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
>     for the RCU-less hypervisor code?

If/when we are worried about fault serialization in the lowvisor I was
thinking something along the lines of disabling interrupts and using
IPIs as barriers before freeing removed table memory, crudely giving the
same protection as RCU.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-09 23:55       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-09 23:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> On Mon, Nov 07, 2022, Oliver Upton wrote:
> > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > release the RCU read lock when traversing the page tables. Defer the
> > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > and provide stubs for hypervisor code, as RCU is not available in such a
> > context.
> > 
> > The RCU protection doesn't amount to much at the moment, as readers are
> > already protected by the read-write lock (all walkers that free table
> > memory take the write lock). Nonetheless, a subsequent change will
> > futher relax the locking requirements around the stage-2 MMU, thereby
> > depending on RCU.
> 
> Two somewhat off-topic questions (because I'm curious):

Worth asking!

>  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
>     faults (handle_access_fault()) and/or write-protection faults without acquiring
>     mmu_lock?

I don't have any plans personally.

OTOH, adding support for read-side access faults is trivial, I just
didn't give it much thought as most large-scale implementations have
FEAT_HAFDBS (hardware access flag management).

>  2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
>     for the RCU-less hypervisor code?

If/when we are worried about fault serialization in the lowvisor I was
thinking something along the lines of disabling interrupts and using
IPIs as barriers before freeing removed table memory, crudely giving the
same protection as RCU.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-10  0:23     ` Gavin Shan
  -1 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  0:23 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

On 11/8/22 5:56 AM, Oliver Upton wrote:
> Passing new arguments by value to the visitor callbacks is extremely
> inflexible for stuffing new parameters used by only some of the
> visitors. Use a context structure instead and pass the pointer through
> to the visitor callback.
> 
> While at it, redefine the 'flags' parameter to the visitor to contain
> the bit indicating the phase of the walk. Pass the entire set of flags
> through the context structure such that the walker can communicate
> additional state to the visitor callback.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>   arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>   arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>   4 files changed, 154 insertions(+), 156 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

One nit below.

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3252eb50ecfe..607f9bb8aab4 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
>   };
>   
> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> -					kvm_pte_t *ptep,
> -					enum kvm_pgtable_walk_flags flag,
> -					void * const arg);
> +struct kvm_pgtable_visit_ctx {
> +	kvm_pte_t				*ptep;
> +	void					*arg;
> +	u64					addr;
> +	u64					end;
> +	u32					level;
> +	enum kvm_pgtable_walk_flags		flags;
> +};
> +
> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> +					enum kvm_pgtable_walk_flags visit);
>   

Does it make sense to reorder these fields in the context struct based on
their properties. For example, ptep is determined by the combination of
addr/level.

     struct kvm_pgtable_visit_ctx {
            enum kvm_pgtable_walk_flags     flags;
            u64                             addr;
            u64                             end;
            u32                             level;
            kvm_pte_t                       *ptep;
            void                            *arg;
     };
            

>   /**
>    * struct kvm_pgtable_walker - Hook into a page-table walk.
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1e78acf9662e..8f5b6a36a039 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -417,13 +417,11 @@ struct check_walk_data {
>   	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
>   };
>   
> -static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> -				      enum kvm_pgtable_walk_flags flag,
> -				      void * const arg)
> +static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
> +				      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct check_walk_data *d = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct check_walk_data *d = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
>   		return -EINVAL;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index e8d4ea2fcfa0..a293cf5eba1b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
>   	hyp_put_page(&hpool, addr);
>   }
>   
> -static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> -					 kvm_pte_t *ptep,
> -					 enum kvm_pgtable_walk_flags flag,
> -					 void * const arg)
> +static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
> -	kvm_pte_t pte = *ptep;
> +	kvm_pte_t pte = *ctx->ptep;
>   	phys_addr_t phys;
>   
>   	if (!kvm_pte_valid(pte))
> @@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
>   	 * was unable to access the hyp_vmemmap and so the buddy allocator has
>   	 * initialised the refcount to '1'.
>   	 */
> -	mm_ops->get_page(ptep);
> -	if (flag != KVM_PGTABLE_WALK_LEAF)
> +	mm_ops->get_page(ctx->ptep);
> +	if (visit != KVM_PGTABLE_WALK_LEAF)
>   		return 0;
>   
> -	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	phys = kvm_pte_to_phys(pte);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index cdf8e76b0be1..900c8b9c0cfc 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
>   	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
>   }
>   
> -static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> +static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
>   {
> -	u64 granule = kvm_granule_size(level);
> +	u64 granule = kvm_granule_size(ctx->level);
>   
> -	if (!kvm_level_supports_block_mapping(level))
> +	if (!kvm_level_supports_block_mapping(ctx->level))
>   		return false;
>   
> -	if (granule > (end - addr))
> +	if (granule > (ctx->end - ctx->addr))
>   		return false;
>   
>   	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
>   		return false;
>   
> -	return IS_ALIGNED(addr, granule);
> +	return IS_ALIGNED(ctx->addr, granule);
>   }
>   
>   static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> @@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
>   	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
>   }
>   
> -static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> -				  u32 level, kvm_pte_t *ptep,
> -				  enum kvm_pgtable_walk_flags flag)
> +static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
> +				  const struct kvm_pgtable_visit_ctx *ctx,
> +				  enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_walker *walker = data->walker;
> -	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
> +	return walker->cb(ctx, visit);
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> @@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   				      kvm_pte_t *ptep, u32 level)
>   {
> +	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +	struct kvm_pgtable_visit_ctx ctx = {
> +		.ptep	= ptep,
> +		.arg	= data->walker->arg,
> +		.addr	= data->addr,
> +		.end	= data->end,
> +		.level	= level,
> +		.flags	= flags,
> +	};
>   	int ret = 0;
> -	u64 addr = data->addr;
>   	kvm_pte_t *childp, pte = *ptep;
>   	bool table = kvm_pte_table(pte, level);
> -	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>   
> -	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_TABLE_PRE);
> -	}
> +	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>   
> -	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_LEAF);
> +	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
>   		pte = *ptep;
>   		table = kvm_pte_table(pte, level);
>   	}
> @@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   	if (ret)
>   		goto out;
>   
> -	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_TABLE_POST);
> -	}
> +	if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
>   
>   out:
>   	return ret;
> @@ -292,13 +294,13 @@ struct leaf_walk_data {
>   	u32		level;
>   };
>   
> -static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -		       enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +		       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct leaf_walk_data *data = arg;
> +	struct leaf_walk_data *data = ctx->arg;
>   
> -	data->pte   = *ptep;
> -	data->level = level;
> +	data->pte   = *ctx->ptep;
> +	data->level = ctx->level;
>   
>   	return 0;
>   }
> @@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>   	return prot;
>   }
>   
> -static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -				    kvm_pte_t *ptep, struct hyp_map_data *data)
> +static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> +				    struct hyp_map_data *data)
>   {
> -	kvm_pte_t new, old = *ptep;
> -	u64 granule = kvm_granule_size(level), phys = data->phys;
> +	kvm_pte_t new, old = *ctx->ptep;
> +	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   
> -	if (!kvm_block_mapping_supported(addr, end, phys, level))
> +	if (!kvm_block_mapping_supported(ctx, phys))
>   		return false;
>   
>   	data->phys += granule;
> -	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>   	if (old == new)
>   		return true;
>   	if (!kvm_pte_valid(old))
> -		data->mm_ops->get_page(ptep);
> +		data->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> -	smp_store_release(ptep, new);
> +	smp_store_release(ctx->ptep, new);
>   	return true;
>   }
>   
> -static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			  enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			  enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp;
> -	struct hyp_map_data *data = arg;
> +	struct hyp_map_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
> +	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
>   
> -	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>   	if (!childp)
>   		return -ENOMEM;
>   
> -	kvm_set_table_pte(ptep, childp, mm_ops);
> -	mm_ops->get_page(ptep);
> +	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +	mm_ops->get_page(ctx->ptep);
>   	return 0;
>   }
>   
> @@ -456,39 +458,39 @@ struct hyp_unmap_data {
>   	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
> -static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			    enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			    enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ptep, *childp = NULL;
> -	u64 granule = kvm_granule_size(level);
> -	struct hyp_unmap_data *data = arg;
> +	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	u64 granule = kvm_granule_size(ctx->level);
> +	struct hyp_unmap_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
>   	if (!kvm_pte_valid(pte))
>   		return -EINVAL;
>   
> -	if (kvm_pte_table(pte, level)) {
> +	if (kvm_pte_table(pte, ctx->level)) {
>   		childp = kvm_pte_follow(pte, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
>   
> -		kvm_clear_pte(ptep);
> +		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
> -		__tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
> +		__tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>   	} else {
> -		if (end - addr < granule)
> +		if (ctx->end - ctx->addr < granule)
>   			return -EINVAL;
>   
> -		kvm_clear_pte(ptep);
> +		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
> -		__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
> +		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>   		data->unmapped += granule;
>   	}
>   
>   	dsb(ish);
>   	isb();
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
>   	if (childp)
>   		mm_ops->put_page(childp);
> @@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   	return 0;
>   }
>   
> -static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			   enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!kvm_pte_valid(pte))
>   		return 0;
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, level))
> +	if (kvm_pte_table(pte, ctx->level))
>   		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>   
>   	return 0;
> @@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>   	return !!pte;
>   }
>   
> -static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> -			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
> +static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> +			   struct kvm_pgtable_mm_ops *mm_ops)
>   {
>   	/*
>   	 * Clear the existing PTE, and perform break-before-make with
>   	 * TLB maintenance if it was valid.
>   	 */
> -	if (kvm_pte_valid(*ptep)) {
> -		kvm_clear_pte(ptep);
> -		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +	if (kvm_pte_valid(*ctx->ptep)) {
> +		kvm_clear_pte(ctx->ptep);
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>   	}
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   }
>   
>   static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
> @@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
>   	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
>   }
>   
> -static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
> +static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>   					struct stage2_map_data *data)
>   {
> -	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> +	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
>   		return false;
>   
> -	return kvm_block_mapping_supported(addr, end, data->phys, level);
> +	return kvm_block_mapping_supported(ctx, data->phys);
>   }
>   
> -static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> +static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	kvm_pte_t new, old = *ptep;
> -	u64 granule = kvm_granule_size(level), phys = data->phys;
> +	kvm_pte_t new, old = *ctx->ptep;
> +	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
>   
>   	if (kvm_phys_is_valid(phys))
> -		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +		new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>   	else
>   		new = kvm_init_invalid_leaf_owner(data->owner_id);
>   
> @@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   		if (!stage2_pte_needs_update(old, new))
>   			return -EAGAIN;
>   
> -		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +		stage2_put_pte(ctx, data->mmu, mm_ops);
>   	}
>   
>   	/* Perform CMOs before installation of the guest stage-2 PTE */
> @@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>   		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>   
> -	smp_store_release(ptep, new);
> +	smp_store_release(ctx->ptep, new);
>   	if (stage2_pte_is_counted(new))
> -		mm_ops->get_page(ptep);
> +		mm_ops->get_page(ctx->ptep);
>   	if (kvm_phys_is_valid(phys))
>   		data->phys += granule;
>   	return 0;
>   }
>   
> -static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> -				     kvm_pte_t *ptep,
> +static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   				     struct stage2_map_data *data)
>   {
>   	if (data->anchor)
>   		return 0;
>   
> -	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> -	kvm_clear_pte(ptep);
> +	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +	kvm_clear_pte(ctx->ptep);
>   
>   	/*
>   	 * Invalidate the whole stage-2, as we may have numerous leaf
> @@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>   	 * individually.
>   	 */
>   	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -	data->anchor = ptep;
> +	data->anchor = ctx->ptep;
>   	return 0;
>   }
>   
> -static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -	kvm_pte_t *childp, pte = *ptep;
> +	kvm_pte_t *childp, pte = *ctx->ptep;
>   	int ret;
>   
>   	if (data->anchor) {
>   		if (stage2_pte_is_counted(pte))
> -			mm_ops->put_page(ptep);
> +			mm_ops->put_page(ctx->ptep);
>   
>   		return 0;
>   	}
>   
> -	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
> +	ret = stage2_map_walker_try_leaf(ctx, data);
>   	if (ret != -E2BIG)
>   		return ret;
>   
> -	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	if (!data->memcache)
> @@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   	 * will be mapped lazily.
>   	 */
>   	if (stage2_pte_is_counted(pte))
> -		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +		stage2_put_pte(ctx, data->mmu, mm_ops);
>   
> -	kvm_set_table_pte(ptep, childp, mm_ops);
> -	mm_ops->get_page(ptep);
> +	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +	mm_ops->get_page(ctx->ptep);
>   
>   	return 0;
>   }
>   
> -static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> +static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>   	if (!data->anchor)
>   		return 0;
>   
> -	if (data->anchor == ptep) {
> +	if (data->anchor == ctx->ptep) {
>   		childp = data->childp;
>   		data->anchor = NULL;
>   		data->childp = NULL;
> -		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +		ret = stage2_map_walk_leaf(ctx, data);
>   	} else {
> -		childp = kvm_pte_follow(*ptep, mm_ops);
> +		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
>   	}
>   
>   	mm_ops->put_page(childp);
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
>   	return ret;
>   }
> @@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>    * the page-table, installing the block entry when it revisits the anchor
>    * pointer and clearing the anchor to NULL.
>    */
> -static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			     enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			     enum kvm_pgtable_walk_flags visit)
>   {
> -	struct stage2_map_data *data = arg;
> +	struct stage2_map_data *data = ctx->arg;
>   
> -	switch (flag) {
> +	switch (visit) {
>   	case KVM_PGTABLE_WALK_TABLE_PRE:
> -		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +		return stage2_map_walk_table_pre(ctx, data);
>   	case KVM_PGTABLE_WALK_LEAF:
> -		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +		return stage2_map_walk_leaf(ctx, data);
>   	case KVM_PGTABLE_WALK_TABLE_POST:
> -		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +		return stage2_map_walk_table_post(ctx, data);
>   	}
>   
>   	return -EINVAL;
> @@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   	return ret;
>   }
>   
> -static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			       enum kvm_pgtable_walk_flags flag,
> -			       void * const arg)
> +static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable *pgt = arg;
> +	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ptep, *childp = NULL;
> +	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
>   	bool need_flush = false;
>   
>   	if (!kvm_pte_valid(pte)) {
>   		if (stage2_pte_is_counted(pte)) {
> -			kvm_clear_pte(ptep);
> -			mm_ops->put_page(ptep);
> +			kvm_clear_pte(ctx->ptep);
> +			mm_ops->put_page(ctx->ptep);
>   		}
>   		return 0;
>   	}
>   
> -	if (kvm_pte_table(pte, level)) {
> +	if (kvm_pte_table(pte, ctx->level)) {
>   		childp = kvm_pte_follow(pte, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
> @@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   	 * block entry and rely on the remaining portions being faulted
>   	 * back lazily.
>   	 */
> -	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
> +	stage2_put_pte(ctx, mmu, mm_ops);
>   
>   	if (need_flush && mm_ops->dcache_clean_inval_poc)
>   		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -					       kvm_granule_size(level));
> +					       kvm_granule_size(ctx->level));
>   
>   	if (childp)
>   		mm_ops->put_page(childp);
> @@ -1012,18 +1010,17 @@ struct stage2_attr_data {
>   	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
> -static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			      enum kvm_pgtable_walk_flags flag,
> -			      void * const arg)
> +static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			      enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ptep;
> -	struct stage2_attr_data *data = arg;
> +	kvm_pte_t pte = *ctx->ptep;
> +	struct stage2_attr_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
>   	if (!kvm_pte_valid(pte))
>   		return 0;
>   
> -	data->level = level;
> +	data->level = ctx->level;
>   	data->pte = pte;
>   	pte &= ~data->attr_clr;
>   	pte |= data->attr_set;
> @@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   		 * stage-2 PTE if we are going to add executable permission.
>   		 */
>   		if (mm_ops->icache_inval_pou &&
> -		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
> +		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
>   			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
> -						  kvm_granule_size(level));
> -		WRITE_ONCE(*ptep, pte);
> +						  kvm_granule_size(ctx->level));
> +		WRITE_ONCE(*ctx->ptep, pte);
>   	}
>   
>   	return 0;
> @@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>   	return ret;
>   }
>   
> -static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			       enum kvm_pgtable_walk_flags flag,
> -			       void * const arg)
> +static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable *pgt = arg;
> +	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ptep;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
>   		return 0;
>   
>   	if (mm_ops->dcache_clean_inval_poc)
>   		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -					       kvm_granule_size(level));
> +					       kvm_granule_size(ctx->level));
>   	return 0;
>   }
>   
> @@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   	return 0;
>   }
>   
> -static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			      enum kvm_pgtable_walk_flags flag,
> -			      void * const arg)
> +static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!stage2_pte_is_counted(pte))
>   		return 0;
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, level))
> +	if (kvm_pte_table(pte, ctx->level))
>   		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>   
>   	return 0;
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-10  0:23     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  0:23 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On 11/8/22 5:56 AM, Oliver Upton wrote:
> Passing new arguments by value to the visitor callbacks is extremely
> inflexible for stuffing new parameters used by only some of the
> visitors. Use a context structure instead and pass the pointer through
> to the visitor callback.
> 
> While at it, redefine the 'flags' parameter to the visitor to contain
> the bit indicating the phase of the walk. Pass the entire set of flags
> through the context structure such that the walker can communicate
> additional state to the visitor callback.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>   arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>   arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>   4 files changed, 154 insertions(+), 156 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

One nit below.

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3252eb50ecfe..607f9bb8aab4 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
>   };
>   
> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> -					kvm_pte_t *ptep,
> -					enum kvm_pgtable_walk_flags flag,
> -					void * const arg);
> +struct kvm_pgtable_visit_ctx {
> +	kvm_pte_t				*ptep;
> +	void					*arg;
> +	u64					addr;
> +	u64					end;
> +	u32					level;
> +	enum kvm_pgtable_walk_flags		flags;
> +};
> +
> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> +					enum kvm_pgtable_walk_flags visit);
>   

Does it make sense to reorder these fields in the context struct based on
their properties. For example, ptep is determined by the combination of
addr/level.

     struct kvm_pgtable_visit_ctx {
            enum kvm_pgtable_walk_flags     flags;
            u64                             addr;
            u64                             end;
            u32                             level;
            kvm_pte_t                       *ptep;
            void                            *arg;
     };
            

>   /**
>    * struct kvm_pgtable_walker - Hook into a page-table walk.
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1e78acf9662e..8f5b6a36a039 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -417,13 +417,11 @@ struct check_walk_data {
>   	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
>   };
>   
> -static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> -				      enum kvm_pgtable_walk_flags flag,
> -				      void * const arg)
> +static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
> +				      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct check_walk_data *d = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct check_walk_data *d = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
>   		return -EINVAL;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index e8d4ea2fcfa0..a293cf5eba1b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
>   	hyp_put_page(&hpool, addr);
>   }
>   
> -static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> -					 kvm_pte_t *ptep,
> -					 enum kvm_pgtable_walk_flags flag,
> -					 void * const arg)
> +static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
> -	kvm_pte_t pte = *ptep;
> +	kvm_pte_t pte = *ctx->ptep;
>   	phys_addr_t phys;
>   
>   	if (!kvm_pte_valid(pte))
> @@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
>   	 * was unable to access the hyp_vmemmap and so the buddy allocator has
>   	 * initialised the refcount to '1'.
>   	 */
> -	mm_ops->get_page(ptep);
> -	if (flag != KVM_PGTABLE_WALK_LEAF)
> +	mm_ops->get_page(ctx->ptep);
> +	if (visit != KVM_PGTABLE_WALK_LEAF)
>   		return 0;
>   
> -	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	phys = kvm_pte_to_phys(pte);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index cdf8e76b0be1..900c8b9c0cfc 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
>   	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
>   }
>   
> -static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> +static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
>   {
> -	u64 granule = kvm_granule_size(level);
> +	u64 granule = kvm_granule_size(ctx->level);
>   
> -	if (!kvm_level_supports_block_mapping(level))
> +	if (!kvm_level_supports_block_mapping(ctx->level))
>   		return false;
>   
> -	if (granule > (end - addr))
> +	if (granule > (ctx->end - ctx->addr))
>   		return false;
>   
>   	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
>   		return false;
>   
> -	return IS_ALIGNED(addr, granule);
> +	return IS_ALIGNED(ctx->addr, granule);
>   }
>   
>   static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> @@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
>   	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
>   }
>   
> -static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> -				  u32 level, kvm_pte_t *ptep,
> -				  enum kvm_pgtable_walk_flags flag)
> +static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
> +				  const struct kvm_pgtable_visit_ctx *ctx,
> +				  enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_walker *walker = data->walker;
> -	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
> +	return walker->cb(ctx, visit);
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> @@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   				      kvm_pte_t *ptep, u32 level)
>   {
> +	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +	struct kvm_pgtable_visit_ctx ctx = {
> +		.ptep	= ptep,
> +		.arg	= data->walker->arg,
> +		.addr	= data->addr,
> +		.end	= data->end,
> +		.level	= level,
> +		.flags	= flags,
> +	};
>   	int ret = 0;
> -	u64 addr = data->addr;
>   	kvm_pte_t *childp, pte = *ptep;
>   	bool table = kvm_pte_table(pte, level);
> -	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>   
> -	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_TABLE_PRE);
> -	}
> +	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>   
> -	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_LEAF);
> +	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
>   		pte = *ptep;
>   		table = kvm_pte_table(pte, level);
>   	}
> @@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   	if (ret)
>   		goto out;
>   
> -	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_TABLE_POST);
> -	}
> +	if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
>   
>   out:
>   	return ret;
> @@ -292,13 +294,13 @@ struct leaf_walk_data {
>   	u32		level;
>   };
>   
> -static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -		       enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +		       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct leaf_walk_data *data = arg;
> +	struct leaf_walk_data *data = ctx->arg;
>   
> -	data->pte   = *ptep;
> -	data->level = level;
> +	data->pte   = *ctx->ptep;
> +	data->level = ctx->level;
>   
>   	return 0;
>   }
> @@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>   	return prot;
>   }
>   
> -static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -				    kvm_pte_t *ptep, struct hyp_map_data *data)
> +static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> +				    struct hyp_map_data *data)
>   {
> -	kvm_pte_t new, old = *ptep;
> -	u64 granule = kvm_granule_size(level), phys = data->phys;
> +	kvm_pte_t new, old = *ctx->ptep;
> +	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   
> -	if (!kvm_block_mapping_supported(addr, end, phys, level))
> +	if (!kvm_block_mapping_supported(ctx, phys))
>   		return false;
>   
>   	data->phys += granule;
> -	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>   	if (old == new)
>   		return true;
>   	if (!kvm_pte_valid(old))
> -		data->mm_ops->get_page(ptep);
> +		data->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> -	smp_store_release(ptep, new);
> +	smp_store_release(ctx->ptep, new);
>   	return true;
>   }
>   
> -static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			  enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			  enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp;
> -	struct hyp_map_data *data = arg;
> +	struct hyp_map_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
> +	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
>   
> -	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>   	if (!childp)
>   		return -ENOMEM;
>   
> -	kvm_set_table_pte(ptep, childp, mm_ops);
> -	mm_ops->get_page(ptep);
> +	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +	mm_ops->get_page(ctx->ptep);
>   	return 0;
>   }
>   
> @@ -456,39 +458,39 @@ struct hyp_unmap_data {
>   	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
> -static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			    enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			    enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ptep, *childp = NULL;
> -	u64 granule = kvm_granule_size(level);
> -	struct hyp_unmap_data *data = arg;
> +	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	u64 granule = kvm_granule_size(ctx->level);
> +	struct hyp_unmap_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
>   	if (!kvm_pte_valid(pte))
>   		return -EINVAL;
>   
> -	if (kvm_pte_table(pte, level)) {
> +	if (kvm_pte_table(pte, ctx->level)) {
>   		childp = kvm_pte_follow(pte, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
>   
> -		kvm_clear_pte(ptep);
> +		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
> -		__tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
> +		__tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>   	} else {
> -		if (end - addr < granule)
> +		if (ctx->end - ctx->addr < granule)
>   			return -EINVAL;
>   
> -		kvm_clear_pte(ptep);
> +		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
> -		__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
> +		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>   		data->unmapped += granule;
>   	}
>   
>   	dsb(ish);
>   	isb();
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
>   	if (childp)
>   		mm_ops->put_page(childp);
> @@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   	return 0;
>   }
>   
> -static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			   enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!kvm_pte_valid(pte))
>   		return 0;
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, level))
> +	if (kvm_pte_table(pte, ctx->level))
>   		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>   
>   	return 0;
> @@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>   	return !!pte;
>   }
>   
> -static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> -			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
> +static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> +			   struct kvm_pgtable_mm_ops *mm_ops)
>   {
>   	/*
>   	 * Clear the existing PTE, and perform break-before-make with
>   	 * TLB maintenance if it was valid.
>   	 */
> -	if (kvm_pte_valid(*ptep)) {
> -		kvm_clear_pte(ptep);
> -		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +	if (kvm_pte_valid(*ctx->ptep)) {
> +		kvm_clear_pte(ctx->ptep);
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>   	}
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   }
>   
>   static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
> @@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
>   	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
>   }
>   
> -static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
> +static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>   					struct stage2_map_data *data)
>   {
> -	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> +	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
>   		return false;
>   
> -	return kvm_block_mapping_supported(addr, end, data->phys, level);
> +	return kvm_block_mapping_supported(ctx, data->phys);
>   }
>   
> -static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> +static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	kvm_pte_t new, old = *ptep;
> -	u64 granule = kvm_granule_size(level), phys = data->phys;
> +	kvm_pte_t new, old = *ctx->ptep;
> +	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
>   
>   	if (kvm_phys_is_valid(phys))
> -		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +		new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>   	else
>   		new = kvm_init_invalid_leaf_owner(data->owner_id);
>   
> @@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   		if (!stage2_pte_needs_update(old, new))
>   			return -EAGAIN;
>   
> -		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +		stage2_put_pte(ctx, data->mmu, mm_ops);
>   	}
>   
>   	/* Perform CMOs before installation of the guest stage-2 PTE */
> @@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>   		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>   
> -	smp_store_release(ptep, new);
> +	smp_store_release(ctx->ptep, new);
>   	if (stage2_pte_is_counted(new))
> -		mm_ops->get_page(ptep);
> +		mm_ops->get_page(ctx->ptep);
>   	if (kvm_phys_is_valid(phys))
>   		data->phys += granule;
>   	return 0;
>   }
>   
> -static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> -				     kvm_pte_t *ptep,
> +static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   				     struct stage2_map_data *data)
>   {
>   	if (data->anchor)
>   		return 0;
>   
> -	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> -	kvm_clear_pte(ptep);
> +	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +	kvm_clear_pte(ctx->ptep);
>   
>   	/*
>   	 * Invalidate the whole stage-2, as we may have numerous leaf
> @@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>   	 * individually.
>   	 */
>   	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -	data->anchor = ptep;
> +	data->anchor = ctx->ptep;
>   	return 0;
>   }
>   
> -static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -	kvm_pte_t *childp, pte = *ptep;
> +	kvm_pte_t *childp, pte = *ctx->ptep;
>   	int ret;
>   
>   	if (data->anchor) {
>   		if (stage2_pte_is_counted(pte))
> -			mm_ops->put_page(ptep);
> +			mm_ops->put_page(ctx->ptep);
>   
>   		return 0;
>   	}
>   
> -	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
> +	ret = stage2_map_walker_try_leaf(ctx, data);
>   	if (ret != -E2BIG)
>   		return ret;
>   
> -	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	if (!data->memcache)
> @@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   	 * will be mapped lazily.
>   	 */
>   	if (stage2_pte_is_counted(pte))
> -		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +		stage2_put_pte(ctx, data->mmu, mm_ops);
>   
> -	kvm_set_table_pte(ptep, childp, mm_ops);
> -	mm_ops->get_page(ptep);
> +	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +	mm_ops->get_page(ctx->ptep);
>   
>   	return 0;
>   }
>   
> -static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> +static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>   	if (!data->anchor)
>   		return 0;
>   
> -	if (data->anchor == ptep) {
> +	if (data->anchor == ctx->ptep) {
>   		childp = data->childp;
>   		data->anchor = NULL;
>   		data->childp = NULL;
> -		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +		ret = stage2_map_walk_leaf(ctx, data);
>   	} else {
> -		childp = kvm_pte_follow(*ptep, mm_ops);
> +		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
>   	}
>   
>   	mm_ops->put_page(childp);
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
>   	return ret;
>   }
> @@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>    * the page-table, installing the block entry when it revisits the anchor
>    * pointer and clearing the anchor to NULL.
>    */
> -static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			     enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			     enum kvm_pgtable_walk_flags visit)
>   {
> -	struct stage2_map_data *data = arg;
> +	struct stage2_map_data *data = ctx->arg;
>   
> -	switch (flag) {
> +	switch (visit) {
>   	case KVM_PGTABLE_WALK_TABLE_PRE:
> -		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +		return stage2_map_walk_table_pre(ctx, data);
>   	case KVM_PGTABLE_WALK_LEAF:
> -		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +		return stage2_map_walk_leaf(ctx, data);
>   	case KVM_PGTABLE_WALK_TABLE_POST:
> -		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +		return stage2_map_walk_table_post(ctx, data);
>   	}
>   
>   	return -EINVAL;
> @@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   	return ret;
>   }
>   
> -static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			       enum kvm_pgtable_walk_flags flag,
> -			       void * const arg)
> +static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable *pgt = arg;
> +	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ptep, *childp = NULL;
> +	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
>   	bool need_flush = false;
>   
>   	if (!kvm_pte_valid(pte)) {
>   		if (stage2_pte_is_counted(pte)) {
> -			kvm_clear_pte(ptep);
> -			mm_ops->put_page(ptep);
> +			kvm_clear_pte(ctx->ptep);
> +			mm_ops->put_page(ctx->ptep);
>   		}
>   		return 0;
>   	}
>   
> -	if (kvm_pte_table(pte, level)) {
> +	if (kvm_pte_table(pte, ctx->level)) {
>   		childp = kvm_pte_follow(pte, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
> @@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   	 * block entry and rely on the remaining portions being faulted
>   	 * back lazily.
>   	 */
> -	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
> +	stage2_put_pte(ctx, mmu, mm_ops);
>   
>   	if (need_flush && mm_ops->dcache_clean_inval_poc)
>   		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -					       kvm_granule_size(level));
> +					       kvm_granule_size(ctx->level));
>   
>   	if (childp)
>   		mm_ops->put_page(childp);
> @@ -1012,18 +1010,17 @@ struct stage2_attr_data {
>   	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
> -static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			      enum kvm_pgtable_walk_flags flag,
> -			      void * const arg)
> +static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			      enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ptep;
> -	struct stage2_attr_data *data = arg;
> +	kvm_pte_t pte = *ctx->ptep;
> +	struct stage2_attr_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
>   	if (!kvm_pte_valid(pte))
>   		return 0;
>   
> -	data->level = level;
> +	data->level = ctx->level;
>   	data->pte = pte;
>   	pte &= ~data->attr_clr;
>   	pte |= data->attr_set;
> @@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   		 * stage-2 PTE if we are going to add executable permission.
>   		 */
>   		if (mm_ops->icache_inval_pou &&
> -		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
> +		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
>   			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
> -						  kvm_granule_size(level));
> -		WRITE_ONCE(*ptep, pte);
> +						  kvm_granule_size(ctx->level));
> +		WRITE_ONCE(*ctx->ptep, pte);
>   	}
>   
>   	return 0;
> @@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>   	return ret;
>   }
>   
> -static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			       enum kvm_pgtable_walk_flags flag,
> -			       void * const arg)
> +static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable *pgt = arg;
> +	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ptep;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
>   		return 0;
>   
>   	if (mm_ops->dcache_clean_inval_poc)
>   		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -					       kvm_granule_size(level));
> +					       kvm_granule_size(ctx->level));
>   	return 0;
>   }
>   
> @@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   	return 0;
>   }
>   
> -static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			      enum kvm_pgtable_walk_flags flag,
> -			      void * const arg)
> +static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!stage2_pte_is_counted(pte))
>   		return 0;
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, level))
> +	if (kvm_pte_table(pte, ctx->level))
>   		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>   
>   	return 0;
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-10  0:23     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  0:23 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On 11/8/22 5:56 AM, Oliver Upton wrote:
> Passing new arguments by value to the visitor callbacks is extremely
> inflexible for stuffing new parameters used by only some of the
> visitors. Use a context structure instead and pass the pointer through
> to the visitor callback.
> 
> While at it, redefine the 'flags' parameter to the visitor to contain
> the bit indicating the phase of the walk. Pass the entire set of flags
> through the context structure such that the walker can communicate
> additional state to the visitor callback.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>   arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>   arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>   4 files changed, 154 insertions(+), 156 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

One nit below.

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3252eb50ecfe..607f9bb8aab4 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
>   };
>   
> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> -					kvm_pte_t *ptep,
> -					enum kvm_pgtable_walk_flags flag,
> -					void * const arg);
> +struct kvm_pgtable_visit_ctx {
> +	kvm_pte_t				*ptep;
> +	void					*arg;
> +	u64					addr;
> +	u64					end;
> +	u32					level;
> +	enum kvm_pgtable_walk_flags		flags;
> +};
> +
> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> +					enum kvm_pgtable_walk_flags visit);
>   

Does it make sense to reorder these fields in the context struct based on
their properties. For example, ptep is determined by the combination of
addr/level.

     struct kvm_pgtable_visit_ctx {
            enum kvm_pgtable_walk_flags     flags;
            u64                             addr;
            u64                             end;
            u32                             level;
            kvm_pte_t                       *ptep;
            void                            *arg;
     };
            

>   /**
>    * struct kvm_pgtable_walker - Hook into a page-table walk.
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1e78acf9662e..8f5b6a36a039 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -417,13 +417,11 @@ struct check_walk_data {
>   	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
>   };
>   
> -static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> -				      enum kvm_pgtable_walk_flags flag,
> -				      void * const arg)
> +static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
> +				      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct check_walk_data *d = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct check_walk_data *d = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
>   		return -EINVAL;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index e8d4ea2fcfa0..a293cf5eba1b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -186,15 +186,13 @@ static void hpool_put_page(void *addr)
>   	hyp_put_page(&hpool, addr);
>   }
>   
> -static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> -					 kvm_pte_t *ptep,
> -					 enum kvm_pgtable_walk_flags flag,
> -					 void * const arg)
> +static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
> -	kvm_pte_t pte = *ptep;
> +	kvm_pte_t pte = *ctx->ptep;
>   	phys_addr_t phys;
>   
>   	if (!kvm_pte_valid(pte))
> @@ -205,11 +203,11 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
>   	 * was unable to access the hyp_vmemmap and so the buddy allocator has
>   	 * initialised the refcount to '1'.
>   	 */
> -	mm_ops->get_page(ptep);
> -	if (flag != KVM_PGTABLE_WALK_LEAF)
> +	mm_ops->get_page(ctx->ptep);
> +	if (visit != KVM_PGTABLE_WALK_LEAF)
>   		return 0;
>   
> -	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	phys = kvm_pte_to_phys(pte);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index cdf8e76b0be1..900c8b9c0cfc 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -64,20 +64,20 @@ static bool kvm_phys_is_valid(u64 phys)
>   	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
>   }
>   
> -static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> +static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
>   {
> -	u64 granule = kvm_granule_size(level);
> +	u64 granule = kvm_granule_size(ctx->level);
>   
> -	if (!kvm_level_supports_block_mapping(level))
> +	if (!kvm_level_supports_block_mapping(ctx->level))
>   		return false;
>   
> -	if (granule > (end - addr))
> +	if (granule > (ctx->end - ctx->addr))
>   		return false;
>   
>   	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
>   		return false;
>   
> -	return IS_ALIGNED(addr, granule);
> +	return IS_ALIGNED(ctx->addr, granule);
>   }
>   
>   static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> @@ -172,12 +172,12 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
>   	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
>   }
>   
> -static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> -				  u32 level, kvm_pte_t *ptep,
> -				  enum kvm_pgtable_walk_flags flag)
> +static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
> +				  const struct kvm_pgtable_visit_ctx *ctx,
> +				  enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_walker *walker = data->walker;
> -	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
> +	return walker->cb(ctx, visit);
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> @@ -186,20 +186,24 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   				      kvm_pte_t *ptep, u32 level)
>   {
> +	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +	struct kvm_pgtable_visit_ctx ctx = {
> +		.ptep	= ptep,
> +		.arg	= data->walker->arg,
> +		.addr	= data->addr,
> +		.end	= data->end,
> +		.level	= level,
> +		.flags	= flags,
> +	};
>   	int ret = 0;
> -	u64 addr = data->addr;
>   	kvm_pte_t *childp, pte = *ptep;
>   	bool table = kvm_pte_table(pte, level);
> -	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>   
> -	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_TABLE_PRE);
> -	}
> +	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>   
> -	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_LEAF);
> +	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
>   		pte = *ptep;
>   		table = kvm_pte_table(pte, level);
>   	}
> @@ -218,10 +222,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   	if (ret)
>   		goto out;
>   
> -	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
> -		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> -					     KVM_PGTABLE_WALK_TABLE_POST);
> -	}
> +	if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
> +		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
>   
>   out:
>   	return ret;
> @@ -292,13 +294,13 @@ struct leaf_walk_data {
>   	u32		level;
>   };
>   
> -static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -		       enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +		       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct leaf_walk_data *data = arg;
> +	struct leaf_walk_data *data = ctx->arg;
>   
> -	data->pte   = *ptep;
> -	data->level = level;
> +	data->pte   = *ctx->ptep;
> +	data->level = ctx->level;
>   
>   	return 0;
>   }
> @@ -383,47 +385,47 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>   	return prot;
>   }
>   
> -static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -				    kvm_pte_t *ptep, struct hyp_map_data *data)
> +static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> +				    struct hyp_map_data *data)
>   {
> -	kvm_pte_t new, old = *ptep;
> -	u64 granule = kvm_granule_size(level), phys = data->phys;
> +	kvm_pte_t new, old = *ctx->ptep;
> +	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   
> -	if (!kvm_block_mapping_supported(addr, end, phys, level))
> +	if (!kvm_block_mapping_supported(ctx, phys))
>   		return false;
>   
>   	data->phys += granule;
> -	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>   	if (old == new)
>   		return true;
>   	if (!kvm_pte_valid(old))
> -		data->mm_ops->get_page(ptep);
> +		data->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> -	smp_store_release(ptep, new);
> +	smp_store_release(ctx->ptep, new);
>   	return true;
>   }
>   
> -static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			  enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			  enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp;
> -	struct hyp_map_data *data = arg;
> +	struct hyp_map_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
> +	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
>   
> -	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>   	if (!childp)
>   		return -ENOMEM;
>   
> -	kvm_set_table_pte(ptep, childp, mm_ops);
> -	mm_ops->get_page(ptep);
> +	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +	mm_ops->get_page(ctx->ptep);
>   	return 0;
>   }
>   
> @@ -456,39 +458,39 @@ struct hyp_unmap_data {
>   	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
> -static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			    enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			    enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ptep, *childp = NULL;
> -	u64 granule = kvm_granule_size(level);
> -	struct hyp_unmap_data *data = arg;
> +	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	u64 granule = kvm_granule_size(ctx->level);
> +	struct hyp_unmap_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
>   	if (!kvm_pte_valid(pte))
>   		return -EINVAL;
>   
> -	if (kvm_pte_table(pte, level)) {
> +	if (kvm_pte_table(pte, ctx->level)) {
>   		childp = kvm_pte_follow(pte, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
>   
> -		kvm_clear_pte(ptep);
> +		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
> -		__tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
> +		__tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>   	} else {
> -		if (end - addr < granule)
> +		if (ctx->end - ctx->addr < granule)
>   			return -EINVAL;
>   
> -		kvm_clear_pte(ptep);
> +		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
> -		__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
> +		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
>   		data->unmapped += granule;
>   	}
>   
>   	dsb(ish);
>   	isb();
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
>   	if (childp)
>   		mm_ops->put_page(childp);
> @@ -532,18 +534,18 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   	return 0;
>   }
>   
> -static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			   enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!kvm_pte_valid(pte))
>   		return 0;
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, level))
> +	if (kvm_pte_table(pte, ctx->level))
>   		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>   
>   	return 0;
> @@ -682,19 +684,19 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
>   	return !!pte;
>   }
>   
> -static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> -			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
> +static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> +			   struct kvm_pgtable_mm_ops *mm_ops)
>   {
>   	/*
>   	 * Clear the existing PTE, and perform break-before-make with
>   	 * TLB maintenance if it was valid.
>   	 */
> -	if (kvm_pte_valid(*ptep)) {
> -		kvm_clear_pte(ptep);
> -		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +	if (kvm_pte_valid(*ctx->ptep)) {
> +		kvm_clear_pte(ctx->ptep);
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>   	}
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   }
>   
>   static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
> @@ -708,29 +710,28 @@ static bool stage2_pte_executable(kvm_pte_t pte)
>   	return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
>   }
>   
> -static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
> +static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>   					struct stage2_map_data *data)
>   {
> -	if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> +	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
>   		return false;
>   
> -	return kvm_block_mapping_supported(addr, end, data->phys, level);
> +	return kvm_block_mapping_supported(ctx, data->phys);
>   }
>   
> -static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> +static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	kvm_pte_t new, old = *ptep;
> -	u64 granule = kvm_granule_size(level), phys = data->phys;
> +	kvm_pte_t new, old = *ctx->ptep;
> +	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
>   
>   	if (kvm_phys_is_valid(phys))
> -		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> +		new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
>   	else
>   		new = kvm_init_invalid_leaf_owner(data->owner_id);
>   
> @@ -744,7 +745,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   		if (!stage2_pte_needs_update(old, new))
>   			return -EAGAIN;
>   
> -		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +		stage2_put_pte(ctx, data->mmu, mm_ops);
>   	}
>   
>   	/* Perform CMOs before installation of the guest stage-2 PTE */
> @@ -755,26 +756,25 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>   	if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
>   		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>   
> -	smp_store_release(ptep, new);
> +	smp_store_release(ctx->ptep, new);
>   	if (stage2_pte_is_counted(new))
> -		mm_ops->get_page(ptep);
> +		mm_ops->get_page(ctx->ptep);
>   	if (kvm_phys_is_valid(phys))
>   		data->phys += granule;
>   	return 0;
>   }
>   
> -static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> -				     kvm_pte_t *ptep,
> +static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   				     struct stage2_map_data *data)
>   {
>   	if (data->anchor)
>   		return 0;
>   
> -	if (!stage2_leaf_mapping_allowed(addr, end, level, data))
> +	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> -	kvm_clear_pte(ptep);
> +	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +	kvm_clear_pte(ctx->ptep);
>   
>   	/*
>   	 * Invalidate the whole stage-2, as we may have numerous leaf
> @@ -782,29 +782,29 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>   	 * individually.
>   	 */
>   	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
> -	data->anchor = ptep;
> +	data->anchor = ctx->ptep;
>   	return 0;
>   }
>   
> -static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -	kvm_pte_t *childp, pte = *ptep;
> +	kvm_pte_t *childp, pte = *ctx->ptep;
>   	int ret;
>   
>   	if (data->anchor) {
>   		if (stage2_pte_is_counted(pte))
> -			mm_ops->put_page(ptep);
> +			mm_ops->put_page(ctx->ptep);
>   
>   		return 0;
>   	}
>   
> -	ret = stage2_map_walker_try_leaf(addr, end, level, ptep, data);
> +	ret = stage2_map_walker_try_leaf(ctx, data);
>   	if (ret != -E2BIG)
>   		return ret;
>   
> -	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
>   	if (!data->memcache)
> @@ -820,16 +820,15 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   	 * will be mapped lazily.
>   	 */
>   	if (stage2_pte_is_counted(pte))
> -		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
> +		stage2_put_pte(ctx, data->mmu, mm_ops);
>   
> -	kvm_set_table_pte(ptep, childp, mm_ops);
> -	mm_ops->get_page(ptep);
> +	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> +	mm_ops->get_page(ctx->ptep);
>   
>   	return 0;
>   }
>   
> -static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> -				      kvm_pte_t *ptep,
> +static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -839,17 +838,17 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>   	if (!data->anchor)
>   		return 0;
>   
> -	if (data->anchor == ptep) {
> +	if (data->anchor == ctx->ptep) {
>   		childp = data->childp;
>   		data->anchor = NULL;
>   		data->childp = NULL;
> -		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +		ret = stage2_map_walk_leaf(ctx, data);
>   	} else {
> -		childp = kvm_pte_follow(*ptep, mm_ops);
> +		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
>   	}
>   
>   	mm_ops->put_page(childp);
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
>   	return ret;
>   }
> @@ -873,18 +872,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>    * the page-table, installing the block entry when it revisits the anchor
>    * pointer and clearing the anchor to NULL.
>    */
> -static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			     enum kvm_pgtable_walk_flags flag, void * const arg)
> +static int stage2_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			     enum kvm_pgtable_walk_flags visit)
>   {
> -	struct stage2_map_data *data = arg;
> +	struct stage2_map_data *data = ctx->arg;
>   
> -	switch (flag) {
> +	switch (visit) {
>   	case KVM_PGTABLE_WALK_TABLE_PRE:
> -		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +		return stage2_map_walk_table_pre(ctx, data);
>   	case KVM_PGTABLE_WALK_LEAF:
> -		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +		return stage2_map_walk_leaf(ctx, data);
>   	case KVM_PGTABLE_WALK_TABLE_POST:
> -		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +		return stage2_map_walk_table_post(ctx, data);
>   	}
>   
>   	return -EINVAL;
> @@ -949,25 +948,24 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   	return ret;
>   }
>   
> -static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			       enum kvm_pgtable_walk_flags flag,
> -			       void * const arg)
> +static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable *pgt = arg;
> +	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ptep, *childp = NULL;
> +	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
>   	bool need_flush = false;
>   
>   	if (!kvm_pte_valid(pte)) {
>   		if (stage2_pte_is_counted(pte)) {
> -			kvm_clear_pte(ptep);
> -			mm_ops->put_page(ptep);
> +			kvm_clear_pte(ctx->ptep);
> +			mm_ops->put_page(ctx->ptep);
>   		}
>   		return 0;
>   	}
>   
> -	if (kvm_pte_table(pte, level)) {
> +	if (kvm_pte_table(pte, ctx->level)) {
>   		childp = kvm_pte_follow(pte, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
> @@ -981,11 +979,11 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   	 * block entry and rely on the remaining portions being faulted
>   	 * back lazily.
>   	 */
> -	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
> +	stage2_put_pte(ctx, mmu, mm_ops);
>   
>   	if (need_flush && mm_ops->dcache_clean_inval_poc)
>   		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -					       kvm_granule_size(level));
> +					       kvm_granule_size(ctx->level));
>   
>   	if (childp)
>   		mm_ops->put_page(childp);
> @@ -1012,18 +1010,17 @@ struct stage2_attr_data {
>   	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
> -static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			      enum kvm_pgtable_walk_flags flag,
> -			      void * const arg)
> +static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			      enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ptep;
> -	struct stage2_attr_data *data = arg;
> +	kvm_pte_t pte = *ctx->ptep;
> +	struct stage2_attr_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
>   	if (!kvm_pte_valid(pte))
>   		return 0;
>   
> -	data->level = level;
> +	data->level = ctx->level;
>   	data->pte = pte;
>   	pte &= ~data->attr_clr;
>   	pte |= data->attr_set;
> @@ -1039,10 +1036,10 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   		 * stage-2 PTE if we are going to add executable permission.
>   		 */
>   		if (mm_ops->icache_inval_pou &&
> -		    stage2_pte_executable(pte) && !stage2_pte_executable(*ptep))
> +		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
>   			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
> -						  kvm_granule_size(level));
> -		WRITE_ONCE(*ptep, pte);
> +						  kvm_granule_size(ctx->level));
> +		WRITE_ONCE(*ctx->ptep, pte);
>   	}
>   
>   	return 0;
> @@ -1140,20 +1137,19 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>   	return ret;
>   }
>   
> -static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			       enum kvm_pgtable_walk_flags flag,
> -			       void * const arg)
> +static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			       enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable *pgt = arg;
> +	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ptep;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
>   		return 0;
>   
>   	if (mm_ops->dcache_clean_inval_poc)
>   		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> -					       kvm_granule_size(level));
> +					       kvm_granule_size(ctx->level));
>   	return 0;
>   }
>   
> @@ -1200,19 +1196,18 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   	return 0;
>   }
>   
> -static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> -			      enum kvm_pgtable_walk_flags flag,
> -			      void * const arg)
> +static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
> +			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = arg;
> -	kvm_pte_t pte = *ptep;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	kvm_pte_t pte = *ctx->ptep;
>   
>   	if (!stage2_pte_is_counted(pte))
>   		return 0;
>   
> -	mm_ops->put_page(ptep);
> +	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, level))
> +	if (kvm_pte_table(pte, ctx->level))
>   		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>   
>   	return 0;
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
  2022-11-10  0:23     ` Gavin Shan
  (?)
@ 2022-11-10  0:42       ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-10  0:42 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Gavin,

On Thu, Nov 10, 2022 at 08:23:36AM +0800, Gavin Shan wrote:
> On 11/8/22 5:56 AM, Oliver Upton wrote:
> > Passing new arguments by value to the visitor callbacks is extremely
> > inflexible for stuffing new parameters used by only some of the
> > visitors. Use a context structure instead and pass the pointer through
> > to the visitor callback.
> > 
> > While at it, redefine the 'flags' parameter to the visitor to contain
> > the bit indicating the phase of the walk. Pass the entire set of flags
> > through the context structure such that the walker can communicate
> > additional state to the visitor callback.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >   arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
> >   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
> >   arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
> >   arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
> >   4 files changed, 154 insertions(+), 156 deletions(-)
> > 
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
> One nit below.
> 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 3252eb50ecfe..607f9bb8aab4 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
> >   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
> >   };
> > -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> > -					kvm_pte_t *ptep,
> > -					enum kvm_pgtable_walk_flags flag,
> > -					void * const arg);
> > +struct kvm_pgtable_visit_ctx {
> > +	kvm_pte_t				*ptep;
> > +	void					*arg;
> > +	u64					addr;
> > +	u64					end;
> > +	u32					level;
> > +	enum kvm_pgtable_walk_flags		flags;
> > +};
> > +
> > +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> > +					enum kvm_pgtable_walk_flags visit);
> 
> Does it make sense to reorder these fields in the context struct based on
> their properties.

The ordering was a deliberate optimization for space. Your suggestion
has 8 bytes of implicit padding:

>     struct kvm_pgtable_visit_ctx {
>            enum kvm_pgtable_walk_flags     flags;

here

>            u64                             addr;
>            u64                             end;
>            u32                             level;

and here.

>            kvm_pte_t                       *ptep;
>            void                            *arg;
>     };

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-10  0:42       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-10  0:42 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, Ben Gardon,
	David Matlack, kvmarm, linux-arm-kernel

Hi Gavin,

On Thu, Nov 10, 2022 at 08:23:36AM +0800, Gavin Shan wrote:
> On 11/8/22 5:56 AM, Oliver Upton wrote:
> > Passing new arguments by value to the visitor callbacks is extremely
> > inflexible for stuffing new parameters used by only some of the
> > visitors. Use a context structure instead and pass the pointer through
> > to the visitor callback.
> > 
> > While at it, redefine the 'flags' parameter to the visitor to contain
> > the bit indicating the phase of the walk. Pass the entire set of flags
> > through the context structure such that the walker can communicate
> > additional state to the visitor callback.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >   arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
> >   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
> >   arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
> >   arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
> >   4 files changed, 154 insertions(+), 156 deletions(-)
> > 
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
> One nit below.
> 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 3252eb50ecfe..607f9bb8aab4 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
> >   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
> >   };
> > -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> > -					kvm_pte_t *ptep,
> > -					enum kvm_pgtable_walk_flags flag,
> > -					void * const arg);
> > +struct kvm_pgtable_visit_ctx {
> > +	kvm_pte_t				*ptep;
> > +	void					*arg;
> > +	u64					addr;
> > +	u64					end;
> > +	u32					level;
> > +	enum kvm_pgtable_walk_flags		flags;
> > +};
> > +
> > +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> > +					enum kvm_pgtable_walk_flags visit);
> 
> Does it make sense to reorder these fields in the context struct based on
> their properties.

The ordering was a deliberate optimization for space. Your suggestion
has 8 bytes of implicit padding:

>     struct kvm_pgtable_visit_ctx {
>            enum kvm_pgtable_walk_flags     flags;

here

>            u64                             addr;
>            u64                             end;
>            u32                             level;

and here.

>            kvm_pte_t                       *ptep;
>            void                            *arg;
>     };

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-10  0:42       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-10  0:42 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Gavin,

On Thu, Nov 10, 2022 at 08:23:36AM +0800, Gavin Shan wrote:
> On 11/8/22 5:56 AM, Oliver Upton wrote:
> > Passing new arguments by value to the visitor callbacks is extremely
> > inflexible for stuffing new parameters used by only some of the
> > visitors. Use a context structure instead and pass the pointer through
> > to the visitor callback.
> > 
> > While at it, redefine the 'flags' parameter to the visitor to contain
> > the bit indicating the phase of the walk. Pass the entire set of flags
> > through the context structure such that the walker can communicate
> > additional state to the visitor callback.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >   arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
> >   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
> >   arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
> >   arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
> >   4 files changed, 154 insertions(+), 156 deletions(-)
> > 
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
> One nit below.
> 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 3252eb50ecfe..607f9bb8aab4 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
> >   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
> >   };
> > -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> > -					kvm_pte_t *ptep,
> > -					enum kvm_pgtable_walk_flags flag,
> > -					void * const arg);
> > +struct kvm_pgtable_visit_ctx {
> > +	kvm_pte_t				*ptep;
> > +	void					*arg;
> > +	u64					addr;
> > +	u64					end;
> > +	u32					level;
> > +	enum kvm_pgtable_walk_flags		flags;
> > +};
> > +
> > +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
> > +					enum kvm_pgtable_walk_flags visit);
> 
> Does it make sense to reorder these fields in the context struct based on
> their properties.

The ordering was a deliberate optimization for space. Your suggestion
has 8 bytes of implicit padding:

>     struct kvm_pgtable_visit_ctx {
>            enum kvm_pgtable_walk_flags     flags;

here

>            u64                             addr;
>            u64                             end;
>            u32                             level;

and here.

>            kvm_pte_t                       *ptep;
>            void                            *arg;
>     };

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
  2022-11-10  0:42       ` Oliver Upton
  (?)
@ 2022-11-10  3:40         ` Gavin Shan
  -1 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  3:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, Ben Gardon,
	David Matlack, kvmarm, linux-arm-kernel

Hi Oliver,

On 11/10/22 8:42 AM, Oliver Upton wrote:
> On Thu, Nov 10, 2022 at 08:23:36AM +0800, Gavin Shan wrote:
>> On 11/8/22 5:56 AM, Oliver Upton wrote:
>>> Passing new arguments by value to the visitor callbacks is extremely
>>> inflexible for stuffing new parameters used by only some of the
>>> visitors. Use a context structure instead and pass the pointer through
>>> to the visitor callback.
>>>
>>> While at it, redefine the 'flags' parameter to the visitor to contain
>>> the bit indicating the phase of the walk. Pass the entire set of flags
>>> through the context structure such that the walker can communicate
>>> additional state to the visitor callback.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
>>> ---
>>>    arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>>>    arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>>>    arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>>>    arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>>>    4 files changed, 154 insertions(+), 156 deletions(-)
>>>
>>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>
>> One nit below.
>>
>>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>>> index 3252eb50ecfe..607f9bb8aab4 100644
>>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>>> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>>>    	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
>>>    };
>>> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
>>> -					kvm_pte_t *ptep,
>>> -					enum kvm_pgtable_walk_flags flag,
>>> -					void * const arg);
>>> +struct kvm_pgtable_visit_ctx {
>>> +	kvm_pte_t				*ptep;
>>> +	void					*arg;
>>> +	u64					addr;
>>> +	u64					end;
>>> +	u32					level;
>>> +	enum kvm_pgtable_walk_flags		flags;
>>> +};
>>> +
>>> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>>> +					enum kvm_pgtable_walk_flags visit);
>>
>> Does it make sense to reorder these fields in the context struct based on
>> their properties.
> 
> The ordering was a deliberate optimization for space. Your suggestion
> has 8 bytes of implicit padding:
> 

Right, so how about to rearrange these fields like below? It makes
more sense to have @arg after addr/end/ptep.

    struct kvm_pgtable_visit_ctx {
           enum kvm_pgtable_walk_flags     flags;
           u32                             level;
           u64                             addr;
           u64                             end;
           kvm_pte_t                       *ptep;
           void                            *arg;
    };


[...]

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-10  3:40         ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  3:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Oliver,

On 11/10/22 8:42 AM, Oliver Upton wrote:
> On Thu, Nov 10, 2022 at 08:23:36AM +0800, Gavin Shan wrote:
>> On 11/8/22 5:56 AM, Oliver Upton wrote:
>>> Passing new arguments by value to the visitor callbacks is extremely
>>> inflexible for stuffing new parameters used by only some of the
>>> visitors. Use a context structure instead and pass the pointer through
>>> to the visitor callback.
>>>
>>> While at it, redefine the 'flags' parameter to the visitor to contain
>>> the bit indicating the phase of the walk. Pass the entire set of flags
>>> through the context structure such that the walker can communicate
>>> additional state to the visitor callback.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
>>> ---
>>>    arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>>>    arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>>>    arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>>>    arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>>>    4 files changed, 154 insertions(+), 156 deletions(-)
>>>
>>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>
>> One nit below.
>>
>>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>>> index 3252eb50ecfe..607f9bb8aab4 100644
>>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>>> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>>>    	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
>>>    };
>>> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
>>> -					kvm_pte_t *ptep,
>>> -					enum kvm_pgtable_walk_flags flag,
>>> -					void * const arg);
>>> +struct kvm_pgtable_visit_ctx {
>>> +	kvm_pte_t				*ptep;
>>> +	void					*arg;
>>> +	u64					addr;
>>> +	u64					end;
>>> +	u32					level;
>>> +	enum kvm_pgtable_walk_flags		flags;
>>> +};
>>> +
>>> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>>> +					enum kvm_pgtable_walk_flags visit);
>>
>> Does it make sense to reorder these fields in the context struct based on
>> their properties.
> 
> The ordering was a deliberate optimization for space. Your suggestion
> has 8 bytes of implicit padding:
> 

Right, so how about to rearrange these fields like below? It makes
more sense to have @arg after addr/end/ptep.

    struct kvm_pgtable_visit_ctx {
           enum kvm_pgtable_walk_flags     flags;
           u32                             level;
           u64                             addr;
           u64                             end;
           kvm_pte_t                       *ptep;
           void                            *arg;
    };


[...]

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure
@ 2022-11-10  3:40         ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  3:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Oliver,

On 11/10/22 8:42 AM, Oliver Upton wrote:
> On Thu, Nov 10, 2022 at 08:23:36AM +0800, Gavin Shan wrote:
>> On 11/8/22 5:56 AM, Oliver Upton wrote:
>>> Passing new arguments by value to the visitor callbacks is extremely
>>> inflexible for stuffing new parameters used by only some of the
>>> visitors. Use a context structure instead and pass the pointer through
>>> to the visitor callback.
>>>
>>> While at it, redefine the 'flags' parameter to the visitor to contain
>>> the bit indicating the phase of the walk. Pass the entire set of flags
>>> through the context structure such that the walker can communicate
>>> additional state to the visitor callback.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
>>> ---
>>>    arch/arm64/include/asm/kvm_pgtable.h  |  15 +-
>>>    arch/arm64/kvm/hyp/nvhe/mem_protect.c |  10 +-
>>>    arch/arm64/kvm/hyp/nvhe/setup.c       |  16 +-
>>>    arch/arm64/kvm/hyp/pgtable.c          | 269 +++++++++++++-------------
>>>    4 files changed, 154 insertions(+), 156 deletions(-)
>>>
>>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>
>> One nit below.
>>
>>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>>> index 3252eb50ecfe..607f9bb8aab4 100644
>>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>>> @@ -199,10 +199,17 @@ enum kvm_pgtable_walk_flags {
>>>    	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
>>>    };
>>> -typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
>>> -					kvm_pte_t *ptep,
>>> -					enum kvm_pgtable_walk_flags flag,
>>> -					void * const arg);
>>> +struct kvm_pgtable_visit_ctx {
>>> +	kvm_pte_t				*ptep;
>>> +	void					*arg;
>>> +	u64					addr;
>>> +	u64					end;
>>> +	u32					level;
>>> +	enum kvm_pgtable_walk_flags		flags;
>>> +};
>>> +
>>> +typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>>> +					enum kvm_pgtable_walk_flags visit);
>>
>> Does it make sense to reorder these fields in the context struct based on
>> their properties.
> 
> The ordering was a deliberate optimization for space. Your suggestion
> has 8 bytes of implicit padding:
> 

Right, so how about to rearrange these fields like below? It makes
more sense to have @arg after addr/end/ptep.

    struct kvm_pgtable_visit_ctx {
           enum kvm_pgtable_walk_flags     flags;
           u32                             level;
           u64                             addr;
           u64                             end;
           kvm_pte_t                       *ptep;
           void                            *arg;
    };


[...]

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-10  4:55     ` Gavin Shan
  -1 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  4:55 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> Rather than reading the ptep all over the shop, read the ptep once from
> __kvm_pgtable_visit() and stick it in the visitor context. Reread the
> ptep after visiting a leaf in case the callback installed a new table
> underneath.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h  |  1 +
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
>   arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
>   arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
>   4 files changed, 48 insertions(+), 51 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

A nit as below.

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 607f9bb8aab4..14d4b68a1e92 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
>   
>   struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
> +	kvm_pte_t				old;
>   	void					*arg;
>   	u64					addr;
>   	u64					end;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 8f5b6a36a039..d21d1b08a055 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
>   				      enum kvm_pgtable_walk_flags visit)
>   {
>   	struct check_walk_data *d = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
> +	if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
>   		return -EINVAL;
>   
> -	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
> +	return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
>   }
>   
>   static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index a293cf5eba1b..6af443c9d78e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
> -	kvm_pte_t pte = *ctx->ptep;
>   	phys_addr_t phys;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	/*
> @@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
> -	phys = kvm_pte_to_phys(pte);
> +	phys = kvm_pte_to_phys(ctx->old);
>   	if (!addr_is_memory(phys))
>   		return -EINVAL;
>   
> @@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	 * Adjust the host stage-2 mappings to match the ownership attributes
>   	 * configured in the hypervisor stage-1.
>   	 */
> -	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> +	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
>   	switch (state) {
>   	case PKVM_PAGE_OWNED:
>   		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 900c8b9c0cfc..fb3696b3a997 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>   	struct kvm_pgtable_visit_ctx ctx = {
>   		.ptep	= ptep,
> +		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
>   		.addr	= data->addr,
>   		.end	= data->end,
> @@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.flags	= flags,
>   	};
>   	int ret = 0;
> -	kvm_pte_t *childp, pte = *ptep;
> -	bool table = kvm_pte_table(pte, level);
> +	kvm_pte_t *childp;
> +	bool table = kvm_pte_table(ctx.old, level);
>   
>   	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
>   		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>   
>   	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
>   		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
> -		pte = *ptep;
> -		table = kvm_pte_table(pte, level);
> +		ctx.old = READ_ONCE(*ptep);
> +		table = kvm_pte_table(ctx.old, level);
>   	}
>   

Since we're here, it may be worthy to have comments to explain why we need
to read the PTE again, due to page table splitting. For example, write-protection
because of dirty logging for one page, which has been mapped by a PMD entry.

>   	if (ret)
> @@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
> +	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
>   	ret = __kvm_pgtable_walk(data, childp, level + 1);
>   	if (ret)
>   		goto out;
> @@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct leaf_walk_data *data = ctx->arg;
>   
> -	data->pte   = *ctx->ptep;
> +	data->pte   = ctx->old;
>   	data->level = ctx->level;
>   
>   	return 0;
> @@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>   static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				    struct hyp_map_data *data)
>   {
> -	kvm_pte_t new, old = *ctx->ptep;
> +	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   
>   	if (!kvm_block_mapping_supported(ctx, phys))
> @@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   	data->phys += granule;
>   	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
> -	if (old == new)
> +	if (ctx->old == new)
>   		return true;
> -	if (!kvm_pte_valid(old))
> +	if (!kvm_pte_valid(ctx->old))
>   		data->mm_ops->get_page(ctx->ptep);
> -	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
> +	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
>   	smp_store_release(ctx->ptep, new);
> @@ -461,16 +462,16 @@ struct hyp_unmap_data {
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
>   	struct hyp_unmap_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
>   
> -	if (kvm_pte_table(pte, ctx->level)) {
> -		childp = kvm_pte_follow(pte, mm_ops);
> +	if (kvm_pte_table(ctx->old, ctx->level)) {
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
> @@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, ctx->level))
> -		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +	if (kvm_pte_table(ctx->old, ctx->level))
> +		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>   
>   	return 0;
>   }
> @@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
>   	 * Clear the existing PTE, and perform break-before-make with
>   	 * TLB maintenance if it was valid.
>   	 */
> -	if (kvm_pte_valid(*ctx->ptep)) {
> +	if (kvm_pte_valid(ctx->old)) {
>   		kvm_clear_pte(ctx->ptep);
>   		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>   	}
> @@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	kvm_pte_t new, old = *ctx->ptep;
> +	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	else
>   		new = kvm_init_invalid_leaf_owner(data->owner_id);
>   
> -	if (stage2_pte_is_counted(old)) {
> +	if (stage2_pte_is_counted(ctx->old)) {
>   		/*
>   		 * Skip updating the PTE if we are trying to recreate the exact
>   		 * same mapping or only change the access permissions. Instead,
>   		 * the vCPU will exit one more time from guest if still needed
>   		 * and then go through the path of relaxing permissions.
>   		 */
> -		if (!stage2_pte_needs_update(old, new))
> +		if (!stage2_pte_needs_update(ctx->old, new))
>   			return -EAGAIN;
>   
>   		stage2_put_pte(ctx, data->mmu, mm_ops);
> @@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -	kvm_pte_t *childp, pte = *ctx->ptep;
> +	kvm_pte_t *childp;
>   	int ret;
>   
>   	if (data->anchor) {
> -		if (stage2_pte_is_counted(pte))
> +		if (stage2_pte_is_counted(ctx->old))
>   			mm_ops->put_page(ctx->ptep);
>   
>   		return 0;
> @@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	 * a table. Accesses beyond 'end' that fall within the new table
>   	 * will be mapped lazily.
>   	 */
> -	if (stage2_pte_is_counted(pte))
> +	if (stage2_pte_is_counted(ctx->old))
>   		stage2_put_pte(ctx, data->mmu, mm_ops);
>   
>   	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> @@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   		data->childp = NULL;
>   		ret = stage2_map_walk_leaf(ctx, data);
>   	} else {
> -		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   	}
>   
>   	mm_ops->put_page(childp);
> @@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> -	if (!kvm_pte_valid(pte)) {
> -		if (stage2_pte_is_counted(pte)) {
> +	if (!kvm_pte_valid(ctx->old)) {
> +		if (stage2_pte_is_counted(ctx->old)) {
>   			kvm_clear_pte(ctx->ptep);
>   			mm_ops->put_page(ctx->ptep);
>   		}
>   		return 0;
>   	}
>   
> -	if (kvm_pte_table(pte, ctx->level)) {
> -		childp = kvm_pte_follow(pte, mm_ops);
> +	if (kvm_pte_table(ctx->old, ctx->level)) {
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
> -	} else if (stage2_pte_cacheable(pgt, pte)) {
> +	} else if (stage2_pte_cacheable(pgt, ctx->old)) {
>   		need_flush = !stage2_has_fwb(pgt);
>   	}
>   
> @@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   	stage2_put_pte(ctx, mmu, mm_ops);
>   
>   	if (need_flush && mm_ops->dcache_clean_inval_poc)
> -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>   					       kvm_granule_size(ctx->level));
>   
>   	if (childp)
> @@ -1013,11 +1013,11 @@ struct stage2_attr_data {
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ctx->ptep;
> +	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	data->level = ctx->level;
> @@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		 * stage-2 PTE if we are going to add executable permission.
>   		 */
>   		if (mm_ops->icache_inval_pou &&
> -		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
> +		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>   			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>   						  kvm_granule_size(ctx->level));
>   		WRITE_ONCE(*ctx->ptep, pte);
> @@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
> +	if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
>   		return 0;
>   
>   	if (mm_ops->dcache_clean_inval_poc)
> -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>   					       kvm_granule_size(ctx->level));
>   	return 0;
>   }
> @@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!stage2_pte_is_counted(pte))
> +	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
>   
>   	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, ctx->level))
> -		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +	if (kvm_pte_table(ctx->old, ctx->level))
> +		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>   
>   	return 0;
>   }
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
@ 2022-11-10  4:55     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  4:55 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> Rather than reading the ptep all over the shop, read the ptep once from
> __kvm_pgtable_visit() and stick it in the visitor context. Reread the
> ptep after visiting a leaf in case the callback installed a new table
> underneath.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h  |  1 +
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
>   arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
>   arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
>   4 files changed, 48 insertions(+), 51 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

A nit as below.

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 607f9bb8aab4..14d4b68a1e92 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
>   
>   struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
> +	kvm_pte_t				old;
>   	void					*arg;
>   	u64					addr;
>   	u64					end;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 8f5b6a36a039..d21d1b08a055 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
>   				      enum kvm_pgtable_walk_flags visit)
>   {
>   	struct check_walk_data *d = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
> +	if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
>   		return -EINVAL;
>   
> -	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
> +	return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
>   }
>   
>   static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index a293cf5eba1b..6af443c9d78e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
> -	kvm_pte_t pte = *ctx->ptep;
>   	phys_addr_t phys;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	/*
> @@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
> -	phys = kvm_pte_to_phys(pte);
> +	phys = kvm_pte_to_phys(ctx->old);
>   	if (!addr_is_memory(phys))
>   		return -EINVAL;
>   
> @@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	 * Adjust the host stage-2 mappings to match the ownership attributes
>   	 * configured in the hypervisor stage-1.
>   	 */
> -	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> +	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
>   	switch (state) {
>   	case PKVM_PAGE_OWNED:
>   		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 900c8b9c0cfc..fb3696b3a997 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>   	struct kvm_pgtable_visit_ctx ctx = {
>   		.ptep	= ptep,
> +		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
>   		.addr	= data->addr,
>   		.end	= data->end,
> @@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.flags	= flags,
>   	};
>   	int ret = 0;
> -	kvm_pte_t *childp, pte = *ptep;
> -	bool table = kvm_pte_table(pte, level);
> +	kvm_pte_t *childp;
> +	bool table = kvm_pte_table(ctx.old, level);
>   
>   	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
>   		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>   
>   	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
>   		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
> -		pte = *ptep;
> -		table = kvm_pte_table(pte, level);
> +		ctx.old = READ_ONCE(*ptep);
> +		table = kvm_pte_table(ctx.old, level);
>   	}
>   

Since we're here, it may be worthy to have comments to explain why we need
to read the PTE again, due to page table splitting. For example, write-protection
because of dirty logging for one page, which has been mapped by a PMD entry.

>   	if (ret)
> @@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
> +	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
>   	ret = __kvm_pgtable_walk(data, childp, level + 1);
>   	if (ret)
>   		goto out;
> @@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct leaf_walk_data *data = ctx->arg;
>   
> -	data->pte   = *ctx->ptep;
> +	data->pte   = ctx->old;
>   	data->level = ctx->level;
>   
>   	return 0;
> @@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>   static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				    struct hyp_map_data *data)
>   {
> -	kvm_pte_t new, old = *ctx->ptep;
> +	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   
>   	if (!kvm_block_mapping_supported(ctx, phys))
> @@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   	data->phys += granule;
>   	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
> -	if (old == new)
> +	if (ctx->old == new)
>   		return true;
> -	if (!kvm_pte_valid(old))
> +	if (!kvm_pte_valid(ctx->old))
>   		data->mm_ops->get_page(ctx->ptep);
> -	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
> +	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
>   	smp_store_release(ctx->ptep, new);
> @@ -461,16 +462,16 @@ struct hyp_unmap_data {
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
>   	struct hyp_unmap_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
>   
> -	if (kvm_pte_table(pte, ctx->level)) {
> -		childp = kvm_pte_follow(pte, mm_ops);
> +	if (kvm_pte_table(ctx->old, ctx->level)) {
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
> @@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, ctx->level))
> -		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +	if (kvm_pte_table(ctx->old, ctx->level))
> +		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>   
>   	return 0;
>   }
> @@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
>   	 * Clear the existing PTE, and perform break-before-make with
>   	 * TLB maintenance if it was valid.
>   	 */
> -	if (kvm_pte_valid(*ctx->ptep)) {
> +	if (kvm_pte_valid(ctx->old)) {
>   		kvm_clear_pte(ctx->ptep);
>   		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>   	}
> @@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	kvm_pte_t new, old = *ctx->ptep;
> +	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	else
>   		new = kvm_init_invalid_leaf_owner(data->owner_id);
>   
> -	if (stage2_pte_is_counted(old)) {
> +	if (stage2_pte_is_counted(ctx->old)) {
>   		/*
>   		 * Skip updating the PTE if we are trying to recreate the exact
>   		 * same mapping or only change the access permissions. Instead,
>   		 * the vCPU will exit one more time from guest if still needed
>   		 * and then go through the path of relaxing permissions.
>   		 */
> -		if (!stage2_pte_needs_update(old, new))
> +		if (!stage2_pte_needs_update(ctx->old, new))
>   			return -EAGAIN;
>   
>   		stage2_put_pte(ctx, data->mmu, mm_ops);
> @@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -	kvm_pte_t *childp, pte = *ctx->ptep;
> +	kvm_pte_t *childp;
>   	int ret;
>   
>   	if (data->anchor) {
> -		if (stage2_pte_is_counted(pte))
> +		if (stage2_pte_is_counted(ctx->old))
>   			mm_ops->put_page(ctx->ptep);
>   
>   		return 0;
> @@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	 * a table. Accesses beyond 'end' that fall within the new table
>   	 * will be mapped lazily.
>   	 */
> -	if (stage2_pte_is_counted(pte))
> +	if (stage2_pte_is_counted(ctx->old))
>   		stage2_put_pte(ctx, data->mmu, mm_ops);
>   
>   	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> @@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   		data->childp = NULL;
>   		ret = stage2_map_walk_leaf(ctx, data);
>   	} else {
> -		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   	}
>   
>   	mm_ops->put_page(childp);
> @@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> -	if (!kvm_pte_valid(pte)) {
> -		if (stage2_pte_is_counted(pte)) {
> +	if (!kvm_pte_valid(ctx->old)) {
> +		if (stage2_pte_is_counted(ctx->old)) {
>   			kvm_clear_pte(ctx->ptep);
>   			mm_ops->put_page(ctx->ptep);
>   		}
>   		return 0;
>   	}
>   
> -	if (kvm_pte_table(pte, ctx->level)) {
> -		childp = kvm_pte_follow(pte, mm_ops);
> +	if (kvm_pte_table(ctx->old, ctx->level)) {
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
> -	} else if (stage2_pte_cacheable(pgt, pte)) {
> +	} else if (stage2_pte_cacheable(pgt, ctx->old)) {
>   		need_flush = !stage2_has_fwb(pgt);
>   	}
>   
> @@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   	stage2_put_pte(ctx, mmu, mm_ops);
>   
>   	if (need_flush && mm_ops->dcache_clean_inval_poc)
> -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>   					       kvm_granule_size(ctx->level));
>   
>   	if (childp)
> @@ -1013,11 +1013,11 @@ struct stage2_attr_data {
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ctx->ptep;
> +	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	data->level = ctx->level;
> @@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		 * stage-2 PTE if we are going to add executable permission.
>   		 */
>   		if (mm_ops->icache_inval_pou &&
> -		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
> +		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>   			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>   						  kvm_granule_size(ctx->level));
>   		WRITE_ONCE(*ctx->ptep, pte);
> @@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
> +	if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
>   		return 0;
>   
>   	if (mm_ops->dcache_clean_inval_poc)
> -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>   					       kvm_granule_size(ctx->level));
>   	return 0;
>   }
> @@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!stage2_pte_is_counted(pte))
> +	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
>   
>   	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, ctx->level))
> -		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +	if (kvm_pte_table(ctx->old, ctx->level))
> +		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>   
>   	return 0;
>   }
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context
@ 2022-11-10  4:55     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  4:55 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> Rather than reading the ptep all over the shop, read the ptep once from
> __kvm_pgtable_visit() and stick it in the visitor context. Reread the
> ptep after visiting a leaf in case the callback installed a new table
> underneath.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h  |  1 +
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c |  5 +-
>   arch/arm64/kvm/hyp/nvhe/setup.c       |  7 +--
>   arch/arm64/kvm/hyp/pgtable.c          | 86 +++++++++++++--------------
>   4 files changed, 48 insertions(+), 51 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

A nit as below.

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 607f9bb8aab4..14d4b68a1e92 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -201,6 +201,7 @@ enum kvm_pgtable_walk_flags {
>   
>   struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
> +	kvm_pte_t				old;
>   	void					*arg;
>   	u64					addr;
>   	u64					end;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 8f5b6a36a039..d21d1b08a055 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -421,12 +421,11 @@ static int __check_page_state_visitor(const struct kvm_pgtable_visit_ctx *ctx,
>   				      enum kvm_pgtable_walk_flags visit)
>   {
>   	struct check_walk_data *d = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
> +	if (kvm_pte_valid(ctx->old) && !addr_is_memory(kvm_pte_to_phys(ctx->old)))
>   		return -EINVAL;
>   
> -	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
> +	return d->get_page_state(ctx->old) == d->desired ? 0 : -EPERM;
>   }
>   
>   static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index a293cf5eba1b..6af443c9d78e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -192,10 +192,9 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
> -	kvm_pte_t pte = *ctx->ptep;
>   	phys_addr_t phys;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	/*
> @@ -210,7 +209,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
> -	phys = kvm_pte_to_phys(pte);
> +	phys = kvm_pte_to_phys(ctx->old);
>   	if (!addr_is_memory(phys))
>   		return -EINVAL;
>   
> @@ -218,7 +217,7 @@ static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx
>   	 * Adjust the host stage-2 mappings to match the ownership attributes
>   	 * configured in the hypervisor stage-1.
>   	 */
> -	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> +	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
>   	switch (state) {
>   	case PKVM_PAGE_OWNED:
>   		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 900c8b9c0cfc..fb3696b3a997 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -189,6 +189,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>   	struct kvm_pgtable_visit_ctx ctx = {
>   		.ptep	= ptep,
> +		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
>   		.addr	= data->addr,
>   		.end	= data->end,
> @@ -196,16 +197,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.flags	= flags,
>   	};
>   	int ret = 0;
> -	kvm_pte_t *childp, pte = *ptep;
> -	bool table = kvm_pte_table(pte, level);
> +	kvm_pte_t *childp;
> +	bool table = kvm_pte_table(ctx.old, level);
>   
>   	if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE))
>   		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE);
>   
>   	if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) {
>   		ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF);
> -		pte = *ptep;
> -		table = kvm_pte_table(pte, level);
> +		ctx.old = READ_ONCE(*ptep);
> +		table = kvm_pte_table(ctx.old, level);
>   	}
>   

Since we're here, it may be worthy to have comments to explain why we need
to read the PTE again, due to page table splitting. For example, write-protection
because of dirty logging for one page, which has been mapped by a PMD entry.

>   	if (ret)
> @@ -217,7 +218,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(pte, data->pgt->mm_ops);
> +	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
>   	ret = __kvm_pgtable_walk(data, childp, level + 1);
>   	if (ret)
>   		goto out;
> @@ -299,7 +300,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct leaf_walk_data *data = ctx->arg;
>   
> -	data->pte   = *ctx->ptep;
> +	data->pte   = ctx->old;
>   	data->level = ctx->level;
>   
>   	return 0;
> @@ -388,7 +389,7 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>   static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				    struct hyp_map_data *data)
>   {
> -	kvm_pte_t new, old = *ctx->ptep;
> +	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   
>   	if (!kvm_block_mapping_supported(ctx, phys))
> @@ -396,11 +397,11 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   	data->phys += granule;
>   	new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
> -	if (old == new)
> +	if (ctx->old == new)
>   		return true;
> -	if (!kvm_pte_valid(old))
> +	if (!kvm_pte_valid(ctx->old))
>   		data->mm_ops->get_page(ctx->ptep);
> -	else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
> +	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
>   	smp_store_release(ctx->ptep, new);
> @@ -461,16 +462,16 @@ struct hyp_unmap_data {
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
>   	struct hyp_unmap_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
>   
> -	if (kvm_pte_table(pte, ctx->level)) {
> -		childp = kvm_pte_follow(pte, mm_ops);
> +	if (kvm_pte_table(ctx->old, ctx->level)) {
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
> @@ -538,15 +539,14 @@ static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, ctx->level))
> -		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +	if (kvm_pte_table(ctx->old, ctx->level))
> +		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>   
>   	return 0;
>   }
> @@ -691,7 +691,7 @@ static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s
>   	 * Clear the existing PTE, and perform break-before-make with
>   	 * TLB maintenance if it was valid.
>   	 */
> -	if (kvm_pte_valid(*ctx->ptep)) {
> +	if (kvm_pte_valid(ctx->old)) {
>   		kvm_clear_pte(ctx->ptep);
>   		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>   	}
> @@ -722,7 +722,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	kvm_pte_t new, old = *ctx->ptep;
> +	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> @@ -735,14 +735,14 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	else
>   		new = kvm_init_invalid_leaf_owner(data->owner_id);
>   
> -	if (stage2_pte_is_counted(old)) {
> +	if (stage2_pte_is_counted(ctx->old)) {
>   		/*
>   		 * Skip updating the PTE if we are trying to recreate the exact
>   		 * same mapping or only change the access permissions. Instead,
>   		 * the vCPU will exit one more time from guest if still needed
>   		 * and then go through the path of relaxing permissions.
>   		 */
> -		if (!stage2_pte_needs_update(old, new))
> +		if (!stage2_pte_needs_update(ctx->old, new))
>   			return -EAGAIN;
>   
>   		stage2_put_pte(ctx, data->mmu, mm_ops);
> @@ -773,7 +773,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(*ctx->ptep, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -790,11 +790,11 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> -	kvm_pte_t *childp, pte = *ctx->ptep;
> +	kvm_pte_t *childp;
>   	int ret;
>   
>   	if (data->anchor) {
> -		if (stage2_pte_is_counted(pte))
> +		if (stage2_pte_is_counted(ctx->old))
>   			mm_ops->put_page(ctx->ptep);
>   
>   		return 0;
> @@ -819,7 +819,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	 * a table. Accesses beyond 'end' that fall within the new table
>   	 * will be mapped lazily.
>   	 */
> -	if (stage2_pte_is_counted(pte))
> +	if (stage2_pte_is_counted(ctx->old))
>   		stage2_put_pte(ctx, data->mmu, mm_ops);
>   
>   	kvm_set_table_pte(ctx->ptep, childp, mm_ops);
> @@ -844,7 +844,7 @@ static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   		data->childp = NULL;
>   		ret = stage2_map_walk_leaf(ctx, data);
>   	} else {
> -		childp = kvm_pte_follow(*ctx->ptep, mm_ops);
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   	}
>   
>   	mm_ops->put_page(childp);
> @@ -954,23 +954,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ctx->ptep, *childp = NULL;
> +	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> -	if (!kvm_pte_valid(pte)) {
> -		if (stage2_pte_is_counted(pte)) {
> +	if (!kvm_pte_valid(ctx->old)) {
> +		if (stage2_pte_is_counted(ctx->old)) {
>   			kvm_clear_pte(ctx->ptep);
>   			mm_ops->put_page(ctx->ptep);
>   		}
>   		return 0;
>   	}
>   
> -	if (kvm_pte_table(pte, ctx->level)) {
> -		childp = kvm_pte_follow(pte, mm_ops);
> +	if (kvm_pte_table(ctx->old, ctx->level)) {
> +		childp = kvm_pte_follow(ctx->old, mm_ops);
>   
>   		if (mm_ops->page_count(childp) != 1)
>   			return 0;
> -	} else if (stage2_pte_cacheable(pgt, pte)) {
> +	} else if (stage2_pte_cacheable(pgt, ctx->old)) {
>   		need_flush = !stage2_has_fwb(pgt);
>   	}
>   
> @@ -982,7 +982,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   	stage2_put_pte(ctx, mmu, mm_ops);
>   
>   	if (need_flush && mm_ops->dcache_clean_inval_poc)
> -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>   					       kvm_granule_size(ctx->level));
>   
>   	if (childp)
> @@ -1013,11 +1013,11 @@ struct stage2_attr_data {
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	kvm_pte_t pte = *ctx->ptep;
> +	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>   
> -	if (!kvm_pte_valid(pte))
> +	if (!kvm_pte_valid(ctx->old))
>   		return 0;
>   
>   	data->level = ctx->level;
> @@ -1036,7 +1036,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		 * stage-2 PTE if we are going to add executable permission.
>   		 */
>   		if (mm_ops->icache_inval_pou &&
> -		    stage2_pte_executable(pte) && !stage2_pte_executable(*ctx->ptep))
> +		    stage2_pte_executable(pte) && !stage2_pte_executable(ctx->old))
>   			mm_ops->icache_inval_pou(kvm_pte_follow(pte, mm_ops),
>   						  kvm_granule_size(ctx->level));
>   		WRITE_ONCE(*ctx->ptep, pte);
> @@ -1142,13 +1142,12 @@ static int stage2_flush_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
> +	if (!kvm_pte_valid(ctx->old) || !stage2_pte_cacheable(pgt, ctx->old))
>   		return 0;
>   
>   	if (mm_ops->dcache_clean_inval_poc)
> -		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(pte, mm_ops),
> +		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
>   					       kvm_granule_size(ctx->level));
>   	return 0;
>   }
> @@ -1200,15 +1199,14 @@ static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> -	kvm_pte_t pte = *ctx->ptep;
>   
> -	if (!stage2_pte_is_counted(pte))
> +	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
>   
>   	mm_ops->put_page(ctx->ptep);
>   
> -	if (kvm_pte_table(pte, ctx->level))
> -		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
> +	if (kvm_pte_table(ctx->old, ctx->level))
> +		mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops));
>   
>   	return 0;
>   }
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-10  5:22     ` Gavin Shan
  -1 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:22 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h |  1 +
>   arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>   arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>   3 files changed, 26 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
>   	kvm_pte_t				old;
>   	void					*arg;
> +	struct kvm_pgtable_mm_ops		*mm_ops;
>   	u64					addr;
>   	u64					end;
>   	u32					level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>   static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
>   	phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= finalize_host_mappings_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pkvm_pgtable.mm_ops,
>   	};
>   	int i, ret;
>   
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level);
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>   
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +				      struct kvm_pgtable_mm_ops *mm_ops,
>   				      kvm_pte_t *ptep, u32 level)
>   {
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.ptep	= ptep,
>   		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
> +		.mm_ops	= mm_ops,
>   		.addr	= data->addr,
>   		.end	= data->end,
>   		.level	= level,

I don't understand why we need @mm_ops argument for this function:

   - @mm_ops is always fetched from the associated page table struct.
     (data->pgt->mm_ops) in upper layer (_kvm_pgtable_walk()). However,
     the argument starts to be used in lower layer (__kvm_pgtable_visit()),
     meaning the argument isn't needed by the upper layers.

   - @mm_ops isn't need by __kvm_pgtable_walk(). So the argument isn't needed by the
     function.


> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	childp = kvm_pte_follow(ctx.old, mm_ops);
> +	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>   	if (ret)
>   		goto out;
>   
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level)
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>   {
>   	u32 idx;
>   	int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   		if (data->addr >= data->end)
>   			break;
>   
> -		ret = __kvm_pgtable_visit(data, ptep, level);
> +		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>   		if (ret)
>   			break;
>   	}
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
> -		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>   		if (ret)
>   			break;
>   	}

As I mentioned above, @mm_ops isn't needed by __kvm_pgtable_walk() and __kvm_pgtable_visit()
can fetch it directly from data->pgtable->mm_ops.

> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>   struct hyp_map_data {
>   	u64				phys;
>   	kvm_pte_t			attr;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (ctx->old == new)
>   		return true;
>   	if (!kvm_pte_valid(ctx->old))
> -		data->mm_ops->get_page(ctx->ptep);
> +		ctx->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t *childp;
>   	struct hyp_map_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	int ret;
>   	struct hyp_map_data map_data = {
>   		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
> -		.mm_ops	= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	return ret;
>   }
>   
> -struct hyp_unmap_data {
> -	u64				unmapped;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -};
> -
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
> -	struct hyp_unmap_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	u64 *unmapped = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
>   		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -		data->unmapped += granule;
> +		*unmapped += granule;
>   	}
>   
>   	dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   {
> -	struct hyp_unmap_data unmap_data = {
> -		.mm_ops	= pgt->mm_ops,
> -	};
> +	u64 unmapped = 0;
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_unmap_walker,
> -		.arg	= &unmap_data,
> +		.arg	= &unmapped,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>   	};
>   
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   		return 0;
>   
>   	kvm_pgtable_walk(pgt, addr, size, &walker);
> -	return unmap_data.unmapped;
> +	return unmapped;
>   }
>   
>   int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>   	struct kvm_s2_mmu		*mmu;
>   	void				*memcache;
>   
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -
>   	/* Force mappings to page granularity */
>   	bool				force_pte;
>   };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret;
>   
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret = 0;
>   
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>   	};
>   	struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= KVM_PHYS_INVALID,
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.owner_id	= owner_id,
>   		.force_pte	= true,
>   	};
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
> -	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>   	kvm_pte_t			attr_clr;
>   	kvm_pte_t			pte;
>   	u32				level;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>   	struct stage2_attr_data data = {
>   		.attr_set	= attr_set & attr_mask,
>   		.attr_clr	= attr_clr & attr_mask,
> -		.mm_ops		= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb		= stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>   		.cb	= stage2_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF |
>   			  KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
@ 2022-11-10  5:22     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:22 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h |  1 +
>   arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>   arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>   3 files changed, 26 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
>   	kvm_pte_t				old;
>   	void					*arg;
> +	struct kvm_pgtable_mm_ops		*mm_ops;
>   	u64					addr;
>   	u64					end;
>   	u32					level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>   static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
>   	phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= finalize_host_mappings_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pkvm_pgtable.mm_ops,
>   	};
>   	int i, ret;
>   
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level);
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>   
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +				      struct kvm_pgtable_mm_ops *mm_ops,
>   				      kvm_pte_t *ptep, u32 level)
>   {
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.ptep	= ptep,
>   		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
> +		.mm_ops	= mm_ops,
>   		.addr	= data->addr,
>   		.end	= data->end,
>   		.level	= level,

I don't understand why we need @mm_ops argument for this function:

   - @mm_ops is always fetched from the associated page table struct.
     (data->pgt->mm_ops) in upper layer (_kvm_pgtable_walk()). However,
     the argument starts to be used in lower layer (__kvm_pgtable_visit()),
     meaning the argument isn't needed by the upper layers.

   - @mm_ops isn't need by __kvm_pgtable_walk(). So the argument isn't needed by the
     function.


> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	childp = kvm_pte_follow(ctx.old, mm_ops);
> +	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>   	if (ret)
>   		goto out;
>   
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level)
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>   {
>   	u32 idx;
>   	int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   		if (data->addr >= data->end)
>   			break;
>   
> -		ret = __kvm_pgtable_visit(data, ptep, level);
> +		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>   		if (ret)
>   			break;
>   	}
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
> -		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>   		if (ret)
>   			break;
>   	}

As I mentioned above, @mm_ops isn't needed by __kvm_pgtable_walk() and __kvm_pgtable_visit()
can fetch it directly from data->pgtable->mm_ops.

> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>   struct hyp_map_data {
>   	u64				phys;
>   	kvm_pte_t			attr;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (ctx->old == new)
>   		return true;
>   	if (!kvm_pte_valid(ctx->old))
> -		data->mm_ops->get_page(ctx->ptep);
> +		ctx->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t *childp;
>   	struct hyp_map_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	int ret;
>   	struct hyp_map_data map_data = {
>   		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
> -		.mm_ops	= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	return ret;
>   }
>   
> -struct hyp_unmap_data {
> -	u64				unmapped;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -};
> -
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
> -	struct hyp_unmap_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	u64 *unmapped = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
>   		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -		data->unmapped += granule;
> +		*unmapped += granule;
>   	}
>   
>   	dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   {
> -	struct hyp_unmap_data unmap_data = {
> -		.mm_ops	= pgt->mm_ops,
> -	};
> +	u64 unmapped = 0;
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_unmap_walker,
> -		.arg	= &unmap_data,
> +		.arg	= &unmapped,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>   	};
>   
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   		return 0;
>   
>   	kvm_pgtable_walk(pgt, addr, size, &walker);
> -	return unmap_data.unmapped;
> +	return unmapped;
>   }
>   
>   int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>   	struct kvm_s2_mmu		*mmu;
>   	void				*memcache;
>   
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -
>   	/* Force mappings to page granularity */
>   	bool				force_pte;
>   };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret;
>   
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret = 0;
>   
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>   	};
>   	struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= KVM_PHYS_INVALID,
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.owner_id	= owner_id,
>   		.force_pte	= true,
>   	};
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
> -	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>   	kvm_pte_t			attr_clr;
>   	kvm_pte_t			pte;
>   	u32				level;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>   	struct stage2_attr_data data = {
>   		.attr_set	= attr_set & attr_mask,
>   		.attr_clr	= attr_clr & attr_mask,
> -		.mm_ops		= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb		= stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>   		.cb	= stage2_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF |
>   			  KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
@ 2022-11-10  5:22     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:22 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h |  1 +
>   arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>   arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>   3 files changed, 26 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
>   	kvm_pte_t				old;
>   	void					*arg;
> +	struct kvm_pgtable_mm_ops		*mm_ops;
>   	u64					addr;
>   	u64					end;
>   	u32					level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>   static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
>   	phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= finalize_host_mappings_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pkvm_pgtable.mm_ops,
>   	};
>   	int i, ret;
>   
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level);
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>   
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +				      struct kvm_pgtable_mm_ops *mm_ops,
>   				      kvm_pte_t *ptep, u32 level)
>   {
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.ptep	= ptep,
>   		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
> +		.mm_ops	= mm_ops,
>   		.addr	= data->addr,
>   		.end	= data->end,
>   		.level	= level,

I don't understand why we need @mm_ops argument for this function:

   - @mm_ops is always fetched from the associated page table struct.
     (data->pgt->mm_ops) in upper layer (_kvm_pgtable_walk()). However,
     the argument starts to be used in lower layer (__kvm_pgtable_visit()),
     meaning the argument isn't needed by the upper layers.

   - @mm_ops isn't need by __kvm_pgtable_walk(). So the argument isn't needed by the
     function.


> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	childp = kvm_pte_follow(ctx.old, mm_ops);
> +	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>   	if (ret)
>   		goto out;
>   
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level)
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>   {
>   	u32 idx;
>   	int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   		if (data->addr >= data->end)
>   			break;
>   
> -		ret = __kvm_pgtable_visit(data, ptep, level);
> +		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>   		if (ret)
>   			break;
>   	}
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
> -		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>   		if (ret)
>   			break;
>   	}

As I mentioned above, @mm_ops isn't needed by __kvm_pgtable_walk() and __kvm_pgtable_visit()
can fetch it directly from data->pgtable->mm_ops.

> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>   struct hyp_map_data {
>   	u64				phys;
>   	kvm_pte_t			attr;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (ctx->old == new)
>   		return true;
>   	if (!kvm_pte_valid(ctx->old))
> -		data->mm_ops->get_page(ctx->ptep);
> +		ctx->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t *childp;
>   	struct hyp_map_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	int ret;
>   	struct hyp_map_data map_data = {
>   		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
> -		.mm_ops	= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	return ret;
>   }
>   
> -struct hyp_unmap_data {
> -	u64				unmapped;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -};
> -
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
> -	struct hyp_unmap_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	u64 *unmapped = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
>   		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -		data->unmapped += granule;
> +		*unmapped += granule;
>   	}
>   
>   	dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   {
> -	struct hyp_unmap_data unmap_data = {
> -		.mm_ops	= pgt->mm_ops,
> -	};
> +	u64 unmapped = 0;
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_unmap_walker,
> -		.arg	= &unmap_data,
> +		.arg	= &unmapped,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>   	};
>   
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   		return 0;
>   
>   	kvm_pgtable_walk(pgt, addr, size, &walker);
> -	return unmap_data.unmapped;
> +	return unmapped;
>   }
>   
>   int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>   	struct kvm_s2_mmu		*mmu;
>   	void				*memcache;
>   
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -
>   	/* Force mappings to page granularity */
>   	bool				force_pte;
>   };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret;
>   
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret = 0;
>   
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>   	};
>   	struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= KVM_PHYS_INVALID,
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.owner_id	= owner_id,
>   		.force_pte	= true,
>   	};
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
> -	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>   	kvm_pte_t			attr_clr;
>   	kvm_pte_t			pte;
>   	u32				level;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>   	struct stage2_attr_data data = {
>   		.attr_set	= attr_set & attr_mask,
>   		.attr_clr	= attr_clr & attr_mask,
> -		.mm_ops		= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb		= stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>   		.cb	= stage2_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF |
>   			  KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-10  5:30     ` Gavin Shan
  -1 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:30 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> In order to tear down page tables from outside the context of
> kvm_pgtable (such as an RCU callback), stop passing a pointer through
> kvm_pgtable_walk_data.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
>   1 file changed, 5 insertions(+), 13 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index db25e81a9890..93989b750a26 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -50,7 +50,6 @@
>   #define KVM_MAX_OWNER_ID		1
>   
>   struct kvm_pgtable_walk_data {
> -	struct kvm_pgtable		*pgt;
>   	struct kvm_pgtable_walker	*walker;
>   
>   	u64				addr;

Ok. Here is the answer why data->pgt->mm_ops isn't reachable in the walker
and visitor, and @mm_ops needs to be passed down.

> @@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
>   	return (data->addr >> shift) & mask;
>   }
>   
> -static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
> +static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>   {
>   	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
>   	u64 mask = BIT(pgt->ia_bits) - 1;
> @@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>   	return (addr & mask) >> shift;
>   }
>   
> -static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
> -{
> -	return __kvm_pgd_page_idx(data->pgt, data->addr);
> -}
> -
>   static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>   {
>   	struct kvm_pgtable pgt = {
> @@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>   		.start_level	= start_level,
>   	};
>   
> -	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> +	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>   }
>   
>   static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> @@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   	return ret;
>   }
>   
> -static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> +static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
>   {
>   	u32 idx;
>   	int ret = 0;
> -	struct kvm_pgtable *pgt = data->pgt;
>   	u64 limit = BIT(pgt->ia_bits);
>   
>   	if (data->addr > limit || data->end > limit)
> @@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	if (!pgt->pgd)
>   		return -EINVAL;
>   
> -	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> +	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
>   		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> @@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		     struct kvm_pgtable_walker *walker)
>   {
>   	struct kvm_pgtable_walk_data walk_data = {
> -		.pgt	= pgt,
>   		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
>   		.end	= PAGE_ALIGN(walk_data.addr + size),
>   		.walker	= walker,
>   	};
>   
> -	return _kvm_pgtable_walk(&walk_data);
> +	return _kvm_pgtable_walk(pgt, &walk_data);
>   }
>   
>   struct leaf_walk_data {
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
@ 2022-11-10  5:30     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:30 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> In order to tear down page tables from outside the context of
> kvm_pgtable (such as an RCU callback), stop passing a pointer through
> kvm_pgtable_walk_data.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
>   1 file changed, 5 insertions(+), 13 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index db25e81a9890..93989b750a26 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -50,7 +50,6 @@
>   #define KVM_MAX_OWNER_ID		1
>   
>   struct kvm_pgtable_walk_data {
> -	struct kvm_pgtable		*pgt;
>   	struct kvm_pgtable_walker	*walker;
>   
>   	u64				addr;

Ok. Here is the answer why data->pgt->mm_ops isn't reachable in the walker
and visitor, and @mm_ops needs to be passed down.

> @@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
>   	return (data->addr >> shift) & mask;
>   }
>   
> -static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
> +static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>   {
>   	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
>   	u64 mask = BIT(pgt->ia_bits) - 1;
> @@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>   	return (addr & mask) >> shift;
>   }
>   
> -static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
> -{
> -	return __kvm_pgd_page_idx(data->pgt, data->addr);
> -}
> -
>   static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>   {
>   	struct kvm_pgtable pgt = {
> @@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>   		.start_level	= start_level,
>   	};
>   
> -	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> +	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>   }
>   
>   static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> @@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   	return ret;
>   }
>   
> -static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> +static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
>   {
>   	u32 idx;
>   	int ret = 0;
> -	struct kvm_pgtable *pgt = data->pgt;
>   	u64 limit = BIT(pgt->ia_bits);
>   
>   	if (data->addr > limit || data->end > limit)
> @@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	if (!pgt->pgd)
>   		return -EINVAL;
>   
> -	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> +	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
>   		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> @@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		     struct kvm_pgtable_walker *walker)
>   {
>   	struct kvm_pgtable_walk_data walk_data = {
> -		.pgt	= pgt,
>   		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
>   		.end	= PAGE_ALIGN(walk_data.addr + size),
>   		.walker	= walker,
>   	};
>   
> -	return _kvm_pgtable_walk(&walk_data);
> +	return _kvm_pgtable_walk(pgt, &walk_data);
>   }
>   
>   struct leaf_walk_data {
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
@ 2022-11-10  5:30     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:30 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Oliver,

On 11/8/22 5:56 AM, Oliver Upton wrote:
> In order to tear down page tables from outside the context of
> kvm_pgtable (such as an RCU callback), stop passing a pointer through
> kvm_pgtable_walk_data.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
>   1 file changed, 5 insertions(+), 13 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index db25e81a9890..93989b750a26 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -50,7 +50,6 @@
>   #define KVM_MAX_OWNER_ID		1
>   
>   struct kvm_pgtable_walk_data {
> -	struct kvm_pgtable		*pgt;
>   	struct kvm_pgtable_walker	*walker;
>   
>   	u64				addr;

Ok. Here is the answer why data->pgt->mm_ops isn't reachable in the walker
and visitor, and @mm_ops needs to be passed down.

> @@ -88,7 +87,7 @@ static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
>   	return (data->addr >> shift) & mask;
>   }
>   
> -static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
> +static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>   {
>   	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
>   	u64 mask = BIT(pgt->ia_bits) - 1;
> @@ -96,11 +95,6 @@ static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>   	return (addr & mask) >> shift;
>   }
>   
> -static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
> -{
> -	return __kvm_pgd_page_idx(data->pgt, data->addr);
> -}
> -
>   static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>   {
>   	struct kvm_pgtable pgt = {
> @@ -108,7 +102,7 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>   		.start_level	= start_level,
>   	};
>   
> -	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> +	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>   }
>   
>   static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> @@ -255,11 +249,10 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   	return ret;
>   }
>   
> -static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> +static int _kvm_pgtable_walk(struct kvm_pgtable *pgt, struct kvm_pgtable_walk_data *data)
>   {
>   	u32 idx;
>   	int ret = 0;
> -	struct kvm_pgtable *pgt = data->pgt;
>   	u64 limit = BIT(pgt->ia_bits);
>   
>   	if (data->addr > limit || data->end > limit)
> @@ -268,7 +261,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	if (!pgt->pgd)
>   		return -EINVAL;
>   
> -	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> +	for (idx = kvm_pgd_page_idx(pgt, data->addr); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
>   		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
> @@ -283,13 +276,12 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		     struct kvm_pgtable_walker *walker)
>   {
>   	struct kvm_pgtable_walk_data walk_data = {
> -		.pgt	= pgt,
>   		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
>   		.end	= PAGE_ALIGN(walk_data.addr + size),
>   		.walker	= walker,
>   	};
>   
> -	return _kvm_pgtable_walk(&walk_data);
> +	return _kvm_pgtable_walk(pgt, &walk_data);
>   }
>   
>   struct leaf_walk_data {
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
  2022-11-07 21:56   ` Oliver Upton
  (?)
@ 2022-11-10  5:30     ` Gavin Shan
  -1 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:30 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

On 11/8/22 5:56 AM, Oliver Upton wrote:
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h |  1 +
>   arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>   arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>   3 files changed, 26 insertions(+), 41 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
>   	kvm_pte_t				old;
>   	void					*arg;
> +	struct kvm_pgtable_mm_ops		*mm_ops;
>   	u64					addr;
>   	u64					end;
>   	u32					level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>   static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
>   	phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= finalize_host_mappings_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pkvm_pgtable.mm_ops,
>   	};
>   	int i, ret;
>   
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level);
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>   
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +				      struct kvm_pgtable_mm_ops *mm_ops,
>   				      kvm_pte_t *ptep, u32 level)
>   {
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.ptep	= ptep,
>   		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
> +		.mm_ops	= mm_ops,
>   		.addr	= data->addr,
>   		.end	= data->end,
>   		.level	= level,
> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	childp = kvm_pte_follow(ctx.old, mm_ops);
> +	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>   	if (ret)
>   		goto out;
>   
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level)
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>   {
>   	u32 idx;
>   	int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   		if (data->addr >= data->end)
>   			break;
>   
> -		ret = __kvm_pgtable_visit(data, ptep, level);
> +		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>   		if (ret)
>   			break;
>   	}
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
> -		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>   		if (ret)
>   			break;
>   	}
> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>   struct hyp_map_data {
>   	u64				phys;
>   	kvm_pte_t			attr;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (ctx->old == new)
>   		return true;
>   	if (!kvm_pte_valid(ctx->old))
> -		data->mm_ops->get_page(ctx->ptep);
> +		ctx->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t *childp;
>   	struct hyp_map_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	int ret;
>   	struct hyp_map_data map_data = {
>   		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
> -		.mm_ops	= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	return ret;
>   }
>   
> -struct hyp_unmap_data {
> -	u64				unmapped;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -};
> -
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
> -	struct hyp_unmap_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	u64 *unmapped = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
>   		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -		data->unmapped += granule;
> +		*unmapped += granule;
>   	}
>   
>   	dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   {
> -	struct hyp_unmap_data unmap_data = {
> -		.mm_ops	= pgt->mm_ops,
> -	};
> +	u64 unmapped = 0;
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_unmap_walker,
> -		.arg	= &unmap_data,
> +		.arg	= &unmapped,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>   	};
>   
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   		return 0;
>   
>   	kvm_pgtable_walk(pgt, addr, size, &walker);
> -	return unmap_data.unmapped;
> +	return unmapped;
>   }
>   
>   int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>   	struct kvm_s2_mmu		*mmu;
>   	void				*memcache;
>   
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -
>   	/* Force mappings to page granularity */
>   	bool				force_pte;
>   };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret;
>   
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret = 0;
>   
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>   	};
>   	struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= KVM_PHYS_INVALID,
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.owner_id	= owner_id,
>   		.force_pte	= true,
>   	};
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
> -	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>   	kvm_pte_t			attr_clr;
>   	kvm_pte_t			pte;
>   	u32				level;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>   	struct stage2_attr_data data = {
>   		.attr_set	= attr_set & attr_mask,
>   		.attr_clr	= attr_clr & attr_mask,
> -		.mm_ops		= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb		= stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>   		.cb	= stage2_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF |
>   			  KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
@ 2022-11-10  5:30     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:30 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On 11/8/22 5:56 AM, Oliver Upton wrote:
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h |  1 +
>   arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>   arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>   3 files changed, 26 insertions(+), 41 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
>   	kvm_pte_t				old;
>   	void					*arg;
> +	struct kvm_pgtable_mm_ops		*mm_ops;
>   	u64					addr;
>   	u64					end;
>   	u32					level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>   static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
>   	phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= finalize_host_mappings_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pkvm_pgtable.mm_ops,
>   	};
>   	int i, ret;
>   
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level);
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>   
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +				      struct kvm_pgtable_mm_ops *mm_ops,
>   				      kvm_pte_t *ptep, u32 level)
>   {
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.ptep	= ptep,
>   		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
> +		.mm_ops	= mm_ops,
>   		.addr	= data->addr,
>   		.end	= data->end,
>   		.level	= level,
> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	childp = kvm_pte_follow(ctx.old, mm_ops);
> +	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>   	if (ret)
>   		goto out;
>   
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level)
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>   {
>   	u32 idx;
>   	int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   		if (data->addr >= data->end)
>   			break;
>   
> -		ret = __kvm_pgtable_visit(data, ptep, level);
> +		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>   		if (ret)
>   			break;
>   	}
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
> -		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>   		if (ret)
>   			break;
>   	}
> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>   struct hyp_map_data {
>   	u64				phys;
>   	kvm_pte_t			attr;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (ctx->old == new)
>   		return true;
>   	if (!kvm_pte_valid(ctx->old))
> -		data->mm_ops->get_page(ctx->ptep);
> +		ctx->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t *childp;
>   	struct hyp_map_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	int ret;
>   	struct hyp_map_data map_data = {
>   		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
> -		.mm_ops	= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	return ret;
>   }
>   
> -struct hyp_unmap_data {
> -	u64				unmapped;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -};
> -
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
> -	struct hyp_unmap_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	u64 *unmapped = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
>   		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -		data->unmapped += granule;
> +		*unmapped += granule;
>   	}
>   
>   	dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   {
> -	struct hyp_unmap_data unmap_data = {
> -		.mm_ops	= pgt->mm_ops,
> -	};
> +	u64 unmapped = 0;
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_unmap_walker,
> -		.arg	= &unmap_data,
> +		.arg	= &unmapped,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>   	};
>   
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   		return 0;
>   
>   	kvm_pgtable_walk(pgt, addr, size, &walker);
> -	return unmap_data.unmapped;
> +	return unmapped;
>   }
>   
>   int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>   	struct kvm_s2_mmu		*mmu;
>   	void				*memcache;
>   
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -
>   	/* Force mappings to page granularity */
>   	bool				force_pte;
>   };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret;
>   
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret = 0;
>   
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>   	};
>   	struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= KVM_PHYS_INVALID,
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.owner_id	= owner_id,
>   		.force_pte	= true,
>   	};
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
> -	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>   	kvm_pte_t			attr_clr;
>   	kvm_pte_t			pte;
>   	u32				level;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>   	struct stage2_attr_data data = {
>   		.attr_set	= attr_set & attr_mask,
>   		.attr_clr	= attr_clr & attr_mask,
> -		.mm_ops		= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb		= stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>   		.cb	= stage2_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF |
>   			  KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> 


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the visitor context
@ 2022-11-10  5:30     ` Gavin Shan
  0 siblings, 0 replies; 156+ messages in thread
From: Gavin Shan @ 2022-11-10  5:30 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On 11/8/22 5:56 AM, Oliver Upton wrote:
> As a prerequisite for getting visitors off of struct kvm_pgtable, pass
> mm_ops through the visitor context.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h |  1 +
>   arch/arm64/kvm/hyp/nvhe/setup.c      |  3 +-
>   arch/arm64/kvm/hyp/pgtable.c         | 63 +++++++++++-----------------
>   3 files changed, 26 insertions(+), 41 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 14d4b68a1e92..a752793482cb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -203,6 +203,7 @@ struct kvm_pgtable_visit_ctx {
>   	kvm_pte_t				*ptep;
>   	kvm_pte_t				old;
>   	void					*arg;
> +	struct kvm_pgtable_mm_ops		*mm_ops;
>   	u64					addr;
>   	u64					end;
>   	u32					level;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 6af443c9d78e..1068338d77f3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -189,7 +189,7 @@ static void hpool_put_page(void *addr)
>   static int finalize_host_mappings_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   					 enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	enum kvm_pgtable_prot prot;
>   	enum pkvm_page_state state;
>   	phys_addr_t phys;
> @@ -239,7 +239,6 @@ static int finalize_host_mappings(void)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= finalize_host_mappings_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pkvm_pgtable.mm_ops,
>   	};
>   	int i, ret;
>   
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index fb3696b3a997..db25e81a9890 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -181,9 +181,10 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level);
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level);
>   
>   static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +				      struct kvm_pgtable_mm_ops *mm_ops,
>   				      kvm_pte_t *ptep, u32 level)
>   {
>   	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> @@ -191,6 +192,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		.ptep	= ptep,
>   		.old	= READ_ONCE(*ptep),
>   		.arg	= data->walker->arg,
> +		.mm_ops	= mm_ops,
>   		.addr	= data->addr,
>   		.end	= data->end,
>   		.level	= level,
> @@ -218,8 +220,8 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   		goto out;
>   	}
>   
> -	childp = kvm_pte_follow(ctx.old, data->pgt->mm_ops);
> -	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	childp = kvm_pte_follow(ctx.old, mm_ops);
> +	ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
>   	if (ret)
>   		goto out;
>   
> @@ -231,7 +233,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>   }
>   
>   static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      kvm_pte_t *pgtable, u32 level)
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level)
>   {
>   	u32 idx;
>   	int ret = 0;
> @@ -245,7 +247,7 @@ static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>   		if (data->addr >= data->end)
>   			break;
>   
> -		ret = __kvm_pgtable_visit(data, ptep, level);
> +		ret = __kvm_pgtable_visit(data, mm_ops, ptep, level);
>   		if (ret)
>   			break;
>   	}
> @@ -269,7 +271,7 @@ static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>   	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>   		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>   
> -		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +		ret = __kvm_pgtable_walk(data, pgt->mm_ops, ptep, pgt->start_level);
>   		if (ret)
>   			break;
>   	}
> @@ -332,7 +334,6 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>   struct hyp_map_data {
>   	u64				phys;
>   	kvm_pte_t			attr;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> @@ -400,7 +401,7 @@ static bool hyp_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (ctx->old == new)
>   		return true;
>   	if (!kvm_pte_valid(ctx->old))
> -		data->mm_ops->get_page(ctx->ptep);
> +		ctx->mm_ops->get_page(ctx->ptep);
>   	else if (WARN_ON((ctx->old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
>   		return false;
>   
> @@ -413,7 +414,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t *childp;
>   	struct hyp_map_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (hyp_map_walker_try_leaf(ctx, data))
>   		return 0;
> @@ -436,7 +437,6 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	int ret;
>   	struct hyp_map_data map_data = {
>   		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
> -		.mm_ops	= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_map_walker,
> @@ -454,18 +454,13 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   	return ret;
>   }
>   
> -struct hyp_unmap_data {
> -	u64				unmapped;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -};
> -
>   static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			    enum kvm_pgtable_walk_flags visit)
>   {
>   	kvm_pte_t *childp = NULL;
>   	u64 granule = kvm_granule_size(ctx->level);
> -	struct hyp_unmap_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	u64 *unmapped = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return -EINVAL;
> @@ -486,7 +481,7 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   		kvm_clear_pte(ctx->ptep);
>   		dsb(ishst);
>   		__tlbi_level(vale2is, __TLBI_VADDR(ctx->addr, 0), ctx->level);
> -		data->unmapped += granule;
> +		*unmapped += granule;
>   	}
>   
>   	dsb(ish);
> @@ -501,12 +496,10 @@ static int hyp_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   
>   u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   {
> -	struct hyp_unmap_data unmap_data = {
> -		.mm_ops	= pgt->mm_ops,
> -	};
> +	u64 unmapped = 0;
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_unmap_walker,
> -		.arg	= &unmap_data,
> +		.arg	= &unmapped,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>   	};
>   
> @@ -514,7 +507,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   		return 0;
>   
>   	kvm_pgtable_walk(pgt, addr, size, &walker);
> -	return unmap_data.unmapped;
> +	return unmapped;
>   }
>   
>   int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> @@ -538,7 +531,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>   static int hyp_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			   enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -556,7 +549,6 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	struct kvm_pgtable_walker walker = {
>   		.cb	= hyp_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> @@ -575,8 +567,6 @@ struct stage2_map_data {
>   	struct kvm_s2_mmu		*mmu;
>   	void				*memcache;
>   
> -	struct kvm_pgtable_mm_ops	*mm_ops;
> -
>   	/* Force mappings to page granularity */
>   	bool				force_pte;
>   };
> @@ -725,7 +715,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   	kvm_pte_t new;
>   	u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
>   	struct kvm_pgtable *pgt = data->mmu->pgt;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return -E2BIG;
> @@ -773,7 +763,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   	if (!stage2_leaf_mapping_allowed(ctx, data))
>   		return 0;
>   
> -	data->childp = kvm_pte_follow(ctx->old, data->mm_ops);
> +	data->childp = kvm_pte_follow(ctx->old, ctx->mm_ops);
>   	kvm_clear_pte(ctx->ptep);
>   
>   	/*
> @@ -789,7 +779,7 @@ static int stage2_map_walk_table_pre(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   				struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret;
>   
> @@ -831,7 +821,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>   static int stage2_map_walk_table_post(const struct kvm_pgtable_visit_ctx *ctx,
>   				      struct stage2_map_data *data)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp;
>   	int ret = 0;
>   
> @@ -898,7 +888,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.force_pte	= pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
>   	};
>   	struct kvm_pgtable_walker walker = {
> @@ -929,7 +918,6 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.phys		= KVM_PHYS_INVALID,
>   		.mmu		= pgt->mmu,
>   		.memcache	= mc,
> -		.mm_ops		= pgt->mm_ops,
>   		.owner_id	= owner_id,
>   		.force_pte	= true,
>   	};
> @@ -953,7 +941,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	struct kvm_pgtable *pgt = ctx->arg;
>   	struct kvm_s2_mmu *mmu = pgt->mmu;
> -	struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   	kvm_pte_t *childp = NULL;
>   	bool need_flush = false;
>   
> @@ -1007,7 +995,6 @@ struct stage2_attr_data {
>   	kvm_pte_t			attr_clr;
>   	kvm_pte_t			pte;
>   	u32				level;
> -	struct kvm_pgtable_mm_ops	*mm_ops;
>   };
>   
>   static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1015,7 +1002,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   {
>   	kvm_pte_t pte = ctx->old;
>   	struct stage2_attr_data *data = ctx->arg;
> -	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!kvm_pte_valid(ctx->old))
>   		return 0;
> @@ -1055,7 +1042,6 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>   	struct stage2_attr_data data = {
>   		.attr_set	= attr_set & attr_mask,
>   		.attr_clr	= attr_clr & attr_mask,
> -		.mm_ops		= pgt->mm_ops,
>   	};
>   	struct kvm_pgtable_walker walker = {
>   		.cb		= stage2_attr_walker,
> @@ -1198,7 +1184,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>   static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx,
>   			      enum kvm_pgtable_walk_flags visit)
>   {
> -	struct kvm_pgtable_mm_ops *mm_ops = ctx->arg;
> +	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
>   
>   	if (!stage2_pte_is_counted(ctx->old))
>   		return 0;
> @@ -1218,7 +1204,6 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>   		.cb	= stage2_free_walker,
>   		.flags	= KVM_PGTABLE_WALK_LEAF |
>   			  KVM_PGTABLE_WALK_TABLE_POST,
> -		.arg	= pgt->mm_ops,
>   	};
>   
>   	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
  2022-11-10  5:30     ` Gavin Shan
  (?)
@ 2022-11-10  5:38       ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-10  5:38 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Thu, Nov 10, 2022 at 01:30:08PM +0800, Gavin Shan wrote:
> Hi Oliver,
> 
> On 11/8/22 5:56 AM, Oliver Upton wrote:
> > In order to tear down page tables from outside the context of
> > kvm_pgtable (such as an RCU callback), stop passing a pointer through
> > kvm_pgtable_walk_data.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >   arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
> >   1 file changed, 5 insertions(+), 13 deletions(-)
> > 
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>

Appreciated :)

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index db25e81a9890..93989b750a26 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -50,7 +50,6 @@
> >   #define KVM_MAX_OWNER_ID		1
> >   struct kvm_pgtable_walk_data {
> > -	struct kvm_pgtable		*pgt;
> >   	struct kvm_pgtable_walker	*walker;
> >   	u64				addr;
> 
> Ok. Here is the answer why data->pgt->mm_ops isn't reachable in the walker
> and visitor, and @mm_ops needs to be passed down.

Yup, the reason for unhitching all of this from kvm_pgtable is explained
in the cover letter as well:

  Patches 1-4 clean up the context associated with a page table walk / PTE
  visit. This is helpful for:
   - Extending the context passed through for a visit
   - Building page table walkers that operate outside of a kvm_pgtable
     context (e.g. RCU callback)

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
@ 2022-11-10  5:38       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-10  5:38 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, Ben Gardon,
	David Matlack, kvmarm, linux-arm-kernel

On Thu, Nov 10, 2022 at 01:30:08PM +0800, Gavin Shan wrote:
> Hi Oliver,
> 
> On 11/8/22 5:56 AM, Oliver Upton wrote:
> > In order to tear down page tables from outside the context of
> > kvm_pgtable (such as an RCU callback), stop passing a pointer through
> > kvm_pgtable_walk_data.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >   arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
> >   1 file changed, 5 insertions(+), 13 deletions(-)
> > 
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>

Appreciated :)

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index db25e81a9890..93989b750a26 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -50,7 +50,6 @@
> >   #define KVM_MAX_OWNER_ID		1
> >   struct kvm_pgtable_walk_data {
> > -	struct kvm_pgtable		*pgt;
> >   	struct kvm_pgtable_walker	*walker;
> >   	u64				addr;
> 
> Ok. Here is the answer why data->pgt->mm_ops isn't reachable in the walker
> and visitor, and @mm_ops needs to be passed down.

Yup, the reason for unhitching all of this from kvm_pgtable is explained
in the cover letter as well:

  Patches 1-4 clean up the context associated with a page table walk / PTE
  visit. This is helpful for:
   - Extending the context passed through for a visit
   - Building page table walkers that operate outside of a kvm_pgtable
     context (e.g. RCU callback)

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
@ 2022-11-10  5:38       ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-10  5:38 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Thu, Nov 10, 2022 at 01:30:08PM +0800, Gavin Shan wrote:
> Hi Oliver,
> 
> On 11/8/22 5:56 AM, Oliver Upton wrote:
> > In order to tear down page tables from outside the context of
> > kvm_pgtable (such as an RCU callback), stop passing a pointer through
> > kvm_pgtable_walk_data.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >   arch/arm64/kvm/hyp/pgtable.c | 18 +++++-------------
> >   1 file changed, 5 insertions(+), 13 deletions(-)
> > 
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>

Appreciated :)

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index db25e81a9890..93989b750a26 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -50,7 +50,6 @@
> >   #define KVM_MAX_OWNER_ID		1
> >   struct kvm_pgtable_walk_data {
> > -	struct kvm_pgtable		*pgt;
> >   	struct kvm_pgtable_walker	*walker;
> >   	u64				addr;
> 
> Ok. Here is the answer why data->pgt->mm_ops isn't reachable in the walker
> and visitor, and @mm_ops needs to be passed down.

Yup, the reason for unhitching all of this from kvm_pgtable is explained
in the cover letter as well:

  Patches 1-4 clean up the context associated with a page table walk / PTE
  visit. This is helpful for:
   - Extending the context passed through for a visit
   - Building page table walkers that operate outside of a kvm_pgtable
     context (e.g. RCU callback)

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-09 22:25     ` Ben Gardon
  (?)
@ 2022-11-10 13:34       ` Marc Zyngier
  -1 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-10 13:34 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Oliver Upton, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, 09 Nov 2022 22:25:38 +0000,
Ben Gardon <bgardon@google.com> wrote:
> 
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > release the RCU read lock when traversing the page tables. Defer the
> > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > and provide stubs for hypervisor code, as RCU is not available in such a
> > context.
> >
> > The RCU protection doesn't amount to much at the moment, as readers are
> > already protected by the read-write lock (all walkers that free table
> > memory take the write lock). Nonetheless, a subsequent change will
> > futher relax the locking requirements around the stage-2 MMU, thereby
> > depending on RCU.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
> >  arch/arm64/kvm/mmu.c                 | 14 +++++++-
> >  3 files changed, 71 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index e70cf57b719e..7634b6964779 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
> >
> >  typedef u64 kvm_pte_t;
> >
> > +/*
> > + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> > + * table walkers used in hyp do not call into RCU and instead use other
> > + * synchronization mechanisms (such as a spinlock).
> > + */
> > +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> > +
> >  typedef kvm_pte_t *kvm_pteref_t;
> >
> >  static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> > @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
> >         return pteref;
> >  }
> >
> > +static inline void kvm_pgtable_walk_begin(void) {}
> > +static inline void kvm_pgtable_walk_end(void) {}
> > +
> > +static inline bool kvm_pgtable_walk_lock_held(void)
> > +{
> > +       return true;
> 
> Forgive my ignorance, but does hyp not use a MMU lock at all? Seems
> like this would be a good place to add a lockdep check.

For normal KVM, we don't mess with the page tables in the HYP code *at
all*. That's just not the place. It is for pKVM that this is a bit
different, as EL2 is where the stuff happens.

Lockdep at EL2 is wishful thinking. However, we have the next best
thing, which is an assertion such as:

	hyp_assert_lock_held(&host_kvm.lock);

though at the moment, this is a *global* lock that serialises
everyone, as a guest stage-2 operation usually affects the host
stage-2 as well (ownership change and such). Quentin should be able to
provide more details on that.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-10 13:34       ` Marc Zyngier
  0 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-10 13:34 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Will Deacon, kvmarm, David Matlack, kvmarm, linux-arm-kernel

On Wed, 09 Nov 2022 22:25:38 +0000,
Ben Gardon <bgardon@google.com> wrote:
> 
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > release the RCU read lock when traversing the page tables. Defer the
> > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > and provide stubs for hypervisor code, as RCU is not available in such a
> > context.
> >
> > The RCU protection doesn't amount to much at the moment, as readers are
> > already protected by the read-write lock (all walkers that free table
> > memory take the write lock). Nonetheless, a subsequent change will
> > futher relax the locking requirements around the stage-2 MMU, thereby
> > depending on RCU.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
> >  arch/arm64/kvm/mmu.c                 | 14 +++++++-
> >  3 files changed, 71 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index e70cf57b719e..7634b6964779 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
> >
> >  typedef u64 kvm_pte_t;
> >
> > +/*
> > + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> > + * table walkers used in hyp do not call into RCU and instead use other
> > + * synchronization mechanisms (such as a spinlock).
> > + */
> > +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> > +
> >  typedef kvm_pte_t *kvm_pteref_t;
> >
> >  static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> > @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
> >         return pteref;
> >  }
> >
> > +static inline void kvm_pgtable_walk_begin(void) {}
> > +static inline void kvm_pgtable_walk_end(void) {}
> > +
> > +static inline bool kvm_pgtable_walk_lock_held(void)
> > +{
> > +       return true;
> 
> Forgive my ignorance, but does hyp not use a MMU lock at all? Seems
> like this would be a good place to add a lockdep check.

For normal KVM, we don't mess with the page tables in the HYP code *at
all*. That's just not the place. It is for pKVM that this is a bit
different, as EL2 is where the stuff happens.

Lockdep at EL2 is wishful thinking. However, we have the next best
thing, which is an assertion such as:

	hyp_assert_lock_held(&host_kvm.lock);

though at the moment, this is a *global* lock that serialises
everyone, as a guest stage-2 operation usually affects the host
stage-2 as well (ownership change and such). Quentin should be able to
provide more details on that.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-10 13:34       ` Marc Zyngier
  0 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-10 13:34 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Oliver Upton, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

On Wed, 09 Nov 2022 22:25:38 +0000,
Ben Gardon <bgardon@google.com> wrote:
> 
> On Mon, Nov 7, 2022 at 1:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > release the RCU read lock when traversing the page tables. Defer the
> > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > and provide stubs for hypervisor code, as RCU is not available in such a
> > context.
> >
> > The RCU protection doesn't amount to much at the moment, as readers are
> > already protected by the read-write lock (all walkers that free table
> > memory take the write lock). Nonetheless, a subsequent change will
> > futher relax the locking requirements around the stage-2 MMU, thereby
> > depending on RCU.
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
> >  arch/arm64/kvm/mmu.c                 | 14 +++++++-
> >  3 files changed, 71 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index e70cf57b719e..7634b6964779 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
> >
> >  typedef u64 kvm_pte_t;
> >
> > +/*
> > + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> > + * table walkers used in hyp do not call into RCU and instead use other
> > + * synchronization mechanisms (such as a spinlock).
> > + */
> > +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> > +
> >  typedef kvm_pte_t *kvm_pteref_t;
> >
> >  static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> > @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
> >         return pteref;
> >  }
> >
> > +static inline void kvm_pgtable_walk_begin(void) {}
> > +static inline void kvm_pgtable_walk_end(void) {}
> > +
> > +static inline bool kvm_pgtable_walk_lock_held(void)
> > +{
> > +       return true;
> 
> Forgive my ignorance, but does hyp not use a MMU lock at all? Seems
> like this would be a good place to add a lockdep check.

For normal KVM, we don't mess with the page tables in the HYP code *at
all*. That's just not the place. It is for pKVM that this is a bit
different, as EL2 is where the stuff happens.

Lockdep at EL2 is wishful thinking. However, we have the next best
thing, which is an assertion such as:

	hyp_assert_lock_held(&host_kvm.lock);

though at the moment, this is a *global* lock that serialises
everyone, as a guest stage-2 operation usually affects the host
stage-2 as well (ownership change and such). Quentin should be able to
provide more details on that.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
  2022-11-09 23:00         ` Ben Gardon
  (?)
@ 2022-11-10 13:40           ` Marc Zyngier
  -1 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-10 13:40 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Sean Christopherson, Oliver Upton, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, 09 Nov 2022 23:00:16 +0000,
Ben Gardon <bgardon@google.com> wrote:
> 
> On Wed, Nov 9, 2022 at 2:42 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Nov 09, 2022, Ben Gardon wrote:
> > > On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> > > >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> > > >  {
> > > >         kvm_pte_t pte = 0;
> > > > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > > > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> > >
> > > Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> > > doesn't just have to pass 0.
> >
> > That's also dangerous though since the param is a set of flags, not unique,
> > arbitrary values.  E.g. this won't do the expected thing
> >
> >         if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)
> >
> > I assume compilers would complain, but never say never when it comes to compilers :-)
> 
> Yeah, I was thinking about that too. IMO using one enum for multiple
> flags is kind of an abuse of the enum. If you're going to put multiple
> orthogonal flags in an int or whatever, it would probably be best to
> have separate enums for each flag. That way you can define masks to
> extract the enum from the int and only compare with == and != as
> opposed to using &.

Too late. The kernel is filled of this (look at the irq code, for
example), and we happily use this construct all over the (oh wait!)
page table code to construct permissions and other things.

At this stage, this is an established construct. Compiler people can
try and break this habit, good luck to them ;-).

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-10 13:40           ` Marc Zyngier
  0 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-10 13:40 UTC (permalink / raw)
  To: Ben Gardon
  Cc: kvm, Will Deacon, kvmarm, David Matlack, kvmarm, linux-arm-kernel

On Wed, 09 Nov 2022 23:00:16 +0000,
Ben Gardon <bgardon@google.com> wrote:
> 
> On Wed, Nov 9, 2022 at 2:42 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Nov 09, 2022, Ben Gardon wrote:
> > > On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> > > >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> > > >  {
> > > >         kvm_pte_t pte = 0;
> > > > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > > > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> > >
> > > Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> > > doesn't just have to pass 0.
> >
> > That's also dangerous though since the param is a set of flags, not unique,
> > arbitrary values.  E.g. this won't do the expected thing
> >
> >         if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)
> >
> > I assume compilers would complain, but never say never when it comes to compilers :-)
> 
> Yeah, I was thinking about that too. IMO using one enum for multiple
> flags is kind of an abuse of the enum. If you're going to put multiple
> orthogonal flags in an int or whatever, it would probably be best to
> have separate enums for each flag. That way you can define masks to
> extract the enum from the int and only compare with == and != as
> opposed to using &.

Too late. The kernel is filled of this (look at the irq code, for
example), and we happily use this construct all over the (oh wait!)
page table code to construct permissions and other things.

At this stage, this is an established construct. Compiler people can
try and break this habit, good luck to them ;-).

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
@ 2022-11-10 13:40           ` Marc Zyngier
  0 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-10 13:40 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Sean Christopherson, Oliver Upton, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, 09 Nov 2022 23:00:16 +0000,
Ben Gardon <bgardon@google.com> wrote:
> 
> On Wed, Nov 9, 2022 at 2:42 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Nov 09, 2022, Ben Gardon wrote:
> > > On Mon, Nov 7, 2022 at 1:58 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > > > @@ -1054,7 +1066,7 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> > > >  bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> > > >  {
> > > >         kvm_pte_t pte = 0;
> > > > -       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL);
> > > > +       stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
> > >
> > > Would be nice to have an enum for KVM_PGTABLE_WALK_EXCLUSIVE so this
> > > doesn't just have to pass 0.
> >
> > That's also dangerous though since the param is a set of flags, not unique,
> > arbitrary values.  E.g. this won't do the expected thing
> >
> >         if (flags & KVM_PGTABLE_WALK_EXCLUSIVE)
> >
> > I assume compilers would complain, but never say never when it comes to compilers :-)
> 
> Yeah, I was thinking about that too. IMO using one enum for multiple
> flags is kind of an abuse of the enum. If you're going to put multiple
> orthogonal flags in an int or whatever, it would probably be best to
> have separate enums for each flag. That way you can define masks to
> extract the enum from the int and only compare with == and != as
> opposed to using &.

Too late. The kernel is filled of this (look at the irq code, for
example), and we happily use this construct all over the (oh wait!)
page table code to construct permissions and other things.

At this stage, this is an established construct. Compiler people can
try and break this habit, good luck to them ;-).

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 00/14] KVM: arm64: Parallel stage-2 fault handling
  2022-11-07 21:56 ` Oliver Upton
  (?)
@ 2022-11-11 15:47   ` Marc Zyngier
  -1 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-11 15:47 UTC (permalink / raw)
  To: James Morse, Oliver Upton, Alexandru Elisei
  Cc: Quentin Perret, Peter Xu, kvm, kvmarm, Will Deacon, kvmarm,
	Ben Gardon, Ricardo Koller, linux-arm-kernel,
	Sean Christopherson, Reiji Watanabe, Gavin Shan, David Matlack

On Mon, 7 Nov 2022 21:56:30 +0000, Oliver Upton wrote:
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
> 
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
> 
> [...]

I've gone over this for quite a while, and while I'm still sh*t
scared about it, I've decided to let it simmer in -next for a bit.

If anything goes wrong or that someone spots something ugly,
it will be easy to simply drop the branch. For simple fixes, they
can go on top.

[01/14] KVM: arm64: Combine visitor arguments into a context structure
        commit: dfc7a7769ab7f2a2f629c673717ef1fa7b63aa42
[02/14] KVM: arm64: Stash observed pte value in visitor context
        commit: 83844a2317ecad935f6735abd854e4bf3f757040
[03/14] KVM: arm64: Pass mm_ops through the visitor context
        commit: 2a611c7f87f26cca405da63a57f06d0e4dc14240
[04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
        commit: fa002e8e79b3f980455ba585c1f47b26680de5b9
[05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
        commit: 8e94e1252cc054bb31fd3e9a15235cd831970ec1
[06/14] KVM: arm64: Use an opaque type for pteps
        commit: 6b91b8f95cadd3441c056182daf9024475ac4a91
[07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
        commit: 5c359cca1faf6d7671537fe1c240e8668467864d
[08/14] KVM: arm64: Protect stage-2 traversal with RCU
        commit: c3119ae45dfb6038ca458ab5ba7a9fba2810845b
[09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
        commit: ca5de2448c3b4c018fe3d6223df8b59068be1cd7
[10/14] KVM: arm64: Split init and set for table PTE
        commit: 331aa3a0547d1c794587e0df374d13b16645e832
[11/14] KVM: arm64: Make block->table PTE changes parallel-aware
        commit: 0ab12f3574db6cb432917a667f9392a88e8f0dfc
[12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
        commit: 946fbfdf336b811479e024136c7cabc00157b6b9
[13/14] KVM: arm64: Make table->block changes parallel-aware
        commit: af87fc03cfdf6893011df419588d27acdfb9c197
[14/14] KVM: arm64: Handle stage-2 faults in parallel
        commit: 1577cb5823cefdff4416f272a88143ee933d97f5

Fingers crossed,

	M.
-- 
Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 00/14] KVM: arm64: Parallel stage-2 fault handling
@ 2022-11-11 15:47   ` Marc Zyngier
  0 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-11 15:47 UTC (permalink / raw)
  To: James Morse, Oliver Upton, Alexandru Elisei
  Cc: kvm, David Matlack, Ben Gardon, kvmarm, Will Deacon, kvmarm,
	linux-arm-kernel

On Mon, 7 Nov 2022 21:56:30 +0000, Oliver Upton wrote:
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
> 
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
> 
> [...]

I've gone over this for quite a while, and while I'm still sh*t
scared about it, I've decided to let it simmer in -next for a bit.

If anything goes wrong or that someone spots something ugly,
it will be easy to simply drop the branch. For simple fixes, they
can go on top.

[01/14] KVM: arm64: Combine visitor arguments into a context structure
        commit: dfc7a7769ab7f2a2f629c673717ef1fa7b63aa42
[02/14] KVM: arm64: Stash observed pte value in visitor context
        commit: 83844a2317ecad935f6735abd854e4bf3f757040
[03/14] KVM: arm64: Pass mm_ops through the visitor context
        commit: 2a611c7f87f26cca405da63a57f06d0e4dc14240
[04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
        commit: fa002e8e79b3f980455ba585c1f47b26680de5b9
[05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
        commit: 8e94e1252cc054bb31fd3e9a15235cd831970ec1
[06/14] KVM: arm64: Use an opaque type for pteps
        commit: 6b91b8f95cadd3441c056182daf9024475ac4a91
[07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
        commit: 5c359cca1faf6d7671537fe1c240e8668467864d
[08/14] KVM: arm64: Protect stage-2 traversal with RCU
        commit: c3119ae45dfb6038ca458ab5ba7a9fba2810845b
[09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
        commit: ca5de2448c3b4c018fe3d6223df8b59068be1cd7
[10/14] KVM: arm64: Split init and set for table PTE
        commit: 331aa3a0547d1c794587e0df374d13b16645e832
[11/14] KVM: arm64: Make block->table PTE changes parallel-aware
        commit: 0ab12f3574db6cb432917a667f9392a88e8f0dfc
[12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
        commit: 946fbfdf336b811479e024136c7cabc00157b6b9
[13/14] KVM: arm64: Make table->block changes parallel-aware
        commit: af87fc03cfdf6893011df419588d27acdfb9c197
[14/14] KVM: arm64: Handle stage-2 faults in parallel
        commit: 1577cb5823cefdff4416f272a88143ee933d97f5

Fingers crossed,

	M.
-- 
Without deviation from the norm, progress is not possible.


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 00/14] KVM: arm64: Parallel stage-2 fault handling
@ 2022-11-11 15:47   ` Marc Zyngier
  0 siblings, 0 replies; 156+ messages in thread
From: Marc Zyngier @ 2022-11-11 15:47 UTC (permalink / raw)
  To: James Morse, Oliver Upton, Alexandru Elisei
  Cc: Quentin Perret, Peter Xu, kvm, kvmarm, Will Deacon, kvmarm,
	Ben Gardon, Ricardo Koller, linux-arm-kernel,
	Sean Christopherson, Reiji Watanabe, Gavin Shan, David Matlack

On Mon, 7 Nov 2022 21:56:30 +0000, Oliver Upton wrote:
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
> 
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
> 
> [...]

I've gone over this for quite a while, and while I'm still sh*t
scared about it, I've decided to let it simmer in -next for a bit.

If anything goes wrong or that someone spots something ugly,
it will be easy to simply drop the branch. For simple fixes, they
can go on top.

[01/14] KVM: arm64: Combine visitor arguments into a context structure
        commit: dfc7a7769ab7f2a2f629c673717ef1fa7b63aa42
[02/14] KVM: arm64: Stash observed pte value in visitor context
        commit: 83844a2317ecad935f6735abd854e4bf3f757040
[03/14] KVM: arm64: Pass mm_ops through the visitor context
        commit: 2a611c7f87f26cca405da63a57f06d0e4dc14240
[04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data
        commit: fa002e8e79b3f980455ba585c1f47b26680de5b9
[05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
        commit: 8e94e1252cc054bb31fd3e9a15235cd831970ec1
[06/14] KVM: arm64: Use an opaque type for pteps
        commit: 6b91b8f95cadd3441c056182daf9024475ac4a91
[07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
        commit: 5c359cca1faf6d7671537fe1c240e8668467864d
[08/14] KVM: arm64: Protect stage-2 traversal with RCU
        commit: c3119ae45dfb6038ca458ab5ba7a9fba2810845b
[09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks
        commit: ca5de2448c3b4c018fe3d6223df8b59068be1cd7
[10/14] KVM: arm64: Split init and set for table PTE
        commit: 331aa3a0547d1c794587e0df374d13b16645e832
[11/14] KVM: arm64: Make block->table PTE changes parallel-aware
        commit: 0ab12f3574db6cb432917a667f9392a88e8f0dfc
[12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware
        commit: 946fbfdf336b811479e024136c7cabc00157b6b9
[13/14] KVM: arm64: Make table->block changes parallel-aware
        commit: af87fc03cfdf6893011df419588d27acdfb9c197
[14/14] KVM: arm64: Handle stage-2 faults in parallel
        commit: 1577cb5823cefdff4416f272a88143ee933d97f5

Fingers crossed,

	M.
-- 
Without deviation from the norm, progress is not possible.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
       [not found]   ` <CGME20221114142915eucas1p258f3ca2c536bde712c068e96851468fd@eucas1p2.samsung.com>
  2022-11-14 14:29       ` Marek Szyprowski
@ 2022-11-14 14:29       ` Marek Szyprowski
  0 siblings, 0 replies; 156+ messages in thread
From: Marek Szyprowski @ 2022-11-14 14:29 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm

Hi Oliver,

On 07.11.2022 22:56, Oliver Upton wrote:
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
>
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

This patch landed in today's linux-next (20221114) as commit 
c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
Unfortunately it introduces a following warning:

--->8---

kvm [1]: IPA Size Limit: 40 bits
BUG: sleeping function called from invalid context at 
include/linux/sched/mm.h:274
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by swapper/0/1:
  #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
__create_hyp_mappings+0x80/0xc4
  #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
kvm_pgtable_walk+0x0/0x1f4
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
Hardware name: Raspberry Pi 3 Model B (DT)
Call trace:
  dump_backtrace.part.0+0xe4/0xf0
  show_stack+0x18/0x40
  dump_stack_lvl+0x8c/0xb8
  dump_stack+0x18/0x34
  __might_resched+0x178/0x220
  __might_sleep+0x48/0xa0
  prepare_alloc_pages+0x178/0x1a0
  __alloc_pages+0x9c/0x109c
  alloc_page_interleave+0x1c/0xc4
  alloc_pages+0xec/0x160
  get_zeroed_page+0x1c/0x44
  kvm_hyp_zalloc_page+0x14/0x20
  hyp_map_walker+0xd4/0x134
  kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
  __kvm_pgtable_walk+0x1a4/0x220
  kvm_pgtable_walk+0x104/0x1f4
  kvm_pgtable_hyp_map+0x80/0xc4
  __create_hyp_mappings+0x9c/0xc4
  kvm_mmu_init+0x144/0x1cc
  kvm_arch_init+0xe4/0xef4
  kvm_init+0x3c/0x3d0
  arm_init+0x20/0x30
  do_one_initcall+0x74/0x400
  kernel_init_freeable+0x2e0/0x350
  kernel_init+0x24/0x130
  ret_from_fork+0x10/0x20
kvm [1]: Hyp mode initialized successfully

--->8----

I looks that more changes in the KVM code are needed to use RCU for that 
code.

> ---
>   arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
>   arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
>   arch/arm64/kvm/mmu.c                 | 14 +++++++-
>   3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index e70cf57b719e..7634b6964779 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>   
>   typedef u64 kvm_pte_t;
>   
> +/*
> + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> + * table walkers used in hyp do not call into RCU and instead use other
> + * synchronization mechanisms (such as a spinlock).
> + */
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +
>   typedef kvm_pte_t *kvm_pteref_t;
>   
>   static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>   	return pteref;
>   }
>   
> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +	return true;
> +}
> +
> +#else
> +
> +typedef kvm_pte_t __rcu *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> +{
> +	return rcu_dereference_check(pteref, !shared);
> +}
> +
> +static inline void kvm_pgtable_walk_begin(void)
> +{
> +	rcu_read_lock();
> +}
> +
> +static inline void kvm_pgtable_walk_end(void)
> +{
> +	rcu_read_unlock();
> +}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +	return rcu_read_lock_held();
> +}
> +
> +#endif
> +
>   #define KVM_PTE_VALID			BIT(0)
>   
>   #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
> @@ -202,11 +243,14 @@ struct kvm_pgtable {
>    *					children.
>    * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
>    *					children.
> + * @KVM_PGTABLE_WALK_SHARED:		Indicates the page-tables may be shared
> + *					with other software walkers.
>    */
>   enum kvm_pgtable_walk_flags {
>   	KVM_PGTABLE_WALK_LEAF			= BIT(0),
>   	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
>   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
> +	KVM_PGTABLE_WALK_SHARED			= BIT(3),
>   };
>   
>   struct kvm_pgtable_visit_ctx {
> @@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
>   typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>   					enum kvm_pgtable_walk_flags visit);
>   
> +static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
> +{
> +	return ctx->flags & KVM_PGTABLE_WALK_SHARED;
> +}
> +
>   /**
>    * struct kvm_pgtable_walker - Hook into a page-table walk.
>    * @cb:		Callback function to invoke during the walk.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7c9782347570..d8d963521d4e 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   				  enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_walker *walker = data->walker;
> +
> +	/* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
> +	WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
>   	return walker->cb(ctx, visit);
>   }
>   
> @@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.end	= PAGE_ALIGN(walk_data.addr + size),
>   		.walker	= walker,
>   	};
> +	int r;
> +
> +	kvm_pgtable_walk_begin();
> +	r = _kvm_pgtable_walk(pgt, &walk_data);
> +	kvm_pgtable_walk_end();
>   
> -	return _kvm_pgtable_walk(pgt, &walk_data);
> +	return r;
>   }
>   
>   struct leaf_walk_data {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 73ae908eb5d9..52e042399ba5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>   
>   static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
>   
> +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> +{
> +	struct page *page = container_of(head, struct page, rcu_head);
> +	void *pgtable = page_to_virt(page);
> +	u32 level = page_private(page);
> +
> +	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
> +}
> +
>   static void stage2_free_removed_table(void *addr, u32 level)
>   {
> -	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +	struct page *page = virt_to_page(addr);
> +
> +	set_page_private(page, (unsigned long)level);
> +	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
>   }
>   
>   static void kvm_host_get_page(void *addr)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-14 14:29       ` Marek Szyprowski
  0 siblings, 0 replies; 156+ messages in thread
From: Marek Szyprowski @ 2022-11-14 14:29 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: kvm, kvmarm, Ben Gardon, David Matlack, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Oliver,

On 07.11.2022 22:56, Oliver Upton wrote:
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
>
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

This patch landed in today's linux-next (20221114) as commit 
c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
Unfortunately it introduces a following warning:

--->8---

kvm [1]: IPA Size Limit: 40 bits
BUG: sleeping function called from invalid context at 
include/linux/sched/mm.h:274
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by swapper/0/1:
  #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
__create_hyp_mappings+0x80/0xc4
  #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
kvm_pgtable_walk+0x0/0x1f4
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
Hardware name: Raspberry Pi 3 Model B (DT)
Call trace:
  dump_backtrace.part.0+0xe4/0xf0
  show_stack+0x18/0x40
  dump_stack_lvl+0x8c/0xb8
  dump_stack+0x18/0x34
  __might_resched+0x178/0x220
  __might_sleep+0x48/0xa0
  prepare_alloc_pages+0x178/0x1a0
  __alloc_pages+0x9c/0x109c
  alloc_page_interleave+0x1c/0xc4
  alloc_pages+0xec/0x160
  get_zeroed_page+0x1c/0x44
  kvm_hyp_zalloc_page+0x14/0x20
  hyp_map_walker+0xd4/0x134
  kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
  __kvm_pgtable_walk+0x1a4/0x220
  kvm_pgtable_walk+0x104/0x1f4
  kvm_pgtable_hyp_map+0x80/0xc4
  __create_hyp_mappings+0x9c/0xc4
  kvm_mmu_init+0x144/0x1cc
  kvm_arch_init+0xe4/0xef4
  kvm_init+0x3c/0x3d0
  arm_init+0x20/0x30
  do_one_initcall+0x74/0x400
  kernel_init_freeable+0x2e0/0x350
  kernel_init+0x24/0x130
  ret_from_fork+0x10/0x20
kvm [1]: Hyp mode initialized successfully

--->8----

I looks that more changes in the KVM code are needed to use RCU for that 
code.

> ---
>   arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
>   arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
>   arch/arm64/kvm/mmu.c                 | 14 +++++++-
>   3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index e70cf57b719e..7634b6964779 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>   
>   typedef u64 kvm_pte_t;
>   
> +/*
> + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> + * table walkers used in hyp do not call into RCU and instead use other
> + * synchronization mechanisms (such as a spinlock).
> + */
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +
>   typedef kvm_pte_t *kvm_pteref_t;
>   
>   static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>   	return pteref;
>   }
>   
> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +	return true;
> +}
> +
> +#else
> +
> +typedef kvm_pte_t __rcu *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> +{
> +	return rcu_dereference_check(pteref, !shared);
> +}
> +
> +static inline void kvm_pgtable_walk_begin(void)
> +{
> +	rcu_read_lock();
> +}
> +
> +static inline void kvm_pgtable_walk_end(void)
> +{
> +	rcu_read_unlock();
> +}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +	return rcu_read_lock_held();
> +}
> +
> +#endif
> +
>   #define KVM_PTE_VALID			BIT(0)
>   
>   #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
> @@ -202,11 +243,14 @@ struct kvm_pgtable {
>    *					children.
>    * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
>    *					children.
> + * @KVM_PGTABLE_WALK_SHARED:		Indicates the page-tables may be shared
> + *					with other software walkers.
>    */
>   enum kvm_pgtable_walk_flags {
>   	KVM_PGTABLE_WALK_LEAF			= BIT(0),
>   	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
>   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
> +	KVM_PGTABLE_WALK_SHARED			= BIT(3),
>   };
>   
>   struct kvm_pgtable_visit_ctx {
> @@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
>   typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>   					enum kvm_pgtable_walk_flags visit);
>   
> +static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
> +{
> +	return ctx->flags & KVM_PGTABLE_WALK_SHARED;
> +}
> +
>   /**
>    * struct kvm_pgtable_walker - Hook into a page-table walk.
>    * @cb:		Callback function to invoke during the walk.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7c9782347570..d8d963521d4e 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   				  enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_walker *walker = data->walker;
> +
> +	/* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
> +	WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
>   	return walker->cb(ctx, visit);
>   }
>   
> @@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.end	= PAGE_ALIGN(walk_data.addr + size),
>   		.walker	= walker,
>   	};
> +	int r;
> +
> +	kvm_pgtable_walk_begin();
> +	r = _kvm_pgtable_walk(pgt, &walk_data);
> +	kvm_pgtable_walk_end();
>   
> -	return _kvm_pgtable_walk(pgt, &walk_data);
> +	return r;
>   }
>   
>   struct leaf_walk_data {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 73ae908eb5d9..52e042399ba5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>   
>   static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
>   
> +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> +{
> +	struct page *page = container_of(head, struct page, rcu_head);
> +	void *pgtable = page_to_virt(page);
> +	u32 level = page_private(page);
> +
> +	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
> +}
> +
>   static void stage2_free_removed_table(void *addr, u32 level)
>   {
> -	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +	struct page *page = virt_to_page(addr);
> +
> +	set_page_private(page, (unsigned long)level);
> +	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
>   }
>   
>   static void kvm_host_get_page(void *addr)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-14 14:29       ` Marek Szyprowski
  0 siblings, 0 replies; 156+ messages in thread
From: Marek Szyprowski @ 2022-11-14 14:29 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm

Hi Oliver,

On 07.11.2022 22:56, Oliver Upton wrote:
> Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> release the RCU read lock when traversing the page tables. Defer the
> freeing of table memory to an RCU callback. Indirect the calls into RCU
> and provide stubs for hypervisor code, as RCU is not available in such a
> context.
>
> The RCU protection doesn't amount to much at the moment, as readers are
> already protected by the read-write lock (all walkers that free table
> memory take the write lock). Nonetheless, a subsequent change will
> futher relax the locking requirements around the stage-2 MMU, thereby
> depending on RCU.
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>

This patch landed in today's linux-next (20221114) as commit 
c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
Unfortunately it introduces a following warning:

--->8---

kvm [1]: IPA Size Limit: 40 bits
BUG: sleeping function called from invalid context at 
include/linux/sched/mm.h:274
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by swapper/0/1:
  #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
__create_hyp_mappings+0x80/0xc4
  #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
kvm_pgtable_walk+0x0/0x1f4
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
Hardware name: Raspberry Pi 3 Model B (DT)
Call trace:
  dump_backtrace.part.0+0xe4/0xf0
  show_stack+0x18/0x40
  dump_stack_lvl+0x8c/0xb8
  dump_stack+0x18/0x34
  __might_resched+0x178/0x220
  __might_sleep+0x48/0xa0
  prepare_alloc_pages+0x178/0x1a0
  __alloc_pages+0x9c/0x109c
  alloc_page_interleave+0x1c/0xc4
  alloc_pages+0xec/0x160
  get_zeroed_page+0x1c/0x44
  kvm_hyp_zalloc_page+0x14/0x20
  hyp_map_walker+0xd4/0x134
  kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
  __kvm_pgtable_walk+0x1a4/0x220
  kvm_pgtable_walk+0x104/0x1f4
  kvm_pgtable_hyp_map+0x80/0xc4
  __create_hyp_mappings+0x9c/0xc4
  kvm_mmu_init+0x144/0x1cc
  kvm_arch_init+0xe4/0xef4
  kvm_init+0x3c/0x3d0
  arm_init+0x20/0x30
  do_one_initcall+0x74/0x400
  kernel_init_freeable+0x2e0/0x350
  kernel_init+0x24/0x130
  ret_from_fork+0x10/0x20
kvm [1]: Hyp mode initialized successfully

--->8----

I looks that more changes in the KVM code are needed to use RCU for that 
code.

> ---
>   arch/arm64/include/asm/kvm_pgtable.h | 49 ++++++++++++++++++++++++++++
>   arch/arm64/kvm/hyp/pgtable.c         | 10 +++++-
>   arch/arm64/kvm/mmu.c                 | 14 +++++++-
>   3 files changed, 71 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index e70cf57b719e..7634b6964779 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -37,6 +37,13 @@ static inline u64 kvm_get_parange(u64 mmfr0)
>   
>   typedef u64 kvm_pte_t;
>   
> +/*
> + * RCU cannot be used in a non-kernel context such as the hyp. As such, page
> + * table walkers used in hyp do not call into RCU and instead use other
> + * synchronization mechanisms (such as a spinlock).
> + */
> +#if defined(__KVM_NVHE_HYPERVISOR__) || defined(__KVM_VHE_HYPERVISOR__)
> +
>   typedef kvm_pte_t *kvm_pteref_t;
>   
>   static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> @@ -44,6 +51,40 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>   	return pteref;
>   }
>   
> +static inline void kvm_pgtable_walk_begin(void) {}
> +static inline void kvm_pgtable_walk_end(void) {}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +	return true;
> +}
> +
> +#else
> +
> +typedef kvm_pte_t __rcu *kvm_pteref_t;
> +
> +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
> +{
> +	return rcu_dereference_check(pteref, !shared);
> +}
> +
> +static inline void kvm_pgtable_walk_begin(void)
> +{
> +	rcu_read_lock();
> +}
> +
> +static inline void kvm_pgtable_walk_end(void)
> +{
> +	rcu_read_unlock();
> +}
> +
> +static inline bool kvm_pgtable_walk_lock_held(void)
> +{
> +	return rcu_read_lock_held();
> +}
> +
> +#endif
> +
>   #define KVM_PTE_VALID			BIT(0)
>   
>   #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
> @@ -202,11 +243,14 @@ struct kvm_pgtable {
>    *					children.
>    * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
>    *					children.
> + * @KVM_PGTABLE_WALK_SHARED:		Indicates the page-tables may be shared
> + *					with other software walkers.
>    */
>   enum kvm_pgtable_walk_flags {
>   	KVM_PGTABLE_WALK_LEAF			= BIT(0),
>   	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
>   	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
> +	KVM_PGTABLE_WALK_SHARED			= BIT(3),
>   };
>   
>   struct kvm_pgtable_visit_ctx {
> @@ -223,6 +267,11 @@ struct kvm_pgtable_visit_ctx {
>   typedef int (*kvm_pgtable_visitor_fn_t)(const struct kvm_pgtable_visit_ctx *ctx,
>   					enum kvm_pgtable_walk_flags visit);
>   
> +static inline bool kvm_pgtable_walk_shared(const struct kvm_pgtable_visit_ctx *ctx)
> +{
> +	return ctx->flags & KVM_PGTABLE_WALK_SHARED;
> +}
> +
>   /**
>    * struct kvm_pgtable_walker - Hook into a page-table walk.
>    * @cb:		Callback function to invoke during the walk.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 7c9782347570..d8d963521d4e 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -171,6 +171,9 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
>   				  enum kvm_pgtable_walk_flags visit)
>   {
>   	struct kvm_pgtable_walker *walker = data->walker;
> +
> +	/* Ensure the appropriate lock is held (e.g. RCU lock for stage-2 MMU) */
> +	WARN_ON_ONCE(kvm_pgtable_walk_shared(ctx) && !kvm_pgtable_walk_lock_held());
>   	return walker->cb(ctx, visit);
>   }
>   
> @@ -281,8 +284,13 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   		.end	= PAGE_ALIGN(walk_data.addr + size),
>   		.walker	= walker,
>   	};
> +	int r;
> +
> +	kvm_pgtable_walk_begin();
> +	r = _kvm_pgtable_walk(pgt, &walk_data);
> +	kvm_pgtable_walk_end();
>   
> -	return _kvm_pgtable_walk(pgt, &walk_data);
> +	return r;
>   }
>   
>   struct leaf_walk_data {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 73ae908eb5d9..52e042399ba5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -130,9 +130,21 @@ static void kvm_s2_free_pages_exact(void *virt, size_t size)
>   
>   static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
>   
> +static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
> +{
> +	struct page *page = container_of(head, struct page, rcu_head);
> +	void *pgtable = page_to_virt(page);
> +	u32 level = page_private(page);
> +
> +	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
> +}
> +
>   static void stage2_free_removed_table(void *addr, u32 level)
>   {
> -	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
> +	struct page *page = virt_to_page(addr);
> +
> +	set_page_private(page, (unsigned long)level);
> +	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
>   }
>   
>   static void kvm_host_get_page(void *addr)

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-14 14:29       ` Marek Szyprowski
  (?)
@ 2022-11-14 17:42         ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-14 17:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Marek,

On Mon, Nov 14, 2022 at 03:29:14PM +0100, Marek Szyprowski wrote:
> This patch landed in today's linux-next (20221114) as commit 
> c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
> Unfortunately it introduces a following warning:

Thanks for the bug report :) I had failed to test nVHE in the past few
revisions of this series.

> --->8---
> 
> kvm [1]: IPA Size Limit: 40 bits
> BUG: sleeping function called from invalid context at 
> include/linux/sched/mm.h:274
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 0, expected: 0
> RCU nest depth: 1, expected: 0
> 2 locks held by swapper/0/1:
>   #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
> __create_hyp_mappings+0x80/0xc4
>   #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
> kvm_pgtable_walk+0x0/0x1f4
> CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
> Hardware name: Raspberry Pi 3 Model B (DT)
> Call trace:
>   dump_backtrace.part.0+0xe4/0xf0
>   show_stack+0x18/0x40
>   dump_stack_lvl+0x8c/0xb8
>   dump_stack+0x18/0x34
>   __might_resched+0x178/0x220
>   __might_sleep+0x48/0xa0
>   prepare_alloc_pages+0x178/0x1a0
>   __alloc_pages+0x9c/0x109c
>   alloc_page_interleave+0x1c/0xc4
>   alloc_pages+0xec/0x160
>   get_zeroed_page+0x1c/0x44
>   kvm_hyp_zalloc_page+0x14/0x20
>   hyp_map_walker+0xd4/0x134
>   kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
>   __kvm_pgtable_walk+0x1a4/0x220
>   kvm_pgtable_walk+0x104/0x1f4
>   kvm_pgtable_hyp_map+0x80/0xc4
>   __create_hyp_mappings+0x9c/0xc4
>   kvm_mmu_init+0x144/0x1cc
>   kvm_arch_init+0xe4/0xef4
>   kvm_init+0x3c/0x3d0
>   arm_init+0x20/0x30
>   do_one_initcall+0x74/0x400
>   kernel_init_freeable+0x2e0/0x350
>   kernel_init+0x24/0x130
>   ret_from_fork+0x10/0x20
> kvm [1]: Hyp mode initialized successfully
> 
> --->8----
> 
> I looks that more changes in the KVM code are needed to use RCU for that 
> code.

Right, the specific issue is that while the stage-2 walkers preallocate
any table memory they may need, the hyp walkers do not and allocate
inline.

As hyp stage-1 is protected by a spinlock there is no actual need for
RCU in that case. I'll post something later on today that addresses the
issue.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-14 17:42         ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-14 17:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, Ben Gardon,
	David Matlack, kvmarm, linux-arm-kernel

Hi Marek,

On Mon, Nov 14, 2022 at 03:29:14PM +0100, Marek Szyprowski wrote:
> This patch landed in today's linux-next (20221114) as commit 
> c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
> Unfortunately it introduces a following warning:

Thanks for the bug report :) I had failed to test nVHE in the past few
revisions of this series.

> --->8---
> 
> kvm [1]: IPA Size Limit: 40 bits
> BUG: sleeping function called from invalid context at 
> include/linux/sched/mm.h:274
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 0, expected: 0
> RCU nest depth: 1, expected: 0
> 2 locks held by swapper/0/1:
>   #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
> __create_hyp_mappings+0x80/0xc4
>   #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
> kvm_pgtable_walk+0x0/0x1f4
> CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
> Hardware name: Raspberry Pi 3 Model B (DT)
> Call trace:
>   dump_backtrace.part.0+0xe4/0xf0
>   show_stack+0x18/0x40
>   dump_stack_lvl+0x8c/0xb8
>   dump_stack+0x18/0x34
>   __might_resched+0x178/0x220
>   __might_sleep+0x48/0xa0
>   prepare_alloc_pages+0x178/0x1a0
>   __alloc_pages+0x9c/0x109c
>   alloc_page_interleave+0x1c/0xc4
>   alloc_pages+0xec/0x160
>   get_zeroed_page+0x1c/0x44
>   kvm_hyp_zalloc_page+0x14/0x20
>   hyp_map_walker+0xd4/0x134
>   kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
>   __kvm_pgtable_walk+0x1a4/0x220
>   kvm_pgtable_walk+0x104/0x1f4
>   kvm_pgtable_hyp_map+0x80/0xc4
>   __create_hyp_mappings+0x9c/0xc4
>   kvm_mmu_init+0x144/0x1cc
>   kvm_arch_init+0xe4/0xef4
>   kvm_init+0x3c/0x3d0
>   arm_init+0x20/0x30
>   do_one_initcall+0x74/0x400
>   kernel_init_freeable+0x2e0/0x350
>   kernel_init+0x24/0x130
>   ret_from_fork+0x10/0x20
> kvm [1]: Hyp mode initialized successfully
> 
> --->8----
> 
> I looks that more changes in the KVM code are needed to use RCU for that 
> code.

Right, the specific issue is that while the stage-2 walkers preallocate
any table memory they may need, the hyp walkers do not and allocate
inline.

As hyp stage-1 is protected by a spinlock there is no actual need for
RCU in that case. I'll post something later on today that addresses the
issue.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-14 17:42         ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-14 17:42 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, linux-arm-kernel,
	kvmarm, kvm, Reiji Watanabe, Ricardo Koller, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	Sean Christopherson, kvmarm

Hi Marek,

On Mon, Nov 14, 2022 at 03:29:14PM +0100, Marek Szyprowski wrote:
> This patch landed in today's linux-next (20221114) as commit 
> c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
> Unfortunately it introduces a following warning:

Thanks for the bug report :) I had failed to test nVHE in the past few
revisions of this series.

> --->8---
> 
> kvm [1]: IPA Size Limit: 40 bits
> BUG: sleeping function called from invalid context at 
> include/linux/sched/mm.h:274
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 0, expected: 0
> RCU nest depth: 1, expected: 0
> 2 locks held by swapper/0/1:
>   #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
> __create_hyp_mappings+0x80/0xc4
>   #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
> kvm_pgtable_walk+0x0/0x1f4
> CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
> Hardware name: Raspberry Pi 3 Model B (DT)
> Call trace:
>   dump_backtrace.part.0+0xe4/0xf0
>   show_stack+0x18/0x40
>   dump_stack_lvl+0x8c/0xb8
>   dump_stack+0x18/0x34
>   __might_resched+0x178/0x220
>   __might_sleep+0x48/0xa0
>   prepare_alloc_pages+0x178/0x1a0
>   __alloc_pages+0x9c/0x109c
>   alloc_page_interleave+0x1c/0xc4
>   alloc_pages+0xec/0x160
>   get_zeroed_page+0x1c/0x44
>   kvm_hyp_zalloc_page+0x14/0x20
>   hyp_map_walker+0xd4/0x134
>   kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
>   __kvm_pgtable_walk+0x1a4/0x220
>   kvm_pgtable_walk+0x104/0x1f4
>   kvm_pgtable_hyp_map+0x80/0xc4
>   __create_hyp_mappings+0x9c/0xc4
>   kvm_mmu_init+0x144/0x1cc
>   kvm_arch_init+0xe4/0xef4
>   kvm_init+0x3c/0x3d0
>   arm_init+0x20/0x30
>   do_one_initcall+0x74/0x400
>   kernel_init_freeable+0x2e0/0x350
>   kernel_init+0x24/0x130
>   ret_from_fork+0x10/0x20
> kvm [1]: Hyp mode initialized successfully
> 
> --->8----
> 
> I looks that more changes in the KVM code are needed to use RCU for that 
> code.

Right, the specific issue is that while the stage-2 walkers preallocate
any table memory they may need, the hyp walkers do not and allocate
inline.

As hyp stage-1 is protected by a spinlock there is no actual need for
RCU in that case. I'll post something later on today that addresses the
issue.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-09 23:55       ` Oliver Upton
  (?)
@ 2022-11-15 18:47         ` Ricardo Koller
  -1 siblings, 0 replies; 156+ messages in thread
From: Ricardo Koller @ 2022-11-15 18:47 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, Nov 09, 2022 at 11:55:31PM +0000, Oliver Upton wrote:
> On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> > On Mon, Nov 07, 2022, Oliver Upton wrote:
> > > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > > release the RCU read lock when traversing the page tables. Defer the
> > > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > > and provide stubs for hypervisor code, as RCU is not available in such a
> > > context.
> > > 
> > > The RCU protection doesn't amount to much at the moment, as readers are
> > > already protected by the read-write lock (all walkers that free table
> > > memory take the write lock). Nonetheless, a subsequent change will
> > > futher relax the locking requirements around the stage-2 MMU, thereby
> > > depending on RCU.
> > 
> > Two somewhat off-topic questions (because I'm curious):
> 
> Worth asking!
> 
> >  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
> >     faults (handle_access_fault()) and/or write-protection faults without acquiring
> >     mmu_lock?
> 
> I don't have any plans personally.
> 
> OTOH, adding support for read-side access faults is trivial, I just
> didn't give it much thought as most large-scale implementations have
> FEAT_HAFDBS (hardware access flag management).

WDYT of permission relaxation (write-protection faults) on the fast
path?

The benefits won't be as good as in x86 due to the required TLBI, but
may be worth it due to not dealing with the mmu lock and avoiding some
of the context save/restore.  Note that unlike x86, in ARM the TLB entry
related to a protection fault needs to be flushed.

> 
> >  2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
> >     for the RCU-less hypervisor code?
> 
> If/when we are worried about fault serialization in the lowvisor I was
> thinking something along the lines of disabling interrupts and using
> IPIs as barriers before freeing removed table memory, crudely giving the
> same protection as RCU.
> 
> --
> Thanks,
> Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-15 18:47         ` Ricardo Koller
  0 siblings, 0 replies; 156+ messages in thread
From: Ricardo Koller @ 2022-11-15 18:47 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Ben Gardon, kvmarm, Will Deacon, Marc Zyngier,
	David Matlack, kvmarm, linux-arm-kernel

On Wed, Nov 09, 2022 at 11:55:31PM +0000, Oliver Upton wrote:
> On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> > On Mon, Nov 07, 2022, Oliver Upton wrote:
> > > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > > release the RCU read lock when traversing the page tables. Defer the
> > > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > > and provide stubs for hypervisor code, as RCU is not available in such a
> > > context.
> > > 
> > > The RCU protection doesn't amount to much at the moment, as readers are
> > > already protected by the read-write lock (all walkers that free table
> > > memory take the write lock). Nonetheless, a subsequent change will
> > > futher relax the locking requirements around the stage-2 MMU, thereby
> > > depending on RCU.
> > 
> > Two somewhat off-topic questions (because I'm curious):
> 
> Worth asking!
> 
> >  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
> >     faults (handle_access_fault()) and/or write-protection faults without acquiring
> >     mmu_lock?
> 
> I don't have any plans personally.
> 
> OTOH, adding support for read-side access faults is trivial, I just
> didn't give it much thought as most large-scale implementations have
> FEAT_HAFDBS (hardware access flag management).

WDYT of permission relaxation (write-protection faults) on the fast
path?

The benefits won't be as good as in x86 due to the required TLBI, but
may be worth it due to not dealing with the mmu lock and avoiding some
of the context save/restore.  Note that unlike x86, in ARM the TLB entry
related to a protection fault needs to be flushed.

> 
> >  2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
> >     for the RCU-less hypervisor code?
> 
> If/when we are worried about fault serialization in the lowvisor I was
> thinking something along the lines of disabling interrupts and using
> IPIs as barriers before freeing removed table memory, crudely giving the
> same protection as RCU.
> 
> --
> Thanks,
> Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-15 18:47         ` Ricardo Koller
  0 siblings, 0 replies; 156+ messages in thread
From: Ricardo Koller @ 2022-11-15 18:47 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Wed, Nov 09, 2022 at 11:55:31PM +0000, Oliver Upton wrote:
> On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> > On Mon, Nov 07, 2022, Oliver Upton wrote:
> > > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > > release the RCU read lock when traversing the page tables. Defer the
> > > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > > and provide stubs for hypervisor code, as RCU is not available in such a
> > > context.
> > > 
> > > The RCU protection doesn't amount to much at the moment, as readers are
> > > already protected by the read-write lock (all walkers that free table
> > > memory take the write lock). Nonetheless, a subsequent change will
> > > futher relax the locking requirements around the stage-2 MMU, thereby
> > > depending on RCU.
> > 
> > Two somewhat off-topic questions (because I'm curious):
> 
> Worth asking!
> 
> >  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
> >     faults (handle_access_fault()) and/or write-protection faults without acquiring
> >     mmu_lock?
> 
> I don't have any plans personally.
> 
> OTOH, adding support for read-side access faults is trivial, I just
> didn't give it much thought as most large-scale implementations have
> FEAT_HAFDBS (hardware access flag management).

WDYT of permission relaxation (write-protection faults) on the fast
path?

The benefits won't be as good as in x86 due to the required TLBI, but
may be worth it due to not dealing with the mmu lock and avoiding some
of the context save/restore.  Note that unlike x86, in ARM the TLB entry
related to a protection fault needs to be flushed.

> 
> >  2. If the answer to (1) is "yes!", what's the plan to protect the lockless walks
> >     for the RCU-less hypervisor code?
> 
> If/when we are worried about fault serialization in the lowvisor I was
> thinking something along the lines of disabling interrupts and using
> IPIs as barriers before freeing removed table memory, crudely giving the
> same protection as RCU.
> 
> --
> Thanks,
> Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-15 18:47         ` Ricardo Koller
  (?)
@ 2022-11-15 18:57           ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-15 18:57 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: Sean Christopherson, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Tue, Nov 15, 2022 at 10:47:37AM -0800, Ricardo Koller wrote:
> On Wed, Nov 09, 2022 at 11:55:31PM +0000, Oliver Upton wrote:
> > On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> > > On Mon, Nov 07, 2022, Oliver Upton wrote:
> > > > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > > > release the RCU read lock when traversing the page tables. Defer the
> > > > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > > > and provide stubs for hypervisor code, as RCU is not available in such a
> > > > context.
> > > > 
> > > > The RCU protection doesn't amount to much at the moment, as readers are
> > > > already protected by the read-write lock (all walkers that free table
> > > > memory take the write lock). Nonetheless, a subsequent change will
> > > > futher relax the locking requirements around the stage-2 MMU, thereby
> > > > depending on RCU.
> > > 
> > > Two somewhat off-topic questions (because I'm curious):
> > 
> > Worth asking!
> > 
> > >  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
> > >     faults (handle_access_fault()) and/or write-protection faults without acquiring
> > >     mmu_lock?
> > 
> > I don't have any plans personally.
> > 
> > OTOH, adding support for read-side access faults is trivial, I just
> > didn't give it much thought as most large-scale implementations have
> > FEAT_HAFDBS (hardware access flag management).
> 
> WDYT of permission relaxation (write-protection faults) on the fast
> path?
> 
> The benefits won't be as good as in x86 due to the required TLBI, but
> may be worth it due to not dealing with the mmu lock and avoiding some
> of the context save/restore.  Note that unlike x86, in ARM the TLB entry
> related to a protection fault needs to be flushed.

Right, the only guarantee we have on arm64 is that the TLB will never
hold an entry that would produce an access fault.

I have no issues whatsoever with implementing a lock-free walker, we're
already most of the way there with the RCU implementation modulo some
rules for atomic PTE updates. I don't believe lock acquisition is a
bounding issue for us quite yet as break-before-make + lazy splitting
hurts.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-15 18:57           ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-15 18:57 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: kvm, Ben Gardon, kvmarm, Will Deacon, Marc Zyngier,
	David Matlack, kvmarm, linux-arm-kernel

On Tue, Nov 15, 2022 at 10:47:37AM -0800, Ricardo Koller wrote:
> On Wed, Nov 09, 2022 at 11:55:31PM +0000, Oliver Upton wrote:
> > On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> > > On Mon, Nov 07, 2022, Oliver Upton wrote:
> > > > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > > > release the RCU read lock when traversing the page tables. Defer the
> > > > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > > > and provide stubs for hypervisor code, as RCU is not available in such a
> > > > context.
> > > > 
> > > > The RCU protection doesn't amount to much at the moment, as readers are
> > > > already protected by the read-write lock (all walkers that free table
> > > > memory take the write lock). Nonetheless, a subsequent change will
> > > > futher relax the locking requirements around the stage-2 MMU, thereby
> > > > depending on RCU.
> > > 
> > > Two somewhat off-topic questions (because I'm curious):
> > 
> > Worth asking!
> > 
> > >  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
> > >     faults (handle_access_fault()) and/or write-protection faults without acquiring
> > >     mmu_lock?
> > 
> > I don't have any plans personally.
> > 
> > OTOH, adding support for read-side access faults is trivial, I just
> > didn't give it much thought as most large-scale implementations have
> > FEAT_HAFDBS (hardware access flag management).
> 
> WDYT of permission relaxation (write-protection faults) on the fast
> path?
> 
> The benefits won't be as good as in x86 due to the required TLBI, but
> may be worth it due to not dealing with the mmu lock and avoiding some
> of the context save/restore.  Note that unlike x86, in ARM the TLB entry
> related to a protection fault needs to be flushed.

Right, the only guarantee we have on arm64 is that the TLB will never
hold an entry that would produce an access fault.

I have no issues whatsoever with implementing a lock-free walker, we're
already most of the way there with the RCU implementation modulo some
rules for atomic PTE updates. I don't believe lock acquisition is a
bounding issue for us quite yet as break-before-make + lazy splitting
hurts.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-11-15 18:57           ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-11-15 18:57 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: Sean Christopherson, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, David Matlack,
	Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu, Will Deacon,
	kvmarm

On Tue, Nov 15, 2022 at 10:47:37AM -0800, Ricardo Koller wrote:
> On Wed, Nov 09, 2022 at 11:55:31PM +0000, Oliver Upton wrote:
> > On Wed, Nov 09, 2022 at 09:53:45PM +0000, Sean Christopherson wrote:
> > > On Mon, Nov 07, 2022, Oliver Upton wrote:
> > > > Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
> > > > release the RCU read lock when traversing the page tables. Defer the
> > > > freeing of table memory to an RCU callback. Indirect the calls into RCU
> > > > and provide stubs for hypervisor code, as RCU is not available in such a
> > > > context.
> > > > 
> > > > The RCU protection doesn't amount to much at the moment, as readers are
> > > > already protected by the read-write lock (all walkers that free table
> > > > memory take the write lock). Nonetheless, a subsequent change will
> > > > futher relax the locking requirements around the stage-2 MMU, thereby
> > > > depending on RCU.
> > > 
> > > Two somewhat off-topic questions (because I'm curious):
> > 
> > Worth asking!
> > 
> > >  1. Are there plans to enable "fast" page faults on ARM?  E.g. to fixup access
> > >     faults (handle_access_fault()) and/or write-protection faults without acquiring
> > >     mmu_lock?
> > 
> > I don't have any plans personally.
> > 
> > OTOH, adding support for read-side access faults is trivial, I just
> > didn't give it much thought as most large-scale implementations have
> > FEAT_HAFDBS (hardware access flag management).
> 
> WDYT of permission relaxation (write-protection faults) on the fast
> path?
> 
> The benefits won't be as good as in x86 due to the required TLBI, but
> may be worth it due to not dealing with the mmu lock and avoiding some
> of the context save/restore.  Note that unlike x86, in ARM the TLB entry
> related to a protection fault needs to be flushed.

Right, the only guarantee we have on arm64 is that the TLB will never
hold an entry that would produce an access fault.

I have no issues whatsoever with implementing a lock-free walker, we're
already most of the way there with the RCU implementation modulo some
rules for atomic PTE updates. I don't believe lock acquisition is a
bounding issue for us quite yet as break-before-make + lazy splitting
hurts.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-11-14 17:42         ` Oliver Upton
  (?)
@ 2022-12-05  5:51           ` Mingwei Zhang
  -1 siblings, 0 replies; 156+ messages in thread
From: Mingwei Zhang @ 2022-12-05  5:51 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marek Szyprowski, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm

On Mon, Nov 14, 2022, Oliver Upton wrote:
> Hi Marek,
> 
> On Mon, Nov 14, 2022 at 03:29:14PM +0100, Marek Szyprowski wrote:
> > This patch landed in today's linux-next (20221114) as commit 
> > c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
> > Unfortunately it introduces a following warning:
> 
> Thanks for the bug report :) I had failed to test nVHE in the past few
> revisions of this series.
> 
> > --->8---
> > 
> > kvm [1]: IPA Size Limit: 40 bits
> > BUG: sleeping function called from invalid context at 
> > include/linux/sched/mm.h:274
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > 2 locks held by swapper/0/1:
> >   #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
> > __create_hyp_mappings+0x80/0xc4
> >   #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
> > kvm_pgtable_walk+0x0/0x1f4
> > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
> > Hardware name: Raspberry Pi 3 Model B (DT)
> > Call trace:
> >   dump_backtrace.part.0+0xe4/0xf0
> >   show_stack+0x18/0x40
> >   dump_stack_lvl+0x8c/0xb8
> >   dump_stack+0x18/0x34
> >   __might_resched+0x178/0x220
> >   __might_sleep+0x48/0xa0
> >   prepare_alloc_pages+0x178/0x1a0
> >   __alloc_pages+0x9c/0x109c
> >   alloc_page_interleave+0x1c/0xc4
> >   alloc_pages+0xec/0x160
> >   get_zeroed_page+0x1c/0x44
> >   kvm_hyp_zalloc_page+0x14/0x20
> >   hyp_map_walker+0xd4/0x134
> >   kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
> >   __kvm_pgtable_walk+0x1a4/0x220
> >   kvm_pgtable_walk+0x104/0x1f4
> >   kvm_pgtable_hyp_map+0x80/0xc4
> >   __create_hyp_mappings+0x9c/0xc4
> >   kvm_mmu_init+0x144/0x1cc
> >   kvm_arch_init+0xe4/0xef4
> >   kvm_init+0x3c/0x3d0
> >   arm_init+0x20/0x30
> >   do_one_initcall+0x74/0x400
> >   kernel_init_freeable+0x2e0/0x350
> >   kernel_init+0x24/0x130
> >   ret_from_fork+0x10/0x20
> > kvm [1]: Hyp mode initialized successfully
> > 
> > --->8----
> > 
> > I looks that more changes in the KVM code are needed to use RCU for that 
> > code.
> 
> Right, the specific issue is that while the stage-2 walkers preallocate
> any table memory they may need, the hyp walkers do not and allocate
> inline.
> 
> As hyp stage-1 is protected by a spinlock there is no actual need for
> RCU in that case. I'll post something later on today that addresses the
> issue.
> 

For each stage-2 page table walk, KVM will use
kvm_mmu_topup_memory_cache() before taking the mmu lock. This ensures
whoever holding the mmu lock won't sleep. hyp walkers seems to
miss  this notion completely, whic makes me puzzeled. Using a spinlock
only ensures functionality but seems quite inefficient if the one who
holds the spinlock try to allocate pages and sleep...

But that seems to be a separate problem for nvhe. Why do we need an RCU
lock here. Oh is it for batching?

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-12-05  5:51           ` Mingwei Zhang
  0 siblings, 0 replies; 156+ messages in thread
From: Mingwei Zhang @ 2022-12-05  5:51 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, Ben Gardon,
	David Matlack, kvmarm, linux-arm-kernel, Marek Szyprowski

On Mon, Nov 14, 2022, Oliver Upton wrote:
> Hi Marek,
> 
> On Mon, Nov 14, 2022 at 03:29:14PM +0100, Marek Szyprowski wrote:
> > This patch landed in today's linux-next (20221114) as commit 
> > c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
> > Unfortunately it introduces a following warning:
> 
> Thanks for the bug report :) I had failed to test nVHE in the past few
> revisions of this series.
> 
> > --->8---
> > 
> > kvm [1]: IPA Size Limit: 40 bits
> > BUG: sleeping function called from invalid context at 
> > include/linux/sched/mm.h:274
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > 2 locks held by swapper/0/1:
> >   #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
> > __create_hyp_mappings+0x80/0xc4
> >   #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
> > kvm_pgtable_walk+0x0/0x1f4
> > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
> > Hardware name: Raspberry Pi 3 Model B (DT)
> > Call trace:
> >   dump_backtrace.part.0+0xe4/0xf0
> >   show_stack+0x18/0x40
> >   dump_stack_lvl+0x8c/0xb8
> >   dump_stack+0x18/0x34
> >   __might_resched+0x178/0x220
> >   __might_sleep+0x48/0xa0
> >   prepare_alloc_pages+0x178/0x1a0
> >   __alloc_pages+0x9c/0x109c
> >   alloc_page_interleave+0x1c/0xc4
> >   alloc_pages+0xec/0x160
> >   get_zeroed_page+0x1c/0x44
> >   kvm_hyp_zalloc_page+0x14/0x20
> >   hyp_map_walker+0xd4/0x134
> >   kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
> >   __kvm_pgtable_walk+0x1a4/0x220
> >   kvm_pgtable_walk+0x104/0x1f4
> >   kvm_pgtable_hyp_map+0x80/0xc4
> >   __create_hyp_mappings+0x9c/0xc4
> >   kvm_mmu_init+0x144/0x1cc
> >   kvm_arch_init+0xe4/0xef4
> >   kvm_init+0x3c/0x3d0
> >   arm_init+0x20/0x30
> >   do_one_initcall+0x74/0x400
> >   kernel_init_freeable+0x2e0/0x350
> >   kernel_init+0x24/0x130
> >   ret_from_fork+0x10/0x20
> > kvm [1]: Hyp mode initialized successfully
> > 
> > --->8----
> > 
> > I looks that more changes in the KVM code are needed to use RCU for that 
> > code.
> 
> Right, the specific issue is that while the stage-2 walkers preallocate
> any table memory they may need, the hyp walkers do not and allocate
> inline.
> 
> As hyp stage-1 is protected by a spinlock there is no actual need for
> RCU in that case. I'll post something later on today that addresses the
> issue.
> 

For each stage-2 page table walk, KVM will use
kvm_mmu_topup_memory_cache() before taking the mmu lock. This ensures
whoever holding the mmu lock won't sleep. hyp walkers seems to
miss  this notion completely, whic makes me puzzeled. Using a spinlock
only ensures functionality but seems quite inefficient if the one who
holds the spinlock try to allocate pages and sleep...

But that seems to be a separate problem for nvhe. Why do we need an RCU
lock here. Oh is it for batching?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-12-05  5:51           ` Mingwei Zhang
  0 siblings, 0 replies; 156+ messages in thread
From: Mingwei Zhang @ 2022-12-05  5:51 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marek Szyprowski, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm

On Mon, Nov 14, 2022, Oliver Upton wrote:
> Hi Marek,
> 
> On Mon, Nov 14, 2022 at 03:29:14PM +0100, Marek Szyprowski wrote:
> > This patch landed in today's linux-next (20221114) as commit 
> > c3119ae45dfb ("KVM: arm64: Protect stage-2 traversal with RCU"). 
> > Unfortunately it introduces a following warning:
> 
> Thanks for the bug report :) I had failed to test nVHE in the past few
> revisions of this series.
> 
> > --->8---
> > 
> > kvm [1]: IPA Size Limit: 40 bits
> > BUG: sleeping function called from invalid context at 
> > include/linux/sched/mm.h:274
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > 2 locks held by swapper/0/1:
> >   #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: 
> > __create_hyp_mappings+0x80/0xc4
> >   #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: 
> > kvm_pgtable_walk+0x0/0x1f4
> > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
> > Hardware name: Raspberry Pi 3 Model B (DT)
> > Call trace:
> >   dump_backtrace.part.0+0xe4/0xf0
> >   show_stack+0x18/0x40
> >   dump_stack_lvl+0x8c/0xb8
> >   dump_stack+0x18/0x34
> >   __might_resched+0x178/0x220
> >   __might_sleep+0x48/0xa0
> >   prepare_alloc_pages+0x178/0x1a0
> >   __alloc_pages+0x9c/0x109c
> >   alloc_page_interleave+0x1c/0xc4
> >   alloc_pages+0xec/0x160
> >   get_zeroed_page+0x1c/0x44
> >   kvm_hyp_zalloc_page+0x14/0x20
> >   hyp_map_walker+0xd4/0x134
> >   kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
> >   __kvm_pgtable_walk+0x1a4/0x220
> >   kvm_pgtable_walk+0x104/0x1f4
> >   kvm_pgtable_hyp_map+0x80/0xc4
> >   __create_hyp_mappings+0x9c/0xc4
> >   kvm_mmu_init+0x144/0x1cc
> >   kvm_arch_init+0xe4/0xef4
> >   kvm_init+0x3c/0x3d0
> >   arm_init+0x20/0x30
> >   do_one_initcall+0x74/0x400
> >   kernel_init_freeable+0x2e0/0x350
> >   kernel_init+0x24/0x130
> >   ret_from_fork+0x10/0x20
> > kvm [1]: Hyp mode initialized successfully
> > 
> > --->8----
> > 
> > I looks that more changes in the KVM code are needed to use RCU for that 
> > code.
> 
> Right, the specific issue is that while the stage-2 walkers preallocate
> any table memory they may need, the hyp walkers do not and allocate
> inline.
> 
> As hyp stage-1 is protected by a spinlock there is no actual need for
> RCU in that case. I'll post something later on today that addresses the
> issue.
> 

For each stage-2 page table walk, KVM will use
kvm_mmu_topup_memory_cache() before taking the mmu lock. This ensures
whoever holding the mmu lock won't sleep. hyp walkers seems to
miss  this notion completely, whic makes me puzzeled. Using a spinlock
only ensures functionality but seems quite inefficient if the one who
holds the spinlock try to allocate pages and sleep...

But that seems to be a separate problem for nvhe. Why do we need an RCU
lock here. Oh is it for batching?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
  2022-12-05  5:51           ` Mingwei Zhang
  (?)
@ 2022-12-05  7:47             ` Oliver Upton
  -1 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-12-05  7:47 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Marek Szyprowski, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm

Hi Mingwei,

On Mon, Dec 05, 2022 at 05:51:13AM +0000, Mingwei Zhang wrote:
> On Mon, Nov 14, 2022, Oliver Upton wrote:

[...]

> > As hyp stage-1 is protected by a spinlock there is no actual need for
> > RCU in that case. I'll post something later on today that addresses the
> > issue.
> > 
> 
> For each stage-2 page table walk, KVM will use
> kvm_mmu_topup_memory_cache() before taking the mmu lock. This ensures
> whoever holding the mmu lock won't sleep. hyp walkers seems to
> miss  this notion completely, whic makes me puzzeled. Using a spinlock
> only ensures functionality but seems quite inefficient if the one who
> holds the spinlock try to allocate pages and sleep...

You're probably confused by my mischaracterization in the above
paragraph. Hyp stage-1 walkers (outside of pKVM) are guarded with a
mutex and are perfectly able to sleep. The erroneous application of RCU
led to this path becoming non-sleepable, hence the bug.

pKVM's own hyp stage-1 walkers are guarded by a spinlock, but the memory
allocations come from its own allocator and there is no concept of a
scheduler at EL2.

> Why do we need an RCU lock here. Oh is it for batching?

We definitely don't need RCU here, thus the corrective measure was to
avoid RCU for exclusive table walks.

https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git/commit/?h=next&id=b7833bf202e3068abb77c642a0843f696e9c8d38

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-12-05  7:47             ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-12-05  7:47 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: kvm, Marc Zyngier, Will Deacon, kvmarm, Ben Gardon,
	David Matlack, kvmarm, linux-arm-kernel, Marek Szyprowski

Hi Mingwei,

On Mon, Dec 05, 2022 at 05:51:13AM +0000, Mingwei Zhang wrote:
> On Mon, Nov 14, 2022, Oliver Upton wrote:

[...]

> > As hyp stage-1 is protected by a spinlock there is no actual need for
> > RCU in that case. I'll post something later on today that addresses the
> > issue.
> > 
> 
> For each stage-2 page table walk, KVM will use
> kvm_mmu_topup_memory_cache() before taking the mmu lock. This ensures
> whoever holding the mmu lock won't sleep. hyp walkers seems to
> miss  this notion completely, whic makes me puzzeled. Using a spinlock
> only ensures functionality but seems quite inefficient if the one who
> holds the spinlock try to allocate pages and sleep...

You're probably confused by my mischaracterization in the above
paragraph. Hyp stage-1 walkers (outside of pKVM) are guarded with a
mutex and are perfectly able to sleep. The erroneous application of RCU
led to this path becoming non-sleepable, hence the bug.

pKVM's own hyp stage-1 walkers are guarded by a spinlock, but the memory
allocations come from its own allocator and there is no concept of a
scheduler at EL2.

> Why do we need an RCU lock here. Oh is it for batching?

We definitely don't need RCU here, thus the corrective measure was to
avoid RCU for exclusive table walks.

https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git/commit/?h=next&id=b7833bf202e3068abb77c642a0843f696e9c8d38

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU
@ 2022-12-05  7:47             ` Oliver Upton
  0 siblings, 0 replies; 156+ messages in thread
From: Oliver Upton @ 2022-12-05  7:47 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Marek Szyprowski, Marc Zyngier, James Morse, Alexandru Elisei,
	linux-arm-kernel, kvmarm, kvm, Reiji Watanabe, Ricardo Koller,
	David Matlack, Quentin Perret, Ben Gardon, Gavin Shan, Peter Xu,
	Will Deacon, Sean Christopherson, kvmarm

Hi Mingwei,

On Mon, Dec 05, 2022 at 05:51:13AM +0000, Mingwei Zhang wrote:
> On Mon, Nov 14, 2022, Oliver Upton wrote:

[...]

> > As hyp stage-1 is protected by a spinlock there is no actual need for
> > RCU in that case. I'll post something later on today that addresses the
> > issue.
> > 
> 
> For each stage-2 page table walk, KVM will use
> kvm_mmu_topup_memory_cache() before taking the mmu lock. This ensures
> whoever holding the mmu lock won't sleep. hyp walkers seems to
> miss  this notion completely, whic makes me puzzeled. Using a spinlock
> only ensures functionality but seems quite inefficient if the one who
> holds the spinlock try to allocate pages and sleep...

You're probably confused by my mischaracterization in the above
paragraph. Hyp stage-1 walkers (outside of pKVM) are guarded with a
mutex and are perfectly able to sleep. The erroneous application of RCU
led to this path becoming non-sleepable, hence the bug.

pKVM's own hyp stage-1 walkers are guarded by a spinlock, but the memory
allocations come from its own allocator and there is no concept of a
scheduler at EL2.

> Why do we need an RCU lock here. Oh is it for batching?

We definitely don't need RCU here, thus the corrective measure was to
avoid RCU for exclusive table walks.

https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git/commit/?h=next&id=b7833bf202e3068abb77c642a0843f696e9c8d38

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 156+ messages in thread

end of thread, other threads:[~2022-12-05  7:49 UTC | newest]

Thread overview: 156+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-07 21:56 [PATCH v5 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
2022-11-07 21:56 ` Oliver Upton
2022-11-07 21:56 ` Oliver Upton
2022-11-07 21:56 ` [PATCH v5 01/14] KVM: arm64: Combine visitor arguments into a context structure Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:23   ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-09 22:48     ` Oliver Upton
2022-11-09 22:48       ` Oliver Upton
2022-11-09 22:48       ` Oliver Upton
2022-11-10  0:23   ` Gavin Shan
2022-11-10  0:23     ` Gavin Shan
2022-11-10  0:23     ` Gavin Shan
2022-11-10  0:42     ` Oliver Upton
2022-11-10  0:42       ` Oliver Upton
2022-11-10  0:42       ` Oliver Upton
2022-11-10  3:40       ` Gavin Shan
2022-11-10  3:40         ` Gavin Shan
2022-11-10  3:40         ` Gavin Shan
2022-11-07 21:56 ` [PATCH v5 02/14] KVM: arm64: Stash observed pte value in visitor context Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:23   ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-10  4:55   ` Gavin Shan
2022-11-10  4:55     ` Gavin Shan
2022-11-10  4:55     ` Gavin Shan
2022-11-07 21:56 ` [PATCH v5 03/14] KVM: arm64: Pass mm_ops through the " Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:23   ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-10  5:22   ` Gavin Shan
2022-11-10  5:22     ` Gavin Shan
2022-11-10  5:22     ` Gavin Shan
2022-11-10  5:30   ` Gavin Shan
2022-11-10  5:30     ` Gavin Shan
2022-11-10  5:30     ` Gavin Shan
2022-11-07 21:56 ` [PATCH v5 04/14] KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:23   ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-10  5:30   ` Gavin Shan
2022-11-10  5:30     ` Gavin Shan
2022-11-10  5:30     ` Gavin Shan
2022-11-10  5:38     ` Oliver Upton
2022-11-10  5:38       ` Oliver Upton
2022-11-10  5:38       ` Oliver Upton
2022-11-07 21:56 ` [PATCH v5 05/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:23   ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-09 22:54     ` Oliver Upton
2022-11-09 22:54       ` Oliver Upton
2022-11-09 22:54       ` Oliver Upton
2022-11-07 21:56 ` [PATCH v5 06/14] KVM: arm64: Use an opaque type for pteps Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:23   ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-09 22:23     ` Ben Gardon
2022-11-07 21:56 ` [PATCH v5 07/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:24   ` Ben Gardon
2022-11-09 22:24     ` Ben Gardon
2022-11-09 22:24     ` Ben Gardon
2022-11-07 21:56 ` [PATCH v5 08/14] KVM: arm64: Protect stage-2 traversal with RCU Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 21:53   ` Sean Christopherson
2022-11-09 21:53     ` Sean Christopherson
2022-11-09 21:53     ` Sean Christopherson
2022-11-09 23:55     ` Oliver Upton
2022-11-09 23:55       ` Oliver Upton
2022-11-09 23:55       ` Oliver Upton
2022-11-15 18:47       ` Ricardo Koller
2022-11-15 18:47         ` Ricardo Koller
2022-11-15 18:47         ` Ricardo Koller
2022-11-15 18:57         ` Oliver Upton
2022-11-15 18:57           ` Oliver Upton
2022-11-15 18:57           ` Oliver Upton
2022-11-09 22:25   ` Ben Gardon
2022-11-09 22:25     ` Ben Gardon
2022-11-09 22:25     ` Ben Gardon
2022-11-10 13:34     ` Marc Zyngier
2022-11-10 13:34       ` Marc Zyngier
2022-11-10 13:34       ` Marc Zyngier
     [not found]   ` <CGME20221114142915eucas1p258f3ca2c536bde712c068e96851468fd@eucas1p2.samsung.com>
2022-11-14 14:29     ` Marek Szyprowski
2022-11-14 14:29       ` Marek Szyprowski
2022-11-14 14:29       ` Marek Szyprowski
2022-11-14 17:42       ` Oliver Upton
2022-11-14 17:42         ` Oliver Upton
2022-11-14 17:42         ` Oliver Upton
2022-12-05  5:51         ` Mingwei Zhang
2022-12-05  5:51           ` Mingwei Zhang
2022-12-05  5:51           ` Mingwei Zhang
2022-12-05  7:47           ` Oliver Upton
2022-12-05  7:47             ` Oliver Upton
2022-12-05  7:47             ` Oliver Upton
2022-11-07 21:56 ` [PATCH v5 09/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:26   ` Ben Gardon
2022-11-09 22:26     ` Ben Gardon
2022-11-09 22:26     ` Ben Gardon
2022-11-09 22:42     ` Sean Christopherson
2022-11-09 22:42       ` Sean Christopherson
2022-11-09 22:42       ` Sean Christopherson
2022-11-09 23:00       ` Ben Gardon
2022-11-09 23:00         ` Ben Gardon
2022-11-09 23:00         ` Ben Gardon
2022-11-10 13:40         ` Marc Zyngier
2022-11-10 13:40           ` Marc Zyngier
2022-11-10 13:40           ` Marc Zyngier
2022-11-07 21:56 ` [PATCH v5 10/14] KVM: arm64: Split init and set for table PTE Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-07 21:56   ` Oliver Upton
2022-11-09 22:26   ` Ben Gardon
2022-11-09 22:26     ` Ben Gardon
2022-11-09 22:26     ` Ben Gardon
2022-11-09 23:00     ` Oliver Upton
2022-11-09 23:00       ` Oliver Upton
2022-11-09 23:00       ` Oliver Upton
2022-11-07 21:58 ` [PATCH v5 11/14] KVM: arm64: Make block->table PTE changes parallel-aware Oliver Upton
2022-11-07 21:58   ` Oliver Upton
2022-11-07 21:58   ` Oliver Upton
2022-11-09 22:26   ` Ben Gardon
2022-11-09 22:26     ` Ben Gardon
2022-11-09 22:26     ` Ben Gardon
2022-11-09 23:03     ` Oliver Upton
2022-11-09 23:03       ` Oliver Upton
2022-11-09 23:03       ` Oliver Upton
2022-11-07 21:59 ` [PATCH v5 12/14] KVM: arm64: Make leaf->leaf " Oliver Upton
2022-11-07 21:59   ` Oliver Upton
2022-11-07 21:59   ` Oliver Upton
2022-11-09 22:26   ` Ben Gardon
2022-11-09 22:26     ` Ben Gardon
2022-11-09 22:26     ` Ben Gardon
2022-11-07 22:00 ` [PATCH v5 13/14] KVM: arm64: Make table->block " Oliver Upton
2022-11-07 22:00   ` Oliver Upton
2022-11-07 22:00   ` Oliver Upton
2022-11-07 22:00 ` [PATCH v5 14/14] KVM: arm64: Handle stage-2 faults in parallel Oliver Upton
2022-11-07 22:00   ` Oliver Upton
2022-11-07 22:00   ` Oliver Upton
2022-11-11 15:47 ` [PATCH v5 00/14] KVM: arm64: Parallel stage-2 fault handling Marc Zyngier
2022-11-11 15:47   ` Marc Zyngier
2022-11-11 15:47   ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.