All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] KVM: arm64: Add support for FEAT_TLBIRANGE
@ 2023-05-19  0:52 ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

In certain code paths, KVM/ARM currently invalidates the entire VM's
page-tables instead of just invalidating a necessary range. For example,
when collapsing a table PTE to a block PTE, instead of iterating over
each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to
flush all the entries. This is inefficient since the guest would have
to refill the TLBs again, even for the addresses that aren't covered
by the table entry. The performance impact would scale poorly if many
addresses in the VM is going through this remapping.

For architectures that implement FEAT_TLBIRANGE, KVM can replace such
inefficient paths by performing the invalidations only on the range of
addresses that are in scope. This series tries to achieve the same in
the areas of stage-2 map, unmap and write-protecting the pages.

Patch-1 refactors the core arm64's __flush_tlb_range() to be used by
other entities.

Patch-2 adds a range-based TLBI mechanism for KVM (VHE and nVHE).

Patch-3 implements the kvm_arch_flush_remote_tlbs_range() for arm64.

Patch-4 aims to flush only the memslot that undergoes a write-protect,
instead of the entire VM.

Patch-5 operates on stage2_try_break_pte() to use the range based
TLBI instructions when collapsing a table entry. The map path is the
immediate consumer of this when KVM remaps a table entry into a block.

Patch-6 modifies the stage-2 unmap path in which, if the system supports
FEAT_TLBIRANGE, the TLB invalidations are skipped during the page-table.
walk. Instead it's done in one go after the entire walk is finished.

The series is based off of upstream v6.4-rc2, and applied David
Matlack's common API for TLB invalidations[1] on top.

The performance evaluation was done on a hardware that supports
FEAT_TLBIRANGE, on a VHE configuration, using a modified
kvm_page_table_test.
The modified version updates the guest code in the ADJUST_MAPPINGS case
to not only access this page but also to access up to 512 pages
backwards
for every new page it iterates through. This is done to test the effect
of TLBI misses after KVM has handled a fault.

The series captures the impact in the map and unmap paths as described
above.

$ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i

+--------+------------------------------+------------------------------+
| mem_sz |    ADJUST_MAPPINGS (s)       |      Unmap VM (s)            |
|  (GB)  | Baseline | Baseline + series | Baseline | Baseline + series |
+--------+----------|-------------------+------------------------------+
|   1    |   3.44   |   2.97            | 0.007     | 0.005            |
|   2    |   5.56   |   5.63            | 0.010     | 0.006            |
|   4    |  11.03   |  10.44            | 0.015     | 0.008            |
|   8    |  24.54   |  19.00            | 0.024     | 0.011            |
|  16    |  40.16   |  36.83            | 0.041     | 0.018            |
|  32    |  75.76   |  73.84            | 0.074     | 0.029            |
|  64    | 151.58   | 152.62            | 0.148     | 0.050            |
| 128    | 330.42   | 306.86            | 0.280     | 0.090            |
+--------+----------+-------------------+----------+-------------------+

$ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i

+--------+------------------------------+
| vCPUs  |    ADJUST_MAPPINGS (s)       |
|        | Baseline | Baseline + series |
+--------+----------|-------------------+
|   1    | 138.69   | 135.58            |
|   2    | 138.77   | 137.54            |
|   4    | 162.57   | 135.82            |
|   8    | 154.92   | 143.67            |
|  16    | 122.02   | 118.86            |
|  32    | 119.99   | 118.81            |
|  64    | 190.70   | 169.36            |
| 128    | 330.42   | 306.86            |   
+--------+----------+-------------------+

For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to
2M hugepages, the series sees an average improvement of ~7%. For
unmapping 2M hugepages, we see at least a 3x improvement.

$ kvm_page_table_test -m 2 -b $i

+--------+------------------------------+
| mem_sz |      Unmap VM (s)            |
|  (GB)  | Baseline | Baseline + series |
+--------+------------------------------+
|   1    |  0.52    |  0.13             |
|   2    |  1.03    |  0.25             |
|   4    |  2.04    |  0.47             |
|   8    |  4.05    |  0.94             |
|  16    |  8.11    |  1.82             |
|  32    | 16.11    |  3.69             |
|  64    | 32.35    |  7.22             |
| 128    | 64.66    | 14.69             |   
+--------+----------+-------------------+

The series sees an average gain of 4x when the guest backed by
PAGE_SIZE (4K) pages.

v4:
Thanks again, Oliver for all the comments
- Updated the __kvm_tlb_flush_vmid_range() implementation for
  nVHE to adjust with the modfied __tlb_switch_to_guest() that
  accepts a new 'bool nsh' arg.
- Renamed stage2_put_pte() to stage2_unmap_put_pte() and removed
  the 'skip_flush' argument.
- Defined stage2_unmap_defer_tlb_flush() to check if the PTE
  flushes can be deferred during the unmap table walk. It's
  being called from stage2_unmap_put_pte() and
  kvm_pgtable_stage2_unmap().
- Got rid of the 'struct stage2_unmap_data'.

v3:
https://lore.kernel.org/all/20230414172922.812640-1-rananta@google.com/
Thanks, Oliver for all the suggestions.
- The core flush API (__kvm_tlb_flush_vmid_range()) now checks if
  the system support FEAT_TLBIRANGE or not, thus elimiating the
  redundancy in the upper layers.
- If FEAT_TLBIRANGE is not supported, the implementation falls
  back to invalidating all the TLB entries with the VMID, instead
  of doing an iterative flush for the range.
- The kvm_arch_flush_remote_tlbs_range() doesn't return -EOPNOTSUPP
  if the system doesn't implement FEAT_TLBIRANGE. It depends on
  __kvm_tlb_flush_vmid_range() to do take care of the decisions
  and return 0 regardless of the underlying feature support.
- __kvm_tlb_flush_vmid_range() doesn't take 'level' as input to
  calculate the 'stride'. Instead, it always assumes PAGE_SIZE.
- Fast unmap path is eliminated. Instead, the existing unmap walker
  is modified to skip the TLBIs during the walk, and do it all at
  once after the walk, using the range-based instructions.

v2:
https://lore.kernel.org/all/20230206172340.2639971-1-rananta@google.com/
- Rebased the series on top of David Matlack's series for common
  TLB invalidation API[1].
- Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending
  the support introduced by [1].
- Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush
  only the current memslot after write-protect.
- Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an
  argument to calculate the 'stride' instead of just using PAGE_SIZE.
- Split the patch that introduces the range-based TLBI to KVM and the
  implementation of IPA-based invalidation into its own patches.
- Dropped the patch that tries to optimize the mmu notifiers paths.
- Rename the function kvm_table_pte_flush() to
  kvm_pgtable_stage2_flush_range(), and accept the range of addresses to
  flush. [Oliver]
- Drop the 'tlb_level' argument for stage2_try_break_pte() and directly
  pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver]

v1:
https://lore.kernel.org/all/20230109215347.3119271-1-rananta@google.com/

Thank you.
Raghavendra

[1]:
https://lore.kernel.org/linux-arm-kernel/20230126184025.2294823-1-dmatlack@google.com/

Raghavendra Rao Ananta (6):
  arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
  KVM: arm64: Implement  __kvm_tlb_flush_vmid_range()
  KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  KVM: arm64: Flush only the memslot after write-protect
  KVM: arm64: Invalidate the table entries upon a range
  KVM: arm64: Use TLBI range-based intructions for unmap

 arch/arm64/include/asm/kvm_asm.h   |   3 +
 arch/arm64/include/asm/kvm_host.h  |   3 +
 arch/arm64/include/asm/tlbflush.h  | 108 +++++++++++++++--------------
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |  11 +++
 arch/arm64/kvm/hyp/nvhe/tlb.c      |  37 ++++++++++
 arch/arm64/kvm/hyp/pgtable.c       |  44 +++++++++---
 arch/arm64/kvm/hyp/vhe/tlb.c       |  35 ++++++++++
 arch/arm64/kvm/mmu.c               |  13 +++-
 8 files changed, 192 insertions(+), 62 deletions(-)

-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 0/6] KVM: arm64: Add support for FEAT_TLBIRANGE
@ 2023-05-19  0:52 ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

In certain code paths, KVM/ARM currently invalidates the entire VM's
page-tables instead of just invalidating a necessary range. For example,
when collapsing a table PTE to a block PTE, instead of iterating over
each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to
flush all the entries. This is inefficient since the guest would have
to refill the TLBs again, even for the addresses that aren't covered
by the table entry. The performance impact would scale poorly if many
addresses in the VM is going through this remapping.

For architectures that implement FEAT_TLBIRANGE, KVM can replace such
inefficient paths by performing the invalidations only on the range of
addresses that are in scope. This series tries to achieve the same in
the areas of stage-2 map, unmap and write-protecting the pages.

Patch-1 refactors the core arm64's __flush_tlb_range() to be used by
other entities.

Patch-2 adds a range-based TLBI mechanism for KVM (VHE and nVHE).

Patch-3 implements the kvm_arch_flush_remote_tlbs_range() for arm64.

Patch-4 aims to flush only the memslot that undergoes a write-protect,
instead of the entire VM.

Patch-5 operates on stage2_try_break_pte() to use the range based
TLBI instructions when collapsing a table entry. The map path is the
immediate consumer of this when KVM remaps a table entry into a block.

Patch-6 modifies the stage-2 unmap path in which, if the system supports
FEAT_TLBIRANGE, the TLB invalidations are skipped during the page-table.
walk. Instead it's done in one go after the entire walk is finished.

The series is based off of upstream v6.4-rc2, and applied David
Matlack's common API for TLB invalidations[1] on top.

The performance evaluation was done on a hardware that supports
FEAT_TLBIRANGE, on a VHE configuration, using a modified
kvm_page_table_test.
The modified version updates the guest code in the ADJUST_MAPPINGS case
to not only access this page but also to access up to 512 pages
backwards
for every new page it iterates through. This is done to test the effect
of TLBI misses after KVM has handled a fault.

The series captures the impact in the map and unmap paths as described
above.

$ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i

+--------+------------------------------+------------------------------+
| mem_sz |    ADJUST_MAPPINGS (s)       |      Unmap VM (s)            |
|  (GB)  | Baseline | Baseline + series | Baseline | Baseline + series |
+--------+----------|-------------------+------------------------------+
|   1    |   3.44   |   2.97            | 0.007     | 0.005            |
|   2    |   5.56   |   5.63            | 0.010     | 0.006            |
|   4    |  11.03   |  10.44            | 0.015     | 0.008            |
|   8    |  24.54   |  19.00            | 0.024     | 0.011            |
|  16    |  40.16   |  36.83            | 0.041     | 0.018            |
|  32    |  75.76   |  73.84            | 0.074     | 0.029            |
|  64    | 151.58   | 152.62            | 0.148     | 0.050            |
| 128    | 330.42   | 306.86            | 0.280     | 0.090            |
+--------+----------+-------------------+----------+-------------------+

$ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i

+--------+------------------------------+
| vCPUs  |    ADJUST_MAPPINGS (s)       |
|        | Baseline | Baseline + series |
+--------+----------|-------------------+
|   1    | 138.69   | 135.58            |
|   2    | 138.77   | 137.54            |
|   4    | 162.57   | 135.82            |
|   8    | 154.92   | 143.67            |
|  16    | 122.02   | 118.86            |
|  32    | 119.99   | 118.81            |
|  64    | 190.70   | 169.36            |
| 128    | 330.42   | 306.86            |   
+--------+----------+-------------------+

For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to
2M hugepages, the series sees an average improvement of ~7%. For
unmapping 2M hugepages, we see at least a 3x improvement.

$ kvm_page_table_test -m 2 -b $i

+--------+------------------------------+
| mem_sz |      Unmap VM (s)            |
|  (GB)  | Baseline | Baseline + series |
+--------+------------------------------+
|   1    |  0.52    |  0.13             |
|   2    |  1.03    |  0.25             |
|   4    |  2.04    |  0.47             |
|   8    |  4.05    |  0.94             |
|  16    |  8.11    |  1.82             |
|  32    | 16.11    |  3.69             |
|  64    | 32.35    |  7.22             |
| 128    | 64.66    | 14.69             |   
+--------+----------+-------------------+

The series sees an average gain of 4x when the guest backed by
PAGE_SIZE (4K) pages.

v4:
Thanks again, Oliver for all the comments
- Updated the __kvm_tlb_flush_vmid_range() implementation for
  nVHE to adjust with the modfied __tlb_switch_to_guest() that
  accepts a new 'bool nsh' arg.
- Renamed stage2_put_pte() to stage2_unmap_put_pte() and removed
  the 'skip_flush' argument.
- Defined stage2_unmap_defer_tlb_flush() to check if the PTE
  flushes can be deferred during the unmap table walk. It's
  being called from stage2_unmap_put_pte() and
  kvm_pgtable_stage2_unmap().
- Got rid of the 'struct stage2_unmap_data'.

v3:
https://lore.kernel.org/all/20230414172922.812640-1-rananta@google.com/
Thanks, Oliver for all the suggestions.
- The core flush API (__kvm_tlb_flush_vmid_range()) now checks if
  the system support FEAT_TLBIRANGE or not, thus elimiating the
  redundancy in the upper layers.
- If FEAT_TLBIRANGE is not supported, the implementation falls
  back to invalidating all the TLB entries with the VMID, instead
  of doing an iterative flush for the range.
- The kvm_arch_flush_remote_tlbs_range() doesn't return -EOPNOTSUPP
  if the system doesn't implement FEAT_TLBIRANGE. It depends on
  __kvm_tlb_flush_vmid_range() to do take care of the decisions
  and return 0 regardless of the underlying feature support.
- __kvm_tlb_flush_vmid_range() doesn't take 'level' as input to
  calculate the 'stride'. Instead, it always assumes PAGE_SIZE.
- Fast unmap path is eliminated. Instead, the existing unmap walker
  is modified to skip the TLBIs during the walk, and do it all at
  once after the walk, using the range-based instructions.

v2:
https://lore.kernel.org/all/20230206172340.2639971-1-rananta@google.com/
- Rebased the series on top of David Matlack's series for common
  TLB invalidation API[1].
- Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending
  the support introduced by [1].
- Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush
  only the current memslot after write-protect.
- Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an
  argument to calculate the 'stride' instead of just using PAGE_SIZE.
- Split the patch that introduces the range-based TLBI to KVM and the
  implementation of IPA-based invalidation into its own patches.
- Dropped the patch that tries to optimize the mmu notifiers paths.
- Rename the function kvm_table_pte_flush() to
  kvm_pgtable_stage2_flush_range(), and accept the range of addresses to
  flush. [Oliver]
- Drop the 'tlb_level' argument for stage2_try_break_pte() and directly
  pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver]

v1:
https://lore.kernel.org/all/20230109215347.3119271-1-rananta@google.com/

Thank you.
Raghavendra

[1]:
https://lore.kernel.org/linux-arm-kernel/20230126184025.2294823-1-dmatlack@google.com/

Raghavendra Rao Ananta (6):
  arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
  KVM: arm64: Implement  __kvm_tlb_flush_vmid_range()
  KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  KVM: arm64: Flush only the memslot after write-protect
  KVM: arm64: Invalidate the table entries upon a range
  KVM: arm64: Use TLBI range-based intructions for unmap

 arch/arm64/include/asm/kvm_asm.h   |   3 +
 arch/arm64/include/asm/kvm_host.h  |   3 +
 arch/arm64/include/asm/tlbflush.h  | 108 +++++++++++++++--------------
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |  11 +++
 arch/arm64/kvm/hyp/nvhe/tlb.c      |  37 ++++++++++
 arch/arm64/kvm/hyp/pgtable.c       |  44 +++++++++---
 arch/arm64/kvm/hyp/vhe/tlb.c       |  35 ++++++++++
 arch/arm64/kvm/mmu.c               |  13 +++-
 8 files changed, 192 insertions(+), 62 deletions(-)

-- 
2.40.1.698.g37aff9b760-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 1/6] arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
  2023-05-19  0:52 ` Raghavendra Rao Ananta
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm, Catalin Marinas

Currently, the core TLB flush functionality of __flush_tlb_range()
hardcodes vae1is (and variants) for the flush operation. In the
upcoming patches, the KVM code reuses this core algorithm with
ipas2e1is for range based TLB invalidations based on the IPA.
Hence, extract the core flush functionality of __flush_tlb_range()
into its own macro that accepts an 'op' argument to pass any
TLBI operation, such that other callers (KVM) can benefit.

No functional changes intended.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 108 +++++++++++++++---------------
 1 file changed, 55 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 412a3b9a3c25d..4775378b6da1b 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -278,14 +278,61 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
  */
 #define MAX_TLBI_OPS	PTRS_PER_PTE
 
+/* When the CPU does not support TLB range operations, flush the TLB
+ * entries one by one at the granularity of 'stride'. If the TLB
+ * range ops are supported, then:
+ *
+ * 1. If 'pages' is odd, flush the first page through non-range
+ *    operations;
+ *
+ * 2. For remaining pages: the minimum range granularity is decided
+ *    by 'scale', so multiple range TLBI operations may be required.
+ *    Start from scale = 0, flush the corresponding number of pages
+ *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
+ *    until no pages left.
+ *
+ * Note that certain ranges can be represented by either num = 31 and
+ * scale or num = 0 and scale + 1. The loop below favours the latter
+ * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
+ */
+#define __flush_tlb_range_op(op, start, pages, stride,			\
+				asid, tlb_level, tlbi_user) do {	\
+	int num = 0;							\
+	int scale = 0;							\
+	unsigned long addr;						\
+									\
+	while (pages > 0) {						\
+		if (!system_supports_tlb_range() ||			\
+		    pages % 2 == 1) {					\
+			addr = __TLBI_VADDR(start, asid);		\
+			__tlbi_level(op, addr, tlb_level);		\
+			if (tlbi_user)					\
+				__tlbi_user_level(op, addr, tlb_level);	\
+			start += stride;				\
+			pages -= stride >> PAGE_SHIFT;			\
+			continue;					\
+		}							\
+									\
+		num = __TLBI_RANGE_NUM(pages, scale);			\
+		if (num >= 0) {						\
+			addr = __TLBI_VADDR_RANGE(start, asid, scale,	\
+						  num, tlb_level);	\
+			__tlbi(r##op, addr);				\
+			if (tlbi_user)					\
+				__tlbi_user(r##op, addr);		\
+			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
+			pages -= __TLBI_RANGE_PAGES(num, scale);	\
+		}							\
+		scale++;						\
+	}								\
+} while (0)
+
 static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
 				     unsigned long stride, bool last_level,
 				     int tlb_level)
 {
-	int num = 0;
-	int scale = 0;
-	unsigned long asid, addr, pages;
+	unsigned long asid, pages;
 
 	start = round_down(start, stride);
 	end = round_up(end, stride);
@@ -307,56 +354,11 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	dsb(ishst);
 	asid = ASID(vma->vm_mm);
 
-	/*
-	 * When the CPU does not support TLB range operations, flush the TLB
-	 * entries one by one at the granularity of 'stride'. If the TLB
-	 * range ops are supported, then:
-	 *
-	 * 1. If 'pages' is odd, flush the first page through non-range
-	 *    operations;
-	 *
-	 * 2. For remaining pages: the minimum range granularity is decided
-	 *    by 'scale', so multiple range TLBI operations may be required.
-	 *    Start from scale = 0, flush the corresponding number of pages
-	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
-	 *    until no pages left.
-	 *
-	 * Note that certain ranges can be represented by either num = 31 and
-	 * scale or num = 0 and scale + 1. The loop below favours the latter
-	 * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
-	 */
-	while (pages > 0) {
-		if (!system_supports_tlb_range() ||
-		    pages % 2 == 1) {
-			addr = __TLBI_VADDR(start, asid);
-			if (last_level) {
-				__tlbi_level(vale1is, addr, tlb_level);
-				__tlbi_user_level(vale1is, addr, tlb_level);
-			} else {
-				__tlbi_level(vae1is, addr, tlb_level);
-				__tlbi_user_level(vae1is, addr, tlb_level);
-			}
-			start += stride;
-			pages -= stride >> PAGE_SHIFT;
-			continue;
-		}
-
-		num = __TLBI_RANGE_NUM(pages, scale);
-		if (num >= 0) {
-			addr = __TLBI_VADDR_RANGE(start, asid, scale,
-						  num, tlb_level);
-			if (last_level) {
-				__tlbi(rvale1is, addr);
-				__tlbi_user(rvale1is, addr);
-			} else {
-				__tlbi(rvae1is, addr);
-				__tlbi_user(rvae1is, addr);
-			}
-			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
-			pages -= __TLBI_RANGE_PAGES(num, scale);
-		}
-		scale++;
-	}
+	if (last_level)
+		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
+	else
+		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
+
 	dsb(ish);
 }
 
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 1/6] arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm, Catalin Marinas

Currently, the core TLB flush functionality of __flush_tlb_range()
hardcodes vae1is (and variants) for the flush operation. In the
upcoming patches, the KVM code reuses this core algorithm with
ipas2e1is for range based TLB invalidations based on the IPA.
Hence, extract the core flush functionality of __flush_tlb_range()
into its own macro that accepts an 'op' argument to pass any
TLBI operation, such that other callers (KVM) can benefit.

No functional changes intended.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 108 +++++++++++++++---------------
 1 file changed, 55 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 412a3b9a3c25d..4775378b6da1b 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -278,14 +278,61 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
  */
 #define MAX_TLBI_OPS	PTRS_PER_PTE
 
+/* When the CPU does not support TLB range operations, flush the TLB
+ * entries one by one at the granularity of 'stride'. If the TLB
+ * range ops are supported, then:
+ *
+ * 1. If 'pages' is odd, flush the first page through non-range
+ *    operations;
+ *
+ * 2. For remaining pages: the minimum range granularity is decided
+ *    by 'scale', so multiple range TLBI operations may be required.
+ *    Start from scale = 0, flush the corresponding number of pages
+ *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
+ *    until no pages left.
+ *
+ * Note that certain ranges can be represented by either num = 31 and
+ * scale or num = 0 and scale + 1. The loop below favours the latter
+ * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
+ */
+#define __flush_tlb_range_op(op, start, pages, stride,			\
+				asid, tlb_level, tlbi_user) do {	\
+	int num = 0;							\
+	int scale = 0;							\
+	unsigned long addr;						\
+									\
+	while (pages > 0) {						\
+		if (!system_supports_tlb_range() ||			\
+		    pages % 2 == 1) {					\
+			addr = __TLBI_VADDR(start, asid);		\
+			__tlbi_level(op, addr, tlb_level);		\
+			if (tlbi_user)					\
+				__tlbi_user_level(op, addr, tlb_level);	\
+			start += stride;				\
+			pages -= stride >> PAGE_SHIFT;			\
+			continue;					\
+		}							\
+									\
+		num = __TLBI_RANGE_NUM(pages, scale);			\
+		if (num >= 0) {						\
+			addr = __TLBI_VADDR_RANGE(start, asid, scale,	\
+						  num, tlb_level);	\
+			__tlbi(r##op, addr);				\
+			if (tlbi_user)					\
+				__tlbi_user(r##op, addr);		\
+			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
+			pages -= __TLBI_RANGE_PAGES(num, scale);	\
+		}							\
+		scale++;						\
+	}								\
+} while (0)
+
 static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
 				     unsigned long stride, bool last_level,
 				     int tlb_level)
 {
-	int num = 0;
-	int scale = 0;
-	unsigned long asid, addr, pages;
+	unsigned long asid, pages;
 
 	start = round_down(start, stride);
 	end = round_up(end, stride);
@@ -307,56 +354,11 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	dsb(ishst);
 	asid = ASID(vma->vm_mm);
 
-	/*
-	 * When the CPU does not support TLB range operations, flush the TLB
-	 * entries one by one at the granularity of 'stride'. If the TLB
-	 * range ops are supported, then:
-	 *
-	 * 1. If 'pages' is odd, flush the first page through non-range
-	 *    operations;
-	 *
-	 * 2. For remaining pages: the minimum range granularity is decided
-	 *    by 'scale', so multiple range TLBI operations may be required.
-	 *    Start from scale = 0, flush the corresponding number of pages
-	 *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
-	 *    until no pages left.
-	 *
-	 * Note that certain ranges can be represented by either num = 31 and
-	 * scale or num = 0 and scale + 1. The loop below favours the latter
-	 * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
-	 */
-	while (pages > 0) {
-		if (!system_supports_tlb_range() ||
-		    pages % 2 == 1) {
-			addr = __TLBI_VADDR(start, asid);
-			if (last_level) {
-				__tlbi_level(vale1is, addr, tlb_level);
-				__tlbi_user_level(vale1is, addr, tlb_level);
-			} else {
-				__tlbi_level(vae1is, addr, tlb_level);
-				__tlbi_user_level(vae1is, addr, tlb_level);
-			}
-			start += stride;
-			pages -= stride >> PAGE_SHIFT;
-			continue;
-		}
-
-		num = __TLBI_RANGE_NUM(pages, scale);
-		if (num >= 0) {
-			addr = __TLBI_VADDR_RANGE(start, asid, scale,
-						  num, tlb_level);
-			if (last_level) {
-				__tlbi(rvale1is, addr);
-				__tlbi_user(rvale1is, addr);
-			} else {
-				__tlbi(rvae1is, addr);
-				__tlbi_user(rvae1is, addr);
-			}
-			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
-			pages -= __TLBI_RANGE_PAGES(num, scale);
-		}
-		scale++;
-	}
+	if (last_level)
+		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
+	else
+		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
+
 	dsb(ish);
 }
 
-- 
2.40.1.698.g37aff9b760-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 2/6] KVM: arm64: Implement  __kvm_tlb_flush_vmid_range()
  2023-05-19  0:52 ` Raghavendra Rao Ananta
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

Define  __kvm_tlb_flush_vmid_range() (for VHE and nVHE)
to flush a range of stage-2 page-tables using IPA in one go.
If the system supports FEAT_TLBIRANGE, the following patches
would conviniently replace global TLBI such as vmalls12e1is
in the map, unmap, and dirty-logging paths with ripas2e1is
instead.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/include/asm/kvm_asm.h   |  3 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++++++++
 arch/arm64/kvm/hyp/nvhe/tlb.c      | 39 ++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/tlb.c       | 35 +++++++++++++++++++++++++++
 4 files changed, 88 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 43c3bc0f9544d..33352d9399e32 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
 	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
+	__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
@@ -225,6 +226,8 @@ extern void __kvm_flush_vm_context(void);
 extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 				     int level);
+extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+					phys_addr_t start, phys_addr_t end);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 728e01d4536b0..81d30737dc7c9 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -125,6 +125,16 @@ static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
 	__kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
 }
 
+static void
+handle___kvm_tlb_flush_vmid_range(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
+	DECLARE_REG(phys_addr_t, start, host_ctxt, 2);
+	DECLARE_REG(phys_addr_t, end, host_ctxt, 3);
+
+	__kvm_tlb_flush_vmid_range(kern_hyp_va(mmu), start, end);
+}
+
 static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
@@ -315,6 +325,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
+	HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid),
 	HANDLE_FUNC(__kvm_flush_cpu_context),
 	HANDLE_FUNC(__kvm_timer_set_cntvoff),
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index 978179133f4b9..d4ea549c4b5c4 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -130,6 +130,45 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
 	__tlb_switch_to_host(&cxt);
 }
 
+void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+				phys_addr_t start, phys_addr_t end)
+{
+	struct tlb_inv_context cxt;
+	unsigned long pages, stride;
+
+	/*
+	 * Since the range of addresses may not be mapped at
+	 * the same level, assume the worst case as PAGE_SIZE
+	 */
+	stride = PAGE_SIZE;
+	start = round_down(start, stride);
+	end = round_up(end, stride);
+	pages = (end - start) >> PAGE_SHIFT;
+
+	if (!system_supports_tlb_range() || pages >= MAX_TLBI_RANGE_PAGES) {
+		__kvm_tlb_flush_vmid(mmu);
+		return;
+	}
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	__flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
+
+	dsb(ish);
+	__tlbi(vmalle1is);
+	dsb(ish);
+	isb();
+
+	/* See the comment below in __kvm_tlb_flush_vmid_ipa() */
+	if (icache_is_vpipt())
+		icache_inval_all_pou();
+
+	__tlb_switch_to_host(&cxt);
+}
+
 void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 24cef9b87f9e9..f34d6dd9e4674 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -111,6 +111,41 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
 	__tlb_switch_to_host(&cxt);
 }
 
+void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+				phys_addr_t start, phys_addr_t end)
+{
+	struct tlb_inv_context cxt;
+	unsigned long pages, stride;
+
+	/*
+	 * Since the range of addresses may not be mapped at
+	 * the same level, assume the worst case as PAGE_SIZE
+	 */
+	stride = PAGE_SIZE;
+	start = round_down(start, stride);
+	end = round_up(end, stride);
+	pages = (end - start) >> PAGE_SHIFT;
+
+	if (!system_supports_tlb_range() || pages >= MAX_TLBI_RANGE_PAGES) {
+		__kvm_tlb_flush_vmid(mmu);
+		return;
+	}
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	__flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
+
+	dsb(ish);
+	__tlbi(vmalle1is);
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
+
 void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 2/6] KVM: arm64: Implement  __kvm_tlb_flush_vmid_range()
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

Define  __kvm_tlb_flush_vmid_range() (for VHE and nVHE)
to flush a range of stage-2 page-tables using IPA in one go.
If the system supports FEAT_TLBIRANGE, the following patches
would conviniently replace global TLBI such as vmalls12e1is
in the map, unmap, and dirty-logging paths with ripas2e1is
instead.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/include/asm/kvm_asm.h   |  3 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++++++++
 arch/arm64/kvm/hyp/nvhe/tlb.c      | 39 ++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/tlb.c       | 35 +++++++++++++++++++++++++++
 4 files changed, 88 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 43c3bc0f9544d..33352d9399e32 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
 	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
+	__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
@@ -225,6 +226,8 @@ extern void __kvm_flush_vm_context(void);
 extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 				     int level);
+extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+					phys_addr_t start, phys_addr_t end);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 728e01d4536b0..81d30737dc7c9 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -125,6 +125,16 @@ static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
 	__kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
 }
 
+static void
+handle___kvm_tlb_flush_vmid_range(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
+	DECLARE_REG(phys_addr_t, start, host_ctxt, 2);
+	DECLARE_REG(phys_addr_t, end, host_ctxt, 3);
+
+	__kvm_tlb_flush_vmid_range(kern_hyp_va(mmu), start, end);
+}
+
 static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
@@ -315,6 +325,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
+	HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid),
 	HANDLE_FUNC(__kvm_flush_cpu_context),
 	HANDLE_FUNC(__kvm_timer_set_cntvoff),
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index 978179133f4b9..d4ea549c4b5c4 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -130,6 +130,45 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
 	__tlb_switch_to_host(&cxt);
 }
 
+void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+				phys_addr_t start, phys_addr_t end)
+{
+	struct tlb_inv_context cxt;
+	unsigned long pages, stride;
+
+	/*
+	 * Since the range of addresses may not be mapped at
+	 * the same level, assume the worst case as PAGE_SIZE
+	 */
+	stride = PAGE_SIZE;
+	start = round_down(start, stride);
+	end = round_up(end, stride);
+	pages = (end - start) >> PAGE_SHIFT;
+
+	if (!system_supports_tlb_range() || pages >= MAX_TLBI_RANGE_PAGES) {
+		__kvm_tlb_flush_vmid(mmu);
+		return;
+	}
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	__flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
+
+	dsb(ish);
+	__tlbi(vmalle1is);
+	dsb(ish);
+	isb();
+
+	/* See the comment below in __kvm_tlb_flush_vmid_ipa() */
+	if (icache_is_vpipt())
+		icache_inval_all_pou();
+
+	__tlb_switch_to_host(&cxt);
+}
+
 void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 24cef9b87f9e9..f34d6dd9e4674 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -111,6 +111,41 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
 	__tlb_switch_to_host(&cxt);
 }
 
+void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+				phys_addr_t start, phys_addr_t end)
+{
+	struct tlb_inv_context cxt;
+	unsigned long pages, stride;
+
+	/*
+	 * Since the range of addresses may not be mapped at
+	 * the same level, assume the worst case as PAGE_SIZE
+	 */
+	stride = PAGE_SIZE;
+	start = round_down(start, stride);
+	end = round_up(end, stride);
+	pages = (end - start) >> PAGE_SHIFT;
+
+	if (!system_supports_tlb_range() || pages >= MAX_TLBI_RANGE_PAGES) {
+		__kvm_tlb_flush_vmid(mmu);
+		return;
+	}
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	__flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
+
+	dsb(ish);
+	__tlbi(vmalle1is);
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
+
 void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
-- 
2.40.1.698.g37aff9b760-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  2023-05-19  0:52 ` Raghavendra Rao Ananta
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

Implement kvm_arch_flush_remote_tlbs_range() for arm64
to invalidate the given range in the TLB.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
 arch/arm64/kvm/mmu.c              | 11 +++++++++++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 81ab41b84f436..343fb530eea9c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
 #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
 int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
 
+#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
+
 static inline bool kvm_vm_is_protected(struct kvm *kvm)
 {
 	return false;
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index d4ea549c4b5c4..d2c7c1bc6d441 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 		return;
 	}
 
-	dsb(ishst);
-
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	__tlb_switch_to_guest(mmu, &cxt, false);
 
 	__flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d0a0d3dca9316..e3673b4c10292 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
 	return 0;
 }
 
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
+{
+	phys_addr_t start, end;
+
+	start = start_gfn << PAGE_SHIFT;
+	end = (start_gfn + pages) << PAGE_SHIFT;
+
+	kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
+	return 0;
+}
+
 static bool kvm_is_device_pfn(unsigned long pfn)
 {
 	return !pfn_is_map_memory(pfn);
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

Implement kvm_arch_flush_remote_tlbs_range() for arm64
to invalidate the given range in the TLB.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
 arch/arm64/kvm/mmu.c              | 11 +++++++++++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 81ab41b84f436..343fb530eea9c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
 #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
 int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
 
+#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
+
 static inline bool kvm_vm_is_protected(struct kvm *kvm)
 {
 	return false;
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index d4ea549c4b5c4..d2c7c1bc6d441 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 		return;
 	}
 
-	dsb(ishst);
-
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	__tlb_switch_to_guest(mmu, &cxt, false);
 
 	__flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d0a0d3dca9316..e3673b4c10292 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
 	return 0;
 }
 
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
+{
+	phys_addr_t start, end;
+
+	start = start_gfn << PAGE_SHIFT;
+	end = (start_gfn + pages) << PAGE_SHIFT;
+
+	kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
+	return 0;
+}
+
 static bool kvm_is_device_pfn(unsigned long pfn)
 {
 	return !pfn_is_map_memory(pfn);
-- 
2.40.1.698.g37aff9b760-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 4/6] KVM: arm64: Flush only the memslot after write-protect
  2023-05-19  0:52 ` Raghavendra Rao Ananta
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

After write-protecting the region, currently KVM invalidates
the entire TLB entries using kvm_flush_remote_tlbs(). Instead,
scope the invalidation only to the targeted memslot. If
supported, the architecture would use the range-based TLBI
instructions to flush the memslot or else fallback to flushing
all of the TLBs.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e3673b4c10292..2ea6eb4ea763e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -996,7 +996,7 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	write_lock(&kvm->mmu_lock);
 	stage2_wp_range(&kvm->arch.mmu, start, end);
 	write_unlock(&kvm->mmu_lock);
-	kvm_flush_remote_tlbs(kvm);
+	kvm_flush_remote_tlbs_memslot(kvm, memslot);
 }
 
 /**
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 4/6] KVM: arm64: Flush only the memslot after write-protect
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

After write-protecting the region, currently KVM invalidates
the entire TLB entries using kvm_flush_remote_tlbs(). Instead,
scope the invalidation only to the targeted memslot. If
supported, the architecture would use the range-based TLBI
instructions to flush the memslot or else fallback to flushing
all of the TLBs.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e3673b4c10292..2ea6eb4ea763e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -996,7 +996,7 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	write_lock(&kvm->mmu_lock);
 	stage2_wp_range(&kvm->arch.mmu, start, end);
 	write_unlock(&kvm->mmu_lock);
-	kvm_flush_remote_tlbs(kvm);
+	kvm_flush_remote_tlbs_memslot(kvm, memslot);
 }
 
 /**
-- 
2.40.1.698.g37aff9b760-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 5/6] KVM: arm64: Invalidate the table entries upon a range
  2023-05-19  0:52 ` Raghavendra Rao Ananta
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

Currently, during the operations such as a hugepage collapse,
KVM would flush the entire VM's context using 'vmalls12e1is'
TLBI operation. Specifically, if the VM is faulting on many
hugepages (say after dirty-logging), it creates a performance
penalty for the guest whose pages have already been faulted
earlier as they would have to refill their TLBs again.

Instead, call __kvm_tlb_flush_vmid_range() for table entries.
If the system supports it, only the required range will be
flushed. Else, it'll fallback to the previous mechanism.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3d61bd3e591d2..b8f0dbd12f773 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -745,10 +745,13 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
 	 * Perform the appropriate TLB invalidation based on the evicted pte
 	 * value (if any).
 	 */
-	if (kvm_pte_table(ctx->old, ctx->level))
-		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
-	else if (kvm_pte_valid(ctx->old))
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		u64 end = ctx->addr + kvm_granule_size(ctx->level);
+
+		kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, ctx->addr, end);
+	} else if (kvm_pte_valid(ctx->old)) {
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+	}
 
 	if (stage2_pte_is_counted(ctx->old))
 		mm_ops->put_page(ctx->ptep);
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 5/6] KVM: arm64: Invalidate the table entries upon a range
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

Currently, during the operations such as a hugepage collapse,
KVM would flush the entire VM's context using 'vmalls12e1is'
TLBI operation. Specifically, if the VM is faulting on many
hugepages (say after dirty-logging), it creates a performance
penalty for the guest whose pages have already been faulted
earlier as they would have to refill their TLBs again.

Instead, call __kvm_tlb_flush_vmid_range() for table entries.
If the system supports it, only the required range will be
flushed. Else, it'll fallback to the previous mechanism.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3d61bd3e591d2..b8f0dbd12f773 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -745,10 +745,13 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
 	 * Perform the appropriate TLB invalidation based on the evicted pte
 	 * value (if any).
 	 */
-	if (kvm_pte_table(ctx->old, ctx->level))
-		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
-	else if (kvm_pte_valid(ctx->old))
+	if (kvm_pte_table(ctx->old, ctx->level)) {
+		u64 end = ctx->addr + kvm_granule_size(ctx->level);
+
+		kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, ctx->addr, end);
+	} else if (kvm_pte_valid(ctx->old)) {
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+	}
 
 	if (stage2_pte_is_counted(ctx->old))
 		mm_ops->put_page(ctx->ptep);
-- 
2.40.1.698.g37aff9b760-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
  2023-05-19  0:52 ` Raghavendra Rao Ananta
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

The current implementation of the stage-2 unmap walker traverses
the given range and, as a part of break-before-make, performs
TLB invalidations with a DSB for every PTE. A multitude of this
combination could cause a performance bottleneck.

Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
invalidations until the entire walk is finished, and then
use range-based instructions to invalidate the TLBs in one go.
Condition this upon S2FWB in order to avoid walking the page-table
again to perform the CMOs after issuing the TLBI.

Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
now serves the stage-2 unmap walker specifically, rather than
acting generic.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b8f0dbd12f773..5832ee3418fb0 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
 	smp_store_release(ctx->ptep, new);
 }
 
-static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
-			   struct kvm_pgtable_mm_ops *mm_ops)
+static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
 {
+	/*
+	 * If FEAT_TLBIRANGE is implemented, defer the individial PTE
+	 * TLB invalidations until the entire walk is finished, and
+	 * then use the range-based TLBI instructions to do the
+	 * invalidations. Condition this upon S2FWB in order to avoid
+	 * a page-table walk again to perform the CMOs after TLBI.
+	 */
+	return system_supports_tlb_range() && stage2_has_fwb(pgt);
+}
+
+static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
+				struct kvm_s2_mmu *mmu,
+				struct kvm_pgtable_mm_ops *mm_ops)
+{
+	struct kvm_pgtable *pgt = ctx->arg;
+
 	/*
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
 	if (kvm_pte_valid(ctx->old)) {
 		kvm_clear_pte(ctx->ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+
+		if (!stage2_unmap_defer_tlb_flush(pgt))
+			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
+					ctx->addr, ctx->level);
 	}
 
 	mm_ops->put_page(ctx->ptep);
@@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	 * block entry and rely on the remaining portions being faulted
 	 * back lazily.
 	 */
-	stage2_put_pte(ctx, mmu, mm_ops);
+	stage2_unmap_put_pte(ctx, mmu, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
@@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
 int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
+	int ret;
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_unmap_walker,
 		.arg	= pgt,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
-	return kvm_pgtable_walk(pgt, addr, size, &walker);
+	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+	if (stage2_unmap_defer_tlb_flush(pgt))
+		/* Perform the deferred TLB invalidations */
+		kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
+				addr, addr + size);
+
+	return ret;
 }
 
 struct stage2_attr_data {
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
@ 2023-05-19  0:52   ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-19  0:52 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier, James Morse, Suzuki K Poulose
  Cc: Ricardo Koller, Paolo Bonzini, Jing Zhang, Colton Lewis,
	Raghavendra Rao Anata, linux-arm-kernel, kvmarm, linux-kernel,
	kvm

The current implementation of the stage-2 unmap walker traverses
the given range and, as a part of break-before-make, performs
TLB invalidations with a DSB for every PTE. A multitude of this
combination could cause a performance bottleneck.

Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
invalidations until the entire walk is finished, and then
use range-based instructions to invalidate the TLBs in one go.
Condition this upon S2FWB in order to avoid walking the page-table
again to perform the CMOs after issuing the TLBI.

Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
now serves the stage-2 unmap walker specifically, rather than
acting generic.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b8f0dbd12f773..5832ee3418fb0 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
 	smp_store_release(ctx->ptep, new);
 }
 
-static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
-			   struct kvm_pgtable_mm_ops *mm_ops)
+static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
 {
+	/*
+	 * If FEAT_TLBIRANGE is implemented, defer the individial PTE
+	 * TLB invalidations until the entire walk is finished, and
+	 * then use the range-based TLBI instructions to do the
+	 * invalidations. Condition this upon S2FWB in order to avoid
+	 * a page-table walk again to perform the CMOs after TLBI.
+	 */
+	return system_supports_tlb_range() && stage2_has_fwb(pgt);
+}
+
+static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
+				struct kvm_s2_mmu *mmu,
+				struct kvm_pgtable_mm_ops *mm_ops)
+{
+	struct kvm_pgtable *pgt = ctx->arg;
+
 	/*
 	 * Clear the existing PTE, and perform break-before-make with
 	 * TLB maintenance if it was valid.
 	 */
 	if (kvm_pte_valid(ctx->old)) {
 		kvm_clear_pte(ctx->ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+
+		if (!stage2_unmap_defer_tlb_flush(pgt))
+			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
+					ctx->addr, ctx->level);
 	}
 
 	mm_ops->put_page(ctx->ptep);
@@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	 * block entry and rely on the remaining portions being faulted
 	 * back lazily.
 	 */
-	stage2_put_pte(ctx, mmu, mm_ops);
+	stage2_unmap_put_pte(ctx, mmu, mm_ops);
 
 	if (need_flush && mm_ops->dcache_clean_inval_poc)
 		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
@@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
 int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
+	int ret;
 	struct kvm_pgtable_walker walker = {
 		.cb	= stage2_unmap_walker,
 		.arg	= pgt,
 		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 	};
 
-	return kvm_pgtable_walk(pgt, addr, size, &walker);
+	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+	if (stage2_unmap_defer_tlb_flush(pgt))
+		/* Perform the deferred TLB invalidations */
+		kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
+				addr, addr + size);
+
+	return ret;
 }
 
 struct stage2_attr_data {
-- 
2.40.1.698.g37aff9b760-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
  2023-05-19  0:52   ` Raghavendra Rao Ananta
@ 2023-05-21 19:32     ` Oliver Upton
  -1 siblings, 0 replies; 36+ messages in thread
From: Oliver Upton @ 2023-05-21 19:32 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, May 19, 2023 at 12:52:31AM +0000, Raghavendra Rao Ananta wrote:
> The current implementation of the stage-2 unmap walker traverses
> the given range and, as a part of break-before-make, performs
> TLB invalidations with a DSB for every PTE. A multitude of this
> combination could cause a performance bottleneck.
> 
> Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> invalidations until the entire walk is finished, and then
> use range-based instructions to invalidate the TLBs in one go.
> Condition this upon S2FWB in order to avoid walking the page-table
> again to perform the CMOs after issuing the TLBI.

nit: Rather than discussing a theoretical CMO walker, I think this is
more readable if you mention the existing behavior of the walker.

  Condition deferred TLB invalidation on the system supporting FWB, as
  the optimization is entirely pointless when the unmap walker needs to
  perform CMOs.

> Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> now serves the stage-2 unmap walker specifically, rather than
> acting generic.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b8f0dbd12f773..5832ee3418fb0 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
>  	smp_store_release(ctx->ptep, new);
>  }
>  
> -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> -			   struct kvm_pgtable_mm_ops *mm_ops)
> +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
>  {
> +	/*
> +	 * If FEAT_TLBIRANGE is implemented, defer the individial PTE

typo: individual

Also, 'PTE' isn't significant here.

> +	 * TLB invalidations until the entire walk is finished, and
> +	 * then use the range-based TLBI instructions to do the
> +	 * invalidations. Condition this upon S2FWB in order to avoid
> +	 * a page-table walk again to perform the CMOs after TLBI.
> +	 */

Apply the wording suggestion from the changelog here as well.

> +	return system_supports_tlb_range() && stage2_has_fwb(pgt);
> +}
> +
> +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> +				struct kvm_s2_mmu *mmu,
> +				struct kvm_pgtable_mm_ops *mm_ops)
> +{
> +	struct kvm_pgtable *pgt = ctx->arg;
> +
>  	/*
>  	 * Clear the existing PTE, and perform break-before-make with
>  	 * TLB maintenance if it was valid.
>  	 */
>  	if (kvm_pte_valid(ctx->old)) {
>  		kvm_clear_pte(ctx->ptep);
> -		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +
> +		if (!stage2_unmap_defer_tlb_flush(pgt))
> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> +					ctx->addr, ctx->level);
>  	}
>  
>  	mm_ops->put_page(ctx->ptep);
> @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	 * block entry and rely on the remaining portions being faulted
>  	 * back lazily.
>  	 */
> -	stage2_put_pte(ctx, mmu, mm_ops);
> +	stage2_unmap_put_pte(ctx, mmu, mm_ops);
>  
>  	if (need_flush && mm_ops->dcache_clean_inval_poc)
>  		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  
>  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
> +	int ret;
>  	struct kvm_pgtable_walker walker = {
>  		.cb	= stage2_unmap_walker,
>  		.arg	= pgt,
>  		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>  	};
>  
> -	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	if (stage2_unmap_defer_tlb_flush(pgt))
> +		/* Perform the deferred TLB invalidations */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> +				addr, addr + size);
> +
> +	return ret;
>  }
>  
>  struct stage2_attr_data {
> -- 
> 2.40.1.698.g37aff9b760-goog
> 

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
@ 2023-05-21 19:32     ` Oliver Upton
  0 siblings, 0 replies; 36+ messages in thread
From: Oliver Upton @ 2023-05-21 19:32 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Marc Zyngier, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, May 19, 2023 at 12:52:31AM +0000, Raghavendra Rao Ananta wrote:
> The current implementation of the stage-2 unmap walker traverses
> the given range and, as a part of break-before-make, performs
> TLB invalidations with a DSB for every PTE. A multitude of this
> combination could cause a performance bottleneck.
> 
> Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> invalidations until the entire walk is finished, and then
> use range-based instructions to invalidate the TLBs in one go.
> Condition this upon S2FWB in order to avoid walking the page-table
> again to perform the CMOs after issuing the TLBI.

nit: Rather than discussing a theoretical CMO walker, I think this is
more readable if you mention the existing behavior of the walker.

  Condition deferred TLB invalidation on the system supporting FWB, as
  the optimization is entirely pointless when the unmap walker needs to
  perform CMOs.

> Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> now serves the stage-2 unmap walker specifically, rather than
> acting generic.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b8f0dbd12f773..5832ee3418fb0 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
>  	smp_store_release(ctx->ptep, new);
>  }
>  
> -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> -			   struct kvm_pgtable_mm_ops *mm_ops)
> +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
>  {
> +	/*
> +	 * If FEAT_TLBIRANGE is implemented, defer the individial PTE

typo: individual

Also, 'PTE' isn't significant here.

> +	 * TLB invalidations until the entire walk is finished, and
> +	 * then use the range-based TLBI instructions to do the
> +	 * invalidations. Condition this upon S2FWB in order to avoid
> +	 * a page-table walk again to perform the CMOs after TLBI.
> +	 */

Apply the wording suggestion from the changelog here as well.

> +	return system_supports_tlb_range() && stage2_has_fwb(pgt);
> +}
> +
> +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> +				struct kvm_s2_mmu *mmu,
> +				struct kvm_pgtable_mm_ops *mm_ops)
> +{
> +	struct kvm_pgtable *pgt = ctx->arg;
> +
>  	/*
>  	 * Clear the existing PTE, and perform break-before-make with
>  	 * TLB maintenance if it was valid.
>  	 */
>  	if (kvm_pte_valid(ctx->old)) {
>  		kvm_clear_pte(ctx->ptep);
> -		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +
> +		if (!stage2_unmap_defer_tlb_flush(pgt))
> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> +					ctx->addr, ctx->level);
>  	}
>  
>  	mm_ops->put_page(ctx->ptep);
> @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	 * block entry and rely on the remaining portions being faulted
>  	 * back lazily.
>  	 */
> -	stage2_put_pte(ctx, mmu, mm_ops);
> +	stage2_unmap_put_pte(ctx, mmu, mm_ops);
>  
>  	if (need_flush && mm_ops->dcache_clean_inval_poc)
>  		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  
>  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
> +	int ret;
>  	struct kvm_pgtable_walker walker = {
>  		.cb	= stage2_unmap_walker,
>  		.arg	= pgt,
>  		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>  	};
>  
> -	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	if (stage2_unmap_defer_tlb_flush(pgt))
> +		/* Perform the deferred TLB invalidations */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> +				addr, addr + size);
> +
> +	return ret;
>  }
>  
>  struct stage2_attr_data {
> -- 
> 2.40.1.698.g37aff9b760-goog
> 

-- 
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 2/6] KVM: arm64: Implement  __kvm_tlb_flush_vmid_range()
  2023-05-19  0:52   ` Raghavendra Rao Ananta
@ 2023-05-29 13:54     ` Marc Zyngier
  -1 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-29 13:54 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, 19 May 2023 01:52:27 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> Define  __kvm_tlb_flush_vmid_range() (for VHE and nVHE)
> to flush a range of stage-2 page-tables using IPA in one go.
> If the system supports FEAT_TLBIRANGE, the following patches
> would conviniently replace global TLBI such as vmalls12e1is
> in the map, unmap, and dirty-logging paths with ripas2e1is
> instead.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h   |  3 +++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++++++++
>  arch/arm64/kvm/hyp/nvhe/tlb.c      | 39 ++++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/vhe/tlb.c       | 35 +++++++++++++++++++++++++++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 43c3bc0f9544d..33352d9399e32 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
>  	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
> +	__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,

nit: please keep this close to the other TLB operations.

>  };
>  
>  #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
> @@ -225,6 +226,8 @@ extern void __kvm_flush_vm_context(void);
>  extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
>  				     int level);
> +extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> +					phys_addr_t start, phys_addr_t end);
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 728e01d4536b0..81d30737dc7c9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -125,6 +125,16 @@ static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
>  	__kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
>  }
>  
> +static void
> +handle___kvm_tlb_flush_vmid_range(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> +	DECLARE_REG(phys_addr_t, start, host_ctxt, 2);
> +	DECLARE_REG(phys_addr_t, end, host_ctxt, 3);
> +
> +	__kvm_tlb_flush_vmid_range(kern_hyp_va(mmu), start, end);
> +}
> +
>  static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
>  {
>  	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> @@ -315,6 +325,7 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__kvm_vcpu_run),
>  	HANDLE_FUNC(__kvm_flush_vm_context),
>  	HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
> +	HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
>  	HANDLE_FUNC(__kvm_tlb_flush_vmid),
>  	HANDLE_FUNC(__kvm_flush_cpu_context),
>  	HANDLE_FUNC(__kvm_timer_set_cntvoff),
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index 978179133f4b9..d4ea549c4b5c4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -130,6 +130,45 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
>  	__tlb_switch_to_host(&cxt);
>  }
>  
> +void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> +				phys_addr_t start, phys_addr_t end)
> +{
> +	struct tlb_inv_context cxt;
> +	unsigned long pages, stride;
> +
> +	/*
> +	 * Since the range of addresses may not be mapped at
> +	 * the same level, assume the worst case as PAGE_SIZE
> +	 */
> +	stride = PAGE_SIZE;
> +	start = round_down(start, stride);
> +	end = round_up(end, stride);
> +	pages = (end - start) >> PAGE_SHIFT;
> +
> +	if (!system_supports_tlb_range() || pages >= MAX_TLBI_RANGE_PAGES) {
> +		__kvm_tlb_flush_vmid(mmu);
> +		return;

Why do we give up on "pages >= MAX_TLBI_RANGE_PAGES"? I see no
rationale for it in the patch. My understanding is that this is the
maximum representable as a range, in which case this is a programming
error.

Or are you *on purpose* making the two equivalent?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 2/6] KVM: arm64: Implement  __kvm_tlb_flush_vmid_range()
@ 2023-05-29 13:54     ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-29 13:54 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, 19 May 2023 01:52:27 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> Define  __kvm_tlb_flush_vmid_range() (for VHE and nVHE)
> to flush a range of stage-2 page-tables using IPA in one go.
> If the system supports FEAT_TLBIRANGE, the following patches
> would conviniently replace global TLBI such as vmalls12e1is
> in the map, unmap, and dirty-logging paths with ripas2e1is
> instead.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h   |  3 +++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++++++++
>  arch/arm64/kvm/hyp/nvhe/tlb.c      | 39 ++++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/vhe/tlb.c       | 35 +++++++++++++++++++++++++++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 43c3bc0f9544d..33352d9399e32 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
>  	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
> +	__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,

nit: please keep this close to the other TLB operations.

>  };
>  
>  #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
> @@ -225,6 +226,8 @@ extern void __kvm_flush_vm_context(void);
>  extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
>  				     int level);
> +extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> +					phys_addr_t start, phys_addr_t end);
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 728e01d4536b0..81d30737dc7c9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -125,6 +125,16 @@ static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
>  	__kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
>  }
>  
> +static void
> +handle___kvm_tlb_flush_vmid_range(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> +	DECLARE_REG(phys_addr_t, start, host_ctxt, 2);
> +	DECLARE_REG(phys_addr_t, end, host_ctxt, 3);
> +
> +	__kvm_tlb_flush_vmid_range(kern_hyp_va(mmu), start, end);
> +}
> +
>  static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
>  {
>  	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> @@ -315,6 +325,7 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__kvm_vcpu_run),
>  	HANDLE_FUNC(__kvm_flush_vm_context),
>  	HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
> +	HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
>  	HANDLE_FUNC(__kvm_tlb_flush_vmid),
>  	HANDLE_FUNC(__kvm_flush_cpu_context),
>  	HANDLE_FUNC(__kvm_timer_set_cntvoff),
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index 978179133f4b9..d4ea549c4b5c4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -130,6 +130,45 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
>  	__tlb_switch_to_host(&cxt);
>  }
>  
> +void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> +				phys_addr_t start, phys_addr_t end)
> +{
> +	struct tlb_inv_context cxt;
> +	unsigned long pages, stride;
> +
> +	/*
> +	 * Since the range of addresses may not be mapped at
> +	 * the same level, assume the worst case as PAGE_SIZE
> +	 */
> +	stride = PAGE_SIZE;
> +	start = round_down(start, stride);
> +	end = round_up(end, stride);
> +	pages = (end - start) >> PAGE_SHIFT;
> +
> +	if (!system_supports_tlb_range() || pages >= MAX_TLBI_RANGE_PAGES) {
> +		__kvm_tlb_flush_vmid(mmu);
> +		return;

Why do we give up on "pages >= MAX_TLBI_RANGE_PAGES"? I see no
rationale for it in the patch. My understanding is that this is the
maximum representable as a range, in which case this is a programming
error.

Or are you *on purpose* making the two equivalent?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  2023-05-19  0:52   ` Raghavendra Rao Ananta
@ 2023-05-29 14:00     ` Marc Zyngier
  -1 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-29 14:00 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, 19 May 2023 01:52:28 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> Implement kvm_arch_flush_remote_tlbs_range() for arm64
> to invalidate the given range in the TLB.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
>  arch/arm64/kvm/mmu.c              | 11 +++++++++++
>  3 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 81ab41b84f436..343fb530eea9c 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
>  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
>  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
>  
> +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> +
>  static inline bool kvm_vm_is_protected(struct kvm *kvm)
>  {
>  	return false;
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index d4ea549c4b5c4..d2c7c1bc6d441 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  		return;
>  	}
>  
> -	dsb(ishst);
> -
>  	/* Switch to requested VMID */
> -	__tlb_switch_to_guest(mmu, &cxt);
> +	__tlb_switch_to_guest(mmu, &cxt, false);

This hunk is in the wrong patch, isn't it?

>  
>  	__flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d0a0d3dca9316..e3673b4c10292 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>  	return 0;
>  }
>  
> +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> +{
> +	phys_addr_t start, end;
> +
> +	start = start_gfn << PAGE_SHIFT;
> +	end = (start_gfn + pages) << PAGE_SHIFT;
> +
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);

So that's the point that I think is not right. It is the MMU code that
should drive the invalidation method, and not the HYP code. The HYP
code should be as dumb as possible, and the logic should be kept in
the MMU code.

So when a range invalidation is forwarded to HYP, it's a *valid* range
invalidation. not something that can fallback to VMID-wide invalidation.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
@ 2023-05-29 14:00     ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-29 14:00 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, 19 May 2023 01:52:28 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> Implement kvm_arch_flush_remote_tlbs_range() for arm64
> to invalidate the given range in the TLB.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
>  arch/arm64/kvm/mmu.c              | 11 +++++++++++
>  3 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 81ab41b84f436..343fb530eea9c 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
>  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
>  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
>  
> +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> +
>  static inline bool kvm_vm_is_protected(struct kvm *kvm)
>  {
>  	return false;
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index d4ea549c4b5c4..d2c7c1bc6d441 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  		return;
>  	}
>  
> -	dsb(ishst);
> -
>  	/* Switch to requested VMID */
> -	__tlb_switch_to_guest(mmu, &cxt);
> +	__tlb_switch_to_guest(mmu, &cxt, false);

This hunk is in the wrong patch, isn't it?

>  
>  	__flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d0a0d3dca9316..e3673b4c10292 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
>  	return 0;
>  }
>  
> +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> +{
> +	phys_addr_t start, end;
> +
> +	start = start_gfn << PAGE_SHIFT;
> +	end = (start_gfn + pages) << PAGE_SHIFT;
> +
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);

So that's the point that I think is not right. It is the MMU code that
should drive the invalidation method, and not the HYP code. The HYP
code should be as dumb as possible, and the logic should be kept in
the MMU code.

So when a range invalidation is forwarded to HYP, it's a *valid* range
invalidation. not something that can fallback to VMID-wide invalidation.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
  2023-05-19  0:52   ` Raghavendra Rao Ananta
@ 2023-05-29 14:18     ` Marc Zyngier
  -1 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-29 14:18 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, 19 May 2023 01:52:31 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> The current implementation of the stage-2 unmap walker traverses
> the given range and, as a part of break-before-make, performs
> TLB invalidations with a DSB for every PTE. A multitude of this
> combination could cause a performance bottleneck.
> 
> Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> invalidations until the entire walk is finished, and then
> use range-based instructions to invalidate the TLBs in one go.
> Condition this upon S2FWB in order to avoid walking the page-table
> again to perform the CMOs after issuing the TLBI.

But that's the real bottleneck. TLBIs are cheap compared to CMOs, even
on remarkably bad implementations. What is your plan to fix this?

> 
> Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> now serves the stage-2 unmap walker specifically, rather than
> acting generic.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b8f0dbd12f773..5832ee3418fb0 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
>  	smp_store_release(ctx->ptep, new);
>  }
>  
> -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> -			   struct kvm_pgtable_mm_ops *mm_ops)
> +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
>  {
> +	/*
> +	 * If FEAT_TLBIRANGE is implemented, defer the individial PTE
> +	 * TLB invalidations until the entire walk is finished, and
> +	 * then use the range-based TLBI instructions to do the
> +	 * invalidations. Condition this upon S2FWB in order to avoid
> +	 * a page-table walk again to perform the CMOs after TLBI.
> +	 */
> +	return system_supports_tlb_range() && stage2_has_fwb(pgt);
> +}
> +
> +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> +				struct kvm_s2_mmu *mmu,
> +				struct kvm_pgtable_mm_ops *mm_ops)
> +{
> +	struct kvm_pgtable *pgt = ctx->arg;
> +
>  	/*
>  	 * Clear the existing PTE, and perform break-before-make with
>  	 * TLB maintenance if it was valid.
>  	 */
>  	if (kvm_pte_valid(ctx->old)) {
>  		kvm_clear_pte(ctx->ptep);
> -		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +
> +		if (!stage2_unmap_defer_tlb_flush(pgt))
> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> +					ctx->addr, ctx->level);

This really doesn't match the comment anymore.

Overall, I'm very concerned that we lose the consistency property that
the current code has: once called, the TLBs and the page tables are
synchronised.

Yes, this patch looks correct. But it is also really fragile.

>  	}
>  
>  	mm_ops->put_page(ctx->ptep);
> @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	 * block entry and rely on the remaining portions being faulted
>  	 * back lazily.
>  	 */
> -	stage2_put_pte(ctx, mmu, mm_ops);
> +	stage2_unmap_put_pte(ctx, mmu, mm_ops);
>  
>  	if (need_flush && mm_ops->dcache_clean_inval_poc)
>  		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  
>  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
> +	int ret;
>  	struct kvm_pgtable_walker walker = {
>  		.cb	= stage2_unmap_walker,
>  		.arg	= pgt,
>  		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>  	};
>  
> -	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	if (stage2_unmap_defer_tlb_flush(pgt))
> +		/* Perform the deferred TLB invalidations */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> +				addr, addr + size);

This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a
wrapper from the point where you introduce it.

> +
> +	return ret;
>  }
>  

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
@ 2023-05-29 14:18     ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-29 14:18 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, 19 May 2023 01:52:31 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> The current implementation of the stage-2 unmap walker traverses
> the given range and, as a part of break-before-make, performs
> TLB invalidations with a DSB for every PTE. A multitude of this
> combination could cause a performance bottleneck.
> 
> Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> invalidations until the entire walk is finished, and then
> use range-based instructions to invalidate the TLBs in one go.
> Condition this upon S2FWB in order to avoid walking the page-table
> again to perform the CMOs after issuing the TLBI.

But that's the real bottleneck. TLBIs are cheap compared to CMOs, even
on remarkably bad implementations. What is your plan to fix this?

> 
> Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> now serves the stage-2 unmap walker specifically, rather than
> acting generic.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b8f0dbd12f773..5832ee3418fb0 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
>  	smp_store_release(ctx->ptep, new);
>  }
>  
> -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> -			   struct kvm_pgtable_mm_ops *mm_ops)
> +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
>  {
> +	/*
> +	 * If FEAT_TLBIRANGE is implemented, defer the individial PTE
> +	 * TLB invalidations until the entire walk is finished, and
> +	 * then use the range-based TLBI instructions to do the
> +	 * invalidations. Condition this upon S2FWB in order to avoid
> +	 * a page-table walk again to perform the CMOs after TLBI.
> +	 */
> +	return system_supports_tlb_range() && stage2_has_fwb(pgt);
> +}
> +
> +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> +				struct kvm_s2_mmu *mmu,
> +				struct kvm_pgtable_mm_ops *mm_ops)
> +{
> +	struct kvm_pgtable *pgt = ctx->arg;
> +
>  	/*
>  	 * Clear the existing PTE, and perform break-before-make with
>  	 * TLB maintenance if it was valid.
>  	 */
>  	if (kvm_pte_valid(ctx->old)) {
>  		kvm_clear_pte(ctx->ptep);
> -		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> +
> +		if (!stage2_unmap_defer_tlb_flush(pgt))
> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> +					ctx->addr, ctx->level);

This really doesn't match the comment anymore.

Overall, I'm very concerned that we lose the consistency property that
the current code has: once called, the TLBs and the page tables are
synchronised.

Yes, this patch looks correct. But it is also really fragile.

>  	}
>  
>  	mm_ops->put_page(ctx->ptep);
> @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	 * block entry and rely on the remaining portions being faulted
>  	 * back lazily.
>  	 */
> -	stage2_put_pte(ctx, mmu, mm_ops);
> +	stage2_unmap_put_pte(ctx, mmu, mm_ops);
>  
>  	if (need_flush && mm_ops->dcache_clean_inval_poc)
>  		mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  
>  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  {
> +	int ret;
>  	struct kvm_pgtable_walker walker = {
>  		.cb	= stage2_unmap_walker,
>  		.arg	= pgt,
>  		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>  	};
>  
> -	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	if (stage2_unmap_defer_tlb_flush(pgt))
> +		/* Perform the deferred TLB invalidations */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> +				addr, addr + size);

This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a
wrapper from the point where you introduce it.

> +
> +	return ret;
>  }
>  

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 2/6] KVM: arm64: Implement __kvm_tlb_flush_vmid_range()
  2023-05-29 13:54     ` Marc Zyngier
@ 2023-05-30 21:14       ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-30 21:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Mon, May 29, 2023 at 6:54 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 19 May 2023 01:52:27 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > Define  __kvm_tlb_flush_vmid_range() (for VHE and nVHE)
> > to flush a range of stage-2 page-tables using IPA in one go.
> > If the system supports FEAT_TLBIRANGE, the following patches
> > would conviniently replace global TLBI such as vmalls12e1is
> > in the map, unmap, and dirty-logging paths with ripas2e1is
> > instead.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h   |  3 +++
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++++++++
> >  arch/arm64/kvm/hyp/nvhe/tlb.c      | 39 ++++++++++++++++++++++++++++++
> >  arch/arm64/kvm/hyp/vhe/tlb.c       | 35 +++++++++++++++++++++++++++
> >  4 files changed, 88 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 43c3bc0f9544d..33352d9399e32 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
> >       __KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
> >       __KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
> >       __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
> > +     __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
>
> nit: please keep this close to the other TLB operations.
>
Sure, I'll reorder this.

> >  };
> >
> >  #define DECLARE_KVM_VHE_SYM(sym)     extern char sym[]
> > @@ -225,6 +226,8 @@ extern void __kvm_flush_vm_context(void);
> >  extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
> >  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
> >                                    int level);
> > +extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > +                                     phys_addr_t start, phys_addr_t end);
> >  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
> >
> >  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 728e01d4536b0..81d30737dc7c9 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -125,6 +125,16 @@ static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
> >       __kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
> >  }
> >
> > +static void
> > +handle___kvm_tlb_flush_vmid_range(struct kvm_cpu_context *host_ctxt)
> > +{
> > +     DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> > +     DECLARE_REG(phys_addr_t, start, host_ctxt, 2);
> > +     DECLARE_REG(phys_addr_t, end, host_ctxt, 3);
> > +
> > +     __kvm_tlb_flush_vmid_range(kern_hyp_va(mmu), start, end);
> > +}
> > +
> >  static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
> >  {
> >       DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> > @@ -315,6 +325,7 @@ static const hcall_t host_hcall[] = {
> >       HANDLE_FUNC(__kvm_vcpu_run),
> >       HANDLE_FUNC(__kvm_flush_vm_context),
> >       HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
> > +     HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
> >       HANDLE_FUNC(__kvm_tlb_flush_vmid),
> >       HANDLE_FUNC(__kvm_flush_cpu_context),
> >       HANDLE_FUNC(__kvm_timer_set_cntvoff),
> > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > index 978179133f4b9..d4ea549c4b5c4 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > @@ -130,6 +130,45 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
> >       __tlb_switch_to_host(&cxt);
> >  }
> >
> > +void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > +                             phys_addr_t start, phys_addr_t end)
> > +{
> > +     struct tlb_inv_context cxt;
> > +     unsigned long pages, stride;
> > +
> > +     /*
> > +      * Since the range of addresses may not be mapped at
> > +      * the same level, assume the worst case as PAGE_SIZE
> > +      */
> > +     stride = PAGE_SIZE;
> > +     start = round_down(start, stride);
> > +     end = round_up(end, stride);
> > +     pages = (end - start) >> PAGE_SHIFT;
> > +
> > +     if (!system_supports_tlb_range() || pages >= MAX_TLBI_RANGE_PAGES) {
> > +             __kvm_tlb_flush_vmid(mmu);
> > +             return;
>
> Why do we give up on "pages >= MAX_TLBI_RANGE_PAGES"? I see no
> rationale for it in the patch. My understanding is that this is the
> maximum representable as a range, in which case this is a programming
> error.
>
> Or are you *on purpose* making the two equivalent?
>
Yes basically, I was trying to align the logic with what we have for
__flush_tlb_range(). But, if you feel that it's mostly caused by a
programming error, do we want to not do any flush at all?

Thank you.

Raghavendra
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 2/6] KVM: arm64: Implement __kvm_tlb_flush_vmid_range()
@ 2023-05-30 21:14       ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-30 21:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Mon, May 29, 2023 at 6:54 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 19 May 2023 01:52:27 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > Define  __kvm_tlb_flush_vmid_range() (for VHE and nVHE)
> > to flush a range of stage-2 page-tables using IPA in one go.
> > If the system supports FEAT_TLBIRANGE, the following patches
> > would conviniently replace global TLBI such as vmalls12e1is
> > in the map, unmap, and dirty-logging paths with ripas2e1is
> > instead.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h   |  3 +++
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++++++++
> >  arch/arm64/kvm/hyp/nvhe/tlb.c      | 39 ++++++++++++++++++++++++++++++
> >  arch/arm64/kvm/hyp/vhe/tlb.c       | 35 +++++++++++++++++++++++++++
> >  4 files changed, 88 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 43c3bc0f9544d..33352d9399e32 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
> >       __KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
> >       __KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
> >       __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
> > +     __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
>
> nit: please keep this close to the other TLB operations.
>
Sure, I'll reorder this.

> >  };
> >
> >  #define DECLARE_KVM_VHE_SYM(sym)     extern char sym[]
> > @@ -225,6 +226,8 @@ extern void __kvm_flush_vm_context(void);
> >  extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
> >  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
> >                                    int level);
> > +extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > +                                     phys_addr_t start, phys_addr_t end);
> >  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
> >
> >  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index 728e01d4536b0..81d30737dc7c9 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -125,6 +125,16 @@ static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
> >       __kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
> >  }
> >
> > +static void
> > +handle___kvm_tlb_flush_vmid_range(struct kvm_cpu_context *host_ctxt)
> > +{
> > +     DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> > +     DECLARE_REG(phys_addr_t, start, host_ctxt, 2);
> > +     DECLARE_REG(phys_addr_t, end, host_ctxt, 3);
> > +
> > +     __kvm_tlb_flush_vmid_range(kern_hyp_va(mmu), start, end);
> > +}
> > +
> >  static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
> >  {
> >       DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> > @@ -315,6 +325,7 @@ static const hcall_t host_hcall[] = {
> >       HANDLE_FUNC(__kvm_vcpu_run),
> >       HANDLE_FUNC(__kvm_flush_vm_context),
> >       HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
> > +     HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
> >       HANDLE_FUNC(__kvm_tlb_flush_vmid),
> >       HANDLE_FUNC(__kvm_flush_cpu_context),
> >       HANDLE_FUNC(__kvm_timer_set_cntvoff),
> > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > index 978179133f4b9..d4ea549c4b5c4 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > @@ -130,6 +130,45 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
> >       __tlb_switch_to_host(&cxt);
> >  }
> >
> > +void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > +                             phys_addr_t start, phys_addr_t end)
> > +{
> > +     struct tlb_inv_context cxt;
> > +     unsigned long pages, stride;
> > +
> > +     /*
> > +      * Since the range of addresses may not be mapped at
> > +      * the same level, assume the worst case as PAGE_SIZE
> > +      */
> > +     stride = PAGE_SIZE;
> > +     start = round_down(start, stride);
> > +     end = round_up(end, stride);
> > +     pages = (end - start) >> PAGE_SHIFT;
> > +
> > +     if (!system_supports_tlb_range() || pages >= MAX_TLBI_RANGE_PAGES) {
> > +             __kvm_tlb_flush_vmid(mmu);
> > +             return;
>
> Why do we give up on "pages >= MAX_TLBI_RANGE_PAGES"? I see no
> rationale for it in the patch. My understanding is that this is the
> maximum representable as a range, in which case this is a programming
> error.
>
> Or are you *on purpose* making the two equivalent?
>
Yes basically, I was trying to align the logic with what we have for
__flush_tlb_range(). But, if you feel that it's mostly caused by a
programming error, do we want to not do any flush at all?

Thank you.

Raghavendra
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  2023-05-29 14:00     ` Marc Zyngier
@ 2023-05-30 21:22       ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-30 21:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 19 May 2023 01:52:28 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > to invalidate the given range in the TLB.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
> >  arch/arm64/kvm/mmu.c              | 11 +++++++++++
> >  3 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 81ab41b84f436..343fb530eea9c 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> >  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> >  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> >
> > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > +
> >  static inline bool kvm_vm_is_protected(struct kvm *kvm)
> >  {
> >       return false;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> >               return;
> >       }
> >
> > -     dsb(ishst);
> > -
> >       /* Switch to requested VMID */
> > -     __tlb_switch_to_guest(mmu, &cxt);
> > +     __tlb_switch_to_guest(mmu, &cxt, false);
>
> This hunk is in the wrong patch, isn't it?
>
Ah, you are right. It should be part of the previous patch. I think I
introduced it accidentally when I rebased the series. I'll remove it
in the next spin.


> >
> >       __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index d0a0d3dca9316..e3673b4c10292 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> >       return 0;
> >  }
> >
> > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > +{
> > +     phys_addr_t start, end;
> > +
> > +     start = start_gfn << PAGE_SHIFT;
> > +     end = (start_gfn + pages) << PAGE_SHIFT;
> > +
> > +     kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
>
> So that's the point that I think is not right. It is the MMU code that
> should drive the invalidation method, and not the HYP code. The HYP
> code should be as dumb as possible, and the logic should be kept in
> the MMU code.
>
> So when a range invalidation is forwarded to HYP, it's a *valid* range
> invalidation. not something that can fallback to VMID-wide invalidation.
>
I'm guessing that you are referring to patch-2. Do you recommend
moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
return an error? How about for the other check:
system_supports_tlb_range()?
The idea was for __kvm_tlb_flush_vmid_range() to also implement a
fallback mechanism in case the system doesn't support the range-based
instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
from multiple cases, we'd end up duplicating the checks. WDYT?


> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
@ 2023-05-30 21:22       ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-30 21:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 19 May 2023 01:52:28 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > to invalidate the given range in the TLB.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
> >  arch/arm64/kvm/mmu.c              | 11 +++++++++++
> >  3 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 81ab41b84f436..343fb530eea9c 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> >  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> >  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> >
> > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > +
> >  static inline bool kvm_vm_is_protected(struct kvm *kvm)
> >  {
> >       return false;
> > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> >               return;
> >       }
> >
> > -     dsb(ishst);
> > -
> >       /* Switch to requested VMID */
> > -     __tlb_switch_to_guest(mmu, &cxt);
> > +     __tlb_switch_to_guest(mmu, &cxt, false);
>
> This hunk is in the wrong patch, isn't it?
>
Ah, you are right. It should be part of the previous patch. I think I
introduced it accidentally when I rebased the series. I'll remove it
in the next spin.


> >
> >       __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index d0a0d3dca9316..e3673b4c10292 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> >       return 0;
> >  }
> >
> > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > +{
> > +     phys_addr_t start, end;
> > +
> > +     start = start_gfn << PAGE_SHIFT;
> > +     end = (start_gfn + pages) << PAGE_SHIFT;
> > +
> > +     kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
>
> So that's the point that I think is not right. It is the MMU code that
> should drive the invalidation method, and not the HYP code. The HYP
> code should be as dumb as possible, and the logic should be kept in
> the MMU code.
>
> So when a range invalidation is forwarded to HYP, it's a *valid* range
> invalidation. not something that can fallback to VMID-wide invalidation.
>
I'm guessing that you are referring to patch-2. Do you recommend
moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
return an error? How about for the other check:
system_supports_tlb_range()?
The idea was for __kvm_tlb_flush_vmid_range() to also implement a
fallback mechanism in case the system doesn't support the range-based
instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
from multiple cases, we'd end up duplicating the checks. WDYT?


> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
  2023-05-29 14:18     ` Marc Zyngier
@ 2023-05-30 21:35       ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-30 21:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Mon, May 29, 2023 at 7:18 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 19 May 2023 01:52:31 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > The current implementation of the stage-2 unmap walker traverses
> > the given range and, as a part of break-before-make, performs
> > TLB invalidations with a DSB for every PTE. A multitude of this
> > combination could cause a performance bottleneck.
> >
> > Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> > invalidations until the entire walk is finished, and then
> > use range-based instructions to invalidate the TLBs in one go.
> > Condition this upon S2FWB in order to avoid walking the page-table
> > again to perform the CMOs after issuing the TLBI.
>
> But that's the real bottleneck. TLBIs are cheap compared to CMOs, even
> on remarkably bad implementations. What is your plan to fix this?
>
Correct me if I'm wrong, but my understanding was that a multiple
issuance of TLBI + DSB was the bottleneck, and this patch tries to
avoid this by issuing only one TLBI + DSB at the end.
> >
> > Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> > now serves the stage-2 unmap walker specifically, rather than
> > acting generic.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
> >  1 file changed, 30 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index b8f0dbd12f773..5832ee3418fb0 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
> >       smp_store_release(ctx->ptep, new);
> >  }
> >
> > -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> > -                        struct kvm_pgtable_mm_ops *mm_ops)
> > +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
> >  {
> > +     /*
> > +      * If FEAT_TLBIRANGE is implemented, defer the individial PTE
> > +      * TLB invalidations until the entire walk is finished, and
> > +      * then use the range-based TLBI instructions to do the
> > +      * invalidations. Condition this upon S2FWB in order to avoid
> > +      * a page-table walk again to perform the CMOs after TLBI.
> > +      */
> > +     return system_supports_tlb_range() && stage2_has_fwb(pgt);
> > +}
> > +
> > +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> > +                             struct kvm_s2_mmu *mmu,
> > +                             struct kvm_pgtable_mm_ops *mm_ops)
> > +{
> > +     struct kvm_pgtable *pgt = ctx->arg;
> > +
> >       /*
> >        * Clear the existing PTE, and perform break-before-make with
> >        * TLB maintenance if it was valid.
> >        */
> >       if (kvm_pte_valid(ctx->old)) {
> >               kvm_clear_pte(ctx->ptep);
> > -             kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> > +
> > +             if (!stage2_unmap_defer_tlb_flush(pgt))
> > +                     kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> > +                                     ctx->addr, ctx->level);
>
> This really doesn't match the comment anymore.
>
Right, I can re-write this in the next spin.

> Overall, I'm very concerned that we lose the consistency property that
> the current code has: once called, the TLBs and the page tables are
> synchronised.
>
> Yes, this patch looks correct. But it is also really fragile.
>
Yeah, we were a little skeptical about this too. Till v2, we had a
different implementation in which we had an independent fast unmap
path that disconnects the PTE hierarchy if the unmap range was exactly
KVM_PGTABLE_MIN_BLOCK_LEVEL [1].  But this had some problems, and we
pivoted to the current implementation.

> >       }
> >
> >       mm_ops->put_page(ctx->ptep);
> > @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> >        * block entry and rely on the remaining portions being faulted
> >        * back lazily.
> >        */
> > -     stage2_put_pte(ctx, mmu, mm_ops);
> > +     stage2_unmap_put_pte(ctx, mmu, mm_ops);
> >
> >       if (need_flush && mm_ops->dcache_clean_inval_poc)
> >               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> > @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> >
> >  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> >  {
> > +     int ret;
> >       struct kvm_pgtable_walker walker = {
> >               .cb     = stage2_unmap_walker,
> >               .arg    = pgt,
> >               .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> >       };
> >
> > -     return kvm_pgtable_walk(pgt, addr, size, &walker);
> > +     ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> > +     if (stage2_unmap_defer_tlb_flush(pgt))
> > +             /* Perform the deferred TLB invalidations */
> > +             kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> > +                             addr, addr + size);
>
> This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a
> wrapper from the point where you introduce it.
>
Sorry, I didn't get this comment. Do you mind elaborating on it?

Thank you.
Raghavendra

[1]: https://lore.kernel.org/all/20230206172340.2639971-8-rananta@google.com/
> > +
> > +     return ret;
> >  }
> >
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
@ 2023-05-30 21:35       ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-05-30 21:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Mon, May 29, 2023 at 7:18 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Fri, 19 May 2023 01:52:31 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > The current implementation of the stage-2 unmap walker traverses
> > the given range and, as a part of break-before-make, performs
> > TLB invalidations with a DSB for every PTE. A multitude of this
> > combination could cause a performance bottleneck.
> >
> > Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> > invalidations until the entire walk is finished, and then
> > use range-based instructions to invalidate the TLBs in one go.
> > Condition this upon S2FWB in order to avoid walking the page-table
> > again to perform the CMOs after issuing the TLBI.
>
> But that's the real bottleneck. TLBIs are cheap compared to CMOs, even
> on remarkably bad implementations. What is your plan to fix this?
>
Correct me if I'm wrong, but my understanding was that a multiple
issuance of TLBI + DSB was the bottleneck, and this patch tries to
avoid this by issuing only one TLBI + DSB at the end.
> >
> > Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> > now serves the stage-2 unmap walker specifically, rather than
> > acting generic.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
> >  1 file changed, 30 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index b8f0dbd12f773..5832ee3418fb0 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
> >       smp_store_release(ctx->ptep, new);
> >  }
> >
> > -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> > -                        struct kvm_pgtable_mm_ops *mm_ops)
> > +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
> >  {
> > +     /*
> > +      * If FEAT_TLBIRANGE is implemented, defer the individial PTE
> > +      * TLB invalidations until the entire walk is finished, and
> > +      * then use the range-based TLBI instructions to do the
> > +      * invalidations. Condition this upon S2FWB in order to avoid
> > +      * a page-table walk again to perform the CMOs after TLBI.
> > +      */
> > +     return system_supports_tlb_range() && stage2_has_fwb(pgt);
> > +}
> > +
> > +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> > +                             struct kvm_s2_mmu *mmu,
> > +                             struct kvm_pgtable_mm_ops *mm_ops)
> > +{
> > +     struct kvm_pgtable *pgt = ctx->arg;
> > +
> >       /*
> >        * Clear the existing PTE, and perform break-before-make with
> >        * TLB maintenance if it was valid.
> >        */
> >       if (kvm_pte_valid(ctx->old)) {
> >               kvm_clear_pte(ctx->ptep);
> > -             kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> > +
> > +             if (!stage2_unmap_defer_tlb_flush(pgt))
> > +                     kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> > +                                     ctx->addr, ctx->level);
>
> This really doesn't match the comment anymore.
>
Right, I can re-write this in the next spin.

> Overall, I'm very concerned that we lose the consistency property that
> the current code has: once called, the TLBs and the page tables are
> synchronised.
>
> Yes, this patch looks correct. But it is also really fragile.
>
Yeah, we were a little skeptical about this too. Till v2, we had a
different implementation in which we had an independent fast unmap
path that disconnects the PTE hierarchy if the unmap range was exactly
KVM_PGTABLE_MIN_BLOCK_LEVEL [1].  But this had some problems, and we
pivoted to the current implementation.

> >       }
> >
> >       mm_ops->put_page(ctx->ptep);
> > @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> >        * block entry and rely on the remaining portions being faulted
> >        * back lazily.
> >        */
> > -     stage2_put_pte(ctx, mmu, mm_ops);
> > +     stage2_unmap_put_pte(ctx, mmu, mm_ops);
> >
> >       if (need_flush && mm_ops->dcache_clean_inval_poc)
> >               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> > @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> >
> >  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> >  {
> > +     int ret;
> >       struct kvm_pgtable_walker walker = {
> >               .cb     = stage2_unmap_walker,
> >               .arg    = pgt,
> >               .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> >       };
> >
> > -     return kvm_pgtable_walk(pgt, addr, size, &walker);
> > +     ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> > +     if (stage2_unmap_defer_tlb_flush(pgt))
> > +             /* Perform the deferred TLB invalidations */
> > +             kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> > +                             addr, addr + size);
>
> This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a
> wrapper from the point where you introduce it.
>
Sorry, I didn't get this comment. Do you mind elaborating on it?

Thank you.
Raghavendra

[1]: https://lore.kernel.org/all/20230206172340.2639971-8-rananta@google.com/
> > +
> > +     return ret;
> >  }
> >
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  2023-05-30 21:22       ` Raghavendra Rao Ananta
@ 2023-05-31  8:46         ` Marc Zyngier
  -1 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-31  8:46 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Tue, 30 May 2023 22:22:23 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 19 May 2023 01:52:28 +0100,
> > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > >
> > > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > > to invalidate the given range in the TLB.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |  3 +++
> > >  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
> > >  arch/arm64/kvm/mmu.c              | 11 +++++++++++
> > >  3 files changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 81ab41b84f436..343fb530eea9c 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> > >  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> > >  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> > >
> > > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > > +
> > >  static inline bool kvm_vm_is_protected(struct kvm *kvm)
> > >  {
> > >       return false;
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > >               return;
> > >       }
> > >
> > > -     dsb(ishst);
> > > -
> > >       /* Switch to requested VMID */
> > > -     __tlb_switch_to_guest(mmu, &cxt);
> > > +     __tlb_switch_to_guest(mmu, &cxt, false);
> >
> > This hunk is in the wrong patch, isn't it?
> >
> Ah, you are right. It should be part of the previous patch. I think I
> introduced it accidentally when I rebased the series. I'll remove it
> in the next spin.
> 
> 
> > >
> > >       __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> > >
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index d0a0d3dca9316..e3673b4c10292 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> > >       return 0;
> > >  }
> > >
> > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > > +{
> > > +     phys_addr_t start, end;
> > > +
> > > +     start = start_gfn << PAGE_SHIFT;
> > > +     end = (start_gfn + pages) << PAGE_SHIFT;
> > > +
> > > +     kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
> >
> > So that's the point that I think is not right. It is the MMU code that
> > should drive the invalidation method, and not the HYP code. The HYP
> > code should be as dumb as possible, and the logic should be kept in
> > the MMU code.
> >
> > So when a range invalidation is forwarded to HYP, it's a *valid* range
> > invalidation. not something that can fallback to VMID-wide invalidation.
> >
> I'm guessing that you are referring to patch-2. Do you recommend
> moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
> return an error? How about for the other check:
> system_supports_tlb_range()?
> The idea was for __kvm_tlb_flush_vmid_range() to also implement a
> fallback mechanism in case the system doesn't support the range-based
> instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
> from multiple cases, we'd end up duplicating the checks. WDYT?

My take is that there should be a single helper deciding to issue
either a number of range-based TLBIs depending on start/end, or a
single VMID-based TLBI. Having multiple calling sites is not a
problem, and even if that code gets duplicated, big deal.

But a hypercall that falls back to global invalidation based on a
range evaluation error (more than MAX_TLBI_RANGE_PAGES) is papering
over a latent bug.

There should be no logic whatsoever in any of the two tlb.c files.
Only a switch to the correct context, and the requested invalidation,
which *must* be architecturally correct.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
@ 2023-05-31  8:46         ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-31  8:46 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Tue, 30 May 2023 22:22:23 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 19 May 2023 01:52:28 +0100,
> > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > >
> > > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > > to invalidate the given range in the TLB.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |  3 +++
> > >  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
> > >  arch/arm64/kvm/mmu.c              | 11 +++++++++++
> > >  3 files changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 81ab41b84f436..343fb530eea9c 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> > >  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> > >  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> > >
> > > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > > +
> > >  static inline bool kvm_vm_is_protected(struct kvm *kvm)
> > >  {
> > >       return false;
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > >               return;
> > >       }
> > >
> > > -     dsb(ishst);
> > > -
> > >       /* Switch to requested VMID */
> > > -     __tlb_switch_to_guest(mmu, &cxt);
> > > +     __tlb_switch_to_guest(mmu, &cxt, false);
> >
> > This hunk is in the wrong patch, isn't it?
> >
> Ah, you are right. It should be part of the previous patch. I think I
> introduced it accidentally when I rebased the series. I'll remove it
> in the next spin.
> 
> 
> > >
> > >       __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> > >
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index d0a0d3dca9316..e3673b4c10292 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> > >       return 0;
> > >  }
> > >
> > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > > +{
> > > +     phys_addr_t start, end;
> > > +
> > > +     start = start_gfn << PAGE_SHIFT;
> > > +     end = (start_gfn + pages) << PAGE_SHIFT;
> > > +
> > > +     kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
> >
> > So that's the point that I think is not right. It is the MMU code that
> > should drive the invalidation method, and not the HYP code. The HYP
> > code should be as dumb as possible, and the logic should be kept in
> > the MMU code.
> >
> > So when a range invalidation is forwarded to HYP, it's a *valid* range
> > invalidation. not something that can fallback to VMID-wide invalidation.
> >
> I'm guessing that you are referring to patch-2. Do you recommend
> moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
> return an error? How about for the other check:
> system_supports_tlb_range()?
> The idea was for __kvm_tlb_flush_vmid_range() to also implement a
> fallback mechanism in case the system doesn't support the range-based
> instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
> from multiple cases, we'd end up duplicating the checks. WDYT?

My take is that there should be a single helper deciding to issue
either a number of range-based TLBIs depending on start/end, or a
single VMID-based TLBI. Having multiple calling sites is not a
problem, and even if that code gets duplicated, big deal.

But a hypercall that falls back to global invalidation based on a
range evaluation error (more than MAX_TLBI_RANGE_PAGES) is papering
over a latent bug.

There should be no logic whatsoever in any of the two tlb.c files.
Only a switch to the correct context, and the requested invalidation,
which *must* be architecturally correct.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
  2023-05-30 21:35       ` Raghavendra Rao Ananta
@ 2023-05-31  8:54         ` Marc Zyngier
  -1 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-31  8:54 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Tue, 30 May 2023 22:35:57 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> On Mon, May 29, 2023 at 7:18 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 19 May 2023 01:52:31 +0100,
> > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > >
> > > The current implementation of the stage-2 unmap walker traverses
> > > the given range and, as a part of break-before-make, performs
> > > TLB invalidations with a DSB for every PTE. A multitude of this
> > > combination could cause a performance bottleneck.
> > >
> > > Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> > > invalidations until the entire walk is finished, and then
> > > use range-based instructions to invalidate the TLBs in one go.
> > > Condition this upon S2FWB in order to avoid walking the page-table
> > > again to perform the CMOs after issuing the TLBI.
> >
> > But that's the real bottleneck. TLBIs are cheap compared to CMOs, even
> > on remarkably bad implementations. What is your plan to fix this?
> >
> Correct me if I'm wrong, but my understanding was that a multiple
> issuance of TLBI + DSB was the bottleneck, and this patch tries to
> avoid this by issuing only one TLBI + DSB at the end.

At least on some of the machines I have access to, CMOs are fare more
expensive than TLBIs, and they are the ones causing slowdowns. Your
system shows a different behaviour, and that's fine, but you can't
draw a general conclusion from it.

> > >
> > > Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> > > now serves the stage-2 unmap walker specifically, rather than
> > > acting generic.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > ---
> > >  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
> > >  1 file changed, 30 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index b8f0dbd12f773..5832ee3418fb0 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
> > >       smp_store_release(ctx->ptep, new);
> > >  }
> > >
> > > -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> > > -                        struct kvm_pgtable_mm_ops *mm_ops)
> > > +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
> > >  {
> > > +     /*
> > > +      * If FEAT_TLBIRANGE is implemented, defer the individial PTE
> > > +      * TLB invalidations until the entire walk is finished, and
> > > +      * then use the range-based TLBI instructions to do the
> > > +      * invalidations. Condition this upon S2FWB in order to avoid
> > > +      * a page-table walk again to perform the CMOs after TLBI.
> > > +      */
> > > +     return system_supports_tlb_range() && stage2_has_fwb(pgt);
> > > +}
> > > +
> > > +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> > > +                             struct kvm_s2_mmu *mmu,
> > > +                             struct kvm_pgtable_mm_ops *mm_ops)
> > > +{
> > > +     struct kvm_pgtable *pgt = ctx->arg;
> > > +
> > >       /*
> > >        * Clear the existing PTE, and perform break-before-make with
> > >        * TLB maintenance if it was valid.
> > >        */
> > >       if (kvm_pte_valid(ctx->old)) {
> > >               kvm_clear_pte(ctx->ptep);
> > > -             kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> > > +
> > > +             if (!stage2_unmap_defer_tlb_flush(pgt))
> > > +                     kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> > > +                                     ctx->addr, ctx->level);
> >
> > This really doesn't match the comment anymore.
> >
> Right, I can re-write this in the next spin.
> 
> > Overall, I'm very concerned that we lose the consistency property that
> > the current code has: once called, the TLBs and the page tables are
> > synchronised.
> >
> > Yes, this patch looks correct. But it is also really fragile.
> >
> Yeah, we were a little skeptical about this too. Till v2, we had a
> different implementation in which we had an independent fast unmap
> path that disconnects the PTE hierarchy if the unmap range was exactly
> KVM_PGTABLE_MIN_BLOCK_LEVEL [1].  But this had some problems, and we
> pivoted to the current implementation.

Can we at least have some sort of runtime assertions that at the point
we release the write lock, the TLBs have been invalidated? Even if
that's tied to some debug config.

> 
> > >       }
> > >
> > >       mm_ops->put_page(ctx->ptep);
> > > @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > >        * block entry and rely on the remaining portions being faulted
> > >        * back lazily.
> > >        */
> > > -     stage2_put_pte(ctx, mmu, mm_ops);
> > > +     stage2_unmap_put_pte(ctx, mmu, mm_ops);
> > >
> > >       if (need_flush && mm_ops->dcache_clean_inval_poc)
> > >               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> > > @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > >
> > >  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> > >  {
> > > +     int ret;
> > >       struct kvm_pgtable_walker walker = {
> > >               .cb     = stage2_unmap_walker,
> > >               .arg    = pgt,
> > >               .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> > >       };
> > >
> > > -     return kvm_pgtable_walk(pgt, addr, size, &walker);
> > > +     ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> > > +     if (stage2_unmap_defer_tlb_flush(pgt))
> > > +             /* Perform the deferred TLB invalidations */
> > > +             kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> > > +                             addr, addr + size);
> >
> > This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a
> > wrapper from the point where you introduce it.
> >
> Sorry, I didn't get this comment. Do you mind elaborating on it?

All I'm saying is that you should have a wrapper like:

void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
			      phys_addr_t base, size_t size)
{
	kvm_call_hyp(__kvm_tlb_flush_vmid_range,
		     mmu, base, base + size);
}

and use it throughout the code.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap
@ 2023-05-31  8:54         ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-05-31  8:54 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Tue, 30 May 2023 22:35:57 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> On Mon, May 29, 2023 at 7:18 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Fri, 19 May 2023 01:52:31 +0100,
> > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > >
> > > The current implementation of the stage-2 unmap walker traverses
> > > the given range and, as a part of break-before-make, performs
> > > TLB invalidations with a DSB for every PTE. A multitude of this
> > > combination could cause a performance bottleneck.
> > >
> > > Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
> > > invalidations until the entire walk is finished, and then
> > > use range-based instructions to invalidate the TLBs in one go.
> > > Condition this upon S2FWB in order to avoid walking the page-table
> > > again to perform the CMOs after issuing the TLBI.
> >
> > But that's the real bottleneck. TLBIs are cheap compared to CMOs, even
> > on remarkably bad implementations. What is your plan to fix this?
> >
> Correct me if I'm wrong, but my understanding was that a multiple
> issuance of TLBI + DSB was the bottleneck, and this patch tries to
> avoid this by issuing only one TLBI + DSB at the end.

At least on some of the machines I have access to, CMOs are fare more
expensive than TLBIs, and they are the ones causing slowdowns. Your
system shows a different behaviour, and that's fine, but you can't
draw a general conclusion from it.

> > >
> > > Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
> > > now serves the stage-2 unmap walker specifically, rather than
> > > acting generic.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > ---
> > >  arch/arm64/kvm/hyp/pgtable.c | 35 ++++++++++++++++++++++++++++++-----
> > >  1 file changed, 30 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index b8f0dbd12f773..5832ee3418fb0 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -771,16 +771,34 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
> > >       smp_store_release(ctx->ptep, new);
> > >  }
> > >
> > > -static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> > > -                        struct kvm_pgtable_mm_ops *mm_ops)
> > > +static bool stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
> > >  {
> > > +     /*
> > > +      * If FEAT_TLBIRANGE is implemented, defer the individial PTE
> > > +      * TLB invalidations until the entire walk is finished, and
> > > +      * then use the range-based TLBI instructions to do the
> > > +      * invalidations. Condition this upon S2FWB in order to avoid
> > > +      * a page-table walk again to perform the CMOs after TLBI.
> > > +      */
> > > +     return system_supports_tlb_range() && stage2_has_fwb(pgt);
> > > +}
> > > +
> > > +static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
> > > +                             struct kvm_s2_mmu *mmu,
> > > +                             struct kvm_pgtable_mm_ops *mm_ops)
> > > +{
> > > +     struct kvm_pgtable *pgt = ctx->arg;
> > > +
> > >       /*
> > >        * Clear the existing PTE, and perform break-before-make with
> > >        * TLB maintenance if it was valid.
> > >        */
> > >       if (kvm_pte_valid(ctx->old)) {
> > >               kvm_clear_pte(ctx->ptep);
> > > -             kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
> > > +
> > > +             if (!stage2_unmap_defer_tlb_flush(pgt))
> > > +                     kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
> > > +                                     ctx->addr, ctx->level);
> >
> > This really doesn't match the comment anymore.
> >
> Right, I can re-write this in the next spin.
> 
> > Overall, I'm very concerned that we lose the consistency property that
> > the current code has: once called, the TLBs and the page tables are
> > synchronised.
> >
> > Yes, this patch looks correct. But it is also really fragile.
> >
> Yeah, we were a little skeptical about this too. Till v2, we had a
> different implementation in which we had an independent fast unmap
> path that disconnects the PTE hierarchy if the unmap range was exactly
> KVM_PGTABLE_MIN_BLOCK_LEVEL [1].  But this had some problems, and we
> pivoted to the current implementation.

Can we at least have some sort of runtime assertions that at the point
we release the write lock, the TLBs have been invalidated? Even if
that's tied to some debug config.

> 
> > >       }
> > >
> > >       mm_ops->put_page(ctx->ptep);
> > > @@ -1015,7 +1033,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > >        * block entry and rely on the remaining portions being faulted
> > >        * back lazily.
> > >        */
> > > -     stage2_put_pte(ctx, mmu, mm_ops);
> > > +     stage2_unmap_put_pte(ctx, mmu, mm_ops);
> > >
> > >       if (need_flush && mm_ops->dcache_clean_inval_poc)
> > >               mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
> > > @@ -1029,13 +1047,20 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > >
> > >  int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> > >  {
> > > +     int ret;
> > >       struct kvm_pgtable_walker walker = {
> > >               .cb     = stage2_unmap_walker,
> > >               .arg    = pgt,
> > >               .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> > >       };
> > >
> > > -     return kvm_pgtable_walk(pgt, addr, size, &walker);
> > > +     ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> > > +     if (stage2_unmap_defer_tlb_flush(pgt))
> > > +             /* Perform the deferred TLB invalidations */
> > > +             kvm_call_hyp(__kvm_tlb_flush_vmid_range, pgt->mmu,
> > > +                             addr, addr + size);
> >
> > This "kvm_call_hyp(__kvm_tlb_flush_vmid_range,...)" could do with a
> > wrapper from the point where you introduce it.
> >
> Sorry, I didn't get this comment. Do you mind elaborating on it?

All I'm saying is that you should have a wrapper like:

void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
			      phys_addr_t base, size_t size)
{
	kvm_call_hyp(__kvm_tlb_flush_vmid_range,
		     mmu, base, base + size);
}

and use it throughout the code.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  2023-05-31  8:46         ` Marc Zyngier
@ 2023-06-02  1:37           ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-06-02  1:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Wed, May 31, 2023 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 30 May 2023 22:22:23 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 19 May 2023 01:52:28 +0100,
> > > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > > >
> > > > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > > > to invalidate the given range in the TLB.
> > > >
> > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > > ---
> > > >  arch/arm64/include/asm/kvm_host.h |  3 +++
> > > >  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
> > > >  arch/arm64/kvm/mmu.c              | 11 +++++++++++
> > > >  3 files changed, 15 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > index 81ab41b84f436..343fb530eea9c 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> > > >  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> > > >  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> > > >
> > > > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > > > +
> > > >  static inline bool kvm_vm_is_protected(struct kvm *kvm)
> > > >  {
> > > >       return false;
> > > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > > >               return;
> > > >       }
> > > >
> > > > -     dsb(ishst);
> > > > -
> > > >       /* Switch to requested VMID */
> > > > -     __tlb_switch_to_guest(mmu, &cxt);
> > > > +     __tlb_switch_to_guest(mmu, &cxt, false);
> > >
> > > This hunk is in the wrong patch, isn't it?
> > >
> > Ah, you are right. It should be part of the previous patch. I think I
> > introduced it accidentally when I rebased the series. I'll remove it
> > in the next spin.
> >
> >
> > > >
> > > >       __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> > > >
> > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > index d0a0d3dca9316..e3673b4c10292 100644
> > > > --- a/arch/arm64/kvm/mmu.c
> > > > +++ b/arch/arm64/kvm/mmu.c
> > > > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> > > >       return 0;
> > > >  }
> > > >
> > > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > > > +{
> > > > +     phys_addr_t start, end;
> > > > +
> > > > +     start = start_gfn << PAGE_SHIFT;
> > > > +     end = (start_gfn + pages) << PAGE_SHIFT;
> > > > +
> > > > +     kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
> > >
> > > So that's the point that I think is not right. It is the MMU code that
> > > should drive the invalidation method, and not the HYP code. The HYP
> > > code should be as dumb as possible, and the logic should be kept in
> > > the MMU code.
> > >
> > > So when a range invalidation is forwarded to HYP, it's a *valid* range
> > > invalidation. not something that can fallback to VMID-wide invalidation.
> > >
> > I'm guessing that you are referring to patch-2. Do you recommend
> > moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
> > return an error? How about for the other check:
> > system_supports_tlb_range()?
> > The idea was for __kvm_tlb_flush_vmid_range() to also implement a
> > fallback mechanism in case the system doesn't support the range-based
> > instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
> > from multiple cases, we'd end up duplicating the checks. WDYT?
>
> My take is that there should be a single helper deciding to issue
> either a number of range-based TLBIs depending on start/end, or a
> single VMID-based TLBI. Having multiple calling sites is not a
> problem, and even if that code gets duplicated, big deal.
>
Hypothetically, if I move the check to this patch and return an error
if this situation occurs, since I'm dependending on David's common MMU
code [1], kvm_main.c would end of calling kvm_flush_remote_tlbs() and
we'd be doing a VMID-based TLBI.
One idea would be to issue a WARN_ON() and return 0 so that we don't
issue any TLBIs. Thoughts?

> But a hypercall that falls back to global invalidation based on a
> range evaluation error (more than MAX_TLBI_RANGE_PAGES) is papering
> over a latent bug.
>
If I understand this correctly, MAX_TLBI_RANGE_PAGES is specifically
the capacity of the range-based instructions itself, isn't it? Is it
incorrect for the caller to request a higher range be invalidated even
on systems that do not support these instructions? Probably that's why
__flush_tlb_range() falls back to a global flush when the range
request is exceeded?

Thank you.
Raghavendra

[1]:  https://lore.kernel.org/linux-arm-kernel/20230126184025.2294823-7-dmatlack@google.com/


> There should be no logic whatsoever in any of the two tlb.c files.
> Only a switch to the correct context, and the requested invalidation,
> which *must* be architecturally correct.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
@ 2023-06-02  1:37           ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 36+ messages in thread
From: Raghavendra Rao Ananta @ 2023-06-02  1:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Wed, May 31, 2023 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 30 May 2023 22:22:23 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Fri, 19 May 2023 01:52:28 +0100,
> > > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > > >
> > > > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > > > to invalidate the given range in the TLB.
> > > >
> > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > > ---
> > > >  arch/arm64/include/asm/kvm_host.h |  3 +++
> > > >  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
> > > >  arch/arm64/kvm/mmu.c              | 11 +++++++++++
> > > >  3 files changed, 15 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > index 81ab41b84f436..343fb530eea9c 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> > > >  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> > > >  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> > > >
> > > > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > > > +
> > > >  static inline bool kvm_vm_is_protected(struct kvm *kvm)
> > > >  {
> > > >       return false;
> > > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > > >               return;
> > > >       }
> > > >
> > > > -     dsb(ishst);
> > > > -
> > > >       /* Switch to requested VMID */
> > > > -     __tlb_switch_to_guest(mmu, &cxt);
> > > > +     __tlb_switch_to_guest(mmu, &cxt, false);
> > >
> > > This hunk is in the wrong patch, isn't it?
> > >
> > Ah, you are right. It should be part of the previous patch. I think I
> > introduced it accidentally when I rebased the series. I'll remove it
> > in the next spin.
> >
> >
> > > >
> > > >       __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> > > >
> > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > index d0a0d3dca9316..e3673b4c10292 100644
> > > > --- a/arch/arm64/kvm/mmu.c
> > > > +++ b/arch/arm64/kvm/mmu.c
> > > > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> > > >       return 0;
> > > >  }
> > > >
> > > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > > > +{
> > > > +     phys_addr_t start, end;
> > > > +
> > > > +     start = start_gfn << PAGE_SHIFT;
> > > > +     end = (start_gfn + pages) << PAGE_SHIFT;
> > > > +
> > > > +     kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
> > >
> > > So that's the point that I think is not right. It is the MMU code that
> > > should drive the invalidation method, and not the HYP code. The HYP
> > > code should be as dumb as possible, and the logic should be kept in
> > > the MMU code.
> > >
> > > So when a range invalidation is forwarded to HYP, it's a *valid* range
> > > invalidation. not something that can fallback to VMID-wide invalidation.
> > >
> > I'm guessing that you are referring to patch-2. Do you recommend
> > moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
> > return an error? How about for the other check:
> > system_supports_tlb_range()?
> > The idea was for __kvm_tlb_flush_vmid_range() to also implement a
> > fallback mechanism in case the system doesn't support the range-based
> > instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
> > from multiple cases, we'd end up duplicating the checks. WDYT?
>
> My take is that there should be a single helper deciding to issue
> either a number of range-based TLBIs depending on start/end, or a
> single VMID-based TLBI. Having multiple calling sites is not a
> problem, and even if that code gets duplicated, big deal.
>
Hypothetically, if I move the check to this patch and return an error
if this situation occurs, since I'm dependending on David's common MMU
code [1], kvm_main.c would end of calling kvm_flush_remote_tlbs() and
we'd be doing a VMID-based TLBI.
One idea would be to issue a WARN_ON() and return 0 so that we don't
issue any TLBIs. Thoughts?

> But a hypercall that falls back to global invalidation based on a
> range evaluation error (more than MAX_TLBI_RANGE_PAGES) is papering
> over a latent bug.
>
If I understand this correctly, MAX_TLBI_RANGE_PAGES is specifically
the capacity of the range-based instructions itself, isn't it? Is it
incorrect for the caller to request a higher range be invalidated even
on systems that do not support these instructions? Probably that's why
__flush_tlb_range() falls back to a global flush when the range
request is exceeded?

Thank you.
Raghavendra

[1]:  https://lore.kernel.org/linux-arm-kernel/20230126184025.2294823-7-dmatlack@google.com/


> There should be no logic whatsoever in any of the two tlb.c files.
> Only a switch to the correct context, and the requested invalidation,
> which *must* be architecturally correct.
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
  2023-06-02  1:37           ` Raghavendra Rao Ananta
@ 2023-06-02  8:25             ` Marc Zyngier
  -1 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-06-02  8:25 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, 02 Jun 2023 02:37:28 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> On Wed, May 31, 2023 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Tue, 30 May 2023 22:22:23 +0100,
> > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > >
> > > On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Fri, 19 May 2023 01:52:28 +0100,
> > > > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > > > >
> > > > > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > > > > to invalidate the given range in the TLB.
> > > > >
> > > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > > > ---
> > > > >  arch/arm64/include/asm/kvm_host.h |  3 +++
> > > > >  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
> > > > >  arch/arm64/kvm/mmu.c              | 11 +++++++++++
> > > > >  3 files changed, 15 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > > index 81ab41b84f436..343fb530eea9c 100644
> > > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> > > > >  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> > > > >  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> > > > >
> > > > > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > > > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > > > > +
> > > > >  static inline bool kvm_vm_is_protected(struct kvm *kvm)
> > > > >  {
> > > > >       return false;
> > > > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > > > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > > > >               return;
> > > > >       }
> > > > >
> > > > > -     dsb(ishst);
> > > > > -
> > > > >       /* Switch to requested VMID */
> > > > > -     __tlb_switch_to_guest(mmu, &cxt);
> > > > > +     __tlb_switch_to_guest(mmu, &cxt, false);
> > > >
> > > > This hunk is in the wrong patch, isn't it?
> > > >
> > > Ah, you are right. It should be part of the previous patch. I think I
> > > introduced it accidentally when I rebased the series. I'll remove it
> > > in the next spin.
> > >
> > >
> > > > >
> > > > >       __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> > > > >
> > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > index d0a0d3dca9316..e3673b4c10292 100644
> > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > > > > +{
> > > > > +     phys_addr_t start, end;
> > > > > +
> > > > > +     start = start_gfn << PAGE_SHIFT;
> > > > > +     end = (start_gfn + pages) << PAGE_SHIFT;
> > > > > +
> > > > > +     kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
> > > >
> > > > So that's the point that I think is not right. It is the MMU code that
> > > > should drive the invalidation method, and not the HYP code. The HYP
> > > > code should be as dumb as possible, and the logic should be kept in
> > > > the MMU code.
> > > >
> > > > So when a range invalidation is forwarded to HYP, it's a *valid* range
> > > > invalidation. not something that can fallback to VMID-wide invalidation.
> > > >
> > > I'm guessing that you are referring to patch-2. Do you recommend
> > > moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
> > > return an error? How about for the other check:
> > > system_supports_tlb_range()?
> > > The idea was for __kvm_tlb_flush_vmid_range() to also implement a
> > > fallback mechanism in case the system doesn't support the range-based
> > > instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
> > > from multiple cases, we'd end up duplicating the checks. WDYT?
> >
> > My take is that there should be a single helper deciding to issue
> > either a number of range-based TLBIs depending on start/end, or a
> > single VMID-based TLBI. Having multiple calling sites is not a
> > problem, and even if that code gets duplicated, big deal.
> >
> Hypothetically, if I move the check to this patch and return an error
> if this situation occurs, since I'm dependending on David's common MMU
> code [1], kvm_main.c would end of calling kvm_flush_remote_tlbs() and
> we'd be doing a VMID-based TLBI.
> One idea would be to issue a WARN_ON() and return 0 so that we don't
> issue any TLBIs. Thoughts?

Then we should fix the infrastructure. And no, not issuing TLBs is
not an acceptable outcome. Might as well panic the kernel, because
silent memory corruption is not something I'm willing to entertain.

My take is as follows:

- either the core code doesn't specify a range, and we blow the whole
  thing as we do today

- or it specifies a range, and we apply range invalidation as permitted
  by the architecture and the implementation (either a *series* of
  range invalidation, or a full VMID invalidation).

As for the "common MMU" stuff, something such as "flush_remote_tlbs"
really shows that this code isn't common at all. It is just x86 creep,
and I have zero problem ignoring it.

> 
> > But a hypercall that falls back to global invalidation based on a
> > range evaluation error (more than MAX_TLBI_RANGE_PAGES) is papering
> > over a latent bug.
> >
> If I understand this correctly, MAX_TLBI_RANGE_PAGES is specifically
> the capacity of the range-based instructions itself, isn't it? Is it
> incorrect for the caller to request a higher range be invalidated even
> on systems that do not support these instructions? Probably that's why
> __flush_tlb_range() falls back to a global flush when the range
> request is exceeded?

This is indeed the architectural limit. But invalidating the whole
VMID isn't necessarily the right thing either. If you can preserve
some of the TLBs by only issuing a couple of range invalidation that
span *in total* more than MAX_TLBI_RANGE_PAGES.

Again, what I'm objecting to is the silent upgrading to VMID-wide
invalidation being hidden away in the HYP code. That's just wrong. The
HYP code is not the place for abstraction.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
@ 2023-06-02  8:25             ` Marc Zyngier
  0 siblings, 0 replies; 36+ messages in thread
From: Marc Zyngier @ 2023-06-02  8:25 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Oliver Upton, James Morse, Suzuki K Poulose, Ricardo Koller,
	Paolo Bonzini, Jing Zhang, Colton Lewis, linux-arm-kernel,
	kvmarm, linux-kernel, kvm

On Fri, 02 Jun 2023 02:37:28 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> On Wed, May 31, 2023 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Tue, 30 May 2023 22:22:23 +0100,
> > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > >
> > > On Mon, May 29, 2023 at 7:00 AM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Fri, 19 May 2023 01:52:28 +0100,
> > > > Raghavendra Rao Ananta <rananta@google.com> wrote:
> > > > >
> > > > > Implement kvm_arch_flush_remote_tlbs_range() for arm64
> > > > > to invalidate the given range in the TLB.
> > > > >
> > > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > > > ---
> > > > >  arch/arm64/include/asm/kvm_host.h |  3 +++
> > > > >  arch/arm64/kvm/hyp/nvhe/tlb.c     |  4 +---
> > > > >  arch/arm64/kvm/mmu.c              | 11 +++++++++++
> > > > >  3 files changed, 15 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > > index 81ab41b84f436..343fb530eea9c 100644
> > > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > > @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> > > > >  #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> > > > >  int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> > > > >
> > > > > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> > > > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> > > > > +
> > > > >  static inline bool kvm_vm_is_protected(struct kvm *kvm)
> > > > >  {
> > > > >       return false;
> > > > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > > index d4ea549c4b5c4..d2c7c1bc6d441 100644
> > > > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > > > @@ -150,10 +150,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > > > >               return;
> > > > >       }
> > > > >
> > > > > -     dsb(ishst);
> > > > > -
> > > > >       /* Switch to requested VMID */
> > > > > -     __tlb_switch_to_guest(mmu, &cxt);
> > > > > +     __tlb_switch_to_guest(mmu, &cxt, false);
> > > >
> > > > This hunk is in the wrong patch, isn't it?
> > > >
> > > Ah, you are right. It should be part of the previous patch. I think I
> > > introduced it accidentally when I rebased the series. I'll remove it
> > > in the next spin.
> > >
> > >
> > > > >
> > > > >       __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> > > > >
> > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > index d0a0d3dca9316..e3673b4c10292 100644
> > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > @@ -92,6 +92,17 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> > > > > +{
> > > > > +     phys_addr_t start, end;
> > > > > +
> > > > > +     start = start_gfn << PAGE_SHIFT;
> > > > > +     end = (start_gfn + pages) << PAGE_SHIFT;
> > > > > +
> > > > > +     kvm_call_hyp(__kvm_tlb_flush_vmid_range, &kvm->arch.mmu, start, end);
> > > >
> > > > So that's the point that I think is not right. It is the MMU code that
> > > > should drive the invalidation method, and not the HYP code. The HYP
> > > > code should be as dumb as possible, and the logic should be kept in
> > > > the MMU code.
> > > >
> > > > So when a range invalidation is forwarded to HYP, it's a *valid* range
> > > > invalidation. not something that can fallback to VMID-wide invalidation.
> > > >
> > > I'm guessing that you are referring to patch-2. Do you recommend
> > > moving the 'pages >= MAX_TLBI_RANGE_PAGES' logic here and simply
> > > return an error? How about for the other check:
> > > system_supports_tlb_range()?
> > > The idea was for __kvm_tlb_flush_vmid_range() to also implement a
> > > fallback mechanism in case the system doesn't support the range-based
> > > instructions. But if we end up calling __kvm_tlb_flush_vmid_range()
> > > from multiple cases, we'd end up duplicating the checks. WDYT?
> >
> > My take is that there should be a single helper deciding to issue
> > either a number of range-based TLBIs depending on start/end, or a
> > single VMID-based TLBI. Having multiple calling sites is not a
> > problem, and even if that code gets duplicated, big deal.
> >
> Hypothetically, if I move the check to this patch and return an error
> if this situation occurs, since I'm dependending on David's common MMU
> code [1], kvm_main.c would end of calling kvm_flush_remote_tlbs() and
> we'd be doing a VMID-based TLBI.
> One idea would be to issue a WARN_ON() and return 0 so that we don't
> issue any TLBIs. Thoughts?

Then we should fix the infrastructure. And no, not issuing TLBs is
not an acceptable outcome. Might as well panic the kernel, because
silent memory corruption is not something I'm willing to entertain.

My take is as follows:

- either the core code doesn't specify a range, and we blow the whole
  thing as we do today

- or it specifies a range, and we apply range invalidation as permitted
  by the architecture and the implementation (either a *series* of
  range invalidation, or a full VMID invalidation).

As for the "common MMU" stuff, something such as "flush_remote_tlbs"
really shows that this code isn't common at all. It is just x86 creep,
and I have zero problem ignoring it.

> 
> > But a hypercall that falls back to global invalidation based on a
> > range evaluation error (more than MAX_TLBI_RANGE_PAGES) is papering
> > over a latent bug.
> >
> If I understand this correctly, MAX_TLBI_RANGE_PAGES is specifically
> the capacity of the range-based instructions itself, isn't it? Is it
> incorrect for the caller to request a higher range be invalidated even
> on systems that do not support these instructions? Probably that's why
> __flush_tlb_range() falls back to a global flush when the range
> request is exceeded?

This is indeed the architectural limit. But invalidating the whole
VMID isn't necessarily the right thing either. If you can preserve
some of the TLBs by only issuing a couple of range invalidation that
span *in total* more than MAX_TLBI_RANGE_PAGES.

Again, what I'm objecting to is the silent upgrading to VMID-wide
invalidation being hidden away in the HYP code. That's just wrong. The
HYP code is not the place for abstraction.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-06-02  8:26 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-19  0:52 [PATCH v4 0/6] KVM: arm64: Add support for FEAT_TLBIRANGE Raghavendra Rao Ananta
2023-05-19  0:52 ` Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 1/6] arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range Raghavendra Rao Ananta
2023-05-19  0:52   ` Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 2/6] KVM: arm64: Implement __kvm_tlb_flush_vmid_range() Raghavendra Rao Ananta
2023-05-19  0:52   ` Raghavendra Rao Ananta
2023-05-29 13:54   ` Marc Zyngier
2023-05-29 13:54     ` Marc Zyngier
2023-05-30 21:14     ` Raghavendra Rao Ananta
2023-05-30 21:14       ` Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 3/6] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range() Raghavendra Rao Ananta
2023-05-19  0:52   ` Raghavendra Rao Ananta
2023-05-29 14:00   ` Marc Zyngier
2023-05-29 14:00     ` Marc Zyngier
2023-05-30 21:22     ` Raghavendra Rao Ananta
2023-05-30 21:22       ` Raghavendra Rao Ananta
2023-05-31  8:46       ` Marc Zyngier
2023-05-31  8:46         ` Marc Zyngier
2023-06-02  1:37         ` Raghavendra Rao Ananta
2023-06-02  1:37           ` Raghavendra Rao Ananta
2023-06-02  8:25           ` Marc Zyngier
2023-06-02  8:25             ` Marc Zyngier
2023-05-19  0:52 ` [PATCH v4 4/6] KVM: arm64: Flush only the memslot after write-protect Raghavendra Rao Ananta
2023-05-19  0:52   ` Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 5/6] KVM: arm64: Invalidate the table entries upon a range Raghavendra Rao Ananta
2023-05-19  0:52   ` Raghavendra Rao Ananta
2023-05-19  0:52 ` [PATCH v4 6/6] KVM: arm64: Use TLBI range-based intructions for unmap Raghavendra Rao Ananta
2023-05-19  0:52   ` Raghavendra Rao Ananta
2023-05-21 19:32   ` Oliver Upton
2023-05-21 19:32     ` Oliver Upton
2023-05-29 14:18   ` Marc Zyngier
2023-05-29 14:18     ` Marc Zyngier
2023-05-30 21:35     ` Raghavendra Rao Ananta
2023-05-30 21:35       ` Raghavendra Rao Ananta
2023-05-31  8:54       ` Marc Zyngier
2023-05-31  8:54         ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.