All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-10-09 18:49 ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

Hi All,

This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
The series does not include support for FEAT_LPA2 in the kernel stage 1. This
support is provided separately by Ard Biesheuvel's series at [4]. The two series
are mostly independent.

This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
minor changes based on review comments from Oliver.

NOTE: I've included my patch to update the range-based tlbi functions to work
with LPA2 in this version, because KVM has started using range-based tlbi
invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
calls will use the LPA2 format if LPA2 is in use by KVM, but the
kernel-originated calls are hardcoded to never use the LPA2 format. If merging
with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
from __flush_tlb_range() appropriately.


Testing
=======

Testing has been done exclusively on the FVP and covers my boot matrix tests
and kvm selftests.

The host/guest config boot matrix gives the same (expected) results as for the
v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
load the host kernel into high memory which isn't expected to work until the
kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
on the exact configs).

KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
takes about 10 hours on FVP, sigh, but I can test a few more host configs if
useful). All tests pass except "memslot_perf_test", which fails due to a timeout
while syncing. This test fails in the same way for plain v6.6-rc1, so I'm
confident this is not a regression caused by this series. (the issue is that
alarm(2) is issued and the signal is received before alarm(0) is issued. I
expect this is an FVP-time related problem, although I'm not sure how to fix
robustly for the FVP without potentially hanging real systems for long periods
of time).


Changes since v3 [2]
====================

 - rebased onto v6.6-rc5.
 - has_lpa2() only requires consistency between s1 and s2 when
   kvm_get_mode() != KVM_MODE_NONE.
 - squashed 2 patches that enabled hyp s1 and vm s2 to use LPA2 pgtable format
   respectively, into 1 patch. There were shared accessors so the intermediate
   state was invalid.
 - use FIELD_PREP() to insert PS into TCR_EL2 instead of raw shift.


Changes since v2 [2]
====================

 - rebased onto v6.6-rc1
 - removed small amount of dead code eroneously introduced by previous rebase
 - reworked range-based tlbi to work with LPA2 (KVM has started using
   range-based tlbi as of v6.6-rc1)


Changes since v1 [1]
====================

 - Create CPU feature for LPA2 (enabled if both S1 and S2 report LPA2 support).
 - Use the CPU feature (and therefore code patching) to globally decide whether
   or not to use LPA2 PTE format; no more per-pgtable flag to pass around.
 - Removed the range-based TLBI changes, which are not required by KVM; leaves
   only minor changes to the non-range-based invalidation code.
 - Removed patch to encode/decode VTCR_EL2.SL2, and replaced with a comment
   describing why we never need to touch SL2 (stage 2 always uses concatenated
   first level lookup).
 - Added support for LPA2 guests in KVM selftests (VM_MODE_P52V48_4K enabled and
   new VM_MODE_P52V48_16K added).
 - Rebased onto 6.3-rc1.


[1] https://lore.kernel.org/kvmarm/20221206135930.3277585-1-ryan.roberts@arm.com/
[2] https://lore.kernel.org/kvmarm/20230306195438.1557851-1-ryan.roberts@arm.com/
[3] https://lore.kernel.org/kvmarm/20230918065740.3670662-1-ryan.roberts@arm.com/
[4] https://lore.kernel.org/linux-arm-kernel/20230912141549.278777-63-ardb@google.com/

Thanks,
Ryan


Anshuman Khandual (1):
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ryan Roberts (11):
  arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  arm64/mm: Update range-based tlb invalidation routines for FEAT_LPA2
  KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
  KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
  KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode()
  KVM: arm64: Convert translation level parameter to s8
  KVM: arm64: Support up to 5 levels of translation in kvm_pgtable
  KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems
  KVM: selftests: arm64: Determine max ipa size per-page size
  KVM: selftests: arm64: Support P52V48 4K and 16K guest_modes

 arch/arm64/include/asm/cpufeature.h           |  5 ++
 arch/arm64/include/asm/kvm_arm.h              |  2 +
 arch/arm64/include/asm/kvm_emulate.h          | 12 ++-
 arch/arm64/include/asm/kvm_pgtable.h          | 78 ++++++++++------
 arch/arm64/include/asm/kvm_pkvm.h             |  5 +-
 arch/arm64/include/asm/sysreg.h               |  5 ++
 arch/arm64/include/asm/tlb.h                  | 15 ++--
 arch/arm64/include/asm/tlbflush.h             | 85 +++++++++++-------
 arch/arm64/kernel/cpufeature.c                | 46 ++++++++++
 arch/arm64/kvm/arm.c                          |  4 +
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |  4 -
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  6 +-
 arch/arm64/kvm/hyp/nvhe/mm.c                  |  4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c               |  2 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |  3 +-
 arch/arm64/kvm/hyp/pgtable.c                  | 88 ++++++++++++-------
 arch/arm64/kvm/hyp/vhe/tlb.c                  |  3 +-
 arch/arm64/kvm/mmu.c                          | 16 ++--
 arch/arm64/kvm/reset.c                        |  9 +-
 arch/arm64/tools/cpucaps                      |  1 +
 .../selftests/kvm/include/aarch64/processor.h |  4 +-
 .../selftests/kvm/include/kvm_util_base.h     |  1 +
 .../selftests/kvm/lib/aarch64/processor.c     | 66 +++++++++++---
 tools/testing/selftests/kvm/lib/guest_modes.c | 42 ++++-----
 tools/testing/selftests/kvm/lib/kvm_util.c    |  3 +
 25 files changed, 348 insertions(+), 161 deletions(-)

--
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-10-09 18:49 ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

Hi All,

This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
The series does not include support for FEAT_LPA2 in the kernel stage 1. This
support is provided separately by Ard Biesheuvel's series at [4]. The two series
are mostly independent.

This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
minor changes based on review comments from Oliver.

NOTE: I've included my patch to update the range-based tlbi functions to work
with LPA2 in this version, because KVM has started using range-based tlbi
invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
calls will use the LPA2 format if LPA2 is in use by KVM, but the
kernel-originated calls are hardcoded to never use the LPA2 format. If merging
with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
from __flush_tlb_range() appropriately.


Testing
=======

Testing has been done exclusively on the FVP and covers my boot matrix tests
and kvm selftests.

The host/guest config boot matrix gives the same (expected) results as for the
v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
load the host kernel into high memory which isn't expected to work until the
kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
on the exact configs).

KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
takes about 10 hours on FVP, sigh, but I can test a few more host configs if
useful). All tests pass except "memslot_perf_test", which fails due to a timeout
while syncing. This test fails in the same way for plain v6.6-rc1, so I'm
confident this is not a regression caused by this series. (the issue is that
alarm(2) is issued and the signal is received before alarm(0) is issued. I
expect this is an FVP-time related problem, although I'm not sure how to fix
robustly for the FVP without potentially hanging real systems for long periods
of time).


Changes since v3 [2]
====================

 - rebased onto v6.6-rc5.
 - has_lpa2() only requires consistency between s1 and s2 when
   kvm_get_mode() != KVM_MODE_NONE.
 - squashed 2 patches that enabled hyp s1 and vm s2 to use LPA2 pgtable format
   respectively, into 1 patch. There were shared accessors so the intermediate
   state was invalid.
 - use FIELD_PREP() to insert PS into TCR_EL2 instead of raw shift.


Changes since v2 [2]
====================

 - rebased onto v6.6-rc1
 - removed small amount of dead code eroneously introduced by previous rebase
 - reworked range-based tlbi to work with LPA2 (KVM has started using
   range-based tlbi as of v6.6-rc1)


Changes since v1 [1]
====================

 - Create CPU feature for LPA2 (enabled if both S1 and S2 report LPA2 support).
 - Use the CPU feature (and therefore code patching) to globally decide whether
   or not to use LPA2 PTE format; no more per-pgtable flag to pass around.
 - Removed the range-based TLBI changes, which are not required by KVM; leaves
   only minor changes to the non-range-based invalidation code.
 - Removed patch to encode/decode VTCR_EL2.SL2, and replaced with a comment
   describing why we never need to touch SL2 (stage 2 always uses concatenated
   first level lookup).
 - Added support for LPA2 guests in KVM selftests (VM_MODE_P52V48_4K enabled and
   new VM_MODE_P52V48_16K added).
 - Rebased onto 6.3-rc1.


[1] https://lore.kernel.org/kvmarm/20221206135930.3277585-1-ryan.roberts@arm.com/
[2] https://lore.kernel.org/kvmarm/20230306195438.1557851-1-ryan.roberts@arm.com/
[3] https://lore.kernel.org/kvmarm/20230918065740.3670662-1-ryan.roberts@arm.com/
[4] https://lore.kernel.org/linux-arm-kernel/20230912141549.278777-63-ardb@google.com/

Thanks,
Ryan


Anshuman Khandual (1):
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ryan Roberts (11):
  arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  arm64/mm: Update range-based tlb invalidation routines for FEAT_LPA2
  KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
  KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
  KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode()
  KVM: arm64: Convert translation level parameter to s8
  KVM: arm64: Support up to 5 levels of translation in kvm_pgtable
  KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems
  KVM: selftests: arm64: Determine max ipa size per-page size
  KVM: selftests: arm64: Support P52V48 4K and 16K guest_modes

 arch/arm64/include/asm/cpufeature.h           |  5 ++
 arch/arm64/include/asm/kvm_arm.h              |  2 +
 arch/arm64/include/asm/kvm_emulate.h          | 12 ++-
 arch/arm64/include/asm/kvm_pgtable.h          | 78 ++++++++++------
 arch/arm64/include/asm/kvm_pkvm.h             |  5 +-
 arch/arm64/include/asm/sysreg.h               |  5 ++
 arch/arm64/include/asm/tlb.h                  | 15 ++--
 arch/arm64/include/asm/tlbflush.h             | 85 +++++++++++-------
 arch/arm64/kernel/cpufeature.c                | 46 ++++++++++
 arch/arm64/kvm/arm.c                          |  4 +
 arch/arm64/kvm/hyp/nvhe/hyp-init.S            |  4 -
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  6 +-
 arch/arm64/kvm/hyp/nvhe/mm.c                  |  4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c               |  2 +-
 arch/arm64/kvm/hyp/nvhe/tlb.c                 |  3 +-
 arch/arm64/kvm/hyp/pgtable.c                  | 88 ++++++++++++-------
 arch/arm64/kvm/hyp/vhe/tlb.c                  |  3 +-
 arch/arm64/kvm/mmu.c                          | 16 ++--
 arch/arm64/kvm/reset.c                        |  9 +-
 arch/arm64/tools/cpucaps                      |  1 +
 .../selftests/kvm/include/aarch64/processor.h |  4 +-
 .../selftests/kvm/include/kvm_util_base.h     |  1 +
 .../selftests/kvm/lib/aarch64/processor.c     | 66 +++++++++++---
 tools/testing/selftests/kvm/lib/guest_modes.c | 42 ++++-----
 tools/testing/selftests/kvm/lib/kvm_util.c    |  3 +
 25 files changed, 348 insertions(+), 161 deletions(-)

--
2.25.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:49   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value for the
4KB granule (this is due to the extra level of translation). Secondly,
the BADDR field in the range tlbi instructions must be aligned to 64KB
when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
continue to operate correctly when LPA2 is in use.

KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
solve the first problem with this patch.

It is solved by always adding the level hint if the level is between [0,
3] (previously anything other than 0 was hinted, which breaks in the new
level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
hint as the HW will fall back to non-hinted. While we are at it, we
replace the notion of 0 being the non-hinted seninel with a macro,
TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
translation depth increases in future.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/tlb.h      |  9 ++++---
 arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 2c29239d05c3..93c537635dbb 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
 #include <asm-generic/tlb.h>
 
 /*
- * get the tlbi levels in arm64.  Default value is 0 if more than one
- * of cleared_* is set or neither is set.
+ * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
+ * one of cleared_* is set or neither is set - this elides the level hinting to
+ * the hardware.
  * Arm64 doesn't support p4ds now.
  */
 static inline int tlb_get_level(struct mmu_gather *tlb)
 {
 	/* The TTL field is only valid for the leaf entry. */
 	if (tlb->freed_tables)
-		return 0;
+		return TLBI_TTL_UNKNOWN;
 
 	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
 				   tlb->cleared_puds ||
@@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
 				   tlb->cleared_p4ds))
 		return 1;
 
-	return 0;
+	return TLBI_TTL_UNKNOWN;
 }
 
 static inline void tlb_flush(struct mmu_gather *tlb)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index b149cf9f91bc..e688246b3b13 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
  * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
  * the level at which the invalidation must take place. If the level is
  * wrong, no invalidation may take place. In the case where the level
- * cannot be easily determined, a 0 value for the level parameter will
- * perform a non-hinted invalidation.
+ * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
+ * a non-hinted invalidation. Any provided level outside the hint range
+ * will also cause fall-back to non-hinted invalidation.
  *
  * For Stage-2 invalidation, use the level values provided to that effect
  * in asm/stage2_pgtable.h.
  */
 #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
 
+#define TLBI_TTL_UNKNOWN	(-1)
+
 #define __tlbi_level(op, addr, level) do {				\
 	u64 arg = addr;							\
 									\
 	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
-	    level) {							\
+	    level >= 0 && level <= 3) {					\
 		u64 ttl = level & 3;					\
 		ttl |= get_trans_granule() << 2;			\
 		arg &= ~TLBI_TTL_MASK;					\
@@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
  * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
  *
  */
-#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
-	({							\
-		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
-		__ta &= GENMASK_ULL(36, 0);			\
-		__ta |= (unsigned long)(ttl) << 37;		\
-		__ta |= (unsigned long)(num) << 39;		\
-		__ta |= (unsigned long)(scale) << 44;		\
-		__ta |= get_trans_granule() << 46;		\
-		__ta |= (unsigned long)(asid) << 48;		\
-		__ta;						\
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
+	({									\
+		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
+		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
+		__ta &= GENMASK_ULL(36, 0);					\
+		__ta |= __ttl << 37;						\
+		__ta |= (unsigned long)(num) << 39;				\
+		__ta |= (unsigned long)(scale) << 44;				\
+		__ta |= get_trans_granule() << 46;				\
+		__ta |= (unsigned long)(asid) << 48;				\
+		__ta;								\
 	})
 
 /* These macros are used by the TLBI RANGE feature. */
@@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
  *		CPUs, ensuring that any walk-cache entries associated with the
  *		translation are also invalidated.
  *
- *	__flush_tlb_range(vma, start, end, stride, last_level)
+ *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
  *		Invalidate the virtual-address range '[start, end)' on all
  *		CPUs for the user address space corresponding to 'vma->mm'.
  *		The invalidation operations are issued at a granularity
  *		determined by 'stride' and only affect any walk-cache entries
- *		if 'last_level' is equal to false.
+ *		if 'last_level' is equal to false. tlb_level is the level at
+ *		which the invalidation must take place. If the level is wrong,
+ *		no invalidation may take place. In the case where the level
+ *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
+ *		perform a non-hinted invalidation.
  *
  *
  *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
@@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	/*
 	 * We cannot use leaf-only invalidation here, since we may be invalidating
 	 * table entries as part of collapsing hugepages or moving page tables.
-	 * Set the tlb_level to 0 because we can not get enough information here.
+	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
+	 * information here.
 	 */
-	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
+	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
 }
 
 static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
@ 2023-10-09 18:49   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value for the
4KB granule (this is due to the extra level of translation). Secondly,
the BADDR field in the range tlbi instructions must be aligned to 64KB
when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
continue to operate correctly when LPA2 is in use.

KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
solve the first problem with this patch.

It is solved by always adding the level hint if the level is between [0,
3] (previously anything other than 0 was hinted, which breaks in the new
level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
hint as the HW will fall back to non-hinted. While we are at it, we
replace the notion of 0 being the non-hinted seninel with a macro,
TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
translation depth increases in future.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/tlb.h      |  9 ++++---
 arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 2c29239d05c3..93c537635dbb 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
 #include <asm-generic/tlb.h>
 
 /*
- * get the tlbi levels in arm64.  Default value is 0 if more than one
- * of cleared_* is set or neither is set.
+ * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
+ * one of cleared_* is set or neither is set - this elides the level hinting to
+ * the hardware.
  * Arm64 doesn't support p4ds now.
  */
 static inline int tlb_get_level(struct mmu_gather *tlb)
 {
 	/* The TTL field is only valid for the leaf entry. */
 	if (tlb->freed_tables)
-		return 0;
+		return TLBI_TTL_UNKNOWN;
 
 	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
 				   tlb->cleared_puds ||
@@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
 				   tlb->cleared_p4ds))
 		return 1;
 
-	return 0;
+	return TLBI_TTL_UNKNOWN;
 }
 
 static inline void tlb_flush(struct mmu_gather *tlb)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index b149cf9f91bc..e688246b3b13 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
  * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
  * the level at which the invalidation must take place. If the level is
  * wrong, no invalidation may take place. In the case where the level
- * cannot be easily determined, a 0 value for the level parameter will
- * perform a non-hinted invalidation.
+ * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
+ * a non-hinted invalidation. Any provided level outside the hint range
+ * will also cause fall-back to non-hinted invalidation.
  *
  * For Stage-2 invalidation, use the level values provided to that effect
  * in asm/stage2_pgtable.h.
  */
 #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
 
+#define TLBI_TTL_UNKNOWN	(-1)
+
 #define __tlbi_level(op, addr, level) do {				\
 	u64 arg = addr;							\
 									\
 	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
-	    level) {							\
+	    level >= 0 && level <= 3) {					\
 		u64 ttl = level & 3;					\
 		ttl |= get_trans_granule() << 2;			\
 		arg &= ~TLBI_TTL_MASK;					\
@@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
  * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
  *
  */
-#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
-	({							\
-		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
-		__ta &= GENMASK_ULL(36, 0);			\
-		__ta |= (unsigned long)(ttl) << 37;		\
-		__ta |= (unsigned long)(num) << 39;		\
-		__ta |= (unsigned long)(scale) << 44;		\
-		__ta |= get_trans_granule() << 46;		\
-		__ta |= (unsigned long)(asid) << 48;		\
-		__ta;						\
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
+	({									\
+		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
+		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
+		__ta &= GENMASK_ULL(36, 0);					\
+		__ta |= __ttl << 37;						\
+		__ta |= (unsigned long)(num) << 39;				\
+		__ta |= (unsigned long)(scale) << 44;				\
+		__ta |= get_trans_granule() << 46;				\
+		__ta |= (unsigned long)(asid) << 48;				\
+		__ta;								\
 	})
 
 /* These macros are used by the TLBI RANGE feature. */
@@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
  *		CPUs, ensuring that any walk-cache entries associated with the
  *		translation are also invalidated.
  *
- *	__flush_tlb_range(vma, start, end, stride, last_level)
+ *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
  *		Invalidate the virtual-address range '[start, end)' on all
  *		CPUs for the user address space corresponding to 'vma->mm'.
  *		The invalidation operations are issued at a granularity
  *		determined by 'stride' and only affect any walk-cache entries
- *		if 'last_level' is equal to false.
+ *		if 'last_level' is equal to false. tlb_level is the level at
+ *		which the invalidation must take place. If the level is wrong,
+ *		no invalidation may take place. In the case where the level
+ *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
+ *		perform a non-hinted invalidation.
  *
  *
  *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
@@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
 	/*
 	 * We cannot use leaf-only invalidation here, since we may be invalidating
 	 * table entries as part of collapsing hugepages or moving page tables.
-	 * Set the tlb_level to 0 because we can not get enough information here.
+	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
+	 * information here.
 	 */
-	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
+	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
 }
 
 static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 02/12] arm64/mm: Update range-based tlb invalidation routines for FEAT_LPA2
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:49   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

The BADDR field of the range-based tlbi instructions is specified in
64KB units when LPA2 is in use (TCR.DS=1), whereas it is in page units
otherwise.

When LPA2 is enabled, use the non-range tlbi instructions to forward
align to a 64KB boundary first, then use range-based tlbi from there on,
until we have either invalidated all pages or we have a single page
remaining. If the latter, that is done with non-range tlbi. (Previously
we invalidated a single odd page first, but we can no longer do this
because it could wreck our 64KB alignment). When LPA2 is not in use, we
don't need the initial alignemnt step. However, the bigger impact is
that we can no longer use the previous method of iterating from smallest
to largest 'scale', since this would likely unalign the boundary again
for the LPA2 case. So instead we iterate from highest to lowest scale,
which guarrantees that we remain 64KB aligned until the last op (at
scale=0).

The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for incrementing scale:

  However, in most scenarios, the pages = 1 when flush_tlb_range() is
  called. Start from scale = 3 or other proper value (such as scale
  =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
  to maximum, the flush order is exactly opposite to the example.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Note: This patch uses LPA2 range-based tlbi based on the new lpa2 param
passed to __flush_tlb_range_op(). This allows both KVM and the kernel to
opt-in/out of LPA2 usage independently. But once both are converted over
(and keyed off the same static key), the parameter could be dropped and
replaced by the static key directly in the macro.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/tlb.h      |  6 +++-
 arch/arm64/include/asm/tlbflush.h | 46 ++++++++++++++++++++-----------
 arch/arm64/kvm/hyp/nvhe/tlb.c     |  2 +-
 arch/arm64/kvm/hyp/vhe/tlb.c      |  2 +-
 4 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 93c537635dbb..396ba9b4872c 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -25,7 +25,6 @@ static void tlb_flush(struct mmu_gather *tlb);
  * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
  * one of cleared_* is set or neither is set - this elides the level hinting to
  * the hardware.
- * Arm64 doesn't support p4ds now.
  */
 static inline int tlb_get_level(struct mmu_gather *tlb)
 {
@@ -48,6 +47,11 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
 				   tlb->cleared_p4ds))
 		return 1;
 
+	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
+				   tlb->cleared_pmds ||
+				   tlb->cleared_puds))
+		return 0;
+
 	return TLBI_TTL_UNKNOWN;
 }
 
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index e688246b3b13..4d34035fe7d6 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -136,10 +136,14 @@ static inline unsigned long get_trans_granule(void)
  * The address range is determined by below formula:
  * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
  *
+ * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
+ * See ARM DDI 0487I.a C5.5.21.
+ *
  */
-#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2)			\
 	({									\
-		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
+		unsigned long __addr_shift = lpa2 ? 16 : PAGE_SHIFT;		\
+		unsigned long __ta = (addr) >> __addr_shift;			\
 		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
 		__ta &= GENMASK_ULL(36, 0);					\
 		__ta |= __ttl << 37;						\
@@ -354,34 +358,44 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
  * @tlb_level:	Translation Table level hint, if known
  * @tlbi_user:	If 'true', call an additional __tlbi_user()
  *              (typically for user ASIDs). 'flase' for IPA instructions
+ * @lpa2:	If 'true', the lpa2 scheme is used as set out below
  *
  * When the CPU does not support TLB range operations, flush the TLB
  * entries one by one at the granularity of 'stride'. If the TLB
  * range ops are supported, then:
  *
- * 1. If 'pages' is odd, flush the first page through non-range
- *    operations;
+ * 1. If FEAT_LPA2 is in use, the start address of a range operation
+ *    must be 64KB aligned, so flush pages one by one until the
+ *    alignment is reached using the non-range operations. This step is
+ *    skipped if LPA2 is not in use.
  *
  * 2. For remaining pages: the minimum range granularity is decided
  *    by 'scale', so multiple range TLBI operations may be required.
- *    Start from scale = 0, flush the corresponding number of pages
- *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
- *    until no pages left.
+ *    Start from scale = 3, flush the corresponding number of pages
+ *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
+ *    until one or zero pages are left. We must start from highest scale
+ *    to ensure 64KB start alignment is maintained in the LPA2 case.
+ *
+ * 3. If there is 1 page remaining, flush it through non-range
+ *    operations. Range operations can only span an even number of
+ *    pages. We save this for last to ensure 64KB start alignment is
+ *    maintained for the LPA2 case.
  *
  * Note that certain ranges can be represented by either num = 31 and
  * scale or num = 0 and scale + 1. The loop below favours the latter
  * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
  */
 #define __flush_tlb_range_op(op, start, pages, stride,			\
-				asid, tlb_level, tlbi_user)		\
+				asid, tlb_level, tlbi_user, lpa2)	\
 do {									\
 	int num = 0;							\
-	int scale = 0;							\
+	int scale = 3;							\
 	unsigned long addr;						\
 									\
 	while (pages > 0) {						\
 		if (!system_supports_tlb_range() ||			\
-		    pages % 2 == 1) {					\
+		    pages == 1 ||					\
+		    (lpa2 && start != ALIGN(start, SZ_64K))) {		\
 			addr = __TLBI_VADDR(start, asid);		\
 			__tlbi_level(op, addr, tlb_level);		\
 			if (tlbi_user)					\
@@ -394,19 +408,19 @@ do {									\
 		num = __TLBI_RANGE_NUM(pages, scale);			\
 		if (num >= 0) {						\
 			addr = __TLBI_VADDR_RANGE(start, asid, scale,	\
-						  num, tlb_level);	\
+						num, tlb_level, lpa2);	\
 			__tlbi(r##op, addr);				\
 			if (tlbi_user)					\
 				__tlbi_user(r##op, addr);		\
 			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
 			pages -= __TLBI_RANGE_PAGES(num, scale);	\
 		}							\
-		scale++;						\
+		scale--;						\
 	}								\
 } while (0)
 
-#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
-	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false)
+#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level, lpa2) \
+	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, lpa2)
 
 static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
@@ -436,9 +450,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	asid = ASID(vma->vm_mm);
 
 	if (last_level)
-		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
+		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true, false);
 	else
-		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
+		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, false);
 
 	dsb(ish);
 	mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index 1b265713d6be..d42b72f78a9b 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -198,7 +198,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt, false);
 
-	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
+	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
 
 	dsb(ish);
 	__tlbi(vmalle1is);
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 46bd43f61d76..6041c6c78984 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -161,7 +161,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt);
 
-	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
+	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
 
 	dsb(ish);
 	__tlbi(vmalle1is);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 02/12] arm64/mm: Update range-based tlb invalidation routines for FEAT_LPA2
@ 2023-10-09 18:49   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

The BADDR field of the range-based tlbi instructions is specified in
64KB units when LPA2 is in use (TCR.DS=1), whereas it is in page units
otherwise.

When LPA2 is enabled, use the non-range tlbi instructions to forward
align to a 64KB boundary first, then use range-based tlbi from there on,
until we have either invalidated all pages or we have a single page
remaining. If the latter, that is done with non-range tlbi. (Previously
we invalidated a single odd page first, but we can no longer do this
because it could wreck our 64KB alignment). When LPA2 is not in use, we
don't need the initial alignemnt step. However, the bigger impact is
that we can no longer use the previous method of iterating from smallest
to largest 'scale', since this would likely unalign the boundary again
for the LPA2 case. So instead we iterate from highest to lowest scale,
which guarrantees that we remain 64KB aligned until the last op (at
scale=0).

The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for incrementing scale:

  However, in most scenarios, the pages = 1 when flush_tlb_range() is
  called. Start from scale = 3 or other proper value (such as scale
  =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
  to maximum, the flush order is exactly opposite to the example.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Note: This patch uses LPA2 range-based tlbi based on the new lpa2 param
passed to __flush_tlb_range_op(). This allows both KVM and the kernel to
opt-in/out of LPA2 usage independently. But once both are converted over
(and keyed off the same static key), the parameter could be dropped and
replaced by the static key directly in the macro.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/tlb.h      |  6 +++-
 arch/arm64/include/asm/tlbflush.h | 46 ++++++++++++++++++++-----------
 arch/arm64/kvm/hyp/nvhe/tlb.c     |  2 +-
 arch/arm64/kvm/hyp/vhe/tlb.c      |  2 +-
 4 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 93c537635dbb..396ba9b4872c 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -25,7 +25,6 @@ static void tlb_flush(struct mmu_gather *tlb);
  * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
  * one of cleared_* is set or neither is set - this elides the level hinting to
  * the hardware.
- * Arm64 doesn't support p4ds now.
  */
 static inline int tlb_get_level(struct mmu_gather *tlb)
 {
@@ -48,6 +47,11 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
 				   tlb->cleared_p4ds))
 		return 1;
 
+	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
+				   tlb->cleared_pmds ||
+				   tlb->cleared_puds))
+		return 0;
+
 	return TLBI_TTL_UNKNOWN;
 }
 
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index e688246b3b13..4d34035fe7d6 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -136,10 +136,14 @@ static inline unsigned long get_trans_granule(void)
  * The address range is determined by below formula:
  * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
  *
+ * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
+ * See ARM DDI 0487I.a C5.5.21.
+ *
  */
-#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
+#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2)			\
 	({									\
-		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
+		unsigned long __addr_shift = lpa2 ? 16 : PAGE_SHIFT;		\
+		unsigned long __ta = (addr) >> __addr_shift;			\
 		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
 		__ta &= GENMASK_ULL(36, 0);					\
 		__ta |= __ttl << 37;						\
@@ -354,34 +358,44 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
  * @tlb_level:	Translation Table level hint, if known
  * @tlbi_user:	If 'true', call an additional __tlbi_user()
  *              (typically for user ASIDs). 'flase' for IPA instructions
+ * @lpa2:	If 'true', the lpa2 scheme is used as set out below
  *
  * When the CPU does not support TLB range operations, flush the TLB
  * entries one by one at the granularity of 'stride'. If the TLB
  * range ops are supported, then:
  *
- * 1. If 'pages' is odd, flush the first page through non-range
- *    operations;
+ * 1. If FEAT_LPA2 is in use, the start address of a range operation
+ *    must be 64KB aligned, so flush pages one by one until the
+ *    alignment is reached using the non-range operations. This step is
+ *    skipped if LPA2 is not in use.
  *
  * 2. For remaining pages: the minimum range granularity is decided
  *    by 'scale', so multiple range TLBI operations may be required.
- *    Start from scale = 0, flush the corresponding number of pages
- *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
- *    until no pages left.
+ *    Start from scale = 3, flush the corresponding number of pages
+ *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
+ *    until one or zero pages are left. We must start from highest scale
+ *    to ensure 64KB start alignment is maintained in the LPA2 case.
+ *
+ * 3. If there is 1 page remaining, flush it through non-range
+ *    operations. Range operations can only span an even number of
+ *    pages. We save this for last to ensure 64KB start alignment is
+ *    maintained for the LPA2 case.
  *
  * Note that certain ranges can be represented by either num = 31 and
  * scale or num = 0 and scale + 1. The loop below favours the latter
  * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
  */
 #define __flush_tlb_range_op(op, start, pages, stride,			\
-				asid, tlb_level, tlbi_user)		\
+				asid, tlb_level, tlbi_user, lpa2)	\
 do {									\
 	int num = 0;							\
-	int scale = 0;							\
+	int scale = 3;							\
 	unsigned long addr;						\
 									\
 	while (pages > 0) {						\
 		if (!system_supports_tlb_range() ||			\
-		    pages % 2 == 1) {					\
+		    pages == 1 ||					\
+		    (lpa2 && start != ALIGN(start, SZ_64K))) {		\
 			addr = __TLBI_VADDR(start, asid);		\
 			__tlbi_level(op, addr, tlb_level);		\
 			if (tlbi_user)					\
@@ -394,19 +408,19 @@ do {									\
 		num = __TLBI_RANGE_NUM(pages, scale);			\
 		if (num >= 0) {						\
 			addr = __TLBI_VADDR_RANGE(start, asid, scale,	\
-						  num, tlb_level);	\
+						num, tlb_level, lpa2);	\
 			__tlbi(r##op, addr);				\
 			if (tlbi_user)					\
 				__tlbi_user(r##op, addr);		\
 			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
 			pages -= __TLBI_RANGE_PAGES(num, scale);	\
 		}							\
-		scale++;						\
+		scale--;						\
 	}								\
 } while (0)
 
-#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
-	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false)
+#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level, lpa2) \
+	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, lpa2)
 
 static inline void __flush_tlb_range(struct vm_area_struct *vma,
 				     unsigned long start, unsigned long end,
@@ -436,9 +450,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	asid = ASID(vma->vm_mm);
 
 	if (last_level)
-		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
+		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true, false);
 	else
-		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
+		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, false);
 
 	dsb(ish);
 	mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index 1b265713d6be..d42b72f78a9b 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -198,7 +198,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt, false);
 
-	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
+	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
 
 	dsb(ish);
 	__tlbi(vmalle1is);
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 46bd43f61d76..6041c6c78984 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -161,7 +161,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt);
 
-	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
+	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
 
 	dsb(ish);
 	__tlbi(vmalle1is);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 03/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:49   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

From: Anshuman Khandual <anshuman.khandual@arm.com>

PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 38296579a4fd..bc782116315a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -826,10 +826,12 @@
 
 /* id_aa64mmfr0 */
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
+#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
+#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
 
 #define ARM64_MIN_PARANGE_BITS		32
@@ -837,6 +839,7 @@
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
+#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
 
 #ifdef CONFIG_ARM64_PA_BITS_52
@@ -847,11 +850,13 @@
 
 #if defined(CONFIG_ARM64_4K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
 #elif defined(CONFIG_ARM64_16K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 03/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
@ 2023-10-09 18:49   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

From: Anshuman Khandual <anshuman.khandual@arm.com>

PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 38296579a4fd..bc782116315a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -826,10 +826,12 @@
 
 /* id_aa64mmfr0 */
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN	0x0
+#define ID_AA64MMFR0_EL1_TGRAN4_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN	0x0
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX	0x7
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN	0x1
+#define ID_AA64MMFR0_EL1_TGRAN16_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX	0xf
 
 #define ARM64_MIN_PARANGE_BITS		32
@@ -837,6 +839,7 @@
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT	0x0
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE		0x1
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN		0x2
+#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2		0x3
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX		0x7
 
 #ifdef CONFIG_ARM64_PA_BITS_52
@@ -847,11 +850,13 @@
 
 #if defined(CONFIG_ARM64_4K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
 #elif defined(CONFIG_ARM64_16K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

Expose FEAT_LPA2 as a capability so that we can take advantage of
alternatives patching in both the kernel and hypervisor.

Although FEAT_LPA2 presence is advertised separately for stage1 and
stage2, the expectation is that in practice both stages will either
support or not support it. Therefore, for the case where KVM is present,
we combine both into a single capability, allowing us to simplify the
implementation. For the case where KVM is not present, we only care
about stage1.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  5 ++++
 arch/arm64/kernel/cpufeature.c      | 46 +++++++++++++++++++++++++++++
 arch/arm64/tools/cpucaps            |  1 +
 3 files changed, 52 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 5bba39376055..b1292ec88538 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -831,6 +831,11 @@ static inline bool system_supports_tlb_range(void)
 		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
 }
 
+static inline bool system_supports_lpa2(void)
+{
+	return cpus_have_const_cap(ARM64_HAS_LPA2);
+}
+
 int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
 bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 444a73c2e638..1ccb1fe0e310 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1746,6 +1746,46 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
 	return !meltdown_safe;
 }
 
+static inline bool has_lpa2_at_stage1(u64 mmfr0)
+{
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+	unsigned int tgran;
+
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
+#else
+	return false;
+#endif
+}
+
+static inline bool has_lpa2_at_stage2(u64 mmfr0)
+{
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+	unsigned int tgran;
+
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
+	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
+#else
+	return false;
+#endif
+}
+
+static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	u64 mmfr0;
+	bool ret;
+
+	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	ret = has_lpa2_at_stage1(mmfr0);
+
+	if (kvm_get_mode() != KVM_MODE_NONE)
+		ret = ret && has_lpa2_at_stage2(mmfr0);
+
+	return ret;
+}
+
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 #define KPTI_NG_TEMP_VA		(-(1UL << PMD_SHIFT))
 
@@ -2719,6 +2759,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
 	},
+	{
+		.desc = "Large Physical Address 2",
+		.capability = ARM64_HAS_LPA2,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_lpa2,
+	},
 	{},
 };
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index dea3dc89234b..07f3957b8488 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -36,6 +36,7 @@ HAS_GIC_PRIO_MASKING
 HAS_GIC_PRIO_RELAXED_SYNC
 HAS_HCX
 HAS_LDAPR
+HAS_LPA2
 HAS_LSE_ATOMICS
 HAS_MOPS
 HAS_NESTED_VIRT
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

Expose FEAT_LPA2 as a capability so that we can take advantage of
alternatives patching in both the kernel and hypervisor.

Although FEAT_LPA2 presence is advertised separately for stage1 and
stage2, the expectation is that in practice both stages will either
support or not support it. Therefore, for the case where KVM is present,
we combine both into a single capability, allowing us to simplify the
implementation. For the case where KVM is not present, we only care
about stage1.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  5 ++++
 arch/arm64/kernel/cpufeature.c      | 46 +++++++++++++++++++++++++++++
 arch/arm64/tools/cpucaps            |  1 +
 3 files changed, 52 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 5bba39376055..b1292ec88538 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -831,6 +831,11 @@ static inline bool system_supports_tlb_range(void)
 		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
 }
 
+static inline bool system_supports_lpa2(void)
+{
+	return cpus_have_const_cap(ARM64_HAS_LPA2);
+}
+
 int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
 bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 444a73c2e638..1ccb1fe0e310 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1746,6 +1746,46 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
 	return !meltdown_safe;
 }
 
+static inline bool has_lpa2_at_stage1(u64 mmfr0)
+{
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+	unsigned int tgran;
+
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
+#else
+	return false;
+#endif
+}
+
+static inline bool has_lpa2_at_stage2(u64 mmfr0)
+{
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+	unsigned int tgran;
+
+	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
+	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
+#else
+	return false;
+#endif
+}
+
+static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
+{
+	u64 mmfr0;
+	bool ret;
+
+	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	ret = has_lpa2_at_stage1(mmfr0);
+
+	if (kvm_get_mode() != KVM_MODE_NONE)
+		ret = ret && has_lpa2_at_stage2(mmfr0);
+
+	return ret;
+}
+
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 #define KPTI_NG_TEMP_VA		(-(1UL << PMD_SHIFT))
 
@@ -2719,6 +2759,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
 	},
+	{
+		.desc = "Large Physical Address 2",
+		.capability = ARM64_HAS_LPA2,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_lpa2,
+	},
 	{},
 };
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index dea3dc89234b..07f3957b8488 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -36,6 +36,7 @@ HAS_GIC_PRIO_MASKING
 HAS_GIC_PRIO_RELAXED_SYNC
 HAS_HCX
 HAS_LDAPR
+HAS_LPA2
 HAS_LSE_ATOMICS
 HAS_MOPS
 HAS_NESTED_VIRT
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 05/12] KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

As per Arm ARM (0487I.a), (V)TCR_EL2.DS fields control whether 52 bit
input and output addresses are supported on 4K and 16K page size
configurations when FEAT_LPA2 is known to have been implemented.

This adds these field definitions which will be used by KVM when
FEAT_LPA2 is enabled.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 5882b2415596..e370e015d7c8 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -106,6 +106,7 @@
 #define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En)
 
 /* TCR_EL2 Registers bits */
+#define TCR_EL2_DS		(1UL << 32)
 #define TCR_EL2_RES1		((1U << 31) | (1 << 23))
 #define TCR_EL2_TBI		(1 << 20)
 #define TCR_EL2_PS_SHIFT	16
@@ -120,6 +121,7 @@
 			 TCR_EL2_ORGN0_MASK | TCR_EL2_IRGN0_MASK | TCR_EL2_T0SZ_MASK)
 
 /* VTCR_EL2 Registers bits */
+#define VTCR_EL2_DS		TCR_EL2_DS
 #define VTCR_EL2_RES1		(1U << 31)
 #define VTCR_EL2_HD		(1 << 22)
 #define VTCR_EL2_HA		(1 << 21)
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 05/12] KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

As per Arm ARM (0487I.a), (V)TCR_EL2.DS fields control whether 52 bit
input and output addresses are supported on 4K and 16K page size
configurations when FEAT_LPA2 is known to have been implemented.

This adds these field definitions which will be used by KVM when
FEAT_LPA2 is enabled.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 5882b2415596..e370e015d7c8 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -106,6 +106,7 @@
 #define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En)
 
 /* TCR_EL2 Registers bits */
+#define TCR_EL2_DS		(1UL << 32)
 #define TCR_EL2_RES1		((1U << 31) | (1 << 23))
 #define TCR_EL2_TBI		(1 << 20)
 #define TCR_EL2_PS_SHIFT	16
@@ -120,6 +121,7 @@
 			 TCR_EL2_ORGN0_MASK | TCR_EL2_IRGN0_MASK | TCR_EL2_T0SZ_MASK)
 
 /* VTCR_EL2 Registers bits */
+#define VTCR_EL2_DS		TCR_EL2_DS
 #define VTCR_EL2_RES1		(1U << 31)
 #define VTCR_EL2_HD		(1 << 22)
 #define VTCR_EL2_HA		(1 << 21)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for stage 2
and hyp stage 1, regardless of the VMM-requested IPA size or
HW-implemented PA size. When in use we can now support up to 52-bit IPA
and PA sizes.

We use the previously created cpu feature to track whether LPA2 is
supported for deciding whether to use the LPA2 or classic pte format.

Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
4KB, 64GB with 16KB). We explicitly don't enable these in the library
because stage2_apply_range() works on batch sizes of the largest used
block mapping, and increasing the size of the batch would lead to soft
lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 47 +++++++++++++++++++++-------
 arch/arm64/kvm/arm.c                 |  2 ++
 arch/arm64/kvm/hyp/nvhe/tlb.c        |  3 +-
 arch/arm64/kvm/hyp/pgtable.c         | 15 +++++++--
 arch/arm64/kvm/hyp/vhe/tlb.c         |  3 +-
 5 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index d3e354bb8351..b240158e1218 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,12 +25,22 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
 #endif
 
+static inline u64 kvm_get_parange_max(void)
+{
+	if (system_supports_lpa2() ||
+	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))
+		return ID_AA64MMFR0_EL1_PARANGE_52;
+	else
+		return ID_AA64MMFR0_EL1_PARANGE_48;
+}
+
 static inline u64 kvm_get_parange(u64 mmfr0)
 {
+	u64 parange_max = kvm_get_parange_max();
 	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
-	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
-		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
+	if (parange > parange_max)
+		parange = parange_max;
 
 	return parange;
 }
@@ -41,6 +51,8 @@ typedef u64 kvm_pte_t;
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
 #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
+#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
+#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
 
 #define KVM_PHYS_INVALID		(-1ULL)
 
@@ -51,21 +63,34 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
 
 static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
 {
-	u64 pa = pte & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16)
-		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	u64 pa;
+
+	if (system_supports_lpa2()) {
+		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
+	} else {
+		pa = pte & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16)
+			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	}
 
 	return pa;
 }
 
 static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
 {
-	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16) {
-		pa &= GENMASK(51, 48);
-		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	kvm_pte_t pte;
+
+	if (system_supports_lpa2()) {
+		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
+		pa &= GENMASK(51, 50);
+		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
+	} else {
+		pte = pa & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16) {
+			pa &= GENMASK(51, 48);
+			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+		}
 	}
 
 	return pte;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4866b3f7b4ea..73cc67c2a8a7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1747,6 +1747,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	}
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
+	if (system_supports_lpa2())
+		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
 
 	params->pgd_pa = kvm_mmu_get_httbr();
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index d42b72f78a9b..c3cd16c6f95f 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -198,7 +198,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt, false);
 
-	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
+	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
+				system_supports_lpa2());
 
 	dsb(ish);
 	__tlbi(vmalle1is);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f155b8c9e98c..062eb7bcdb8a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -79,7 +79,10 @@ static bool kvm_pgtable_walk_skip_cmo(const struct kvm_pgtable_visit_ctx *ctx)
 
 static bool kvm_phys_is_valid(u64 phys)
 {
-	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
+	u64 parange_max = kvm_get_parange_max();
+	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
+
+	return phys < BIT(shift);
 }
 
 static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
@@ -408,7 +411,8 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
 	}
 
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
+	if (!system_supports_lpa2())
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
@@ -654,6 +658,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 		vtcr |= VTCR_EL2_HA;
 #endif /* CONFIG_ARM64_HW_AFDBM */
 
+	if (system_supports_lpa2())
+		vtcr |= VTCR_EL2_DS;
+
 	/* Set the vmid bits */
 	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
 		VTCR_EL2_VS_16BIT :
@@ -711,7 +718,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
 	if (prot & KVM_PGTABLE_PROT_W)
 		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
 
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+	if (!system_supports_lpa2())
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 6041c6c78984..40cea2482a76 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -161,7 +161,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt);
 
-	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
+	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
+				system_supports_lpa2());
 
 	dsb(ish);
 	__tlbi(vmalle1is);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for stage 2
and hyp stage 1, regardless of the VMM-requested IPA size or
HW-implemented PA size. When in use we can now support up to 52-bit IPA
and PA sizes.

We use the previously created cpu feature to track whether LPA2 is
supported for deciding whether to use the LPA2 or classic pte format.

Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
4KB, 64GB with 16KB). We explicitly don't enable these in the library
because stage2_apply_range() works on batch sizes of the largest used
block mapping, and increasing the size of the batch would lead to soft
lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 47 +++++++++++++++++++++-------
 arch/arm64/kvm/arm.c                 |  2 ++
 arch/arm64/kvm/hyp/nvhe/tlb.c        |  3 +-
 arch/arm64/kvm/hyp/pgtable.c         | 15 +++++++--
 arch/arm64/kvm/hyp/vhe/tlb.c         |  3 +-
 5 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index d3e354bb8351..b240158e1218 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,12 +25,22 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
 #endif
 
+static inline u64 kvm_get_parange_max(void)
+{
+	if (system_supports_lpa2() ||
+	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))
+		return ID_AA64MMFR0_EL1_PARANGE_52;
+	else
+		return ID_AA64MMFR0_EL1_PARANGE_48;
+}
+
 static inline u64 kvm_get_parange(u64 mmfr0)
 {
+	u64 parange_max = kvm_get_parange_max();
 	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
-	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
-		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
+	if (parange > parange_max)
+		parange = parange_max;
 
 	return parange;
 }
@@ -41,6 +51,8 @@ typedef u64 kvm_pte_t;
 
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
 #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
+#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
+#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
 
 #define KVM_PHYS_INVALID		(-1ULL)
 
@@ -51,21 +63,34 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
 
 static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
 {
-	u64 pa = pte & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16)
-		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	u64 pa;
+
+	if (system_supports_lpa2()) {
+		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
+	} else {
+		pa = pte & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16)
+			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+	}
 
 	return pa;
 }
 
 static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
 {
-	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16) {
-		pa &= GENMASK(51, 48);
-		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	kvm_pte_t pte;
+
+	if (system_supports_lpa2()) {
+		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
+		pa &= GENMASK(51, 50);
+		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
+	} else {
+		pte = pa & KVM_PTE_ADDR_MASK;
+		if (PAGE_SHIFT == 16) {
+			pa &= GENMASK(51, 48);
+			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+		}
 	}
 
 	return pte;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4866b3f7b4ea..73cc67c2a8a7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1747,6 +1747,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	}
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
+	if (system_supports_lpa2())
+		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
 
 	params->pgd_pa = kvm_mmu_get_httbr();
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index d42b72f78a9b..c3cd16c6f95f 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -198,7 +198,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt, false);
 
-	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
+	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
+				system_supports_lpa2());
 
 	dsb(ish);
 	__tlbi(vmalle1is);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f155b8c9e98c..062eb7bcdb8a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -79,7 +79,10 @@ static bool kvm_pgtable_walk_skip_cmo(const struct kvm_pgtable_visit_ctx *ctx)
 
 static bool kvm_phys_is_valid(u64 phys)
 {
-	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
+	u64 parange_max = kvm_get_parange_max();
+	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
+
+	return phys < BIT(shift);
 }
 
 static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
@@ -408,7 +411,8 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
 	}
 
 	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
+	if (!system_supports_lpa2())
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
 	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
@@ -654,6 +658,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 		vtcr |= VTCR_EL2_HA;
 #endif /* CONFIG_ARM64_HW_AFDBM */
 
+	if (system_supports_lpa2())
+		vtcr |= VTCR_EL2_DS;
+
 	/* Set the vmid bits */
 	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
 		VTCR_EL2_VS_16BIT :
@@ -711,7 +718,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
 	if (prot & KVM_PGTABLE_PROT_W)
 		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
 
-	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+	if (!system_supports_lpa2())
+		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+
 	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
 	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
 	*ptep = attr;
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 6041c6c78984..40cea2482a76 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -161,7 +161,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt);
 
-	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
+	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
+				system_supports_lpa2());
 
 	dsb(ish);
 	__tlbi(vmalle1is);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 07/12] KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode()
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

With the addition of LPA2 support in the hypervisor, the PA size
supported by the HW must be capped with a runtime decision, rather than
simply using a compile-time decision based on PA_BITS. For example, on a
system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
or 16KB kernel compiled with LPA2 support must still limit the PA size
to 48 bits.

Therefore, move the insertion of the PS field into TCR_EL2 out of
__kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
where the rest of TCR_EL2 is prepared. This allows us to figure out PS
with kvm_get_parange(), which has the appropriate logic to ensure the
above requirement. (and the PS field of VTCR_EL2 is already populated
this way).

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/arm.c               | 3 +++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 73cc67c2a8a7..0bb8918475d2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1726,6 +1726,7 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 {
 	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
+	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
@@ -1747,6 +1748,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	}
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
+	tcr &= ~TCR_EL2_PS_MASK;
+	tcr |= FIELD_PREP(TCR_EL2_PS_MASK, kvm_get_parange(mmfr0));
 	if (system_supports_lpa2())
 		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 1cc06e6797bd..f62a7d360285 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -122,11 +122,7 @@ alternative_if ARM64_HAS_CNP
 alternative_else_nop_endif
 	msr	ttbr0_el2, x2
 
-	/*
-	 * Set the PS bits in TCR_EL2.
-	 */
 	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
-	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
 	msr	tcr_el2, x0
 
 	isb
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 07/12] KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode()
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

With the addition of LPA2 support in the hypervisor, the PA size
supported by the HW must be capped with a runtime decision, rather than
simply using a compile-time decision based on PA_BITS. For example, on a
system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
or 16KB kernel compiled with LPA2 support must still limit the PA size
to 48 bits.

Therefore, move the insertion of the PS field into TCR_EL2 out of
__kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
where the rest of TCR_EL2 is prepared. This allows us to figure out PS
with kvm_get_parange(), which has the appropriate logic to ensure the
above requirement. (and the PS field of VTCR_EL2 is already populated
this way).

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/arm.c               | 3 +++
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 73cc67c2a8a7..0bb8918475d2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1726,6 +1726,7 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 {
 	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
 	unsigned long tcr;
+	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 
 	/*
 	 * Calculate the raw per-cpu offset without a translation from the
@@ -1747,6 +1748,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 	}
 	tcr &= ~TCR_T0SZ_MASK;
 	tcr |= TCR_T0SZ(hyp_va_bits);
+	tcr &= ~TCR_EL2_PS_MASK;
+	tcr |= FIELD_PREP(TCR_EL2_PS_MASK, kvm_get_parange(mmfr0));
 	if (system_supports_lpa2())
 		tcr |= TCR_EL2_DS;
 	params->tcr_el2 = tcr;
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index 1cc06e6797bd..f62a7d360285 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -122,11 +122,7 @@ alternative_if ARM64_HAS_CNP
 alternative_else_nop_endif
 	msr	ttbr0_el2, x2
 
-	/*
-	 * Set the PS bits in TCR_EL2.
-	 */
 	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
-	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
 	msr	tcr_el2, x0
 
 	isb
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 08/12] KVM: arm64: Convert translation level parameter to s8
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
translation, level -1, so levels can now be in the range [-1;3]. 3 is
always the last level and the first level is determined based on the
number of VA bits in use.

Convert level variables to use a signed type in preparation for
supporting this new level -1.

Since the last level is always anchored at 3, and the first level varies
to suit the number of VA/IPA bits, take the opportunity to replace
KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
longer be true.

No behavioral changes intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h  |  2 +-
 arch/arm64/include/asm/kvm_pgtable.h  | 31 ++++++-------
 arch/arm64/include/asm/kvm_pkvm.h     |  5 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +--
 arch/arm64/kvm/hyp/nvhe/mm.c          |  4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  2 +-
 arch/arm64/kvm/hyp/pgtable.c          | 64 ++++++++++++++-------------
 arch/arm64/kvm/mmu.c                  | 16 ++++---
 8 files changed, 69 insertions(+), 61 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 3d6725ff0bf6..bf3ef66eb51f 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -404,7 +404,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
+static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index b240158e1218..c61bb9709201 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,8 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_MAX_LEVELS		4U
+#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
  * The largest supported block sizes for KVM (no 52-bit PA support):
@@ -20,9 +21,9 @@
  *  - 64K (level 2):	512MB
  */
 #ifdef CONFIG_ARM64_4K_PAGES
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
 #else
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
 #endif
 
 static inline u64 kvm_get_parange_max(void)
@@ -101,28 +102,28 @@ static inline kvm_pfn_t kvm_pte_to_pfn(kvm_pte_t pte)
 	return __phys_to_pfn(kvm_pte_to_phys(pte));
 }
 
-static inline u64 kvm_granule_shift(u32 level)
+static inline u64 kvm_granule_shift(s8 level)
 {
-	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
+	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
 	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
 }
 
-static inline u64 kvm_granule_size(u32 level)
+static inline u64 kvm_granule_size(s8 level)
 {
 	return BIT(kvm_granule_shift(level));
 }
 
-static inline bool kvm_level_supports_block_mapping(u32 level)
+static inline bool kvm_level_supports_block_mapping(s8 level)
 {
 	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
 }
 
 static inline u32 kvm_supported_block_sizes(void)
 {
-	u32 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
+	s8 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
 	u32 r = 0;
 
-	for (; level < KVM_PGTABLE_MAX_LEVELS; level++)
+	for (; level <= KVM_PGTABLE_LAST_LEVEL; level++)
 		r |= BIT(kvm_granule_shift(level));
 
 	return r;
@@ -167,7 +168,7 @@ struct kvm_pgtable_mm_ops {
 	void*		(*zalloc_page)(void *arg);
 	void*		(*zalloc_pages_exact)(size_t size);
 	void		(*free_pages_exact)(void *addr, size_t size);
-	void		(*free_unlinked_table)(void *addr, u32 level);
+	void		(*free_unlinked_table)(void *addr, s8 level);
 	void		(*get_page)(void *addr);
 	void		(*put_page)(void *addr);
 	int		(*page_count)(void *addr);
@@ -263,7 +264,7 @@ struct kvm_pgtable_visit_ctx {
 	u64					start;
 	u64					addr;
 	u64					end;
-	u32					level;
+	s8					level;
 	enum kvm_pgtable_walk_flags		flags;
 };
 
@@ -366,7 +367,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
  */
 struct kvm_pgtable {
 	u32					ia_bits;
-	u32					start_level;
+	s8					start_level;
 	kvm_pteref_t				pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 
@@ -500,7 +501,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  * The page-table is assumed to be unreachable by any hardware walkers prior to
  * freeing and therefore no TLB invalidation is performed.
  */
-void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
+void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
 
 /**
  * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure.
@@ -524,7 +525,7 @@ void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *p
  * an ERR_PTR(error) on failure.
  */
 kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
-					      u64 phys, u32 level,
+					      u64 phys, s8 level,
 					      enum kvm_pgtable_prot prot,
 					      void *mc, bool force_pte);
 
@@ -750,7 +751,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level);
+			 kvm_pte_t *ptep, s8 *level);
 
 /**
  * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index e46250a02017..ad9cfb5c1ff4 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -56,10 +56,11 @@ static inline unsigned long hyp_vm_table_pages(void)
 
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
-	unsigned long total = 0, i;
+	unsigned long total = 0;
+	int i;
 
 	/* Provision the worst case scenario */
-	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
+	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
 		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
 		total += nr_pages;
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 9d703441278b..2cfb6352a8ea 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -91,7 +91,7 @@ static void host_s2_put_page(void *addr)
 	hyp_put_page(&host_s2_pool, addr);
 }
 
-static void host_s2_free_unlinked_table(void *addr, u32 level)
+static void host_s2_free_unlinked_table(void *addr, s8 level)
 {
 	kvm_pgtable_stage2_free_unlinked(&host_mmu.mm_ops, addr, level);
 }
@@ -443,7 +443,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 {
 	struct kvm_mem_range cur;
 	kvm_pte_t pte;
-	u32 level;
+	s8 level;
 	int ret;
 
 	hyp_assert_lock_held(&host_mmu.lock);
@@ -462,7 +462,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 		cur.start = ALIGN_DOWN(addr, granule);
 		cur.end = cur.start + granule;
 		level++;
-	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
+	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
 			!(kvm_level_supports_block_mapping(level) &&
 			  range_included(&cur, range)));
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 65a7a186d7b2..b01a3d1078a8 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -260,7 +260,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
 	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
 	 */
 	dsb(ishst);
-	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
+	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
 	dsb(ish);
 	isb();
 }
@@ -275,7 +275,7 @@ static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
 
-	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_MAX_LEVELS - 1)
+	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
 		return -EINVAL;
 
 	slot->addr = ctx->addr;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 0d5e0a89ddce..bc58d1b515af 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -181,7 +181,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
-	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
+	if (ctx->level != KVM_PGTABLE_LAST_LEVEL)
 		return -EINVAL;
 
 	phys = kvm_pte_to_phys(ctx->old);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 062eb7bcdb8a..8e79ff6972ce 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -101,7 +101,7 @@ static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx,
 	return IS_ALIGNED(ctx->addr, granule);
 }
 
-static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
+static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
 {
 	u64 shift = kvm_granule_shift(level);
 	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
@@ -117,7 +117,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
 	return (addr & mask) >> shift;
 }
 
-static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
+static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
 {
 	struct kvm_pgtable pgt = {
 		.ia_bits	= ia_bits,
@@ -127,9 +127,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
-static bool kvm_pte_table(kvm_pte_t pte, u32 level)
+static bool kvm_pte_table(kvm_pte_t pte, s8 level)
 {
-	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+	if (level == KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	if (!kvm_pte_valid(pte))
@@ -157,11 +157,11 @@ static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops
 	return pte;
 }
 
-static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
 {
 	kvm_pte_t pte = kvm_phys_to_pte(pa);
-	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
-							   KVM_PTE_TYPE_BLOCK;
+	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
+						       KVM_PTE_TYPE_BLOCK;
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
@@ -206,11 +206,11 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      struct kvm_pgtable_mm_ops *mm_ops,
-				      kvm_pteref_t pteref, u32 level)
+				      kvm_pteref_t pteref, s8 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 	kvm_pte_t *ptep = kvm_dereference_pteref(data->walker, pteref);
@@ -275,12 +275,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level)
 {
 	u32 idx;
 	int ret = 0;
 
-	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
+	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
@@ -343,7 +343,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 
 struct leaf_walk_data {
 	kvm_pte_t	pte;
-	u32		level;
+	s8		level;
 };
 
 static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
@@ -358,7 +358,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
 }
 
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level)
+			 kvm_pte_t *ptep, s8 *level)
 {
 	struct leaf_walk_data data;
 	struct kvm_pgtable_walker walker = {
@@ -471,7 +471,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	if (hyp_map_walker_try_leaf(ctx, data))
 		return 0;
 
-	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
@@ -567,14 +567,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 			 struct kvm_pgtable_mm_ops *mm_ops)
 {
-	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
+	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
+			 ARM64_HW_PGTABLE_LEVELS(va_bits);
+	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
+	    start_level > KVM_PGTABLE_LAST_LEVEL)
+		return -EINVAL;
 
 	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= va_bits;
-	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
+	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= NULL;
 	pgt->force_pte_cb	= NULL;
@@ -628,7 +632,7 @@ struct stage2_map_data {
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
-	u8 lvls;
+	s8 lvls;
 
 	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
 	vtcr |= VTCR_EL2_T0SZ(phys_shift);
@@ -911,7 +915,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	u64 phys = stage2_map_walker_phys_addr(ctx, data);
 
-	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+	if (data->force_pte && ctx->level < KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	return kvm_block_mapping_supported(ctx, phys);
@@ -990,7 +994,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (ret != -E2BIG)
 		return ret;
 
-	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	if (!data->memcache)
@@ -1160,7 +1164,7 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_set;
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
-	u32				level;
+	s8				level;
 };
 
 static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
@@ -1203,7 +1207,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level, enum kvm_pgtable_walk_flags flags)
+				    s8 *level, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1305,7 +1309,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 				   enum kvm_pgtable_prot prot)
 {
 	int ret;
-	u32 level;
+	s8 level;
 	kvm_pte_t set = 0, clr = 0;
 
 	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
@@ -1358,7 +1362,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 }
 
 kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
-					      u64 phys, u32 level,
+					      u64 phys, s8 level,
 					      enum kvm_pgtable_prot prot,
 					      void *mc, bool force_pte)
 {
@@ -1416,7 +1420,7 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
  * fully populated tree up to the PTE entries. Note that @level is
  * interpreted as in "level @level entry".
  */
-static int stage2_block_get_nr_page_tables(u32 level)
+static int stage2_block_get_nr_page_tables(s8 level)
 {
 	switch (level) {
 	case 1:
@@ -1427,7 +1431,7 @@ static int stage2_block_get_nr_page_tables(u32 level)
 		return 0;
 	default:
 		WARN_ON_ONCE(level < KVM_PGTABLE_MIN_BLOCK_LEVEL ||
-			     level >= KVM_PGTABLE_MAX_LEVELS);
+			     level > KVM_PGTABLE_LAST_LEVEL);
 		return -EINVAL;
 	};
 }
@@ -1440,13 +1444,13 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	struct kvm_s2_mmu *mmu;
 	kvm_pte_t pte = ctx->old, new, *childp;
 	enum kvm_pgtable_prot prot;
-	u32 level = ctx->level;
+	s8 level = ctx->level;
 	bool force_pte;
 	int nr_pages;
 	u64 phys;
 
 	/* No huge-pages exist at the last level */
-	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+	if (level == KVM_PGTABLE_LAST_LEVEL)
 		return 0;
 
 	/* We only split valid block mappings */
@@ -1523,7 +1527,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
@@ -1546,7 +1550,7 @@ size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
 {
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 }
@@ -1582,7 +1586,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->pgd = NULL;
 }
 
-void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
+void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
 {
 	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
 	struct kvm_pgtable_walker walker = {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 482280fe22d7..73110ba3624c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -223,12 +223,12 @@ static void stage2_free_unlinked_table_rcu_cb(struct rcu_head *head)
 {
 	struct page *page = container_of(head, struct page, rcu_head);
 	void *pgtable = page_to_virt(page);
-	u32 level = page_private(page);
+	s8 level = page_private(page);
 
 	kvm_pgtable_stage2_free_unlinked(&kvm_s2_mm_ops, pgtable, level);
 }
 
-static void stage2_free_unlinked_table(void *addr, u32 level)
+static void stage2_free_unlinked_table(void *addr, s8 level)
 {
 	struct page *page = virt_to_page(addr);
 
@@ -804,13 +804,13 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 	struct kvm_pgtable pgt = {
 		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
 		.ia_bits	= vabits_actual,
-		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
-				   CONFIG_PGTABLE_LEVELS),
+		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
+				   CONFIG_PGTABLE_LEVELS + 1),
 		.mm_ops		= &kvm_user_mm_ops,
 	};
 	unsigned long flags;
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
-	u32 level = ~0;
+	s8 level = ~0;
 	int ret;
 
 	/*
@@ -829,7 +829,9 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 	 * Not seeing an error, but not updating level? Something went
 	 * deeply wrong...
 	 */
-	if (WARN_ON(level >= KVM_PGTABLE_MAX_LEVELS))
+	if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
+		return -EFAULT;
+	if (WARN_ON(level < KVM_PGTABLE_FIRST_LEVEL))
 		return -EFAULT;
 
 	/* Oops, the userspace PTs are gone... Replay the fault */
@@ -1407,7 +1409,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
+	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 08/12] KVM: arm64: Convert translation level parameter to s8
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
translation, level -1, so levels can now be in the range [-1;3]. 3 is
always the last level and the first level is determined based on the
number of VA bits in use.

Convert level variables to use a signed type in preparation for
supporting this new level -1.

Since the last level is always anchored at 3, and the first level varies
to suit the number of VA/IPA bits, take the opportunity to replace
KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
longer be true.

No behavioral changes intended.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h  |  2 +-
 arch/arm64/include/asm/kvm_pgtable.h  | 31 ++++++-------
 arch/arm64/include/asm/kvm_pkvm.h     |  5 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +--
 arch/arm64/kvm/hyp/nvhe/mm.c          |  4 +-
 arch/arm64/kvm/hyp/nvhe/setup.c       |  2 +-
 arch/arm64/kvm/hyp/pgtable.c          | 64 ++++++++++++++-------------
 arch/arm64/kvm/mmu.c                  | 16 ++++---
 8 files changed, 69 insertions(+), 61 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 3d6725ff0bf6..bf3ef66eb51f 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -404,7 +404,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
+static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index b240158e1218..c61bb9709201 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,8 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_MAX_LEVELS		4U
+#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
  * The largest supported block sizes for KVM (no 52-bit PA support):
@@ -20,9 +21,9 @@
  *  - 64K (level 2):	512MB
  */
 #ifdef CONFIG_ARM64_4K_PAGES
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
 #else
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
 #endif
 
 static inline u64 kvm_get_parange_max(void)
@@ -101,28 +102,28 @@ static inline kvm_pfn_t kvm_pte_to_pfn(kvm_pte_t pte)
 	return __phys_to_pfn(kvm_pte_to_phys(pte));
 }
 
-static inline u64 kvm_granule_shift(u32 level)
+static inline u64 kvm_granule_shift(s8 level)
 {
-	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
+	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
 	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
 }
 
-static inline u64 kvm_granule_size(u32 level)
+static inline u64 kvm_granule_size(s8 level)
 {
 	return BIT(kvm_granule_shift(level));
 }
 
-static inline bool kvm_level_supports_block_mapping(u32 level)
+static inline bool kvm_level_supports_block_mapping(s8 level)
 {
 	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
 }
 
 static inline u32 kvm_supported_block_sizes(void)
 {
-	u32 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
+	s8 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
 	u32 r = 0;
 
-	for (; level < KVM_PGTABLE_MAX_LEVELS; level++)
+	for (; level <= KVM_PGTABLE_LAST_LEVEL; level++)
 		r |= BIT(kvm_granule_shift(level));
 
 	return r;
@@ -167,7 +168,7 @@ struct kvm_pgtable_mm_ops {
 	void*		(*zalloc_page)(void *arg);
 	void*		(*zalloc_pages_exact)(size_t size);
 	void		(*free_pages_exact)(void *addr, size_t size);
-	void		(*free_unlinked_table)(void *addr, u32 level);
+	void		(*free_unlinked_table)(void *addr, s8 level);
 	void		(*get_page)(void *addr);
 	void		(*put_page)(void *addr);
 	int		(*page_count)(void *addr);
@@ -263,7 +264,7 @@ struct kvm_pgtable_visit_ctx {
 	u64					start;
 	u64					addr;
 	u64					end;
-	u32					level;
+	s8					level;
 	enum kvm_pgtable_walk_flags		flags;
 };
 
@@ -366,7 +367,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
  */
 struct kvm_pgtable {
 	u32					ia_bits;
-	u32					start_level;
+	s8					start_level;
 	kvm_pteref_t				pgd;
 	struct kvm_pgtable_mm_ops		*mm_ops;
 
@@ -500,7 +501,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
  * The page-table is assumed to be unreachable by any hardware walkers prior to
  * freeing and therefore no TLB invalidation is performed.
  */
-void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
+void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
 
 /**
  * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure.
@@ -524,7 +525,7 @@ void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *p
  * an ERR_PTR(error) on failure.
  */
 kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
-					      u64 phys, u32 level,
+					      u64 phys, s8 level,
 					      enum kvm_pgtable_prot prot,
 					      void *mc, bool force_pte);
 
@@ -750,7 +751,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level);
+			 kvm_pte_t *ptep, s8 *level);
 
 /**
  * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index e46250a02017..ad9cfb5c1ff4 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -56,10 +56,11 @@ static inline unsigned long hyp_vm_table_pages(void)
 
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
-	unsigned long total = 0, i;
+	unsigned long total = 0;
+	int i;
 
 	/* Provision the worst case scenario */
-	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
+	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
 		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
 		total += nr_pages;
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 9d703441278b..2cfb6352a8ea 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -91,7 +91,7 @@ static void host_s2_put_page(void *addr)
 	hyp_put_page(&host_s2_pool, addr);
 }
 
-static void host_s2_free_unlinked_table(void *addr, u32 level)
+static void host_s2_free_unlinked_table(void *addr, s8 level)
 {
 	kvm_pgtable_stage2_free_unlinked(&host_mmu.mm_ops, addr, level);
 }
@@ -443,7 +443,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 {
 	struct kvm_mem_range cur;
 	kvm_pte_t pte;
-	u32 level;
+	s8 level;
 	int ret;
 
 	hyp_assert_lock_held(&host_mmu.lock);
@@ -462,7 +462,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 		cur.start = ALIGN_DOWN(addr, granule);
 		cur.end = cur.start + granule;
 		level++;
-	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
+	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
 			!(kvm_level_supports_block_mapping(level) &&
 			  range_included(&cur, range)));
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 65a7a186d7b2..b01a3d1078a8 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -260,7 +260,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
 	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
 	 */
 	dsb(ishst);
-	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
+	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
 	dsb(ish);
 	isb();
 }
@@ -275,7 +275,7 @@ static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
 
-	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_MAX_LEVELS - 1)
+	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
 		return -EINVAL;
 
 	slot->addr = ctx->addr;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 0d5e0a89ddce..bc58d1b515af 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -181,7 +181,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!kvm_pte_valid(ctx->old))
 		return 0;
 
-	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
+	if (ctx->level != KVM_PGTABLE_LAST_LEVEL)
 		return -EINVAL;
 
 	phys = kvm_pte_to_phys(ctx->old);
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 062eb7bcdb8a..8e79ff6972ce 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -101,7 +101,7 @@ static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx,
 	return IS_ALIGNED(ctx->addr, granule);
 }
 
-static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
+static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
 {
 	u64 shift = kvm_granule_shift(level);
 	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
@@ -117,7 +117,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
 	return (addr & mask) >> shift;
 }
 
-static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
+static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
 {
 	struct kvm_pgtable pgt = {
 		.ia_bits	= ia_bits,
@@ -127,9 +127,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
 	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
 }
 
-static bool kvm_pte_table(kvm_pte_t pte, u32 level)
+static bool kvm_pte_table(kvm_pte_t pte, s8 level)
 {
-	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+	if (level == KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	if (!kvm_pte_valid(pte))
@@ -157,11 +157,11 @@ static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops
 	return pte;
 }
 
-static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
+static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
 {
 	kvm_pte_t pte = kvm_phys_to_pte(pa);
-	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
-							   KVM_PTE_TYPE_BLOCK;
+	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
+						       KVM_PTE_TYPE_BLOCK;
 
 	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
 	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
@@ -206,11 +206,11 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level);
 
 static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 				      struct kvm_pgtable_mm_ops *mm_ops,
-				      kvm_pteref_t pteref, u32 level)
+				      kvm_pteref_t pteref, s8 level)
 {
 	enum kvm_pgtable_walk_flags flags = data->walker->flags;
 	kvm_pte_t *ptep = kvm_dereference_pteref(data->walker, pteref);
@@ -275,12 +275,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
 }
 
 static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
-			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
+			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level)
 {
 	u32 idx;
 	int ret = 0;
 
-	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
+	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
@@ -343,7 +343,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 
 struct leaf_walk_data {
 	kvm_pte_t	pte;
-	u32		level;
+	s8		level;
 };
 
 static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
@@ -358,7 +358,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
 }
 
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-			 kvm_pte_t *ptep, u32 *level)
+			 kvm_pte_t *ptep, s8 *level)
 {
 	struct leaf_walk_data data;
 	struct kvm_pgtable_walker walker = {
@@ -471,7 +471,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	if (hyp_map_walker_try_leaf(ctx, data))
 		return 0;
 
-	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
@@ -567,14 +567,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 			 struct kvm_pgtable_mm_ops *mm_ops)
 {
-	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
+	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
+			 ARM64_HW_PGTABLE_LEVELS(va_bits);
+	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
+	    start_level > KVM_PGTABLE_LAST_LEVEL)
+		return -EINVAL;
 
 	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
 	if (!pgt->pgd)
 		return -ENOMEM;
 
 	pgt->ia_bits		= va_bits;
-	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
+	pgt->start_level	= start_level;
 	pgt->mm_ops		= mm_ops;
 	pgt->mmu		= NULL;
 	pgt->force_pte_cb	= NULL;
@@ -628,7 +632,7 @@ struct stage2_map_data {
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 {
 	u64 vtcr = VTCR_EL2_FLAGS;
-	u8 lvls;
+	s8 lvls;
 
 	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
 	vtcr |= VTCR_EL2_T0SZ(phys_shift);
@@ -911,7 +915,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
 {
 	u64 phys = stage2_map_walker_phys_addr(ctx, data);
 
-	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+	if (data->force_pte && ctx->level < KVM_PGTABLE_LAST_LEVEL)
 		return false;
 
 	return kvm_block_mapping_supported(ctx, phys);
@@ -990,7 +994,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (ret != -E2BIG)
 		return ret;
 
-	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
+	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
 		return -EINVAL;
 
 	if (!data->memcache)
@@ -1160,7 +1164,7 @@ struct stage2_attr_data {
 	kvm_pte_t			attr_set;
 	kvm_pte_t			attr_clr;
 	kvm_pte_t			pte;
-	u32				level;
+	s8				level;
 };
 
 static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
@@ -1203,7 +1207,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
 static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 				    u64 size, kvm_pte_t attr_set,
 				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
-				    u32 *level, enum kvm_pgtable_walk_flags flags)
+				    s8 *level, enum kvm_pgtable_walk_flags flags)
 {
 	int ret;
 	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
@@ -1305,7 +1309,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
 				   enum kvm_pgtable_prot prot)
 {
 	int ret;
-	u32 level;
+	s8 level;
 	kvm_pte_t set = 0, clr = 0;
 
 	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
@@ -1358,7 +1362,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 }
 
 kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
-					      u64 phys, u32 level,
+					      u64 phys, s8 level,
 					      enum kvm_pgtable_prot prot,
 					      void *mc, bool force_pte)
 {
@@ -1416,7 +1420,7 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
  * fully populated tree up to the PTE entries. Note that @level is
  * interpreted as in "level @level entry".
  */
-static int stage2_block_get_nr_page_tables(u32 level)
+static int stage2_block_get_nr_page_tables(s8 level)
 {
 	switch (level) {
 	case 1:
@@ -1427,7 +1431,7 @@ static int stage2_block_get_nr_page_tables(u32 level)
 		return 0;
 	default:
 		WARN_ON_ONCE(level < KVM_PGTABLE_MIN_BLOCK_LEVEL ||
-			     level >= KVM_PGTABLE_MAX_LEVELS);
+			     level > KVM_PGTABLE_LAST_LEVEL);
 		return -EINVAL;
 	};
 }
@@ -1440,13 +1444,13 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	struct kvm_s2_mmu *mmu;
 	kvm_pte_t pte = ctx->old, new, *childp;
 	enum kvm_pgtable_prot prot;
-	u32 level = ctx->level;
+	s8 level = ctx->level;
 	bool force_pte;
 	int nr_pages;
 	u64 phys;
 
 	/* No huge-pages exist at the last level */
-	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+	if (level == KVM_PGTABLE_LAST_LEVEL)
 		return 0;
 
 	/* We only split valid block mappings */
@@ -1523,7 +1527,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	u64 vtcr = mmu->arch->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
@@ -1546,7 +1550,7 @@ size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
 {
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
-	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
 
 	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
 }
@@ -1582,7 +1586,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
 	pgt->pgd = NULL;
 }
 
-void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
+void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
 {
 	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
 	struct kvm_pgtable_walker walker = {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 482280fe22d7..73110ba3624c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -223,12 +223,12 @@ static void stage2_free_unlinked_table_rcu_cb(struct rcu_head *head)
 {
 	struct page *page = container_of(head, struct page, rcu_head);
 	void *pgtable = page_to_virt(page);
-	u32 level = page_private(page);
+	s8 level = page_private(page);
 
 	kvm_pgtable_stage2_free_unlinked(&kvm_s2_mm_ops, pgtable, level);
 }
 
-static void stage2_free_unlinked_table(void *addr, u32 level)
+static void stage2_free_unlinked_table(void *addr, s8 level)
 {
 	struct page *page = virt_to_page(addr);
 
@@ -804,13 +804,13 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 	struct kvm_pgtable pgt = {
 		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
 		.ia_bits	= vabits_actual,
-		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
-				   CONFIG_PGTABLE_LEVELS),
+		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
+				   CONFIG_PGTABLE_LEVELS + 1),
 		.mm_ops		= &kvm_user_mm_ops,
 	};
 	unsigned long flags;
 	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
-	u32 level = ~0;
+	s8 level = ~0;
 	int ret;
 
 	/*
@@ -829,7 +829,9 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 	 * Not seeing an error, but not updating level? Something went
 	 * deeply wrong...
 	 */
-	if (WARN_ON(level >= KVM_PGTABLE_MAX_LEVELS))
+	if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
+		return -EFAULT;
+	if (WARN_ON(level < KVM_PGTABLE_FIRST_LEVEL))
 		return -EFAULT;
 
 	/* Oops, the userspace PTs are gone... Replay the fault */
@@ -1407,7 +1409,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
+	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 09/12] KVM: arm64: Support up to 5 levels of translation in kvm_pgtable
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

FEAT_LPA2 increases the maximum levels of translation from 4 to 5 for
the 4KB page case, when IA is >48 bits. While we can still use 4 levels
for stage2 translation in this case (due to stage2 allowing concatenated
page tables for first level lookup), the same kvm_pgtable library is
used for the hyp stage1 page tables and stage1 does not support
concatenation.

Therefore, modify the library to support up to 5 levels. Previous
patches already laid the groundwork for this by refactoring code to work
in terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. So we
just need to change these macros.

The hardware sometimes encodes the new level differently from the
others: One such place is when reading the level from the FSC field in
the ESR_EL2 register. We never expect to see the lowest level (-1) here
since the stage 2 page tables always use concatenated tables for first
level lookup and therefore only use 4 levels of lookup. So we get away
with just adding a comment to explain why we are not being careful about
decoding level -1.

For stage2 VTCR_EL2.SL2 is introduced to encode the new start level.
However, since we always use concatenated page tables for first level
look up at stage2 (and therefore we will never need the new extra level)
we never touch this new field.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_pgtable.h |  2 +-
 arch/arm64/kvm/hyp/pgtable.c         |  9 +++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index bf3ef66eb51f..9afce5e42352 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -406,6 +406,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 
 static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
+	/*
+	 * Note: With the introduction of FEAT_LPA2 an extra level of
+	 * translation (level -1) is added. This level (obviously) doesn't
+	 * follow the previous convention of encoding the 4 levels in the 2 LSBs
+	 * of the FSC so this function breaks if the fault is for level -1.
+	 *
+	 * However, stage2 tables always use concatenated tables for first level
+	 * lookup and therefore it is guaranteed that the level will be between
+	 * 0 and 3, and this function continues to work.
+	 */
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c61bb9709201..3d2cde571553 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,7 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_FIRST_LEVEL		-1
 #define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8e79ff6972ce..20a2322fa45a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -643,6 +643,15 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	lvls = stage2_pgtable_levels(phys_shift);
 	if (lvls < 2)
 		lvls = 2;
+
+	/*
+	 * When LPA2 is enabled, the HW supports an extra level of translation
+	 * (for 5 in total) when using 4K pages. It also introduces VTCR_EL2.SL2
+	 * to as an addition to SL0 to enable encoding this extra start level.
+	 * However, since we always use concatenated pages for the first level
+	 * lookup, we will never need this extra level and therefore do not need
+	 * to touch SL2.
+	 */
 	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
 
 #ifdef CONFIG_ARM64_HW_AFDBM
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 09/12] KVM: arm64: Support up to 5 levels of translation in kvm_pgtable
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

FEAT_LPA2 increases the maximum levels of translation from 4 to 5 for
the 4KB page case, when IA is >48 bits. While we can still use 4 levels
for stage2 translation in this case (due to stage2 allowing concatenated
page tables for first level lookup), the same kvm_pgtable library is
used for the hyp stage1 page tables and stage1 does not support
concatenation.

Therefore, modify the library to support up to 5 levels. Previous
patches already laid the groundwork for this by refactoring code to work
in terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. So we
just need to change these macros.

The hardware sometimes encodes the new level differently from the
others: One such place is when reading the level from the FSC field in
the ESR_EL2 register. We never expect to see the lowest level (-1) here
since the stage 2 page tables always use concatenated tables for first
level lookup and therefore only use 4 levels of lookup. So we get away
with just adding a comment to explain why we are not being careful about
decoding level -1.

For stage2 VTCR_EL2.SL2 is introduced to encode the new start level.
However, since we always use concatenated page tables for first level
look up at stage2 (and therefore we will never need the new extra level)
we never touch this new field.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_pgtable.h |  2 +-
 arch/arm64/kvm/hyp/pgtable.c         |  9 +++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index bf3ef66eb51f..9afce5e42352 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -406,6 +406,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 
 static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
 {
+	/*
+	 * Note: With the introduction of FEAT_LPA2 an extra level of
+	 * translation (level -1) is added. This level (obviously) doesn't
+	 * follow the previous convention of encoding the 4 levels in the 2 LSBs
+	 * of the FSC so this function breaks if the fault is for level -1.
+	 *
+	 * However, stage2 tables always use concatenated tables for first level
+	 * lookup and therefore it is guaranteed that the level will be between
+	 * 0 and 3, and this function continues to work.
+	 */
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c61bb9709201..3d2cde571553 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,7 @@
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 
-#define KVM_PGTABLE_FIRST_LEVEL		0
+#define KVM_PGTABLE_FIRST_LEVEL		-1
 #define KVM_PGTABLE_LAST_LEVEL		3
 
 /*
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8e79ff6972ce..20a2322fa45a 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -643,6 +643,15 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
 	lvls = stage2_pgtable_levels(phys_shift);
 	if (lvls < 2)
 		lvls = 2;
+
+	/*
+	 * When LPA2 is enabled, the HW supports an extra level of translation
+	 * (for 5 in total) when using 4K pages. It also introduces VTCR_EL2.SL2
+	 * to as an addition to SL0 to enable encoding this extra start level.
+	 * However, since we always use concatenated pages for the first level
+	 * lookup, we will never need this extra level and therefore do not need
+	 * to touch SL2.
+	 */
 	vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
 
 #ifdef CONFIG_ARM64_HW_AFDBM
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 10/12] KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

With all the page-table infrastructure in place, we can finally increase
the maximum permisable IPA size to 52-bits on 4KB and 16KB page systems
that have FEAT_LPA2.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/reset.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 7a65a35ee4ac..7816c64d4701 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -316,12 +316,11 @@ int __init kvm_set_ipa_limit(void)
 	parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
 	/*
-	 * IPA size beyond 48 bits could not be supported
-	 * on either 4K or 16K page size. Hence let's cap
-	 * it to 48 bits, in case it's reported as larger
-	 * on the system.
+	 * IPA size beyond 48 bits for 4K and 16K page size is only supported
+	 * when LPA2 is available. So if we have LPA2, enable it, else cap to 48
+	 * bits, in case it's reported as larger on the system.
 	 */
-	if (PAGE_SIZE != SZ_64K)
+	if (!system_supports_lpa2() && PAGE_SIZE != SZ_64K)
 		parange = min(parange, (unsigned int)ID_AA64MMFR0_EL1_PARANGE_48);
 
 	/*
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 10/12] KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

With all the page-table infrastructure in place, we can finally increase
the maximum permisable IPA size to 52-bits on 4KB and 16KB page systems
that have FEAT_LPA2.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 arch/arm64/kvm/reset.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 7a65a35ee4ac..7816c64d4701 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -316,12 +316,11 @@ int __init kvm_set_ipa_limit(void)
 	parange = cpuid_feature_extract_unsigned_field(mmfr0,
 				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
 	/*
-	 * IPA size beyond 48 bits could not be supported
-	 * on either 4K or 16K page size. Hence let's cap
-	 * it to 48 bits, in case it's reported as larger
-	 * on the system.
+	 * IPA size beyond 48 bits for 4K and 16K page size is only supported
+	 * when LPA2 is available. So if we have LPA2, enable it, else cap to 48
+	 * bits, in case it's reported as larger on the system.
 	 */
-	if (PAGE_SIZE != SZ_64K)
+	if (!system_supports_lpa2() && PAGE_SIZE != SZ_64K)
 		parange = min(parange, (unsigned int)ID_AA64MMFR0_EL1_PARANGE_48);
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 11/12] KVM: selftests: arm64: Determine max ipa size per-page size
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

We are about to add 52 bit PA guest modes for 4K and 16K pages when the
system supports LPA2. In preparation beef up the logic that parses mmfr0
to also tell us what the maximum supported PA size is for each page
size. Max PA size = 0 implies the page size is not supported at all.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 .../selftests/kvm/include/aarch64/processor.h |  4 +-
 .../selftests/kvm/lib/aarch64/processor.c     | 27 ++++++++++---
 tools/testing/selftests/kvm/lib/guest_modes.c | 40 ++++++++-----------
 3 files changed, 41 insertions(+), 30 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h b/tools/testing/selftests/kvm/include/aarch64/processor.h
index cb537253a6b9..9e415cf4f8dd 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -118,8 +118,8 @@ enum {
 /* Access flag update enable/disable */
 #define TCR_EL1_HA		(1ULL << 39)
 
-void aarch64_get_supported_page_sizes(uint32_t ipa,
-				      bool *ps4k, bool *ps16k, bool *ps64k);
+void aarch64_get_supported_page_sizes(uint32_t ipa, uint32_t *ipa4k,
+					uint32_t *ipa16k, uint32_t *ipa64k);
 
 void vm_init_descriptor_tables(struct kvm_vm *vm);
 void vcpu_init_descriptor_tables(struct kvm_vcpu *vcpu);
diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c b/tools/testing/selftests/kvm/lib/aarch64/processor.c
index 3a0259e25335..e50dad81e956 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
@@ -492,12 +492,24 @@ uint32_t guest_get_vcpuid(void)
 	return read_sysreg(tpidr_el1);
 }
 
-void aarch64_get_supported_page_sizes(uint32_t ipa,
-				      bool *ps4k, bool *ps16k, bool *ps64k)
+static uint32_t max_ipa_for_page_size(uint32_t vm_ipa, uint32_t gran,
+				uint32_t not_sup_val, uint32_t ipa52_min_val)
+{
+	if (gran == not_sup_val)
+		return 0;
+	else if (gran >= ipa52_min_val && vm_ipa >= 52)
+		return 52;
+	else
+		return min(vm_ipa, 48U);
+}
+
+void aarch64_get_supported_page_sizes(uint32_t ipa, uint32_t *ipa4k,
+					uint32_t *ipa16k, uint32_t *ipa64k)
 {
 	struct kvm_vcpu_init preferred_init;
 	int kvm_fd, vm_fd, vcpu_fd, err;
 	uint64_t val;
+	uint32_t gran;
 	struct kvm_one_reg reg = {
 		.id	= KVM_ARM64_SYS_REG(SYS_ID_AA64MMFR0_EL1),
 		.addr	= (uint64_t)&val,
@@ -518,9 +530,14 @@ void aarch64_get_supported_page_sizes(uint32_t ipa,
 	err = ioctl(vcpu_fd, KVM_GET_ONE_REG, &reg);
 	TEST_ASSERT(err == 0, KVM_IOCTL_ERROR(KVM_GET_ONE_REG, vcpu_fd));
 
-	*ps4k = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN4), val) != 0xf;
-	*ps64k = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN64), val) == 0;
-	*ps16k = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN16), val) != 0;
+	gran = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN4), val);
+	*ipa4k = max_ipa_for_page_size(ipa, gran, 0xf, 1);
+
+	gran = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN64), val);
+	*ipa64k = max_ipa_for_page_size(ipa, gran, 0xf, 0);
+
+	gran = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN16), val);
+	*ipa16k = max_ipa_for_page_size(ipa, gran, 0, 2);
 
 	close(vcpu_fd);
 	close(vm_fd);
diff --git a/tools/testing/selftests/kvm/lib/guest_modes.c b/tools/testing/selftests/kvm/lib/guest_modes.c
index 1df3ce4b16fd..c64c5cf49942 100644
--- a/tools/testing/selftests/kvm/lib/guest_modes.c
+++ b/tools/testing/selftests/kvm/lib/guest_modes.c
@@ -18,33 +18,27 @@ void guest_modes_append_default(void)
 #else
 	{
 		unsigned int limit = kvm_check_cap(KVM_CAP_ARM_VM_IPA_SIZE);
-		bool ps4k, ps16k, ps64k;
+		uint32_t ipa4k, ipa16k, ipa64k;
 		int i;
 
-		aarch64_get_supported_page_sizes(limit, &ps4k, &ps16k, &ps64k);
+		aarch64_get_supported_page_sizes(limit, &ipa4k, &ipa16k, &ipa64k);
 
-		vm_mode_default = NUM_VM_MODES;
+		guest_mode_append(VM_MODE_P52V48_64K, ipa64k >= 52, ipa64k >= 52);
 
-		if (limit >= 52)
-			guest_mode_append(VM_MODE_P52V48_64K, ps64k, ps64k);
-		if (limit >= 48) {
-			guest_mode_append(VM_MODE_P48V48_4K, ps4k, ps4k);
-			guest_mode_append(VM_MODE_P48V48_16K, ps16k, ps16k);
-			guest_mode_append(VM_MODE_P48V48_64K, ps64k, ps64k);
-		}
-		if (limit >= 40) {
-			guest_mode_append(VM_MODE_P40V48_4K, ps4k, ps4k);
-			guest_mode_append(VM_MODE_P40V48_16K, ps16k, ps16k);
-			guest_mode_append(VM_MODE_P40V48_64K, ps64k, ps64k);
-			if (ps4k)
-				vm_mode_default = VM_MODE_P40V48_4K;
-		}
-		if (limit >= 36) {
-			guest_mode_append(VM_MODE_P36V48_4K, ps4k, ps4k);
-			guest_mode_append(VM_MODE_P36V48_16K, ps16k, ps16k);
-			guest_mode_append(VM_MODE_P36V48_64K, ps64k, ps64k);
-			guest_mode_append(VM_MODE_P36V47_16K, ps16k, ps16k);
-		}
+		guest_mode_append(VM_MODE_P48V48_4K, ipa4k >= 48, ipa4k >= 48);
+		guest_mode_append(VM_MODE_P48V48_16K, ipa16k >= 48, ipa16k >= 48);
+		guest_mode_append(VM_MODE_P48V48_64K, ipa64k >= 48, ipa16k >= 48);
+
+		guest_mode_append(VM_MODE_P40V48_4K, ipa4k >= 40, ipa4k >= 40);
+		guest_mode_append(VM_MODE_P40V48_16K, ipa16k >= 40, ipa16k >= 40);
+		guest_mode_append(VM_MODE_P40V48_64K, ipa64k >= 40, ipa64k >= 40);
+
+		guest_mode_append(VM_MODE_P36V48_4K, ipa4k >= 36, ipa4k >= 36);
+		guest_mode_append(VM_MODE_P36V48_16K, ipa16k >= 36, ipa16k >= 36);
+		guest_mode_append(VM_MODE_P36V48_64K, ipa64k >= 36, ipa64k >= 36);
+		guest_mode_append(VM_MODE_P36V47_16K, ipa16k >= 36, ipa16k >= 36);
+
+		vm_mode_default = ipa4k >= 40 ? VM_MODE_P40V48_4K : NUM_VM_MODES;
 
 		/*
 		 * Pick the first supported IPA size if the default
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 11/12] KVM: selftests: arm64: Determine max ipa size per-page size
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

We are about to add 52 bit PA guest modes for 4K and 16K pages when the
system supports LPA2. In preparation beef up the logic that parses mmfr0
to also tell us what the maximum supported PA size is for each page
size. Max PA size = 0 implies the page size is not supported at all.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 .../selftests/kvm/include/aarch64/processor.h |  4 +-
 .../selftests/kvm/lib/aarch64/processor.c     | 27 ++++++++++---
 tools/testing/selftests/kvm/lib/guest_modes.c | 40 ++++++++-----------
 3 files changed, 41 insertions(+), 30 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h b/tools/testing/selftests/kvm/include/aarch64/processor.h
index cb537253a6b9..9e415cf4f8dd 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -118,8 +118,8 @@ enum {
 /* Access flag update enable/disable */
 #define TCR_EL1_HA		(1ULL << 39)
 
-void aarch64_get_supported_page_sizes(uint32_t ipa,
-				      bool *ps4k, bool *ps16k, bool *ps64k);
+void aarch64_get_supported_page_sizes(uint32_t ipa, uint32_t *ipa4k,
+					uint32_t *ipa16k, uint32_t *ipa64k);
 
 void vm_init_descriptor_tables(struct kvm_vm *vm);
 void vcpu_init_descriptor_tables(struct kvm_vcpu *vcpu);
diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c b/tools/testing/selftests/kvm/lib/aarch64/processor.c
index 3a0259e25335..e50dad81e956 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
@@ -492,12 +492,24 @@ uint32_t guest_get_vcpuid(void)
 	return read_sysreg(tpidr_el1);
 }
 
-void aarch64_get_supported_page_sizes(uint32_t ipa,
-				      bool *ps4k, bool *ps16k, bool *ps64k)
+static uint32_t max_ipa_for_page_size(uint32_t vm_ipa, uint32_t gran,
+				uint32_t not_sup_val, uint32_t ipa52_min_val)
+{
+	if (gran == not_sup_val)
+		return 0;
+	else if (gran >= ipa52_min_val && vm_ipa >= 52)
+		return 52;
+	else
+		return min(vm_ipa, 48U);
+}
+
+void aarch64_get_supported_page_sizes(uint32_t ipa, uint32_t *ipa4k,
+					uint32_t *ipa16k, uint32_t *ipa64k)
 {
 	struct kvm_vcpu_init preferred_init;
 	int kvm_fd, vm_fd, vcpu_fd, err;
 	uint64_t val;
+	uint32_t gran;
 	struct kvm_one_reg reg = {
 		.id	= KVM_ARM64_SYS_REG(SYS_ID_AA64MMFR0_EL1),
 		.addr	= (uint64_t)&val,
@@ -518,9 +530,14 @@ void aarch64_get_supported_page_sizes(uint32_t ipa,
 	err = ioctl(vcpu_fd, KVM_GET_ONE_REG, &reg);
 	TEST_ASSERT(err == 0, KVM_IOCTL_ERROR(KVM_GET_ONE_REG, vcpu_fd));
 
-	*ps4k = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN4), val) != 0xf;
-	*ps64k = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN64), val) == 0;
-	*ps16k = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN16), val) != 0;
+	gran = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN4), val);
+	*ipa4k = max_ipa_for_page_size(ipa, gran, 0xf, 1);
+
+	gran = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN64), val);
+	*ipa64k = max_ipa_for_page_size(ipa, gran, 0xf, 0);
+
+	gran = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_TGRAN16), val);
+	*ipa16k = max_ipa_for_page_size(ipa, gran, 0, 2);
 
 	close(vcpu_fd);
 	close(vm_fd);
diff --git a/tools/testing/selftests/kvm/lib/guest_modes.c b/tools/testing/selftests/kvm/lib/guest_modes.c
index 1df3ce4b16fd..c64c5cf49942 100644
--- a/tools/testing/selftests/kvm/lib/guest_modes.c
+++ b/tools/testing/selftests/kvm/lib/guest_modes.c
@@ -18,33 +18,27 @@ void guest_modes_append_default(void)
 #else
 	{
 		unsigned int limit = kvm_check_cap(KVM_CAP_ARM_VM_IPA_SIZE);
-		bool ps4k, ps16k, ps64k;
+		uint32_t ipa4k, ipa16k, ipa64k;
 		int i;
 
-		aarch64_get_supported_page_sizes(limit, &ps4k, &ps16k, &ps64k);
+		aarch64_get_supported_page_sizes(limit, &ipa4k, &ipa16k, &ipa64k);
 
-		vm_mode_default = NUM_VM_MODES;
+		guest_mode_append(VM_MODE_P52V48_64K, ipa64k >= 52, ipa64k >= 52);
 
-		if (limit >= 52)
-			guest_mode_append(VM_MODE_P52V48_64K, ps64k, ps64k);
-		if (limit >= 48) {
-			guest_mode_append(VM_MODE_P48V48_4K, ps4k, ps4k);
-			guest_mode_append(VM_MODE_P48V48_16K, ps16k, ps16k);
-			guest_mode_append(VM_MODE_P48V48_64K, ps64k, ps64k);
-		}
-		if (limit >= 40) {
-			guest_mode_append(VM_MODE_P40V48_4K, ps4k, ps4k);
-			guest_mode_append(VM_MODE_P40V48_16K, ps16k, ps16k);
-			guest_mode_append(VM_MODE_P40V48_64K, ps64k, ps64k);
-			if (ps4k)
-				vm_mode_default = VM_MODE_P40V48_4K;
-		}
-		if (limit >= 36) {
-			guest_mode_append(VM_MODE_P36V48_4K, ps4k, ps4k);
-			guest_mode_append(VM_MODE_P36V48_16K, ps16k, ps16k);
-			guest_mode_append(VM_MODE_P36V48_64K, ps64k, ps64k);
-			guest_mode_append(VM_MODE_P36V47_16K, ps16k, ps16k);
-		}
+		guest_mode_append(VM_MODE_P48V48_4K, ipa4k >= 48, ipa4k >= 48);
+		guest_mode_append(VM_MODE_P48V48_16K, ipa16k >= 48, ipa16k >= 48);
+		guest_mode_append(VM_MODE_P48V48_64K, ipa64k >= 48, ipa16k >= 48);
+
+		guest_mode_append(VM_MODE_P40V48_4K, ipa4k >= 40, ipa4k >= 40);
+		guest_mode_append(VM_MODE_P40V48_16K, ipa16k >= 40, ipa16k >= 40);
+		guest_mode_append(VM_MODE_P40V48_64K, ipa64k >= 40, ipa64k >= 40);
+
+		guest_mode_append(VM_MODE_P36V48_4K, ipa4k >= 36, ipa4k >= 36);
+		guest_mode_append(VM_MODE_P36V48_16K, ipa16k >= 36, ipa16k >= 36);
+		guest_mode_append(VM_MODE_P36V48_64K, ipa64k >= 36, ipa64k >= 36);
+		guest_mode_append(VM_MODE_P36V47_16K, ipa16k >= 36, ipa16k >= 36);
+
+		vm_mode_default = ipa4k >= 40 ? VM_MODE_P40V48_4K : NUM_VM_MODES;
 
 		/*
 		 * Pick the first supported IPA size if the default
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 12/12] KVM: selftests: arm64: Support P52V48 4K and 16K guest_modes
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-09 18:50   ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

Add support for VM_MODE_P52V48_4K and VM_MODE_P52V48_16K guest modes by
using the FEAT_LPA2 pte format for stage1, when FEAT_LPA2 is available.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 .../selftests/kvm/include/kvm_util_base.h     |  1 +
 .../selftests/kvm/lib/aarch64/processor.c     | 39 ++++++++++++++-----
 tools/testing/selftests/kvm/lib/guest_modes.c |  2 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  3 ++
 4 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index a18db6a7b3cf..406500fb6e28 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -171,6 +171,7 @@ static inline struct userspace_mem_region *vm_get_mem_region(struct kvm_vm *vm,
 
 enum vm_guest_mode {
 	VM_MODE_P52V48_4K,
+	VM_MODE_P52V48_16K,
 	VM_MODE_P52V48_64K,
 	VM_MODE_P48V48_4K,
 	VM_MODE_P48V48_16K,
diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c b/tools/testing/selftests/kvm/lib/aarch64/processor.c
index e50dad81e956..7137222d7bcb 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
@@ -12,6 +12,7 @@
 #include "kvm_util.h"
 #include "processor.h"
 #include <linux/bitfield.h>
+#include <linux/sizes.h>
 
 #define DEFAULT_ARM64_GUEST_STACK_VADDR_MIN	0xac0000
 
@@ -58,13 +59,25 @@ static uint64_t pte_index(struct kvm_vm *vm, vm_vaddr_t gva)
 	return (gva >> vm->page_shift) & mask;
 }
 
+static inline bool use_lpa2_pte_format(struct kvm_vm *vm)
+{
+	return (vm->page_size == SZ_4K || vm->page_size == SZ_16K) &&
+	    (vm->pa_bits > 48 || vm->va_bits > 48);
+}
+
 static uint64_t addr_pte(struct kvm_vm *vm, uint64_t pa, uint64_t attrs)
 {
 	uint64_t pte;
 
-	pte = pa & GENMASK(47, vm->page_shift);
-	if (vm->page_shift == 16)
-		pte |= FIELD_GET(GENMASK(51, 48), pa) << 12;
+	if (use_lpa2_pte_format(vm)) {
+		pte = pa & GENMASK(49, vm->page_shift);
+		pte |= FIELD_GET(GENMASK(51, 50), pa) << 8;
+		attrs &= ~GENMASK(9, 8);
+	} else {
+		pte = pa & GENMASK(47, vm->page_shift);
+		if (vm->page_shift == 16)
+			pte |= FIELD_GET(GENMASK(51, 48), pa) << 12;
+	}
 	pte |= attrs;
 
 	return pte;
@@ -74,9 +87,14 @@ static uint64_t pte_addr(struct kvm_vm *vm, uint64_t pte)
 {
 	uint64_t pa;
 
-	pa = pte & GENMASK(47, vm->page_shift);
-	if (vm->page_shift == 16)
-		pa |= FIELD_GET(GENMASK(15, 12), pte) << 48;
+	if (use_lpa2_pte_format(vm)) {
+		pa = pte & GENMASK(49, vm->page_shift);
+		pa |= FIELD_GET(GENMASK(9, 8), pte) << 50;
+	} else {
+		pa = pte & GENMASK(47, vm->page_shift);
+		if (vm->page_shift == 16)
+			pa |= FIELD_GET(GENMASK(15, 12), pte) << 48;
+	}
 
 	return pa;
 }
@@ -266,9 +284,6 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 
 	/* Configure base granule size */
 	switch (vm->mode) {
-	case VM_MODE_P52V48_4K:
-		TEST_FAIL("AArch64 does not support 4K sized pages "
-			  "with 52-bit physical address ranges");
 	case VM_MODE_PXXV48_4K:
 		TEST_FAIL("AArch64 does not support 4K sized pages "
 			  "with ANY-bit physical address ranges");
@@ -278,12 +293,14 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 	case VM_MODE_P36V48_64K:
 		tcr_el1 |= 1ul << 14; /* TG0 = 64KB */
 		break;
+	case VM_MODE_P52V48_16K:
 	case VM_MODE_P48V48_16K:
 	case VM_MODE_P40V48_16K:
 	case VM_MODE_P36V48_16K:
 	case VM_MODE_P36V47_16K:
 		tcr_el1 |= 2ul << 14; /* TG0 = 16KB */
 		break;
+	case VM_MODE_P52V48_4K:
 	case VM_MODE_P48V48_4K:
 	case VM_MODE_P40V48_4K:
 	case VM_MODE_P36V48_4K:
@@ -297,6 +314,8 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 
 	/* Configure output size */
 	switch (vm->mode) {
+	case VM_MODE_P52V48_4K:
+	case VM_MODE_P52V48_16K:
 	case VM_MODE_P52V48_64K:
 		tcr_el1 |= 6ul << 32; /* IPS = 52 bits */
 		ttbr0_el1 |= FIELD_GET(GENMASK(51, 48), vm->pgd) << 2;
@@ -325,6 +344,8 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 	/* TCR_EL1 |= IRGN0:WBWA | ORGN0:WBWA | SH0:Inner-Shareable */;
 	tcr_el1 |= (1 << 8) | (1 << 10) | (3 << 12);
 	tcr_el1 |= (64 - vm->va_bits) /* T0SZ */;
+	if (use_lpa2_pte_format(vm))
+		tcr_el1 |= (1ul << 59) /* DS */;
 
 	vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_SCTLR_EL1), sctlr_el1);
 	vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_TCR_EL1), tcr_el1);
diff --git a/tools/testing/selftests/kvm/lib/guest_modes.c b/tools/testing/selftests/kvm/lib/guest_modes.c
index c64c5cf49942..6634afc22137 100644
--- a/tools/testing/selftests/kvm/lib/guest_modes.c
+++ b/tools/testing/selftests/kvm/lib/guest_modes.c
@@ -23,6 +23,8 @@ void guest_modes_append_default(void)
 
 		aarch64_get_supported_page_sizes(limit, &ipa4k, &ipa16k, &ipa64k);
 
+		guest_mode_append(VM_MODE_P52V48_4K, ipa4k >= 52, ipa4k >= 52);
+		guest_mode_append(VM_MODE_P52V48_16K, ipa16k >= 52, ipa16k >= 52);
 		guest_mode_append(VM_MODE_P52V48_64K, ipa64k >= 52, ipa64k >= 52);
 
 		guest_mode_append(VM_MODE_P48V48_4K, ipa4k >= 48, ipa4k >= 48);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 7a8af1821f5d..aeba7a23105c 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -148,6 +148,7 @@ const char *vm_guest_mode_string(uint32_t i)
 {
 	static const char * const strings[] = {
 		[VM_MODE_P52V48_4K]	= "PA-bits:52,  VA-bits:48,  4K pages",
+		[VM_MODE_P52V48_16K]	= "PA-bits:52,  VA-bits:48, 16K pages",
 		[VM_MODE_P52V48_64K]	= "PA-bits:52,  VA-bits:48, 64K pages",
 		[VM_MODE_P48V48_4K]	= "PA-bits:48,  VA-bits:48,  4K pages",
 		[VM_MODE_P48V48_16K]	= "PA-bits:48,  VA-bits:48, 16K pages",
@@ -173,6 +174,7 @@ const char *vm_guest_mode_string(uint32_t i)
 
 const struct vm_guest_mode_params vm_guest_mode_params[] = {
 	[VM_MODE_P52V48_4K]	= { 52, 48,  0x1000, 12 },
+	[VM_MODE_P52V48_16K]	= { 52, 48,  0x4000, 14 },
 	[VM_MODE_P52V48_64K]	= { 52, 48, 0x10000, 16 },
 	[VM_MODE_P48V48_4K]	= { 48, 48,  0x1000, 12 },
 	[VM_MODE_P48V48_16K]	= { 48, 48,  0x4000, 14 },
@@ -251,6 +253,7 @@ struct kvm_vm *____vm_create(enum vm_guest_mode mode)
 	case VM_MODE_P36V48_64K:
 		vm->pgtable_levels = 3;
 		break;
+	case VM_MODE_P52V48_16K:
 	case VM_MODE_P48V48_16K:
 	case VM_MODE_P40V48_16K:
 	case VM_MODE_P36V48_16K:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v4 12/12] KVM: selftests: arm64: Support P52V48 4K and 16K guest_modes
@ 2023-10-09 18:50   ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-09 18:50 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	Suzuki K Poulose, James Morse, Zenghui Yu, Ard Biesheuvel,
	Anshuman Khandual
  Cc: Ryan Roberts, linux-arm-kernel, kvmarm

Add support for VM_MODE_P52V48_4K and VM_MODE_P52V48_16K guest modes by
using the FEAT_LPA2 pte format for stage1, when FEAT_LPA2 is available.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 .../selftests/kvm/include/kvm_util_base.h     |  1 +
 .../selftests/kvm/lib/aarch64/processor.c     | 39 ++++++++++++++-----
 tools/testing/selftests/kvm/lib/guest_modes.c |  2 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  3 ++
 4 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index a18db6a7b3cf..406500fb6e28 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -171,6 +171,7 @@ static inline struct userspace_mem_region *vm_get_mem_region(struct kvm_vm *vm,
 
 enum vm_guest_mode {
 	VM_MODE_P52V48_4K,
+	VM_MODE_P52V48_16K,
 	VM_MODE_P52V48_64K,
 	VM_MODE_P48V48_4K,
 	VM_MODE_P48V48_16K,
diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c b/tools/testing/selftests/kvm/lib/aarch64/processor.c
index e50dad81e956..7137222d7bcb 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
@@ -12,6 +12,7 @@
 #include "kvm_util.h"
 #include "processor.h"
 #include <linux/bitfield.h>
+#include <linux/sizes.h>
 
 #define DEFAULT_ARM64_GUEST_STACK_VADDR_MIN	0xac0000
 
@@ -58,13 +59,25 @@ static uint64_t pte_index(struct kvm_vm *vm, vm_vaddr_t gva)
 	return (gva >> vm->page_shift) & mask;
 }
 
+static inline bool use_lpa2_pte_format(struct kvm_vm *vm)
+{
+	return (vm->page_size == SZ_4K || vm->page_size == SZ_16K) &&
+	    (vm->pa_bits > 48 || vm->va_bits > 48);
+}
+
 static uint64_t addr_pte(struct kvm_vm *vm, uint64_t pa, uint64_t attrs)
 {
 	uint64_t pte;
 
-	pte = pa & GENMASK(47, vm->page_shift);
-	if (vm->page_shift == 16)
-		pte |= FIELD_GET(GENMASK(51, 48), pa) << 12;
+	if (use_lpa2_pte_format(vm)) {
+		pte = pa & GENMASK(49, vm->page_shift);
+		pte |= FIELD_GET(GENMASK(51, 50), pa) << 8;
+		attrs &= ~GENMASK(9, 8);
+	} else {
+		pte = pa & GENMASK(47, vm->page_shift);
+		if (vm->page_shift == 16)
+			pte |= FIELD_GET(GENMASK(51, 48), pa) << 12;
+	}
 	pte |= attrs;
 
 	return pte;
@@ -74,9 +87,14 @@ static uint64_t pte_addr(struct kvm_vm *vm, uint64_t pte)
 {
 	uint64_t pa;
 
-	pa = pte & GENMASK(47, vm->page_shift);
-	if (vm->page_shift == 16)
-		pa |= FIELD_GET(GENMASK(15, 12), pte) << 48;
+	if (use_lpa2_pte_format(vm)) {
+		pa = pte & GENMASK(49, vm->page_shift);
+		pa |= FIELD_GET(GENMASK(9, 8), pte) << 50;
+	} else {
+		pa = pte & GENMASK(47, vm->page_shift);
+		if (vm->page_shift == 16)
+			pa |= FIELD_GET(GENMASK(15, 12), pte) << 48;
+	}
 
 	return pa;
 }
@@ -266,9 +284,6 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 
 	/* Configure base granule size */
 	switch (vm->mode) {
-	case VM_MODE_P52V48_4K:
-		TEST_FAIL("AArch64 does not support 4K sized pages "
-			  "with 52-bit physical address ranges");
 	case VM_MODE_PXXV48_4K:
 		TEST_FAIL("AArch64 does not support 4K sized pages "
 			  "with ANY-bit physical address ranges");
@@ -278,12 +293,14 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 	case VM_MODE_P36V48_64K:
 		tcr_el1 |= 1ul << 14; /* TG0 = 64KB */
 		break;
+	case VM_MODE_P52V48_16K:
 	case VM_MODE_P48V48_16K:
 	case VM_MODE_P40V48_16K:
 	case VM_MODE_P36V48_16K:
 	case VM_MODE_P36V47_16K:
 		tcr_el1 |= 2ul << 14; /* TG0 = 16KB */
 		break;
+	case VM_MODE_P52V48_4K:
 	case VM_MODE_P48V48_4K:
 	case VM_MODE_P40V48_4K:
 	case VM_MODE_P36V48_4K:
@@ -297,6 +314,8 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 
 	/* Configure output size */
 	switch (vm->mode) {
+	case VM_MODE_P52V48_4K:
+	case VM_MODE_P52V48_16K:
 	case VM_MODE_P52V48_64K:
 		tcr_el1 |= 6ul << 32; /* IPS = 52 bits */
 		ttbr0_el1 |= FIELD_GET(GENMASK(51, 48), vm->pgd) << 2;
@@ -325,6 +344,8 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 	/* TCR_EL1 |= IRGN0:WBWA | ORGN0:WBWA | SH0:Inner-Shareable */;
 	tcr_el1 |= (1 << 8) | (1 << 10) | (3 << 12);
 	tcr_el1 |= (64 - vm->va_bits) /* T0SZ */;
+	if (use_lpa2_pte_format(vm))
+		tcr_el1 |= (1ul << 59) /* DS */;
 
 	vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_SCTLR_EL1), sctlr_el1);
 	vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_TCR_EL1), tcr_el1);
diff --git a/tools/testing/selftests/kvm/lib/guest_modes.c b/tools/testing/selftests/kvm/lib/guest_modes.c
index c64c5cf49942..6634afc22137 100644
--- a/tools/testing/selftests/kvm/lib/guest_modes.c
+++ b/tools/testing/selftests/kvm/lib/guest_modes.c
@@ -23,6 +23,8 @@ void guest_modes_append_default(void)
 
 		aarch64_get_supported_page_sizes(limit, &ipa4k, &ipa16k, &ipa64k);
 
+		guest_mode_append(VM_MODE_P52V48_4K, ipa4k >= 52, ipa4k >= 52);
+		guest_mode_append(VM_MODE_P52V48_16K, ipa16k >= 52, ipa16k >= 52);
 		guest_mode_append(VM_MODE_P52V48_64K, ipa64k >= 52, ipa64k >= 52);
 
 		guest_mode_append(VM_MODE_P48V48_4K, ipa4k >= 48, ipa4k >= 48);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 7a8af1821f5d..aeba7a23105c 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -148,6 +148,7 @@ const char *vm_guest_mode_string(uint32_t i)
 {
 	static const char * const strings[] = {
 		[VM_MODE_P52V48_4K]	= "PA-bits:52,  VA-bits:48,  4K pages",
+		[VM_MODE_P52V48_16K]	= "PA-bits:52,  VA-bits:48, 16K pages",
 		[VM_MODE_P52V48_64K]	= "PA-bits:52,  VA-bits:48, 64K pages",
 		[VM_MODE_P48V48_4K]	= "PA-bits:48,  VA-bits:48,  4K pages",
 		[VM_MODE_P48V48_16K]	= "PA-bits:48,  VA-bits:48, 16K pages",
@@ -173,6 +174,7 @@ const char *vm_guest_mode_string(uint32_t i)
 
 const struct vm_guest_mode_params vm_guest_mode_params[] = {
 	[VM_MODE_P52V48_4K]	= { 52, 48,  0x1000, 12 },
+	[VM_MODE_P52V48_16K]	= { 52, 48,  0x4000, 14 },
 	[VM_MODE_P52V48_64K]	= { 52, 48, 0x10000, 16 },
 	[VM_MODE_P48V48_4K]	= { 48, 48,  0x1000, 12 },
 	[VM_MODE_P48V48_16K]	= { 48, 48,  0x4000, 14 },
@@ -251,6 +253,7 @@ struct kvm_vm *____vm_create(enum vm_guest_mode mode)
 	case VM_MODE_P36V48_64K:
 		vm->pgtable_levels = 3;
 		break;
+	case VM_MODE_P52V48_16K:
 	case VM_MODE_P48V48_16K:
 	case VM_MODE_P40V48_16K:
 	case VM_MODE_P36V48_16K:
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  2023-10-09 18:49   ` Ryan Roberts
@ 2023-10-19  8:03     ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-19  8:03 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:49:57 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
> the non-range tlbi instructions can now validly take a 0 value for the
> 4KB granule (this is due to the extra level of translation). Secondly,

nit: 0 was always valid. It just didn't indicate any level.

> the BADDR field in the range tlbi instructions must be aligned to 64KB
> when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
> continue to operate correctly when LPA2 is in use.
> 
> KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
> solve the first problem with this patch.

Is this still true? This patch changes __TLBI_VADDR_RANGE() and co.

>
> It is solved by always adding the level hint if the level is between [0,
> 3] (previously anything other than 0 was hinted, which breaks in the new
> level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
> hint as the HW will fall back to non-hinted. While we are at it, we
> replace the notion of 0 being the non-hinted seninel with a macro,
> TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
> translation depth increases in future.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/tlb.h      |  9 ++++---
>  arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
>  2 files changed, 31 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index 2c29239d05c3..93c537635dbb 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
>  #include <asm-generic/tlb.h>
>  
>  /*
> - * get the tlbi levels in arm64.  Default value is 0 if more than one
> - * of cleared_* is set or neither is set.
> + * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
> + * one of cleared_* is set or neither is set - this elides the level hinting to
> + * the hardware.
>   * Arm64 doesn't support p4ds now.
>   */
>  static inline int tlb_get_level(struct mmu_gather *tlb)
>  {
>  	/* The TTL field is only valid for the leaf entry. */
>  	if (tlb->freed_tables)
> -		return 0;
> +		return TLBI_TTL_UNKNOWN;
>  
>  	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
>  				   tlb->cleared_puds ||
> @@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>  				   tlb->cleared_p4ds))
>  		return 1;
>  
> -	return 0;
> +	return TLBI_TTL_UNKNOWN;
>  }
>  
>  static inline void tlb_flush(struct mmu_gather *tlb)
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index b149cf9f91bc..e688246b3b13 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
>   * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
>   * the level at which the invalidation must take place. If the level is
>   * wrong, no invalidation may take place. In the case where the level
> - * cannot be easily determined, a 0 value for the level parameter will
> - * perform a non-hinted invalidation.
> + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
> + * a non-hinted invalidation. Any provided level outside the hint range
> + * will also cause fall-back to non-hinted invalidation.
>   *
>   * For Stage-2 invalidation, use the level values provided to that effect
>   * in asm/stage2_pgtable.h.
>   */
>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
>  
> +#define TLBI_TTL_UNKNOWN	(-1)

I find this value somehow confusing, as it represent an actual level
number. It just happen to be one that cannot be provided as a TTL. So
having that as a return value from tlb_get_level() isn't great, and
I'd rather have something that cannot be mistaken for a valid level.

> +
>  #define __tlbi_level(op, addr, level) do {				\
>  	u64 arg = addr;							\
>  									\
>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
> -	    level) {							\
> +	    level >= 0 && level <= 3) {					\
>  		u64 ttl = level & 3;					\
>  		ttl |= get_trans_granule() << 2;			\
>  		arg &= ~TLBI_TTL_MASK;					\
> @@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>   *
>   */
> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
> -	({							\
> -		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
> -		__ta &= GENMASK_ULL(36, 0);			\
> -		__ta |= (unsigned long)(ttl) << 37;		\
> -		__ta |= (unsigned long)(num) << 39;		\
> -		__ta |= (unsigned long)(scale) << 44;		\
> -		__ta |= get_trans_granule() << 46;		\
> -		__ta |= (unsigned long)(asid) << 48;		\
> -		__ta;						\
> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
> +	({									\
> +		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
> +		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
> +		__ta &= GENMASK_ULL(36, 0);					\
> +		__ta |= __ttl << 37;						\
> +		__ta |= (unsigned long)(num) << 39;				\
> +		__ta |= (unsigned long)(scale) << 44;				\
> +		__ta |= get_trans_granule() << 46;				\
> +		__ta |= (unsigned long)(asid) << 48;				\
> +		__ta;								\
>  	})
>  
>  /* These macros are used by the TLBI RANGE feature. */
> @@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
>   *		CPUs, ensuring that any walk-cache entries associated with the
>   *		translation are also invalidated.
>   *
> - *	__flush_tlb_range(vma, start, end, stride, last_level)
> + *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
>   *		Invalidate the virtual-address range '[start, end)' on all
>   *		CPUs for the user address space corresponding to 'vma->mm'.
>   *		The invalidation operations are issued at a granularity
>   *		determined by 'stride' and only affect any walk-cache entries
> - *		if 'last_level' is equal to false.
> + *		if 'last_level' is equal to false. tlb_level is the level at
> + *		which the invalidation must take place. If the level is wrong,
> + *		no invalidation may take place. In the case where the level
> + *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
> + *		perform a non-hinted invalidation.
>   *
>   *
>   *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
> @@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>  	/*
>  	 * We cannot use leaf-only invalidation here, since we may be invalidating
>  	 * table entries as part of collapsing hugepages or moving page tables.
> -	 * Set the tlb_level to 0 because we can not get enough information here.
> +	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
> +	 * information here.
>  	 */
> -	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
> +	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
>  }
>  
>  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)

It feels like this range stuff would be better located in the second
patch. Not a huge deal though.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
@ 2023-10-19  8:03     ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-19  8:03 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:49:57 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
> the non-range tlbi instructions can now validly take a 0 value for the
> 4KB granule (this is due to the extra level of translation). Secondly,

nit: 0 was always valid. It just didn't indicate any level.

> the BADDR field in the range tlbi instructions must be aligned to 64KB
> when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
> continue to operate correctly when LPA2 is in use.
> 
> KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
> solve the first problem with this patch.

Is this still true? This patch changes __TLBI_VADDR_RANGE() and co.

>
> It is solved by always adding the level hint if the level is between [0,
> 3] (previously anything other than 0 was hinted, which breaks in the new
> level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
> hint as the HW will fall back to non-hinted. While we are at it, we
> replace the notion of 0 being the non-hinted seninel with a macro,
> TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
> translation depth increases in future.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/tlb.h      |  9 ++++---
>  arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
>  2 files changed, 31 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index 2c29239d05c3..93c537635dbb 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
>  #include <asm-generic/tlb.h>
>  
>  /*
> - * get the tlbi levels in arm64.  Default value is 0 if more than one
> - * of cleared_* is set or neither is set.
> + * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
> + * one of cleared_* is set or neither is set - this elides the level hinting to
> + * the hardware.
>   * Arm64 doesn't support p4ds now.
>   */
>  static inline int tlb_get_level(struct mmu_gather *tlb)
>  {
>  	/* The TTL field is only valid for the leaf entry. */
>  	if (tlb->freed_tables)
> -		return 0;
> +		return TLBI_TTL_UNKNOWN;
>  
>  	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
>  				   tlb->cleared_puds ||
> @@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>  				   tlb->cleared_p4ds))
>  		return 1;
>  
> -	return 0;
> +	return TLBI_TTL_UNKNOWN;
>  }
>  
>  static inline void tlb_flush(struct mmu_gather *tlb)
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index b149cf9f91bc..e688246b3b13 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
>   * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
>   * the level at which the invalidation must take place. If the level is
>   * wrong, no invalidation may take place. In the case where the level
> - * cannot be easily determined, a 0 value for the level parameter will
> - * perform a non-hinted invalidation.
> + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
> + * a non-hinted invalidation. Any provided level outside the hint range
> + * will also cause fall-back to non-hinted invalidation.
>   *
>   * For Stage-2 invalidation, use the level values provided to that effect
>   * in asm/stage2_pgtable.h.
>   */
>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
>  
> +#define TLBI_TTL_UNKNOWN	(-1)

I find this value somehow confusing, as it represent an actual level
number. It just happen to be one that cannot be provided as a TTL. So
having that as a return value from tlb_get_level() isn't great, and
I'd rather have something that cannot be mistaken for a valid level.

> +
>  #define __tlbi_level(op, addr, level) do {				\
>  	u64 arg = addr;							\
>  									\
>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
> -	    level) {							\
> +	    level >= 0 && level <= 3) {					\
>  		u64 ttl = level & 3;					\
>  		ttl |= get_trans_granule() << 2;			\
>  		arg &= ~TLBI_TTL_MASK;					\
> @@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>   *
>   */
> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
> -	({							\
> -		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
> -		__ta &= GENMASK_ULL(36, 0);			\
> -		__ta |= (unsigned long)(ttl) << 37;		\
> -		__ta |= (unsigned long)(num) << 39;		\
> -		__ta |= (unsigned long)(scale) << 44;		\
> -		__ta |= get_trans_granule() << 46;		\
> -		__ta |= (unsigned long)(asid) << 48;		\
> -		__ta;						\
> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
> +	({									\
> +		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
> +		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
> +		__ta &= GENMASK_ULL(36, 0);					\
> +		__ta |= __ttl << 37;						\
> +		__ta |= (unsigned long)(num) << 39;				\
> +		__ta |= (unsigned long)(scale) << 44;				\
> +		__ta |= get_trans_granule() << 46;				\
> +		__ta |= (unsigned long)(asid) << 48;				\
> +		__ta;								\
>  	})
>  
>  /* These macros are used by the TLBI RANGE feature. */
> @@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
>   *		CPUs, ensuring that any walk-cache entries associated with the
>   *		translation are also invalidated.
>   *
> - *	__flush_tlb_range(vma, start, end, stride, last_level)
> + *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
>   *		Invalidate the virtual-address range '[start, end)' on all
>   *		CPUs for the user address space corresponding to 'vma->mm'.
>   *		The invalidation operations are issued at a granularity
>   *		determined by 'stride' and only affect any walk-cache entries
> - *		if 'last_level' is equal to false.
> + *		if 'last_level' is equal to false. tlb_level is the level at
> + *		which the invalidation must take place. If the level is wrong,
> + *		no invalidation may take place. In the case where the level
> + *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
> + *		perform a non-hinted invalidation.
>   *
>   *
>   *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
> @@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>  	/*
>  	 * We cannot use leaf-only invalidation here, since we may be invalidating
>  	 * table entries as part of collapsing hugepages or moving page tables.
> -	 * Set the tlb_level to 0 because we can not get enough information here.
> +	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
> +	 * information here.
>  	 */
> -	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
> +	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
>  }
>  
>  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)

It feels like this range stuff would be better located in the second
patch. Not a huge deal though.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  2023-10-19  8:03     ` Marc Zyngier
@ 2023-10-19  9:22       ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-19  9:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 19/10/2023 09:03, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:49:57 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
>> the non-range tlbi instructions can now validly take a 0 value for the
>> 4KB granule (this is due to the extra level of translation). Secondly,
> 
> nit: 0 was always valid. It just didn't indicate any level.

True. I'll change to "can now validly take a 0 value as a TTL hint".

> 
>> the BADDR field in the range tlbi instructions must be aligned to 64KB
>> when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
>> continue to operate correctly when LPA2 is in use.
>>
>> KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
>> solve the first problem with this patch.
> 
> Is this still true? This patch changes __TLBI_VADDR_RANGE() and co.

It is no longer true that KVM only uses the non-range routines. v6.6 adds a
series where KVM will now use the range-based routines too. So that text is out
of date and I should have spotted it when doing the rebase - I'll fix. KVM now
using range-based ops is the reason I added patch 2 to this series.

However, this patch doesn't really change __TLBI_VADDR_RANGE()'s behavior, it
just makes it robust to the presence of TLBI_TTL_UNKNOWN, instead of 0 which was
previously used as the "don't know" value.

> 
>>
>> It is solved by always adding the level hint if the level is between [0,
>> 3] (previously anything other than 0 was hinted, which breaks in the new
>> level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
>> hint as the HW will fall back to non-hinted. While we are at it, we
>> replace the notion of 0 being the non-hinted seninel with a macro,
>> TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
>> translation depth increases in future.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> ---
>>  arch/arm64/include/asm/tlb.h      |  9 ++++---
>>  arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
>>  2 files changed, 31 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
>> index 2c29239d05c3..93c537635dbb 100644
>> --- a/arch/arm64/include/asm/tlb.h
>> +++ b/arch/arm64/include/asm/tlb.h
>> @@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
>>  #include <asm-generic/tlb.h>
>>  
>>  /*
>> - * get the tlbi levels in arm64.  Default value is 0 if more than one
>> - * of cleared_* is set or neither is set.
>> + * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
>> + * one of cleared_* is set or neither is set - this elides the level hinting to
>> + * the hardware.
>>   * Arm64 doesn't support p4ds now.
>>   */
>>  static inline int tlb_get_level(struct mmu_gather *tlb)
>>  {
>>  	/* The TTL field is only valid for the leaf entry. */
>>  	if (tlb->freed_tables)
>> -		return 0;
>> +		return TLBI_TTL_UNKNOWN;
>>  
>>  	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
>>  				   tlb->cleared_puds ||
>> @@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>>  				   tlb->cleared_p4ds))
>>  		return 1;
>>  
>> -	return 0;
>> +	return TLBI_TTL_UNKNOWN;
>>  }
>>  
>>  static inline void tlb_flush(struct mmu_gather *tlb)
>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>> index b149cf9f91bc..e688246b3b13 100644
>> --- a/arch/arm64/include/asm/tlbflush.h
>> +++ b/arch/arm64/include/asm/tlbflush.h
>> @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
>>   * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
>>   * the level at which the invalidation must take place. If the level is
>>   * wrong, no invalidation may take place. In the case where the level
>> - * cannot be easily determined, a 0 value for the level parameter will
>> - * perform a non-hinted invalidation.
>> + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
>> + * a non-hinted invalidation. Any provided level outside the hint range
>> + * will also cause fall-back to non-hinted invalidation.
>>   *
>>   * For Stage-2 invalidation, use the level values provided to that effect
>>   * in asm/stage2_pgtable.h.
>>   */
>>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
>>  
>> +#define TLBI_TTL_UNKNOWN	(-1)
> 
> I find this value somehow confusing, as it represent an actual level
> number. It just happen to be one that cannot be provided as a TTL. So
> having that as a return value from tlb_get_level() isn't great, and
> I'd rather have something that cannot be mistaken for a valid level.

OK, how about INT_MAX?

> 
>> +
>>  #define __tlbi_level(op, addr, level) do {				\
>>  	u64 arg = addr;							\
>>  									\
>>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
>> -	    level) {							\
>> +	    level >= 0 && level <= 3) {					\
>>  		u64 ttl = level & 3;					\
>>  		ttl |= get_trans_granule() << 2;			\
>>  		arg &= ~TLBI_TTL_MASK;					\
>> @@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
>>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>>   *
>>   */
>> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
>> -	({							\
>> -		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
>> -		__ta &= GENMASK_ULL(36, 0);			\
>> -		__ta |= (unsigned long)(ttl) << 37;		\
>> -		__ta |= (unsigned long)(num) << 39;		\
>> -		__ta |= (unsigned long)(scale) << 44;		\
>> -		__ta |= get_trans_granule() << 46;		\
>> -		__ta |= (unsigned long)(asid) << 48;		\
>> -		__ta;						\
>> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
>> +	({									\
>> +		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
>> +		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
>> +		__ta &= GENMASK_ULL(36, 0);					\
>> +		__ta |= __ttl << 37;						\
>> +		__ta |= (unsigned long)(num) << 39;				\
>> +		__ta |= (unsigned long)(scale) << 44;				\
>> +		__ta |= get_trans_granule() << 46;				\
>> +		__ta |= (unsigned long)(asid) << 48;				\
>> +		__ta;								\
>>  	})
>>  
>>  /* These macros are used by the TLBI RANGE feature. */
>> @@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
>>   *		CPUs, ensuring that any walk-cache entries associated with the
>>   *		translation are also invalidated.
>>   *
>> - *	__flush_tlb_range(vma, start, end, stride, last_level)
>> + *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
>>   *		Invalidate the virtual-address range '[start, end)' on all
>>   *		CPUs for the user address space corresponding to 'vma->mm'.
>>   *		The invalidation operations are issued at a granularity
>>   *		determined by 'stride' and only affect any walk-cache entries
>> - *		if 'last_level' is equal to false.
>> + *		if 'last_level' is equal to false. tlb_level is the level at
>> + *		which the invalidation must take place. If the level is wrong,
>> + *		no invalidation may take place. In the case where the level
>> + *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
>> + *		perform a non-hinted invalidation.
>>   *
>>   *
>>   *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
>> @@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>>  	/*
>>  	 * We cannot use leaf-only invalidation here, since we may be invalidating
>>  	 * table entries as part of collapsing hugepages or moving page tables.
>> -	 * Set the tlb_level to 0 because we can not get enough information here.
>> +	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
>> +	 * information here.
>>  	 */
>> -	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
>> +	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
>>  }
>>  
>>  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> 
> It feels like this range stuff would be better located in the second
> patch. Not a huge deal though.

As I said, this is the minimal change to the range-based side of things to
robustly deal with the introduction of TLBI_TTL_UNKNOWN.

But I wonder if I'm actually better of squashing both of the 2 patches into one.
The only reason I split it previously was because KVM was only using the
level-based ops.

Thanks for the review!

Ryan


> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
@ 2023-10-19  9:22       ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-19  9:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 19/10/2023 09:03, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:49:57 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
>> the non-range tlbi instructions can now validly take a 0 value for the
>> 4KB granule (this is due to the extra level of translation). Secondly,
> 
> nit: 0 was always valid. It just didn't indicate any level.

True. I'll change to "can now validly take a 0 value as a TTL hint".

> 
>> the BADDR field in the range tlbi instructions must be aligned to 64KB
>> when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
>> continue to operate correctly when LPA2 is in use.
>>
>> KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
>> solve the first problem with this patch.
> 
> Is this still true? This patch changes __TLBI_VADDR_RANGE() and co.

It is no longer true that KVM only uses the non-range routines. v6.6 adds a
series where KVM will now use the range-based routines too. So that text is out
of date and I should have spotted it when doing the rebase - I'll fix. KVM now
using range-based ops is the reason I added patch 2 to this series.

However, this patch doesn't really change __TLBI_VADDR_RANGE()'s behavior, it
just makes it robust to the presence of TLBI_TTL_UNKNOWN, instead of 0 which was
previously used as the "don't know" value.

> 
>>
>> It is solved by always adding the level hint if the level is between [0,
>> 3] (previously anything other than 0 was hinted, which breaks in the new
>> level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
>> hint as the HW will fall back to non-hinted. While we are at it, we
>> replace the notion of 0 being the non-hinted seninel with a macro,
>> TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
>> translation depth increases in future.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>> ---
>>  arch/arm64/include/asm/tlb.h      |  9 ++++---
>>  arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
>>  2 files changed, 31 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
>> index 2c29239d05c3..93c537635dbb 100644
>> --- a/arch/arm64/include/asm/tlb.h
>> +++ b/arch/arm64/include/asm/tlb.h
>> @@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
>>  #include <asm-generic/tlb.h>
>>  
>>  /*
>> - * get the tlbi levels in arm64.  Default value is 0 if more than one
>> - * of cleared_* is set or neither is set.
>> + * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
>> + * one of cleared_* is set or neither is set - this elides the level hinting to
>> + * the hardware.
>>   * Arm64 doesn't support p4ds now.
>>   */
>>  static inline int tlb_get_level(struct mmu_gather *tlb)
>>  {
>>  	/* The TTL field is only valid for the leaf entry. */
>>  	if (tlb->freed_tables)
>> -		return 0;
>> +		return TLBI_TTL_UNKNOWN;
>>  
>>  	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
>>  				   tlb->cleared_puds ||
>> @@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>>  				   tlb->cleared_p4ds))
>>  		return 1;
>>  
>> -	return 0;
>> +	return TLBI_TTL_UNKNOWN;
>>  }
>>  
>>  static inline void tlb_flush(struct mmu_gather *tlb)
>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>> index b149cf9f91bc..e688246b3b13 100644
>> --- a/arch/arm64/include/asm/tlbflush.h
>> +++ b/arch/arm64/include/asm/tlbflush.h
>> @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
>>   * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
>>   * the level at which the invalidation must take place. If the level is
>>   * wrong, no invalidation may take place. In the case where the level
>> - * cannot be easily determined, a 0 value for the level parameter will
>> - * perform a non-hinted invalidation.
>> + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
>> + * a non-hinted invalidation. Any provided level outside the hint range
>> + * will also cause fall-back to non-hinted invalidation.
>>   *
>>   * For Stage-2 invalidation, use the level values provided to that effect
>>   * in asm/stage2_pgtable.h.
>>   */
>>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
>>  
>> +#define TLBI_TTL_UNKNOWN	(-1)
> 
> I find this value somehow confusing, as it represent an actual level
> number. It just happen to be one that cannot be provided as a TTL. So
> having that as a return value from tlb_get_level() isn't great, and
> I'd rather have something that cannot be mistaken for a valid level.

OK, how about INT_MAX?

> 
>> +
>>  #define __tlbi_level(op, addr, level) do {				\
>>  	u64 arg = addr;							\
>>  									\
>>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
>> -	    level) {							\
>> +	    level >= 0 && level <= 3) {					\
>>  		u64 ttl = level & 3;					\
>>  		ttl |= get_trans_granule() << 2;			\
>>  		arg &= ~TLBI_TTL_MASK;					\
>> @@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
>>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>>   *
>>   */
>> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
>> -	({							\
>> -		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
>> -		__ta &= GENMASK_ULL(36, 0);			\
>> -		__ta |= (unsigned long)(ttl) << 37;		\
>> -		__ta |= (unsigned long)(num) << 39;		\
>> -		__ta |= (unsigned long)(scale) << 44;		\
>> -		__ta |= get_trans_granule() << 46;		\
>> -		__ta |= (unsigned long)(asid) << 48;		\
>> -		__ta;						\
>> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
>> +	({									\
>> +		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
>> +		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
>> +		__ta &= GENMASK_ULL(36, 0);					\
>> +		__ta |= __ttl << 37;						\
>> +		__ta |= (unsigned long)(num) << 39;				\
>> +		__ta |= (unsigned long)(scale) << 44;				\
>> +		__ta |= get_trans_granule() << 46;				\
>> +		__ta |= (unsigned long)(asid) << 48;				\
>> +		__ta;								\
>>  	})
>>  
>>  /* These macros are used by the TLBI RANGE feature. */
>> @@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
>>   *		CPUs, ensuring that any walk-cache entries associated with the
>>   *		translation are also invalidated.
>>   *
>> - *	__flush_tlb_range(vma, start, end, stride, last_level)
>> + *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
>>   *		Invalidate the virtual-address range '[start, end)' on all
>>   *		CPUs for the user address space corresponding to 'vma->mm'.
>>   *		The invalidation operations are issued at a granularity
>>   *		determined by 'stride' and only affect any walk-cache entries
>> - *		if 'last_level' is equal to false.
>> + *		if 'last_level' is equal to false. tlb_level is the level at
>> + *		which the invalidation must take place. If the level is wrong,
>> + *		no invalidation may take place. In the case where the level
>> + *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
>> + *		perform a non-hinted invalidation.
>>   *
>>   *
>>   *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
>> @@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>>  	/*
>>  	 * We cannot use leaf-only invalidation here, since we may be invalidating
>>  	 * table entries as part of collapsing hugepages or moving page tables.
>> -	 * Set the tlb_level to 0 because we can not get enough information here.
>> +	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
>> +	 * information here.
>>  	 */
>> -	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
>> +	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
>>  }
>>  
>>  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> 
> It feels like this range stuff would be better located in the second
> patch. Not a huge deal though.

As I said, this is the minimal change to the range-based side of things to
robustly deal with the introduction of TLBI_TTL_UNKNOWN.

But I wonder if I'm actually better of squashing both of the 2 patches into one.
The only reason I split it previously was because KVM was only using the
level-based ops.

Thanks for the review!

Ryan


> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 02/12] arm64/mm: Update range-based tlb invalidation routines for FEAT_LPA2
  2023-10-09 18:49   ` Ryan Roberts
@ 2023-10-19 21:06     ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-19 21:06 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:49:58 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> The BADDR field of the range-based tlbi instructions is specified in
> 64KB units when LPA2 is in use (TCR.DS=1), whereas it is in page units
> otherwise.
> 
> When LPA2 is enabled, use the non-range tlbi instructions to forward
> align to a 64KB boundary first, then use range-based tlbi from there on,
> until we have either invalidated all pages or we have a single page
> remaining. If the latter, that is done with non-range tlbi. (Previously
> we invalidated a single odd page first, but we can no longer do this
> because it could wreck our 64KB alignment). When LPA2 is not in use, we
> don't need the initial alignemnt step. However, the bigger impact is
> that we can no longer use the previous method of iterating from smallest
> to largest 'scale', since this would likely unalign the boundary again
> for the LPA2 case. So instead we iterate from highest to lowest scale,
> which guarrantees that we remain 64KB aligned until the last op (at
> scale=0).
> 
> The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
> arm64") stated this as the reason for incrementing scale:
> 
>   However, in most scenarios, the pages = 1 when flush_tlb_range() is
>   called. Start from scale = 3 or other proper value (such as scale
>   =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
>   to maximum, the flush order is exactly opposite to the example.
> 
> But pages=1 is already special cased by the non-range invalidation path,
> which will take care of it the first time through the loop (both in the
> original commit and in my change), so I don't think switching to
> decrement scale should have any extra performance impact after all.

Surely this can be benchmarked. After all, HW supporting range
invalidation is common enough these days.

> 
> Note: This patch uses LPA2 range-based tlbi based on the new lpa2 param
> passed to __flush_tlb_range_op(). This allows both KVM and the kernel to
> opt-in/out of LPA2 usage independently. But once both are converted over
> (and keyed off the same static key), the parameter could be dropped and
> replaced by the static key directly in the macro.

Why can't this be done right away? Have a patch common to the two
series that exposes the static key, and use that from the start. This
would avoid the current (and rather ugly) extra parameter that I find
unnecessarily hard to parse.

And if the 64kB alignment above is cheap enough, maybe this could
become the one true way?

> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/tlb.h      |  6 +++-
>  arch/arm64/include/asm/tlbflush.h | 46 ++++++++++++++++++++-----------
>  arch/arm64/kvm/hyp/nvhe/tlb.c     |  2 +-
>  arch/arm64/kvm/hyp/vhe/tlb.c      |  2 +-
>  4 files changed, 37 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index 93c537635dbb..396ba9b4872c 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -25,7 +25,6 @@ static void tlb_flush(struct mmu_gather *tlb);
>   * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
>   * one of cleared_* is set or neither is set - this elides the level hinting to
>   * the hardware.
> - * Arm64 doesn't support p4ds now.
>   */
>  static inline int tlb_get_level(struct mmu_gather *tlb)
>  {
> @@ -48,6 +47,11 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>  				   tlb->cleared_p4ds))
>  		return 1;
>  
> +	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
> +				   tlb->cleared_pmds ||
> +				   tlb->cleared_puds))
> +		return 0;
> +
>  	return TLBI_TTL_UNKNOWN;
>  }
>  
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index e688246b3b13..4d34035fe7d6 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -136,10 +136,14 @@ static inline unsigned long get_trans_granule(void)
>   * The address range is determined by below formula:
>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>   *
> + * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
> + * See ARM DDI 0487I.a C5.5.21.

Please update this to the latest published ARM ARM. I know it will be
obsolete quickly enough, but still. Also, "page number" is rather
imprecise, and doesn't match the language of the architecture.

> + *
>   */
> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2)			\
>  	({									\
> -		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
> +		unsigned long __addr_shift = lpa2 ? 16 : PAGE_SHIFT;		\
> +		unsigned long __ta = (addr) >> __addr_shift;			\
>  		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
>  		__ta &= GENMASK_ULL(36, 0);					\
>  		__ta |= __ttl << 37;						\
> @@ -354,34 +358,44 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>   * @tlb_level:	Translation Table level hint, if known
>   * @tlbi_user:	If 'true', call an additional __tlbi_user()
>   *              (typically for user ASIDs). 'flase' for IPA instructions
> + * @lpa2:	If 'true', the lpa2 scheme is used as set out below
>   *
>   * When the CPU does not support TLB range operations, flush the TLB
>   * entries one by one at the granularity of 'stride'. If the TLB
>   * range ops are supported, then:
>   *
> - * 1. If 'pages' is odd, flush the first page through non-range
> - *    operations;
> + * 1. If FEAT_LPA2 is in use, the start address of a range operation
> + *    must be 64KB aligned, so flush pages one by one until the
> + *    alignment is reached using the non-range operations. This step is
> + *    skipped if LPA2 is not in use.
>   *
>   * 2. For remaining pages: the minimum range granularity is decided
>   *    by 'scale', so multiple range TLBI operations may be required.
> - *    Start from scale = 0, flush the corresponding number of pages
> - *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
> - *    until no pages left.
> + *    Start from scale = 3, flush the corresponding number of pages
> + *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
> + *    until one or zero pages are left. We must start from highest scale
> + *    to ensure 64KB start alignment is maintained in the LPA2 case.

Surely the algorithm is a bit more subtle than this, because always
starting with scale==3 means that you're invalidating at least 64k
*pages*, which is an awful lot (a minimum of 256MB?).

> + *
> + * 3. If there is 1 page remaining, flush it through non-range
> + *    operations. Range operations can only span an even number of
> + *    pages. We save this for last to ensure 64KB start alignment is
> + *    maintained for the LPA2 case.
>   *
>   * Note that certain ranges can be represented by either num = 31 and
>   * scale or num = 0 and scale + 1. The loop below favours the latter
>   * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
>   */
>  #define __flush_tlb_range_op(op, start, pages, stride,			\
> -				asid, tlb_level, tlbi_user)		\
> +				asid, tlb_level, tlbi_user, lpa2)	\
>  do {									\
>  	int num = 0;							\
> -	int scale = 0;							\
> +	int scale = 3;							\
>  	unsigned long addr;						\
>  									\
>  	while (pages > 0) {						\

Not an issue with your patch, but we could be more robust here. If
'pages' is an unsigned quantity and what we have a bug in converging
to 0 below, we'll be looping for a long time. Not to mention the side
effects on pages and start.

>  		if (!system_supports_tlb_range() ||			\
> -		    pages % 2 == 1) {					\
> +		    pages == 1 ||					\
> +		    (lpa2 && start != ALIGN(start, SZ_64K))) {		\
>  			addr = __TLBI_VADDR(start, asid);		\
>  			__tlbi_level(op, addr, tlb_level);		\
>  			if (tlbi_user)					\
> @@ -394,19 +408,19 @@ do {									\
>  		num = __TLBI_RANGE_NUM(pages, scale);			\
>  		if (num >= 0) {						\
>  			addr = __TLBI_VADDR_RANGE(start, asid, scale,	\
> -						  num, tlb_level);	\
> +						num, tlb_level, lpa2);	\
>  			__tlbi(r##op, addr);				\
>  			if (tlbi_user)					\
>  				__tlbi_user(r##op, addr);		\
>  			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
>  			pages -= __TLBI_RANGE_PAGES(num, scale);	\
>  		}							\
> -		scale++;						\
> +		scale--;						\
>  	}								\
>  } while (0)
>  
> -#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
> -	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false)
> +#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level, lpa2) \
> +	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, lpa2)
>  
>  static inline void __flush_tlb_range(struct vm_area_struct *vma,
>  				     unsigned long start, unsigned long end,
> @@ -436,9 +450,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
>  	asid = ASID(vma->vm_mm);
>  
>  	if (last_level)
> -		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
> +		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true, false);
>  	else
> -		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
> +		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, false);
>  
>  	dsb(ish);
>  	mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index 1b265713d6be..d42b72f78a9b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -198,7 +198,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  	/* Switch to requested VMID */
>  	__tlb_switch_to_guest(mmu, &cxt, false);
>  
> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>  
>  	dsb(ish);
>  	__tlbi(vmalle1is);
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 46bd43f61d76..6041c6c78984 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -161,7 +161,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  	/* Switch to requested VMID */
>  	__tlb_switch_to_guest(mmu, &cxt);
>  
> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>  
>  	dsb(ish);
>  	__tlbi(vmalle1is);

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 02/12] arm64/mm: Update range-based tlb invalidation routines for FEAT_LPA2
@ 2023-10-19 21:06     ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-19 21:06 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:49:58 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> The BADDR field of the range-based tlbi instructions is specified in
> 64KB units when LPA2 is in use (TCR.DS=1), whereas it is in page units
> otherwise.
> 
> When LPA2 is enabled, use the non-range tlbi instructions to forward
> align to a 64KB boundary first, then use range-based tlbi from there on,
> until we have either invalidated all pages or we have a single page
> remaining. If the latter, that is done with non-range tlbi. (Previously
> we invalidated a single odd page first, but we can no longer do this
> because it could wreck our 64KB alignment). When LPA2 is not in use, we
> don't need the initial alignemnt step. However, the bigger impact is
> that we can no longer use the previous method of iterating from smallest
> to largest 'scale', since this would likely unalign the boundary again
> for the LPA2 case. So instead we iterate from highest to lowest scale,
> which guarrantees that we remain 64KB aligned until the last op (at
> scale=0).
> 
> The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
> arm64") stated this as the reason for incrementing scale:
> 
>   However, in most scenarios, the pages = 1 when flush_tlb_range() is
>   called. Start from scale = 3 or other proper value (such as scale
>   =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
>   to maximum, the flush order is exactly opposite to the example.
> 
> But pages=1 is already special cased by the non-range invalidation path,
> which will take care of it the first time through the loop (both in the
> original commit and in my change), so I don't think switching to
> decrement scale should have any extra performance impact after all.

Surely this can be benchmarked. After all, HW supporting range
invalidation is common enough these days.

> 
> Note: This patch uses LPA2 range-based tlbi based on the new lpa2 param
> passed to __flush_tlb_range_op(). This allows both KVM and the kernel to
> opt-in/out of LPA2 usage independently. But once both are converted over
> (and keyed off the same static key), the parameter could be dropped and
> replaced by the static key directly in the macro.

Why can't this be done right away? Have a patch common to the two
series that exposes the static key, and use that from the start. This
would avoid the current (and rather ugly) extra parameter that I find
unnecessarily hard to parse.

And if the 64kB alignment above is cheap enough, maybe this could
become the one true way?

> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/tlb.h      |  6 +++-
>  arch/arm64/include/asm/tlbflush.h | 46 ++++++++++++++++++++-----------
>  arch/arm64/kvm/hyp/nvhe/tlb.c     |  2 +-
>  arch/arm64/kvm/hyp/vhe/tlb.c      |  2 +-
>  4 files changed, 37 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index 93c537635dbb..396ba9b4872c 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -25,7 +25,6 @@ static void tlb_flush(struct mmu_gather *tlb);
>   * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
>   * one of cleared_* is set or neither is set - this elides the level hinting to
>   * the hardware.
> - * Arm64 doesn't support p4ds now.
>   */
>  static inline int tlb_get_level(struct mmu_gather *tlb)
>  {
> @@ -48,6 +47,11 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>  				   tlb->cleared_p4ds))
>  		return 1;
>  
> +	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
> +				   tlb->cleared_pmds ||
> +				   tlb->cleared_puds))
> +		return 0;
> +
>  	return TLBI_TTL_UNKNOWN;
>  }
>  
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index e688246b3b13..4d34035fe7d6 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -136,10 +136,14 @@ static inline unsigned long get_trans_granule(void)
>   * The address range is determined by below formula:
>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>   *
> + * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
> + * See ARM DDI 0487I.a C5.5.21.

Please update this to the latest published ARM ARM. I know it will be
obsolete quickly enough, but still. Also, "page number" is rather
imprecise, and doesn't match the language of the architecture.

> + *
>   */
> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2)			\
>  	({									\
> -		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
> +		unsigned long __addr_shift = lpa2 ? 16 : PAGE_SHIFT;		\
> +		unsigned long __ta = (addr) >> __addr_shift;			\
>  		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
>  		__ta &= GENMASK_ULL(36, 0);					\
>  		__ta |= __ttl << 37;						\
> @@ -354,34 +358,44 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>   * @tlb_level:	Translation Table level hint, if known
>   * @tlbi_user:	If 'true', call an additional __tlbi_user()
>   *              (typically for user ASIDs). 'flase' for IPA instructions
> + * @lpa2:	If 'true', the lpa2 scheme is used as set out below
>   *
>   * When the CPU does not support TLB range operations, flush the TLB
>   * entries one by one at the granularity of 'stride'. If the TLB
>   * range ops are supported, then:
>   *
> - * 1. If 'pages' is odd, flush the first page through non-range
> - *    operations;
> + * 1. If FEAT_LPA2 is in use, the start address of a range operation
> + *    must be 64KB aligned, so flush pages one by one until the
> + *    alignment is reached using the non-range operations. This step is
> + *    skipped if LPA2 is not in use.
>   *
>   * 2. For remaining pages: the minimum range granularity is decided
>   *    by 'scale', so multiple range TLBI operations may be required.
> - *    Start from scale = 0, flush the corresponding number of pages
> - *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
> - *    until no pages left.
> + *    Start from scale = 3, flush the corresponding number of pages
> + *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
> + *    until one or zero pages are left. We must start from highest scale
> + *    to ensure 64KB start alignment is maintained in the LPA2 case.

Surely the algorithm is a bit more subtle than this, because always
starting with scale==3 means that you're invalidating at least 64k
*pages*, which is an awful lot (a minimum of 256MB?).

> + *
> + * 3. If there is 1 page remaining, flush it through non-range
> + *    operations. Range operations can only span an even number of
> + *    pages. We save this for last to ensure 64KB start alignment is
> + *    maintained for the LPA2 case.
>   *
>   * Note that certain ranges can be represented by either num = 31 and
>   * scale or num = 0 and scale + 1. The loop below favours the latter
>   * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
>   */
>  #define __flush_tlb_range_op(op, start, pages, stride,			\
> -				asid, tlb_level, tlbi_user)		\
> +				asid, tlb_level, tlbi_user, lpa2)	\
>  do {									\
>  	int num = 0;							\
> -	int scale = 0;							\
> +	int scale = 3;							\
>  	unsigned long addr;						\
>  									\
>  	while (pages > 0) {						\

Not an issue with your patch, but we could be more robust here. If
'pages' is an unsigned quantity and what we have a bug in converging
to 0 below, we'll be looping for a long time. Not to mention the side
effects on pages and start.

>  		if (!system_supports_tlb_range() ||			\
> -		    pages % 2 == 1) {					\
> +		    pages == 1 ||					\
> +		    (lpa2 && start != ALIGN(start, SZ_64K))) {		\
>  			addr = __TLBI_VADDR(start, asid);		\
>  			__tlbi_level(op, addr, tlb_level);		\
>  			if (tlbi_user)					\
> @@ -394,19 +408,19 @@ do {									\
>  		num = __TLBI_RANGE_NUM(pages, scale);			\
>  		if (num >= 0) {						\
>  			addr = __TLBI_VADDR_RANGE(start, asid, scale,	\
> -						  num, tlb_level);	\
> +						num, tlb_level, lpa2);	\
>  			__tlbi(r##op, addr);				\
>  			if (tlbi_user)					\
>  				__tlbi_user(r##op, addr);		\
>  			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
>  			pages -= __TLBI_RANGE_PAGES(num, scale);	\
>  		}							\
> -		scale++;						\
> +		scale--;						\
>  	}								\
>  } while (0)
>  
> -#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
> -	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false)
> +#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level, lpa2) \
> +	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, lpa2)
>  
>  static inline void __flush_tlb_range(struct vm_area_struct *vma,
>  				     unsigned long start, unsigned long end,
> @@ -436,9 +450,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
>  	asid = ASID(vma->vm_mm);
>  
>  	if (last_level)
> -		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
> +		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true, false);
>  	else
> -		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
> +		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, false);
>  
>  	dsb(ish);
>  	mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index 1b265713d6be..d42b72f78a9b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -198,7 +198,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  	/* Switch to requested VMID */
>  	__tlb_switch_to_guest(mmu, &cxt, false);
>  
> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>  
>  	dsb(ish);
>  	__tlbi(vmalle1is);
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 46bd43f61d76..6041c6c78984 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -161,7 +161,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  	/* Switch to requested VMID */
>  	__tlb_switch_to_guest(mmu, &cxt);
>  
> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>  
>  	dsb(ish);
>  	__tlbi(vmalle1is);

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  2023-10-19  9:22       ` Ryan Roberts
@ 2023-10-20  8:05         ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20  8:05 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Thu, 19 Oct 2023 10:22:37 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 19/10/2023 09:03, Marc Zyngier wrote:
> > On Mon, 09 Oct 2023 19:49:57 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
> >> the non-range tlbi instructions can now validly take a 0 value for the
> >> 4KB granule (this is due to the extra level of translation). Secondly,
> > 
> > nit: 0 was always valid. It just didn't indicate any level.
> 
> True. I'll change to "can now validly take a 0 value as a TTL hint".
> 
> > 
> >> the BADDR field in the range tlbi instructions must be aligned to 64KB
> >> when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
> >> continue to operate correctly when LPA2 is in use.
> >>
> >> KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
> >> solve the first problem with this patch.
> > 
> > Is this still true? This patch changes __TLBI_VADDR_RANGE() and co.
> 
> It is no longer true that KVM only uses the non-range routines. v6.6 adds a
> series where KVM will now use the range-based routines too. So that text is out
> of date and I should have spotted it when doing the rebase - I'll fix. KVM now
> using range-based ops is the reason I added patch 2 to this series.
> 
> However, this patch doesn't really change __TLBI_VADDR_RANGE()'s behavior, it
> just makes it robust to the presence of TLBI_TTL_UNKNOWN, instead of 0 which was
> previously used as the "don't know" value.
> 
> > 
> >>
> >> It is solved by always adding the level hint if the level is between [0,
> >> 3] (previously anything other than 0 was hinted, which breaks in the new
> >> level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
> >> hint as the HW will fall back to non-hinted. While we are at it, we
> >> replace the notion of 0 being the non-hinted seninel with a macro,
> >> TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
> >> translation depth increases in future.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> >> ---
> >>  arch/arm64/include/asm/tlb.h      |  9 ++++---
> >>  arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
> >>  2 files changed, 31 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> >> index 2c29239d05c3..93c537635dbb 100644
> >> --- a/arch/arm64/include/asm/tlb.h
> >> +++ b/arch/arm64/include/asm/tlb.h
> >> @@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
> >>  #include <asm-generic/tlb.h>
> >>  
> >>  /*
> >> - * get the tlbi levels in arm64.  Default value is 0 if more than one
> >> - * of cleared_* is set or neither is set.
> >> + * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
> >> + * one of cleared_* is set or neither is set - this elides the level hinting to
> >> + * the hardware.
> >>   * Arm64 doesn't support p4ds now.
> >>   */
> >>  static inline int tlb_get_level(struct mmu_gather *tlb)
> >>  {
> >>  	/* The TTL field is only valid for the leaf entry. */
> >>  	if (tlb->freed_tables)
> >> -		return 0;
> >> +		return TLBI_TTL_UNKNOWN;
> >>  
> >>  	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
> >>  				   tlb->cleared_puds ||
> >> @@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
> >>  				   tlb->cleared_p4ds))
> >>  		return 1;
> >>  
> >> -	return 0;
> >> +	return TLBI_TTL_UNKNOWN;
> >>  }
> >>  
> >>  static inline void tlb_flush(struct mmu_gather *tlb)
> >> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> >> index b149cf9f91bc..e688246b3b13 100644
> >> --- a/arch/arm64/include/asm/tlbflush.h
> >> +++ b/arch/arm64/include/asm/tlbflush.h
> >> @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
> >>   * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
> >>   * the level at which the invalidation must take place. If the level is
> >>   * wrong, no invalidation may take place. In the case where the level
> >> - * cannot be easily determined, a 0 value for the level parameter will
> >> - * perform a non-hinted invalidation.
> >> + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
> >> + * a non-hinted invalidation. Any provided level outside the hint range
> >> + * will also cause fall-back to non-hinted invalidation.
> >>   *
> >>   * For Stage-2 invalidation, use the level values provided to that effect
> >>   * in asm/stage2_pgtable.h.
> >>   */
> >>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
> >>  
> >> +#define TLBI_TTL_UNKNOWN	(-1)
> > 
> > I find this value somehow confusing, as it represent an actual level
> > number. It just happen to be one that cannot be provided as a TTL. So
> > having that as a return value from tlb_get_level() isn't great, and
> > I'd rather have something that cannot be mistaken for a valid level.
> 
> OK, how about INT_MAX?

Works for me.

> 
> > 
> >> +
> >>  #define __tlbi_level(op, addr, level) do {				\
> >>  	u64 arg = addr;							\
> >>  									\
> >>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
> >> -	    level) {							\
> >> +	    level >= 0 && level <= 3) {					\
> >>  		u64 ttl = level & 3;					\
> >>  		ttl |= get_trans_granule() << 2;			\
> >>  		arg &= ~TLBI_TTL_MASK;					\
> >> @@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
> >>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
> >>   *
> >>   */
> >> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
> >> -	({							\
> >> -		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
> >> -		__ta &= GENMASK_ULL(36, 0);			\
> >> -		__ta |= (unsigned long)(ttl) << 37;		\
> >> -		__ta |= (unsigned long)(num) << 39;		\
> >> -		__ta |= (unsigned long)(scale) << 44;		\
> >> -		__ta |= get_trans_granule() << 46;		\
> >> -		__ta |= (unsigned long)(asid) << 48;		\
> >> -		__ta;						\
> >> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
> >> +	({									\
> >> +		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
> >> +		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
> >> +		__ta &= GENMASK_ULL(36, 0);					\
> >> +		__ta |= __ttl << 37;						\
> >> +		__ta |= (unsigned long)(num) << 39;				\
> >> +		__ta |= (unsigned long)(scale) << 44;				\
> >> +		__ta |= get_trans_granule() << 46;				\
> >> +		__ta |= (unsigned long)(asid) << 48;				\
> >> +		__ta;								\
> >>  	})
> >>  
> >>  /* These macros are used by the TLBI RANGE feature. */
> >> @@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
> >>   *		CPUs, ensuring that any walk-cache entries associated with the
> >>   *		translation are also invalidated.
> >>   *
> >> - *	__flush_tlb_range(vma, start, end, stride, last_level)
> >> + *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
> >>   *		Invalidate the virtual-address range '[start, end)' on all
> >>   *		CPUs for the user address space corresponding to 'vma->mm'.
> >>   *		The invalidation operations are issued at a granularity
> >>   *		determined by 'stride' and only affect any walk-cache entries
> >> - *		if 'last_level' is equal to false.
> >> + *		if 'last_level' is equal to false. tlb_level is the level at
> >> + *		which the invalidation must take place. If the level is wrong,
> >> + *		no invalidation may take place. In the case where the level
> >> + *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
> >> + *		perform a non-hinted invalidation.
> >>   *
> >>   *
> >>   *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
> >> @@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
> >>  	/*
> >>  	 * We cannot use leaf-only invalidation here, since we may be invalidating
> >>  	 * table entries as part of collapsing hugepages or moving page tables.
> >> -	 * Set the tlb_level to 0 because we can not get enough information here.
> >> +	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
> >> +	 * information here.
> >>  	 */
> >> -	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
> >> +	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
> >>  }
> >>  
> >>  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > 
> > It feels like this range stuff would be better located in the second
> > patch. Not a huge deal though.
> 
> As I said, this is the minimal change to the range-based side of things to
> robustly deal with the introduction of TLBI_TTL_UNKNOWN.
> 
> But I wonder if I'm actually better of squashing both of the 2 patches into one.
> The only reason I split it previously was because KVM was only using the
> level-based ops.

Maybe. There is something to be said about making the range rework
(decreasing scale) an independent patch, as it is a significant change
on its own. But maybe the rest of the plumbing can be grouped
together.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
@ 2023-10-20  8:05         ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20  8:05 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Thu, 19 Oct 2023 10:22:37 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 19/10/2023 09:03, Marc Zyngier wrote:
> > On Mon, 09 Oct 2023 19:49:57 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
> >> the non-range tlbi instructions can now validly take a 0 value for the
> >> 4KB granule (this is due to the extra level of translation). Secondly,
> > 
> > nit: 0 was always valid. It just didn't indicate any level.
> 
> True. I'll change to "can now validly take a 0 value as a TTL hint".
> 
> > 
> >> the BADDR field in the range tlbi instructions must be aligned to 64KB
> >> when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
> >> continue to operate correctly when LPA2 is in use.
> >>
> >> KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
> >> solve the first problem with this patch.
> > 
> > Is this still true? This patch changes __TLBI_VADDR_RANGE() and co.
> 
> It is no longer true that KVM only uses the non-range routines. v6.6 adds a
> series where KVM will now use the range-based routines too. So that text is out
> of date and I should have spotted it when doing the rebase - I'll fix. KVM now
> using range-based ops is the reason I added patch 2 to this series.
> 
> However, this patch doesn't really change __TLBI_VADDR_RANGE()'s behavior, it
> just makes it robust to the presence of TLBI_TTL_UNKNOWN, instead of 0 which was
> previously used as the "don't know" value.
> 
> > 
> >>
> >> It is solved by always adding the level hint if the level is between [0,
> >> 3] (previously anything other than 0 was hinted, which breaks in the new
> >> level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
> >> hint as the HW will fall back to non-hinted. While we are at it, we
> >> replace the notion of 0 being the non-hinted seninel with a macro,
> >> TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
> >> translation depth increases in future.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> >> ---
> >>  arch/arm64/include/asm/tlb.h      |  9 ++++---
> >>  arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
> >>  2 files changed, 31 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> >> index 2c29239d05c3..93c537635dbb 100644
> >> --- a/arch/arm64/include/asm/tlb.h
> >> +++ b/arch/arm64/include/asm/tlb.h
> >> @@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
> >>  #include <asm-generic/tlb.h>
> >>  
> >>  /*
> >> - * get the tlbi levels in arm64.  Default value is 0 if more than one
> >> - * of cleared_* is set or neither is set.
> >> + * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
> >> + * one of cleared_* is set or neither is set - this elides the level hinting to
> >> + * the hardware.
> >>   * Arm64 doesn't support p4ds now.
> >>   */
> >>  static inline int tlb_get_level(struct mmu_gather *tlb)
> >>  {
> >>  	/* The TTL field is only valid for the leaf entry. */
> >>  	if (tlb->freed_tables)
> >> -		return 0;
> >> +		return TLBI_TTL_UNKNOWN;
> >>  
> >>  	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
> >>  				   tlb->cleared_puds ||
> >> @@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
> >>  				   tlb->cleared_p4ds))
> >>  		return 1;
> >>  
> >> -	return 0;
> >> +	return TLBI_TTL_UNKNOWN;
> >>  }
> >>  
> >>  static inline void tlb_flush(struct mmu_gather *tlb)
> >> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> >> index b149cf9f91bc..e688246b3b13 100644
> >> --- a/arch/arm64/include/asm/tlbflush.h
> >> +++ b/arch/arm64/include/asm/tlbflush.h
> >> @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
> >>   * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
> >>   * the level at which the invalidation must take place. If the level is
> >>   * wrong, no invalidation may take place. In the case where the level
> >> - * cannot be easily determined, a 0 value for the level parameter will
> >> - * perform a non-hinted invalidation.
> >> + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
> >> + * a non-hinted invalidation. Any provided level outside the hint range
> >> + * will also cause fall-back to non-hinted invalidation.
> >>   *
> >>   * For Stage-2 invalidation, use the level values provided to that effect
> >>   * in asm/stage2_pgtable.h.
> >>   */
> >>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
> >>  
> >> +#define TLBI_TTL_UNKNOWN	(-1)
> > 
> > I find this value somehow confusing, as it represent an actual level
> > number. It just happen to be one that cannot be provided as a TTL. So
> > having that as a return value from tlb_get_level() isn't great, and
> > I'd rather have something that cannot be mistaken for a valid level.
> 
> OK, how about INT_MAX?

Works for me.

> 
> > 
> >> +
> >>  #define __tlbi_level(op, addr, level) do {				\
> >>  	u64 arg = addr;							\
> >>  									\
> >>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
> >> -	    level) {							\
> >> +	    level >= 0 && level <= 3) {					\
> >>  		u64 ttl = level & 3;					\
> >>  		ttl |= get_trans_granule() << 2;			\
> >>  		arg &= ~TLBI_TTL_MASK;					\
> >> @@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
> >>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
> >>   *
> >>   */
> >> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
> >> -	({							\
> >> -		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
> >> -		__ta &= GENMASK_ULL(36, 0);			\
> >> -		__ta |= (unsigned long)(ttl) << 37;		\
> >> -		__ta |= (unsigned long)(num) << 39;		\
> >> -		__ta |= (unsigned long)(scale) << 44;		\
> >> -		__ta |= get_trans_granule() << 46;		\
> >> -		__ta |= (unsigned long)(asid) << 48;		\
> >> -		__ta;						\
> >> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
> >> +	({									\
> >> +		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
> >> +		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
> >> +		__ta &= GENMASK_ULL(36, 0);					\
> >> +		__ta |= __ttl << 37;						\
> >> +		__ta |= (unsigned long)(num) << 39;				\
> >> +		__ta |= (unsigned long)(scale) << 44;				\
> >> +		__ta |= get_trans_granule() << 46;				\
> >> +		__ta |= (unsigned long)(asid) << 48;				\
> >> +		__ta;								\
> >>  	})
> >>  
> >>  /* These macros are used by the TLBI RANGE feature. */
> >> @@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
> >>   *		CPUs, ensuring that any walk-cache entries associated with the
> >>   *		translation are also invalidated.
> >>   *
> >> - *	__flush_tlb_range(vma, start, end, stride, last_level)
> >> + *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
> >>   *		Invalidate the virtual-address range '[start, end)' on all
> >>   *		CPUs for the user address space corresponding to 'vma->mm'.
> >>   *		The invalidation operations are issued at a granularity
> >>   *		determined by 'stride' and only affect any walk-cache entries
> >> - *		if 'last_level' is equal to false.
> >> + *		if 'last_level' is equal to false. tlb_level is the level at
> >> + *		which the invalidation must take place. If the level is wrong,
> >> + *		no invalidation may take place. In the case where the level
> >> + *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
> >> + *		perform a non-hinted invalidation.
> >>   *
> >>   *
> >>   *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
> >> @@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
> >>  	/*
> >>  	 * We cannot use leaf-only invalidation here, since we may be invalidating
> >>  	 * table entries as part of collapsing hugepages or moving page tables.
> >> -	 * Set the tlb_level to 0 because we can not get enough information here.
> >> +	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
> >> +	 * information here.
> >>  	 */
> >> -	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
> >> +	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
> >>  }
> >>  
> >>  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > 
> > It feels like this range stuff would be better located in the second
> > patch. Not a huge deal though.
> 
> As I said, this is the minimal change to the range-based side of things to
> robustly deal with the introduction of TLBI_TTL_UNKNOWN.
> 
> But I wonder if I'm actually better of squashing both of the 2 patches into one.
> The only reason I split it previously was because KVM was only using the
> level-based ops.

Maybe. There is something to be said about making the range rework
(decreasing scale) an independent patch, as it is a significant change
on its own. But maybe the rest of the plumbing can be grouped
together.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
  2023-10-09 18:50   ` Ryan Roberts
@ 2023-10-20  8:16     ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20  8:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:50:00 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> Expose FEAT_LPA2 as a capability so that we can take advantage of
> alternatives patching in both the kernel and hypervisor.
> 
> Although FEAT_LPA2 presence is advertised separately for stage1 and
> stage2, the expectation is that in practice both stages will either
> support or not support it. Therefore, for the case where KVM is present,
> we combine both into a single capability, allowing us to simplify the
> implementation. For the case where KVM is not present, we only care
> about stage1.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/cpufeature.h |  5 ++++
>  arch/arm64/kernel/cpufeature.c      | 46 +++++++++++++++++++++++++++++
>  arch/arm64/tools/cpucaps            |  1 +
>  3 files changed, 52 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 5bba39376055..b1292ec88538 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -831,6 +831,11 @@ static inline bool system_supports_tlb_range(void)
>  		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
>  }
>  
> +static inline bool system_supports_lpa2(void)
> +{
> +	return cpus_have_const_cap(ARM64_HAS_LPA2);

cpus_have_const_cap() is going away. You may want to look at Mark's
series to see how to replace this one.

> +}
> +
>  int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>  bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 444a73c2e638..1ccb1fe0e310 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1746,6 +1746,46 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
>  	return !meltdown_safe;
>  }
>  
> +static inline bool has_lpa2_at_stage1(u64 mmfr0)

Why inline? It isn't like this has any performance implication...

> +{
> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
> +	unsigned int tgran;
> +
> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> +						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
> +#else
> +	return false;
> +#endif

Writing this using IS_ENABLED() would be slightly more pleasing to my
tired eyes... ;-)

> +}
> +
> +static inline bool has_lpa2_at_stage2(u64 mmfr0)
> +{
> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
> +	unsigned int tgran;
> +
> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> +						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
> +#else
> +	return false;
> +#endif
> +}
> +
> +static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> +{
> +	u64 mmfr0;
> +	bool ret;
> +
> +	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> +	ret = has_lpa2_at_stage1(mmfr0);
> +
> +	if (kvm_get_mode() != KVM_MODE_NONE)
> +		ret = ret && has_lpa2_at_stage2(mmfr0);

Isn't it too late to go back on the decision to use LPA2 at S1 if you
realise that S2 doesn't support it?

> +
> +	return ret;
> +}
> +
>  #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
>  #define KPTI_NG_TEMP_VA		(-(1UL << PMD_SHIFT))
>  
> @@ -2719,6 +2759,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>  		.matches = has_cpuid_feature,
>  		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
>  	},
> +	{
> +		.desc = "Large Physical Address 2",
> +		.capability = ARM64_HAS_LPA2,
> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
> +		.matches = has_lpa2,
> +	},
>  	{},
>  };
>  
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index dea3dc89234b..07f3957b8488 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -36,6 +36,7 @@ HAS_GIC_PRIO_MASKING
>  HAS_GIC_PRIO_RELAXED_SYNC
>  HAS_HCX
>  HAS_LDAPR
> +HAS_LPA2
>  HAS_LSE_ATOMICS
>  HAS_MOPS
>  HAS_NESTED_VIRT

Why isn't this patch the first or second in the series? You could use
it to drive the LPA2 decision in the patch #2, avoiding the ugly lpa2
flag...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
@ 2023-10-20  8:16     ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20  8:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:50:00 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> Expose FEAT_LPA2 as a capability so that we can take advantage of
> alternatives patching in both the kernel and hypervisor.
> 
> Although FEAT_LPA2 presence is advertised separately for stage1 and
> stage2, the expectation is that in practice both stages will either
> support or not support it. Therefore, for the case where KVM is present,
> we combine both into a single capability, allowing us to simplify the
> implementation. For the case where KVM is not present, we only care
> about stage1.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/cpufeature.h |  5 ++++
>  arch/arm64/kernel/cpufeature.c      | 46 +++++++++++++++++++++++++++++
>  arch/arm64/tools/cpucaps            |  1 +
>  3 files changed, 52 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 5bba39376055..b1292ec88538 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -831,6 +831,11 @@ static inline bool system_supports_tlb_range(void)
>  		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
>  }
>  
> +static inline bool system_supports_lpa2(void)
> +{
> +	return cpus_have_const_cap(ARM64_HAS_LPA2);

cpus_have_const_cap() is going away. You may want to look at Mark's
series to see how to replace this one.

> +}
> +
>  int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>  bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 444a73c2e638..1ccb1fe0e310 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1746,6 +1746,46 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
>  	return !meltdown_safe;
>  }
>  
> +static inline bool has_lpa2_at_stage1(u64 mmfr0)

Why inline? It isn't like this has any performance implication...

> +{
> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
> +	unsigned int tgran;
> +
> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> +						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
> +#else
> +	return false;
> +#endif

Writing this using IS_ENABLED() would be slightly more pleasing to my
tired eyes... ;-)

> +}
> +
> +static inline bool has_lpa2_at_stage2(u64 mmfr0)
> +{
> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
> +	unsigned int tgran;
> +
> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> +						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
> +#else
> +	return false;
> +#endif
> +}
> +
> +static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> +{
> +	u64 mmfr0;
> +	bool ret;
> +
> +	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> +	ret = has_lpa2_at_stage1(mmfr0);
> +
> +	if (kvm_get_mode() != KVM_MODE_NONE)
> +		ret = ret && has_lpa2_at_stage2(mmfr0);

Isn't it too late to go back on the decision to use LPA2 at S1 if you
realise that S2 doesn't support it?

> +
> +	return ret;
> +}
> +
>  #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
>  #define KPTI_NG_TEMP_VA		(-(1UL << PMD_SHIFT))
>  
> @@ -2719,6 +2759,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>  		.matches = has_cpuid_feature,
>  		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
>  	},
> +	{
> +		.desc = "Large Physical Address 2",
> +		.capability = ARM64_HAS_LPA2,
> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
> +		.matches = has_lpa2,
> +	},
>  	{},
>  };
>  
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index dea3dc89234b..07f3957b8488 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -36,6 +36,7 @@ HAS_GIC_PRIO_MASKING
>  HAS_GIC_PRIO_RELAXED_SYNC
>  HAS_HCX
>  HAS_LDAPR
> +HAS_LPA2
>  HAS_LSE_ATOMICS
>  HAS_MOPS
>  HAS_NESTED_VIRT

Why isn't this patch the first or second in the series? You could use
it to drive the LPA2 decision in the patch #2, avoiding the ugly lpa2
flag...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
  2023-10-09 18:50   ` Ryan Roberts
@ 2023-10-20  9:16     ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20  9:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:50:02 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
> page size we are using, always use LPA2-style page-tables for stage 2
> and hyp stage 1, regardless of the VMM-requested IPA size or
> HW-implemented PA size. When in use we can now support up to 52-bit IPA
> and PA sizes.

Maybe worth stating that this S1 comment only applies to the
standalone EL2 portion, and not the VHE S1 mappings.

> 
> We use the previously created cpu feature to track whether LPA2 is
> supported for deciding whether to use the LPA2 or classic pte format.
> 
> Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
> 4KB, 64GB with 16KB). We explicitly don't enable these in the library
> because stage2_apply_range() works on batch sizes of the largest used
> block mapping, and increasing the size of the batch would lead to soft
> lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
> stage2_apply_range() batch size to largest block").
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 47 +++++++++++++++++++++-------
>  arch/arm64/kvm/arm.c                 |  2 ++
>  arch/arm64/kvm/hyp/nvhe/tlb.c        |  3 +-
>  arch/arm64/kvm/hyp/pgtable.c         | 15 +++++++--
>  arch/arm64/kvm/hyp/vhe/tlb.c         |  3 +-
>  5 files changed, 54 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index d3e354bb8351..b240158e1218 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -25,12 +25,22 @@
>  #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
>  #endif
>  
> +static inline u64 kvm_get_parange_max(void)
> +{
> +	if (system_supports_lpa2() ||
> +	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))

nit: the rest of the code uses PAGE_SHIFT instead of PAGE_SIZE. Not a
big deal, but being consistent might help the reader.

> +		return ID_AA64MMFR0_EL1_PARANGE_52;
> +	else
> +		return ID_AA64MMFR0_EL1_PARANGE_48;
> +}
> +
>  static inline u64 kvm_get_parange(u64 mmfr0)
>  {
> +	u64 parange_max = kvm_get_parange_max();
>  	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
>  				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
> -	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
> -		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
> +	if (parange > parange_max)
> +		parange = parange_max;
>  
>  	return parange;
>  }
> @@ -41,6 +51,8 @@ typedef u64 kvm_pte_t;
>  
>  #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
>  #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
> +#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
> +#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
>  
>  #define KVM_PHYS_INVALID		(-1ULL)
>  
> @@ -51,21 +63,34 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
>  
>  static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
>  {
> -	u64 pa = pte & KVM_PTE_ADDR_MASK;
> -
> -	if (PAGE_SHIFT == 16)
> -		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
> +	u64 pa;
> +
> +	if (system_supports_lpa2()) {
> +		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
> +		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
> +	} else {
> +		pa = pte & KVM_PTE_ADDR_MASK;
> +		if (PAGE_SHIFT == 16)
> +			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
> +	}
>  
>  	return pa;
>  }
>  
>  static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
>  {
> -	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
> -
> -	if (PAGE_SHIFT == 16) {
> -		pa &= GENMASK(51, 48);
> -		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
> +	kvm_pte_t pte;
> +
> +	if (system_supports_lpa2()) {
> +		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
> +		pa &= GENMASK(51, 50);
> +		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
> +	} else {
> +		pte = pa & KVM_PTE_ADDR_MASK;
> +		if (PAGE_SHIFT == 16) {
> +			pa &= GENMASK(51, 48);
> +			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
> +		}
>  	}
>  
>  	return pte;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 4866b3f7b4ea..73cc67c2a8a7 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1747,6 +1747,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>  	}
>  	tcr &= ~TCR_T0SZ_MASK;
>  	tcr |= TCR_T0SZ(hyp_va_bits);
> +	if (system_supports_lpa2())
> +		tcr |= TCR_EL2_DS;
>  	params->tcr_el2 = tcr;
>  
>  	params->pgd_pa = kvm_mmu_get_httbr();
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index d42b72f78a9b..c3cd16c6f95f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -198,7 +198,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  	/* Switch to requested VMID */
>  	__tlb_switch_to_guest(mmu, &cxt, false);
>  
> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
> +				system_supports_lpa2());

At this stage, I'd fully expect the flag to have been subsumed into
the helper...

>  
>  	dsb(ish);
>  	__tlbi(vmalle1is);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index f155b8c9e98c..062eb7bcdb8a 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -79,7 +79,10 @@ static bool kvm_pgtable_walk_skip_cmo(const struct kvm_pgtable_visit_ctx *ctx)
>  
>  static bool kvm_phys_is_valid(u64 phys)
>  {
> -	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
> +	u64 parange_max = kvm_get_parange_max();
> +	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
> +
> +	return phys < BIT(shift);
>  }
>  
>  static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
> @@ -408,7 +411,8 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
>  	}
>  
>  	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> -	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
> +	if (!system_supports_lpa2())
> +		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
>  	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
>  	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
>  	*ptep = attr;
> @@ -654,6 +658,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>  		vtcr |= VTCR_EL2_HA;
>  #endif /* CONFIG_ARM64_HW_AFDBM */
>  
> +	if (system_supports_lpa2())
> +		vtcr |= VTCR_EL2_DS;
> +
>  	/* Set the vmid bits */
>  	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
>  		VTCR_EL2_VS_16BIT :
> @@ -711,7 +718,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
>  	if (prot & KVM_PGTABLE_PROT_W)
>  		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
>  
> -	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> +	if (!system_supports_lpa2())
> +		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> +
>  	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
>  	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
>  	*ptep = attr;
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 6041c6c78984..40cea2482a76 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -161,7 +161,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  	/* Switch to requested VMID */
>  	__tlb_switch_to_guest(mmu, &cxt);
>  
> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
> +				system_supports_lpa2());
>  
>  	dsb(ish);
>  	__tlbi(vmalle1is);

One thing I don't see here is how you update the tcr_compute_pa_size
macro that is used on the initial nVHE setup, which is inconsistent
with the kvm_get_parange_max() helper.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
@ 2023-10-20  9:16     ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20  9:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:50:02 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
> page size we are using, always use LPA2-style page-tables for stage 2
> and hyp stage 1, regardless of the VMM-requested IPA size or
> HW-implemented PA size. When in use we can now support up to 52-bit IPA
> and PA sizes.

Maybe worth stating that this S1 comment only applies to the
standalone EL2 portion, and not the VHE S1 mappings.

> 
> We use the previously created cpu feature to track whether LPA2 is
> supported for deciding whether to use the LPA2 or classic pte format.
> 
> Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
> 4KB, 64GB with 16KB). We explicitly don't enable these in the library
> because stage2_apply_range() works on batch sizes of the largest used
> block mapping, and increasing the size of the batch would lead to soft
> lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
> stage2_apply_range() batch size to largest block").
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 47 +++++++++++++++++++++-------
>  arch/arm64/kvm/arm.c                 |  2 ++
>  arch/arm64/kvm/hyp/nvhe/tlb.c        |  3 +-
>  arch/arm64/kvm/hyp/pgtable.c         | 15 +++++++--
>  arch/arm64/kvm/hyp/vhe/tlb.c         |  3 +-
>  5 files changed, 54 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index d3e354bb8351..b240158e1218 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -25,12 +25,22 @@
>  #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
>  #endif
>  
> +static inline u64 kvm_get_parange_max(void)
> +{
> +	if (system_supports_lpa2() ||
> +	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))

nit: the rest of the code uses PAGE_SHIFT instead of PAGE_SIZE. Not a
big deal, but being consistent might help the reader.

> +		return ID_AA64MMFR0_EL1_PARANGE_52;
> +	else
> +		return ID_AA64MMFR0_EL1_PARANGE_48;
> +}
> +
>  static inline u64 kvm_get_parange(u64 mmfr0)
>  {
> +	u64 parange_max = kvm_get_parange_max();
>  	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
>  				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
> -	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
> -		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
> +	if (parange > parange_max)
> +		parange = parange_max;
>  
>  	return parange;
>  }
> @@ -41,6 +51,8 @@ typedef u64 kvm_pte_t;
>  
>  #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
>  #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
> +#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
> +#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
>  
>  #define KVM_PHYS_INVALID		(-1ULL)
>  
> @@ -51,21 +63,34 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
>  
>  static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
>  {
> -	u64 pa = pte & KVM_PTE_ADDR_MASK;
> -
> -	if (PAGE_SHIFT == 16)
> -		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
> +	u64 pa;
> +
> +	if (system_supports_lpa2()) {
> +		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
> +		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
> +	} else {
> +		pa = pte & KVM_PTE_ADDR_MASK;
> +		if (PAGE_SHIFT == 16)
> +			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
> +	}
>  
>  	return pa;
>  }
>  
>  static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
>  {
> -	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
> -
> -	if (PAGE_SHIFT == 16) {
> -		pa &= GENMASK(51, 48);
> -		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
> +	kvm_pte_t pte;
> +
> +	if (system_supports_lpa2()) {
> +		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
> +		pa &= GENMASK(51, 50);
> +		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
> +	} else {
> +		pte = pa & KVM_PTE_ADDR_MASK;
> +		if (PAGE_SHIFT == 16) {
> +			pa &= GENMASK(51, 48);
> +			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
> +		}
>  	}
>  
>  	return pte;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 4866b3f7b4ea..73cc67c2a8a7 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1747,6 +1747,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>  	}
>  	tcr &= ~TCR_T0SZ_MASK;
>  	tcr |= TCR_T0SZ(hyp_va_bits);
> +	if (system_supports_lpa2())
> +		tcr |= TCR_EL2_DS;
>  	params->tcr_el2 = tcr;
>  
>  	params->pgd_pa = kvm_mmu_get_httbr();
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index d42b72f78a9b..c3cd16c6f95f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -198,7 +198,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  	/* Switch to requested VMID */
>  	__tlb_switch_to_guest(mmu, &cxt, false);
>  
> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
> +				system_supports_lpa2());

At this stage, I'd fully expect the flag to have been subsumed into
the helper...

>  
>  	dsb(ish);
>  	__tlbi(vmalle1is);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index f155b8c9e98c..062eb7bcdb8a 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -79,7 +79,10 @@ static bool kvm_pgtable_walk_skip_cmo(const struct kvm_pgtable_visit_ctx *ctx)
>  
>  static bool kvm_phys_is_valid(u64 phys)
>  {
> -	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
> +	u64 parange_max = kvm_get_parange_max();
> +	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
> +
> +	return phys < BIT(shift);
>  }
>  
>  static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
> @@ -408,7 +411,8 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
>  	}
>  
>  	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> -	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
> +	if (!system_supports_lpa2())
> +		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
>  	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
>  	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
>  	*ptep = attr;
> @@ -654,6 +658,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>  		vtcr |= VTCR_EL2_HA;
>  #endif /* CONFIG_ARM64_HW_AFDBM */
>  
> +	if (system_supports_lpa2())
> +		vtcr |= VTCR_EL2_DS;
> +
>  	/* Set the vmid bits */
>  	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
>  		VTCR_EL2_VS_16BIT :
> @@ -711,7 +718,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
>  	if (prot & KVM_PGTABLE_PROT_W)
>  		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
>  
> -	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> +	if (!system_supports_lpa2())
> +		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> +
>  	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
>  	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
>  	*ptep = attr;
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 6041c6c78984..40cea2482a76 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -161,7 +161,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>  	/* Switch to requested VMID */
>  	__tlb_switch_to_guest(mmu, &cxt);
>  
> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
> +				system_supports_lpa2());
>  
>  	dsb(ish);
>  	__tlbi(vmalle1is);

One thing I don't see here is how you update the tcr_compute_pa_size
macro that is used on the initial nVHE setup, which is inconsistent
with the kvm_get_parange_max() helper.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 07/12] KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode()
  2023-10-09 18:50   ` Ryan Roberts
@ 2023-10-20  9:21     ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20  9:21 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:50:03 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> With the addition of LPA2 support in the hypervisor, the PA size
> supported by the HW must be capped with a runtime decision, rather than
> simply using a compile-time decision based on PA_BITS. For example, on a
> system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
> or 16KB kernel compiled with LPA2 support must still limit the PA size
> to 48 bits.
> 
> Therefore, move the insertion of the PS field into TCR_EL2 out of
> __kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
> where the rest of TCR_EL2 is prepared. This allows us to figure out PS
> with kvm_get_parange(), which has the appropriate logic to ensure the
> above requirement. (and the PS field of VTCR_EL2 is already populated
> this way).
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/kvm/arm.c               | 3 +++
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 73cc67c2a8a7..0bb8918475d2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1726,6 +1726,7 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>  {
>  	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
>  	unsigned long tcr;
> +	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);

nit: move this one up by a line (yes, I'm being difficult).

>  
>  	/*
>  	 * Calculate the raw per-cpu offset without a translation from the
> @@ -1747,6 +1748,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>  	}
>  	tcr &= ~TCR_T0SZ_MASK;
>  	tcr |= TCR_T0SZ(hyp_va_bits);
> +	tcr &= ~TCR_EL2_PS_MASK;
> +	tcr |= FIELD_PREP(TCR_EL2_PS_MASK, kvm_get_parange(mmfr0));
>  	if (system_supports_lpa2())
>  		tcr |= TCR_EL2_DS;
>  	params->tcr_el2 = tcr;
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index 1cc06e6797bd..f62a7d360285 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -122,11 +122,7 @@ alternative_if ARM64_HAS_CNP
>  alternative_else_nop_endif
>  	msr	ttbr0_el2, x2
>  
> -	/*
> -	 * Set the PS bits in TCR_EL2.
> -	 */
>  	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
> -	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
>  	msr	tcr_el2, x0
>  
>  	isb

Ah, this is where this was hiding. This should be folded into the
previous patch for consistency (this is otherwise non bisectable).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 07/12] KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode()
@ 2023-10-20  9:21     ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20  9:21 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:50:03 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> With the addition of LPA2 support in the hypervisor, the PA size
> supported by the HW must be capped with a runtime decision, rather than
> simply using a compile-time decision based on PA_BITS. For example, on a
> system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
> or 16KB kernel compiled with LPA2 support must still limit the PA size
> to 48 bits.
> 
> Therefore, move the insertion of the PS field into TCR_EL2 out of
> __kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
> where the rest of TCR_EL2 is prepared. This allows us to figure out PS
> with kvm_get_parange(), which has the appropriate logic to ensure the
> above requirement. (and the PS field of VTCR_EL2 is already populated
> this way).
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/kvm/arm.c               | 3 +++
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 73cc67c2a8a7..0bb8918475d2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1726,6 +1726,7 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>  {
>  	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
>  	unsigned long tcr;
> +	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);

nit: move this one up by a line (yes, I'm being difficult).

>  
>  	/*
>  	 * Calculate the raw per-cpu offset without a translation from the
> @@ -1747,6 +1748,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>  	}
>  	tcr &= ~TCR_T0SZ_MASK;
>  	tcr |= TCR_T0SZ(hyp_va_bits);
> +	tcr &= ~TCR_EL2_PS_MASK;
> +	tcr |= FIELD_PREP(TCR_EL2_PS_MASK, kvm_get_parange(mmfr0));
>  	if (system_supports_lpa2())
>  		tcr |= TCR_EL2_DS;
>  	params->tcr_el2 = tcr;
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index 1cc06e6797bd..f62a7d360285 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -122,11 +122,7 @@ alternative_if ARM64_HAS_CNP
>  alternative_else_nop_endif
>  	msr	ttbr0_el2, x2
>  
> -	/*
> -	 * Set the PS bits in TCR_EL2.
> -	 */
>  	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
> -	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
>  	msr	tcr_el2, x0
>  
>  	isb

Ah, this is where this was hiding. This should be folded into the
previous patch for consistency (this is otherwise non bisectable).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 08/12] KVM: arm64: Convert translation level parameter to s8
  2023-10-09 18:50   ` Ryan Roberts
@ 2023-10-20 10:42     ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20 10:42 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:50:04 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
> translation, level -1, so levels can now be in the range [-1;3]. 3 is
> always the last level and the first level is determined based on the
> number of VA bits in use.
> 
> Convert level variables to use a signed type in preparation for
> supporting this new level -1.
> 
> Since the last level is always anchored at 3, and the first level varies
> to suit the number of VA/IPA bits, take the opportunity to replace
> KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
> KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
> levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
> longer be true.
> 
> No behavioral changes intended.

Shrug. Unless you have compared the binaries before and after and
proven that they are strictly identical, there will be behaviour
changes, intended or otherwise.

I know what you're trying to convey, but I've seen so many patches
carrying a sentence of this sort and yet turning the kernel on its
head that I've become allergic to it. Sorry.

> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h  |  2 +-
>  arch/arm64/include/asm/kvm_pgtable.h  | 31 ++++++-------
>  arch/arm64/include/asm/kvm_pkvm.h     |  5 ++-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +--
>  arch/arm64/kvm/hyp/nvhe/mm.c          |  4 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |  2 +-
>  arch/arm64/kvm/hyp/pgtable.c          | 64 ++++++++++++++-------------
>  arch/arm64/kvm/mmu.c                  | 16 ++++---
>  8 files changed, 69 insertions(+), 61 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 3d6725ff0bf6..bf3ef66eb51f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -404,7 +404,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>  }
>  
> -static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
> +static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>  {
>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
>  }
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index b240158e1218..c61bb9709201 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -11,7 +11,8 @@
>  #include <linux/kvm_host.h>
>  #include <linux/types.h>
>  
> -#define KVM_PGTABLE_MAX_LEVELS		4U
> +#define KVM_PGTABLE_FIRST_LEVEL		0
> +#define KVM_PGTABLE_LAST_LEVEL		3
>  
>  /*
>   * The largest supported block sizes for KVM (no 52-bit PA support):
> @@ -20,9 +21,9 @@
>   *  - 64K (level 2):	512MB
>   */
>  #ifdef CONFIG_ARM64_4K_PAGES
> -#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
> +#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
>  #else
> -#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
> +#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
>  #endif
>  
>  static inline u64 kvm_get_parange_max(void)
> @@ -101,28 +102,28 @@ static inline kvm_pfn_t kvm_pte_to_pfn(kvm_pte_t pte)
>  	return __phys_to_pfn(kvm_pte_to_phys(pte));
>  }
>  
> -static inline u64 kvm_granule_shift(u32 level)
> +static inline u64 kvm_granule_shift(s8 level)
>  {
> -	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
> +	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
>  	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);

I'm amazed that the macro tolerates a negative level, but it really
does.

>  }
>  
> -static inline u64 kvm_granule_size(u32 level)
> +static inline u64 kvm_granule_size(s8 level)
>  {
>  	return BIT(kvm_granule_shift(level));
>  }
>  
> -static inline bool kvm_level_supports_block_mapping(u32 level)
> +static inline bool kvm_level_supports_block_mapping(s8 level)
>  {
>  	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
>  }
>  
>  static inline u32 kvm_supported_block_sizes(void)
>  {
> -	u32 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
> +	s8 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
>  	u32 r = 0;
>  
> -	for (; level < KVM_PGTABLE_MAX_LEVELS; level++)
> +	for (; level <= KVM_PGTABLE_LAST_LEVEL; level++)
>  		r |= BIT(kvm_granule_shift(level));
>  
>  	return r;
> @@ -167,7 +168,7 @@ struct kvm_pgtable_mm_ops {
>  	void*		(*zalloc_page)(void *arg);
>  	void*		(*zalloc_pages_exact)(size_t size);
>  	void		(*free_pages_exact)(void *addr, size_t size);
> -	void		(*free_unlinked_table)(void *addr, u32 level);
> +	void		(*free_unlinked_table)(void *addr, s8 level);
>  	void		(*get_page)(void *addr);
>  	void		(*put_page)(void *addr);
>  	int		(*page_count)(void *addr);
> @@ -263,7 +264,7 @@ struct kvm_pgtable_visit_ctx {
>  	u64					start;
>  	u64					addr;
>  	u64					end;
> -	u32					level;
> +	s8					level;
>  	enum kvm_pgtable_walk_flags		flags;
>  };
>  
> @@ -366,7 +367,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
>   */
>  struct kvm_pgtable {
>  	u32					ia_bits;
> -	u32					start_level;
> +	s8					start_level;
>  	kvm_pteref_t				pgd;
>  	struct kvm_pgtable_mm_ops		*mm_ops;
>  
> @@ -500,7 +501,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>   * The page-table is assumed to be unreachable by any hardware walkers prior to
>   * freeing and therefore no TLB invalidation is performed.
>   */
> -void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
> +void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
>  
>  /**
>   * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure.
> @@ -524,7 +525,7 @@ void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *p
>   * an ERR_PTR(error) on failure.
>   */
>  kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
> -					      u64 phys, u32 level,
> +					      u64 phys, s8 level,
>  					      enum kvm_pgtable_prot prot,
>  					      void *mc, bool force_pte);
>  
> @@ -750,7 +751,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   * Return: 0 on success, negative error code on failure.
>   */
>  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> -			 kvm_pte_t *ptep, u32 *level);
> +			 kvm_pte_t *ptep, s8 *level);
>  
>  /**
>   * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index e46250a02017..ad9cfb5c1ff4 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -56,10 +56,11 @@ static inline unsigned long hyp_vm_table_pages(void)
>  
>  static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>  {
> -	unsigned long total = 0, i;
> +	unsigned long total = 0;
> +	int i;
>  
>  	/* Provision the worst case scenario */
> -	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
> +	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
>  		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
>  		total += nr_pages;
>  	}
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 9d703441278b..2cfb6352a8ea 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -91,7 +91,7 @@ static void host_s2_put_page(void *addr)
>  	hyp_put_page(&host_s2_pool, addr);
>  }
>  
> -static void host_s2_free_unlinked_table(void *addr, u32 level)
> +static void host_s2_free_unlinked_table(void *addr, s8 level)
>  {
>  	kvm_pgtable_stage2_free_unlinked(&host_mmu.mm_ops, addr, level);
>  }
> @@ -443,7 +443,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>  {
>  	struct kvm_mem_range cur;
>  	kvm_pte_t pte;
> -	u32 level;
> +	s8 level;
>  	int ret;
>  
>  	hyp_assert_lock_held(&host_mmu.lock);
> @@ -462,7 +462,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>  		cur.start = ALIGN_DOWN(addr, granule);
>  		cur.end = cur.start + granule;
>  		level++;
> -	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
> +	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
>  			!(kvm_level_supports_block_mapping(level) &&
>  			  range_included(&cur, range)));
>  
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index 65a7a186d7b2..b01a3d1078a8 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -260,7 +260,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
>  	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
>  	 */
>  	dsb(ishst);
> -	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
> +	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
>  	dsb(ish);
>  	isb();
>  }
> @@ -275,7 +275,7 @@ static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>  	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
>  
> -	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_MAX_LEVELS - 1)
> +	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
>  		return -EINVAL;
>  
>  	slot->addr = ctx->addr;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 0d5e0a89ddce..bc58d1b515af 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -181,7 +181,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	if (!kvm_pte_valid(ctx->old))
>  		return 0;
>  
> -	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (ctx->level != KVM_PGTABLE_LAST_LEVEL)
>  		return -EINVAL;
>  
>  	phys = kvm_pte_to_phys(ctx->old);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 062eb7bcdb8a..8e79ff6972ce 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -101,7 +101,7 @@ static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx,
>  	return IS_ALIGNED(ctx->addr, granule);
>  }
>  
> -static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
>  {
>  	u64 shift = kvm_granule_shift(level);
>  	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
> @@ -117,7 +117,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>  	return (addr & mask) >> shift;
>  }
>  
> -static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
> +static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
>  {
>  	struct kvm_pgtable pgt = {
>  		.ia_bits	= ia_bits,
> @@ -127,9 +127,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>  	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>  }
>  
> -static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> +static bool kvm_pte_table(kvm_pte_t pte, s8 level)
>  {
> -	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> +	if (level == KVM_PGTABLE_LAST_LEVEL)
>  		return false;
>  
>  	if (!kvm_pte_valid(pte))
> @@ -157,11 +157,11 @@ static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops
>  	return pte;
>  }
>  
> -static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
>  {
>  	kvm_pte_t pte = kvm_phys_to_pte(pa);
> -	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> -							   KVM_PTE_TYPE_BLOCK;
> +	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
> +						       KVM_PTE_TYPE_BLOCK;
>  
>  	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>  	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> @@ -206,11 +206,11 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
>  }
>  
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level);
>  
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  				      struct kvm_pgtable_mm_ops *mm_ops,
> -				      kvm_pteref_t pteref, u32 level)
> +				      kvm_pteref_t pteref, s8 level)
>  {
>  	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>  	kvm_pte_t *ptep = kvm_dereference_pteref(data->walker, pteref);
> @@ -275,12 +275,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  }
>  
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level)
>  {
>  	u32 idx;
>  	int ret = 0;
>  
> -	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
> +	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
>  		return -EINVAL;

Now that level can be negative, you may want to check it against
KVM_PGTABLE_FIRST_LEVEL as well.

>  
>  	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
> @@ -343,7 +343,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  
>  struct leaf_walk_data {
>  	kvm_pte_t	pte;
> -	u32		level;
> +	s8		level;
>  };
>  
>  static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -358,7 +358,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  }
>  
>  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> -			 kvm_pte_t *ptep, u32 *level)
> +			 kvm_pte_t *ptep, s8 *level)
>  {
>  	struct leaf_walk_data data;
>  	struct kvm_pgtable_walker walker = {
> @@ -471,7 +471,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	if (hyp_map_walker_try_leaf(ctx, data))
>  		return 0;
>  
> -	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
>  		return -EINVAL;

Same thing.

>  
>  	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
> @@ -567,14 +567,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  			 struct kvm_pgtable_mm_ops *mm_ops)
>  {
> -	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
> +	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
> +			 ARM64_HW_PGTABLE_LEVELS(va_bits);
> +	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
> +	    start_level > KVM_PGTABLE_LAST_LEVEL)
> +		return -EINVAL;

Please add a new line between the variable definition and the if ()
statement.

>  
>  	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
>  	if (!pgt->pgd)
>  		return -ENOMEM;
>  
>  	pgt->ia_bits		= va_bits;
> -	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
> +	pgt->start_level	= start_level;
>  	pgt->mm_ops		= mm_ops;
>  	pgt->mmu		= NULL;
>  	pgt->force_pte_cb	= NULL;
> @@ -628,7 +632,7 @@ struct stage2_map_data {
>  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>  {
>  	u64 vtcr = VTCR_EL2_FLAGS;
> -	u8 lvls;
> +	s8 lvls;
>  
>  	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
>  	vtcr |= VTCR_EL2_T0SZ(phys_shift);
> @@ -911,7 +915,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>  	u64 phys = stage2_map_walker_phys_addr(ctx, data);
>  
> -	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> +	if (data->force_pte && ctx->level < KVM_PGTABLE_LAST_LEVEL)
>  		return false;
>  
>  	return kvm_block_mapping_supported(ctx, phys);
> @@ -990,7 +994,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  	if (ret != -E2BIG)
>  		return ret;
>  
> -	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
>  		return -EINVAL;
>  
>  	if (!data->memcache)
> @@ -1160,7 +1164,7 @@ struct stage2_attr_data {
>  	kvm_pte_t			attr_set;
>  	kvm_pte_t			attr_clr;
>  	kvm_pte_t			pte;
> -	u32				level;
> +	s8				level;
>  };
>  
>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1203,7 +1207,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>  				    u64 size, kvm_pte_t attr_set,
>  				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
> -				    u32 *level, enum kvm_pgtable_walk_flags flags)
> +				    s8 *level, enum kvm_pgtable_walk_flags flags)
>  {
>  	int ret;
>  	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
> @@ -1305,7 +1309,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>  				   enum kvm_pgtable_prot prot)
>  {
>  	int ret;
> -	u32 level;
> +	s8 level;
>  	kvm_pte_t set = 0, clr = 0;
>  
>  	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
> @@ -1358,7 +1362,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  }
>  
>  kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
> -					      u64 phys, u32 level,
> +					      u64 phys, s8 level,
>  					      enum kvm_pgtable_prot prot,
>  					      void *mc, bool force_pte)
>  {
> @@ -1416,7 +1420,7 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>   * fully populated tree up to the PTE entries. Note that @level is
>   * interpreted as in "level @level entry".
>   */
> -static int stage2_block_get_nr_page_tables(u32 level)
> +static int stage2_block_get_nr_page_tables(s8 level)
>  {
>  	switch (level) {
>  	case 1:
> @@ -1427,7 +1431,7 @@ static int stage2_block_get_nr_page_tables(u32 level)
>  		return 0;
>  	default:
>  		WARN_ON_ONCE(level < KVM_PGTABLE_MIN_BLOCK_LEVEL ||
> -			     level >= KVM_PGTABLE_MAX_LEVELS);
> +			     level > KVM_PGTABLE_LAST_LEVEL);
>  		return -EINVAL;
>  	};
>  }
> @@ -1440,13 +1444,13 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	struct kvm_s2_mmu *mmu;
>  	kvm_pte_t pte = ctx->old, new, *childp;
>  	enum kvm_pgtable_prot prot;
> -	u32 level = ctx->level;
> +	s8 level = ctx->level;
>  	bool force_pte;
>  	int nr_pages;
>  	u64 phys;
>  
>  	/* No huge-pages exist at the last level */
> -	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> +	if (level == KVM_PGTABLE_LAST_LEVEL)
>  		return 0;
>  
>  	/* We only split valid block mappings */
> @@ -1523,7 +1527,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>  	u64 vtcr = mmu->arch->vtcr;
>  	u32 ia_bits = VTCR_EL2_IPA(vtcr);
>  	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
> -	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
> +	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>  
>  	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
>  	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
> @@ -1546,7 +1550,7 @@ size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
>  {
>  	u32 ia_bits = VTCR_EL2_IPA(vtcr);
>  	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
> -	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
> +	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>  
>  	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
>  }
> @@ -1582,7 +1586,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>  	pgt->pgd = NULL;
>  }
>  
> -void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
> +void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
>  {
>  	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
>  	struct kvm_pgtable_walker walker = {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 482280fe22d7..73110ba3624c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -223,12 +223,12 @@ static void stage2_free_unlinked_table_rcu_cb(struct rcu_head *head)
>  {
>  	struct page *page = container_of(head, struct page, rcu_head);
>  	void *pgtable = page_to_virt(page);
> -	u32 level = page_private(page);
> +	s8 level = page_private(page);
>  
>  	kvm_pgtable_stage2_free_unlinked(&kvm_s2_mm_ops, pgtable, level);
>  }
>  
> -static void stage2_free_unlinked_table(void *addr, u32 level)
> +static void stage2_free_unlinked_table(void *addr, s8 level)
>  {
>  	struct page *page = virt_to_page(addr);
>  
> @@ -804,13 +804,13 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  	struct kvm_pgtable pgt = {
>  		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
>  		.ia_bits	= vabits_actual,
> -		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
> -				   CONFIG_PGTABLE_LEVELS),
> +		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
> +				   CONFIG_PGTABLE_LEVELS + 1),
>  		.mm_ops		= &kvm_user_mm_ops,
>  	};
>  	unsigned long flags;
>  	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
> -	u32 level = ~0;
> +	s8 level = ~0;

Well, that's a semantic difference. ~0 == -1, which is a valid level,
while the original code was trying to initialise level to something
invalid. On the bright side, this function is going away in 6.7...

>  	int ret;
>  
>  	/*
> @@ -829,7 +829,9 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  	 * Not seeing an error, but not updating level? Something went
>  	 * deeply wrong...
>  	 */
> -	if (WARN_ON(level >= KVM_PGTABLE_MAX_LEVELS))
> +	if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
> +		return -EFAULT;
> +	if (WARN_ON(level < KVM_PGTABLE_FIRST_LEVEL))
>  		return -EFAULT;
>  
>  	/* Oops, the userspace PTs are gone... Replay the fault */
> @@ -1407,7 +1409,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	gfn_t gfn;
>  	kvm_pfn_t pfn;
>  	bool logging_active = memslot_is_logging(memslot);
> -	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
> +	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>  	long vma_pagesize, fault_granule;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>  	struct kvm_pgtable *pgt;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 08/12] KVM: arm64: Convert translation level parameter to s8
@ 2023-10-20 10:42     ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20 10:42 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 09 Oct 2023 19:50:04 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
> translation, level -1, so levels can now be in the range [-1;3]. 3 is
> always the last level and the first level is determined based on the
> number of VA bits in use.
> 
> Convert level variables to use a signed type in preparation for
> supporting this new level -1.
> 
> Since the last level is always anchored at 3, and the first level varies
> to suit the number of VA/IPA bits, take the opportunity to replace
> KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
> KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
> levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
> longer be true.
> 
> No behavioral changes intended.

Shrug. Unless you have compared the binaries before and after and
proven that they are strictly identical, there will be behaviour
changes, intended or otherwise.

I know what you're trying to convey, but I've seen so many patches
carrying a sentence of this sort and yet turning the kernel on its
head that I've become allergic to it. Sorry.

> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h  |  2 +-
>  arch/arm64/include/asm/kvm_pgtable.h  | 31 ++++++-------
>  arch/arm64/include/asm/kvm_pkvm.h     |  5 ++-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +--
>  arch/arm64/kvm/hyp/nvhe/mm.c          |  4 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c       |  2 +-
>  arch/arm64/kvm/hyp/pgtable.c          | 64 ++++++++++++++-------------
>  arch/arm64/kvm/mmu.c                  | 16 ++++---
>  8 files changed, 69 insertions(+), 61 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 3d6725ff0bf6..bf3ef66eb51f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -404,7 +404,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>  }
>  
> -static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
> +static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>  {
>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
>  }
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index b240158e1218..c61bb9709201 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -11,7 +11,8 @@
>  #include <linux/kvm_host.h>
>  #include <linux/types.h>
>  
> -#define KVM_PGTABLE_MAX_LEVELS		4U
> +#define KVM_PGTABLE_FIRST_LEVEL		0
> +#define KVM_PGTABLE_LAST_LEVEL		3
>  
>  /*
>   * The largest supported block sizes for KVM (no 52-bit PA support):
> @@ -20,9 +21,9 @@
>   *  - 64K (level 2):	512MB
>   */
>  #ifdef CONFIG_ARM64_4K_PAGES
> -#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
> +#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
>  #else
> -#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
> +#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
>  #endif
>  
>  static inline u64 kvm_get_parange_max(void)
> @@ -101,28 +102,28 @@ static inline kvm_pfn_t kvm_pte_to_pfn(kvm_pte_t pte)
>  	return __phys_to_pfn(kvm_pte_to_phys(pte));
>  }
>  
> -static inline u64 kvm_granule_shift(u32 level)
> +static inline u64 kvm_granule_shift(s8 level)
>  {
> -	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
> +	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
>  	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);

I'm amazed that the macro tolerates a negative level, but it really
does.

>  }
>  
> -static inline u64 kvm_granule_size(u32 level)
> +static inline u64 kvm_granule_size(s8 level)
>  {
>  	return BIT(kvm_granule_shift(level));
>  }
>  
> -static inline bool kvm_level_supports_block_mapping(u32 level)
> +static inline bool kvm_level_supports_block_mapping(s8 level)
>  {
>  	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
>  }
>  
>  static inline u32 kvm_supported_block_sizes(void)
>  {
> -	u32 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
> +	s8 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
>  	u32 r = 0;
>  
> -	for (; level < KVM_PGTABLE_MAX_LEVELS; level++)
> +	for (; level <= KVM_PGTABLE_LAST_LEVEL; level++)
>  		r |= BIT(kvm_granule_shift(level));
>  
>  	return r;
> @@ -167,7 +168,7 @@ struct kvm_pgtable_mm_ops {
>  	void*		(*zalloc_page)(void *arg);
>  	void*		(*zalloc_pages_exact)(size_t size);
>  	void		(*free_pages_exact)(void *addr, size_t size);
> -	void		(*free_unlinked_table)(void *addr, u32 level);
> +	void		(*free_unlinked_table)(void *addr, s8 level);
>  	void		(*get_page)(void *addr);
>  	void		(*put_page)(void *addr);
>  	int		(*page_count)(void *addr);
> @@ -263,7 +264,7 @@ struct kvm_pgtable_visit_ctx {
>  	u64					start;
>  	u64					addr;
>  	u64					end;
> -	u32					level;
> +	s8					level;
>  	enum kvm_pgtable_walk_flags		flags;
>  };
>  
> @@ -366,7 +367,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
>   */
>  struct kvm_pgtable {
>  	u32					ia_bits;
> -	u32					start_level;
> +	s8					start_level;
>  	kvm_pteref_t				pgd;
>  	struct kvm_pgtable_mm_ops		*mm_ops;
>  
> @@ -500,7 +501,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>   * The page-table is assumed to be unreachable by any hardware walkers prior to
>   * freeing and therefore no TLB invalidation is performed.
>   */
> -void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
> +void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
>  
>  /**
>   * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure.
> @@ -524,7 +525,7 @@ void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *p
>   * an ERR_PTR(error) on failure.
>   */
>  kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
> -					      u64 phys, u32 level,
> +					      u64 phys, s8 level,
>  					      enum kvm_pgtable_prot prot,
>  					      void *mc, bool force_pte);
>  
> @@ -750,7 +751,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>   * Return: 0 on success, negative error code on failure.
>   */
>  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> -			 kvm_pte_t *ptep, u32 *level);
> +			 kvm_pte_t *ptep, s8 *level);
>  
>  /**
>   * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index e46250a02017..ad9cfb5c1ff4 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -56,10 +56,11 @@ static inline unsigned long hyp_vm_table_pages(void)
>  
>  static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>  {
> -	unsigned long total = 0, i;
> +	unsigned long total = 0;
> +	int i;
>  
>  	/* Provision the worst case scenario */
> -	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
> +	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
>  		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
>  		total += nr_pages;
>  	}
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 9d703441278b..2cfb6352a8ea 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -91,7 +91,7 @@ static void host_s2_put_page(void *addr)
>  	hyp_put_page(&host_s2_pool, addr);
>  }
>  
> -static void host_s2_free_unlinked_table(void *addr, u32 level)
> +static void host_s2_free_unlinked_table(void *addr, s8 level)
>  {
>  	kvm_pgtable_stage2_free_unlinked(&host_mmu.mm_ops, addr, level);
>  }
> @@ -443,7 +443,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>  {
>  	struct kvm_mem_range cur;
>  	kvm_pte_t pte;
> -	u32 level;
> +	s8 level;
>  	int ret;
>  
>  	hyp_assert_lock_held(&host_mmu.lock);
> @@ -462,7 +462,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>  		cur.start = ALIGN_DOWN(addr, granule);
>  		cur.end = cur.start + granule;
>  		level++;
> -	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
> +	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
>  			!(kvm_level_supports_block_mapping(level) &&
>  			  range_included(&cur, range)));
>  
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index 65a7a186d7b2..b01a3d1078a8 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -260,7 +260,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
>  	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
>  	 */
>  	dsb(ishst);
> -	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
> +	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
>  	dsb(ish);
>  	isb();
>  }
> @@ -275,7 +275,7 @@ static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>  	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
>  
> -	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_MAX_LEVELS - 1)
> +	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
>  		return -EINVAL;
>  
>  	slot->addr = ctx->addr;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 0d5e0a89ddce..bc58d1b515af 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -181,7 +181,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	if (!kvm_pte_valid(ctx->old))
>  		return 0;
>  
> -	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (ctx->level != KVM_PGTABLE_LAST_LEVEL)
>  		return -EINVAL;
>  
>  	phys = kvm_pte_to_phys(ctx->old);
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 062eb7bcdb8a..8e79ff6972ce 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -101,7 +101,7 @@ static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx,
>  	return IS_ALIGNED(ctx->addr, granule);
>  }
>  
> -static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
>  {
>  	u64 shift = kvm_granule_shift(level);
>  	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
> @@ -117,7 +117,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>  	return (addr & mask) >> shift;
>  }
>  
> -static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
> +static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
>  {
>  	struct kvm_pgtable pgt = {
>  		.ia_bits	= ia_bits,
> @@ -127,9 +127,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>  	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>  }
>  
> -static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> +static bool kvm_pte_table(kvm_pte_t pte, s8 level)
>  {
> -	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> +	if (level == KVM_PGTABLE_LAST_LEVEL)
>  		return false;
>  
>  	if (!kvm_pte_valid(pte))
> @@ -157,11 +157,11 @@ static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops
>  	return pte;
>  }
>  
> -static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
>  {
>  	kvm_pte_t pte = kvm_phys_to_pte(pa);
> -	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> -							   KVM_PTE_TYPE_BLOCK;
> +	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
> +						       KVM_PTE_TYPE_BLOCK;
>  
>  	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>  	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> @@ -206,11 +206,11 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
>  }
>  
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level);
>  
>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  				      struct kvm_pgtable_mm_ops *mm_ops,
> -				      kvm_pteref_t pteref, u32 level)
> +				      kvm_pteref_t pteref, s8 level)
>  {
>  	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>  	kvm_pte_t *ptep = kvm_dereference_pteref(data->walker, pteref);
> @@ -275,12 +275,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>  }
>  
>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> -			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level)
>  {
>  	u32 idx;
>  	int ret = 0;
>  
> -	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
> +	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
>  		return -EINVAL;

Now that level can be negative, you may want to check it against
KVM_PGTABLE_FIRST_LEVEL as well.

>  
>  	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
> @@ -343,7 +343,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  
>  struct leaf_walk_data {
>  	kvm_pte_t	pte;
> -	u32		level;
> +	s8		level;
>  };
>  
>  static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -358,7 +358,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  }
>  
>  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> -			 kvm_pte_t *ptep, u32 *level)
> +			 kvm_pte_t *ptep, s8 *level)
>  {
>  	struct leaf_walk_data data;
>  	struct kvm_pgtable_walker walker = {
> @@ -471,7 +471,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	if (hyp_map_walker_try_leaf(ctx, data))
>  		return 0;
>  
> -	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
>  		return -EINVAL;

Same thing.

>  
>  	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
> @@ -567,14 +567,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>  			 struct kvm_pgtable_mm_ops *mm_ops)
>  {
> -	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
> +	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
> +			 ARM64_HW_PGTABLE_LEVELS(va_bits);
> +	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
> +	    start_level > KVM_PGTABLE_LAST_LEVEL)
> +		return -EINVAL;

Please add a new line between the variable definition and the if ()
statement.

>  
>  	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
>  	if (!pgt->pgd)
>  		return -ENOMEM;
>  
>  	pgt->ia_bits		= va_bits;
> -	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
> +	pgt->start_level	= start_level;
>  	pgt->mm_ops		= mm_ops;
>  	pgt->mmu		= NULL;
>  	pgt->force_pte_cb	= NULL;
> @@ -628,7 +632,7 @@ struct stage2_map_data {
>  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>  {
>  	u64 vtcr = VTCR_EL2_FLAGS;
> -	u8 lvls;
> +	s8 lvls;
>  
>  	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
>  	vtcr |= VTCR_EL2_T0SZ(phys_shift);
> @@ -911,7 +915,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>  {
>  	u64 phys = stage2_map_walker_phys_addr(ctx, data);
>  
> -	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> +	if (data->force_pte && ctx->level < KVM_PGTABLE_LAST_LEVEL)
>  		return false;
>  
>  	return kvm_block_mapping_supported(ctx, phys);
> @@ -990,7 +994,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  	if (ret != -E2BIG)
>  		return ret;
>  
> -	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
> +	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
>  		return -EINVAL;
>  
>  	if (!data->memcache)
> @@ -1160,7 +1164,7 @@ struct stage2_attr_data {
>  	kvm_pte_t			attr_set;
>  	kvm_pte_t			attr_clr;
>  	kvm_pte_t			pte;
> -	u32				level;
> +	s8				level;
>  };
>  
>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
> @@ -1203,7 +1207,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>  				    u64 size, kvm_pte_t attr_set,
>  				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
> -				    u32 *level, enum kvm_pgtable_walk_flags flags)
> +				    s8 *level, enum kvm_pgtable_walk_flags flags)
>  {
>  	int ret;
>  	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
> @@ -1305,7 +1309,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>  				   enum kvm_pgtable_prot prot)
>  {
>  	int ret;
> -	u32 level;
> +	s8 level;
>  	kvm_pte_t set = 0, clr = 0;
>  
>  	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
> @@ -1358,7 +1362,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
>  }
>  
>  kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
> -					      u64 phys, u32 level,
> +					      u64 phys, s8 level,
>  					      enum kvm_pgtable_prot prot,
>  					      void *mc, bool force_pte)
>  {
> @@ -1416,7 +1420,7 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>   * fully populated tree up to the PTE entries. Note that @level is
>   * interpreted as in "level @level entry".
>   */
> -static int stage2_block_get_nr_page_tables(u32 level)
> +static int stage2_block_get_nr_page_tables(s8 level)
>  {
>  	switch (level) {
>  	case 1:
> @@ -1427,7 +1431,7 @@ static int stage2_block_get_nr_page_tables(u32 level)
>  		return 0;
>  	default:
>  		WARN_ON_ONCE(level < KVM_PGTABLE_MIN_BLOCK_LEVEL ||
> -			     level >= KVM_PGTABLE_MAX_LEVELS);
> +			     level > KVM_PGTABLE_LAST_LEVEL);
>  		return -EINVAL;
>  	};
>  }
> @@ -1440,13 +1444,13 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
>  	struct kvm_s2_mmu *mmu;
>  	kvm_pte_t pte = ctx->old, new, *childp;
>  	enum kvm_pgtable_prot prot;
> -	u32 level = ctx->level;
> +	s8 level = ctx->level;
>  	bool force_pte;
>  	int nr_pages;
>  	u64 phys;
>  
>  	/* No huge-pages exist at the last level */
> -	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> +	if (level == KVM_PGTABLE_LAST_LEVEL)
>  		return 0;
>  
>  	/* We only split valid block mappings */
> @@ -1523,7 +1527,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>  	u64 vtcr = mmu->arch->vtcr;
>  	u32 ia_bits = VTCR_EL2_IPA(vtcr);
>  	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
> -	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
> +	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>  
>  	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
>  	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
> @@ -1546,7 +1550,7 @@ size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
>  {
>  	u32 ia_bits = VTCR_EL2_IPA(vtcr);
>  	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
> -	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
> +	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>  
>  	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
>  }
> @@ -1582,7 +1586,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>  	pgt->pgd = NULL;
>  }
>  
> -void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
> +void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
>  {
>  	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
>  	struct kvm_pgtable_walker walker = {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 482280fe22d7..73110ba3624c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -223,12 +223,12 @@ static void stage2_free_unlinked_table_rcu_cb(struct rcu_head *head)
>  {
>  	struct page *page = container_of(head, struct page, rcu_head);
>  	void *pgtable = page_to_virt(page);
> -	u32 level = page_private(page);
> +	s8 level = page_private(page);
>  
>  	kvm_pgtable_stage2_free_unlinked(&kvm_s2_mm_ops, pgtable, level);
>  }
>  
> -static void stage2_free_unlinked_table(void *addr, u32 level)
> +static void stage2_free_unlinked_table(void *addr, s8 level)
>  {
>  	struct page *page = virt_to_page(addr);
>  
> @@ -804,13 +804,13 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  	struct kvm_pgtable pgt = {
>  		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
>  		.ia_bits	= vabits_actual,
> -		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
> -				   CONFIG_PGTABLE_LEVELS),
> +		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
> +				   CONFIG_PGTABLE_LEVELS + 1),
>  		.mm_ops		= &kvm_user_mm_ops,
>  	};
>  	unsigned long flags;
>  	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
> -	u32 level = ~0;
> +	s8 level = ~0;

Well, that's a semantic difference. ~0 == -1, which is a valid level,
while the original code was trying to initialise level to something
invalid. On the bright side, this function is going away in 6.7...

>  	int ret;
>  
>  	/*
> @@ -829,7 +829,9 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>  	 * Not seeing an error, but not updating level? Something went
>  	 * deeply wrong...
>  	 */
> -	if (WARN_ON(level >= KVM_PGTABLE_MAX_LEVELS))
> +	if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
> +		return -EFAULT;
> +	if (WARN_ON(level < KVM_PGTABLE_FIRST_LEVEL))
>  		return -EFAULT;
>  
>  	/* Oops, the userspace PTs are gone... Replay the fault */
> @@ -1407,7 +1409,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	gfn_t gfn;
>  	kvm_pfn_t pfn;
>  	bool logging_active = memslot_is_logging(memslot);
> -	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
> +	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>  	long vma_pagesize, fault_granule;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>  	struct kvm_pgtable *pgt;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2023-10-09 18:49 ` Ryan Roberts
@ 2023-10-20 10:54   ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20 10:54 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

Hi Ryan,

On Mon, 09 Oct 2023 19:49:56 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> Hi All,
> 
> This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
> nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
> FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
> already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
> The series does not include support for FEAT_LPA2 in the kernel stage 1. This
> support is provided separately by Ard Biesheuvel's series at [4]. The two series
> are mostly independent.
> 
> This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
> minor changes based on review comments from Oliver.
> 
> NOTE: I've included my patch to update the range-based tlbi functions to work
> with LPA2 in this version, because KVM has started using range-based tlbi
> invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
> calls will use the LPA2 format if LPA2 is in use by KVM, but the
> kernel-originated calls are hardcoded to never use the LPA2 format. If merging
> with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
> from __flush_tlb_range() appropriately.
> 
> 
> Testing
> =======
> 
> Testing has been done exclusively on the FVP and covers my boot matrix tests
> and kvm selftests.
> 
> The host/guest config boot matrix gives the same (expected) results as for the
> v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
> load the host kernel into high memory which isn't expected to work until the
> kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
> on the exact configs).
> 
> KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
> all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
> takes about 10 hours on FVP, sigh, but I can test a few more host configs if
> useful).

Have you tried with the (brand new) "arm64_sw.hvhe=1" command-line
option, which enables VHE for the EL2 hypervisor only? I expect things
to work, but it would be good to make sure...

> All tests pass except "memslot_perf_test", which fails due to a timeout
> while syncing. This test fails in the same way for plain v6.6-rc1, so I'm
> confident this is not a regression caused by this series. (the issue is that
> alarm(2) is issued and the signal is received before alarm(0) is issued. I
> expect this is an FVP-time related problem, although I'm not sure how to fix
> robustly for the FVP without potentially hanging real systems for long periods
> of time).

[...]

This is starting to look good, and I only had pretty minor comments on
this series so far. It is too late for 6.7, but if you can respin it
for -rc1, I'll happily review it again and queue it for 6.8 if things
keep looking OK.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-10-20 10:54   ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20 10:54 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

Hi Ryan,

On Mon, 09 Oct 2023 19:49:56 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> Hi All,
> 
> This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
> nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
> FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
> already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
> The series does not include support for FEAT_LPA2 in the kernel stage 1. This
> support is provided separately by Ard Biesheuvel's series at [4]. The two series
> are mostly independent.
> 
> This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
> minor changes based on review comments from Oliver.
> 
> NOTE: I've included my patch to update the range-based tlbi functions to work
> with LPA2 in this version, because KVM has started using range-based tlbi
> invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
> calls will use the LPA2 format if LPA2 is in use by KVM, but the
> kernel-originated calls are hardcoded to never use the LPA2 format. If merging
> with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
> from __flush_tlb_range() appropriately.
> 
> 
> Testing
> =======
> 
> Testing has been done exclusively on the FVP and covers my boot matrix tests
> and kvm selftests.
> 
> The host/guest config boot matrix gives the same (expected) results as for the
> v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
> load the host kernel into high memory which isn't expected to work until the
> kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
> on the exact configs).
> 
> KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
> all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
> takes about 10 hours on FVP, sigh, but I can test a few more host configs if
> useful).

Have you tried with the (brand new) "arm64_sw.hvhe=1" command-line
option, which enables VHE for the EL2 hypervisor only? I expect things
to work, but it would be good to make sure...

> All tests pass except "memslot_perf_test", which fails due to a timeout
> while syncing. This test fails in the same way for plain v6.6-rc1, so I'm
> confident this is not a regression caused by this series. (the issue is that
> alarm(2) is issued and the signal is received before alarm(0) is issued. I
> expect this is an FVP-time related problem, although I'm not sure how to fix
> robustly for the FVP without potentially hanging real systems for long periods
> of time).

[...]

This is starting to look good, and I only had pretty minor comments on
this series so far. It is too late for 6.7, but if you can respin it
for -rc1, I'll happily review it again and queue it for 6.8 if things
keep looking OK.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  2023-10-20  8:05         ` Marc Zyngier
@ 2023-10-20 12:39           ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 12:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 09:05, Marc Zyngier wrote:
> On Thu, 19 Oct 2023 10:22:37 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 19/10/2023 09:03, Marc Zyngier wrote:
>>> On Mon, 09 Oct 2023 19:49:57 +0100,
>>> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
>>>> the non-range tlbi instructions can now validly take a 0 value for the
>>>> 4KB granule (this is due to the extra level of translation). Secondly,
>>>
>>> nit: 0 was always valid. It just didn't indicate any level.
>>
>> True. I'll change to "can now validly take a 0 value as a TTL hint".
>>
>>>
>>>> the BADDR field in the range tlbi instructions must be aligned to 64KB
>>>> when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
>>>> continue to operate correctly when LPA2 is in use.
>>>>
>>>> KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
>>>> solve the first problem with this patch.
>>>
>>> Is this still true? This patch changes __TLBI_VADDR_RANGE() and co.
>>
>> It is no longer true that KVM only uses the non-range routines. v6.6 adds a
>> series where KVM will now use the range-based routines too. So that text is out
>> of date and I should have spotted it when doing the rebase - I'll fix. KVM now
>> using range-based ops is the reason I added patch 2 to this series.
>>
>> However, this patch doesn't really change __TLBI_VADDR_RANGE()'s behavior, it
>> just makes it robust to the presence of TLBI_TTL_UNKNOWN, instead of 0 which was
>> previously used as the "don't know" value.
>>
>>>
>>>>
>>>> It is solved by always adding the level hint if the level is between [0,
>>>> 3] (previously anything other than 0 was hinted, which breaks in the new
>>>> level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
>>>> hint as the HW will fall back to non-hinted. While we are at it, we
>>>> replace the notion of 0 being the non-hinted seninel with a macro,
>>>> TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
>>>> translation depth increases in future.
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>>>> ---
>>>>  arch/arm64/include/asm/tlb.h      |  9 ++++---
>>>>  arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
>>>>  2 files changed, 31 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
>>>> index 2c29239d05c3..93c537635dbb 100644
>>>> --- a/arch/arm64/include/asm/tlb.h
>>>> +++ b/arch/arm64/include/asm/tlb.h
>>>> @@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
>>>>  #include <asm-generic/tlb.h>
>>>>  
>>>>  /*
>>>> - * get the tlbi levels in arm64.  Default value is 0 if more than one
>>>> - * of cleared_* is set or neither is set.
>>>> + * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
>>>> + * one of cleared_* is set or neither is set - this elides the level hinting to
>>>> + * the hardware.
>>>>   * Arm64 doesn't support p4ds now.
>>>>   */
>>>>  static inline int tlb_get_level(struct mmu_gather *tlb)
>>>>  {
>>>>  	/* The TTL field is only valid for the leaf entry. */
>>>>  	if (tlb->freed_tables)
>>>> -		return 0;
>>>> +		return TLBI_TTL_UNKNOWN;
>>>>  
>>>>  	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
>>>>  				   tlb->cleared_puds ||
>>>> @@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>>>>  				   tlb->cleared_p4ds))
>>>>  		return 1;
>>>>  
>>>> -	return 0;
>>>> +	return TLBI_TTL_UNKNOWN;
>>>>  }
>>>>  
>>>>  static inline void tlb_flush(struct mmu_gather *tlb)
>>>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>>>> index b149cf9f91bc..e688246b3b13 100644
>>>> --- a/arch/arm64/include/asm/tlbflush.h
>>>> +++ b/arch/arm64/include/asm/tlbflush.h
>>>> @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
>>>>   * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
>>>>   * the level at which the invalidation must take place. If the level is
>>>>   * wrong, no invalidation may take place. In the case where the level
>>>> - * cannot be easily determined, a 0 value for the level parameter will
>>>> - * perform a non-hinted invalidation.
>>>> + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
>>>> + * a non-hinted invalidation. Any provided level outside the hint range
>>>> + * will also cause fall-back to non-hinted invalidation.
>>>>   *
>>>>   * For Stage-2 invalidation, use the level values provided to that effect
>>>>   * in asm/stage2_pgtable.h.
>>>>   */
>>>>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
>>>>  
>>>> +#define TLBI_TTL_UNKNOWN	(-1)
>>>
>>> I find this value somehow confusing, as it represent an actual level
>>> number. It just happen to be one that cannot be provided as a TTL. So
>>> having that as a return value from tlb_get_level() isn't great, and
>>> I'd rather have something that cannot be mistaken for a valid level.
>>
>> OK, how about INT_MAX?
> 
> Works for me.
> 
>>
>>>
>>>> +
>>>>  #define __tlbi_level(op, addr, level) do {				\
>>>>  	u64 arg = addr;							\
>>>>  									\
>>>>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
>>>> -	    level) {							\
>>>> +	    level >= 0 && level <= 3) {					\
>>>>  		u64 ttl = level & 3;					\
>>>>  		ttl |= get_trans_granule() << 2;			\
>>>>  		arg &= ~TLBI_TTL_MASK;					\
>>>> @@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
>>>>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>>>>   *
>>>>   */
>>>> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
>>>> -	({							\
>>>> -		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
>>>> -		__ta &= GENMASK_ULL(36, 0);			\
>>>> -		__ta |= (unsigned long)(ttl) << 37;		\
>>>> -		__ta |= (unsigned long)(num) << 39;		\
>>>> -		__ta |= (unsigned long)(scale) << 44;		\
>>>> -		__ta |= get_trans_granule() << 46;		\
>>>> -		__ta |= (unsigned long)(asid) << 48;		\
>>>> -		__ta;						\
>>>> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
>>>> +	({									\
>>>> +		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
>>>> +		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
>>>> +		__ta &= GENMASK_ULL(36, 0);					\
>>>> +		__ta |= __ttl << 37;						\
>>>> +		__ta |= (unsigned long)(num) << 39;				\
>>>> +		__ta |= (unsigned long)(scale) << 44;				\
>>>> +		__ta |= get_trans_granule() << 46;				\
>>>> +		__ta |= (unsigned long)(asid) << 48;				\
>>>> +		__ta;								\
>>>>  	})
>>>>  
>>>>  /* These macros are used by the TLBI RANGE feature. */
>>>> @@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
>>>>   *		CPUs, ensuring that any walk-cache entries associated with the
>>>>   *		translation are also invalidated.
>>>>   *
>>>> - *	__flush_tlb_range(vma, start, end, stride, last_level)
>>>> + *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
>>>>   *		Invalidate the virtual-address range '[start, end)' on all
>>>>   *		CPUs for the user address space corresponding to 'vma->mm'.
>>>>   *		The invalidation operations are issued at a granularity
>>>>   *		determined by 'stride' and only affect any walk-cache entries
>>>> - *		if 'last_level' is equal to false.
>>>> + *		if 'last_level' is equal to false. tlb_level is the level at
>>>> + *		which the invalidation must take place. If the level is wrong,
>>>> + *		no invalidation may take place. In the case where the level
>>>> + *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
>>>> + *		perform a non-hinted invalidation.
>>>>   *
>>>>   *
>>>>   *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
>>>> @@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>>>>  	/*
>>>>  	 * We cannot use leaf-only invalidation here, since we may be invalidating
>>>>  	 * table entries as part of collapsing hugepages or moving page tables.
>>>> -	 * Set the tlb_level to 0 because we can not get enough information here.
>>>> +	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
>>>> +	 * information here.
>>>>  	 */
>>>> -	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
>>>> +	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
>>>>  }
>>>>  
>>>>  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
>>>
>>> It feels like this range stuff would be better located in the second
>>> patch. Not a huge deal though.
>>
>> As I said, this is the minimal change to the range-based side of things to
>> robustly deal with the introduction of TLBI_TTL_UNKNOWN.
>>
>> But I wonder if I'm actually better of squashing both of the 2 patches into one.
>> The only reason I split it previously was because KVM was only using the
>> level-based ops.
> 
> Maybe. There is something to be said about making the range rework
> (decreasing scale) an independent patch, as it is a significant change
> on its own. But maybe the rest of the plumbing can be grouped
> together.

But that's effectively the split I have now, isn't it? The first patch
introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
patch reworks the range stuff. I don't quite follow what you are suggesting.

> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
@ 2023-10-20 12:39           ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 12:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 09:05, Marc Zyngier wrote:
> On Thu, 19 Oct 2023 10:22:37 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 19/10/2023 09:03, Marc Zyngier wrote:
>>> On Mon, 09 Oct 2023 19:49:57 +0100,
>>> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
>>>> the non-range tlbi instructions can now validly take a 0 value for the
>>>> 4KB granule (this is due to the extra level of translation). Secondly,
>>>
>>> nit: 0 was always valid. It just didn't indicate any level.
>>
>> True. I'll change to "can now validly take a 0 value as a TTL hint".
>>
>>>
>>>> the BADDR field in the range tlbi instructions must be aligned to 64KB
>>>> when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
>>>> continue to operate correctly when LPA2 is in use.
>>>>
>>>> KVM only uses the non-range (__tlbi_level()) routines. Therefore we only
>>>> solve the first problem with this patch.
>>>
>>> Is this still true? This patch changes __TLBI_VADDR_RANGE() and co.
>>
>> It is no longer true that KVM only uses the non-range routines. v6.6 adds a
>> series where KVM will now use the range-based routines too. So that text is out
>> of date and I should have spotted it when doing the rebase - I'll fix. KVM now
>> using range-based ops is the reason I added patch 2 to this series.
>>
>> However, this patch doesn't really change __TLBI_VADDR_RANGE()'s behavior, it
>> just makes it robust to the presence of TLBI_TTL_UNKNOWN, instead of 0 which was
>> previously used as the "don't know" value.
>>
>>>
>>>>
>>>> It is solved by always adding the level hint if the level is between [0,
>>>> 3] (previously anything other than 0 was hinted, which breaks in the new
>>>> level -1 case from kvm). When running on non-LPA2 HW, 0 is still safe to
>>>> hint as the HW will fall back to non-hinted. While we are at it, we
>>>> replace the notion of 0 being the non-hinted seninel with a macro,
>>>> TLBI_TTL_UNKNOWN. This means callers won't need updating if/when
>>>> translation depth increases in future.
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
>>>> ---
>>>>  arch/arm64/include/asm/tlb.h      |  9 ++++---
>>>>  arch/arm64/include/asm/tlbflush.h | 43 +++++++++++++++++++------------
>>>>  2 files changed, 31 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
>>>> index 2c29239d05c3..93c537635dbb 100644
>>>> --- a/arch/arm64/include/asm/tlb.h
>>>> +++ b/arch/arm64/include/asm/tlb.h
>>>> @@ -22,15 +22,16 @@ static void tlb_flush(struct mmu_gather *tlb);
>>>>  #include <asm-generic/tlb.h>
>>>>  
>>>>  /*
>>>> - * get the tlbi levels in arm64.  Default value is 0 if more than one
>>>> - * of cleared_* is set or neither is set.
>>>> + * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
>>>> + * one of cleared_* is set or neither is set - this elides the level hinting to
>>>> + * the hardware.
>>>>   * Arm64 doesn't support p4ds now.
>>>>   */
>>>>  static inline int tlb_get_level(struct mmu_gather *tlb)
>>>>  {
>>>>  	/* The TTL field is only valid for the leaf entry. */
>>>>  	if (tlb->freed_tables)
>>>> -		return 0;
>>>> +		return TLBI_TTL_UNKNOWN;
>>>>  
>>>>  	if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
>>>>  				   tlb->cleared_puds ||
>>>> @@ -47,7 +48,7 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>>>>  				   tlb->cleared_p4ds))
>>>>  		return 1;
>>>>  
>>>> -	return 0;
>>>> +	return TLBI_TTL_UNKNOWN;
>>>>  }
>>>>  
>>>>  static inline void tlb_flush(struct mmu_gather *tlb)
>>>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>>>> index b149cf9f91bc..e688246b3b13 100644
>>>> --- a/arch/arm64/include/asm/tlbflush.h
>>>> +++ b/arch/arm64/include/asm/tlbflush.h
>>>> @@ -94,19 +94,22 @@ static inline unsigned long get_trans_granule(void)
>>>>   * When ARMv8.4-TTL exists, TLBI operations take an additional hint for
>>>>   * the level at which the invalidation must take place. If the level is
>>>>   * wrong, no invalidation may take place. In the case where the level
>>>> - * cannot be easily determined, a 0 value for the level parameter will
>>>> - * perform a non-hinted invalidation.
>>>> + * cannot be easily determined, the value TLBI_TTL_UNKNOWN will perform
>>>> + * a non-hinted invalidation. Any provided level outside the hint range
>>>> + * will also cause fall-back to non-hinted invalidation.
>>>>   *
>>>>   * For Stage-2 invalidation, use the level values provided to that effect
>>>>   * in asm/stage2_pgtable.h.
>>>>   */
>>>>  #define TLBI_TTL_MASK		GENMASK_ULL(47, 44)
>>>>  
>>>> +#define TLBI_TTL_UNKNOWN	(-1)
>>>
>>> I find this value somehow confusing, as it represent an actual level
>>> number. It just happen to be one that cannot be provided as a TTL. So
>>> having that as a return value from tlb_get_level() isn't great, and
>>> I'd rather have something that cannot be mistaken for a valid level.
>>
>> OK, how about INT_MAX?
> 
> Works for me.
> 
>>
>>>
>>>> +
>>>>  #define __tlbi_level(op, addr, level) do {				\
>>>>  	u64 arg = addr;							\
>>>>  									\
>>>>  	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&		\
>>>> -	    level) {							\
>>>> +	    level >= 0 && level <= 3) {					\
>>>>  		u64 ttl = level & 3;					\
>>>>  		ttl |= get_trans_granule() << 2;			\
>>>>  		arg &= ~TLBI_TTL_MASK;					\
>>>> @@ -134,16 +137,17 @@ static inline unsigned long get_trans_granule(void)
>>>>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>>>>   *
>>>>   */
>>>> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)		\
>>>> -	({							\
>>>> -		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
>>>> -		__ta &= GENMASK_ULL(36, 0);			\
>>>> -		__ta |= (unsigned long)(ttl) << 37;		\
>>>> -		__ta |= (unsigned long)(num) << 39;		\
>>>> -		__ta |= (unsigned long)(scale) << 44;		\
>>>> -		__ta |= get_trans_granule() << 46;		\
>>>> -		__ta |= (unsigned long)(asid) << 48;		\
>>>> -		__ta;						\
>>>> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
>>>> +	({									\
>>>> +		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
>>>> +		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
>>>> +		__ta &= GENMASK_ULL(36, 0);					\
>>>> +		__ta |= __ttl << 37;						\
>>>> +		__ta |= (unsigned long)(num) << 39;				\
>>>> +		__ta |= (unsigned long)(scale) << 44;				\
>>>> +		__ta |= get_trans_granule() << 46;				\
>>>> +		__ta |= (unsigned long)(asid) << 48;				\
>>>> +		__ta;								\
>>>>  	})
>>>>  
>>>>  /* These macros are used by the TLBI RANGE feature. */
>>>> @@ -216,12 +220,16 @@ static inline unsigned long get_trans_granule(void)
>>>>   *		CPUs, ensuring that any walk-cache entries associated with the
>>>>   *		translation are also invalidated.
>>>>   *
>>>> - *	__flush_tlb_range(vma, start, end, stride, last_level)
>>>> + *	__flush_tlb_range(vma, start, end, stride, last_level, tlb_level)
>>>>   *		Invalidate the virtual-address range '[start, end)' on all
>>>>   *		CPUs for the user address space corresponding to 'vma->mm'.
>>>>   *		The invalidation operations are issued at a granularity
>>>>   *		determined by 'stride' and only affect any walk-cache entries
>>>> - *		if 'last_level' is equal to false.
>>>> + *		if 'last_level' is equal to false. tlb_level is the level at
>>>> + *		which the invalidation must take place. If the level is wrong,
>>>> + *		no invalidation may take place. In the case where the level
>>>> + *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will
>>>> + *		perform a non-hinted invalidation.
>>>>   *
>>>>   *
>>>>   *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
>>>> @@ -442,9 +450,10 @@ static inline void flush_tlb_range(struct vm_area_struct *vma,
>>>>  	/*
>>>>  	 * We cannot use leaf-only invalidation here, since we may be invalidating
>>>>  	 * table entries as part of collapsing hugepages or moving page tables.
>>>> -	 * Set the tlb_level to 0 because we can not get enough information here.
>>>> +	 * Set the tlb_level to TLBI_TTL_UNKNOWN because we can not get enough
>>>> +	 * information here.
>>>>  	 */
>>>> -	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
>>>> +	__flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
>>>>  }
>>>>  
>>>>  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
>>>
>>> It feels like this range stuff would be better located in the second
>>> patch. Not a huge deal though.
>>
>> As I said, this is the minimal change to the range-based side of things to
>> robustly deal with the introduction of TLBI_TTL_UNKNOWN.
>>
>> But I wonder if I'm actually better of squashing both of the 2 patches into one.
>> The only reason I split it previously was because KVM was only using the
>> level-based ops.
> 
> Maybe. There is something to be said about making the range rework
> (decreasing scale) an independent patch, as it is a significant change
> on its own. But maybe the rest of the plumbing can be grouped
> together.

But that's effectively the split I have now, isn't it? The first patch
introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
patch reworks the range stuff. I don't quite follow what you are suggesting.

> 
> Thanks,
> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  2023-10-20 12:39           ` Ryan Roberts
@ 2023-10-20 13:02             ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20 13:02 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 13:39:47 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 09:05, Marc Zyngier wrote:
> > Maybe. There is something to be said about making the range rework
> > (decreasing scale) an independent patch, as it is a significant change
> > on its own. But maybe the rest of the plumbing can be grouped
> > together.
> 
> But that's effectively the split I have now, isn't it? The first patch
> introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
> patch reworks the range stuff. I don't quite follow what you are suggesting.

Not quite.

What I'm proposing is that you pull the scale changes in their own
patch, and preferably without any change to the external API (i.e. no
change to the signature of the helper). They any extra change, such as
the TTL rework can go separately.

So while this is similar to your existing split, I'd like to see it
without any churn around the calling convention. Which means turning
the ordering around, and making use of a static key in the various
helpers that need to know about LPA2.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
@ 2023-10-20 13:02             ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20 13:02 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 13:39:47 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 09:05, Marc Zyngier wrote:
> > Maybe. There is something to be said about making the range rework
> > (decreasing scale) an independent patch, as it is a significant change
> > on its own. But maybe the rest of the plumbing can be grouped
> > together.
> 
> But that's effectively the split I have now, isn't it? The first patch
> introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
> patch reworks the range stuff. I don't quite follow what you are suggesting.

Not quite.

What I'm proposing is that you pull the scale changes in their own
patch, and preferably without any change to the external API (i.e. no
change to the signature of the helper). They any extra change, such as
the TTL rework can go separately.

So while this is similar to your existing split, I'd like to see it
without any churn around the calling convention. Which means turning
the ordering around, and making use of a static key in the various
helpers that need to know about LPA2.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  2023-10-20 13:02             ` Marc Zyngier
@ 2023-10-20 13:21               ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 13:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 14:02, Marc Zyngier wrote:
> On Fri, 20 Oct 2023 13:39:47 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 20/10/2023 09:05, Marc Zyngier wrote:
>>> Maybe. There is something to be said about making the range rework
>>> (decreasing scale) an independent patch, as it is a significant change
>>> on its own. But maybe the rest of the plumbing can be grouped
>>> together.
>>
>> But that's effectively the split I have now, isn't it? The first patch
>> introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
>> patch reworks the range stuff. I don't quite follow what you are suggesting.
> 
> Not quite.
> 
> What I'm proposing is that you pull the scale changes in their own
> patch, and preferably without any change to the external API (i.e. no
> change to the signature of the helper). They any extra change, such as
> the TTL rework can go separately.
> 
> So while this is similar to your existing split, I'd like to see it
> without any churn around the calling convention. Which means turning
> the ordering around, and making use of a static key in the various
> helpers that need to know about LPA2.

I don't think we can embed the static key usage directly inside
__flush_tlb_range_op() (if that's what you were suggesting), because this macro
is used by both the kernel (for its stage 1) and the hypervisor (for stage 2).
And the kernel doesn't support LPA2 (until Ard's work is merged). So I think
this needs to be an argument to the macro.

Or are you asking that I make the scale change universally, even if LPA2 is not
in use? I could do that as its own change change (which I could benchmark), then
add the rest in a separate change. But my thinking was that we would not want to
change the algorithm for !LAP2 since it is not as effcient (due to the LPA2 64K
alignment requirement).

Sorry for laboring the point - I just want to make sure I understand what you
are asking for.


> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
@ 2023-10-20 13:21               ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 13:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 14:02, Marc Zyngier wrote:
> On Fri, 20 Oct 2023 13:39:47 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 20/10/2023 09:05, Marc Zyngier wrote:
>>> Maybe. There is something to be said about making the range rework
>>> (decreasing scale) an independent patch, as it is a significant change
>>> on its own. But maybe the rest of the plumbing can be grouped
>>> together.
>>
>> But that's effectively the split I have now, isn't it? The first patch
>> introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
>> patch reworks the range stuff. I don't quite follow what you are suggesting.
> 
> Not quite.
> 
> What I'm proposing is that you pull the scale changes in their own
> patch, and preferably without any change to the external API (i.e. no
> change to the signature of the helper). They any extra change, such as
> the TTL rework can go separately.
> 
> So while this is similar to your existing split, I'd like to see it
> without any churn around the calling convention. Which means turning
> the ordering around, and making use of a static key in the various
> helpers that need to know about LPA2.

I don't think we can embed the static key usage directly inside
__flush_tlb_range_op() (if that's what you were suggesting), because this macro
is used by both the kernel (for its stage 1) and the hypervisor (for stage 2).
And the kernel doesn't support LPA2 (until Ard's work is merged). So I think
this needs to be an argument to the macro.

Or are you asking that I make the scale change universally, even if LPA2 is not
in use? I could do that as its own change change (which I could benchmark), then
add the rest in a separate change. But my thinking was that we would not want to
change the algorithm for !LAP2 since it is not as effcient (due to the LPA2 64K
alignment requirement).

Sorry for laboring the point - I just want to make sure I understand what you
are asking for.


> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
  2023-10-20 13:21               ` Ryan Roberts
@ 2023-10-20 13:41                 ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20 13:41 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 14:21:39 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 14:02, Marc Zyngier wrote:
> > On Fri, 20 Oct 2023 13:39:47 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 20/10/2023 09:05, Marc Zyngier wrote:
> >>> Maybe. There is something to be said about making the range rework
> >>> (decreasing scale) an independent patch, as it is a significant change
> >>> on its own. But maybe the rest of the plumbing can be grouped
> >>> together.
> >>
> >> But that's effectively the split I have now, isn't it? The first patch
> >> introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
> >> patch reworks the range stuff. I don't quite follow what you are suggesting.
> > 
> > Not quite.
> > 
> > What I'm proposing is that you pull the scale changes in their own
> > patch, and preferably without any change to the external API (i.e. no
> > change to the signature of the helper). They any extra change, such as
> > the TTL rework can go separately.
> > 
> > So while this is similar to your existing split, I'd like to see it
> > without any churn around the calling convention. Which means turning
> > the ordering around, and making use of a static key in the various
> > helpers that need to know about LPA2.
> 
> I don't think we can embed the static key usage directly inside
> __flush_tlb_range_op() (if that's what you were suggesting), because this macro
> is used by both the kernel (for its stage 1) and the hypervisor (for stage 2).
> And the kernel doesn't support LPA2 (until Ard's work is merged). So I think
> this needs to be an argument to the macro.

I can see two outcomes here:

- either you create separate helpers that abstract the LPA2-ness for
  KVM and stick to non-LPA2 for the kernel (until Ard's series makes
  it in)

- or you leave the whole thing disabled until we have full LPA2
  support.

Eventually, you replace the whole extra parameter with a static key,
and nobody sees any churn.

> Or are you asking that I make the scale change universally, even if LPA2 is not
> in use? I could do that as its own change change (which I could benchmark), then
> add the rest in a separate change. But my thinking was that we would not want to
> change the algorithm for !LAP2 since it is not as effcient (due to the LPA2 64K
> alignment requirement).

I'm all for simplicity. If having an extra 15 potential TLBIs is
acceptable from a performance perspective, I won't complain. But I can
imagine that NV would be suffering from that (TLBIs on S2 have to
trap).

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2
@ 2023-10-20 13:41                 ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-20 13:41 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 14:21:39 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 14:02, Marc Zyngier wrote:
> > On Fri, 20 Oct 2023 13:39:47 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 20/10/2023 09:05, Marc Zyngier wrote:
> >>> Maybe. There is something to be said about making the range rework
> >>> (decreasing scale) an independent patch, as it is a significant change
> >>> on its own. But maybe the rest of the plumbing can be grouped
> >>> together.
> >>
> >> But that's effectively the split I have now, isn't it? The first patch
> >> introduces TLBI_TTL_UNKNOWN to enable use of 0 as a ttl hint. Then the second
> >> patch reworks the range stuff. I don't quite follow what you are suggesting.
> > 
> > Not quite.
> > 
> > What I'm proposing is that you pull the scale changes in their own
> > patch, and preferably without any change to the external API (i.e. no
> > change to the signature of the helper). They any extra change, such as
> > the TTL rework can go separately.
> > 
> > So while this is similar to your existing split, I'd like to see it
> > without any churn around the calling convention. Which means turning
> > the ordering around, and making use of a static key in the various
> > helpers that need to know about LPA2.
> 
> I don't think we can embed the static key usage directly inside
> __flush_tlb_range_op() (if that's what you were suggesting), because this macro
> is used by both the kernel (for its stage 1) and the hypervisor (for stage 2).
> And the kernel doesn't support LPA2 (until Ard's work is merged). So I think
> this needs to be an argument to the macro.

I can see two outcomes here:

- either you create separate helpers that abstract the LPA2-ness for
  KVM and stick to non-LPA2 for the kernel (until Ard's series makes
  it in)

- or you leave the whole thing disabled until we have full LPA2
  support.

Eventually, you replace the whole extra parameter with a static key,
and nobody sees any churn.

> Or are you asking that I make the scale change universally, even if LPA2 is not
> in use? I could do that as its own change change (which I could benchmark), then
> add the rest in a separate change. But my thinking was that we would not want to
> change the algorithm for !LAP2 since it is not as effcient (due to the LPA2 64K
> alignment requirement).

I'm all for simplicity. If having an extra 15 potential TLBIs is
acceptable from a performance perspective, I won't complain. But I can
imagine that NV would be suffering from that (TLBIs on S2 have to
trap).

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 02/12] arm64/mm: Update range-based tlb invalidation routines for FEAT_LPA2
  2023-10-19 21:06     ` Marc Zyngier
@ 2023-10-20 14:55       ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 14:55 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 19/10/2023 22:06, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:49:58 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> The BADDR field of the range-based tlbi instructions is specified in
>> 64KB units when LPA2 is in use (TCR.DS=1), whereas it is in page units
>> otherwise.
>>
>> When LPA2 is enabled, use the non-range tlbi instructions to forward
>> align to a 64KB boundary first, then use range-based tlbi from there on,
>> until we have either invalidated all pages or we have a single page
>> remaining. If the latter, that is done with non-range tlbi. (Previously
>> we invalidated a single odd page first, but we can no longer do this
>> because it could wreck our 64KB alignment). When LPA2 is not in use, we
>> don't need the initial alignemnt step. However, the bigger impact is
>> that we can no longer use the previous method of iterating from smallest
>> to largest 'scale', since this would likely unalign the boundary again
>> for the LPA2 case. So instead we iterate from highest to lowest scale,
>> which guarrantees that we remain 64KB aligned until the last op (at
>> scale=0).
>>
>> The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
>> arm64") stated this as the reason for incrementing scale:
>>
>>   However, in most scenarios, the pages = 1 when flush_tlb_range() is
>>   called. Start from scale = 3 or other proper value (such as scale
>>   =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
>>   to maximum, the flush order is exactly opposite to the example.
>>
>> But pages=1 is already special cased by the non-range invalidation path,
>> which will take care of it the first time through the loop (both in the
>> original commit and in my change), so I don't think switching to
>> decrement scale should have any extra performance impact after all.
> 
> Surely this can be benchmarked. After all, HW supporting range
> invalidation is common enough these days.

Ahh, I think I see now what you were suggesting on the other thread; make the
change from incrementing scale to decrementing scale its own patch - and exclude
the LPA2 alignment stuff from it too. Then add all the other LPA2 specific stuff
in a separate patch.

Yes I can do that, and benchmark it. Kernel compilation is pretty TLBI
intensive; Sufficient to use that as the benchmark and run in VM on M2?

> 
>>
>> Note: This patch uses LPA2 range-based tlbi based on the new lpa2 param
>> passed to __flush_tlb_range_op(). This allows both KVM and the kernel to
>> opt-in/out of LPA2 usage independently. But once both are converted over
>> (and keyed off the same static key), the parameter could be dropped and
>> replaced by the static key directly in the macro.
> 
> Why can't this be done right away? Have a patch common to the two
> series that exposes the static key, and use that from the start. This
> would avoid the current (and rather ugly) extra parameter that I find
> unnecessarily hard to parse.
> 
> And if the 64kB alignment above is cheap enough, maybe this could
> become the one true way?

Yes, I can benchmark that too. Let's see what the data tells us then decide.

> 
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/tlb.h      |  6 +++-
>>  arch/arm64/include/asm/tlbflush.h | 46 ++++++++++++++++++++-----------
>>  arch/arm64/kvm/hyp/nvhe/tlb.c     |  2 +-
>>  arch/arm64/kvm/hyp/vhe/tlb.c      |  2 +-
>>  4 files changed, 37 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
>> index 93c537635dbb..396ba9b4872c 100644
>> --- a/arch/arm64/include/asm/tlb.h
>> +++ b/arch/arm64/include/asm/tlb.h
>> @@ -25,7 +25,6 @@ static void tlb_flush(struct mmu_gather *tlb);
>>   * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
>>   * one of cleared_* is set or neither is set - this elides the level hinting to
>>   * the hardware.
>> - * Arm64 doesn't support p4ds now.
>>   */
>>  static inline int tlb_get_level(struct mmu_gather *tlb)
>>  {
>> @@ -48,6 +47,11 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>>  				   tlb->cleared_p4ds))
>>  		return 1;
>>  
>> +	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
>> +				   tlb->cleared_pmds ||
>> +				   tlb->cleared_puds))
>> +		return 0;
>> +
>>  	return TLBI_TTL_UNKNOWN;
>>  }
>>  
>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>> index e688246b3b13..4d34035fe7d6 100644
>> --- a/arch/arm64/include/asm/tlbflush.h
>> +++ b/arch/arm64/include/asm/tlbflush.h
>> @@ -136,10 +136,14 @@ static inline unsigned long get_trans_granule(void)
>>   * The address range is determined by below formula:
>>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>>   *
>> + * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
>> + * See ARM DDI 0487I.a C5.5.21.
> 
> Please update this to the latest published ARM ARM. I know it will be
> obsolete quickly enough, but still. Also, "page number" is rather
> imprecise, and doesn't match the language of the architecture.

Will do.

> 
>> + *
>>   */
>> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
>> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2)			\
>>  	({									\
>> -		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
>> +		unsigned long __addr_shift = lpa2 ? 16 : PAGE_SHIFT;		\
>> +		unsigned long __ta = (addr) >> __addr_shift;			\
>>  		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
>>  		__ta &= GENMASK_ULL(36, 0);					\
>>  		__ta |= __ttl << 37;						\
>> @@ -354,34 +358,44 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>>   * @tlb_level:	Translation Table level hint, if known
>>   * @tlbi_user:	If 'true', call an additional __tlbi_user()
>>   *              (typically for user ASIDs). 'flase' for IPA instructions
>> + * @lpa2:	If 'true', the lpa2 scheme is used as set out below
>>   *
>>   * When the CPU does not support TLB range operations, flush the TLB
>>   * entries one by one at the granularity of 'stride'. If the TLB
>>   * range ops are supported, then:
>>   *
>> - * 1. If 'pages' is odd, flush the first page through non-range
>> - *    operations;
>> + * 1. If FEAT_LPA2 is in use, the start address of a range operation
>> + *    must be 64KB aligned, so flush pages one by one until the
>> + *    alignment is reached using the non-range operations. This step is
>> + *    skipped if LPA2 is not in use.
>>   *
>>   * 2. For remaining pages: the minimum range granularity is decided
>>   *    by 'scale', so multiple range TLBI operations may be required.
>> - *    Start from scale = 0, flush the corresponding number of pages
>> - *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
>> - *    until no pages left.
>> + *    Start from scale = 3, flush the corresponding number of pages
>> + *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
>> + *    until one or zero pages are left. We must start from highest scale
>> + *    to ensure 64KB start alignment is maintained in the LPA2 case.
> 
> Surely the algorithm is a bit more subtle than this, because always
> starting with scale==3 means that you're invalidating at least 64k
> *pages*, which is an awful lot (a minimum of 256MB?).

__TLBI_RANGE_NUM() returns -1 if scale produces a range that is bigger than the
number of remaining pages, so we won't over-invalidate. I guess you are asking
for this detail to be added to the comment (it wasn't there before and this
aspect hasn't changed).

> 
>> + *
>> + * 3. If there is 1 page remaining, flush it through non-range
>> + *    operations. Range operations can only span an even number of
>> + *    pages. We save this for last to ensure 64KB start alignment is
>> + *    maintained for the LPA2 case.
>>   *
>>   * Note that certain ranges can be represented by either num = 31 and
>>   * scale or num = 0 and scale + 1. The loop below favours the latter
>>   * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
>>   */
>>  #define __flush_tlb_range_op(op, start, pages, stride,			\
>> -				asid, tlb_level, tlbi_user)		\
>> +				asid, tlb_level, tlbi_user, lpa2)	\
>>  do {									\
>>  	int num = 0;							\
>> -	int scale = 0;							\
>> +	int scale = 3;							\
>>  	unsigned long addr;						\
>>  									\
>>  	while (pages > 0) {						\
> 
> Not an issue with your patch, but we could be more robust here. If
> 'pages' is an unsigned quantity and what we have a bug in converging
> to 0 below, we'll be looping for a long time. Not to mention the side
> effects on pages and start.

Good point; pages is always unsigned long in all the callers of this macro, so
the problem definitely exists. I guess the easiest thing would be to convert to
a signed variable, given we know the passed in value must always fit in a signed
long?

> 
>>  		if (!system_supports_tlb_range() ||			\
>> -		    pages % 2 == 1) {					\
>> +		    pages == 1 ||					\
>> +		    (lpa2 && start != ALIGN(start, SZ_64K))) {		\
>>  			addr = __TLBI_VADDR(start, asid);		\
>>  			__tlbi_level(op, addr, tlb_level);		\
>>  			if (tlbi_user)					\
>> @@ -394,19 +408,19 @@ do {									\
>>  		num = __TLBI_RANGE_NUM(pages, scale);			\
>>  		if (num >= 0) {						\
>>  			addr = __TLBI_VADDR_RANGE(start, asid, scale,	\
>> -						  num, tlb_level);	\
>> +						num, tlb_level, lpa2);	\
>>  			__tlbi(r##op, addr);				\
>>  			if (tlbi_user)					\
>>  				__tlbi_user(r##op, addr);		\
>>  			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
>>  			pages -= __TLBI_RANGE_PAGES(num, scale);	\
>>  		}							\
>> -		scale++;						\
>> +		scale--;						\
>>  	}								\
>>  } while (0)
>>  
>> -#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
>> -	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false)
>> +#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level, lpa2) \
>> +	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, lpa2)
>>  
>>  static inline void __flush_tlb_range(struct vm_area_struct *vma,
>>  				     unsigned long start, unsigned long end,
>> @@ -436,9 +450,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
>>  	asid = ASID(vma->vm_mm);
>>  
>>  	if (last_level)
>> -		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
>> +		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true, false);
>>  	else
>> -		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
>> +		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, false);
>>  
>>  	dsb(ish);
>>  	mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
>> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
>> index 1b265713d6be..d42b72f78a9b 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
>> @@ -198,7 +198,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>  	/* Switch to requested VMID */
>>  	__tlb_switch_to_guest(mmu, &cxt, false);
>>  
>> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
>> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>>  
>>  	dsb(ish);
>>  	__tlbi(vmalle1is);
>> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
>> index 46bd43f61d76..6041c6c78984 100644
>> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
>> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
>> @@ -161,7 +161,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>  	/* Switch to requested VMID */
>>  	__tlb_switch_to_guest(mmu, &cxt);
>>  
>> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
>> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>>  
>>  	dsb(ish);
>>  	__tlbi(vmalle1is);
> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 02/12] arm64/mm: Update range-based tlb invalidation routines for FEAT_LPA2
@ 2023-10-20 14:55       ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 14:55 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 19/10/2023 22:06, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:49:58 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> The BADDR field of the range-based tlbi instructions is specified in
>> 64KB units when LPA2 is in use (TCR.DS=1), whereas it is in page units
>> otherwise.
>>
>> When LPA2 is enabled, use the non-range tlbi instructions to forward
>> align to a 64KB boundary first, then use range-based tlbi from there on,
>> until we have either invalidated all pages or we have a single page
>> remaining. If the latter, that is done with non-range tlbi. (Previously
>> we invalidated a single odd page first, but we can no longer do this
>> because it could wreck our 64KB alignment). When LPA2 is not in use, we
>> don't need the initial alignemnt step. However, the bigger impact is
>> that we can no longer use the previous method of iterating from smallest
>> to largest 'scale', since this would likely unalign the boundary again
>> for the LPA2 case. So instead we iterate from highest to lowest scale,
>> which guarrantees that we remain 64KB aligned until the last op (at
>> scale=0).
>>
>> The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
>> arm64") stated this as the reason for incrementing scale:
>>
>>   However, in most scenarios, the pages = 1 when flush_tlb_range() is
>>   called. Start from scale = 3 or other proper value (such as scale
>>   =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
>>   to maximum, the flush order is exactly opposite to the example.
>>
>> But pages=1 is already special cased by the non-range invalidation path,
>> which will take care of it the first time through the loop (both in the
>> original commit and in my change), so I don't think switching to
>> decrement scale should have any extra performance impact after all.
> 
> Surely this can be benchmarked. After all, HW supporting range
> invalidation is common enough these days.

Ahh, I think I see now what you were suggesting on the other thread; make the
change from incrementing scale to decrementing scale its own patch - and exclude
the LPA2 alignment stuff from it too. Then add all the other LPA2 specific stuff
in a separate patch.

Yes I can do that, and benchmark it. Kernel compilation is pretty TLBI
intensive; Sufficient to use that as the benchmark and run in VM on M2?

> 
>>
>> Note: This patch uses LPA2 range-based tlbi based on the new lpa2 param
>> passed to __flush_tlb_range_op(). This allows both KVM and the kernel to
>> opt-in/out of LPA2 usage independently. But once both are converted over
>> (and keyed off the same static key), the parameter could be dropped and
>> replaced by the static key directly in the macro.
> 
> Why can't this be done right away? Have a patch common to the two
> series that exposes the static key, and use that from the start. This
> would avoid the current (and rather ugly) extra parameter that I find
> unnecessarily hard to parse.
> 
> And if the 64kB alignment above is cheap enough, maybe this could
> become the one true way?

Yes, I can benchmark that too. Let's see what the data tells us then decide.

> 
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/tlb.h      |  6 +++-
>>  arch/arm64/include/asm/tlbflush.h | 46 ++++++++++++++++++++-----------
>>  arch/arm64/kvm/hyp/nvhe/tlb.c     |  2 +-
>>  arch/arm64/kvm/hyp/vhe/tlb.c      |  2 +-
>>  4 files changed, 37 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
>> index 93c537635dbb..396ba9b4872c 100644
>> --- a/arch/arm64/include/asm/tlb.h
>> +++ b/arch/arm64/include/asm/tlb.h
>> @@ -25,7 +25,6 @@ static void tlb_flush(struct mmu_gather *tlb);
>>   * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more than
>>   * one of cleared_* is set or neither is set - this elides the level hinting to
>>   * the hardware.
>> - * Arm64 doesn't support p4ds now.
>>   */
>>  static inline int tlb_get_level(struct mmu_gather *tlb)
>>  {
>> @@ -48,6 +47,11 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
>>  				   tlb->cleared_p4ds))
>>  		return 1;
>>  
>> +	if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
>> +				   tlb->cleared_pmds ||
>> +				   tlb->cleared_puds))
>> +		return 0;
>> +
>>  	return TLBI_TTL_UNKNOWN;
>>  }
>>  
>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>> index e688246b3b13..4d34035fe7d6 100644
>> --- a/arch/arm64/include/asm/tlbflush.h
>> +++ b/arch/arm64/include/asm/tlbflush.h
>> @@ -136,10 +136,14 @@ static inline unsigned long get_trans_granule(void)
>>   * The address range is determined by below formula:
>>   * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
>>   *
>> + * If LPA2 is in use, BADDR holds addr[52:16]. Else BADDR holds page number.
>> + * See ARM DDI 0487I.a C5.5.21.
> 
> Please update this to the latest published ARM ARM. I know it will be
> obsolete quickly enough, but still. Also, "page number" is rather
> imprecise, and doesn't match the language of the architecture.

Will do.

> 
>> + *
>>   */
>> -#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl)				\
>> +#define __TLBI_VADDR_RANGE(addr, asid, scale, num, ttl, lpa2)			\
>>  	({									\
>> -		unsigned long __ta = (addr) >> PAGE_SHIFT;			\
>> +		unsigned long __addr_shift = lpa2 ? 16 : PAGE_SHIFT;		\
>> +		unsigned long __ta = (addr) >> __addr_shift;			\
>>  		unsigned long __ttl = (ttl >= 1 && ttl <= 3) ? ttl : 0;		\
>>  		__ta &= GENMASK_ULL(36, 0);					\
>>  		__ta |= __ttl << 37;						\
>> @@ -354,34 +358,44 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>>   * @tlb_level:	Translation Table level hint, if known
>>   * @tlbi_user:	If 'true', call an additional __tlbi_user()
>>   *              (typically for user ASIDs). 'flase' for IPA instructions
>> + * @lpa2:	If 'true', the lpa2 scheme is used as set out below
>>   *
>>   * When the CPU does not support TLB range operations, flush the TLB
>>   * entries one by one at the granularity of 'stride'. If the TLB
>>   * range ops are supported, then:
>>   *
>> - * 1. If 'pages' is odd, flush the first page through non-range
>> - *    operations;
>> + * 1. If FEAT_LPA2 is in use, the start address of a range operation
>> + *    must be 64KB aligned, so flush pages one by one until the
>> + *    alignment is reached using the non-range operations. This step is
>> + *    skipped if LPA2 is not in use.
>>   *
>>   * 2. For remaining pages: the minimum range granularity is decided
>>   *    by 'scale', so multiple range TLBI operations may be required.
>> - *    Start from scale = 0, flush the corresponding number of pages
>> - *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
>> - *    until no pages left.
>> + *    Start from scale = 3, flush the corresponding number of pages
>> + *    ((num+1)*2^(5*scale+1) starting from 'addr'), then descrease it
>> + *    until one or zero pages are left. We must start from highest scale
>> + *    to ensure 64KB start alignment is maintained in the LPA2 case.
> 
> Surely the algorithm is a bit more subtle than this, because always
> starting with scale==3 means that you're invalidating at least 64k
> *pages*, which is an awful lot (a minimum of 256MB?).

__TLBI_RANGE_NUM() returns -1 if scale produces a range that is bigger than the
number of remaining pages, so we won't over-invalidate. I guess you are asking
for this detail to be added to the comment (it wasn't there before and this
aspect hasn't changed).

> 
>> + *
>> + * 3. If there is 1 page remaining, flush it through non-range
>> + *    operations. Range operations can only span an even number of
>> + *    pages. We save this for last to ensure 64KB start alignment is
>> + *    maintained for the LPA2 case.
>>   *
>>   * Note that certain ranges can be represented by either num = 31 and
>>   * scale or num = 0 and scale + 1. The loop below favours the latter
>>   * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
>>   */
>>  #define __flush_tlb_range_op(op, start, pages, stride,			\
>> -				asid, tlb_level, tlbi_user)		\
>> +				asid, tlb_level, tlbi_user, lpa2)	\
>>  do {									\
>>  	int num = 0;							\
>> -	int scale = 0;							\
>> +	int scale = 3;							\
>>  	unsigned long addr;						\
>>  									\
>>  	while (pages > 0) {						\
> 
> Not an issue with your patch, but we could be more robust here. If
> 'pages' is an unsigned quantity and what we have a bug in converging
> to 0 below, we'll be looping for a long time. Not to mention the side
> effects on pages and start.

Good point; pages is always unsigned long in all the callers of this macro, so
the problem definitely exists. I guess the easiest thing would be to convert to
a signed variable, given we know the passed in value must always fit in a signed
long?

> 
>>  		if (!system_supports_tlb_range() ||			\
>> -		    pages % 2 == 1) {					\
>> +		    pages == 1 ||					\
>> +		    (lpa2 && start != ALIGN(start, SZ_64K))) {		\
>>  			addr = __TLBI_VADDR(start, asid);		\
>>  			__tlbi_level(op, addr, tlb_level);		\
>>  			if (tlbi_user)					\
>> @@ -394,19 +408,19 @@ do {									\
>>  		num = __TLBI_RANGE_NUM(pages, scale);			\
>>  		if (num >= 0) {						\
>>  			addr = __TLBI_VADDR_RANGE(start, asid, scale,	\
>> -						  num, tlb_level);	\
>> +						num, tlb_level, lpa2);	\
>>  			__tlbi(r##op, addr);				\
>>  			if (tlbi_user)					\
>>  				__tlbi_user(r##op, addr);		\
>>  			start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
>>  			pages -= __TLBI_RANGE_PAGES(num, scale);	\
>>  		}							\
>> -		scale++;						\
>> +		scale--;						\
>>  	}								\
>>  } while (0)
>>  
>> -#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
>> -	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false)
>> +#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level, lpa2) \
>> +	__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, lpa2)
>>  
>>  static inline void __flush_tlb_range(struct vm_area_struct *vma,
>>  				     unsigned long start, unsigned long end,
>> @@ -436,9 +450,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
>>  	asid = ASID(vma->vm_mm);
>>  
>>  	if (last_level)
>> -		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
>> +		__flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true, false);
>>  	else
>> -		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
>> +		__flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, false);
>>  
>>  	dsb(ish);
>>  	mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
>> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
>> index 1b265713d6be..d42b72f78a9b 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
>> @@ -198,7 +198,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>  	/* Switch to requested VMID */
>>  	__tlb_switch_to_guest(mmu, &cxt, false);
>>  
>> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
>> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>>  
>>  	dsb(ish);
>>  	__tlbi(vmalle1is);
>> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
>> index 46bd43f61d76..6041c6c78984 100644
>> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
>> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
>> @@ -161,7 +161,7 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>  	/* Switch to requested VMID */
>>  	__tlb_switch_to_guest(mmu, &cxt);
>>  
>> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0);
>> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>>  
>>  	dsb(ish);
>>  	__tlbi(vmalle1is);
> 
> Thanks,
> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
  2023-10-20  8:16     ` Marc Zyngier
@ 2023-10-20 15:03       ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 09:16, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:00 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Expose FEAT_LPA2 as a capability so that we can take advantage of
>> alternatives patching in both the kernel and hypervisor.
>>
>> Although FEAT_LPA2 presence is advertised separately for stage1 and
>> stage2, the expectation is that in practice both stages will either
>> support or not support it. Therefore, for the case where KVM is present,
>> we combine both into a single capability, allowing us to simplify the
>> implementation. For the case where KVM is not present, we only care
>> about stage1.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/cpufeature.h |  5 ++++
>>  arch/arm64/kernel/cpufeature.c      | 46 +++++++++++++++++++++++++++++
>>  arch/arm64/tools/cpucaps            |  1 +
>>  3 files changed, 52 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 5bba39376055..b1292ec88538 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -831,6 +831,11 @@ static inline bool system_supports_tlb_range(void)
>>  		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
>>  }
>>  
>> +static inline bool system_supports_lpa2(void)
>> +{
>> +	return cpus_have_const_cap(ARM64_HAS_LPA2);
> 
> cpus_have_const_cap() is going away. You may want to look at Mark's
> series to see how to replace this one.

ACK.

> 
>> +}
>> +
>>  int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>>  bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>>  
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 444a73c2e638..1ccb1fe0e310 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1746,6 +1746,46 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
>>  	return !meltdown_safe;
>>  }
>>  
>> +static inline bool has_lpa2_at_stage1(u64 mmfr0)
> 
> Why inline? It isn't like this has any performance implication...

ACK. I'll remove inline from this and has_lpa2_at_stage2().

> 
>> +{
>> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
>> +	unsigned int tgran;
>> +
>> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
>> +						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
>> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
>> +#else
>> +	return false;
>> +#endif
> 
> Writing this using IS_ENABLED() would be slightly more pleasing to my
> tired eyes... ;-)

ACK.

> 
>> +}
>> +
>> +static inline bool has_lpa2_at_stage2(u64 mmfr0)
>> +{
>> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
>> +	unsigned int tgran;
>> +
>> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
>> +						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
>> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
>> +#else
>> +	return false;
>> +#endif
>> +}
>> +
>> +static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
>> +{
>> +	u64 mmfr0;
>> +	bool ret;
>> +
>> +	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
>> +	ret = has_lpa2_at_stage1(mmfr0);
>> +
>> +	if (kvm_get_mode() != KVM_MODE_NONE)
>> +		ret = ret && has_lpa2_at_stage2(mmfr0);
> 
> Isn't it too late to go back on the decision to use LPA2 at S1 if you
> realise that S2 doesn't support it?

The KVM mode dependent part was a change that Oliver asked for. I guess you are
talking about kernel S1? I don't think it's too late here to decide whether the
(nvhe) hyp s1 should use LPA2. But I guess your point is that kernel s1 would
have had to decide much earlier in boot and will have had to take LPA2 support
in both S1 and S2 into account, and would not have the KVM mode info available
to it at that point?

> 
>> +
>> +	return ret;
>> +}
>> +
>>  #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
>>  #define KPTI_NG_TEMP_VA		(-(1UL << PMD_SHIFT))
>>  
>> @@ -2719,6 +2759,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>>  		.matches = has_cpuid_feature,
>>  		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
>>  	},
>> +	{
>> +		.desc = "Large Physical Address 2",
>> +		.capability = ARM64_HAS_LPA2,
>> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
>> +		.matches = has_lpa2,
>> +	},
>>  	{},
>>  };
>>  
>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>> index dea3dc89234b..07f3957b8488 100644
>> --- a/arch/arm64/tools/cpucaps
>> +++ b/arch/arm64/tools/cpucaps
>> @@ -36,6 +36,7 @@ HAS_GIC_PRIO_MASKING
>>  HAS_GIC_PRIO_RELAXED_SYNC
>>  HAS_HCX
>>  HAS_LDAPR
>> +HAS_LPA2
>>  HAS_LSE_ATOMICS
>>  HAS_MOPS
>>  HAS_NESTED_VIRT
> 
> Why isn't this patch the first or second in the series? You could use
> it to drive the LPA2 decision in the patch #2, avoiding the ugly lpa2
> flag...

I still only think this works if we put my patch and Ard's patch in atomically?
Or at least force has_lpa2() to always return false until both are in, then flip
the switch atomically.

> 
> Thanks,
> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
@ 2023-10-20 15:03       ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 09:16, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:00 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Expose FEAT_LPA2 as a capability so that we can take advantage of
>> alternatives patching in both the kernel and hypervisor.
>>
>> Although FEAT_LPA2 presence is advertised separately for stage1 and
>> stage2, the expectation is that in practice both stages will either
>> support or not support it. Therefore, for the case where KVM is present,
>> we combine both into a single capability, allowing us to simplify the
>> implementation. For the case where KVM is not present, we only care
>> about stage1.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/cpufeature.h |  5 ++++
>>  arch/arm64/kernel/cpufeature.c      | 46 +++++++++++++++++++++++++++++
>>  arch/arm64/tools/cpucaps            |  1 +
>>  3 files changed, 52 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 5bba39376055..b1292ec88538 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -831,6 +831,11 @@ static inline bool system_supports_tlb_range(void)
>>  		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
>>  }
>>  
>> +static inline bool system_supports_lpa2(void)
>> +{
>> +	return cpus_have_const_cap(ARM64_HAS_LPA2);
> 
> cpus_have_const_cap() is going away. You may want to look at Mark's
> series to see how to replace this one.

ACK.

> 
>> +}
>> +
>>  int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>>  bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>>  
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 444a73c2e638..1ccb1fe0e310 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1746,6 +1746,46 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
>>  	return !meltdown_safe;
>>  }
>>  
>> +static inline bool has_lpa2_at_stage1(u64 mmfr0)
> 
> Why inline? It isn't like this has any performance implication...

ACK. I'll remove inline from this and has_lpa2_at_stage2().

> 
>> +{
>> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
>> +	unsigned int tgran;
>> +
>> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
>> +						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
>> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
>> +#else
>> +	return false;
>> +#endif
> 
> Writing this using IS_ENABLED() would be slightly more pleasing to my
> tired eyes... ;-)

ACK.

> 
>> +}
>> +
>> +static inline bool has_lpa2_at_stage2(u64 mmfr0)
>> +{
>> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
>> +	unsigned int tgran;
>> +
>> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
>> +						ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
>> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
>> +#else
>> +	return false;
>> +#endif
>> +}
>> +
>> +static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
>> +{
>> +	u64 mmfr0;
>> +	bool ret;
>> +
>> +	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
>> +	ret = has_lpa2_at_stage1(mmfr0);
>> +
>> +	if (kvm_get_mode() != KVM_MODE_NONE)
>> +		ret = ret && has_lpa2_at_stage2(mmfr0);
> 
> Isn't it too late to go back on the decision to use LPA2 at S1 if you
> realise that S2 doesn't support it?

The KVM mode dependent part was a change that Oliver asked for. I guess you are
talking about kernel S1? I don't think it's too late here to decide whether the
(nvhe) hyp s1 should use LPA2. But I guess your point is that kernel s1 would
have had to decide much earlier in boot and will have had to take LPA2 support
in both S1 and S2 into account, and would not have the KVM mode info available
to it at that point?

> 
>> +
>> +	return ret;
>> +}
>> +
>>  #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
>>  #define KPTI_NG_TEMP_VA		(-(1UL << PMD_SHIFT))
>>  
>> @@ -2719,6 +2759,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>>  		.matches = has_cpuid_feature,
>>  		ARM64_CPUID_FIELDS(ID_AA64MMFR2_EL1, EVT, IMP)
>>  	},
>> +	{
>> +		.desc = "Large Physical Address 2",
>> +		.capability = ARM64_HAS_LPA2,
>> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
>> +		.matches = has_lpa2,
>> +	},
>>  	{},
>>  };
>>  
>> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
>> index dea3dc89234b..07f3957b8488 100644
>> --- a/arch/arm64/tools/cpucaps
>> +++ b/arch/arm64/tools/cpucaps
>> @@ -36,6 +36,7 @@ HAS_GIC_PRIO_MASKING
>>  HAS_GIC_PRIO_RELAXED_SYNC
>>  HAS_HCX
>>  HAS_LDAPR
>> +HAS_LPA2
>>  HAS_LSE_ATOMICS
>>  HAS_MOPS
>>  HAS_NESTED_VIRT
> 
> Why isn't this patch the first or second in the series? You could use
> it to drive the LPA2 decision in the patch #2, avoiding the ugly lpa2
> flag...

I still only think this works if we put my patch and Ard's patch in atomically?
Or at least force has_lpa2() to always return false until both are in, then flip
the switch atomically.

> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
  2023-10-20  9:16     ` Marc Zyngier
@ 2023-10-20 15:06       ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 10:16, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:02 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
>> page size we are using, always use LPA2-style page-tables for stage 2
>> and hyp stage 1, regardless of the VMM-requested IPA size or
>> HW-implemented PA size. When in use we can now support up to 52-bit IPA
>> and PA sizes.
> 
> Maybe worth stating that this S1 comment only applies to the
> standalone EL2 portion, and not the VHE S1 mappings.

ACK.

> 
>>
>> We use the previously created cpu feature to track whether LPA2 is
>> supported for deciding whether to use the LPA2 or classic pte format.
>>
>> Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
>> 4KB, 64GB with 16KB). We explicitly don't enable these in the library
>> because stage2_apply_range() works on batch sizes of the largest used
>> block mapping, and increasing the size of the batch would lead to soft
>> lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
>> stage2_apply_range() batch size to largest block").
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_pgtable.h | 47 +++++++++++++++++++++-------
>>  arch/arm64/kvm/arm.c                 |  2 ++
>>  arch/arm64/kvm/hyp/nvhe/tlb.c        |  3 +-
>>  arch/arm64/kvm/hyp/pgtable.c         | 15 +++++++--
>>  arch/arm64/kvm/hyp/vhe/tlb.c         |  3 +-
>>  5 files changed, 54 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>> index d3e354bb8351..b240158e1218 100644
>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>> @@ -25,12 +25,22 @@
>>  #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
>>  #endif
>>  
>> +static inline u64 kvm_get_parange_max(void)
>> +{
>> +	if (system_supports_lpa2() ||
>> +	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))
> 
> nit: the rest of the code uses PAGE_SHIFT instead of PAGE_SIZE. Not a
> big deal, but being consistent might help the reader.

ACK.

> 
>> +		return ID_AA64MMFR0_EL1_PARANGE_52;
>> +	else
>> +		return ID_AA64MMFR0_EL1_PARANGE_48;
>> +}
>> +
>>  static inline u64 kvm_get_parange(u64 mmfr0)
>>  {
>> +	u64 parange_max = kvm_get_parange_max();
>>  	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
>>  				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
>> -	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
>> -		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
>> +	if (parange > parange_max)
>> +		parange = parange_max;
>>  
>>  	return parange;
>>  }
>> @@ -41,6 +51,8 @@ typedef u64 kvm_pte_t;
>>  
>>  #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
>>  #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
>> +#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
>> +#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
>>  
>>  #define KVM_PHYS_INVALID		(-1ULL)
>>  
>> @@ -51,21 +63,34 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
>>  
>>  static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
>>  {
>> -	u64 pa = pte & KVM_PTE_ADDR_MASK;
>> -
>> -	if (PAGE_SHIFT == 16)
>> -		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
>> +	u64 pa;
>> +
>> +	if (system_supports_lpa2()) {
>> +		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
>> +		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
>> +	} else {
>> +		pa = pte & KVM_PTE_ADDR_MASK;
>> +		if (PAGE_SHIFT == 16)
>> +			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
>> +	}
>>  
>>  	return pa;
>>  }
>>  
>>  static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
>>  {
>> -	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
>> -
>> -	if (PAGE_SHIFT == 16) {
>> -		pa &= GENMASK(51, 48);
>> -		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
>> +	kvm_pte_t pte;
>> +
>> +	if (system_supports_lpa2()) {
>> +		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
>> +		pa &= GENMASK(51, 50);
>> +		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
>> +	} else {
>> +		pte = pa & KVM_PTE_ADDR_MASK;
>> +		if (PAGE_SHIFT == 16) {
>> +			pa &= GENMASK(51, 48);
>> +			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
>> +		}
>>  	}
>>  
>>  	return pte;
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 4866b3f7b4ea..73cc67c2a8a7 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1747,6 +1747,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>>  	}
>>  	tcr &= ~TCR_T0SZ_MASK;
>>  	tcr |= TCR_T0SZ(hyp_va_bits);
>> +	if (system_supports_lpa2())
>> +		tcr |= TCR_EL2_DS;
>>  	params->tcr_el2 = tcr;
>>  
>>  	params->pgd_pa = kvm_mmu_get_httbr();
>> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
>> index d42b72f78a9b..c3cd16c6f95f 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
>> @@ -198,7 +198,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>  	/* Switch to requested VMID */
>>  	__tlb_switch_to_guest(mmu, &cxt, false);
>>  
>> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
>> +				system_supports_lpa2());
> 
> At this stage, I'd fully expect the flag to have been subsumed into
> the helper...

ACK. I'm planning to have has_lpa2() always return false for now. Then once
Ard's changes are in, we can change it to report the system status. Then we can
move this inside __flush_s2_tlb_range_op(). Does that work for you?

> 
>>  
>>  	dsb(ish);
>>  	__tlbi(vmalle1is);
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index f155b8c9e98c..062eb7bcdb8a 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -79,7 +79,10 @@ static bool kvm_pgtable_walk_skip_cmo(const struct kvm_pgtable_visit_ctx *ctx)
>>  
>>  static bool kvm_phys_is_valid(u64 phys)
>>  {
>> -	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
>> +	u64 parange_max = kvm_get_parange_max();
>> +	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
>> +
>> +	return phys < BIT(shift);
>>  }
>>  
>>  static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
>> @@ -408,7 +411,8 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
>>  	}
>>  
>>  	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
>> -	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
>> +	if (!system_supports_lpa2())
>> +		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
>>  	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
>>  	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
>>  	*ptep = attr;
>> @@ -654,6 +658,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>>  		vtcr |= VTCR_EL2_HA;
>>  #endif /* CONFIG_ARM64_HW_AFDBM */
>>  
>> +	if (system_supports_lpa2())
>> +		vtcr |= VTCR_EL2_DS;
>> +
>>  	/* Set the vmid bits */
>>  	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
>>  		VTCR_EL2_VS_16BIT :
>> @@ -711,7 +718,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
>>  	if (prot & KVM_PGTABLE_PROT_W)
>>  		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
>>  
>> -	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
>> +	if (!system_supports_lpa2())
>> +		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
>> +
>>  	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
>>  	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
>>  	*ptep = attr;
>> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
>> index 6041c6c78984..40cea2482a76 100644
>> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
>> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
>> @@ -161,7 +161,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>  	/* Switch to requested VMID */
>>  	__tlb_switch_to_guest(mmu, &cxt);
>>  
>> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
>> +				system_supports_lpa2());
>>  
>>  	dsb(ish);
>>  	__tlbi(vmalle1is);
> 
> One thing I don't see here is how you update the tcr_compute_pa_size
> macro that is used on the initial nVHE setup, which is inconsistent
> with the kvm_get_parange_max() helper.

As you saw, it's in a separate patch.

> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
@ 2023-10-20 15:06       ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 10:16, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:02 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
>> page size we are using, always use LPA2-style page-tables for stage 2
>> and hyp stage 1, regardless of the VMM-requested IPA size or
>> HW-implemented PA size. When in use we can now support up to 52-bit IPA
>> and PA sizes.
> 
> Maybe worth stating that this S1 comment only applies to the
> standalone EL2 portion, and not the VHE S1 mappings.

ACK.

> 
>>
>> We use the previously created cpu feature to track whether LPA2 is
>> supported for deciding whether to use the LPA2 or classic pte format.
>>
>> Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
>> 4KB, 64GB with 16KB). We explicitly don't enable these in the library
>> because stage2_apply_range() works on batch sizes of the largest used
>> block mapping, and increasing the size of the batch would lead to soft
>> lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
>> stage2_apply_range() batch size to largest block").
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_pgtable.h | 47 +++++++++++++++++++++-------
>>  arch/arm64/kvm/arm.c                 |  2 ++
>>  arch/arm64/kvm/hyp/nvhe/tlb.c        |  3 +-
>>  arch/arm64/kvm/hyp/pgtable.c         | 15 +++++++--
>>  arch/arm64/kvm/hyp/vhe/tlb.c         |  3 +-
>>  5 files changed, 54 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>> index d3e354bb8351..b240158e1218 100644
>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>> @@ -25,12 +25,22 @@
>>  #define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
>>  #endif
>>  
>> +static inline u64 kvm_get_parange_max(void)
>> +{
>> +	if (system_supports_lpa2() ||
>> +	   (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))
> 
> nit: the rest of the code uses PAGE_SHIFT instead of PAGE_SIZE. Not a
> big deal, but being consistent might help the reader.

ACK.

> 
>> +		return ID_AA64MMFR0_EL1_PARANGE_52;
>> +	else
>> +		return ID_AA64MMFR0_EL1_PARANGE_48;
>> +}
>> +
>>  static inline u64 kvm_get_parange(u64 mmfr0)
>>  {
>> +	u64 parange_max = kvm_get_parange_max();
>>  	u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
>>  				ID_AA64MMFR0_EL1_PARANGE_SHIFT);
>> -	if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
>> -		parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
>> +	if (parange > parange_max)
>> +		parange = parange_max;
>>  
>>  	return parange;
>>  }
>> @@ -41,6 +51,8 @@ typedef u64 kvm_pte_t;
>>  
>>  #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
>>  #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
>> +#define KVM_PTE_ADDR_MASK_LPA2		GENMASK(49, PAGE_SHIFT)
>> +#define KVM_PTE_ADDR_51_50_LPA2		GENMASK(9, 8)
>>  
>>  #define KVM_PHYS_INVALID		(-1ULL)
>>  
>> @@ -51,21 +63,34 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
>>  
>>  static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
>>  {
>> -	u64 pa = pte & KVM_PTE_ADDR_MASK;
>> -
>> -	if (PAGE_SHIFT == 16)
>> -		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
>> +	u64 pa;
>> +
>> +	if (system_supports_lpa2()) {
>> +		pa = pte & KVM_PTE_ADDR_MASK_LPA2;
>> +		pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
>> +	} else {
>> +		pa = pte & KVM_PTE_ADDR_MASK;
>> +		if (PAGE_SHIFT == 16)
>> +			pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
>> +	}
>>  
>>  	return pa;
>>  }
>>  
>>  static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
>>  {
>> -	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
>> -
>> -	if (PAGE_SHIFT == 16) {
>> -		pa &= GENMASK(51, 48);
>> -		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
>> +	kvm_pte_t pte;
>> +
>> +	if (system_supports_lpa2()) {
>> +		pte = pa & KVM_PTE_ADDR_MASK_LPA2;
>> +		pa &= GENMASK(51, 50);
>> +		pte |= FIELD_PREP(KVM_PTE_ADDR_51_50_LPA2, pa >> 50);
>> +	} else {
>> +		pte = pa & KVM_PTE_ADDR_MASK;
>> +		if (PAGE_SHIFT == 16) {
>> +			pa &= GENMASK(51, 48);
>> +			pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
>> +		}
>>  	}
>>  
>>  	return pte;
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 4866b3f7b4ea..73cc67c2a8a7 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1747,6 +1747,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>>  	}
>>  	tcr &= ~TCR_T0SZ_MASK;
>>  	tcr |= TCR_T0SZ(hyp_va_bits);
>> +	if (system_supports_lpa2())
>> +		tcr |= TCR_EL2_DS;
>>  	params->tcr_el2 = tcr;
>>  
>>  	params->pgd_pa = kvm_mmu_get_httbr();
>> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
>> index d42b72f78a9b..c3cd16c6f95f 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
>> @@ -198,7 +198,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>  	/* Switch to requested VMID */
>>  	__tlb_switch_to_guest(mmu, &cxt, false);
>>  
>> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
>> +				system_supports_lpa2());
> 
> At this stage, I'd fully expect the flag to have been subsumed into
> the helper...

ACK. I'm planning to have has_lpa2() always return false for now. Then once
Ard's changes are in, we can change it to report the system status. Then we can
move this inside __flush_s2_tlb_range_op(). Does that work for you?

> 
>>  
>>  	dsb(ish);
>>  	__tlbi(vmalle1is);
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index f155b8c9e98c..062eb7bcdb8a 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -79,7 +79,10 @@ static bool kvm_pgtable_walk_skip_cmo(const struct kvm_pgtable_visit_ctx *ctx)
>>  
>>  static bool kvm_phys_is_valid(u64 phys)
>>  {
>> -	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
>> +	u64 parange_max = kvm_get_parange_max();
>> +	u8 shift = id_aa64mmfr0_parange_to_phys_shift(parange_max);
>> +
>> +	return phys < BIT(shift);
>>  }
>>  
>>  static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx, u64 phys)
>> @@ -408,7 +411,8 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
>>  	}
>>  
>>  	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
>> -	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
>> +	if (!system_supports_lpa2())
>> +		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
>>  	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
>>  	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
>>  	*ptep = attr;
>> @@ -654,6 +658,9 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>>  		vtcr |= VTCR_EL2_HA;
>>  #endif /* CONFIG_ARM64_HW_AFDBM */
>>  
>> +	if (system_supports_lpa2())
>> +		vtcr |= VTCR_EL2_DS;
>> +
>>  	/* Set the vmid bits */
>>  	vtcr |= (get_vmid_bits(mmfr1) == 16) ?
>>  		VTCR_EL2_VS_16BIT :
>> @@ -711,7 +718,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
>>  	if (prot & KVM_PGTABLE_PROT_W)
>>  		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
>>  
>> -	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
>> +	if (!system_supports_lpa2())
>> +		attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
>> +
>>  	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
>>  	attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
>>  	*ptep = attr;
>> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
>> index 6041c6c78984..40cea2482a76 100644
>> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
>> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
>> @@ -161,7 +161,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>  	/* Switch to requested VMID */
>>  	__tlb_switch_to_guest(mmu, &cxt);
>>  
>> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
>> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
>> +				system_supports_lpa2());
>>  
>>  	dsb(ish);
>>  	__tlbi(vmalle1is);
> 
> One thing I don't see here is how you update the tcr_compute_pa_size
> macro that is used on the initial nVHE setup, which is inconsistent
> with the kvm_get_parange_max() helper.

As you saw, it's in a separate patch.

> 
> Thanks,
> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 07/12] KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode()
  2023-10-20  9:21     ` Marc Zyngier
@ 2023-10-20 15:07       ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 10:21, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:03 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> With the addition of LPA2 support in the hypervisor, the PA size
>> supported by the HW must be capped with a runtime decision, rather than
>> simply using a compile-time decision based on PA_BITS. For example, on a
>> system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
>> or 16KB kernel compiled with LPA2 support must still limit the PA size
>> to 48 bits.
>>
>> Therefore, move the insertion of the PS field into TCR_EL2 out of
>> __kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
>> where the rest of TCR_EL2 is prepared. This allows us to figure out PS
>> with kvm_get_parange(), which has the appropriate logic to ensure the
>> above requirement. (and the PS field of VTCR_EL2 is already populated
>> this way).
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/kvm/arm.c               | 3 +++
>>  arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
>>  2 files changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 73cc67c2a8a7..0bb8918475d2 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1726,6 +1726,7 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>>  {
>>  	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
>>  	unsigned long tcr;
>> +	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> 
> nit: move this one up by a line (yes, I'm being difficult).

ACK.

> 
>>  
>>  	/*
>>  	 * Calculate the raw per-cpu offset without a translation from the
>> @@ -1747,6 +1748,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>>  	}
>>  	tcr &= ~TCR_T0SZ_MASK;
>>  	tcr |= TCR_T0SZ(hyp_va_bits);
>> +	tcr &= ~TCR_EL2_PS_MASK;
>> +	tcr |= FIELD_PREP(TCR_EL2_PS_MASK, kvm_get_parange(mmfr0));
>>  	if (system_supports_lpa2())
>>  		tcr |= TCR_EL2_DS;
>>  	params->tcr_el2 = tcr;
>> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> index 1cc06e6797bd..f62a7d360285 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> @@ -122,11 +122,7 @@ alternative_if ARM64_HAS_CNP
>>  alternative_else_nop_endif
>>  	msr	ttbr0_el2, x2
>>  
>> -	/*
>> -	 * Set the PS bits in TCR_EL2.
>> -	 */
>>  	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
>> -	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
>>  	msr	tcr_el2, x0
>>  
>>  	isb
> 
> Ah, this is where this was hiding. This should be folded into the
> previous patch for consistency (this is otherwise non bisectable).

ACK.

> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 07/12] KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode()
@ 2023-10-20 15:07       ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 10:21, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:03 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> With the addition of LPA2 support in the hypervisor, the PA size
>> supported by the HW must be capped with a runtime decision, rather than
>> simply using a compile-time decision based on PA_BITS. For example, on a
>> system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
>> or 16KB kernel compiled with LPA2 support must still limit the PA size
>> to 48 bits.
>>
>> Therefore, move the insertion of the PS field into TCR_EL2 out of
>> __kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
>> where the rest of TCR_EL2 is prepared. This allows us to figure out PS
>> with kvm_get_parange(), which has the appropriate logic to ensure the
>> above requirement. (and the PS field of VTCR_EL2 is already populated
>> this way).
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/kvm/arm.c               | 3 +++
>>  arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 ----
>>  2 files changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 73cc67c2a8a7..0bb8918475d2 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1726,6 +1726,7 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>>  {
>>  	struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
>>  	unsigned long tcr;
>> +	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> 
> nit: move this one up by a line (yes, I'm being difficult).

ACK.

> 
>>  
>>  	/*
>>  	 * Calculate the raw per-cpu offset without a translation from the
>> @@ -1747,6 +1748,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
>>  	}
>>  	tcr &= ~TCR_T0SZ_MASK;
>>  	tcr |= TCR_T0SZ(hyp_va_bits);
>> +	tcr &= ~TCR_EL2_PS_MASK;
>> +	tcr |= FIELD_PREP(TCR_EL2_PS_MASK, kvm_get_parange(mmfr0));
>>  	if (system_supports_lpa2())
>>  		tcr |= TCR_EL2_DS;
>>  	params->tcr_el2 = tcr;
>> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> index 1cc06e6797bd..f62a7d360285 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
>> @@ -122,11 +122,7 @@ alternative_if ARM64_HAS_CNP
>>  alternative_else_nop_endif
>>  	msr	ttbr0_el2, x2
>>  
>> -	/*
>> -	 * Set the PS bits in TCR_EL2.
>> -	 */
>>  	ldr	x0, [x0, #NVHE_INIT_TCR_EL2]
>> -	tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
>>  	msr	tcr_el2, x0
>>  
>>  	isb
> 
> Ah, this is where this was hiding. This should be folded into the
> previous patch for consistency (this is otherwise non bisectable).

ACK.

> 
> Thanks,
> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 08/12] KVM: arm64: Convert translation level parameter to s8
  2023-10-20 10:42     ` Marc Zyngier
@ 2023-10-20 15:11       ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 11:42, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:04 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
>> translation, level -1, so levels can now be in the range [-1;3]. 3 is
>> always the last level and the first level is determined based on the
>> number of VA bits in use.
>>
>> Convert level variables to use a signed type in preparation for
>> supporting this new level -1.
>>
>> Since the last level is always anchored at 3, and the first level varies
>> to suit the number of VA/IPA bits, take the opportunity to replace
>> KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
>> KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
>> levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
>> longer be true.
>>
>> No behavioral changes intended.
> 
> Shrug. Unless you have compared the binaries before and after and
> proven that they are strictly identical, there will be behaviour
> changes, intended or otherwise.
> 
> I know what you're trying to convey, but I've seen so many patches
> carrying a sentence of this sort and yet turning the kernel on its
> head that I've become allergic to it. Sorry.

I picked this habit up from other KVM patches, so assumed it was preferred
practice. I'll remove, and avoid in future.

> 
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_emulate.h  |  2 +-
>>  arch/arm64/include/asm/kvm_pgtable.h  | 31 ++++++-------
>>  arch/arm64/include/asm/kvm_pkvm.h     |  5 ++-
>>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +--
>>  arch/arm64/kvm/hyp/nvhe/mm.c          |  4 +-
>>  arch/arm64/kvm/hyp/nvhe/setup.c       |  2 +-
>>  arch/arm64/kvm/hyp/pgtable.c          | 64 ++++++++++++++-------------
>>  arch/arm64/kvm/mmu.c                  | 16 ++++---
>>  8 files changed, 69 insertions(+), 61 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 3d6725ff0bf6..bf3ef66eb51f 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -404,7 +404,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
>>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>>  }
>>  
>> -static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>> +static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>>  {
>>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
>>  }
>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>> index b240158e1218..c61bb9709201 100644
>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>> @@ -11,7 +11,8 @@
>>  #include <linux/kvm_host.h>
>>  #include <linux/types.h>
>>  
>> -#define KVM_PGTABLE_MAX_LEVELS		4U
>> +#define KVM_PGTABLE_FIRST_LEVEL		0
>> +#define KVM_PGTABLE_LAST_LEVEL		3
>>  
>>  /*
>>   * The largest supported block sizes for KVM (no 52-bit PA support):
>> @@ -20,9 +21,9 @@
>>   *  - 64K (level 2):	512MB
>>   */
>>  #ifdef CONFIG_ARM64_4K_PAGES
>> -#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
>> +#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
>>  #else
>> -#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
>> +#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
>>  #endif
>>  
>>  static inline u64 kvm_get_parange_max(void)
>> @@ -101,28 +102,28 @@ static inline kvm_pfn_t kvm_pte_to_pfn(kvm_pte_t pte)
>>  	return __phys_to_pfn(kvm_pte_to_phys(pte));
>>  }
>>  
>> -static inline u64 kvm_granule_shift(u32 level)
>> +static inline u64 kvm_granule_shift(s8 level)
>>  {
>> -	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
>> +	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
>>  	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
> 
> I'm amazed that the macro tolerates a negative level, but it really
> does.

Yep. I remember I spent quite a while convincing myself of this, because I also
initially assumed that it would not.

> 
>>  }
>>  
>> -static inline u64 kvm_granule_size(u32 level)
>> +static inline u64 kvm_granule_size(s8 level)
>>  {
>>  	return BIT(kvm_granule_shift(level));
>>  }
>>  
>> -static inline bool kvm_level_supports_block_mapping(u32 level)
>> +static inline bool kvm_level_supports_block_mapping(s8 level)
>>  {
>>  	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
>>  }
>>  
>>  static inline u32 kvm_supported_block_sizes(void)
>>  {
>> -	u32 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
>> +	s8 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
>>  	u32 r = 0;
>>  
>> -	for (; level < KVM_PGTABLE_MAX_LEVELS; level++)
>> +	for (; level <= KVM_PGTABLE_LAST_LEVEL; level++)
>>  		r |= BIT(kvm_granule_shift(level));
>>  
>>  	return r;
>> @@ -167,7 +168,7 @@ struct kvm_pgtable_mm_ops {
>>  	void*		(*zalloc_page)(void *arg);
>>  	void*		(*zalloc_pages_exact)(size_t size);
>>  	void		(*free_pages_exact)(void *addr, size_t size);
>> -	void		(*free_unlinked_table)(void *addr, u32 level);
>> +	void		(*free_unlinked_table)(void *addr, s8 level);
>>  	void		(*get_page)(void *addr);
>>  	void		(*put_page)(void *addr);
>>  	int		(*page_count)(void *addr);
>> @@ -263,7 +264,7 @@ struct kvm_pgtable_visit_ctx {
>>  	u64					start;
>>  	u64					addr;
>>  	u64					end;
>> -	u32					level;
>> +	s8					level;
>>  	enum kvm_pgtable_walk_flags		flags;
>>  };
>>  
>> @@ -366,7 +367,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
>>   */
>>  struct kvm_pgtable {
>>  	u32					ia_bits;
>> -	u32					start_level;
>> +	s8					start_level;
>>  	kvm_pteref_t				pgd;
>>  	struct kvm_pgtable_mm_ops		*mm_ops;
>>  
>> @@ -500,7 +501,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>>   * The page-table is assumed to be unreachable by any hardware walkers prior to
>>   * freeing and therefore no TLB invalidation is performed.
>>   */
>> -void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
>> +void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
>>  
>>  /**
>>   * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure.
>> @@ -524,7 +525,7 @@ void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *p
>>   * an ERR_PTR(error) on failure.
>>   */
>>  kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>> -					      u64 phys, u32 level,
>> +					      u64 phys, s8 level,
>>  					      enum kvm_pgtable_prot prot,
>>  					      void *mc, bool force_pte);
>>  
>> @@ -750,7 +751,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>>   * Return: 0 on success, negative error code on failure.
>>   */
>>  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>> -			 kvm_pte_t *ptep, u32 *level);
>> +			 kvm_pte_t *ptep, s8 *level);
>>  
>>  /**
>>   * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
>> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
>> index e46250a02017..ad9cfb5c1ff4 100644
>> --- a/arch/arm64/include/asm/kvm_pkvm.h
>> +++ b/arch/arm64/include/asm/kvm_pkvm.h
>> @@ -56,10 +56,11 @@ static inline unsigned long hyp_vm_table_pages(void)
>>  
>>  static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>>  {
>> -	unsigned long total = 0, i;
>> +	unsigned long total = 0;
>> +	int i;
>>  
>>  	/* Provision the worst case scenario */
>> -	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
>> +	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
>>  		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
>>  		total += nr_pages;
>>  	}
>> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
>> index 9d703441278b..2cfb6352a8ea 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
>> @@ -91,7 +91,7 @@ static void host_s2_put_page(void *addr)
>>  	hyp_put_page(&host_s2_pool, addr);
>>  }
>>  
>> -static void host_s2_free_unlinked_table(void *addr, u32 level)
>> +static void host_s2_free_unlinked_table(void *addr, s8 level)
>>  {
>>  	kvm_pgtable_stage2_free_unlinked(&host_mmu.mm_ops, addr, level);
>>  }
>> @@ -443,7 +443,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>>  {
>>  	struct kvm_mem_range cur;
>>  	kvm_pte_t pte;
>> -	u32 level;
>> +	s8 level;
>>  	int ret;
>>  
>>  	hyp_assert_lock_held(&host_mmu.lock);
>> @@ -462,7 +462,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>>  		cur.start = ALIGN_DOWN(addr, granule);
>>  		cur.end = cur.start + granule;
>>  		level++;
>> -	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
>> +	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
>>  			!(kvm_level_supports_block_mapping(level) &&
>>  			  range_included(&cur, range)));
>>  
>> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
>> index 65a7a186d7b2..b01a3d1078a8 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
>> @@ -260,7 +260,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
>>  	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
>>  	 */
>>  	dsb(ishst);
>> -	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
>> +	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
>>  	dsb(ish);
>>  	isb();
>>  }
>> @@ -275,7 +275,7 @@ static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
>>  {
>>  	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
>>  
>> -	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_MAX_LEVELS - 1)
>> +	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
>>  		return -EINVAL;
>>  
>>  	slot->addr = ctx->addr;
>> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
>> index 0d5e0a89ddce..bc58d1b515af 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
>> @@ -181,7 +181,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  	if (!kvm_pte_valid(ctx->old))
>>  		return 0;
>>  
>> -	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>> +	if (ctx->level != KVM_PGTABLE_LAST_LEVEL)
>>  		return -EINVAL;
>>  
>>  	phys = kvm_pte_to_phys(ctx->old);
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 062eb7bcdb8a..8e79ff6972ce 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -101,7 +101,7 @@ static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx,
>>  	return IS_ALIGNED(ctx->addr, granule);
>>  }
>>  
>> -static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
>> +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
>>  {
>>  	u64 shift = kvm_granule_shift(level);
>>  	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
>> @@ -117,7 +117,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>>  	return (addr & mask) >> shift;
>>  }
>>  
>> -static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>> +static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
>>  {
>>  	struct kvm_pgtable pgt = {
>>  		.ia_bits	= ia_bits,
>> @@ -127,9 +127,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>>  	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>>  }
>>  
>> -static bool kvm_pte_table(kvm_pte_t pte, u32 level)
>> +static bool kvm_pte_table(kvm_pte_t pte, s8 level)
>>  {
>> -	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
>> +	if (level == KVM_PGTABLE_LAST_LEVEL)
>>  		return false;
>>  
>>  	if (!kvm_pte_valid(pte))
>> @@ -157,11 +157,11 @@ static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops
>>  	return pte;
>>  }
>>  
>> -static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
>> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
>>  {
>>  	kvm_pte_t pte = kvm_phys_to_pte(pa);
>> -	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
>> -							   KVM_PTE_TYPE_BLOCK;
>> +	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
>> +						       KVM_PTE_TYPE_BLOCK;
>>  
>>  	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>>  	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
>> @@ -206,11 +206,11 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
>>  }
>>  
>>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>> -			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
>> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level);
>>  
>>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>>  				      struct kvm_pgtable_mm_ops *mm_ops,
>> -				      kvm_pteref_t pteref, u32 level)
>> +				      kvm_pteref_t pteref, s8 level)
>>  {
>>  	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>>  	kvm_pte_t *ptep = kvm_dereference_pteref(data->walker, pteref);
>> @@ -275,12 +275,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>>  }
>>  
>>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>> -			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
>> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level)
>>  {
>>  	u32 idx;
>>  	int ret = 0;
>>  
>> -	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
>> +	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
>>  		return -EINVAL;
> 
> Now that level can be negative, you may want to check it against
> KVM_PGTABLE_FIRST_LEVEL as well.

ACK.

> 
>>  
>>  	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
>> @@ -343,7 +343,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>>  
>>  struct leaf_walk_data {
>>  	kvm_pte_t	pte;
>> -	u32		level;
>> +	s8		level;
>>  };
>>  
>>  static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>> @@ -358,7 +358,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  }
>>  
>>  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>> -			 kvm_pte_t *ptep, u32 *level)
>> +			 kvm_pte_t *ptep, s8 *level)
>>  {
>>  	struct leaf_walk_data data;
>>  	struct kvm_pgtable_walker walker = {
>> @@ -471,7 +471,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  	if (hyp_map_walker_try_leaf(ctx, data))
>>  		return 0;
>>  
>> -	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>> +	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
>>  		return -EINVAL;
> 
> Same thing.

ACK.

> 
>>  
>>  	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>> @@ -567,14 +567,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>>  int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>>  			 struct kvm_pgtable_mm_ops *mm_ops)
>>  {
>> -	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
>> +	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
>> +			 ARM64_HW_PGTABLE_LEVELS(va_bits);
>> +	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
>> +	    start_level > KVM_PGTABLE_LAST_LEVEL)
>> +		return -EINVAL;
> 
> Please add a new line between the variable definition and the if ()
> statement.

Hmm, I'm surprised that check_patch.pl didn't flag this. Or more likely I
somehow didn't run it for this patch...

> 
>>  
>>  	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
>>  	if (!pgt->pgd)
>>  		return -ENOMEM;
>>  
>>  	pgt->ia_bits		= va_bits;
>> -	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
>> +	pgt->start_level	= start_level;
>>  	pgt->mm_ops		= mm_ops;
>>  	pgt->mmu		= NULL;
>>  	pgt->force_pte_cb	= NULL;
>> @@ -628,7 +632,7 @@ struct stage2_map_data {
>>  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>>  {
>>  	u64 vtcr = VTCR_EL2_FLAGS;
>> -	u8 lvls;
>> +	s8 lvls;
>>  
>>  	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
>>  	vtcr |= VTCR_EL2_T0SZ(phys_shift);
>> @@ -911,7 +915,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>>  {
>>  	u64 phys = stage2_map_walker_phys_addr(ctx, data);
>>  
>> -	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
>> +	if (data->force_pte && ctx->level < KVM_PGTABLE_LAST_LEVEL)
>>  		return false;
>>  
>>  	return kvm_block_mapping_supported(ctx, phys);
>> @@ -990,7 +994,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>>  	if (ret != -E2BIG)
>>  		return ret;
>>  
>> -	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>> +	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
>>  		return -EINVAL;
>>  
>>  	if (!data->memcache)
>> @@ -1160,7 +1164,7 @@ struct stage2_attr_data {
>>  	kvm_pte_t			attr_set;
>>  	kvm_pte_t			attr_clr;
>>  	kvm_pte_t			pte;
>> -	u32				level;
>> +	s8				level;
>>  };
>>  
>>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>> @@ -1203,7 +1207,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>>  				    u64 size, kvm_pte_t attr_set,
>>  				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
>> -				    u32 *level, enum kvm_pgtable_walk_flags flags)
>> +				    s8 *level, enum kvm_pgtable_walk_flags flags)
>>  {
>>  	int ret;
>>  	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
>> @@ -1305,7 +1309,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>>  				   enum kvm_pgtable_prot prot)
>>  {
>>  	int ret;
>> -	u32 level;
>> +	s8 level;
>>  	kvm_pte_t set = 0, clr = 0;
>>  
>>  	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
>> @@ -1358,7 +1362,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
>>  }
>>  
>>  kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>> -					      u64 phys, u32 level,
>> +					      u64 phys, s8 level,
>>  					      enum kvm_pgtable_prot prot,
>>  					      void *mc, bool force_pte)
>>  {
>> @@ -1416,7 +1420,7 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>>   * fully populated tree up to the PTE entries. Note that @level is
>>   * interpreted as in "level @level entry".
>>   */
>> -static int stage2_block_get_nr_page_tables(u32 level)
>> +static int stage2_block_get_nr_page_tables(s8 level)
>>  {
>>  	switch (level) {
>>  	case 1:
>> @@ -1427,7 +1431,7 @@ static int stage2_block_get_nr_page_tables(u32 level)
>>  		return 0;
>>  	default:
>>  		WARN_ON_ONCE(level < KVM_PGTABLE_MIN_BLOCK_LEVEL ||
>> -			     level >= KVM_PGTABLE_MAX_LEVELS);
>> +			     level > KVM_PGTABLE_LAST_LEVEL);
>>  		return -EINVAL;
>>  	};
>>  }
>> @@ -1440,13 +1444,13 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  	struct kvm_s2_mmu *mmu;
>>  	kvm_pte_t pte = ctx->old, new, *childp;
>>  	enum kvm_pgtable_prot prot;
>> -	u32 level = ctx->level;
>> +	s8 level = ctx->level;
>>  	bool force_pte;
>>  	int nr_pages;
>>  	u64 phys;
>>  
>>  	/* No huge-pages exist at the last level */
>> -	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
>> +	if (level == KVM_PGTABLE_LAST_LEVEL)
>>  		return 0;
>>  
>>  	/* We only split valid block mappings */
>> @@ -1523,7 +1527,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>>  	u64 vtcr = mmu->arch->vtcr;
>>  	u32 ia_bits = VTCR_EL2_IPA(vtcr);
>>  	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
>> -	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>> +	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>>  
>>  	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
>>  	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
>> @@ -1546,7 +1550,7 @@ size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
>>  {
>>  	u32 ia_bits = VTCR_EL2_IPA(vtcr);
>>  	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
>> -	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>> +	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>>  
>>  	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
>>  }
>> @@ -1582,7 +1586,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>>  	pgt->pgd = NULL;
>>  }
>>  
>> -void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
>> +void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
>>  {
>>  	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
>>  	struct kvm_pgtable_walker walker = {
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 482280fe22d7..73110ba3624c 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -223,12 +223,12 @@ static void stage2_free_unlinked_table_rcu_cb(struct rcu_head *head)
>>  {
>>  	struct page *page = container_of(head, struct page, rcu_head);
>>  	void *pgtable = page_to_virt(page);
>> -	u32 level = page_private(page);
>> +	s8 level = page_private(page);
>>  
>>  	kvm_pgtable_stage2_free_unlinked(&kvm_s2_mm_ops, pgtable, level);
>>  }
>>  
>> -static void stage2_free_unlinked_table(void *addr, u32 level)
>> +static void stage2_free_unlinked_table(void *addr, s8 level)
>>  {
>>  	struct page *page = virt_to_page(addr);
>>  
>> @@ -804,13 +804,13 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>>  	struct kvm_pgtable pgt = {
>>  		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
>>  		.ia_bits	= vabits_actual,
>> -		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
>> -				   CONFIG_PGTABLE_LEVELS),
>> +		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
>> +				   CONFIG_PGTABLE_LEVELS + 1),
>>  		.mm_ops		= &kvm_user_mm_ops,
>>  	};
>>  	unsigned long flags;
>>  	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
>> -	u32 level = ~0;
>> +	s8 level = ~0;
> 
> Well, that's a semantic difference. ~0 == -1, which is a valid level,
> while the original code was trying to initialise level to something
> invalid. On the bright side, this function is going away in 6.7...

ACK.

> 
>>  	int ret;
>>  
>>  	/*
>> @@ -829,7 +829,9 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>>  	 * Not seeing an error, but not updating level? Something went
>>  	 * deeply wrong...
>>  	 */
>> -	if (WARN_ON(level >= KVM_PGTABLE_MAX_LEVELS))
>> +	if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
>> +		return -EFAULT;
>> +	if (WARN_ON(level < KVM_PGTABLE_FIRST_LEVEL))
>>  		return -EFAULT;
>>  
>>  	/* Oops, the userspace PTs are gone... Replay the fault */
>> @@ -1407,7 +1409,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	gfn_t gfn;
>>  	kvm_pfn_t pfn;
>>  	bool logging_active = memslot_is_logging(memslot);
>> -	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>> +	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>>  	long vma_pagesize, fault_granule;
>>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>>  	struct kvm_pgtable *pgt;
> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 08/12] KVM: arm64: Convert translation level parameter to s8
@ 2023-10-20 15:11       ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 11:42, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:04 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
>> translation, level -1, so levels can now be in the range [-1;3]. 3 is
>> always the last level and the first level is determined based on the
>> number of VA bits in use.
>>
>> Convert level variables to use a signed type in preparation for
>> supporting this new level -1.
>>
>> Since the last level is always anchored at 3, and the first level varies
>> to suit the number of VA/IPA bits, take the opportunity to replace
>> KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
>> KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
>> levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
>> longer be true.
>>
>> No behavioral changes intended.
> 
> Shrug. Unless you have compared the binaries before and after and
> proven that they are strictly identical, there will be behaviour
> changes, intended or otherwise.
> 
> I know what you're trying to convey, but I've seen so many patches
> carrying a sentence of this sort and yet turning the kernel on its
> head that I've become allergic to it. Sorry.

I picked this habit up from other KVM patches, so assumed it was preferred
practice. I'll remove, and avoid in future.

> 
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_emulate.h  |  2 +-
>>  arch/arm64/include/asm/kvm_pgtable.h  | 31 ++++++-------
>>  arch/arm64/include/asm/kvm_pkvm.h     |  5 ++-
>>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +--
>>  arch/arm64/kvm/hyp/nvhe/mm.c          |  4 +-
>>  arch/arm64/kvm/hyp/nvhe/setup.c       |  2 +-
>>  arch/arm64/kvm/hyp/pgtable.c          | 64 ++++++++++++++-------------
>>  arch/arm64/kvm/mmu.c                  | 16 ++++---
>>  8 files changed, 69 insertions(+), 61 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 3d6725ff0bf6..bf3ef66eb51f 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -404,7 +404,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
>>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>>  }
>>  
>> -static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>> +static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>>  {
>>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
>>  }
>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>> index b240158e1218..c61bb9709201 100644
>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>> @@ -11,7 +11,8 @@
>>  #include <linux/kvm_host.h>
>>  #include <linux/types.h>
>>  
>> -#define KVM_PGTABLE_MAX_LEVELS		4U
>> +#define KVM_PGTABLE_FIRST_LEVEL		0
>> +#define KVM_PGTABLE_LAST_LEVEL		3
>>  
>>  /*
>>   * The largest supported block sizes for KVM (no 52-bit PA support):
>> @@ -20,9 +21,9 @@
>>   *  - 64K (level 2):	512MB
>>   */
>>  #ifdef CONFIG_ARM64_4K_PAGES
>> -#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1U
>> +#define KVM_PGTABLE_MIN_BLOCK_LEVEL	1
>>  #else
>> -#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2U
>> +#define KVM_PGTABLE_MIN_BLOCK_LEVEL	2
>>  #endif
>>  
>>  static inline u64 kvm_get_parange_max(void)
>> @@ -101,28 +102,28 @@ static inline kvm_pfn_t kvm_pte_to_pfn(kvm_pte_t pte)
>>  	return __phys_to_pfn(kvm_pte_to_phys(pte));
>>  }
>>  
>> -static inline u64 kvm_granule_shift(u32 level)
>> +static inline u64 kvm_granule_shift(s8 level)
>>  {
>> -	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
>> +	/* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
>>  	return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
> 
> I'm amazed that the macro tolerates a negative level, but it really
> does.

Yep. I remember I spent quite a while convincing myself of this, because I also
initially assumed that it would not.

> 
>>  }
>>  
>> -static inline u64 kvm_granule_size(u32 level)
>> +static inline u64 kvm_granule_size(s8 level)
>>  {
>>  	return BIT(kvm_granule_shift(level));
>>  }
>>  
>> -static inline bool kvm_level_supports_block_mapping(u32 level)
>> +static inline bool kvm_level_supports_block_mapping(s8 level)
>>  {
>>  	return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
>>  }
>>  
>>  static inline u32 kvm_supported_block_sizes(void)
>>  {
>> -	u32 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
>> +	s8 level = KVM_PGTABLE_MIN_BLOCK_LEVEL;
>>  	u32 r = 0;
>>  
>> -	for (; level < KVM_PGTABLE_MAX_LEVELS; level++)
>> +	for (; level <= KVM_PGTABLE_LAST_LEVEL; level++)
>>  		r |= BIT(kvm_granule_shift(level));
>>  
>>  	return r;
>> @@ -167,7 +168,7 @@ struct kvm_pgtable_mm_ops {
>>  	void*		(*zalloc_page)(void *arg);
>>  	void*		(*zalloc_pages_exact)(size_t size);
>>  	void		(*free_pages_exact)(void *addr, size_t size);
>> -	void		(*free_unlinked_table)(void *addr, u32 level);
>> +	void		(*free_unlinked_table)(void *addr, s8 level);
>>  	void		(*get_page)(void *addr);
>>  	void		(*put_page)(void *addr);
>>  	int		(*page_count)(void *addr);
>> @@ -263,7 +264,7 @@ struct kvm_pgtable_visit_ctx {
>>  	u64					start;
>>  	u64					addr;
>>  	u64					end;
>> -	u32					level;
>> +	s8					level;
>>  	enum kvm_pgtable_walk_flags		flags;
>>  };
>>  
>> @@ -366,7 +367,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
>>   */
>>  struct kvm_pgtable {
>>  	u32					ia_bits;
>> -	u32					start_level;
>> +	s8					start_level;
>>  	kvm_pteref_t				pgd;
>>  	struct kvm_pgtable_mm_ops		*mm_ops;
>>  
>> @@ -500,7 +501,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>>   * The page-table is assumed to be unreachable by any hardware walkers prior to
>>   * freeing and therefore no TLB invalidation is performed.
>>   */
>> -void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level);
>> +void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level);
>>  
>>  /**
>>   * kvm_pgtable_stage2_create_unlinked() - Create an unlinked stage-2 paging structure.
>> @@ -524,7 +525,7 @@ void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *p
>>   * an ERR_PTR(error) on failure.
>>   */
>>  kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>> -					      u64 phys, u32 level,
>> +					      u64 phys, s8 level,
>>  					      enum kvm_pgtable_prot prot,
>>  					      void *mc, bool force_pte);
>>  
>> @@ -750,7 +751,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>>   * Return: 0 on success, negative error code on failure.
>>   */
>>  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>> -			 kvm_pte_t *ptep, u32 *level);
>> +			 kvm_pte_t *ptep, s8 *level);
>>  
>>  /**
>>   * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
>> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
>> index e46250a02017..ad9cfb5c1ff4 100644
>> --- a/arch/arm64/include/asm/kvm_pkvm.h
>> +++ b/arch/arm64/include/asm/kvm_pkvm.h
>> @@ -56,10 +56,11 @@ static inline unsigned long hyp_vm_table_pages(void)
>>  
>>  static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
>>  {
>> -	unsigned long total = 0, i;
>> +	unsigned long total = 0;
>> +	int i;
>>  
>>  	/* Provision the worst case scenario */
>> -	for (i = 0; i < KVM_PGTABLE_MAX_LEVELS; i++) {
>> +	for (i = KVM_PGTABLE_FIRST_LEVEL; i <= KVM_PGTABLE_LAST_LEVEL; i++) {
>>  		nr_pages = DIV_ROUND_UP(nr_pages, PTRS_PER_PTE);
>>  		total += nr_pages;
>>  	}
>> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
>> index 9d703441278b..2cfb6352a8ea 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
>> @@ -91,7 +91,7 @@ static void host_s2_put_page(void *addr)
>>  	hyp_put_page(&host_s2_pool, addr);
>>  }
>>  
>> -static void host_s2_free_unlinked_table(void *addr, u32 level)
>> +static void host_s2_free_unlinked_table(void *addr, s8 level)
>>  {
>>  	kvm_pgtable_stage2_free_unlinked(&host_mmu.mm_ops, addr, level);
>>  }
>> @@ -443,7 +443,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>>  {
>>  	struct kvm_mem_range cur;
>>  	kvm_pte_t pte;
>> -	u32 level;
>> +	s8 level;
>>  	int ret;
>>  
>>  	hyp_assert_lock_held(&host_mmu.lock);
>> @@ -462,7 +462,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>>  		cur.start = ALIGN_DOWN(addr, granule);
>>  		cur.end = cur.start + granule;
>>  		level++;
>> -	} while ((level < KVM_PGTABLE_MAX_LEVELS) &&
>> +	} while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
>>  			!(kvm_level_supports_block_mapping(level) &&
>>  			  range_included(&cur, range)));
>>  
>> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
>> index 65a7a186d7b2..b01a3d1078a8 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
>> @@ -260,7 +260,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
>>  	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
>>  	 */
>>  	dsb(ishst);
>> -	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
>> +	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
>>  	dsb(ish);
>>  	isb();
>>  }
>> @@ -275,7 +275,7 @@ static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
>>  {
>>  	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
>>  
>> -	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_MAX_LEVELS - 1)
>> +	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
>>  		return -EINVAL;
>>  
>>  	slot->addr = ctx->addr;
>> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
>> index 0d5e0a89ddce..bc58d1b515af 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
>> @@ -181,7 +181,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  	if (!kvm_pte_valid(ctx->old))
>>  		return 0;
>>  
>> -	if (ctx->level != (KVM_PGTABLE_MAX_LEVELS - 1))
>> +	if (ctx->level != KVM_PGTABLE_LAST_LEVEL)
>>  		return -EINVAL;
>>  
>>  	phys = kvm_pte_to_phys(ctx->old);
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 062eb7bcdb8a..8e79ff6972ce 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -101,7 +101,7 @@ static bool kvm_block_mapping_supported(const struct kvm_pgtable_visit_ctx *ctx,
>>  	return IS_ALIGNED(ctx->addr, granule);
>>  }
>>  
>> -static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
>> +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, s8 level)
>>  {
>>  	u64 shift = kvm_granule_shift(level);
>>  	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
>> @@ -117,7 +117,7 @@ static u32 kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
>>  	return (addr & mask) >> shift;
>>  }
>>  
>> -static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>> +static u32 kvm_pgd_pages(u32 ia_bits, s8 start_level)
>>  {
>>  	struct kvm_pgtable pgt = {
>>  		.ia_bits	= ia_bits,
>> @@ -127,9 +127,9 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>>  	return kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>>  }
>>  
>> -static bool kvm_pte_table(kvm_pte_t pte, u32 level)
>> +static bool kvm_pte_table(kvm_pte_t pte, s8 level)
>>  {
>> -	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
>> +	if (level == KVM_PGTABLE_LAST_LEVEL)
>>  		return false;
>>  
>>  	if (!kvm_pte_valid(pte))
>> @@ -157,11 +157,11 @@ static kvm_pte_t kvm_init_table_pte(kvm_pte_t *childp, struct kvm_pgtable_mm_ops
>>  	return pte;
>>  }
>>  
>> -static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
>> +static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
>>  {
>>  	kvm_pte_t pte = kvm_phys_to_pte(pa);
>> -	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
>> -							   KVM_PTE_TYPE_BLOCK;
>> +	u64 type = (level == KVM_PGTABLE_LAST_LEVEL) ? KVM_PTE_TYPE_PAGE :
>> +						       KVM_PTE_TYPE_BLOCK;
>>  
>>  	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
>>  	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
>> @@ -206,11 +206,11 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
>>  }
>>  
>>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>> -			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
>> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level);
>>  
>>  static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>>  				      struct kvm_pgtable_mm_ops *mm_ops,
>> -				      kvm_pteref_t pteref, u32 level)
>> +				      kvm_pteref_t pteref, s8 level)
>>  {
>>  	enum kvm_pgtable_walk_flags flags = data->walker->flags;
>>  	kvm_pte_t *ptep = kvm_dereference_pteref(data->walker, pteref);
>> @@ -275,12 +275,12 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
>>  }
>>  
>>  static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
>> -			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level)
>> +			      struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, s8 level)
>>  {
>>  	u32 idx;
>>  	int ret = 0;
>>  
>> -	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
>> +	if (WARN_ON_ONCE(level > KVM_PGTABLE_LAST_LEVEL))
>>  		return -EINVAL;
> 
> Now that level can be negative, you may want to check it against
> KVM_PGTABLE_FIRST_LEVEL as well.

ACK.

> 
>>  
>>  	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
>> @@ -343,7 +343,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>>  
>>  struct leaf_walk_data {
>>  	kvm_pte_t	pte;
>> -	u32		level;
>> +	s8		level;
>>  };
>>  
>>  static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>> @@ -358,7 +358,7 @@ static int leaf_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  }
>>  
>>  int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
>> -			 kvm_pte_t *ptep, u32 *level)
>> +			 kvm_pte_t *ptep, s8 *level)
>>  {
>>  	struct leaf_walk_data data;
>>  	struct kvm_pgtable_walker walker = {
>> @@ -471,7 +471,7 @@ static int hyp_map_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  	if (hyp_map_walker_try_leaf(ctx, data))
>>  		return 0;
>>  
>> -	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>> +	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
>>  		return -EINVAL;
> 
> Same thing.

ACK.

> 
>>  
>>  	childp = (kvm_pte_t *)mm_ops->zalloc_page(NULL);
>> @@ -567,14 +567,18 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>>  int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>>  			 struct kvm_pgtable_mm_ops *mm_ops)
>>  {
>> -	u64 levels = ARM64_HW_PGTABLE_LEVELS(va_bits);
>> +	s8 start_level = KVM_PGTABLE_LAST_LEVEL + 1 -
>> +			 ARM64_HW_PGTABLE_LEVELS(va_bits);
>> +	if (start_level < KVM_PGTABLE_FIRST_LEVEL ||
>> +	    start_level > KVM_PGTABLE_LAST_LEVEL)
>> +		return -EINVAL;
> 
> Please add a new line between the variable definition and the if ()
> statement.

Hmm, I'm surprised that check_patch.pl didn't flag this. Or more likely I
somehow didn't run it for this patch...

> 
>>  
>>  	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_page(NULL);
>>  	if (!pgt->pgd)
>>  		return -ENOMEM;
>>  
>>  	pgt->ia_bits		= va_bits;
>> -	pgt->start_level	= KVM_PGTABLE_MAX_LEVELS - levels;
>> +	pgt->start_level	= start_level;
>>  	pgt->mm_ops		= mm_ops;
>>  	pgt->mmu		= NULL;
>>  	pgt->force_pte_cb	= NULL;
>> @@ -628,7 +632,7 @@ struct stage2_map_data {
>>  u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
>>  {
>>  	u64 vtcr = VTCR_EL2_FLAGS;
>> -	u8 lvls;
>> +	s8 lvls;
>>  
>>  	vtcr |= kvm_get_parange(mmfr0) << VTCR_EL2_PS_SHIFT;
>>  	vtcr |= VTCR_EL2_T0SZ(phys_shift);
>> @@ -911,7 +915,7 @@ static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
>>  {
>>  	u64 phys = stage2_map_walker_phys_addr(ctx, data);
>>  
>> -	if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
>> +	if (data->force_pte && ctx->level < KVM_PGTABLE_LAST_LEVEL)
>>  		return false;
>>  
>>  	return kvm_block_mapping_supported(ctx, phys);
>> @@ -990,7 +994,7 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>>  	if (ret != -E2BIG)
>>  		return ret;
>>  
>> -	if (WARN_ON(ctx->level == KVM_PGTABLE_MAX_LEVELS - 1))
>> +	if (WARN_ON(ctx->level == KVM_PGTABLE_LAST_LEVEL))
>>  		return -EINVAL;
>>  
>>  	if (!data->memcache)
>> @@ -1160,7 +1164,7 @@ struct stage2_attr_data {
>>  	kvm_pte_t			attr_set;
>>  	kvm_pte_t			attr_clr;
>>  	kvm_pte_t			pte;
>> -	u32				level;
>> +	s8				level;
>>  };
>>  
>>  static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>> @@ -1203,7 +1207,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>>  				    u64 size, kvm_pte_t attr_set,
>>  				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte,
>> -				    u32 *level, enum kvm_pgtable_walk_flags flags)
>> +				    s8 *level, enum kvm_pgtable_walk_flags flags)
>>  {
>>  	int ret;
>>  	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
>> @@ -1305,7 +1309,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
>>  				   enum kvm_pgtable_prot prot)
>>  {
>>  	int ret;
>> -	u32 level;
>> +	s8 level;
>>  	kvm_pte_t set = 0, clr = 0;
>>  
>>  	if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
>> @@ -1358,7 +1362,7 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
>>  }
>>  
>>  kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>> -					      u64 phys, u32 level,
>> +					      u64 phys, s8 level,
>>  					      enum kvm_pgtable_prot prot,
>>  					      void *mc, bool force_pte)
>>  {
>> @@ -1416,7 +1420,7 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
>>   * fully populated tree up to the PTE entries. Note that @level is
>>   * interpreted as in "level @level entry".
>>   */
>> -static int stage2_block_get_nr_page_tables(u32 level)
>> +static int stage2_block_get_nr_page_tables(s8 level)
>>  {
>>  	switch (level) {
>>  	case 1:
>> @@ -1427,7 +1431,7 @@ static int stage2_block_get_nr_page_tables(u32 level)
>>  		return 0;
>>  	default:
>>  		WARN_ON_ONCE(level < KVM_PGTABLE_MIN_BLOCK_LEVEL ||
>> -			     level >= KVM_PGTABLE_MAX_LEVELS);
>> +			     level > KVM_PGTABLE_LAST_LEVEL);
>>  		return -EINVAL;
>>  	};
>>  }
>> @@ -1440,13 +1444,13 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
>>  	struct kvm_s2_mmu *mmu;
>>  	kvm_pte_t pte = ctx->old, new, *childp;
>>  	enum kvm_pgtable_prot prot;
>> -	u32 level = ctx->level;
>> +	s8 level = ctx->level;
>>  	bool force_pte;
>>  	int nr_pages;
>>  	u64 phys;
>>  
>>  	/* No huge-pages exist at the last level */
>> -	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
>> +	if (level == KVM_PGTABLE_LAST_LEVEL)
>>  		return 0;
>>  
>>  	/* We only split valid block mappings */
>> @@ -1523,7 +1527,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
>>  	u64 vtcr = mmu->arch->vtcr;
>>  	u32 ia_bits = VTCR_EL2_IPA(vtcr);
>>  	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
>> -	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>> +	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>>  
>>  	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
>>  	pgt->pgd = (kvm_pteref_t)mm_ops->zalloc_pages_exact(pgd_sz);
>> @@ -1546,7 +1550,7 @@ size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
>>  {
>>  	u32 ia_bits = VTCR_EL2_IPA(vtcr);
>>  	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
>> -	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>> +	s8 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
>>  
>>  	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
>>  }
>> @@ -1582,7 +1586,7 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>>  	pgt->pgd = NULL;
>>  }
>>  
>> -void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, u32 level)
>> +void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, void *pgtable, s8 level)
>>  {
>>  	kvm_pteref_t ptep = (kvm_pteref_t)pgtable;
>>  	struct kvm_pgtable_walker walker = {
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 482280fe22d7..73110ba3624c 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -223,12 +223,12 @@ static void stage2_free_unlinked_table_rcu_cb(struct rcu_head *head)
>>  {
>>  	struct page *page = container_of(head, struct page, rcu_head);
>>  	void *pgtable = page_to_virt(page);
>> -	u32 level = page_private(page);
>> +	s8 level = page_private(page);
>>  
>>  	kvm_pgtable_stage2_free_unlinked(&kvm_s2_mm_ops, pgtable, level);
>>  }
>>  
>> -static void stage2_free_unlinked_table(void *addr, u32 level)
>> +static void stage2_free_unlinked_table(void *addr, s8 level)
>>  {
>>  	struct page *page = virt_to_page(addr);
>>  
>> @@ -804,13 +804,13 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>>  	struct kvm_pgtable pgt = {
>>  		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
>>  		.ia_bits	= vabits_actual,
>> -		.start_level	= (KVM_PGTABLE_MAX_LEVELS -
>> -				   CONFIG_PGTABLE_LEVELS),
>> +		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
>> +				   CONFIG_PGTABLE_LEVELS + 1),
>>  		.mm_ops		= &kvm_user_mm_ops,
>>  	};
>>  	unsigned long flags;
>>  	kvm_pte_t pte = 0;	/* Keep GCC quiet... */
>> -	u32 level = ~0;
>> +	s8 level = ~0;
> 
> Well, that's a semantic difference. ~0 == -1, which is a valid level,
> while the original code was trying to initialise level to something
> invalid. On the bright side, this function is going away in 6.7...

ACK.

> 
>>  	int ret;
>>  
>>  	/*
>> @@ -829,7 +829,9 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
>>  	 * Not seeing an error, but not updating level? Something went
>>  	 * deeply wrong...
>>  	 */
>> -	if (WARN_ON(level >= KVM_PGTABLE_MAX_LEVELS))
>> +	if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
>> +		return -EFAULT;
>> +	if (WARN_ON(level < KVM_PGTABLE_FIRST_LEVEL))
>>  		return -EFAULT;
>>  
>>  	/* Oops, the userspace PTs are gone... Replay the fault */
>> @@ -1407,7 +1409,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	gfn_t gfn;
>>  	kvm_pfn_t pfn;
>>  	bool logging_active = memslot_is_logging(memslot);
>> -	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>> +	s8 fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>>  	long vma_pagesize, fault_granule;
>>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>>  	struct kvm_pgtable *pgt;
> 
> Thanks,
> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2023-10-20 10:54   ` Marc Zyngier
@ 2023-10-20 15:22     ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 11:54, Marc Zyngier wrote:
> Hi Ryan,
> 
> On Mon, 09 Oct 2023 19:49:56 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Hi All,
>>
>> This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
>> nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
>> FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
>> already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
>> The series does not include support for FEAT_LPA2 in the kernel stage 1. This
>> support is provided separately by Ard Biesheuvel's series at [4]. The two series
>> are mostly independent.
>>
>> This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
>> minor changes based on review comments from Oliver.
>>
>> NOTE: I've included my patch to update the range-based tlbi functions to work
>> with LPA2 in this version, because KVM has started using range-based tlbi
>> invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
>> calls will use the LPA2 format if LPA2 is in use by KVM, but the
>> kernel-originated calls are hardcoded to never use the LPA2 format. If merging
>> with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
>> from __flush_tlb_range() appropriately.
>>
>>
>> Testing
>> =======
>>
>> Testing has been done exclusively on the FVP and covers my boot matrix tests
>> and kvm selftests.
>>
>> The host/guest config boot matrix gives the same (expected) results as for the
>> v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
>> load the host kernel into high memory which isn't expected to work until the
>> kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
>> on the exact configs).
>>
>> KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
>> all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
>> takes about 10 hours on FVP, sigh, but I can test a few more host configs if
>> useful).
> 
> Have you tried with the (brand new) "arm64_sw.hvhe=1" command-line
> option, which enables VHE for the EL2 hypervisor only? I expect things
> to work, but it would be good to make sure...

No, I haven't tried. I did notice it when I rebased but convinced myself that it
doesn't affect the page table stuff. I'm happy to give it a spin once I've
rebased to v6.7-rc1 though.

> 
>> All tests pass except "memslot_perf_test", which fails due to a timeout
>> while syncing. This test fails in the same way for plain v6.6-rc1, so I'm
>> confident this is not a regression caused by this series. (the issue is that
>> alarm(2) is issued and the signal is received before alarm(0) is issued. I
>> expect this is an FVP-time related problem, although I'm not sure how to fix
>> robustly for the FVP without potentially hanging real systems for long periods
>> of time).
> 
> [...]
> 
> This is starting to look good, and I only had pretty minor comments on
> this series so far. It is too late for 6.7, but if you can respin it
> for -rc1, I'll happily review it again and queue it for 6.8 if things
> keep looking OK.

Thanks for the review! This all sounds great to me. I'll probably wait for
v6.7-rc1 and do the rebase, fix up all your comments and do the benchmarking
then repost.

Thanks,
Ryan


> 
> Thanks,
> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-10-20 15:22     ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-20 15:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 11:54, Marc Zyngier wrote:
> Hi Ryan,
> 
> On Mon, 09 Oct 2023 19:49:56 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Hi All,
>>
>> This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
>> nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
>> FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
>> already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
>> The series does not include support for FEAT_LPA2 in the kernel stage 1. This
>> support is provided separately by Ard Biesheuvel's series at [4]. The two series
>> are mostly independent.
>>
>> This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
>> minor changes based on review comments from Oliver.
>>
>> NOTE: I've included my patch to update the range-based tlbi functions to work
>> with LPA2 in this version, because KVM has started using range-based tlbi
>> invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
>> calls will use the LPA2 format if LPA2 is in use by KVM, but the
>> kernel-originated calls are hardcoded to never use the LPA2 format. If merging
>> with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
>> from __flush_tlb_range() appropriately.
>>
>>
>> Testing
>> =======
>>
>> Testing has been done exclusively on the FVP and covers my boot matrix tests
>> and kvm selftests.
>>
>> The host/guest config boot matrix gives the same (expected) results as for the
>> v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
>> load the host kernel into high memory which isn't expected to work until the
>> kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
>> on the exact configs).
>>
>> KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
>> all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
>> takes about 10 hours on FVP, sigh, but I can test a few more host configs if
>> useful).
> 
> Have you tried with the (brand new) "arm64_sw.hvhe=1" command-line
> option, which enables VHE for the EL2 hypervisor only? I expect things
> to work, but it would be good to make sure...

No, I haven't tried. I did notice it when I rebased but convinced myself that it
doesn't affect the page table stuff. I'm happy to give it a spin once I've
rebased to v6.7-rc1 though.

> 
>> All tests pass except "memslot_perf_test", which fails due to a timeout
>> while syncing. This test fails in the same way for plain v6.6-rc1, so I'm
>> confident this is not a regression caused by this series. (the issue is that
>> alarm(2) is issued and the signal is received before alarm(0) is issued. I
>> expect this is an FVP-time related problem, although I'm not sure how to fix
>> robustly for the FVP without potentially hanging real systems for long periods
>> of time).
> 
> [...]
> 
> This is starting to look good, and I only had pretty minor comments on
> this series so far. It is too late for 6.7, but if you can respin it
> for -rc1, I'll happily review it again and queue it for 6.8 if things
> keep looking OK.

Thanks for the review! This all sounds great to me. I'll probably wait for
v6.7-rc1 and do the rebase, fix up all your comments and do the benchmarking
then repost.

Thanks,
Ryan


> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
  2023-10-20 15:03       ` Ryan Roberts
@ 2023-10-23  9:34         ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-23  9:34 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 16:03:37 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 09:16, Marc Zyngier wrote:
> > On Mon, 09 Oct 2023 19:50:00 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> +static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> >> +{
> >> +	u64 mmfr0;
> >> +	bool ret;
> >> +
> >> +	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> >> +	ret = has_lpa2_at_stage1(mmfr0);
> >> +
> >> +	if (kvm_get_mode() != KVM_MODE_NONE)
> >> +		ret = ret && has_lpa2_at_stage2(mmfr0);
> > 
> > Isn't it too late to go back on the decision to use LPA2 at S1 if you
> > realise that S2 doesn't support it?
> 
> The KVM mode dependent part was a change that Oliver asked for. I guess you are
> talking about kernel S1? I don't think it's too late here to decide whether the
> (nvhe) hyp s1 should use LPA2. But I guess your point is that kernel s1 would
> have had to decide much earlier in boot and will have had to take LPA2 support
> in both S1 and S2 into account, and would not have the KVM mode info available
> to it at that point?

That's roughly my point. When we reach this point on a VHE system,
we're pretty far along and I'm not sure we can turn back. In all
honesty, if a system doesn't support LPA2 at S2, it is in a pretty bad
shape and we shouldn't bother supporting it. Or at least not with KVM.

Just because the architecture allows braindead configurations doesn't
mean we have to go out of our way to support them. In this case, I'd
be absolutely fine with disabling KVM altogether.

> > Why isn't this patch the first or second in the series? You could use
> > it to drive the LPA2 decision in the patch #2, avoiding the ugly lpa2
> > flag...
> 
> I still only think this works if we put my patch and Ard's patch in atomically?
> Or at least force has_lpa2() to always return false until both are in, then flip
> the switch atomically.

Whichever works for you. My only ask is to try to minimise the churn
here.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
@ 2023-10-23  9:34         ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-23  9:34 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 16:03:37 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 09:16, Marc Zyngier wrote:
> > On Mon, 09 Oct 2023 19:50:00 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> +static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> >> +{
> >> +	u64 mmfr0;
> >> +	bool ret;
> >> +
> >> +	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> >> +	ret = has_lpa2_at_stage1(mmfr0);
> >> +
> >> +	if (kvm_get_mode() != KVM_MODE_NONE)
> >> +		ret = ret && has_lpa2_at_stage2(mmfr0);
> > 
> > Isn't it too late to go back on the decision to use LPA2 at S1 if you
> > realise that S2 doesn't support it?
> 
> The KVM mode dependent part was a change that Oliver asked for. I guess you are
> talking about kernel S1? I don't think it's too late here to decide whether the
> (nvhe) hyp s1 should use LPA2. But I guess your point is that kernel s1 would
> have had to decide much earlier in boot and will have had to take LPA2 support
> in both S1 and S2 into account, and would not have the KVM mode info available
> to it at that point?

That's roughly my point. When we reach this point on a VHE system,
we're pretty far along and I'm not sure we can turn back. In all
honesty, if a system doesn't support LPA2 at S2, it is in a pretty bad
shape and we shouldn't bother supporting it. Or at least not with KVM.

Just because the architecture allows braindead configurations doesn't
mean we have to go out of our way to support them. In this case, I'd
be absolutely fine with disabling KVM altogether.

> > Why isn't this patch the first or second in the series? You could use
> > it to drive the LPA2 decision in the patch #2, avoiding the ugly lpa2
> > flag...
> 
> I still only think this works if we put my patch and Ard's patch in atomically?
> Or at least force has_lpa2() to always return false until both are in, then flip
> the switch atomically.

Whichever works for you. My only ask is to try to minimise the churn
here.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
  2023-10-20 15:06       ` Ryan Roberts
@ 2023-10-23  9:36         ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-23  9:36 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 16:06:50 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 10:16, Marc Zyngier wrote:
> > On Mon, 09 Oct 2023 19:50:02 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> >> index d42b72f78a9b..c3cd16c6f95f 100644
> >> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> >> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> >> @@ -198,7 +198,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> >>  	/* Switch to requested VMID */
> >>  	__tlb_switch_to_guest(mmu, &cxt, false);
> >>  
> >> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
> >> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
> >> +				system_supports_lpa2());
> > 
> > At this stage, I'd fully expect the flag to have been subsumed into
> > the helper...
> 
> ACK. I'm planning to have has_lpa2() always return false for now. Then once
> Ard's changes are in, we can change it to report the system status. Then we can
> move this inside __flush_s2_tlb_range_op(). Does that work for you?

Sure, go for it and see what it looks like.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1
@ 2023-10-23  9:36         ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-23  9:36 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 16:06:50 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 10:16, Marc Zyngier wrote:
> > On Mon, 09 Oct 2023 19:50:02 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> >> index d42b72f78a9b..c3cd16c6f95f 100644
> >> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> >> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> >> @@ -198,7 +198,8 @@ void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> >>  	/* Switch to requested VMID */
> >>  	__tlb_switch_to_guest(mmu, &cxt, false);
> >>  
> >> -	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0, false);
> >> +	__flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0,
> >> +				system_supports_lpa2());
> > 
> > At this stage, I'd fully expect the flag to have been subsumed into
> > the helper...
> 
> ACK. I'm planning to have has_lpa2() always return false for now. Then once
> Ard's changes are in, we can change it to report the system status. Then we can
> move this inside __flush_s2_tlb_range_op(). Does that work for you?

Sure, go for it and see what it looks like.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2023-10-20 15:22     ` Ryan Roberts
@ 2023-10-23  9:42       ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-23  9:42 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 16:22:29 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 11:54, Marc Zyngier wrote:
> > Hi Ryan,
> > 
> > On Mon, 09 Oct 2023 19:49:56 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> Hi All,
> >>
> >> This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
> >> nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
> >> FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
> >> already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
> >> The series does not include support for FEAT_LPA2 in the kernel stage 1. This
> >> support is provided separately by Ard Biesheuvel's series at [4]. The two series
> >> are mostly independent.
> >>
> >> This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
> >> minor changes based on review comments from Oliver.
> >>
> >> NOTE: I've included my patch to update the range-based tlbi functions to work
> >> with LPA2 in this version, because KVM has started using range-based tlbi
> >> invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
> >> calls will use the LPA2 format if LPA2 is in use by KVM, but the
> >> kernel-originated calls are hardcoded to never use the LPA2 format. If merging
> >> with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
> >> from __flush_tlb_range() appropriately.
> >>
> >>
> >> Testing
> >> =======
> >>
> >> Testing has been done exclusively on the FVP and covers my boot matrix tests
> >> and kvm selftests.
> >>
> >> The host/guest config boot matrix gives the same (expected) results as for the
> >> v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
> >> load the host kernel into high memory which isn't expected to work until the
> >> kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
> >> on the exact configs).
> >>
> >> KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
> >> all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
> >> takes about 10 hours on FVP, sigh, but I can test a few more host configs if
> >> useful).
> > 
> > Have you tried with the (brand new) "arm64_sw.hvhe=1" command-line
> > option, which enables VHE for the EL2 hypervisor only? I expect things
> > to work, but it would be good to make sure...
> 
> No, I haven't tried. I did notice it when I rebased but convinced myself that it
> doesn't affect the page table stuff. I'm happy to give it a spin once I've
> rebased to v6.7-rc1 though.

It does affect the page-table format (KVM_PTE_LEAF_ATTR_LO_S1_AP_RO
and co), so I'm taking the view that whatever is not tested doesn't
work.

Thanks for giving it a go!

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-10-23  9:42       ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-10-23  9:42 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Fri, 20 Oct 2023 16:22:29 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> On 20/10/2023 11:54, Marc Zyngier wrote:
> > Hi Ryan,
> > 
> > On Mon, 09 Oct 2023 19:49:56 +0100,
> > Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> Hi All,
> >>
> >> This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
> >> nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
> >> FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
> >> already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
> >> The series does not include support for FEAT_LPA2 in the kernel stage 1. This
> >> support is provided separately by Ard Biesheuvel's series at [4]. The two series
> >> are mostly independent.
> >>
> >> This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
> >> minor changes based on review comments from Oliver.
> >>
> >> NOTE: I've included my patch to update the range-based tlbi functions to work
> >> with LPA2 in this version, because KVM has started using range-based tlbi
> >> invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
> >> calls will use the LPA2 format if LPA2 is in use by KVM, but the
> >> kernel-originated calls are hardcoded to never use the LPA2 format. If merging
> >> with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
> >> from __flush_tlb_range() appropriately.
> >>
> >>
> >> Testing
> >> =======
> >>
> >> Testing has been done exclusively on the FVP and covers my boot matrix tests
> >> and kvm selftests.
> >>
> >> The host/guest config boot matrix gives the same (expected) results as for the
> >> v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
> >> load the host kernel into high memory which isn't expected to work until the
> >> kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
> >> on the exact configs).
> >>
> >> KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
> >> all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
> >> takes about 10 hours on FVP, sigh, but I can test a few more host configs if
> >> useful).
> > 
> > Have you tried with the (brand new) "arm64_sw.hvhe=1" command-line
> > option, which enables VHE for the EL2 hypervisor only? I expect things
> > to work, but it would be good to make sure...
> 
> No, I haven't tried. I did notice it when I rebased but convinced myself that it
> doesn't affect the page table stuff. I'm happy to give it a spin once I've
> rebased to v6.7-rc1 though.

It does affect the page-table format (KVM_PTE_LEAF_ATTR_LO_S1_AP_RO
and co), so I'm taking the view that whatever is not tested doesn't
work.

Thanks for giving it a go!

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
  2023-10-23  9:42       ` Marc Zyngier
@ 2023-10-23 15:00         ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-23 15:00 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 23/10/2023 10:42, Marc Zyngier wrote:
> On Fri, 20 Oct 2023 16:22:29 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 20/10/2023 11:54, Marc Zyngier wrote:
>>> Hi Ryan,
>>>
>>> On Mon, 09 Oct 2023 19:49:56 +0100,
>>> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
>>>> nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
>>>> FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
>>>> already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
>>>> The series does not include support for FEAT_LPA2 in the kernel stage 1. This
>>>> support is provided separately by Ard Biesheuvel's series at [4]. The two series
>>>> are mostly independent.
>>>>
>>>> This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
>>>> minor changes based on review comments from Oliver.
>>>>
>>>> NOTE: I've included my patch to update the range-based tlbi functions to work
>>>> with LPA2 in this version, because KVM has started using range-based tlbi
>>>> invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
>>>> calls will use the LPA2 format if LPA2 is in use by KVM, but the
>>>> kernel-originated calls are hardcoded to never use the LPA2 format. If merging
>>>> with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
>>>> from __flush_tlb_range() appropriately.
>>>>
>>>>
>>>> Testing
>>>> =======
>>>>
>>>> Testing has been done exclusively on the FVP and covers my boot matrix tests
>>>> and kvm selftests.
>>>>
>>>> The host/guest config boot matrix gives the same (expected) results as for the
>>>> v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
>>>> load the host kernel into high memory which isn't expected to work until the
>>>> kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
>>>> on the exact configs).
>>>>
>>>> KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
>>>> all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
>>>> takes about 10 hours on FVP, sigh, but I can test a few more host configs if
>>>> useful).
>>>
>>> Have you tried with the (brand new) "arm64_sw.hvhe=1" command-line
>>> option, which enables VHE for the EL2 hypervisor only? I expect things
>>> to work, but it would be good to make sure...
>>
>> No, I haven't tried. I did notice it when I rebased but convinced myself that it
>> doesn't affect the page table stuff. I'm happy to give it a spin once I've
>> rebased to v6.7-rc1 though.
> 
> It does affect the page-table format (KVM_PTE_LEAF_ATTR_LO_S1_AP_RO
> and co), so I'm taking the view that whatever is not tested doesn't
> work.
> 
> Thanks for giving it a go!

ACk to this and the other emails you sent today on this topic. Thanks for the
fast responses - I'll come back to you on the TLBI benchmarks in a couple of
weeks if they show anything that means we have to make a decision other than
what we already discussed. Otherwise I'll be 4ish weeks when I post on top of
v6.7-rc1.

Thanks,
Ryan


> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2
@ 2023-10-23 15:00         ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-10-23 15:00 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 23/10/2023 10:42, Marc Zyngier wrote:
> On Fri, 20 Oct 2023 16:22:29 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 20/10/2023 11:54, Marc Zyngier wrote:
>>> Hi Ryan,
>>>
>>> On Mon, 09 Oct 2023 19:49:56 +0100,
>>> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> This adds support for FEAT_LPA2 to KVM for both hypervisor stage 1 (for the
>>>> nvhe/protected modes) and the vm stage 2 translation tables (for all modes).
>>>> FEAT_LPA2 enables 52 bit PAs and VAs for 4KB and 16KB granules (note this is
>>>> already supported for 64KB granules via the FEAT_LPA and FEAT_LVA extensions).
>>>> The series does not include support for FEAT_LPA2 in the kernel stage 1. This
>>>> support is provided separately by Ard Biesheuvel's series at [4]. The two series
>>>> are mostly independent.
>>>>
>>>> This is a small update from v3, rebased onto v6.6-rc5 and incorporating some
>>>> minor changes based on review comments from Oliver.
>>>>
>>>> NOTE: I've included my patch to update the range-based tlbi functions to work
>>>> with LPA2 in this version, because KVM has started using range-based tlbi
>>>> invalidation as of v6.6-rc1. I've done this in such a way that KVM-originated
>>>> calls will use the LPA2 format if LPA2 is in use by KVM, but the
>>>> kernel-originated calls are hardcoded to never use the LPA2 format. If merging
>>>> with Ard's series, you will need to update the 2 calls to __flush_tlb_range_op()
>>>> from __flush_tlb_range() appropriately.
>>>>
>>>>
>>>> Testing
>>>> =======
>>>>
>>>> Testing has been done exclusively on the FVP and covers my boot matrix tests
>>>> and kvm selftests.
>>>>
>>>> The host/guest config boot matrix gives the same (expected) results as for the
>>>> v3 submission; of 180 conifgs, 12 fail, and these are all due to attempting to
>>>> load the host kernel into high memory which isn't expected to work until the
>>>> kernel has FEAT_LPA2 support for its stage 1. (refer to v1 posting for details
>>>> on the exact configs).
>>>>
>>>> KVM selftests have been enhanced to support P52V48 4K and 16K guest modes, and
>>>> all tests have been run against a P48V48_4K host and a P52V52_4K host (a run
>>>> takes about 10 hours on FVP, sigh, but I can test a few more host configs if
>>>> useful).
>>>
>>> Have you tried with the (brand new) "arm64_sw.hvhe=1" command-line
>>> option, which enables VHE for the EL2 hypervisor only? I expect things
>>> to work, but it would be good to make sure...
>>
>> No, I haven't tried. I did notice it when I rebased but convinced myself that it
>> doesn't affect the page table stuff. I'm happy to give it a spin once I've
>> rebased to v6.7-rc1 though.
> 
> It does affect the page-table format (KVM_PTE_LEAF_ATTR_LO_S1_AP_RO
> and co), so I'm taking the view that whatever is not tested doesn't
> work.
> 
> Thanks for giving it a go!

ACk to this and the other emails you sent today on this topic. Thanks for the
fast responses - I'll come back to you on the TLBI benchmarks in a couple of
weeks if they show anything that means we have to make a decision other than
what we already discussed. Otherwise I'll be 4ish weeks when I post on top of
v6.7-rc1.

Thanks,
Ryan


> 
> 	M.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
  2023-10-20  8:16     ` Marc Zyngier
@ 2023-11-13 11:57       ` Ryan Roberts
  -1 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-11-13 11:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 09:16, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:00 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Expose FEAT_LPA2 as a capability so that we can take advantage of
>> alternatives patching in both the kernel and hypervisor.
>>
>> Although FEAT_LPA2 presence is advertised separately for stage1 and
>> stage2, the expectation is that in practice both stages will either
>> support or not support it. Therefore, for the case where KVM is present,
>> we combine both into a single capability, allowing us to simplify the
>> implementation. For the case where KVM is not present, we only care
>> about stage1.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/cpufeature.h |  5 ++++
>>  arch/arm64/kernel/cpufeature.c      | 46 +++++++++++++++++++++++++++++
>>  arch/arm64/tools/cpucaps            |  1 +
>>  3 files changed, 52 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 5bba39376055..b1292ec88538 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -831,6 +831,11 @@ static inline bool system_supports_tlb_range(void)
>>  		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
>>  }
>>  
>> +static inline bool system_supports_lpa2(void)
>> +{
>> +	return cpus_have_const_cap(ARM64_HAS_LPA2);
> 
> cpus_have_const_cap() is going away. You may want to look at Mark's
> series to see how to replace this one.
> 
>> +}
>> +
>>  int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>>  bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>>  
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 444a73c2e638..1ccb1fe0e310 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1746,6 +1746,46 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
>>  	return !meltdown_safe;
>>  }
>>  
>> +static inline bool has_lpa2_at_stage1(u64 mmfr0)
> 
> Why inline? It isn't like this has any performance implication...
> 
>> +{
>> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
>> +	unsigned int tgran;
>> +
>> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
>> +						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
>> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
>> +#else
>> +	return false;
>> +#endif
> 
> Writing this using IS_ENABLED() would be slightly more pleasing to my
> tired eyes... ;-)

Unfortunately this doesn't work because ID_AA64MMFR0_EL1_TGRAN_LPA2 is only
defined for 4K and 16K configs (there is no field ofr 64K). So proposing to do
it this way instead. Please shout if you have a better idea:

#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
static bool has_lpa2_at_stage1(u64 mmfr0)
{
	unsigned int tgran;

	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
					ID_AA64MMFR0_EL1_TGRAN_SHIFT);
	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
}

static bool has_lpa2_at_stage2(u64 mmfr0)
{
	unsigned int tgran;

	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
					ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
}

static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
{
	u64 mmfr0;

	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
	return has_lpa2_at_stage1(mmfr0) && has_lpa2_at_stage2(mmfr0);
}
#else
static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
{
	return false;
}
#endif


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
@ 2023-11-13 11:57       ` Ryan Roberts
  0 siblings, 0 replies; 76+ messages in thread
From: Ryan Roberts @ 2023-11-13 11:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On 20/10/2023 09:16, Marc Zyngier wrote:
> On Mon, 09 Oct 2023 19:50:00 +0100,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Expose FEAT_LPA2 as a capability so that we can take advantage of
>> alternatives patching in both the kernel and hypervisor.
>>
>> Although FEAT_LPA2 presence is advertised separately for stage1 and
>> stage2, the expectation is that in practice both stages will either
>> support or not support it. Therefore, for the case where KVM is present,
>> we combine both into a single capability, allowing us to simplify the
>> implementation. For the case where KVM is not present, we only care
>> about stage1.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  arch/arm64/include/asm/cpufeature.h |  5 ++++
>>  arch/arm64/kernel/cpufeature.c      | 46 +++++++++++++++++++++++++++++
>>  arch/arm64/tools/cpucaps            |  1 +
>>  3 files changed, 52 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 5bba39376055..b1292ec88538 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -831,6 +831,11 @@ static inline bool system_supports_tlb_range(void)
>>  		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
>>  }
>>  
>> +static inline bool system_supports_lpa2(void)
>> +{
>> +	return cpus_have_const_cap(ARM64_HAS_LPA2);
> 
> cpus_have_const_cap() is going away. You may want to look at Mark's
> series to see how to replace this one.
> 
>> +}
>> +
>>  int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>>  bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
>>  
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 444a73c2e638..1ccb1fe0e310 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -1746,6 +1746,46 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
>>  	return !meltdown_safe;
>>  }
>>  
>> +static inline bool has_lpa2_at_stage1(u64 mmfr0)
> 
> Why inline? It isn't like this has any performance implication...
> 
>> +{
>> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
>> +	unsigned int tgran;
>> +
>> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
>> +						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
>> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
>> +#else
>> +	return false;
>> +#endif
> 
> Writing this using IS_ENABLED() would be slightly more pleasing to my
> tired eyes... ;-)

Unfortunately this doesn't work because ID_AA64MMFR0_EL1_TGRAN_LPA2 is only
defined for 4K and 16K configs (there is no field ofr 64K). So proposing to do
it this way instead. Please shout if you have a better idea:

#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
static bool has_lpa2_at_stage1(u64 mmfr0)
{
	unsigned int tgran;

	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
					ID_AA64MMFR0_EL1_TGRAN_SHIFT);
	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
}

static bool has_lpa2_at_stage2(u64 mmfr0)
{
	unsigned int tgran;

	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
					ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
}

static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
{
	u64 mmfr0;

	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
	return has_lpa2_at_stage1(mmfr0) && has_lpa2_at_stage2(mmfr0);
}
#else
static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
{
	return false;
}
#endif


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
  2023-11-13 11:57       ` Ryan Roberts
@ 2023-11-13 12:16         ` Marc Zyngier
  -1 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-11-13 12:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 13 Nov 2023 11:57:45 +0000,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> >> +{
> >> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
> >> +	unsigned int tgran;
> >> +
> >> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> >> +						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
> >> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
> >> +#else
> >> +	return false;
> >> +#endif
> > 
> > Writing this using IS_ENABLED() would be slightly more pleasing to my
> > tired eyes... ;-)
> 
> Unfortunately this doesn't work because ID_AA64MMFR0_EL1_TGRAN_LPA2 is only
> defined for 4K and 16K configs (there is no field ofr 64K). So proposing to do
> it this way instead. Please shout if you have a better idea:
> 
> #if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
> static bool has_lpa2_at_stage1(u64 mmfr0)
> {
> 	unsigned int tgran;
> 
> 	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> 					ID_AA64MMFR0_EL1_TGRAN_SHIFT);
> 	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
> }
> 
> static bool has_lpa2_at_stage2(u64 mmfr0)
> {
> 	unsigned int tgran;
> 
> 	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> 					ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
> 	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
> }
> 
> static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> {
> 	u64 mmfr0;
> 
> 	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> 	return has_lpa2_at_stage1(mmfr0) && has_lpa2_at_stage2(mmfr0);
> }
> #else
> static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> {
> 	return false;
> }
> #endif

Ah, fair enough. This looks marginally better anyway.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability
@ 2023-11-13 12:16         ` Marc Zyngier
  0 siblings, 0 replies; 76+ messages in thread
From: Marc Zyngier @ 2023-11-13 12:16 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Catalin Marinas, Will Deacon, Oliver Upton, Suzuki K Poulose,
	James Morse, Zenghui Yu, Ard Biesheuvel, Anshuman Khandual,
	linux-arm-kernel, kvmarm

On Mon, 13 Nov 2023 11:57:45 +0000,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> >> +{
> >> +#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
> >> +	unsigned int tgran;
> >> +
> >> +	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> >> +						ID_AA64MMFR0_EL1_TGRAN_SHIFT);
> >> +	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
> >> +#else
> >> +	return false;
> >> +#endif
> > 
> > Writing this using IS_ENABLED() would be slightly more pleasing to my
> > tired eyes... ;-)
> 
> Unfortunately this doesn't work because ID_AA64MMFR0_EL1_TGRAN_LPA2 is only
> defined for 4K and 16K configs (there is no field ofr 64K). So proposing to do
> it this way instead. Please shout if you have a better idea:
> 
> #if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
> static bool has_lpa2_at_stage1(u64 mmfr0)
> {
> 	unsigned int tgran;
> 
> 	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> 					ID_AA64MMFR0_EL1_TGRAN_SHIFT);
> 	return tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2;
> }
> 
> static bool has_lpa2_at_stage2(u64 mmfr0)
> {
> 	unsigned int tgran;
> 
> 	tgran = cpuid_feature_extract_unsigned_field(mmfr0,
> 					ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
> 	return tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2;
> }
> 
> static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> {
> 	u64 mmfr0;
> 
> 	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> 	return has_lpa2_at_stage1(mmfr0) && has_lpa2_at_stage2(mmfr0);
> }
> #else
> static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope)
> {
> 	return false;
> }
> #endif

Ah, fair enough. This looks marginally better anyway.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2023-11-13 12:16 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-09 18:49 [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2 Ryan Roberts
2023-10-09 18:49 ` Ryan Roberts
2023-10-09 18:49 ` [PATCH v4 01/12] arm64/mm: Update non-range tlb invalidation routines for FEAT_LPA2 Ryan Roberts
2023-10-09 18:49   ` Ryan Roberts
2023-10-19  8:03   ` Marc Zyngier
2023-10-19  8:03     ` Marc Zyngier
2023-10-19  9:22     ` Ryan Roberts
2023-10-19  9:22       ` Ryan Roberts
2023-10-20  8:05       ` Marc Zyngier
2023-10-20  8:05         ` Marc Zyngier
2023-10-20 12:39         ` Ryan Roberts
2023-10-20 12:39           ` Ryan Roberts
2023-10-20 13:02           ` Marc Zyngier
2023-10-20 13:02             ` Marc Zyngier
2023-10-20 13:21             ` Ryan Roberts
2023-10-20 13:21               ` Ryan Roberts
2023-10-20 13:41               ` Marc Zyngier
2023-10-20 13:41                 ` Marc Zyngier
2023-10-09 18:49 ` [PATCH v4 02/12] arm64/mm: Update range-based " Ryan Roberts
2023-10-09 18:49   ` Ryan Roberts
2023-10-19 21:06   ` Marc Zyngier
2023-10-19 21:06     ` Marc Zyngier
2023-10-20 14:55     ` Ryan Roberts
2023-10-20 14:55       ` Ryan Roberts
2023-10-09 18:49 ` [PATCH v4 03/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] Ryan Roberts
2023-10-09 18:49   ` Ryan Roberts
2023-10-09 18:50 ` [PATCH v4 04/12] KVM: arm64: Add ARM64_HAS_LPA2 CPU capability Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-20  8:16   ` Marc Zyngier
2023-10-20  8:16     ` Marc Zyngier
2023-10-20 15:03     ` Ryan Roberts
2023-10-20 15:03       ` Ryan Roberts
2023-10-23  9:34       ` Marc Zyngier
2023-10-23  9:34         ` Marc Zyngier
2023-11-13 11:57     ` Ryan Roberts
2023-11-13 11:57       ` Ryan Roberts
2023-11-13 12:16       ` Marc Zyngier
2023-11-13 12:16         ` Marc Zyngier
2023-10-09 18:50 ` [PATCH v4 05/12] KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2 Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-09 18:50 ` [PATCH v4 06/12] KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1 Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-20  9:16   ` Marc Zyngier
2023-10-20  9:16     ` Marc Zyngier
2023-10-20 15:06     ` Ryan Roberts
2023-10-20 15:06       ` Ryan Roberts
2023-10-23  9:36       ` Marc Zyngier
2023-10-23  9:36         ` Marc Zyngier
2023-10-09 18:50 ` [PATCH v4 07/12] KVM: arm64: Prepare TCR_EL2.PS in cpu_prepare_hyp_mode() Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-20  9:21   ` Marc Zyngier
2023-10-20  9:21     ` Marc Zyngier
2023-10-20 15:07     ` Ryan Roberts
2023-10-20 15:07       ` Ryan Roberts
2023-10-09 18:50 ` [PATCH v4 08/12] KVM: arm64: Convert translation level parameter to s8 Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-20 10:42   ` Marc Zyngier
2023-10-20 10:42     ` Marc Zyngier
2023-10-20 15:11     ` Ryan Roberts
2023-10-20 15:11       ` Ryan Roberts
2023-10-09 18:50 ` [PATCH v4 09/12] KVM: arm64: Support up to 5 levels of translation in kvm_pgtable Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-09 18:50 ` [PATCH v4 10/12] KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-09 18:50 ` [PATCH v4 11/12] KVM: selftests: arm64: Determine max ipa size per-page size Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-09 18:50 ` [PATCH v4 12/12] KVM: selftests: arm64: Support P52V48 4K and 16K guest_modes Ryan Roberts
2023-10-09 18:50   ` Ryan Roberts
2023-10-20 10:54 ` [PATCH v4 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2 Marc Zyngier
2023-10-20 10:54   ` Marc Zyngier
2023-10-20 15:22   ` Ryan Roberts
2023-10-20 15:22     ` Ryan Roberts
2023-10-23  9:42     ` Marc Zyngier
2023-10-23  9:42       ` Marc Zyngier
2023-10-23 15:00       ` Ryan Roberts
2023-10-23 15:00         ` Ryan Roberts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.