All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
@ 2022-11-24 12:39 Ard Biesheuvel
  2022-11-24 12:39   ` Ard Biesheuvel
                   ` (20 more replies)
  0 siblings, 21 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Enable support for LPA2 when running with 4k or 16k pages. In the former
case, this requires 5 level paging with a runtime fallback to 4 on
non-LPA2 hardware. For consistency, the same approach is adopted for 16k
pages, where we fall back to 3 level paging (47 bit virtual addressing)
on non-LPA2 configurations. (Falling back to 48 bits would involve
finding a workaround for the fact that we cannot construct a level 0
table covering 52 bits of VA space that appears aligned to its size in
memory, and has the top 2 entries that represent the 48-bit region
appearing at an alignment of 64 bytes, which is required by the
architecture for TTBR address values. Also, using an additional level of
paging to translate a single VA bit is wasteful in terms of TLB
efficiency)

This means support for falling back to 3 levels of paging at runtime
when configured for 4 is also needed.

Another thing worth to note is that the repurposed physical address bits
in the page table descriptors were not RES0 before, and so there is now
a big global switch (called TCR.DS) which controls how all page table
descriptors are interpreted. This requires some extra care in the PTE
conversion helpers, and additional handling in the boot code to ensure
that we set TCR.DS safely if supported (and not overridden)

Note that this series is mostly orthogonal to work by Anshuman done last
year: this series assumes that 52-bit physical addressing is never
needed to map the kernel image itself, and therefore that we never need
ID map range extension to cover the kernel with a 5th level when running
with 4. And given that the LPA2 architectural feature covers both the
virtual and physical range extensions, where enabling the latter is
required to enable the former, we can simplify things further by only
enabling them as a pair. (I.e., 52-bit physical addressing cannot be
enabled for 48-bit VA space or smaller)

This series applies onto some of my previous work that is still in
flight, so these patches will not apply in isolation. Complete branch
can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-4k-lpa2

It supersedes the RFC v1 I sent out last week, which covered 16k pages
only. It also supersedes some related work I sent out in isolation
before:

[PATCH] arm64: mm: Enable KASAN for 16k/48-bit VA configurations
[PATCH 0/3] arm64: mm: Model LVA support as a CPU feature

Tested on QEMU with -cpu max and lpa2 both off and on, as well as using
the arm64.nolva override kernel command line parameter. Note that this
requires a QEMU built from the latest sources.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>

Anshuman Khandual (3):
  arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
  arm64/mm: Add FEAT_LPA2 specific TCR_EL1.DS field
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ard Biesheuvel (16):
  arm64: kaslr: Adjust randomization range dynamically
  arm64: mm: get rid of kimage_vaddr global variable
  arm64: head: remove order argument from early mapping routine
  arm64: mm: Handle LVA support as a CPU feature
  arm64: mm: Deal with potential ID map extension if VA_BITS >
    VA_BITS_MIN
  arm64: mm: Add feature override support for LVA
  arm64: mm: Wire up TCR.DS bit to PTE shareability fields
  arm64: mm: Add LPA2 support to phys<->pte conversion routines
  arm64: mm: Add definitions to support 5 levels of paging
  arm64: mm: add 5 level paging support to G-to-nG conversion routine
  arm64: Enable LPA2 at boot if supported by the system
  arm64: mm: Add 5 level paging support to fixmap and swapper handling
  arm64: kasan: Reduce minimum shadow alignment and enable 5 level
    paging
  arm64: mm: Add support for folding PUDs at runtime
  arm64: ptdump: Disregard unaddressable VA space
  arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs

 arch/arm64/Kconfig                      |  23 ++-
 arch/arm64/include/asm/assembler.h      |  42 ++---
 arch/arm64/include/asm/cpufeature.h     |   2 +
 arch/arm64/include/asm/fixmap.h         |   1 +
 arch/arm64/include/asm/kernel-pgtable.h |  27 ++-
 arch/arm64/include/asm/memory.h         |  23 ++-
 arch/arm64/include/asm/pgalloc.h        |  53 +++++-
 arch/arm64/include/asm/pgtable-hwdef.h  |  34 +++-
 arch/arm64/include/asm/pgtable-prot.h   |  18 +-
 arch/arm64/include/asm/pgtable-types.h  |   6 +
 arch/arm64/include/asm/pgtable.h        | 197 ++++++++++++++++++--
 arch/arm64/include/asm/sysreg.h         |   2 +
 arch/arm64/include/asm/tlb.h            |   3 +-
 arch/arm64/kernel/cpufeature.c          |  46 ++++-
 arch/arm64/kernel/head.S                |  99 +++++-----
 arch/arm64/kernel/image-vars.h          |   4 +
 arch/arm64/kernel/pi/idreg-override.c   |  29 ++-
 arch/arm64/kernel/pi/kaslr_early.c      |  23 ++-
 arch/arm64/kernel/pi/map_kernel.c       | 115 +++++++++++-
 arch/arm64/kernel/sleep.S               |   3 -
 arch/arm64/mm/init.c                    |   2 +-
 arch/arm64/mm/kasan_init.c              | 124 ++++++++++--
 arch/arm64/mm/mmap.c                    |   4 +
 arch/arm64/mm/mmu.c                     | 138 ++++++++++----
 arch/arm64/mm/pgd.c                     |  17 +-
 arch/arm64/mm/proc.S                    |  76 +++++++-
 arch/arm64/mm/ptdump.c                  |   4 +-
 arch/arm64/tools/cpucaps                |   1 +
 28 files changed, 907 insertions(+), 209 deletions(-)

-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 01/19] arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
@ 2022-11-24 12:39   ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 02/19] arm64/mm: Add FEAT_LPA2 specific TCR_EL1.DS field Ard Biesheuvel
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts, linux-kernel

From: Anshuman Khandual <anshuman.khandual@arm.com>

pte_to_phys() assembly definition does multiple bits field transformations
to derive physical address, embedded inside a page table entry. Unlike its
C counter part i.e __pte_to_phys(), pte_to_phys() is not very apparent. It
simplifies these operations via a new macro PTE_ADDR_HIGH_SHIFT indicating
how far the pte encoded higher address bits need to be left shifted. While
here, this also updates __pte_to_phys() and __phys_to_pte_val().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20221107141753.2938621-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/assembler.h     | 8 +++-----
 arch/arm64/include/asm/pgtable-hwdef.h | 1 +
 arch/arm64/include/asm/pgtable.h       | 4 ++--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index e5957a53be39..89038067ef34 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -660,12 +660,10 @@ alternative_endif
 	.endm
 
 	.macro	pte_to_phys, phys, pte
-#ifdef CONFIG_ARM64_PA_BITS_52
-	ubfiz	\phys, \pte, #(48 - 16 - 12), #16
-	bfxil	\phys, \pte, #16, #32
-	lsl	\phys, \phys, #16
-#else
 	and	\phys, \pte, #PTE_ADDR_MASK
+#ifdef CONFIG_ARM64_PA_BITS_52
+	orr	\phys, \phys, \phys, lsl #PTE_ADDR_HIGH_SHIFT
+	and	\phys, \phys, GENMASK_ULL(PHYS_MASK_SHIFT - 1, PAGE_SHIFT)
 #endif
 	.endm
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 5ab8d163198f..f658aafc47df 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -159,6 +159,7 @@
 #ifdef CONFIG_ARM64_PA_BITS_52
 #define PTE_ADDR_HIGH		(_AT(pteval_t, 0xf) << 12)
 #define PTE_ADDR_MASK		(PTE_ADDR_LOW | PTE_ADDR_HIGH)
+#define PTE_ADDR_HIGH_SHIFT	36
 #else
 #define PTE_ADDR_MASK		PTE_ADDR_LOW
 #endif
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 71a1af42f0e8..daedd6172227 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -77,11 +77,11 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 static inline phys_addr_t __pte_to_phys(pte_t pte)
 {
 	return (pte_val(pte) & PTE_ADDR_LOW) |
-		((pte_val(pte) & PTE_ADDR_HIGH) << 36);
+		((pte_val(pte) & PTE_ADDR_HIGH) << PTE_ADDR_HIGH_SHIFT);
 }
 static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
 {
-	return (phys | (phys >> 36)) & PTE_ADDR_MASK;
+	return (phys | (phys >> PTE_ADDR_HIGH_SHIFT)) & PTE_ADDR_MASK;
 }
 #else
 #define __pte_to_phys(pte)	(pte_val(pte) & PTE_ADDR_MASK)
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 01/19] arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
@ 2022-11-24 12:39   ` Ard Biesheuvel
  0 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts, linux-kernel

From: Anshuman Khandual <anshuman.khandual@arm.com>

pte_to_phys() assembly definition does multiple bits field transformations
to derive physical address, embedded inside a page table entry. Unlike its
C counter part i.e __pte_to_phys(), pte_to_phys() is not very apparent. It
simplifies these operations via a new macro PTE_ADDR_HIGH_SHIFT indicating
how far the pte encoded higher address bits need to be left shifted. While
here, this also updates __pte_to_phys() and __phys_to_pte_val().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20221107141753.2938621-1-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/assembler.h     | 8 +++-----
 arch/arm64/include/asm/pgtable-hwdef.h | 1 +
 arch/arm64/include/asm/pgtable.h       | 4 ++--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index e5957a53be39..89038067ef34 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -660,12 +660,10 @@ alternative_endif
 	.endm
 
 	.macro	pte_to_phys, phys, pte
-#ifdef CONFIG_ARM64_PA_BITS_52
-	ubfiz	\phys, \pte, #(48 - 16 - 12), #16
-	bfxil	\phys, \pte, #16, #32
-	lsl	\phys, \phys, #16
-#else
 	and	\phys, \pte, #PTE_ADDR_MASK
+#ifdef CONFIG_ARM64_PA_BITS_52
+	orr	\phys, \phys, \phys, lsl #PTE_ADDR_HIGH_SHIFT
+	and	\phys, \phys, GENMASK_ULL(PHYS_MASK_SHIFT - 1, PAGE_SHIFT)
 #endif
 	.endm
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 5ab8d163198f..f658aafc47df 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -159,6 +159,7 @@
 #ifdef CONFIG_ARM64_PA_BITS_52
 #define PTE_ADDR_HIGH		(_AT(pteval_t, 0xf) << 12)
 #define PTE_ADDR_MASK		(PTE_ADDR_LOW | PTE_ADDR_HIGH)
+#define PTE_ADDR_HIGH_SHIFT	36
 #else
 #define PTE_ADDR_MASK		PTE_ADDR_LOW
 #endif
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 71a1af42f0e8..daedd6172227 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -77,11 +77,11 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 static inline phys_addr_t __pte_to_phys(pte_t pte)
 {
 	return (pte_val(pte) & PTE_ADDR_LOW) |
-		((pte_val(pte) & PTE_ADDR_HIGH) << 36);
+		((pte_val(pte) & PTE_ADDR_HIGH) << PTE_ADDR_HIGH_SHIFT);
 }
 static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
 {
-	return (phys | (phys >> 36)) & PTE_ADDR_MASK;
+	return (phys | (phys >> PTE_ADDR_HIGH_SHIFT)) & PTE_ADDR_MASK;
 }
 #else
 #define __pte_to_phys(pte)	(pte_val(pte) & PTE_ADDR_MASK)
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 02/19] arm64/mm: Add FEAT_LPA2 specific TCR_EL1.DS field
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
  2022-11-24 12:39   ` Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 03/19] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] Ard Biesheuvel
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

From: Anshuman Khandual <anshuman.khandual@arm.com>

As per ARM ARM (0487G.A) TCR_EL1.DS fields controls whether 52 bit input
and output address get supported on 4K and 16K page size configuration,
when FEAT_LPA2 is known to have been implemented. This adds TCR_DS field
definition which would be used when FEAT_LPA2 gets enabled.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index f658aafc47df..c4ad7fbb12c5 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -276,6 +276,7 @@
 #define TCR_E0PD1		(UL(1) << 56)
 #define TCR_TCMA0		(UL(1) << 57)
 #define TCR_TCMA1		(UL(1) << 58)
+#define TCR_DS			(UL(1) << 59)
 
 /*
  * TTBR.
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 03/19] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
  2022-11-24 12:39   ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 02/19] arm64/mm: Add FEAT_LPA2 specific TCR_EL1.DS field Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 04/19] arm64: kaslr: Adjust randomization range dynamically Ard Biesheuvel
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

From: Anshuman Khandual <anshuman.khandual@arm.com>

PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9..9547be423e1f 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -800,11 +800,13 @@
 
 #if defined(CONFIG_ARM64_4K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
 #elif defined(CONFIG_ARM64_16K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2		ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX	ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT		ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 04/19] arm64: kaslr: Adjust randomization range dynamically
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 03/19] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 05/19] arm64: mm: get rid of kimage_vaddr global variable Ard Biesheuvel
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Currently, we base the KASLR randomization range on a rough estimate of
the available space in the vmalloc region: the lower 1/4th has the module
region and the upper 1/4th has the fixmap, vmemmap and PCI I/O ranges,
and so we pick a random location in the remaining space in the middle.

Once we enable support for 5-level paging with 4k pages, this no longer
works: the vmemmap region, being dimensioned to cover a 52-bit linear
region, takes up so much space in the upper VA region (whose size is based
on a 48-bit VA space for compatibility with non-LVA hardware) that the
region above the vmalloc region takes up more than a quarter of the
available space.

So instead of a heuristic, let's derive the randomization range from the
actual boundaries of the various regions. Note that this requires some
tweaks to the early fixmap init logic so it can deal with upper
translation levels having already been populated by the time we reach
that function. Note, however, that the level 3 page table cannot be
shared between the fixmap and the kernel, so this needs to be taken into
account when defining the range.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/pi/kaslr_early.c | 23 +++++++++++++++-----
 arch/arm64/mm/mmu.c                | 21 +++++-------------
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index 965515f7f180..51142c4f2659 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -13,7 +13,9 @@
 #include <linux/string.h>
 
 #include <asm/archrandom.h>
+#include <asm/boot.h>
 #include <asm/memory.h>
+#include <asm/pgtable.h>
 
 #include "pi.h"
 
@@ -40,6 +42,16 @@ static u64 __init get_kaslr_seed(void *fdt, int node)
 
 u64 __init kaslr_early_init(void *fdt, int chosen)
 {
+	/*
+	 * The kernel can be mapped almost anywhere in the vmalloc region,
+	 * although we have to ensure that we don't share a level 3 table with
+	 * the fixmap, which installs its own statically allocated one (bm_pte)
+	 * and manipulates the slots by writing into the array directly.
+	 * We also have to account for the offset modulo 2 MiB resulting from
+	 * the physical placement of the image.
+	 */
+	const u64 range = (VMALLOC_END & PMD_MASK) - MODULES_END -
+			  ((u64)_end - ALIGN_DOWN((u64)_text, MIN_KIMG_ALIGN));
 	u64 seed;
 
 	if (cpuid_feature_extract_unsigned_field(arm64_sw_feature_override.val,
@@ -56,11 +68,10 @@ u64 __init kaslr_early_init(void *fdt, int chosen)
 	memstart_offset_seed = seed & U16_MAX;
 
 	/*
-	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
-	 * kernel image offset from the seed. Let's place the kernel in the
-	 * middle half of the VMALLOC area (VA_BITS_MIN - 2), and stay clear of
-	 * the lower and upper quarters to avoid colliding with other
-	 * allocations.
+	 * Multiply 'range' by a value in [0 .. U32_MAX], and shift the result
+	 * right by 32 bits, to obtain a value in the range [0 .. range). To
+	 * avoid loss of precision in the multiplication, split the right shift
+	 * in two shifts by 16 (range is 64k aligned in any case)
 	 */
-	return BIT(VA_BITS_MIN - 3) + (seed & GENMASK(VA_BITS_MIN - 3, 16));
+	return (((range >> 16) * (seed >> 32)) >> 16) & ~(MIN_KIMG_ALIGN - 1);
 }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6942255056ae..222c1154b550 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1222,23 +1222,14 @@ void __init early_fixmap_init(void)
 	pgdp = pgd_offset_k(addr);
 	p4dp = p4d_offset(pgdp, addr);
 	p4d = READ_ONCE(*p4dp);
-	if (CONFIG_PGTABLE_LEVELS > 3 &&
-	    !(p4d_none(p4d) || p4d_page_paddr(p4d) == __pa_symbol(bm_pud))) {
-		/*
-		 * We only end up here if the kernel mapping and the fixmap
-		 * share the top level pgd entry, which should only happen on
-		 * 16k/4 levels configurations.
-		 */
-		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
-		pudp = pud_offset_kimg(p4dp, addr);
-	} else {
-		if (p4d_none(p4d))
-			__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
-		pudp = fixmap_pud(addr);
-	}
+	if (p4d_none(p4d))
+		__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
+
+	pudp = pud_offset_kimg(p4dp, addr);
 	if (pud_none(READ_ONCE(*pudp)))
 		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
-	pmdp = fixmap_pmd(addr);
+
+	pmdp = pmd_offset_kimg(pudp, addr);
 	__pmd_populate(pmdp, __pa_symbol(bm_pte), PMD_TYPE_TABLE);
 
 	/*
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 05/19] arm64: mm: get rid of kimage_vaddr global variable
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 04/19] arm64: kaslr: Adjust randomization range dynamically Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 06/19] arm64: head: remove order argument from early mapping routine Ard Biesheuvel
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

We store the address of _text in kimage_vaddr, but since commit
09e3c22a86f6889d ("arm64: Use a variable to store non-global mappings
decision"), we no longer reference this variable from modules so we no
longer need to export it.

In fact, we don't need it at all so let's just get rid of it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h | 6 ++----
 arch/arm64/kernel/head.S        | 2 +-
 arch/arm64/mm/mmu.c             | 3 ---
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 78e5163836a0..a4e1d832a15a 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -182,6 +182,7 @@
 #include <linux/types.h>
 #include <asm/boot.h>
 #include <asm/bug.h>
+#include <asm/sections.h>
 
 #if VA_BITS > 48
 extern u64			vabits_actual;
@@ -193,15 +194,12 @@ extern s64			memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
 
-/* the virtual base of the kernel image */
-extern u64			kimage_vaddr;
-
 /* the offset between the kernel virtual and physical mappings */
 extern u64			kimage_voffset;
 
 static inline unsigned long kaslr_offset(void)
 {
-	return kimage_vaddr - KIMAGE_VADDR;
+	return (u64)&_text - KIMAGE_VADDR;
 }
 
 static inline bool kaslr_enabled(void)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 3ea5bf0a6e17..3b3c5e8e84af 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -399,7 +399,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
 
-	ldr_l	x4, kimage_vaddr		// Save the offset between
+	adrp	x4, _text			// Save the offset between
 	sub	x4, x4, x0			// the kernel virtual and
 	str_l	x4, kimage_voffset, x5		// physical mappings
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 222c1154b550..d083ac6a0764 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -50,9 +50,6 @@ u64 vabits_actual __ro_after_init = VA_BITS_MIN;
 EXPORT_SYMBOL(vabits_actual);
 #endif
 
-u64 kimage_vaddr __ro_after_init = (u64)&_text;
-EXPORT_SYMBOL(kimage_vaddr);
-
 u64 kimage_voffset __ro_after_init;
 EXPORT_SYMBOL(kimage_voffset);
 
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 06/19] arm64: head: remove order argument from early mapping routine
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 05/19] arm64: mm: get rid of kimage_vaddr global variable Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 07/19] arm64: mm: Handle LVA support as a CPU feature Ard Biesheuvel
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

When creating mappings in the upper region of the address space, it is
important to know the order of the table being created, i.e., the number
of bits that are being translated at the level in question. Bits beyond
that number do not contribute to the virtual address, and need to be
masked out.

Now that we no longer use the asm kernel page creation code for mappings
in the upper region, those bits are guaranteed to be zero anyway, so we
don't have to account for them in the masking.

This means we can simply use the maximum order for the all tables
including the root level table. Doing so will also allow us to
transparently use the same routines creating the initial ID map covering
4 levels when the VA space is configured for 5.

Note that the root level tables are always statically allocated as full
pages regardless of how many VA bits they translate.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S | 26 +++++++++-----------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 3b3c5e8e84af..a37525a5ee34 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -158,7 +158,6 @@ SYM_CODE_END(preserve_boot_args)
  *	vstart:	virtual address of start of range
  *	vend:	virtual address of end of range - we map [vstart, vend]
  *	shift:	shift used to transform virtual address into index
- *	order:  #imm 2log(number of entries in page table)
  *	istart:	index in table corresponding to vstart
  *	iend:	index in table corresponding to vend
  *	count:	On entry: how many extra entries were required in previous level, scales
@@ -168,10 +167,10 @@ SYM_CODE_END(preserve_boot_args)
  * Preserves:	vstart, vend
  * Returns:	istart, iend, count
  */
-	.macro compute_indices, vstart, vend, shift, order, istart, iend, count
-	ubfx	\istart, \vstart, \shift, \order
-	ubfx	\iend, \vend, \shift, \order
-	add	\iend, \iend, \count, lsl \order
+	.macro compute_indices, vstart, vend, shift, istart, iend, count
+	ubfx	\istart, \vstart, \shift, #PAGE_SHIFT - 3
+	ubfx	\iend, \vend, \shift, #PAGE_SHIFT - 3
+	add	\iend, \iend, \count, lsl #PAGE_SHIFT - 3
 	sub	\count, \iend, \istart
 	.endm
 
@@ -186,7 +185,6 @@ SYM_CODE_END(preserve_boot_args)
  *	vend:	virtual address of end of range - we map [vstart, vend - 1]
  *	flags:	flags to use to map last level entries
  *	phys:	physical address corresponding to vstart - physical memory is contiguous
- *	order:  #imm 2log(number of entries in PGD table)
  *
  * If extra_shift is set, an extra level will be populated if the end address does
  * not fit in 'extra_shift' bits. This assumes vend is in the TTBR0 range.
@@ -195,7 +193,7 @@ SYM_CODE_END(preserve_boot_args)
  * Preserves:	vstart, flags
  * Corrupts:	tbl, rtbl, vend, istart, iend, tmp, count, sv
  */
-	.macro map_memory, tbl, rtbl, vstart, vend, flags, phys, order, istart, iend, tmp, count, sv, extra_shift
+	.macro map_memory, tbl, rtbl, vstart, vend, flags, phys, istart, iend, tmp, count, sv, extra_shift
 	sub \vend, \vend, #1
 	add \rtbl, \tbl, #PAGE_SIZE
 	mov \count, #0
@@ -203,32 +201,32 @@ SYM_CODE_END(preserve_boot_args)
 	.ifnb	\extra_shift
 	tst	\vend, #~((1 << (\extra_shift)) - 1)
 	b.eq	.L_\@
-	compute_indices \vstart, \vend, #\extra_shift, #(PAGE_SHIFT - 3), \istart, \iend, \count
+	compute_indices \vstart, \vend, #\extra_shift, \istart, \iend, \count
 	mov \sv, \rtbl
 	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
 	mov \tbl, \sv
 	.endif
 .L_\@:
-	compute_indices \vstart, \vend, #PGDIR_SHIFT, #\order, \istart, \iend, \count
+	compute_indices \vstart, \vend, #PGDIR_SHIFT, \istart, \iend, \count
 	mov \sv, \rtbl
 	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
 	mov \tbl, \sv
 
 #if INIT_IDMAP_TABLE_LEVELS > 3
-	compute_indices \vstart, \vend, #PUD_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
+	compute_indices \vstart, \vend, #PUD_SHIFT, \istart, \iend, \count
 	mov \sv, \rtbl
 	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
 	mov \tbl, \sv
 #endif
 
 #if INIT_IDMAP_TABLE_LEVELS > 2
-	compute_indices \vstart, \vend, #INIT_IDMAP_TABLE_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
+	compute_indices \vstart, \vend, #INIT_IDMAP_TABLE_SHIFT, \istart, \iend, \count
 	mov \sv, \rtbl
 	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
 	mov \tbl, \sv
 #endif
 
-	compute_indices \vstart, \vend, #INIT_IDMAP_BLOCK_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
+	compute_indices \vstart, \vend, #INIT_IDMAP_BLOCK_SHIFT, \istart, \iend, \count
 	bic \rtbl, \phys, #INIT_IDMAP_BLOCK_SIZE - 1
 	populate_entries \tbl, \rtbl, \istart, \iend, \flags, #INIT_IDMAP_BLOCK_SIZE, \tmp
 	.endm
@@ -294,7 +292,6 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	 *   requires more than 47 or 48 bits, respectively.
 	 */
 #if (VA_BITS < 48)
-#define IDMAP_PGD_ORDER	(VA_BITS - PGDIR_SHIFT)
 #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
 
 	/*
@@ -308,7 +305,6 @@ SYM_FUNC_START_LOCAL(create_idmap)
 #error "Mismatch between VA_BITS and page size/number of translation levels"
 #endif
 #else
-#define IDMAP_PGD_ORDER	(PHYS_MASK_SHIFT - PGDIR_SHIFT)
 #define EXTRA_SHIFT
 	/*
 	 * If VA_BITS == 48, we don't have to configure an additional
@@ -320,7 +316,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	adrp	x6, _end + MAX_FDT_SIZE + INIT_IDMAP_BLOCK_SIZE
 	mov	x7, INIT_IDMAP_RX_MMUFLAGS
 
-	map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
+	map_memory x0, x1, x3, x6, x7, x3, x10, x11, x12, x13, x14, EXTRA_SHIFT
 
 	/* Remap BSS and the kernel page tables r/w in the ID map */
 	adrp	x1, _text
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 07/19] arm64: mm: Handle LVA support as a CPU feature
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 06/19] arm64: head: remove order argument from early mapping routine Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-28 14:54   ` Ryan Roberts
  2022-11-24 12:39 ` [PATCH v2 08/19] arm64: mm: Deal with potential ID map extension if VA_BITS > VA_BITS_MIN Ard Biesheuvel
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.

Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and suspend from resume.

On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.

Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h   | 13 ++++++++++-
 arch/arm64/kernel/cpufeature.c    | 13 +++++++++++
 arch/arm64/kernel/head.S          | 24 +++-----------------
 arch/arm64/kernel/pi/map_kernel.c | 12 ++++++++++
 arch/arm64/kernel/sleep.S         |  3 ---
 arch/arm64/mm/mmu.c               |  5 ----
 arch/arm64/mm/proc.S              | 17 +++++++-------
 arch/arm64/tools/cpucaps          |  1 +
 8 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index a4e1d832a15a..b3826ff2e52b 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -183,9 +183,20 @@
 #include <asm/boot.h>
 #include <asm/bug.h>
 #include <asm/sections.h>
+#include <asm/sysreg.h>
+
+static inline u64 __pure read_tcr(void)
+{
+	u64  tcr;
+
+	// read_sysreg() uses asm volatile, so avoid it here
+	asm("mrs %0, tcr_el1" : "=r"(tcr));
+	return tcr;
+}
 
 #if VA_BITS > 48
-extern u64			vabits_actual;
+// For reasons of #include hell, we can't use TCR_T1SZ_OFFSET/TCR_T1SZ_MASK here
+#define vabits_actual		(64 - ((read_tcr() >> 16) & 63))
 #else
 #define vabits_actual		((u64)VA_BITS)
 #endif
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index eca9df123a8b..b44aece5024c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2654,6 +2654,19 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		.cpu_enable = cpu_trap_el0_impdef,
 	},
+#ifdef CONFIG_ARM64_VA_BITS_52
+	{
+		.desc = "52-bit Virtual Addressing (LVA)",
+		.capability = ARM64_HAS_LVA,
+		.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_width = 4,
+		.field_pos = ID_AA64MMFR2_EL1_VARange_SHIFT,
+		.matches = has_cpuid_feature,
+		.min_field_value = ID_AA64MMFR2_EL1_VARange_52,
+	},
+#endif
 	{},
 };
 
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index a37525a5ee34..d423ff78474e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -80,7 +80,6 @@
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
-	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 *  x28        create_idmap()                           callee preserved temp register
 	 */
 SYM_CODE_START(primary_entry)
@@ -95,14 +94,6 @@ SYM_CODE_START(primary_entry)
 	 * On return, the CPU will be ready for the MMU to be turned on and
 	 * the TCR will have been set.
 	 */
-#if VA_BITS > 48
-	mrs_s	x0, SYS_ID_AA64MMFR2_EL1
-	tst	x0, #0xf << ID_AA64MMFR2_EL1_VARange_SHIFT
-	mov	x0, #VA_BITS
-	mov	x25, #VA_BITS_MIN
-	csel	x25, x25, x0, eq
-	mov	x0, x25
-#endif
 	bl	__cpu_setup			// initialise processor
 	b	__primary_switch
 SYM_CODE_END(primary_entry)
@@ -402,11 +393,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	mov	x0, x20
 	bl	set_cpu_boot_mode_flag
 
-#if VA_BITS > 48
-	adr_l	x8, vabits_actual		// Set this early so KASAN early init
-	str	x25, [x8]			// ... observes the correct value
-	dc	civac, x8			// Make visible to booting secondaries
-#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
@@ -521,9 +507,6 @@ SYM_FUNC_START_LOCAL(secondary_startup)
 	mov	x20, x0				// preserve boot mode
 	bl	finalise_el2
 	bl	__cpu_secondary_check52bitva
-#if VA_BITS > 48
-	ldr_l	x0, vabits_actual
-#endif
 	bl	__cpu_setup			// initialise processor
 	adrp	x1, swapper_pg_dir
 	adrp	x2, idmap_pg_dir
@@ -624,10 +607,9 @@ SYM_FUNC_END(__enable_mmu)
 
 SYM_FUNC_START(__cpu_secondary_check52bitva)
 #if VA_BITS > 48
-	ldr_l	x0, vabits_actual
-	cmp	x0, #52
-	b.ne	2f
-
+alternative_if_not ARM64_HAS_LVA
+	ret
+alternative_else_nop_endif
 	mrs_s	x0, SYS_ID_AA64MMFR2_EL1
 	and	x0, x0, #(0xf << ID_AA64MMFR2_EL1_VARange_SHIFT)
 	cbnz	x0, 2f
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index 36537d1b837b..7dd6daee0ffd 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -122,6 +122,15 @@ static bool __init arm64_early_this_cpu_has_e0pd(void)
 						    ID_AA64MMFR2_EL1_E0PD_SHIFT);
 }
 
+static bool __init arm64_early_this_cpu_has_lva(void)
+{
+	u64 mmfr2;
+
+	mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
+	return cpuid_feature_extract_unsigned_field(mmfr2,
+						    ID_AA64MMFR2_EL1_VARange_SHIFT);
+}
+
 static bool __init arm64_early_this_cpu_has_pac(void)
 {
 	u64 isar1, isar2;
@@ -284,6 +293,9 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 			arm64_use_ng_mappings = true;
 	}
 
+	if (VA_BITS == 52 && arm64_early_this_cpu_has_lva())
+		sysreg_clear_set(tcr_el1, TCR_T1SZ_MASK, TCR_T1SZ(VA_BITS));
+
 	va_base = KIMAGE_VADDR + kaslr_offset;
 	map_kernel(kaslr_offset, va_base - pa_base, root_level);
 }
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 97c9de57725d..617f78ad43a1 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -101,9 +101,6 @@ SYM_FUNC_END(__cpu_suspend_enter)
 SYM_CODE_START(cpu_resume)
 	bl	init_kernel_el
 	bl	finalise_el2
-#if VA_BITS > 48
-	ldr_l	x0, vabits_actual
-#endif
 	bl	__cpu_setup
 	/* enable the MMU early - so we can access sleep_save_stash by va */
 	adrp	x1, swapper_pg_dir
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d083ac6a0764..b0c702e5bf66 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -45,11 +45,6 @@
 
 int idmap_t0sz __ro_after_init;
 
-#if VA_BITS > 48
-u64 vabits_actual __ro_after_init = VA_BITS_MIN;
-EXPORT_SYMBOL(vabits_actual);
-#endif
-
 u64 kimage_voffset __ro_after_init;
 EXPORT_SYMBOL(kimage_voffset);
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index ae2caab15ac8..e14648ca820b 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -400,8 +400,6 @@ SYM_FUNC_END(idmap_kpti_install_ng_mappings)
  *
  *	Initialise the processor for turning the MMU on.
  *
- * Input:
- *	x0 - actual number of VA bits (ignored unless VA_BITS > 48)
  * Output:
  *	Return in x0 the value of the SCTLR_EL1 register.
  */
@@ -426,20 +424,21 @@ SYM_FUNC_START(__cpu_setup)
 	mair	.req	x17
 	tcr	.req	x16
 	mov_q	mair, MAIR_EL1_SET
-	mov_q	tcr, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
+	mov_q	tcr, TCR_TxSZ(VA_BITS_MIN) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
 			TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
 			TCR_TBI0 | TCR_A1 | TCR_KASAN_SW_FLAGS | TCR_MTE_FLAGS
 
 	tcr_clear_errata_bits tcr, x9, x5
 
-#ifdef CONFIG_ARM64_VA_BITS_52
-	sub		x9, xzr, x0
-	add		x9, x9, #64
-	tcr_set_t1sz	tcr, x9
-#else
+#if VA_BITS < 48
 	idmap_get_t0sz	x9
-#endif
 	tcr_set_t0sz	tcr, x9
+#elif VA_BITS > VA_BITS_MIN
+	mov		x9, #64 - VA_BITS
+alternative_if ARM64_HAS_LVA
+	tcr_set_t1sz	tcr, x9
+alternative_else_nop_endif
+#endif
 
 	/*
 	 * Set the IPS bits in TCR_EL1.
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index f1c0347ec31a..ec650a2cf433 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -30,6 +30,7 @@ HAS_GENERIC_AUTH_IMP_DEF
 HAS_IRQ_PRIO_MASKING
 HAS_LDAPR
 HAS_LSE_ATOMICS
+HAS_LVA
 HAS_NO_FPSIMD
 HAS_NO_HW_PREFETCH
 HAS_PAN
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 08/19] arm64: mm: Deal with potential ID map extension if VA_BITS > VA_BITS_MIN
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 07/19] arm64: mm: Handle LVA support as a CPU feature Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 09/19] arm64: mm: Add feature override support for LVA Ard Biesheuvel
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

On 16k pages with LPA2, we will fall back to 47 bits of VA space in case
LPA2 is not implemented. Since we support loading the kernel anywhere in
the 48-bit addressable PA space, this means we may have to extend the ID
map like we normally do in such cases, even when VA_BITS >= 48.

Since VA_BITS_MIN will equal 47 in that case, use that symbolic constant
instead to determine whether ID map extension is required. Also, use
vabits_actual to determine whether create_idmap() needs to add an extra
level as well, as this is never needed if LPA2 is enabled at runtime.

Note that VA_BITS, VA_BITS_MIN and vabits_actual all collapse into the
same compile time constant on configurations that support ID map
extension currently, so there this change should have no effect
whatsoever.

Note that the use of PGDIR_SHIFT in the calculation of EXTRA_SHIFT is no
longer appropriate, as it is derived from VA_BITS not VA_BITS_MIN. So
rephrase the check whether VA_BITS_MIN is consistent with the number of
levels.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h |  2 +-
 arch/arm64/kernel/head.S                | 40 ++++++++++----------
 arch/arm64/mm/mmu.c                     |  4 +-
 arch/arm64/mm/proc.S                    |  5 ++-
 4 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 4278cd088347..faa11e8b4a0e 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -73,7 +73,7 @@
 #define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR) + EARLY_SEGMENT_EXTRA_PAGES))
 
 /* the initial ID map may need two extra pages if it needs to be extended */
-#if VA_BITS < 48
+#if VA_BITS_MIN < 48
 #define INIT_IDMAP_DIR_SIZE	((INIT_IDMAP_DIR_PAGES + 2) * PAGE_SIZE)
 #else
 #define INIT_IDMAP_DIR_SIZE	(INIT_IDMAP_DIR_PAGES * PAGE_SIZE)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d423ff78474e..94de42dfe97d 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -266,40 +266,40 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	 * space for ordinary kernel and user space mappings.
 	 *
 	 * There are three cases to consider here:
-	 * - 39 <= VA_BITS < 48, and the ID map needs up to 48 VA bits to cover
-	 *   the placement of the image. In this case, we configure one extra
-	 *   level of translation on the fly for the ID map only. (This case
-	 *   also covers 42-bit VA/52-bit PA on 64k pages).
+	 * - 39 <= VA_BITS_MIN < 48, and the ID map needs up to 48 VA bits to
+	 *   cover the placement of the image. In this case, we configure one
+	 *   extra level of translation on the fly for the ID map only. (This
+	 *   case also covers 42-bit VA/52-bit PA on 64k pages).
 	 *
-	 * - VA_BITS == 48, and the ID map needs more than 48 VA bits. This can
-	 *   only happen when using 64k pages, in which case we need to extend
-	 *   the root level table rather than add a level. Note that we can
-	 *   treat this case as 'always extended' as long as we take care not
-	 *   to program an unsupported T0SZ value into the TCR register.
+	 * - VA_BITS_MIN == 48, and the ID map needs more than 48 VA bits. This
+	 *   can only happen when using 64k pages, in which case we need to
+	 *   extend the root level table rather than add a level. Note that we
+	 *   can treat this case as 'always extended' as long as we take care
+	 *   not to program an unsupported T0SZ value into the TCR register.
 	 *
 	 * - Combinations that would require two additional levels of
 	 *   translation are not supported, e.g., VA_BITS==36 on 16k pages, or
 	 *   VA_BITS==39/4k pages with 5-level paging, where the input address
 	 *   requires more than 47 or 48 bits, respectively.
 	 */
-#if (VA_BITS < 48)
-#define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
+#if VA_BITS_MIN < 48
+#define EXTRA_SHIFT	VA_BITS_MIN
 
 	/*
-	 * If VA_BITS < 48, we have to configure an additional table level.
-	 * First, we have to verify our assumption that the current value of
-	 * VA_BITS was chosen such that all translation levels are fully
-	 * utilised, and that lowering T0SZ will always result in an additional
-	 * translation level to be configured.
+	 * If VA_BITS_MIN < 48, we may have to configure an additional table
+	 * level.  First, we have to verify our assumption that the current
+	 * value of VA_BITS_MIN was chosen such that all translation levels are
+	 * fully utilised, and that lowering T0SZ will always result in an
+	 * additional translation level to be configured.
 	 */
-#if VA_BITS != EXTRA_SHIFT
-#error "Mismatch between VA_BITS and page size/number of translation levels"
+#if ((VA_BITS_MIN - PAGE_SHIFT) % (PAGE_SHIFT - 3)) != 0
+#error "Mismatch between VA_BITS_MIN and page size/number of translation levels"
 #endif
 #else
 #define EXTRA_SHIFT
 	/*
-	 * If VA_BITS == 48, we don't have to configure an additional
-	 * translation level, but the top-level table has more entries.
+	 * If VA_BITS_MIN == 48, we don't have to configure an additional
+	 * translation level, but the top-level table may have more entries.
 	 */
 #endif
 	adrp	x0, init_idmap_pg_dir
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b0c702e5bf66..bcf617f956cb 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -702,9 +702,9 @@ static void __init create_idmap(void)
 	u64 pgd_phys;
 
 	/* check if we need an additional level of translation */
-	if (VA_BITS < 48 && idmap_t0sz < (64 - VA_BITS_MIN)) {
+	if (vabits_actual < 48 && idmap_t0sz < (64 - VA_BITS_MIN)) {
 		pgd_phys = early_pgtable_alloc(PAGE_SHIFT);
-		set_pgd(&idmap_pg_dir[start >> VA_BITS],
+		set_pgd(&idmap_pg_dir[start >> VA_BITS_MIN],
 			__pgd(pgd_phys | P4D_TYPE_TABLE));
 		pgd = __va(pgd_phys);
 	}
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index e14648ca820b..873adaccb12f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -430,10 +430,11 @@ SYM_FUNC_START(__cpu_setup)
 
 	tcr_clear_errata_bits tcr, x9, x5
 
-#if VA_BITS < 48
+#if VA_BITS_MIN < 48
 	idmap_get_t0sz	x9
 	tcr_set_t0sz	tcr, x9
-#elif VA_BITS > VA_BITS_MIN
+#endif
+#if VA_BITS > VA_BITS_MIN
 	mov		x9, #64 - VA_BITS
 alternative_if ARM64_HAS_LVA
 	tcr_set_t1sz	tcr, x9
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 09/19] arm64: mm: Add feature override support for LVA
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 08/19] arm64: mm: Deal with potential ID map extension if VA_BITS > VA_BITS_MIN Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 10/19] arm64: mm: Wire up TCR.DS bit to PTE shareability fields Ard Biesheuvel
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Add support for overriding the VARange field of the MMFR2 CPU ID
register. This permits the associated LVA feature to be overridden early
enough for the boot code that creates the kernel mapping to take it into
account.

Given that LPA2 implies LVA, disabling the latter should disable the
former as well. So override the ID_AA64MMFR0.TGran field of the current
page size as well if it advertises support for 52-bit virtual
addressing.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h    | 17 +++++++-----
 arch/arm64/include/asm/cpufeature.h   |  2 ++
 arch/arm64/kernel/cpufeature.c        |  8 ++++--
 arch/arm64/kernel/image-vars.h        |  2 ++
 arch/arm64/kernel/pi/idreg-override.c | 29 +++++++++++++++++++-
 arch/arm64/kernel/pi/map_kernel.c     |  2 ++
 6 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 89038067ef34..4cb84dc6e220 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -604,18 +604,21 @@ alternative_endif
 	.endm
 
 /*
- * Offset ttbr1 to allow for 48-bit kernel VAs set with 52-bit PTRS_PER_PGD.
+ * If the kernel is built for 52-bit virtual addressing but the hardware only
+ * supports 48 bits, we cannot program the pgdir address into TTBR1 directly,
+ * but we have to add an offset so that the TTBR1 address corresponds with the
+ * pgdir entry that covers the lowest 48-bit addressable VA.
+ *
  * orr is used as it can cover the immediate value (and is idempotent).
- * In future this may be nop'ed out when dealing with 52-bit kernel VAs.
  * 	ttbr: Value of ttbr to set, modified.
  */
 	.macro	offset_ttbr1, ttbr, tmp
 #ifdef CONFIG_ARM64_VA_BITS_52
-	mrs_s	\tmp, SYS_ID_AA64MMFR2_EL1
-	and	\tmp, \tmp, #(0xf << ID_AA64MMFR2_EL1_VARange_SHIFT)
-	cbnz	\tmp, .Lskipoffs_\@
-	orr	\ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
-.Lskipoffs_\@ :
+	mrs	\tmp, tcr_el1
+	and	\tmp, \tmp, #TCR_T1SZ_MASK
+	cmp	\tmp, #TCR_T1SZ(VA_BITS_MIN)
+	orr	\tmp, \ttbr, #TTBR1_BADDR_4852_OFFSET
+	csel	\ttbr, \tmp, \ttbr, eq
 #endif
 	.endm
 
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index b8c7a2d13bbe..e1d194350d72 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -909,7 +909,9 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
 
 struct arm64_ftr_reg *get_arm64_ftr_reg(u32 sys_id);
 
+extern struct arm64_ftr_override id_aa64mmfr0_override;
 extern struct arm64_ftr_override id_aa64mmfr1_override;
+extern struct arm64_ftr_override id_aa64mmfr2_override;
 extern struct arm64_ftr_override id_aa64pfr0_override;
 extern struct arm64_ftr_override id_aa64pfr1_override;
 extern struct arm64_ftr_override id_aa64zfr0_override;
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index b44aece5024c..2ae42db621fe 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -636,7 +636,9 @@ static const struct arm64_ftr_bits ftr_raz[] = {
 #define ARM64_FTR_REG(id, table)		\
 	__ARM64_FTR_REG_OVERRIDE(#id, id, table, &no_override)
 
+struct arm64_ftr_override id_aa64mmfr0_override;
 struct arm64_ftr_override id_aa64mmfr1_override;
+struct arm64_ftr_override id_aa64mmfr2_override;
 struct arm64_ftr_override id_aa64pfr0_override;
 struct arm64_ftr_override id_aa64pfr1_override;
 struct arm64_ftr_override id_aa64zfr0_override;
@@ -700,10 +702,12 @@ static const struct __ftr_reg_entry {
 			       &id_aa64isar2_override),
 
 	/* Op1 = 0, CRn = 0, CRm = 7 */
-	ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0),
+	ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0,
+			       &id_aa64mmfr0_override),
 	ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1,
 			       &id_aa64mmfr1_override),
-	ARM64_FTR_REG(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2),
+	ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2,
+			       &id_aa64mmfr2_override),
 
 	/* Op1 = 0, CRn = 1, CRm = 2 */
 	ARM64_FTR_REG(SYS_ZCR_EL1, ftr_zcr),
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 14787614b2e7..82bafa1f869c 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -45,7 +45,9 @@ PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 
 PROVIDE(__pi_id_aa64isar1_override	= id_aa64isar1_override);
 PROVIDE(__pi_id_aa64isar2_override	= id_aa64isar2_override);
+PROVIDE(__pi_id_aa64mmfr0_override	= id_aa64mmfr0_override);
 PROVIDE(__pi_id_aa64mmfr1_override	= id_aa64mmfr1_override);
+PROVIDE(__pi_id_aa64mmfr2_override	= id_aa64mmfr2_override);
 PROVIDE(__pi_id_aa64pfr0_override	= id_aa64pfr0_override);
 PROVIDE(__pi_id_aa64pfr1_override	= id_aa64pfr1_override);
 PROVIDE(__pi_id_aa64smfr0_override	= id_aa64smfr0_override);
diff --git a/arch/arm64/kernel/pi/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
index d0ce3dc4e07a..3de1fe1e2559 100644
--- a/arch/arm64/kernel/pi/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -138,12 +138,38 @@ DEFINE_OVERRIDE(6, sw_features, "arm64_sw", arm64_sw_feature_override,
 		FIELD("rodataoff", ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF),
 		{});
 
+DEFINE_OVERRIDE(7, mmfr2, "id_aa64mmfr2", id_aa64mmfr2_override,
+		FIELD("varange", ID_AA64MMFR2_EL1_VARange_SHIFT),
+		{});
+
+#ifdef ID_AA64MMFR0_EL1_TGRAN_LPA2
+asmlinkage bool __init mmfr2_varange_filter(u64 val)
+{
+	u64 mmfr0;
+	int feat;
+
+	if (val)
+		return false;
+
+	mmfr0 = read_sysreg(id_aa64mmfr0_el1);
+	feat = cpuid_feature_extract_signed_field(mmfr0,
+						  ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+	if (feat >= ID_AA64MMFR0_EL1_TGRAN_LPA2) {
+		id_aa64mmfr0_override.val |=
+			(ID_AA64MMFR0_EL1_TGRAN_LPA2 - 1) << ID_AA64MMFR0_EL1_TGRAN_SHIFT;
+		id_aa64mmfr0_override.mask |= 0xf << ID_AA64MMFR0_EL1_TGRAN_SHIFT;
+	}
+	return true;
+}
+DEFINE_OVERRIDE_FILTER(mmfr2, 0, mmfr2_varange_filter);
+#endif
+
 /*
  * regs[] is populated by R_AARCH64_PREL32 directives invisible to the compiler
  * so it cannot be static or const, or the compiler might try to use constant
  * propagation on the values.
  */
-asmlinkage s32 regs[7] __initdata = { [0 ... ARRAY_SIZE(regs) - 1] = S32_MAX };
+asmlinkage s32 regs[8] __initdata = { [0 ... ARRAY_SIZE(regs) - 1] = S32_MAX };
 
 static struct arm64_ftr_override * __init reg_override(int i)
 {
@@ -168,6 +194,7 @@ static const struct {
 	{ "arm64.nomte",		"id_aa64pfr1.mte=0" },
 	{ "nokaslr",			"arm64_sw.nokaslr=1" },
 	{ "rodata=off",			"arm64_sw.rodataoff=1" },
+	{ "arm64.nolva",		"id_aa64mmfr2.varange=0" },
 };
 
 static int __init find_field(const char *cmdline, char *opt, int len,
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index 7dd6daee0ffd..a9472ab8d901 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -127,6 +127,8 @@ static bool __init arm64_early_this_cpu_has_lva(void)
 	u64 mmfr2;
 
 	mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
+	mmfr2 &= ~id_aa64mmfr2_override.mask;
+	mmfr2 |= id_aa64mmfr2_override.val;
 	return cpuid_feature_extract_unsigned_field(mmfr2,
 						    ID_AA64MMFR2_EL1_VARange_SHIFT);
 }
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 10/19] arm64: mm: Wire up TCR.DS bit to PTE shareability fields
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 09/19] arm64: mm: Add feature override support for LVA Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 11/19] arm64: mm: Add LPA2 support to phys<->pte conversion routines Ard Biesheuvel
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

When LPA2 is enabled, bits 8 and 9 of page and block descriptors become
part of the output address instead of carrying shareability attributes
for the region in question.

So avoid setting these bits if TCR.DS == 1, which means LPA2 is enabled.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig                    |  4 ++++
 arch/arm64/include/asm/pgtable-prot.h | 18 ++++++++++++++++--
 arch/arm64/mm/mmap.c                  |  4 ++++
 arch/arm64/mm/proc.S                  |  8 ++++++++
 4 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 170832f31eff..6d299c6c0a56 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1264,6 +1264,10 @@ config ARM64_PA_BITS
 	default 48 if ARM64_PA_BITS_48
 	default 52 if ARM64_PA_BITS_52
 
+config ARM64_LPA2
+	def_bool y
+	depends on ARM64_PA_BITS_52 && !ARM64_64K_PAGES
+
 choice
 	prompt "Endianness"
 	default CPU_LITTLE_ENDIAN
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 9b165117a454..269584d5a2c0 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -40,6 +40,20 @@ extern bool arm64_use_ng_mappings;
 #define PTE_MAYBE_NG		(arm64_use_ng_mappings ? PTE_NG : 0)
 #define PMD_MAYBE_NG		(arm64_use_ng_mappings ? PMD_SECT_NG : 0)
 
+#ifndef CONFIG_ARM64_LPA2
+#define lpa2_is_enabled()	false
+#define PTE_MAYBE_SHARED	PTE_SHARED
+#define PMD_MAYBE_SHARED	PMD_SECT_S
+#else
+static inline bool __pure lpa2_is_enabled(void)
+{
+	return read_tcr() & TCR_DS;
+}
+
+#define PTE_MAYBE_SHARED	(lpa2_is_enabled() ? 0 : PTE_SHARED)
+#define PMD_MAYBE_SHARED	(lpa2_is_enabled() ? 0 : PMD_SECT_S)
+#endif
+
 /*
  * If we have userspace only BTI we don't want to mark kernel pages
  * guarded even if the system does support BTI.
@@ -50,8 +64,8 @@ extern bool arm64_use_ng_mappings;
 #define PTE_MAYBE_GP		0
 #endif
 
-#define PROT_DEFAULT		(_PROT_DEFAULT | PTE_MAYBE_NG)
-#define PROT_SECT_DEFAULT	(_PROT_SECT_DEFAULT | PMD_MAYBE_NG)
+#define PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_MAYBE_NG | PTE_MAYBE_SHARED | PTE_AF)
+#define PROT_SECT_DEFAULT	(PMD_TYPE_SECT | PMD_MAYBE_NG | PMD_MAYBE_SHARED | PMD_SECT_AF)
 
 #define PROT_DEVICE_nGnRnE	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRnE))
 #define PROT_DEVICE_nGnRE	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRE))
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 8f5b7ce857ed..adcf547f74eb 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -73,6 +73,10 @@ static int __init adjust_protection_map(void)
 		protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY;
 	}
 
+	if (lpa2_is_enabled())
+		for (int i = 0; i < ARRAY_SIZE(protection_map); i++)
+			pgprot_val(protection_map[i]) &= ~PTE_SHARED;
+
 	return 0;
 }
 arch_initcall(adjust_protection_map);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 873adaccb12f..0f576d72b847 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -297,6 +297,14 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 	mov	temp_pte, x5
 	mov	pte_flags, #KPTI_NG_PTE_FLAGS
 
+#ifdef CONFIG_ARM64_LPA2
+	/* Omit PTE_SHARED from pte_flags if LPA2 is enabled */
+	mrs	x5, tcr_el1
+	tst	x5, #TCR_DS
+	mov	x5, #KPTI_NG_PTE_FLAGS & ~PTE_SHARED
+	csel	pte_flags, x5, pte_flags, ne
+#endif
+
 	/* Everybody is enjoying the idmap, so we can rewrite swapper. */
 	/* PGD */
 	adrp		cur_pgdp, swapper_pg_dir
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 11/19] arm64: mm: Add LPA2 support to phys<->pte conversion routines
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 10/19] arm64: mm: Wire up TCR.DS bit to PTE shareability fields Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging Ard Biesheuvel
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

In preparation for enabling LPA2 support, introduce the mask values for
converting between physical addresses and their representations in a
page table descriptor.

While at it, move the pte_to_phys asm macro into its only user, so that
we can freely modify it to use its input value register as a temp
register.

For LPA2, the PTE_ADDR_MASK contains two non-adjacent sequences of zero
bits, which means it no longer fits into the immediate field of an
ordinary ALU instruction. So let's redefine it to include the bits in
between as well, and only use it when converting from physical address
to PTE representation, where the distinction does not matter. Also
update the name accordingly to emphasize this.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h     | 16 ++--------------
 arch/arm64/include/asm/pgtable-hwdef.h | 10 +++++++---
 arch/arm64/include/asm/pgtable.h       |  5 +++--
 arch/arm64/mm/proc.S                   |  8 ++++++++
 4 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 4cb84dc6e220..786bf62826a8 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -651,25 +651,13 @@ alternative_endif
 
 	.macro	phys_to_pte, pte, phys
 #ifdef CONFIG_ARM64_PA_BITS_52
-	/*
-	 * We assume \phys is 64K aligned and this is guaranteed by only
-	 * supporting this configuration with 64K pages.
-	 */
-	orr	\pte, \phys, \phys, lsr #36
-	and	\pte, \pte, #PTE_ADDR_MASK
+	orr	\pte, \phys, \phys, lsr #PTE_ADDR_HIGH_SHIFT
+	and	\pte, \pte, #PHYS_TO_PTE_ADDR_MASK
 #else
 	mov	\pte, \phys
 #endif
 	.endm
 
-	.macro	pte_to_phys, phys, pte
-	and	\phys, \pte, #PTE_ADDR_MASK
-#ifdef CONFIG_ARM64_PA_BITS_52
-	orr	\phys, \phys, \phys, lsl #PTE_ADDR_HIGH_SHIFT
-	and	\phys, \phys, GENMASK_ULL(PHYS_MASK_SHIFT - 1, PAGE_SHIFT)
-#endif
-	.endm
-
 /*
  * tcr_clear_errata_bits - Clear TCR bits that trigger an errata on this CPU.
  */
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index c4ad7fbb12c5..b91fe4781b06 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -155,13 +155,17 @@
 #define PTE_PXN			(_AT(pteval_t, 1) << 53)	/* Privileged XN */
 #define PTE_UXN			(_AT(pteval_t, 1) << 54)	/* User XN */
 
-#define PTE_ADDR_LOW		(((_AT(pteval_t, 1) << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
+#define PTE_ADDR_LOW		(((_AT(pteval_t, 1) << (50 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
 #ifdef CONFIG_ARM64_PA_BITS_52
+#ifdef CONFIG_ARM64_64K_PAGES
 #define PTE_ADDR_HIGH		(_AT(pteval_t, 0xf) << 12)
-#define PTE_ADDR_MASK		(PTE_ADDR_LOW | PTE_ADDR_HIGH)
 #define PTE_ADDR_HIGH_SHIFT	36
+#define PHYS_TO_PTE_ADDR_MASK	(PTE_ADDR_LOW | PTE_ADDR_HIGH)
 #else
-#define PTE_ADDR_MASK		PTE_ADDR_LOW
+#define PTE_ADDR_HIGH		(_AT(pteval_t, 0x3) << 8)
+#define PTE_ADDR_HIGH_SHIFT	42
+#define PHYS_TO_PTE_ADDR_MASK	GENMASK_ULL(49, 8)
+#endif
 #endif
 
 /*
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index daedd6172227..666db7173d0f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -76,15 +76,16 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 #ifdef CONFIG_ARM64_PA_BITS_52
 static inline phys_addr_t __pte_to_phys(pte_t pte)
 {
+	pte_val(pte) &= ~PTE_MAYBE_SHARED;
 	return (pte_val(pte) & PTE_ADDR_LOW) |
 		((pte_val(pte) & PTE_ADDR_HIGH) << PTE_ADDR_HIGH_SHIFT);
 }
 static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
 {
-	return (phys | (phys >> PTE_ADDR_HIGH_SHIFT)) & PTE_ADDR_MASK;
+	return (phys | (phys >> PTE_ADDR_HIGH_SHIFT)) & PHYS_TO_PTE_ADDR_MASK;
 }
 #else
-#define __pte_to_phys(pte)	(pte_val(pte) & PTE_ADDR_MASK)
+#define __pte_to_phys(pte)	(pte_val(pte) & PTE_ADDR_LOW)
 #define __phys_to_pte_val(phys)	(phys)
 #endif
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 0f576d72b847..6415623b7ebf 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -208,6 +208,14 @@ SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 
 	.pushsection ".idmap.text", "awx"
 
+	.macro	pte_to_phys, phys, pte
+	and	\phys, \pte, #PTE_ADDR_LOW
+#ifdef CONFIG_ARM64_PA_BITS_52
+	and	\pte, \pte, #PTE_ADDR_HIGH
+	orr	\phys, \phys, \pte, lsl #PTE_ADDR_HIGH_SHIFT
+#endif
+	.endm
+
 	.macro	kpti_mk_tbl_ng, type, num_entries
 	add	end_\type\()p, cur_\type\()p, #\num_entries * 8
 .Ldo_\type:
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 11/19] arm64: mm: Add LPA2 support to phys<->pte conversion routines Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-28 16:17   ` Ryan Roberts
  2022-11-24 12:39 ` [PATCH v2 13/19] arm64: mm: add 5 level paging support to G-to-nG conversion routine Ard Biesheuvel
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Add the required types and descriptor accessors to support 5 levels of
paging in the common code. This is one of the prerequisites for
supporting 52-bit virtual addressing with 4k pages.

Note that this does not cover the code that handles kernel mappings or
the fixmap.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgalloc.h       | 41 +++++++++++
 arch/arm64/include/asm/pgtable-hwdef.h | 22 +++++-
 arch/arm64/include/asm/pgtable-types.h |  6 ++
 arch/arm64/include/asm/pgtable.h       | 75 +++++++++++++++++++-
 arch/arm64/mm/mmu.c                    | 31 +++++++-
 arch/arm64/mm/pgd.c                    | 15 +++-
 6 files changed, 181 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 237224484d0f..cae8c648f462 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -60,6 +60,47 @@ static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
 }
 #endif	/* CONFIG_PGTABLE_LEVELS > 3 */
 
+#if CONFIG_PGTABLE_LEVELS > 4
+
+static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
+{
+	if (pgtable_l5_enabled())
+		set_pgd(pgdp, __pgd(__phys_to_pgd_val(p4dp) | prot));
+}
+
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
+{
+	pgdval_t pgdval = PGD_TYPE_TABLE;
+
+	pgdval |= (mm == &init_mm) ? PGD_TABLE_UXN : PGD_TABLE_PXN;
+	__pgd_populate(pgdp, __pa(p4dp), pgdval);
+}
+
+static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	gfp_t gfp = GFP_PGTABLE_USER;
+
+	if (mm == &init_mm)
+		gfp = GFP_PGTABLE_KERNEL;
+	return (p4d_t *)get_zeroed_page(gfp);
+}
+
+static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
+{
+	if (!pgtable_l5_enabled())
+		return;
+	BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
+	free_page((unsigned long)p4d);
+}
+
+#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
+#else
+static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
+{
+	BUILD_BUG();
+}
+#endif	/* CONFIG_PGTABLE_LEVELS > 4 */
+
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index b91fe4781b06..b364b02e696b 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -26,10 +26,10 @@
 #define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))
 
 /*
- * Size mapped by an entry at level n ( 0 <= n <= 3)
+ * Size mapped by an entry at level n ( -1 <= n <= 3)
  * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits
  * in the final page. The maximum number of translation levels supported by
- * the architecture is 4. Hence, starting at level n, we have further
+ * the architecture is 5. Hence, starting at level n, we have further
  * ((4 - n) - 1) levels of translation excluding the offset within the page.
  * So, the total number of bits mapped by an entry at level n is :
  *
@@ -62,9 +62,16 @@
 #define PTRS_PER_PUD		(1 << (PAGE_SHIFT - 3))
 #endif
 
+#if CONFIG_PGTABLE_LEVELS > 4
+#define P4D_SHIFT		ARM64_HW_PGTABLE_LEVEL_SHIFT(0)
+#define P4D_SIZE		(_AC(1, UL) << P4D_SHIFT)
+#define P4D_MASK		(~(P4D_SIZE-1))
+#define PTRS_PER_P4D		(1 << (PAGE_SHIFT - 3))
+#endif
+
 /*
  * PGDIR_SHIFT determines the size a top-level page table entry can map
- * (depending on the configuration, this level can be 0, 1 or 2).
+ * (depending on the configuration, this level can be -1, 0, 1 or 2).
  */
 #define PGDIR_SHIFT		ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - CONFIG_PGTABLE_LEVELS)
 #define PGDIR_SIZE		(_AC(1, UL) << PGDIR_SHIFT)
@@ -87,6 +94,15 @@
 /*
  * Hardware page table definitions.
  *
+ * Level -1 descriptor (PGD).
+ */
+#define PGD_TYPE_TABLE		(_AT(pgdval_t, 3) << 0)
+#define PGD_TABLE_BIT		(_AT(pgdval_t, 1) << 1)
+#define PGD_TYPE_MASK		(_AT(pgdval_t, 3) << 0)
+#define PGD_TABLE_PXN		(_AT(pgdval_t, 1) << 59)
+#define PGD_TABLE_UXN		(_AT(pgdval_t, 1) << 60)
+
+/*
  * Level 0 descriptor (P4D).
  */
 #define P4D_TYPE_TABLE		(_AT(p4dval_t, 3) << 0)
diff --git a/arch/arm64/include/asm/pgtable-types.h b/arch/arm64/include/asm/pgtable-types.h
index b8f158ae2527..6d6d4065b0cb 100644
--- a/arch/arm64/include/asm/pgtable-types.h
+++ b/arch/arm64/include/asm/pgtable-types.h
@@ -36,6 +36,12 @@ typedef struct { pudval_t pud; } pud_t;
 #define __pud(x)	((pud_t) { (x) } )
 #endif
 
+#if CONFIG_PGTABLE_LEVELS > 4
+typedef struct { p4dval_t p4d; } p4d_t;
+#define p4d_val(x)	((x).p4d)
+#define __p4d(x)	((p4d_t) { (x) } )
+#endif
+
 typedef struct { pgdval_t pgd; } pgd_t;
 #define pgd_val(x)	((x).pgd)
 #define __pgd(x)	((pgd_t) { (x) } )
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 666db7173d0f..2f7202d03d98 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -793,7 +793,6 @@ static inline pud_t *p4d_pgtable(p4d_t p4d)
 #else
 
 #define p4d_page_paddr(p4d)	({ BUILD_BUG(); 0;})
-#define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
 
 /* Match pud_offset folding in <asm/generic/pgtable-nopud.h> */
 #define pud_set_fixmap(addr)		NULL
@@ -804,6 +803,80 @@ static inline pud_t *p4d_pgtable(p4d_t p4d)
 
 #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
 
+#if CONFIG_PGTABLE_LEVELS > 4
+
+static __always_inline bool pgtable_l5_enabled(void)
+{
+	if (!alternative_has_feature_likely(ARM64_ALWAYS_BOOT))
+		return vabits_actual == VA_BITS;
+	return alternative_has_feature_unlikely(ARM64_HAS_LVA);
+}
+
+static inline bool mm_p4d_folded(struct mm_struct *mm)
+{
+	return !pgtable_l5_enabled();
+}
+#define mm_p4d_folded  mm_p4d_folded
+
+#define p4d_ERROR(e)	\
+	pr_err("%s:%d: bad p4d %016llx.\n", __FILE__, __LINE__, p4d_val(e))
+
+#define pgd_none(pgd)		(pgtable_l5_enabled() && !pgd_val(pgd))
+#define pgd_bad(pgd)		(pgtable_l5_enabled() && !(pgd_val(pgd) & 2))
+#define pgd_present(pgd)	(!pgd_none(pgd))
+
+static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	if (in_swapper_pgdir(pgdp)) {
+		set_swapper_pgd(pgdp, __pgd(pgd_val(pgd)));
+		return;
+	}
+
+	WRITE_ONCE(*pgdp, pgd);
+	dsb(ishst);
+	isb();
+}
+
+static inline void pgd_clear(pgd_t *pgdp)
+{
+	if (pgtable_l5_enabled())
+		set_pgd(pgdp, __pgd(0));
+}
+
+static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
+{
+	return __pgd_to_phys(pgd);
+}
+
+#define p4d_index(addr)		(((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1))
+
+static inline p4d_t *pgd_to_folded_p4d(pgd_t *pgdp, unsigned long addr)
+{
+	return (p4d_t *)PTR_ALIGN_DOWN(pgdp, PAGE_SIZE) + p4d_index(addr);
+}
+
+static inline phys_addr_t p4d_offset_phys(pgd_t *pgdp, unsigned long addr)
+{
+	BUG_ON(!pgtable_l5_enabled());
+
+	return pgd_page_paddr(READ_ONCE(*pgdp)) + p4d_index(addr) * sizeof(p4d_t);
+}
+
+static inline p4d_t *p4d_offset(pgd_t *pgdp, unsigned long addr)
+{
+	if (!pgtable_l5_enabled())
+		return pgd_to_folded_p4d(pgdp, addr);
+	return (p4d_t *)__va(p4d_offset_phys(pgdp, addr));
+}
+
+#define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(__pgd_to_phys(pgd)))
+
+#else
+
+static inline bool pgtable_l5_enabled(void) { return false; }
+
+#endif  /* CONFIG_PGTABLE_LEVELS > 4 */
+
 #define pgd_ERROR(e)	\
 	pr_err("%s:%d: bad pgd %016llx.\n", __FILE__, __LINE__, pgd_val(e))
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index bcf617f956cb..d089bc78e592 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1049,7 +1049,7 @@ static void free_empty_pud_table(p4d_t *p4dp, unsigned long addr,
 	if (CONFIG_PGTABLE_LEVELS <= 3)
 		return;
 
-	if (!pgtable_range_aligned(start, end, floor, ceiling, PGDIR_MASK))
+	if (!pgtable_range_aligned(start, end, floor, ceiling, P4D_MASK))
 		return;
 
 	/*
@@ -1072,8 +1072,8 @@ static void free_empty_p4d_table(pgd_t *pgdp, unsigned long addr,
 				 unsigned long end, unsigned long floor,
 				 unsigned long ceiling)
 {
-	unsigned long next;
 	p4d_t *p4dp, p4d;
+	unsigned long i, next, start = addr;
 
 	do {
 		next = p4d_addr_end(addr, end);
@@ -1085,6 +1085,27 @@ static void free_empty_p4d_table(pgd_t *pgdp, unsigned long addr,
 		WARN_ON(!p4d_present(p4d));
 		free_empty_pud_table(p4dp, addr, next, floor, ceiling);
 	} while (addr = next, addr < end);
+
+	if (!pgtable_l5_enabled())
+		return;
+
+	if (!pgtable_range_aligned(start, end, floor, ceiling, PGDIR_MASK))
+		return;
+
+	/*
+	 * Check whether we can free the p4d page if the rest of the
+	 * entries are empty. Overlap with other regions have been
+	 * handled by the floor/ceiling check.
+	 */
+	p4dp = p4d_offset(pgdp, 0UL);
+	for (i = 0; i < PTRS_PER_P4D; i++) {
+		if (!p4d_none(READ_ONCE(p4dp[i])))
+			return;
+	}
+
+	pgd_clear(pgdp);
+	__flush_tlb_kernel_pgtable(start);
+	free_hotplug_pgtable_page(virt_to_page(p4dp));
 }
 
 static void free_empty_tables(unsigned long addr, unsigned long end,
@@ -1351,6 +1372,12 @@ int pmd_set_huge(pmd_t *pmdp, phys_addr_t phys, pgprot_t prot)
 	return 1;
 }
 
+#ifndef __PAGETABLE_P4D_FOLDED
+void p4d_clear_huge(p4d_t *p4dp)
+{
+}
+#endif
+
 int pud_clear_huge(pud_t *pudp)
 {
 	if (!pud_sect(READ_ONCE(*pudp)))
diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
index 4a64089e5771..3c4f8a279d2b 100644
--- a/arch/arm64/mm/pgd.c
+++ b/arch/arm64/mm/pgd.c
@@ -17,11 +17,20 @@
 
 static struct kmem_cache *pgd_cache __ro_after_init;
 
+static bool pgdir_is_page_size(void)
+{
+	if (PGD_SIZE == PAGE_SIZE)
+		return true;
+	if (CONFIG_PGTABLE_LEVELS == 5)
+		return !pgtable_l5_enabled();
+	return false;
+}
+
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	gfp_t gfp = GFP_PGTABLE_USER;
 
-	if (PGD_SIZE == PAGE_SIZE)
+	if (pgdir_is_page_size())
 		return (pgd_t *)__get_free_page(gfp);
 	else
 		return kmem_cache_alloc(pgd_cache, gfp);
@@ -29,7 +38,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-	if (PGD_SIZE == PAGE_SIZE)
+	if (pgdir_is_page_size())
 		free_page((unsigned long)pgd);
 	else
 		kmem_cache_free(pgd_cache, pgd);
@@ -37,7 +46,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 
 void __init pgtable_cache_init(void)
 {
-	if (PGD_SIZE == PAGE_SIZE)
+	if (pgdir_is_page_size())
 		return;
 
 #ifdef CONFIG_ARM64_PA_BITS_52
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 13/19] arm64: mm: add 5 level paging support to G-to-nG conversion routine
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 14/19] arm64: Enable LPA2 at boot if supported by the system Ard Biesheuvel
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Add support for 5 level paging in the G-to-nG routine that creates its
own temporary page tables to traverse the swapper page tables. Also add
support for running the 5 level configuration with the top level folded
at runtime, to support CPUs that do not implement the LPA2 extension.

While at it, wire up the level skipping logic so it will also trigger on
4 level configurations with LPA2 enabled at build time but not active at
runtime, as we'll fall back to 3 level paging in that case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/cpufeature.c |  9 +++--
 arch/arm64/mm/proc.S           | 40 +++++++++++++++++++-
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2ae42db621fe..c20c3cbd42ef 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1726,6 +1726,9 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused)
 	pgd_t *kpti_ng_temp_pgd;
 	u64 alloc = 0;
 
+	if (levels == 5 && !pgtable_l5_enabled())
+		levels = 4;
+
 	if (__this_cpu_read(this_cpu_vector) == vectors) {
 		const char *v = arm64_get_bp_hardening_vector(EL1_VECTOR_KPTI);
 
@@ -1753,9 +1756,9 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused)
 		//
 		// The physical pages are laid out as follows:
 		//
-		// +--------+-/-------+-/------ +-\\--------+
-		// :  PTE[] : | PMD[] : | PUD[] : || PGD[]  :
-		// +--------+-\-------+-\------ +-//--------+
+		// +--------+-/-------+-/------ +-/------ +-\\\--------+
+		// :  PTE[] : | PMD[] : | PUD[] : | P4D[] : ||| PGD[]  :
+		// +--------+-\-------+-\------ +-\------ +-///--------+
 		//      ^
 		// The first page is mapped into this hierarchy at a PMD_SHIFT
 		// aligned virtual address, so that we can manipulate the PTE
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 6415623b7ebf..179e213bbe2d 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -282,6 +282,8 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 	end_ptep	.req	x15
 	pte		.req	x16
 	valid		.req	x17
+	cur_p4dp	.req	x19
+	end_p4dp	.req	x20
 
 	mov	x5, x3				// preserve temp_pte arg
 	mrs	swapper_ttb, ttbr1_el1
@@ -289,6 +291,12 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 
 	cbnz	cpu, __idmap_kpti_secondary
 
+#if CONFIG_PGTABLE_LEVELS > 4
+	stp	x29, x30, [sp, #-32]!
+	mov	x29, sp
+	stp	x19, x20, [sp, #16]
+#endif
+
 	/* We're the boot CPU. Wait for the others to catch up */
 	sevl
 1:	wfe
@@ -316,6 +324,14 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 	/* Everybody is enjoying the idmap, so we can rewrite swapper. */
 	/* PGD */
 	adrp		cur_pgdp, swapper_pg_dir
+#ifdef CONFIG_ARM64_LPA2
+alternative_if_not ARM64_HAS_LVA
+	/* skip one level of translation if 52-bit VAs are not enabled */
+	mov	pgd, cur_pgdp
+	add	end_pgdp, cur_pgdp, #8	// stop condition at pgd level
+	b	.Lderef_pgd
+alternative_else_nop_endif
+#endif
 	kpti_map_pgtbl	pgd, 0
 	kpti_mk_tbl_ng	pgd, PTRS_PER_PGD
 
@@ -329,16 +345,33 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 
 	/* Set the flag to zero to indicate that we're all done */
 	str	wzr, [flag_ptr]
+#if CONFIG_PGTABLE_LEVELS > 4
+	ldp	x19, x20, [sp, #16]
+	ldp	x29, x30, [sp], #32
+#endif
 	ret
 
 .Lderef_pgd:
+	/* P4D */
+	.if		CONFIG_PGTABLE_LEVELS > 4
+	p4d		.req	x30
+	pte_to_phys	cur_p4dp, pgd
+	kpti_map_pgtbl	p4d, 4
+	kpti_mk_tbl_ng	p4d, PTRS_PER_P4D
+	b		.Lnext_pgd
+	.else		/* CONFIG_PGTABLE_LEVELS <= 4 */
+	p4d		.req	pgd
+	.set		.Lnext_p4d, .Lnext_pgd
+	.endif
+
+.Lderef_p4d:
 	/* PUD */
 	.if		CONFIG_PGTABLE_LEVELS > 3
 	pud		.req	x10
-	pte_to_phys	cur_pudp, pgd
+	pte_to_phys	cur_pudp, p4d
 	kpti_map_pgtbl	pud, 1
 	kpti_mk_tbl_ng	pud, PTRS_PER_PUD
-	b		.Lnext_pgd
+	b		.Lnext_p4d
 	.else		/* CONFIG_PGTABLE_LEVELS <= 3 */
 	pud		.req	pgd
 	.set		.Lnext_pud, .Lnext_pgd
@@ -382,6 +415,9 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 	.unreq	end_ptep
 	.unreq	pte
 	.unreq	valid
+	.unreq	cur_p4dp
+	.unreq	end_p4dp
+	.unreq	p4d
 
 	/* Secondary CPUs end up here */
 __idmap_kpti_secondary:
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 14/19] arm64: Enable LPA2 at boot if supported by the system
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (12 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 13/19] arm64: mm: add 5 level paging support to G-to-nG conversion routine Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-28 14:54   ` Ryan Roberts
  2022-11-24 12:39 ` [PATCH v2 15/19] arm64: mm: Add 5 level paging support to fixmap and swapper handling Ard Biesheuvel
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Update the early kernel mapping code to take 52-bit virtual addressing
into account based on the LPA2 feature. This is a bit more involved than
LVA (which is supported with 64k pages only), given that some page table
descriptor bits change meaning in this case.

To keep the handling in asm to a minimum, the initial ID map is still
created with 48-bit virtual addressing, which implies that the kernel
image must be loaded into 48-bit addressable physical memory. This is
currently required by the boot protocol, even though we happen to
support placement outside of that for LVA/64k based configurations.

Enabling LPA2 involves more than setting TCR.T1SZ to a lower value,
there is also a DS bit in TCR that needs to be set, and which changes
the meaning of bits [9:8] in all page table descriptors. Since we cannot
enable DS and every live page table descriptor at the same time, let's
pivot through another temporary mapping. This avoids the need to
reintroduce manipulations of the page tables with the MMU and caches
disabled.

To permit the LPA2 feature to be overridden on the kernel command line,
which may be necessary to work around silicon errata, or to deal with
mismatched features on heterogeneous SoC designs, test for CPU feature
overrides first, and only then enable LPA2.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h      |   7 +-
 arch/arm64/include/asm/kernel-pgtable.h |  25 +++--
 arch/arm64/include/asm/memory.h         |   4 +
 arch/arm64/kernel/head.S                |   9 +-
 arch/arm64/kernel/image-vars.h          |   2 +
 arch/arm64/kernel/pi/map_kernel.c       | 103 +++++++++++++++++++-
 arch/arm64/mm/init.c                    |   2 +-
 arch/arm64/mm/mmu.c                     |   8 +-
 arch/arm64/mm/proc.S                    |   4 +
 9 files changed, 151 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 786bf62826a8..30eee6473cf0 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -609,11 +609,16 @@ alternative_endif
  * but we have to add an offset so that the TTBR1 address corresponds with the
  * pgdir entry that covers the lowest 48-bit addressable VA.
  *
+ * Note that this trick only works for LVA/64k pages - LPA2/4k pages uses an
+ * additional paging level, and on LPA2/16k pages, we would end up with a TTBR
+ * address that is not 64 byte aligned, so there we reduce the number of paging
+ * levels for the non-LPA2 case.
+ *
  * orr is used as it can cover the immediate value (and is idempotent).
  * 	ttbr: Value of ttbr to set, modified.
  */
 	.macro	offset_ttbr1, ttbr, tmp
-#ifdef CONFIG_ARM64_VA_BITS_52
+#if defined(CONFIG_ARM64_VA_BITS_52) && !defined(CONFIG_ARM64_LPA2)
 	mrs	\tmp, tcr_el1
 	and	\tmp, \tmp, #TCR_T1SZ_MASK
 	cmp	\tmp, #TCR_T1SZ(VA_BITS_MIN)
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index faa11e8b4a0e..2359b2af0c4c 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -20,12 +20,16 @@
  */
 #ifdef CONFIG_ARM64_4K_PAGES
 #define INIT_IDMAP_USES_PMD_MAPS	1
-#define INIT_IDMAP_TABLE_LEVELS		(CONFIG_PGTABLE_LEVELS - 1)
 #else
 #define INIT_IDMAP_USES_PMD_MAPS	0
-#define INIT_IDMAP_TABLE_LEVELS		(CONFIG_PGTABLE_LEVELS)
 #endif
 
+/* how many levels of translation are required to cover 'x' bits of VA space */
+#define VA_LEVELS(x)		(((x) - 4) / (PAGE_SHIFT - 3))
+#define INIT_IDMAP_TABLE_LEVELS	(VA_LEVELS(VA_BITS_MIN) - INIT_IDMAP_USES_PMD_MAPS)
+
+#define INIT_IDMAP_ROOT_SHIFT	(VA_LEVELS(VA_BITS_MIN) * (PAGE_SHIFT - 3) + 3)
+
 /*
  * If KASLR is enabled, then an offset K is added to the kernel address
  * space. The bottom 21 bits of this offset are zero to guarantee 2MB
@@ -52,7 +56,14 @@
 #define EARLY_ENTRIES(vstart, vend, shift, add) \
 	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1 + add)
 
-#define EARLY_PGDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PGDIR_SHIFT, add))
+#if CONFIG_PGTABLE_LEVELS > 4
+/* the kernel is covered entirely by the pgd_t at the top of the VA space */
+#define EARLY_PGDS	1
+#else
+#define EARLY_PGDS	0
+#endif
+
+#define EARLY_P4DS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, INIT_IDMAP_ROOT_SHIFT, add))
 
 #if INIT_IDMAP_TABLE_LEVELS > 3
 #define EARLY_PUDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PUD_SHIFT, add))
@@ -66,11 +77,13 @@
 #define EARLY_PMDS(vstart, vend, add) (0)
 #endif
 
-#define EARLY_PAGES(vstart, vend, add) ( 1 			/* PGDIR page */				\
-			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
+#define EARLY_PAGES(vstart, vend, add) ( 1 			/* PGDIR/P4D page */				\
+			+ EARLY_P4DS((vstart), (vend), add) 	/* each P4D needs a next level page table */	\
 			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
 			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
-#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR) + EARLY_SEGMENT_EXTRA_PAGES))
+
+#define INIT_DIR_SIZE	(PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR) + \
+			 EARLY_SEGMENT_EXTRA_PAGES + EARLY_PGDS))
 
 /* the initial ID map may need two extra pages if it needs to be extended */
 #if VA_BITS_MIN < 48
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index b3826ff2e52b..4f617e271008 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -54,7 +54,11 @@
 #define FIXADDR_TOP		(VMEMMAP_START - SZ_32M)
 
 #if VA_BITS > 48
+#ifdef CONFIG_ARM64_16K_PAGES
+#define VA_BITS_MIN		(47)
+#else
 #define VA_BITS_MIN		(48)
+#endif
 #else
 #define VA_BITS_MIN		(VA_BITS)
 #endif
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 94de42dfe97d..6be121949c06 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -198,7 +198,7 @@ SYM_CODE_END(preserve_boot_args)
 	mov \tbl, \sv
 	.endif
 .L_\@:
-	compute_indices \vstart, \vend, #PGDIR_SHIFT, \istart, \iend, \count
+	compute_indices \vstart, \vend, #INIT_IDMAP_ROOT_SHIFT, \istart, \iend, \count
 	mov \sv, \rtbl
 	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
 	mov \tbl, \sv
@@ -610,9 +610,16 @@ SYM_FUNC_START(__cpu_secondary_check52bitva)
 alternative_if_not ARM64_HAS_LVA
 	ret
 alternative_else_nop_endif
+#ifndef CONFIG_ARM64_LPA2
 	mrs_s	x0, SYS_ID_AA64MMFR2_EL1
 	and	x0, x0, #(0xf << ID_AA64MMFR2_EL1_VARange_SHIFT)
 	cbnz	x0, 2f
+#else
+	mrs	x0, id_aa64mmfr0_el1
+	sbfx	x0, x0, #ID_AA64MMFR0_EL1_TGRAN_SHIFT, 4
+	cmp	x0, #ID_AA64MMFR0_EL1_TGRAN_LPA2
+	b.ge	2f
+#endif
 
 	update_early_cpu_boot_status \
 		CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_52_BIT_VA, x0, x1
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 82bafa1f869c..f48b6f09d278 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -56,6 +56,8 @@ PROVIDE(__pi_arm64_sw_feature_override	= arm64_sw_feature_override);
 PROVIDE(__pi_arm64_use_ng_mappings	= arm64_use_ng_mappings);
 PROVIDE(__pi__ctype			= _ctype);
 
+PROVIDE(__pi_init_idmap_pg_dir		= init_idmap_pg_dir);
+PROVIDE(__pi_init_idmap_pg_end		= init_idmap_pg_end);
 PROVIDE(__pi_init_pg_dir		= init_pg_dir);
 PROVIDE(__pi_init_pg_end		= init_pg_end);
 PROVIDE(__pi_swapper_pg_dir		= swapper_pg_dir);
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index a9472ab8d901..75d643da56c8 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -133,6 +133,20 @@ static bool __init arm64_early_this_cpu_has_lva(void)
 						    ID_AA64MMFR2_EL1_VARange_SHIFT);
 }
 
+static bool __init arm64_early_this_cpu_has_lpa2(void)
+{
+	u64 mmfr0;
+	int feat;
+
+	mmfr0 = read_sysreg(id_aa64mmfr0_el1);
+	mmfr0 &= ~id_aa64mmfr0_override.mask;
+	mmfr0 |= id_aa64mmfr0_override.val;
+	feat = cpuid_feature_extract_signed_field(mmfr0,
+						  ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+
+	return feat >= ID_AA64MMFR0_EL1_TGRAN_LPA2;
+}
+
 static bool __init arm64_early_this_cpu_has_pac(void)
 {
 	u64 isar1, isar2;
@@ -254,11 +268,85 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
 	}
 
 	/* Copy the root page table to its final location */
-	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PGD_SIZE);
+	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PAGE_SIZE);
 	dsb(ishst);
 	idmap_cpu_replace_ttbr1(swapper_pg_dir);
 }
 
+static void noinline __section(".idmap.text") set_ttbr0_for_lpa2(u64 ttbr)
+{
+	u64 sctlr = read_sysreg(sctlr_el1);
+	u64 tcr = read_sysreg(tcr_el1) | TCR_DS;
+
+	/* Update TCR.T0SZ in case we entered with a 47-bit ID map */
+	tcr &= ~TCR_T0SZ_MASK;
+	tcr |= TCR_T0SZ(48);
+
+	asm("	msr	sctlr_el1, %0		;"
+	    "	isb				;"
+	    "   msr     ttbr0_el1, %1		;"
+	    "   msr     tcr_el1, %2		;"
+	    "	isb				;"
+	    "	tlbi    vmalle1			;"
+	    "	dsb     nsh			;"
+	    "	isb				;"
+	    "	msr     sctlr_el1, %3		;"
+	    "	isb				;"
+	    ::	"r"(sctlr & ~SCTLR_ELx_M), "r"(ttbr), "r"(tcr), "r"(sctlr));
+}
+
+static void remap_idmap_for_lpa2(void)
+{
+	extern pgd_t init_idmap_pg_dir[], init_idmap_pg_end[];
+	pgd_t *pgdp = (void *)init_pg_dir + PAGE_SIZE;
+	pgprot_t text_prot = PAGE_KERNEL_ROX;
+	pgprot_t data_prot = PAGE_KERNEL;
+
+	/* clear the bits that change meaning once LPA2 is turned on */
+	pgprot_val(text_prot) &= ~PTE_SHARED;
+	pgprot_val(data_prot) &= ~PTE_SHARED;
+
+	/*
+	 * We have to clear bits [9:8] in all block or page descriptors in the
+	 * initial ID map, as otherwise they will be (mis)interpreted as
+	 * physical address bits once we flick the LPA2 switch (TCR.DS). Since
+	 * we cannot manipulate live descriptors in that way without creating
+	 * potential TLB conflicts, let's create another temporary ID map in a
+	 * LPA2 compatible fashion, and update the initial ID map while running
+	 * from that.
+	 */
+	map_segment(init_pg_dir, &pgdp, 0, _stext, __inittext_end, text_prot,
+		    false, 0);
+	map_segment(init_pg_dir, &pgdp, 0, __initdata_begin, _end, data_prot,
+		    false, 0);
+	dsb(ishst);
+	set_ttbr0_for_lpa2((u64)init_pg_dir);
+
+	/*
+	 * Recreate the initial ID map with the same granularity as before.
+	 * Don't bother with the FDT, we no longer need it after this.
+	 */
+	memset(init_idmap_pg_dir, 0,
+	       (u64)init_idmap_pg_dir - (u64)init_idmap_pg_end);
+
+	pgdp = (void *)init_idmap_pg_dir + PAGE_SIZE;
+	map_segment(init_idmap_pg_dir, &pgdp, 0,
+		    PTR_ALIGN_DOWN(&_stext[0], INIT_IDMAP_BLOCK_SIZE),
+		    PTR_ALIGN_DOWN(&__bss_start[0], INIT_IDMAP_BLOCK_SIZE),
+		    text_prot, false, 0);
+	map_segment(init_idmap_pg_dir, &pgdp, 0,
+		    PTR_ALIGN_DOWN(&__bss_start[0], INIT_IDMAP_BLOCK_SIZE),
+		    PTR_ALIGN(&_end[0], INIT_IDMAP_BLOCK_SIZE),
+		    data_prot, false, 0);
+	dsb(ishst);
+
+	/* switch back to the updated initial ID map */
+	set_ttbr0_for_lpa2((u64)init_idmap_pg_dir);
+
+	/* wipe the temporary ID map from memory */
+	memset(init_pg_dir, 0, (u64)init_pg_end - (u64)init_pg_dir);
+}
+
 asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 {
 	static char const chosen_str[] __initconst = "/chosen";
@@ -266,6 +354,7 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 	u64 va_base, pa_base = (u64)&_text;
 	u64 kaslr_offset = pa_base % MIN_KIMG_ALIGN;
 	int root_level = 4 - CONFIG_PGTABLE_LEVELS;
+	bool va52 = (VA_BITS == 52);
 
 	/* Clear BSS and the initial page tables */
 	memset(__bss_start, 0, (u64)init_pg_end - (u64)__bss_start);
@@ -295,7 +384,17 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 			arm64_use_ng_mappings = true;
 	}
 
-	if (VA_BITS == 52 && arm64_early_this_cpu_has_lva())
+	if (IS_ENABLED(CONFIG_ARM64_LPA2)) {
+		if (arm64_early_this_cpu_has_lpa2()) {
+			remap_idmap_for_lpa2();
+		} else {
+			va52 = false;
+			root_level++;
+		}
+	} else if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) {
+		va52 &= arm64_early_this_cpu_has_lva();
+	}
+	if (va52)
 		sysreg_clear_set(tcr_el1, TCR_T1SZ_MASK, TCR_T1SZ(VA_BITS));
 
 	va_base = KIMAGE_VADDR + kaslr_offset;
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 4b4651ee47f2..498d327341b4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -315,7 +315,7 @@ void __init arm64_memblock_init(void)
 	 * physical address of PAGE_OFFSET, we have to *subtract* from it.
 	 */
 	if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
-		memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
+		memstart_addr -= _PAGE_OFFSET(vabits_actual) - _PAGE_OFFSET(52);
 
 	/*
 	 * Apply the memory limit if it was set. Since the kernel may be loaded
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d089bc78e592..ba5423ff7039 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -541,8 +541,12 @@ static void __init map_mem(pgd_t *pgdp)
 	 * entries at any level are being shared between the linear region and
 	 * the vmalloc region. Check whether this is true for the PGD level, in
 	 * which case it is guaranteed to be true for all other levels as well.
+	 * (Unless we are running with support for LPA2, in which case the
+	 * entire reduced VA space is covered by a single pgd_t which will have
+	 * been populated without the PXNTable attribute by the time we get here.)
 	 */
-	BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end));
+	BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end) &&
+		     pgd_index(_PAGE_OFFSET(VA_BITS_MIN)) != PTRS_PER_PGD - 1);
 
 	if (can_set_direct_map())
 		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
@@ -726,7 +730,7 @@ static void __init create_idmap(void)
 
 void __init paging_init(void)
 {
-	idmap_t0sz = 63UL - __fls(__pa_symbol(_end) | GENMASK(VA_BITS_MIN - 1, 0));
+	idmap_t0sz = 63UL - __fls(__pa_symbol(_end) | GENMASK(vabits_actual - 1, 0));
 
 	map_mem(swapper_pg_dir);
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 179e213bbe2d..d95df732b672 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -489,7 +489,11 @@ SYM_FUNC_START(__cpu_setup)
 #if VA_BITS > VA_BITS_MIN
 	mov		x9, #64 - VA_BITS
 alternative_if ARM64_HAS_LVA
+	tcr_set_t0sz	tcr, x9
 	tcr_set_t1sz	tcr, x9
+#ifdef CONFIG_ARM64_LPA2
+	orr		tcr, tcr, #TCR_DS
+#endif
 alternative_else_nop_endif
 #endif
 
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 15/19] arm64: mm: Add 5 level paging support to fixmap and swapper handling
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (13 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 14/19] arm64: Enable LPA2 at boot if supported by the system Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 16/19] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging Ard Biesheuvel
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Add support for using 5 levels of paging in the fixmap, as well as in
the kernel page table handling code which uses fixmaps internally.
This also handles the case where a 5 level build runs on hardware that
only supports 4 levels of paging.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/fixmap.h  |  1 +
 arch/arm64/include/asm/pgtable.h | 35 +++++++++++
 arch/arm64/mm/mmu.c              | 64 +++++++++++++++++---
 3 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index d09654af5b12..675e08e98e8b 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -91,6 +91,7 @@ enum fixed_addresses {
 	FIX_PTE,
 	FIX_PMD,
 	FIX_PUD,
+	FIX_P4D,
 	FIX_PGD,
 
 	__end_of_fixed_addresses
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2f7202d03d98..057f079bb2c7 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -869,12 +869,47 @@ static inline p4d_t *p4d_offset(pgd_t *pgdp, unsigned long addr)
 	return (p4d_t *)__va(p4d_offset_phys(pgdp, addr));
 }
 
+static inline p4d_t *p4d_set_fixmap(unsigned long addr)
+{
+	if (!pgtable_l5_enabled())
+		return NULL;
+	return (p4d_t *)set_fixmap_offset(FIX_P4D, addr);
+}
+
+static inline p4d_t *p4d_set_fixmap_offset(pgd_t *pgdp, unsigned long addr)
+{
+	if (!pgtable_l5_enabled())
+		return pgd_to_folded_p4d(pgdp, addr);
+	return p4d_set_fixmap(p4d_offset_phys(pgdp, addr));
+}
+
+static inline void p4d_clear_fixmap(void)
+{
+	if (pgtable_l5_enabled())
+		clear_fixmap(FIX_P4D);
+}
+
+/* use ONLY for statically allocated translation tables */
+static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
+{
+	if (!pgtable_l5_enabled())
+		return pgd_to_folded_p4d(pgdp, addr);
+	return (p4d_t *)__phys_to_kimg(p4d_offset_phys(pgdp, addr));
+}
+
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(__pgd_to_phys(pgd)))
 
 #else
 
 static inline bool pgtable_l5_enabled(void) { return false; }
 
+/* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
+#define p4d_set_fixmap(addr)		NULL
+#define p4d_set_fixmap_offset(p4dp, addr)	((p4d_t *)p4dp)
+#define p4d_clear_fixmap()
+
+#define p4d_offset_kimg(dir,addr)	((p4d_t *)dir)
+
 #endif  /* CONFIG_PGTABLE_LEVELS > 4 */
 
 #define pgd_ERROR(e)	\
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ba5423ff7039..000ae84da0ef 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -313,15 +313,14 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 	} while (addr = next, addr != end);
 }
 
-static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
+static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
 			   phys_addr_t phys, pgprot_t prot,
 			   phys_addr_t (*pgtable_alloc)(int),
 			   int flags)
 {
 	unsigned long next;
-	pud_t *pudp;
-	p4d_t *p4dp = p4d_offset(pgdp, addr);
 	p4d_t p4d = READ_ONCE(*p4dp);
+	pud_t *pudp;
 
 	if (p4d_none(p4d)) {
 		p4dval_t p4dval = P4D_TYPE_TABLE | P4D_TABLE_UXN;
@@ -369,6 +368,46 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 	pud_clear_fixmap();
 }
 
+static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
+			   phys_addr_t phys, pgprot_t prot,
+			   phys_addr_t (*pgtable_alloc)(int),
+			   int flags)
+{
+	unsigned long next;
+	pgd_t pgd = READ_ONCE(*pgdp);
+	p4d_t *p4dp;
+
+	if (pgd_none(pgd)) {
+		pgdval_t pgdval = PGD_TYPE_TABLE | PGD_TABLE_UXN;
+		phys_addr_t p4d_phys;
+
+		if (flags & NO_EXEC_MAPPINGS)
+			pgdval |= PGD_TABLE_PXN;
+		BUG_ON(!pgtable_alloc);
+		p4d_phys = pgtable_alloc(P4D_SHIFT);
+		__pgd_populate(pgdp, p4d_phys, pgdval);
+		pgd = READ_ONCE(*pgdp);
+	}
+	BUG_ON(pgd_bad(pgd));
+
+	p4dp = p4d_set_fixmap_offset(pgdp, addr);
+	do {
+		p4d_t old_p4d = READ_ONCE(*p4dp);
+
+		next = p4d_addr_end(addr, end);
+
+		alloc_init_pud(p4dp, addr, next, phys, prot,
+			       pgtable_alloc, flags);
+
+		BUG_ON(p4d_val(old_p4d) != 0 &&
+		       p4d_val(old_p4d) != READ_ONCE(p4d_val(*p4dp)));
+
+		phys += next - addr;
+	} while (p4dp++, addr = next, addr != end);
+
+	p4d_clear_fixmap();
+}
+
 static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
 					unsigned long virt, phys_addr_t size,
 					pgprot_t prot,
@@ -391,7 +430,7 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
 
 	do {
 		next = pgd_addr_end(addr, end);
-		alloc_init_pud(pgdp, addr, next, phys, prot, pgtable_alloc,
+		alloc_init_p4d(pgdp, addr, next, phys, prot, pgtable_alloc,
 			       flags);
 		phys += next - addr;
 	} while (pgdp++, addr = next, addr != end);
@@ -1196,10 +1235,19 @@ void vmemmap_free(unsigned long start, unsigned long end,
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
-static inline pud_t *fixmap_pud(unsigned long addr)
+static inline p4d_t *fixmap_p4d(unsigned long addr)
 {
 	pgd_t *pgdp = pgd_offset_k(addr);
-	p4d_t *p4dp = p4d_offset(pgdp, addr);
+	pgd_t pgd = READ_ONCE(*pgdp);
+
+	BUG_ON(pgd_none(pgd) || pgd_bad(pgd));
+
+	return p4d_offset_kimg(pgdp, addr);
+}
+
+static inline pud_t *fixmap_pud(unsigned long addr)
+{
+	p4d_t *p4dp = fixmap_p4d(addr);
 	p4d_t p4d = READ_ONCE(*p4dp);
 
 	BUG_ON(p4d_none(p4d) || p4d_bad(p4d));
@@ -1230,14 +1278,12 @@ static inline pte_t *fixmap_pte(unsigned long addr)
  */
 void __init early_fixmap_init(void)
 {
-	pgd_t *pgdp;
 	p4d_t *p4dp, p4d;
 	pud_t *pudp;
 	pmd_t *pmdp;
 	unsigned long addr = FIXADDR_START;
 
-	pgdp = pgd_offset_k(addr);
-	p4dp = p4d_offset(pgdp, addr);
+	p4dp = fixmap_p4d(addr);
 	p4d = READ_ONCE(*p4dp);
 	if (p4d_none(p4d))
 		__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 16/19] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (14 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 15/19] arm64: mm: Add 5 level paging support to fixmap and swapper handling Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 17:44   ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 17/19] arm64: mm: Add support for folding PUDs at runtime Ard Biesheuvel
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Allow the KASAN init code to deal with 5 levels of paging, and relax the
requirement that the shadow region is aligned to the top level pgd_t
size. This is necessary for LPA2 based 52-bit virtual addressing, where
the KASAN shadow will never be aligned to the pgd_t size. Allowing this
also enables the 16k/48-bit case for KASAN, which is a nice bonus.

This involves some hackery to manipulate the root and next level page
tables without having to distinguish all the various configurations,
including 16k/48-bits (which has a two entry pgd_t level), and LPA2
configurations running with one translation level less on non-LPA2
hardware.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig         |   2 +-
 arch/arm64/mm/kasan_init.c | 124 ++++++++++++++++++--
 2 files changed, 112 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6d299c6c0a56..901f4d73476d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -153,7 +153,7 @@ config ARM64
 	select HAVE_ARCH_HUGE_VMAP
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
-	select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
+	select HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 7e32f21fb8e1..c422952e439b 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -23,7 +23,7 @@
 
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 
-static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
+static pgd_t tmp_pg_dir[PTRS_PER_PTE] __initdata __aligned(PAGE_SIZE);
 
 /*
  * The p*d_populate functions call virt_to_phys implicitly so they can't be used
@@ -99,6 +99,19 @@ static pud_t *__init kasan_pud_offset(p4d_t *p4dp, unsigned long addr, int node,
 	return early ? pud_offset_kimg(p4dp, addr) : pud_offset(p4dp, addr);
 }
 
+static p4d_t *__init kasan_p4d_offset(pgd_t *pgdp, unsigned long addr, int node,
+				      bool early)
+{
+	if (pgd_none(READ_ONCE(*pgdp))) {
+		phys_addr_t p4d_phys = early ?
+				__pa_symbol(kasan_early_shadow_p4d)
+					: kasan_alloc_zeroed_page(node);
+		__pgd_populate(pgdp, p4d_phys, PGD_TYPE_TABLE);
+	}
+
+	return early ? p4d_offset_kimg(pgdp, addr) : p4d_offset(pgdp, addr);
+}
+
 static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
 				      unsigned long end, int node, bool early)
 {
@@ -144,7 +157,7 @@ static void __init kasan_p4d_populate(pgd_t *pgdp, unsigned long addr,
 				      unsigned long end, int node, bool early)
 {
 	unsigned long next;
-	p4d_t *p4dp = p4d_offset(pgdp, addr);
+	p4d_t *p4dp = kasan_p4d_offset(pgdp, addr, node, early);
 
 	do {
 		next = p4d_addr_end(addr, end);
@@ -165,14 +178,20 @@ static void __init kasan_pgd_populate(unsigned long addr, unsigned long end,
 	} while (pgdp++, addr = next, addr != end);
 }
 
+#if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS > 4
+#define SHADOW_ALIGN	P4D_SIZE
+#else
+#define SHADOW_ALIGN	PUD_SIZE
+#endif
+
 /* The early shadow maps everything to a single page of zeroes */
 asmlinkage void __init kasan_early_init(void)
 {
 	BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
 		KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
-	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), PGDIR_SIZE));
-	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), PGDIR_SIZE));
-	BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, PGDIR_SIZE));
+	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), SHADOW_ALIGN));
+	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), SHADOW_ALIGN));
+	BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, SHADOW_ALIGN));
 	kasan_pgd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, NUMA_NO_NODE,
 			   true);
 }
@@ -184,20 +203,86 @@ static void __init kasan_map_populate(unsigned long start, unsigned long end,
 	kasan_pgd_populate(start & PAGE_MASK, PAGE_ALIGN(end), node, false);
 }
 
-static void __init clear_pgds(unsigned long start,
-			unsigned long end)
+/*
+ * Return whether 'addr' is aligned to the size covered by a top level
+ * descriptor.
+ */
+static bool __init top_level_aligned(u64 addr)
+{
+	int shift = (VA_LEVELS(vabits_actual) - 1) * (PAGE_SHIFT - 3);
+
+	return (addr % (PAGE_SIZE << shift)) == 0;
+}
+
+/*
+ * Return the descriptor index of 'addr' in the top level table
+ */
+static int __init top_level_idx(u64 addr)
 {
 	/*
-	 * Remove references to kasan page tables from
-	 * swapper_pg_dir. pgd_clear() can't be used
-	 * here because it's nop on 2,3-level pagetable setups
+	 * On 64k pages, the TTBR1 range root tables are extended for 52-bit
+	 * virtual addressing, and TTBR1 will simply point to the pgd_t entry
+	 * that covers the start of the 48-bit addressable VA space if LVA is
+	 * not implemented. This means we need to index the table as usual,
+	 * instead of masking off bits based on vabits_actual.
 	 */
-	for (; start < end; start += PGDIR_SIZE)
-		set_pgd(pgd_offset_k(start), __pgd(0));
+	u64 vabits = IS_ENABLED(CONFIG_ARM64_64K_PAGES) ? VA_BITS
+							: vabits_actual;
+	int shift = (VA_LEVELS(vabits) - 1) * (PAGE_SHIFT - 3);
+
+	return (addr & ~_PAGE_OFFSET(vabits)) >> (shift + PAGE_SHIFT);
+}
+
+/*
+ * Clone a next level table from swapper_pg_dir into tmp_pg_dir
+ */
+static void __init clone_next_level(u64 addr, pgd_t *tmp_pg_dir, pud_t *pud)
+{
+	int idx = top_level_idx(addr);
+	pgd_t pgd = READ_ONCE(swapper_pg_dir[idx]);
+	pud_t *pudp = (pud_t *)__phys_to_kimg(__pgd_to_phys(pgd));
+
+	memcpy(pud, pudp, PAGE_SIZE);
+	tmp_pg_dir[idx] = __pgd(__phys_to_pgd_val(__pa_symbol(pud)) |
+				PUD_TYPE_TABLE);
+}
+
+/*
+ * Return the descriptor index of 'addr' in the next level table
+ */
+static int __init next_level_idx(u64 addr)
+{
+	int shift = (VA_LEVELS(vabits_actual) - 2) * (PAGE_SHIFT - 3);
+
+	return (addr >> (shift + PAGE_SHIFT)) % PTRS_PER_PTE;
+}
+
+/*
+ * Dereference the table descriptor at 'pgd_idx' and clear the entries from
+ * 'start' to 'end' from the table.
+ */
+static void __init clear_next_level(int pgd_idx, int start, int end)
+{
+	pgd_t pgd = READ_ONCE(swapper_pg_dir[pgd_idx]);
+	pud_t *pudp = (pud_t *)__phys_to_kimg(__pgd_to_phys(pgd));
+
+	memset(&pudp[start], 0, (end - start) * sizeof(pud_t));
+}
+
+static void __init clear_shadow(u64 start, u64 end)
+{
+	int l = top_level_idx(start), m = top_level_idx(end);
+
+	if (!top_level_aligned(start))
+		clear_next_level(l++, next_level_idx(start), PTRS_PER_PTE - 1);
+	if (!top_level_aligned(end))
+		clear_next_level(m, 0, next_level_idx(end));
+	memset(&swapper_pg_dir[l], 0, (m - l) * sizeof(pgd_t));
 }
 
 static void __init kasan_init_shadow(void)
 {
+	static pud_t pud[2][PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
 	u64 kimg_shadow_start, kimg_shadow_end;
 	u64 mod_shadow_start, mod_shadow_end;
 	u64 vmalloc_shadow_end;
@@ -220,10 +305,23 @@ static void __init kasan_init_shadow(void)
 	 * setup will be finished.
 	 */
 	memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir));
+
+	/*
+	 * If the start or end address of the shadow region is not aligned to
+	 * the top level size, we have to allocate a temporary next-level table
+	 * in each case, clone the next level of descriptors, and install the
+	 * table into tmp_pg_dir. Note that with 5 levels of paging, the next
+	 * level will in fact be p4d_t, but that makes no difference in this
+	 * case.
+	 */
+	if (!top_level_aligned(KASAN_SHADOW_START))
+		clone_next_level(KASAN_SHADOW_START, tmp_pg_dir, pud[0]);
+	if (!top_level_aligned(KASAN_SHADOW_END))
+		clone_next_level(KASAN_SHADOW_END, tmp_pg_dir, pud[1]);
 	dsb(ishst);
 	cpu_replace_ttbr1(lm_alias(tmp_pg_dir));
 
-	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
+	clear_shadow(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
 	kasan_map_populate(kimg_shadow_start, kimg_shadow_end,
 			   early_pfn_to_nid(virt_to_pfn(lm_alias(KERNEL_START))));
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 17/19] arm64: mm: Add support for folding PUDs at runtime
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (15 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 16/19] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 18/19] arm64: ptdump: Disregard unaddressable VA space Ard Biesheuvel
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

In order to support LPA2 on 16k pages in a way that permits non-LPA2
systems to run the same kernel image, we have to be able to fall back to
at most 48 bits of virtual addressing.

Falling back to 48 bits would result in a level 0 with only 2 entries,
which is suboptimal in terms of TLB utilization, and problematic because
we cannot dimension the level 0 table for 52-bit virtual addressing and
simply point TTBR1 to the top 2 entries when using only 48 bits of
address space (analogous to how we handle this on 64k pages), given that
those two entries will not appear at a 64 byte aligned address, which is
a requirement for TTBRx address values.

So instead, let's fall back to 47 bits, solving both problems. This
means we need to be able to fold PUDs dynamically, similar to how we
fold P4Ds for 48 bit virtual addressing on LPA2 with 4k pages.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgalloc.h | 12 ++-
 arch/arm64/include/asm/pgtable.h | 80 +++++++++++++++++---
 arch/arm64/include/asm/tlb.h     |  3 +-
 arch/arm64/kernel/cpufeature.c   |  2 +
 arch/arm64/mm/mmu.c              |  2 +-
 arch/arm64/mm/pgd.c              |  2 +
 6 files changed, 87 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index cae8c648f462..aeba2cf15a25 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -14,6 +14,7 @@
 #include <asm/tlbflush.h>
 
 #define __HAVE_ARCH_PGD_FREE
+#define __HAVE_ARCH_PUD_FREE
 #include <asm-generic/pgalloc.h>
 
 #define PGD_SIZE	(PTRS_PER_PGD * sizeof(pgd_t))
@@ -43,7 +44,8 @@ static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
 
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
 {
-	set_p4d(p4dp, __p4d(__phys_to_p4d_val(pudp) | prot));
+	if (pgtable_l4_enabled())
+		set_p4d(p4dp, __p4d(__phys_to_p4d_val(pudp) | prot));
 }
 
 static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
@@ -53,6 +55,14 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
 	p4dval |= (mm == &init_mm) ? P4D_TABLE_UXN : P4D_TABLE_PXN;
 	__p4d_populate(p4dp, __pa(pudp), p4dval);
 }
+
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+	if (!pgtable_l4_enabled())
+		return;
+	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
+	free_page((unsigned long)pud);
+}
 #else
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
 {
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 057f079bb2c7..3fb75680f3bc 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -744,12 +744,27 @@ static inline pmd_t *pud_pgtable(pud_t pud)
 
 #if CONFIG_PGTABLE_LEVELS > 3
 
+static __always_inline bool pgtable_l4_enabled(void)
+{
+	if (CONFIG_PGTABLE_LEVELS > 4 || !IS_ENABLED(CONFIG_ARM64_LPA2))
+		return true;
+	if (!alternative_has_feature_likely(ARM64_ALWAYS_BOOT))
+		return vabits_actual == VA_BITS;
+	return alternative_has_feature_unlikely(ARM64_HAS_LVA);
+}
+
+static inline bool mm_pud_folded(struct mm_struct *mm)
+{
+	return !pgtable_l4_enabled();
+}
+#define mm_pud_folded  mm_pud_folded
+
 #define pud_ERROR(e)	\
 	pr_err("%s:%d: bad pud %016llx.\n", __FILE__, __LINE__, pud_val(e))
 
-#define p4d_none(p4d)		(!p4d_val(p4d))
-#define p4d_bad(p4d)		(!(p4d_val(p4d) & 2))
-#define p4d_present(p4d)	(p4d_val(p4d))
+#define p4d_none(p4d)		(pgtable_l4_enabled() && !p4d_val(p4d))
+#define p4d_bad(p4d)		(pgtable_l4_enabled() && !(p4d_val(p4d) & 2))
+#define p4d_present(p4d)	(!p4d_none(p4d))
 
 static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
@@ -765,7 +780,8 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 
 static inline void p4d_clear(p4d_t *p4dp)
 {
-	set_p4d(p4dp, __p4d(0));
+	if (pgtable_l4_enabled())
+		set_p4d(p4dp, __p4d(0));
 }
 
 static inline phys_addr_t p4d_page_paddr(p4d_t p4d)
@@ -773,25 +789,67 @@ static inline phys_addr_t p4d_page_paddr(p4d_t p4d)
 	return __p4d_to_phys(p4d);
 }
 
+#define pud_index(addr)		(((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+
+static inline pud_t *p4d_to_folded_pud(p4d_t *p4dp, unsigned long addr)
+{
+	return (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr);
+}
+
 static inline pud_t *p4d_pgtable(p4d_t p4d)
 {
 	return (pud_t *)__va(p4d_page_paddr(p4d));
 }
 
-/* Find an entry in the first-level page table. */
-#define pud_offset_phys(dir, addr)	(p4d_page_paddr(READ_ONCE(*(dir))) + pud_index(addr) * sizeof(pud_t))
+static inline phys_addr_t pud_offset_phys(p4d_t *p4dp, unsigned long addr)
+{
+	BUG_ON(!pgtable_l4_enabled());
 
-#define pud_set_fixmap(addr)		((pud_t *)set_fixmap_offset(FIX_PUD, addr))
-#define pud_set_fixmap_offset(p4d, addr)	pud_set_fixmap(pud_offset_phys(p4d, addr))
-#define pud_clear_fixmap()		clear_fixmap(FIX_PUD)
+	return p4d_page_paddr(READ_ONCE(*p4dp)) + pud_index(addr) * sizeof(pud_t);
+}
 
-#define p4d_page(p4d)		pfn_to_page(__phys_to_pfn(__p4d_to_phys(p4d)))
+static inline pud_t *pud_offset(p4d_t *p4dp, unsigned long addr)
+{
+	if (!pgtable_l4_enabled())
+		return p4d_to_folded_pud(p4dp, addr);
+	return (pud_t *)__va(pud_offset_phys(p4dp, addr));
+}
+#define pud_offset	pud_offset
+
+static inline pud_t *pud_set_fixmap(unsigned long addr)
+{
+	if (!pgtable_l4_enabled())
+		return NULL;
+	return (pud_t *)set_fixmap_offset(FIX_PUD, addr);
+}
+
+static inline pud_t *pud_set_fixmap_offset(p4d_t *p4dp, unsigned long addr)
+{
+	if (!pgtable_l4_enabled())
+		return p4d_to_folded_pud(p4dp, addr);
+	return pud_set_fixmap(pud_offset_phys(p4dp, addr));
+}
+
+static inline void pud_clear_fixmap(void)
+{
+	if (pgtable_l4_enabled())
+		clear_fixmap(FIX_PUD);
+}
 
 /* use ONLY for statically allocated translation tables */
-#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
+static inline pud_t *pud_offset_kimg(p4d_t *p4dp, u64 addr)
+{
+	if (!pgtable_l4_enabled())
+		return p4d_to_folded_pud(p4dp, addr);
+	return (pud_t *)__phys_to_kimg(pud_offset_phys(p4dp, addr));
+}
+
+#define p4d_page(p4d)		pfn_to_page(__phys_to_pfn(__p4d_to_phys(p4d)))
 
 #else
 
+static inline bool pgtable_l4_enabled(void) { return false; }
+
 #define p4d_page_paddr(p4d)	({ BUILD_BUG(); 0;})
 
 /* Match pud_offset folding in <asm/generic/pgtable-nopud.h> */
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..a23d33b6b56b 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -94,7 +94,8 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pudp));
+	if (pgtable_l4_enabled())
+		tlb_remove_table(tlb, virt_to_page(pudp));
 }
 #endif
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c20c3cbd42ef..f1b9c638afa7 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1728,6 +1728,8 @@ kpti_install_ng_mappings(const struct arm64_cpu_capabilities *__unused)
 
 	if (levels == 5 && !pgtable_l5_enabled())
 		levels = 4;
+	else if (levels == 4 && !pgtable_l4_enabled())
+		levels = 3;
 
 	if (__this_cpu_read(this_cpu_vector) == vectors) {
 		const char *v = arm64_get_bp_hardening_vector(EL1_VECTOR_KPTI);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 000ae84da0ef..54273b37808b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1089,7 +1089,7 @@ static void free_empty_pud_table(p4d_t *p4dp, unsigned long addr,
 		free_empty_pmd_table(pudp, addr, next, floor, ceiling);
 	} while (addr = next, addr < end);
 
-	if (CONFIG_PGTABLE_LEVELS <= 3)
+	if (!pgtable_l4_enabled())
 		return;
 
 	if (!pgtable_range_aligned(start, end, floor, ceiling, P4D_MASK))
diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
index 3c4f8a279d2b..0c501cabc238 100644
--- a/arch/arm64/mm/pgd.c
+++ b/arch/arm64/mm/pgd.c
@@ -21,6 +21,8 @@ static bool pgdir_is_page_size(void)
 {
 	if (PGD_SIZE == PAGE_SIZE)
 		return true;
+	if (CONFIG_PGTABLE_LEVELS == 4)
+		return !pgtable_l4_enabled();
 	if (CONFIG_PGTABLE_LEVELS == 5)
 		return !pgtable_l5_enabled();
 	return false;
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 18/19] arm64: ptdump: Disregard unaddressable VA space
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (16 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 17/19] arm64: mm: Add support for folding PUDs at runtime Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 12:39 ` [PATCH v2 19/19] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs Ard Biesheuvel
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Configurations built with support for 52-bit virtual addressing can also
run on CPUs that only support 48 bits of VA space, in which case only
that part of swapper_pg_dir that represents the 48-bit addressable
region is relevant, and everything else is ignored by the hardware.

Our software pagetable walker has little in the way of input address
validation, and so it will happily start a walk from an address that is
not representable by the number of paging levels that are actually
active, resulting in lots of bogus output from the page table dumper
unless we take care to start at a valid address.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/ptdump.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 9bc4066c5bf3..ca38ac2637b5 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -358,7 +358,7 @@ void ptdump_check_wx(void)
 		.ptdump = {
 			.note_page = note_page,
 			.range = (struct ptdump_range[]) {
-				{PAGE_OFFSET, ~0UL},
+				{_PAGE_OFFSET(vabits_actual), ~0UL},
 				{0, 0}
 			}
 		}
@@ -380,6 +380,8 @@ static int __init ptdump_init(void)
 	address_markers[KASAN_START_NR].start_address = KASAN_SHADOW_START;
 #endif
 	ptdump_initialize();
+	if (vabits_actual < VA_BITS)
+		kernel_ptdump_info.base_addr = _PAGE_OFFSET(vabits_actual);
 	ptdump_debugfs_register(&kernel_ptdump_info, "kernel_page_tables");
 	return 0;
 }
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 19/19] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (17 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 18/19] arm64: ptdump: Disregard unaddressable VA space Ard Biesheuvel
@ 2022-11-24 12:39 ` Ard Biesheuvel
  2022-11-24 14:39 ` [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ryan Roberts
  2022-11-29 15:31 ` Ryan Roberts
  20 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 12:39 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

Update Kconfig to permit 4k and 16k granule configurations to be built
with 52-bit virtual addressing, now that all the prerequisites are in
place.

While at it, update the feature description so it matches on the
appropriate feature bits depending on the page size. For simplicity,
let's just keep ARM64_HAS_LVA as the feature name.

Note that LPA2 based 52-bit virtual addressing requires 52-bit physical
addressing support to be enabled as well, as programming TCR.TxSZ to
values below 16 is not allowed unless TCR.DS is set, which is what
activates the 52-bit physical addressing support.

While supporting the converse (52-bit physical addressing without 52-bit
virtual addressing) would be possible in principle, let's keep things
simple, by only allowing these features to be enabled at the same time.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig             | 17 ++++++++-------
 arch/arm64/kernel/cpufeature.c | 22 ++++++++++++++++----
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 901f4d73476d..ade7a9a007c0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -344,7 +344,9 @@ config PGTABLE_LEVELS
 	default 3 if ARM64_64K_PAGES && (ARM64_VA_BITS_48 || ARM64_VA_BITS_52)
 	default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
 	default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47
+	default 4 if ARM64_16K_PAGES && (ARM64_VA_BITS_48 || ARM64_VA_BITS_52)
 	default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48
+	default 5 if ARM64_4K_PAGES && ARM64_VA_BITS_52
 
 config ARCH_SUPPORTS_UPROBES
 	def_bool y
@@ -358,13 +360,13 @@ config BROKEN_GAS_INST
 config KASAN_SHADOW_OFFSET
 	hex
 	depends on KASAN_GENERIC || KASAN_SW_TAGS
-	default 0xdfff800000000000 if (ARM64_VA_BITS_48 || ARM64_VA_BITS_52) && !KASAN_SW_TAGS
-	default 0xdfffc00000000000 if ARM64_VA_BITS_47 && !KASAN_SW_TAGS
+	default 0xdfff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && !KASAN_SW_TAGS
+	default 0xdfffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && !KASAN_SW_TAGS
 	default 0xdffffe0000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS
 	default 0xdfffffc000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS
 	default 0xdffffff800000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS
-	default 0xefff800000000000 if (ARM64_VA_BITS_48 || ARM64_VA_BITS_52) && KASAN_SW_TAGS
-	default 0xefffc00000000000 if ARM64_VA_BITS_47 && KASAN_SW_TAGS
+	default 0xefff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
+	default 0xefffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
 	default 0xeffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
 	default 0xefffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
 	default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
@@ -1197,7 +1199,7 @@ config ARM64_VA_BITS_48
 
 config ARM64_VA_BITS_52
 	bool "52-bit"
-	depends on ARM64_64K_PAGES && (ARM64_PAN || !ARM64_SW_TTBR0_PAN)
+	depends on ARM64_PAN || !ARM64_SW_TTBR0_PAN
 	help
 	  Enable 52-bit virtual addressing for userspace when explicitly
 	  requested via a hint to mmap(). The kernel will also use 52-bit
@@ -1244,10 +1246,11 @@ choice
 
 config ARM64_PA_BITS_48
 	bool "48-bit"
+	depends on ARM64_64K_PAGES || !ARM64_VA_BITS_52
 
 config ARM64_PA_BITS_52
-	bool "52-bit (ARMv8.2)"
-	depends on ARM64_64K_PAGES
+	bool "52-bit"
+	depends on ARM64_64K_PAGES || ARM64_VA_BITS_52
 	depends on ARM64_PAN || !ARM64_SW_TTBR0_PAN
 	help
 	  Enable support for a 52-bit physical address space, introduced as
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index f1b9c638afa7..834dc7b76e1c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2665,15 +2665,29 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 	},
 #ifdef CONFIG_ARM64_VA_BITS_52
 	{
-		.desc = "52-bit Virtual Addressing (LVA)",
 		.capability = ARM64_HAS_LVA,
 		.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
-		.sys_reg = SYS_ID_AA64MMFR2_EL1,
-		.sign = FTR_UNSIGNED,
+		.matches = has_cpuid_feature,
 		.field_width = 4,
+#ifdef CONFIG_ARM64_64K_PAGES
+		.desc = "52-bit Virtual Addressing (LVA)",
+		.sign = FTR_SIGNED,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
 		.field_pos = ID_AA64MMFR2_EL1_VARange_SHIFT,
-		.matches = has_cpuid_feature,
 		.min_field_value = ID_AA64MMFR2_EL1_VARange_52,
+#else
+		.desc = "52-bit Virtual Addressing (LPA2)",
+		.sys_reg = SYS_ID_AA64MMFR0_EL1,
+#ifdef CONFIG_ARM64_4K_PAGES
+		.sign = FTR_SIGNED,
+		.field_pos = ID_AA64MMFR0_EL1_TGRAN4_SHIFT,
+		.min_field_value = ID_AA64MMFR0_EL1_TGRAN4_52_BIT,
+#else
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR0_EL1_TGRAN16_SHIFT,
+		.min_field_value = ID_AA64MMFR0_EL1_TGRAN16_52_BIT,
+#endif
+#endif
 	},
 #endif
 	{},
-- 
2.38.1.584.g0f3c55d4c2-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (18 preceding siblings ...)
  2022-11-24 12:39 ` [PATCH v2 19/19] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs Ard Biesheuvel
@ 2022-11-24 14:39 ` Ryan Roberts
  2022-11-24 17:14   ` Ard Biesheuvel
  2022-11-29 15:31 ` Ryan Roberts
  20 siblings, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-11-24 14:39 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Marc Zyngier, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

Hi Ard,

Thanks for including me on this. I'll plan to do a review over the next week or
so, but in the meantime, I have a couple of general questions/comments:

On 24/11/2022 12:39, Ard Biesheuvel wrote:
> Enable support for LPA2 when running with 4k or 16k pages. In the former
> case, this requires 5 level paging with a runtime fallback to 4 on
> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
> pages, where we fall back to 3 level paging (47 bit virtual addressing)
> on non-LPA2 configurations. 

It seems odd to me that if you have a non-LPA2 system, if you run a kernel that
is compiled for 16KB pages and 48 VA bits, then you will get 48 VA bits. But if
you run a kernel that is compiled for 16KB pages and 52 VA bits then you will
get 47 VA bits? Wouldn't that pose a potential user space compat issue?

> (Falling back to 48 bits would involve
> finding a workaround for the fact that we cannot construct a level 0
> table covering 52 bits of VA space that appears aligned to its size in
> memory, and has the top 2 entries that represent the 48-bit region
> appearing at an alignment of 64 bytes, which is required by the
> architecture for TTBR address values. 

I'm not sure I've understood this. The level 0 table would need 32 entries for
52 VA bits so the table size is 256 bytes, naturally aligned to 256 bytes. 64 is
a factor of 256 so surely the top 2 entries are guaranteed to also meet the
constraint for the fallback path too?

> Also, using an additional level of
> paging to translate a single VA bit is wasteful in terms of TLB
> efficiency)
> 
> This means support for falling back to 3 levels of paging at runtime
> when configured for 4 is also needed.
> 
> Another thing worth to note is that the repurposed physical address bits
> in the page table descriptors were not RES0 before, and so there is now
> a big global switch (called TCR.DS) which controls how all page table
> descriptors are interpreted. This requires some extra care in the PTE
> conversion helpers, and additional handling in the boot code to ensure
> that we set TCR.DS safely if supported (and not overridden)
> 
> Note that this series is mostly orthogonal to work by Anshuman done last
> year: this series assumes that 52-bit physical addressing is never
> needed to map the kernel image itself, and therefore that we never need
> ID map range extension to cover the kernel with a 5th level when running
> with 4. 

This limitation will certainly make it more tricky to test the the LPA2 stage2
implementation that I have done. I've got scripts that construct host systems
with all the RAM above 48 bits so that the output addresses in the stage2 page
tables can be guaranteed to contain OAs > 48 bits. I think the work around here
would be to place the RAM so that it straddles the 48 bit boundary such that the
size of RAM below is the same size as the kernel image, and place the kernel
image in it. Then this will ensure that the VM's memory still uses the RAM above
the threshold. Or is there a simpler approach?

> And given that the LPA2 architectural feature covers both the
> virtual and physical range extensions, where enabling the latter is
> required to enable the former, we can simplify things further by only
> enabling them as a pair. (I.e., 52-bit physical addressing cannot be
> enabled for 48-bit VA space or smaller)
> 
> [...]

Thanks,
Ryan



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-24 14:39 ` [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ryan Roberts
@ 2022-11-24 17:14   ` Ard Biesheuvel
  2022-11-25  9:22     ` Ryan Roberts
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 17:14 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Thu, 24 Nov 2022 at 15:39, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> Hi Ard,
>
> Thanks for including me on this. I'll plan to do a review over the next week or
> so, but in the meantime, I have a couple of general questions/comments:
>
> On 24/11/2022 12:39, Ard Biesheuvel wrote:
> > Enable support for LPA2 when running with 4k or 16k pages. In the former
> > case, this requires 5 level paging with a runtime fallback to 4 on
> > non-LPA2 hardware. For consistency, the same approach is adopted for 16k
> > pages, where we fall back to 3 level paging (47 bit virtual addressing)
> > on non-LPA2 configurations.
>
> It seems odd to me that if you have a non-LPA2 system, if you run a kernel that
> is compiled for 16KB pages and 48 VA bits, then you will get 48 VA bits. But if
> you run a kernel that is compiled for 16KB pages and 52 VA bits then you will
> get 47 VA bits? Wouldn't that pose a potential user space compat issue?
>

Well, given that Android happily runs with 39-bit VAs to avoid 4 level
paging at all cost, I don't think that is a universal concern.

The benefit of this approach is that you can decide at runtime whether
you want to take the performance hit of 4 (or 5) level paging to get
access to the extended VA space.

> > (Falling back to 48 bits would involve
> > finding a workaround for the fact that we cannot construct a level 0
> > table covering 52 bits of VA space that appears aligned to its size in
> > memory, and has the top 2 entries that represent the 48-bit region
> > appearing at an alignment of 64 bytes, which is required by the
> > architecture for TTBR address values.
>
> I'm not sure I've understood this. The level 0 table would need 32 entries for
> 52 VA bits so the table size is 256 bytes, naturally aligned to 256 bytes. 64 is
> a factor of 256 so surely the top 2 entries are guaranteed to also meet the
> constraint for the fallback path too?
>

The top 2 entries are 16 bytes combined, and end on a 256 byte aligned
boundary so I don't see how they can start on a 64 byte aligned
boundary at the same time.

My RFC had a workaround for this, but it is a bit nasty because we
need to copy those two entries at the right time and keep them in
sync.

> > Also, using an additional level of
> > paging to translate a single VA bit is wasteful in terms of TLB
> > efficiency)
> >
> > This means support for falling back to 3 levels of paging at runtime
> > when configured for 4 is also needed.
> >
> > Another thing worth to note is that the repurposed physical address bits
> > in the page table descriptors were not RES0 before, and so there is now
> > a big global switch (called TCR.DS) which controls how all page table
> > descriptors are interpreted. This requires some extra care in the PTE
> > conversion helpers, and additional handling in the boot code to ensure
> > that we set TCR.DS safely if supported (and not overridden)
> >
> > Note that this series is mostly orthogonal to work by Anshuman done last
> > year: this series assumes that 52-bit physical addressing is never
> > needed to map the kernel image itself, and therefore that we never need
> > ID map range extension to cover the kernel with a 5th level when running
> > with 4.
>
> This limitation will certainly make it more tricky to test the the LPA2 stage2
> implementation that I have done. I've got scripts that construct host systems
> with all the RAM above 48 bits so that the output addresses in the stage2 page
> tables can be guaranteed to contain OAs > 48 bits. I think the work around here
> would be to place the RAM so that it straddles the 48 bit boundary such that the
> size of RAM below is the same size as the kernel image, and place the kernel
> image in it. Then this will ensure that the VM's memory still uses the RAM above
> the threshold. Or is there a simpler approach?
>

No, that sounds reasonable. I'm using QEMU which happily lets you put
the start of DRAM at any address you can imagine (if you recompile it)

Another approach could be to simply stick a memblock_reserve()
somewhere that covers all 48-bit addressable memory, but you will need
some of both in any case.

> > And given that the LPA2 architectural feature covers both the
> > virtual and physical range extensions, where enabling the latter is
> > required to enable the former, we can simplify things further by only
> > enabling them as a pair. (I.e., 52-bit physical addressing cannot be
> > enabled for 48-bit VA space or smaller)
> >
> > [...]
>
> Thanks,
> Ryan
>
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 16/19] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
  2022-11-24 12:39 ` [PATCH v2 16/19] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging Ard Biesheuvel
@ 2022-11-24 17:44   ` Ard Biesheuvel
  0 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-24 17:44 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Marc Zyngier, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson, Ryan Roberts

On Thu, 24 Nov 2022 at 13:40, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Allow the KASAN init code to deal with 5 levels of paging, and relax the
> requirement that the shadow region is aligned to the top level pgd_t
> size. This is necessary for LPA2 based 52-bit virtual addressing, where
> the KASAN shadow will never be aligned to the pgd_t size. Allowing this
> also enables the 16k/48-bit case for KASAN, which is a nice bonus.
>
> This involves some hackery to manipulate the root and next level page
> tables without having to distinguish all the various configurations,
> including 16k/48-bits (which has a two entry pgd_t level), and LPA2
> configurations running with one translation level less on non-LPA2
> hardware.
>

This patch is not entirely correct: to safely allow the start of the
kasan shadow region to be misaligned wrt the top level block size, we
need to install a next level table that covers it before we map the
early shadow, or otherwise, we may end up mapping parts of the linear
map into the zero shadow page tables.

I have a fix that I will incorporate the next time around.

> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/Kconfig         |   2 +-
>  arch/arm64/mm/kasan_init.c | 124 ++++++++++++++++++--
>  2 files changed, 112 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6d299c6c0a56..901f4d73476d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -153,7 +153,7 @@ config ARM64
>         select HAVE_ARCH_HUGE_VMAP
>         select HAVE_ARCH_JUMP_LABEL
>         select HAVE_ARCH_JUMP_LABEL_RELATIVE
> -       select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
> +       select HAVE_ARCH_KASAN
>         select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
>         select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
>         select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index 7e32f21fb8e1..c422952e439b 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -23,7 +23,7 @@
>
>  #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
>
> -static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
> +static pgd_t tmp_pg_dir[PTRS_PER_PTE] __initdata __aligned(PAGE_SIZE);
>
>  /*
>   * The p*d_populate functions call virt_to_phys implicitly so they can't be used
> @@ -99,6 +99,19 @@ static pud_t *__init kasan_pud_offset(p4d_t *p4dp, unsigned long addr, int node,
>         return early ? pud_offset_kimg(p4dp, addr) : pud_offset(p4dp, addr);
>  }
>
> +static p4d_t *__init kasan_p4d_offset(pgd_t *pgdp, unsigned long addr, int node,
> +                                     bool early)
> +{
> +       if (pgd_none(READ_ONCE(*pgdp))) {
> +               phys_addr_t p4d_phys = early ?
> +                               __pa_symbol(kasan_early_shadow_p4d)
> +                                       : kasan_alloc_zeroed_page(node);
> +               __pgd_populate(pgdp, p4d_phys, PGD_TYPE_TABLE);
> +       }
> +
> +       return early ? p4d_offset_kimg(pgdp, addr) : p4d_offset(pgdp, addr);
> +}
> +
>  static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
>                                       unsigned long end, int node, bool early)
>  {
> @@ -144,7 +157,7 @@ static void __init kasan_p4d_populate(pgd_t *pgdp, unsigned long addr,
>                                       unsigned long end, int node, bool early)
>  {
>         unsigned long next;
> -       p4d_t *p4dp = p4d_offset(pgdp, addr);
> +       p4d_t *p4dp = kasan_p4d_offset(pgdp, addr, node, early);
>
>         do {
>                 next = p4d_addr_end(addr, end);
> @@ -165,14 +178,20 @@ static void __init kasan_pgd_populate(unsigned long addr, unsigned long end,
>         } while (pgdp++, addr = next, addr != end);
>  }
>
> +#if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS > 4
> +#define SHADOW_ALIGN   P4D_SIZE
> +#else
> +#define SHADOW_ALIGN   PUD_SIZE
> +#endif
> +
>  /* The early shadow maps everything to a single page of zeroes */
>  asmlinkage void __init kasan_early_init(void)
>  {
>         BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
>                 KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
> -       BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), PGDIR_SIZE));
> -       BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), PGDIR_SIZE));
> -       BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, PGDIR_SIZE));
> +       BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), SHADOW_ALIGN));
> +       BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), SHADOW_ALIGN));
> +       BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, SHADOW_ALIGN));
>         kasan_pgd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, NUMA_NO_NODE,
>                            true);
>  }
> @@ -184,20 +203,86 @@ static void __init kasan_map_populate(unsigned long start, unsigned long end,
>         kasan_pgd_populate(start & PAGE_MASK, PAGE_ALIGN(end), node, false);
>  }
>
> -static void __init clear_pgds(unsigned long start,
> -                       unsigned long end)
> +/*
> + * Return whether 'addr' is aligned to the size covered by a top level
> + * descriptor.
> + */
> +static bool __init top_level_aligned(u64 addr)
> +{
> +       int shift = (VA_LEVELS(vabits_actual) - 1) * (PAGE_SHIFT - 3);
> +
> +       return (addr % (PAGE_SIZE << shift)) == 0;
> +}
> +
> +/*
> + * Return the descriptor index of 'addr' in the top level table
> + */
> +static int __init top_level_idx(u64 addr)
>  {
>         /*
> -        * Remove references to kasan page tables from
> -        * swapper_pg_dir. pgd_clear() can't be used
> -        * here because it's nop on 2,3-level pagetable setups
> +        * On 64k pages, the TTBR1 range root tables are extended for 52-bit
> +        * virtual addressing, and TTBR1 will simply point to the pgd_t entry
> +        * that covers the start of the 48-bit addressable VA space if LVA is
> +        * not implemented. This means we need to index the table as usual,
> +        * instead of masking off bits based on vabits_actual.
>          */
> -       for (; start < end; start += PGDIR_SIZE)
> -               set_pgd(pgd_offset_k(start), __pgd(0));
> +       u64 vabits = IS_ENABLED(CONFIG_ARM64_64K_PAGES) ? VA_BITS
> +                                                       : vabits_actual;
> +       int shift = (VA_LEVELS(vabits) - 1) * (PAGE_SHIFT - 3);
> +
> +       return (addr & ~_PAGE_OFFSET(vabits)) >> (shift + PAGE_SHIFT);
> +}
> +
> +/*
> + * Clone a next level table from swapper_pg_dir into tmp_pg_dir
> + */
> +static void __init clone_next_level(u64 addr, pgd_t *tmp_pg_dir, pud_t *pud)
> +{
> +       int idx = top_level_idx(addr);
> +       pgd_t pgd = READ_ONCE(swapper_pg_dir[idx]);
> +       pud_t *pudp = (pud_t *)__phys_to_kimg(__pgd_to_phys(pgd));
> +
> +       memcpy(pud, pudp, PAGE_SIZE);
> +       tmp_pg_dir[idx] = __pgd(__phys_to_pgd_val(__pa_symbol(pud)) |
> +                               PUD_TYPE_TABLE);
> +}
> +
> +/*
> + * Return the descriptor index of 'addr' in the next level table
> + */
> +static int __init next_level_idx(u64 addr)
> +{
> +       int shift = (VA_LEVELS(vabits_actual) - 2) * (PAGE_SHIFT - 3);
> +
> +       return (addr >> (shift + PAGE_SHIFT)) % PTRS_PER_PTE;
> +}
> +
> +/*
> + * Dereference the table descriptor at 'pgd_idx' and clear the entries from
> + * 'start' to 'end' from the table.
> + */
> +static void __init clear_next_level(int pgd_idx, int start, int end)
> +{
> +       pgd_t pgd = READ_ONCE(swapper_pg_dir[pgd_idx]);
> +       pud_t *pudp = (pud_t *)__phys_to_kimg(__pgd_to_phys(pgd));
> +
> +       memset(&pudp[start], 0, (end - start) * sizeof(pud_t));
> +}
> +
> +static void __init clear_shadow(u64 start, u64 end)
> +{
> +       int l = top_level_idx(start), m = top_level_idx(end);
> +
> +       if (!top_level_aligned(start))
> +               clear_next_level(l++, next_level_idx(start), PTRS_PER_PTE - 1);
> +       if (!top_level_aligned(end))
> +               clear_next_level(m, 0, next_level_idx(end));
> +       memset(&swapper_pg_dir[l], 0, (m - l) * sizeof(pgd_t));
>  }
>
>  static void __init kasan_init_shadow(void)
>  {
> +       static pud_t pud[2][PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
>         u64 kimg_shadow_start, kimg_shadow_end;
>         u64 mod_shadow_start, mod_shadow_end;
>         u64 vmalloc_shadow_end;
> @@ -220,10 +305,23 @@ static void __init kasan_init_shadow(void)
>          * setup will be finished.
>          */
>         memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir));
> +
> +       /*
> +        * If the start or end address of the shadow region is not aligned to
> +        * the top level size, we have to allocate a temporary next-level table
> +        * in each case, clone the next level of descriptors, and install the
> +        * table into tmp_pg_dir. Note that with 5 levels of paging, the next
> +        * level will in fact be p4d_t, but that makes no difference in this
> +        * case.
> +        */
> +       if (!top_level_aligned(KASAN_SHADOW_START))
> +               clone_next_level(KASAN_SHADOW_START, tmp_pg_dir, pud[0]);
> +       if (!top_level_aligned(KASAN_SHADOW_END))
> +               clone_next_level(KASAN_SHADOW_END, tmp_pg_dir, pud[1]);
>         dsb(ishst);
>         cpu_replace_ttbr1(lm_alias(tmp_pg_dir));
>
> -       clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
> +       clear_shadow(KASAN_SHADOW_START, KASAN_SHADOW_END);
>
>         kasan_map_populate(kimg_shadow_start, kimg_shadow_end,
>                            early_pfn_to_nid(virt_to_pfn(lm_alias(KERNEL_START))));
> --
> 2.38.1.584.g0f3c55d4c2-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-24 17:14   ` Ard Biesheuvel
@ 2022-11-25  9:22     ` Ryan Roberts
  2022-11-25  9:35       ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-11-25  9:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 24/11/2022 17:14, Ard Biesheuvel wrote:
> On Thu, 24 Nov 2022 at 15:39, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Hi Ard,
>>
>> Thanks for including me on this. I'll plan to do a review over the next week or
>> so, but in the meantime, I have a couple of general questions/comments:
>>
>> On 24/11/2022 12:39, Ard Biesheuvel wrote:
>>> Enable support for LPA2 when running with 4k or 16k pages. In the former
>>> case, this requires 5 level paging with a runtime fallback to 4 on
>>> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
>>> pages, where we fall back to 3 level paging (47 bit virtual addressing)
>>> on non-LPA2 configurations.
>>
>> It seems odd to me that if you have a non-LPA2 system, if you run a kernel that
>> is compiled for 16KB pages and 48 VA bits, then you will get 48 VA bits. But if
>> you run a kernel that is compiled for 16KB pages and 52 VA bits then you will
>> get 47 VA bits? Wouldn't that pose a potential user space compat issue?
>>
> 
> Well, given that Android happily runs with 39-bit VAs to avoid 4 level
> paging at all cost, I don't think that is a universal concern.

Well presumably the Android kernel is always explicitly compiled for 39 VA bits
so that's what user space is used to? I was really just making the point that if
you have (the admittedly exotic and unlikely) case of having a 16KB kernel
previously compiled for 48 VA bits, and you "upgrade" it to 52 VA bits now that
the option is available, on HW without LPA2, this will actually be observed as a
"downgrade" to 47 bits. If you previously wanted to limit to 3 levels of lookup
with 16KB you would already have been compiling for 47 VA bits.

> 
> The benefit of this approach is that you can decide at runtime whether
> you want to take the performance hit of 4 (or 5) level paging to get
> access to the extended VA space.
> 
>>> (Falling back to 48 bits would involve
>>> finding a workaround for the fact that we cannot construct a level 0
>>> table covering 52 bits of VA space that appears aligned to its size in
>>> memory, and has the top 2 entries that represent the 48-bit region
>>> appearing at an alignment of 64 bytes, which is required by the
>>> architecture for TTBR address values.
>>
>> I'm not sure I've understood this. The level 0 table would need 32 entries for
>> 52 VA bits so the table size is 256 bytes, naturally aligned to 256 bytes. 64 is
>> a factor of 256 so surely the top 2 entries are guaranteed to also meet the
>> constraint for the fallback path too?
>>
> 
> The top 2 entries are 16 bytes combined, and end on a 256 byte aligned
> boundary so I don't see how they can start on a 64 byte aligned
> boundary at the same time.

I'm still not following; why would the 2 entry/16 byte table *end* on a 256 byte
boundary? I guess I should go and read your patch before making assumptions, but
my assumption from your description here was that you were optimistically
allocating a 32 entry/256 byte table for the 52 VA bit case, then needing to
reuse that table for the 2 entry/16 byte case if HW turns out not to support
LPA2. In which case, surely the 2 entry table would be overlayed at the start
(low address) of the allocated 32 entry table, and therefore its alignment is
256 bytes, which meets the HW's 64 byte alignment requirement?

> 
> My RFC had a workaround for this, but it is a bit nasty because we
> need to copy those two entries at the right time and keep them in
> sync.
> 
>>> Also, using an additional level of
>>> paging to translate a single VA bit is wasteful in terms of TLB
>>> efficiency)
>>>
>>> This means support for falling back to 3 levels of paging at runtime
>>> when configured for 4 is also needed.
>>>
>>> Another thing worth to note is that the repurposed physical address bits
>>> in the page table descriptors were not RES0 before, and so there is now
>>> a big global switch (called TCR.DS) which controls how all page table
>>> descriptors are interpreted. This requires some extra care in the PTE
>>> conversion helpers, and additional handling in the boot code to ensure
>>> that we set TCR.DS safely if supported (and not overridden)
>>>
>>> Note that this series is mostly orthogonal to work by Anshuman done last
>>> year: this series assumes that 52-bit physical addressing is never
>>> needed to map the kernel image itself, and therefore that we never need
>>> ID map range extension to cover the kernel with a 5th level when running
>>> with 4.
>>
>> This limitation will certainly make it more tricky to test the the LPA2 stage2
>> implementation that I have done. I've got scripts that construct host systems
>> with all the RAM above 48 bits so that the output addresses in the stage2 page
>> tables can be guaranteed to contain OAs > 48 bits. I think the work around here
>> would be to place the RAM so that it straddles the 48 bit boundary such that the
>> size of RAM below is the same size as the kernel image, and place the kernel
>> image in it. Then this will ensure that the VM's memory still uses the RAM above
>> the threshold. Or is there a simpler approach?
>>
> 
> No, that sounds reasonable. I'm using QEMU which happily lets you put
> the start of DRAM at any address you can imagine (if you recompile it)

I'm running on FVP, which will let me do this with runtime parameters. Anyway,
I'll update my tests to cope with this constraint and run this patch set
through, and I'll let you know if it spots anything.
> 
> Another approach could be to simply stick a memblock_reserve()
> somewhere that covers all 48-bit addressable memory, but you will need
> some of both in any case.
> 
>>> And given that the LPA2 architectural feature covers both the
>>> virtual and physical range extensions, where enabling the latter is
>>> required to enable the former, we can simplify things further by only
>>> enabling them as a pair. (I.e., 52-bit physical addressing cannot be
>>> enabled for 48-bit VA space or smaller)
>>>
>>> [...]
>>
>> Thanks,
>> Ryan
>>
>>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-25  9:22     ` Ryan Roberts
@ 2022-11-25  9:35       ` Ard Biesheuvel
  2022-11-25 10:07         ` Ryan Roberts
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-25  9:35 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Fri, 25 Nov 2022 at 10:23, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 24/11/2022 17:14, Ard Biesheuvel wrote:
> > On Thu, 24 Nov 2022 at 15:39, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> Hi Ard,
> >>
> >> Thanks for including me on this. I'll plan to do a review over the next week or
> >> so, but in the meantime, I have a couple of general questions/comments:
> >>
> >> On 24/11/2022 12:39, Ard Biesheuvel wrote:
> >>> Enable support for LPA2 when running with 4k or 16k pages. In the former
> >>> case, this requires 5 level paging with a runtime fallback to 4 on
> >>> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
> >>> pages, where we fall back to 3 level paging (47 bit virtual addressing)
> >>> on non-LPA2 configurations.
> >>
> >> It seems odd to me that if you have a non-LPA2 system, if you run a kernel that
> >> is compiled for 16KB pages and 48 VA bits, then you will get 48 VA bits. But if
> >> you run a kernel that is compiled for 16KB pages and 52 VA bits then you will
> >> get 47 VA bits? Wouldn't that pose a potential user space compat issue?
> >>
> >
> > Well, given that Android happily runs with 39-bit VAs to avoid 4 level
> > paging at all cost, I don't think that is a universal concern.
>
> Well presumably the Android kernel is always explicitly compiled for 39 VA bits
> so that's what user space is used to? I was really just making the point that if
> you have (the admittedly exotic and unlikely) case of having a 16KB kernel
> previously compiled for 48 VA bits, and you "upgrade" it to 52 VA bits now that
> the option is available, on HW without LPA2, this will actually be observed as a
> "downgrade" to 47 bits. If you previously wanted to limit to 3 levels of lookup
> with 16KB you would already have been compiling for 47 VA bits.
>

I am not debating that. I'm just saying that, without any hardware in
existence, it is difficult to predict which of these concerns is going
to dominate, and so I opted for the least messy and most symmetrical
approach.

> >
> > The benefit of this approach is that you can decide at runtime whether
> > you want to take the performance hit of 4 (or 5) level paging to get
> > access to the extended VA space.
> >
> >>> (Falling back to 48 bits would involve
> >>> finding a workaround for the fact that we cannot construct a level 0
> >>> table covering 52 bits of VA space that appears aligned to its size in
> >>> memory, and has the top 2 entries that represent the 48-bit region
> >>> appearing at an alignment of 64 bytes, which is required by the
> >>> architecture for TTBR address values.
> >>
> >> I'm not sure I've understood this. The level 0 table would need 32 entries for
> >> 52 VA bits so the table size is 256 bytes, naturally aligned to 256 bytes. 64 is
> >> a factor of 256 so surely the top 2 entries are guaranteed to also meet the
> >> constraint for the fallback path too?
> >>
> >
> > The top 2 entries are 16 bytes combined, and end on a 256 byte aligned
> > boundary so I don't see how they can start on a 64 byte aligned
> > boundary at the same time.
>
> I'm still not following; why would the 2 entry/16 byte table *end* on a 256 byte
> boundary? I guess I should go and read your patch before making assumptions, but
> my assumption from your description here was that you were optimistically
> allocating a 32 entry/256 byte table for the 52 VA bit case, then needing to
> reuse that table for the 2 entry/16 byte case if HW turns out not to support
> LPA2. In which case, surely the 2 entry table would be overlayed at the start
> (low address) of the allocated 32 entry table, and therefore its alignment is
> 256 bytes, which meets the HW's 64 byte alignment requirement?
>

No, it's at the end, that is the point. I am specifically referring to
TTBR1 upper region page tables here.

Please refer to the existing ttbr1_offset asm macro, which implements
this today for 64k pages + LVA. In this case, however, the condensed
table covers 6 bits of translation so it is naturally aligned to the
TTBR minimum alignment.

> >
> > My RFC had a workaround for this, but it is a bit nasty because we
> > need to copy those two entries at the right time and keep them in
> > sync.
> >
> >>> Also, using an additional level of
> >>> paging to translate a single VA bit is wasteful in terms of TLB
> >>> efficiency)
> >>>
> >>> This means support for falling back to 3 levels of paging at runtime
> >>> when configured for 4 is also needed.
> >>>
> >>> Another thing worth to note is that the repurposed physical address bits
> >>> in the page table descriptors were not RES0 before, and so there is now
> >>> a big global switch (called TCR.DS) which controls how all page table
> >>> descriptors are interpreted. This requires some extra care in the PTE
> >>> conversion helpers, and additional handling in the boot code to ensure
> >>> that we set TCR.DS safely if supported (and not overridden)
> >>>
> >>> Note that this series is mostly orthogonal to work by Anshuman done last
> >>> year: this series assumes that 52-bit physical addressing is never
> >>> needed to map the kernel image itself, and therefore that we never need
> >>> ID map range extension to cover the kernel with a 5th level when running
> >>> with 4.
> >>
> >> This limitation will certainly make it more tricky to test the the LPA2 stage2
> >> implementation that I have done. I've got scripts that construct host systems
> >> with all the RAM above 48 bits so that the output addresses in the stage2 page
> >> tables can be guaranteed to contain OAs > 48 bits. I think the work around here
> >> would be to place the RAM so that it straddles the 48 bit boundary such that the
> >> size of RAM below is the same size as the kernel image, and place the kernel
> >> image in it. Then this will ensure that the VM's memory still uses the RAM above
> >> the threshold. Or is there a simpler approach?
> >>
> >
> > No, that sounds reasonable. I'm using QEMU which happily lets you put
> > the start of DRAM at any address you can imagine (if you recompile it)
>
> I'm running on FVP, which will let me do this with runtime parameters. Anyway,
> I'll update my tests to cope with this constraint and run this patch set
> through, and I'll let you know if it spots anything.

Excellent, thanks.

> >
> > Another approach could be to simply stick a memblock_reserve()
> > somewhere that covers all 48-bit addressable memory, but you will need
> > some of both in any case.
> >
> >>> And given that the LPA2 architectural feature covers both the
> >>> virtual and physical range extensions, where enabling the latter is
> >>> required to enable the former, we can simplify things further by only
> >>> enabling them as a pair. (I.e., 52-bit physical addressing cannot be
> >>> enabled for 48-bit VA space or smaller)
> >>>
> >>> [...]
> >>
> >> Thanks,
> >> Ryan
> >>
> >>
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-25  9:35       ` Ard Biesheuvel
@ 2022-11-25 10:07         ` Ryan Roberts
  2022-11-25 10:36           ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-11-25 10:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 25/11/2022 09:35, Ard Biesheuvel wrote:
> On Fri, 25 Nov 2022 at 10:23, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 24/11/2022 17:14, Ard Biesheuvel wrote:
>>> On Thu, 24 Nov 2022 at 15:39, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>> Thanks for including me on this. I'll plan to do a review over the next week or
>>>> so, but in the meantime, I have a couple of general questions/comments:
>>>>
>>>> On 24/11/2022 12:39, Ard Biesheuvel wrote:
>>>>> Enable support for LPA2 when running with 4k or 16k pages. In the former
>>>>> case, this requires 5 level paging with a runtime fallback to 4 on
>>>>> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
>>>>> pages, where we fall back to 3 level paging (47 bit virtual addressing)
>>>>> on non-LPA2 configurations.
>>>>
>>>> It seems odd to me that if you have a non-LPA2 system, if you run a kernel that
>>>> is compiled for 16KB pages and 48 VA bits, then you will get 48 VA bits. But if
>>>> you run a kernel that is compiled for 16KB pages and 52 VA bits then you will
>>>> get 47 VA bits? Wouldn't that pose a potential user space compat issue?
>>>>
>>>
>>> Well, given that Android happily runs with 39-bit VAs to avoid 4 level
>>> paging at all cost, I don't think that is a universal concern.
>>
>> Well presumably the Android kernel is always explicitly compiled for 39 VA bits
>> so that's what user space is used to? I was really just making the point that if
>> you have (the admittedly exotic and unlikely) case of having a 16KB kernel
>> previously compiled for 48 VA bits, and you "upgrade" it to 52 VA bits now that
>> the option is available, on HW without LPA2, this will actually be observed as a
>> "downgrade" to 47 bits. If you previously wanted to limit to 3 levels of lookup
>> with 16KB you would already have been compiling for 47 VA bits.
>>
> 
> I am not debating that. I'm just saying that, without any hardware in
> existence, it is difficult to predict which of these concerns is going
> to dominate, and so I opted for the least messy and most symmetrical
> approach.

OK fair enough. My opinion is logged ;-).

> 
>>>
>>> The benefit of this approach is that you can decide at runtime whether
>>> you want to take the performance hit of 4 (or 5) level paging to get
>>> access to the extended VA space.
>>>
>>>>> (Falling back to 48 bits would involve
>>>>> finding a workaround for the fact that we cannot construct a level 0
>>>>> table covering 52 bits of VA space that appears aligned to its size in
>>>>> memory, and has the top 2 entries that represent the 48-bit region
>>>>> appearing at an alignment of 64 bytes, which is required by the
>>>>> architecture for TTBR address values.
>>>>
>>>> I'm not sure I've understood this. The level 0 table would need 32 entries for
>>>> 52 VA bits so the table size is 256 bytes, naturally aligned to 256 bytes. 64 is
>>>> a factor of 256 so surely the top 2 entries are guaranteed to also meet the
>>>> constraint for the fallback path too?
>>>>
>>>
>>> The top 2 entries are 16 bytes combined, and end on a 256 byte aligned
>>> boundary so I don't see how they can start on a 64 byte aligned
>>> boundary at the same time.
>>
>> I'm still not following; why would the 2 entry/16 byte table *end* on a 256 byte
>> boundary? I guess I should go and read your patch before making assumptions, but
>> my assumption from your description here was that you were optimistically
>> allocating a 32 entry/256 byte table for the 52 VA bit case, then needing to
>> reuse that table for the 2 entry/16 byte case if HW turns out not to support
>> LPA2. In which case, surely the 2 entry table would be overlayed at the start
>> (low address) of the allocated 32 entry table, and therefore its alignment is
>> 256 bytes, which meets the HW's 64 byte alignment requirement?
>>
> 
> No, it's at the end, that is the point. I am specifically referring to
> TTBR1 upper region page tables here.
> 
> Please refer to the existing ttbr1_offset asm macro, which implements
> this today for 64k pages + LVA. In this case, however, the condensed
> table covers 6 bits of translation so it is naturally aligned to the
> TTBR minimum alignment.

Afraid I don't see any such ttbr1_offset macro, either in upstream or the branch
you posted. The best I can find is TTBR1_OFFSET in arm arch, which I'm guessing
isn't it. I'm keen to understand this better if you can point me to the right
location?

Regardless, the Arm ARM states this for TTBR1_EL1.BADDR:

"""
BADDR[47:1], bits [47:1]

Translation table base address:
• Bits A[47:x] of the stage 1 translation table base address bits are in
register bits[47:x].
• Bits A[(x-1):0] of the stage 1 translation table base address are zero.

Address bit x is the minimum address bit required to align the translation table
to the size of the table. The smallest permitted value of x is 6. The AArch64
Virtual Memory System Architecture chapter describes how x is calculated based
on the value of TCR_EL1.T1SZ, the translation stage, and the translation granule
size.

Note
A translation table is required to be aligned to the size of the table. If a
table contains fewer than eight entries, it must be aligned on a 64 byte address
boundary.
"""

I don't see how that is referring to the alignment of the *end* of the table?



> 
>>>
>>> My RFC had a workaround for this, but it is a bit nasty because we
>>> need to copy those two entries at the right time and keep them in
>>> sync.
>>>
>>>>> Also, using an additional level of
>>>>> paging to translate a single VA bit is wasteful in terms of TLB
>>>>> efficiency)
>>>>>
>>>>> This means support for falling back to 3 levels of paging at runtime
>>>>> when configured for 4 is also needed.
>>>>>
>>>>> Another thing worth to note is that the repurposed physical address bits
>>>>> in the page table descriptors were not RES0 before, and so there is now
>>>>> a big global switch (called TCR.DS) which controls how all page table
>>>>> descriptors are interpreted. This requires some extra care in the PTE
>>>>> conversion helpers, and additional handling in the boot code to ensure
>>>>> that we set TCR.DS safely if supported (and not overridden)
>>>>>
>>>>> Note that this series is mostly orthogonal to work by Anshuman done last
>>>>> year: this series assumes that 52-bit physical addressing is never
>>>>> needed to map the kernel image itself, and therefore that we never need
>>>>> ID map range extension to cover the kernel with a 5th level when running
>>>>> with 4.
>>>>
>>>> This limitation will certainly make it more tricky to test the the LPA2 stage2
>>>> implementation that I have done. I've got scripts that construct host systems
>>>> with all the RAM above 48 bits so that the output addresses in the stage2 page
>>>> tables can be guaranteed to contain OAs > 48 bits. I think the work around here
>>>> would be to place the RAM so that it straddles the 48 bit boundary such that the
>>>> size of RAM below is the same size as the kernel image, and place the kernel
>>>> image in it. Then this will ensure that the VM's memory still uses the RAM above
>>>> the threshold. Or is there a simpler approach?
>>>>
>>>
>>> No, that sounds reasonable. I'm using QEMU which happily lets you put
>>> the start of DRAM at any address you can imagine (if you recompile it)
>>
>> I'm running on FVP, which will let me do this with runtime parameters. Anyway,
>> I'll update my tests to cope with this constraint and run this patch set
>> through, and I'll let you know if it spots anything.
> 
> Excellent, thanks.
> 
>>>
>>> Another approach could be to simply stick a memblock_reserve()
>>> somewhere that covers all 48-bit addressable memory, but you will need
>>> some of both in any case.
>>>
>>>>> And given that the LPA2 architectural feature covers both the
>>>>> virtual and physical range extensions, where enabling the latter is
>>>>> required to enable the former, we can simplify things further by only
>>>>> enabling them as a pair. (I.e., 52-bit physical addressing cannot be
>>>>> enabled for 48-bit VA space or smaller)
>>>>>
>>>>> [...]
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-25 10:07         ` Ryan Roberts
@ 2022-11-25 10:36           ` Ard Biesheuvel
  2022-11-25 14:12             ` Ryan Roberts
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-25 10:36 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Fri, 25 Nov 2022 at 11:07, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 25/11/2022 09:35, Ard Biesheuvel wrote:
> > On Fri, 25 Nov 2022 at 10:23, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 24/11/2022 17:14, Ard Biesheuvel wrote:
> >>> On Thu, 24 Nov 2022 at 15:39, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>
> >>>> Hi Ard,
> >>>>
> >>>> Thanks for including me on this. I'll plan to do a review over the next week or
> >>>> so, but in the meantime, I have a couple of general questions/comments:
> >>>>
> >>>> On 24/11/2022 12:39, Ard Biesheuvel wrote:
> >>>>> Enable support for LPA2 when running with 4k or 16k pages. In the former
> >>>>> case, this requires 5 level paging with a runtime fallback to 4 on
> >>>>> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
> >>>>> pages, where we fall back to 3 level paging (47 bit virtual addressing)
> >>>>> on non-LPA2 configurations.
> >>>>
> >>>> It seems odd to me that if you have a non-LPA2 system, if you run a kernel that
> >>>> is compiled for 16KB pages and 48 VA bits, then you will get 48 VA bits. But if
> >>>> you run a kernel that is compiled for 16KB pages and 52 VA bits then you will
> >>>> get 47 VA bits? Wouldn't that pose a potential user space compat issue?
> >>>>
> >>>
> >>> Well, given that Android happily runs with 39-bit VAs to avoid 4 level
> >>> paging at all cost, I don't think that is a universal concern.
> >>
> >> Well presumably the Android kernel is always explicitly compiled for 39 VA bits
> >> so that's what user space is used to? I was really just making the point that if
> >> you have (the admittedly exotic and unlikely) case of having a 16KB kernel
> >> previously compiled for 48 VA bits, and you "upgrade" it to 52 VA bits now that
> >> the option is available, on HW without LPA2, this will actually be observed as a
> >> "downgrade" to 47 bits. If you previously wanted to limit to 3 levels of lookup
> >> with 16KB you would already have been compiling for 47 VA bits.
> >>
> >
> > I am not debating that. I'm just saying that, without any hardware in
> > existence, it is difficult to predict which of these concerns is going
> > to dominate, and so I opted for the least messy and most symmetrical
> > approach.
>
> OK fair enough. My opinion is logged ;-).
>
> >
> >>>
> >>> The benefit of this approach is that you can decide at runtime whether
> >>> you want to take the performance hit of 4 (or 5) level paging to get
> >>> access to the extended VA space.
> >>>
> >>>>> (Falling back to 48 bits would involve
> >>>>> finding a workaround for the fact that we cannot construct a level 0
> >>>>> table covering 52 bits of VA space that appears aligned to its size in
> >>>>> memory, and has the top 2 entries that represent the 48-bit region
> >>>>> appearing at an alignment of 64 bytes, which is required by the
> >>>>> architecture for TTBR address values.
> >>>>
> >>>> I'm not sure I've understood this. The level 0 table would need 32 entries for
> >>>> 52 VA bits so the table size is 256 bytes, naturally aligned to 256 bytes. 64 is
> >>>> a factor of 256 so surely the top 2 entries are guaranteed to also meet the
> >>>> constraint for the fallback path too?
> >>>>
> >>>
> >>> The top 2 entries are 16 bytes combined, and end on a 256 byte aligned
> >>> boundary so I don't see how they can start on a 64 byte aligned
> >>> boundary at the same time.
> >>
> >> I'm still not following; why would the 2 entry/16 byte table *end* on a 256 byte
> >> boundary? I guess I should go and read your patch before making assumptions, but
> >> my assumption from your description here was that you were optimistically
> >> allocating a 32 entry/256 byte table for the 52 VA bit case, then needing to
> >> reuse that table for the 2 entry/16 byte case if HW turns out not to support
> >> LPA2. In which case, surely the 2 entry table would be overlayed at the start
> >> (low address) of the allocated 32 entry table, and therefore its alignment is
> >> 256 bytes, which meets the HW's 64 byte alignment requirement?
> >>
> >
> > No, it's at the end, that is the point. I am specifically referring to
> > TTBR1 upper region page tables here.
> >
> > Please refer to the existing ttbr1_offset asm macro, which implements
> > this today for 64k pages + LVA. In this case, however, the condensed
> > table covers 6 bits of translation so it is naturally aligned to the
> > TTBR minimum alignment.
>
> Afraid I don't see any such ttbr1_offset macro, either in upstream or the branch
> you posted. The best I can find is TTBR1_OFFSET in arm arch, which I'm guessing
> isn't it. I'm keen to understand this better if you can point me to the right
> location?
>

Apologies, I got the name wrong

/*
 * Offset ttbr1 to allow for 48-bit kernel VAs set with 52-bit PTRS_PER_PGD.
 * orr is used as it can cover the immediate value (and is idempotent).
 * In future this may be nop'ed out when dealing with 52-bit kernel VAs.
 *      ttbr: Value of ttbr to set, modified.
 */
        .macro  offset_ttbr1, ttbr, tmp
#ifdef CONFIG_ARM64_VA_BITS_52
        mrs_s   \tmp, SYS_ID_AA64MMFR2_EL1
        and     \tmp, \tmp, #(0xf << ID_AA64MMFR2_EL1_VARange_SHIFT)
        cbnz    \tmp, .Lskipoffs_\@
        orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
.Lskipoffs_\@ :
#endif
        .endm

> Regardless, the Arm ARM states this for TTBR1_EL1.BADDR:
>
> """
> BADDR[47:1], bits [47:1]
>
> Translation table base address:
> • Bits A[47:x] of the stage 1 translation table base address bits are in
> register bits[47:x].
> • Bits A[(x-1):0] of the stage 1 translation table base address are zero.
>
> Address bit x is the minimum address bit required to align the translation table
> to the size of the table. The smallest permitted value of x is 6. The AArch64
> Virtual Memory System Architecture chapter describes how x is calculated based
> on the value of TCR_EL1.T1SZ, the translation stage, and the translation granule
> size.
>
> Note
> A translation table is required to be aligned to the size of the table. If a
> table contains fewer than eight entries, it must be aligned on a 64 byte address
> boundary.
> """
>
> I don't see how that is referring to the alignment of the *end* of the table?
>

It refers to the address poked into the register

When you create a 256 byte aligned 32 entry 52-bit level 0 table for
16k pages, entry #0 covers the start of the 52-bit addressable VA
space, and entry #30 covers the start of the 48-bit addressable VA
space.

When LPA2 is not supported, the walk must start at entry #30 so that
is where TTBR1_EL1 should point, but doing so violates the
architecture's alignment requirement.

So what we might do is double the size of the table, and clone entries
#30 and #31 to positions #32 and #33, for instance (and remember to
keep them in sync, which is not that hard)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-25 10:36           ` Ard Biesheuvel
@ 2022-11-25 14:12             ` Ryan Roberts
  2022-11-25 14:19               ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-11-25 14:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 25/11/2022 10:36, Ard Biesheuvel wrote:
> On Fri, 25 Nov 2022 at 11:07, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 25/11/2022 09:35, Ard Biesheuvel wrote:
>>> On Fri, 25 Nov 2022 at 10:23, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> On 24/11/2022 17:14, Ard Biesheuvel wrote:
>>>>> On Thu, 24 Nov 2022 at 15:39, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>
>>>>>> Hi Ard,
>>>>>>
>>>>>> Thanks for including me on this. I'll plan to do a review over the next week or
>>>>>> so, but in the meantime, I have a couple of general questions/comments:
>>>>>>
>>>>>> On 24/11/2022 12:39, Ard Biesheuvel wrote:
>>>>>>> Enable support for LPA2 when running with 4k or 16k pages. In the former
>>>>>>> case, this requires 5 level paging with a runtime fallback to 4 on
>>>>>>> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
>>>>>>> pages, where we fall back to 3 level paging (47 bit virtual addressing)
>>>>>>> on non-LPA2 configurations.
>>>>>>
>>>>>> It seems odd to me that if you have a non-LPA2 system, if you run a kernel that
>>>>>> is compiled for 16KB pages and 48 VA bits, then you will get 48 VA bits. But if
>>>>>> you run a kernel that is compiled for 16KB pages and 52 VA bits then you will
>>>>>> get 47 VA bits? Wouldn't that pose a potential user space compat issue?
>>>>>>
>>>>>
>>>>> Well, given that Android happily runs with 39-bit VAs to avoid 4 level
>>>>> paging at all cost, I don't think that is a universal concern.
>>>>
>>>> Well presumably the Android kernel is always explicitly compiled for 39 VA bits
>>>> so that's what user space is used to? I was really just making the point that if
>>>> you have (the admittedly exotic and unlikely) case of having a 16KB kernel
>>>> previously compiled for 48 VA bits, and you "upgrade" it to 52 VA bits now that
>>>> the option is available, on HW without LPA2, this will actually be observed as a
>>>> "downgrade" to 47 bits. If you previously wanted to limit to 3 levels of lookup
>>>> with 16KB you would already have been compiling for 47 VA bits.
>>>>
>>>
>>> I am not debating that. I'm just saying that, without any hardware in
>>> existence, it is difficult to predict which of these concerns is going
>>> to dominate, and so I opted for the least messy and most symmetrical
>>> approach.
>>
>> OK fair enough. My opinion is logged ;-).
>>
>>>
>>>>>
>>>>> The benefit of this approach is that you can decide at runtime whether
>>>>> you want to take the performance hit of 4 (or 5) level paging to get
>>>>> access to the extended VA space.
>>>>>
>>>>>>> (Falling back to 48 bits would involve
>>>>>>> finding a workaround for the fact that we cannot construct a level 0
>>>>>>> table covering 52 bits of VA space that appears aligned to its size in
>>>>>>> memory, and has the top 2 entries that represent the 48-bit region
>>>>>>> appearing at an alignment of 64 bytes, which is required by the
>>>>>>> architecture for TTBR address values.
>>>>>>
>>>>>> I'm not sure I've understood this. The level 0 table would need 32 entries for
>>>>>> 52 VA bits so the table size is 256 bytes, naturally aligned to 256 bytes. 64 is
>>>>>> a factor of 256 so surely the top 2 entries are guaranteed to also meet the
>>>>>> constraint for the fallback path too?
>>>>>>
>>>>>
>>>>> The top 2 entries are 16 bytes combined, and end on a 256 byte aligned
>>>>> boundary so I don't see how they can start on a 64 byte aligned
>>>>> boundary at the same time.
>>>>
>>>> I'm still not following; why would the 2 entry/16 byte table *end* on a 256 byte
>>>> boundary? I guess I should go and read your patch before making assumptions, but
>>>> my assumption from your description here was that you were optimistically
>>>> allocating a 32 entry/256 byte table for the 52 VA bit case, then needing to
>>>> reuse that table for the 2 entry/16 byte case if HW turns out not to support
>>>> LPA2. In which case, surely the 2 entry table would be overlayed at the start
>>>> (low address) of the allocated 32 entry table, and therefore its alignment is
>>>> 256 bytes, which meets the HW's 64 byte alignment requirement?
>>>>
>>>
>>> No, it's at the end, that is the point. I am specifically referring to
>>> TTBR1 upper region page tables here.
>>>
>>> Please refer to the existing ttbr1_offset asm macro, which implements
>>> this today for 64k pages + LVA. In this case, however, the condensed
>>> table covers 6 bits of translation so it is naturally aligned to the
>>> TTBR minimum alignment.
>>
>> Afraid I don't see any such ttbr1_offset macro, either in upstream or the branch
>> you posted. The best I can find is TTBR1_OFFSET in arm arch, which I'm guessing
>> isn't it. I'm keen to understand this better if you can point me to the right
>> location?
>>
> 
> Apologies, I got the name wrong
> 
> /*
>  * Offset ttbr1 to allow for 48-bit kernel VAs set with 52-bit PTRS_PER_PGD.
>  * orr is used as it can cover the immediate value (and is idempotent).
>  * In future this may be nop'ed out when dealing with 52-bit kernel VAs.
>  *      ttbr: Value of ttbr to set, modified.
>  */
>         .macro  offset_ttbr1, ttbr, tmp
> #ifdef CONFIG_ARM64_VA_BITS_52
>         mrs_s   \tmp, SYS_ID_AA64MMFR2_EL1
>         and     \tmp, \tmp, #(0xf << ID_AA64MMFR2_EL1_VARange_SHIFT)
>         cbnz    \tmp, .Lskipoffs_\@
>         orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
> .Lskipoffs_\@ :
> #endif
>         .endm
> 
>> Regardless, the Arm ARM states this for TTBR1_EL1.BADDR:
>>
>> """
>> BADDR[47:1], bits [47:1]
>>
>> Translation table base address:
>> • Bits A[47:x] of the stage 1 translation table base address bits are in
>> register bits[47:x].
>> • Bits A[(x-1):0] of the stage 1 translation table base address are zero.
>>
>> Address bit x is the minimum address bit required to align the translation table
>> to the size of the table. The smallest permitted value of x is 6. The AArch64
>> Virtual Memory System Architecture chapter describes how x is calculated based
>> on the value of TCR_EL1.T1SZ, the translation stage, and the translation granule
>> size.
>>
>> Note
>> A translation table is required to be aligned to the size of the table. If a
>> table contains fewer than eight entries, it must be aligned on a 64 byte address
>> boundary.
>> """
>>
>> I don't see how that is referring to the alignment of the *end* of the table?
>>
> 
> It refers to the address poked into the register
> 
> When you create a 256 byte aligned 32 entry 52-bit level 0 table for
> 16k pages, entry #0 covers the start of the 52-bit addressable VA
> space, and entry #30 covers the start of the 48-bit addressable VA
> space.
> 
> When LPA2 is not supported, the walk must start at entry #30 so that
> is where TTBR1_EL1 should point, but doing so violates the
> architecture's alignment requirement.
> 
> So what we might do is double the size of the table, and clone entries
> #30 and #31 to positions #32 and #33, for instance (and remember to
> keep them in sync, which is not that hard)

OK I get it now - thanks for explaining. I think the key part that I didn't
appreciate is that the table is always allocated and populated as if we have 52
VA bits even if we only have 48, then you just fix up TTBR1 to point part way
down the table if we only have 48 bits.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-25 14:12             ` Ryan Roberts
@ 2022-11-25 14:19               ` Ard Biesheuvel
  0 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-25 14:19 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Fri, 25 Nov 2022 at 15:13, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 25/11/2022 10:36, Ard Biesheuvel wrote:
> > On Fri, 25 Nov 2022 at 11:07, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 25/11/2022 09:35, Ard Biesheuvel wrote:
> >>> On Fri, 25 Nov 2022 at 10:23, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>
> >>>> On 24/11/2022 17:14, Ard Biesheuvel wrote:
> >>>>> On Thu, 24 Nov 2022 at 15:39, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>>>
> >>>>>> Hi Ard,
> >>>>>>
> >>>>>> Thanks for including me on this. I'll plan to do a review over the next week or
> >>>>>> so, but in the meantime, I have a couple of general questions/comments:
> >>>>>>
> >>>>>> On 24/11/2022 12:39, Ard Biesheuvel wrote:
> >>>>>>> Enable support for LPA2 when running with 4k or 16k pages. In the former
> >>>>>>> case, this requires 5 level paging with a runtime fallback to 4 on
> >>>>>>> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
> >>>>>>> pages, where we fall back to 3 level paging (47 bit virtual addressing)
> >>>>>>> on non-LPA2 configurations.
> >>>>>>
> >>>>>> It seems odd to me that if you have a non-LPA2 system, if you run a kernel that
> >>>>>> is compiled for 16KB pages and 48 VA bits, then you will get 48 VA bits. But if
> >>>>>> you run a kernel that is compiled for 16KB pages and 52 VA bits then you will
> >>>>>> get 47 VA bits? Wouldn't that pose a potential user space compat issue?
> >>>>>>
> >>>>>
> >>>>> Well, given that Android happily runs with 39-bit VAs to avoid 4 level
> >>>>> paging at all cost, I don't think that is a universal concern.
> >>>>
> >>>> Well presumably the Android kernel is always explicitly compiled for 39 VA bits
> >>>> so that's what user space is used to? I was really just making the point that if
> >>>> you have (the admittedly exotic and unlikely) case of having a 16KB kernel
> >>>> previously compiled for 48 VA bits, and you "upgrade" it to 52 VA bits now that
> >>>> the option is available, on HW without LPA2, this will actually be observed as a
> >>>> "downgrade" to 47 bits. If you previously wanted to limit to 3 levels of lookup
> >>>> with 16KB you would already have been compiling for 47 VA bits.
> >>>>
> >>>
> >>> I am not debating that. I'm just saying that, without any hardware in
> >>> existence, it is difficult to predict which of these concerns is going
> >>> to dominate, and so I opted for the least messy and most symmetrical
> >>> approach.
> >>
> >> OK fair enough. My opinion is logged ;-).
> >>
> >>>
> >>>>>
> >>>>> The benefit of this approach is that you can decide at runtime whether
> >>>>> you want to take the performance hit of 4 (or 5) level paging to get
> >>>>> access to the extended VA space.
> >>>>>
> >>>>>>> (Falling back to 48 bits would involve
> >>>>>>> finding a workaround for the fact that we cannot construct a level 0
> >>>>>>> table covering 52 bits of VA space that appears aligned to its size in
> >>>>>>> memory, and has the top 2 entries that represent the 48-bit region
> >>>>>>> appearing at an alignment of 64 bytes, which is required by the
> >>>>>>> architecture for TTBR address values.
> >>>>>>
> >>>>>> I'm not sure I've understood this. The level 0 table would need 32 entries for
> >>>>>> 52 VA bits so the table size is 256 bytes, naturally aligned to 256 bytes. 64 is
> >>>>>> a factor of 256 so surely the top 2 entries are guaranteed to also meet the
> >>>>>> constraint for the fallback path too?
> >>>>>>
> >>>>>
> >>>>> The top 2 entries are 16 bytes combined, and end on a 256 byte aligned
> >>>>> boundary so I don't see how they can start on a 64 byte aligned
> >>>>> boundary at the same time.
> >>>>
> >>>> I'm still not following; why would the 2 entry/16 byte table *end* on a 256 byte
> >>>> boundary? I guess I should go and read your patch before making assumptions, but
> >>>> my assumption from your description here was that you were optimistically
> >>>> allocating a 32 entry/256 byte table for the 52 VA bit case, then needing to
> >>>> reuse that table for the 2 entry/16 byte case if HW turns out not to support
> >>>> LPA2. In which case, surely the 2 entry table would be overlayed at the start
> >>>> (low address) of the allocated 32 entry table, and therefore its alignment is
> >>>> 256 bytes, which meets the HW's 64 byte alignment requirement?
> >>>>
> >>>
> >>> No, it's at the end, that is the point. I am specifically referring to
> >>> TTBR1 upper region page tables here.
> >>>
> >>> Please refer to the existing ttbr1_offset asm macro, which implements
> >>> this today for 64k pages + LVA. In this case, however, the condensed
> >>> table covers 6 bits of translation so it is naturally aligned to the
> >>> TTBR minimum alignment.
> >>
> >> Afraid I don't see any such ttbr1_offset macro, either in upstream or the branch
> >> you posted. The best I can find is TTBR1_OFFSET in arm arch, which I'm guessing
> >> isn't it. I'm keen to understand this better if you can point me to the right
> >> location?
> >>
> >
> > Apologies, I got the name wrong
> >
> > /*
> >  * Offset ttbr1 to allow for 48-bit kernel VAs set with 52-bit PTRS_PER_PGD.
> >  * orr is used as it can cover the immediate value (and is idempotent).
> >  * In future this may be nop'ed out when dealing with 52-bit kernel VAs.
> >  *      ttbr: Value of ttbr to set, modified.
> >  */
> >         .macro  offset_ttbr1, ttbr, tmp
> > #ifdef CONFIG_ARM64_VA_BITS_52
> >         mrs_s   \tmp, SYS_ID_AA64MMFR2_EL1
> >         and     \tmp, \tmp, #(0xf << ID_AA64MMFR2_EL1_VARange_SHIFT)
> >         cbnz    \tmp, .Lskipoffs_\@
> >         orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
> > .Lskipoffs_\@ :
> > #endif
> >         .endm
> >
> >> Regardless, the Arm ARM states this for TTBR1_EL1.BADDR:
> >>
> >> """
> >> BADDR[47:1], bits [47:1]
> >>
> >> Translation table base address:
> >> • Bits A[47:x] of the stage 1 translation table base address bits are in
> >> register bits[47:x].
> >> • Bits A[(x-1):0] of the stage 1 translation table base address are zero.
> >>
> >> Address bit x is the minimum address bit required to align the translation table
> >> to the size of the table. The smallest permitted value of x is 6. The AArch64
> >> Virtual Memory System Architecture chapter describes how x is calculated based
> >> on the value of TCR_EL1.T1SZ, the translation stage, and the translation granule
> >> size.
> >>
> >> Note
> >> A translation table is required to be aligned to the size of the table. If a
> >> table contains fewer than eight entries, it must be aligned on a 64 byte address
> >> boundary.
> >> """
> >>
> >> I don't see how that is referring to the alignment of the *end* of the table?
> >>
> >
> > It refers to the address poked into the register
> >
> > When you create a 256 byte aligned 32 entry 52-bit level 0 table for
> > 16k pages, entry #0 covers the start of the 52-bit addressable VA
> > space, and entry #30 covers the start of the 48-bit addressable VA
> > space.
> >
> > When LPA2 is not supported, the walk must start at entry #30 so that
> > is where TTBR1_EL1 should point, but doing so violates the
> > architecture's alignment requirement.
> >
> > So what we might do is double the size of the table, and clone entries
> > #30 and #31 to positions #32 and #33, for instance (and remember to
> > keep them in sync, which is not that hard)
>
> OK I get it now - thanks for explaining. I think the key part that I didn't
> appreciate is that the table is always allocated and populated as if we have 52
> VA bits even if we only have 48, then you just fix up TTBR1 to point part way
> down the table if we only have 48 bits.
>

Exactly.The entire address space is represented in software as if we
support 52-bits of virtual addressing, and we simply don't map
anything outside of the 48-bit addressable area if the hardware does
not support it. That way, handling the LVA/LPA2 feature remains an
arch specific implementation detail, and all the generic software page
table walking code has to be none the wiser. And for user space page
tables and the ID map, we can simply use the same root table address -
only kernel page tables need the additional hack here.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 14/19] arm64: Enable LPA2 at boot if supported by the system
  2022-11-24 12:39 ` [PATCH v2 14/19] arm64: Enable LPA2 at boot if supported by the system Ard Biesheuvel
@ 2022-11-28 14:54   ` Ryan Roberts
  0 siblings, 0 replies; 45+ messages in thread
From: Ryan Roberts @ 2022-11-28 14:54 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Marc Zyngier, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 24/11/2022 12:39, Ard Biesheuvel wrote:
> Update the early kernel mapping code to take 52-bit virtual addressing
> into account based on the LPA2 feature. This is a bit more involved than
> LVA (which is supported with 64k pages only), given that some page table
> descriptor bits change meaning in this case.
> 
> To keep the handling in asm to a minimum, the initial ID map is still
> created with 48-bit virtual addressing, which implies that the kernel
> image must be loaded into 48-bit addressable physical memory. This is
> currently required by the boot protocol, even though we happen to
> support placement outside of that for LVA/64k based configurations.
> 
> Enabling LPA2 involves more than setting TCR.T1SZ to a lower value,
> there is also a DS bit in TCR that needs to be set, and which changes
> the meaning of bits [9:8] in all page table descriptors. Since we cannot
> enable DS and every live page table descriptor at the same time, let's
> pivot through another temporary mapping. This avoids the need to
> reintroduce manipulations of the page tables with the MMU and caches
> disabled.
> 
> To permit the LPA2 feature to be overridden on the kernel command line,
> which may be necessary to work around silicon errata, or to deal with
> mismatched features on heterogeneous SoC designs, test for CPU feature
> overrides first, and only then enable LPA2.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/assembler.h      |   7 +-
>  arch/arm64/include/asm/kernel-pgtable.h |  25 +++--
>  arch/arm64/include/asm/memory.h         |   4 +
>  arch/arm64/kernel/head.S                |   9 +-
>  arch/arm64/kernel/image-vars.h          |   2 +
>  arch/arm64/kernel/pi/map_kernel.c       | 103 +++++++++++++++++++-
>  arch/arm64/mm/init.c                    |   2 +-
>  arch/arm64/mm/mmu.c                     |   8 +-
>  arch/arm64/mm/proc.S                    |   4 +
>  9 files changed, 151 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index 786bf62826a8..30eee6473cf0 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -609,11 +609,16 @@ alternative_endif
>   * but we have to add an offset so that the TTBR1 address corresponds with the
>   * pgdir entry that covers the lowest 48-bit addressable VA.
>   *
> + * Note that this trick only works for LVA/64k pages - LPA2/4k pages uses an
> + * additional paging level, and on LPA2/16k pages, we would end up with a TTBR
> + * address that is not 64 byte aligned, so there we reduce the number of paging
> + * levels for the non-LPA2 case.
> + *
>   * orr is used as it can cover the immediate value (and is idempotent).
>   * 	ttbr: Value of ttbr to set, modified.
>   */
>  	.macro	offset_ttbr1, ttbr, tmp
> -#ifdef CONFIG_ARM64_VA_BITS_52
> +#if defined(CONFIG_ARM64_VA_BITS_52) && !defined(CONFIG_ARM64_LPA2)
>  	mrs	\tmp, tcr_el1
>  	and	\tmp, \tmp, #TCR_T1SZ_MASK
>  	cmp	\tmp, #TCR_T1SZ(VA_BITS_MIN)
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index faa11e8b4a0e..2359b2af0c4c 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -20,12 +20,16 @@
>   */
>  #ifdef CONFIG_ARM64_4K_PAGES
>  #define INIT_IDMAP_USES_PMD_MAPS	1
> -#define INIT_IDMAP_TABLE_LEVELS		(CONFIG_PGTABLE_LEVELS - 1)
>  #else
>  #define INIT_IDMAP_USES_PMD_MAPS	0
> -#define INIT_IDMAP_TABLE_LEVELS		(CONFIG_PGTABLE_LEVELS)
>  #endif
>  
> +/* how many levels of translation are required to cover 'x' bits of VA space */
> +#define VA_LEVELS(x)		(((x) - 4) / (PAGE_SHIFT - 3))
> +#define INIT_IDMAP_TABLE_LEVELS	(VA_LEVELS(VA_BITS_MIN) - INIT_IDMAP_USES_PMD_MAPS)
> +
> +#define INIT_IDMAP_ROOT_SHIFT	(VA_LEVELS(VA_BITS_MIN) * (PAGE_SHIFT - 3) + 3)
> +
>  /*
>   * If KASLR is enabled, then an offset K is added to the kernel address
>   * space. The bottom 21 bits of this offset are zero to guarantee 2MB
> @@ -52,7 +56,14 @@
>  #define EARLY_ENTRIES(vstart, vend, shift, add) \
>  	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1 + add)
>  
> -#define EARLY_PGDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PGDIR_SHIFT, add))
> +#if CONFIG_PGTABLE_LEVELS > 4
> +/* the kernel is covered entirely by the pgd_t at the top of the VA space */
> +#define EARLY_PGDS	1
> +#else
> +#define EARLY_PGDS	0
> +#endif
> +
> +#define EARLY_P4DS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, INIT_IDMAP_ROOT_SHIFT, add))
>  
>  #if INIT_IDMAP_TABLE_LEVELS > 3
>  #define EARLY_PUDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PUD_SHIFT, add))
> @@ -66,11 +77,13 @@
>  #define EARLY_PMDS(vstart, vend, add) (0)
>  #endif
>  
> -#define EARLY_PAGES(vstart, vend, add) ( 1 			/* PGDIR page */				\
> -			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
> +#define EARLY_PAGES(vstart, vend, add) ( 1 			/* PGDIR/P4D page */				\
> +			+ EARLY_P4DS((vstart), (vend), add) 	/* each P4D needs a next level page table */	\
>  			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
>  			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
> -#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR) + EARLY_SEGMENT_EXTRA_PAGES))
> +
> +#define INIT_DIR_SIZE	(PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EARLY_KASLR) + \
> +			 EARLY_SEGMENT_EXTRA_PAGES + EARLY_PGDS))
>  
>  /* the initial ID map may need two extra pages if it needs to be extended */
>  #if VA_BITS_MIN < 48
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index b3826ff2e52b..4f617e271008 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -54,7 +54,11 @@
>  #define FIXADDR_TOP		(VMEMMAP_START - SZ_32M)
>  
>  #if VA_BITS > 48
> +#ifdef CONFIG_ARM64_16K_PAGES
> +#define VA_BITS_MIN		(47)
> +#else
>  #define VA_BITS_MIN		(48)
> +#endif
>  #else
>  #define VA_BITS_MIN		(VA_BITS)
>  #endif
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 94de42dfe97d..6be121949c06 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -198,7 +198,7 @@ SYM_CODE_END(preserve_boot_args)
>  	mov \tbl, \sv
>  	.endif
>  .L_\@:
> -	compute_indices \vstart, \vend, #PGDIR_SHIFT, \istart, \iend, \count
> +	compute_indices \vstart, \vend, #INIT_IDMAP_ROOT_SHIFT, \istart, \iend, \count
>  	mov \sv, \rtbl
>  	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
>  	mov \tbl, \sv
> @@ -610,9 +610,16 @@ SYM_FUNC_START(__cpu_secondary_check52bitva)
>  alternative_if_not ARM64_HAS_LVA
>  	ret
>  alternative_else_nop_endif
> +#ifndef CONFIG_ARM64_LPA2
>  	mrs_s	x0, SYS_ID_AA64MMFR2_EL1
>  	and	x0, x0, #(0xf << ID_AA64MMFR2_EL1_VARange_SHIFT)
>  	cbnz	x0, 2f
> +#else
> +	mrs	x0, id_aa64mmfr0_el1
> +	sbfx	x0, x0, #ID_AA64MMFR0_EL1_TGRAN_SHIFT, 4
> +	cmp	x0, #ID_AA64MMFR0_EL1_TGRAN_LPA2
> +	b.ge	2f
> +#endif
>  
>  	update_early_cpu_boot_status \
>  		CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_52_BIT_VA, x0, x1
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 82bafa1f869c..f48b6f09d278 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -56,6 +56,8 @@ PROVIDE(__pi_arm64_sw_feature_override	= arm64_sw_feature_override);
>  PROVIDE(__pi_arm64_use_ng_mappings	= arm64_use_ng_mappings);
>  PROVIDE(__pi__ctype			= _ctype);
>  
> +PROVIDE(__pi_init_idmap_pg_dir		= init_idmap_pg_dir);
> +PROVIDE(__pi_init_idmap_pg_end		= init_idmap_pg_end);
>  PROVIDE(__pi_init_pg_dir		= init_pg_dir);
>  PROVIDE(__pi_init_pg_end		= init_pg_end);
>  PROVIDE(__pi_swapper_pg_dir		= swapper_pg_dir);
> diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
> index a9472ab8d901..75d643da56c8 100644
> --- a/arch/arm64/kernel/pi/map_kernel.c
> +++ b/arch/arm64/kernel/pi/map_kernel.c
> @@ -133,6 +133,20 @@ static bool __init arm64_early_this_cpu_has_lva(void)
>  						    ID_AA64MMFR2_EL1_VARange_SHIFT);
>  }
>  
> +static bool __init arm64_early_this_cpu_has_lpa2(void)
> +{
> +	u64 mmfr0;
> +	int feat;
> +
> +	mmfr0 = read_sysreg(id_aa64mmfr0_el1);
> +	mmfr0 &= ~id_aa64mmfr0_override.mask;
> +	mmfr0 |= id_aa64mmfr0_override.val;
> +	feat = cpuid_feature_extract_signed_field(mmfr0,
> +						  ID_AA64MMFR0_EL1_TGRAN_SHIFT);
> +
> +	return feat >= ID_AA64MMFR0_EL1_TGRAN_LPA2;
> +}

This fails to compile when configured for 64KB pages, since
ID_AA64MMFR0_EL1_TGRAN_LPA2 is only defined for 4KB and 16KB granules (see
sysreg.h).

Suggest:

static bool __init arm64_early_this_cpu_has_lpa2(void)
{
#ifdef ID_AA64MMFR0_EL1_TGRAN_LPA2
	u64 mmfr0;
	int feat;

	mmfr0 = read_sysreg(id_aa64mmfr0_el1);
	mmfr0 &= ~id_aa64mmfr0_override.mask;
	mmfr0 |= id_aa64mmfr0_override.val;
	feat = cpuid_feature_extract_signed_field(mmfr0,
						  ID_AA64MMFR0_EL1_TGRAN_SHIFT);

	return feat >= ID_AA64MMFR0_EL1_TGRAN_LPA2;
#else
	return false;
#endif
}


> +
>  static bool __init arm64_early_this_cpu_has_pac(void)
>  {
>  	u64 isar1, isar2;
> @@ -254,11 +268,85 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
>  	}
>  
>  	/* Copy the root page table to its final location */
> -	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PGD_SIZE);
> +	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PAGE_SIZE);
>  	dsb(ishst);
>  	idmap_cpu_replace_ttbr1(swapper_pg_dir);
>  }
>  
> +static void noinline __section(".idmap.text") set_ttbr0_for_lpa2(u64 ttbr)
> +{
> +	u64 sctlr = read_sysreg(sctlr_el1);
> +	u64 tcr = read_sysreg(tcr_el1) | TCR_DS;
> +
> +	/* Update TCR.T0SZ in case we entered with a 47-bit ID map */
> +	tcr &= ~TCR_T0SZ_MASK;
> +	tcr |= TCR_T0SZ(48);
> +
> +	asm("	msr	sctlr_el1, %0		;"
> +	    "	isb				;"
> +	    "   msr     ttbr0_el1, %1		;"
> +	    "   msr     tcr_el1, %2		;"
> +	    "	isb				;"
> +	    "	tlbi    vmalle1			;"
> +	    "	dsb     nsh			;"
> +	    "	isb				;"
> +	    "	msr     sctlr_el1, %3		;"
> +	    "	isb				;"
> +	    ::	"r"(sctlr & ~SCTLR_ELx_M), "r"(ttbr), "r"(tcr), "r"(sctlr));
> +}
> +
> +static void remap_idmap_for_lpa2(void)
> +{
> +	extern pgd_t init_idmap_pg_dir[], init_idmap_pg_end[];
> +	pgd_t *pgdp = (void *)init_pg_dir + PAGE_SIZE;
> +	pgprot_t text_prot = PAGE_KERNEL_ROX;
> +	pgprot_t data_prot = PAGE_KERNEL;
> +
> +	/* clear the bits that change meaning once LPA2 is turned on */
> +	pgprot_val(text_prot) &= ~PTE_SHARED;
> +	pgprot_val(data_prot) &= ~PTE_SHARED;
> +
> +	/*
> +	 * We have to clear bits [9:8] in all block or page descriptors in the
> +	 * initial ID map, as otherwise they will be (mis)interpreted as
> +	 * physical address bits once we flick the LPA2 switch (TCR.DS). Since
> +	 * we cannot manipulate live descriptors in that way without creating
> +	 * potential TLB conflicts, let's create another temporary ID map in a
> +	 * LPA2 compatible fashion, and update the initial ID map while running
> +	 * from that.
> +	 */
> +	map_segment(init_pg_dir, &pgdp, 0, _stext, __inittext_end, text_prot,
> +		    false, 0);
> +	map_segment(init_pg_dir, &pgdp, 0, __initdata_begin, _end, data_prot,
> +		    false, 0);
> +	dsb(ishst);
> +	set_ttbr0_for_lpa2((u64)init_pg_dir);
> +
> +	/*
> +	 * Recreate the initial ID map with the same granularity as before.
> +	 * Don't bother with the FDT, we no longer need it after this.
> +	 */
> +	memset(init_idmap_pg_dir, 0,
> +	       (u64)init_idmap_pg_dir - (u64)init_idmap_pg_end);
> +
> +	pgdp = (void *)init_idmap_pg_dir + PAGE_SIZE;
> +	map_segment(init_idmap_pg_dir, &pgdp, 0,
> +		    PTR_ALIGN_DOWN(&_stext[0], INIT_IDMAP_BLOCK_SIZE),
> +		    PTR_ALIGN_DOWN(&__bss_start[0], INIT_IDMAP_BLOCK_SIZE),
> +		    text_prot, false, 0);
> +	map_segment(init_idmap_pg_dir, &pgdp, 0,
> +		    PTR_ALIGN_DOWN(&__bss_start[0], INIT_IDMAP_BLOCK_SIZE),
> +		    PTR_ALIGN(&_end[0], INIT_IDMAP_BLOCK_SIZE),
> +		    data_prot, false, 0);
> +	dsb(ishst);
> +
> +	/* switch back to the updated initial ID map */
> +	set_ttbr0_for_lpa2((u64)init_idmap_pg_dir);
> +
> +	/* wipe the temporary ID map from memory */
> +	memset(init_pg_dir, 0, (u64)init_pg_end - (u64)init_pg_dir);
> +}
> +
>  asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
>  {
>  	static char const chosen_str[] __initconst = "/chosen";
> @@ -266,6 +354,7 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
>  	u64 va_base, pa_base = (u64)&_text;
>  	u64 kaslr_offset = pa_base % MIN_KIMG_ALIGN;
>  	int root_level = 4 - CONFIG_PGTABLE_LEVELS;
> +	bool va52 = (VA_BITS == 52);
>  
>  	/* Clear BSS and the initial page tables */
>  	memset(__bss_start, 0, (u64)init_pg_end - (u64)__bss_start);
> @@ -295,7 +384,17 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
>  			arm64_use_ng_mappings = true;
>  	}
>  
> -	if (VA_BITS == 52 && arm64_early_this_cpu_has_lva())
> +	if (IS_ENABLED(CONFIG_ARM64_LPA2)) {
> +		if (arm64_early_this_cpu_has_lpa2()) {
> +			remap_idmap_for_lpa2();
> +		} else {
> +			va52 = false;
> +			root_level++;
> +		}
> +	} else if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) {
> +		va52 &= arm64_early_this_cpu_has_lva();
> +	}
> +	if (va52)
>  		sysreg_clear_set(tcr_el1, TCR_T1SZ_MASK, TCR_T1SZ(VA_BITS));
>  
>  	va_base = KIMAGE_VADDR + kaslr_offset;
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 4b4651ee47f2..498d327341b4 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -315,7 +315,7 @@ void __init arm64_memblock_init(void)
>  	 * physical address of PAGE_OFFSET, we have to *subtract* from it.
>  	 */
>  	if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
> -		memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
> +		memstart_addr -= _PAGE_OFFSET(vabits_actual) - _PAGE_OFFSET(52);
>  
>  	/*
>  	 * Apply the memory limit if it was set. Since the kernel may be loaded
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index d089bc78e592..ba5423ff7039 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -541,8 +541,12 @@ static void __init map_mem(pgd_t *pgdp)
>  	 * entries at any level are being shared between the linear region and
>  	 * the vmalloc region. Check whether this is true for the PGD level, in
>  	 * which case it is guaranteed to be true for all other levels as well.
> +	 * (Unless we are running with support for LPA2, in which case the
> +	 * entire reduced VA space is covered by a single pgd_t which will have
> +	 * been populated without the PXNTable attribute by the time we get here.)
>  	 */
> -	BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end));
> +	BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end) &&
> +		     pgd_index(_PAGE_OFFSET(VA_BITS_MIN)) != PTRS_PER_PGD - 1);
>  
>  	if (can_set_direct_map())
>  		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> @@ -726,7 +730,7 @@ static void __init create_idmap(void)
>  
>  void __init paging_init(void)
>  {
> -	idmap_t0sz = 63UL - __fls(__pa_symbol(_end) | GENMASK(VA_BITS_MIN - 1, 0));
> +	idmap_t0sz = 63UL - __fls(__pa_symbol(_end) | GENMASK(vabits_actual - 1, 0));
>  
>  	map_mem(swapper_pg_dir);
>  
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 179e213bbe2d..d95df732b672 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -489,7 +489,11 @@ SYM_FUNC_START(__cpu_setup)
>  #if VA_BITS > VA_BITS_MIN
>  	mov		x9, #64 - VA_BITS
>  alternative_if ARM64_HAS_LVA
> +	tcr_set_t0sz	tcr, x9
>  	tcr_set_t1sz	tcr, x9
> +#ifdef CONFIG_ARM64_LPA2
> +	orr		tcr, tcr, #TCR_DS
> +#endif
>  alternative_else_nop_endif
>  #endif
>  


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 07/19] arm64: mm: Handle LVA support as a CPU feature
  2022-11-24 12:39 ` [PATCH v2 07/19] arm64: mm: Handle LVA support as a CPU feature Ard Biesheuvel
@ 2022-11-28 14:54   ` Ryan Roberts
  0 siblings, 0 replies; 45+ messages in thread
From: Ryan Roberts @ 2022-11-28 14:54 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Marc Zyngier, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

Hi Ard,

On 24/11/2022 12:39, Ard Biesheuvel wrote:
> Currently, we detect CPU support for 52-bit virtual addressing (LVA)
> extremely early, before creating the kernel page tables or enabling the
> MMU. We cannot override the feature this early, and so large virtual
> addressing is always enabled on CPUs that implement support for it if
> the software support for it was enabled at build time. It also means we
> rely on non-trivial code in asm to deal with this feature.
> 
> Given that both the ID map and the TTBR1 mapping of the kernel image are
> guaranteed to be 48-bit addressable, it is not actually necessary to
> enable support this early, and instead, we can model it as a CPU
> feature. That way, we can rely on code patching to get the correct
> TCR.T1SZ values programmed on secondary boot and suspend from resume.

nit: I think you mean "resume from suspend"?

> 
> On the primary boot path, we simply enable the MMU with 48-bit virtual
> addressing initially, and update TCR.T1SZ if LVA is supported from C
> code, right before creating the kernel mapping. Given that TTBR1 still
> points to reserved_pg_dir at this point, updating TCR.T1SZ should be
> safe without the need for explicit TLB maintenance.
> 
> Since this gets rid of all accesses to the vabits_actual variable from
> asm code that occurred before TCR.T1SZ had been programmed, we no longer
> have a need for this variable, and we can replace it with a C expression
> that produces the correct value directly, based on the value of TCR.T1SZ.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/memory.h   | 13 ++++++++++-
>  arch/arm64/kernel/cpufeature.c    | 13 +++++++++++
>  arch/arm64/kernel/head.S          | 24 +++-----------------
>  arch/arm64/kernel/pi/map_kernel.c | 12 ++++++++++
>  arch/arm64/kernel/sleep.S         |  3 ---
>  arch/arm64/mm/mmu.c               |  5 ----
>  arch/arm64/mm/proc.S              | 17 +++++++-------
>  arch/arm64/tools/cpucaps          |  1 +
>  8 files changed, 49 insertions(+), 39 deletions(-)
> 
> [...]
Thanks,
Ryan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging
  2022-11-24 12:39 ` [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging Ard Biesheuvel
@ 2022-11-28 16:17   ` Ryan Roberts
  2022-11-28 16:22     ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-11-28 16:17 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Marc Zyngier, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 24/11/2022 12:39, Ard Biesheuvel wrote:
> Add the required types and descriptor accessors to support 5 levels of
> paging in the common code. This is one of the prerequisites for
> supporting 52-bit virtual addressing with 4k pages.
> 
> Note that this does not cover the code that handles kernel mappings or
> the fixmap.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/pgalloc.h       | 41 +++++++++++
>  arch/arm64/include/asm/pgtable-hwdef.h | 22 +++++-
>  arch/arm64/include/asm/pgtable-types.h |  6 ++
>  arch/arm64/include/asm/pgtable.h       | 75 +++++++++++++++++++-
>  arch/arm64/mm/mmu.c                    | 31 +++++++-
>  arch/arm64/mm/pgd.c                    | 15 +++-
>  6 files changed, 181 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 237224484d0f..cae8c648f462 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -60,6 +60,47 @@ static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
>  }
>  #endif	/* CONFIG_PGTABLE_LEVELS > 3 */
>  
> +#if CONFIG_PGTABLE_LEVELS > 4
> +
> +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
> +{
> +	if (pgtable_l5_enabled())
> +		set_pgd(pgdp, __pgd(__phys_to_pgd_val(p4dp) | prot));
> +}
> +
> +static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> +{
> +	pgdval_t pgdval = PGD_TYPE_TABLE;
> +
> +	pgdval |= (mm == &init_mm) ? PGD_TABLE_UXN : PGD_TABLE_PXN;
> +	__pgd_populate(pgdp, __pa(p4dp), pgdval);
> +}
> +
> +static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
> +{
> +	gfp_t gfp = GFP_PGTABLE_USER;
> +
> +	if (mm == &init_mm)
> +		gfp = GFP_PGTABLE_KERNEL;
> +	return (p4d_t *)get_zeroed_page(gfp);
> +}
> +
> +static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> +{
> +	if (!pgtable_l5_enabled())
> +		return;
> +	BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
> +	free_page((unsigned long)p4d);
> +}
> +
> +#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
> +#else
> +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
> +{
> +	BUILD_BUG();
> +}
> +#endif	/* CONFIG_PGTABLE_LEVELS > 4 */
> +
>  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>  extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
>  
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index b91fe4781b06..b364b02e696b 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -26,10 +26,10 @@
>  #define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))
>  
>  /*
> - * Size mapped by an entry at level n ( 0 <= n <= 3)
> + * Size mapped by an entry at level n ( -1 <= n <= 3)
>   * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits
>   * in the final page. The maximum number of translation levels supported by
> - * the architecture is 4. Hence, starting at level n, we have further
> + * the architecture is 5. Hence, starting at level n, we have further
>   * ((4 - n) - 1) levels of translation excluding the offset within the page.
>   * So, the total number of bits mapped by an entry at level n is :
>   *

Is it neccessary to represent the levels as (-1 - 3) in the kernel or are you
open to switching to (0 - 4)?

There are a couple of other places where translation level is used, which I
found and fixed up for the KVM LPA2 support work. It got a bit messy to
represent the levels using the architectural range (-1 - 3) so I ended up
representing them as (0 - 4). The main issue was that KVM represents level as
unsigned so that change would have looked quite big.

Most of this is confined to KVM and the only place it really crosses over with
the kernel is at __tlbi_level(). Which makes me think you might be missing some
required changes (I didn't notice these in your other patches):

Looking at the TLB management stuff, I think there are some places you will need
to fix up to correctly handle the extra level in the kernel (e.g.
tlb_get_level(), flush_tlb_range()).

There are some new ecodings for level in the FSC field in the ESR. You might
need to update the fault_info array in fault.c to represent these and correctly
handle user space faults for the new level?


> [...]

Thanks,
Ryan



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging
  2022-11-28 16:17   ` Ryan Roberts
@ 2022-11-28 16:22     ` Ard Biesheuvel
  2022-11-28 18:00       ` Marc Zyngier
  2022-11-29 15:46       ` Ryan Roberts
  0 siblings, 2 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-28 16:22 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Mon, 28 Nov 2022 at 17:17, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 24/11/2022 12:39, Ard Biesheuvel wrote:
> > Add the required types and descriptor accessors to support 5 levels of
> > paging in the common code. This is one of the prerequisites for
> > supporting 52-bit virtual addressing with 4k pages.
> >
> > Note that this does not cover the code that handles kernel mappings or
> > the fixmap.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/include/asm/pgalloc.h       | 41 +++++++++++
> >  arch/arm64/include/asm/pgtable-hwdef.h | 22 +++++-
> >  arch/arm64/include/asm/pgtable-types.h |  6 ++
> >  arch/arm64/include/asm/pgtable.h       | 75 +++++++++++++++++++-
> >  arch/arm64/mm/mmu.c                    | 31 +++++++-
> >  arch/arm64/mm/pgd.c                    | 15 +++-
> >  6 files changed, 181 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> > index 237224484d0f..cae8c648f462 100644
> > --- a/arch/arm64/include/asm/pgalloc.h
> > +++ b/arch/arm64/include/asm/pgalloc.h
> > @@ -60,6 +60,47 @@ static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
> >  }
> >  #endif       /* CONFIG_PGTABLE_LEVELS > 3 */
> >
> > +#if CONFIG_PGTABLE_LEVELS > 4
> > +
> > +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
> > +{
> > +     if (pgtable_l5_enabled())
> > +             set_pgd(pgdp, __pgd(__phys_to_pgd_val(p4dp) | prot));
> > +}
> > +
> > +static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> > +{
> > +     pgdval_t pgdval = PGD_TYPE_TABLE;
> > +
> > +     pgdval |= (mm == &init_mm) ? PGD_TABLE_UXN : PGD_TABLE_PXN;
> > +     __pgd_populate(pgdp, __pa(p4dp), pgdval);
> > +}
> > +
> > +static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
> > +{
> > +     gfp_t gfp = GFP_PGTABLE_USER;
> > +
> > +     if (mm == &init_mm)
> > +             gfp = GFP_PGTABLE_KERNEL;
> > +     return (p4d_t *)get_zeroed_page(gfp);
> > +}
> > +
> > +static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> > +{
> > +     if (!pgtable_l5_enabled())
> > +             return;
> > +     BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
> > +     free_page((unsigned long)p4d);
> > +}
> > +
> > +#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
> > +#else
> > +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
> > +{
> > +     BUILD_BUG();
> > +}
> > +#endif       /* CONFIG_PGTABLE_LEVELS > 4 */
> > +
> >  extern pgd_t *pgd_alloc(struct mm_struct *mm);
> >  extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
> >
> > diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> > index b91fe4781b06..b364b02e696b 100644
> > --- a/arch/arm64/include/asm/pgtable-hwdef.h
> > +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> > @@ -26,10 +26,10 @@
> >  #define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))
> >
> >  /*
> > - * Size mapped by an entry at level n ( 0 <= n <= 3)
> > + * Size mapped by an entry at level n ( -1 <= n <= 3)
> >   * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits
> >   * in the final page. The maximum number of translation levels supported by
> > - * the architecture is 4. Hence, starting at level n, we have further
> > + * the architecture is 5. Hence, starting at level n, we have further
> >   * ((4 - n) - 1) levels of translation excluding the offset within the page.
> >   * So, the total number of bits mapped by an entry at level n is :
> >   *
>
> Is it neccessary to represent the levels as (-1 - 3) in the kernel or are you
> open to switching to (0 - 4)?
>
> There are a couple of other places where translation level is used, which I
> found and fixed up for the KVM LPA2 support work. It got a bit messy to
> represent the levels using the architectural range (-1 - 3) so I ended up
> representing them as (0 - 4). The main issue was that KVM represents level as
> unsigned so that change would have looked quite big.
>
> Most of this is confined to KVM and the only place it really crosses over with
> the kernel is at __tlbi_level(). Which makes me think you might be missing some
> required changes (I didn't notice these in your other patches):
>
> Looking at the TLB management stuff, I think there are some places you will need
> to fix up to correctly handle the extra level in the kernel (e.g.
> tlb_get_level(), flush_tlb_range()).
>
> There are some new ecodings for level in the FSC field in the ESR. You might
> need to update the fault_info array in fault.c to represent these and correctly
> handle user space faults for the new level?
>

Hi Ryan,

Thanks for pointing this out. Once I have educated myself a bit more
about all of this, I should be able to answer your questions :-)

I did not do any user space testing in anger on this series, on the
assumption that we already support 52-bit VAs, but I completely missed
the fact that the additional level of paging requires additional
attention.

As for the level indexing: I have a slight preference for sticking
with the architectural range, but I don't deeply care either way.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging
  2022-11-28 16:22     ` Ard Biesheuvel
@ 2022-11-28 18:00       ` Marc Zyngier
  2022-11-28 18:20         ` Ryan Roberts
  2022-11-29 15:46       ` Ryan Roberts
  1 sibling, 1 reply; 45+ messages in thread
From: Marc Zyngier @ 2022-11-28 18:00 UTC (permalink / raw)
  To: Ard Biesheuvel, Ryan Roberts
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 2022-11-28 16:22, Ard Biesheuvel wrote:
> On Mon, 28 Nov 2022 at 17:17, Ryan Roberts <ryan.roberts@arm.com> 
> wrote:
>> 
>> On 24/11/2022 12:39, Ard Biesheuvel wrote:
>> > Add the required types and descriptor accessors to support 5 levels of
>> > paging in the common code. This is one of the prerequisites for
>> > supporting 52-bit virtual addressing with 4k pages.
>> >
>> > Note that this does not cover the code that handles kernel mappings or
>> > the fixmap.
>> >
>> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>> > ---
>> >  arch/arm64/include/asm/pgalloc.h       | 41 +++++++++++
>> >  arch/arm64/include/asm/pgtable-hwdef.h | 22 +++++-
>> >  arch/arm64/include/asm/pgtable-types.h |  6 ++
>> >  arch/arm64/include/asm/pgtable.h       | 75 +++++++++++++++++++-
>> >  arch/arm64/mm/mmu.c                    | 31 +++++++-
>> >  arch/arm64/mm/pgd.c                    | 15 +++-
>> >  6 files changed, 181 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
>> > index 237224484d0f..cae8c648f462 100644
>> > --- a/arch/arm64/include/asm/pgalloc.h
>> > +++ b/arch/arm64/include/asm/pgalloc.h
>> > @@ -60,6 +60,47 @@ static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
>> >  }
>> >  #endif       /* CONFIG_PGTABLE_LEVELS > 3 */
>> >
>> > +#if CONFIG_PGTABLE_LEVELS > 4
>> > +
>> > +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
>> > +{
>> > +     if (pgtable_l5_enabled())
>> > +             set_pgd(pgdp, __pgd(__phys_to_pgd_val(p4dp) | prot));
>> > +}
>> > +
>> > +static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
>> > +{
>> > +     pgdval_t pgdval = PGD_TYPE_TABLE;
>> > +
>> > +     pgdval |= (mm == &init_mm) ? PGD_TABLE_UXN : PGD_TABLE_PXN;
>> > +     __pgd_populate(pgdp, __pa(p4dp), pgdval);
>> > +}
>> > +
>> > +static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
>> > +{
>> > +     gfp_t gfp = GFP_PGTABLE_USER;
>> > +
>> > +     if (mm == &init_mm)
>> > +             gfp = GFP_PGTABLE_KERNEL;
>> > +     return (p4d_t *)get_zeroed_page(gfp);
>> > +}
>> > +
>> > +static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
>> > +{
>> > +     if (!pgtable_l5_enabled())
>> > +             return;
>> > +     BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
>> > +     free_page((unsigned long)p4d);
>> > +}
>> > +
>> > +#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
>> > +#else
>> > +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
>> > +{
>> > +     BUILD_BUG();
>> > +}
>> > +#endif       /* CONFIG_PGTABLE_LEVELS > 4 */
>> > +
>> >  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>> >  extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
>> >
>> > diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>> > index b91fe4781b06..b364b02e696b 100644
>> > --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> > +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> > @@ -26,10 +26,10 @@
>> >  #define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))
>> >
>> >  /*
>> > - * Size mapped by an entry at level n ( 0 <= n <= 3)
>> > + * Size mapped by an entry at level n ( -1 <= n <= 3)
>> >   * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits
>> >   * in the final page. The maximum number of translation levels supported by
>> > - * the architecture is 4. Hence, starting at level n, we have further
>> > + * the architecture is 5. Hence, starting at level n, we have further
>> >   * ((4 - n) - 1) levels of translation excluding the offset within the page.
>> >   * So, the total number of bits mapped by an entry at level n is :
>> >   *
>> 
>> Is it neccessary to represent the levels as (-1 - 3) in the kernel or 
>> are you
>> open to switching to (0 - 4)?
>> 
>> There are a couple of other places where translation level is used, 
>> which I
>> found and fixed up for the KVM LPA2 support work. It got a bit messy 
>> to
>> represent the levels using the architectural range (-1 - 3) so I ended 
>> up
>> representing them as (0 - 4). The main issue was that KVM represents 
>> level as
>> unsigned so that change would have looked quite big.
>> 
>> Most of this is confined to KVM and the only place it really crosses 
>> over with
>> the kernel is at __tlbi_level(). Which makes me think you might be 
>> missing some
>> required changes (I didn't notice these in your other patches):
>> 
>> Looking at the TLB management stuff, I think there are some places you 
>> will need
>> to fix up to correctly handle the extra level in the kernel (e.g.
>> tlb_get_level(), flush_tlb_range()).
>> 
>> There are some new ecodings for level in the FSC field in the ESR. You 
>> might
>> need to update the fault_info array in fault.c to represent these and 
>> correctly
>> handle user space faults for the new level?
>> 
> 
> Hi Ryan,
> 
> Thanks for pointing this out. Once I have educated myself a bit more
> about all of this, I should be able to answer your questions :-)
> 
> I did not do any user space testing in anger on this series, on the
> assumption that we already support 52-bit VAs, but I completely missed
> the fact that the additional level of paging requires additional
> attention.
> 
> As for the level indexing: I have a slight preference for sticking
> with the architectural range, but I don't deeply care either way.

I'd really like to stick to the architectural representation, as
there is an ingrained knowledge of the relation between a base
granule size, a level, and a block mapping size.

The nice thing about level '-1' is that it preserve this behaviour,
and doesn't force everyone to adjust. It also makes it extremely
easy to compare the code and the spec.

So let's please stick to the [-1;3] range. It will save everyone
a lot of trouble.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging
  2022-11-28 18:00       ` Marc Zyngier
@ 2022-11-28 18:20         ` Ryan Roberts
  0 siblings, 0 replies; 45+ messages in thread
From: Ryan Roberts @ 2022-11-28 18:20 UTC (permalink / raw)
  To: Marc Zyngier, Ard Biesheuvel
  Cc: linux-arm-kernel, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 28/11/2022 18:00, Marc Zyngier wrote:
> On 2022-11-28 16:22, Ard Biesheuvel wrote:
>> On Mon, 28 Nov 2022 at 17:17, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> On 24/11/2022 12:39, Ard Biesheuvel wrote:
>>> > Add the required types and descriptor accessors to support 5 levels of
>>> > paging in the common code. This is one of the prerequisites for
>>> > supporting 52-bit virtual addressing with 4k pages.
>>> >
>>> > Note that this does not cover the code that handles kernel mappings or
>>> > the fixmap.
>>> >
>>> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>> > ---
>>> >  arch/arm64/include/asm/pgalloc.h       | 41 +++++++++++
>>> >  arch/arm64/include/asm/pgtable-hwdef.h | 22 +++++-
>>> >  arch/arm64/include/asm/pgtable-types.h |  6 ++
>>> >  arch/arm64/include/asm/pgtable.h       | 75 +++++++++++++++++++-
>>> >  arch/arm64/mm/mmu.c                    | 31 +++++++-
>>> >  arch/arm64/mm/pgd.c                    | 15 +++-
>>> >  6 files changed, 181 insertions(+), 9 deletions(-)
>>> >
>>> > diff --git a/arch/arm64/include/asm/pgalloc.h
>>> b/arch/arm64/include/asm/pgalloc.h
>>> > index 237224484d0f..cae8c648f462 100644
>>> > --- a/arch/arm64/include/asm/pgalloc.h
>>> > +++ b/arch/arm64/include/asm/pgalloc.h
>>> > @@ -60,6 +60,47 @@ static inline void __p4d_populate(p4d_t *p4dp,
>>> phys_addr_t pudp, p4dval_t prot)
>>> >  }
>>> >  #endif       /* CONFIG_PGTABLE_LEVELS > 3 */
>>> >
>>> > +#if CONFIG_PGTABLE_LEVELS > 4
>>> > +
>>> > +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t
>>> prot)
>>> > +{
>>> > +     if (pgtable_l5_enabled())
>>> > +             set_pgd(pgdp, __pgd(__phys_to_pgd_val(p4dp) | prot));
>>> > +}
>>> > +
>>> > +static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, p4d_t
>>> *p4dp)
>>> > +{
>>> > +     pgdval_t pgdval = PGD_TYPE_TABLE;
>>> > +
>>> > +     pgdval |= (mm == &init_mm) ? PGD_TABLE_UXN : PGD_TABLE_PXN;
>>> > +     __pgd_populate(pgdp, __pa(p4dp), pgdval);
>>> > +}
>>> > +
>>> > +static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
>>> > +{
>>> > +     gfp_t gfp = GFP_PGTABLE_USER;
>>> > +
>>> > +     if (mm == &init_mm)
>>> > +             gfp = GFP_PGTABLE_KERNEL;
>>> > +     return (p4d_t *)get_zeroed_page(gfp);
>>> > +}
>>> > +
>>> > +static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
>>> > +{
>>> > +     if (!pgtable_l5_enabled())
>>> > +             return;
>>> > +     BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
>>> > +     free_page((unsigned long)p4d);
>>> > +}
>>> > +
>>> > +#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
>>> > +#else
>>> > +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t
>>> prot)
>>> > +{
>>> > +     BUILD_BUG();
>>> > +}
>>> > +#endif       /* CONFIG_PGTABLE_LEVELS > 4 */
>>> > +
>>> >  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>>> >  extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
>>> >
>>> > diff --git a/arch/arm64/include/asm/pgtable-hwdef.h
>>> b/arch/arm64/include/asm/pgtable-hwdef.h
>>> > index b91fe4781b06..b364b02e696b 100644
>>> > --- a/arch/arm64/include/asm/pgtable-hwdef.h
>>> > +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>>> > @@ -26,10 +26,10 @@
>>> >  #define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))
>>> >
>>> >  /*
>>> > - * Size mapped by an entry at level n ( 0 <= n <= 3)
>>> > + * Size mapped by an entry at level n ( -1 <= n <= 3)
>>> >   * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits
>>> >   * in the final page. The maximum number of translation levels supported by
>>> > - * the architecture is 4. Hence, starting at level n, we have further
>>> > + * the architecture is 5. Hence, starting at level n, we have further
>>> >   * ((4 - n) - 1) levels of translation excluding the offset within the page.
>>> >   * So, the total number of bits mapped by an entry at level n is :
>>> >   *
>>>
>>> Is it neccessary to represent the levels as (-1 - 3) in the kernel or are you
>>> open to switching to (0 - 4)?
>>>
>>> There are a couple of other places where translation level is used, which I
>>> found and fixed up for the KVM LPA2 support work. It got a bit messy to
>>> represent the levels using the architectural range (-1 - 3) so I ended up
>>> representing them as (0 - 4). The main issue was that KVM represents level as
>>> unsigned so that change would have looked quite big.
>>>
>>> Most of this is confined to KVM and the only place it really crosses over with
>>> the kernel is at __tlbi_level(). Which makes me think you might be missing some
>>> required changes (I didn't notice these in your other patches):
>>>
>>> Looking at the TLB management stuff, I think there are some places you will need
>>> to fix up to correctly handle the extra level in the kernel (e.g.
>>> tlb_get_level(), flush_tlb_range()).
>>>
>>> There are some new ecodings for level in the FSC field in the ESR. You might
>>> need to update the fault_info array in fault.c to represent these and correctly
>>> handle user space faults for the new level?
>>>
>>
>> Hi Ryan,
>>
>> Thanks for pointing this out. Once I have educated myself a bit more
>> about all of this, I should be able to answer your questions :-)
>>
>> I did not do any user space testing in anger on this series, on the
>> assumption that we already support 52-bit VAs, but I completely missed
>> the fact that the additional level of paging requires additional
>> attention.
>>
>> As for the level indexing: I have a slight preference for sticking
>> with the architectural range, but I don't deeply care either way.
> 
> I'd really like to stick to the architectural representation, as
> there is an ingrained knowledge of the relation between a base
> granule size, a level, and a block mapping size.
> 
> The nice thing about level '-1' is that it preserve this behaviour,
> and doesn't force everyone to adjust. It also makes it extremely
> easy to compare the code and the spec.
> 
> So let's please stick to the [-1;3] range. It will save everyone
> a lot of trouble.

Fair point. It will mean a bigger patch, but I'll rework my stuff to make it all
work with [-1;3] before I post it.

> 
> Thanks,
> 
>         M.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
                   ` (19 preceding siblings ...)
  2022-11-24 14:39 ` [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ryan Roberts
@ 2022-11-29 15:31 ` Ryan Roberts
  2022-11-29 15:47   ` Ard Biesheuvel
  20 siblings, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-11-29 15:31 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Marc Zyngier, Will Deacon, Mark Rutland, Kees Cook,
	Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

Hi Ard,

As promised, I ran your patch set through my test set up and have noticed a few
issues. Sorry it turned into rather a long email...

First, a quick explanation of the test suite: For all valid combinations of the
below parameters, boot the host kernel on the FVP, then boot the guest kernel in
a VM, check that booting succeeds all the way to the guest shell then poweroff
guest followed by host can check shutdown is clean.

Parameters:
 - hw_pa:		[48, lpa, lpa2]
 - hw_va:		[48, 52]
 - kvm_mode:		[vhe, nvhe, protected]
 - host_page_size:	[4KB, 16KB, 64KB]
 - host_pa:		[48, 52]
 - host_va:		[48, 52]
 - host_load_addr:	[low, high]
 - guest_page_size:	[64KB]
 - guest_pa:		[52]
 - guest_va:		[52]
 - guest_load_addr:	[low, high]

When *_load_addr is 'low', that means the RAM is below 48 bits in (I)PA space.
'high' means the RAM starts at 2048TB for the guest (52 bit PA), and it means
there are 2 regions for the host; one at 0x880000000000 (48 bit PA) sized to
hold the kernel image only and another at 0x8800000000000 (52 bit PA) sized at
4GB. The FVP only allows RAM at certain locations and having a contiguous region
cross the 48 bit boundary is not an option. So I chose these values to ensure
that the linear map size is within 51 bits, which is a requirement for
nhve/protected mode kvm.

In all cases, I preload TF-A bl31, kernel, dt and initrd into RAM and run the
FVP. This sidesteps problems with EFI needing low memory, and with the FVP's
block devices needing DMA memory below 44 bits PA. bl31 and dt are appropriately
fixed up for the 2 different memory layouts.

Given this was designed to test my KVM changes, I was previously running these
without the host_load_addr=high option for the 4k and 16k host kernels (since
this requires your patch set). In this situation there are 132 valid configs and
all of them pass.

I then rebased my changes on top of yours and added in the host_load_addr=high
option. Now there are 186 valid configs, 64 of which fail. (some of these
failures are regressions). From a quick initial triage, there are 3 failure modes:


1) 18 FAILING TESTS: Host kernel never outputs anything to console

  TF-A runs successfully, says it is jumping to the kernel, then nothing further
  is seen. I'm pretty confident that the blobs are loaded into memory correctly
  because the same framework is working for the other configs (including 64k
  kernel loaded into high memory). This affects all configs where a host kernel
  with 4k or 16k pages built with LPA2 support is loaded into high memory.


2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM

  During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
  failing tests are configured for protected KVM, and are build with LPA2
  support, running on non-LPA2 HW.


3) 42 FAILING TESTS: Guest kernel never outputs anything to console

  Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
  There is no error reported, but the guest never outputs anything. Haven't
  worked out which config options are common to all failures yet.


Finally, I removed my code, and ran with your patch set as provided. For this I
hacked up my test suite to boot the host, and ignore booting a guest. I also
didn't bother to vary the KVM mode and just left it in VHE mode. There were 46
valid configs here, of which 4 failed. They were all the same failure mode as
(1) above. Failing configs were:

id  hw_pa  hw_va  host_page_size  host_pa  host_va  host_load_addr
------------------------------------------------------------------
40  lpa    52     4k              52       52       high
45  lpa    52     16k             52       52       high
55  lpa2   52     4k              52       52       high
60  lpa2   52     16k             52       52       high


So on the balance of probabilities, I think failure mode (1) is very likely to
be due to a bug in your code. (2) and (3) could be my issue or your issue: I
propose to dig into those a bit further and will get back to you on them. I
don't plan to look any further into (1).

Thanks,
Ryan


On 24/11/2022 12:39, Ard Biesheuvel wrote:
> Enable support for LPA2 when running with 4k or 16k pages. In the former
> case, this requires 5 level paging with a runtime fallback to 4 on
> non-LPA2 hardware. For consistency, the same approach is adopted for 16k
> pages, where we fall back to 3 level paging (47 bit virtual addressing)
> on non-LPA2 configurations. (Falling back to 48 bits would involve
> finding a workaround for the fact that we cannot construct a level 0
> table covering 52 bits of VA space that appears aligned to its size in
> memory, and has the top 2 entries that represent the 48-bit region
> appearing at an alignment of 64 bytes, which is required by the
> architecture for TTBR address values. Also, using an additional level of
> paging to translate a single VA bit is wasteful in terms of TLB
> efficiency)
> 
> This means support for falling back to 3 levels of paging at runtime
> when configured for 4 is also needed.
> 
> Another thing worth to note is that the repurposed physical address bits
> in the page table descriptors were not RES0 before, and so there is now
> a big global switch (called TCR.DS) which controls how all page table
> descriptors are interpreted. This requires some extra care in the PTE
> conversion helpers, and additional handling in the boot code to ensure
> that we set TCR.DS safely if supported (and not overridden)
> 
> Note that this series is mostly orthogonal to work by Anshuman done last
> year: this series assumes that 52-bit physical addressing is never
> needed to map the kernel image itself, and therefore that we never need
> ID map range extension to cover the kernel with a 5th level when running
> with 4. And given that the LPA2 architectural feature covers both the
> virtual and physical range extensions, where enabling the latter is
> required to enable the former, we can simplify things further by only
> enabling them as a pair. (I.e., 52-bit physical addressing cannot be
> enabled for 48-bit VA space or smaller)
> 
> This series applies onto some of my previous work that is still in
> flight, so these patches will not apply in isolation. Complete branch
> can be found here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-4k-lpa2
> 
> It supersedes the RFC v1 I sent out last week, which covered 16k pages
> only. It also supersedes some related work I sent out in isolation
> before:
> 
> [PATCH] arm64: mm: Enable KASAN for 16k/48-bit VA configurations
> [PATCH 0/3] arm64: mm: Model LVA support as a CPU feature
> 
> Tested on QEMU with -cpu max and lpa2 both off and on, as well as using
> the arm64.nolva override kernel command line parameter. Note that this
> requires a QEMU built from the latest sources.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> 
> Anshuman Khandual (3):
>   arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses
>   arm64/mm: Add FEAT_LPA2 specific TCR_EL1.DS field
>   arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]
> 
> Ard Biesheuvel (16):
>   arm64: kaslr: Adjust randomization range dynamically
>   arm64: mm: get rid of kimage_vaddr global variable
>   arm64: head: remove order argument from early mapping routine
>   arm64: mm: Handle LVA support as a CPU feature
>   arm64: mm: Deal with potential ID map extension if VA_BITS >
>     VA_BITS_MIN
>   arm64: mm: Add feature override support for LVA
>   arm64: mm: Wire up TCR.DS bit to PTE shareability fields
>   arm64: mm: Add LPA2 support to phys<->pte conversion routines
>   arm64: mm: Add definitions to support 5 levels of paging
>   arm64: mm: add 5 level paging support to G-to-nG conversion routine
>   arm64: Enable LPA2 at boot if supported by the system
>   arm64: mm: Add 5 level paging support to fixmap and swapper handling
>   arm64: kasan: Reduce minimum shadow alignment and enable 5 level
>     paging
>   arm64: mm: Add support for folding PUDs at runtime
>   arm64: ptdump: Disregard unaddressable VA space
>   arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
> 
>  arch/arm64/Kconfig                      |  23 ++-
>  arch/arm64/include/asm/assembler.h      |  42 ++---
>  arch/arm64/include/asm/cpufeature.h     |   2 +
>  arch/arm64/include/asm/fixmap.h         |   1 +
>  arch/arm64/include/asm/kernel-pgtable.h |  27 ++-
>  arch/arm64/include/asm/memory.h         |  23 ++-
>  arch/arm64/include/asm/pgalloc.h        |  53 +++++-
>  arch/arm64/include/asm/pgtable-hwdef.h  |  34 +++-
>  arch/arm64/include/asm/pgtable-prot.h   |  18 +-
>  arch/arm64/include/asm/pgtable-types.h  |   6 +
>  arch/arm64/include/asm/pgtable.h        | 197 ++++++++++++++++++--
>  arch/arm64/include/asm/sysreg.h         |   2 +
>  arch/arm64/include/asm/tlb.h            |   3 +-
>  arch/arm64/kernel/cpufeature.c          |  46 ++++-
>  arch/arm64/kernel/head.S                |  99 +++++-----
>  arch/arm64/kernel/image-vars.h          |   4 +
>  arch/arm64/kernel/pi/idreg-override.c   |  29 ++-
>  arch/arm64/kernel/pi/kaslr_early.c      |  23 ++-
>  arch/arm64/kernel/pi/map_kernel.c       | 115 +++++++++++-
>  arch/arm64/kernel/sleep.S               |   3 -
>  arch/arm64/mm/init.c                    |   2 +-
>  arch/arm64/mm/kasan_init.c              | 124 ++++++++++--
>  arch/arm64/mm/mmap.c                    |   4 +
>  arch/arm64/mm/mmu.c                     | 138 ++++++++++----
>  arch/arm64/mm/pgd.c                     |  17 +-
>  arch/arm64/mm/proc.S                    |  76 +++++++-
>  arch/arm64/mm/ptdump.c                  |   4 +-
>  arch/arm64/tools/cpucaps                |   1 +
>  28 files changed, 907 insertions(+), 209 deletions(-)
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging
  2022-11-28 16:22     ` Ard Biesheuvel
  2022-11-28 18:00       ` Marc Zyngier
@ 2022-11-29 15:46       ` Ryan Roberts
  2022-11-29 15:48         ` Ard Biesheuvel
  1 sibling, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-11-29 15:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 28/11/2022 16:22, Ard Biesheuvel wrote:
> On Mon, 28 Nov 2022 at 17:17, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 24/11/2022 12:39, Ard Biesheuvel wrote:
>>> Add the required types and descriptor accessors to support 5 levels of
>>> paging in the common code. This is one of the prerequisites for
>>> supporting 52-bit virtual addressing with 4k pages.
>>>
>>> Note that this does not cover the code that handles kernel mappings or
>>> the fixmap.
>>>
>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>> ---
>>>  arch/arm64/include/asm/pgalloc.h       | 41 +++++++++++
>>>  arch/arm64/include/asm/pgtable-hwdef.h | 22 +++++-
>>>  arch/arm64/include/asm/pgtable-types.h |  6 ++
>>>  arch/arm64/include/asm/pgtable.h       | 75 +++++++++++++++++++-
>>>  arch/arm64/mm/mmu.c                    | 31 +++++++-
>>>  arch/arm64/mm/pgd.c                    | 15 +++-
>>>  6 files changed, 181 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
>>> index 237224484d0f..cae8c648f462 100644
>>> --- a/arch/arm64/include/asm/pgalloc.h
>>> +++ b/arch/arm64/include/asm/pgalloc.h
>>> @@ -60,6 +60,47 @@ static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
>>>  }
>>>  #endif       /* CONFIG_PGTABLE_LEVELS > 3 */
>>>
>>> +#if CONFIG_PGTABLE_LEVELS > 4
>>> +
>>> +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
>>> +{
>>> +     if (pgtable_l5_enabled())
>>> +             set_pgd(pgdp, __pgd(__phys_to_pgd_val(p4dp) | prot));
>>> +}
>>> +
>>> +static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
>>> +{
>>> +     pgdval_t pgdval = PGD_TYPE_TABLE;
>>> +
>>> +     pgdval |= (mm == &init_mm) ? PGD_TABLE_UXN : PGD_TABLE_PXN;
>>> +     __pgd_populate(pgdp, __pa(p4dp), pgdval);
>>> +}
>>> +
>>> +static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
>>> +{
>>> +     gfp_t gfp = GFP_PGTABLE_USER;
>>> +
>>> +     if (mm == &init_mm)
>>> +             gfp = GFP_PGTABLE_KERNEL;
>>> +     return (p4d_t *)get_zeroed_page(gfp);
>>> +}
>>> +
>>> +static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
>>> +{
>>> +     if (!pgtable_l5_enabled())
>>> +             return;
>>> +     BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
>>> +     free_page((unsigned long)p4d);
>>> +}
>>> +
>>> +#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
>>> +#else
>>> +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
>>> +{
>>> +     BUILD_BUG();
>>> +}
>>> +#endif       /* CONFIG_PGTABLE_LEVELS > 4 */
>>> +
>>>  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>>>  extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>>> index b91fe4781b06..b364b02e696b 100644
>>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>>> @@ -26,10 +26,10 @@
>>>  #define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))
>>>
>>>  /*
>>> - * Size mapped by an entry at level n ( 0 <= n <= 3)
>>> + * Size mapped by an entry at level n ( -1 <= n <= 3)
>>>   * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits
>>>   * in the final page. The maximum number of translation levels supported by
>>> - * the architecture is 4. Hence, starting at level n, we have further
>>> + * the architecture is 5. Hence, starting at level n, we have further
>>>   * ((4 - n) - 1) levels of translation excluding the offset within the page.
>>>   * So, the total number of bits mapped by an entry at level n is :
>>>   *
>>
>> Is it neccessary to represent the levels as (-1 - 3) in the kernel or are you
>> open to switching to (0 - 4)?
>>
>> There are a couple of other places where translation level is used, which I
>> found and fixed up for the KVM LPA2 support work. It got a bit messy to
>> represent the levels using the architectural range (-1 - 3) so I ended up
>> representing them as (0 - 4). The main issue was that KVM represents level as
>> unsigned so that change would have looked quite big.
>>
>> Most of this is confined to KVM and the only place it really crosses over with
>> the kernel is at __tlbi_level(). Which makes me think you might be missing some
>> required changes (I didn't notice these in your other patches):
>>
>> Looking at the TLB management stuff, I think there are some places you will need
>> to fix up to correctly handle the extra level in the kernel (e.g.
>> tlb_get_level(), flush_tlb_range()).
>>
>> There are some new ecodings for level in the FSC field in the ESR. You might
>> need to update the fault_info array in fault.c to represent these and correctly
>> handle user space faults for the new level?
>>
> 
> Hi Ryan,
> 
> Thanks for pointing this out. Once I have educated myself a bit more
> about all of this, I should be able to answer your questions :-)

I've just noticed one more thing: get_user_mapping_size() in
arch/arm64/kvm/mmu.c uses CONFIG_PGTABLE_LEVELS to calculate the start level of
a user space page table. I guess that will need some attention now that the
runtime value might be smaller than this macro on systems that don't support LPA2?

> 
> I did not do any user space testing in anger on this series, on the
> assumption that we already support 52-bit VAs, but I completely missed
> the fact that the additional level of paging requires additional
> attention.
> 
> As for the level indexing: I have a slight preference for sticking
> with the architectural range, but I don't deeply care either way.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-29 15:31 ` Ryan Roberts
@ 2022-11-29 15:47   ` Ard Biesheuvel
  2022-11-29 16:35     ` Ryan Roberts
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-29 15:47 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Tue, 29 Nov 2022 at 16:31, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> Hi Ard,
>
> As promised, I ran your patch set through my test set up and have noticed a few
> issues. Sorry it turned into rather a long email...
>

No worries, and thanks a lot for going through the trouble.

> First, a quick explanation of the test suite: For all valid combinations of the
> below parameters, boot the host kernel on the FVP, then boot the guest kernel in
> a VM, check that booting succeeds all the way to the guest shell then poweroff
> guest followed by host can check shutdown is clean.
>
> Parameters:
>  - hw_pa:               [48, lpa, lpa2]
>  - hw_va:               [48, 52]
>  - kvm_mode:            [vhe, nvhe, protected]
>  - host_page_size:      [4KB, 16KB, 64KB]
>  - host_pa:             [48, 52]
>  - host_va:             [48, 52]
>  - host_load_addr:      [low, high]
>  - guest_page_size:     [64KB]
>  - guest_pa:            [52]
>  - guest_va:            [52]
>  - guest_load_addr:     [low, high]
>
> When *_load_addr is 'low', that means the RAM is below 48 bits in (I)PA space.
> 'high' means the RAM starts at 2048TB for the guest (52 bit PA), and it means
> there are 2 regions for the host; one at 0x880000000000 (48 bit PA) sized to
> hold the kernel image only and another at 0x8800000000000 (52 bit PA) sized at
> 4GB. The FVP only allows RAM at certain locations and having a contiguous region
> cross the 48 bit boundary is not an option. So I chose these values to ensure
> that the linear map size is within 51 bits, which is a requirement for
> nhve/protected mode kvm.
>
> In all cases, I preload TF-A bl31, kernel, dt and initrd into RAM and run the
> FVP. This sidesteps problems with EFI needing low memory, and with the FVP's
> block devices needing DMA memory below 44 bits PA. bl31 and dt are appropriately
> fixed up for the 2 different memory layouts.
>
> Given this was designed to test my KVM changes, I was previously running these
> without the host_load_addr=high option for the 4k and 16k host kernels (since
> this requires your patch set). In this situation there are 132 valid configs and
> all of them pass.
>
> I then rebased my changes on top of yours and added in the host_load_addr=high
> option. Now there are 186 valid configs, 64 of which fail. (some of these
> failures are regressions). From a quick initial triage, there are 3 failure modes:
>
>
> 1) 18 FAILING TESTS: Host kernel never outputs anything to console
>
>   TF-A runs successfully, says it is jumping to the kernel, then nothing further
>   is seen. I'm pretty confident that the blobs are loaded into memory correctly
>   because the same framework is working for the other configs (including 64k
>   kernel loaded into high memory). This affects all configs where a host kernel
>   with 4k or 16k pages built with LPA2 support is loaded into high memory.
>

Not sure how to interpret this in combination with your explanation
above, but if 'loaded high' means that the kernel itself is not in
48-bit addressable physical memory, this failure is expected.

Given that we have no way of informing the bootloader or firmware
whether or not a certain kernel image supports running from such a
high offset, it must currently assume it cannot. We've just queued up
a documentation fix to clarify this in the boot protocol, i.e., that
the kernel must be loaded in 48-bit addressable physical memory.

The fact that you had to doctor your boot environment to get around
this kind of proves my point, and unless someone is silly enough to
ship a SoC that cannot function without this, I don't think we should
add this support.

I understand how this is an interesting case for completeness from a
validation pov, but the reality is that adding support for this would
mean introducing changes amounting to dead code to fragile boot
sequence code that is already hard to maintain.

>
> 2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM
>
>   During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
>   failing tests are configured for protected KVM, and are build with LPA2
>   support, running on non-LPA2 HW.
>

I will try to reproduce this locally.

>
> 3) 42 FAILING TESTS: Guest kernel never outputs anything to console
>
>   Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
>   There is no error reported, but the guest never outputs anything. Haven't
>   worked out which config options are common to all failures yet.
>

This goes a bit beyond what I am currently set up for in terms of
testing, but I'm happy to help narrow this down.

>
> Finally, I removed my code, and ran with your patch set as provided. For this I
> hacked up my test suite to boot the host, and ignore booting a guest. I also
> didn't bother to vary the KVM mode and just left it in VHE mode. There were 46
> valid configs here, of which 4 failed. They were all the same failure mode as
> (1) above. Failing configs were:
>
> id  hw_pa  hw_va  host_page_size  host_pa  host_va  host_load_addr
> ------------------------------------------------------------------
> 40  lpa    52     4k              52       52       high
> 45  lpa    52     16k             52       52       high
> 55  lpa2   52     4k              52       52       high
> 60  lpa2   52     16k             52       52       high
>

Same point as above then, I guess.

>
> So on the balance of probabilities, I think failure mode (1) is very likely to
> be due to a bug in your code. (2) and (3) could be my issue or your issue: I
> propose to dig into those a bit further and will get back to you on them. I
> don't plan to look any further into (1).
>

Thanks again. (1) is expected, and (2) is something I will investigate further.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging
  2022-11-29 15:46       ` Ryan Roberts
@ 2022-11-29 15:48         ` Ard Biesheuvel
  0 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-29 15:48 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Tue, 29 Nov 2022 at 16:46, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 28/11/2022 16:22, Ard Biesheuvel wrote:
> > On Mon, 28 Nov 2022 at 17:17, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 24/11/2022 12:39, Ard Biesheuvel wrote:
> >>> Add the required types and descriptor accessors to support 5 levels of
> >>> paging in the common code. This is one of the prerequisites for
> >>> supporting 52-bit virtual addressing with 4k pages.
> >>>
> >>> Note that this does not cover the code that handles kernel mappings or
> >>> the fixmap.
> >>>
> >>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >>> ---
> >>>  arch/arm64/include/asm/pgalloc.h       | 41 +++++++++++
> >>>  arch/arm64/include/asm/pgtable-hwdef.h | 22 +++++-
> >>>  arch/arm64/include/asm/pgtable-types.h |  6 ++
> >>>  arch/arm64/include/asm/pgtable.h       | 75 +++++++++++++++++++-
> >>>  arch/arm64/mm/mmu.c                    | 31 +++++++-
> >>>  arch/arm64/mm/pgd.c                    | 15 +++-
> >>>  6 files changed, 181 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> >>> index 237224484d0f..cae8c648f462 100644
> >>> --- a/arch/arm64/include/asm/pgalloc.h
> >>> +++ b/arch/arm64/include/asm/pgalloc.h
> >>> @@ -60,6 +60,47 @@ static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
> >>>  }
> >>>  #endif       /* CONFIG_PGTABLE_LEVELS > 3 */
> >>>
> >>> +#if CONFIG_PGTABLE_LEVELS > 4
> >>> +
> >>> +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
> >>> +{
> >>> +     if (pgtable_l5_enabled())
> >>> +             set_pgd(pgdp, __pgd(__phys_to_pgd_val(p4dp) | prot));
> >>> +}
> >>> +
> >>> +static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
> >>> +{
> >>> +     pgdval_t pgdval = PGD_TYPE_TABLE;
> >>> +
> >>> +     pgdval |= (mm == &init_mm) ? PGD_TABLE_UXN : PGD_TABLE_PXN;
> >>> +     __pgd_populate(pgdp, __pa(p4dp), pgdval);
> >>> +}
> >>> +
> >>> +static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
> >>> +{
> >>> +     gfp_t gfp = GFP_PGTABLE_USER;
> >>> +
> >>> +     if (mm == &init_mm)
> >>> +             gfp = GFP_PGTABLE_KERNEL;
> >>> +     return (p4d_t *)get_zeroed_page(gfp);
> >>> +}
> >>> +
> >>> +static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
> >>> +{
> >>> +     if (!pgtable_l5_enabled())
> >>> +             return;
> >>> +     BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
> >>> +     free_page((unsigned long)p4d);
> >>> +}
> >>> +
> >>> +#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
> >>> +#else
> >>> +static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
> >>> +{
> >>> +     BUILD_BUG();
> >>> +}
> >>> +#endif       /* CONFIG_PGTABLE_LEVELS > 4 */
> >>> +
> >>>  extern pgd_t *pgd_alloc(struct mm_struct *mm);
> >>>  extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
> >>>
> >>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> >>> index b91fe4781b06..b364b02e696b 100644
> >>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> >>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> >>> @@ -26,10 +26,10 @@
> >>>  #define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))
> >>>
> >>>  /*
> >>> - * Size mapped by an entry at level n ( 0 <= n <= 3)
> >>> + * Size mapped by an entry at level n ( -1 <= n <= 3)
> >>>   * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits
> >>>   * in the final page. The maximum number of translation levels supported by
> >>> - * the architecture is 4. Hence, starting at level n, we have further
> >>> + * the architecture is 5. Hence, starting at level n, we have further
> >>>   * ((4 - n) - 1) levels of translation excluding the offset within the page.
> >>>   * So, the total number of bits mapped by an entry at level n is :
> >>>   *
> >>
> >> Is it neccessary to represent the levels as (-1 - 3) in the kernel or are you
> >> open to switching to (0 - 4)?
> >>
> >> There are a couple of other places where translation level is used, which I
> >> found and fixed up for the KVM LPA2 support work. It got a bit messy to
> >> represent the levels using the architectural range (-1 - 3) so I ended up
> >> representing them as (0 - 4). The main issue was that KVM represents level as
> >> unsigned so that change would have looked quite big.
> >>
> >> Most of this is confined to KVM and the only place it really crosses over with
> >> the kernel is at __tlbi_level(). Which makes me think you might be missing some
> >> required changes (I didn't notice these in your other patches):
> >>
> >> Looking at the TLB management stuff, I think there are some places you will need
> >> to fix up to correctly handle the extra level in the kernel (e.g.
> >> tlb_get_level(), flush_tlb_range()).
> >>
> >> There are some new ecodings for level in the FSC field in the ESR. You might
> >> need to update the fault_info array in fault.c to represent these and correctly
> >> handle user space faults for the new level?
> >>
> >
> > Hi Ryan,
> >
> > Thanks for pointing this out. Once I have educated myself a bit more
> > about all of this, I should be able to answer your questions :-)
>
> I've just noticed one more thing: get_user_mapping_size() in
> arch/arm64/kvm/mmu.c uses CONFIG_PGTABLE_LEVELS to calculate the start level of
> a user space page table. I guess that will need some attention now that the
> runtime value might be smaller than this macro on systems that don't support LPA2?

Indeed. In general, every reference to that quantity should now take
pgtable_l4_enabled() and pgtable_l5_enabled() into account as well.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-29 15:47   ` Ard Biesheuvel
@ 2022-11-29 16:35     ` Ryan Roberts
  2022-11-29 16:56       ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-11-29 16:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 29/11/2022 15:47, Ard Biesheuvel wrote:
> On Tue, 29 Nov 2022 at 16:31, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Hi Ard,
>>
>> As promised, I ran your patch set through my test set up and have noticed a few
>> issues. Sorry it turned into rather a long email...
>>
> 
> No worries, and thanks a lot for going through the trouble.
> 
>> First, a quick explanation of the test suite: For all valid combinations of the
>> below parameters, boot the host kernel on the FVP, then boot the guest kernel in
>> a VM, check that booting succeeds all the way to the guest shell then poweroff
>> guest followed by host can check shutdown is clean.
>>
>> Parameters:
>>  - hw_pa:               [48, lpa, lpa2]
>>  - hw_va:               [48, 52]
>>  - kvm_mode:            [vhe, nvhe, protected]
>>  - host_page_size:      [4KB, 16KB, 64KB]
>>  - host_pa:             [48, 52]
>>  - host_va:             [48, 52]
>>  - host_load_addr:      [low, high]
>>  - guest_page_size:     [64KB]
>>  - guest_pa:            [52]
>>  - guest_va:            [52]
>>  - guest_load_addr:     [low, high]
>>
>> When *_load_addr is 'low', that means the RAM is below 48 bits in (I)PA space.
>> 'high' means the RAM starts at 2048TB for the guest (52 bit PA), and it means
>> there are 2 regions for the host; one at 0x880000000000 (48 bit PA) sized to
>> hold the kernel image only and another at 0x8800000000000 (52 bit PA) sized at
>> 4GB. The FVP only allows RAM at certain locations and having a contiguous region
>> cross the 48 bit boundary is not an option. So I chose these values to ensure
>> that the linear map size is within 51 bits, which is a requirement for
>> nhve/protected mode kvm.
>>
>> In all cases, I preload TF-A bl31, kernel, dt and initrd into RAM and run the
>> FVP. This sidesteps problems with EFI needing low memory, and with the FVP's
>> block devices needing DMA memory below 44 bits PA. bl31 and dt are appropriately
>> fixed up for the 2 different memory layouts.
>>
>> Given this was designed to test my KVM changes, I was previously running these
>> without the host_load_addr=high option for the 4k and 16k host kernels (since
>> this requires your patch set). In this situation there are 132 valid configs and
>> all of them pass.
>>
>> I then rebased my changes on top of yours and added in the host_load_addr=high
>> option. Now there are 186 valid configs, 64 of which fail. (some of these
>> failures are regressions). From a quick initial triage, there are 3 failure modes:
>>
>>
>> 1) 18 FAILING TESTS: Host kernel never outputs anything to console
>>
>>   TF-A runs successfully, says it is jumping to the kernel, then nothing further
>>   is seen. I'm pretty confident that the blobs are loaded into memory correctly
>>   because the same framework is working for the other configs (including 64k
>>   kernel loaded into high memory). This affects all configs where a host kernel
>>   with 4k or 16k pages built with LPA2 support is loaded into high memory.
>>
> 
> Not sure how to interpret this in combination with your explanation
> above, but if 'loaded high' means that the kernel itself is not in
> 48-bit addressable physical memory, this failure is expected.

Sorry - my wording was confusing. host_load_addr=high means what I said at the
top; the kernel image is loaded at 0x880000000000 in a block of memory sized to
hold the kernel image only (actually its forward aligned to 2MB). The dtb and
initrd are loaded into a 4GB region at 0x8800000000000. The reason I'm doing
this is to ensure that when I create a VM, the memory used for it (at least the
vast majority) is coming from the region at 52 bits. I want to do this to prove
that the stage2 implementation is correctly handling the 52 OA case.

> 
> Given that we have no way of informing the bootloader or firmware
> whether or not a certain kernel image supports running from such a
> high offset, it must currently assume it cannot. We've just queued up
> a documentation fix to clarify this in the boot protocol, i.e., that
> the kernel must be loaded in 48-bit addressable physical memory.

OK, but I think what I'm doing complies with this. Unless the DTB also has to be
below 48 bits?

> 
> The fact that you had to doctor your boot environment to get around
> this kind of proves my point, and unless someone is silly enough to
> ship a SoC that cannot function without this, I don't think we should
> add this support.
> 
> I understand how this is an interesting case for completeness from a
> validation pov, but the reality is that adding support for this would
> mean introducing changes amounting to dead code to fragile boot
> sequence code that is already hard to maintain.

I'm not disagreeing. But I think what I'm doing should conform with the
requirements? (Previously I had the tests set up to just have a single region of
memory above 52 bits and the kernel image was placed there. That works/worked
for the 64KB kernel. But I brought the kernel image to below 48 bits to align
with the requirements of this patch set.

If you see an easier way for me to validate 52 bit OAs in the stage 2 (and
ideally hyp stage 1), then I'm all ears!

> 
>>
>> 2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM
>>
>>   During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
>>   failing tests are configured for protected KVM, and are build with LPA2
>>   support, running on non-LPA2 HW.
>>
> 
> I will try to reproduce this locally.
> 
>>
>> 3) 42 FAILING TESTS: Guest kernel never outputs anything to console
>>
>>   Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
>>   There is no error reported, but the guest never outputs anything. Haven't
>>   worked out which config options are common to all failures yet.
>>
> 
> This goes a bit beyond what I am currently set up for in terms of
> testing, but I'm happy to help narrow this down.
> 
>>
>> Finally, I removed my code, and ran with your patch set as provided. For this I
>> hacked up my test suite to boot the host, and ignore booting a guest. I also
>> didn't bother to vary the KVM mode and just left it in VHE mode. There were 46
>> valid configs here, of which 4 failed. They were all the same failure mode as
>> (1) above. Failing configs were:
>>
>> id  hw_pa  hw_va  host_page_size  host_pa  host_va  host_load_addr
>> ------------------------------------------------------------------
>> 40  lpa    52     4k              52       52       high
>> 45  lpa    52     16k             52       52       high
>> 55  lpa2   52     4k              52       52       high
>> 60  lpa2   52     16k             52       52       high
>>
> 
> Same point as above then, I guess.
> 
>>
>> So on the balance of probabilities, I think failure mode (1) is very likely to
>> be due to a bug in your code. (2) and (3) could be my issue or your issue: I
>> propose to dig into those a bit further and will get back to you on them. I
>> don't plan to look any further into (1).
>>
> 
> Thanks again. (1) is expected, and (2) is something I will investigate further.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-29 16:35     ` Ryan Roberts
@ 2022-11-29 16:56       ` Ard Biesheuvel
  2022-12-01 12:22         ` Ryan Roberts
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-11-29 16:56 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Tue, 29 Nov 2022 at 17:36, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 29/11/2022 15:47, Ard Biesheuvel wrote:
> > On Tue, 29 Nov 2022 at 16:31, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> Hi Ard,
> >>
> >> As promised, I ran your patch set through my test set up and have noticed a few
> >> issues. Sorry it turned into rather a long email...
> >>
> >
> > No worries, and thanks a lot for going through the trouble.
> >
> >> First, a quick explanation of the test suite: For all valid combinations of the
> >> below parameters, boot the host kernel on the FVP, then boot the guest kernel in
> >> a VM, check that booting succeeds all the way to the guest shell then poweroff
> >> guest followed by host can check shutdown is clean.
> >>
> >> Parameters:
> >>  - hw_pa:               [48, lpa, lpa2]
> >>  - hw_va:               [48, 52]
> >>  - kvm_mode:            [vhe, nvhe, protected]
> >>  - host_page_size:      [4KB, 16KB, 64KB]
> >>  - host_pa:             [48, 52]
> >>  - host_va:             [48, 52]
> >>  - host_load_addr:      [low, high]
> >>  - guest_page_size:     [64KB]
> >>  - guest_pa:            [52]
> >>  - guest_va:            [52]
> >>  - guest_load_addr:     [low, high]
> >>
> >> When *_load_addr is 'low', that means the RAM is below 48 bits in (I)PA space.
> >> 'high' means the RAM starts at 2048TB for the guest (52 bit PA), and it means
> >> there are 2 regions for the host; one at 0x880000000000 (48 bit PA) sized to
> >> hold the kernel image only and another at 0x8800000000000 (52 bit PA) sized at
> >> 4GB. The FVP only allows RAM at certain locations and having a contiguous region
> >> cross the 48 bit boundary is not an option. So I chose these values to ensure
> >> that the linear map size is within 51 bits, which is a requirement for
> >> nhve/protected mode kvm.
> >>
> >> In all cases, I preload TF-A bl31, kernel, dt and initrd into RAM and run the
> >> FVP. This sidesteps problems with EFI needing low memory, and with the FVP's
> >> block devices needing DMA memory below 44 bits PA. bl31 and dt are appropriately
> >> fixed up for the 2 different memory layouts.
> >>
> >> Given this was designed to test my KVM changes, I was previously running these
> >> without the host_load_addr=high option for the 4k and 16k host kernels (since
> >> this requires your patch set). In this situation there are 132 valid configs and
> >> all of them pass.
> >>
> >> I then rebased my changes on top of yours and added in the host_load_addr=high
> >> option. Now there are 186 valid configs, 64 of which fail. (some of these
> >> failures are regressions). From a quick initial triage, there are 3 failure modes:
> >>
> >>
> >> 1) 18 FAILING TESTS: Host kernel never outputs anything to console
> >>
> >>   TF-A runs successfully, says it is jumping to the kernel, then nothing further
> >>   is seen. I'm pretty confident that the blobs are loaded into memory correctly
> >>   because the same framework is working for the other configs (including 64k
> >>   kernel loaded into high memory). This affects all configs where a host kernel
> >>   with 4k or 16k pages built with LPA2 support is loaded into high memory.
> >>
> >
> > Not sure how to interpret this in combination with your explanation
> > above, but if 'loaded high' means that the kernel itself is not in
> > 48-bit addressable physical memory, this failure is expected.
>
> Sorry - my wording was confusing. host_load_addr=high means what I said at the
> top; the kernel image is loaded at 0x880000000000 in a block of memory sized to
> hold the kernel image only (actually its forward aligned to 2MB). The dtb and
> initrd are loaded into a 4GB region at 0x8800000000000. The reason I'm doing
> this is to ensure that when I create a VM, the memory used for it (at least the
> vast majority) is coming from the region at 52 bits. I want to do this to prove
> that the stage2 implementation is correctly handling the 52 OA case.
>
> >
> > Given that we have no way of informing the bootloader or firmware
> > whether or not a certain kernel image supports running from such a
> > high offset, it must currently assume it cannot. We've just queued up
> > a documentation fix to clarify this in the boot protocol, i.e., that
> > the kernel must be loaded in 48-bit addressable physical memory.
>
> OK, but I think what I'm doing complies with this. Unless the DTB also has to be
> below 48 bits?
>

Ahh yes, good point. Yes, this is actually implied but we should
clarify this. Or fix it.

But the same reasoning applies: currently, a loader cannot know from
looking at a certain binary whether or not it supports addressing any
memory outside of the 48-bit addressable range, so any asset loading
into physical memory is affected by the same limitation.

This has implications for other firmware aspects as well, i.e., ACPI tables etc.

> >
> > The fact that you had to doctor your boot environment to get around
> > this kind of proves my point, and unless someone is silly enough to
> > ship a SoC that cannot function without this, I don't think we should
> > add this support.
> >
> > I understand how this is an interesting case for completeness from a
> > validation pov, but the reality is that adding support for this would
> > mean introducing changes amounting to dead code to fragile boot
> > sequence code that is already hard to maintain.
>
> I'm not disagreeing. But I think what I'm doing should conform with the
> requirements? (Previously I had the tests set up to just have a single region of
> memory above 52 bits and the kernel image was placed there. That works/worked
> for the 64KB kernel. But I brought the kernel image to below 48 bits to align
> with the requirements of this patch set.
>
> If you see an easier way for me to validate 52 bit OAs in the stage 2 (and
> ideally hyp stage 1), then I'm all ears!
>

There is a Kconfig knob CONFIG_ARM64_FORCE_52BIT which was introduced
for a similar purpose, i.e., to ensure that the 52-bit range gets
utilized. I could imagine adding a similar control for KVM in
particular or for preferring allocations from outside the 48-bit PA
range in general.

> >
> >>
> >> 2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM
> >>
> >>   During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
> >>   failing tests are configured for protected KVM, and are build with LPA2
> >>   support, running on non-LPA2 HW.
> >>
> >
> > I will try to reproduce this locally.
> >
> >>
> >> 3) 42 FAILING TESTS: Guest kernel never outputs anything to console
> >>
> >>   Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
> >>   There is no error reported, but the guest never outputs anything. Haven't
> >>   worked out which config options are common to all failures yet.
> >>
> >
> > This goes a bit beyond what I am currently set up for in terms of
> > testing, but I'm happy to help narrow this down.
> >
> >>
> >> Finally, I removed my code, and ran with your patch set as provided. For this I
> >> hacked up my test suite to boot the host, and ignore booting a guest. I also
> >> didn't bother to vary the KVM mode and just left it in VHE mode. There were 46
> >> valid configs here, of which 4 failed. They were all the same failure mode as
> >> (1) above. Failing configs were:
> >>
> >> id  hw_pa  hw_va  host_page_size  host_pa  host_va  host_load_addr
> >> ------------------------------------------------------------------
> >> 40  lpa    52     4k              52       52       high
> >> 45  lpa    52     16k             52       52       high
> >> 55  lpa2   52     4k              52       52       high
> >> 60  lpa2   52     16k             52       52       high
> >>
> >
> > Same point as above then, I guess.
> >
> >>
> >> So on the balance of probabilities, I think failure mode (1) is very likely to
> >> be due to a bug in your code. (2) and (3) could be my issue or your issue: I
> >> propose to dig into those a bit further and will get back to you on them. I
> >> don't plan to look any further into (1).
> >>
> >
> > Thanks again. (1) is expected, and (2) is something I will investigate further.
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-11-29 16:56       ` Ard Biesheuvel
@ 2022-12-01 12:22         ` Ryan Roberts
  2022-12-01 13:43           ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Ryan Roberts @ 2022-12-01 12:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

Hi Ard,

I wanted to provide a quick update on the debugging I've been doing from my end.
See below...


On 29/11/2022 16:56, Ard Biesheuvel wrote:
> On Tue, 29 Nov 2022 at 17:36, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 29/11/2022 15:47, Ard Biesheuvel wrote:
>>> On Tue, 29 Nov 2022 at 16:31, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>> As promised, I ran your patch set through my test set up and have noticed a few
>>>> issues. Sorry it turned into rather a long email...
>>>>
>>>
>>> No worries, and thanks a lot for going through the trouble.
>>>
>>>> First, a quick explanation of the test suite: For all valid combinations of the
>>>> below parameters, boot the host kernel on the FVP, then boot the guest kernel in
>>>> a VM, check that booting succeeds all the way to the guest shell then poweroff
>>>> guest followed by host can check shutdown is clean.
>>>>
>>>> Parameters:
>>>>  - hw_pa:               [48, lpa, lpa2]
>>>>  - hw_va:               [48, 52]
>>>>  - kvm_mode:            [vhe, nvhe, protected]
>>>>  - host_page_size:      [4KB, 16KB, 64KB]
>>>>  - host_pa:             [48, 52]
>>>>  - host_va:             [48, 52]
>>>>  - host_load_addr:      [low, high]
>>>>  - guest_page_size:     [64KB]
>>>>  - guest_pa:            [52]
>>>>  - guest_va:            [52]
>>>>  - guest_load_addr:     [low, high]
>>>>
>>>> When *_load_addr is 'low', that means the RAM is below 48 bits in (I)PA space.
>>>> 'high' means the RAM starts at 2048TB for the guest (52 bit PA), and it means
>>>> there are 2 regions for the host; one at 0x880000000000 (48 bit PA) sized to
>>>> hold the kernel image only and another at 0x8800000000000 (52 bit PA) sized at
>>>> 4GB. The FVP only allows RAM at certain locations and having a contiguous region
>>>> cross the 48 bit boundary is not an option. So I chose these values to ensure
>>>> that the linear map size is within 51 bits, which is a requirement for
>>>> nhve/protected mode kvm.
>>>>
>>>> In all cases, I preload TF-A bl31, kernel, dt and initrd into RAM and run the
>>>> FVP. This sidesteps problems with EFI needing low memory, and with the FVP's
>>>> block devices needing DMA memory below 44 bits PA. bl31 and dt are appropriately
>>>> fixed up for the 2 different memory layouts.
>>>>
>>>> Given this was designed to test my KVM changes, I was previously running these
>>>> without the host_load_addr=high option for the 4k and 16k host kernels (since
>>>> this requires your patch set). In this situation there are 132 valid configs and
>>>> all of them pass.
>>>>
>>>> I then rebased my changes on top of yours and added in the host_load_addr=high
>>>> option. Now there are 186 valid configs, 64 of which fail. (some of these
>>>> failures are regressions). From a quick initial triage, there are 3 failure modes:
>>>>
>>>>
>>>> 1) 18 FAILING TESTS: Host kernel never outputs anything to console
>>>>
>>>>   TF-A runs successfully, says it is jumping to the kernel, then nothing further
>>>>   is seen. I'm pretty confident that the blobs are loaded into memory correctly
>>>>   because the same framework is working for the other configs (including 64k
>>>>   kernel loaded into high memory). This affects all configs where a host kernel
>>>>   with 4k or 16k pages built with LPA2 support is loaded into high memory.
>>>>
>>>
>>> Not sure how to interpret this in combination with your explanation
>>> above, but if 'loaded high' means that the kernel itself is not in
>>> 48-bit addressable physical memory, this failure is expected.
>>
>> Sorry - my wording was confusing. host_load_addr=high means what I said at the
>> top; the kernel image is loaded at 0x880000000000 in a block of memory sized to
>> hold the kernel image only (actually its forward aligned to 2MB). The dtb and
>> initrd are loaded into a 4GB region at 0x8800000000000. The reason I'm doing
>> this is to ensure that when I create a VM, the memory used for it (at least the
>> vast majority) is coming from the region at 52 bits. I want to do this to prove
>> that the stage2 implementation is correctly handling the 52 OA case.
>>
>>>
>>> Given that we have no way of informing the bootloader or firmware
>>> whether or not a certain kernel image supports running from such a
>>> high offset, it must currently assume it cannot. We've just queued up
>>> a documentation fix to clarify this in the boot protocol, i.e., that
>>> the kernel must be loaded in 48-bit addressable physical memory.
>>
>> OK, but I think what I'm doing complies with this. Unless the DTB also has to be
>> below 48 bits?
>>
> 
> Ahh yes, good point. Yes, this is actually implied but we should
> clarify this. Or fix it.
> 
> But the same reasoning applies: currently, a loader cannot know from
> looking at a certain binary whether or not it supports addressing any
> memory outside of the 48-bit addressable range, so any asset loading
> into physical memory is affected by the same limitation.
> 
> This has implications for other firmware aspects as well, i.e., ACPI tables etc.
> 
>>>
>>> The fact that you had to doctor your boot environment to get around
>>> this kind of proves my point, and unless someone is silly enough to
>>> ship a SoC that cannot function without this, I don't think we should
>>> add this support.
>>>
>>> I understand how this is an interesting case for completeness from a
>>> validation pov, but the reality is that adding support for this would
>>> mean introducing changes amounting to dead code to fragile boot
>>> sequence code that is already hard to maintain.
>>
>> I'm not disagreeing. But I think what I'm doing should conform with the
>> requirements? (Previously I had the tests set up to just have a single region of
>> memory above 52 bits and the kernel image was placed there. That works/worked
>> for the 64KB kernel. But I brought the kernel image to below 48 bits to align
>> with the requirements of this patch set.
>>
>> If you see an easier way for me to validate 52 bit OAs in the stage 2 (and
>> ideally hyp stage 1), then I'm all ears!
>>
> 
> There is a Kconfig knob CONFIG_ARM64_FORCE_52BIT which was introduced
> for a similar purpose, i.e., to ensure that the 52-bit range gets
> utilized. I could imagine adding a similar control for KVM in
> particular or for preferring allocations from outside the 48-bit PA
> range in general.

I managed to get these all booting after moving the kernel and dtb to low
memory. I've kept the initrd in high memory for now, which works fine. (I think
the initrd memory will get freed and I didn't want any free low memory floating
around to ensure that kvm guests get high memory). Once booting, some of these
get converted to passes, and others remain failures but for other reasons (see
below).


> 
>>>
>>>>
>>>> 2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM
>>>>
>>>>   During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
>>>>   failing tests are configured for protected KVM, and are build with LPA2
>>>>   support, running on non-LPA2 HW.
>>>>
>>>
>>> I will try to reproduce this locally.

It turns out the same issue is hit when running your patches without mine on
top. The root cause in both cases is an assumption that kvm_get_parange() makes
that ID_AA64MMFR0_EL1_PARANGE_MAX will always be 48 for 4KB and 16KB PAGE_SIZE.
That is no longer true with your patches. It causes the VTCR_EL2 to be
programmed incorrectly for the host stage2, then on return to the host, bang.

This demonstrates that kernel stage1 support for LPA2 depends on kvm support for
LPA2, since for protected kvm, the host stage2 needs to be able to id map the
full physical range that the host kernel sees prior to deprivilege. So I don't
think it's fixable in your series. I have a fix in my series for this.

I also found another dependency problem (hit by some of the tests that were
previously failing at the first issue) where kvm uses its page table library to
walk a user space page table created by the kernel (see
get_user_mapping_size()). In the case where the kernel creates an LPA2 page
table, kvm can't walk it without my patches. I've also added a fix for this to
my series.


>>>
>>>>
>>>> 3) 42 FAILING TESTS: Guest kernel never outputs anything to console
>>>>
>>>>   Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
>>>>   There is no error reported, but the guest never outputs anything. Haven't
>>>>   worked out which config options are common to all failures yet.
>>>>
>>>
>>> This goes a bit beyond what I am currently set up for in terms of
>>> testing, but I'm happy to help narrow this down.

I don't have a root cause for this yet. I'll try to take a loo this afternoon.
Will keep you posted.

>>>
>>>>
>>>> Finally, I removed my code, and ran with your patch set as provided. For this I
>>>> hacked up my test suite to boot the host, and ignore booting a guest. I also
>>>> didn't bother to vary the KVM mode and just left it in VHE mode. There were 46
>>>> valid configs here, of which 4 failed. They were all the same failure mode as
>>>> (1) above. Failing configs were:
>>>>
>>>> id  hw_pa  hw_va  host_page_size  host_pa  host_va  host_load_addr
>>>> ------------------------------------------------------------------
>>>> 40  lpa    52     4k              52       52       high
>>>> 45  lpa    52     16k             52       52       high
>>>> 55  lpa2   52     4k              52       52       high
>>>> 60  lpa2   52     16k             52       52       high
>>>>
>>>
>>> Same point as above then, I guess.>>>
>>>>
>>>> So on the balance of probabilities, I think failure mode (1) is very likely to
>>>> be due to a bug in your code. (2) and (3) could be my issue or your issue: I
>>>> propose to dig into those a bit further and will get back to you on them. I
>>>> don't plan to look any further into (1).
>>>>
>>>
>>> Thanks again. (1) is expected, and (2) is something I will investigate further.
>>

Once I have all the tests passing, I'll post my series, then hopefully we can
move it all forwards as one?

As part of my debugging, I've got a patch to sort out the tlbi code to support
LPA2 properly - I think I raised that comment on one of the patches. Are you
happy for me to post as part of my series?

Thanks,
Ryan






_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-12-01 12:22         ` Ryan Roberts
@ 2022-12-01 13:43           ` Ard Biesheuvel
  2022-12-01 16:00             ` Ryan Roberts
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2022-12-01 13:43 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On Thu, 1 Dec 2022 at 13:22, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> Hi Ard,
>
> I wanted to provide a quick update on the debugging I've been doing from my end.
> See below...
>
>
> On 29/11/2022 16:56, Ard Biesheuvel wrote:
> > On Tue, 29 Nov 2022 at 17:36, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 29/11/2022 15:47, Ard Biesheuvel wrote:
> >>> On Tue, 29 Nov 2022 at 16:31, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>
> >>>> Hi Ard,
> >>>>
> >>>> As promised, I ran your patch set through my test set up and have noticed a few
> >>>> issues. Sorry it turned into rather a long email...
> >>>>
> >>>
> >>> No worries, and thanks a lot for going through the trouble.
> >>>
> >>>> First, a quick explanation of the test suite: For all valid combinations of the
> >>>> below parameters, boot the host kernel on the FVP, then boot the guest kernel in
> >>>> a VM, check that booting succeeds all the way to the guest shell then poweroff
> >>>> guest followed by host can check shutdown is clean.
> >>>>
> >>>> Parameters:
> >>>>  - hw_pa:               [48, lpa, lpa2]
> >>>>  - hw_va:               [48, 52]
> >>>>  - kvm_mode:            [vhe, nvhe, protected]
> >>>>  - host_page_size:      [4KB, 16KB, 64KB]
> >>>>  - host_pa:             [48, 52]
> >>>>  - host_va:             [48, 52]
> >>>>  - host_load_addr:      [low, high]
> >>>>  - guest_page_size:     [64KB]
> >>>>  - guest_pa:            [52]
> >>>>  - guest_va:            [52]
> >>>>  - guest_load_addr:     [low, high]
> >>>>
> >>>> When *_load_addr is 'low', that means the RAM is below 48 bits in (I)PA space.
> >>>> 'high' means the RAM starts at 2048TB for the guest (52 bit PA), and it means
> >>>> there are 2 regions for the host; one at 0x880000000000 (48 bit PA) sized to
> >>>> hold the kernel image only and another at 0x8800000000000 (52 bit PA) sized at
> >>>> 4GB. The FVP only allows RAM at certain locations and having a contiguous region
> >>>> cross the 48 bit boundary is not an option. So I chose these values to ensure
> >>>> that the linear map size is within 51 bits, which is a requirement for
> >>>> nhve/protected mode kvm.
> >>>>
> >>>> In all cases, I preload TF-A bl31, kernel, dt and initrd into RAM and run the
> >>>> FVP. This sidesteps problems with EFI needing low memory, and with the FVP's
> >>>> block devices needing DMA memory below 44 bits PA. bl31 and dt are appropriately
> >>>> fixed up for the 2 different memory layouts.
> >>>>
> >>>> Given this was designed to test my KVM changes, I was previously running these
> >>>> without the host_load_addr=high option for the 4k and 16k host kernels (since
> >>>> this requires your patch set). In this situation there are 132 valid configs and
> >>>> all of them pass.
> >>>>
> >>>> I then rebased my changes on top of yours and added in the host_load_addr=high
> >>>> option. Now there are 186 valid configs, 64 of which fail. (some of these
> >>>> failures are regressions). From a quick initial triage, there are 3 failure modes:
> >>>>
> >>>>
> >>>> 1) 18 FAILING TESTS: Host kernel never outputs anything to console
> >>>>
> >>>>   TF-A runs successfully, says it is jumping to the kernel, then nothing further
> >>>>   is seen. I'm pretty confident that the blobs are loaded into memory correctly
> >>>>   because the same framework is working for the other configs (including 64k
> >>>>   kernel loaded into high memory). This affects all configs where a host kernel
> >>>>   with 4k or 16k pages built with LPA2 support is loaded into high memory.
> >>>>
> >>>
> >>> Not sure how to interpret this in combination with your explanation
> >>> above, but if 'loaded high' means that the kernel itself is not in
> >>> 48-bit addressable physical memory, this failure is expected.
> >>
> >> Sorry - my wording was confusing. host_load_addr=high means what I said at the
> >> top; the kernel image is loaded at 0x880000000000 in a block of memory sized to
> >> hold the kernel image only (actually its forward aligned to 2MB). The dtb and
> >> initrd are loaded into a 4GB region at 0x8800000000000. The reason I'm doing
> >> this is to ensure that when I create a VM, the memory used for it (at least the
> >> vast majority) is coming from the region at 52 bits. I want to do this to prove
> >> that the stage2 implementation is correctly handling the 52 OA case.
> >>
> >>>
> >>> Given that we have no way of informing the bootloader or firmware
> >>> whether or not a certain kernel image supports running from such a
> >>> high offset, it must currently assume it cannot. We've just queued up
> >>> a documentation fix to clarify this in the boot protocol, i.e., that
> >>> the kernel must be loaded in 48-bit addressable physical memory.
> >>
> >> OK, but I think what I'm doing complies with this. Unless the DTB also has to be
> >> below 48 bits?
> >>
> >
> > Ahh yes, good point. Yes, this is actually implied but we should
> > clarify this. Or fix it.
> >
> > But the same reasoning applies: currently, a loader cannot know from
> > looking at a certain binary whether or not it supports addressing any
> > memory outside of the 48-bit addressable range, so any asset loading
> > into physical memory is affected by the same limitation.
> >
> > This has implications for other firmware aspects as well, i.e., ACPI tables etc.
> >
> >>>
> >>> The fact that you had to doctor your boot environment to get around
> >>> this kind of proves my point, and unless someone is silly enough to
> >>> ship a SoC that cannot function without this, I don't think we should
> >>> add this support.
> >>>
> >>> I understand how this is an interesting case for completeness from a
> >>> validation pov, but the reality is that adding support for this would
> >>> mean introducing changes amounting to dead code to fragile boot
> >>> sequence code that is already hard to maintain.
> >>
> >> I'm not disagreeing. But I think what I'm doing should conform with the
> >> requirements? (Previously I had the tests set up to just have a single region of
> >> memory above 52 bits and the kernel image was placed there. That works/worked
> >> for the 64KB kernel. But I brought the kernel image to below 48 bits to align
> >> with the requirements of this patch set.
> >>
> >> If you see an easier way for me to validate 52 bit OAs in the stage 2 (and
> >> ideally hyp stage 1), then I'm all ears!
> >>
> >
> > There is a Kconfig knob CONFIG_ARM64_FORCE_52BIT which was introduced
> > for a similar purpose, i.e., to ensure that the 52-bit range gets
> > utilized. I could imagine adding a similar control for KVM in
> > particular or for preferring allocations from outside the 48-bit PA
> > range in general.
>
> I managed to get these all booting after moving the kernel and dtb to low
> memory. I've kept the initrd in high memory for now, which works fine. (I think
> the initrd memory will get freed and I didn't want any free low memory floating
> around to ensure that kvm guests get high memory).

The DT's early mapping is via the ID map, while the initrd is only
mapped much later via the kernel mappings in TTBR1, so this is why it
works.

However, for the boot protocol, this distinction doesn't really
matter: if the boot stack cannot be certain that a certain kernel
image was built to support 52 physical addressing, it simply must
never place anything there.

> Once booting, some of these
> get converted to passes, and others remain failures but for other reasons (see
> below).
>
>
> >
> >>>
> >>>>
> >>>> 2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM
> >>>>
> >>>>   During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
> >>>>   failing tests are configured for protected KVM, and are build with LPA2
> >>>>   support, running on non-LPA2 HW.
> >>>>
> >>>
> >>> I will try to reproduce this locally.
>
> It turns out the same issue is hit when running your patches without mine on
> top. The root cause in both cases is an assumption that kvm_get_parange() makes
> that ID_AA64MMFR0_EL1_PARANGE_MAX will always be 48 for 4KB and 16KB PAGE_SIZE.
> That is no longer true with your patches. It causes the VTCR_EL2 to be
> programmed incorrectly for the host stage2, then on return to the host, bang.
>

Right. I made an attempt at replacing 'u32 level' with 's32 level'
throughout that code, along with some related changes, but I didn't
spot this issue.

> This demonstrates that kernel stage1 support for LPA2 depends on kvm support for
> LPA2, since for protected kvm, the host stage2 needs to be able to id map the
> full physical range that the host kernel sees prior to deprivilege. So I don't
> think it's fixable in your series. I have a fix in my series for this.
>

The reference to ID_AA64MMFR0_EL1_PARANGE_MAX should be fixable in
isolation, no? Even if it results in a KVM that cannot use 52-bit PAs
while the host can.

> I also found another dependency problem (hit by some of the tests that were
> previously failing at the first issue) where kvm uses its page table library to
> walk a user space page table created by the kernel (see
> get_user_mapping_size()). In the case where the kernel creates an LPA2 page
> table, kvm can't walk it without my patches. I've also added a fix for this to
> my series.
>

OK

>
> >>>
> >>>>
> >>>> 3) 42 FAILING TESTS: Guest kernel never outputs anything to console
> >>>>
> >>>>   Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
> >>>>   There is no error reported, but the guest never outputs anything. Haven't
> >>>>   worked out which config options are common to all failures yet.
> >>>>
> >>>
> >>> This goes a bit beyond what I am currently set up for in terms of
> >>> testing, but I'm happy to help narrow this down.
>
> I don't have a root cause for this yet. I'll try to take a loo this afternoon.
> Will keep you posted.
>

Thanks

> >>>
> >>>>
> >>>> Finally, I removed my code, and ran with your patch set as provided. For this I
> >>>> hacked up my test suite to boot the host, and ignore booting a guest. I also
> >>>> didn't bother to vary the KVM mode and just left it in VHE mode. There were 46
> >>>> valid configs here, of which 4 failed. They were all the same failure mode as
> >>>> (1) above. Failing configs were:
> >>>>
> >>>> id  hw_pa  hw_va  host_page_size  host_pa  host_va  host_load_addr
> >>>> ------------------------------------------------------------------
> >>>> 40  lpa    52     4k              52       52       high
> >>>> 45  lpa    52     16k             52       52       high
> >>>> 55  lpa2   52     4k              52       52       high
> >>>> 60  lpa2   52     16k             52       52       high
> >>>>
> >>>
> >>> Same point as above then, I guess.>>>
> >>>>
> >>>> So on the balance of probabilities, I think failure mode (1) is very likely to
> >>>> be due to a bug in your code. (2) and (3) could be my issue or your issue: I
> >>>> propose to dig into those a bit further and will get back to you on them. I
> >>>> don't plan to look any further into (1).
> >>>>
> >>>
> >>> Thanks again. (1) is expected, and (2) is something I will investigate further.
> >>
>
> Once I have all the tests passing, I'll post my series, then hopefully we can
> move it all forwards as one?
>

That would be great, yes, although my work depends on a sizable rework
of the early boot code that has seen very little review as of yet.

So for the time being, let's keep aligned but let's not put any eggs
in each other's baskets :-)

> As part of my debugging, I've got a patch to sort out the tlbi code to support
> LPA2 properly - I think I raised that comment on one of the patches. Are you
> happy for me to post as part of my series?
>

I wasn't sure where to look tbh. The generic 5-level paging stuff
seems to work fine - is this specific to KVM?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages
  2022-12-01 13:43           ` Ard Biesheuvel
@ 2022-12-01 16:00             ` Ryan Roberts
  0 siblings, 0 replies; 45+ messages in thread
From: Ryan Roberts @ 2022-12-01 16:00 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual,
	Richard Henderson

On 01/12/2022 13:43, Ard Biesheuvel wrote:
> [...]>>>>>>
>>>>>> 2) 4 FAILING TESTS: Host kernel gets stuck initializing KVM
>>>>>>
>>>>>>   During kernel boot, last console log is "kvm [1]: vgic interrupt IRQ9". All
>>>>>>   failing tests are configured for protected KVM, and are build with LPA2
>>>>>>   support, running on non-LPA2 HW.
>>>>>>
>>>>>
>>>>> I will try to reproduce this locally.
>>
>> It turns out the same issue is hit when running your patches without mine on
>> top. The root cause in both cases is an assumption that kvm_get_parange() makes
>> that ID_AA64MMFR0_EL1_PARANGE_MAX will always be 48 for 4KB and 16KB PAGE_SIZE.
>> That is no longer true with your patches. It causes the VTCR_EL2 to be
>> programmed incorrectly for the host stage2, then on return to the host, bang.
>>
> 
> Right. I made an attempt at replacing 'u32 level' with 's32 level'
> throughout that code, along with some related changes, but I didn't
> spot this issue.
> 
>> This demonstrates that kernel stage1 support for LPA2 depends on kvm support for
>> LPA2, since for protected kvm, the host stage2 needs to be able to id map the
>> full physical range that the host kernel sees prior to deprivilege. So I don't
>> think it's fixable in your series. I have a fix in my series for this.
>>
> 
> The reference to ID_AA64MMFR0_EL1_PARANGE_MAX should be fixable in
> isolation, no? Even if it results in a KVM that cannot use 52-bit PAs
> while the host can.

Well that doesn't really sound like a good solution to me - the VMM can't
control which physical memory it is using so it would be the luck of the draw as
to whether kvm gets passed memory above 48 PA bits. So you would probably have
to explicitly prevent kvm from initializing if you want to keep your kernel LPA2
patches decoupled from mine.

> 
>> I also found another dependency problem (hit by some of the tests that were
>> previously failing at the first issue) where kvm uses its page table library to
>> walk a user space page table created by the kernel (see
>> get_user_mapping_size()). In the case where the kernel creates an LPA2 page
>> table, kvm can't walk it without my patches. I've also added a fix for this to
>> my series.
>>
> 
> OK

Again, I think the only way to work around this without kvm understanding LPA2
is to disable KVM when the kernel is using LPA2.

> 
>>
>>>>>
>>>>>>
>>>>>> 3) 42 FAILING TESTS: Guest kernel never outputs anything to console
>>>>>>
>>>>>>   Host kernel boots fine, and we attempt to launch a guest kernel using kvmtool.
>>>>>>   There is no error reported, but the guest never outputs anything. Haven't
>>>>>>   worked out which config options are common to all failures yet.
>>>>>>
>>>>>
>>>>> This goes a bit beyond what I am currently set up for in terms of
>>>>> testing, but I'm happy to help narrow this down.
>>
>> I don't have a root cause for this yet. I'll try to take a loo this afternoon.
>> Will keep you posted.
>>

OK found it. All these failures are when loading the guest in 'high' memory. My
test environment is allocating a vm with 256M@2048T (so all memory above 48 bits
IPA) and putting the 64KB/52VA/52PA kernel, dtb and intird there. This works
fine without your changes. I guess as part of your change you have broken 64KB
kernel's ability to create an ID map above 48 bits (which conforms to your
clarification of the boot protocol). I guess this was intentional? It makes it
really hard for me to test >48 bit IA at stage2...

> 
>>
>> Once I have all the tests passing, I'll post my series, then hopefully we can
>> move it all forwards as one?
>>
> 
> That would be great, yes, although my work depends on a sizable rework
> of the early boot code that has seen very little review as of yet.
> 
> So for the time being, let's keep aligned but let's not put any eggs
> in each other's baskets :-)>
>> As part of my debugging, I've got a patch to sort out the tlbi code to support
>> LPA2 properly - I think I raised that comment on one of the patches. Are you
>> happy for me to post as part of my series?
>>
> 
> I wasn't sure where to look tbh. The generic 5-level paging stuff
> seems to work fine - is this specific to KVM?

There are a couple of issues that I spotted; for the range-based tlbi
instructions, when LPA2 is in use (TCR_EL1.DS=1), BaseADDR must be 64KB aligned
(when LPA2 is disabled it only needs to be page aligned). So there is some
forward alignment required using the non-range tbli in __flush_tlb_range(). I
think this would manifest as invalidating the wrong entries if left as-is once
your patches are applied.

The second problem is that __tlbi_level() uses level 0 as the "don't use level
hint" sentinel. I think this will work just fine (if slightly suboptimal) if
left as is, since the higher layers are never passing anything outside the range
[0, 3] at the moment, but if that changed and -1 was passed then it would cause
a bug.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-12-01 16:17 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-24 12:39 [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 01/19] arm64/mm: Simplify and document pte_to_phys() for 52 bit addresses Ard Biesheuvel
2022-11-24 12:39   ` Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 02/19] arm64/mm: Add FEAT_LPA2 specific TCR_EL1.DS field Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 03/19] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 04/19] arm64: kaslr: Adjust randomization range dynamically Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 05/19] arm64: mm: get rid of kimage_vaddr global variable Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 06/19] arm64: head: remove order argument from early mapping routine Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 07/19] arm64: mm: Handle LVA support as a CPU feature Ard Biesheuvel
2022-11-28 14:54   ` Ryan Roberts
2022-11-24 12:39 ` [PATCH v2 08/19] arm64: mm: Deal with potential ID map extension if VA_BITS > VA_BITS_MIN Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 09/19] arm64: mm: Add feature override support for LVA Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 10/19] arm64: mm: Wire up TCR.DS bit to PTE shareability fields Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 11/19] arm64: mm: Add LPA2 support to phys<->pte conversion routines Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 12/19] arm64: mm: Add definitions to support 5 levels of paging Ard Biesheuvel
2022-11-28 16:17   ` Ryan Roberts
2022-11-28 16:22     ` Ard Biesheuvel
2022-11-28 18:00       ` Marc Zyngier
2022-11-28 18:20         ` Ryan Roberts
2022-11-29 15:46       ` Ryan Roberts
2022-11-29 15:48         ` Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 13/19] arm64: mm: add 5 level paging support to G-to-nG conversion routine Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 14/19] arm64: Enable LPA2 at boot if supported by the system Ard Biesheuvel
2022-11-28 14:54   ` Ryan Roberts
2022-11-24 12:39 ` [PATCH v2 15/19] arm64: mm: Add 5 level paging support to fixmap and swapper handling Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 16/19] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging Ard Biesheuvel
2022-11-24 17:44   ` Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 17/19] arm64: mm: Add support for folding PUDs at runtime Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 18/19] arm64: ptdump: Disregard unaddressable VA space Ard Biesheuvel
2022-11-24 12:39 ` [PATCH v2 19/19] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs Ard Biesheuvel
2022-11-24 14:39 ` [PATCH v2 00/19] arm64: Enable LPA2 support for 4k and 16k pages Ryan Roberts
2022-11-24 17:14   ` Ard Biesheuvel
2022-11-25  9:22     ` Ryan Roberts
2022-11-25  9:35       ` Ard Biesheuvel
2022-11-25 10:07         ` Ryan Roberts
2022-11-25 10:36           ` Ard Biesheuvel
2022-11-25 14:12             ` Ryan Roberts
2022-11-25 14:19               ` Ard Biesheuvel
2022-11-29 15:31 ` Ryan Roberts
2022-11-29 15:47   ` Ard Biesheuvel
2022-11-29 16:35     ` Ryan Roberts
2022-11-29 16:56       ` Ard Biesheuvel
2022-12-01 12:22         ` Ryan Roberts
2022-12-01 13:43           ` Ard Biesheuvel
2022-12-01 16:00             ` Ryan Roberts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.