All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
@ 2021-09-28  5:33 ` Anshuman Khandual
  0 siblings, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2021-09-28  5:33 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon, James Morse,
	Marc Zyngier, Mark Rutland, linux-kernel

When creating the idmap, the kernel may add one extra level to idmap memory
outside the VA range. But for [16K|36VA|48PA], we need two levels to reach
48 bits. If the bootloader places the kernel in memory above (1 << 46), the
kernel will fail to enable the MMU. Although we are not aware of a platform
where this happens, it is worth to accommodate such scenarios and prevent a
possible kernel crash.

Lets fix this problem by carefully analyzing existing VA_BITS with respect
to maximum possible mapping with the existing PGDIR level i.e (PGDIR_SHIFT
+ PAGE_SHIFT - 3) and then evaluating how many extra page table levels are
required to accommodate the reduced idmap_t0sz to map __idmap_text_end.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Fixes: 215399392fe4 ("arm64: 36 bit VA")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This applies on v5.15-rc3.

This is a different approach as compared to V1 which still applies on the
latest mainline. Besides this enables all upcoming FEAT_LPA2 combinations
as well. Please do suggest which approach would be preferred.

- Anshuman

 V1:  https://lore.kernel.org/all/1627879359-30303-1-git-send-email-anshuman.khandual@arm.com/
RFC: https://lore.kernel.org/lkml/1627019894-14819-1-git-send-email-anshuman.khandual@arm.com/

 arch/arm64/include/asm/assembler.h |  9 ++++++++
 arch/arm64/kernel/head.S           | 46 +++++++++++++++++++++++---------------
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index bfa5840..e5b5d3a 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -25,6 +25,15 @@
 #include <asm/ptrace.h>
 #include <asm/thread_info.h>
 
+	.macro shift_to_ptrs, ptrs, shift, tmp, tmp1
+	ldr_l   \tmp1, idmap_t0sz
+	add     \tmp1, \tmp1, \shift
+	mov     \tmp, #64
+	sub     \tmp, \tmp, \tmp1
+	mov     \ptrs, #1
+	lsr     \ptrs, \ptrs, \tmp
+	.endm
+
 	/*
 	 * Provide a wxN alias for each wN register so what we can paste a xN
 	 * reference after a 'w' to obtain the 32-bit version.
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 1796245..b93d50d 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -328,30 +328,40 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	dmb	sy
 	dc	ivac, x6		// Invalidate potentially stale cache line
 
-#if (VA_BITS < 48)
 #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
-#define EXTRA_PTRS	(1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
-
-	/*
-	 * If VA_BITS < 48, we have to configure an additional table level.
-	 * First, we have to verify our assumption that the current value of
-	 * VA_BITS was chosen such that all translation levels are fully
-	 * utilised, and that lowering T0SZ will always result in an additional
-	 * translation level to be configured.
-	 */
-#if VA_BITS != EXTRA_SHIFT
+#define EXTRA_SHIFT_1	(EXTRA_SHIFT + PAGE_SHIFT - 3)
+#if (VA_BITS > EXTRA_SHIFT)
 #error "Mismatch between VA_BITS and page size/number of translation levels"
 #endif
 
-	mov	x4, EXTRA_PTRS
+#if (VA_BITS == EXTRA_SHIFT)
+	mov	x6, #TCR_T0SZ(VA_BITS_MIN)
+	sub	x6, x6, x5
+	cmp	x6, #(PAGE_SHIFT - 3)
+	b.gt	8f
+
+	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
 	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
-#else
-	/*
-	 * If VA_BITS == 48, we don't have to configure an additional
-	 * translation level, but the top-level table has more entries.
-	 */
-	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
+	b	1f
+8:
+	shift_to_ptrs x4, EXTRA_SHIFT_1, x5, x6
+	create_table_entry x0, x3, EXTRA_SHIFT_1, x4, x5, x6
+
+	mov     x4, PTRS_PER_PTE
+	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
+#elif (VA_BITS < EXTRA_SHIFT)
+	mov	x6, #64
+	sub	x6, x6, x5
+	cmp	x6, EXTRA_SHIFT
+	b.eq	1f
+	b.gt	9f
+
+	shift_to_ptrs x4, PGDIR_SHIFT, x5, x6
 	str_l	x4, idmap_ptrs_per_pgd, x5
+	b	1f
+9:
+	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
+	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
 #endif
 1:
 	ldr_l	x4, idmap_ptrs_per_pgd
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
@ 2021-09-28  5:33 ` Anshuman Khandual
  0 siblings, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2021-09-28  5:33 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Will Deacon, James Morse,
	Marc Zyngier, Mark Rutland, linux-kernel

When creating the idmap, the kernel may add one extra level to idmap memory
outside the VA range. But for [16K|36VA|48PA], we need two levels to reach
48 bits. If the bootloader places the kernel in memory above (1 << 46), the
kernel will fail to enable the MMU. Although we are not aware of a platform
where this happens, it is worth to accommodate such scenarios and prevent a
possible kernel crash.

Lets fix this problem by carefully analyzing existing VA_BITS with respect
to maximum possible mapping with the existing PGDIR level i.e (PGDIR_SHIFT
+ PAGE_SHIFT - 3) and then evaluating how many extra page table levels are
required to accommodate the reduced idmap_t0sz to map __idmap_text_end.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Fixes: 215399392fe4 ("arm64: 36 bit VA")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
This applies on v5.15-rc3.

This is a different approach as compared to V1 which still applies on the
latest mainline. Besides this enables all upcoming FEAT_LPA2 combinations
as well. Please do suggest which approach would be preferred.

- Anshuman

 V1:  https://lore.kernel.org/all/1627879359-30303-1-git-send-email-anshuman.khandual@arm.com/
RFC: https://lore.kernel.org/lkml/1627019894-14819-1-git-send-email-anshuman.khandual@arm.com/

 arch/arm64/include/asm/assembler.h |  9 ++++++++
 arch/arm64/kernel/head.S           | 46 +++++++++++++++++++++++---------------
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index bfa5840..e5b5d3a 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -25,6 +25,15 @@
 #include <asm/ptrace.h>
 #include <asm/thread_info.h>
 
+	.macro shift_to_ptrs, ptrs, shift, tmp, tmp1
+	ldr_l   \tmp1, idmap_t0sz
+	add     \tmp1, \tmp1, \shift
+	mov     \tmp, #64
+	sub     \tmp, \tmp, \tmp1
+	mov     \ptrs, #1
+	lsr     \ptrs, \ptrs, \tmp
+	.endm
+
 	/*
 	 * Provide a wxN alias for each wN register so what we can paste a xN
 	 * reference after a 'w' to obtain the 32-bit version.
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 1796245..b93d50d 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -328,30 +328,40 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	dmb	sy
 	dc	ivac, x6		// Invalidate potentially stale cache line
 
-#if (VA_BITS < 48)
 #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
-#define EXTRA_PTRS	(1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
-
-	/*
-	 * If VA_BITS < 48, we have to configure an additional table level.
-	 * First, we have to verify our assumption that the current value of
-	 * VA_BITS was chosen such that all translation levels are fully
-	 * utilised, and that lowering T0SZ will always result in an additional
-	 * translation level to be configured.
-	 */
-#if VA_BITS != EXTRA_SHIFT
+#define EXTRA_SHIFT_1	(EXTRA_SHIFT + PAGE_SHIFT - 3)
+#if (VA_BITS > EXTRA_SHIFT)
 #error "Mismatch between VA_BITS and page size/number of translation levels"
 #endif
 
-	mov	x4, EXTRA_PTRS
+#if (VA_BITS == EXTRA_SHIFT)
+	mov	x6, #TCR_T0SZ(VA_BITS_MIN)
+	sub	x6, x6, x5
+	cmp	x6, #(PAGE_SHIFT - 3)
+	b.gt	8f
+
+	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
 	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
-#else
-	/*
-	 * If VA_BITS == 48, we don't have to configure an additional
-	 * translation level, but the top-level table has more entries.
-	 */
-	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
+	b	1f
+8:
+	shift_to_ptrs x4, EXTRA_SHIFT_1, x5, x6
+	create_table_entry x0, x3, EXTRA_SHIFT_1, x4, x5, x6
+
+	mov     x4, PTRS_PER_PTE
+	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
+#elif (VA_BITS < EXTRA_SHIFT)
+	mov	x6, #64
+	sub	x6, x6, x5
+	cmp	x6, EXTRA_SHIFT
+	b.eq	1f
+	b.gt	9f
+
+	shift_to_ptrs x4, PGDIR_SHIFT, x5, x6
 	str_l	x4, idmap_ptrs_per_pgd, x5
+	b	1f
+9:
+	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
+	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
 #endif
 1:
 	ldr_l	x4, idmap_ptrs_per_pgd
-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
  2021-09-28  5:33 ` Anshuman Khandual
@ 2021-10-04 10:49   ` Mark Rutland
  -1 siblings, 0 replies; 10+ messages in thread
From: Mark Rutland @ 2021-10-04 10:49 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, James Morse,
	Marc Zyngier, linux-kernel

Hi Anshuman,

On Tue, Sep 28, 2021 at 11:03:45AM +0530, Anshuman Khandual wrote:
> When creating the idmap, the kernel may add one extra level to idmap memory
> outside the VA range. But for [16K|36VA|48PA], we need two levels to reach
> 48 bits. If the bootloader places the kernel in memory above (1 << 46), the
> kernel will fail to enable the MMU. Although we are not aware of a platform
> where this happens, it is worth to accommodate such scenarios and prevent a
> possible kernel crash.

I think it's worth noting here that ARM64_VA_BITS_36 depends on EXPERT,
so very few people are likely to be using this configuration.

> Lets fix this problem by carefully analyzing existing VA_BITS with respect
> to maximum possible mapping with the existing PGDIR level i.e (PGDIR_SHIFT
> + PAGE_SHIFT - 3) and then evaluating how many extra page table levels are
> required to accommodate the reduced idmap_t0sz to map __idmap_text_end.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: James Morse <james.morse@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 215399392fe4 ("arm64: 36 bit VA")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This applies on v5.15-rc3.
> 
> This is a different approach as compared to V1 which still applies on the
> latest mainline. Besides this enables all upcoming FEAT_LPA2 combinations
> as well. Please do suggest which approach would be preferred.
> - Anshuman
> 
>  V1:  https://lore.kernel.org/all/1627879359-30303-1-git-send-email-anshuman.khandual@arm.com/
> RFC: https://lore.kernel.org/lkml/1627019894-14819-1-git-send-email-anshuman.khandual@arm.com/

If we need something to backport, I'm not opposed to taking one of these
patches (and as v1 is simpler, I'd prefer that), but I think either
approach is further bodging around the `map_memory` macro not being a
great fit for the idmap creation, and it would be better to rework the
structure of the pagetable creation code to do the right thing from the
outset.

Catalin, Will, do you have any preference as to having a backportable
fix for this?

Ignoring backports, I'd prefer if we could refactor things such that we
decouple the `idmap_pg_dir` creation from the `init_pg_dir` creation,
and create the idmap in terms of the architectural levels rather than
pgd/p4d/pud/pmd/pte, so that we can consistently create the idmap with
at least 48 bits of VA.

Pseudo-code wise, I'd like something that looks like:

	create_idmap(...)
	{
		idmap_va_bits = 48;
		idmap_t0size = TCR_T0SZ(48);

		if (need_52_bit_va(__idmap_text_start)) {
			if (!supports_52bit_va()) {
				some_early_spin_loop();
			}
			idmap_va_bits = 52;
			idmap_t0size = TCR_T0SZ(52);
		}
	
		if (need_table_level(idmap_va_bits, -1))
			create_table_level(-1, ...);

		if (need_table_level(idmap_va_bits, 0))
			create_table_level(0, ...);

		if (need_table_level(idmap_va_bits, 1))
			create_table_level(1, ...);

		if (need_table_level(idmap_va_bits, 2))
			create_table_level(2, ...);

		create_table_level(3, ...);
	}

... which I think would be much easier to reason about consistently.

How does that sound to you?

I've pushed some preparatory rework out to my arm64/pgtable/idmap
branch, splitting out a __create_idmap_tables() function (and ensuring
that idmap_t0sz doesn't get silently overridden elsewhere):

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/pgtable/idmap
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/pgtable/idmap

... but I haven't had the chance to do the actual rework of the idmap
creation code.

I can send that as a series if that's helpful.

Thanks,
Mark.

>  arch/arm64/include/asm/assembler.h |  9 ++++++++
>  arch/arm64/kernel/head.S           | 46 +++++++++++++++++++++++---------------
>  2 files changed, 37 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index bfa5840..e5b5d3a 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -25,6 +25,15 @@
>  #include <asm/ptrace.h>
>  #include <asm/thread_info.h>
>  
> +	.macro shift_to_ptrs, ptrs, shift, tmp, tmp1
> +	ldr_l   \tmp1, idmap_t0sz
> +	add     \tmp1, \tmp1, \shift
> +	mov     \tmp, #64
> +	sub     \tmp, \tmp, \tmp1
> +	mov     \ptrs, #1
> +	lsr     \ptrs, \ptrs, \tmp
> +	.endm
> +
>  	/*
>  	 * Provide a wxN alias for each wN register so what we can paste a xN
>  	 * reference after a 'w' to obtain the 32-bit version.
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 1796245..b93d50d 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -328,30 +328,40 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
>  	dmb	sy
>  	dc	ivac, x6		// Invalidate potentially stale cache line
>  
> -#if (VA_BITS < 48)
>  #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
> -#define EXTRA_PTRS	(1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
> -
> -	/*
> -	 * If VA_BITS < 48, we have to configure an additional table level.
> -	 * First, we have to verify our assumption that the current value of
> -	 * VA_BITS was chosen such that all translation levels are fully
> -	 * utilised, and that lowering T0SZ will always result in an additional
> -	 * translation level to be configured.
> -	 */
> -#if VA_BITS != EXTRA_SHIFT
> +#define EXTRA_SHIFT_1	(EXTRA_SHIFT + PAGE_SHIFT - 3)
> +#if (VA_BITS > EXTRA_SHIFT)
>  #error "Mismatch between VA_BITS and page size/number of translation levels"
>  #endif
>  
> -	mov	x4, EXTRA_PTRS
> +#if (VA_BITS == EXTRA_SHIFT)
> +	mov	x6, #TCR_T0SZ(VA_BITS_MIN)
> +	sub	x6, x6, x5
> +	cmp	x6, #(PAGE_SHIFT - 3)
> +	b.gt	8f
> +
> +	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
>  	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
> -#else
> -	/*
> -	 * If VA_BITS == 48, we don't have to configure an additional
> -	 * translation level, but the top-level table has more entries.
> -	 */
> -	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
> +	b	1f
> +8:
> +	shift_to_ptrs x4, EXTRA_SHIFT_1, x5, x6
> +	create_table_entry x0, x3, EXTRA_SHIFT_1, x4, x5, x6
> +
> +	mov     x4, PTRS_PER_PTE
> +	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
> +#elif (VA_BITS < EXTRA_SHIFT)
> +	mov	x6, #64
> +	sub	x6, x6, x5
> +	cmp	x6, EXTRA_SHIFT
> +	b.eq	1f
> +	b.gt	9f
> +
> +	shift_to_ptrs x4, PGDIR_SHIFT, x5, x6
>  	str_l	x4, idmap_ptrs_per_pgd, x5
> +	b	1f
> +9:
> +	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
> +	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
>  #endif
>  1:
>  	ldr_l	x4, idmap_ptrs_per_pgd
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
@ 2021-10-04 10:49   ` Mark Rutland
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Rutland @ 2021-10-04 10:49 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, James Morse,
	Marc Zyngier, linux-kernel

Hi Anshuman,

On Tue, Sep 28, 2021 at 11:03:45AM +0530, Anshuman Khandual wrote:
> When creating the idmap, the kernel may add one extra level to idmap memory
> outside the VA range. But for [16K|36VA|48PA], we need two levels to reach
> 48 bits. If the bootloader places the kernel in memory above (1 << 46), the
> kernel will fail to enable the MMU. Although we are not aware of a platform
> where this happens, it is worth to accommodate such scenarios and prevent a
> possible kernel crash.

I think it's worth noting here that ARM64_VA_BITS_36 depends on EXPERT,
so very few people are likely to be using this configuration.

> Lets fix this problem by carefully analyzing existing VA_BITS with respect
> to maximum possible mapping with the existing PGDIR level i.e (PGDIR_SHIFT
> + PAGE_SHIFT - 3) and then evaluating how many extra page table levels are
> required to accommodate the reduced idmap_t0sz to map __idmap_text_end.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: James Morse <james.morse@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 215399392fe4 ("arm64: 36 bit VA")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> This applies on v5.15-rc3.
> 
> This is a different approach as compared to V1 which still applies on the
> latest mainline. Besides this enables all upcoming FEAT_LPA2 combinations
> as well. Please do suggest which approach would be preferred.
> - Anshuman
> 
>  V1:  https://lore.kernel.org/all/1627879359-30303-1-git-send-email-anshuman.khandual@arm.com/
> RFC: https://lore.kernel.org/lkml/1627019894-14819-1-git-send-email-anshuman.khandual@arm.com/

If we need something to backport, I'm not opposed to taking one of these
patches (and as v1 is simpler, I'd prefer that), but I think either
approach is further bodging around the `map_memory` macro not being a
great fit for the idmap creation, and it would be better to rework the
structure of the pagetable creation code to do the right thing from the
outset.

Catalin, Will, do you have any preference as to having a backportable
fix for this?

Ignoring backports, I'd prefer if we could refactor things such that we
decouple the `idmap_pg_dir` creation from the `init_pg_dir` creation,
and create the idmap in terms of the architectural levels rather than
pgd/p4d/pud/pmd/pte, so that we can consistently create the idmap with
at least 48 bits of VA.

Pseudo-code wise, I'd like something that looks like:

	create_idmap(...)
	{
		idmap_va_bits = 48;
		idmap_t0size = TCR_T0SZ(48);

		if (need_52_bit_va(__idmap_text_start)) {
			if (!supports_52bit_va()) {
				some_early_spin_loop();
			}
			idmap_va_bits = 52;
			idmap_t0size = TCR_T0SZ(52);
		}
	
		if (need_table_level(idmap_va_bits, -1))
			create_table_level(-1, ...);

		if (need_table_level(idmap_va_bits, 0))
			create_table_level(0, ...);

		if (need_table_level(idmap_va_bits, 1))
			create_table_level(1, ...);

		if (need_table_level(idmap_va_bits, 2))
			create_table_level(2, ...);

		create_table_level(3, ...);
	}

... which I think would be much easier to reason about consistently.

How does that sound to you?

I've pushed some preparatory rework out to my arm64/pgtable/idmap
branch, splitting out a __create_idmap_tables() function (and ensuring
that idmap_t0sz doesn't get silently overridden elsewhere):

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/pgtable/idmap
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/pgtable/idmap

... but I haven't had the chance to do the actual rework of the idmap
creation code.

I can send that as a series if that's helpful.

Thanks,
Mark.

>  arch/arm64/include/asm/assembler.h |  9 ++++++++
>  arch/arm64/kernel/head.S           | 46 +++++++++++++++++++++++---------------
>  2 files changed, 37 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index bfa5840..e5b5d3a 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -25,6 +25,15 @@
>  #include <asm/ptrace.h>
>  #include <asm/thread_info.h>
>  
> +	.macro shift_to_ptrs, ptrs, shift, tmp, tmp1
> +	ldr_l   \tmp1, idmap_t0sz
> +	add     \tmp1, \tmp1, \shift
> +	mov     \tmp, #64
> +	sub     \tmp, \tmp, \tmp1
> +	mov     \ptrs, #1
> +	lsr     \ptrs, \ptrs, \tmp
> +	.endm
> +
>  	/*
>  	 * Provide a wxN alias for each wN register so what we can paste a xN
>  	 * reference after a 'w' to obtain the 32-bit version.
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 1796245..b93d50d 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -328,30 +328,40 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
>  	dmb	sy
>  	dc	ivac, x6		// Invalidate potentially stale cache line
>  
> -#if (VA_BITS < 48)
>  #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
> -#define EXTRA_PTRS	(1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
> -
> -	/*
> -	 * If VA_BITS < 48, we have to configure an additional table level.
> -	 * First, we have to verify our assumption that the current value of
> -	 * VA_BITS was chosen such that all translation levels are fully
> -	 * utilised, and that lowering T0SZ will always result in an additional
> -	 * translation level to be configured.
> -	 */
> -#if VA_BITS != EXTRA_SHIFT
> +#define EXTRA_SHIFT_1	(EXTRA_SHIFT + PAGE_SHIFT - 3)
> +#if (VA_BITS > EXTRA_SHIFT)
>  #error "Mismatch between VA_BITS and page size/number of translation levels"
>  #endif
>  
> -	mov	x4, EXTRA_PTRS
> +#if (VA_BITS == EXTRA_SHIFT)
> +	mov	x6, #TCR_T0SZ(VA_BITS_MIN)
> +	sub	x6, x6, x5
> +	cmp	x6, #(PAGE_SHIFT - 3)
> +	b.gt	8f
> +
> +	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
>  	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
> -#else
> -	/*
> -	 * If VA_BITS == 48, we don't have to configure an additional
> -	 * translation level, but the top-level table has more entries.
> -	 */
> -	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
> +	b	1f
> +8:
> +	shift_to_ptrs x4, EXTRA_SHIFT_1, x5, x6
> +	create_table_entry x0, x3, EXTRA_SHIFT_1, x4, x5, x6
> +
> +	mov     x4, PTRS_PER_PTE
> +	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
> +#elif (VA_BITS < EXTRA_SHIFT)
> +	mov	x6, #64
> +	sub	x6, x6, x5
> +	cmp	x6, EXTRA_SHIFT
> +	b.eq	1f
> +	b.gt	9f
> +
> +	shift_to_ptrs x4, PGDIR_SHIFT, x5, x6
>  	str_l	x4, idmap_ptrs_per_pgd, x5
> +	b	1f
> +9:
> +	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
> +	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
>  #endif
>  1:
>  	ldr_l	x4, idmap_ptrs_per_pgd
> -- 
> 2.7.4
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
  2021-10-04 10:49   ` Mark Rutland
@ 2021-10-04 17:23     ` Will Deacon
  -1 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2021-10-04 17:23 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Anshuman Khandual, linux-arm-kernel, Catalin Marinas,
	James Morse, Marc Zyngier, linux-kernel

On Mon, Oct 04, 2021 at 11:49:58AM +0100, Mark Rutland wrote:
> Hi Anshuman,
> 
> On Tue, Sep 28, 2021 at 11:03:45AM +0530, Anshuman Khandual wrote:
> > When creating the idmap, the kernel may add one extra level to idmap memory
> > outside the VA range. But for [16K|36VA|48PA], we need two levels to reach
> > 48 bits. If the bootloader places the kernel in memory above (1 << 46), the
> > kernel will fail to enable the MMU. Although we are not aware of a platform
> > where this happens, it is worth to accommodate such scenarios and prevent a
> > possible kernel crash.
> 
> I think it's worth noting here that ARM64_VA_BITS_36 depends on EXPERT,
> so very few people are likely to be using this configuration.
> 
> > Lets fix this problem by carefully analyzing existing VA_BITS with respect
> > to maximum possible mapping with the existing PGDIR level i.e (PGDIR_SHIFT
> > + PAGE_SHIFT - 3) and then evaluating how many extra page table levels are
> > required to accommodate the reduced idmap_t0sz to map __idmap_text_end.
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Fixes: 215399392fe4 ("arm64: 36 bit VA")
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > ---
> > This applies on v5.15-rc3.
> > 
> > This is a different approach as compared to V1 which still applies on the
> > latest mainline. Besides this enables all upcoming FEAT_LPA2 combinations
> > as well. Please do suggest which approach would be preferred.
> > - Anshuman
> > 
> >  V1:  https://lore.kernel.org/all/1627879359-30303-1-git-send-email-anshuman.khandual@arm.com/
> > RFC: https://lore.kernel.org/lkml/1627019894-14819-1-git-send-email-anshuman.khandual@arm.com/
> 
> If we need something to backport, I'm not opposed to taking one of these
> patches (and as v1 is simpler, I'd prefer that), but I think either
> approach is further bodging around the `map_memory` macro not being a
> great fit for the idmap creation, and it would be better to rework the
> structure of the pagetable creation code to do the right thing from the
> outset.
> 
> Catalin, Will, do you have any preference as to having a backportable
> fix for this?

Given that I don't think this has ever worked (i.e. this isn't a
regression), I don't see the need to backport the fix unless somebody is
running into real problems.

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
@ 2021-10-04 17:23     ` Will Deacon
  0 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2021-10-04 17:23 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Anshuman Khandual, linux-arm-kernel, Catalin Marinas,
	James Morse, Marc Zyngier, linux-kernel

On Mon, Oct 04, 2021 at 11:49:58AM +0100, Mark Rutland wrote:
> Hi Anshuman,
> 
> On Tue, Sep 28, 2021 at 11:03:45AM +0530, Anshuman Khandual wrote:
> > When creating the idmap, the kernel may add one extra level to idmap memory
> > outside the VA range. But for [16K|36VA|48PA], we need two levels to reach
> > 48 bits. If the bootloader places the kernel in memory above (1 << 46), the
> > kernel will fail to enable the MMU. Although we are not aware of a platform
> > where this happens, it is worth to accommodate such scenarios and prevent a
> > possible kernel crash.
> 
> I think it's worth noting here that ARM64_VA_BITS_36 depends on EXPERT,
> so very few people are likely to be using this configuration.
> 
> > Lets fix this problem by carefully analyzing existing VA_BITS with respect
> > to maximum possible mapping with the existing PGDIR level i.e (PGDIR_SHIFT
> > + PAGE_SHIFT - 3) and then evaluating how many extra page table levels are
> > required to accommodate the reduced idmap_t0sz to map __idmap_text_end.
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > Fixes: 215399392fe4 ("arm64: 36 bit VA")
> > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > ---
> > This applies on v5.15-rc3.
> > 
> > This is a different approach as compared to V1 which still applies on the
> > latest mainline. Besides this enables all upcoming FEAT_LPA2 combinations
> > as well. Please do suggest which approach would be preferred.
> > - Anshuman
> > 
> >  V1:  https://lore.kernel.org/all/1627879359-30303-1-git-send-email-anshuman.khandual@arm.com/
> > RFC: https://lore.kernel.org/lkml/1627019894-14819-1-git-send-email-anshuman.khandual@arm.com/
> 
> If we need something to backport, I'm not opposed to taking one of these
> patches (and as v1 is simpler, I'd prefer that), but I think either
> approach is further bodging around the `map_memory` macro not being a
> great fit for the idmap creation, and it would be better to rework the
> structure of the pagetable creation code to do the right thing from the
> outset.
> 
> Catalin, Will, do you have any preference as to having a backportable
> fix for this?

Given that I don't think this has ever worked (i.e. this isn't a
regression), I don't see the need to backport the fix unless somebody is
running into real problems.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
  2021-10-04 10:49   ` Mark Rutland
@ 2021-10-12  4:49     ` Anshuman Khandual
  -1 siblings, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2021-10-12  4:49 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, James Morse,
	Marc Zyngier, linux-kernel



On 10/4/21 4:19 PM, Mark Rutland wrote:
> Hi Anshuman,
> 
> On Tue, Sep 28, 2021 at 11:03:45AM +0530, Anshuman Khandual wrote:
>> When creating the idmap, the kernel may add one extra level to idmap memory
>> outside the VA range. But for [16K|36VA|48PA], we need two levels to reach
>> 48 bits. If the bootloader places the kernel in memory above (1 << 46), the
>> kernel will fail to enable the MMU. Although we are not aware of a platform
>> where this happens, it is worth to accommodate such scenarios and prevent a
>> possible kernel crash.
> 
> I think it's worth noting here that ARM64_VA_BITS_36 depends on EXPERT,
> so very few people are likely to be using this configuration.
> 
>> Lets fix this problem by carefully analyzing existing VA_BITS with respect
>> to maximum possible mapping with the existing PGDIR level i.e (PGDIR_SHIFT
>> + PAGE_SHIFT - 3) and then evaluating how many extra page table levels are
>> required to accommodate the reduced idmap_t0sz to map __idmap_text_end.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: James Morse <james.morse@arm.com>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 215399392fe4 ("arm64: 36 bit VA")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> This applies on v5.15-rc3.
>>
>> This is a different approach as compared to V1 which still applies on the
>> latest mainline. Besides this enables all upcoming FEAT_LPA2 combinations
>> as well. Please do suggest which approach would be preferred.
>> - Anshuman
>>
>>  V1:  https://lore.kernel.org/all/1627879359-30303-1-git-send-email-anshuman.khandual@arm.com/
>> RFC: https://lore.kernel.org/lkml/1627019894-14819-1-git-send-email-anshuman.khandual@arm.com/
> 
> If we need something to backport, I'm not opposed to taking one of these
> patches (and as v1 is simpler, I'd prefer that), but I think either
> approach is further bodging around the `map_memory` macro not being a
> great fit for the idmap creation, and it would be better to rework the
> structure of the pagetable creation code to do the right thing from the
> outset.

Agreed. The idmap page table creation does not have much similarity with
that of init_pg_dir. So probably a simpler idmap table creation would be
better.

> 
> Catalin, Will, do you have any preference as to having a backportable
> fix for this?

Although I would have preferred to have at least the V1 merged and back
ported, but Will has already answered that question on the thread.

> 
> Ignoring backports, I'd prefer if we could refactor things such that we
> decouple the `idmap_pg_dir` creation from the `init_pg_dir` creation,

Decoupling both page table creation makes sense.

> and create the idmap in terms of the architectural levels rather than
> pgd/p4d/pud/pmd/pte, so that we can consistently create the idmap with
> at least 48 bits of VA.

The rationale for creating the idmap table in terms of architectural
levels, rather than kernel pgd/p4d/pud/pmd/pte is to avoid handling
page table folding stuff and also to make it simpler ?

> 
> Pseudo-code wise, I'd like something that looks like:
> 
> 	create_idmap(...)
> 	{
> 		idmap_va_bits = 48;
> 		idmap_t0size = TCR_T0SZ(48);
> 
> 		if (need_52_bit_va(__idmap_text_start)) {

s/__idmap_text_start/__idmap_text_end/ instead ?

> 			if (!supports_52bit_va()) {
> 				some_early_spin_loop();

With a new CPU_STUCK_REASON_ code ?

> 			}
> 			idmap_va_bits = 52;
> 			idmap_t0size = TCR_T0SZ(52);
> 		}
> 	
> 		if (need_table_level(idmap_va_bits, -1))
> 			create_table_level(-1, ...);
> 
> 		if (need_table_level(idmap_va_bits, 0))
> 			create_table_level(0, ...);
> 
> 		if (need_table_level(idmap_va_bits, 1))
> 			create_table_level(1, ...);
> 
> 		if (need_table_level(idmap_va_bits, 2))
> 			create_table_level(2, ...);
> 
> 		create_table_level(3, ...);
> 	}
> 
> ... which I think would be much easier to reason about consistently.
> 
> How does that sound to you?

This approach will be simpler and as you mentioned, easier to reason about.

> 
> I've pushed some preparatory rework out to my arm64/pgtable/idmap
> branch, splitting out a __create_idmap_tables() function (and ensuring
> that idmap_t0sz doesn't get silently overridden elsewhere):
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/pgtable/idmap
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/pgtable/idmap
> 
> ... but I haven't had the chance to do the actual rework of the idmap
> creation code.
> 
> I can send that as a series if that's helpful.

I could also just pick those changes from the above branch and complete
the rework.

> 
> Thanks,
> Mark.
> 
>>  arch/arm64/include/asm/assembler.h |  9 ++++++++
>>  arch/arm64/kernel/head.S           | 46 +++++++++++++++++++++++---------------
>>  2 files changed, 37 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> index bfa5840..e5b5d3a 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -25,6 +25,15 @@
>>  #include <asm/ptrace.h>
>>  #include <asm/thread_info.h>
>>  
>> +	.macro shift_to_ptrs, ptrs, shift, tmp, tmp1
>> +	ldr_l   \tmp1, idmap_t0sz
>> +	add     \tmp1, \tmp1, \shift
>> +	mov     \tmp, #64
>> +	sub     \tmp, \tmp, \tmp1
>> +	mov     \ptrs, #1
>> +	lsr     \ptrs, \ptrs, \tmp
>> +	.endm
>> +
>>  	/*
>>  	 * Provide a wxN alias for each wN register so what we can paste a xN
>>  	 * reference after a 'w' to obtain the 32-bit version.
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 1796245..b93d50d 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -328,30 +328,40 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
>>  	dmb	sy
>>  	dc	ivac, x6		// Invalidate potentially stale cache line
>>  
>> -#if (VA_BITS < 48)
>>  #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
>> -#define EXTRA_PTRS	(1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
>> -
>> -	/*
>> -	 * If VA_BITS < 48, we have to configure an additional table level.
>> -	 * First, we have to verify our assumption that the current value of
>> -	 * VA_BITS was chosen such that all translation levels are fully
>> -	 * utilised, and that lowering T0SZ will always result in an additional
>> -	 * translation level to be configured.
>> -	 */
>> -#if VA_BITS != EXTRA_SHIFT
>> +#define EXTRA_SHIFT_1	(EXTRA_SHIFT + PAGE_SHIFT - 3)
>> +#if (VA_BITS > EXTRA_SHIFT)
>>  #error "Mismatch between VA_BITS and page size/number of translation levels"
>>  #endif
>>  
>> -	mov	x4, EXTRA_PTRS
>> +#if (VA_BITS == EXTRA_SHIFT)
>> +	mov	x6, #TCR_T0SZ(VA_BITS_MIN)
>> +	sub	x6, x6, x5
>> +	cmp	x6, #(PAGE_SHIFT - 3)
>> +	b.gt	8f
>> +
>> +	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
>>  	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
>> -#else
>> -	/*
>> -	 * If VA_BITS == 48, we don't have to configure an additional
>> -	 * translation level, but the top-level table has more entries.
>> -	 */
>> -	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
>> +	b	1f
>> +8:
>> +	shift_to_ptrs x4, EXTRA_SHIFT_1, x5, x6
>> +	create_table_entry x0, x3, EXTRA_SHIFT_1, x4, x5, x6
>> +
>> +	mov     x4, PTRS_PER_PTE
>> +	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
>> +#elif (VA_BITS < EXTRA_SHIFT)
>> +	mov	x6, #64
>> +	sub	x6, x6, x5
>> +	cmp	x6, EXTRA_SHIFT
>> +	b.eq	1f
>> +	b.gt	9f
>> +
>> +	shift_to_ptrs x4, PGDIR_SHIFT, x5, x6
>>  	str_l	x4, idmap_ptrs_per_pgd, x5
>> +	b	1f
>> +9:
>> +	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
>> +	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
>>  #endif
>>  1:
>>  	ldr_l	x4, idmap_ptrs_per_pgd
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
@ 2021-10-12  4:49     ` Anshuman Khandual
  0 siblings, 0 replies; 10+ messages in thread
From: Anshuman Khandual @ 2021-10-12  4:49 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, James Morse,
	Marc Zyngier, linux-kernel



On 10/4/21 4:19 PM, Mark Rutland wrote:
> Hi Anshuman,
> 
> On Tue, Sep 28, 2021 at 11:03:45AM +0530, Anshuman Khandual wrote:
>> When creating the idmap, the kernel may add one extra level to idmap memory
>> outside the VA range. But for [16K|36VA|48PA], we need two levels to reach
>> 48 bits. If the bootloader places the kernel in memory above (1 << 46), the
>> kernel will fail to enable the MMU. Although we are not aware of a platform
>> where this happens, it is worth to accommodate such scenarios and prevent a
>> possible kernel crash.
> 
> I think it's worth noting here that ARM64_VA_BITS_36 depends on EXPERT,
> so very few people are likely to be using this configuration.
> 
>> Lets fix this problem by carefully analyzing existing VA_BITS with respect
>> to maximum possible mapping with the existing PGDIR level i.e (PGDIR_SHIFT
>> + PAGE_SHIFT - 3) and then evaluating how many extra page table levels are
>> required to accommodate the reduced idmap_t0sz to map __idmap_text_end.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: James Morse <james.morse@arm.com>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: 215399392fe4 ("arm64: 36 bit VA")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> This applies on v5.15-rc3.
>>
>> This is a different approach as compared to V1 which still applies on the
>> latest mainline. Besides this enables all upcoming FEAT_LPA2 combinations
>> as well. Please do suggest which approach would be preferred.
>> - Anshuman
>>
>>  V1:  https://lore.kernel.org/all/1627879359-30303-1-git-send-email-anshuman.khandual@arm.com/
>> RFC: https://lore.kernel.org/lkml/1627019894-14819-1-git-send-email-anshuman.khandual@arm.com/
> 
> If we need something to backport, I'm not opposed to taking one of these
> patches (and as v1 is simpler, I'd prefer that), but I think either
> approach is further bodging around the `map_memory` macro not being a
> great fit for the idmap creation, and it would be better to rework the
> structure of the pagetable creation code to do the right thing from the
> outset.

Agreed. The idmap page table creation does not have much similarity with
that of init_pg_dir. So probably a simpler idmap table creation would be
better.

> 
> Catalin, Will, do you have any preference as to having a backportable
> fix for this?

Although I would have preferred to have at least the V1 merged and back
ported, but Will has already answered that question on the thread.

> 
> Ignoring backports, I'd prefer if we could refactor things such that we
> decouple the `idmap_pg_dir` creation from the `init_pg_dir` creation,

Decoupling both page table creation makes sense.

> and create the idmap in terms of the architectural levels rather than
> pgd/p4d/pud/pmd/pte, so that we can consistently create the idmap with
> at least 48 bits of VA.

The rationale for creating the idmap table in terms of architectural
levels, rather than kernel pgd/p4d/pud/pmd/pte is to avoid handling
page table folding stuff and also to make it simpler ?

> 
> Pseudo-code wise, I'd like something that looks like:
> 
> 	create_idmap(...)
> 	{
> 		idmap_va_bits = 48;
> 		idmap_t0size = TCR_T0SZ(48);
> 
> 		if (need_52_bit_va(__idmap_text_start)) {

s/__idmap_text_start/__idmap_text_end/ instead ?

> 			if (!supports_52bit_va()) {
> 				some_early_spin_loop();

With a new CPU_STUCK_REASON_ code ?

> 			}
> 			idmap_va_bits = 52;
> 			idmap_t0size = TCR_T0SZ(52);
> 		}
> 	
> 		if (need_table_level(idmap_va_bits, -1))
> 			create_table_level(-1, ...);
> 
> 		if (need_table_level(idmap_va_bits, 0))
> 			create_table_level(0, ...);
> 
> 		if (need_table_level(idmap_va_bits, 1))
> 			create_table_level(1, ...);
> 
> 		if (need_table_level(idmap_va_bits, 2))
> 			create_table_level(2, ...);
> 
> 		create_table_level(3, ...);
> 	}
> 
> ... which I think would be much easier to reason about consistently.
> 
> How does that sound to you?

This approach will be simpler and as you mentioned, easier to reason about.

> 
> I've pushed some preparatory rework out to my arm64/pgtable/idmap
> branch, splitting out a __create_idmap_tables() function (and ensuring
> that idmap_t0sz doesn't get silently overridden elsewhere):
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/pgtable/idmap
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/pgtable/idmap
> 
> ... but I haven't had the chance to do the actual rework of the idmap
> creation code.
> 
> I can send that as a series if that's helpful.

I could also just pick those changes from the above branch and complete
the rework.

> 
> Thanks,
> Mark.
> 
>>  arch/arm64/include/asm/assembler.h |  9 ++++++++
>>  arch/arm64/kernel/head.S           | 46 +++++++++++++++++++++++---------------
>>  2 files changed, 37 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> index bfa5840..e5b5d3a 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -25,6 +25,15 @@
>>  #include <asm/ptrace.h>
>>  #include <asm/thread_info.h>
>>  
>> +	.macro shift_to_ptrs, ptrs, shift, tmp, tmp1
>> +	ldr_l   \tmp1, idmap_t0sz
>> +	add     \tmp1, \tmp1, \shift
>> +	mov     \tmp, #64
>> +	sub     \tmp, \tmp, \tmp1
>> +	mov     \ptrs, #1
>> +	lsr     \ptrs, \ptrs, \tmp
>> +	.endm
>> +
>>  	/*
>>  	 * Provide a wxN alias for each wN register so what we can paste a xN
>>  	 * reference after a 'w' to obtain the 32-bit version.
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 1796245..b93d50d 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -328,30 +328,40 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
>>  	dmb	sy
>>  	dc	ivac, x6		// Invalidate potentially stale cache line
>>  
>> -#if (VA_BITS < 48)
>>  #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
>> -#define EXTRA_PTRS	(1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
>> -
>> -	/*
>> -	 * If VA_BITS < 48, we have to configure an additional table level.
>> -	 * First, we have to verify our assumption that the current value of
>> -	 * VA_BITS was chosen such that all translation levels are fully
>> -	 * utilised, and that lowering T0SZ will always result in an additional
>> -	 * translation level to be configured.
>> -	 */
>> -#if VA_BITS != EXTRA_SHIFT
>> +#define EXTRA_SHIFT_1	(EXTRA_SHIFT + PAGE_SHIFT - 3)
>> +#if (VA_BITS > EXTRA_SHIFT)
>>  #error "Mismatch between VA_BITS and page size/number of translation levels"
>>  #endif
>>  
>> -	mov	x4, EXTRA_PTRS
>> +#if (VA_BITS == EXTRA_SHIFT)
>> +	mov	x6, #TCR_T0SZ(VA_BITS_MIN)
>> +	sub	x6, x6, x5
>> +	cmp	x6, #(PAGE_SHIFT - 3)
>> +	b.gt	8f
>> +
>> +	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
>>  	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
>> -#else
>> -	/*
>> -	 * If VA_BITS == 48, we don't have to configure an additional
>> -	 * translation level, but the top-level table has more entries.
>> -	 */
>> -	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
>> +	b	1f
>> +8:
>> +	shift_to_ptrs x4, EXTRA_SHIFT_1, x5, x6
>> +	create_table_entry x0, x3, EXTRA_SHIFT_1, x4, x5, x6
>> +
>> +	mov     x4, PTRS_PER_PTE
>> +	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
>> +#elif (VA_BITS < EXTRA_SHIFT)
>> +	mov	x6, #64
>> +	sub	x6, x6, x5
>> +	cmp	x6, EXTRA_SHIFT
>> +	b.eq	1f
>> +	b.gt	9f
>> +
>> +	shift_to_ptrs x4, PGDIR_SHIFT, x5, x6
>>  	str_l	x4, idmap_ptrs_per_pgd, x5
>> +	b	1f
>> +9:
>> +	shift_to_ptrs x4, EXTRA_SHIFT, x5, x6
>> +	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
>>  #endif
>>  1:
>>  	ldr_l	x4, idmap_ptrs_per_pgd
>> -- 
>> 2.7.4
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
  2021-10-12  4:49     ` Anshuman Khandual
@ 2021-10-12  9:49       ` Mark Rutland
  -1 siblings, 0 replies; 10+ messages in thread
From: Mark Rutland @ 2021-10-12  9:49 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, James Morse,
	Marc Zyngier, linux-kernel

On Tue, Oct 12, 2021 at 10:19:49AM +0530, Anshuman Khandual wrote:
> On 10/4/21 4:19 PM, Mark Rutland wrote:
> > On Tue, Sep 28, 2021 at 11:03:45AM +0530, Anshuman Khandual wrote:
> > Ignoring backports, I'd prefer if we could refactor things such that we
> > decouple the `idmap_pg_dir` creation from the `init_pg_dir` creation,
> 
> Decoupling both page table creation makes sense.
> 
> > and create the idmap in terms of the architectural levels rather than
> > pgd/p4d/pud/pmd/pte, so that we can consistently create the idmap with
> > at least 48 bits of VA.
> 
> The rationale for creating the idmap table in terms of architectural
> levels, rather than kernel pgd/p4d/pud/pmd/pte is to avoid handling
> page table folding stuff and also to make it simpler ?

Yup; decoupling it from the pgd/p4d/pud/pmd/pte levels means that we can
avoid the conditional extension logic (which always has to be precisely
the inverse of the regular table creation code), and means we can always
map the idmap at page granularity regardless of whether we're using
section mappings for the init_pg_dir.

> > Pseudo-code wise, I'd like something that looks like:
> > 
> > 	create_idmap(...)
> > 	{
> > 		idmap_va_bits = 48;
> > 		idmap_t0size = TCR_T0SZ(48);
> > 
> > 		if (need_52_bit_va(__idmap_text_start)) {
> 
> s/__idmap_text_start/__idmap_text_end/ instead ?

I don't think that's necessary. The idmap is always 4K aligned, and we
force it to be no more than 4K in size, so as long as we can map the
start we can map the rest.

If the idmap were exactly 4K in size, and happend to be placed at PA
0x0000ffff_fffff000, we can map that with a 48-bit VA, but checking
__idmap_text_end would force a 52-bit VA.

That shouldn't really matter either way since the idmap is in the middle
of the .text section, but I'd prefer we just check the start address so
that we're using hte same address we'll use to create the mapping from.

> 
> > 			if (!supports_52bit_va()) {
> > 				some_early_spin_loop();
> 
> With a new CPU_STUCK_REASON_ code ?

I'm happy with reusing CPU_STUCK_REASON_52_BIT_VA, or with adding some
new CPU_STUCK_REASON_IDMAP_${whatever}.

> > 			}
> > 			idmap_va_bits = 52;
> > 			idmap_t0size = TCR_T0SZ(52);
> > 		}
> > 	
> > 		if (need_table_level(idmap_va_bits, -1))
> > 			create_table_level(-1, ...);
> > 
> > 		if (need_table_level(idmap_va_bits, 0))
> > 			create_table_level(0, ...);
> > 
> > 		if (need_table_level(idmap_va_bits, 1))
> > 			create_table_level(1, ...);
> > 
> > 		if (need_table_level(idmap_va_bits, 2))
> > 			create_table_level(2, ...);
> > 
> > 		create_table_level(3, ...);
> > 	}
> > 
> > ... which I think would be much easier to reason about consistently.
> > 
> > How does that sound to you?
> 
> This approach will be simpler and as you mentioned, easier to reason about.
> 
> > 
> > I've pushed some preparatory rework out to my arm64/pgtable/idmap
> > branch, splitting out a __create_idmap_tables() function (and ensuring
> > that idmap_t0sz doesn't get silently overridden elsewhere):
> > 
> >   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/pgtable/idmap
> >   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/pgtable/idmap
> > 
> > ... but I haven't had the chance to do the actual rework of the idmap
> > creation code.
> > 
> > I can send that as a series if that's helpful.
> 
> I could also just pick those changes from the above branch and complete
> the rework.

I'm happy with that if it works for you.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA]
@ 2021-10-12  9:49       ` Mark Rutland
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Rutland @ 2021-10-12  9:49 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, James Morse,
	Marc Zyngier, linux-kernel

On Tue, Oct 12, 2021 at 10:19:49AM +0530, Anshuman Khandual wrote:
> On 10/4/21 4:19 PM, Mark Rutland wrote:
> > On Tue, Sep 28, 2021 at 11:03:45AM +0530, Anshuman Khandual wrote:
> > Ignoring backports, I'd prefer if we could refactor things such that we
> > decouple the `idmap_pg_dir` creation from the `init_pg_dir` creation,
> 
> Decoupling both page table creation makes sense.
> 
> > and create the idmap in terms of the architectural levels rather than
> > pgd/p4d/pud/pmd/pte, so that we can consistently create the idmap with
> > at least 48 bits of VA.
> 
> The rationale for creating the idmap table in terms of architectural
> levels, rather than kernel pgd/p4d/pud/pmd/pte is to avoid handling
> page table folding stuff and also to make it simpler ?

Yup; decoupling it from the pgd/p4d/pud/pmd/pte levels means that we can
avoid the conditional extension logic (which always has to be precisely
the inverse of the regular table creation code), and means we can always
map the idmap at page granularity regardless of whether we're using
section mappings for the init_pg_dir.

> > Pseudo-code wise, I'd like something that looks like:
> > 
> > 	create_idmap(...)
> > 	{
> > 		idmap_va_bits = 48;
> > 		idmap_t0size = TCR_T0SZ(48);
> > 
> > 		if (need_52_bit_va(__idmap_text_start)) {
> 
> s/__idmap_text_start/__idmap_text_end/ instead ?

I don't think that's necessary. The idmap is always 4K aligned, and we
force it to be no more than 4K in size, so as long as we can map the
start we can map the rest.

If the idmap were exactly 4K in size, and happend to be placed at PA
0x0000ffff_fffff000, we can map that with a 48-bit VA, but checking
__idmap_text_end would force a 52-bit VA.

That shouldn't really matter either way since the idmap is in the middle
of the .text section, but I'd prefer we just check the start address so
that we're using hte same address we'll use to create the mapping from.

> 
> > 			if (!supports_52bit_va()) {
> > 				some_early_spin_loop();
> 
> With a new CPU_STUCK_REASON_ code ?

I'm happy with reusing CPU_STUCK_REASON_52_BIT_VA, or with adding some
new CPU_STUCK_REASON_IDMAP_${whatever}.

> > 			}
> > 			idmap_va_bits = 52;
> > 			idmap_t0size = TCR_T0SZ(52);
> > 		}
> > 	
> > 		if (need_table_level(idmap_va_bits, -1))
> > 			create_table_level(-1, ...);
> > 
> > 		if (need_table_level(idmap_va_bits, 0))
> > 			create_table_level(0, ...);
> > 
> > 		if (need_table_level(idmap_va_bits, 1))
> > 			create_table_level(1, ...);
> > 
> > 		if (need_table_level(idmap_va_bits, 2))
> > 			create_table_level(2, ...);
> > 
> > 		create_table_level(3, ...);
> > 	}
> > 
> > ... which I think would be much easier to reason about consistently.
> > 
> > How does that sound to you?
> 
> This approach will be simpler and as you mentioned, easier to reason about.
> 
> > 
> > I've pushed some preparatory rework out to my arm64/pgtable/idmap
> > branch, splitting out a __create_idmap_tables() function (and ensuring
> > that idmap_t0sz doesn't get silently overridden elsewhere):
> > 
> >   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/pgtable/idmap
> >   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/pgtable/idmap
> > 
> > ... but I haven't had the chance to do the actual rework of the idmap
> > creation code.
> > 
> > I can send that as a series if that's helpful.
> 
> I could also just pick those changes from the above branch and complete
> the rework.

I'm happy with that if it works for you.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-12  9:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28  5:33 [PATCH V2] arm64/mm: Fix idmap on [16K|36VA|48PA] Anshuman Khandual
2021-09-28  5:33 ` Anshuman Khandual
2021-10-04 10:49 ` Mark Rutland
2021-10-04 10:49   ` Mark Rutland
2021-10-04 17:23   ` Will Deacon
2021-10-04 17:23     ` Will Deacon
2021-10-12  4:49   ` Anshuman Khandual
2021-10-12  4:49     ` Anshuman Khandual
2021-10-12  9:49     ` Mark Rutland
2021-10-12  9:49       ` Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.