* [PATCH v4 01/26] arm64: head: move kimage_vaddr variable into C file
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-14 8:26 ` Anshuman Khandual
2022-06-13 14:45 ` [PATCH v4 02/26] arm64: mm: make vabits_actual a build time constant if possible Ard Biesheuvel
` (25 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
This variable definition does not need to be in head.S so move it out.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 7 -------
arch/arm64/mm/mmu.c | 3 +++
2 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 6a98f1a38c29..1cdecce552bb 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -469,13 +469,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
ASM_BUG()
SYM_FUNC_END(__primary_switched)
- .pushsection ".rodata", "a"
-SYM_DATA_START(kimage_vaddr)
- .quad _text
-SYM_DATA_END(kimage_vaddr)
-EXPORT_SYMBOL(kimage_vaddr)
- .popsection
-
/*
* end early head section, begin head code that is also used for
* hotplug and needs to have the same protections as the text region
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c5563ff990da..7148928e3932 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -49,6 +49,9 @@ u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
u64 __section(".mmuoff.data.write") vabits_actual;
EXPORT_SYMBOL(vabits_actual);
+u64 kimage_vaddr __ro_after_init = (u64)&_text;
+EXPORT_SYMBOL(kimage_vaddr);
+
u64 kimage_voffset __ro_after_init;
EXPORT_SYMBOL(kimage_voffset);
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 01/26] arm64: head: move kimage_vaddr variable into C file
2022-06-13 14:45 ` [PATCH v4 01/26] arm64: head: move kimage_vaddr variable into C file Ard Biesheuvel
@ 2022-06-14 8:26 ` Anshuman Khandual
0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2022-06-14 8:26 UTC (permalink / raw)
To: Ard Biesheuvel, linux-arm-kernel
Cc: linux-hardening, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
On 6/13/22 20:15, Ard Biesheuvel wrote:
> This variable definition does not need to be in head.S so move it out.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> arch/arm64/kernel/head.S | 7 -------
> arch/arm64/mm/mmu.c | 3 +++
> 2 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 6a98f1a38c29..1cdecce552bb 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -469,13 +469,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
> ASM_BUG()
> SYM_FUNC_END(__primary_switched)
>
> - .pushsection ".rodata", "a"
> -SYM_DATA_START(kimage_vaddr)
> - .quad _text
> -SYM_DATA_END(kimage_vaddr)
> -EXPORT_SYMBOL(kimage_vaddr)
> - .popsection
> -
> /*
> * end early head section, begin head code that is also used for
> * hotplug and needs to have the same protections as the text region
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index c5563ff990da..7148928e3932 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -49,6 +49,9 @@ u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
> u64 __section(".mmuoff.data.write") vabits_actual;
> EXPORT_SYMBOL(vabits_actual);
>
> +u64 kimage_vaddr __ro_after_init = (u64)&_text;
> +EXPORT_SYMBOL(kimage_vaddr);
> +
> u64 kimage_voffset __ro_after_init;
> EXPORT_SYMBOL(kimage_voffset);
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 02/26] arm64: mm: make vabits_actual a build time constant if possible
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 01/26] arm64: head: move kimage_vaddr variable into C file Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-14 8:25 ` Anshuman Khandual
2022-06-13 14:45 ` [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code Ard Biesheuvel
` (24 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Currently, we only support 52-bit virtual addressing on 64k pages
configurations, and in all other cases, vabits_actual is guaranteed to
equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in
that case.
While at it, move the assignment out of the asm entry code - it has no
need to be there.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/memory.h | 4 ++++
arch/arm64/kernel/head.S | 15 +--------------
arch/arm64/mm/mmu.c | 15 ++++++++++++++-
3 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0af70d9abede..c751cd9b94f8 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -174,7 +174,11 @@
#include <linux/types.h>
#include <asm/bug.h>
+#if VA_BITS > 48
extern u64 vabits_actual;
+#else
+#define vabits_actual ((u64)VA_BITS)
+#endif
extern s64 memstart_addr;
/* PHYS_OFFSET - the physical address of the start of memory. */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 1cdecce552bb..dc07858eb673 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -293,19 +293,6 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
adrp x0, idmap_pg_dir
adrp x3, __idmap_text_start // __pa(__idmap_text_start)
-#ifdef CONFIG_ARM64_VA_BITS_52
- mrs_s x6, SYS_ID_AA64MMFR2_EL1
- and x6, x6, #(0xf << ID_AA64MMFR2_LVA_SHIFT)
- mov x5, #52
- cbnz x6, 1f
-#endif
- mov x5, #VA_BITS_MIN
-1:
- adr_l x6, vabits_actual
- str x5, [x6]
- dmb sy
- dc ivac, x6 // Invalidate potentially stale cache line
-
/*
* VA_BITS may be too small to allow for an ID mapping to be created
* that covers system RAM if that is located sufficiently high in the
@@ -713,7 +700,7 @@ SYM_FUNC_START(__enable_mmu)
SYM_FUNC_END(__enable_mmu)
SYM_FUNC_START(__cpu_secondary_check52bitva)
-#ifdef CONFIG_ARM64_VA_BITS_52
+#if VA_BITS > 48
ldr_l x0, vabits_actual
cmp x0, #52
b.ne 2f
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7148928e3932..17b339c1a326 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -46,8 +46,10 @@
u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
-u64 __section(".mmuoff.data.write") vabits_actual;
+#if VA_BITS > 48
+u64 vabits_actual __ro_after_init = VA_BITS_MIN;
EXPORT_SYMBOL(vabits_actual);
+#endif
u64 kimage_vaddr __ro_after_init = (u64)&_text;
EXPORT_SYMBOL(kimage_vaddr);
@@ -772,6 +774,17 @@ void __init paging_init(void)
{
pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
+#if VA_BITS > 48
+ if (cpuid_feature_extract_unsigned_field(
+ read_sysreg_s(SYS_ID_AA64MMFR2_EL1),
+ ID_AA64MMFR2_LVA_SHIFT))
+ vabits_actual = VA_BITS;
+
+ /* make the variable visible to secondaries with the MMU off */
+ dcache_clean_inval_poc((u64)&vabits_actual,
+ (u64)&vabits_actual + sizeof(vabits_actual));
+#endif
+
map_kernel(pgdp);
map_mem(pgdp);
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 02/26] arm64: mm: make vabits_actual a build time constant if possible
2022-06-13 14:45 ` [PATCH v4 02/26] arm64: mm: make vabits_actual a build time constant if possible Ard Biesheuvel
@ 2022-06-14 8:25 ` Anshuman Khandual
2022-06-14 8:34 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Anshuman Khandual @ 2022-06-14 8:25 UTC (permalink / raw)
To: Ard Biesheuvel, linux-arm-kernel
Cc: linux-hardening, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
On 6/13/22 20:15, Ard Biesheuvel wrote:
> Currently, we only support 52-bit virtual addressing on 64k pages
But going forward, will support on 4K/16K pages as well via FEAT_LPA2.
> configurations, and in all other cases, vabits_actual is guaranteed to
> equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in
> that case.
The change here does not really get rid of vabit_actual in those cases
either, it just makes it a build time constant AFAICS.
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -174,7 +174,11 @@
#include <linux/types.h>
#include <asm/bug.h>
+#if VA_BITS > 48
extern u64 vabits_actual;
+#else
+#define vabits_actual ((u64)VA_BITS)
+#endif
>
> While at it, move the assignment out of the asm entry code - it has no
> need to be there.
This also changes when vabits_actual gets evaluated ? Then how would it
know, that CPU needs to be stuck in kernel (CPU_STUCK_REASON_52_BIT_VA)
in case all secondary CPUs do not support large VA feature ? Looking at
the sequence...
secondary_entry
OR
secondary_holding_pen
secondary_startup
__cpu_secondary_check52bitva
primary_entry
__create_page_tables <--- original position
__primary_switch
start_kernel
setup_arch
paging_init <--- new position
It might still be possible for the secondary cpu start up sequence to
validate LVA support across the platform, but still why even send
vabits_actual evaluation down the line until paging_init(). Ideally
should not it be evaluated as early as possible during boot. Hence,
wondering - what is the real benefit here ?
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/include/asm/memory.h | 4 ++++
> arch/arm64/kernel/head.S | 15 +--------------
> arch/arm64/mm/mmu.c | 15 ++++++++++++++-
> 3 files changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 0af70d9abede..c751cd9b94f8 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -174,7 +174,11 @@
> #include <linux/types.h>
> #include <asm/bug.h>
>
> +#if VA_BITS > 48
> extern u64 vabits_actual;
> +#else
> +#define vabits_actual ((u64)VA_BITS)
> +#endif
>
> extern s64 memstart_addr;
> /* PHYS_OFFSET - the physical address of the start of memory. */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 1cdecce552bb..dc07858eb673 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -293,19 +293,6 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
> adrp x0, idmap_pg_dir
> adrp x3, __idmap_text_start // __pa(__idmap_text_start)
>
> -#ifdef CONFIG_ARM64_VA_BITS_52
> - mrs_s x6, SYS_ID_AA64MMFR2_EL1
> - and x6, x6, #(0xf << ID_AA64MMFR2_LVA_SHIFT)
> - mov x5, #52
> - cbnz x6, 1f
> -#endif
> - mov x5, #VA_BITS_MIN
> -1:
> - adr_l x6, vabits_actual
> - str x5, [x6]
> - dmb sy
> - dc ivac, x6 // Invalidate potentially stale cache line
> -
> /*
> * VA_BITS may be too small to allow for an ID mapping to be created
> * that covers system RAM if that is located sufficiently high in the
> @@ -713,7 +700,7 @@ SYM_FUNC_START(__enable_mmu)
> SYM_FUNC_END(__enable_mmu)
>
> SYM_FUNC_START(__cpu_secondary_check52bitva)
> -#ifdef CONFIG_ARM64_VA_BITS_52
> +#if VA_BITS > 48
Just curious - why this is any better ? Although both (VA_BITS > 48)
and CONFIG_ARM64_VA_BITS_52 are build time constants.
> ldr_l x0, vabits_actual
> cmp x0, #52
> b.ne 2f
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7148928e3932..17b339c1a326 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -46,8 +46,10 @@
> u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
> u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
>
> -u64 __section(".mmuoff.data.write") vabits_actual;
> +#if VA_BITS > 48
> +u64 vabits_actual __ro_after_init = VA_BITS_MIN;
> EXPORT_SYMBOL(vabits_actual);
> +#endif
>
> u64 kimage_vaddr __ro_after_init = (u64)&_text;
> EXPORT_SYMBOL(kimage_vaddr);
> @@ -772,6 +774,17 @@ void __init paging_init(void)
> {
> pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
>
> +#if VA_BITS > 48
> + if (cpuid_feature_extract_unsigned_field(
> + read_sysreg_s(SYS_ID_AA64MMFR2_EL1),
> + ID_AA64MMFR2_LVA_SHIFT))
> + vabits_actual = VA_BITS;
> +
> + /* make the variable visible to secondaries with the MMU off */
> + dcache_clean_inval_poc((u64)&vabits_actual,
> + (u64)&vabits_actual + sizeof(vabits_actual));
> +#endif
> +
> map_kernel(pgdp);
> map_mem(pgdp);
>
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 02/26] arm64: mm: make vabits_actual a build time constant if possible
2022-06-14 8:25 ` Anshuman Khandual
@ 2022-06-14 8:34 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-14 8:34 UTC (permalink / raw)
To: Anshuman Khandual
Cc: Linux ARM, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown
On Tue, 14 Jun 2022 at 10:25, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
> On 6/13/22 20:15, Ard Biesheuvel wrote:
> > Currently, we only support 52-bit virtual addressing on 64k pages
>
> But going forward, will support on 4K/16K pages as well via FEAT_LPA2.
>
> > configurations, and in all other cases, vabits_actual is guaranteed to
> > equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in
> > that case.
>
> The change here does not really get rid of vabit_actual in those cases
> either, it just makes it a build time constant AFAICS.
>
Indeed, and so it ceases to be a variable.
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -174,7 +174,11 @@
> #include <linux/types.h>
> #include <asm/bug.h>
>
> +#if VA_BITS > 48
> extern u64 vabits_actual;
> +#else
> +#define vabits_actual ((u64)VA_BITS)
> +#endif
>
> >
> > While at it, move the assignment out of the asm entry code - it has no
> > need to be there.
>
> This also changes when vabits_actual gets evaluated ? Then how would it
> know, that CPU needs to be stuck in kernel (CPU_STUCK_REASON_52_BIT_VA)
> in case all secondary CPUs do not support large VA feature ? Looking at
> the sequence...
>
> secondary_entry
> OR
> secondary_holding_pen
> secondary_startup
> __cpu_secondary_check52bitva
>
> primary_entry
> __create_page_tables <--- original position
> __primary_switch
> start_kernel
> setup_arch
> paging_init <--- new position
>
> It might still be possible for the secondary cpu start up sequence to
> validate LVA support across the platform, but still why even send
> vabits_actual evaluation down the line until paging_init(). Ideally
> should not it be evaluated as early as possible during boot. Hence,
> wondering - what is the real benefit here ?
>
Why should it be evaluated as early as possible? The whole point is
deferring it so we don't have to do it from asm code.
But I suppose doing it as early as possible from C code (i.e., in
setup_arch() before arm64_memblock_init() or even before
early_fixmap_init()) might be better.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/arm64/include/asm/memory.h | 4 ++++
> > arch/arm64/kernel/head.S | 15 +--------------
> > arch/arm64/mm/mmu.c | 15 ++++++++++++++-
> > 3 files changed, 19 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index 0af70d9abede..c751cd9b94f8 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -174,7 +174,11 @@
> > #include <linux/types.h>
> > #include <asm/bug.h>
> >
> > +#if VA_BITS > 48
> > extern u64 vabits_actual;
> > +#else
> > +#define vabits_actual ((u64)VA_BITS)
> > +#endif
> >
> > extern s64 memstart_addr;
> > /* PHYS_OFFSET - the physical address of the start of memory. */
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 1cdecce552bb..dc07858eb673 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -293,19 +293,6 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
> > adrp x0, idmap_pg_dir
> > adrp x3, __idmap_text_start // __pa(__idmap_text_start)
> >
> > -#ifdef CONFIG_ARM64_VA_BITS_52
> > - mrs_s x6, SYS_ID_AA64MMFR2_EL1
> > - and x6, x6, #(0xf << ID_AA64MMFR2_LVA_SHIFT)
> > - mov x5, #52
> > - cbnz x6, 1f
> > -#endif
> > - mov x5, #VA_BITS_MIN
> > -1:
> > - adr_l x6, vabits_actual
> > - str x5, [x6]
> > - dmb sy
> > - dc ivac, x6 // Invalidate potentially stale cache line
> > -
> > /*
> > * VA_BITS may be too small to allow for an ID mapping to be created
> > * that covers system RAM if that is located sufficiently high in the
> > @@ -713,7 +700,7 @@ SYM_FUNC_START(__enable_mmu)
> > SYM_FUNC_END(__enable_mmu)
> >
> > SYM_FUNC_START(__cpu_secondary_check52bitva)
> > -#ifdef CONFIG_ARM64_VA_BITS_52
> > +#if VA_BITS > 48
>
> Just curious - why this is any better ? Although both (VA_BITS > 48)
> and CONFIG_ARM64_VA_BITS_52 are build time constants.
>
VA_BITS > 48 is a bit more readable, and more likely to remain accurate.
> > ldr_l x0, vabits_actual
> > cmp x0, #52
> > b.ne 2f
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 7148928e3932..17b339c1a326 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -46,8 +46,10 @@
> > u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
> > u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
> >
> > -u64 __section(".mmuoff.data.write") vabits_actual;
> > +#if VA_BITS > 48
> > +u64 vabits_actual __ro_after_init = VA_BITS_MIN;
> > EXPORT_SYMBOL(vabits_actual);
> > +#endif
> >
> > u64 kimage_vaddr __ro_after_init = (u64)&_text;
> > EXPORT_SYMBOL(kimage_vaddr);
> > @@ -772,6 +774,17 @@ void __init paging_init(void)
> > {
> > pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
> >
> > +#if VA_BITS > 48
> > + if (cpuid_feature_extract_unsigned_field(
> > + read_sysreg_s(SYS_ID_AA64MMFR2_EL1),
> > + ID_AA64MMFR2_LVA_SHIFT))
> > + vabits_actual = VA_BITS;
> > +
> > + /* make the variable visible to secondaries with the MMU off */
> > + dcache_clean_inval_poc((u64)&vabits_actual,
> > + (u64)&vabits_actual + sizeof(vabits_actual));
> > +#endif
> > +
> > map_kernel(pgdp);
> > map_mem(pgdp);
> >
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 01/26] arm64: head: move kimage_vaddr variable into C file Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 02/26] arm64: mm: make vabits_actual a build time constant if possible Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-14 9:22 ` Anshuman Khandual
2022-06-24 12:36 ` Will Deacon
2022-06-13 14:45 ` [PATCH v4 04/26] arm64: head: drop idmap_ptrs_per_pgd Ard Biesheuvel
` (23 subsequent siblings)
26 siblings, 2 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Setting idmap_t0sz involves fiddling with the caches if done with the
MMU off. Since we will be creating an initial ID map with the MMU and
caches off, and the permanent ID map with the MMU and caches on, let's
move this assignment of idmap_t0sz out of the startup code, and replace
it with a macro that simply issues the three instructions needed to
calculate the value wherever it is needed before the MMU is turned on.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/assembler.h | 14 ++++++++++++++
arch/arm64/include/asm/mmu_context.h | 2 +-
arch/arm64/kernel/head.S | 13 +------------
arch/arm64/mm/mmu.c | 5 ++++-
arch/arm64/mm/proc.S | 2 +-
5 files changed, 21 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 8c5a61aeaf8e..9468f45c07a6 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -359,6 +359,20 @@ alternative_cb_end
bfi \valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH
.endm
+/*
+ * idmap_get_t0sz - get the T0SZ value needed to cover the ID map
+ *
+ * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
+ * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
+ * this number conveniently equals the number of leading zeroes in
+ * the physical address of _end.
+ */
+ .macro idmap_get_t0sz, reg
+ adrp \reg, _end
+ orr \reg, \reg, #(1 << VA_BITS_MIN) - 1
+ clz \reg, \reg
+ .endm
+
/*
* tcr_compute_pa_size - set TCR.(I)PS to the highest supported
* ID_AA64MMFR0_EL1.PARange value
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 6770667b34a3..6ac0086ebb1a 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -60,7 +60,7 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
* TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
* physical memory, in which case it will be smaller.
*/
-extern u64 idmap_t0sz;
+extern int idmap_t0sz;
extern u64 idmap_ptrs_per_pgd;
/*
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index dc07858eb673..7f361bc72d12 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -299,22 +299,11 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
* physical address space. So for the ID map, use an extended virtual
* range in that case, and configure an additional translation level
* if needed.
- *
- * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
- * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
- * this number conveniently equals the number of leading zeroes in
- * the physical address of __idmap_text_end.
*/
- adrp x5, __idmap_text_end
- clz x5, x5
+ idmap_get_t0sz x5
cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
b.ge 1f // .. then skip VA range extension
- adr_l x6, idmap_t0sz
- str x5, [x6]
- dmb sy
- dc ivac, x6 // Invalidate potentially stale cache line
-
#if (VA_BITS < 48)
#define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
#define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 17b339c1a326..103bf4ae408d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -43,7 +43,7 @@
#define NO_CONT_MAPPINGS BIT(1)
#define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
-u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
+int idmap_t0sz __ro_after_init;
u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
#if VA_BITS > 48
@@ -785,6 +785,9 @@ void __init paging_init(void)
(u64)&vabits_actual + sizeof(vabits_actual));
#endif
+ idmap_t0sz = min(63UL - __fls(__pa_symbol(_end)),
+ TCR_T0SZ(VA_BITS_MIN));
+
map_kernel(pgdp);
map_mem(pgdp);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 972ce8d7f2c5..97cd67697212 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -470,7 +470,7 @@ SYM_FUNC_START(__cpu_setup)
add x9, x9, #64
tcr_set_t1sz tcr, x9
#else
- ldr_l x9, idmap_t0sz
+ idmap_get_t0sz x9
#endif
tcr_set_t0sz tcr, x9
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code
2022-06-13 14:45 ` [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code Ard Biesheuvel
@ 2022-06-14 9:22 ` Anshuman Khandual
2022-06-14 9:34 ` Ard Biesheuvel
2022-06-24 12:36 ` Will Deacon
1 sibling, 1 reply; 57+ messages in thread
From: Anshuman Khandual @ 2022-06-14 9:22 UTC (permalink / raw)
To: Ard Biesheuvel, linux-arm-kernel
Cc: linux-hardening, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
On 6/13/22 20:15, Ard Biesheuvel wrote:
> Setting idmap_t0sz involves fiddling with the caches if done with the
> MMU off. Since we will be creating an initial ID map with the MMU and
> caches off, and the permanent ID map with the MMU and caches on, let's
> move this assignment of idmap_t0sz out of the startup code, and replace
> it with a macro that simply issues the three instructions needed to
> calculate the value wherever it is needed before the MMU is turned on.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/include/asm/assembler.h | 14 ++++++++++++++
> arch/arm64/include/asm/mmu_context.h | 2 +-
> arch/arm64/kernel/head.S | 13 +------------
> arch/arm64/mm/mmu.c | 5 ++++-
> arch/arm64/mm/proc.S | 2 +-
> 5 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index 8c5a61aeaf8e..9468f45c07a6 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -359,6 +359,20 @@ alternative_cb_end
> bfi \valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH
> .endm
>
> +/*
> + * idmap_get_t0sz - get the T0SZ value needed to cover the ID map
> + *
> + * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> + * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
> + * this number conveniently equals the number of leading zeroes in
> + * the physical address of _end.
> + */
> + .macro idmap_get_t0sz, reg
> + adrp \reg, _end
> + orr \reg, \reg, #(1 << VA_BITS_MIN) - 1
> + clz \reg, \reg
> + .endm
Is there any particular reason to evaluate idmap t0sz from '__end' and
VA_BITS_MIN, instead of '__idmap_text_end', as was the case previously.
> +
> /*
> * tcr_compute_pa_size - set TCR.(I)PS to the highest supported
> * ID_AA64MMFR0_EL1.PARange value
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index 6770667b34a3..6ac0086ebb1a 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -60,7 +60,7 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
> * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
> * physical memory, in which case it will be smaller.
> */
> -extern u64 idmap_t0sz;
> +extern int idmap_t0sz;
> extern u64 idmap_ptrs_per_pgd;
>
> /*
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index dc07858eb673..7f361bc72d12 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -299,22 +299,11 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
> * physical address space. So for the ID map, use an extended virtual
> * range in that case, and configure an additional translation level
> * if needed.
> - *
> - * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> - * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
> - * this number conveniently equals the number of leading zeroes in
> - * the physical address of __idmap_text_end.
> */
> - adrp x5, __idmap_text_end
> - clz x5, x5
> + idmap_get_t0sz x5
> cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
> b.ge 1f // .. then skip VA range extension
>
> - adr_l x6, idmap_t0sz
> - str x5, [x6]
> - dmb sy
> - dc ivac, x6 // Invalidate potentially stale cache line
Right, as there is no 'idmap_t0sz' variable to update, cache maintenance
can be dropped off.
> -
> #if (VA_BITS < 48)
> #define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
> #define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17b339c1a326..103bf4ae408d 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -43,7 +43,7 @@
> #define NO_CONT_MAPPINGS BIT(1)
> #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
>
> -u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
> +int idmap_t0sz __ro_after_init;
I guess this is just to reduce 'idmap_t0sz' memory foot print.
> u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
>
> #if VA_BITS > 48
> @@ -785,6 +785,9 @@ void __init paging_init(void)
> (u64)&vabits_actual + sizeof(vabits_actual));
> #endif
>
> + idmap_t0sz = min(63UL - __fls(__pa_symbol(_end)),
> + TCR_T0SZ(VA_BITS_MIN));
> +
Just curious - but does not this also need some sync for the update
to be visible across the system ?
#define cpu_set_idmap_tcr_t0sz() __cpu_set_tcr_t0sz(idmap_t0sz)
static inline void cpu_install_idmap(void)
{
cpu_set_reserved_ttbr0();
local_flush_tlb_all();
cpu_set_idmap_tcr_t0sz();
cpu_switch_mm(lm_alias(idmap_pg_dir), &init_mm);
}
> map_kernel(pgdp);
> map_mem(pgdp);
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 972ce8d7f2c5..97cd67697212 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -470,7 +470,7 @@ SYM_FUNC_START(__cpu_setup)
> add x9, x9, #64
> tcr_set_t1sz tcr, x9
> #else
> - ldr_l x9, idmap_t0sz
> + idmap_get_t0sz x9
> #endif
> tcr_set_t0sz tcr, x9
>
Avoiding one cache maintenance in __create_page_table(), now makes us
again evaluate idmap_t0sz in __cpu_setup(), and also capture & update
idmap_t0sz in paging_init(). This change moves idmap_t0sz outside the
asm functions but from performance perspecive, is there an improvement ?
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code
2022-06-14 9:22 ` Anshuman Khandual
@ 2022-06-14 9:34 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-14 9:34 UTC (permalink / raw)
To: Anshuman Khandual
Cc: Linux ARM, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown
On Tue, 14 Jun 2022 at 11:22, Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
> On 6/13/22 20:15, Ard Biesheuvel wrote:
> > Setting idmap_t0sz involves fiddling with the caches if done with the
> > MMU off. Since we will be creating an initial ID map with the MMU and
> > caches off, and the permanent ID map with the MMU and caches on, let's
> > move this assignment of idmap_t0sz out of the startup code, and replace
> > it with a macro that simply issues the three instructions needed to
> > calculate the value wherever it is needed before the MMU is turned on.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/arm64/include/asm/assembler.h | 14 ++++++++++++++
> > arch/arm64/include/asm/mmu_context.h | 2 +-
> > arch/arm64/kernel/head.S | 13 +------------
> > arch/arm64/mm/mmu.c | 5 ++++-
> > arch/arm64/mm/proc.S | 2 +-
> > 5 files changed, 21 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> > index 8c5a61aeaf8e..9468f45c07a6 100644
> > --- a/arch/arm64/include/asm/assembler.h
> > +++ b/arch/arm64/include/asm/assembler.h
> > @@ -359,6 +359,20 @@ alternative_cb_end
> > bfi \valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH
> > .endm
> >
> > +/*
> > + * idmap_get_t0sz - get the T0SZ value needed to cover the ID map
> > + *
> > + * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> > + * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
> > + * this number conveniently equals the number of leading zeroes in
> > + * the physical address of _end.
> > + */
> > + .macro idmap_get_t0sz, reg
> > + adrp \reg, _end
> > + orr \reg, \reg, #(1 << VA_BITS_MIN) - 1
> > + clz \reg, \reg
> > + .endm
>
> Is there any particular reason to evaluate idmap t0sz from '__end' and
> VA_BITS_MIN, instead of '__idmap_text_end', as was the case previously.
>
Ah yes, I failed to mention that. In a later patch, the ID map will
cover the entire image.
> > +
> > /*
> > * tcr_compute_pa_size - set TCR.(I)PS to the highest supported
> > * ID_AA64MMFR0_EL1.PARange value
> > diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> > index 6770667b34a3..6ac0086ebb1a 100644
> > --- a/arch/arm64/include/asm/mmu_context.h
> > +++ b/arch/arm64/include/asm/mmu_context.h
> > @@ -60,7 +60,7 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
> > * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
> > * physical memory, in which case it will be smaller.
> > */
> > -extern u64 idmap_t0sz;
> > +extern int idmap_t0sz;
> > extern u64 idmap_ptrs_per_pgd;
> >
> > /*
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index dc07858eb673..7f361bc72d12 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -299,22 +299,11 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
> > * physical address space. So for the ID map, use an extended virtual
> > * range in that case, and configure an additional translation level
> > * if needed.
> > - *
> > - * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> > - * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
> > - * this number conveniently equals the number of leading zeroes in
> > - * the physical address of __idmap_text_end.
> > */
> > - adrp x5, __idmap_text_end
> > - clz x5, x5
> > + idmap_get_t0sz x5
> > cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
> > b.ge 1f // .. then skip VA range extension
> >
> > - adr_l x6, idmap_t0sz
> > - str x5, [x6]
> > - dmb sy
> > - dc ivac, x6 // Invalidate potentially stale cache line
>
> Right, as there is no 'idmap_t0sz' variable to update, cache maintenance
> can be dropped off.
>
> > -
> > #if (VA_BITS < 48)
> > #define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
> > #define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 17b339c1a326..103bf4ae408d 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -43,7 +43,7 @@
> > #define NO_CONT_MAPPINGS BIT(1)
> > #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
> >
> > -u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
> > +int idmap_t0sz __ro_after_init;
>
> I guess this is just to reduce 'idmap_t0sz' memory foot print.
>
It's essentially the 2log of a u64 so it doesn't have to be a u64. The
footprint doesn't really matter, of course.
> > u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
> >
> > #if VA_BITS > 48
> > @@ -785,6 +785,9 @@ void __init paging_init(void)
> > (u64)&vabits_actual + sizeof(vabits_actual));
> > #endif
> >
> > + idmap_t0sz = min(63UL - __fls(__pa_symbol(_end)),
> > + TCR_T0SZ(VA_BITS_MIN));
> > +
>
> Just curious - but does not this also need some sync for the update
> to be visible across the system ?
>
No it does not, now that the asm macro no longer refers to the variable.
> > map_kernel(pgdp);
> > map_mem(pgdp);
> >
> > diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> > index 972ce8d7f2c5..97cd67697212 100644
> > --- a/arch/arm64/mm/proc.S
> > +++ b/arch/arm64/mm/proc.S
> > @@ -470,7 +470,7 @@ SYM_FUNC_START(__cpu_setup)
> > add x9, x9, #64
> > tcr_set_t1sz tcr, x9
> > #else
> > - ldr_l x9, idmap_t0sz
> > + idmap_get_t0sz x9
> > #endif
> > tcr_set_t0sz tcr, x9
> >
>
> Avoiding one cache maintenance in __create_page_table(), now makes us
> again evaluate idmap_t0sz in __cpu_setup(), and also capture & update
> idmap_t0sz in paging_init(). This change moves idmap_t0sz outside the
> asm functions but from performance perspecive, is there an improvement ?
No, the performance is not expected to be affected, and a ~10
instruction delta at boot is not going to be measurable anyway. The
point of this patch is to remove the need to reason about how/when
variables are accessed, and whether that requires cache cleaning,
invalidation, system-wide DMBs etc.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code
2022-06-13 14:45 ` [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code Ard Biesheuvel
2022-06-14 9:22 ` Anshuman Khandual
@ 2022-06-24 12:36 ` Will Deacon
2022-06-24 12:57 ` Ard Biesheuvel
1 sibling, 1 reply; 57+ messages in thread
From: Will Deacon @ 2022-06-24 12:36 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:27PM +0200, Ard Biesheuvel wrote:
> Setting idmap_t0sz involves fiddling with the caches if done with the
> MMU off. Since we will be creating an initial ID map with the MMU and
> caches off, and the permanent ID map with the MMU and caches on, let's
> move this assignment of idmap_t0sz out of the startup code, and replace
> it with a macro that simply issues the three instructions needed to
> calculate the value wherever it is needed before the MMU is turned on.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/include/asm/assembler.h | 14 ++++++++++++++
> arch/arm64/include/asm/mmu_context.h | 2 +-
> arch/arm64/kernel/head.S | 13 +------------
> arch/arm64/mm/mmu.c | 5 ++++-
> arch/arm64/mm/proc.S | 2 +-
> 5 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index 8c5a61aeaf8e..9468f45c07a6 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -359,6 +359,20 @@ alternative_cb_end
> bfi \valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH
> .endm
>
> +/*
> + * idmap_get_t0sz - get the T0SZ value needed to cover the ID map
> + *
> + * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> + * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
> + * this number conveniently equals the number of leading zeroes in
> + * the physical address of _end.
> + */
> + .macro idmap_get_t0sz, reg
> + adrp \reg, _end
> + orr \reg, \reg, #(1 << VA_BITS_MIN) - 1
> + clz \reg, \reg
> + .endm
> +
> /*
> * tcr_compute_pa_size - set TCR.(I)PS to the highest supported
> * ID_AA64MMFR0_EL1.PARange value
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index 6770667b34a3..6ac0086ebb1a 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -60,7 +60,7 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
> * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
> * physical memory, in which case it will be smaller.
> */
> -extern u64 idmap_t0sz;
> +extern int idmap_t0sz;
> extern u64 idmap_ptrs_per_pgd;
>
> /*
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index dc07858eb673..7f361bc72d12 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -299,22 +299,11 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
> * physical address space. So for the ID map, use an extended virtual
> * range in that case, and configure an additional translation level
> * if needed.
> - *
> - * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> - * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
> - * this number conveniently equals the number of leading zeroes in
> - * the physical address of __idmap_text_end.
> */
> - adrp x5, __idmap_text_end
> - clz x5, x5
> + idmap_get_t0sz x5
> cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
> b.ge 1f // .. then skip VA range extension
>
> - adr_l x6, idmap_t0sz
> - str x5, [x6]
> - dmb sy
> - dc ivac, x6 // Invalidate potentially stale cache line
> -
> #if (VA_BITS < 48)
> #define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
> #define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 17b339c1a326..103bf4ae408d 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -43,7 +43,7 @@
> #define NO_CONT_MAPPINGS BIT(1)
> #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
>
> -u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
> +int idmap_t0sz __ro_after_init;
> u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
>
> #if VA_BITS > 48
> @@ -785,6 +785,9 @@ void __init paging_init(void)
> (u64)&vabits_actual + sizeof(vabits_actual));
> #endif
>
> + idmap_t0sz = min(63UL - __fls(__pa_symbol(_end)),
> + TCR_T0SZ(VA_BITS_MIN));
nit: TCR_T0SZ shifts by TCR_T0SZ_OFFSET, so this is a bit confusing and
works out because the register offset happens to be zero. Maybe it would
be clearer to calculate the maximum of fls(__pa_symbol(_end)) and
VA_BITS_MIN, then subtract that from 64?
Will
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code
2022-06-24 12:36 ` Will Deacon
@ 2022-06-24 12:57 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-24 12:57 UTC (permalink / raw)
To: Will Deacon
Cc: Linux ARM, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Fri, 24 Jun 2022 at 14:36, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jun 13, 2022 at 04:45:27PM +0200, Ard Biesheuvel wrote:
> > Setting idmap_t0sz involves fiddling with the caches if done with the
> > MMU off. Since we will be creating an initial ID map with the MMU and
> > caches off, and the permanent ID map with the MMU and caches on, let's
> > move this assignment of idmap_t0sz out of the startup code, and replace
> > it with a macro that simply issues the three instructions needed to
> > calculate the value wherever it is needed before the MMU is turned on.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/arm64/include/asm/assembler.h | 14 ++++++++++++++
> > arch/arm64/include/asm/mmu_context.h | 2 +-
> > arch/arm64/kernel/head.S | 13 +------------
> > arch/arm64/mm/mmu.c | 5 ++++-
> > arch/arm64/mm/proc.S | 2 +-
> > 5 files changed, 21 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> > index 8c5a61aeaf8e..9468f45c07a6 100644
> > --- a/arch/arm64/include/asm/assembler.h
> > +++ b/arch/arm64/include/asm/assembler.h
> > @@ -359,6 +359,20 @@ alternative_cb_end
> > bfi \valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH
> > .endm
> >
> > +/*
> > + * idmap_get_t0sz - get the T0SZ value needed to cover the ID map
> > + *
> > + * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> > + * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
> > + * this number conveniently equals the number of leading zeroes in
> > + * the physical address of _end.
> > + */
> > + .macro idmap_get_t0sz, reg
> > + adrp \reg, _end
> > + orr \reg, \reg, #(1 << VA_BITS_MIN) - 1
> > + clz \reg, \reg
> > + .endm
> > +
> > /*
> > * tcr_compute_pa_size - set TCR.(I)PS to the highest supported
> > * ID_AA64MMFR0_EL1.PARange value
> > diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> > index 6770667b34a3..6ac0086ebb1a 100644
> > --- a/arch/arm64/include/asm/mmu_context.h
> > +++ b/arch/arm64/include/asm/mmu_context.h
> > @@ -60,7 +60,7 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
> > * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
> > * physical memory, in which case it will be smaller.
> > */
> > -extern u64 idmap_t0sz;
> > +extern int idmap_t0sz;
> > extern u64 idmap_ptrs_per_pgd;
> >
> > /*
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index dc07858eb673..7f361bc72d12 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -299,22 +299,11 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
> > * physical address space. So for the ID map, use an extended virtual
> > * range in that case, and configure an additional translation level
> > * if needed.
> > - *
> > - * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
> > - * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
> > - * this number conveniently equals the number of leading zeroes in
> > - * the physical address of __idmap_text_end.
> > */
> > - adrp x5, __idmap_text_end
> > - clz x5, x5
> > + idmap_get_t0sz x5
> > cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
> > b.ge 1f // .. then skip VA range extension
> >
> > - adr_l x6, idmap_t0sz
> > - str x5, [x6]
> > - dmb sy
> > - dc ivac, x6 // Invalidate potentially stale cache line
> > -
> > #if (VA_BITS < 48)
> > #define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
> > #define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 17b339c1a326..103bf4ae408d 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -43,7 +43,7 @@
> > #define NO_CONT_MAPPINGS BIT(1)
> > #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
> >
> > -u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
> > +int idmap_t0sz __ro_after_init;
> > u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
> >
> > #if VA_BITS > 48
> > @@ -785,6 +785,9 @@ void __init paging_init(void)
> > (u64)&vabits_actual + sizeof(vabits_actual));
> > #endif
> >
> > + idmap_t0sz = min(63UL - __fls(__pa_symbol(_end)),
> > + TCR_T0SZ(VA_BITS_MIN));
>
> nit: TCR_T0SZ shifts by TCR_T0SZ_OFFSET, so this is a bit confusing and
> works out because the register offset happens to be zero. Maybe it would
> be clearer to calculate the maximum of fls(__pa_symbol(_end)) and
> VA_BITS_MIN, then subtract that from 64?
>
I just noticed there are other inconsistencies with TCR_T0SZ(), e.g.,
in create_safe_exec_page(), which receives the 'shifted' value of
t0sz, but then shifts it again in cpu_install_ttbr0(). So this is
definitely
Let's just use the same expression as in the idmap_get_t0sz macro I am adding:
idmap_t0sz = 63UL - __fls(__pa_symbol(_end) | GENMASK(VA_BITS_MIN - 1, 0));
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 04/26] arm64: head: drop idmap_ptrs_per_pgd
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (2 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 03/26] arm64: head: move assignment of idmap_t0sz to C code Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-15 4:07 ` Anshuman Khandual
2022-06-13 14:45 ` [PATCH v4 05/26] arm64: head: simplify page table mapping macros (slightly) Ard Biesheuvel
` (22 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
The assignment of idmap_ptrs_per_pgd lacks any cache invalidation, even
though it is updated with the MMU and caches disabled. However, we never
bother to read the value again except in the very next instruction, and
so we can just drop the variable entirely.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/mmu_context.h | 1 -
arch/arm64/kernel/head.S | 7 +++----
arch/arm64/mm/mmu.c | 1 -
3 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 6ac0086ebb1a..7b387c3b312a 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -61,7 +61,6 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
* physical memory, in which case it will be smaller.
*/
extern int idmap_t0sz;
-extern u64 idmap_ptrs_per_pgd;
/*
* Ensure TCR.T0SZ is set to the provided value.
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 7f361bc72d12..53126a35d73c 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -300,6 +300,7 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
* range in that case, and configure an additional translation level
* if needed.
*/
+ mov x4, #PTRS_PER_PGD
idmap_get_t0sz x5
cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
b.ge 1f // .. then skip VA range extension
@@ -319,18 +320,16 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
#error "Mismatch between VA_BITS and page size/number of translation levels"
#endif
- mov x4, EXTRA_PTRS
- create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
+ mov x2, EXTRA_PTRS
+ create_table_entry x0, x3, EXTRA_SHIFT, x2, x5, x6
#else
/*
* If VA_BITS == 48, we don't have to configure an additional
* translation level, but the top-level table has more entries.
*/
mov x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
- str_l x4, idmap_ptrs_per_pgd, x5
#endif
1:
- ldr_l x4, idmap_ptrs_per_pgd
adr_l x6, __idmap_text_end // __pa(__idmap_text_end)
map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 103bf4ae408d..0f95c91e5a8e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -44,7 +44,6 @@
#define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
int idmap_t0sz __ro_after_init;
-u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
#if VA_BITS > 48
u64 vabits_actual __ro_after_init = VA_BITS_MIN;
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 04/26] arm64: head: drop idmap_ptrs_per_pgd
2022-06-13 14:45 ` [PATCH v4 04/26] arm64: head: drop idmap_ptrs_per_pgd Ard Biesheuvel
@ 2022-06-15 4:07 ` Anshuman Khandual
0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2022-06-15 4:07 UTC (permalink / raw)
To: Ard Biesheuvel, linux-arm-kernel
Cc: linux-hardening, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
On 6/13/22 20:15, Ard Biesheuvel wrote:
> The assignment of idmap_ptrs_per_pgd lacks any cache invalidation, even
> though it is updated with the MMU and caches disabled. However, we never
Right, seems like an omission.
> bother to read the value again except in the very next instruction, and
> so we can just drop the variable entirely.
Right.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/include/asm/mmu_context.h | 1 -
> arch/arm64/kernel/head.S | 7 +++----
> arch/arm64/mm/mmu.c | 1 -
> 3 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index 6ac0086ebb1a..7b387c3b312a 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -61,7 +61,6 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
> * physical memory, in which case it will be smaller.
> */
> extern int idmap_t0sz;
> -extern u64 idmap_ptrs_per_pgd;
>
> /*
> * Ensure TCR.T0SZ is set to the provided value.
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 7f361bc72d12..53126a35d73c 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -300,6 +300,7 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
> * range in that case, and configure an additional translation level
> * if needed.
> */
> + mov x4, #PTRS_PER_PGD
> idmap_get_t0sz x5
> cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
> b.ge 1f // .. then skip VA range extension
> @@ -319,18 +320,16 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
> #error "Mismatch between VA_BITS and page size/number of translation levels"
> #endif
>
> - mov x4, EXTRA_PTRS
> - create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
> + mov x2, EXTRA_PTRS
> + create_table_entry x0, x3, EXTRA_SHIFT, x2, x5, x6
AFAICS should be safe to use 'x2' here instead of 'x4'.
> #else
> /*
> * If VA_BITS == 48, we don't have to configure an additional
> * translation level, but the top-level table has more entries.
> */
> mov x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
> - str_l x4, idmap_ptrs_per_pgd, x5
> #endif
> 1:
> - ldr_l x4, idmap_ptrs_per_pgd
'x4' will contain default PTRS_PER_PGD if (VA_BITS = EXTRA_SHIFT), otherwise
it will have #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT), but without going via
erstwhile 'idmap_ptrs_per_pgd' variable.
> adr_l x6, __idmap_text_end // __pa(__idmap_text_end)
>
> map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 103bf4ae408d..0f95c91e5a8e 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -44,7 +44,6 @@
> #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
>
> int idmap_t0sz __ro_after_init;
> -u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
>
> #if VA_BITS > 48
> u64 vabits_actual __ro_after_init = VA_BITS_MIN;
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 05/26] arm64: head: simplify page table mapping macros (slightly)
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (3 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 04/26] arm64: head: drop idmap_ptrs_per_pgd Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 06/26] arm64: head: switch to map_memory macro for the extended ID map Ard Biesheuvel
` (21 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Simplify the macros in head.S that are used to set up the early page
tables, by switching to immediates for the number of bits that are
interpreted as the table index at each level. This makes it much
easier to infer from the instruction stream what is going on, and
reduces the number of instructions emitted substantially.
Note that the extended ID map for cases where no additional level needs
to be configured now uses a compile time size as well, which means that
we interpret up to 10 bits as the table index at the root level (for
52-bit physical addressing), without taking into account whether or not
this is supported on the current system. However, those bits can only
be set if we are executing the image from an address that exceeds the
48-bit PA range, and are guaranteed to be cleared otherwise, and given
that we are dealing with a mapping in the lower TTBR0 range of the
address space, the result is therefore the same as if we'd mask off only
6 bits.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 55 ++++++++------------
1 file changed, 22 insertions(+), 33 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 53126a35d73c..9fdde2f9cc0f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -179,31 +179,20 @@ SYM_CODE_END(preserve_boot_args)
* vstart: virtual address of start of range
* vend: virtual address of end of range - we map [vstart, vend]
* shift: shift used to transform virtual address into index
- * ptrs: number of entries in page table
+ * order: #imm 2log(number of entries in page table)
* istart: index in table corresponding to vstart
* iend: index in table corresponding to vend
* count: On entry: how many extra entries were required in previous level, scales
* our end index.
* On exit: returns how many extra entries required for next page table level
*
- * Preserves: vstart, vend, shift, ptrs
+ * Preserves: vstart, vend
* Returns: istart, iend, count
*/
- .macro compute_indices, vstart, vend, shift, ptrs, istart, iend, count
- lsr \iend, \vend, \shift
- mov \istart, \ptrs
- sub \istart, \istart, #1
- and \iend, \iend, \istart // iend = (vend >> shift) & (ptrs - 1)
- mov \istart, \ptrs
- mul \istart, \istart, \count
- add \iend, \iend, \istart // iend += count * ptrs
- // our entries span multiple tables
-
- lsr \istart, \vstart, \shift
- mov \count, \ptrs
- sub \count, \count, #1
- and \istart, \istart, \count
-
+ .macro compute_indices, vstart, vend, shift, order, istart, iend, count
+ ubfx \istart, \vstart, \shift, \order
+ ubfx \iend, \vend, \shift, \order
+ add \iend, \iend, \count, lsl \order
sub \count, \iend, \istart
.endm
@@ -218,38 +207,39 @@ SYM_CODE_END(preserve_boot_args)
* vend: virtual address of end of range - we map [vstart, vend - 1]
* flags: flags to use to map last level entries
* phys: physical address corresponding to vstart - physical memory is contiguous
- * pgds: the number of pgd entries
+ * order: #imm 2log(number of entries in PGD table)
*
* Temporaries: istart, iend, tmp, count, sv - these need to be different registers
* Preserves: vstart, flags
* Corrupts: tbl, rtbl, vend, istart, iend, tmp, count, sv
*/
- .macro map_memory, tbl, rtbl, vstart, vend, flags, phys, pgds, istart, iend, tmp, count, sv
+ .macro map_memory, tbl, rtbl, vstart, vend, flags, phys, order, istart, iend, tmp, count, sv
sub \vend, \vend, #1
add \rtbl, \tbl, #PAGE_SIZE
- mov \sv, \rtbl
mov \count, #0
- compute_indices \vstart, \vend, #PGDIR_SHIFT, \pgds, \istart, \iend, \count
+
+ compute_indices \vstart, \vend, #PGDIR_SHIFT, #\order, \istart, \iend, \count
+ mov \sv, \rtbl
populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
mov \tbl, \sv
- mov \sv, \rtbl
#if SWAPPER_PGTABLE_LEVELS > 3
- compute_indices \vstart, \vend, #PUD_SHIFT, #PTRS_PER_PUD, \istart, \iend, \count
+ compute_indices \vstart, \vend, #PUD_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
+ mov \sv, \rtbl
populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
mov \tbl, \sv
- mov \sv, \rtbl
#endif
#if SWAPPER_PGTABLE_LEVELS > 2
- compute_indices \vstart, \vend, #SWAPPER_TABLE_SHIFT, #PTRS_PER_PMD, \istart, \iend, \count
+ compute_indices \vstart, \vend, #SWAPPER_TABLE_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
+ mov \sv, \rtbl
populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
mov \tbl, \sv
#endif
- compute_indices \vstart, \vend, #SWAPPER_BLOCK_SHIFT, #PTRS_PER_PTE, \istart, \iend, \count
- bic \count, \phys, #SWAPPER_BLOCK_SIZE - 1
- populate_entries \tbl, \count, \istart, \iend, \flags, #SWAPPER_BLOCK_SIZE, \tmp
+ compute_indices \vstart, \vend, #SWAPPER_BLOCK_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
+ bic \rtbl, \phys, #SWAPPER_BLOCK_SIZE - 1
+ populate_entries \tbl, \rtbl, \istart, \iend, \flags, #SWAPPER_BLOCK_SIZE, \tmp
.endm
/*
@@ -300,12 +290,12 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
* range in that case, and configure an additional translation level
* if needed.
*/
- mov x4, #PTRS_PER_PGD
idmap_get_t0sz x5
cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
b.ge 1f // .. then skip VA range extension
#if (VA_BITS < 48)
+#define IDMAP_PGD_ORDER (VA_BITS - PGDIR_SHIFT)
#define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
#define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
@@ -323,16 +313,16 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
mov x2, EXTRA_PTRS
create_table_entry x0, x3, EXTRA_SHIFT, x2, x5, x6
#else
+#define IDMAP_PGD_ORDER (PHYS_MASK_SHIFT - PGDIR_SHIFT)
/*
* If VA_BITS == 48, we don't have to configure an additional
* translation level, but the top-level table has more entries.
*/
- mov x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
#endif
1:
adr_l x6, __idmap_text_end // __pa(__idmap_text_end)
- map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14
+ map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14
/*
* Map the kernel image (starting with PHYS_OFFSET).
@@ -340,13 +330,12 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
adrp x0, init_pg_dir
mov_q x5, KIMAGE_VADDR // compile time __va(_text)
add x5, x5, x23 // add KASLR displacement
- mov x4, PTRS_PER_PGD
adrp x6, _end // runtime __pa(_end)
adrp x3, _text // runtime __pa(_text)
sub x6, x6, x3 // _end - _text
add x6, x6, x5 // runtime __va(_end)
- map_memory x0, x1, x5, x6, x7, x3, x4, x10, x11, x12, x13, x14
+ map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
/*
* Since the page tables have been populated with non-cacheable
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 06/26] arm64: head: switch to map_memory macro for the extended ID map
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (4 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 05/26] arm64: head: simplify page table mapping macros (slightly) Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 07/26] arm64: head: split off idmap creation code Ard Biesheuvel
` (20 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
In a future patch, we will start using an ID map that covers the entire
image, rather than a single page. This means that we need to deal with
the pathological case of an extended ID map where the kernel image does
not fit neatly inside a single entry at the root level, which means we
will need to create additional table entries and map additional pages
for page tables.
The existing map_memory macro already takes care of most of that, so
let's just extend it to deal with this case as well. While at it, drop
the conditional branch on the value of T0SZ: we don't set the variable
anymore in the entry code, and so we can just let the map_memory macro
deal with the case where the output address exceeds VA_BITS.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 76 ++++++++++----------
1 file changed, 37 insertions(+), 39 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 9fdde2f9cc0f..eb54c0289c8a 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -122,29 +122,6 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
b dcache_inval_poc // tail call
SYM_CODE_END(preserve_boot_args)
-/*
- * Macro to create a table entry to the next page.
- *
- * tbl: page table address
- * virt: virtual address
- * shift: #imm page table shift
- * ptrs: #imm pointers per table page
- *
- * Preserves: virt
- * Corrupts: ptrs, tmp1, tmp2
- * Returns: tbl -> next level table page address
- */
- .macro create_table_entry, tbl, virt, shift, ptrs, tmp1, tmp2
- add \tmp1, \tbl, #PAGE_SIZE
- phys_to_pte \tmp2, \tmp1
- orr \tmp2, \tmp2, #PMD_TYPE_TABLE // address of next table and entry type
- lsr \tmp1, \virt, #\shift
- sub \ptrs, \ptrs, #1
- and \tmp1, \tmp1, \ptrs // table index
- str \tmp2, [\tbl, \tmp1, lsl #3]
- add \tbl, \tbl, #PAGE_SIZE // next level table page
- .endm
-
/*
* Macro to populate page table entries, these entries can be pointers to the next level
* or last level entries pointing to physical memory.
@@ -209,15 +186,27 @@ SYM_CODE_END(preserve_boot_args)
* phys: physical address corresponding to vstart - physical memory is contiguous
* order: #imm 2log(number of entries in PGD table)
*
+ * If extra_shift is set, an extra level will be populated if the end address does
+ * not fit in 'extra_shift' bits. This assumes vend is in the TTBR0 range.
+ *
* Temporaries: istart, iend, tmp, count, sv - these need to be different registers
* Preserves: vstart, flags
* Corrupts: tbl, rtbl, vend, istart, iend, tmp, count, sv
*/
- .macro map_memory, tbl, rtbl, vstart, vend, flags, phys, order, istart, iend, tmp, count, sv
+ .macro map_memory, tbl, rtbl, vstart, vend, flags, phys, order, istart, iend, tmp, count, sv, extra_shift
sub \vend, \vend, #1
add \rtbl, \tbl, #PAGE_SIZE
mov \count, #0
+ .ifnb \extra_shift
+ tst \vend, #~((1 << (\extra_shift)) - 1)
+ b.eq .L_\@
+ compute_indices \vstart, \vend, #\extra_shift, #(PAGE_SHIFT - 3), \istart, \iend, \count
+ mov \sv, \rtbl
+ populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
+ mov \tbl, \sv
+ .endif
+.L_\@:
compute_indices \vstart, \vend, #PGDIR_SHIFT, #\order, \istart, \iend, \count
mov \sv, \rtbl
populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
@@ -284,20 +273,32 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
adrp x3, __idmap_text_start // __pa(__idmap_text_start)
/*
- * VA_BITS may be too small to allow for an ID mapping to be created
- * that covers system RAM if that is located sufficiently high in the
- * physical address space. So for the ID map, use an extended virtual
- * range in that case, and configure an additional translation level
- * if needed.
+ * The ID map carries a 1:1 mapping of the physical address range
+ * covered by the loaded image, which could be anywhere in DRAM. This
+ * means that the required size of the VA (== PA) space is decided at
+ * boot time, and could be more than the configured size of the VA
+ * space for ordinary kernel and user space mappings.
+ *
+ * There are three cases to consider here:
+ * - 39 <= VA_BITS < 48, and the ID map needs up to 48 VA bits to cover
+ * the placement of the image. In this case, we configure one extra
+ * level of translation on the fly for the ID map only. (This case
+ * also covers 42-bit VA/52-bit PA on 64k pages).
+ *
+ * - VA_BITS == 48, and the ID map needs more than 48 VA bits. This can
+ * only happen when using 64k pages, in which case we need to extend
+ * the root level table rather than add a level. Note that we can
+ * treat this case as 'always extended' as long as we take care not
+ * to program an unsupported T0SZ value into the TCR register.
+ *
+ * - Combinations that would require two additional levels of
+ * translation are not supported, e.g., VA_BITS==36 on 16k pages, or
+ * VA_BITS==39/4k pages with 5-level paging, where the input address
+ * requires more than 47 or 48 bits, respectively.
*/
- idmap_get_t0sz x5
- cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
- b.ge 1f // .. then skip VA range extension
-
#if (VA_BITS < 48)
#define IDMAP_PGD_ORDER (VA_BITS - PGDIR_SHIFT)
#define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
-#define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
/*
* If VA_BITS < 48, we have to configure an additional table level.
@@ -309,20 +310,17 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
#if VA_BITS != EXTRA_SHIFT
#error "Mismatch between VA_BITS and page size/number of translation levels"
#endif
-
- mov x2, EXTRA_PTRS
- create_table_entry x0, x3, EXTRA_SHIFT, x2, x5, x6
#else
#define IDMAP_PGD_ORDER (PHYS_MASK_SHIFT - PGDIR_SHIFT)
+#define EXTRA_SHIFT
/*
* If VA_BITS == 48, we don't have to configure an additional
* translation level, but the top-level table has more entries.
*/
#endif
-1:
adr_l x6, __idmap_text_end // __pa(__idmap_text_end)
- map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14
+ map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
/*
* Map the kernel image (starting with PHYS_OFFSET).
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 07/26] arm64: head: split off idmap creation code
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (5 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 06/26] arm64: head: switch to map_memory macro for the extended ID map Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 08/26] arm64: kernel: drop unnecessary PoC cache clean+invalidate Ard Biesheuvel
` (19 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Split off the creation of the ID map page tables, so that we can avoid
running it again unnecessarily when KASLR is in effect (which only
randomizes the virtual placement). This will permit us to drop some
explicit cache maintenance to the PoC which was necessary because the
cache invalidation being performed on some global variables might
otherwise clobber unrelated variables that happen to share a cacheline.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 101 ++++++++++----------
1 file changed, 52 insertions(+), 49 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index eb54c0289c8a..1cbc52097bf9 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -84,7 +84,7 @@
* Register Scope Purpose
* x21 primary_entry() .. start_kernel() FDT pointer passed at boot in x0
* x23 primary_entry() .. start_kernel() physical misalignment/KASLR offset
- * x28 __create_page_tables() callee preserved temp register
+ * x28 clear_page_tables() callee preserved temp register
* x19/x20 __primary_switch() callee preserved temp registers
* x24 __primary_switch() .. relocate_kernel() current RELR displacement
*/
@@ -94,7 +94,10 @@ SYM_CODE_START(primary_entry)
adrp x23, __PHYS_OFFSET
and x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
bl set_cpu_boot_mode_flag
- bl __create_page_tables
+ bl clear_page_tables
+ bl create_idmap
+ bl create_kernel_mapping
+
/*
* The following calls CPU setup code, see arch/arm64/mm/proc.S for
* details.
@@ -122,6 +125,35 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
b dcache_inval_poc // tail call
SYM_CODE_END(preserve_boot_args)
+SYM_FUNC_START_LOCAL(clear_page_tables)
+ mov x28, lr
+
+ /*
+ * Invalidate the init page tables to avoid potential dirty cache lines
+ * being evicted. Other page tables are allocated in rodata as part of
+ * the kernel image, and thus are clean to the PoC per the boot
+ * protocol.
+ */
+ adrp x0, init_pg_dir
+ adrp x1, init_pg_end
+ bl dcache_inval_poc
+
+ /*
+ * Clear the init page tables.
+ */
+ adrp x0, init_pg_dir
+ adrp x1, init_pg_end
+ sub x1, x1, x0
+1: stp xzr, xzr, [x0], #16
+ stp xzr, xzr, [x0], #16
+ stp xzr, xzr, [x0], #16
+ stp xzr, xzr, [x0], #16
+ subs x1, x1, #64
+ b.ne 1b
+
+ ret x28
+SYM_FUNC_END(clear_page_tables)
+
/*
* Macro to populate page table entries, these entries can be pointers to the next level
* or last level entries pointing to physical memory.
@@ -231,44 +263,8 @@ SYM_CODE_END(preserve_boot_args)
populate_entries \tbl, \rtbl, \istart, \iend, \flags, #SWAPPER_BLOCK_SIZE, \tmp
.endm
-/*
- * Setup the initial page tables. We only setup the barest amount which is
- * required to get the kernel running. The following sections are required:
- * - identity mapping to enable the MMU (low address, TTBR0)
- * - first few MB of the kernel linear mapping to jump to once the MMU has
- * been enabled
- */
-SYM_FUNC_START_LOCAL(__create_page_tables)
- mov x28, lr
- /*
- * Invalidate the init page tables to avoid potential dirty cache lines
- * being evicted. Other page tables are allocated in rodata as part of
- * the kernel image, and thus are clean to the PoC per the boot
- * protocol.
- */
- adrp x0, init_pg_dir
- adrp x1, init_pg_end
- bl dcache_inval_poc
-
- /*
- * Clear the init page tables.
- */
- adrp x0, init_pg_dir
- adrp x1, init_pg_end
- sub x1, x1, x0
-1: stp xzr, xzr, [x0], #16
- stp xzr, xzr, [x0], #16
- stp xzr, xzr, [x0], #16
- stp xzr, xzr, [x0], #16
- subs x1, x1, #64
- b.ne 1b
-
- mov x7, SWAPPER_MM_MMUFLAGS
-
- /*
- * Create the identity mapping.
- */
+SYM_FUNC_START_LOCAL(create_idmap)
adrp x0, idmap_pg_dir
adrp x3, __idmap_text_start // __pa(__idmap_text_start)
@@ -319,12 +315,23 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
*/
#endif
adr_l x6, __idmap_text_end // __pa(__idmap_text_end)
+ mov x7, SWAPPER_MM_MMUFLAGS
map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
/*
- * Map the kernel image (starting with PHYS_OFFSET).
+ * Since the page tables have been populated with non-cacheable
+ * accesses (MMU disabled), invalidate those tables again to
+ * remove any speculatively loaded cache lines.
*/
+ dmb sy
+
+ adrp x0, idmap_pg_dir
+ adrp x1, idmap_pg_end
+ b dcache_inval_poc // tail call
+SYM_FUNC_END(create_idmap)
+
+SYM_FUNC_START_LOCAL(create_kernel_mapping)
adrp x0, init_pg_dir
mov_q x5, KIMAGE_VADDR // compile time __va(_text)
add x5, x5, x23 // add KASLR displacement
@@ -332,6 +339,7 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
adrp x3, _text // runtime __pa(_text)
sub x6, x6, x3 // _end - _text
add x6, x6, x5 // runtime __va(_end)
+ mov x7, SWAPPER_MM_MMUFLAGS
map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
@@ -342,16 +350,10 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
*/
dmb sy
- adrp x0, idmap_pg_dir
- adrp x1, idmap_pg_end
- bl dcache_inval_poc
-
adrp x0, init_pg_dir
adrp x1, init_pg_end
- bl dcache_inval_poc
-
- ret x28
-SYM_FUNC_END(__create_page_tables)
+ b dcache_inval_poc // tail call
+SYM_FUNC_END(create_kernel_mapping)
/*
* Initialize CPU registers with task-specific and cpu-specific context.
@@ -836,7 +838,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
pre_disable_mmu_workaround
msr sctlr_el1, x20 // disable the MMU
isb
- bl __create_page_tables // recreate kernel mapping
+ bl clear_page_tables
+ bl create_kernel_mapping // Recreate kernel mapping
tlbi vmalle1 // Remove any stale TLB entries
dsb nsh
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 08/26] arm64: kernel: drop unnecessary PoC cache clean+invalidate
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (6 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 07/26] arm64: head: split off idmap creation code Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-15 4:32 ` Anshuman Khandual
2022-06-13 14:45 ` [PATCH v4 09/26] arm64: head: pass ID map root table address to __enable_mmu() Ard Biesheuvel
` (18 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Some early boot code runs before the virtual placement of the kernel is
finalized, and we used to go back to the very start and recreate the ID
map along with the page tables describing the virtual kernel mapping,
and this involved setting some global variables with the caches off.
In order to ensure that global state created by the KASLR code is not
corrupted by the cache invalidation that occurs in that case, we needed
to clean those global variables to the PoC explicitly.
This is no longer needed now that the ID map is created only once (and
the associated global variable updates are no longer repeated). So drop
the cache maintenance that is no longer necessary.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/kaslr.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 418b2bba1521..d5542666182f 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -13,7 +13,6 @@
#include <linux/pgtable.h>
#include <linux/random.h>
-#include <asm/cacheflush.h>
#include <asm/fixmap.h>
#include <asm/kernel-pgtable.h>
#include <asm/memory.h>
@@ -72,9 +71,6 @@ u64 __init kaslr_early_init(void)
* we end up running with module randomization disabled.
*/
module_alloc_base = (u64)_etext - MODULES_VSIZE;
- dcache_clean_inval_poc((unsigned long)&module_alloc_base,
- (unsigned long)&module_alloc_base +
- sizeof(module_alloc_base));
/*
* Try to map the FDT early. If this fails, we simply bail,
@@ -174,13 +170,6 @@ u64 __init kaslr_early_init(void)
module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21;
module_alloc_base &= PAGE_MASK;
- dcache_clean_inval_poc((unsigned long)&module_alloc_base,
- (unsigned long)&module_alloc_base +
- sizeof(module_alloc_base));
- dcache_clean_inval_poc((unsigned long)&memstart_offset_seed,
- (unsigned long)&memstart_offset_seed +
- sizeof(memstart_offset_seed));
-
return offset;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 08/26] arm64: kernel: drop unnecessary PoC cache clean+invalidate
2022-06-13 14:45 ` [PATCH v4 08/26] arm64: kernel: drop unnecessary PoC cache clean+invalidate Ard Biesheuvel
@ 2022-06-15 4:32 ` Anshuman Khandual
0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2022-06-15 4:32 UTC (permalink / raw)
To: Ard Biesheuvel, linux-arm-kernel
Cc: linux-hardening, Marc Zyngier, Will Deacon, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown
On 6/13/22 20:15, Ard Biesheuvel wrote:
> Some early boot code runs before the virtual placement of the kernel is
> finalized, and we used to go back to the very start and recreate the ID
> map along with the page tables describing the virtual kernel mapping,
> and this involved setting some global variables with the caches off.
>
> In order to ensure that global state created by the KASLR code is not
> corrupted by the cache invalidation that occurs in that case, we needed
> to clean those global variables to the PoC explicitly.
>
> This is no longer needed now that the ID map is created only once (and
> the associated global variable updates are no longer repeated). So drop
> the cache maintenance that is no longer necessary.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/kernel/kaslr.c | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
> index 418b2bba1521..d5542666182f 100644
> --- a/arch/arm64/kernel/kaslr.c
> +++ b/arch/arm64/kernel/kaslr.c
> @@ -13,7 +13,6 @@
> #include <linux/pgtable.h>
> #include <linux/random.h>
>
> -#include <asm/cacheflush.h>
> #include <asm/fixmap.h>
> #include <asm/kernel-pgtable.h>
> #include <asm/memory.h>
> @@ -72,9 +71,6 @@ u64 __init kaslr_early_init(void)
> * we end up running with module randomization disabled.
> */
> module_alloc_base = (u64)_etext - MODULES_VSIZE;
> - dcache_clean_inval_poc((unsigned long)&module_alloc_base,
> - (unsigned long)&module_alloc_base +
> - sizeof(module_alloc_base));
>
> /*
> * Try to map the FDT early. If this fails, we simply bail,
> @@ -174,13 +170,6 @@ u64 __init kaslr_early_init(void)
> module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21;
> module_alloc_base &= PAGE_MASK;
>
> - dcache_clean_inval_poc((unsigned long)&module_alloc_base,
> - (unsigned long)&module_alloc_base +
> - sizeof(module_alloc_base));
> - dcache_clean_inval_poc((unsigned long)&memstart_offset_seed,
> - (unsigned long)&memstart_offset_seed +
> - sizeof(memstart_offset_seed));
> -
> return offset;
> }
>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 09/26] arm64: head: pass ID map root table address to __enable_mmu()
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (7 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 08/26] arm64: kernel: drop unnecessary PoC cache clean+invalidate Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 10/26] arm64: mm: provide idmap pointer to cpu_replace_ttbr1() Ard Biesheuvel
` (17 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
We will be adding an initial ID map that covers the entire kernel image,
so we will pass the actual ID map root table to use to __enable_mmu(),
rather than hard code it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 14 ++++++++------
arch/arm64/kernel/sleep.S | 1 +
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 1cbc52097bf9..70c462bbd6bf 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -595,6 +595,7 @@ SYM_FUNC_START_LOCAL(secondary_startup)
bl __cpu_secondary_check52bitva
bl __cpu_setup // initialise processor
adrp x1, swapper_pg_dir
+ adrp x2, idmap_pg_dir
bl __enable_mmu
ldr x8, =__secondary_switched
br x8
@@ -648,6 +649,7 @@ SYM_FUNC_END(__secondary_too_slow)
*
* x0 = SCTLR_EL1 value for turning on the MMU.
* x1 = TTBR1_EL1 value
+ * x2 = ID map root table address
*
* Returns to the caller via x30/lr. This requires the caller to be covered
* by the .idmap.text section.
@@ -656,14 +658,13 @@ SYM_FUNC_END(__secondary_too_slow)
* If it isn't, park the CPU
*/
SYM_FUNC_START(__enable_mmu)
- mrs x2, ID_AA64MMFR0_EL1
- ubfx x2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
- cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
+ mrs x3, ID_AA64MMFR0_EL1
+ ubfx x3, x3, #ID_AA64MMFR0_TGRAN_SHIFT, 4
+ cmp x3, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
b.lt __no_granule_support
- cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
+ cmp x3, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
b.gt __no_granule_support
- update_early_cpu_boot_status 0, x2, x3
- adrp x2, idmap_pg_dir
+ update_early_cpu_boot_status 0, x3, x4
phys_to_ttbr x1, x1
phys_to_ttbr x2, x2
msr ttbr0_el1, x2 // load TTBR0
@@ -819,6 +820,7 @@ SYM_FUNC_START_LOCAL(__primary_switch)
#endif
adrp x1, init_pg_dir
+ adrp x2, idmap_pg_dir
bl __enable_mmu
#ifdef CONFIG_RELOCATABLE
#ifdef CONFIG_RELR
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 4ea9392f86e0..e36b09d942f7 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -104,6 +104,7 @@ SYM_CODE_START(cpu_resume)
bl __cpu_setup
/* enable the MMU early - so we can access sleep_save_stash by va */
adrp x1, swapper_pg_dir
+ adrp x2, idmap_pg_dir
bl __enable_mmu
ldr x8, =_cpu_resume
br x8
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 10/26] arm64: mm: provide idmap pointer to cpu_replace_ttbr1()
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (8 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 09/26] arm64: head: pass ID map root table address to __enable_mmu() Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 11/26] arm64: head: add helper function to remap regions in early page tables Ard Biesheuvel
` (16 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
In preparation for changing the way we initialize the permanent ID map,
update cpu_replace_ttbr1() so we can use it with the initial ID map as
well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/mmu_context.h | 13 +++++++++----
arch/arm64/kernel/cpufeature.c | 2 +-
arch/arm64/kernel/suspend.c | 2 +-
arch/arm64/mm/kasan_init.c | 4 ++--
arch/arm64/mm/mmu.c | 2 +-
5 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 7b387c3b312a..c7ccd82db1d2 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -105,13 +105,18 @@ static inline void cpu_uninstall_idmap(void)
cpu_switch_mm(mm->pgd, mm);
}
-static inline void cpu_install_idmap(void)
+static inline void __cpu_install_idmap(pgd_t *idmap)
{
cpu_set_reserved_ttbr0();
local_flush_tlb_all();
cpu_set_idmap_tcr_t0sz();
- cpu_switch_mm(lm_alias(idmap_pg_dir), &init_mm);
+ cpu_switch_mm(lm_alias(idmap), &init_mm);
+}
+
+static inline void cpu_install_idmap(void)
+{
+ __cpu_install_idmap(idmap_pg_dir);
}
/*
@@ -142,7 +147,7 @@ static inline void cpu_install_ttbr0(phys_addr_t ttbr0, unsigned long t0sz)
* Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
* avoiding the possibility of conflicting TLB entries being allocated.
*/
-static inline void __nocfi cpu_replace_ttbr1(pgd_t *pgdp)
+static inline void __nocfi cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap)
{
typedef void (ttbr_replace_func)(phys_addr_t);
extern ttbr_replace_func idmap_cpu_replace_ttbr1;
@@ -165,7 +170,7 @@ static inline void __nocfi cpu_replace_ttbr1(pgd_t *pgdp)
replace_phys = (void *)__pa_symbol(function_nocfi(idmap_cpu_replace_ttbr1));
- cpu_install_idmap();
+ __cpu_install_idmap(idmap);
replace_phys(ttbr1);
cpu_uninstall_idmap();
}
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c2a64c9e451e..f37d8f69c339 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -3275,7 +3275,7 @@ subsys_initcall_sync(init_32bit_el0_mask);
static void __maybe_unused cpu_enable_cnp(struct arm64_cpu_capabilities const *cap)
{
- cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
+ cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir);
}
/*
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index 2b0887e58a7c..9135fe0f3df5 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -52,7 +52,7 @@ void notrace __cpu_suspend_exit(void)
/* Restore CnP bit in TTBR1_EL1 */
if (system_supports_cnp())
- cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
+ cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir);
/*
* PSTATE was not saved over suspend/resume, re-enable any detected
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index c12cd700598f..e969e68de005 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -236,7 +236,7 @@ static void __init kasan_init_shadow(void)
*/
memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir));
dsb(ishst);
- cpu_replace_ttbr1(lm_alias(tmp_pg_dir));
+ cpu_replace_ttbr1(lm_alias(tmp_pg_dir), idmap_pg_dir);
clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
@@ -280,7 +280,7 @@ static void __init kasan_init_shadow(void)
PAGE_KERNEL_RO));
memset(kasan_early_shadow_page, KASAN_SHADOW_INIT, PAGE_SIZE);
- cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
+ cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir);
}
static void __init kasan_init_depth(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0f95c91e5a8e..74f9982c30a7 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -792,7 +792,7 @@ void __init paging_init(void)
pgd_clear_fixmap();
- cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
+ cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir);
init_mm.pgd = swapper_pg_dir;
memblock_phys_free(__pa_symbol(init_pg_dir),
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 11/26] arm64: head: add helper function to remap regions in early page tables
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (9 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 10/26] arm64: mm: provide idmap pointer to cpu_replace_ttbr1() Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 12/26] arm64: head: cover entire kernel image in initial ID map Ard Biesheuvel
` (15 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
The asm macros used to create the initial ID map and kernel mappings
don't support randomly remapping parts of the address space after it has
been populated. What we can do, however, given that all block or page
mappings are created at the final level, is take a subset of the mapped
range and update its attributes or output address. This will permit us
to make parts of these page tables read-only, or remap a part of it to
cover the device tree.
So add a helper that encapsulates this.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 33 ++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 70c462bbd6bf..7397555f8437 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -263,6 +263,39 @@ SYM_FUNC_END(clear_page_tables)
populate_entries \tbl, \rtbl, \istart, \iend, \flags, #SWAPPER_BLOCK_SIZE, \tmp
.endm
+/*
+ * Remap a subregion created with the map_memory macro with modified attributes
+ * or output address. The entire remapped region must have been covered in the
+ * invocation of map_memory.
+ *
+ * x0: last level table address (returned in first argument to map_memory)
+ * x1: start VA of the existing mapping
+ * x2: start VA of the region to update
+ * x3: end VA of the region to update (exclusive)
+ * x4: start PA associated with the region to update
+ * x5: attributes to set on the updated region
+ * x6: order of the last level mappings
+ */
+SYM_FUNC_START_LOCAL(remap_region)
+ sub x3, x3, #1 // make end inclusive
+
+ // Get the index offset for the start of the last level table
+ lsr x1, x1, x6
+ bfi x1, xzr, #0, #PAGE_SHIFT - 3
+
+ // Derive the start and end indexes into the last level table
+ // associated with the provided region
+ lsr x2, x2, x6
+ lsr x3, x3, x6
+ sub x2, x2, x1
+ sub x3, x3, x1
+
+ mov x1, #1
+ lsl x6, x1, x6 // block size at this level
+
+ populate_entries x0, x4, x2, x3, x5, x6, x7
+ ret
+SYM_FUNC_END(remap_region)
SYM_FUNC_START_LOCAL(create_idmap)
adrp x0, idmap_pg_dir
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 12/26] arm64: head: cover entire kernel image in initial ID map
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (10 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 11/26] arm64: head: add helper function to remap regions in early page tables Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 13/26] arm64: head: use relative references to the RELA and RELR tables Ard Biesheuvel
` (14 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
As a first step towards avoiding the need to create, tear down and
recreate the kernel virtual mapping with MMU and caches disabled, start
by expanding the ID map so it covers the page tables as well as all
executable code. This will allow us to populate the page tables with the
MMU and caches on, and call KASLR init code before setting up the
virtual mapping.
Since this ID map is only needed at boot, create it as a temporary set
of page tables, and populate the permanent ID map after enabling the MMU
and caches. While at it, switch to read-only attributes for the where
possible, as writable permissions are only needed for the initial kernel
page tables. Note that on 4k granule configurations, the permanent ID
map will now be reduced to a single page rather than a 2M block mapping.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/kernel-pgtable.h | 16 ++++++---
arch/arm64/kernel/head.S | 31 +++++++++++------
arch/arm64/kernel/vmlinux.lds.S | 7 ++--
arch/arm64/mm/mmu.c | 35 +++++++++++++++++++-
arch/arm64/mm/proc.S | 8 +++--
5 files changed, 76 insertions(+), 21 deletions(-)
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 96dc0f7da258..5395e5a04f35 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -35,10 +35,8 @@
*/
#if ARM64_KERNEL_USES_PMD_MAPS
#define SWAPPER_PGTABLE_LEVELS (CONFIG_PGTABLE_LEVELS - 1)
-#define IDMAP_PGTABLE_LEVELS (ARM64_HW_PGTABLE_LEVELS(PHYS_MASK_SHIFT) - 1)
#else
#define SWAPPER_PGTABLE_LEVELS (CONFIG_PGTABLE_LEVELS)
-#define IDMAP_PGTABLE_LEVELS (ARM64_HW_PGTABLE_LEVELS(PHYS_MASK_SHIFT))
#endif
@@ -87,7 +85,13 @@
+ EARLY_PUDS((vstart), (vend)) /* each PUD needs a next level page table */ \
+ EARLY_PMDS((vstart), (vend))) /* each PMD needs a next level page table */
#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end))
-#define IDMAP_DIR_SIZE (IDMAP_PGTABLE_LEVELS * PAGE_SIZE)
+
+/* the initial ID map may need two extra pages if it needs to be extended */
+#if VA_BITS < 48
+#define INIT_IDMAP_DIR_SIZE (INIT_DIR_SIZE + (2 * PAGE_SIZE))
+#else
+#define INIT_IDMAP_DIR_SIZE INIT_DIR_SIZE
+#endif
/* Initial memory map size */
#if ARM64_KERNEL_USES_PMD_MAPS
@@ -107,9 +111,11 @@
#define SWAPPER_PMD_FLAGS (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
#if ARM64_KERNEL_USES_PMD_MAPS
-#define SWAPPER_MM_MMUFLAGS (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
+#define SWAPPER_RW_MMUFLAGS (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
+#define SWAPPER_RX_MMUFLAGS (SWAPPER_RW_MMUFLAGS | PMD_SECT_RDONLY)
#else
-#define SWAPPER_MM_MMUFLAGS (PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
+#define SWAPPER_RW_MMUFLAGS (PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
+#define SWAPPER_RX_MMUFLAGS (SWAPPER_RW_MMUFLAGS | PTE_RDONLY)
#endif
/*
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 7397555f8437..93734c91a29a 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -87,6 +87,7 @@
* x28 clear_page_tables() callee preserved temp register
* x19/x20 __primary_switch() callee preserved temp registers
* x24 __primary_switch() .. relocate_kernel() current RELR displacement
+ * x28 create_idmap() callee preserved temp register
*/
SYM_CODE_START(primary_entry)
bl preserve_boot_args
@@ -298,9 +299,7 @@ SYM_FUNC_START_LOCAL(remap_region)
SYM_FUNC_END(remap_region)
SYM_FUNC_START_LOCAL(create_idmap)
- adrp x0, idmap_pg_dir
- adrp x3, __idmap_text_start // __pa(__idmap_text_start)
-
+ mov x28, lr
/*
* The ID map carries a 1:1 mapping of the physical address range
* covered by the loaded image, which could be anywhere in DRAM. This
@@ -347,11 +346,22 @@ SYM_FUNC_START_LOCAL(create_idmap)
* translation level, but the top-level table has more entries.
*/
#endif
- adr_l x6, __idmap_text_end // __pa(__idmap_text_end)
- mov x7, SWAPPER_MM_MMUFLAGS
+ adrp x0, init_idmap_pg_dir
+ adrp x3, _text
+ adrp x6, _end
+ mov x7, SWAPPER_RX_MMUFLAGS
map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
+ /* Remap the kernel page tables r/w in the ID map */
+ adrp x1, _text
+ adrp x2, init_pg_dir
+ adrp x3, init_pg_end
+ bic x4, x2, #SWAPPER_BLOCK_SIZE - 1
+ mov x5, SWAPPER_RW_MMUFLAGS
+ mov x6, #SWAPPER_BLOCK_SHIFT
+ bl remap_region
+
/*
* Since the page tables have been populated with non-cacheable
* accesses (MMU disabled), invalidate those tables again to
@@ -359,9 +369,10 @@ SYM_FUNC_START_LOCAL(create_idmap)
*/
dmb sy
- adrp x0, idmap_pg_dir
- adrp x1, idmap_pg_end
- b dcache_inval_poc // tail call
+ adrp x0, init_idmap_pg_dir
+ adrp x1, init_idmap_pg_end
+ bl dcache_inval_poc
+ ret x28
SYM_FUNC_END(create_idmap)
SYM_FUNC_START_LOCAL(create_kernel_mapping)
@@ -372,7 +383,7 @@ SYM_FUNC_START_LOCAL(create_kernel_mapping)
adrp x3, _text // runtime __pa(_text)
sub x6, x6, x3 // _end - _text
add x6, x6, x5 // runtime __va(_end)
- mov x7, SWAPPER_MM_MMUFLAGS
+ mov x7, SWAPPER_RW_MMUFLAGS
map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
@@ -853,7 +864,7 @@ SYM_FUNC_START_LOCAL(__primary_switch)
#endif
adrp x1, init_pg_dir
- adrp x2, idmap_pg_dir
+ adrp x2, init_idmap_pg_dir
bl __enable_mmu
#ifdef CONFIG_RELOCATABLE
#ifdef CONFIG_RELR
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 8a078c0ee140..0ce3a7c9f8c4 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -199,8 +199,7 @@ SECTIONS
}
idmap_pg_dir = .;
- . += IDMAP_DIR_SIZE;
- idmap_pg_end = .;
+ . += PAGE_SIZE;
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
tramp_pg_dir = .;
@@ -236,6 +235,10 @@ SECTIONS
__inittext_end = .;
__initdata_begin = .;
+ init_idmap_pg_dir = .;
+ . += INIT_IDMAP_DIR_SIZE;
+ init_idmap_pg_end = .;
+
.init.data : {
INIT_DATA
INIT_SETUP(16)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 74f9982c30a7..ed3a4b87529b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -769,9 +769,40 @@ static void __init map_kernel(pgd_t *pgdp)
kasan_copy_shadow(pgdp);
}
+static void __init create_idmap(void)
+{
+ u64 start = __pa_symbol(__idmap_text_start);
+ u64 size = __pa_symbol(__idmap_text_end) - start;
+ pgd_t *pgd = idmap_pg_dir;
+ u64 pgd_phys;
+
+ /* check if we need an additional level of translation */
+ if (VA_BITS < 48 && idmap_t0sz < TCR_T0SZ(VA_BITS_MIN)) {
+ pgd_phys = early_pgtable_alloc(PAGE_SHIFT);
+ set_pgd(&idmap_pg_dir[start >> VA_BITS],
+ __pgd(pgd_phys | P4D_TYPE_TABLE));
+ pgd = __va(pgd_phys);
+ }
+ __create_pgd_mapping(pgd, start, start, size, PAGE_KERNEL_ROX,
+ early_pgtable_alloc, 0);
+
+ if (IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) {
+ extern u32 __idmap_kpti_flag;
+ u64 pa = __pa_symbol(&__idmap_kpti_flag);
+
+ /*
+ * The KPTI G-to-nG conversion code needs a read-write mapping
+ * of its synchronization flag in the ID map.
+ */
+ __create_pgd_mapping(pgd, pa, pa, sizeof(u32), PAGE_KERNEL,
+ early_pgtable_alloc, 0);
+ }
+}
+
void __init paging_init(void)
{
pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
+ extern pgd_t init_idmap_pg_dir[];
#if VA_BITS > 48
if (cpuid_feature_extract_unsigned_field(
@@ -792,13 +823,15 @@ void __init paging_init(void)
pgd_clear_fixmap();
- cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir);
+ cpu_replace_ttbr1(lm_alias(swapper_pg_dir), init_idmap_pg_dir);
init_mm.pgd = swapper_pg_dir;
memblock_phys_free(__pa_symbol(init_pg_dir),
__pa_symbol(init_pg_end) - __pa_symbol(init_pg_dir));
memblock_allow_resize();
+
+ create_idmap();
}
/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 97cd67697212..493b8ffc9be5 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -249,8 +249,10 @@ SYM_FUNC_END(idmap_cpu_replace_ttbr1)
*
* Called exactly once from stop_machine context by each CPU found during boot.
*/
-__idmap_kpti_flag:
- .long 1
+ .pushsection ".data", "aw", %progbits
+SYM_DATA(__idmap_kpti_flag, .long 1)
+ .popsection
+
SYM_FUNC_START(idmap_kpti_install_ng_mappings)
cpu .req w0
temp_pte .req x0
@@ -273,7 +275,7 @@ SYM_FUNC_START(idmap_kpti_install_ng_mappings)
mov x5, x3 // preserve temp_pte arg
mrs swapper_ttb, ttbr1_el1
- adr flag_ptr, __idmap_kpti_flag
+ adr_l flag_ptr, __idmap_kpti_flag
cbnz cpu, __idmap_kpti_secondary
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 13/26] arm64: head: use relative references to the RELA and RELR tables
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (11 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 12/26] arm64: head: cover entire kernel image in initial ID map Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 14/26] arm64: head: create a temporary FDT mapping in the initial ID map Ard Biesheuvel
` (13 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Formerly, we had to access the RELA and RELR tables via the kernel
mapping that was being relocated, and so deriving the start and end
addresses using ADRP/ADD references was not possible, as the relocation
code runs from the ID map.
Now that we map the entire kernel image via the ID map, we can simplify
this, and just load the entries via the ID map as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 13 ++++---------
arch/arm64/kernel/vmlinux.lds.S | 12 ++++--------
2 files changed, 8 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 93734c91a29a..f1497f7b4da0 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -757,13 +757,10 @@ SYM_FUNC_START_LOCAL(__relocate_kernel)
* Iterate over each entry in the relocation table, and apply the
* relocations in place.
*/
- ldr w9, =__rela_offset // offset to reloc table
- ldr w10, =__rela_size // size of reloc table
-
+ adr_l x9, __rela_start
+ adr_l x10, __rela_end
mov_q x11, KIMAGE_VADDR // default virtual offset
add x11, x11, x23 // actual virtual offset
- add x9, x9, x11 // __va(.rela)
- add x10, x9, x10 // __va(.rela) + sizeof(.rela)
0: cmp x9, x10
b.hs 1f
@@ -813,10 +810,8 @@ SYM_FUNC_START_LOCAL(__relocate_kernel)
* __relocate_kernel is called twice with non-zero displacements (i.e.
* if there is both a physical misalignment and a KASLR displacement).
*/
- ldr w9, =__relr_offset // offset to reloc table
- ldr w10, =__relr_size // size of reloc table
- add x9, x9, x11 // __va(.relr)
- add x10, x9, x10 // __va(.relr) + sizeof(.relr)
+ adr_l x9, __relr_start
+ adr_l x10, __relr_end
sub x15, x23, x24 // delta from previous offset
cbz x15, 7f // nothing to do if unchanged
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 0ce3a7c9f8c4..45131e354e27 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -257,21 +257,17 @@ SECTIONS
HYPERVISOR_RELOC_SECTION
.rela.dyn : ALIGN(8) {
+ __rela_start = .;
*(.rela .rela*)
+ __rela_end = .;
}
- __rela_offset = ABSOLUTE(ADDR(.rela.dyn) - KIMAGE_VADDR);
- __rela_size = SIZEOF(.rela.dyn);
-
-#ifdef CONFIG_RELR
.relr.dyn : ALIGN(8) {
+ __relr_start = .;
*(.relr.dyn)
+ __relr_end = .;
}
- __relr_offset = ABSOLUTE(ADDR(.relr.dyn) - KIMAGE_VADDR);
- __relr_size = SIZEOF(.relr.dyn);
-#endif
-
. = ALIGN(SEGMENT_ALIGN);
__initdata_end = .;
__init_end = .;
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 14/26] arm64: head: create a temporary FDT mapping in the initial ID map
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (12 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 13/26] arm64: head: use relative references to the RELA and RELR tables Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 15/26] arm64: idreg-override: use early FDT mapping in " Ard Biesheuvel
` (12 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
We need to access the DT very early to get at the command line and the
KASLR seed, which currently means we rely on some hacks to call into the
kernel before really calling into the kernel, which is undesirable.
So instead, let's create a mapping for the FDT in the initial ID map,
which is feasible now that it has been extended to cover more than a
single page or block, and can be updated in place to remap other output
addresses.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/kernel-pgtable.h | 6 ++++--
arch/arm64/kernel/head.S | 14 +++++++++++++-
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 5395e5a04f35..02e59fa8f293 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -8,6 +8,7 @@
#ifndef __ASM_KERNEL_PGTABLE_H
#define __ASM_KERNEL_PGTABLE_H
+#include <asm/boot.h>
#include <asm/pgtable-hwdef.h>
#include <asm/sparsemem.h>
@@ -88,10 +89,11 @@
/* the initial ID map may need two extra pages if it needs to be extended */
#if VA_BITS < 48
-#define INIT_IDMAP_DIR_SIZE (INIT_DIR_SIZE + (2 * PAGE_SIZE))
+#define INIT_IDMAP_DIR_SIZE ((INIT_IDMAP_DIR_PAGES + 2) * PAGE_SIZE)
#else
-#define INIT_IDMAP_DIR_SIZE INIT_DIR_SIZE
+#define INIT_IDMAP_DIR_SIZE (INIT_IDMAP_DIR_PAGES * PAGE_SIZE)
#endif
+#define INIT_IDMAP_DIR_PAGES EARLY_PAGES(KIMAGE_VADDR, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE)
/* Initial memory map size */
#if ARM64_KERNEL_USES_PMD_MAPS
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index f1497f7b4da0..8283ff848328 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -83,6 +83,7 @@
*
* Register Scope Purpose
* x21 primary_entry() .. start_kernel() FDT pointer passed at boot in x0
+ * x22 create_idmap() .. start_kernel() ID map VA of the DT blob
* x23 primary_entry() .. start_kernel() physical misalignment/KASLR offset
* x28 clear_page_tables() callee preserved temp register
* x19/x20 __primary_switch() callee preserved temp registers
@@ -348,7 +349,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
#endif
adrp x0, init_idmap_pg_dir
adrp x3, _text
- adrp x6, _end
+ adrp x6, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE
mov x7, SWAPPER_RX_MMUFLAGS
map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
@@ -362,6 +363,17 @@ SYM_FUNC_START_LOCAL(create_idmap)
mov x6, #SWAPPER_BLOCK_SHIFT
bl remap_region
+ /* Remap the FDT after the kernel image */
+ adrp x1, _text
+ adrp x22, _end + SWAPPER_BLOCK_SIZE
+ bic x2, x22, #SWAPPER_BLOCK_SIZE - 1
+ bfi x22, x21, #0, #SWAPPER_BLOCK_SHIFT // remapped FDT address
+ add x3, x2, #MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE
+ bic x4, x21, #SWAPPER_BLOCK_SIZE - 1
+ mov x5, SWAPPER_RW_MMUFLAGS
+ mov x6, #SWAPPER_BLOCK_SHIFT
+ bl remap_region
+
/*
* Since the page tables have been populated with non-cacheable
* accesses (MMU disabled), invalidate those tables again to
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 15/26] arm64: idreg-override: use early FDT mapping in ID map
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (13 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 14/26] arm64: head: create a temporary FDT mapping in the initial ID map Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 16/26] arm64: head: factor out TTBR1 assignment into a macro Ard Biesheuvel
` (11 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Instead of calling into the kernel to map the FDT into the kernel page
tables before even calling start_kernel(), let's switch to the initial,
temporary mapping of the device tree that has been added to the ID map.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 1 +
arch/arm64/kernel/idreg-override.c | 17 ++++++-----------
2 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 8283ff848328..64ebff634b83 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -472,6 +472,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
#endif
mov x0, x21 // pass FDT address in x0
bl early_fdt_map // Try mapping the FDT early
+ mov x0, x22 // pass FDT address in x0
bl init_feature_override // Parse cpu feature overrides
#ifdef CONFIG_RANDOMIZE_BASE
tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized?
diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index 8a2ceb591686..f92836e196e5 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -201,16 +201,11 @@ static __init void __parse_cmdline(const char *cmdline, bool parse_aliases)
} while (1);
}
-static __init const u8 *get_bootargs_cmdline(void)
+static __init const u8 *get_bootargs_cmdline(const void *fdt)
{
const u8 *prop;
- void *fdt;
int node;
- fdt = get_early_fdt_ptr();
- if (!fdt)
- return NULL;
-
node = fdt_path_offset(fdt, "/chosen");
if (node < 0)
return NULL;
@@ -222,9 +217,9 @@ static __init const u8 *get_bootargs_cmdline(void)
return strlen(prop) ? prop : NULL;
}
-static __init void parse_cmdline(void)
+static __init void parse_cmdline(const void *fdt)
{
- const u8 *prop = get_bootargs_cmdline();
+ const u8 *prop = get_bootargs_cmdline(fdt);
if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !prop)
__parse_cmdline(CONFIG_CMDLINE, true);
@@ -234,9 +229,9 @@ static __init void parse_cmdline(void)
}
/* Keep checkers quiet */
-void init_feature_override(void);
+void init_feature_override(const void *fdt);
-asmlinkage void __init init_feature_override(void)
+asmlinkage void __init init_feature_override(const void *fdt)
{
int i;
@@ -247,7 +242,7 @@ asmlinkage void __init init_feature_override(void)
}
}
- parse_cmdline();
+ parse_cmdline(fdt);
for (i = 0; i < ARRAY_SIZE(regs); i++) {
if (regs[i]->override)
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 16/26] arm64: head: factor out TTBR1 assignment into a macro
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (14 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 15/26] arm64: idreg-override: use early FDT mapping in " Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 17/26] arm64: head: populate kernel page tables with MMU and caches on Ard Biesheuvel
` (10 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Create a macro load_ttbr1 to avoid having to repeat the same instruction
sequence 3 times in a subsequent patch. No functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/assembler.h | 17 +++++++++++++----
arch/arm64/kernel/head.S | 5 +----
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 9468f45c07a6..b2584709c332 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -479,6 +479,18 @@ alternative_endif
_cond_extable .Licache_op\@, \fixup
.endm
+/*
+ * load_ttbr1 - install @pgtbl as a TTBR1 page table
+ * pgtbl preserved
+ * tmp1/tmp2 clobbered, either may overlap with pgtbl
+ */
+ .macro load_ttbr1, pgtbl, tmp1, tmp2
+ phys_to_ttbr \tmp1, \pgtbl
+ offset_ttbr1 \tmp1, \tmp2
+ msr ttbr1_el1, \tmp1
+ isb
+ .endm
+
/*
* To prevent the possibility of old and new partial table walks being visible
* in the tlb, switch the ttbr to a zero page when we invalidate the old
@@ -492,10 +504,7 @@ alternative_endif
isb
tlbi vmalle1
dsb nsh
- phys_to_ttbr \tmp, \page_table
- offset_ttbr1 \tmp, \tmp2
- msr ttbr1_el1, \tmp
- isb
+ load_ttbr1 \page_table, \tmp, \tmp2
.endm
/*
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 64ebff634b83..d704d0bd8ffc 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -722,12 +722,9 @@ SYM_FUNC_START(__enable_mmu)
cmp x3, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
b.gt __no_granule_support
update_early_cpu_boot_status 0, x3, x4
- phys_to_ttbr x1, x1
phys_to_ttbr x2, x2
msr ttbr0_el1, x2 // load TTBR0
- offset_ttbr1 x1, x3
- msr ttbr1_el1, x1 // load TTBR1
- isb
+ load_ttbr1 x1, x1, x3
set_sctlr_el1 x0
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 17/26] arm64: head: populate kernel page tables with MMU and caches on
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (15 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 16/26] arm64: head: factor out TTBR1 assignment into a macro Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-24 12:56 ` Will Deacon
2022-06-13 14:45 ` [PATCH v4 18/26] arm64: head: record CPU boot mode after enabling the MMU Ard Biesheuvel
` (9 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Now that we can access the entire kernel image via the ID map, we can
execute the page table population code with the MMU and caches enabled.
The only thing we need to ensure is that translations via TTBR1 remain
disabled while we are updating the page tables the second time around,
in case KASLR wants them to be randomized.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 62 +++++---------------
1 file changed, 16 insertions(+), 46 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d704d0bd8ffc..583cbea865e1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -85,8 +85,6 @@
* x21 primary_entry() .. start_kernel() FDT pointer passed at boot in x0
* x22 create_idmap() .. start_kernel() ID map VA of the DT blob
* x23 primary_entry() .. start_kernel() physical misalignment/KASLR offset
- * x28 clear_page_tables() callee preserved temp register
- * x19/x20 __primary_switch() callee preserved temp registers
* x24 __primary_switch() .. relocate_kernel() current RELR displacement
* x28 create_idmap() callee preserved temp register
*/
@@ -96,9 +94,7 @@ SYM_CODE_START(primary_entry)
adrp x23, __PHYS_OFFSET
and x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
bl set_cpu_boot_mode_flag
- bl clear_page_tables
bl create_idmap
- bl create_kernel_mapping
/*
* The following calls CPU setup code, see arch/arm64/mm/proc.S for
@@ -128,32 +124,14 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
SYM_CODE_END(preserve_boot_args)
SYM_FUNC_START_LOCAL(clear_page_tables)
- mov x28, lr
-
- /*
- * Invalidate the init page tables to avoid potential dirty cache lines
- * being evicted. Other page tables are allocated in rodata as part of
- * the kernel image, and thus are clean to the PoC per the boot
- * protocol.
- */
- adrp x0, init_pg_dir
- adrp x1, init_pg_end
- bl dcache_inval_poc
-
/*
* Clear the init page tables.
*/
adrp x0, init_pg_dir
adrp x1, init_pg_end
- sub x1, x1, x0
-1: stp xzr, xzr, [x0], #16
- stp xzr, xzr, [x0], #16
- stp xzr, xzr, [x0], #16
- stp xzr, xzr, [x0], #16
- subs x1, x1, #64
- b.ne 1b
-
- ret x28
+ sub x2, x1, x0
+ mov x1, xzr
+ b __pi_memset // tail call
SYM_FUNC_END(clear_page_tables)
/*
@@ -399,16 +377,8 @@ SYM_FUNC_START_LOCAL(create_kernel_mapping)
map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
- /*
- * Since the page tables have been populated with non-cacheable
- * accesses (MMU disabled), invalidate those tables again to
- * remove any speculatively loaded cache lines.
- */
- dmb sy
-
- adrp x0, init_pg_dir
- adrp x1, init_pg_end
- b dcache_inval_poc // tail call
+ dsb ishst // sync with page table walker
+ ret
SYM_FUNC_END(create_kernel_mapping)
/*
@@ -863,14 +833,15 @@ SYM_FUNC_END(__relocate_kernel)
#endif
SYM_FUNC_START_LOCAL(__primary_switch)
-#ifdef CONFIG_RANDOMIZE_BASE
- mov x19, x0 // preserve new SCTLR_EL1 value
- mrs x20, sctlr_el1 // preserve old SCTLR_EL1 value
-#endif
-
- adrp x1, init_pg_dir
+ adrp x1, reserved_pg_dir
adrp x2, init_idmap_pg_dir
bl __enable_mmu
+
+ bl clear_page_tables
+ bl create_kernel_mapping
+
+ adrp x1, init_pg_dir
+ load_ttbr1 x1, x1, x2
#ifdef CONFIG_RELOCATABLE
#ifdef CONFIG_RELR
mov x24, #0 // no RELR displacement yet
@@ -886,9 +857,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
* to take into account by discarding the current kernel mapping and
* creating a new one.
*/
- pre_disable_mmu_workaround
- msr sctlr_el1, x20 // disable the MMU
- isb
+ adrp x1, reserved_pg_dir // Disable translations via TTBR1
+ load_ttbr1 x1, x1, x2
bl clear_page_tables
bl create_kernel_mapping // Recreate kernel mapping
@@ -896,8 +866,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
dsb nsh
isb
- set_sctlr_el1 x19 // re-enable the MMU
-
+ adrp x1, init_pg_dir // Re-enable translations via TTBR1
+ load_ttbr1 x1, x1, x2
bl __relocate_kernel
#endif
#endif
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 17/26] arm64: head: populate kernel page tables with MMU and caches on
2022-06-13 14:45 ` [PATCH v4 17/26] arm64: head: populate kernel page tables with MMU and caches on Ard Biesheuvel
@ 2022-06-24 12:56 ` Will Deacon
2022-06-24 13:07 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Will Deacon @ 2022-06-24 12:56 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:41PM +0200, Ard Biesheuvel wrote:
> Now that we can access the entire kernel image via the ID map, we can
> execute the page table population code with the MMU and caches enabled.
> The only thing we need to ensure is that translations via TTBR1 remain
> disabled while we are updating the page tables the second time around,
> in case KASLR wants them to be randomized.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/kernel/head.S | 62 +++++---------------
> 1 file changed, 16 insertions(+), 46 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index d704d0bd8ffc..583cbea865e1 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -85,8 +85,6 @@
> * x21 primary_entry() .. start_kernel() FDT pointer passed at boot in x0
> * x22 create_idmap() .. start_kernel() ID map VA of the DT blob
> * x23 primary_entry() .. start_kernel() physical misalignment/KASLR offset
> - * x28 clear_page_tables() callee preserved temp register
> - * x19/x20 __primary_switch() callee preserved temp registers
> * x24 __primary_switch() .. relocate_kernel() current RELR displacement
> * x28 create_idmap() callee preserved temp register
> */
> @@ -96,9 +94,7 @@ SYM_CODE_START(primary_entry)
> adrp x23, __PHYS_OFFSET
> and x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
> bl set_cpu_boot_mode_flag
> - bl clear_page_tables
> bl create_idmap
> - bl create_kernel_mapping
>
> /*
> * The following calls CPU setup code, see arch/arm64/mm/proc.S for
> @@ -128,32 +124,14 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
> SYM_CODE_END(preserve_boot_args)
>
> SYM_FUNC_START_LOCAL(clear_page_tables)
> - mov x28, lr
> -
> - /*
> - * Invalidate the init page tables to avoid potential dirty cache lines
> - * being evicted. Other page tables are allocated in rodata as part of
> - * the kernel image, and thus are clean to the PoC per the boot
> - * protocol.
> - */
> - adrp x0, init_pg_dir
> - adrp x1, init_pg_end
> - bl dcache_inval_poc
> -
> /*
> * Clear the init page tables.
> */
> adrp x0, init_pg_dir
> adrp x1, init_pg_end
> - sub x1, x1, x0
> -1: stp xzr, xzr, [x0], #16
> - stp xzr, xzr, [x0], #16
> - stp xzr, xzr, [x0], #16
> - stp xzr, xzr, [x0], #16
> - subs x1, x1, #64
> - b.ne 1b
> -
> - ret x28
> + sub x2, x1, x0
> + mov x1, xzr
> + b __pi_memset // tail call
> SYM_FUNC_END(clear_page_tables)
>
> /*
> @@ -399,16 +377,8 @@ SYM_FUNC_START_LOCAL(create_kernel_mapping)
>
> map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
>
> - /*
> - * Since the page tables have been populated with non-cacheable
> - * accesses (MMU disabled), invalidate those tables again to
> - * remove any speculatively loaded cache lines.
> - */
> - dmb sy
> -
> - adrp x0, init_pg_dir
> - adrp x1, init_pg_end
> - b dcache_inval_poc // tail call
> + dsb ishst // sync with page table walker
> + ret
> SYM_FUNC_END(create_kernel_mapping)
>
> /*
> @@ -863,14 +833,15 @@ SYM_FUNC_END(__relocate_kernel)
> #endif
>
> SYM_FUNC_START_LOCAL(__primary_switch)
> -#ifdef CONFIG_RANDOMIZE_BASE
> - mov x19, x0 // preserve new SCTLR_EL1 value
> - mrs x20, sctlr_el1 // preserve old SCTLR_EL1 value
> -#endif
> -
> - adrp x1, init_pg_dir
> + adrp x1, reserved_pg_dir
> adrp x2, init_idmap_pg_dir
> bl __enable_mmu
> +
> + bl clear_page_tables
> + bl create_kernel_mapping
> +
> + adrp x1, init_pg_dir
> + load_ttbr1 x1, x1, x2
> #ifdef CONFIG_RELOCATABLE
> #ifdef CONFIG_RELR
> mov x24, #0 // no RELR displacement yet
> @@ -886,9 +857,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
> * to take into account by discarding the current kernel mapping and
> * creating a new one.
> */
> - pre_disable_mmu_workaround
> - msr sctlr_el1, x20 // disable the MMU
> - isb
> + adrp x1, reserved_pg_dir // Disable translations via TTBR1
> + load_ttbr1 x1, x1, x2
I'd have thought we'd need some TLB maintenance here... is that not the
case?
Also, it might be a tiny bit easier to clear EPD1 instead of using the
reserved_pg_dir.
Will
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 17/26] arm64: head: populate kernel page tables with MMU and caches on
2022-06-24 12:56 ` Will Deacon
@ 2022-06-24 13:07 ` Ard Biesheuvel
2022-06-24 13:29 ` Will Deacon
0 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-24 13:07 UTC (permalink / raw)
To: Will Deacon
Cc: Linux ARM, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Fri, 24 Jun 2022 at 14:56, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jun 13, 2022 at 04:45:41PM +0200, Ard Biesheuvel wrote:
> > Now that we can access the entire kernel image via the ID map, we can
> > execute the page table population code with the MMU and caches enabled.
> > The only thing we need to ensure is that translations via TTBR1 remain
> > disabled while we are updating the page tables the second time around,
> > in case KASLR wants them to be randomized.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/arm64/kernel/head.S | 62 +++++---------------
> > 1 file changed, 16 insertions(+), 46 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index d704d0bd8ffc..583cbea865e1 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -85,8 +85,6 @@
> > * x21 primary_entry() .. start_kernel() FDT pointer passed at boot in x0
> > * x22 create_idmap() .. start_kernel() ID map VA of the DT blob
> > * x23 primary_entry() .. start_kernel() physical misalignment/KASLR offset
> > - * x28 clear_page_tables() callee preserved temp register
> > - * x19/x20 __primary_switch() callee preserved temp registers
> > * x24 __primary_switch() .. relocate_kernel() current RELR displacement
> > * x28 create_idmap() callee preserved temp register
> > */
> > @@ -96,9 +94,7 @@ SYM_CODE_START(primary_entry)
> > adrp x23, __PHYS_OFFSET
> > and x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
> > bl set_cpu_boot_mode_flag
> > - bl clear_page_tables
> > bl create_idmap
> > - bl create_kernel_mapping
> >
> > /*
> > * The following calls CPU setup code, see arch/arm64/mm/proc.S for
> > @@ -128,32 +124,14 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
> > SYM_CODE_END(preserve_boot_args)
> >
> > SYM_FUNC_START_LOCAL(clear_page_tables)
> > - mov x28, lr
> > -
> > - /*
> > - * Invalidate the init page tables to avoid potential dirty cache lines
> > - * being evicted. Other page tables are allocated in rodata as part of
> > - * the kernel image, and thus are clean to the PoC per the boot
> > - * protocol.
> > - */
> > - adrp x0, init_pg_dir
> > - adrp x1, init_pg_end
> > - bl dcache_inval_poc
> > -
> > /*
> > * Clear the init page tables.
> > */
> > adrp x0, init_pg_dir
> > adrp x1, init_pg_end
> > - sub x1, x1, x0
> > -1: stp xzr, xzr, [x0], #16
> > - stp xzr, xzr, [x0], #16
> > - stp xzr, xzr, [x0], #16
> > - stp xzr, xzr, [x0], #16
> > - subs x1, x1, #64
> > - b.ne 1b
> > -
> > - ret x28
> > + sub x2, x1, x0
> > + mov x1, xzr
> > + b __pi_memset // tail call
> > SYM_FUNC_END(clear_page_tables)
> >
> > /*
> > @@ -399,16 +377,8 @@ SYM_FUNC_START_LOCAL(create_kernel_mapping)
> >
> > map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
> >
> > - /*
> > - * Since the page tables have been populated with non-cacheable
> > - * accesses (MMU disabled), invalidate those tables again to
> > - * remove any speculatively loaded cache lines.
> > - */
> > - dmb sy
> > -
> > - adrp x0, init_pg_dir
> > - adrp x1, init_pg_end
> > - b dcache_inval_poc // tail call
> > + dsb ishst // sync with page table walker
> > + ret
> > SYM_FUNC_END(create_kernel_mapping)
> >
> > /*
> > @@ -863,14 +833,15 @@ SYM_FUNC_END(__relocate_kernel)
> > #endif
> >
> > SYM_FUNC_START_LOCAL(__primary_switch)
> > -#ifdef CONFIG_RANDOMIZE_BASE
> > - mov x19, x0 // preserve new SCTLR_EL1 value
> > - mrs x20, sctlr_el1 // preserve old SCTLR_EL1 value
> > -#endif
> > -
> > - adrp x1, init_pg_dir
> > + adrp x1, reserved_pg_dir
> > adrp x2, init_idmap_pg_dir
> > bl __enable_mmu
> > +
> > + bl clear_page_tables
> > + bl create_kernel_mapping
> > +
> > + adrp x1, init_pg_dir
> > + load_ttbr1 x1, x1, x2
> > #ifdef CONFIG_RELOCATABLE
> > #ifdef CONFIG_RELR
> > mov x24, #0 // no RELR displacement yet
> > @@ -886,9 +857,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
> > * to take into account by discarding the current kernel mapping and
> > * creating a new one.
> > */
> > - pre_disable_mmu_workaround
> > - msr sctlr_el1, x20 // disable the MMU
> > - isb
> > + adrp x1, reserved_pg_dir // Disable translations via TTBR1
> > + load_ttbr1 x1, x1, x2
>
> I'd have thought we'd need some TLB maintenance here... is that not the
> case?
>
You mean at this particular point? We are running from the ID map with
TTBR1 translations disabled. We clear the page tables, repopulate
them, and perform a TLBI VMALLE1.
So are you saying repopulating the page tables while translations are
disabled needs to occur only after doing TLB maintenance?
> Also, it might be a tiny bit easier to clear EPD1 instead of using the
> reserved_pg_dir.
>
Right. So is there any reason in particular why it would be
appropriate here but not anywhere else? IOW, why do we have
reserved_pg_dir in the first place if we can just flick EPD1 on and
off?
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 17/26] arm64: head: populate kernel page tables with MMU and caches on
2022-06-24 13:07 ` Ard Biesheuvel
@ 2022-06-24 13:29 ` Will Deacon
2022-06-24 14:07 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Will Deacon @ 2022-06-24 13:29 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Linux ARM, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Fri, Jun 24, 2022 at 03:07:44PM +0200, Ard Biesheuvel wrote:
> On Fri, 24 Jun 2022 at 14:56, Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Jun 13, 2022 at 04:45:41PM +0200, Ard Biesheuvel wrote:
> > > Now that we can access the entire kernel image via the ID map, we can
> > > execute the page table population code with the MMU and caches enabled.
> > > The only thing we need to ensure is that translations via TTBR1 remain
> > > disabled while we are updating the page tables the second time around,
> > > in case KASLR wants them to be randomized.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > > arch/arm64/kernel/head.S | 62 +++++---------------
> > > 1 file changed, 16 insertions(+), 46 deletions(-)
[...]
> > > @@ -886,9 +857,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
> > > * to take into account by discarding the current kernel mapping and
> > > * creating a new one.
> > > */
> > > - pre_disable_mmu_workaround
> > > - msr sctlr_el1, x20 // disable the MMU
> > > - isb
> > > + adrp x1, reserved_pg_dir // Disable translations via TTBR1
> > > + load_ttbr1 x1, x1, x2
> >
> > I'd have thought we'd need some TLB maintenance here... is that not the
> > case?
> >
>
> You mean at this particular point? We are running from the ID map with
> TTBR1 translations disabled. We clear the page tables, repopulate
> them, and perform a TLBI VMALLE1.
>
> So are you saying repopulating the page tables while translations are
> disabled needs to occur only after doing TLB maintenance?
I'm thinking about walk cache entries from the previous page-table, which
would make the reserved_pg_dir ineffective. However, if we're clearing the
page-table anyway, I'm not even sure why we need reserved_pg_dir at all!
> > Also, it might be a tiny bit easier to clear EPD1 instead of using the
> > reserved_pg_dir.
> >
>
> Right. So is there any reason in particular why it would be
> appropriate here but not anywhere else? IOW, why do we have
> reserved_pg_dir in the first place if we can just flick EPD1 on and
> off?
I think using a reserved (all zeroes) page-table makes sense when it
has its own ASID, as you can switch to/from it without TLB invalidation,
but that doesn't seem to be the case here. Anyway, no strong preference,
I just thought it might simplify things a bit.
Will
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 17/26] arm64: head: populate kernel page tables with MMU and caches on
2022-06-24 13:29 ` Will Deacon
@ 2022-06-24 14:07 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-24 14:07 UTC (permalink / raw)
To: Will Deacon
Cc: Linux ARM, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Fri, 24 Jun 2022 at 15:29, Will Deacon <will@kernel.org> wrote:
>
> On Fri, Jun 24, 2022 at 03:07:44PM +0200, Ard Biesheuvel wrote:
> > On Fri, 24 Jun 2022 at 14:56, Will Deacon <will@kernel.org> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 04:45:41PM +0200, Ard Biesheuvel wrote:
> > > > Now that we can access the entire kernel image via the ID map, we can
> > > > execute the page table population code with the MMU and caches enabled.
> > > > The only thing we need to ensure is that translations via TTBR1 remain
> > > > disabled while we are updating the page tables the second time around,
> > > > in case KASLR wants them to be randomized.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > > arch/arm64/kernel/head.S | 62 +++++---------------
> > > > 1 file changed, 16 insertions(+), 46 deletions(-)
>
> [...]
>
> > > > @@ -886,9 +857,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
> > > > * to take into account by discarding the current kernel mapping and
> > > > * creating a new one.
> > > > */
> > > > - pre_disable_mmu_workaround
> > > > - msr sctlr_el1, x20 // disable the MMU
> > > > - isb
> > > > + adrp x1, reserved_pg_dir // Disable translations via TTBR1
> > > > + load_ttbr1 x1, x1, x2
> > >
> > > I'd have thought we'd need some TLB maintenance here... is that not the
> > > case?
> > >
> >
> > You mean at this particular point? We are running from the ID map with
> > TTBR1 translations disabled. We clear the page tables, repopulate
> > them, and perform a TLBI VMALLE1.
> >
> > So are you saying repopulating the page tables while translations are
> > disabled needs to occur only after doing TLB maintenance?
>
> I'm thinking about walk cache entries from the previous page-table, which
> would make the reserved_pg_dir ineffective. However, if we're clearing the
> page-table anyway, I'm not even sure why we need reserved_pg_dir at all!
>
Perhaps not. But this code is removed again two patches later so it
doesn't matter that much to begin with.
> > > Also, it might be a tiny bit easier to clear EPD1 instead of using the
> > > reserved_pg_dir.
> > >
> >
> > Right. So is there any reason in particular why it would be
> > appropriate here but not anywhere else? IOW, why do we have
> > reserved_pg_dir in the first place if we can just flick EPD1 on and
> > off?
>
> I think using a reserved (all zeroes) page-table makes sense when it
> has its own ASID, as you can switch to/from it without TLB invalidation,
> but that doesn't seem to be the case here. Anyway, no strong preference,
> I just thought it might simplify things a bit.
>
Ah right, I hadn't considered ASIDs.
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 18/26] arm64: head: record CPU boot mode after enabling the MMU
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (16 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 17/26] arm64: head: populate kernel page tables with MMU and caches on Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 19/26] arm64: kaslr: defer initialization to late initcall where permitted Ard Biesheuvel
` (8 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
In order to avoid having to touch memory with the MMU and caches
disabled, and therefore having to invalidate it from the caches
explicitly, just defer storing the value until after the MMU has been
turned on, unless we are giving up with an error.
While at it, move the associated variable definitions into C code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 50 +++++---------------
arch/arm64/kernel/hyp-stub.S | 4 +-
arch/arm64/mm/mmu.c | 8 ++++
3 files changed, 23 insertions(+), 39 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 583cbea865e1..8de346dd4470 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -82,6 +82,7 @@
* primary lowlevel boot path:
*
* Register Scope Purpose
+ * x20 primary_entry() .. __primary_switch() CPU boot mode
* x21 primary_entry() .. start_kernel() FDT pointer passed at boot in x0
* x22 create_idmap() .. start_kernel() ID map VA of the DT blob
* x23 primary_entry() .. start_kernel() physical misalignment/KASLR offset
@@ -91,9 +92,9 @@
SYM_CODE_START(primary_entry)
bl preserve_boot_args
bl init_kernel_el // w0=cpu_boot_mode
+ mov x20, x0
adrp x23, __PHYS_OFFSET
and x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
- bl set_cpu_boot_mode_flag
bl create_idmap
/*
@@ -429,6 +430,9 @@ SYM_FUNC_START_LOCAL(__primary_switched)
sub x4, x4, x0 // the kernel virtual and
str_l x4, kimage_voffset, x5 // physical mappings
+ mov x0, x20
+ bl set_cpu_boot_mode_flag
+
// Clear BSS
adr_l x0, __bss_start
mov x1, xzr
@@ -454,6 +458,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
ret // to __primary_switch()
0:
#endif
+ mov x0, x20
bl switch_to_vhe // Prefer VHE if possible
ldp x29, x30, [sp], #16
bl start_kernel
@@ -553,52 +558,21 @@ SYM_FUNC_START_LOCAL(set_cpu_boot_mode_flag)
b.ne 1f
add x1, x1, #4
1: str w0, [x1] // Save CPU boot mode
- dmb sy
- dc ivac, x1 // Invalidate potentially stale cache line
ret
SYM_FUNC_END(set_cpu_boot_mode_flag)
-/*
- * These values are written with the MMU off, but read with the MMU on.
- * Writers will invalidate the corresponding address, discarding up to a
- * 'Cache Writeback Granule' (CWG) worth of data. The linker script ensures
- * sufficient alignment that the CWG doesn't overlap another section.
- */
- .pushsection ".mmuoff.data.write", "aw"
-/*
- * We need to find out the CPU boot mode long after boot, so we need to
- * store it in a writable variable.
- *
- * This is not in .bss, because we set it sufficiently early that the boot-time
- * zeroing of .bss would clobber it.
- */
-SYM_DATA_START(__boot_cpu_mode)
- .long BOOT_CPU_MODE_EL2
- .long BOOT_CPU_MODE_EL1
-SYM_DATA_END(__boot_cpu_mode)
-/*
- * The booting CPU updates the failed status @__early_cpu_boot_status,
- * with MMU turned off.
- */
-SYM_DATA_START(__early_cpu_boot_status)
- .quad 0
-SYM_DATA_END(__early_cpu_boot_status)
-
- .popsection
-
/*
* This provides a "holding pen" for platforms to hold all secondary
* cores are held until we're ready for them to initialise.
*/
SYM_FUNC_START(secondary_holding_pen)
bl init_kernel_el // w0=cpu_boot_mode
- bl set_cpu_boot_mode_flag
- mrs x0, mpidr_el1
+ mrs x2, mpidr_el1
mov_q x1, MPIDR_HWID_BITMASK
- and x0, x0, x1
+ and x2, x2, x1
adr_l x3, secondary_holding_pen_release
pen: ldr x4, [x3]
- cmp x4, x0
+ cmp x4, x2
b.eq secondary_startup
wfe
b pen
@@ -610,7 +584,6 @@ SYM_FUNC_END(secondary_holding_pen)
*/
SYM_FUNC_START(secondary_entry)
bl init_kernel_el // w0=cpu_boot_mode
- bl set_cpu_boot_mode_flag
b secondary_startup
SYM_FUNC_END(secondary_entry)
@@ -618,6 +591,7 @@ SYM_FUNC_START_LOCAL(secondary_startup)
/*
* Common entry point for secondary CPUs.
*/
+ mov x20, x0 // preserve boot mode
bl switch_to_vhe
bl __cpu_secondary_check52bitva
bl __cpu_setup // initialise processor
@@ -629,6 +603,9 @@ SYM_FUNC_START_LOCAL(secondary_startup)
SYM_FUNC_END(secondary_startup)
SYM_FUNC_START_LOCAL(__secondary_switched)
+ mov x0, x20
+ bl set_cpu_boot_mode_flag
+ str_l xzr, __early_cpu_boot_status, x3
adr_l x5, vectors
msr vbar_el1, x5
isb
@@ -691,7 +668,6 @@ SYM_FUNC_START(__enable_mmu)
b.lt __no_granule_support
cmp x3, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
b.gt __no_granule_support
- update_early_cpu_boot_status 0, x3, x4
phys_to_ttbr x2, x2
msr ttbr0_el1, x2 // load TTBR0
load_ttbr1 x1, x1, x3
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 43d212618834..5bafb53fafb4 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -223,11 +223,11 @@ SYM_FUNC_END(__hyp_reset_vectors)
/*
* Entry point to switch to VHE if deemed capable
+ *
+ * w0: boot mode, as returned by init_kernel_el()
*/
SYM_FUNC_START(switch_to_vhe)
// Need to have booted at EL2
- adr_l x1, __boot_cpu_mode
- ldr w0, [x1]
cmp w0, #BOOT_CPU_MODE_EL2
b.ne 1f
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ed3a4b87529b..9828ad826837 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -56,6 +56,14 @@ EXPORT_SYMBOL(kimage_vaddr);
u64 kimage_voffset __ro_after_init;
EXPORT_SYMBOL(kimage_voffset);
+u32 __boot_cpu_mode[] = { BOOT_CPU_MODE_EL2, BOOT_CPU_MODE_EL1 };
+
+/*
+ * The booting CPU updates the failed status @__early_cpu_boot_status,
+ * with MMU turned off.
+ */
+long __section(".mmuoff.data.write") __early_cpu_boot_status;
+
/*
* Empty_zero_page is a special page that is used for zero-initialized data
* and COW.
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 19/26] arm64: kaslr: defer initialization to late initcall where permitted
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (17 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 18/26] arm64: head: record CPU boot mode after enabling the MMU Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-24 13:08 ` Will Deacon
2022-06-13 14:45 ` [PATCH v4 20/26] arm64: head: avoid relocating the kernel twice for KASLR Ard Biesheuvel
` (7 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
The early KASLR init code runs extremely early, and anything that could
be deferred until later should be. So let's defer the randomization of
the module region until much later - this also simplifies the
arithmetic, given that we no longer have to reason about the link time
vs load time placement of the core kernel explicitly. Also get rid of
the global status variable, and infer the status reported by the
diagnostic print from other KASLR related context.
While at it, get rid of the special case for KASAN without
KASAN_VMALLOC, which never occurs in practice.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/kaslr.c | 95 +++++++++-----------
1 file changed, 40 insertions(+), 55 deletions(-)
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index d5542666182f..af9ffe4d0f0f 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -20,14 +20,6 @@
#include <asm/sections.h>
#include <asm/setup.h>
-enum kaslr_status {
- KASLR_ENABLED,
- KASLR_DISABLED_CMDLINE,
- KASLR_DISABLED_NO_SEED,
- KASLR_DISABLED_FDT_REMAP,
-};
-
-static enum kaslr_status __initdata kaslr_status;
u64 __ro_after_init module_alloc_base;
u16 __initdata memstart_offset_seed;
@@ -63,15 +55,9 @@ struct arm64_ftr_override kaslr_feature_override __initdata;
u64 __init kaslr_early_init(void)
{
void *fdt;
- u64 seed, offset, mask, module_range;
+ u64 seed, offset, mask;
unsigned long raw;
- /*
- * Set a reasonable default for module_alloc_base in case
- * we end up running with module randomization disabled.
- */
- module_alloc_base = (u64)_etext - MODULES_VSIZE;
-
/*
* Try to map the FDT early. If this fails, we simply bail,
* and proceed with KASLR disabled. We will make another
@@ -79,7 +65,6 @@ u64 __init kaslr_early_init(void)
*/
fdt = get_early_fdt_ptr();
if (!fdt) {
- kaslr_status = KASLR_DISABLED_FDT_REMAP;
return 0;
}
@@ -93,7 +78,6 @@ u64 __init kaslr_early_init(void)
* return 0 if that is the case.
*/
if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) {
- kaslr_status = KASLR_DISABLED_CMDLINE;
return 0;
}
@@ -106,7 +90,6 @@ u64 __init kaslr_early_init(void)
seed ^= raw;
if (!seed) {
- kaslr_status = KASLR_DISABLED_NO_SEED;
return 0;
}
@@ -126,19 +109,43 @@ u64 __init kaslr_early_init(void)
/* use the top 16 bits to randomize the linear region */
memstart_offset_seed = seed >> 48;
- if (!IS_ENABLED(CONFIG_KASAN_VMALLOC) &&
- (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
- IS_ENABLED(CONFIG_KASAN_SW_TAGS)))
- /*
- * KASAN without KASAN_VMALLOC does not expect the module region
- * to intersect the vmalloc region, since shadow memory is
- * allocated for each module at load time, whereas the vmalloc
- * region is shadowed by KASAN zero pages. So keep modules
- * out of the vmalloc region if KASAN is enabled without
- * KASAN_VMALLOC, and put the kernel well within 4 GB of the
- * module region.
- */
- return offset % SZ_2G;
+ return offset;
+}
+
+static int __init kaslr_init(void)
+{
+ u64 module_range;
+ u32 seed;
+
+ /*
+ * Set a reasonable default for module_alloc_base in case
+ * we end up running with module randomization disabled.
+ */
+ module_alloc_base = (u64)_etext - MODULES_VSIZE;
+
+ if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) {
+ pr_info("KASLR disabled on command line\n");
+ return 0;
+ }
+
+ if (!kaslr_offset()) {
+ pr_warn("KASLR disabled due to lack of seed\n");
+ return 0;
+ }
+
+ pr_info("KASLR enabled\n");
+
+ /*
+ * KASAN without KASAN_VMALLOC does not expect the module region to
+ * intersect the vmalloc region, since shadow memory is allocated for
+ * each module at load time, whereas the vmalloc region will already be
+ * shadowed by KASAN zero pages.
+ */
+ BUILD_BUG_ON((IS_ENABLED(CONFIG_KASAN_GENERIC) ||
+ IS_ENABLED(CONFIG_KASAN_SW_TAGS)) &&
+ !IS_ENABLED(CONFIG_KASAN_VMALLOC));
+
+ seed = get_random_u32();
if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
/*
@@ -150,8 +157,7 @@ u64 __init kaslr_early_init(void)
* resolved normally.)
*/
module_range = SZ_2G - (u64)(_end - _stext);
- module_alloc_base = max((u64)_end + offset - SZ_2G,
- (u64)MODULES_VADDR);
+ module_alloc_base = max((u64)_end - SZ_2G, (u64)MODULES_VADDR);
} else {
/*
* Randomize the module region by setting module_alloc_base to
@@ -163,33 +169,12 @@ u64 __init kaslr_early_init(void)
* when ARM64_MODULE_PLTS is enabled.
*/
module_range = MODULES_VSIZE - (u64)(_etext - _stext);
- module_alloc_base = (u64)_etext + offset - MODULES_VSIZE;
}
/* use the lower 21 bits to randomize the base of the module region */
module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21;
module_alloc_base &= PAGE_MASK;
- return offset;
-}
-
-static int __init kaslr_init(void)
-{
- switch (kaslr_status) {
- case KASLR_ENABLED:
- pr_info("KASLR enabled\n");
- break;
- case KASLR_DISABLED_CMDLINE:
- pr_info("KASLR disabled on command line\n");
- break;
- case KASLR_DISABLED_NO_SEED:
- pr_warn("KASLR disabled due to lack of seed\n");
- break;
- case KASLR_DISABLED_FDT_REMAP:
- pr_warn("KASLR disabled due to FDT remapping failure\n");
- break;
- }
-
return 0;
}
-core_initcall(kaslr_init)
+late_initcall(kaslr_init)
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 19/26] arm64: kaslr: defer initialization to late initcall where permitted
2022-06-13 14:45 ` [PATCH v4 19/26] arm64: kaslr: defer initialization to late initcall where permitted Ard Biesheuvel
@ 2022-06-24 13:08 ` Will Deacon
2022-06-24 13:09 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Will Deacon @ 2022-06-24 13:08 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:43PM +0200, Ard Biesheuvel wrote:
> The early KASLR init code runs extremely early, and anything that could
> be deferred until later should be. So let's defer the randomization of
> the module region until much later - this also simplifies the
> arithmetic, given that we no longer have to reason about the link time
> vs load time placement of the core kernel explicitly. Also get rid of
> the global status variable, and infer the status reported by the
> diagnostic print from other KASLR related context.
>
> While at it, get rid of the special case for KASAN without
> KASAN_VMALLOC, which never occurs in practice.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/kernel/kaslr.c | 95 +++++++++-----------
> 1 file changed, 40 insertions(+), 55 deletions(-)
[...]
> @@ -163,33 +169,12 @@ u64 __init kaslr_early_init(void)
> * when ARM64_MODULE_PLTS is enabled.
> */
> module_range = MODULES_VSIZE - (u64)(_etext - _stext);
> - module_alloc_base = (u64)_etext + offset - MODULES_VSIZE;
> }
>
> /* use the lower 21 bits to randomize the base of the module region */
> module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21;
> module_alloc_base &= PAGE_MASK;
>
> - return offset;
> -}
> -
> -static int __init kaslr_init(void)
> -{
> - switch (kaslr_status) {
> - case KASLR_ENABLED:
> - pr_info("KASLR enabled\n");
> - break;
> - case KASLR_DISABLED_CMDLINE:
> - pr_info("KASLR disabled on command line\n");
> - break;
> - case KASLR_DISABLED_NO_SEED:
> - pr_warn("KASLR disabled due to lack of seed\n");
> - break;
> - case KASLR_DISABLED_FDT_REMAP:
> - pr_warn("KASLR disabled due to FDT remapping failure\n");
> - break;
> - }
> -
> return 0;
> }
> -core_initcall(kaslr_init)
> +late_initcall(kaslr_init)
Are you sure this isn't too late? I'm nervous that we might have called
request_module() off the back of all the other initcalls that we've run by
this point.
Will
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 19/26] arm64: kaslr: defer initialization to late initcall where permitted
2022-06-24 13:08 ` Will Deacon
@ 2022-06-24 13:09 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-24 13:09 UTC (permalink / raw)
To: Will Deacon
Cc: Linux ARM, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Fri, 24 Jun 2022 at 15:08, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jun 13, 2022 at 04:45:43PM +0200, Ard Biesheuvel wrote:
> > The early KASLR init code runs extremely early, and anything that could
> > be deferred until later should be. So let's defer the randomization of
> > the module region until much later - this also simplifies the
> > arithmetic, given that we no longer have to reason about the link time
> > vs load time placement of the core kernel explicitly. Also get rid of
> > the global status variable, and infer the status reported by the
> > diagnostic print from other KASLR related context.
> >
> > While at it, get rid of the special case for KASAN without
> > KASAN_VMALLOC, which never occurs in practice.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/arm64/kernel/kaslr.c | 95 +++++++++-----------
> > 1 file changed, 40 insertions(+), 55 deletions(-)
>
> [...]
>
> > @@ -163,33 +169,12 @@ u64 __init kaslr_early_init(void)
> > * when ARM64_MODULE_PLTS is enabled.
> > */
> > module_range = MODULES_VSIZE - (u64)(_etext - _stext);
> > - module_alloc_base = (u64)_etext + offset - MODULES_VSIZE;
> > }
> >
> > /* use the lower 21 bits to randomize the base of the module region */
> > module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21;
> > module_alloc_base &= PAGE_MASK;
> >
> > - return offset;
> > -}
> > -
> > -static int __init kaslr_init(void)
> > -{
> > - switch (kaslr_status) {
> > - case KASLR_ENABLED:
> > - pr_info("KASLR enabled\n");
> > - break;
> > - case KASLR_DISABLED_CMDLINE:
> > - pr_info("KASLR disabled on command line\n");
> > - break;
> > - case KASLR_DISABLED_NO_SEED:
> > - pr_warn("KASLR disabled due to lack of seed\n");
> > - break;
> > - case KASLR_DISABLED_FDT_REMAP:
> > - pr_warn("KASLR disabled due to FDT remapping failure\n");
> > - break;
> > - }
> > -
> > return 0;
> > }
> > -core_initcall(kaslr_init)
> > +late_initcall(kaslr_init)
>
> Are you sure this isn't too late? I'm nervous that we might have called
> request_module() off the back of all the other initcalls that we've run by
> this point.
>
Yeah, I just realized the other day that this is probably too late.
subsys_initcall() might be more suitable here
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 20/26] arm64: head: avoid relocating the kernel twice for KASLR
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (18 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 19/26] arm64: kaslr: defer initialization to late initcall where permitted Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-24 13:16 ` Will Deacon
2022-06-13 14:45 ` [PATCH v4 21/26] arm64: setup: drop early FDT pointer helpers Ard Biesheuvel
` (6 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Currently, when KASLR is in effect, we set up the kernel virtual address
space twice: the first time, the KASLR seed is looked up in the device
tree, and the kernel virtual mapping is torn down and recreated again,
after which the relocations are applied a second time. The latter step
means that statically initialized global pointer variables will be reset
to their initial values, and to ensure that BSS variables are not set to
values based on the initial translation, they are cleared again as well.
All of this is needed because we need the command line (taken from the
DT) to tell us whether or not to randomize the virtual address space
before entering the kernel proper. However, this code has expanded
little by little and now creates global state unrelated to the virtual
randomization of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.
So instead, let's use the temporary mapping of the device tree, and
execute the bare minimum of code to decide whether or not KASLR should
be enabled, and what the seed is. Only then, create the virtual kernel
mapping, clear BSS, etc and proceed as normal. This avoids the issues
around inconsistent global state due to BSS being cleared twice, and is
generally more maintainable, as it permits us to defer all the remaining
DT parsing and KASLR initialization to a later time.
This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.
Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early. Note that only the kernel page tables and the
temporary stack are mapped read-write at this point, which ensures that
the early code does not modify any global state inadvertently.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/Makefile | 2 +-
arch/arm64/kernel/head.S | 73 ++++---------
arch/arm64/kernel/image-vars.h | 4 +
arch/arm64/kernel/kaslr.c | 87 ---------------
arch/arm64/kernel/pi/Makefile | 33 ++++++
arch/arm64/kernel/pi/kaslr_early.c | 112 ++++++++++++++++++++
6 files changed, 171 insertions(+), 140 deletions(-)
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index fa7981d0d917..88a96511580e 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -59,7 +59,7 @@ obj-$(CONFIG_ACPI) += acpi.o
obj-$(CONFIG_ACPI_NUMA) += acpi_numa.o
obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL) += acpi_parking_protocol.o
obj-$(CONFIG_PARAVIRT) += paravirt.o
-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
+obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o pi/
obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
obj-$(CONFIG_ELF_CORE) += elfcore.o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o \
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 8de346dd4470..5a2ff6466b6b 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -86,15 +86,13 @@
* x21 primary_entry() .. start_kernel() FDT pointer passed at boot in x0
* x22 create_idmap() .. start_kernel() ID map VA of the DT blob
* x23 primary_entry() .. start_kernel() physical misalignment/KASLR offset
- * x24 __primary_switch() .. relocate_kernel() current RELR displacement
+ * x24 __primary_switch() linear map KASLR seed
* x28 create_idmap() callee preserved temp register
*/
SYM_CODE_START(primary_entry)
bl preserve_boot_args
bl init_kernel_el // w0=cpu_boot_mode
mov x20, x0
- adrp x23, __PHYS_OFFSET
- and x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
bl create_idmap
/*
@@ -441,6 +439,10 @@ SYM_FUNC_START_LOCAL(__primary_switched)
bl __pi_memset
dsb ishst // Make zero page visible to PTW
+#ifdef CONFIG_RANDOMIZE_BASE
+ adrp x5, memstart_offset_seed // Save KASLR linear map seed
+ strh w24, [x5, :lo12:memstart_offset_seed]
+#endif
#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
bl kasan_early_init
#endif
@@ -448,16 +450,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
bl early_fdt_map // Try mapping the FDT early
mov x0, x22 // pass FDT address in x0
bl init_feature_override // Parse cpu feature overrides
-#ifdef CONFIG_RANDOMIZE_BASE
- tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized?
- b.ne 0f
- bl kaslr_early_init // parse FDT for KASLR options
- cbz x0, 0f // KASLR disabled? just proceed
- orr x23, x23, x0 // record KASLR offset
- ldp x29, x30, [sp], #16 // we must enable KASLR, return
- ret // to __primary_switch()
-0:
-#endif
mov x0, x20
bl switch_to_vhe // Prefer VHE if possible
ldp x29, x30, [sp], #16
@@ -759,27 +751,17 @@ SYM_FUNC_START_LOCAL(__relocate_kernel)
* entry in x9, the address being relocated by the current address or
* bitmap entry in x13 and the address being relocated by the current
* bit in x14.
- *
- * Because addends are stored in place in the binary, RELR relocations
- * cannot be applied idempotently. We use x24 to keep track of the
- * currently applied displacement so that we can correctly relocate if
- * __relocate_kernel is called twice with non-zero displacements (i.e.
- * if there is both a physical misalignment and a KASLR displacement).
*/
adr_l x9, __relr_start
adr_l x10, __relr_end
- sub x15, x23, x24 // delta from previous offset
- cbz x15, 7f // nothing to do if unchanged
- mov x24, x23 // save new offset
-
2: cmp x9, x10
b.hs 7f
ldr x11, [x9], #8
tbnz x11, #0, 3f // branch to handle bitmaps
add x13, x11, x23
ldr x12, [x13] // relocate address entry
- add x12, x12, x15
+ add x12, x12, x23
str x12, [x13], #8 // adjust to start of bitmap
b 2b
@@ -788,7 +770,7 @@ SYM_FUNC_START_LOCAL(__relocate_kernel)
cbz x11, 6f
tbz x11, #0, 5f // skip bit if not set
ldr x12, [x14] // relocate bit
- add x12, x12, x15
+ add x12, x12, x23
str x12, [x14]
5: add x14, x14, #8 // move to next bit's address
@@ -812,40 +794,27 @@ SYM_FUNC_START_LOCAL(__primary_switch)
adrp x1, reserved_pg_dir
adrp x2, init_idmap_pg_dir
bl __enable_mmu
-
+#ifdef CONFIG_RELOCATABLE
+ adrp x23, __PHYS_OFFSET
+ and x23, x23, MIN_KIMG_ALIGN - 1
+#ifdef CONFIG_RANDOMIZE_BASE
+ mov x0, x22
+ adrp x1, init_pg_end
+ mov sp, x1
+ mov x29, xzr
+ bl __pi_kaslr_early_init
+ and x24, x0, #SZ_2M - 1 // capture memstart offset seed
+ bic x0, x0, #SZ_2M - 1
+ orr x23, x23, x0 // record kernel offset
+#endif
+#endif
bl clear_page_tables
bl create_kernel_mapping
adrp x1, init_pg_dir
load_ttbr1 x1, x1, x2
#ifdef CONFIG_RELOCATABLE
-#ifdef CONFIG_RELR
- mov x24, #0 // no RELR displacement yet
-#endif
bl __relocate_kernel
-#ifdef CONFIG_RANDOMIZE_BASE
- ldr x8, =__primary_switched
- adrp x0, __PHYS_OFFSET
- blr x8
-
- /*
- * If we return here, we have a KASLR displacement in x23 which we need
- * to take into account by discarding the current kernel mapping and
- * creating a new one.
- */
- adrp x1, reserved_pg_dir // Disable translations via TTBR1
- load_ttbr1 x1, x1, x2
- bl clear_page_tables
- bl create_kernel_mapping // Recreate kernel mapping
-
- tlbi vmalle1 // Remove any stale TLB entries
- dsb nsh
- isb
-
- adrp x1, init_pg_dir // Re-enable translations via TTBR1
- load_ttbr1 x1, x1, x2
- bl __relocate_kernel
-#endif
#endif
ldr x8, =__primary_switched
adrp x0, __PHYS_OFFSET
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 241c86b67d01..0c381a405bf0 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -41,6 +41,10 @@ __efistub_dcache_clean_poc = __pi_dcache_clean_poc;
__efistub___memcpy = __pi_memcpy;
__efistub___memmove = __pi_memmove;
__efistub___memset = __pi_memset;
+
+__pi___memcpy = __pi_memcpy;
+__pi___memmove = __pi_memmove;
+__pi___memset = __pi_memset;
#endif
__efistub__text = _text;
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index af9ffe4d0f0f..06515afce692 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -23,95 +23,8 @@
u64 __ro_after_init module_alloc_base;
u16 __initdata memstart_offset_seed;
-static __init u64 get_kaslr_seed(void *fdt)
-{
- int node, len;
- fdt64_t *prop;
- u64 ret;
-
- node = fdt_path_offset(fdt, "/chosen");
- if (node < 0)
- return 0;
-
- prop = fdt_getprop_w(fdt, node, "kaslr-seed", &len);
- if (!prop || len != sizeof(u64))
- return 0;
-
- ret = fdt64_to_cpu(*prop);
- *prop = 0;
- return ret;
-}
-
struct arm64_ftr_override kaslr_feature_override __initdata;
-/*
- * This routine will be executed with the kernel mapped at its default virtual
- * address, and if it returns successfully, the kernel will be remapped, and
- * start_kernel() will be executed from a randomized virtual offset. The
- * relocation will result in all absolute references (e.g., static variables
- * containing function pointers) to be reinitialized, and zero-initialized
- * .bss variables will be reset to 0.
- */
-u64 __init kaslr_early_init(void)
-{
- void *fdt;
- u64 seed, offset, mask;
- unsigned long raw;
-
- /*
- * Try to map the FDT early. If this fails, we simply bail,
- * and proceed with KASLR disabled. We will make another
- * attempt at mapping the FDT in setup_machine()
- */
- fdt = get_early_fdt_ptr();
- if (!fdt) {
- return 0;
- }
-
- /*
- * Retrieve (and wipe) the seed from the FDT
- */
- seed = get_kaslr_seed(fdt);
-
- /*
- * Check if 'nokaslr' appears on the command line, and
- * return 0 if that is the case.
- */
- if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) {
- return 0;
- }
-
- /*
- * Mix in any entropy obtainable architecturally if enabled
- * and supported.
- */
-
- if (arch_get_random_seed_long_early(&raw))
- seed ^= raw;
-
- if (!seed) {
- return 0;
- }
-
- /*
- * OK, so we are proceeding with KASLR enabled. Calculate a suitable
- * kernel image offset from the seed. Let's place the kernel in the
- * middle half of the VMALLOC area (VA_BITS_MIN - 2), and stay clear of
- * the lower and upper quarters to avoid colliding with other
- * allocations.
- * Even if we could randomize at page granularity for 16k and 64k pages,
- * let's always round to 2 MB so we don't interfere with the ability to
- * map using contiguous PTEs
- */
- mask = ((1UL << (VA_BITS_MIN - 2)) - 1) & ~(SZ_2M - 1);
- offset = BIT(VA_BITS_MIN - 3) + (seed & mask);
-
- /* use the top 16 bits to randomize the linear region */
- memstart_offset_seed = seed >> 48;
-
- return offset;
-}
-
static int __init kaslr_init(void)
{
u64 module_range;
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
new file mode 100644
index 000000000000..839291430cb3
--- /dev/null
+++ b/arch/arm64/kernel/pi/Makefile
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2022 Google LLC
+
+KBUILD_CFLAGS := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \
+ -Os -DDISABLE_BRANCH_PROFILING $(DISABLE_STACKLEAK_PLUGIN) \
+ $(call cc-option,-mbranch-protection=none) \
+ -I$(srctree)/scripts/dtc/libfdt -fno-stack-protector \
+ -include $(srctree)/include/linux/hidden.h \
+ -D__DISABLE_EXPORTS -ffreestanding -D__NO_FORTIFY \
+ $(call cc-option,-fno-addrsig)
+
+# remove SCS flags from all objects in this directory
+KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
+# disable LTO
+KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
+
+GCOV_PROFILE := n
+KASAN_SANITIZE := n
+KCSAN_SANITIZE := n
+UBSAN_SANITIZE := n
+KCOV_INSTRUMENT := n
+
+$(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
+ --remove-section=.note.gnu.property \
+ --prefix-alloc-sections=.init
+$(obj)/%.pi.o: $(obj)/%.o FORCE
+ $(call if_changed,objcopy)
+
+$(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
+ $(call if_changed_rule,cc_o_c)
+
+obj-y := kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+extra-y := $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
new file mode 100644
index 000000000000..6c3855e69395
--- /dev/null
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2022 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+// NOTE: code in this file runs *very* early, and is not permitted to use
+// global variables or anything that relies on absolute addressing.
+
+#include <linux/libfdt.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/string.h>
+
+#include <asm/archrandom.h>
+#include <asm/memory.h>
+
+/* taken from lib/string.c */
+static char *__strstr(const char *s1, const char *s2)
+{
+ size_t l1, l2;
+
+ l2 = strlen(s2);
+ if (!l2)
+ return (char *)s1;
+ l1 = strlen(s1);
+ while (l1 >= l2) {
+ l1--;
+ if (!memcmp(s1, s2, l2))
+ return (char *)s1;
+ s1++;
+ }
+ return NULL;
+}
+static bool cmdline_contains_nokaslr(const u8 *cmdline)
+{
+ const u8 *str;
+
+ str = __strstr(cmdline, "nokaslr");
+ return str == cmdline || (str > cmdline && *(str - 1) == ' ');
+}
+
+static bool is_kaslr_disabled_cmdline(void *fdt)
+{
+ if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
+ int node;
+ const u8 *prop;
+
+ node = fdt_path_offset(fdt, "/chosen");
+ if (node < 0)
+ goto out;
+
+ prop = fdt_getprop(fdt, node, "bootargs", NULL);
+ if (!prop)
+ goto out;
+
+ if (cmdline_contains_nokaslr(prop))
+ return true;
+
+ if (IS_ENABLED(CONFIG_CMDLINE_EXTEND))
+ goto out;
+
+ return false;
+ }
+out:
+ return cmdline_contains_nokaslr(CONFIG_CMDLINE);
+}
+
+static u64 get_kaslr_seed(void *fdt)
+{
+ int node, len;
+ fdt64_t *prop;
+ u64 ret;
+
+ node = fdt_path_offset(fdt, "/chosen");
+ if (node < 0)
+ return 0;
+
+ prop = fdt_getprop_w(fdt, node, "kaslr-seed", &len);
+ if (!prop || len != sizeof(u64))
+ return 0;
+
+ ret = fdt64_to_cpu(*prop);
+ *prop = 0;
+ return ret;
+}
+
+asmlinkage u64 kaslr_early_init(void *fdt)
+{
+ u64 seed;
+
+ if (is_kaslr_disabled_cmdline(fdt))
+ return 0;
+
+ seed = get_kaslr_seed(fdt);
+ if (!seed) {
+#ifdef CONFIG_ARCH_RANDOM
+ if (!__early_cpu_has_rndr() ||
+ !__arm64_rndr((unsigned long *)&seed))
+#endif
+ return 0;
+ }
+
+ /*
+ * OK, so we are proceeding with KASLR enabled. Calculate a suitable
+ * kernel image offset from the seed. Let's place the kernel in the
+ * middle half of the VMALLOC area (VA_BITS_MIN - 2), and stay clear of
+ * the lower and upper quarters to avoid colliding with other
+ * allocations.
+ */
+ return BIT(VA_BITS_MIN - 3) + (seed & GENMASK(VA_BITS_MIN - 3, 0));
+}
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 20/26] arm64: head: avoid relocating the kernel twice for KASLR
2022-06-13 14:45 ` [PATCH v4 20/26] arm64: head: avoid relocating the kernel twice for KASLR Ard Biesheuvel
@ 2022-06-24 13:16 ` Will Deacon
2022-06-24 13:17 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Will Deacon @ 2022-06-24 13:16 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:44PM +0200, Ard Biesheuvel wrote:
> Currently, when KASLR is in effect, we set up the kernel virtual address
> space twice: the first time, the KASLR seed is looked up in the device
> tree, and the kernel virtual mapping is torn down and recreated again,
> after which the relocations are applied a second time. The latter step
> means that statically initialized global pointer variables will be reset
> to their initial values, and to ensure that BSS variables are not set to
> values based on the initial translation, they are cleared again as well.
>
> All of this is needed because we need the command line (taken from the
> DT) to tell us whether or not to randomize the virtual address space
> before entering the kernel proper. However, this code has expanded
> little by little and now creates global state unrelated to the virtual
> randomization of the kernel before the mapping is torn down and set up
> again, and the BSS cleared for a second time. This has created some
> issues in the past, and it would be better to avoid this little dance if
> possible.
>
> So instead, let's use the temporary mapping of the device tree, and
> execute the bare minimum of code to decide whether or not KASLR should
> be enabled, and what the seed is. Only then, create the virtual kernel
> mapping, clear BSS, etc and proceed as normal. This avoids the issues
> around inconsistent global state due to BSS being cleared twice, and is
> generally more maintainable, as it permits us to defer all the remaining
> DT parsing and KASLR initialization to a later time.
>
> This means the relocation fixup code runs only a single time as well,
> allowing us to simplify the RELR handling code too, which is not
> idempotent and was therefore required to keep track of the offset that
> was applied the first time around.
>
> Note that this means we have to clone a pair of FDT library objects, so
> that we can control how they are built - we need the stack protector
> and other instrumentation disabled so that the code can tolerate being
> called this early. Note that only the kernel page tables and the
> temporary stack are mapped read-write at this point, which ensures that
> the early code does not modify any global state inadvertently.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/kernel/Makefile | 2 +-
> arch/arm64/kernel/head.S | 73 ++++---------
> arch/arm64/kernel/image-vars.h | 4 +
> arch/arm64/kernel/kaslr.c | 87 ---------------
> arch/arm64/kernel/pi/Makefile | 33 ++++++
> arch/arm64/kernel/pi/kaslr_early.c | 112 ++++++++++++++++++++
Heh, how long before we get a decompressor in here too?
Will
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 20/26] arm64: head: avoid relocating the kernel twice for KASLR
2022-06-24 13:16 ` Will Deacon
@ 2022-06-24 13:17 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-24 13:17 UTC (permalink / raw)
To: Will Deacon
Cc: Linux ARM, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Fri, 24 Jun 2022 at 15:16, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jun 13, 2022 at 04:45:44PM +0200, Ard Biesheuvel wrote:
> > Currently, when KASLR is in effect, we set up the kernel virtual address
> > space twice: the first time, the KASLR seed is looked up in the device
> > tree, and the kernel virtual mapping is torn down and recreated again,
> > after which the relocations are applied a second time. The latter step
> > means that statically initialized global pointer variables will be reset
> > to their initial values, and to ensure that BSS variables are not set to
> > values based on the initial translation, they are cleared again as well.
> >
> > All of this is needed because we need the command line (taken from the
> > DT) to tell us whether or not to randomize the virtual address space
> > before entering the kernel proper. However, this code has expanded
> > little by little and now creates global state unrelated to the virtual
> > randomization of the kernel before the mapping is torn down and set up
> > again, and the BSS cleared for a second time. This has created some
> > issues in the past, and it would be better to avoid this little dance if
> > possible.
> >
> > So instead, let's use the temporary mapping of the device tree, and
> > execute the bare minimum of code to decide whether or not KASLR should
> > be enabled, and what the seed is. Only then, create the virtual kernel
> > mapping, clear BSS, etc and proceed as normal. This avoids the issues
> > around inconsistent global state due to BSS being cleared twice, and is
> > generally more maintainable, as it permits us to defer all the remaining
> > DT parsing and KASLR initialization to a later time.
> >
> > This means the relocation fixup code runs only a single time as well,
> > allowing us to simplify the RELR handling code too, which is not
> > idempotent and was therefore required to keep track of the offset that
> > was applied the first time around.
> >
> > Note that this means we have to clone a pair of FDT library objects, so
> > that we can control how they are built - we need the stack protector
> > and other instrumentation disabled so that the code can tolerate being
> > called this early. Note that only the kernel page tables and the
> > temporary stack are mapped read-write at this point, which ensures that
> > the early code does not modify any global state inadvertently.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/arm64/kernel/Makefile | 2 +-
> > arch/arm64/kernel/head.S | 73 ++++---------
> > arch/arm64/kernel/image-vars.h | 4 +
> > arch/arm64/kernel/kaslr.c | 87 ---------------
> > arch/arm64/kernel/pi/Makefile | 33 ++++++
> > arch/arm64/kernel/pi/kaslr_early.c | 112 ++++++++++++++++++++
>
> Heh, how long before we get a decompressor in here too?
>
Right after BPF support :-)
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 21/26] arm64: setup: drop early FDT pointer helpers
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (19 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 20/26] arm64: head: avoid relocating the kernel twice for KASLR Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 14:45 ` [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment Ard Biesheuvel
` (5 subsequent siblings)
26 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
We no longer need to call into the kernel to map the FDT before calling
into the kernel so let's drop the helpers we added for this.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/setup.h | 3 ---
arch/arm64/kernel/head.S | 2 --
arch/arm64/kernel/setup.c | 15 ---------------
3 files changed, 20 deletions(-)
diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
index 6437df661700..5f147a418281 100644
--- a/arch/arm64/include/asm/setup.h
+++ b/arch/arm64/include/asm/setup.h
@@ -5,9 +5,6 @@
#include <uapi/asm/setup.h>
-void *get_early_fdt_ptr(void);
-void early_fdt_map(u64 dt_phys);
-
/*
* These two variables are used in the head.S file.
*/
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 5a2ff6466b6b..6bf685f988f1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -446,8 +446,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
bl kasan_early_init
#endif
- mov x0, x21 // pass FDT address in x0
- bl early_fdt_map // Try mapping the FDT early
mov x0, x22 // pass FDT address in x0
bl init_feature_override // Parse cpu feature overrides
mov x0, x20
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index cf3a759f10d4..6c2120afe542 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -163,21 +163,6 @@ static void __init smp_build_mpidr_hash(void)
pr_warn("Large number of MPIDR hash buckets detected\n");
}
-static void *early_fdt_ptr __initdata;
-
-void __init *get_early_fdt_ptr(void)
-{
- return early_fdt_ptr;
-}
-
-asmlinkage void __init early_fdt_map(u64 dt_phys)
-{
- int fdt_size;
-
- early_fixmap_init();
- early_fdt_ptr = fixmap_remap_fdt(dt_phys, &fdt_size, PAGE_KERNEL);
-}
-
static void __init setup_machine_fdt(phys_addr_t dt_phys)
{
int size;
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (20 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 21/26] arm64: setup: drop early FDT pointer helpers Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 17:00 ` Kees Cook
2022-06-13 14:45 ` [PATCH v4 23/26] arm64: head: remap the kernel text/inittext region read-only Ard Biesheuvel
` (4 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Currently, the ro_after_init sections sits right in the middle of the
text/rodata/inittext segment, making it difficult to map any of those
non-writable during early boot. So instead, move it to the start of
.data, and update the init sequences so that the section is remapped
read-only once startup completes.
Note that this moves the entire HYP data section into .data as well -
this likely needs to remain as a single block for now, but could perhaps
split into a .rodata and .data..ro_after_init section later.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/vmlinux.lds.S | 42 ++++++++++++--------
arch/arm64/mm/mmu.c | 29 ++++++++------
2 files changed, 42 insertions(+), 29 deletions(-)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 45131e354e27..736aca63dad1 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -59,6 +59,7 @@
#define RO_EXCEPTION_TABLE_ALIGN 4
#define RUNTIME_DISCARD_EXIT
+#define RO_AFTER_INIT_DATA
#include <asm-generic/vmlinux.lds.h>
#include <asm/cache.h>
@@ -188,30 +189,13 @@ SECTIONS
/* everything from this point to __init_begin will be marked RO NX */
RO_DATA(PAGE_SIZE)
- HYPERVISOR_DATA_SECTIONS
-
/* code sections that are never executed via the kernel mapping */
.rodata.text : {
TRAMP_TEXT
HIBERNATE_TEXT
KEXEC_TEXT
- . = ALIGN(PAGE_SIZE);
}
- idmap_pg_dir = .;
- . += PAGE_SIZE;
-
-#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
- tramp_pg_dir = .;
- . += PAGE_SIZE;
-#endif
-
- reserved_pg_dir = .;
- . += PAGE_SIZE;
-
- swapper_pg_dir = .;
- . += PAGE_SIZE;
-
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
__inittext_begin = .;
@@ -274,6 +258,30 @@ SECTIONS
_data = .;
_sdata = .;
+
+ __start_ro_after_init = .;
+ idmap_pg_dir = .;
+ . += PAGE_SIZE;
+
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+ tramp_pg_dir = .;
+ . += PAGE_SIZE;
+#endif
+ reserved_pg_dir = .;
+ . += PAGE_SIZE;
+
+ swapper_pg_dir = .;
+ . += PAGE_SIZE;
+
+ HYPERVISOR_DATA_SECTIONS
+
+ .data.ro_after_init : {
+ *(.data..ro_after_init)
+ JUMP_TABLE_DATA
+ . = ALIGN(SEGMENT_ALIGN);
+ __end_ro_after_init = .;
+ }
+
RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
/*
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9828ad826837..e9b074ffc768 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -495,11 +495,17 @@ static void __init __map_memblock(pgd_t *pgdp, phys_addr_t start,
void __init mark_linear_text_alias_ro(void)
{
/*
- * Remove the write permissions from the linear alias of .text/.rodata
+ * Remove the write permissions from the linear alias of .text/.rodata/ro_after_init
*/
update_mapping_prot(__pa_symbol(_stext), (unsigned long)lm_alias(_stext),
(unsigned long)__init_begin - (unsigned long)_stext,
PAGE_KERNEL_RO);
+
+ update_mapping_prot(__pa_symbol(__start_ro_after_init),
+ (unsigned long)lm_alias(__start_ro_after_init),
+ (unsigned long)__end_ro_after_init -
+ (unsigned long)__start_ro_after_init,
+ PAGE_KERNEL_RO);
}
static bool crash_mem_map __initdata;
@@ -608,12 +614,10 @@ void mark_rodata_ro(void)
{
unsigned long section_size;
- /*
- * mark .rodata as read only. Use __init_begin rather than __end_rodata
- * to cover NOTES and EXCEPTION_TABLE.
- */
- section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
- update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
+ section_size = (unsigned long)__end_ro_after_init -
+ (unsigned long)__start_ro_after_init;
+ update_mapping_prot(__pa_symbol(__start_ro_after_init),
+ (unsigned long)__start_ro_after_init,
section_size, PAGE_KERNEL_RO);
debug_checkwx();
@@ -733,18 +737,19 @@ static void __init map_kernel(pgd_t *pgdp)
text_prot = __pgprot_modify(text_prot, PTE_GP, PTE_GP);
/*
- * Only rodata will be remapped with different permissions later on,
- * all other segments are allowed to use contiguous mappings.
+ * Only data will be partially remapped with different permissions
+ * later on, all other segments are allowed to use contiguous mappings.
*/
map_kernel_segment(pgdp, _stext, _etext, text_prot, &vmlinux_text, 0,
VM_NO_GUARD);
- map_kernel_segment(pgdp, __start_rodata, __inittext_begin, PAGE_KERNEL,
- &vmlinux_rodata, NO_CONT_MAPPINGS, VM_NO_GUARD);
+ map_kernel_segment(pgdp, __start_rodata, __inittext_begin, PAGE_KERNEL_RO,
+ &vmlinux_rodata, 0, VM_NO_GUARD);
map_kernel_segment(pgdp, __inittext_begin, __inittext_end, text_prot,
&vmlinux_inittext, 0, VM_NO_GUARD);
map_kernel_segment(pgdp, __initdata_begin, __initdata_end, PAGE_KERNEL,
&vmlinux_initdata, 0, VM_NO_GUARD);
- map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0);
+ map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data,
+ NO_CONT_MAPPINGS | NO_BLOCK_MAPPINGS, 0);
if (!READ_ONCE(pgd_val(*pgd_offset_pgd(pgdp, FIXADDR_START)))) {
/*
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment
2022-06-13 14:45 ` [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment Ard Biesheuvel
@ 2022-06-13 17:00 ` Kees Cook
2022-06-13 17:16 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Kees Cook @ 2022-06-13 17:00 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:46PM +0200, Ard Biesheuvel wrote:
> Currently, the ro_after_init sections sits right in the middle of the
> text/rodata/inittext segment, making it difficult to map any of those
> non-writable during early boot. So instead, move it to the start of
> .data, and update the init sequences so that the section is remapped
> read-only once startup completes.
>
> Note that this moves the entire HYP data section into .data as well -
> this likely needs to remain as a single block for now, but could perhaps
> split into a .rodata and .data..ro_after_init section later.
If I'm reading this correctly, this means that .data..ro_after_init now
lives between .data and .rodata?
Do the various LKDTM tests still pass after this change?
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment
2022-06-13 17:00 ` Kees Cook
@ 2022-06-13 17:16 ` Ard Biesheuvel
2022-06-13 23:38 ` Kees Cook
0 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 17:16 UTC (permalink / raw)
To: Kees Cook
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, 13 Jun 2022 at 19:00, Kees Cook <keescook@chromium.org> wrote:
>
> On Mon, Jun 13, 2022 at 04:45:46PM +0200, Ard Biesheuvel wrote:
> > Currently, the ro_after_init sections sits right in the middle of the
> > text/rodata/inittext segment, making it difficult to map any of those
> > non-writable during early boot. So instead, move it to the start of
> > .data, and update the init sequences so that the section is remapped
> > read-only once startup completes.
> >
> > Note that this moves the entire HYP data section into .data as well -
> > this likely needs to remain as a single block for now, but could perhaps
> > split into a .rodata and .data..ro_after_init section later.
>
> If I'm reading this correctly, this means that .data..ro_after_init now
> lives between .data and .rodata?
>
No, between .initdata and .data
> Do the various LKDTM tests still pass after this change?
>
Good question, I'll check.
> Reviewed-by: Kees Cook <keescook@chromium.org>
>
> --
> Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment
2022-06-13 17:16 ` Ard Biesheuvel
@ 2022-06-13 23:38 ` Kees Cook
2022-06-16 11:31 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Kees Cook @ 2022-06-13 23:38 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 07:16:15PM +0200, Ard Biesheuvel wrote:
> On Mon, 13 Jun 2022 at 19:00, Kees Cook <keescook@chromium.org> wrote:
> >
> > On Mon, Jun 13, 2022 at 04:45:46PM +0200, Ard Biesheuvel wrote:
> > > Currently, the ro_after_init sections sits right in the middle of the
> > > text/rodata/inittext segment, making it difficult to map any of those
> > > non-writable during early boot. So instead, move it to the start of
> > > .data, and update the init sequences so that the section is remapped
> > > read-only once startup completes.
> > >
> > > Note that this moves the entire HYP data section into .data as well -
> > > this likely needs to remain as a single block for now, but could perhaps
> > > split into a .rodata and .data..ro_after_init section later.
> >
> > If I'm reading this correctly, this means that .data..ro_after_init now
> > lives between .data and .rodata?
> >
>
> No, between .initdata and .data
Ah, doesn't this mean more padding (for segment alignment) used? On other
architectures .data..ro_after_init tried to be near the writable/read-only
boundary so segment padding was only needed on one side (e.g. it could
live at the end of .rodata without segment alignment but before .data
which was segment aligned.) Then when .rodata was made read-only (after
__init), .data..ro_after_init would also get set read-only.
In this case, I think it ends up needing segment alignment both at the
front and the end, since the .initdata and .data are freed and left
writable, respectively?
--
Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment
2022-06-13 23:38 ` Kees Cook
@ 2022-06-16 11:31 ` Ard Biesheuvel
2022-06-16 16:18 ` Kees Cook
0 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-16 11:31 UTC (permalink / raw)
To: Kees Cook
Cc: Linux ARM, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Tue, 14 Jun 2022 at 01:38, Kees Cook <keescook@chromium.org> wrote:
>
> On Mon, Jun 13, 2022 at 07:16:15PM +0200, Ard Biesheuvel wrote:
> > On Mon, 13 Jun 2022 at 19:00, Kees Cook <keescook@chromium.org> wrote:
> > >
> > > On Mon, Jun 13, 2022 at 04:45:46PM +0200, Ard Biesheuvel wrote:
> > > > Currently, the ro_after_init sections sits right in the middle of the
> > > > text/rodata/inittext segment, making it difficult to map any of those
> > > > non-writable during early boot. So instead, move it to the start of
> > > > .data, and update the init sequences so that the section is remapped
> > > > read-only once startup completes.
> > > >
> > > > Note that this moves the entire HYP data section into .data as well -
> > > > this likely needs to remain as a single block for now, but could perhaps
> > > > split into a .rodata and .data..ro_after_init section later.
> > >
> > > If I'm reading this correctly, this means that .data..ro_after_init now
> > > lives between .data and .rodata?
> > >
> >
> > No, between .initdata and .data
>
> Ah, doesn't this mean more padding (for segment alignment) used? On other
> architectures .data..ro_after_init tried to be near the writable/read-only
> boundary so segment padding was only needed on one side (e.g. it could
> live at the end of .rodata without segment alignment but before .data
> which was segment aligned.) Then when .rodata was made read-only (after
> __init), .data..ro_after_init would also get set read-only.
>
> In this case, I think it ends up needing segment alignment both at the
> front and the end, since the .initdata and .data are freed and left
> writable, respectively?
>
We used to have
text
--
rodata
(ro_after_init)
--
inittext
--
initdata
--
data
bss
where -- are the segment boundaries, which are always aligned to 64k on arm64
After this patch, we get
text
--
rodata
--
inittext
--
initdata
--
(ro_after_init)
data
bss
so in terms of padding due to alignment, there is not a lot of difference.
The main difference here is the fact that we lose the ability to use
block mappings, but if anyone cares about that, we could work around
this by creating a separate segment for ro_after_init.
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment
2022-06-16 11:31 ` Ard Biesheuvel
@ 2022-06-16 16:18 ` Kees Cook
2022-06-16 16:31 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Kees Cook @ 2022-06-16 16:18 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Linux ARM, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Thu, Jun 16, 2022 at 01:31:23PM +0200, Ard Biesheuvel wrote:
> We used to have
>
> text
> --
> rodata
> (ro_after_init)
> --
> inittext
> --
> initdata
> --
> data
> bss
>
> where -- are the segment boundaries, which are always aligned to 64k on arm64
>
> After this patch, we get
>
> text
> --
> rodata
> --
> inittext
> --
> initdata
> --
> (ro_after_init)
> data
> bss
>
> so in terms of padding due to alignment, there is not a lot of difference.
But how is ro_after_init read-only and data isn't, if there isn't a
segment alignment to make that work out?
--
Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment
2022-06-16 16:18 ` Kees Cook
@ 2022-06-16 16:31 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-16 16:31 UTC (permalink / raw)
To: Kees Cook
Cc: Linux ARM, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Thu, 16 Jun 2022 at 18:18, Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Jun 16, 2022 at 01:31:23PM +0200, Ard Biesheuvel wrote:
> > We used to have
> >
> > text
> > --
> > rodata
> > (ro_after_init)
> > --
> > inittext
> > --
> > initdata
> > --
> > data
> > bss
> >
> > where -- are the segment boundaries, which are always aligned to 64k on arm64
> >
> > After this patch, we get
> >
> > text
> > --
> > rodata
> > --
> > inittext
> > --
> > initdata
> > --
> > (ro_after_init)
> > data
> > bss
> >
> > so in terms of padding due to alignment, there is not a lot of difference.
>
> But how is ro_after_init read-only and data isn't, if there isn't a
> segment alignment to make that work out?
>
Actually, there is a segment alignment between ro_after_init and data
- my diagram is inaccurate. But we don't actually need that to remap
this slice of memory r/o
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 23/26] arm64: head: remap the kernel text/inittext region read-only
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (21 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 22/26] arm64: mm: move ro_after_init section into the data segment Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 16:57 ` Kees Cook
2022-06-13 14:45 ` [PATCH v4 24/26] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
` (3 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
In order to be able to run with WXN from boot (which could potentially
be under a hypervisor regime that mandates this), update the temporary
kernel page tables with read-only attributes for the text regions before
attempting to execute from them.
This is rather straight-forward for 16k and 64k granule configurations,
as the split between executable and writable regions is guaranteed to be
aligned to the granule used for the early kernel page tables. For 4k, it
involves installing a single table entry and populating it accordingly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/assembler.h | 8 +++
arch/arm64/kernel/head.S | 73 ++++++++++++++++++--
arch/arm64/kernel/vmlinux.lds.S | 2 +-
arch/arm64/mm/proc.S | 11 ---
4 files changed, 78 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index b2584709c332..e1e652410d7d 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -507,6 +507,14 @@ alternative_endif
load_ttbr1 \page_table, \tmp, \tmp2
.endm
+ .macro __idmap_cpu_set_reserved_ttbr1, tmp1, tmp2
+ adrp \tmp1, reserved_pg_dir
+ load_ttbr1 \tmp1, \tmp1, \tmp2
+ tlbi vmalle1
+ dsb nsh
+ isb
+ .endm
+
/*
* reset_pmuserenr_el0 - reset PMUSERENR_EL0 if PMUv3 present
*/
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 6bf685f988f1..92cbad41eed8 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -87,7 +87,7 @@
* x22 create_idmap() .. start_kernel() ID map VA of the DT blob
* x23 primary_entry() .. start_kernel() physical misalignment/KASLR offset
* x24 __primary_switch() linear map KASLR seed
- * x28 create_idmap() callee preserved temp register
+ * x28 create_idmap(), remap_kernel_text() callee preserved temp register
*/
SYM_CODE_START(primary_entry)
bl preserve_boot_args
@@ -380,6 +380,66 @@ SYM_FUNC_START_LOCAL(create_kernel_mapping)
ret
SYM_FUNC_END(create_kernel_mapping)
+SYM_FUNC_START_LOCAL(remap_kernel_text)
+ mov x28, lr
+
+ ldr_l x1, kimage_vaddr
+ mov x2, x1
+ ldr_l x3, .Linitdata_begin
+ adrp x4, _text
+ bic x4, x4, #SWAPPER_BLOCK_SIZE - 1
+ mov x5, SWAPPER_RX_MMUFLAGS
+ mov x6, #SWAPPER_BLOCK_SHIFT
+ bl remap_region
+
+#if SWAPPER_BLOCK_SHIFT > PAGE_SHIFT
+ /*
+ * If the boundary between inittext and initdata happens to be aligned
+ * sufficiently, we are done here. Otherwise, we have to replace its block
+ * entry with a table entry, and populate the lower level table accordingly.
+ */
+ ldr_l x3, .Linitdata_begin
+ tst x3, #SWAPPER_BLOCK_SIZE - 1
+ b.eq 0f
+
+ /* First, create a table mapping to replace the block mapping */
+ ldr_l x1, kimage_vaddr
+ bic x2, x3, #SWAPPER_BLOCK_SIZE - 1
+ adrp x4, init_pg_end - PAGE_SIZE
+ mov x5, #PMD_TYPE_TABLE
+ mov x6, #SWAPPER_BLOCK_SHIFT
+ bl remap_region
+
+ /* Apply executable permissions to the first subregion */
+ adrp x0, init_pg_end - PAGE_SIZE
+ ldr_l x3, .Linitdata_begin
+ bic x1, x3, #SWAPPER_BLOCK_SIZE - 1
+ mov x2, x1
+ adrp x4, __initdata_begin
+ bic x4, x4, #SWAPPER_BLOCK_SIZE - 1
+ mov x5, SWAPPER_RX_MMUFLAGS | PTE_TYPE_PAGE
+ mov x6, #PAGE_SHIFT
+ bl remap_region
+
+ /* Apply writable permissions to the second subregion */
+ ldr_l x2, .Linitdata_begin
+ bic x1, x2, #SWAPPER_BLOCK_SIZE - 1
+ add x3, x1, #SWAPPER_BLOCK_SIZE
+ adrp x4, __initdata_begin
+ mov x5, SWAPPER_RW_MMUFLAGS | PTE_TYPE_PAGE
+ mov x6, #PAGE_SHIFT
+ bl remap_region
+#endif
+0: dsb ishst
+ ret x28
+SYM_FUNC_END(remap_kernel_text)
+
+ __INITDATA
+ .align 3
+.Linitdata_begin:
+ .quad __initdata_begin
+ .previous
+
/*
* Initialize CPU registers with task-specific and cpu-specific context.
*
@@ -808,12 +868,17 @@ SYM_FUNC_START_LOCAL(__primary_switch)
#endif
bl clear_page_tables
bl create_kernel_mapping
-
+#ifdef CONFIG_RELOCATABLE
adrp x1, init_pg_dir
load_ttbr1 x1, x1, x2
-#ifdef CONFIG_RELOCATABLE
- bl __relocate_kernel
+ bl __relocate_kernel // preserves x0
+
+ __idmap_cpu_set_reserved_ttbr1 x1, x2
#endif
+ bl remap_kernel_text
+ adrp x1, init_pg_dir
+ load_ttbr1 x1, x1, x2
+
ldr x8, =__primary_switched
adrp x0, __PHYS_OFFSET
br x8
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 736aca63dad1..3830c6c66e46 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -310,7 +310,7 @@ SECTIONS
. = ALIGN(PAGE_SIZE);
init_pg_dir = .;
- . += INIT_DIR_SIZE;
+ . += INIT_DIR_SIZE + PAGE_SIZE;
init_pg_end = .;
. = ALIGN(SEGMENT_ALIGN);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 493b8ffc9be5..c237e976b138 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -168,17 +168,6 @@ SYM_FUNC_END(cpu_do_resume)
.pushsection ".idmap.text", "awx"
-.macro __idmap_cpu_set_reserved_ttbr1, tmp1, tmp2
- adrp \tmp1, reserved_pg_dir
- phys_to_ttbr \tmp2, \tmp1
- offset_ttbr1 \tmp2, \tmp1
- msr ttbr1_el1, \tmp2
- isb
- tlbi vmalle1
- dsb nsh
- isb
-.endm
-
/*
* void idmap_cpu_replace_ttbr1(phys_addr_t ttbr1)
*
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 23/26] arm64: head: remap the kernel text/inittext region read-only
2022-06-13 14:45 ` [PATCH v4 23/26] arm64: head: remap the kernel text/inittext region read-only Ard Biesheuvel
@ 2022-06-13 16:57 ` Kees Cook
0 siblings, 0 replies; 57+ messages in thread
From: Kees Cook @ 2022-06-13 16:57 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:47PM +0200, Ard Biesheuvel wrote:
> In order to be able to run with WXN from boot (which could potentially
> be under a hypervisor regime that mandates this), update the temporary
> kernel page tables with read-only attributes for the text regions before
> attempting to execute from them.
>
> This is rather straight-forward for 16k and 64k granule configurations,
> as the split between executable and writable regions is guaranteed to be
> aligned to the granule used for the early kernel page tables. For 4k, it
> involves installing a single table entry and populating it accordingly.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 24/26] mm: add arch hook to validate mmap() prot flags
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (22 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 23/26] arm64: head: remap the kernel text/inittext region read-only Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 16:37 ` Kees Cook
2022-06-13 14:45 ` [PATCH v4 25/26] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel
` (2 subsequent siblings)
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Add a hook to permit architectures to perform validation on the prot
flags passed to mmap(), like arch_validate_prot() does for mprotect().
This will be used by arm64 to reject PROT_WRITE+PROT_EXEC mappings on
configurations that run with WXN enabled.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
include/linux/mman.h | 15 +++++++++++++++
mm/mmap.c | 3 +++
2 files changed, 18 insertions(+)
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 58b3abd457a3..53ac72310ce0 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -120,6 +120,21 @@ static inline bool arch_validate_flags(unsigned long flags)
#define arch_validate_flags arch_validate_flags
#endif
+#ifndef arch_validate_mmap_prot
+/*
+ * This is called from mmap(), which ignores unknown prot bits so the default
+ * is to accept anything.
+ *
+ * Returns true if the prot flags are valid
+ */
+static inline bool arch_validate_mmap_prot(unsigned long prot,
+ unsigned long addr)
+{
+ return true;
+}
+#define arch_validate_mmap_prot arch_validate_mmap_prot
+#endif
+
/*
* Optimisation macro. It is equivalent to:
* (x & bit1) ? bit2 : 0
diff --git a/mm/mmap.c b/mm/mmap.c
index 61e6135c54ef..4a585879937d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1437,6 +1437,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
if (!(file && path_noexec(&file->f_path)))
prot |= PROT_EXEC;
+ if (!arch_validate_mmap_prot(prot, addr))
+ return -EACCES;
+
/* force arch specific MAP_FIXED handling in get_unmapped_area */
if (flags & MAP_FIXED_NOREPLACE)
flags |= MAP_FIXED;
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 24/26] mm: add arch hook to validate mmap() prot flags
2022-06-13 14:45 ` [PATCH v4 24/26] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
@ 2022-06-13 16:37 ` Kees Cook
2022-06-13 16:44 ` Ard Biesheuvel
0 siblings, 1 reply; 57+ messages in thread
From: Kees Cook @ 2022-06-13 16:37 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:48PM +0200, Ard Biesheuvel wrote:
> Add a hook to permit architectures to perform validation on the prot
> flags passed to mmap(), like arch_validate_prot() does for mprotect().
> This will be used by arm64 to reject PROT_WRITE+PROT_EXEC mappings on
> configurations that run with WXN enabled.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> include/linux/mman.h | 15 +++++++++++++++
> mm/mmap.c | 3 +++
> 2 files changed, 18 insertions(+)
>
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 58b3abd457a3..53ac72310ce0 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -120,6 +120,21 @@ static inline bool arch_validate_flags(unsigned long flags)
> #define arch_validate_flags arch_validate_flags
> #endif
>
> +#ifndef arch_validate_mmap_prot
> +/*
> + * This is called from mmap(), which ignores unknown prot bits so the default
> + * is to accept anything.
> + *
> + * Returns true if the prot flags are valid
> + */
> +static inline bool arch_validate_mmap_prot(unsigned long prot,
> + unsigned long addr)
> +{
> + return true;
> +}
> +#define arch_validate_mmap_prot arch_validate_mmap_prot
> +#endif
> +
> /*
> * Optimisation macro. It is equivalent to:
> * (x & bit1) ? bit2 : 0
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 61e6135c54ef..4a585879937d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1437,6 +1437,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> if (!(file && path_noexec(&file->f_path)))
> prot |= PROT_EXEC;
>
> + if (!arch_validate_mmap_prot(prot, addr))
> + return -EACCES;
I assume yes, but just to be clear, the existing userspace programs that
can switch modes are checking for EACCES? (Or are just just checking for
failure generally?) It looks like, for example, SELinux returns EACCES
too, so this looks correct. (Looking at the mmap man page, it seems the
ship has sailed for this to be EPERM, which looks more correct to me,
but so be it.)
> +
> /* force arch specific MAP_FIXED handling in get_unmapped_area */
> if (flags & MAP_FIXED_NOREPLACE)
> flags |= MAP_FIXED;
> --
> 2.30.2
>
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 24/26] mm: add arch hook to validate mmap() prot flags
2022-06-13 16:37 ` Kees Cook
@ 2022-06-13 16:44 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 16:44 UTC (permalink / raw)
To: Kees Cook
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, 13 Jun 2022 at 18:37, Kees Cook <keescook@chromium.org> wrote:
>
> On Mon, Jun 13, 2022 at 04:45:48PM +0200, Ard Biesheuvel wrote:
> > Add a hook to permit architectures to perform validation on the prot
> > flags passed to mmap(), like arch_validate_prot() does for mprotect().
> > This will be used by arm64 to reject PROT_WRITE+PROT_EXEC mappings on
> > configurations that run with WXN enabled.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > include/linux/mman.h | 15 +++++++++++++++
> > mm/mmap.c | 3 +++
> > 2 files changed, 18 insertions(+)
> >
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 58b3abd457a3..53ac72310ce0 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -120,6 +120,21 @@ static inline bool arch_validate_flags(unsigned long flags)
> > #define arch_validate_flags arch_validate_flags
> > #endif
> >
> > +#ifndef arch_validate_mmap_prot
> > +/*
> > + * This is called from mmap(), which ignores unknown prot bits so the default
> > + * is to accept anything.
> > + *
> > + * Returns true if the prot flags are valid
> > + */
> > +static inline bool arch_validate_mmap_prot(unsigned long prot,
> > + unsigned long addr)
> > +{
> > + return true;
> > +}
> > +#define arch_validate_mmap_prot arch_validate_mmap_prot
> > +#endif
> > +
> > /*
> > * Optimisation macro. It is equivalent to:
> > * (x & bit1) ? bit2 : 0
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 61e6135c54ef..4a585879937d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1437,6 +1437,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > if (!(file && path_noexec(&file->f_path)))
> > prot |= PROT_EXEC;
> >
> > + if (!arch_validate_mmap_prot(prot, addr))
> > + return -EACCES;
>
> I assume yes, but just to be clear, the existing userspace programs that
> can switch modes are checking for EACCES? (Or are just just checking for
> failure generally?) It looks like, for example, SELinux returns EACCES
> too, so this looks correct. (Looking at the mmap man page, it seems the
> ship has sailed for this to be EPERM, which looks more correct to me,
> but so be it.)
>
Taking libffi for example, it will use the fallback on either EPERM or
EACCES, but only if it thinks selinux is enabled. If it thinks PaX is
enabled, it will not even try PROT_WRITE+PROT_EXEC, and use the
fallback unconditionally.
The only other occurrence I needed to fix in my user space was
libpcre2, but there, I had to rebuild it with --enable-git-sealloc
(which, presumably, more selinux minded distros are doing already)
> > +
> > /* force arch specific MAP_FIXED handling in get_unmapped_area */
> > if (flags & MAP_FIXED_NOREPLACE)
> > flags |= MAP_FIXED;
> > --
> > 2.30.2
> >
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
>
> --
> Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 25/26] arm64: mm: add support for WXN memory translation attribute
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (23 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 24/26] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 16:51 ` Kees Cook
2022-06-13 14:45 ` [PATCH v4 26/26] arm64: kernel: move ID map out of .text mapping Ard Biesheuvel
2022-06-24 13:19 ` [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Will Deacon
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
The AArch64 virtual memory system supports a global WXN control, which
can be enabled to make all writable mappings implicitly no-exec. This is
a useful hardening feature, as it prevents mistakes in managing page
table permissions from being exploited to attack the system.
When enabled at EL1, the restrictions apply to both EL1 and EL0. EL1 is
completely under our control, and has been cleaned up to allow WXN to be
enabled from boot onwards. EL0 is not under our control, but given that
widely deployed security features such as selinux or PaX already limit
the ability of user space to create mappings that are writable and
executable at the same time, the impact of enabling this for EL0 is
expected to be limited. (For this reason, common user space libraries
that have a legitimate need for manipulating executable code already
carry fallbacks such as [0].)
If enabled at compile time, the feature can still be disabled at boot if
needed, by passing arm64.nowxn on the kernel command line.
[0] https://github.com/libffi/libffi/blob/master/src/closures.c#L440
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/Kconfig | 11 ++++++
arch/arm64/include/asm/cpufeature.h | 8 +++++
arch/arm64/include/asm/mman.h | 36 ++++++++++++++++++++
arch/arm64/include/asm/mmu_context.h | 30 +++++++++++++++-
arch/arm64/kernel/head.S | 28 ++++++++++++++-
arch/arm64/kernel/idreg-override.c | 16 +++++++++
arch/arm64/mm/proc.S | 6 ++++
7 files changed, 133 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1652a9800ebe..d262d5ab4316 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1422,6 +1422,17 @@ config RODATA_FULL_DEFAULT_ENABLED
This requires the linear region to be mapped down to pages,
which may adversely affect performance in some cases.
+config ARM64_WXN
+ bool "Enable WXN attribute so all writable mappings are non-exec"
+ help
+ Set the WXN bit in the SCTLR system register so that all writable
+ mappings are treated as if the PXN/UXN bit is set as well.
+ If this is set to Y, it can still be disabled at runtime by
+ passing 'arm64.nowxn' on the kernel command line.
+
+ This should only be set if no software needs to be supported that
+ relies on being able to execute from writable mappings.
+
config ARM64_SW_TTBR0_PAN
bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
help
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 14a8f3d93add..fc364c4d31e2 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -911,10 +911,18 @@ extern struct arm64_ftr_override id_aa64mmfr1_override;
extern struct arm64_ftr_override id_aa64pfr1_override;
extern struct arm64_ftr_override id_aa64isar1_override;
extern struct arm64_ftr_override id_aa64isar2_override;
+extern struct arm64_ftr_override sctlr_override;
u32 get_kvm_ipa_limit(void);
void dump_cpu_features(void);
+static inline bool arm64_wxn_enabled(void)
+{
+ if (!IS_ENABLED(CONFIG_ARM64_WXN))
+ return false;
+ return (sctlr_override.val & sctlr_override.mask & 0xf) == 0;
+}
+
#endif /* __ASSEMBLY__ */
#endif
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 5966ee4a6154..6d4940342ba7 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -35,11 +35,40 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
}
#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
+static inline bool arm64_check_wx_prot(unsigned long prot,
+ struct task_struct *tsk)
+{
+ /*
+ * When we are running with SCTLR_ELx.WXN==1, writable mappings are
+ * implicitly non-executable. This means we should reject such mappings
+ * when user space attempts to create them using mmap() or mprotect().
+ */
+ if (arm64_wxn_enabled() &&
+ ((prot & (PROT_WRITE | PROT_EXEC)) == (PROT_WRITE | PROT_EXEC))) {
+ /*
+ * User space libraries such as libffi carry elaborate
+ * heuristics to decide whether it is worth it to even attempt
+ * to create writable executable mappings, as PaX or selinux
+ * enabled systems will outright reject it. They will usually
+ * fall back to something else (e.g., two separate shared
+ * mmap()s of a temporary file) on failure.
+ */
+ pr_info_ratelimited(
+ "process %s (%d) attempted to create PROT_WRITE+PROT_EXEC mapping\n",
+ tsk->comm, tsk->pid);
+ return false;
+ }
+ return true;
+}
+
static inline bool arch_validate_prot(unsigned long prot,
unsigned long addr __always_unused)
{
unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM;
+ if (!arm64_check_wx_prot(prot, current))
+ return false;
+
if (system_supports_bti())
supported |= PROT_BTI;
@@ -50,6 +79,13 @@ static inline bool arch_validate_prot(unsigned long prot,
}
#define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr)
+static inline bool arch_validate_mmap_prot(unsigned long prot,
+ unsigned long addr)
+{
+ return arm64_check_wx_prot(prot, current);
+}
+#define arch_validate_mmap_prot arch_validate_mmap_prot
+
static inline bool arch_validate_flags(unsigned long vm_flags)
{
if (!system_supports_mte())
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index c7ccd82db1d2..cd4bb5410a18 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -19,13 +19,41 @@
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/proc-fns.h>
-#include <asm-generic/mm_hooks.h>
#include <asm/cputype.h>
#include <asm/sysreg.h>
#include <asm/tlbflush.h>
extern bool rodata_full;
+static inline int arch_dup_mmap(struct mm_struct *oldmm,
+ struct mm_struct *mm)
+{
+ return 0;
+}
+
+static inline void arch_exit_mmap(struct mm_struct *mm)
+{
+}
+
+static inline void arch_unmap(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+}
+
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+ bool write, bool execute, bool foreign)
+{
+ if (IS_ENABLED(CONFIG_ARM64_WXN) && execute &&
+ (vma->vm_flags & (VM_WRITE | VM_EXEC)) == (VM_WRITE | VM_EXEC)) {
+ pr_warn_ratelimited(
+ "process %s (%d) attempted to execute from writable memory\n",
+ current->comm, current->pid);
+ /* disallow unless the nowxn override is set */
+ return !arm64_wxn_enabled();
+ }
+ return true;
+}
+
static inline void contextidr_thread_switch(struct task_struct *next)
{
if (!IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR))
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 92cbad41eed8..834afdc1c6ff 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -511,6 +511,12 @@ SYM_FUNC_START_LOCAL(__primary_switched)
mov x0, x20
bl switch_to_vhe // Prefer VHE if possible
ldp x29, x30, [sp], #16
+#ifdef CONFIG_ARM64_WXN
+ ldr_l x1, sctlr_override + FTR_OVR_VAL_OFFSET
+ tbz x1, #0, 0f
+ blr lr
+0:
+#endif
bl start_kernel
ASM_BUG()
SYM_FUNC_END(__primary_switched)
@@ -881,5 +887,25 @@ SYM_FUNC_START_LOCAL(__primary_switch)
ldr x8, =__primary_switched
adrp x0, __PHYS_OFFSET
- br x8
+ blr x8
+#ifdef CONFIG_ARM64_WXN
+ /*
+ * If we return here, we need to disable WXN before we proceed. This
+ * requires the MMU to be disabled, so it needs to occur while running
+ * from the ID map.
+ */
+ mrs x0, sctlr_el1
+ bic x1, x0, #SCTLR_ELx_M
+ msr sctlr_el1, x1
+ isb
+
+ tlbi vmalle1
+ dsb nsh
+ isb
+
+ bic x0, x0, #SCTLR_ELx_WXN
+ msr sctlr_el1, x0
+ isb
+ ret
+#endif
SYM_FUNC_END(__primary_switch)
diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index f92836e196e5..85d8fa47d196 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -94,12 +94,27 @@ static const struct ftr_set_desc kaslr __initconst = {
},
};
+#ifdef CONFIG_ARM64_WXN
+asmlinkage struct arm64_ftr_override sctlr_override __ro_after_init;
+static const struct ftr_set_desc sctlr __initconst = {
+ .name = "sctlr",
+ .override = &sctlr_override,
+ .fields = {
+ { "nowxn", 0 },
+ {}
+ },
+};
+#endif
+
static const struct ftr_set_desc * const regs[] __initconst = {
&mmfr1,
&pfr1,
&isar1,
&isar2,
&kaslr,
+#ifdef CONFIG_ARM64_WXN
+ &sctlr,
+#endif
};
static const struct {
@@ -115,6 +130,7 @@ static const struct {
"id_aa64isar2.gpa3=0 id_aa64isar2.apa3=0" },
{ "arm64.nomte", "id_aa64pfr1.mte=0" },
{ "nokaslr", "kaslr.disabled=1" },
+ { "arm64.nowxn", "sctlr.nowxn=1" },
};
static int __init find_field(const char *cmdline,
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index c237e976b138..9ffdf1091d97 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -487,6 +487,12 @@ SYM_FUNC_START(__cpu_setup)
* Prepare SCTLR
*/
mov_q x0, INIT_SCTLR_EL1_MMU_ON
+#ifdef CONFIG_ARM64_WXN
+ ldr_l x1, sctlr_override + FTR_OVR_VAL_OFFSET
+ tst x1, #0x1 // WXN disabled on command line?
+ orr x1, x0, #SCTLR_ELx_WXN
+ csel x0, x0, x1, ne
+#endif
ret // return to head.S
.unreq mair
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 25/26] arm64: mm: add support for WXN memory translation attribute
2022-06-13 14:45 ` [PATCH v4 25/26] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel
@ 2022-06-13 16:51 ` Kees Cook
0 siblings, 0 replies; 57+ messages in thread
From: Kees Cook @ 2022-06-13 16:51 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:49PM +0200, Ard Biesheuvel wrote:
> The AArch64 virtual memory system supports a global WXN control, which
> can be enabled to make all writable mappings implicitly no-exec. This is
> a useful hardening feature, as it prevents mistakes in managing page
> table permissions from being exploited to attack the system.
>
> When enabled at EL1, the restrictions apply to both EL1 and EL0. EL1 is
> completely under our control, and has been cleaned up to allow WXN to be
> enabled from boot onwards. EL0 is not under our control, but given that
> widely deployed security features such as selinux or PaX already limit
> the ability of user space to create mappings that are writable and
> executable at the same time, the impact of enabling this for EL0 is
> expected to be limited. (For this reason, common user space libraries
> that have a legitimate need for manipulating executable code already
> carry fallbacks such as [0].)
>
> If enabled at compile time, the feature can still be disabled at boot if
> needed, by passing arm64.nowxn on the kernel command line.
>
> [0] https://github.com/libffi/libffi/blob/master/src/closures.c#L440
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/Kconfig | 11 ++++++
> arch/arm64/include/asm/cpufeature.h | 8 +++++
> arch/arm64/include/asm/mman.h | 36 ++++++++++++++++++++
> arch/arm64/include/asm/mmu_context.h | 30 +++++++++++++++-
> arch/arm64/kernel/head.S | 28 ++++++++++++++-
> arch/arm64/kernel/idreg-override.c | 16 +++++++++
> arch/arm64/mm/proc.S | 6 ++++
> 7 files changed, 133 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1652a9800ebe..d262d5ab4316 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1422,6 +1422,17 @@ config RODATA_FULL_DEFAULT_ENABLED
> This requires the linear region to be mapped down to pages,
> which may adversely affect performance in some cases.
>
> +config ARM64_WXN
> + bool "Enable WXN attribute so all writable mappings are non-exec"
> + help
> + Set the WXN bit in the SCTLR system register so that all writable
> + mappings are treated as if the PXN/UXN bit is set as well.
> + If this is set to Y, it can still be disabled at runtime by
> + passing 'arm64.nowxn' on the kernel command line.
> +
> + This should only be set if no software needs to be supported that
> + relies on being able to execute from writable mappings.
Should this instead just be a "default value of arm64.xwn" config? It
seems like it should be possible to just drop all the #ifdefs below, as
XWN is arguably the default state we would want systems to move to.
> +
> config ARM64_SW_TTBR0_PAN
> bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
> help
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 14a8f3d93add..fc364c4d31e2 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -911,10 +911,18 @@ extern struct arm64_ftr_override id_aa64mmfr1_override;
> extern struct arm64_ftr_override id_aa64pfr1_override;
> extern struct arm64_ftr_override id_aa64isar1_override;
> extern struct arm64_ftr_override id_aa64isar2_override;
> +extern struct arm64_ftr_override sctlr_override;
>
> u32 get_kvm_ipa_limit(void);
> void dump_cpu_features(void);
>
> +static inline bool arm64_wxn_enabled(void)
> +{
> + if (!IS_ENABLED(CONFIG_ARM64_WXN))
> + return false;
> + return (sctlr_override.val & sctlr_override.mask & 0xf) == 0;
> +}
> +
> #endif /* __ASSEMBLY__ */
>
> #endif
> diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
> index 5966ee4a6154..6d4940342ba7 100644
> --- a/arch/arm64/include/asm/mman.h
> +++ b/arch/arm64/include/asm/mman.h
> @@ -35,11 +35,40 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
> }
> #define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
>
> +static inline bool arm64_check_wx_prot(unsigned long prot,
> + struct task_struct *tsk)
> +{
> + /*
> + * When we are running with SCTLR_ELx.WXN==1, writable mappings are
> + * implicitly non-executable. This means we should reject such mappings
> + * when user space attempts to create them using mmap() or mprotect().
If this series is respun, perhaps add to this comment a little to indicate
that this is basically a hint to userspace, and not an attempt to actually
provide a general W+X mapping protection:
* Note that this is effectively just a hint (for things like
* libffi noted below), as solving this for all mapping combinations
* is a larger endeavor. (e.g. userspace setting an executable mapping
* writable, changing it, and then making it read-only again.)
> + */
> + if (arm64_wxn_enabled() &&
> + ((prot & (PROT_WRITE | PROT_EXEC)) == (PROT_WRITE | PROT_EXEC))) {
> + /*
> + * User space libraries such as libffi carry elaborate
> + * heuristics to decide whether it is worth it to even attempt
> + * to create writable executable mappings, as PaX or selinux
> + * enabled systems will outright reject it. They will usually
> + * fall back to something else (e.g., two separate shared
> + * mmap()s of a temporary file) on failure.
> + */
> + pr_info_ratelimited(
> + "process %s (%d) attempted to create PROT_WRITE+PROT_EXEC mapping\n",
> + tsk->comm, tsk->pid);
> + return false;
> + }
> + return true;
> +}
But regardless, with or without the changes above:
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH v4 26/26] arm64: kernel: move ID map out of .text mapping
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (24 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 25/26] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel
@ 2022-06-13 14:45 ` Ard Biesheuvel
2022-06-13 16:52 ` Kees Cook
2022-06-24 13:19 ` [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Will Deacon
26 siblings, 1 reply; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-13 14:45 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-hardening, Ard Biesheuvel, Marc Zyngier, Will Deacon,
Mark Rutland, Kees Cook, Catalin Marinas, Mark Brown,
Anshuman Khandual
Reorganize the ID map slightly so that only code that is executed via
the 1:1 mapping remains. This allows to move the ID map out of the .text
segment, given that it no longer needs exec permissions via the kernel
mapping.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/kernel/head.S | 5 ++++-
arch/arm64/kernel/vmlinux.lds.S | 2 +-
arch/arm64/mm/proc.S | 2 --
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 834afdc1c6ff..eb959d3387b4 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -525,7 +525,7 @@ SYM_FUNC_END(__primary_switched)
* end early head section, begin head code that is also used for
* hotplug and needs to have the same protections as the text region
*/
- .section ".idmap.text","awx"
+ .text
/*
* Starting from EL2 or EL1, configure the CPU to execute at the highest
@@ -617,6 +617,7 @@ SYM_FUNC_START_LOCAL(set_cpu_boot_mode_flag)
ret
SYM_FUNC_END(set_cpu_boot_mode_flag)
+ .section ".idmap.text","awx"
/*
* This provides a "holding pen" for platforms to hold all secondary
* cores are held until we're ready for them to initialise.
@@ -658,6 +659,7 @@ SYM_FUNC_START_LOCAL(secondary_startup)
br x8
SYM_FUNC_END(secondary_startup)
+ .text
SYM_FUNC_START_LOCAL(__secondary_switched)
mov x0, x20
bl set_cpu_boot_mode_flag
@@ -717,6 +719,7 @@ SYM_FUNC_END(__secondary_too_slow)
* Checks if the selected granule size is supported by the CPU.
* If it isn't, park the CPU
*/
+ .section ".idmap.text","awx"
SYM_FUNC_START(__enable_mmu)
mrs x3, ID_AA64MMFR0_EL1
ubfx x3, x3, #ID_AA64MMFR0_TGRAN_SHIFT, 4
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3830c6c66e46..d51aa4bbd272 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -169,7 +169,6 @@ SECTIONS
LOCK_TEXT
KPROBES_TEXT
HYPERVISOR_TEXT
- IDMAP_TEXT
*(.gnu.warning)
. = ALIGN(16);
*(.got) /* Global offset table */
@@ -194,6 +193,7 @@ SECTIONS
TRAMP_TEXT
HIBERNATE_TEXT
KEXEC_TEXT
+ IDMAP_TEXT
}
. = ALIGN(SEGMENT_ALIGN);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 9ffdf1091d97..7b22e2afe8a0 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -107,7 +107,6 @@ SYM_FUNC_END(cpu_do_suspend)
*
* x0: Address of context pointer
*/
- .pushsection ".idmap.text", "awx"
SYM_FUNC_START(cpu_do_resume)
ldp x2, x3, [x0]
ldp x4, x5, [x0, #16]
@@ -163,7 +162,6 @@ alternative_else_nop_endif
isb
ret
SYM_FUNC_END(cpu_do_resume)
- .popsection
#endif
.pushsection ".idmap.text", "awx"
--
2.30.2
^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH v4 26/26] arm64: kernel: move ID map out of .text mapping
2022-06-13 14:45 ` [PATCH v4 26/26] arm64: kernel: move ID map out of .text mapping Ard Biesheuvel
@ 2022-06-13 16:52 ` Kees Cook
0 siblings, 0 replies; 57+ messages in thread
From: Kees Cook @ 2022-06-13 16:52 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Will Deacon,
Mark Rutland, Catalin Marinas, Mark Brown, Anshuman Khandual
On Mon, Jun 13, 2022 at 04:45:50PM +0200, Ard Biesheuvel wrote:
> Reorganize the ID map slightly so that only code that is executed via
> the 1:1 mapping remains. This allows to move the ID map out of the .text
> segment, given that it no longer needs exec permissions via the kernel
> mapping.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
This could be done earlier in the series, yes?
Regardless:
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN
2022-06-13 14:45 [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Ard Biesheuvel
` (25 preceding siblings ...)
2022-06-13 14:45 ` [PATCH v4 26/26] arm64: kernel: move ID map out of .text mapping Ard Biesheuvel
@ 2022-06-24 13:19 ` Will Deacon
2022-06-24 14:40 ` Ard Biesheuvel
26 siblings, 1 reply; 57+ messages in thread
From: Will Deacon @ 2022-06-24 13:19 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-arm-kernel, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
Hi Ard,
On Mon, Jun 13, 2022 at 04:45:24PM +0200, Ard Biesheuvel wrote:
> [ TL;DR this series does the following:
> - move variable definitions and assignments out of early asm code
> where possible, and get rid of explicit cache maintenance;
> - convert initial ID map so it covers the entire loaded image as well
> as the DT blob;
> - create the kernel mapping only once instead of twice (for KASLR),
> and do it with the MMU and caches on;
> - avoid mappings that are both writable and executable entirely;
> - avoid parsing the DT while the kernel text and rodata are still
> mapped writable;
> - allow WXN to be enabled (with an opt-out) so writable mappings are
> never executable. ]
I really like this series -- it removes quite a few ugly warts from our
boot assembly that we've collected over the years and, while functional,
they have never been particularly satisfactory. Thank you for putting it
together.
I've left a handful of minor comments on some of the patches and if you
can address those then I'd like to queue the first 21 patches ASAP to
give them some more exposure before the next merge window.
The remaining patches are the WXN pieces, which I'd like to give others
a chance to chime in on first.
Cheers,
Will
^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN
2022-06-24 13:19 ` [PATCH v4 00/26] arm64: refactor boot flow and add support for WXN Will Deacon
@ 2022-06-24 14:40 ` Ard Biesheuvel
0 siblings, 0 replies; 57+ messages in thread
From: Ard Biesheuvel @ 2022-06-24 14:40 UTC (permalink / raw)
To: Will Deacon
Cc: Linux ARM, linux-hardening, Marc Zyngier, Mark Rutland,
Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual
On Fri, 24 Jun 2022 at 15:20, Will Deacon <will@kernel.org> wrote:
>
> Hi Ard,
>
> On Mon, Jun 13, 2022 at 04:45:24PM +0200, Ard Biesheuvel wrote:
> > [ TL;DR this series does the following:
> > - move variable definitions and assignments out of early asm code
> > where possible, and get rid of explicit cache maintenance;
> > - convert initial ID map so it covers the entire loaded image as well
> > as the DT blob;
> > - create the kernel mapping only once instead of twice (for KASLR),
> > and do it with the MMU and caches on;
> > - avoid mappings that are both writable and executable entirely;
> > - avoid parsing the DT while the kernel text and rodata are still
> > mapped writable;
> > - allow WXN to be enabled (with an opt-out) so writable mappings are
> > never executable. ]
>
> I really like this series -- it removes quite a few ugly warts from our
> boot assembly that we've collected over the years and, while functional,
> they have never been particularly satisfactory. Thank you for putting it
> together.
>
> I've left a handful of minor comments on some of the patches and if you
> can address those then I'd like to queue the first 21 patches ASAP to
> give them some more exposure before the next merge window.
>
I'll spin a v5 with just those patches, and we can revisit the
remaining work at a later time.
> The remaining patches are the WXN pieces, which I'd like to give others
> a chance to chime in on first.
>
> Cheers,
>
> Will
^ permalink raw reply [flat|nested] 57+ messages in thread