linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/9] arm64: add support for WXN
@ 2022-07-01 13:04 Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 1/9] arm64: kaslr: use an ordinary command line param for nokaslr Ard Biesheuvel
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

This series covers the remaining changes that are needed to enable WXN
support on arm64 now that a lot of the prerequisite work has been queued
up. WXN support is desirable for robustness, given that writable,
executable mappings of memory are too easy to subvert, and in the
kernel, we never rely on such mappings anyway.

Setting SCTLR_ELx.WXN makes all writable mappings implicitly
non-executable, and when set at EL1, it affects EL0 as well as EL1. This
means we need some diligence on the part of user space, but fortunately,
most JITs and other user space components that actually need to
manipulate the contents of their own executable code use split views on
the same memory, or switch between RW- and R-X and back. (One notable
exception is V8, which recently switched back to a full RWX mappings
based JIT)

So on the user space side, we need a couple of minor tweaks to validate
the mmap()/mprotect() arguments when WXN is in effect, and to handle any
faults that might occur on such mappings.

On the kernel side, it is mostly about ensuring that we don't rely on
writable, executable mappings, even during early boot. So for this
reason, the two remaining sequences that create the kernel mapping are
merged, moving the more elaborate logic to set the right attributes into
a C implementation that executes from the ID map. This also allows us to
move the relocation code into C as well, which only lived in asm because
it runs before we have a stack.

Finally, some cleanups are provided for the KASLR code, mainly to ensure
that the early code's decision to use nG mappings or not is based on the
exact same criteria.

(v5 was a subset of v4 without the WXN specific pieces)

Changes since v4: [0]
- don't move __ro_after_init section now that we no longer need to,
- don't complicate the asm kernel mapping routines further, but instead,
  merge the two existing passes into one implemented in C,
- deal with rodata=off on WXN enabled builds (i.e., turn off WXN as
  well),
- add some acks from Kees

[0] https://lore.kernel.org/linux-arm-kernel/20220613144550.3760857-1-ardb@kernel.org/

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>

Ard Biesheuvel (9):
  arm64: kaslr: use an ordinary command line param for nokaslr
  arm64: kaslr: don't pretend KASLR is enabled if offset <
    MIN_KIMG_ALIGN
  arm64: kaslr: drop special case for ThunderX in kaslr_requires_kpti()
  arm64: head: allocate more pages for the kernel mapping
  arm64: head: move early kernel mapping and relocation code to C code
  arm64: mm: avoid fixmap for early swapper_pg_dir updates
  arm64: mm: omit redundant remap of kernel image
  mm: add arch hook to validate mmap() prot flags
  arm64: mm: add support for WXN memory translation attribute

 arch/arm64/Kconfig                      |  11 +
 arch/arm64/include/asm/cpufeature.h     |   9 +
 arch/arm64/include/asm/kasan.h          |   2 -
 arch/arm64/include/asm/kernel-pgtable.h |  11 +-
 arch/arm64/include/asm/memory.h         |  11 +
 arch/arm64/include/asm/mman.h           |  36 ++
 arch/arm64/include/asm/mmu.h            |   2 +-
 arch/arm64/include/asm/mmu_context.h    |  30 +-
 arch/arm64/include/asm/pgtable-prot.h   |   2 +
 arch/arm64/kernel/Makefile              |   4 +-
 arch/arm64/kernel/cpufeature.c          |  14 +-
 arch/arm64/kernel/head.S                | 157 +-------
 arch/arm64/kernel/idreg-override.c      |  15 -
 arch/arm64/kernel/image-vars.h          |  17 +
 arch/arm64/kernel/kaslr.c               |   8 +-
 arch/arm64/kernel/pi/Makefile           |   2 +-
 arch/arm64/kernel/pi/early_map_kernel.c | 381 ++++++++++++++++++++
 arch/arm64/kernel/pi/kaslr_early.c      | 112 ------
 arch/arm64/kernel/vmlinux.lds.S         |  13 +-
 arch/arm64/mm/kasan_init.c              |  15 -
 arch/arm64/mm/mmu.c                     | 150 +++-----
 arch/arm64/mm/proc.S                    |  24 ++
 include/linux/mman.h                    |  15 +
 mm/mmap.c                               |   3 +
 24 files changed, 630 insertions(+), 414 deletions(-)
 create mode 100644 arch/arm64/kernel/pi/early_map_kernel.c
 delete mode 100644 arch/arm64/kernel/pi/kaslr_early.c

-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/9] arm64: kaslr: use an ordinary command line param for nokaslr
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  2022-07-01 14:07   ` Mark Brown
  2022-07-01 13:04 ` [PATCH v6 2/9] arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN Ard Biesheuvel
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

We no longer need to rely on the idreg-override hack to parse the
'nokaslr' command line parameter, given that we now parse it way earlier
already, before the kernel is even mapped. So for later access to its
value, we can just use core_param() instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/idreg-override.c | 15 ---------------
 arch/arm64/kernel/kaslr.c          |  6 ++++--
 2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c
index f92836e196e5..52f858aeba81 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/idreg-override.c
@@ -81,25 +81,11 @@ static const struct ftr_set_desc isar2 __initconst = {
 	},
 };
 
-extern struct arm64_ftr_override kaslr_feature_override;
-
-static const struct ftr_set_desc kaslr __initconst = {
-	.name		= "kaslr",
-#ifdef CONFIG_RANDOMIZE_BASE
-	.override	= &kaslr_feature_override,
-#endif
-	.fields		= {
-		{ "disabled", 0 },
-		{}
-	},
-};
-
 static const struct ftr_set_desc * const regs[] __initconst = {
 	&mmfr1,
 	&pfr1,
 	&isar1,
 	&isar2,
-	&kaslr,
 };
 
 static const struct {
@@ -114,7 +100,6 @@ static const struct {
 	  "id_aa64isar1.api=0 id_aa64isar1.apa=0 "
 	  "id_aa64isar2.gpa3=0 id_aa64isar2.apa3=0"	   },
 	{ "arm64.nomte",		"id_aa64pfr1.mte=0" },
-	{ "nokaslr",			"kaslr.disabled=1" },
 };
 
 static int __init find_field(const char *cmdline,
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 325455d16dbc..bcbcca938da8 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/libfdt.h>
 #include <linux/mm_types.h>
+#include <linux/moduleparam.h>
 #include <linux/sched.h>
 #include <linux/types.h>
 #include <linux/pgtable.h>
@@ -23,7 +24,8 @@
 u64 __ro_after_init module_alloc_base;
 u16 __initdata memstart_offset_seed;
 
-struct arm64_ftr_override kaslr_feature_override __initdata;
+static bool nokaslr;
+core_param(nokaslr, nokaslr, bool, 0);
 
 static int __init kaslr_init(void)
 {
@@ -36,7 +38,7 @@ static int __init kaslr_init(void)
 	 */
 	module_alloc_base = (u64)_etext - MODULES_VSIZE;
 
-	if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) {
+	if (nokaslr) {
 		pr_info("KASLR disabled on command line\n");
 		return 0;
 	}
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/9] arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 1/9] arm64: kaslr: use an ordinary command line param for nokaslr Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  2022-07-01 14:12   ` Mark Brown
  2022-07-01 13:04 ` [PATCH v6 3/9] arm64: kaslr: drop special case for ThunderX in kaslr_requires_kpti() Ard Biesheuvel
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

Our virtual KASLR displacement consists of a fully randomized multiple
of 2 MiB, combined with an offset that is equal to the physical
placement modulo 2 MiB. This arrangement ensures that we can always use
2 MiB block mappings (or contiguous PTE mappings for 16k or 64k pages)
to map the kernel.

This means that a KASLR offset of less than 2 MiB is simply the product
of this physical displacement, and no randomization has actually taken
place. So let's avoid misreporting this case as 'KASLR enabled'.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h | 11 +++++++++++
 arch/arm64/kernel/cpufeature.c  |  2 +-
 arch/arm64/kernel/kaslr.c       |  2 +-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index c751cd9b94f8..498af99d1adc 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -172,6 +172,7 @@
 #include <linux/compiler.h>
 #include <linux/mmdebug.h>
 #include <linux/types.h>
+#include <asm/boot.h>
 #include <asm/bug.h>
 
 #if VA_BITS > 48
@@ -195,6 +196,16 @@ static inline unsigned long kaslr_offset(void)
 	return kimage_vaddr - KIMAGE_VADDR;
 }
 
+static inline bool kaslr_enabled(void)
+{
+	/*
+	 * The KASLR offset modulo MIN_KIMG_ALIGN is taken from the physical
+	 * placement of the image rather than from the seed, so a displacement
+	 * of less than MIN_KIMG_ALIGN means that no seed was provided.
+	 */
+	return kaslr_offset() >= MIN_KIMG_ALIGN;
+}
+
 /*
  * Allow all memory at the discovery stage. We will clip it later.
  */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 98b48d9069a7..22e3604aee02 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1562,7 +1562,7 @@ bool kaslr_requires_kpti(void)
 			return false;
 	}
 
-	return kaslr_offset() > 0;
+	return kaslr_enabled();
 }
 
 static bool __meltdown_safe = true;
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index bcbcca938da8..d63322fc1d40 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -43,7 +43,7 @@ static int __init kaslr_init(void)
 		return 0;
 	}
 
-	if (!kaslr_offset()) {
+	if (!kaslr_enabled()) {
 		pr_warn("KASLR disabled due to lack of seed\n");
 		return 0;
 	}
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 3/9] arm64: kaslr: drop special case for ThunderX in kaslr_requires_kpti()
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 1/9] arm64: kaslr: use an ordinary command line param for nokaslr Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 2/9] arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 4/9] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

ThunderX is an obsolete platform that shipped without support for the
EFI_RNG_PROTOCOL in its firmware. Now that we no longer misidentify
small KASLR offsets as randomization being enabled, we can drop the
explicit check for ThunderX as well, given that KASLR is known to be
unavailable on these systems.

Note that we never enable KPTI on these systems, in spite of what this
function returns. The only penalty for getting it wrong (i.e., returning
true here on a ThunderX system) is that we will end up using non-global
mappings for the kernel pointlessly.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 22e3604aee02..af46ca0da994 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1550,18 +1550,6 @@ bool kaslr_requires_kpti(void)
 			return false;
 	}
 
-	/*
-	 * Systems affected by Cavium erratum 24756 are incompatible
-	 * with KPTI.
-	 */
-	if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
-		extern const struct midr_range cavium_erratum_27456_cpus[];
-
-		if (is_midr_in_range_list(read_cpuid_id(),
-					  cavium_erratum_27456_cpus))
-			return false;
-	}
-
 	return kaslr_enabled();
 }
 
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 4/9] arm64: head: allocate more pages for the kernel mapping
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2022-07-01 13:04 ` [PATCH v6 3/9] arm64: kaslr: drop special case for ThunderX in kaslr_requires_kpti() Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 5/9] arm64: head: move early kernel mapping and relocation code to C code Ard Biesheuvel
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

In preparation for switching to an early kernel mapping routine that
maps each segment according to its precise boundaries, and with the
correct attributes, let's allocate some extra pages for page tables for
the 4k page size configuration. This is necessary because the start and
end of each segment may not be aligned to the block size, and so we'll
need an extra page table at each segment boundary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 02e59fa8f293..5b63bdcd0741 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -85,7 +85,7 @@
 			+ EARLY_PGDS((vstart), (vend)) 	/* each PGDIR needs a next level page table */	\
 			+ EARLY_PUDS((vstart), (vend))	/* each PUD needs a next level page table */	\
 			+ EARLY_PMDS((vstart), (vend)))	/* each PMD needs a next level page table */
-#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end))
+#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end) + EARLY_SEGMENT_EXTRA_PAGES))
 
 /* the initial ID map may need two extra pages if it needs to be extended */
 #if VA_BITS < 48
@@ -106,6 +106,15 @@
 #define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+/* The number of segments in the kernel image (text, rodata, inittext, initdata, data+bss) */
+#define KERNEL_SEGMENT_COUNT	5
+
+#if SWAPPER_BLOCK_SIZE > SEGMENT_ALIGN
+#define EARLY_SEGMENT_EXTRA_PAGES (KERNEL_SEGMENT_COUNT + 1)
+#else
+#define EARLY_SEGMENT_EXTRA_PAGES 0
+#endif
+
 /*
  * Initial memory map attributes.
  */
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 5/9] arm64: head: move early kernel mapping and relocation code to C code
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2022-07-01 13:04 ` [PATCH v6 4/9] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 6/9] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

The asm version of the kernel mapping code works fine for creating a
coarse grained identity map, but for mapping the kernel down to its
exact boundaries with the right attributes, it is not suitable. This is
why we create a preliminary RWX kernel mapping first, and then rebuild
it from scratch later on.

So let's reimplement this, and along with it the relocation routines, in
C, in a way that will make it unnecessary to create the kernel page
tables yet another time in paging_init().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgtable-prot.h   |   2 +
 arch/arm64/kernel/Makefile              |   4 +-
 arch/arm64/kernel/head.S                | 157 +--------
 arch/arm64/kernel/image-vars.h          |  16 +
 arch/arm64/kernel/pi/Makefile           |   2 +-
 arch/arm64/kernel/pi/early_map_kernel.c | 368 ++++++++++++++++++++
 arch/arm64/kernel/pi/kaslr_early.c      | 112 ------
 arch/arm64/kernel/vmlinux.lds.S         |  13 +-
 arch/arm64/mm/proc.S                    |   1 +
 9 files changed, 410 insertions(+), 265 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 62e0ebeed720..dd38f8e80fac 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -32,7 +32,9 @@
 #include <asm/cpufeature.h>
 #include <asm/pgtable-types.h>
 
+#ifndef arm64_use_ng_mappings
 extern bool arm64_use_ng_mappings;
+#endif
 
 #define _PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
 #define _PROT_SECT_DEFAULT	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 88a96511580e..802de025bbea 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -29,7 +29,7 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   cpufeature.o alternative.o cacheinfo.o		\
 			   smp.o smp_spin_table.o topology.o smccc-call.o	\
 			   syscall.o proton-pack.o idreg-override.o idle.o	\
-			   patching.o
+			   patching.o pi/
 
 targets			+= efi-entry.o
 
@@ -59,7 +59,7 @@ obj-$(CONFIG_ACPI)			+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 obj-$(CONFIG_PARAVIRT)			+= paravirt.o
-obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o pi/
+obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
 obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 obj-$(CONFIG_ELF_CORE)			+= elfcore.o
 obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index f80127df5846..b7f1bd07a647 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -81,8 +81,6 @@
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
-	 *  x23        primary_entry() .. start_kernel()        physical misalignment/KASLR offset
-	 *  x24        __primary_switch()                       linear map KASLR seed
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 *  x28        create_idmap()                           callee preserved temp register
 	 */
@@ -153,17 +151,6 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
 0:	ret
 SYM_CODE_END(preserve_boot_args)
 
-SYM_FUNC_START_LOCAL(clear_page_tables)
-	/*
-	 * Clear the init page tables.
-	 */
-	adrp	x0, init_pg_dir
-	adrp	x1, init_pg_end
-	sub	x2, x1, x0
-	mov	x1, xzr
-	b	__pi_memset			// tail call
-SYM_FUNC_END(clear_page_tables)
-
 /*
  * Macro to populate page table entries, these entries can be pointers to the next level
  * or last level entries pointing to physical memory.
@@ -365,7 +352,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	/* Remap the kernel page tables r/w in the ID map */
 	adrp	x1, _text
 	adrp	x2, init_pg_dir
-	adrp	x3, init_pg_end
+	adrp	x3, _end
 	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
 	mov	x5, SWAPPER_RW_MMUFLAGS
 	mov	x6, #SWAPPER_BLOCK_SHIFT
@@ -396,22 +383,6 @@ SYM_FUNC_START_LOCAL(create_idmap)
 0:	ret	x28
 SYM_FUNC_END(create_idmap)
 
-SYM_FUNC_START_LOCAL(create_kernel_mapping)
-	adrp	x0, init_pg_dir
-	mov_q	x5, KIMAGE_VADDR		// compile time __va(_text)
-	add	x5, x5, x23			// add KASLR displacement
-	adrp	x6, _end			// runtime __pa(_end)
-	adrp	x3, _text			// runtime __pa(_text)
-	sub	x6, x6, x3			// _end - _text
-	add	x6, x6, x5			// runtime __va(_end)
-	mov	x7, SWAPPER_RW_MMUFLAGS
-
-	map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
-
-	dsb	ishst				// sync with page table walker
-	ret
-SYM_FUNC_END(create_kernel_mapping)
-
 	/*
 	 * Initialize CPU registers with task-specific and cpu-specific context.
 	 *
@@ -445,6 +416,7 @@ SYM_FUNC_END(create_kernel_mapping)
  * The following fragment of code is executed with the MMU enabled.
  *
  *   x0 = __pa(KERNEL_START)
+ *   w1 = memstart_offset_seed
  */
 SYM_FUNC_START_LOCAL(__primary_switched)
 	adr_l	x4, init_task
@@ -454,6 +426,11 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	msr	vbar_el1, x8			// vector table address
 	isb
 
+#ifdef CONFIG_RANDOMIZE_BASE
+	adrp	x5, memstart_offset_seed	// Save KASLR linear map seed
+	strh	w1, [x5, :lo12:memstart_offset_seed]
+#endif
+
 	stp	x29, x30, [sp, #-16]!
 	mov	x29, sp
 
@@ -479,11 +456,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	str	x25, [x8]			// ... observes the correct value
 	dc	civac, x8			// Make visible to booting secondaries
 #endif
-
-#ifdef CONFIG_RANDOMIZE_BASE
-	adrp	x5, memstart_offset_seed	// Save KASLR linear map seed
-	strh	w24, [x5, :lo12:memstart_offset_seed]
-#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
@@ -747,123 +719,18 @@ SYM_FUNC_START_LOCAL(__no_granule_support)
 	b	1b
 SYM_FUNC_END(__no_granule_support)
 
-#ifdef CONFIG_RELOCATABLE
-SYM_FUNC_START_LOCAL(__relocate_kernel)
-	/*
-	 * Iterate over each entry in the relocation table, and apply the
-	 * relocations in place.
-	 */
-	adr_l	x9, __rela_start
-	adr_l	x10, __rela_end
-	mov_q	x11, KIMAGE_VADDR		// default virtual offset
-	add	x11, x11, x23			// actual virtual offset
-
-0:	cmp	x9, x10
-	b.hs	1f
-	ldp	x12, x13, [x9], #24
-	ldr	x14, [x9, #-8]
-	cmp	w13, #R_AARCH64_RELATIVE
-	b.ne	0b
-	add	x14, x14, x23			// relocate
-	str	x14, [x12, x23]
-	b	0b
-
-1:
-#ifdef CONFIG_RELR
-	/*
-	 * Apply RELR relocations.
-	 *
-	 * RELR is a compressed format for storing relative relocations. The
-	 * encoded sequence of entries looks like:
-	 * [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
-	 *
-	 * i.e. start with an address, followed by any number of bitmaps. The
-	 * address entry encodes 1 relocation. The subsequent bitmap entries
-	 * encode up to 63 relocations each, at subsequent offsets following
-	 * the last address entry.
-	 *
-	 * The bitmap entries must have 1 in the least significant bit. The
-	 * assumption here is that an address cannot have 1 in lsb. Odd
-	 * addresses are not supported. Any odd addresses are stored in the RELA
-	 * section, which is handled above.
-	 *
-	 * Excluding the least significant bit in the bitmap, each non-zero
-	 * bit in the bitmap represents a relocation to be applied to
-	 * a corresponding machine word that follows the base address
-	 * word. The second least significant bit represents the machine
-	 * word immediately following the initial address, and each bit
-	 * that follows represents the next word, in linear order. As such,
-	 * a single bitmap can encode up to 63 relocations in a 64-bit object.
-	 *
-	 * In this implementation we store the address of the next RELR table
-	 * entry in x9, the address being relocated by the current address or
-	 * bitmap entry in x13 and the address being relocated by the current
-	 * bit in x14.
-	 */
-	adr_l	x9, __relr_start
-	adr_l	x10, __relr_end
-
-2:	cmp	x9, x10
-	b.hs	7f
-	ldr	x11, [x9], #8
-	tbnz	x11, #0, 3f			// branch to handle bitmaps
-	add	x13, x11, x23
-	ldr	x12, [x13]			// relocate address entry
-	add	x12, x12, x23
-	str	x12, [x13], #8			// adjust to start of bitmap
-	b	2b
-
-3:	mov	x14, x13
-4:	lsr	x11, x11, #1
-	cbz	x11, 6f
-	tbz	x11, #0, 5f			// skip bit if not set
-	ldr	x12, [x14]			// relocate bit
-	add	x12, x12, x23
-	str	x12, [x14]
-
-5:	add	x14, x14, #8			// move to next bit's address
-	b	4b
-
-6:	/*
-	 * Move to the next bitmap's address. 8 is the word size, and 63 is the
-	 * number of significant bits in a bitmap entry.
-	 */
-	add	x13, x13, #(8 * 63)
-	b	2b
-
-7:
-#endif
-	ret
-
-SYM_FUNC_END(__relocate_kernel)
-#endif
-
 SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
-#ifdef CONFIG_RELOCATABLE
-	adrp	x23, KERNEL_START
-	and	x23, x23, MIN_KIMG_ALIGN - 1
-#ifdef CONFIG_RANDOMIZE_BASE
-	mov	x0, x22
-	adrp	x1, init_pg_end
+
+	adrp	x1, primary_init_stack
 	mov	sp, x1
 	mov	x29, xzr
-	bl	__pi_kaslr_early_init
-	and	x24, x0, #SZ_2M - 1		// capture memstart offset seed
-	bic	x0, x0, #SZ_2M - 1
-	orr	x23, x23, x0			// record kernel offset
-#endif
-#endif
-	bl	clear_page_tables
-	bl	create_kernel_mapping
+	mov	x0, x22				// pass FDT pointer
+	bl	__pi_early_map_kernel
+	mov	w1, w0				// capture memstart offset seed
 
-	adrp	x1, init_pg_dir
-	load_ttbr1 x1, x1, x2
-#ifdef CONFIG_RELOCATABLE
-	bl	__relocate_kernel
-#endif
 	ldr	x8, =__primary_switched
 	adrp	x0, KERNEL_START		// __pa(KERNEL_START)
 	br	x8
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index afa69e04e75e..032c3e1aff20 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -54,6 +54,22 @@ PROVIDE(__pi___memcpy			= __pi_memcpy);
 PROVIDE(__pi___memmove			= __pi_memmove);
 PROVIDE(__pi___memset			= __pi_memset);
 
+/*
+ * The symbols below are used by the early C kernel mapping code.
+ */
+PROVIDE(__pi_init_pg_dir		= init_pg_dir);
+PROVIDE(__pi_init_pg_end		= init_pg_end);
+
+PROVIDE(__pi__text			= _text);
+PROVIDE(__pi__stext               	= _stext);
+PROVIDE(__pi__etext               	= _etext);
+PROVIDE(__pi___start_rodata       	= __start_rodata);
+PROVIDE(__pi___inittext_begin     	= __inittext_begin);
+PROVIDE(__pi___inittext_end       	= __inittext_end);
+PROVIDE(__pi___initdata_begin     	= __initdata_begin);
+PROVIDE(__pi___initdata_end       	= __initdata_end);
+PROVIDE(__pi__data                	= _data);
+
 #ifdef CONFIG_KVM
 
 /*
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index 839291430cb3..b88612eab1bc 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -29,5 +29,5 @@ $(obj)/%.pi.o: $(obj)/%.o FORCE
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
-obj-y		:= kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+obj-y		:= early_map_kernel.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
 extra-y		:= $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/pi/early_map_kernel.c b/arch/arm64/kernel/pi/early_map_kernel.c
new file mode 100644
index 000000000000..4199548584fb
--- /dev/null
+++ b/arch/arm64/kernel/pi/early_map_kernel.c
@@ -0,0 +1,368 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2022 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+// NOTE: code in this file runs *very* early, and is not permitted to use
+// global variables or anything that relies on absolute addressing.
+
+#define arm64_use_ng_mappings 0
+
+#include <linux/elf.h>
+#include <linux/libfdt.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/string.h>
+
+#include <asm/archrandom.h>
+#include <asm/memory.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+
+/* taken from lib/string.c */
+static char *__strstr(const char *s1, const char *s2)
+{
+	size_t l1, l2;
+
+	l2 = strlen(s2);
+	if (!l2)
+		return (char *)s1;
+	l1 = strlen(s1);
+	while (l1 >= l2) {
+		l1--;
+		if (!memcmp(s1, s2, l2))
+			return (char *)s1;
+		s1++;
+	}
+	return NULL;
+}
+
+/*
+ * Returns whether @pfx appears in @string, either at the very start, or
+ * elsewhere but preceded by a space character.
+ */
+static bool string_contains_prefix(const u8 *string, const u8 *pfx)
+{
+	const u8 *str;
+
+	str = __strstr(string, pfx);
+	return str == string || (str > string && *(str - 1) == ' ');
+}
+
+static bool cmdline_has(void *fdt, const u8 *word)
+{
+	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
+		int node;
+		const u8 *prop;
+
+		node = fdt_path_offset(fdt, "/chosen");
+		if (node < 0)
+			goto out;
+
+		prop = fdt_getprop(fdt, node, "bootargs", NULL);
+		if (!prop)
+			goto out;
+
+		if (string_contains_prefix(prop, word))
+			return true;
+
+		if (IS_ENABLED(CONFIG_CMDLINE_EXTEND))
+			goto out;
+
+		return false;
+	}
+out:
+	return string_contains_prefix(CONFIG_CMDLINE, word);
+}
+
+static u64 get_kaslr_seed(void *fdt)
+{
+	int node, len;
+	fdt64_t *prop;
+	u64 ret;
+
+	node = fdt_path_offset(fdt, "/chosen");
+	if (node < 0)
+		return 0;
+
+	prop = fdt_getprop_w(fdt, node, "kaslr-seed", &len);
+	if (!prop || len != sizeof(u64))
+		return 0;
+
+	ret = fdt64_to_cpu(*prop);
+	*prop = 0;
+	return ret;
+}
+
+static u64 kaslr_early_init(void *fdt)
+{
+	u64 seed;
+
+	if (cmdline_has(fdt, "nokaslr"))
+		return 0;
+
+	seed = get_kaslr_seed(fdt);
+	if (!seed) {
+#ifdef CONFIG_ARCH_RANDOM
+		 if (!__early_cpu_has_rndr() ||
+		     !__arm64_rndr((unsigned long *)&seed))
+#endif
+		return 0;
+	}
+
+	/*
+	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
+	 * kernel image offset from the seed. Let's place the kernel in the
+	 * middle half of the VMALLOC area (VA_BITS_MIN - 2), and stay clear of
+	 * the lower and upper quarters to avoid colliding with other
+	 * allocations.
+	 */
+	return BIT(VA_BITS_MIN - 3) + (seed & GENMASK(VA_BITS_MIN - 3, 0));
+}
+
+extern const Elf64_Rela rela_start[], rela_end[];
+extern const u64 relr_start[], relr_end[];
+
+static void relocate_kernel(u64 offset)
+{
+	const Elf64_Rela *rela;
+	const u64 *relr;
+	u64 *place;
+
+	for (rela = rela_start; rela < rela_end; rela++) {
+		if (ELF64_R_TYPE(rela->r_info) != R_AARCH64_RELATIVE)
+			continue;
+		place = (u64 *)(rela->r_offset + offset);
+		*place = rela->r_addend + offset;
+	}
+
+	if (!IS_ENABLED(CONFIG_RELR) || !offset)
+		return;
+
+	/*
+	 * Apply RELR relocations.
+	 *
+	 * RELR is a compressed format for storing relative relocations. The
+	 * encoded sequence of entries looks like:
+	 * [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
+	 *
+	 * i.e. start with an address, followed by any number of bitmaps. The
+	 * address entry encodes 1 relocation. The subsequent bitmap entries
+	 * encode up to 63 relocations each, at subsequent offsets following
+	 * the last address entry.
+	 *
+	 * The bitmap entries must have 1 in the least significant bit. The
+	 * assumption here is that an address cannot have 1 in lsb. Odd
+	 * addresses are not supported. Any odd addresses are stored in the
+	 * RELA section, which is handled above.
+	 *
+	 * Excluding the least significant bit in the bitmap, each non-zero bit
+	 * in the bitmap represents a relocation to be applied to a
+	 * corresponding machine word that follows the base address word. The
+	 * second least significant bit represents the machine word immediately
+	 * following the initial address, and each bit that follows represents
+	 * the next word, in linear order. As such, a single bitmap can encode
+	 * up to 63 relocations in a 64-bit object.
+	 */
+	for (relr = relr_start; relr < relr_end; relr++) {
+		u64 *p, r = *relr;
+
+		if ((r & 1) == 0) {
+			place = (u64 *)(r + offset);
+			*place++ += offset;
+		} else {
+			for (p = place; r; p++) {
+				r >>= 1;
+				if (r & 1)
+					*p += offset;
+			}
+			place += 63;
+		}
+	}
+}
+
+extern void idmap_cpu_replace_ttbr1(void *pgdir);
+
+static void map_range(pgd_t **pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
+		      int level, pte_t *tbl, bool may_use_cont)
+{
+	u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 : U64_MAX;
+	u64 protval = pgprot_val(prot) & ~PTE_TYPE_MASK;
+	int lshift = (3 - level) * (PAGE_SHIFT - 3);
+	u64 lmask = (PAGE_SIZE << lshift) - 1;
+
+	// Advance tbl to the entry that covers start
+	tbl += (start >> (lshift + PAGE_SHIFT)) % BIT(PAGE_SHIFT - 3);
+
+	// Set the right block/page bits for this level unless we are
+	// clearing the mapping
+	if (protval)
+		protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
+
+	while (start < end) {
+		u64 next = min((start | lmask) + 1, end);
+
+		if (level < 3 &&
+		    (start & lmask || next & lmask || pa & lmask)) {
+			// This chunk needs a finer grained mapping
+			// Put down a table mapping if necessary and recurse
+			if (pte_none(*tbl)) {
+				*tbl = __pte(__phys_to_pte_val((u64)*pgd) |
+					     PMD_TYPE_TABLE);
+				*pgd += PTRS_PER_PTE;
+			}
+			map_range(pgd, start, next, pa, prot, level + 1,
+				  (pte_t *)__pte_to_phys(*tbl), may_use_cont);
+		} else {
+			// Start a contiguous range if start and pa are
+			// suitably aligned
+			if (((start | pa) & cmask) == 0 && may_use_cont)
+				protval |= PTE_CONT;
+			// Clear the contiguous attribute if the remaining
+			// range does not cover a contiguous block
+			if ((end & ~cmask) <= start)
+				protval &= ~PTE_CONT;
+			// Put down a block or page mapping
+			*tbl = __pte(__phys_to_pte_val(pa) | protval);
+		}
+		pa += next - start;
+		start = next;
+		tbl++;
+	}
+}
+
+static void map_segment(pgd_t **pgd, u64 va_offset, void *start, void *end,
+			pgprot_t prot, bool may_use_cont)
+{
+	map_range(pgd, ((u64)start + va_offset) & ~PAGE_OFFSET,
+		  ((u64)end + va_offset) & ~PAGE_OFFSET, (u64)start,
+		  prot, 4 - CONFIG_PGTABLE_LEVELS, (pte_t *)init_pg_dir,
+		  may_use_cont);
+}
+
+static void unmap_segment(u64 va_offset, void *start, void *end)
+{
+	map_range(NULL, ((u64)start + va_offset) & ~PAGE_OFFSET,
+		  ((u64)end + va_offset) & ~PAGE_OFFSET, (u64)start,
+		  __pgprot(0), 4 - CONFIG_PGTABLE_LEVELS, (pte_t *)init_pg_dir,
+		  false);
+}
+
+/*
+ * Open coded check for BTI, only for use to determine configuration
+ * for early mappings for before the cpufeature code has run.
+ */
+static bool arm64_early_this_cpu_has_bti(void)
+{
+	u64 pfr1;
+
+	if (!IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
+		return false;
+
+	pfr1 = read_sysreg_s(SYS_ID_AA64PFR1_EL1);
+	return cpuid_feature_extract_unsigned_field(pfr1,
+						    ID_AA64PFR1_BT_SHIFT);
+}
+
+static bool arm64_early_this_cpu_has_e0pd(void)
+{
+	u64 mmfr2;
+
+	if (!IS_ENABLED(CONFIG_ARM64_E0PD))
+		return false;
+
+	mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
+	return cpuid_feature_extract_unsigned_field(mmfr2,
+						    ID_AA64MMFR2_E0PD_SHIFT);
+}
+
+static void map_kernel(void *fdt, u64 kaslr_offset, u64 va_offset)
+{
+	pgd_t *pgdp = (void *)init_pg_dir + PAGE_SIZE;
+	pgprot_t text_prot = PAGE_KERNEL_ROX;
+	pgprot_t data_prot = PAGE_KERNEL;
+	pgprot_t prot;
+
+	if (cmdline_has(fdt, "rodata=off"))
+		text_prot = PAGE_KERNEL_EXEC;
+
+	// If we have a CPU that supports BTI and a kernel built for
+	// BTI then mark the kernel executable text as guarded pages
+	// now so we don't have to rewrite the page tables later.
+	if (arm64_early_this_cpu_has_bti() && !cmdline_has(fdt, "arm64.nobti"))
+		text_prot = __pgprot_modify(text_prot, PTE_GP, PTE_GP);
+
+	// Assume that any CPU that does not implement E0PD needs KPTI to
+	// ensure that KASLR randomized addresses will not leak. This means we
+	// need to use non-global mappings for the kernel text and data.
+	if (!arm64_early_this_cpu_has_e0pd() && kaslr_offset >= MIN_KIMG_ALIGN) {
+		text_prot = __pgprot_modify(text_prot, PTE_NG, PTE_NG);
+		data_prot = __pgprot_modify(data_prot, PTE_NG, PTE_NG);
+	}
+
+	// Map all code read-write on the first pass for relocation processing
+	prot = IS_ENABLED(CONFIG_RELOCATABLE) ? data_prot : text_prot;
+
+	map_segment(&pgdp, va_offset, _stext, _etext, prot, true);
+	map_segment(&pgdp, va_offset, __start_rodata, __inittext_begin, data_prot, false);
+	map_segment(&pgdp, va_offset, __inittext_begin, __inittext_end, prot, false);
+	map_segment(&pgdp, va_offset, __initdata_begin, __initdata_end, data_prot, false);
+	map_segment(&pgdp, va_offset, _data, init_pg_dir, data_prot, true);
+	// omit [init_pg_dir, _end] - it doesn't need a kernel mapping
+	dsb(ishst);
+
+	idmap_cpu_replace_ttbr1(init_pg_dir);
+
+	if (IS_ENABLED(CONFIG_RELOCATABLE)) {
+		relocate_kernel(kaslr_offset);
+
+		// Unmap the text region before remapping it, to avoid
+		// potential TLB conflicts on the contiguous descriptors. This
+		// assumes that it is permitted to clear the valid bit on a
+		// live descriptor with the CONT bit set.
+		unmap_segment(va_offset, _stext, _etext);
+		dsb(ishst);
+		isb();
+		__tlbi(vmalle1);
+		isb();
+
+		// Remap these segments with different permissions
+		// No new page table allocations should be needed
+		map_segment(NULL, va_offset, _stext, _etext, text_prot, true);
+		map_segment(NULL, va_offset, __inittext_begin, __inittext_end,
+			    text_prot, false);
+		dsb(ishst);
+	}
+}
+
+asmlinkage u64 early_map_kernel(void *fdt)
+{
+	u64 kaslr_seed = 0, kaslr_offset = 0;
+	u64 va_base = KIMAGE_VADDR;
+	u64 pa_base = (u64)&_text;
+
+	// Clear the initial page tables before populating them
+	memset(init_pg_dir, 0, init_pg_end - init_pg_dir);
+
+	// The virtual KASLR displacement modulo 2MiB is decided by the
+	// physical placement of the image, as otherwise, we might not be able
+	// to create the early kernel mapping using 2 MiB block descriptors. So
+	// take the low bits of the KASLR offset from the physical address, and
+	// fill in the high bits from the seed.
+	if (IS_ENABLED(CONFIG_RELOCATABLE)) {
+		kaslr_offset = pa_base & (MIN_KIMG_ALIGN - 1);
+		if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+			kaslr_seed = kaslr_early_init(fdt);
+			kaslr_offset |= kaslr_seed & ~(MIN_KIMG_ALIGN - 1);
+		}
+		va_base += kaslr_offset;
+	}
+
+	map_kernel(fdt, kaslr_offset, va_base - pa_base);
+
+	// Return the lower 16 bits of the seed - this will be
+	// used to randomize the linear map
+	return kaslr_seed & U16_MAX;
+}
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
deleted file mode 100644
index 6c3855e69395..000000000000
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ /dev/null
@@ -1,112 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-// Copyright 2022 Google LLC
-// Author: Ard Biesheuvel <ardb@google.com>
-
-// NOTE: code in this file runs *very* early, and is not permitted to use
-// global variables or anything that relies on absolute addressing.
-
-#include <linux/libfdt.h>
-#include <linux/init.h>
-#include <linux/linkage.h>
-#include <linux/types.h>
-#include <linux/sizes.h>
-#include <linux/string.h>
-
-#include <asm/archrandom.h>
-#include <asm/memory.h>
-
-/* taken from lib/string.c */
-static char *__strstr(const char *s1, const char *s2)
-{
-	size_t l1, l2;
-
-	l2 = strlen(s2);
-	if (!l2)
-		return (char *)s1;
-	l1 = strlen(s1);
-	while (l1 >= l2) {
-		l1--;
-		if (!memcmp(s1, s2, l2))
-			return (char *)s1;
-		s1++;
-	}
-	return NULL;
-}
-static bool cmdline_contains_nokaslr(const u8 *cmdline)
-{
-	const u8 *str;
-
-	str = __strstr(cmdline, "nokaslr");
-	return str == cmdline || (str > cmdline && *(str - 1) == ' ');
-}
-
-static bool is_kaslr_disabled_cmdline(void *fdt)
-{
-	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
-		int node;
-		const u8 *prop;
-
-		node = fdt_path_offset(fdt, "/chosen");
-		if (node < 0)
-			goto out;
-
-		prop = fdt_getprop(fdt, node, "bootargs", NULL);
-		if (!prop)
-			goto out;
-
-		if (cmdline_contains_nokaslr(prop))
-			return true;
-
-		if (IS_ENABLED(CONFIG_CMDLINE_EXTEND))
-			goto out;
-
-		return false;
-	}
-out:
-	return cmdline_contains_nokaslr(CONFIG_CMDLINE);
-}
-
-static u64 get_kaslr_seed(void *fdt)
-{
-	int node, len;
-	fdt64_t *prop;
-	u64 ret;
-
-	node = fdt_path_offset(fdt, "/chosen");
-	if (node < 0)
-		return 0;
-
-	prop = fdt_getprop_w(fdt, node, "kaslr-seed", &len);
-	if (!prop || len != sizeof(u64))
-		return 0;
-
-	ret = fdt64_to_cpu(*prop);
-	*prop = 0;
-	return ret;
-}
-
-asmlinkage u64 kaslr_early_init(void *fdt)
-{
-	u64 seed;
-
-	if (is_kaslr_disabled_cmdline(fdt))
-		return 0;
-
-	seed = get_kaslr_seed(fdt);
-	if (!seed) {
-#ifdef CONFIG_ARCH_RANDOM
-		 if (!__early_cpu_has_rndr() ||
-		     !__arm64_rndr((unsigned long *)&seed))
-#endif
-		return 0;
-	}
-
-	/*
-	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
-	 * kernel image offset from the seed. Let's place the kernel in the
-	 * middle half of the VMALLOC area (VA_BITS_MIN - 2), and stay clear of
-	 * the lower and upper quarters to avoid colliding with other
-	 * allocations.
-	 */
-	return BIT(VA_BITS_MIN - 3) + (seed & GENMASK(VA_BITS_MIN - 3, 0));
-}
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5002d869fa7f..60641aa3ac07 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -257,15 +257,15 @@ SECTIONS
 	HYPERVISOR_RELOC_SECTION
 
 	.rela.dyn : ALIGN(8) {
-		__rela_start = .;
+		__pi_rela_start = .;
 		*(.rela .rela*)
-		__rela_end = .;
+		__pi_rela_end = .;
 	}
 
 	.relr.dyn : ALIGN(8) {
-		__relr_start = .;
+		__pi_relr_start = .;
 		*(.relr.dyn)
-		__relr_end = .;
+		__pi_relr_end = .;
 	}
 
 	. = ALIGN(SEGMENT_ALIGN);
@@ -309,11 +309,14 @@ SECTIONS
 
 	BSS_SECTION(SBSS_ALIGN, 0, 0)
 
-	. = ALIGN(PAGE_SIZE);
+	. = ALIGN(SEGMENT_ALIGN);
 	init_pg_dir = .;
 	. += INIT_DIR_SIZE;
 	init_pg_end = .;
 
+	. += SZ_4K;
+	primary_init_stack = .;
+
 	. = ALIGN(SEGMENT_ALIGN);
 	__pecoff_data_size = ABSOLUTE(. - __initdata_begin);
 	_end = .;
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 113a4fedf5b8..4322ddf5e02f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -196,6 +196,7 @@ SYM_FUNC_START(idmap_cpu_replace_ttbr1)
 
 	ret
 SYM_FUNC_END(idmap_cpu_replace_ttbr1)
+SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 	.popsection
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 6/9] arm64: mm: avoid fixmap for early swapper_pg_dir updates
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2022-07-01 13:04 ` [PATCH v6 5/9] arm64: head: move early kernel mapping and relocation code to C code Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 7/9] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

Early in the boot, when .rodata is still writable, we can poke
swapper_pg_dir entries directly, and there is no need to go through the
fixmap. After a future patch, we will enter the kernel with
swapper_pg_dir already active, and early swapper_pg_dir updates for
creating the fixmap page table hierarchy itself cannot go through the
fixmap for obvious reaons. So let's keep track of whether rodata is
writable, and update the descriptor directly in that case.

As the same reasoning applies to early KASAN init, make the function
noinstr as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/mmu.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a723bd2cfc27..8965f621e2c3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -58,6 +58,8 @@ EXPORT_SYMBOL(kimage_voffset);
 
 u32 __boot_cpu_mode[] = { BOOT_CPU_MODE_EL2, BOOT_CPU_MODE_EL1 };
 
+static bool rodata_is_rw __ro_after_init = true;
+
 /*
  * The booting CPU updates the failed status @__early_cpu_boot_status,
  * with MMU turned off.
@@ -78,10 +80,21 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
 static DEFINE_SPINLOCK(swapper_pgdir_lock);
 static DEFINE_MUTEX(fixmap_lock);
 
-void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
+void noinstr set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 {
 	pgd_t *fixmap_pgdp;
 
+	/*
+	 * Don't bother with the fixmap if swapper_pg_dir is still mapped
+	 * writable in the kernel mapping.
+	 */
+	if (rodata_is_rw) {
+		WRITE_ONCE(*pgdp, pgd);
+		dsb(ishst);
+		isb();
+		return;
+	}
+
 	spin_lock(&swapper_pgdir_lock);
 	fixmap_pgdp = pgd_set_fixmap(__pa_symbol(pgdp));
 	WRITE_ONCE(*fixmap_pgdp, pgd);
@@ -613,6 +626,7 @@ void mark_rodata_ro(void)
 	 * to cover NOTES and EXCEPTION_TABLE.
 	 */
 	section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
+	WRITE_ONCE(rodata_is_rw, false);
 	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
 
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 7/9] arm64: mm: omit redundant remap of kernel image
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2022-07-01 13:04 ` [PATCH v6 6/9] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 8/9] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 9/9] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel
  8 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

Now that the early kernel mapping is created with all the right
attributes and segment boundaries, there is no longer a need to recreate
it and switch to it. This also means we no longer have to copy the kasan
shadow or some parts of the fixmap from one set of page tables to the
other.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kasan.h          |   2 -
 arch/arm64/include/asm/mmu.h            |   2 +-
 arch/arm64/kernel/image-vars.h          |   1 +
 arch/arm64/kernel/pi/early_map_kernel.c |   4 +
 arch/arm64/mm/kasan_init.c              |  15 ---
 arch/arm64/mm/mmu.c                     | 101 +++-----------------
 6 files changed, 17 insertions(+), 108 deletions(-)

diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
index 12d5f47f7dbe..ab52688ac4bd 100644
--- a/arch/arm64/include/asm/kasan.h
+++ b/arch/arm64/include/asm/kasan.h
@@ -36,12 +36,10 @@ void kasan_init(void);
 #define _KASAN_SHADOW_START(va)	(KASAN_SHADOW_END - (1UL << ((va) - KASAN_SHADOW_SCALE_SHIFT)))
 #define KASAN_SHADOW_START      _KASAN_SHADOW_START(vabits_actual)
 
-void kasan_copy_shadow(pgd_t *pgdir);
 asmlinkage void kasan_early_init(void);
 
 #else
 static inline void kasan_init(void) { }
-static inline void kasan_copy_shadow(pgd_t *pgdir) { }
 #endif
 
 #endif
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 48f8466a4be9..a93d495d6e8c 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -73,7 +73,7 @@ extern void mark_linear_text_alias_ro(void);
 extern bool kaslr_requires_kpti(void);
 
 #define INIT_MM_CONTEXT(name)	\
-	.pgd = init_pg_dir,
+	.pgd = swapper_pg_dir,
 
 #endif	/* !__ASSEMBLY__ */
 #endif
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 032c3e1aff20..b010cba48916 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -59,6 +59,7 @@ PROVIDE(__pi___memset			= __pi_memset);
  */
 PROVIDE(__pi_init_pg_dir		= init_pg_dir);
 PROVIDE(__pi_init_pg_end		= init_pg_end);
+PROVIDE(__pi_swapper_pg_dir		= swapper_pg_dir);
 
 PROVIDE(__pi__text			= _text);
 PROVIDE(__pi__stext               	= _stext);
diff --git a/arch/arm64/kernel/pi/early_map_kernel.c b/arch/arm64/kernel/pi/early_map_kernel.c
index 4199548584fb..b2e660a1d003 100644
--- a/arch/arm64/kernel/pi/early_map_kernel.c
+++ b/arch/arm64/kernel/pi/early_map_kernel.c
@@ -335,6 +335,10 @@ static void map_kernel(void *fdt, u64 kaslr_offset, u64 va_offset)
 			    text_prot, false);
 		dsb(ishst);
 	}
+
+	// Copy the root page table to its final location
+	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PGD_SIZE);
+	idmap_cpu_replace_ttbr1(swapper_pg_dir);
 }
 
 asmlinkage u64 early_map_kernel(void *fdt)
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index e969e68de005..df98f496539f 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -184,21 +184,6 @@ static void __init kasan_map_populate(unsigned long start, unsigned long end,
 	kasan_pgd_populate(start & PAGE_MASK, PAGE_ALIGN(end), node, false);
 }
 
-/*
- * Copy the current shadow region into a new pgdir.
- */
-void __init kasan_copy_shadow(pgd_t *pgdir)
-{
-	pgd_t *pgdp, *pgdp_new, *pgdp_end;
-
-	pgdp = pgd_offset_k(KASAN_SHADOW_START);
-	pgdp_end = pgd_offset_k(KASAN_SHADOW_END);
-	pgdp_new = pgd_offset_pgd(pgdir, KASAN_SHADOW_START);
-	do {
-		set_pgd(pgdp_new, READ_ONCE(*pgdp));
-	} while (pgdp++, pgdp_new++, pgdp != pgdp_end);
-}
-
 static void __init clear_pgds(unsigned long start,
 			unsigned long end)
 {
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8965f621e2c3..0a23f4f14f99 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -633,9 +633,9 @@ void mark_rodata_ro(void)
 	debug_checkwx();
 }
 
-static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
-				      pgprot_t prot, struct vm_struct *vma,
-				      int flags, unsigned long vm_flags)
+static void __init declare_vma(void *va_start, void *va_end,
+			       struct vm_struct *vma,
+			       unsigned long vm_flags)
 {
 	phys_addr_t pa_start = __pa_symbol(va_start);
 	unsigned long size = va_end - va_start;
@@ -643,9 +643,6 @@ static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(pa_start));
 	BUG_ON(!PAGE_ALIGNED(size));
 
-	__create_pgd_mapping(pgdp, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc, flags);
-
 	if (!(vm_flags & VM_NO_GUARD))
 		size += PAGE_SIZE;
 
@@ -708,87 +705,18 @@ core_initcall(map_entry_trampoline);
 #endif
 
 /*
- * Open coded check for BTI, only for use to determine configuration
- * for early mappings for before the cpufeature code has run.
- */
-static bool arm64_early_this_cpu_has_bti(void)
-{
-	u64 pfr1;
-
-	if (!IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-		return false;
-
-	pfr1 = __read_sysreg_by_encoding(SYS_ID_AA64PFR1_EL1);
-	return cpuid_feature_extract_unsigned_field(pfr1,
-						    ID_AA64PFR1_BT_SHIFT);
-}
-
-/*
- * Create fine-grained mappings for the kernel.
+ * Declare the VMA areas for the kernel
  */
-static void __init map_kernel(pgd_t *pgdp)
+static void __init declare_kernel_vmas(void)
 {
 	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
 				vmlinux_initdata, vmlinux_data;
 
-	/*
-	 * External debuggers may need to write directly to the text
-	 * mapping to install SW breakpoints. Allow this (only) when
-	 * explicitly requested with rodata=off.
-	 */
-	pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
-
-	/*
-	 * If we have a CPU that supports BTI and a kernel built for
-	 * BTI then mark the kernel executable text as guarded pages
-	 * now so we don't have to rewrite the page tables later.
-	 */
-	if (arm64_early_this_cpu_has_bti())
-		text_prot = __pgprot_modify(text_prot, PTE_GP, PTE_GP);
-
-	/*
-	 * Only rodata will be remapped with different permissions later on,
-	 * all other segments are allowed to use contiguous mappings.
-	 */
-	map_kernel_segment(pgdp, _stext, _etext, text_prot, &vmlinux_text, 0,
-			   VM_NO_GUARD);
-	map_kernel_segment(pgdp, __start_rodata, __inittext_begin, PAGE_KERNEL,
-			   &vmlinux_rodata, NO_CONT_MAPPINGS, VM_NO_GUARD);
-	map_kernel_segment(pgdp, __inittext_begin, __inittext_end, text_prot,
-			   &vmlinux_inittext, 0, VM_NO_GUARD);
-	map_kernel_segment(pgdp, __initdata_begin, __initdata_end, PAGE_KERNEL,
-			   &vmlinux_initdata, 0, VM_NO_GUARD);
-	map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0);
-
-	if (!READ_ONCE(pgd_val(*pgd_offset_pgd(pgdp, FIXADDR_START)))) {
-		/*
-		 * The fixmap falls in a separate pgd to the kernel, and doesn't
-		 * live in the carveout for the swapper_pg_dir. We can simply
-		 * re-use the existing dir for the fixmap.
-		 */
-		set_pgd(pgd_offset_pgd(pgdp, FIXADDR_START),
-			READ_ONCE(*pgd_offset_k(FIXADDR_START)));
-	} else if (CONFIG_PGTABLE_LEVELS > 3) {
-		pgd_t *bm_pgdp;
-		p4d_t *bm_p4dp;
-		pud_t *bm_pudp;
-		/*
-		 * The fixmap shares its top level pgd entry with the kernel
-		 * mapping. This can really only occur when we are running
-		 * with 16k/4 levels, so we can simply reuse the pud level
-		 * entry instead.
-		 */
-		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
-		bm_pgdp = pgd_offset_pgd(pgdp, FIXADDR_START);
-		bm_p4dp = p4d_offset(bm_pgdp, FIXADDR_START);
-		bm_pudp = pud_set_fixmap_offset(bm_p4dp, FIXADDR_START);
-		pud_populate(&init_mm, bm_pudp, lm_alias(bm_pmd));
-		pud_clear_fixmap();
-	} else {
-		BUG();
-	}
-
-	kasan_copy_shadow(pgdp);
+	declare_vma(_stext, _etext, &vmlinux_text, VM_NO_GUARD);
+	declare_vma(__start_rodata, __inittext_begin, &vmlinux_rodata, VM_NO_GUARD);
+	declare_vma(__inittext_begin, __inittext_end, &vmlinux_inittext, VM_NO_GUARD);
+	declare_vma(__initdata_begin, __initdata_end, &vmlinux_initdata, VM_NO_GUARD);
+	declare_vma(_data, _end, &vmlinux_data, 0);
 }
 
 static void __init create_idmap(void)
@@ -824,24 +752,17 @@ static void __init create_idmap(void)
 void __init paging_init(void)
 {
 	pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
-	extern pgd_t init_idmap_pg_dir[];
 
 	idmap_t0sz = 63UL - __fls(__pa_symbol(_end) | GENMASK(VA_BITS_MIN - 1, 0));
 
-	map_kernel(pgdp);
 	map_mem(pgdp);
 
 	pgd_clear_fixmap();
 
-	cpu_replace_ttbr1(lm_alias(swapper_pg_dir), init_idmap_pg_dir);
-	init_mm.pgd = swapper_pg_dir;
-
-	memblock_phys_free(__pa_symbol(init_pg_dir),
-			   __pa_symbol(init_pg_end) - __pa_symbol(init_pg_dir));
-
 	memblock_allow_resize();
 
 	create_idmap();
+	declare_kernel_vmas();
 }
 
 /*
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 8/9] mm: add arch hook to validate mmap() prot flags
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2022-07-01 13:04 ` [PATCH v6 7/9] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  2022-07-01 13:04 ` [PATCH v6 9/9] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel
  8 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

Add a hook to permit architectures to perform validation on the prot
flags passed to mmap(), like arch_validate_prot() does for mprotect().
This will be used by arm64 to reject PROT_WRITE+PROT_EXEC mappings on
configurations that run with WXN enabled.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/mman.h | 15 +++++++++++++++
 mm/mmap.c            |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 58b3abd457a3..53ac72310ce0 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -120,6 +120,21 @@ static inline bool arch_validate_flags(unsigned long flags)
 #define arch_validate_flags arch_validate_flags
 #endif
 
+#ifndef arch_validate_mmap_prot
+/*
+ * This is called from mmap(), which ignores unknown prot bits so the default
+ * is to accept anything.
+ *
+ * Returns true if the prot flags are valid
+ */
+static inline bool arch_validate_mmap_prot(unsigned long prot,
+					   unsigned long addr)
+{
+	return true;
+}
+#define arch_validate_mmap_prot arch_validate_mmap_prot
+#endif
+
 /*
  * Optimisation macro.  It is equivalent to:
  *      (x & bit1) ? bit2 : 0
diff --git a/mm/mmap.c b/mm/mmap.c
index 61e6135c54ef..4a585879937d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1437,6 +1437,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		if (!(file && path_noexec(&file->f_path)))
 			prot |= PROT_EXEC;
 
+	if (!arch_validate_mmap_prot(prot, addr))
+		return -EACCES;
+
 	/* force arch specific MAP_FIXED handling in get_unmapped_area */
 	if (flags & MAP_FIXED_NOREPLACE)
 		flags |= MAP_FIXED;
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 9/9] arm64: mm: add support for WXN memory translation attribute
  2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2022-07-01 13:04 ` [PATCH v6 8/9] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
@ 2022-07-01 13:04 ` Ard Biesheuvel
  8 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2022-07-01 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Mark Brown, Anshuman Khandual

The AArch64 virtual memory system supports a global WXN control, which
can be enabled to make all writable mappings implicitly no-exec. This is
a useful hardening feature, as it prevents mistakes in managing page
table permissions from being exploited to attack the system.

When enabled at EL1, the restrictions apply to both EL1 and EL0. EL1 is
completely under our control, and has been cleaned up to allow WXN to be
enabled from boot onwards. EL0 is not under our control, but given that
widely deployed security features such as selinux or PaX already limit
the ability of user space to create mappings that are writable and
executable at the same time, the impact of enabling this for EL0 is
expected to be limited. (For this reason, common user space libraries
that have a legitimate need for manipulating executable code already
carry fallbacks such as [0].)

If enabled at compile time, the feature can still be disabled at boot if
needed, by passing arm64.nowxn on the kernel command line.

[0] https://github.com/libffi/libffi/blob/master/src/closures.c#L440

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/arm64/Kconfig                      | 11 ++++++
 arch/arm64/include/asm/cpufeature.h     |  9 +++++
 arch/arm64/include/asm/mman.h           | 36 ++++++++++++++++++++
 arch/arm64/include/asm/mmu_context.h    | 30 +++++++++++++++-
 arch/arm64/kernel/pi/early_map_kernel.c | 11 +++++-
 arch/arm64/mm/mmu.c                     | 33 ++++++++++++++----
 arch/arm64/mm/proc.S                    | 23 +++++++++++++
 7 files changed, 144 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1652a9800ebe..d262d5ab4316 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1422,6 +1422,17 @@ config RODATA_FULL_DEFAULT_ENABLED
 	  This requires the linear region to be mapped down to pages,
 	  which may adversely affect performance in some cases.
 
+config ARM64_WXN
+	bool "Enable WXN attribute so all writable mappings are non-exec"
+	help
+	  Set the WXN bit in the SCTLR system register so that all writable
+	  mappings are treated as if the PXN/UXN bit is set as well.
+	  If this is set to Y, it can still be disabled at runtime by
+	  passing 'arm64.nowxn' on the kernel command line.
+
+	  This should only be set if no software needs to be supported that
+	  relies on being able to execute from writable mappings.
+
 config ARM64_SW_TTBR0_PAN
 	bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
 	help
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 14a8f3d93add..86ec12ceeaff 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -915,6 +915,15 @@ extern struct arm64_ftr_override id_aa64isar2_override;
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
+extern int arm64_no_wxn;
+
+static inline bool arm64_wxn_enabled(void)
+{
+	if (!IS_ENABLED(CONFIG_ARM64_WXN))
+		return false;
+	return arm64_no_wxn == 0;
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 5966ee4a6154..6d4940342ba7 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -35,11 +35,40 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
 }
 #define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
 
+static inline bool arm64_check_wx_prot(unsigned long prot,
+				       struct task_struct *tsk)
+{
+	/*
+	 * When we are running with SCTLR_ELx.WXN==1, writable mappings are
+	 * implicitly non-executable. This means we should reject such mappings
+	 * when user space attempts to create them using mmap() or mprotect().
+	 */
+	if (arm64_wxn_enabled() &&
+	    ((prot & (PROT_WRITE | PROT_EXEC)) == (PROT_WRITE | PROT_EXEC))) {
+		/*
+		 * User space libraries such as libffi carry elaborate
+		 * heuristics to decide whether it is worth it to even attempt
+		 * to create writable executable mappings, as PaX or selinux
+		 * enabled systems will outright reject it. They will usually
+		 * fall back to something else (e.g., two separate shared
+		 * mmap()s of a temporary file) on failure.
+		 */
+		pr_info_ratelimited(
+			"process %s (%d) attempted to create PROT_WRITE+PROT_EXEC mapping\n",
+			tsk->comm, tsk->pid);
+		return false;
+	}
+	return true;
+}
+
 static inline bool arch_validate_prot(unsigned long prot,
 	unsigned long addr __always_unused)
 {
 	unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM;
 
+	if (!arm64_check_wx_prot(prot, current))
+		return false;
+
 	if (system_supports_bti())
 		supported |= PROT_BTI;
 
@@ -50,6 +79,13 @@ static inline bool arch_validate_prot(unsigned long prot,
 }
 #define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr)
 
+static inline bool arch_validate_mmap_prot(unsigned long prot,
+					   unsigned long addr)
+{
+	return arm64_check_wx_prot(prot, current);
+}
+#define arch_validate_mmap_prot arch_validate_mmap_prot
+
 static inline bool arch_validate_flags(unsigned long vm_flags)
 {
 	if (!system_supports_mte())
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index c7ccd82db1d2..cd4bb5410a18 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -19,13 +19,41 @@
 #include <asm/cacheflush.h>
 #include <asm/cpufeature.h>
 #include <asm/proc-fns.h>
-#include <asm-generic/mm_hooks.h>
 #include <asm/cputype.h>
 #include <asm/sysreg.h>
 #include <asm/tlbflush.h>
 
 extern bool rodata_full;
 
+static inline int arch_dup_mmap(struct mm_struct *oldmm,
+				struct mm_struct *mm)
+{
+	return 0;
+}
+
+static inline void arch_exit_mmap(struct mm_struct *mm)
+{
+}
+
+static inline void arch_unmap(struct mm_struct *mm,
+			unsigned long start, unsigned long end)
+{
+}
+
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+		bool write, bool execute, bool foreign)
+{
+	if (IS_ENABLED(CONFIG_ARM64_WXN) && execute &&
+	    (vma->vm_flags & (VM_WRITE | VM_EXEC)) == (VM_WRITE | VM_EXEC)) {
+		pr_warn_ratelimited(
+			"process %s (%d) attempted to execute from writable memory\n",
+			current->comm, current->pid);
+		/* disallow unless the nowxn override is set */
+		return !arm64_wxn_enabled();
+	}
+	return true;
+}
+
 static inline void contextidr_thread_switch(struct task_struct *next)
 {
 	if (!IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR))
diff --git a/arch/arm64/kernel/pi/early_map_kernel.c b/arch/arm64/kernel/pi/early_map_kernel.c
index b2e660a1d003..c6c1bae343d3 100644
--- a/arch/arm64/kernel/pi/early_map_kernel.c
+++ b/arch/arm64/kernel/pi/early_map_kernel.c
@@ -278,15 +278,24 @@ static bool arm64_early_this_cpu_has_e0pd(void)
 						    ID_AA64MMFR2_E0PD_SHIFT);
 }
 
+extern void disable_wxn(void);
+
 static void map_kernel(void *fdt, u64 kaslr_offset, u64 va_offset)
 {
 	pgd_t *pgdp = (void *)init_pg_dir + PAGE_SIZE;
 	pgprot_t text_prot = PAGE_KERNEL_ROX;
 	pgprot_t data_prot = PAGE_KERNEL;
 	pgprot_t prot;
+	bool nowxn = false;
 
-	if (cmdline_has(fdt, "rodata=off"))
+	if (cmdline_has(fdt, "rodata=off")) {
 		text_prot = PAGE_KERNEL_EXEC;
+		nowxn = true;
+	}
+
+	if (IS_ENABLED(CONFIG_ARM64_WXN) &&
+	    (nowxn || cmdline_has(fdt, "arm64.nowxn")))
+		disable_wxn();
 
 	// If we have a CPU that supports BTI and a kernel built for
 	// BTI then mark the kernel executable text as guarded pages
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0a23f4f14f99..0f3e556ccfae 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -73,6 +73,21 @@ long __section(".mmuoff.data.write") __early_cpu_boot_status;
 unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
 EXPORT_SYMBOL(empty_zero_page);
 
+#ifdef CONFIG_ARM64_WXN
+asmlinkage int arm64_no_wxn __ro_after_init;
+
+static int set_arm64_no_wxn(char *str)
+{
+	arm64_no_wxn = 1;
+
+	// Make the value visible to booting secondaries
+	dcache_clean_inval_poc((u64)&arm64_no_wxn,
+			       (u64)&arm64_no_wxn + sizeof(arm64_no_wxn));
+	return 1;
+}
+__setup("arm64.nowxn", set_arm64_no_wxn);
+#endif
+
 static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
 static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
 static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
@@ -660,15 +675,19 @@ static int __init parse_rodata(char *arg)
 	int ret = strtobool(arg, &rodata_enabled);
 	if (!ret) {
 		rodata_full = false;
-		return 0;
-	}
+	} else {
+		/* permit 'full' in addition to boolean options */
+		if (strcmp(arg, "full"))
+			return -EINVAL;
 
-	/* permit 'full' in addition to boolean options */
-	if (strcmp(arg, "full"))
-		return -EINVAL;
+		rodata_enabled = true;
+		rodata_full = true;
+	}
 
-	rodata_enabled = true;
-	rodata_full = true;
+#ifdef CONFIG_ARM64_WXN
+	if (!rodata_enabled)
+		set_arm64_no_wxn(NULL);
+#endif
 	return 0;
 }
 early_param("rodata", parse_rodata);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 4322ddf5e02f..656c78f82a17 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -498,8 +498,31 @@ SYM_FUNC_START(__cpu_setup)
 	 * Prepare SCTLR
 	 */
 	mov_q	x0, INIT_SCTLR_EL1_MMU_ON
+#ifdef CONFIG_ARM64_WXN
+	ldr_l	w1, arm64_no_wxn, x1
+	tst	w1, #0x1			// WXN disabled on command line?
+	orr	x1, x0, #SCTLR_ELx_WXN
+	csel	x0, x0, x1, ne
+#endif
 	ret					// return to head.S
 
 	.unreq	mair
 	.unreq	tcr
 SYM_FUNC_END(__cpu_setup)
+
+#ifdef CONFIG_ARM64_WXN
+	.align	2
+SYM_CODE_START(__pi_disable_wxn)
+	mrs	x0, sctlr_el1
+	bic	x1, x0, #SCTLR_ELx_M
+	msr	sctlr_el1, x1
+	isb
+	tlbi	vmalle1
+	dsb	nsh
+	isb
+	bic	x0, x0, #SCTLR_ELx_WXN
+	msr	sctlr_el1, x0
+	isb
+	ret
+SYM_CODE_END(__pi_disable_wxn)
+#endif
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/9] arm64: kaslr: use an ordinary command line param for nokaslr
  2022-07-01 13:04 ` [PATCH v6 1/9] arm64: kaslr: use an ordinary command line param for nokaslr Ard Biesheuvel
@ 2022-07-01 14:07   ` Mark Brown
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Brown @ 2022-07-01 14:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Anshuman Khandual


[-- Attachment #1.1: Type: text/plain, Size: 372 bytes --]

On Fri, Jul 01, 2022 at 03:04:36PM +0200, Ard Biesheuvel wrote:
> We no longer need to rely on the idreg-override hack to parse the
> 'nokaslr' command line parameter, given that we now parse it way earlier
> already, before the kernel is even mapped. So for later access to its
> value, we can just use core_param() instead.

Reviewed-by: Mark Brown <broonie@kernel.org>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/9] arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN
  2022-07-01 13:04 ` [PATCH v6 2/9] arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN Ard Biesheuvel
@ 2022-07-01 14:12   ` Mark Brown
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Brown @ 2022-07-01 14:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Marc Zyngier, Will Deacon, Mark Rutland,
	Kees Cook, Catalin Marinas, Anshuman Khandual


[-- Attachment #1.1: Type: text/plain, Size: 806 bytes --]

On Fri, Jul 01, 2022 at 03:04:37PM +0200, Ard Biesheuvel wrote:
> Our virtual KASLR displacement consists of a fully randomized multiple
> of 2 MiB, combined with an offset that is equal to the physical
> placement modulo 2 MiB. This arrangement ensures that we can always use
> 2 MiB block mappings (or contiguous PTE mappings for 16k or 64k pages)
> to map the kernel.
> 
> This means that a KASLR offset of less than 2 MiB is simply the product
> of this physical displacement, and no randomization has actually taken
> place. So let's avoid misreporting this case as 'KASLR enabled'.

Might be worth backporting to stable?  Though the consequence is
just that we might enable KPTI when we don't *need* it which is
not the end of the world.

Reviewed-by: Mark Brown <broonie@kernel.org>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-07-01 14:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-01 13:04 [PATCH v6 0/9] arm64: add support for WXN Ard Biesheuvel
2022-07-01 13:04 ` [PATCH v6 1/9] arm64: kaslr: use an ordinary command line param for nokaslr Ard Biesheuvel
2022-07-01 14:07   ` Mark Brown
2022-07-01 13:04 ` [PATCH v6 2/9] arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN Ard Biesheuvel
2022-07-01 14:12   ` Mark Brown
2022-07-01 13:04 ` [PATCH v6 3/9] arm64: kaslr: drop special case for ThunderX in kaslr_requires_kpti() Ard Biesheuvel
2022-07-01 13:04 ` [PATCH v6 4/9] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
2022-07-01 13:04 ` [PATCH v6 5/9] arm64: head: move early kernel mapping and relocation code to C code Ard Biesheuvel
2022-07-01 13:04 ` [PATCH v6 6/9] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
2022-07-01 13:04 ` [PATCH v6 7/9] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
2022-07-01 13:04 ` [PATCH v6 8/9] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
2022-07-01 13:04 ` [PATCH v6 9/9] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).