linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot
@ 2022-03-30 15:41 Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 01/18] arm64: head: drop idmap_ptrs_per_pgd Ard Biesheuvel
                   ` (18 more replies)
  0 siblings, 19 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel, Marc Zyngier, Will Deacon

This is a followup to a previous series of mine [0], and it aims to
streamline the boot flow with respect to cache maintenance and redundant
copying of data in memory.

Combined with my proof-of-concept firmware for QEMU/arm64 [1], this
results in a boot where both the kernel and the initrd are loaded
straight to their final locations in memory, while the physical
placement of the kernel image is still randomized by the loader. It also
removes all memory accesses performed with the MMU and caches off
(except for instruction fetches) that are done from the moment the VM
comes out of reset.

On the kernel side, this comes down to:
- increasing the ID map to cover the entire kernel image, so we can
  build the kernel page tables with the MMU and caches enabled;
- deal with the MMU already being on at boot, and keep it on while
  building the ID map;
- ensure all stores to memory that are now done with the MMU and caches
  on are not negated by the subsequent cache invalidation.

Additionally, this series removes the little dance we do to create a
kernel mapping, relocate the kernel, run the KASLR init code, tear down
the old mapping and creat a new one, relocate the kernel again, and
finally enter the kernel proper. Instead, it invokes a minimal C
function 'kaslr_early_init()' while running from the ID map with a
temporary mapping of the FDT in TTBR1. This change represents a
substantial of the diffstat, as it requires some work to instantiate
code that can run safely from the wrong address. It is also the most
likely to raise objections, so it can be dropped from this series if
desired (patch #9 is the meat, and #8 is a prerequisite patch that could
be dropped in that case as well)

Changes since v1:
- Remove the dodgy handling of the KASLR seed, which was necessary to
  avoid doing two iterations of the setup/teardown of the page tables.
  This is now dealt with by creating the TTBR1 page tables while
  executing from TTBR0, and so all memory manipulations are still done
  with the MMU and caches on. (This is also the reason patch #9 is
  optional now)
- Only boot from EFI with the MMU and caches on if the image was not
  moved around in memory. Otherwise, we cannot rely on the firmware's ID
  map to have created an executable mapping for the copied code.

[0] https://lore.kernel.org/all/20220304175657.2744400-1-ardb@kernel.org/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/efilite.git/

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>

Ard Biesheuvel (18):
  arm64: head: drop idmap_ptrs_per_pgd
  arm64: head: split off idmap creation code
  arm64: kernel: drop unnecessary PoC cache clean+invalidate
  arm64: head: cover entire kernel image in ID map
  arm64: head: factor out TTBR1 assignment into a macro
  arm64: head: populate kernel page tables with MMU and caches on
  arm64: kaslr: deal with init called with VA randomization enabled
  arm64: setup: defer R/O remapping of FDT
  arm64: head: relocate kernel only a single time if KASLR is enabled
  arm64: head: record the MMU state at primary entry
  arm64: mm: make vabits_actual a build time constant if possible
  arm64: head: avoid cache invalidation when entering with the MMU on
  arm64: head: record CPU boot mode after enabling the MMU
  arm64: head: clean the ID map page to the PoC
  arm64: lds: move idmap_pg_dir out of .rodata
  efi: libstub: pass image handle to handle_kernel_image()
  efi/arm64: libstub: run image in place if randomized by the loader
  arm64: efi/libstub: enter with the MMU on if executing in place

 arch/arm64/include/asm/kernel-pgtable.h   |   2 +-
 arch/arm64/include/asm/memory.h           |   6 +
 arch/arm64/include/asm/mmu_context.h      |   1 -
 arch/arm64/kernel/Makefile                |   2 +-
 arch/arm64/kernel/efi-entry.S             |   4 +
 arch/arm64/kernel/head.S                  | 276 +++++++++++---------
 arch/arm64/kernel/kaslr.c                 |  86 +-----
 arch/arm64/kernel/pi/Makefile             |  33 +++
 arch/arm64/kernel/pi/kaslr_early.c        | 128 +++++++++
 arch/arm64/kernel/setup.c                 |   8 +-
 arch/arm64/kernel/vmlinux.lds.S           |   9 +-
 arch/arm64/mm/mmu.c                       |  15 +-
 drivers/firmware/efi/libstub/arm32-stub.c |   3 +-
 drivers/firmware/efi/libstub/arm64-stub.c |  15 +-
 drivers/firmware/efi/libstub/efi-stub.c   |   2 +-
 drivers/firmware/efi/libstub/efistub.h    |   3 +-
 drivers/firmware/efi/libstub/riscv-stub.c |   3 +-
 include/linux/efi.h                       |  11 +
 18 files changed, 380 insertions(+), 227 deletions(-)
 create mode 100644 arch/arm64/kernel/pi/Makefile
 create mode 100644 arch/arm64/kernel/pi/kaslr_early.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 01/18] arm64: head: drop idmap_ptrs_per_pgd
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 02/18] arm64: head: split off idmap creation code Ard Biesheuvel
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

The assignment of idmap_ptrs_per_pgd lacks any cache invalidation, even
though it is updated with the MMU and caches disabled. However, we never
bother to read the value again except in the very next instruction, and
so we can just drop the variable entirely.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/mmu_context.h | 1 -
 arch/arm64/kernel/head.S             | 4 ++--
 arch/arm64/mm/mmu.c                  | 1 -
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 6770667b34a3..52eb234101a2 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -61,7 +61,6 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
  * physical memory, in which case it will be smaller.
  */
 extern u64 idmap_t0sz;
-extern u64 idmap_ptrs_per_pgd;
 
 /*
  * Ensure TCR.T0SZ is set to the provided value.
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 6a98f1a38c29..127e29f38715 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -318,6 +318,7 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	 * this number conveniently equals the number of leading zeroes in
 	 * the physical address of __idmap_text_end.
 	 */
+	mov	x4, PTRS_PER_PGD
 	adrp	x5, __idmap_text_end
 	clz	x5, x5
 	cmp	x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
@@ -345,16 +346,15 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 
 	mov	x4, EXTRA_PTRS
 	create_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6
+	mov	x4, PTRS_PER_PGD
 #else
 	/*
 	 * If VA_BITS == 48, we don't have to configure an additional
 	 * translation level, but the top-level table has more entries.
 	 */
 	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
-	str_l	x4, idmap_ptrs_per_pgd, x5
 #endif
 1:
-	ldr_l	x4, idmap_ptrs_per_pgd
 	adr_l	x6, __idmap_text_end		// __pa(__idmap_text_end)
 
 	map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 626ec32873c6..e74a6453cb14 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -44,7 +44,6 @@
 #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
 
 u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
-u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
 
 u64 __section(".mmuoff.data.write") vabits_actual;
 EXPORT_SYMBOL(vabits_actual);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 02/18] arm64: head: split off idmap creation code
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 01/18] arm64: head: drop idmap_ptrs_per_pgd Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 03/18] arm64: kernel: drop unnecessary PoC cache clean+invalidate Ard Biesheuvel
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

Split off the creation of the ID map page tables, so that we can avoid
running it again unnecessarily when KASLR is in effect (which only
randomizes the virtual placement). This will permit us to drop some
explicit cache maintenance to the PoC which was necessary because the
cache invalidation being performed on some global variables might
otherwise clobber unrelated variables that happen to share a cacheline.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S | 101 ++++++++++----------
 1 file changed, 52 insertions(+), 49 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 127e29f38715..275cd14a70c2 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -84,7 +84,7 @@
 	 *  Register   Scope                      Purpose
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x23        primary_entry() .. start_kernel()        physical misalignment/KASLR offset
-	 *  x28        __create_page_tables()                   callee preserved temp register
+	 *  x28        clear_page_tables()                      callee preserved temp register
 	 *  x19/x20    __primary_switch()                       callee preserved temp registers
 	 *  x24        __primary_switch() .. relocate_kernel()  current RELR displacement
 	 */
@@ -94,7 +94,10 @@ SYM_CODE_START(primary_entry)
 	adrp	x23, __PHYS_OFFSET
 	and	x23, x23, MIN_KIMG_ALIGN - 1	// KASLR offset, defaults to 0
 	bl	set_cpu_boot_mode_flag
-	bl	__create_page_tables
+	bl	clear_page_tables
+	bl	create_idmap
+	bl	create_kernel_mapping
+
 	/*
 	 * The following calls CPU setup code, see arch/arm64/mm/proc.S for
 	 * details.
@@ -122,6 +125,35 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
 	b	dcache_inval_poc		// tail call
 SYM_CODE_END(preserve_boot_args)
 
+SYM_FUNC_START_LOCAL(clear_page_tables)
+	mov	x28, lr
+
+	/*
+	 * Invalidate the init page tables to avoid potential dirty cache lines
+	 * being evicted. Other page tables are allocated in rodata as part of
+	 * the kernel image, and thus are clean to the PoC per the boot
+	 * protocol.
+	 */
+	adrp	x0, init_pg_dir
+	adrp	x1, init_pg_end
+	bl	dcache_inval_poc
+
+	/*
+	 * Clear the init page tables.
+	 */
+	adrp	x0, init_pg_dir
+	adrp	x1, init_pg_end
+	sub	x1, x1, x0
+1:	stp	xzr, xzr, [x0], #16
+	stp	xzr, xzr, [x0], #16
+	stp	xzr, xzr, [x0], #16
+	stp	xzr, xzr, [x0], #16
+	subs	x1, x1, #64
+	b.ne	1b
+
+	ret	x28
+SYM_FUNC_END(clear_page_tables)
+
 /*
  * Macro to create a table entry to the next page.
  *
@@ -252,44 +284,8 @@ SYM_CODE_END(preserve_boot_args)
 	populate_entries \tbl, \count, \istart, \iend, \flags, #SWAPPER_BLOCK_SIZE, \tmp
 	.endm
 
-/*
- * Setup the initial page tables. We only setup the barest amount which is
- * required to get the kernel running. The following sections are required:
- *   - identity mapping to enable the MMU (low address, TTBR0)
- *   - first few MB of the kernel linear mapping to jump to once the MMU has
- *     been enabled
- */
-SYM_FUNC_START_LOCAL(__create_page_tables)
-	mov	x28, lr
 
-	/*
-	 * Invalidate the init page tables to avoid potential dirty cache lines
-	 * being evicted. Other page tables are allocated in rodata as part of
-	 * the kernel image, and thus are clean to the PoC per the boot
-	 * protocol.
-	 */
-	adrp	x0, init_pg_dir
-	adrp	x1, init_pg_end
-	bl	dcache_inval_poc
-
-	/*
-	 * Clear the init page tables.
-	 */
-	adrp	x0, init_pg_dir
-	adrp	x1, init_pg_end
-	sub	x1, x1, x0
-1:	stp	xzr, xzr, [x0], #16
-	stp	xzr, xzr, [x0], #16
-	stp	xzr, xzr, [x0], #16
-	stp	xzr, xzr, [x0], #16
-	subs	x1, x1, #64
-	b.ne	1b
-
-	mov	x7, SWAPPER_MM_MMUFLAGS
-
-	/*
-	 * Create the identity mapping.
-	 */
+SYM_FUNC_START_LOCAL(create_idmap)
 	adrp	x0, idmap_pg_dir
 	adrp	x3, __idmap_text_start		// __pa(__idmap_text_start)
 
@@ -356,12 +352,23 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 #endif
 1:
 	adr_l	x6, __idmap_text_end		// __pa(__idmap_text_end)
+	mov	x7, SWAPPER_MM_MMUFLAGS
 
 	map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14
 
 	/*
-	 * Map the kernel image (starting with PHYS_OFFSET).
+	 * Since the page tables have been populated with non-cacheable
+	 * accesses (MMU disabled), invalidate those tables again to
+	 * remove any speculatively loaded cache lines.
 	 */
+	dmb	sy
+
+	adrp	x0, idmap_pg_dir
+	adrp	x1, idmap_pg_end
+	b	dcache_inval_poc		// tail call
+SYM_FUNC_END(create_idmap)
+
+SYM_FUNC_START_LOCAL(create_kernel_mapping)
 	adrp	x0, init_pg_dir
 	mov_q	x5, KIMAGE_VADDR		// compile time __va(_text)
 	add	x5, x5, x23			// add KASLR displacement
@@ -370,6 +377,7 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	adrp	x3, _text			// runtime __pa(_text)
 	sub	x6, x6, x3			// _end - _text
 	add	x6, x6, x5			// runtime __va(_end)
+	mov	x7, SWAPPER_MM_MMUFLAGS
 
 	map_memory x0, x1, x5, x6, x7, x3, x4, x10, x11, x12, x13, x14
 
@@ -380,16 +388,10 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	 */
 	dmb	sy
 
-	adrp	x0, idmap_pg_dir
-	adrp	x1, idmap_pg_end
-	bl	dcache_inval_poc
-
 	adrp	x0, init_pg_dir
 	adrp	x1, init_pg_end
-	bl	dcache_inval_poc
-
-	ret	x28
-SYM_FUNC_END(__create_page_tables)
+	b	dcache_inval_poc		// tail call
+SYM_FUNC_END(create_kernel_mapping)
 
 	/*
 	 * Initialize CPU registers with task-specific and cpu-specific context.
@@ -881,7 +883,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	pre_disable_mmu_workaround
 	msr	sctlr_el1, x20			// disable the MMU
 	isb
-	bl	__create_page_tables		// recreate kernel mapping
+	bl	clear_page_tables
+	bl	create_kernel_mapping		// recreate kernel mapping
 
 	tlbi	vmalle1				// Remove any stale TLB entries
 	dsb	nsh
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 03/18] arm64: kernel: drop unnecessary PoC cache clean+invalidate
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 01/18] arm64: head: drop idmap_ptrs_per_pgd Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 02/18] arm64: head: split off idmap creation code Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 04/18] arm64: head: cover entire kernel image in ID map Ard Biesheuvel
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

Some early boot code runs before the virtual placement of the kernel is
finalized, and we used to go back to the very start and recreate the ID
map along with the page tables describing the virtual kernel mapping,
and this involved setting some global variables with the caches off.

In order to ensure that global state created by the KASLR code is not
corrupted by the cache invalidation that occurs in that case, we needed
to clean those global variables to the PoC explicitly.

This is no longer needed now that the ID map is created only once (and
the associated global variable updates are no longer repeated). So drop
the cache maintenance that is no longer necessary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/kaslr.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 418b2bba1521..d5542666182f 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -13,7 +13,6 @@
 #include <linux/pgtable.h>
 #include <linux/random.h>
 
-#include <asm/cacheflush.h>
 #include <asm/fixmap.h>
 #include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
@@ -72,9 +71,6 @@ u64 __init kaslr_early_init(void)
 	 * we end up running with module randomization disabled.
 	 */
 	module_alloc_base = (u64)_etext - MODULES_VSIZE;
-	dcache_clean_inval_poc((unsigned long)&module_alloc_base,
-			    (unsigned long)&module_alloc_base +
-				    sizeof(module_alloc_base));
 
 	/*
 	 * Try to map the FDT early. If this fails, we simply bail,
@@ -174,13 +170,6 @@ u64 __init kaslr_early_init(void)
 	module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21;
 	module_alloc_base &= PAGE_MASK;
 
-	dcache_clean_inval_poc((unsigned long)&module_alloc_base,
-			    (unsigned long)&module_alloc_base +
-				    sizeof(module_alloc_base));
-	dcache_clean_inval_poc((unsigned long)&memstart_offset_seed,
-			    (unsigned long)&memstart_offset_seed +
-				    sizeof(memstart_offset_seed));
-
 	return offset;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 04/18] arm64: head: cover entire kernel image in ID map
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 03/18] arm64: kernel: drop unnecessary PoC cache clean+invalidate Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 05/18] arm64: head: factor out TTBR1 assignment into a macro Ard Biesheuvel
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

As a first step towards avoiding the need to create, tear down and
recreate the kernel virtual mapping with MMU and caches disabled, start
by expanding the ID map so it covers the page tables as well as all
executable code. This will allow us to populate the page tables with the
MMU and caches on, and call KASLR init code before setting up the
virtual mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h |  2 +-
 arch/arm64/kernel/head.S                | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 96dc0f7da258..b62200a9456e 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -87,7 +87,7 @@
 			+ EARLY_PUDS((vstart), (vend))	/* each PUD needs a next level page table */	\
 			+ EARLY_PMDS((vstart), (vend)))	/* each PMD needs a next level page table */
 #define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end))
-#define IDMAP_DIR_SIZE		(IDMAP_PGTABLE_LEVELS * PAGE_SIZE)
+#define IDMAP_DIR_SIZE		INIT_DIR_SIZE
 
 /* Initial memory map size */
 #if ARM64_KERNEL_USES_PMD_MAPS
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 275cd14a70c2..727561972e4a 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -287,7 +287,7 @@ SYM_FUNC_END(clear_page_tables)
 
 SYM_FUNC_START_LOCAL(create_idmap)
 	adrp	x0, idmap_pg_dir
-	adrp	x3, __idmap_text_start		// __pa(__idmap_text_start)
+	adrp	x3, _text			// __pa(_text)
 
 #ifdef CONFIG_ARM64_VA_BITS_52
 	mrs_s	x6, SYS_ID_AA64MMFR2_EL1
@@ -312,10 +312,10 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	 * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
 	 * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
 	 * this number conveniently equals the number of leading zeroes in
-	 * the physical address of __idmap_text_end.
+	 * the physical address of _end.
 	 */
 	mov	x4, PTRS_PER_PGD
-	adrp	x5, __idmap_text_end
+	adrp	x5, _end
 	clz	x5, x5
 	cmp	x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
 	b.ge	1f			// .. then skip VA range extension
@@ -351,7 +351,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
 #endif
 1:
-	adr_l	x6, __idmap_text_end		// __pa(__idmap_text_end)
+	adr_l	x6, _end			// __pa(_end)
 	mov	x7, SWAPPER_MM_MMUFLAGS
 
 	map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14
@@ -884,7 +884,7 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	msr	sctlr_el1, x20			// disable the MMU
 	isb
 	bl	clear_page_tables
-	bl	create_kernel_mapping		// recreate kernel mapping
+	bl	create_kernel_mapping		// Recreate kernel mapping
 
 	tlbi	vmalle1				// Remove any stale TLB entries
 	dsb	nsh
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 05/18] arm64: head: factor out TTBR1 assignment into a macro
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 04/18] arm64: head: cover entire kernel image in ID map Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 06/18] arm64: head: populate kernel page tables with MMU and caches on Ard Biesheuvel
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

Create a macro load_ttbr1 to avoid having to repeat the same instruction
sequence 3 times in a subsequent patch. No functional change intended.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 727561972e4a..7c4aefacf6c2 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -688,6 +688,13 @@ SYM_FUNC_END(__secondary_too_slow)
 	dc	ivac, \tmp1			// Invalidate potentially stale cache line
 	.endm
 
+	.macro		load_ttbr1, reg, tmp
+	phys_to_ttbr	\reg, \reg
+	offset_ttbr1 	\reg, \tmp
+	msr		ttbr1_el1, \reg
+	isb
+	.endm
+
 /*
  * Enable the MMU.
  *
@@ -709,12 +716,9 @@ SYM_FUNC_START(__enable_mmu)
 	b.gt    __no_granule_support
 	update_early_cpu_boot_status 0, x2, x3
 	adrp	x2, idmap_pg_dir
-	phys_to_ttbr x1, x1
 	phys_to_ttbr x2, x2
 	msr	ttbr0_el1, x2			// load TTBR0
-	offset_ttbr1 x1, x3
-	msr	ttbr1_el1, x1			// load TTBR1
-	isb
+	load_ttbr1 x1, x3
 
 	set_sctlr_el1	x0
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 06/18] arm64: head: populate kernel page tables with MMU and caches on
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 05/18] arm64: head: factor out TTBR1 assignment into a macro Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 07/18] arm64: kaslr: deal with init called with VA randomization enabled Ard Biesheuvel
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

Now that we can access the entire kernel image via the ID map, we can
execute the page table population code with the MMU and caches enabled.
The only thing we need to ensure is that translations via TTBR1 remain
disabled while we are updating the page tables the second time around,
in case KASLR wants them to be randomized.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S | 48 +++++---------------
 1 file changed, 11 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 7c4aefacf6c2..5d4cb481e42f 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -84,8 +84,6 @@
 	 *  Register   Scope                      Purpose
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x23        primary_entry() .. start_kernel()        physical misalignment/KASLR offset
-	 *  x28        clear_page_tables()                      callee preserved temp register
-	 *  x19/x20    __primary_switch()                       callee preserved temp registers
 	 *  x24        __primary_switch() .. relocate_kernel()  current RELR displacement
 	 */
 SYM_CODE_START(primary_entry)
@@ -94,9 +92,7 @@ SYM_CODE_START(primary_entry)
 	adrp	x23, __PHYS_OFFSET
 	and	x23, x23, MIN_KIMG_ALIGN - 1	// KASLR offset, defaults to 0
 	bl	set_cpu_boot_mode_flag
-	bl	clear_page_tables
 	bl	create_idmap
-	bl	create_kernel_mapping
 
 	/*
 	 * The following calls CPU setup code, see arch/arm64/mm/proc.S for
@@ -126,18 +122,6 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
 SYM_CODE_END(preserve_boot_args)
 
 SYM_FUNC_START_LOCAL(clear_page_tables)
-	mov	x28, lr
-
-	/*
-	 * Invalidate the init page tables to avoid potential dirty cache lines
-	 * being evicted. Other page tables are allocated in rodata as part of
-	 * the kernel image, and thus are clean to the PoC per the boot
-	 * protocol.
-	 */
-	adrp	x0, init_pg_dir
-	adrp	x1, init_pg_end
-	bl	dcache_inval_poc
-
 	/*
 	 * Clear the init page tables.
 	 */
@@ -151,7 +135,7 @@ SYM_FUNC_START_LOCAL(clear_page_tables)
 	subs	x1, x1, #64
 	b.ne	1b
 
-	ret	x28
+	ret
 SYM_FUNC_END(clear_page_tables)
 
 /*
@@ -381,16 +365,7 @@ SYM_FUNC_START_LOCAL(create_kernel_mapping)
 
 	map_memory x0, x1, x5, x6, x7, x3, x4, x10, x11, x12, x13, x14
 
-	/*
-	 * Since the page tables have been populated with non-cacheable
-	 * accesses (MMU disabled), invalidate those tables again to
-	 * remove any speculatively loaded cache lines.
-	 */
-	dmb	sy
-
-	adrp	x0, init_pg_dir
-	adrp	x1, init_pg_end
-	b	dcache_inval_poc		// tail call
+	ret
 SYM_FUNC_END(create_kernel_mapping)
 
 	/*
@@ -862,13 +837,13 @@ SYM_FUNC_END(__relocate_kernel)
 #endif
 
 SYM_FUNC_START_LOCAL(__primary_switch)
-#ifdef CONFIG_RANDOMIZE_BASE
-	mov	x19, x0				// preserve new SCTLR_EL1 value
-	mrs	x20, sctlr_el1			// preserve old SCTLR_EL1 value
-#endif
+	adrp	x1, reserved_pg_dir
+	bl	__enable_mmu
+	bl	clear_page_tables
+	bl	create_kernel_mapping
 
 	adrp	x1, init_pg_dir
-	bl	__enable_mmu
+	load_ttbr1 x1, x2
 #ifdef CONFIG_RELOCATABLE
 #ifdef CONFIG_RELR
 	mov	x24, #0				// no RELR displacement yet
@@ -884,9 +859,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	 * to take into account by discarding the current kernel mapping and
 	 * creating a new one.
 	 */
-	pre_disable_mmu_workaround
-	msr	sctlr_el1, x20			// disable the MMU
-	isb
+	adrp	x1, reserved_pg_dir		// Disable translations via TTBR1
+	load_ttbr1 x1, x2
 	bl	clear_page_tables
 	bl	create_kernel_mapping		// Recreate kernel mapping
 
@@ -894,8 +868,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	dsb	nsh
 	isb
 
-	set_sctlr_el1	x19			// re-enable the MMU
-
+	adrp	x1, init_pg_dir			// Re-enable translations via TTBR1
+	load_ttbr1 x1, x2
 	bl	__relocate_kernel
 #endif
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 07/18] arm64: kaslr: deal with init called with VA randomization enabled
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 06/18] arm64: head: populate kernel page tables with MMU and caches on Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 08/18] arm64: setup: defer R/O remapping of FDT Ard Biesheuvel
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

We will be entering kaslr_init() fully randomized, and so any addresses
taken by this code already take the randomization into account.

This means that taking the address of _end or _etext and adding offset
to it produces the wrong value, given that _end and _etext references
will have been fixed up already, and therefore already incorporate
offset.

So instead of referring to these symbols directly, use their offsets
relative to _text, which should produce values that depend on the size
and layout of the Image only. Then, add KIMAGE_VADDR to obtain the
unrandomized values.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/kaslr.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index d5542666182f..3b12715642ce 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -141,6 +141,8 @@ u64 __init kaslr_early_init(void)
 		return offset % SZ_2G;
 
 	if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
+		u64 end = (u64)_end - (u64)_text + KIMAGE_VADDR;
+
 		/*
 		 * Randomize the module region over a 2 GB window covering the
 		 * kernel. This reduces the risk of modules leaking information
@@ -150,9 +152,11 @@ u64 __init kaslr_early_init(void)
 		 * resolved normally.)
 		 */
 		module_range = SZ_2G - (u64)(_end - _stext);
-		module_alloc_base = max((u64)_end + offset - SZ_2G,
+		module_alloc_base = max(end + offset - SZ_2G,
 					(u64)MODULES_VADDR);
 	} else {
+		u64 end = (u64)_etext - (u64)_text + KIMAGE_VADDR;
+
 		/*
 		 * Randomize the module region by setting module_alloc_base to
 		 * a PAGE_SIZE multiple in the range [_etext - MODULES_VSIZE,
@@ -163,7 +167,7 @@ u64 __init kaslr_early_init(void)
 		 * when ARM64_MODULE_PLTS is enabled.
 		 */
 		module_range = MODULES_VSIZE - (u64)(_etext - _stext);
-		module_alloc_base = (u64)_etext + offset - MODULES_VSIZE;
+		module_alloc_base = end + offset - MODULES_VSIZE;
 	}
 
 	/* use the lower 21 bits to randomize the base of the module region */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 08/18] arm64: setup: defer R/O remapping of FDT
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 07/18] arm64: kaslr: deal with init called with VA randomization enabled Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 09/18] arm64: head: relocate kernel only a single time if KASLR is enabled Ard Biesheuvel
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

We will be moving the call to kaslr_init() into setup_arch() in an
upcoming patch, and this needs the FDT to be writable so the KASLR seed
can be wiped from it.

So break out the R/O remapping of the FDT from setup_machine_fdt() and
call it explicitly from setup_arch().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/setup.c |  6 +++---
 arch/arm64/mm/mmu.c       | 12 +++++++-----
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 3505789cf4bd..ebf69312eabf 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -203,9 +203,6 @@ static void __init setup_machine_fdt(phys_addr_t dt_phys)
 			cpu_relax();
 	}
 
-	/* Early fixups are done, map the FDT as read-only now */
-	fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL_RO);
-
 	name = of_flat_dt_get_machine_name();
 	if (!name)
 		return;
@@ -316,6 +313,9 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 
 	setup_machine_fdt(__fdt_pointer);
 
+	/* Early fixups are done, map the FDT as read-only now */
+	fixmap_remap_fdt(__fdt_pointer, NULL, PAGE_KERNEL_RO);
+
 	/*
 	 * Initialise the static keys early as they may be enabled by the
 	 * cpufeature code and early parameters.
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e74a6453cb14..20dd95a750bc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1324,7 +1324,7 @@ void __set_fixmap(enum fixed_addresses idx,
 void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 {
 	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
-	int offset;
+	int offset, dt_size;
 	void *dt_virt;
 
 	/*
@@ -1363,13 +1363,15 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 	if (fdt_magic(dt_virt) != FDT_MAGIC)
 		return NULL;
 
-	*size = fdt_totalsize(dt_virt);
-	if (*size > MAX_FDT_SIZE)
+	dt_size = fdt_totalsize(dt_virt);
+	if (size)
+		*size = dt_size;
+	if (dt_size > MAX_FDT_SIZE)
 		return NULL;
 
-	if (offset + *size > SWAPPER_BLOCK_SIZE)
+	if (offset + dt_size > SWAPPER_BLOCK_SIZE)
 		create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
-			       round_up(offset + *size, SWAPPER_BLOCK_SIZE), prot);
+			       round_up(offset + dt_size, SWAPPER_BLOCK_SIZE), prot);
 
 	return dt_virt;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 09/18] arm64: head: relocate kernel only a single time if KASLR is enabled
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 08/18] arm64: setup: defer R/O remapping of FDT Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 10/18] arm64: head: record the MMU state at primary entry Ard Biesheuvel
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

We have accumulated a small pile of hacks to allow us to run C code and
parse command line options and DT properties before actually entering
the kernel in C mode. This was needed because we need the command line
(taken from the DT) to tell us whether or not to randomize the virtual
address space before entering the kernel proper. However, this code has
expanded little by little and now creates global state unrelated to the
virtual mapping of the kernel before the mapping is torn down and set up
again, and the BSS cleared for a second time. This has created some
issues in the past, and it would be better to avoid this little dance if
possible.

So instead, let's create a temporary mapping of the device tree before
even mapping the kernel, and execute the bare minimum of code to decide
whether or not KASLR should be enabled, and what the seed is. Only then,
create the virtual kernel mapping, clear BSS, etc and proceed as normal.
This avoids the issues around inconsistent global state due to BSS being
cleared twice, and is generally more maintainable, as it permits us to
defer all the remaining DT parsing and KASLR initialization to a later
time.

This means the relocation fixup code runs only a single time as well,
allowing us to simplify the RELR handling code too, which is not
idempotent and was therefore required to keep track of the offset that
was applied the first time around.

Note that this means we have to clone a pair of FDT library objects, so
that we can control how they are built - we need the stack protector
and other instrumentation disabled so that the code can tolerate being
called this early.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h    |   2 +
 arch/arm64/kernel/Makefile         |   2 +-
 arch/arm64/kernel/head.S           |  87 +++++++------
 arch/arm64/kernel/kaslr.c          |  71 ++---------
 arch/arm64/kernel/pi/Makefile      |  33 +++++
 arch/arm64/kernel/pi/kaslr_early.c | 128 ++++++++++++++++++++
 arch/arm64/kernel/setup.c          |   2 +
 7 files changed, 216 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0af70d9abede..2f1a48be11cf 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -191,6 +191,8 @@ static inline unsigned long kaslr_offset(void)
 	return kimage_vaddr - KIMAGE_VADDR;
 }
 
+void kaslr_init(void *);
+
 /*
  * Allow all memory at the discovery stage. We will clip it later.
  */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 986837d7ec82..45f7a0e2d35e 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -59,7 +59,7 @@ obj-$(CONFIG_ACPI)			+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 obj-$(CONFIG_PARAVIRT)			+= paravirt.o
-obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
+obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o pi/
 obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 obj-$(CONFIG_ELF_CORE)			+= elfcore.o
 obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 5d4cb481e42f..f3b096daf1c5 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -83,14 +83,11 @@
 	 *
 	 *  Register   Scope                      Purpose
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
-	 *  x23        primary_entry() .. start_kernel()        physical misalignment/KASLR offset
-	 *  x24        __primary_switch() .. relocate_kernel()  current RELR displacement
+	 *  x23        __primary_switch() .. relocate_kernel()  physical misalignment/KASLR offset
 	 */
 SYM_CODE_START(primary_entry)
 	bl	preserve_boot_args
 	bl	init_kernel_el			// w0=cpu_boot_mode
-	adrp	x23, __PHYS_OFFSET
-	and	x23, x23, MIN_KIMG_ALIGN - 1	// KASLR offset, defaults to 0
 	bl	set_cpu_boot_mode_flag
 	bl	create_idmap
 
@@ -368,6 +365,27 @@ SYM_FUNC_START_LOCAL(create_kernel_mapping)
 	ret
 SYM_FUNC_END(create_kernel_mapping)
 
+#ifdef CONFIG_RANDOMIZE_BASE
+// Create a temporary mapping of the device tree blob in the kernel page tables
+// so that we can grab the KASLR seed and command line options before mapping
+// the kernel at its randomized address.
+SYM_FUNC_START_LOCAL(create_temp_fdt_mapping)
+	adrp	x2, init_pg_dir
+	mov_q	x5, KIMAGE_VADDR
+	mov	x4, PTRS_PER_PGD
+	bic	x3, x21, #SZ_4M - 1		// map a 4 MiB block
+	add	x6, x5, #SZ_4M
+	mov	x7, SWAPPER_MM_MMUFLAGS
+
+	sub	x0, x21, x3			// return the mapped address in x0
+	add	x0, x0, x5
+
+	map_memory x2, x1, x5, x6, x7, x3, x4, x10, x11, x12, x13, x14
+
+	ret
+SYM_FUNC_END(create_temp_fdt_mapping)
+#endif
+
 	/*
 	 * Initialize CPU registers with task-specific and cpu-specific context.
 	 *
@@ -430,16 +448,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	mov	x0, x21				// pass FDT address in x0
 	bl	early_fdt_map			// Try mapping the FDT early
 	bl	init_feature_override		// Parse cpu feature overrides
-#ifdef CONFIG_RANDOMIZE_BASE
-	tst	x23, ~(MIN_KIMG_ALIGN - 1)	// already running randomized?
-	b.ne	0f
-	bl	kaslr_early_init		// parse FDT for KASLR options
-	cbz	x0, 0f				// KASLR disabled? just proceed
-	orr	x23, x23, x0			// record KASLR offset
-	ldp	x29, x30, [sp], #16		// we must enable KASLR, return
-	ret					// to __primary_switch()
-0:
-#endif
 	bl	switch_to_vhe			// Prefer VHE if possible
 	ldp	x29, x30, [sp], #16
 	bl	start_kernel
@@ -785,29 +793,19 @@ SYM_FUNC_START_LOCAL(__relocate_kernel)
 	 * entry in x9, the address being relocated by the current address or
 	 * bitmap entry in x13 and the address being relocated by the current
 	 * bit in x14.
-	 *
-	 * Because addends are stored in place in the binary, RELR relocations
-	 * cannot be applied idempotently. We use x24 to keep track of the
-	 * currently applied displacement so that we can correctly relocate if
-	 * __relocate_kernel is called twice with non-zero displacements (i.e.
-	 * if there is both a physical misalignment and a KASLR displacement).
 	 */
 	ldr	w9, =__relr_offset		// offset to reloc table
 	ldr	w10, =__relr_size		// size of reloc table
 	add	x9, x9, x11			// __va(.relr)
 	add	x10, x9, x10			// __va(.relr) + sizeof(.relr)
 
-	sub	x15, x23, x24			// delta from previous offset
-	cbz	x15, 7f				// nothing to do if unchanged
-	mov	x24, x23			// save new offset
-
 2:	cmp	x9, x10
 	b.hs	7f
 	ldr	x11, [x9], #8
 	tbnz	x11, #0, 3f			// branch to handle bitmaps
 	add	x13, x11, x23
 	ldr	x12, [x13]			// relocate address entry
-	add	x12, x12, x15
+	add	x12, x12, x23
 	str	x12, [x13], #8			// adjust to start of bitmap
 	b	2b
 
@@ -816,7 +814,7 @@ SYM_FUNC_START_LOCAL(__relocate_kernel)
 	cbz	x11, 6f
 	tbz	x11, #0, 5f			// skip bit if not set
 	ldr	x12, [x14]			// relocate bit
-	add	x12, x12, x15
+	add	x12, x12, x23
 	str	x12, [x14]
 
 5:	add	x14, x14, #8			// move to next bit's address
@@ -840,38 +838,35 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x1, reserved_pg_dir
 	bl	__enable_mmu
 	bl	clear_page_tables
-	bl	create_kernel_mapping
 
-	adrp	x1, init_pg_dir
-	load_ttbr1 x1, x2
 #ifdef CONFIG_RELOCATABLE
-#ifdef CONFIG_RELR
-	mov	x24, #0				// no RELR displacement yet
-#endif
-	bl	__relocate_kernel
+	adrp	x23, __PHYS_OFFSET
+	and	x23, x23, MIN_KIMG_ALIGN - 1
 #ifdef CONFIG_RANDOMIZE_BASE
-	ldr	x8, =__primary_switched
-	adrp	x0, __PHYS_OFFSET
-	blr	x8
+	bl	create_temp_fdt_mapping
+	adrp	x1, init_pg_dir
+	load_ttbr1 x1, x2
 
-	/*
-	 * If we return here, we have a KASLR displacement in x23 which we need
-	 * to take into account by discarding the current kernel mapping and
-	 * creating a new one.
-	 */
-	adrp	x1, reserved_pg_dir		// Disable translations via TTBR1
+	adrp	x1, init_stack
+	add	sp, x1, #THREAD_SIZE
+	mov	x29, xzr
+	bl	__pi_kaslr_early_init
+	orr	x23, x23, x0			// record KASLR offset
+
+	adrp	x1, reserved_pg_dir
 	load_ttbr1 x1, x2
-	bl	clear_page_tables
-	bl	create_kernel_mapping		// Recreate kernel mapping
 
 	tlbi	vmalle1				// Remove any stale TLB entries
 	dsb	nsh
 	isb
+#endif
+#endif
+	bl	create_kernel_mapping
 
-	adrp	x1, init_pg_dir			// Re-enable translations via TTBR1
+	adrp	x1, init_pg_dir
 	load_ttbr1 x1, x2
+#ifdef CONFIG_RELOCATABLE
 	bl	__relocate_kernel
-#endif
 #endif
 	ldr	x8, =__primary_switched
 	adrp	x0, __PHYS_OFFSET
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 3b12715642ce..78339a2aa3f8 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -24,7 +24,6 @@ enum kaslr_status {
 	KASLR_ENABLED,
 	KASLR_DISABLED_CMDLINE,
 	KASLR_DISABLED_NO_SEED,
-	KASLR_DISABLED_FDT_REMAP,
 };
 
 static enum kaslr_status __initdata kaslr_status;
@@ -52,18 +51,9 @@ static __init u64 get_kaslr_seed(void *fdt)
 
 struct arm64_ftr_override kaslr_feature_override __initdata;
 
-/*
- * This routine will be executed with the kernel mapped at its default virtual
- * address, and if it returns successfully, the kernel will be remapped, and
- * start_kernel() will be executed from a randomized virtual offset. The
- * relocation will result in all absolute references (e.g., static variables
- * containing function pointers) to be reinitialized, and zero-initialized
- * .bss variables will be reset to 0.
- */
-u64 __init kaslr_early_init(void)
+void __init kaslr_init(void *fdt)
 {
-	void *fdt;
-	u64 seed, offset, mask, module_range;
+	u64 seed, module_range;
 	unsigned long raw;
 
 	/*
@@ -72,17 +62,6 @@ u64 __init kaslr_early_init(void)
 	 */
 	module_alloc_base = (u64)_etext - MODULES_VSIZE;
 
-	/*
-	 * Try to map the FDT early. If this fails, we simply bail,
-	 * and proceed with KASLR disabled. We will make another
-	 * attempt at mapping the FDT in setup_machine()
-	 */
-	fdt = get_early_fdt_ptr();
-	if (!fdt) {
-		kaslr_status = KASLR_DISABLED_FDT_REMAP;
-		return 0;
-	}
-
 	/*
 	 * Retrieve (and wipe) the seed from the FDT
 	 */
@@ -94,7 +73,7 @@ u64 __init kaslr_early_init(void)
 	 */
 	if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) {
 		kaslr_status = KASLR_DISABLED_CMDLINE;
-		return 0;
+		return;
 	}
 
 	/*
@@ -105,41 +84,14 @@ u64 __init kaslr_early_init(void)
 	if (arch_get_random_seed_long_early(&raw))
 		seed ^= raw;
 
-	if (!seed) {
+	if (!seed || !kimage_voffset) {
 		kaslr_status = KASLR_DISABLED_NO_SEED;
-		return 0;
+		return;
 	}
 
-	/*
-	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
-	 * kernel image offset from the seed. Let's place the kernel in the
-	 * middle half of the VMALLOC area (VA_BITS_MIN - 2), and stay clear of
-	 * the lower and upper quarters to avoid colliding with other
-	 * allocations.
-	 * Even if we could randomize at page granularity for 16k and 64k pages,
-	 * let's always round to 2 MB so we don't interfere with the ability to
-	 * map using contiguous PTEs
-	 */
-	mask = ((1UL << (VA_BITS_MIN - 2)) - 1) & ~(SZ_2M - 1);
-	offset = BIT(VA_BITS_MIN - 3) + (seed & mask);
-
 	/* use the top 16 bits to randomize the linear region */
 	memstart_offset_seed = seed >> 48;
 
-	if (!IS_ENABLED(CONFIG_KASAN_VMALLOC) &&
-	    (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
-	     IS_ENABLED(CONFIG_KASAN_SW_TAGS)))
-		/*
-		 * KASAN without KASAN_VMALLOC does not expect the module region
-		 * to intersect the vmalloc region, since shadow memory is
-		 * allocated for each module at load time, whereas the vmalloc
-		 * region is shadowed by KASAN zero pages. So keep modules
-		 * out of the vmalloc region if KASAN is enabled without
-		 * KASAN_VMALLOC, and put the kernel well within 4 GB of the
-		 * module region.
-		 */
-		return offset % SZ_2G;
-
 	if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
 		u64 end = (u64)_end - (u64)_text + KIMAGE_VADDR;
 
@@ -152,7 +104,7 @@ u64 __init kaslr_early_init(void)
 		 * resolved normally.)
 		 */
 		module_range = SZ_2G - (u64)(_end - _stext);
-		module_alloc_base = max(end + offset - SZ_2G,
+		module_alloc_base = max(end + kimage_voffset - SZ_2G,
 					(u64)MODULES_VADDR);
 	} else {
 		u64 end = (u64)_etext - (u64)_text + KIMAGE_VADDR;
@@ -167,17 +119,15 @@ u64 __init kaslr_early_init(void)
 		 * when ARM64_MODULE_PLTS is enabled.
 		 */
 		module_range = MODULES_VSIZE - (u64)(_etext - _stext);
-		module_alloc_base = end + offset - MODULES_VSIZE;
+		module_alloc_base = end + kimage_voffset - MODULES_VSIZE;
 	}
 
 	/* use the lower 21 bits to randomize the base of the module region */
 	module_alloc_base += (module_range * (seed & ((1 << 21) - 1))) >> 21;
 	module_alloc_base &= PAGE_MASK;
-
-	return offset;
 }
 
-static int __init kaslr_init(void)
+static int __init kaslr_report_status(void)
 {
 	switch (kaslr_status) {
 	case KASLR_ENABLED:
@@ -189,11 +139,8 @@ static int __init kaslr_init(void)
 	case KASLR_DISABLED_NO_SEED:
 		pr_warn("KASLR disabled due to lack of seed\n");
 		break;
-	case KASLR_DISABLED_FDT_REMAP:
-		pr_warn("KASLR disabled due to FDT remapping failure\n");
-		break;
 	}
 
 	return 0;
 }
-core_initcall(kaslr_init)
+core_initcall(kaslr_report_status)
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
new file mode 100644
index 000000000000..3c503312b5dc
--- /dev/null
+++ b/arch/arm64/kernel/pi/Makefile
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2022 Google LLC
+
+KBUILD_CFLAGS	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \
+		   -Os -DDISABLE_BRANCH_PROFILING $(DISABLE_STACKLEAK_PLUGIN) \
+		   $(call cc-option,-mbranch-protection=none) \
+		   -I$(srctree)/scripts/dtc/libfdt -fno-stack-protector \
+		   -include $(srctree)/include/linux/hidden.h \
+		   -D__DISABLE_EXPORTS -ffreestanding \
+		   $(call cc-option,-fno-addrsig)
+
+# remove SCS flags from all objects in this directory
+KBUILD_CFLAGS	:= $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
+# disable LTO
+KBUILD_CFLAGS	:= $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
+
+GCOV_PROFILE	:= n
+KASAN_SANITIZE	:= n
+KCSAN_SANITIZE	:= n
+UBSAN_SANITIZE	:= n
+KCOV_INSTRUMENT	:= n
+
+$(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
+			       --remove-section=.note.gnu.property \
+			       --prefix-alloc-sections=.init
+$(obj)/%.pi.o: $(obj)/%.o FORCE
+	$(call if_changed,objcopy)
+
+$(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
+	$(call if_changed_rule,cc_o_c)
+
+obj-y		:= kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+extra-y		:= $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
new file mode 100644
index 000000000000..bbba01807ab6
--- /dev/null
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2022 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+// NOTE: code in this file runs *very* early, and is not permitted to use
+// global variables or anything that relies on absolute addressing.
+
+#include <linux/libfdt.h>
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/string.h>
+
+#include <asm/archrandom.h>
+#include <asm/memory.h>
+
+/* taken from lib/string.c */
+static char *__strstr(const char *s1, const char *s2)
+{
+	size_t l1, l2;
+
+	l2 = strlen(s2);
+	if (!l2)
+		return (char *)s1;
+	l1 = strlen(s1);
+	while (l1 >= l2) {
+		l1--;
+		if (!memcmp(s1, s2, l2))
+			return (char *)s1;
+		s1++;
+	}
+	return NULL;
+}
+static bool cmdline_contains_nokaslr(const u8 *cmdline)
+{
+	const u8 *str;
+
+	str = __strstr(cmdline, "nokaslr");
+	return str == cmdline || (str > cmdline && *(str - 1) == ' ');
+}
+
+static bool is_kaslr_disabled_cmdline(void *fdt)
+{
+	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
+		int node;
+		const u8 *prop;
+
+		node = fdt_path_offset(fdt, "/chosen");
+		if (node < 0)
+			goto out;
+
+		prop = fdt_getprop(fdt, node, "bootargs", NULL);
+		if (!prop)
+			goto out;
+
+		if (cmdline_contains_nokaslr(prop))
+			return true;
+
+		if (IS_ENABLED(CONFIG_CMDLINE_EXTEND))
+			goto out;
+
+		return false;
+	}
+out:
+	return cmdline_contains_nokaslr(CONFIG_CMDLINE);
+}
+
+static u64 get_kaslr_seed(void *fdt)
+{
+	int node, len;
+	const fdt64_t *prop;
+	u64 ret;
+
+	node = fdt_path_offset(fdt, "/chosen");
+	if (node < 0)
+		return 0;
+
+	prop = fdt_getprop(fdt, node, "kaslr-seed", &len);
+	if (!prop || len != sizeof(u64))
+		return 0;
+
+	ret = fdt64_to_cpu(*prop);
+	return ret;
+}
+
+asmlinkage u64 kaslr_early_init(void *fdt)
+{
+	u64 seed, mask, offset;
+
+	if (is_kaslr_disabled_cmdline(fdt))
+		return 0;
+
+	seed = get_kaslr_seed(fdt);
+	if (!seed && (!__early_cpu_has_rndr() ||
+		      !__arm64_rndr((unsigned long *)&seed)))
+		return 0;
+
+	/*
+	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
+	 * kernel image offset from the seed. Let's place the kernel in the
+	 * middle half of the VMALLOC area (VA_BITS_MIN - 2), and stay clear of
+	 * the lower and upper quarters to avoid colliding with other
+	 * allocations.
+	 * Even if we could randomize at page granularity for 16k and 64k pages,
+	 * let's always round to 2 MB so we don't interfere with the ability to
+	 * map using contiguous PTEs
+	 */
+	mask = ((1UL << (VA_BITS_MIN - 2)) - 1) & ~(SZ_2M - 1);
+	offset = BIT(VA_BITS_MIN - 3) + (seed & mask);
+
+	if (!IS_ENABLED(CONFIG_KASAN_VMALLOC) &&
+	    (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
+	     IS_ENABLED(CONFIG_KASAN_SW_TAGS)))
+		/*
+		 * KASAN without KASAN_VMALLOC does not expect the module region
+		 * to intersect the vmalloc region, since shadow memory is
+		 * allocated for each module at load time, whereas the vmalloc
+		 * region is shadowed by KASAN zero pages. So keep modules
+		 * out of the vmalloc region if KASAN is enabled without
+		 * KASAN_VMALLOC, and put the kernel well within 4 GB of the
+		 * module region.
+		 */
+		return offset % SZ_2G;
+
+	return offset;
+}
+
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index ebf69312eabf..26fb4bce5f8b 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -313,6 +313,8 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 
 	setup_machine_fdt(__fdt_pointer);
 
+	kaslr_init(initial_boot_params);
+
 	/* Early fixups are done, map the FDT as read-only now */
 	fixmap_remap_fdt(__fdt_pointer, NULL, PAGE_KERNEL_RO);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 10/18] arm64: head: record the MMU state at primary entry
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 09/18] arm64: head: relocate kernel only a single time if KASLR is enabled Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 11/18] arm64: mm: make vabits_actual a build time constant if possible Ard Biesheuvel
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

Prepare for being able to deal with primary entry with the MMU and
caches enabled, by recording whether or not we entered with the MMU on
in register x22.

While at it, add disable_mmu_workaround macro invocations to
init_kernel_el, as its manipulation of SCTLR_ELx may come down to
disabling of the MMU after subsequent patches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index f3b096daf1c5..44e2e39046a9 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -83,9 +83,11 @@
 	 *
 	 *  Register   Scope                      Purpose
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
+	 *  x22        primary_entry() .. start_kernel()        whether we entered with the MMU on
 	 *  x23        __primary_switch() .. relocate_kernel()  physical misalignment/KASLR offset
 	 */
 SYM_CODE_START(primary_entry)
+	bl	record_mmu_state
 	bl	preserve_boot_args
 	bl	init_kernel_el			// w0=cpu_boot_mode
 	bl	set_cpu_boot_mode_flag
@@ -101,6 +103,17 @@ SYM_CODE_START(primary_entry)
 	b	__primary_switch
 SYM_CODE_END(primary_entry)
 
+SYM_CODE_START_LOCAL(record_mmu_state)
+	mrs	x22, CurrentEL
+	cmp	x22, #CurrentEL_EL2
+	mrs	x22, sctlr_el1
+	b.ne	0f
+	mrs	x22, sctlr_el2
+0:	tst	x22, #SCTLR_ELx_M
+	cset	w22, ne
+	ret
+SYM_CODE_END(record_mmu_state)
+
 /*
  * Preserve the arguments passed by the bootloader in x0 .. x3
  */
@@ -485,6 +498,7 @@ SYM_FUNC_START(init_kernel_el)
 
 SYM_INNER_LABEL(init_el1, SYM_L_LOCAL)
 	mov_q	x0, INIT_SCTLR_EL1_MMU_OFF
+	pre_disable_mmu_workaround
 	msr	sctlr_el1, x0
 	isb
 	mov_q	x0, INIT_PSTATE_EL1
@@ -516,6 +530,7 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
 
 	/* Switching to VHE requires a sane SCTLR_EL1 as a start */
 	mov_q	x0, INIT_SCTLR_EL1_MMU_OFF
+	pre_disable_mmu_workaround
 	msr_s	SYS_SCTLR_EL12, x0
 
 	/*
@@ -531,6 +546,7 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
 
 1:
 	mov_q	x0, INIT_SCTLR_EL1_MMU_OFF
+	pre_disable_mmu_workaround
 	msr	sctlr_el1, x0
 
 	msr	elr_el2, lr
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 11/18] arm64: mm: make vabits_actual a build time constant if possible
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 10/18] arm64: head: record the MMU state at primary entry Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:41 ` [RFC PATCH v2 12/18] arm64: head: avoid cache invalidation when entering with the MMU on Ard Biesheuvel
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

Currently, we only support 52-bit virtual addressing on 64k pages
configurations, and in all other cases, vabits_actual is guaranteed to
equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in
that case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h | 4 ++++
 arch/arm64/kernel/head.S        | 7 +++----
 arch/arm64/mm/mmu.c             | 2 ++
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 2f1a48be11cf..c989f6bf5426 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -174,7 +174,11 @@
 #include <linux/types.h>
 #include <asm/bug.h>
 
+#if VA_BITS > 48
 extern u64			vabits_actual;
+#else
+#define vabits_actual		VA_BITS
+#endif
 
 extern s64			memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 44e2e39046a9..836237289ffb 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -283,19 +283,18 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	adrp	x0, idmap_pg_dir
 	adrp	x3, _text			// __pa(_text)
 
-#ifdef CONFIG_ARM64_VA_BITS_52
+#if (VA_BITS > 48)
 	mrs_s	x6, SYS_ID_AA64MMFR2_EL1
 	and	x6, x6, #(0xf << ID_AA64MMFR2_LVA_SHIFT)
 	mov	x5, #52
 	cbnz	x6, 1f
-#endif
 	mov	x5, #VA_BITS_MIN
 1:
 	adr_l	x6, vabits_actual
 	str	x5, [x6]
 	dmb	sy
 	dc	ivac, x6		// Invalidate potentially stale cache line
-
+#endif
 	/*
 	 * VA_BITS may be too small to allow for an ID mapping to be created
 	 * that covers system RAM if that is located sufficiently high in the
@@ -725,7 +724,7 @@ SYM_FUNC_START(__enable_mmu)
 SYM_FUNC_END(__enable_mmu)
 
 SYM_FUNC_START(__cpu_secondary_check52bitva)
-#ifdef CONFIG_ARM64_VA_BITS_52
+#if (VA_BITS > 48)
 	ldr_l	x0, vabits_actual
 	cmp	x0, #52
 	b.ne	2f
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 20dd95a750bc..8933d4f72427 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -45,8 +45,10 @@
 
 u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
 
+#if (VA_BITS > 48)
 u64 __section(".mmuoff.data.write") vabits_actual;
 EXPORT_SYMBOL(vabits_actual);
+#endif
 
 u64 kimage_voffset __ro_after_init;
 EXPORT_SYMBOL(kimage_voffset);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 12/18] arm64: head: avoid cache invalidation when entering with the MMU on
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 11/18] arm64: mm: make vabits_actual a build time constant if possible Ard Biesheuvel
@ 2022-03-30 15:41 ` Ard Biesheuvel
  2022-03-30 15:42 ` [RFC PATCH v2 13/18] arm64: head: record CPU boot mode after enabling the MMU Ard Biesheuvel
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:41 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

If we enter with the MMU on, there is no need for explicit cache
invalidation for stores to memory, as they will be coherent with the
caches.

Let's take advantage of this, and create the ID map with the MMU still
enabled if that is how we entered, and avoid any cache invalidation
calls in that case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 836237289ffb..db315129f15d 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -89,9 +89,9 @@
 SYM_CODE_START(primary_entry)
 	bl	record_mmu_state
 	bl	preserve_boot_args
+	bl	create_idmap
 	bl	init_kernel_el			// w0=cpu_boot_mode
 	bl	set_cpu_boot_mode_flag
-	bl	create_idmap
 
 	/*
 	 * The following calls CPU setup code, see arch/arm64/mm/proc.S for
@@ -124,11 +124,13 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
 	stp	x21, x1, [x0]			// x0 .. x3 at kernel entry
 	stp	x2, x3, [x0, #16]
 
+	cbnz	x22, 0f				// skip cache invalidation if MMU is on
 	dmb	sy				// needed before dc ivac with
 						// MMU off
 
 	add	x1, x0, #0x20			// 4 x 8 bytes
 	b	dcache_inval_poc		// tail call
+0:	ret
 SYM_CODE_END(preserve_boot_args)
 
 SYM_FUNC_START_LOCAL(clear_page_tables)
@@ -292,8 +294,10 @@ SYM_FUNC_START_LOCAL(create_idmap)
 1:
 	adr_l	x6, vabits_actual
 	str	x5, [x6]
+	cbnz	x22, 2f			// skip cache invalidation if MMU is on
 	dmb	sy
 	dc	ivac, x6		// Invalidate potentially stale cache line
+2:
 #endif
 	/*
 	 * VA_BITS may be too small to allow for an ID mapping to be created
@@ -311,13 +315,14 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	adrp	x5, _end
 	clz	x5, x5
 	cmp	x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?
-	b.ge	1f			// .. then skip VA range extension
+	b.ge	4f			// .. then skip VA range extension
 
 	adr_l	x6, idmap_t0sz
 	str	x5, [x6]
+	cbnz	x22, 3f			// skip cache invalidation if MMU is on
 	dmb	sy
 	dc	ivac, x6		// Invalidate potentially stale cache line
-
+3:
 #if (VA_BITS < 48)
 #define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
 #define EXTRA_PTRS	(1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))
@@ -343,7 +348,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	 */
 	mov	x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)
 #endif
-1:
+4:
 	adr_l	x6, _end			// __pa(_end)
 	mov	x7, SWAPPER_MM_MMUFLAGS
 
@@ -354,11 +359,13 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	 * accesses (MMU disabled), invalidate those tables again to
 	 * remove any speculatively loaded cache lines.
 	 */
+	cbnz	x22, 5f			// skip cache invalidation if MMU is on
 	dmb	sy
 
 	adrp	x0, idmap_pg_dir
 	adrp	x1, idmap_pg_end
 	b	dcache_inval_poc		// tail call
+5:	ret
 SYM_FUNC_END(create_idmap)
 
 SYM_FUNC_START_LOCAL(create_kernel_mapping)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 13/18] arm64: head: record CPU boot mode after enabling the MMU
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2022-03-30 15:41 ` [RFC PATCH v2 12/18] arm64: head: avoid cache invalidation when entering with the MMU on Ard Biesheuvel
@ 2022-03-30 15:42 ` Ard Biesheuvel
  2022-03-30 15:42 ` [RFC PATCH v2 14/18] arm64: head: clean the ID map page to the PoC Ard Biesheuvel
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:42 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

In order to avoid having to touch memory with the MMU and caches
disabled, and therefore having to invalidate it from the caches
explicitly, just defer storing the boot mode until after the MMU has
been turned on.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index db315129f15d..ec57a29f3f43 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -81,7 +81,8 @@
 	 * The following callee saved general purpose registers are used on the
 	 * primary lowlevel boot path:
 	 *
-	 *  Register   Scope                      Purpose
+	 *  Register   Scope                                    Purpose
+	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        primary_entry() .. start_kernel()        whether we entered with the MMU on
 	 *  x23        __primary_switch() .. relocate_kernel()  physical misalignment/KASLR offset
@@ -91,7 +92,7 @@ SYM_CODE_START(primary_entry)
 	bl	preserve_boot_args
 	bl	create_idmap
 	bl	init_kernel_el			// w0=cpu_boot_mode
-	bl	set_cpu_boot_mode_flag
+	mov	x20, x0
 
 	/*
 	 * The following calls CPU setup code, see arch/arm64/mm/proc.S for
@@ -576,8 +577,6 @@ SYM_FUNC_START_LOCAL(set_cpu_boot_mode_flag)
 	b.ne	1f
 	add	x1, x1, #4
 1:	str	w0, [x1]			// Save CPU boot mode
-	dmb	sy
-	dc	ivac, x1			// Invalidate potentially stale cache line
 	ret
 SYM_FUNC_END(set_cpu_boot_mode_flag)
 
@@ -615,7 +614,7 @@ SYM_DATA_END(__early_cpu_boot_status)
 	 */
 SYM_FUNC_START(secondary_holding_pen)
 	bl	init_kernel_el			// w0=cpu_boot_mode
-	bl	set_cpu_boot_mode_flag
+	mov	x20, x0
 	mrs	x0, mpidr_el1
 	mov_q	x1, MPIDR_HWID_BITMASK
 	and	x0, x0, x1
@@ -633,7 +632,7 @@ SYM_FUNC_END(secondary_holding_pen)
 	 */
 SYM_FUNC_START(secondary_entry)
 	bl	init_kernel_el			// w0=cpu_boot_mode
-	bl	set_cpu_boot_mode_flag
+	mov	x20, x0
 	b	secondary_startup
 SYM_FUNC_END(secondary_entry)
 
@@ -646,6 +645,8 @@ SYM_FUNC_START_LOCAL(secondary_startup)
 	bl	__cpu_setup			// initialise processor
 	adrp	x1, swapper_pg_dir
 	bl	__enable_mmu
+	mov	x0, x20
+	bl	set_cpu_boot_mode_flag
 	ldr	x8, =__secondary_switched
 	br	x8
 SYM_FUNC_END(secondary_startup)
@@ -861,6 +862,9 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	bl	__enable_mmu
 	bl	clear_page_tables
 
+	mov	x0, x20
+	bl	set_cpu_boot_mode_flag
+
 #ifdef CONFIG_RELOCATABLE
 	adrp	x23, __PHYS_OFFSET
 	and	x23, x23, MIN_KIMG_ALIGN - 1
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 14/18] arm64: head: clean the ID map page to the PoC
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (12 preceding siblings ...)
  2022-03-30 15:42 ` [RFC PATCH v2 13/18] arm64: head: record CPU boot mode after enabling the MMU Ard Biesheuvel
@ 2022-03-30 15:42 ` Ard Biesheuvel
  2022-03-30 15:42 ` [RFC PATCH v2 15/18] arm64: lds: move idmap_pg_dir out of .rodata Ard Biesheuvel
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:42 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

If we enter with the MMU and caches enabled, the caller may not have
performed any cache maintenance. So clean the ID mapped page to the PoC,
and invalidate the I-cache so we can safely execute from it after
disabling the MMU and caches.

Note that this means primary_entry() itself needs to be moved into the
ID map as well, as we will return from init_kernel_el() with the MMU and
caches off.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index ec57a29f3f43..2f1dcc0c7594 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -75,7 +75,7 @@
 
 	__EFI_PE_HEADER
 
-	__INIT
+	.section ".idmap.text","awx"
 
 	/*
 	 * The following callee saved general purpose registers are used on the
@@ -91,6 +91,19 @@ SYM_CODE_START(primary_entry)
 	bl	record_mmu_state
 	bl	preserve_boot_args
 	bl	create_idmap
+
+	/*
+	 * If we entered with the MMU and caches on, clean the ID mapped part
+	 * of the primary boot code to the PoC and invalidate it from the
+	 * I-cache so we can safely turn them off.
+	 */
+	cbz	x22, 0f
+	adrp	x0, __idmap_text_start
+	adr_l	x1, __idmap_text_end
+	sub	x1, x1, x0
+	bl	dcache_clean_poc
+	ic	ialluis
+0:
 	bl	init_kernel_el			// w0=cpu_boot_mode
 	mov	x20, x0
 
@@ -104,6 +117,7 @@ SYM_CODE_START(primary_entry)
 	b	__primary_switch
 SYM_CODE_END(primary_entry)
 
+	__INIT
 SYM_CODE_START_LOCAL(record_mmu_state)
 	mrs	x22, CurrentEL
 	cmp	x22, #CurrentEL_EL2
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 15/18] arm64: lds: move idmap_pg_dir out of .rodata
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (13 preceding siblings ...)
  2022-03-30 15:42 ` [RFC PATCH v2 14/18] arm64: head: clean the ID map page to the PoC Ard Biesheuvel
@ 2022-03-30 15:42 ` Ard Biesheuvel
  2022-03-30 15:42 ` [RFC PATCH v2 16/18] efi: libstub: pass image handle to handle_kernel_image() Ard Biesheuvel
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:42 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

After future changes, the ID map may be set up by the boot entry code
with the MMU and caches enabled, which means we will reuse the identity
map set up by the firmware. This means that memory we describe as
read-only in the PE/COFF header may not be writable, preventing us from
creating the new identity map if the root level is located in such a
region.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/vmlinux.lds.S | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index edaf0faf766f..2231ccba45f7 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -194,10 +194,6 @@ SECTIONS
 
 	HYPERVISOR_DATA_SECTIONS
 
-	idmap_pg_dir = .;
-	. += IDMAP_DIR_SIZE;
-	idmap_pg_end = .;
-
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 	tramp_pg_dir = .;
 	. += PAGE_SIZE;
@@ -291,6 +287,11 @@ SECTIONS
 		__mmuoff_data_end = .;
 	}
 
+	. = ALIGN(PAGE_SIZE);
+	idmap_pg_dir = .;
+	. += IDMAP_DIR_SIZE;
+	idmap_pg_end = .;
+
 	PECOFF_EDATA_PADDING
 	__pecoff_data_rawsize = ABSOLUTE(. - __initdata_begin);
 	_edata = .;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 16/18] efi: libstub: pass image handle to handle_kernel_image()
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (14 preceding siblings ...)
  2022-03-30 15:42 ` [RFC PATCH v2 15/18] arm64: lds: move idmap_pg_dir out of .rodata Ard Biesheuvel
@ 2022-03-30 15:42 ` Ard Biesheuvel
  2022-03-30 15:42 ` [RFC PATCH v2 17/18] efi/arm64: libstub: run image in place if randomized by the loader Ard Biesheuvel
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:42 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

In a future patch, arm64's implementation of handle_kernel_image() will
omit randomizing the placement of the kernel if the load address was
chosen randomly by the loader. In order to do this, it needs to locate a
protocol on the image handle, so pass it to handle_kernel_image().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/libstub/arm32-stub.c | 3 ++-
 drivers/firmware/efi/libstub/arm64-stub.c | 3 ++-
 drivers/firmware/efi/libstub/efi-stub.c   | 2 +-
 drivers/firmware/efi/libstub/efistub.h    | 3 ++-
 drivers/firmware/efi/libstub/riscv-stub.c | 3 ++-
 5 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/firmware/efi/libstub/arm32-stub.c b/drivers/firmware/efi/libstub/arm32-stub.c
index 4b5b2403b3a0..0131e3aaa605 100644
--- a/drivers/firmware/efi/libstub/arm32-stub.c
+++ b/drivers/firmware/efi/libstub/arm32-stub.c
@@ -117,7 +117,8 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
 				 unsigned long *image_size,
 				 unsigned long *reserve_addr,
 				 unsigned long *reserve_size,
-				 efi_loaded_image_t *image)
+				 efi_loaded_image_t *image,
+				 efi_handle_t image_handle)
 {
 	const int slack = TEXT_OFFSET - 5 * PAGE_SIZE;
 	int alloc_size = MAX_UNCOMP_KERNEL_SIZE + EFI_PHYS_ALIGN;
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index 9cc556013d08..00c91a3807ea 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -83,7 +83,8 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
 				 unsigned long *image_size,
 				 unsigned long *reserve_addr,
 				 unsigned long *reserve_size,
-				 efi_loaded_image_t *image)
+				 efi_loaded_image_t *image,
+				 efi_handle_t image_handle)
 {
 	efi_status_t status;
 	unsigned long kernel_size, kernel_memsize = 0;
diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
index da93864d7abc..f515394cce6e 100644
--- a/drivers/firmware/efi/libstub/efi-stub.c
+++ b/drivers/firmware/efi/libstub/efi-stub.c
@@ -198,7 +198,7 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
 	status = handle_kernel_image(&image_addr, &image_size,
 				     &reserve_addr,
 				     &reserve_size,
-				     image);
+				     image, handle);
 	if (status != EFI_SUCCESS) {
 		efi_err("Failed to relocate kernel\n");
 		goto fail_free_screeninfo;
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index edb77b0621ea..c4f4f078087d 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -865,7 +865,8 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
 				 unsigned long *image_size,
 				 unsigned long *reserve_addr,
 				 unsigned long *reserve_size,
-				 efi_loaded_image_t *image);
+				 efi_loaded_image_t *image,
+				 efi_handle_t image_handle);
 
 asmlinkage void __noreturn efi_enter_kernel(unsigned long entrypoint,
 					    unsigned long fdt_addr,
diff --git a/drivers/firmware/efi/libstub/riscv-stub.c b/drivers/firmware/efi/libstub/riscv-stub.c
index 9c460843442f..eec043873354 100644
--- a/drivers/firmware/efi/libstub/riscv-stub.c
+++ b/drivers/firmware/efi/libstub/riscv-stub.c
@@ -80,7 +80,8 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
 				 unsigned long *image_size,
 				 unsigned long *reserve_addr,
 				 unsigned long *reserve_size,
-				 efi_loaded_image_t *image)
+				 efi_loaded_image_t *image,
+				 efi_handle_t image_handle)
 {
 	unsigned long kernel_size = 0;
 	unsigned long preferred_addr;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 17/18] efi/arm64: libstub: run image in place if randomized by the loader
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (15 preceding siblings ...)
  2022-03-30 15:42 ` [RFC PATCH v2 16/18] efi: libstub: pass image handle to handle_kernel_image() Ard Biesheuvel
@ 2022-03-30 15:42 ` Ard Biesheuvel
  2022-03-30 15:42 ` [RFC PATCH v2 18/18] arm64: efi/libstub: enter with the MMU on if executing in place Ard Biesheuvel
  2022-03-31 15:37 ` [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Mark Rutland
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:42 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

If the loader has already placed the EFI kernel image randomly in
physical memory, and indicates having done so by installing the 'fixed
placement' protocol onto the image handle, don't bother randomizing the
placement again in the EFI stub.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/libstub/arm64-stub.c | 12 +++++++++---
 include/linux/efi.h                       | 11 +++++++++++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index 00c91a3807ea..577173ee1f83 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -101,7 +101,15 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
 	u64 min_kimg_align = efi_nokaslr ? MIN_KIMG_ALIGN : EFI_KIMG_ALIGN;
 
 	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
-		if (!efi_nokaslr) {
+		efi_guid_t li_fixed_proto = LINUX_EFI_LOADED_IMAGE_FIXED_GUID;
+		void *p;
+
+		if (efi_nokaslr) {
+			efi_info("KASLR disabled on kernel command line\n");
+		} else if (efi_bs_call(handle_protocol, image_handle,
+				       &li_fixed_proto, &p) == EFI_SUCCESS) {
+			efi_info("Image placement fixed by loader\n");
+		} else {
 			status = efi_get_random_bytes(sizeof(phys_seed),
 						      (u8 *)&phys_seed);
 			if (status == EFI_NOT_FOUND) {
@@ -112,8 +120,6 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
 					status);
 				efi_nokaslr = true;
 			}
-		} else {
-			efi_info("KASLR disabled on kernel command line\n");
 		}
 	}
 
diff --git a/include/linux/efi.h b/include/linux/efi.h
index ccd4d3f91c98..d7567006e151 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -406,6 +406,17 @@ void efi_native_runtime_setup(void);
 #define LINUX_EFI_INITRD_MEDIA_GUID		EFI_GUID(0x5568e427, 0x68fc, 0x4f3d,  0xac, 0x74, 0xca, 0x55, 0x52, 0x31, 0xcc, 0x68)
 #define LINUX_EFI_MOK_VARIABLE_TABLE_GUID	EFI_GUID(0xc451ed2b, 0x9694, 0x45d3,  0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89)
 
+/*
+ * This GUID may be installed onto the kernel image's handle as a NULL protocol
+ * to signal to the stub that the placement of the image should be respected,
+ * and moving the image in physical memory is undesirable. To ensure
+ * compatibility with 64k pages kernels with virtually mapped stacks, and to
+ * avoid defeating physical randomization, this protocol should only be
+ * installed if the image was placed at a randomized 128k aligned address in
+ * memory.
+ */
+#define LINUX_EFI_LOADED_IMAGE_FIXED_GUID	EFI_GUID(0xf5a37b6d, 0x3344, 0x42a5,  0xb6, 0xbb, 0x97, 0x86, 0x48, 0xc1, 0x89, 0x0a)
+
 /* OEM GUIDs */
 #define DELLEMC_EFI_RCI2_TABLE_GUID		EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
 #define AMD_SEV_MEM_ENCRYPT_GUID		EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 18/18] arm64: efi/libstub: enter with the MMU on if executing in place
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (16 preceding siblings ...)
  2022-03-30 15:42 ` [RFC PATCH v2 17/18] efi/arm64: libstub: run image in place if randomized by the loader Ard Biesheuvel
@ 2022-03-30 15:42 ` Ard Biesheuvel
  2022-03-31 15:37 ` [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Mark Rutland
  18 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-30 15:42 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-arm-kernel, keescook, mark.rutland, catalin.marinas,
	Ard Biesheuvel

If the kernel image has not been moved from the place where it was
loaded by the firmware, just call the kernel entrypoint directly, and
keep the MMU and caches enabled. This removes the need for any cache
invalidation in the entry path.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/efi-entry.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
index 61a87fa1c305..0da0b373cf32 100644
--- a/arch/arm64/kernel/efi-entry.S
+++ b/arch/arm64/kernel/efi-entry.S
@@ -23,6 +23,10 @@ SYM_CODE_START(efi_enter_kernel)
 	add	x19, x0, x2		// relocated Image entrypoint
 	mov	x20, x1			// DTB address
 
+	adrp	x3, _text		// just call the entrypoint
+	cmp	x0, x3			// directly if the image was
+	b.eq	2f			// not moved around in memory
+
 	/*
 	 * Clean the copied Image to the PoC, and ensure it is not shadowed by
 	 * stale icache entries from before relocation.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot
  2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
                   ` (17 preceding siblings ...)
  2022-03-30 15:42 ` [RFC PATCH v2 18/18] arm64: efi/libstub: enter with the MMU on if executing in place Ard Biesheuvel
@ 2022-03-31 15:37 ` Mark Rutland
  2022-03-31 16:20   ` Ard Biesheuvel
  18 siblings, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2022-03-31 15:37 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-arm-kernel, keescook, catalin.marinas,
	Marc Zyngier, Will Deacon

Hi Ard,

On Wed, Mar 30, 2022 at 05:41:47PM +0200, Ard Biesheuvel wrote:
> This is a followup to a previous series of mine [0], and it aims to
> streamline the boot flow with respect to cache maintenance and redundant
> copying of data in memory.
> 
> Combined with my proof-of-concept firmware for QEMU/arm64 [1], this
> results in a boot where both the kernel and the initrd are loaded
> straight to their final locations in memory, while the physical
> placement of the kernel image is still randomized by the loader. It also
> removes all memory accesses performed with the MMU and caches off
> (except for instruction fetches) that are done from the moment the VM
> comes out of reset.
> 
> On the kernel side, this comes down to:
> - increasing the ID map to cover the entire kernel image, so we can
>   build the kernel page tables with the MMU and caches enabled;
> - deal with the MMU already being on at boot, and keep it on while
>   building the ID map;
> - ensure all stores to memory that are now done with the MMU and caches
>   on are not negated by the subsequent cache invalidation.

This is on my queue to review in detail, but for now I have a couple of
high-level thoughts:

1) I like the idea of deferring/staging some work until after the MMU is on,
   and I'm in favour of doing so where we can do so in all cases. If we end up
   with infrastructure to run some MMU-on TTBR0 stub environment(s), that could
   be useful elsewhere, e.g. idmap_kpti_install_ng_mappings().

2) I do not think that we should support entering the kernel with the MMU on.

   I think that consistently using the same MMU-off boot code has saved us a
   great deal of pain thus far, and the more I think about booting with the MMU
   on, I think it opens us up to a lot of potential pain, both in the short term
   and longer term as the architecture evolves. For example, as rhetoricals
   from the top of my head:

  * How do we safely inherit whatever VMSA state the loader has left us with?
    e.g. what do we require w.r.t. TCRs, MAIRS?
    e.g. what to do when the loader uses a different granule size from the
         kernel?

  * What can we expect is mapped, and with which specific attributes and
    permissions?

  * What do we document here for loaders other than the EFI stub?
    ... and what about kexec?

  ... and generally this is another complication for maintenance and testing
  that I'd rather not open the door to.

In other words, my view is that we should *minimize* what we do with the MMU
off, but only where we can do that consistently, and we should still
consistently enter with the MMU off such that we can consistently and safely
initialize the VMSA state.

Thanks,
Mark.

> Additionally, this series removes the little dance we do to create a
> kernel mapping, relocate the kernel, run the KASLR init code, tear down
> the old mapping and creat a new one, relocate the kernel again, and
> finally enter the kernel proper. Instead, it invokes a minimal C
> function 'kaslr_early_init()' while running from the ID map with a
> temporary mapping of the FDT in TTBR1. This change represents a
> substantial of the diffstat, as it requires some work to instantiate
> code that can run safely from the wrong address. It is also the most
> likely to raise objections, so it can be dropped from this series if
> desired (patch #9 is the meat, and #8 is a prerequisite patch that could
> be dropped in that case as well)
> 
> Changes since v1:
> - Remove the dodgy handling of the KASLR seed, which was necessary to
>   avoid doing two iterations of the setup/teardown of the page tables.
>   This is now dealt with by creating the TTBR1 page tables while
>   executing from TTBR0, and so all memory manipulations are still done
>   with the MMU and caches on. (This is also the reason patch #9 is
>   optional now)
> - Only boot from EFI with the MMU and caches on if the image was not
>   moved around in memory. Otherwise, we cannot rely on the firmware's ID
>   map to have created an executable mapping for the copied code.
> 
> [0] https://lore.kernel.org/all/20220304175657.2744400-1-ardb@kernel.org/
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/efilite.git/
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> 
> Ard Biesheuvel (18):
>   arm64: head: drop idmap_ptrs_per_pgd
>   arm64: head: split off idmap creation code
>   arm64: kernel: drop unnecessary PoC cache clean+invalidate
>   arm64: head: cover entire kernel image in ID map
>   arm64: head: factor out TTBR1 assignment into a macro
>   arm64: head: populate kernel page tables with MMU and caches on
>   arm64: kaslr: deal with init called with VA randomization enabled
>   arm64: setup: defer R/O remapping of FDT
>   arm64: head: relocate kernel only a single time if KASLR is enabled
>   arm64: head: record the MMU state at primary entry
>   arm64: mm: make vabits_actual a build time constant if possible
>   arm64: head: avoid cache invalidation when entering with the MMU on
>   arm64: head: record CPU boot mode after enabling the MMU
>   arm64: head: clean the ID map page to the PoC
>   arm64: lds: move idmap_pg_dir out of .rodata
>   efi: libstub: pass image handle to handle_kernel_image()
>   efi/arm64: libstub: run image in place if randomized by the loader
>   arm64: efi/libstub: enter with the MMU on if executing in place
> 
>  arch/arm64/include/asm/kernel-pgtable.h   |   2 +-
>  arch/arm64/include/asm/memory.h           |   6 +
>  arch/arm64/include/asm/mmu_context.h      |   1 -
>  arch/arm64/kernel/Makefile                |   2 +-
>  arch/arm64/kernel/efi-entry.S             |   4 +
>  arch/arm64/kernel/head.S                  | 276 +++++++++++---------
>  arch/arm64/kernel/kaslr.c                 |  86 +-----
>  arch/arm64/kernel/pi/Makefile             |  33 +++
>  arch/arm64/kernel/pi/kaslr_early.c        | 128 +++++++++
>  arch/arm64/kernel/setup.c                 |   8 +-
>  arch/arm64/kernel/vmlinux.lds.S           |   9 +-
>  arch/arm64/mm/mmu.c                       |  15 +-
>  drivers/firmware/efi/libstub/arm32-stub.c |   3 +-
>  drivers/firmware/efi/libstub/arm64-stub.c |  15 +-
>  drivers/firmware/efi/libstub/efi-stub.c   |   2 +-
>  drivers/firmware/efi/libstub/efistub.h    |   3 +-
>  drivers/firmware/efi/libstub/riscv-stub.c |   3 +-
>  include/linux/efi.h                       |  11 +
>  18 files changed, 380 insertions(+), 227 deletions(-)
>  create mode 100644 arch/arm64/kernel/pi/Makefile
>  create mode 100644 arch/arm64/kernel/pi/kaslr_early.c
> 
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot
  2022-03-31 15:37 ` [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Mark Rutland
@ 2022-03-31 16:20   ` Ard Biesheuvel
  0 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2022-03-31 16:20 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-efi, Linux ARM, Kees Cook, Catalin Marinas, Marc Zyngier,
	Will Deacon

On Thu, 31 Mar 2022 at 17:37, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Ard,
>

Hi Mark,

Thanks for taking a look.

> On Wed, Mar 30, 2022 at 05:41:47PM +0200, Ard Biesheuvel wrote:
> > This is a followup to a previous series of mine [0], and it aims to
> > streamline the boot flow with respect to cache maintenance and redundant
> > copying of data in memory.
> >
> > Combined with my proof-of-concept firmware for QEMU/arm64 [1], this
> > results in a boot where both the kernel and the initrd are loaded
> > straight to their final locations in memory, while the physical
> > placement of the kernel image is still randomized by the loader. It also
> > removes all memory accesses performed with the MMU and caches off
> > (except for instruction fetches) that are done from the moment the VM
> > comes out of reset.
> >
> > On the kernel side, this comes down to:
> > - increasing the ID map to cover the entire kernel image, so we can
> >   build the kernel page tables with the MMU and caches enabled;
> > - deal with the MMU already being on at boot, and keep it on while
> >   building the ID map;
> > - ensure all stores to memory that are now done with the MMU and caches
> >   on are not negated by the subsequent cache invalidation.
>
> This is on my queue to review in detail, but for now I have a couple of
> high-level thoughts:
>
> 1) I like the idea of deferring/staging some work until after the MMU is on,
>    and I'm in favour of doing so where we can do so in all cases. If we end up
>    with infrastructure to run some MMU-on TTBR0 stub environment(s), that could
>    be useful elsewhere, e.g. idmap_kpti_install_ng_mappings().
>

Yeah, good point. I as aware that there might be other code that we
would prefer to run in the same way.

> 2) I do not think that we should support entering the kernel with the MMU on.
>
>    I think that consistently using the same MMU-off boot code has saved us a
>    great deal of pain thus far, and the more I think about booting with the MMU
>    on, I think it opens us up to a lot of potential pain, both in the short term
>    and longer term as the architecture evolves. For example, as rhetoricals
>    from the top of my head:
>
>   * How do we safely inherit whatever VMSA state the loader has left us with?
>     e.g. what do we require w.r.t. TCRs, MAIRS?
>     e.g. what to do when the loader uses a different granule size from the
>          kernel?
>
>   * What can we expect is mapped, and with which specific attributes and
>     permissions?
>
>   * What do we document here for loaders other than the EFI stub?
>     ... and what about kexec?
>

The only requirement is that the entire image is mapped writeback
cacheable, with the code region executable and the data region
writable. Beyond that, it doesn't really matter, not even whether we
boot at EL2 or EL1. The 1:1 mapping we inherit from the previous boot
stage is only used to create the ID map (and to set some global
variables), we never run our page tables under the old TCR/MAIR regime
or vice versa.

Whether or not we should relax the documented boot protocol as well is
a separate question. I wouldn't be opposed to doing that, if we
document the requirements, but it is not something I'm pursuing with
this series.

>   ... and generally this is another complication for maintenance and testing
>   that I'd rather not open the door to.
>
> In other words, my view is that we should *minimize* what we do with the MMU
> off, but only where we can do that consistently, and we should still
> consistently enter with the MMU off such that we can consistently and safely
> initialize the VMSA state.
>

I see your point. I personally think this is manageable, but I'll let
the maintainers be the judge of that.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-03-31 16:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-30 15:41 [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 01/18] arm64: head: drop idmap_ptrs_per_pgd Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 02/18] arm64: head: split off idmap creation code Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 03/18] arm64: kernel: drop unnecessary PoC cache clean+invalidate Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 04/18] arm64: head: cover entire kernel image in ID map Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 05/18] arm64: head: factor out TTBR1 assignment into a macro Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 06/18] arm64: head: populate kernel page tables with MMU and caches on Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 07/18] arm64: kaslr: deal with init called with VA randomization enabled Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 08/18] arm64: setup: defer R/O remapping of FDT Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 09/18] arm64: head: relocate kernel only a single time if KASLR is enabled Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 10/18] arm64: head: record the MMU state at primary entry Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 11/18] arm64: mm: make vabits_actual a build time constant if possible Ard Biesheuvel
2022-03-30 15:41 ` [RFC PATCH v2 12/18] arm64: head: avoid cache invalidation when entering with the MMU on Ard Biesheuvel
2022-03-30 15:42 ` [RFC PATCH v2 13/18] arm64: head: record CPU boot mode after enabling the MMU Ard Biesheuvel
2022-03-30 15:42 ` [RFC PATCH v2 14/18] arm64: head: clean the ID map page to the PoC Ard Biesheuvel
2022-03-30 15:42 ` [RFC PATCH v2 15/18] arm64: lds: move idmap_pg_dir out of .rodata Ard Biesheuvel
2022-03-30 15:42 ` [RFC PATCH v2 16/18] efi: libstub: pass image handle to handle_kernel_image() Ard Biesheuvel
2022-03-30 15:42 ` [RFC PATCH v2 17/18] efi/arm64: libstub: run image in place if randomized by the loader Ard Biesheuvel
2022-03-30 15:42 ` [RFC PATCH v2 18/18] arm64: efi/libstub: enter with the MMU on if executing in place Ard Biesheuvel
2022-03-31 15:37 ` [RFC PATCH v2 00/18] arm64: efi: leave MMU and caches on at boot Mark Rutland
2022-03-31 16:20   ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).