All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1
@ 2024-02-14 12:28 Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 01/43] arm64: kernel: Manage absolute relocations in code built under pi/ Ard Biesheuvel
                   ` (43 more replies)
  0 siblings, 44 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

This v8 covers the remaining changes that implement support for LPA2 and
WXN at stage 1, now that some of the prerequisites are in place.

v4: https://lore.kernel.org/r/20230912141549.278777-63-ardb@google.com/
v5: https://lore.kernel.org/r/20231124101840.944737-41-ardb@google.com/
v6: https://lore.kernel.org/r/20231129111555.3594833-43-ardb@google.com/
v7: https://lore.kernel.org/r/20240123145258.1462979-52-ardb%2Bgit%40google.com/

-%-

Changes in v8:
- rebase onto arm64/reorg-va-space and drop the patches that were merged
- bring back the KVM change to rely on vabits_actual to decide at which
  level a walk of the user space page tables should start

Changes in v7:
- rebase onto v6.8-rc1 which includes some patches of the previous
  revision, and includes the KVM changes for LPA2

The first ~30 patches rework the early init code, reimplementing most of
the page table and relocation handling in C code. There are several
reasons why this is needed:
- we generally prefer C code over asm for these things, and the macros
  that currently exist in head.S for creating the kernel page tables
  are a good example why;
- we no longer need to create the kernel mapping in two passes, which
  means we can remove the logic that copies parts of the fixmap and the
  KAsan shadow from one set of page tables to the other; this is
  especially advantageous for KAsan with LPA2, which needs more
  elaborate shadow handling across multiple levels, since the KAsan
  region cannot be placed on exact pgd_t boundaries in that case;
- we can read the ID registers and parse command line overrides before
  creating the page tables, which simplifies the LPA2 case, as flicking
  the global TCR_EL1.DS bit at a later stage would require elaborate
  repainting of all page table descriptors, some of which with the MMU
  disabled;
- we can use more elaborate logic to create the mappings, which means we
  can use more precise mappings for code and data sections even when
  using 2 MiB granularity, and this is a prerequisite for running with
  WXN.

As part of the ID map changes, we decouple the ID map size from the
kernel VA size, and switch to a 48-bit VA map for all configurations.

The next ~10 patches rework the existing LVA support as a CPU feature,
which simplifies some code and gets rid of the vabits_actual variable.
Then, LPA2 support is implemented in the same vein. This requires adding
support for 5 level paging as well, given that LPA2 introduces a new
paging level '-1' when using 4k pages.

Combined with the vmemmap changes at the start of the series, the
resulting LPA2/4k pages configuration will have the exact same VA space
layout as the ordinary 4k/4 levels configuration, and so LPA2 support
can reasonably be enabled by default, as the fallback is seamless on
non-LPA2 hardware.

In the 16k/LPA2 case, the fallback also reduces the number of paging
levels, resulting in a 47-bit VA space. This is based on the assumption
that hybrid LPA2/non-LPA2 16k pages kernels in production use would
prefer not to take the performance hit of 4 level paging to gain only a
single additional bit of VA space. (Note that generic Android kernels
use only 3 levels of paging today.) Bespoke 16k configurations can still
configure 48-bit virtual addressing as before.

Finally, enable support for running with the WXN control enabled. This
was previously part of a separate series, but given that the delta is
tiny, it is included here as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Kees Cook <keescook@chromium.org>

Ard Biesheuvel (43):
  arm64: kernel: Manage absolute relocations in code built under pi/
  arm64: kernel: Don't rely on objcopy to make code under pi/ __init
  arm64: head: move relocation handling to C code
  arm64: idreg-override: Move to early mini C runtime
  arm64: kernel: Remove early fdt remap code
  arm64: head: Clear BSS and the kernel page tables in one go
  arm64: Move feature overrides into the BSS section
  arm64: head: Run feature override detection before mapping the kernel
  arm64: head: move dynamic shadow call stack patching into early C
    runtime
  arm64: cpufeature: Add helper to test for CPU feature overrides
  arm64: kaslr: Use feature override instead of parsing the cmdline
    again
  arm64: idreg-override: Create a pseudo feature for rodata=off
  arm64: Add helpers to probe local CPU for PAC and BTI support
  arm64: head: allocate more pages for the kernel mapping
  arm64: head: move memstart_offset_seed handling to C code
  arm64: mm: Make kaslr_requires_kpti() a static inline
  arm64: mmu: Make __cpu_replace_ttbr1() out of line
  arm64: head: Move early kernel mapping routines into C code
  arm64: mm: Use 48-bit virtual addressing for the permanent ID map
  arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels
  arm64: kernel: Create initial ID map from C code
  arm64: mm: avoid fixmap for early swapper_pg_dir updates
  arm64: mm: omit redundant remap of kernel image
  arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()"
  arm64: mm: Handle LVA support as a CPU feature
  arm64: mm: Add feature override support for LVA
  arm64: Avoid #define'ing PTE_MAYBE_NG to 0x0 for asm use
  arm64: Add ESR decoding for exceptions involving translation level -1
  arm64: mm: Wire up TCR.DS bit to PTE shareability fields
  arm64: mm: Add LPA2 support to phys<->pte conversion routines
  arm64: mm: Add definitions to support 5 levels of paging
  arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion
  arm64: Enable LPA2 at boot if supported by the system
  arm64: mm: Add 5 level paging support to fixmap and swapper handling
  arm64: kasan: Reduce minimum shadow alignment and enable 5 level
    paging
  arm64: mm: Add support for folding PUDs at runtime
  arm64: ptdump: Disregard unaddressable VA space
  arm64: ptdump: Deal with translation levels folded at runtime
  arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels
  arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
  arm64: defconfig: Enable LPA2 support
  mm: add arch hook to validate mmap() prot flags
  arm64: mm: add support for WXN memory translation attribute

 arch/arm64/Kconfig                          |  38 +-
 arch/arm64/configs/defconfig                |   1 -
 arch/arm64/include/asm/archrandom.h         |   2 -
 arch/arm64/include/asm/assembler.h          |  55 +--
 arch/arm64/include/asm/cpufeature.h         | 116 +++++
 arch/arm64/include/asm/esr.h                |  13 +-
 arch/arm64/include/asm/fixmap.h             |   2 +-
 arch/arm64/include/asm/kasan.h              |   2 -
 arch/arm64/include/asm/kernel-pgtable.h     | 103 ++---
 arch/arm64/include/asm/kvm_emulate.h        |  10 +-
 arch/arm64/include/asm/memory.h             |  17 +-
 arch/arm64/include/asm/mman.h               |  36 ++
 arch/arm64/include/asm/mmu.h                |  40 +-
 arch/arm64/include/asm/mmu_context.h        |  83 ++--
 arch/arm64/include/asm/pgalloc.h            |  53 ++-
 arch/arm64/include/asm/pgtable-hwdef.h      |  33 +-
 arch/arm64/include/asm/pgtable-prot.h       |  20 +-
 arch/arm64/include/asm/pgtable-types.h      |   6 +
 arch/arm64/include/asm/pgtable.h            | 219 ++++++++-
 arch/arm64/include/asm/scs.h                |  36 +-
 arch/arm64/include/asm/setup.h              |   3 -
 arch/arm64/include/asm/tlb.h                |   3 +
 arch/arm64/kernel/Makefile                  |  13 +-
 arch/arm64/kernel/cpufeature.c              | 111 +++--
 arch/arm64/kernel/head.S                    | 463 ++------------------
 arch/arm64/kernel/image-vars.h              |  35 +-
 arch/arm64/kernel/kaslr.c                   |   4 +-
 arch/arm64/kernel/module.c                  |   2 +-
 arch/arm64/kernel/pi/Makefile               |  27 +-
 arch/arm64/kernel/{ => pi}/idreg-override.c |  80 ++--
 arch/arm64/kernel/pi/kaslr_early.c          |  67 +--
 arch/arm64/kernel/pi/map_kernel.c           | 276 ++++++++++++
 arch/arm64/kernel/pi/map_range.c            | 105 +++++
 arch/arm64/kernel/{ => pi}/patch-scs.c      |  36 +-
 arch/arm64/kernel/pi/pi.h                   |  36 ++
 arch/arm64/kernel/pi/relacheck.c            | 130 ++++++
 arch/arm64/kernel/pi/relocate.c             |  64 +++
 arch/arm64/kernel/setup.c                   |  22 -
 arch/arm64/kernel/sleep.S                   |   3 -
 arch/arm64/kernel/vmlinux.lds.S             |  17 +-
 arch/arm64/kvm/mmu.c                        |  17 +-
 arch/arm64/mm/fault.c                       |  30 +-
 arch/arm64/mm/fixmap.c                      |  36 +-
 arch/arm64/mm/init.c                        |   2 +-
 arch/arm64/mm/kasan_init.c                  | 159 +++++--
 arch/arm64/mm/mmap.c                        |   4 +
 arch/arm64/mm/mmu.c                         | 255 ++++++-----
 arch/arm64/mm/pgd.c                         |  17 +-
 arch/arm64/mm/proc.S                        | 122 +++++-
 arch/arm64/mm/ptdump.c                      |  21 +-
 arch/arm64/tools/cpucaps                    |   1 +
 include/linux/mman.h                        |  15 +
 mm/mmap.c                                   |   3 +
 53 files changed, 1948 insertions(+), 1116 deletions(-)
 rename arch/arm64/kernel/{ => pi}/idreg-override.c (83%)
 create mode 100644 arch/arm64/kernel/pi/map_kernel.c
 create mode 100644 arch/arm64/kernel/pi/map_range.c
 rename arch/arm64/kernel/{ => pi}/patch-scs.c (89%)
 create mode 100644 arch/arm64/kernel/pi/pi.h
 create mode 100644 arch/arm64/kernel/pi/relacheck.c
 create mode 100644 arch/arm64/kernel/pi/relocate.c

-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v8 01/43] arm64: kernel: Manage absolute relocations in code built under pi/
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 02/43] arm64: kernel: Don't rely on objcopy to make code under pi/ __init Ard Biesheuvel
                   ` (42 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The mini C runtime runs before relocations are processed, and so it
cannot rely on statically initialized pointer variables.

Add a check to ensure that such code does not get introduced by
accident, by going over the relocations in each object, identifying the
ones that operate on data sections that are part of the executable
image, and raising an error if any relocations of type R_AARCH64_ABS64
exist. Note that such relocations are permitted in other places (e.g.,
debug sections) and will never occur in compiler generated code sections
when using the small code model, so only check sections that have
SHF_ALLOC set and SHF_EXECINSTR cleared.

To accommodate cases where statically initialized symbol references are
unavoidable, introduce a special case for ELF input data sections that
have ".rodata.prel64" in their names, and in these cases, instead of
rejecting any encountered ABS64 relocations, convert them into PREL64
relocations, which don't require any runtime fixups. Note that the code
in question must still be modified to deal with this, as it needs to
convert the 64-bit signed offsets into absolute addresses before use.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/pi/Makefile    |   9 +-
 arch/arm64/kernel/pi/pi.h        |  18 +++
 arch/arm64/kernel/pi/relacheck.c | 130 ++++++++++++++++++++
 3 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index c844a0546d7f..bc32a431fe35 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -22,11 +22,16 @@ KCSAN_SANITIZE	:= n
 UBSAN_SANITIZE	:= n
 KCOV_INSTRUMENT	:= n
 
+hostprogs	:= relacheck
+
+quiet_cmd_piobjcopy = $(quiet_cmd_objcopy)
+      cmd_piobjcopy = $(cmd_objcopy) && $(obj)/relacheck $(@) $(<)
+
 $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
 			       --remove-section=.note.gnu.property \
 			       --prefix-alloc-sections=.init
-$(obj)/%.pi.o: $(obj)/%.o FORCE
-	$(call if_changed,objcopy)
+$(obj)/%.pi.o: $(obj)/%.o $(obj)/relacheck FORCE
+	$(call if_changed,piobjcopy)
 
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
new file mode 100644
index 000000000000..7c2d9bbf0ff9
--- /dev/null
+++ b/arch/arm64/kernel/pi/pi.h
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2023 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+#define __prel64_initconst	__section(".init.rodata.prel64")
+
+#define PREL64(type, name)	union { type *name; prel64_t name ## _prel; }
+
+#define prel64_pointer(__d)	(typeof(__d))prel64_to_pointer(&__d##_prel)
+
+typedef volatile signed long prel64_t;
+
+static inline void *prel64_to_pointer(const prel64_t *offset)
+{
+	if (!*offset)
+		return NULL;
+	return (void *)offset + *offset;
+}
diff --git a/arch/arm64/kernel/pi/relacheck.c b/arch/arm64/kernel/pi/relacheck.c
new file mode 100644
index 000000000000..b0cd4d0d275b
--- /dev/null
+++ b/arch/arm64/kernel/pi/relacheck.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 - Google LLC
+ * Author: Ard Biesheuvel <ardb@google.com>
+ */
+
+#include <elf.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define HOST_ORDER ELFDATA2LSB
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define HOST_ORDER ELFDATA2MSB
+#endif
+
+static Elf64_Ehdr *ehdr;
+static Elf64_Shdr *shdr;
+static const char *strtab;
+static bool swap;
+
+static uint64_t swab_elfxword(uint64_t val)
+{
+	return swap ? __builtin_bswap64(val) : val;
+}
+
+static uint32_t swab_elfword(uint32_t val)
+{
+	return swap ? __builtin_bswap32(val) : val;
+}
+
+static uint16_t swab_elfhword(uint16_t val)
+{
+	return swap ? __builtin_bswap16(val) : val;
+}
+
+int main(int argc, char *argv[])
+{
+	struct stat stat;
+	int fd, ret;
+
+	if (argc < 3) {
+		fprintf(stderr, "file arguments missing\n");
+		exit(EXIT_FAILURE);
+	}
+
+	fd = open(argv[1], O_RDWR);
+	if (fd < 0) {
+		fprintf(stderr, "failed to open %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	ret = fstat(fd, &stat);
+	if (ret < 0) {
+		fprintf(stderr, "failed to stat() %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	ehdr = mmap(0, stat.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	if (ehdr == MAP_FAILED) {
+		fprintf(stderr, "failed to mmap() %s\n", argv[1]);
+		exit(EXIT_FAILURE);
+	}
+
+	swap = ehdr->e_ident[EI_DATA] != HOST_ORDER;
+	shdr = (void *)ehdr + swab_elfxword(ehdr->e_shoff);
+	strtab = (void *)ehdr +
+		 swab_elfxword(shdr[swab_elfhword(ehdr->e_shstrndx)].sh_offset);
+
+	for (int i = 0; i < swab_elfhword(ehdr->e_shnum); i++) {
+		unsigned long info, flags;
+		bool prel64 = false;
+		Elf64_Rela *rela;
+		int numrela;
+
+		if (swab_elfword(shdr[i].sh_type) != SHT_RELA)
+			continue;
+
+		/* only consider RELA sections operating on data */
+		info = swab_elfword(shdr[i].sh_info);
+		flags = swab_elfxword(shdr[info].sh_flags);
+		if ((flags & (SHF_ALLOC | SHF_EXECINSTR)) != SHF_ALLOC)
+			continue;
+
+		/*
+		 * We generally don't permit ABS64 relocations in the code that
+		 * runs before relocation processing occurs. If statically
+		 * initialized absolute symbol references are unavoidable, they
+		 * may be emitted into a *.rodata.prel64 section and they will
+		 * be converted to place-relative 64-bit references. This
+		 * requires special handling in the referring code.
+		 */
+		if (strstr(strtab + swab_elfword(shdr[info].sh_name),
+			   ".rodata.prel64")) {
+			prel64 = true;
+		}
+
+		rela = (void *)ehdr + swab_elfxword(shdr[i].sh_offset);
+		numrela = swab_elfxword(shdr[i].sh_size) / sizeof(*rela);
+
+		for (int j = 0; j < numrela; j++) {
+			uint64_t info = swab_elfxword(rela[j].r_info);
+
+			if (ELF64_R_TYPE(info) != R_AARCH64_ABS64)
+				continue;
+
+			if (prel64) {
+				/* convert ABS64 into PREL64 */
+				info ^= R_AARCH64_ABS64 ^ R_AARCH64_PREL64;
+				rela[j].r_info = swab_elfxword(info);
+			} else {
+				fprintf(stderr,
+					"Unexpected absolute relocations detected in %s\n",
+					argv[2]);
+				close(fd);
+				unlink(argv[1]);
+				exit(EXIT_FAILURE);
+			}
+		}
+	}
+	close(fd);
+	return 0;
+}
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 02/43] arm64: kernel: Don't rely on objcopy to make code under pi/ __init
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 01/43] arm64: kernel: Manage absolute relocations in code built under pi/ Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 03/43] arm64: head: move relocation handling to C code Ard Biesheuvel
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We will add some code under pi/ that contains global variables that
should not end up in __initdata, as they will not be writable via the
initial ID map. So only rely on objcopy for making the libfdt code
__init, and use explicit annotations for the rest.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/pi/Makefile      |  6 ++++--
 arch/arm64/kernel/pi/kaslr_early.c | 16 +++++++++-------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index bc32a431fe35..2bbe866417d4 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -28,11 +28,13 @@ quiet_cmd_piobjcopy = $(quiet_cmd_objcopy)
       cmd_piobjcopy = $(cmd_objcopy) && $(obj)/relacheck $(@) $(<)
 
 $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \
-			       --remove-section=.note.gnu.property \
-			       --prefix-alloc-sections=.init
+			       --remove-section=.note.gnu.property
 $(obj)/%.pi.o: $(obj)/%.o $(obj)/relacheck FORCE
 	$(call if_changed,piobjcopy)
 
+# ensure that all the lib- code ends up as __init code and data
+$(obj)/lib-%.pi.o: OBJCOPYFLAGS += --prefix-alloc-sections=.init
+
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index b9e0bb4bc6a9..167081b30a15 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -17,7 +17,7 @@
 #include <asm/pgtable.h>
 
 /* taken from lib/string.c */
-static char *__strstr(const char *s1, const char *s2)
+static char *__init __strstr(const char *s1, const char *s2)
 {
 	size_t l1, l2;
 
@@ -33,7 +33,7 @@ static char *__strstr(const char *s1, const char *s2)
 	}
 	return NULL;
 }
-static bool cmdline_contains_nokaslr(const u8 *cmdline)
+static bool __init cmdline_contains_nokaslr(const u8 *cmdline)
 {
 	const u8 *str;
 
@@ -41,7 +41,7 @@ static bool cmdline_contains_nokaslr(const u8 *cmdline)
 	return str == cmdline || (str > cmdline && *(str - 1) == ' ');
 }
 
-static bool is_kaslr_disabled_cmdline(void *fdt)
+static bool __init is_kaslr_disabled_cmdline(void *fdt)
 {
 	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
 		int node;
@@ -67,17 +67,19 @@ static bool is_kaslr_disabled_cmdline(void *fdt)
 	return cmdline_contains_nokaslr(CONFIG_CMDLINE);
 }
 
-static u64 get_kaslr_seed(void *fdt)
+static u64 __init get_kaslr_seed(void *fdt)
 {
+	static char const chosen_str[] __initconst = "chosen";
+	static char const seed_str[] __initconst = "kaslr-seed";
 	int node, len;
 	fdt64_t *prop;
 	u64 ret;
 
-	node = fdt_path_offset(fdt, "/chosen");
+	node = fdt_path_offset(fdt, chosen_str);
 	if (node < 0)
 		return 0;
 
-	prop = fdt_getprop_w(fdt, node, "kaslr-seed", &len);
+	prop = fdt_getprop_w(fdt, node, seed_str, &len);
 	if (!prop || len != sizeof(u64))
 		return 0;
 
@@ -86,7 +88,7 @@ static u64 get_kaslr_seed(void *fdt)
 	return ret;
 }
 
-asmlinkage u64 kaslr_early_init(void *fdt)
+asmlinkage u64 __init kaslr_early_init(void *fdt)
 {
 	u64 seed, range;
 
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 03/43] arm64: head: move relocation handling to C code
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 01/43] arm64: kernel: Manage absolute relocations in code built under pi/ Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 02/43] arm64: kernel: Don't rely on objcopy to make code under pi/ __init Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 04/43] arm64: idreg-override: Move to early mini C runtime Ard Biesheuvel
                   ` (40 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Now that we have a mini C runtime before the kernel mapping is up, we
can move the non-trivial relocation processing code out of head.S and
reimplement it in C.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/Makefile      |   3 +-
 arch/arm64/kernel/head.S        | 104 ++------------------
 arch/arm64/kernel/pi/Makefile   |   5 +-
 arch/arm64/kernel/pi/relocate.c |  62 ++++++++++++
 arch/arm64/kernel/vmlinux.lds.S |  12 ++-
 5 files changed, 82 insertions(+), 104 deletions(-)

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 467cb7117273..78f14084f6d7 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -57,7 +57,8 @@ obj-$(CONFIG_ACPI)			+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 obj-$(CONFIG_PARAVIRT)			+= paravirt.o
-obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o pi/
+obj-$(CONFIG_RELOCATABLE)		+= pi/
+obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
 obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 obj-$(CONFIG_ELF_CORE)			+= elfcore.o
 obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index cab7f91949d8..a8fa64fc30d7 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -81,7 +81,7 @@
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
-	 *  x23        primary_entry() .. start_kernel()        physical misalignment/KASLR offset
+	 *  x23        __primary_switch()                       physical misalignment/KASLR offset
 	 *  x24        __primary_switch()                       linear map KASLR seed
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 *  x28        create_idmap()                           callee preserved temp register
@@ -389,7 +389,7 @@ SYM_FUNC_START_LOCAL(create_idmap)
 	/* Remap the kernel page tables r/w in the ID map */
 	adrp	x1, _text
 	adrp	x2, init_pg_dir
-	adrp	x3, init_pg_end
+	adrp	x3, _end
 	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
 	mov_q	x5, SWAPPER_RW_MMUFLAGS
 	mov	x6, #SWAPPER_BLOCK_SHIFT
@@ -779,97 +779,6 @@ SYM_FUNC_START_LOCAL(__no_granule_support)
 	b	1b
 SYM_FUNC_END(__no_granule_support)
 
-#ifdef CONFIG_RELOCATABLE
-SYM_FUNC_START_LOCAL(__relocate_kernel)
-	/*
-	 * Iterate over each entry in the relocation table, and apply the
-	 * relocations in place.
-	 */
-	adr_l	x9, __rela_start
-	adr_l	x10, __rela_end
-	mov_q	x11, KIMAGE_VADDR		// default virtual offset
-	add	x11, x11, x23			// actual virtual offset
-
-0:	cmp	x9, x10
-	b.hs	1f
-	ldp	x12, x13, [x9], #24
-	ldr	x14, [x9, #-8]
-	cmp	w13, #R_AARCH64_RELATIVE
-	b.ne	0b
-	add	x14, x14, x23			// relocate
-	str	x14, [x12, x23]
-	b	0b
-
-1:
-#ifdef CONFIG_RELR
-	/*
-	 * Apply RELR relocations.
-	 *
-	 * RELR is a compressed format for storing relative relocations. The
-	 * encoded sequence of entries looks like:
-	 * [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
-	 *
-	 * i.e. start with an address, followed by any number of bitmaps. The
-	 * address entry encodes 1 relocation. The subsequent bitmap entries
-	 * encode up to 63 relocations each, at subsequent offsets following
-	 * the last address entry.
-	 *
-	 * The bitmap entries must have 1 in the least significant bit. The
-	 * assumption here is that an address cannot have 1 in lsb. Odd
-	 * addresses are not supported. Any odd addresses are stored in the RELA
-	 * section, which is handled above.
-	 *
-	 * Excluding the least significant bit in the bitmap, each non-zero
-	 * bit in the bitmap represents a relocation to be applied to
-	 * a corresponding machine word that follows the base address
-	 * word. The second least significant bit represents the machine
-	 * word immediately following the initial address, and each bit
-	 * that follows represents the next word, in linear order. As such,
-	 * a single bitmap can encode up to 63 relocations in a 64-bit object.
-	 *
-	 * In this implementation we store the address of the next RELR table
-	 * entry in x9, the address being relocated by the current address or
-	 * bitmap entry in x13 and the address being relocated by the current
-	 * bit in x14.
-	 */
-	adr_l	x9, __relr_start
-	adr_l	x10, __relr_end
-
-2:	cmp	x9, x10
-	b.hs	7f
-	ldr	x11, [x9], #8
-	tbnz	x11, #0, 3f			// branch to handle bitmaps
-	add	x13, x11, x23
-	ldr	x12, [x13]			// relocate address entry
-	add	x12, x12, x23
-	str	x12, [x13], #8			// adjust to start of bitmap
-	b	2b
-
-3:	mov	x14, x13
-4:	lsr	x11, x11, #1
-	cbz	x11, 6f
-	tbz	x11, #0, 5f			// skip bit if not set
-	ldr	x12, [x14]			// relocate bit
-	add	x12, x12, x23
-	str	x12, [x14]
-
-5:	add	x14, x14, #8			// move to next bit's address
-	b	4b
-
-6:	/*
-	 * Move to the next bitmap's address. 8 is the word size, and 63 is the
-	 * number of significant bits in a bitmap entry.
-	 */
-	add	x13, x13, #(8 * 63)
-	b	2b
-
-7:
-#endif
-	ret
-
-SYM_FUNC_END(__relocate_kernel)
-#endif
-
 SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
@@ -877,11 +786,11 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 #ifdef CONFIG_RELOCATABLE
 	adrp	x23, KERNEL_START
 	and	x23, x23, MIN_KIMG_ALIGN - 1
-#ifdef CONFIG_RANDOMIZE_BASE
-	mov	x0, x22
-	adrp	x1, init_pg_end
+	adrp	x1, early_init_stack
 	mov	sp, x1
 	mov	x29, xzr
+#ifdef CONFIG_RANDOMIZE_BASE
+	mov	x0, x22
 	bl	__pi_kaslr_early_init
 	and	x24, x0, #SZ_2M - 1		// capture memstart offset seed
 	bic	x0, x0, #SZ_2M - 1
@@ -894,7 +803,8 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x1, init_pg_dir
 	load_ttbr1 x1, x1, x2
 #ifdef CONFIG_RELOCATABLE
-	bl	__relocate_kernel
+	mov	x0, x23
+	bl	__pi_relocate_kernel
 #endif
 	ldr	x8, =__primary_switched
 	adrp	x0, KERNEL_START		// __pa(KERNEL_START)
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index 2bbe866417d4..d084c1dcf416 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -38,5 +38,6 @@ $(obj)/lib-%.pi.o: OBJCOPYFLAGS += --prefix-alloc-sections=.init
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
-obj-y		:= kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
-extra-y		:= $(patsubst %.pi.o,%.o,$(obj-y))
+obj-y				:= relocate.pi.o
+obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+extra-y				:= $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/pi/relocate.c b/arch/arm64/kernel/pi/relocate.c
new file mode 100644
index 000000000000..1853408ea76b
--- /dev/null
+++ b/arch/arm64/kernel/pi/relocate.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2023 Google LLC
+// Authors: Ard Biesheuvel <ardb@google.com>
+//          Peter Collingbourne <pcc@google.com>
+
+#include <linux/elf.h>
+#include <linux/init.h>
+#include <linux/types.h>
+
+extern const Elf64_Rela rela_start[], rela_end[];
+extern const u64 relr_start[], relr_end[];
+
+void __init relocate_kernel(u64 offset)
+{
+	u64 *place = NULL;
+
+	for (const Elf64_Rela *rela = rela_start; rela < rela_end; rela++) {
+		if (ELF64_R_TYPE(rela->r_info) != R_AARCH64_RELATIVE)
+			continue;
+		*(u64 *)(rela->r_offset + offset) = rela->r_addend + offset;
+	}
+
+	if (!IS_ENABLED(CONFIG_RELR) || !offset)
+		return;
+
+	/*
+	 * Apply RELR relocations.
+	 *
+	 * RELR is a compressed format for storing relative relocations. The
+	 * encoded sequence of entries looks like:
+	 * [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
+	 *
+	 * i.e. start with an address, followed by any number of bitmaps. The
+	 * address entry encodes 1 relocation. The subsequent bitmap entries
+	 * encode up to 63 relocations each, at subsequent offsets following
+	 * the last address entry.
+	 *
+	 * The bitmap entries must have 1 in the least significant bit. The
+	 * assumption here is that an address cannot have 1 in lsb. Odd
+	 * addresses are not supported. Any odd addresses are stored in the
+	 * RELA section, which is handled above.
+	 *
+	 * With the exception of the least significant bit, each bit in the
+	 * bitmap corresponds with a machine word that follows the base address
+	 * word, and the bit value indicates whether or not a relocation needs
+	 * to be applied to it. The second least significant bit represents the
+	 * machine word immediately following the initial address, and each bit
+	 * that follows represents the next word, in linear order. As such, a
+	 * single bitmap can encode up to 63 relocations in a 64-bit object.
+	 */
+	for (const u64 *relr = relr_start; relr < relr_end; relr++) {
+		if ((*relr & 1) == 0) {
+			place = (u64 *)(*relr + offset);
+			*place++ += offset;
+		} else {
+			for (u64 *p = place, r = *relr >> 1; r; p++, r >>= 1)
+				if (r & 1)
+					*p += offset;
+			place += 63;
+		}
+	}
+}
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3cd7e76cc562..8dd5dda66f7c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -270,15 +270,15 @@ SECTIONS
 	HYPERVISOR_RELOC_SECTION
 
 	.rela.dyn : ALIGN(8) {
-		__rela_start = .;
+		__pi_rela_start = .;
 		*(.rela .rela*)
-		__rela_end = .;
+		__pi_rela_end = .;
 	}
 
 	.relr.dyn : ALIGN(8) {
-		__relr_start = .;
+		__pi_relr_start = .;
 		*(.relr.dyn)
-		__relr_end = .;
+		__pi_relr_end = .;
 	}
 
 	. = ALIGN(SEGMENT_ALIGN);
@@ -317,6 +317,10 @@ SECTIONS
 	init_pg_dir = .;
 	. += INIT_DIR_SIZE;
 	init_pg_end = .;
+#ifdef CONFIG_RELOCATABLE
+	. += SZ_4K;		/* stack for the early relocation code */
+	early_init_stack = .;
+#endif
 
 	. = ALIGN(SEGMENT_ALIGN);
 	__pecoff_data_size = ABSOLUTE(. - __initdata_begin);
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 04/43] arm64: idreg-override: Move to early mini C runtime
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 03/43] arm64: head: move relocation handling to C code Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 05/43] arm64: kernel: Remove early fdt remap code Ard Biesheuvel
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We will want to parse the ID register overrides even earlier, so that we
can take them into account before creating the kernel mapping. So
migrate the code and make it work in the context of the early C runtime.
We will move the invocation to an earlier stage in a subsequent patch.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/Makefile                  |  4 +--
 arch/arm64/kernel/head.S                    |  5 ++-
 arch/arm64/kernel/image-vars.h              |  9 +++++
 arch/arm64/kernel/pi/Makefile               |  5 +--
 arch/arm64/kernel/{ => pi}/idreg-override.c | 35 +++++++++-----------
 5 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 78f14084f6d7..4236f1e0fffa 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -33,8 +33,7 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   return_address.o cpuinfo.o cpu_errata.o		\
 			   cpufeature.o alternative.o cacheinfo.o		\
 			   smp.o smp_spin_table.o topology.o smccc-call.o	\
-			   syscall.o proton-pack.o idreg-override.o idle.o	\
-			   patching.o
+			   syscall.o proton-pack.o idle.o patching.o pi/
 
 obj-$(CONFIG_COMPAT)			+= sys32.o signal32.o			\
 					   sys_compat.o
@@ -57,7 +56,6 @@ obj-$(CONFIG_ACPI)			+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 obj-$(CONFIG_PARAVIRT)			+= paravirt.o
-obj-$(CONFIG_RELOCATABLE)		+= pi/
 obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
 obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 obj-$(CONFIG_ELF_CORE)			+= elfcore.o
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index a8fa64fc30d7..ca5e5fbefcd3 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -510,10 +510,9 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
-	mov	x0, x21				// pass FDT address in x0
-	bl	early_fdt_map			// Try mapping the FDT early
 	mov	x0, x20				// pass the full boot status
-	bl	init_feature_override		// Parse cpu feature overrides
+	mov	x1, x22				// pass the low FDT mapping
+	bl	__pi_init_feature_override	// Parse cpu feature overrides
 #ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
 	bl	scs_patch_vmlinux
 #endif
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index e931ce078a00..eacc3d167733 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -37,6 +37,15 @@ PROVIDE(__pi___memmove			= __pi_memmove);
 PROVIDE(__pi___memset			= __pi_memset);
 
 PROVIDE(__pi_vabits_actual		= vabits_actual);
+PROVIDE(__pi_id_aa64isar1_override	= id_aa64isar1_override);
+PROVIDE(__pi_id_aa64isar2_override	= id_aa64isar2_override);
+PROVIDE(__pi_id_aa64mmfr1_override	= id_aa64mmfr1_override);
+PROVIDE(__pi_id_aa64pfr0_override	= id_aa64pfr0_override);
+PROVIDE(__pi_id_aa64pfr1_override	= id_aa64pfr1_override);
+PROVIDE(__pi_id_aa64smfr0_override	= id_aa64smfr0_override);
+PROVIDE(__pi_id_aa64zfr0_override	= id_aa64zfr0_override);
+PROVIDE(__pi_arm64_sw_feature_override	= arm64_sw_feature_override);
+PROVIDE(__pi__ctype			= _ctype);
 
 #ifdef CONFIG_KVM
 
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index d084c1dcf416..7f6dfce893c3 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -38,6 +38,7 @@ $(obj)/lib-%.pi.o: OBJCOPYFLAGS += --prefix-alloc-sections=.init
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
-obj-y				:= relocate.pi.o
-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr_early.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+obj-y				:= idreg-override.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
+obj-$(CONFIG_RELOCATABLE)	+= relocate.pi.o
+obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr_early.pi.o
 extra-y				:= $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
similarity index 93%
rename from arch/arm64/kernel/idreg-override.c
rename to arch/arm64/kernel/pi/idreg-override.c
index e30fd9e32ef3..f9e05c10faab 100644
--- a/arch/arm64/kernel/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -14,6 +14,8 @@
 #include <asm/cpufeature.h>
 #include <asm/setup.h>
 
+#include "pi.h"
+
 #define FTR_DESC_NAME_LEN	20
 #define FTR_DESC_FIELD_LEN	10
 #define FTR_ALIAS_NAME_LEN	30
@@ -21,15 +23,6 @@
 
 static u64 __boot_status __initdata;
 
-// temporary __prel64 related definitions
-// to be removed when this code is moved under pi/
-
-#define __prel64_initconst	__initconst
-
-#define PREL64(type, name)	union { type *name; }
-
-#define prel64_pointer(__d)	(__d)
-
 typedef bool filter_t(u64 val);
 
 struct ftr_set_desc {
@@ -313,16 +306,11 @@ static __init void __parse_cmdline(const char *cmdline, bool parse_aliases)
 	} while (1);
 }
 
-static __init const u8 *get_bootargs_cmdline(void)
+static __init const u8 *get_bootargs_cmdline(const void *fdt)
 {
 	const u8 *prop;
-	void *fdt;
 	int node;
 
-	fdt = get_early_fdt_ptr();
-	if (!fdt)
-		return NULL;
-
 	node = fdt_path_offset(fdt, "/chosen");
 	if (node < 0)
 		return NULL;
@@ -334,9 +322,9 @@ static __init const u8 *get_bootargs_cmdline(void)
 	return strlen(prop) ? prop : NULL;
 }
 
-static __init void parse_cmdline(void)
+static __init void parse_cmdline(const void *fdt)
 {
-	const u8 *prop = get_bootargs_cmdline();
+	const u8 *prop = get_bootargs_cmdline(fdt);
 
 	if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !prop)
 		__parse_cmdline(CONFIG_CMDLINE, true);
@@ -346,9 +334,9 @@ static __init void parse_cmdline(void)
 }
 
 /* Keep checkers quiet */
-void init_feature_override(u64 boot_status);
+void init_feature_override(u64 boot_status, const void *fdt);
 
-asmlinkage void __init init_feature_override(u64 boot_status)
+asmlinkage void __init init_feature_override(u64 boot_status, const void *fdt)
 {
 	struct arm64_ftr_override *override;
 	const struct ftr_set_desc *reg;
@@ -364,7 +352,7 @@ asmlinkage void __init init_feature_override(u64 boot_status)
 
 	__boot_status = boot_status;
 
-	parse_cmdline();
+	parse_cmdline(fdt);
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
 		reg = prel64_pointer(regs[i].reg);
@@ -373,3 +361,10 @@ asmlinkage void __init init_feature_override(u64 boot_status)
 				       (unsigned long)(override + 1));
 	}
 }
+
+char * __init skip_spaces(const char *str)
+{
+	while (isspace(*str))
+		++str;
+	return (char *)str;
+}
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 05/43] arm64: kernel: Remove early fdt remap code
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 04/43] arm64: idreg-override: Move to early mini C runtime Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 06/43] arm64: head: Clear BSS and the kernel page tables in one go Ard Biesheuvel
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The early FDT remap code is no longer used so let's drop it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/setup.h |  3 ---
 arch/arm64/kernel/setup.c      | 15 ---------------
 2 files changed, 18 deletions(-)

diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
index 2e4d7da74fb8..ba269a7a3201 100644
--- a/arch/arm64/include/asm/setup.h
+++ b/arch/arm64/include/asm/setup.h
@@ -7,9 +7,6 @@
 
 #include <uapi/asm/setup.h>
 
-void *get_early_fdt_ptr(void);
-void early_fdt_map(u64 dt_phys);
-
 /*
  * These two variables are used in the head.S file.
  */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 42c690bb2d60..97d2143669cf 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -166,21 +166,6 @@ static void __init smp_build_mpidr_hash(void)
 		pr_warn("Large number of MPIDR hash buckets detected\n");
 }
 
-static void *early_fdt_ptr __initdata;
-
-void __init *get_early_fdt_ptr(void)
-{
-	return early_fdt_ptr;
-}
-
-asmlinkage void __init early_fdt_map(u64 dt_phys)
-{
-	int fdt_size;
-
-	early_fixmap_init();
-	early_fdt_ptr = fixmap_remap_fdt(dt_phys, &fdt_size, PAGE_KERNEL);
-}
-
 static void __init setup_machine_fdt(phys_addr_t dt_phys)
 {
 	int size;
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 06/43] arm64: head: Clear BSS and the kernel page tables in one go
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 05/43] arm64: kernel: Remove early fdt remap code Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 07/43] arm64: Move feature overrides into the BSS section Ard Biesheuvel
                   ` (37 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We will move the CPU feature overrides into BSS in a subsequent patch,
and this requires that BSS is zeroed before the feature override
detection code runs. So let's map BSS read-write in the ID map, and zero
it via this mapping.

Since the kernel page tables are right next to it, and also zeroed via
the ID map, let's drop the separate clear_page_tables() function, and
just zero everything in one go.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S        | 33 +++++++-------------
 arch/arm64/kernel/vmlinux.lds.S |  3 ++
 2 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index ca5e5fbefcd3..2af518161f3a 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -177,17 +177,6 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
 	ret
 SYM_CODE_END(preserve_boot_args)
 
-SYM_FUNC_START_LOCAL(clear_page_tables)
-	/*
-	 * Clear the init page tables.
-	 */
-	adrp	x0, init_pg_dir
-	adrp	x1, init_pg_end
-	sub	x2, x1, x0
-	mov	x1, xzr
-	b	__pi_memset			// tail call
-SYM_FUNC_END(clear_page_tables)
-
 /*
  * Macro to populate page table entries, these entries can be pointers to the next level
  * or last level entries pointing to physical memory.
@@ -386,9 +375,9 @@ SYM_FUNC_START_LOCAL(create_idmap)
 
 	map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
 
-	/* Remap the kernel page tables r/w in the ID map */
+	/* Remap BSS and the kernel page tables r/w in the ID map */
 	adrp	x1, _text
-	adrp	x2, init_pg_dir
+	adrp	x2, __bss_start
 	adrp	x3, _end
 	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
 	mov_q	x5, SWAPPER_RW_MMUFLAGS
@@ -489,14 +478,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	mov	x0, x20
 	bl	set_cpu_boot_mode_flag
 
-	// Clear BSS
-	adr_l	x0, __bss_start
-	mov	x1, xzr
-	adr_l	x2, __bss_stop
-	sub	x2, x2, x0
-	bl	__pi_memset
-	dsb	ishst				// Make zero page visible to PTW
-
 #if VA_BITS > 48
 	adr_l	x8, vabits_actual		// Set this early so KASAN early init
 	str	x25, [x8]			// ... observes the correct value
@@ -782,6 +763,15 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
+
+	// Clear BSS
+	adrp	x0, __bss_start
+	mov	x1, xzr
+	adrp	x2, init_pg_end
+	sub	x2, x2, x0
+	bl	__pi_memset
+	dsb	ishst				// Make zero page visible to PTW
+
 #ifdef CONFIG_RELOCATABLE
 	adrp	x23, KERNEL_START
 	and	x23, x23, MIN_KIMG_ALIGN - 1
@@ -796,7 +786,6 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	orr	x23, x23, x0			// record kernel offset
 #endif
 #endif
-	bl	clear_page_tables
 	bl	create_kernel_mapping
 
 	adrp	x1, init_pg_dir
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 8dd5dda66f7c..8a3c6aacc355 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -311,12 +311,15 @@ SECTIONS
 	__pecoff_data_rawsize = ABSOLUTE(. - __initdata_begin);
 	_edata = .;
 
+	/* start of zero-init region */
 	BSS_SECTION(SBSS_ALIGN, 0, 0)
 
 	. = ALIGN(PAGE_SIZE);
 	init_pg_dir = .;
 	. += INIT_DIR_SIZE;
 	init_pg_end = .;
+	/* end of zero-init region */
+
 #ifdef CONFIG_RELOCATABLE
 	. += SZ_4K;		/* stack for the early relocation code */
 	early_init_stack = .;
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 07/43] arm64: Move feature overrides into the BSS section
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 06/43] arm64: head: Clear BSS and the kernel page tables in one go Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 08/43] arm64: head: Run feature override detection before mapping the kernel Ard Biesheuvel
                   ` (36 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In order to allow the CPU feature override detection code to run even
earlier, move the feature override global variables into BSS, which is
the only part of the static kernel image that is mapped read-write in
the initial ID map.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 8d1a634a403e..a99ad79adee2 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -655,13 +655,13 @@ static const struct arm64_ftr_bits ftr_raz[] = {
 #define ARM64_FTR_REG(id, table)		\
 	__ARM64_FTR_REG_OVERRIDE(#id, id, table, &no_override)
 
-struct arm64_ftr_override __ro_after_init id_aa64mmfr1_override;
-struct arm64_ftr_override __ro_after_init id_aa64pfr0_override;
-struct arm64_ftr_override __ro_after_init id_aa64pfr1_override;
-struct arm64_ftr_override __ro_after_init id_aa64zfr0_override;
-struct arm64_ftr_override __ro_after_init id_aa64smfr0_override;
-struct arm64_ftr_override __ro_after_init id_aa64isar1_override;
-struct arm64_ftr_override __ro_after_init id_aa64isar2_override;
+struct arm64_ftr_override id_aa64mmfr1_override;
+struct arm64_ftr_override id_aa64pfr0_override;
+struct arm64_ftr_override id_aa64pfr1_override;
+struct arm64_ftr_override id_aa64zfr0_override;
+struct arm64_ftr_override id_aa64smfr0_override;
+struct arm64_ftr_override id_aa64isar1_override;
+struct arm64_ftr_override id_aa64isar2_override;
 
 struct arm64_ftr_override arm64_sw_feature_override;
 
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 08/43] arm64: head: Run feature override detection before mapping the kernel
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 07/43] arm64: Move feature overrides into the BSS section Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 09/43] arm64: head: move dynamic shadow call stack patching into early C runtime Ard Biesheuvel
                   ` (35 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

To permit the feature overrides to be taken into account before the
KASLR init code runs and the kernel mapping is created, move the
detection code to an earlier stage in the boot.

In a subsequent patch, this will be taken advantage of by merging the
preliminary and permanent mappings of the kernel text and data into a
single one that gets created and relocated before start_kernel() is
called.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S        | 17 +++++++++--------
 arch/arm64/kernel/vmlinux.lds.S |  4 +---
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 2af518161f3a..865ecc1f8255 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -375,9 +375,9 @@ SYM_FUNC_START_LOCAL(create_idmap)
 
 	map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
 
-	/* Remap BSS and the kernel page tables r/w in the ID map */
+	/* Remap [.init].data, BSS and the kernel page tables r/w in the ID map */
 	adrp	x1, _text
-	adrp	x2, __bss_start
+	adrp	x2, __initdata_begin
 	adrp	x3, _end
 	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
 	mov_q	x5, SWAPPER_RW_MMUFLAGS
@@ -491,9 +491,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
-	mov	x0, x20				// pass the full boot status
-	mov	x1, x22				// pass the low FDT mapping
-	bl	__pi_init_feature_override	// Parse cpu feature overrides
 #ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
 	bl	scs_patch_vmlinux
 #endif
@@ -772,12 +769,16 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	bl	__pi_memset
 	dsb	ishst				// Make zero page visible to PTW
 
-#ifdef CONFIG_RELOCATABLE
-	adrp	x23, KERNEL_START
-	and	x23, x23, MIN_KIMG_ALIGN - 1
 	adrp	x1, early_init_stack
 	mov	sp, x1
 	mov	x29, xzr
+	mov	x0, x20				// pass the full boot status
+	mov	x1, x22				// pass the low FDT mapping
+	bl	__pi_init_feature_override	// Parse cpu feature overrides
+
+#ifdef CONFIG_RELOCATABLE
+	adrp	x23, KERNEL_START
+	and	x23, x23, MIN_KIMG_ALIGN - 1
 #ifdef CONFIG_RANDOMIZE_BASE
 	mov	x0, x22
 	bl	__pi_kaslr_early_init
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 8a3c6aacc355..3afb4223a5e8 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -320,10 +320,8 @@ SECTIONS
 	init_pg_end = .;
 	/* end of zero-init region */
 
-#ifdef CONFIG_RELOCATABLE
-	. += SZ_4K;		/* stack for the early relocation code */
+	. += SZ_4K;		/* stack for the early C runtime */
 	early_init_stack = .;
-#endif
 
 	. = ALIGN(SEGMENT_ALIGN);
 	__pecoff_data_size = ABSOLUTE(. - __initdata_begin);
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 09/43] arm64: head: move dynamic shadow call stack patching into early C runtime
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 08/43] arm64: head: Run feature override detection before mapping the kernel Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 10/43] arm64: cpufeature: Add helper to test for CPU feature overrides Ard Biesheuvel
                   ` (34 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Once we update the early kernel mapping code to only map the kernel once
with the right permissions, we can no longer perform code patching via
this mapping.

So move this code to an earlier stage of the boot, right after applying
the relocations.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/scs.h           |  4 +--
 arch/arm64/kernel/Makefile             |  8 ------
 arch/arm64/kernel/head.S               |  8 +++---
 arch/arm64/kernel/module.c             |  2 +-
 arch/arm64/kernel/pi/Makefile          | 10 +++++---
 arch/arm64/kernel/{ => pi}/patch-scs.c | 26 ++++++++++----------
 6 files changed, 27 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h
index 3fdae5fe3142..eca2ba5a6276 100644
--- a/arch/arm64/include/asm/scs.h
+++ b/arch/arm64/include/asm/scs.h
@@ -72,8 +72,8 @@ static inline void dynamic_scs_init(void)
 static inline void dynamic_scs_init(void) {}
 #endif
 
-int scs_patch(const u8 eh_frame[], int size);
-asmlinkage void scs_patch_vmlinux(void);
+int __pi_scs_patch(const u8 eh_frame[], int size);
+asmlinkage void __pi_scs_patch_vmlinux(void);
 
 #endif /* __ASSEMBLY __ */
 
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 4236f1e0fffa..14b4a179bad3 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -71,14 +71,6 @@ obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
 obj-$(CONFIG_ARM64_MTE)			+= mte.o
 obj-y					+= vdso-wrap.o
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
-obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
-
-# We need to prevent the SCS patching code from patching itself. Using
-# -mbranch-protection=none here to avoid the patchable PAC opcodes from being
-# generated triggers an issue with full LTO on Clang, which stops emitting PAC
-# instructions altogether. So disable LTO as well for the compilation unit.
-CFLAGS_patch-scs.o			+= -mbranch-protection=none
-CFLAGS_REMOVE_patch-scs.o		+= $(CC_FLAGS_LTO)
 
 # Force dependency (vdso*-wrap.S includes vdso.so through incbin)
 $(obj)/vdso-wrap.o: $(obj)/vdso/vdso.so
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 865ecc1f8255..b320702032a7 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -490,9 +490,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
-#endif
-#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
-	bl	scs_patch_vmlinux
 #endif
 	mov	x0, x20
 	bl	finalise_el2			// Prefer VHE if possible
@@ -794,6 +791,11 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 #ifdef CONFIG_RELOCATABLE
 	mov	x0, x23
 	bl	__pi_relocate_kernel
+#endif
+#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
+	ldr	x0, =__eh_frame_start
+	ldr	x1, =__eh_frame_end
+	bl	__pi_scs_patch_vmlinux
 #endif
 	ldr	x8, =__primary_switched
 	adrp	x0, KERNEL_START		// __pa(KERNEL_START)
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index dd851297596e..47e0be610bb6 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -595,7 +595,7 @@ int module_finalize(const Elf_Ehdr *hdr,
 	if (scs_is_dynamic()) {
 		s = find_section(hdr, sechdrs, ".init.eh_frame");
 		if (s)
-			scs_patch((void *)s->sh_addr, s->sh_size);
+			__pi_scs_patch((void *)s->sh_addr, s->sh_size);
 	}
 
 	return module_init_ftrace_plt(hdr, sechdrs, me);
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index 7f6dfce893c3..a8b302245f15 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -38,7 +38,9 @@ $(obj)/lib-%.pi.o: OBJCOPYFLAGS += --prefix-alloc-sections=.init
 $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
-obj-y				:= idreg-override.pi.o lib-fdt.pi.o lib-fdt_ro.pi.o
-obj-$(CONFIG_RELOCATABLE)	+= relocate.pi.o
-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr_early.pi.o
-extra-y				:= $(patsubst %.pi.o,%.o,$(obj-y))
+obj-y					:= idreg-override.pi.o \
+					   lib-fdt.pi.o lib-fdt_ro.pi.o
+obj-$(CONFIG_RELOCATABLE)		+= relocate.pi.o
+obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr_early.pi.o
+obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.pi.o
+extra-y					:= $(patsubst %.pi.o,%.o,$(obj-y))
diff --git a/arch/arm64/kernel/patch-scs.c b/arch/arm64/kernel/pi/patch-scs.c
similarity index 91%
rename from arch/arm64/kernel/patch-scs.c
rename to arch/arm64/kernel/pi/patch-scs.c
index a1fe4b4ff591..c65ef40d1e6b 100644
--- a/arch/arm64/kernel/patch-scs.c
+++ b/arch/arm64/kernel/pi/patch-scs.c
@@ -4,14 +4,11 @@
  * Author: Ard Biesheuvel <ardb@google.com>
  */
 
-#include <linux/bug.h>
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/linkage.h>
-#include <linux/printk.h>
 #include <linux/types.h>
 
-#include <asm/cacheflush.h>
 #include <asm/scs.h>
 
 //
@@ -81,7 +78,11 @@ static void __always_inline scs_patch_loc(u64 loc)
 		 */
 		return;
 	}
-	dcache_clean_pou(loc, loc + sizeof(u32));
+	if (IS_ENABLED(CONFIG_ARM64_WORKAROUND_CLEAN_CACHE))
+		asm("dc civac, %0" :: "r"(loc));
+	else
+		asm(ALTERNATIVE("dc cvau, %0", "nop", ARM64_HAS_CACHE_IDC)
+		    :: "r"(loc));
 }
 
 /*
@@ -128,10 +129,10 @@ struct eh_frame {
 	};
 };
 
-static int noinstr scs_handle_fde_frame(const struct eh_frame *frame,
-					bool fde_has_augmentation_data,
-					int code_alignment_factor,
-					bool dry_run)
+static int scs_handle_fde_frame(const struct eh_frame *frame,
+				bool fde_has_augmentation_data,
+				int code_alignment_factor,
+				bool dry_run)
 {
 	int size = frame->size - offsetof(struct eh_frame, opcodes) + 4;
 	u64 loc = (u64)offset_to_ptr(&frame->initial_loc);
@@ -198,14 +199,13 @@ static int noinstr scs_handle_fde_frame(const struct eh_frame *frame,
 			break;
 
 		default:
-			pr_err("unhandled opcode: %02x in FDE frame %lx\n", opcode[-1], (uintptr_t)frame);
 			return -ENOEXEC;
 		}
 	}
 	return 0;
 }
 
-int noinstr scs_patch(const u8 eh_frame[], int size)
+int scs_patch(const u8 eh_frame[], int size)
 {
 	const u8 *p = eh_frame;
 
@@ -251,12 +251,12 @@ int noinstr scs_patch(const u8 eh_frame[], int size)
 	return 0;
 }
 
-asmlinkage void __init scs_patch_vmlinux(void)
+asmlinkage void __init scs_patch_vmlinux(const u8 start[], const u8 end[])
 {
 	if (!should_patch_pac_into_scs())
 		return;
 
-	WARN_ON(scs_patch(__eh_frame_start, __eh_frame_end - __eh_frame_start));
-	icache_inval_all_pou();
+	scs_patch(start, end - start);
+	asm("ic ialluis");
 	isb();
 }
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 10/43] arm64: cpufeature: Add helper to test for CPU feature overrides
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 09/43] arm64: head: move dynamic shadow call stack patching into early C runtime Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 11/43] arm64: kaslr: Use feature override instead of parsing the cmdline again Ard Biesheuvel
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add some helpers to extract and apply feature overrides to the bare
idreg values. This involves inspecting the value and mask of the
specific field that we are interested in, given that an override
value/mask pair might be invalid for one field but valid for another.

Then, wire up the new helper for the hVHE test - note that we can drop
the sysreg test here, as the override will be invalid when trying to
enable hVHE on non-VHE capable hardware.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h | 39 ++++++++++++++++++++
 arch/arm64/kernel/cpufeature.c      |  9 +----
 2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 21c824edf8ce..acd8f4949583 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -915,6 +915,45 @@ extern struct arm64_ftr_override id_aa64isar2_override;
 
 extern struct arm64_ftr_override arm64_sw_feature_override;
 
+static inline
+u64 arm64_apply_feature_override(u64 val, int feat, int width,
+				 const struct arm64_ftr_override *override)
+{
+	u64 oval = override->val;
+
+	/*
+	 * When it encounters an invalid override (e.g., an override that
+	 * cannot be honoured due to a missing CPU feature), the early idreg
+	 * override code will set the mask to 0x0 and the value to non-zero for
+	 * the field in question. In order to determine whether the override is
+	 * valid or not for the field we are interested in, we first need to
+	 * disregard bits belonging to other fields.
+	 */
+	oval &= GENMASK_ULL(feat + width - 1, feat);
+
+	/*
+	 * The override is valid if all value bits are accounted for in the
+	 * mask. If so, replace the masked bits with the override value.
+	 */
+	if (oval == (oval & override->mask)) {
+		val &= ~override->mask;
+		val |= oval;
+	}
+
+	/* Extract the field from the updated value */
+	return cpuid_feature_extract_unsigned_field(val, feat);
+}
+
+static inline bool arm64_test_sw_feature_override(int feat)
+{
+	/*
+	 * Software features are pseudo CPU features that have no underlying
+	 * CPUID system register value to apply the override to.
+	 */
+	return arm64_apply_feature_override(0, feat, 4,
+					    &arm64_sw_feature_override);
+}
+
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a99ad79adee2..d0ffb872a31a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2042,14 +2042,7 @@ static bool has_nested_virt_support(const struct arm64_cpu_capabilities *cap,
 static bool hvhe_possible(const struct arm64_cpu_capabilities *entry,
 			  int __unused)
 {
-	u64 val;
-
-	val = read_sysreg(id_aa64mmfr1_el1);
-	if (!cpuid_feature_extract_unsigned_field(val, ID_AA64MMFR1_EL1_VH_SHIFT))
-		return false;
-
-	val = arm64_sw_feature_override.val & arm64_sw_feature_override.mask;
-	return cpuid_feature_extract_unsigned_field(val, ARM64_SW_FEATURE_OVERRIDE_HVHE);
+	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_HVHE);
 }
 
 #ifdef CONFIG_ARM64_PAN
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 11/43] arm64: kaslr: Use feature override instead of parsing the cmdline again
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 10/43] arm64: cpufeature: Add helper to test for CPU feature overrides Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 12/43] arm64: idreg-override: Create a pseudo feature for rodata=off Ard Biesheuvel
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The early kaslr code open codes the detection of 'nokaslr' on the kernel
command line, and this is no longer necessary now that the feature
detection code, which also looks for the same string, executes before
this code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h |  5 ++
 arch/arm64/kernel/kaslr.c           |  4 +-
 arch/arm64/kernel/pi/kaslr_early.c  | 53 +-------------------
 3 files changed, 7 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index acd8f4949583..e309255b7f04 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -954,6 +954,11 @@ static inline bool arm64_test_sw_feature_override(int feat)
 					    &arm64_sw_feature_override);
 }
 
+static inline bool kaslr_disabled_cmdline(void)
+{
+	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_NOKASLR);
+}
+
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 12c7f3c8ba76..1da3e25f9d9e 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -16,9 +16,7 @@ bool __ro_after_init __kaslr_is_enabled = false;
 
 void __init kaslr_init(void)
 {
-	if (cpuid_feature_extract_unsigned_field(arm64_sw_feature_override.val &
-						 arm64_sw_feature_override.mask,
-						 ARM64_SW_FEATURE_OVERRIDE_NOKASLR)) {
+	if (kaslr_disabled_cmdline()) {
 		pr_info("KASLR disabled on command line\n");
 		return;
 	}
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index 167081b30a15..f2305e276ec3 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -16,57 +16,6 @@
 #include <asm/memory.h>
 #include <asm/pgtable.h>
 
-/* taken from lib/string.c */
-static char *__init __strstr(const char *s1, const char *s2)
-{
-	size_t l1, l2;
-
-	l2 = strlen(s2);
-	if (!l2)
-		return (char *)s1;
-	l1 = strlen(s1);
-	while (l1 >= l2) {
-		l1--;
-		if (!memcmp(s1, s2, l2))
-			return (char *)s1;
-		s1++;
-	}
-	return NULL;
-}
-static bool __init cmdline_contains_nokaslr(const u8 *cmdline)
-{
-	const u8 *str;
-
-	str = __strstr(cmdline, "nokaslr");
-	return str == cmdline || (str > cmdline && *(str - 1) == ' ');
-}
-
-static bool __init is_kaslr_disabled_cmdline(void *fdt)
-{
-	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
-		int node;
-		const u8 *prop;
-
-		node = fdt_path_offset(fdt, "/chosen");
-		if (node < 0)
-			goto out;
-
-		prop = fdt_getprop(fdt, node, "bootargs", NULL);
-		if (!prop)
-			goto out;
-
-		if (cmdline_contains_nokaslr(prop))
-			return true;
-
-		if (IS_ENABLED(CONFIG_CMDLINE_EXTEND))
-			goto out;
-
-		return false;
-	}
-out:
-	return cmdline_contains_nokaslr(CONFIG_CMDLINE);
-}
-
 static u64 __init get_kaslr_seed(void *fdt)
 {
 	static char const chosen_str[] __initconst = "chosen";
@@ -92,7 +41,7 @@ asmlinkage u64 __init kaslr_early_init(void *fdt)
 {
 	u64 seed, range;
 
-	if (is_kaslr_disabled_cmdline(fdt))
+	if (kaslr_disabled_cmdline())
 		return 0;
 
 	seed = get_kaslr_seed(fdt);
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 12/43] arm64: idreg-override: Create a pseudo feature for rodata=off
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 11/43] arm64: kaslr: Use feature override instead of parsing the cmdline again Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:28 ` [PATCH v8 13/43] arm64: Add helpers to probe local CPU for PAC and BTI support Ard Biesheuvel
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add rodata=off to the set of kernel command line options that is parsed
early using the CPU feature override detection code, so we can easily
refer to it when creating the kernel mapping.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h   | 1 +
 arch/arm64/kernel/pi/idreg-override.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index e309255b7f04..03c34242bfc7 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -17,6 +17,7 @@
 
 #define ARM64_SW_FEATURE_OVERRIDE_NOKASLR	0
 #define ARM64_SW_FEATURE_OVERRIDE_HVHE		4
+#define ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF	8
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kernel/pi/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
index f9e05c10faab..e4bcabcc6860 100644
--- a/arch/arm64/kernel/pi/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -159,6 +159,7 @@ static const struct ftr_set_desc sw_features __prel64_initconst = {
 	.fields		= {
 		FIELD("nokaslr", ARM64_SW_FEATURE_OVERRIDE_NOKASLR, NULL),
 		FIELD("hvhe", ARM64_SW_FEATURE_OVERRIDE_HVHE, hvhe_filter),
+		FIELD("rodataoff", ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF, NULL),
 		{}
 	},
 };
@@ -190,6 +191,7 @@ static const struct {
 	{ "arm64.nomops",		"id_aa64isar2.mops=0" },
 	{ "arm64.nomte",		"id_aa64pfr1.mte=0" },
 	{ "nokaslr",			"arm64_sw.nokaslr=1" },
+	{ "rodata=off",			"arm64_sw.rodataoff=1" },
 };
 
 static int __init parse_hexdigit(const char *p, u64 *v)
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 13/43] arm64: Add helpers to probe local CPU for PAC and BTI support
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 12/43] arm64: idreg-override: Create a pseudo feature for rodata=off Ard Biesheuvel
@ 2024-02-14 12:28 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 14/43] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:28 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add some helpers that will be used by the early kernel mapping code to
check feature support on the local CPU. This permits the early kernel
mapping to be created with the right attributes, removing the need for
tearing it down and recreating it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h | 32 ++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 03c34242bfc7..e3edae1825f3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -963,6 +963,38 @@ static inline bool kaslr_disabled_cmdline(void)
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
+static inline bool cpu_has_bti(void)
+{
+	if (!IS_ENABLED(CONFIG_ARM64_BTI))
+		return false;
+
+	return arm64_apply_feature_override(read_cpuid(ID_AA64PFR1_EL1),
+					    ID_AA64PFR1_EL1_BT_SHIFT, 4,
+					    &id_aa64pfr1_override);
+}
+
+static inline bool cpu_has_pac(void)
+{
+	u64 isar1, isar2;
+
+	if (!IS_ENABLED(CONFIG_ARM64_PTR_AUTH))
+		return false;
+
+	isar1 = read_cpuid(ID_AA64ISAR1_EL1);
+	isar2 = read_cpuid(ID_AA64ISAR2_EL1);
+
+	if (arm64_apply_feature_override(isar1, ID_AA64ISAR1_EL1_APA_SHIFT, 4,
+					 &id_aa64isar1_override))
+		return true;
+
+	if (arm64_apply_feature_override(isar1, ID_AA64ISAR1_EL1_API_SHIFT, 4,
+					 &id_aa64isar1_override))
+		return true;
+
+	return arm64_apply_feature_override(isar2, ID_AA64ISAR2_EL1_APA3_SHIFT, 4,
+					    &id_aa64isar2_override);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 14/43] arm64: head: allocate more pages for the kernel mapping
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (12 preceding siblings ...)
  2024-02-14 12:28 ` [PATCH v8 13/43] arm64: Add helpers to probe local CPU for PAC and BTI support Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 15/43] arm64: head: move memstart_offset_seed handling to C code Ard Biesheuvel
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In preparation for switching to an early kernel mapping routine that
maps each segment according to its precise boundaries, and with the
correct attributes, let's allocate some extra pages for page tables for
the 4k page size configuration. This is necessary because the start and
end of each segment may not be aligned to the block size, and so we'll
need an extra page table at each segment boundary.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 83ddb14b95a5..0631604995ee 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -68,7 +68,7 @@
 			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
 			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
 			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
-#define INIT_DIR_SIZE (PAGE_SIZE * EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE))
+#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE) + EARLY_SEGMENT_EXTRA_PAGES))
 
 /* the initial ID map may need two extra pages if it needs to be extended */
 #if VA_BITS < 48
@@ -89,6 +89,15 @@
 #define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+/* The number of segments in the kernel image (text, rodata, inittext, initdata, data+bss) */
+#define KERNEL_SEGMENT_COUNT	5
+
+#if SWAPPER_BLOCK_SIZE > SEGMENT_ALIGN
+#define EARLY_SEGMENT_EXTRA_PAGES (KERNEL_SEGMENT_COUNT + 1)
+#else
+#define EARLY_SEGMENT_EXTRA_PAGES 0
+#endif
+
 /*
  * Initial memory map attributes.
  */
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 15/43] arm64: head: move memstart_offset_seed handling to C code
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (13 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 14/43] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 16/43] arm64: mm: Make kaslr_requires_kpti() a static inline Ard Biesheuvel
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Now that we can set BSS variables from the early code running from the
ID map, we can set memstart_offset_seed directly from the C code that
derives the value instead of passing it back and forth between C and asm
code.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/head.S           | 7 -------
 arch/arm64/kernel/image-vars.h     | 1 +
 arch/arm64/kernel/pi/kaslr_early.c | 4 ++++
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b320702032a7..aa7766dc64d9 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -82,7 +82,6 @@
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
 	 *  x23        __primary_switch()                       physical misalignment/KASLR offset
-	 *  x24        __primary_switch()                       linear map KASLR seed
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 *  x28        create_idmap()                           callee preserved temp register
 	 */
@@ -483,11 +482,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	str	x25, [x8]			// ... observes the correct value
 	dc	civac, x8			// Make visible to booting secondaries
 #endif
-
-#ifdef CONFIG_RANDOMIZE_BASE
-	adrp	x5, memstart_offset_seed	// Save KASLR linear map seed
-	strh	w24, [x5, :lo12:memstart_offset_seed]
-#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
@@ -779,7 +773,6 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 #ifdef CONFIG_RANDOMIZE_BASE
 	mov	x0, x22
 	bl	__pi_kaslr_early_init
-	and	x24, x0, #SZ_2M - 1		// capture memstart offset seed
 	bic	x0, x0, #SZ_2M - 1
 	orr	x23, x23, x0			// record kernel offset
 #endif
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index eacc3d167733..8d96052079e8 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -46,6 +46,7 @@ PROVIDE(__pi_id_aa64smfr0_override	= id_aa64smfr0_override);
 PROVIDE(__pi_id_aa64zfr0_override	= id_aa64zfr0_override);
 PROVIDE(__pi_arm64_sw_feature_override	= arm64_sw_feature_override);
 PROVIDE(__pi__ctype			= _ctype);
+PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 
 #ifdef CONFIG_KVM
 
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index f2305e276ec3..eeecee7ffd6f 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -16,6 +16,8 @@
 #include <asm/memory.h>
 #include <asm/pgtable.h>
 
+extern u16 memstart_offset_seed;
+
 static u64 __init get_kaslr_seed(void *fdt)
 {
 	static char const chosen_str[] __initconst = "chosen";
@@ -51,6 +53,8 @@ asmlinkage u64 __init kaslr_early_init(void *fdt)
 			return 0;
 	}
 
+	memstart_offset_seed = seed & U16_MAX;
+
 	/*
 	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
 	 * kernel image offset from the seed. Let's place the kernel in the
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 16/43] arm64: mm: Make kaslr_requires_kpti() a static inline
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (14 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 15/43] arm64: head: move memstart_offset_seed handling to C code Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 17/43] arm64: mmu: Make __cpu_replace_ttbr1() out of line Ard Biesheuvel
                   ` (27 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In preparation for moving the first assignment of arm64_use_ng_mappings
to an earlier stage in the boot, ensure that kaslr_requires_kpti() is
accessible without relying on the core kernel's view on whether or not
KASLR is enabled. So make it a static inline, and move the
kaslr_enabled() check out of it and into the callers, one of which will
disappear in a subsequent patch.

Once/when support for the obsolete ThunderX 1 platform is dropped, this
check reduces to a E0PD feature check on the local CPU.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/mmu.h   | 38 +++++++++++++++++-
 arch/arm64/kernel/cpufeature.c | 42 +-------------------
 arch/arm64/kernel/setup.c      |  2 +-
 3 files changed, 39 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 2fcf51231d6e..d0b8b4b413b6 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -71,7 +71,43 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot);
 extern void mark_linear_text_alias_ro(void);
-extern bool kaslr_requires_kpti(void);
+
+/*
+ * This check is triggered during the early boot before the cpufeature
+ * is initialised. Checking the status on the local CPU allows the boot
+ * CPU to detect the need for non-global mappings and thus avoiding a
+ * pagetable re-write after all the CPUs are booted. This check will be
+ * anyway run on individual CPUs, allowing us to get the consistent
+ * state once the SMP CPUs are up and thus make the switch to non-global
+ * mappings if required.
+ */
+static inline bool kaslr_requires_kpti(void)
+{
+	/*
+	 * E0PD does a similar job to KPTI so can be used instead
+	 * where available.
+	 */
+	if (IS_ENABLED(CONFIG_ARM64_E0PD)) {
+		u64 mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
+		if (cpuid_feature_extract_unsigned_field(mmfr2,
+						ID_AA64MMFR2_EL1_E0PD_SHIFT))
+			return false;
+	}
+
+	/*
+	 * Systems affected by Cavium erratum 24756 are incompatible
+	 * with KPTI.
+	 */
+	if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
+		extern const struct midr_range cavium_erratum_27456_cpus[];
+
+		if (is_midr_in_range_list(read_cpuid_id(),
+					  cavium_erratum_27456_cpus))
+			return false;
+	}
+
+	return true;
+}
 
 #define INIT_MM_CONTEXT(name)	\
 	.pgd = init_pg_dir,
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d0ffb872a31a..7064cf13f226 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1620,46 +1620,6 @@ has_useable_cnp(const struct arm64_cpu_capabilities *entry, int scope)
 	return has_cpuid_feature(entry, scope);
 }
 
-/*
- * This check is triggered during the early boot before the cpufeature
- * is initialised. Checking the status on the local CPU allows the boot
- * CPU to detect the need for non-global mappings and thus avoiding a
- * pagetable re-write after all the CPUs are booted. This check will be
- * anyway run on individual CPUs, allowing us to get the consistent
- * state once the SMP CPUs are up and thus make the switch to non-global
- * mappings if required.
- */
-bool kaslr_requires_kpti(void)
-{
-	if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
-		return false;
-
-	/*
-	 * E0PD does a similar job to KPTI so can be used instead
-	 * where available.
-	 */
-	if (IS_ENABLED(CONFIG_ARM64_E0PD)) {
-		u64 mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
-		if (cpuid_feature_extract_unsigned_field(mmfr2,
-						ID_AA64MMFR2_EL1_E0PD_SHIFT))
-			return false;
-	}
-
-	/*
-	 * Systems affected by Cavium erratum 24756 are incompatible
-	 * with KPTI.
-	 */
-	if (IS_ENABLED(CONFIG_CAVIUM_ERRATUM_27456)) {
-		extern const struct midr_range cavium_erratum_27456_cpus[];
-
-		if (is_midr_in_range_list(read_cpuid_id(),
-					  cavium_erratum_27456_cpus))
-			return false;
-	}
-
-	return kaslr_enabled();
-}
-
 static bool __meltdown_safe = true;
 static int __kpti_forced; /* 0: not forced, >0: forced on, <0: forced off */
 
@@ -1712,7 +1672,7 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
 	}
 
 	/* Useful for KASLR robustness */
-	if (kaslr_requires_kpti()) {
+	if (kaslr_enabled() && kaslr_requires_kpti()) {
 		if (!__kpti_forced) {
 			str = "KASLR";
 			__kpti_forced = 1;
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 97d2143669cf..0ef45d1927b3 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -288,7 +288,7 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 	 * mappings from the start, avoiding the cost of rewriting
 	 * everything later.
 	 */
-	arm64_use_ng_mappings = kaslr_requires_kpti();
+	arm64_use_ng_mappings = kaslr_enabled() && kaslr_requires_kpti();
 
 	early_fixmap_init();
 	early_ioremap_init();
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 17/43] arm64: mmu: Make __cpu_replace_ttbr1() out of line
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (15 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 16/43] arm64: mm: Make kaslr_requires_kpti() a static inline Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 18/43] arm64: head: Move early kernel mapping routines into C code Ard Biesheuvel
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

__cpu_replace_ttbr1() is a static inline, and so it gets instantiated
wherever it is used. This is not really necessary, as it is never called
on a hot path. It also has the unfortunate side effect that the symbol
idmap_cpu_replace_ttbr1 may never be referenced from kCFI enabled C
code, and this means the type id symbol may not exist either.  This will
result in a build error once we start referring to this symbol from asm
code as well. (Note that this problem only occurs when CnP, KAsan and
suspend/resume are all disabled in the Kconfig but that is a valid
config, if unusual).

So let's just move it out of line so all callers will share the same
implementation, which will reference idmap_cpu_replace_ttbr1
unconditionally.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/mmu_context.h | 32 +-------------------
 arch/arm64/mm/mmu.c                  | 32 ++++++++++++++++++++
 2 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 9ce4200508b1..926fbbcecbe0 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -148,37 +148,7 @@ static inline void cpu_install_ttbr0(phys_addr_t ttbr0, unsigned long t0sz)
 	isb();
 }
 
-/*
- * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
- * avoiding the possibility of conflicting TLB entries being allocated.
- */
-static inline void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
-{
-	typedef void (ttbr_replace_func)(phys_addr_t);
-	extern ttbr_replace_func idmap_cpu_replace_ttbr1;
-	ttbr_replace_func *replace_phys;
-	unsigned long daif;
-
-	/* phys_to_ttbr() zeros lower 2 bits of ttbr with 52-bit PA */
-	phys_addr_t ttbr1 = phys_to_ttbr(virt_to_phys(pgdp));
-
-	if (cnp)
-		ttbr1 |= TTBR_CNP_BIT;
-
-	replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1);
-
-	__cpu_install_idmap(idmap);
-
-	/*
-	 * We really don't want to take *any* exceptions while TTBR1 is
-	 * in the process of being replaced so mask everything.
-	 */
-	daif = local_daif_save();
-	replace_phys(ttbr1);
-	local_daif_restore(daif);
-
-	cpu_uninstall_idmap();
-}
+void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp);
 
 static inline void cpu_enable_swapper_cnp(void)
 {
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1ac7467d34c9..f9332eea318f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1486,3 +1486,35 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte
 {
 	set_pte_at(vma->vm_mm, addr, ptep, pte);
 }
+
+/*
+ * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
+ * avoiding the possibility of conflicting TLB entries being allocated.
+ */
+void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
+{
+	typedef void (ttbr_replace_func)(phys_addr_t);
+	extern ttbr_replace_func idmap_cpu_replace_ttbr1;
+	ttbr_replace_func *replace_phys;
+	unsigned long daif;
+
+	/* phys_to_ttbr() zeros lower 2 bits of ttbr with 52-bit PA */
+	phys_addr_t ttbr1 = phys_to_ttbr(virt_to_phys(pgdp));
+
+	if (cnp)
+		ttbr1 |= TTBR_CNP_BIT;
+
+	replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1);
+
+	__cpu_install_idmap(idmap);
+
+	/*
+	 * We really don't want to take *any* exceptions while TTBR1 is
+	 * in the process of being replaced so mask everything.
+	 */
+	daif = local_daif_save();
+	replace_phys(ttbr1);
+	local_daif_restore(daif);
+
+	cpu_uninstall_idmap();
+}
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 18/43] arm64: head: Move early kernel mapping routines into C code
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (16 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 17/43] arm64: mmu: Make __cpu_replace_ttbr1() out of line Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 19/43] arm64: mm: Use 48-bit virtual addressing for the permanent ID map Ard Biesheuvel
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The asm version of the kernel mapping code works fine for creating a
coarse grained identity map, but for mapping the kernel down to its
exact boundaries with the right attributes, it is not suitable. This is
why we create a preliminary RWX kernel mapping first, and then rebuild
it from scratch later on.

So let's reimplement this in C, in a way that will make it unnecessary
to create the kernel page tables yet another time in paging_init().

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/archrandom.h   |   2 -
 arch/arm64/include/asm/scs.h          |  32 +---
 arch/arm64/kernel/head.S              |  52 +------
 arch/arm64/kernel/image-vars.h        |  19 +++
 arch/arm64/kernel/pi/Makefile         |   1 +
 arch/arm64/kernel/pi/idreg-override.c |  22 ++-
 arch/arm64/kernel/pi/kaslr_early.c    |  12 +-
 arch/arm64/kernel/pi/map_kernel.c     | 164 ++++++++++++++++++++
 arch/arm64/kernel/pi/map_range.c      |  88 +++++++++++
 arch/arm64/kernel/pi/patch-scs.c      |  16 +-
 arch/arm64/kernel/pi/pi.h             |  14 ++
 arch/arm64/kernel/pi/relocate.c       |   2 +
 arch/arm64/kernel/setup.c             |   7 -
 arch/arm64/kernel/vmlinux.lds.S       |   4 +-
 arch/arm64/mm/proc.S                  |   1 +
 15 files changed, 315 insertions(+), 121 deletions(-)

diff --git a/arch/arm64/include/asm/archrandom.h b/arch/arm64/include/asm/archrandom.h
index ecdb3cfcd0f8..8babfbe31f95 100644
--- a/arch/arm64/include/asm/archrandom.h
+++ b/arch/arm64/include/asm/archrandom.h
@@ -129,6 +129,4 @@ static inline bool __init __early_cpu_has_rndr(void)
 	return (ftr >> ID_AA64ISAR0_EL1_RNDR_SHIFT) & 0xf;
 }
 
-u64 kaslr_early_init(void *fdt);
-
 #endif /* _ASM_ARCHRANDOM_H */
diff --git a/arch/arm64/include/asm/scs.h b/arch/arm64/include/asm/scs.h
index eca2ba5a6276..2e010ea76be2 100644
--- a/arch/arm64/include/asm/scs.h
+++ b/arch/arm64/include/asm/scs.h
@@ -33,37 +33,11 @@
 #include <asm/cpufeature.h>
 
 #ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
-static inline bool should_patch_pac_into_scs(void)
-{
-	u64 reg;
-
-	/*
-	 * We only enable the shadow call stack dynamically if we are running
-	 * on a system that does not implement PAC or BTI. PAC and SCS provide
-	 * roughly the same level of protection, and BTI relies on the PACIASP
-	 * instructions serving as landing pads, preventing us from patching
-	 * those instructions into something else.
-	 */
-	reg = read_sysreg_s(SYS_ID_AA64ISAR1_EL1);
-	if (SYS_FIELD_GET(ID_AA64ISAR1_EL1, APA, reg) |
-	    SYS_FIELD_GET(ID_AA64ISAR1_EL1, API, reg))
-		return false;
-
-	reg = read_sysreg_s(SYS_ID_AA64ISAR2_EL1);
-	if (SYS_FIELD_GET(ID_AA64ISAR2_EL1, APA3, reg))
-		return false;
-
-	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) {
-		reg = read_sysreg_s(SYS_ID_AA64PFR1_EL1);
-		if (reg & (0xf << ID_AA64PFR1_EL1_BT_SHIFT))
-			return false;
-	}
-	return true;
-}
-
 static inline void dynamic_scs_init(void)
 {
-	if (should_patch_pac_into_scs()) {
+	extern bool __pi_dynamic_scs_is_enabled;
+
+	if (__pi_dynamic_scs_is_enabled) {
 		pr_info("Enabling dynamic shadow call stack\n");
 		static_branch_enable(&dynamic_scs_enabled);
 	}
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index aa7766dc64d9..ffacce7b5a02 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -81,7 +81,6 @@
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
 	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
-	 *  x23        __primary_switch()                       physical misalignment/KASLR offset
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 *  x28        create_idmap()                           callee preserved temp register
 	 */
@@ -408,24 +407,6 @@ SYM_FUNC_START_LOCAL(create_idmap)
 0:	ret	x28
 SYM_FUNC_END(create_idmap)
 
-SYM_FUNC_START_LOCAL(create_kernel_mapping)
-	adrp	x0, init_pg_dir
-	mov_q	x5, KIMAGE_VADDR		// compile time __va(_text)
-#ifdef CONFIG_RELOCATABLE
-	add	x5, x5, x23			// add KASLR displacement
-#endif
-	adrp	x6, _end			// runtime __pa(_end)
-	adrp	x3, _text			// runtime __pa(_text)
-	sub	x6, x6, x3			// _end - _text
-	add	x6, x6, x5			// runtime __va(_end)
-	mov_q	x7, SWAPPER_RW_MMUFLAGS
-
-	map_memory x0, x1, x5, x6, x7, x3, (VA_BITS - PGDIR_SHIFT), x10, x11, x12, x13, x14
-
-	dsb	ishst				// sync with page table walker
-	ret
-SYM_FUNC_END(create_kernel_mapping)
-
 	/*
 	 * Initialize CPU registers with task-specific and cpu-specific context.
 	 *
@@ -752,44 +733,13 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
 
-	// Clear BSS
-	adrp	x0, __bss_start
-	mov	x1, xzr
-	adrp	x2, init_pg_end
-	sub	x2, x2, x0
-	bl	__pi_memset
-	dsb	ishst				// Make zero page visible to PTW
-
 	adrp	x1, early_init_stack
 	mov	sp, x1
 	mov	x29, xzr
 	mov	x0, x20				// pass the full boot status
 	mov	x1, x22				// pass the low FDT mapping
-	bl	__pi_init_feature_override	// Parse cpu feature overrides
-
-#ifdef CONFIG_RELOCATABLE
-	adrp	x23, KERNEL_START
-	and	x23, x23, MIN_KIMG_ALIGN - 1
-#ifdef CONFIG_RANDOMIZE_BASE
-	mov	x0, x22
-	bl	__pi_kaslr_early_init
-	bic	x0, x0, #SZ_2M - 1
-	orr	x23, x23, x0			// record kernel offset
-#endif
-#endif
-	bl	create_kernel_mapping
+	bl	__pi_early_map_kernel		// Map and relocate the kernel
 
-	adrp	x1, init_pg_dir
-	load_ttbr1 x1, x1, x2
-#ifdef CONFIG_RELOCATABLE
-	mov	x0, x23
-	bl	__pi_relocate_kernel
-#endif
-#ifdef CONFIG_UNWIND_PATCH_PAC_INTO_SCS
-	ldr	x0, =__eh_frame_start
-	ldr	x1, =__eh_frame_end
-	bl	__pi_scs_patch_vmlinux
-#endif
 	ldr	x8, =__primary_switched
 	adrp	x0, KERNEL_START		// __pa(KERNEL_START)
 	br	x8
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 8d96052079e8..e566b32f9c22 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -45,9 +45,28 @@ PROVIDE(__pi_id_aa64pfr1_override	= id_aa64pfr1_override);
 PROVIDE(__pi_id_aa64smfr0_override	= id_aa64smfr0_override);
 PROVIDE(__pi_id_aa64zfr0_override	= id_aa64zfr0_override);
 PROVIDE(__pi_arm64_sw_feature_override	= arm64_sw_feature_override);
+PROVIDE(__pi_arm64_use_ng_mappings	= arm64_use_ng_mappings);
+#ifdef CONFIG_CAVIUM_ERRATUM_27456
+PROVIDE(__pi_cavium_erratum_27456_cpus	= cavium_erratum_27456_cpus);
+#endif
 PROVIDE(__pi__ctype			= _ctype);
 PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 
+PROVIDE(__pi_init_pg_dir		= init_pg_dir);
+PROVIDE(__pi_init_pg_end		= init_pg_end);
+
+PROVIDE(__pi__text			= _text);
+PROVIDE(__pi__stext               	= _stext);
+PROVIDE(__pi__etext               	= _etext);
+PROVIDE(__pi___start_rodata       	= __start_rodata);
+PROVIDE(__pi___inittext_begin     	= __inittext_begin);
+PROVIDE(__pi___inittext_end       	= __inittext_end);
+PROVIDE(__pi___initdata_begin     	= __initdata_begin);
+PROVIDE(__pi___initdata_end       	= __initdata_end);
+PROVIDE(__pi__data                	= _data);
+PROVIDE(__pi___bss_start		= __bss_start);
+PROVIDE(__pi__end			= _end);
+
 #ifdef CONFIG_KVM
 
 /*
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index a8b302245f15..8c2f80a46b93 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -39,6 +39,7 @@ $(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
 obj-y					:= idreg-override.pi.o \
+					   map_kernel.pi.o map_range.pi.o \
 					   lib-fdt.pi.o lib-fdt_ro.pi.o
 obj-$(CONFIG_RELOCATABLE)		+= relocate.pi.o
 obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr_early.pi.o
diff --git a/arch/arm64/kernel/pi/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
index e4bcabcc6860..1884bd936c0d 100644
--- a/arch/arm64/kernel/pi/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -308,37 +308,35 @@ static __init void __parse_cmdline(const char *cmdline, bool parse_aliases)
 	} while (1);
 }
 
-static __init const u8 *get_bootargs_cmdline(const void *fdt)
+static __init const u8 *get_bootargs_cmdline(const void *fdt, int node)
 {
+	static char const bootargs[] __initconst = "bootargs";
 	const u8 *prop;
-	int node;
 
-	node = fdt_path_offset(fdt, "/chosen");
 	if (node < 0)
 		return NULL;
 
-	prop = fdt_getprop(fdt, node, "bootargs", NULL);
+	prop = fdt_getprop(fdt, node, bootargs, NULL);
 	if (!prop)
 		return NULL;
 
 	return strlen(prop) ? prop : NULL;
 }
 
-static __init void parse_cmdline(const void *fdt)
+static __init void parse_cmdline(const void *fdt, int chosen)
 {
-	const u8 *prop = get_bootargs_cmdline(fdt);
+	static char const cmdline[] __initconst = CONFIG_CMDLINE;
+	const u8 *prop = get_bootargs_cmdline(fdt, chosen);
 
 	if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !prop)
-		__parse_cmdline(CONFIG_CMDLINE, true);
+		__parse_cmdline(cmdline, true);
 
 	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE) && prop)
 		__parse_cmdline(prop, true);
 }
 
-/* Keep checkers quiet */
-void init_feature_override(u64 boot_status, const void *fdt);
-
-asmlinkage void __init init_feature_override(u64 boot_status, const void *fdt)
+void __init init_feature_override(u64 boot_status, const void *fdt,
+				  int chosen)
 {
 	struct arm64_ftr_override *override;
 	const struct ftr_set_desc *reg;
@@ -354,7 +352,7 @@ asmlinkage void __init init_feature_override(u64 boot_status, const void *fdt)
 
 	__boot_status = boot_status;
 
-	parse_cmdline(fdt);
+	parse_cmdline(fdt, chosen);
 
 	for (i = 0; i < ARRAY_SIZE(regs); i++) {
 		reg = prel64_pointer(regs[i].reg);
diff --git a/arch/arm64/kernel/pi/kaslr_early.c b/arch/arm64/kernel/pi/kaslr_early.c
index eeecee7ffd6f..0257b43819db 100644
--- a/arch/arm64/kernel/pi/kaslr_early.c
+++ b/arch/arm64/kernel/pi/kaslr_early.c
@@ -16,17 +16,17 @@
 #include <asm/memory.h>
 #include <asm/pgtable.h>
 
+#include "pi.h"
+
 extern u16 memstart_offset_seed;
 
-static u64 __init get_kaslr_seed(void *fdt)
+static u64 __init get_kaslr_seed(void *fdt, int node)
 {
-	static char const chosen_str[] __initconst = "chosen";
 	static char const seed_str[] __initconst = "kaslr-seed";
-	int node, len;
 	fdt64_t *prop;
 	u64 ret;
+	int len;
 
-	node = fdt_path_offset(fdt, chosen_str);
 	if (node < 0)
 		return 0;
 
@@ -39,14 +39,14 @@ static u64 __init get_kaslr_seed(void *fdt)
 	return ret;
 }
 
-asmlinkage u64 __init kaslr_early_init(void *fdt)
+u64 __init kaslr_early_init(void *fdt, int chosen)
 {
 	u64 seed, range;
 
 	if (kaslr_disabled_cmdline())
 		return 0;
 
-	seed = get_kaslr_seed(fdt);
+	seed = get_kaslr_seed(fdt, chosen);
 	if (!seed) {
 		if (!__early_cpu_has_rndr() ||
 		    !__arm64_rndr((unsigned long *)&seed))
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
new file mode 100644
index 000000000000..f206373b28b0
--- /dev/null
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2023 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+#include <linux/init.h>
+#include <linux/libfdt.h>
+#include <linux/linkage.h>
+#include <linux/types.h>
+#include <linux/sizes.h>
+#include <linux/string.h>
+
+#include <asm/memory.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+
+#include "pi.h"
+
+extern const u8 __eh_frame_start[], __eh_frame_end[];
+
+extern void idmap_cpu_replace_ttbr1(void *pgdir);
+
+static void __init map_segment(pgd_t *pg_dir, u64 *pgd, u64 va_offset,
+			       void *start, void *end, pgprot_t prot,
+			       bool may_use_cont, int root_level)
+{
+	map_range(pgd, ((u64)start + va_offset) & ~PAGE_OFFSET,
+		  ((u64)end + va_offset) & ~PAGE_OFFSET, (u64)start,
+		  prot, root_level, (pte_t *)pg_dir, may_use_cont, 0);
+}
+
+static void __init unmap_segment(pgd_t *pg_dir, u64 va_offset, void *start,
+				 void *end, int root_level)
+{
+	map_segment(pg_dir, NULL, va_offset, start, end, __pgprot(0),
+		    false, root_level);
+}
+
+static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
+{
+	bool enable_scs = IS_ENABLED(CONFIG_UNWIND_PATCH_PAC_INTO_SCS);
+	bool twopass = IS_ENABLED(CONFIG_RELOCATABLE);
+	u64 pgdp = (u64)init_pg_dir + PAGE_SIZE;
+	pgprot_t text_prot = PAGE_KERNEL_ROX;
+	pgprot_t data_prot = PAGE_KERNEL;
+	pgprot_t prot;
+
+	/*
+	 * External debuggers may need to write directly to the text mapping to
+	 * install SW breakpoints. Allow this (only) when explicitly requested
+	 * with rodata=off.
+	 */
+	if (arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF))
+		text_prot = PAGE_KERNEL_EXEC;
+
+	/*
+	 * We only enable the shadow call stack dynamically if we are running
+	 * on a system that does not implement PAC or BTI. PAC and SCS provide
+	 * roughly the same level of protection, and BTI relies on the PACIASP
+	 * instructions serving as landing pads, preventing us from patching
+	 * those instructions into something else.
+	 */
+	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) && cpu_has_pac())
+		enable_scs = false;
+
+	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) && cpu_has_bti()) {
+		enable_scs = false;
+
+		/*
+		 * If we have a CPU that supports BTI and a kernel built for
+		 * BTI then mark the kernel executable text as guarded pages
+		 * now so we don't have to rewrite the page tables later.
+		 */
+		text_prot = __pgprot_modify(text_prot, PTE_GP, PTE_GP);
+	}
+
+	/* Map all code read-write on the first pass if needed */
+	twopass |= enable_scs;
+	prot = twopass ? data_prot : text_prot;
+
+	map_segment(init_pg_dir, &pgdp, va_offset, _stext, _etext, prot,
+		    !twopass, root_level);
+	map_segment(init_pg_dir, &pgdp, va_offset, __start_rodata,
+		    __inittext_begin, data_prot, false, root_level);
+	map_segment(init_pg_dir, &pgdp, va_offset, __inittext_begin,
+		    __inittext_end, prot, false, root_level);
+	map_segment(init_pg_dir, &pgdp, va_offset, __initdata_begin,
+		    __initdata_end, data_prot, false, root_level);
+	map_segment(init_pg_dir, &pgdp, va_offset, _data, _end, data_prot,
+		    true, root_level);
+	dsb(ishst);
+
+	idmap_cpu_replace_ttbr1(init_pg_dir);
+
+	if (twopass) {
+		if (IS_ENABLED(CONFIG_RELOCATABLE))
+			relocate_kernel(kaslr_offset);
+
+		if (enable_scs) {
+			scs_patch(__eh_frame_start + va_offset,
+				  __eh_frame_end - __eh_frame_start);
+			asm("ic ialluis");
+
+			dynamic_scs_is_enabled = true;
+		}
+
+		/*
+		 * Unmap the text region before remapping it, to avoid
+		 * potential TLB conflicts when creating the contiguous
+		 * descriptors.
+		 */
+		unmap_segment(init_pg_dir, va_offset, _stext, _etext,
+			      root_level);
+		dsb(ishst);
+		isb();
+		__tlbi(vmalle1);
+		isb();
+
+		/*
+		 * Remap these segments with different permissions
+		 * No new page table allocations should be needed
+		 */
+		map_segment(init_pg_dir, NULL, va_offset, _stext, _etext,
+			    text_prot, true, root_level);
+		map_segment(init_pg_dir, NULL, va_offset, __inittext_begin,
+			    __inittext_end, text_prot, false, root_level);
+		dsb(ishst);
+	}
+}
+
+asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
+{
+	static char const chosen_str[] __initconst = "/chosen";
+	u64 va_base, pa_base = (u64)&_text;
+	u64 kaslr_offset = pa_base % MIN_KIMG_ALIGN;
+	int root_level = 4 - CONFIG_PGTABLE_LEVELS;
+	int chosen;
+
+	/* Clear BSS and the initial page tables */
+	memset(__bss_start, 0, (u64)init_pg_end - (u64)__bss_start);
+
+	/* Parse the command line for CPU feature overrides */
+	chosen = fdt_path_offset(fdt, chosen_str);
+	init_feature_override(boot_status, fdt, chosen);
+
+	/*
+	 * The virtual KASLR displacement modulo 2MiB is decided by the
+	 * physical placement of the image, as otherwise, we might not be able
+	 * to create the early kernel mapping using 2 MiB block descriptors. So
+	 * take the low bits of the KASLR offset from the physical address, and
+	 * fill in the high bits from the seed.
+	 */
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+		u64 kaslr_seed = kaslr_early_init(fdt, chosen);
+
+		if (kaslr_seed && kaslr_requires_kpti())
+			arm64_use_ng_mappings = true;
+
+		kaslr_offset |= kaslr_seed & ~(MIN_KIMG_ALIGN - 1);
+	}
+
+	va_base = KIMAGE_VADDR + kaslr_offset;
+	map_kernel(kaslr_offset, va_base - pa_base, root_level);
+}
diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
new file mode 100644
index 000000000000..c31feda18f47
--- /dev/null
+++ b/arch/arm64/kernel/pi/map_range.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright 2023 Google LLC
+// Author: Ard Biesheuvel <ardb@google.com>
+
+#include <linux/types.h>
+#include <linux/sizes.h>
+
+#include <asm/memory.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+
+#include "pi.h"
+
+/**
+ * map_range - Map a contiguous range of physical pages into virtual memory
+ *
+ * @pte:		Address of physical pointer to array of pages to
+ *			allocate page tables from
+ * @start:		Virtual address of the start of the range
+ * @end:		Virtual address of the end of the range (exclusive)
+ * @pa:			Physical address of the start of the range
+ * @prot:		Access permissions of the range
+ * @level:		Translation level for the mapping
+ * @tbl:		The level @level page table to create the mappings in
+ * @may_use_cont:	Whether the use of the contiguous attribute is allowed
+ * @va_offset:		Offset between a physical page and its current mapping
+ * 			in the VA space
+ */
+void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
+		      int level, pte_t *tbl, bool may_use_cont, u64 va_offset)
+{
+	u64 cmask = (level == 3) ? CONT_PTE_SIZE - 1 : U64_MAX;
+	u64 protval = pgprot_val(prot) & ~PTE_TYPE_MASK;
+	int lshift = (3 - level) * (PAGE_SHIFT - 3);
+	u64 lmask = (PAGE_SIZE << lshift) - 1;
+
+	start	&= PAGE_MASK;
+	pa	&= PAGE_MASK;
+
+	/* Advance tbl to the entry that covers start */
+	tbl += (start >> (lshift + PAGE_SHIFT)) % PTRS_PER_PTE;
+
+	/*
+	 * Set the right block/page bits for this level unless we are
+	 * clearing the mapping
+	 */
+	if (protval)
+		protval |= (level < 3) ? PMD_TYPE_SECT : PTE_TYPE_PAGE;
+
+	while (start < end) {
+		u64 next = min((start | lmask) + 1, PAGE_ALIGN(end));
+
+		if (level < 3 && (start | next | pa) & lmask) {
+			/*
+			 * This chunk needs a finer grained mapping. Create a
+			 * table mapping if necessary and recurse.
+			 */
+			if (pte_none(*tbl)) {
+				*tbl = __pte(__phys_to_pte_val(*pte) |
+					     PMD_TYPE_TABLE | PMD_TABLE_UXN);
+				*pte += PTRS_PER_PTE * sizeof(pte_t);
+			}
+			map_range(pte, start, next, pa, prot, level + 1,
+				  (pte_t *)(__pte_to_phys(*tbl) + va_offset),
+				  may_use_cont, va_offset);
+		} else {
+			/*
+			 * Start a contiguous range if start and pa are
+			 * suitably aligned
+			 */
+			if (((start | pa) & cmask) == 0 && may_use_cont)
+				protval |= PTE_CONT;
+
+			/*
+			 * Clear the contiguous attribute if the remaining
+			 * range does not cover a contiguous block
+			 */
+			if ((end & ~cmask) <= start)
+				protval &= ~PTE_CONT;
+
+			/* Put down a block or page mapping */
+			*tbl = __pte(__phys_to_pte_val(pa) | protval);
+		}
+		pa += next - start;
+		start = next;
+		tbl++;
+	}
+}
diff --git a/arch/arm64/kernel/pi/patch-scs.c b/arch/arm64/kernel/pi/patch-scs.c
index c65ef40d1e6b..49d8b40e61bc 100644
--- a/arch/arm64/kernel/pi/patch-scs.c
+++ b/arch/arm64/kernel/pi/patch-scs.c
@@ -11,6 +11,10 @@
 
 #include <asm/scs.h>
 
+#include "pi.h"
+
+bool dynamic_scs_is_enabled;
+
 //
 // This minimal DWARF CFI parser is partially based on the code in
 // arch/arc/kernel/unwind.c, and on the document below:
@@ -46,8 +50,6 @@
 #define DW_CFA_GNU_negative_offset_extended 0x2f
 #define DW_CFA_hi_user                      0x3f
 
-extern const u8 __eh_frame_start[], __eh_frame_end[];
-
 enum {
 	PACIASP		= 0xd503233f,
 	AUTIASP		= 0xd50323bf,
@@ -250,13 +252,3 @@ int scs_patch(const u8 eh_frame[], int size)
 	}
 	return 0;
 }
-
-asmlinkage void __init scs_patch_vmlinux(const u8 start[], const u8 end[])
-{
-	if (!should_patch_pac_into_scs())
-		return;
-
-	scs_patch(start, end - start);
-	asm("ic ialluis");
-	isb();
-}
diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
index 7c2d9bbf0ff9..d307c58e9741 100644
--- a/arch/arm64/kernel/pi/pi.h
+++ b/arch/arm64/kernel/pi/pi.h
@@ -2,6 +2,8 @@
 // Copyright 2023 Google LLC
 // Author: Ard Biesheuvel <ardb@google.com>
 
+#include <linux/types.h>
+
 #define __prel64_initconst	__section(".init.rodata.prel64")
 
 #define PREL64(type, name)	union { type *name; prel64_t name ## _prel; }
@@ -16,3 +18,15 @@ static inline void *prel64_to_pointer(const prel64_t *offset)
 		return NULL;
 	return (void *)offset + *offset;
 }
+
+extern bool dynamic_scs_is_enabled;
+
+void init_feature_override(u64 boot_status, const void *fdt, int chosen);
+u64 kaslr_early_init(void *fdt, int chosen);
+void relocate_kernel(u64 offset);
+int scs_patch(const u8 eh_frame[], int size);
+
+void map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
+	       int level, pte_t *tbl, bool may_use_cont, u64 va_offset);
+
+asmlinkage void early_map_kernel(u64 boot_status, void *fdt);
diff --git a/arch/arm64/kernel/pi/relocate.c b/arch/arm64/kernel/pi/relocate.c
index 1853408ea76b..2407d2696398 100644
--- a/arch/arm64/kernel/pi/relocate.c
+++ b/arch/arm64/kernel/pi/relocate.c
@@ -7,6 +7,8 @@
 #include <linux/init.h>
 #include <linux/types.h>
 
+#include "pi.h"
+
 extern const Elf64_Rela rela_start[], rela_end[];
 extern const u64 relr_start[], relr_end[];
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 0ef45d1927b3..0ea45b6d0177 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -283,13 +283,6 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
 
 	kaslr_init();
 
-	/*
-	 * If know now we are going to need KPTI then use non-global
-	 * mappings from the start, avoiding the cost of rewriting
-	 * everything later.
-	 */
-	arm64_use_ng_mappings = kaslr_enabled() && kaslr_requires_kpti();
-
 	early_fixmap_init();
 	early_ioremap_init();
 
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3afb4223a5e8..755a22d4f840 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -126,9 +126,9 @@ jiffies = jiffies_64;
 #ifdef CONFIG_UNWIND_TABLES
 #define UNWIND_DATA_SECTIONS				\
 	.eh_frame : {					\
-		__eh_frame_start = .;			\
+		__pi___eh_frame_start = .;		\
 		*(.eh_frame)				\
-		__eh_frame_end = .;			\
+		__pi___eh_frame_end = .;		\
 	}
 #else
 #define UNWIND_DATA_SECTIONS
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index f66c37a1610e..7c1bdaf25408 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -195,6 +195,7 @@ SYM_TYPED_FUNC_START(idmap_cpu_replace_ttbr1)
 
 	ret
 SYM_FUNC_END(idmap_cpu_replace_ttbr1)
+SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 	.popsection
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 19/43] arm64: mm: Use 48-bit virtual addressing for the permanent ID map
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (17 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 18/43] arm64: head: Move early kernel mapping routines into C code Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 20/43] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels Ard Biesheuvel
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Even though we support loading kernels anywhere in 48-bit addressable
physical memory, we create the ID maps based on the number of levels
that we happened to configure for the kernel VA and user VA spaces.

The reason for this is that the PGD/PUD/PMD based classification of
translation levels, along with the associated folding when the number of
levels is less than 5, does not permit creating a page table hierarchy
of a set number of levels. This means that, for instance, on 39-bit VA
kernels we need to configure an additional level above PGD level on the
fly, and 36-bit VA kernels still only support 47-bit virtual addressing
with this trick applied.

Now that we have a separate helper to populate page table hierarchies
that does not define the levels in terms of PUDS/PMDS/etc at all, let's
reuse it to create the permanent ID map with a fixed VA size of 48 bits.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h |  3 ++
 arch/arm64/kernel/head.S                |  5 +++
 arch/arm64/kvm/mmu.c                    | 15 +++------
 arch/arm64/mm/mmu.c                     | 32 +++++++++++---------
 arch/arm64/mm/proc.S                    |  9 ++----
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 0631604995ee..742a4b2778f7 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -35,6 +35,9 @@
 #define SWAPPER_PGTABLE_LEVELS	(CONFIG_PGTABLE_LEVELS)
 #endif
 
+#define IDMAP_VA_BITS		48
+#define IDMAP_LEVELS		ARM64_HW_PGTABLE_LEVELS(IDMAP_VA_BITS)
+#define IDMAP_ROOT_LEVEL	(4 - IDMAP_LEVELS)
 
 /*
  * A relocatable kernel may execute from an address that differs from the one at
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index ffacce7b5a02..a1c29d64e875 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -729,6 +729,11 @@ SYM_FUNC_START_LOCAL(__no_granule_support)
 SYM_FUNC_END(__no_granule_support)
 
 SYM_FUNC_START_LOCAL(__primary_switch)
+	mrs		x1, tcr_el1
+	mov		x2, #64 - VA_BITS
+	tcr_set_t0sz	x1, x2
+	msr		tcr_el1, x1
+
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d14504821b79..6fa9e816df40 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1874,16 +1874,9 @@ int __init kvm_mmu_init(u32 *hyp_va_bits)
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
 	/*
-	 * The ID map may be configured to use an extended virtual address
-	 * range. This is only the case if system RAM is out of range for the
-	 * currently configured page size and VA_BITS_MIN, in which case we will
-	 * also need the extended virtual range for the HYP ID map, or we won't
-	 * be able to enable the EL2 MMU.
-	 *
-	 * However, in some cases the ID map may be configured for fewer than
-	 * the number of VA bits used by the regular kernel stage 1. This
-	 * happens when VA_BITS=52 and the kernel image is placed in PA space
-	 * below 48 bits.
+	 * The ID map is always configured for 48 bits of translation, which
+	 * may be fewer than the number of VA bits used by the regular kernel
+	 * stage 1, when VA_BITS=52.
 	 *
 	 * At EL2, there is only one TTBR register, and we can't switch between
 	 * translation tables *and* update TCR_EL2.T0SZ at the same time. Bottom
@@ -1894,7 +1887,7 @@ int __init kvm_mmu_init(u32 *hyp_va_bits)
 	 * 1 VA bits to assure that the hypervisor can both ID map its code page
 	 * and map any kernel memory.
 	 */
-	idmap_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
+	idmap_bits = IDMAP_VA_BITS;
 	kernel_bits = vabits_actual;
 	*hyp_va_bits = max(idmap_bits, kernel_bits);
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f9332eea318f..a991f195592b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -757,22 +757,21 @@ static void __init map_kernel(pgd_t *pgdp)
 	kasan_copy_shadow(pgdp);
 }
 
+void __pi_map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
+		    int level, pte_t *tbl, bool may_use_cont, u64 va_offset);
+
+static u8 idmap_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE) __ro_after_init,
+	  kpti_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE) __ro_after_init;
+
 static void __init create_idmap(void)
 {
 	u64 start = __pa_symbol(__idmap_text_start);
-	u64 size = __pa_symbol(__idmap_text_end) - start;
-	pgd_t *pgd = idmap_pg_dir;
-	u64 pgd_phys;
-
-	/* check if we need an additional level of translation */
-	if (VA_BITS < 48 && idmap_t0sz < (64 - VA_BITS_MIN)) {
-		pgd_phys = early_pgtable_alloc(PAGE_SHIFT);
-		set_pgd(&idmap_pg_dir[start >> VA_BITS],
-			__pgd(pgd_phys | P4D_TYPE_TABLE));
-		pgd = __va(pgd_phys);
-	}
-	__create_pgd_mapping(pgd, start, start, size, PAGE_KERNEL_ROX,
-			     early_pgtable_alloc, 0);
+	u64 end   = __pa_symbol(__idmap_text_end);
+	u64 ptep  = __pa_symbol(idmap_ptes);
+
+	__pi_map_range(&ptep, start, end, start, PAGE_KERNEL_ROX,
+		       IDMAP_ROOT_LEVEL, (pte_t *)idmap_pg_dir, false,
+		       __phys_to_virt(ptep) - ptep);
 
 	if (IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0)) {
 		extern u32 __idmap_kpti_flag;
@@ -782,8 +781,10 @@ static void __init create_idmap(void)
 		 * The KPTI G-to-nG conversion code needs a read-write mapping
 		 * of its synchronization flag in the ID map.
 		 */
-		__create_pgd_mapping(pgd, pa, pa, sizeof(u32), PAGE_KERNEL,
-				     early_pgtable_alloc, 0);
+		ptep = __pa_symbol(kpti_ptes);
+		__pi_map_range(&ptep, pa, pa + sizeof(u32), pa, PAGE_KERNEL,
+			       IDMAP_ROOT_LEVEL, (pte_t *)idmap_pg_dir, false,
+			       __phys_to_virt(ptep) - ptep);
 	}
 }
 
@@ -808,6 +809,7 @@ void __init paging_init(void)
 	memblock_allow_resize();
 
 	create_idmap();
+	idmap_t0sz = TCR_T0SZ(IDMAP_VA_BITS);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 7c1bdaf25408..47ede52bb900 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -421,9 +421,9 @@ SYM_FUNC_START(__cpu_setup)
 	mair	.req	x17
 	tcr	.req	x16
 	mov_q	mair, MAIR_EL1_SET
-	mov_q	tcr, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
-			TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
-			TCR_TBI0 | TCR_A1 | TCR_KASAN_SW_FLAGS | TCR_MTE_FLAGS
+	mov_q	tcr, TCR_T0SZ(IDMAP_VA_BITS) | TCR_T1SZ(VA_BITS) | TCR_CACHE_FLAGS | \
+		     TCR_SMP_FLAGS | TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
+		     TCR_TBI0 | TCR_A1 | TCR_KASAN_SW_FLAGS | TCR_MTE_FLAGS
 
 	tcr_clear_errata_bits tcr, x9, x5
 
@@ -431,10 +431,7 @@ SYM_FUNC_START(__cpu_setup)
 	sub		x9, xzr, x0
 	add		x9, x9, #64
 	tcr_set_t1sz	tcr, x9
-#else
-	idmap_get_t0sz	x9
 #endif
-	tcr_set_t0sz	tcr, x9
 
 	/*
 	 * Set the IPS bits in TCR_EL1.
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 20/43] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (18 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 19/43] arm64: mm: Use 48-bit virtual addressing for the permanent ID map Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 21/43] arm64: kernel: Create initial ID map from C code Ard Biesheuvel
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The mapping from PGD/PUD/PMD to levels and shifts is very confusing,
given that, due to folding, the shifts may be equal for different
levels, if the macros are even #define'd to begin with.

In a subsequent patch, we will modify the ID mapping code to decouple
the number of levels from the kernel's view of how these types are
folded, so prepare for this by reformulating the macros without the use
of these types.

Instead, use SWAPPER_BLOCK_SHIFT as the base quantity, and derive it
from either PAGE_SHIFT or PMD_SHIFT, which -if defined at all- are
defined unambiguously for a given page size, regardless of the number of
configured levels.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 65 ++++++--------------
 1 file changed, 19 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 742a4b2778f7..f1fc98a233d5 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -13,27 +13,22 @@
 #include <asm/sparsemem.h>
 
 /*
- * The linear mapping and the start of memory are both 2M aligned (per
- * the arm64 booting.txt requirements). Hence we can use section mapping
- * with 4K (section size = 2M) but not with 16K (section size = 32M) or
- * 64K (section size = 512M).
+ * The physical and virtual addresses of the start of the kernel image are
+ * equal modulo 2 MiB (per the arm64 booting.txt requirements). Hence we can
+ * use section mapping with 4K (section size = 2M) but not with 16K (section
+ * size = 32M) or 64K (section size = 512M).
  */
-
-/*
- * The idmap and swapper page tables need some space reserved in the kernel
- * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
- * map the kernel. With the 64K page configuration, swapper and idmap need to
- * map to pte level. The swapper also maps the FDT (see __create_page_tables
- * for more information). Note that the number of ID map translation levels
- * could be increased on the fly if system RAM is out of reach for the default
- * VA range, so pages required to map highest possible PA are reserved in all
- * cases.
- */
-#ifdef CONFIG_ARM64_4K_PAGES
-#define SWAPPER_PGTABLE_LEVELS	(CONFIG_PGTABLE_LEVELS - 1)
+#if defined(PMD_SIZE) && PMD_SIZE <= MIN_KIMG_ALIGN
+#define SWAPPER_BLOCK_SHIFT	PMD_SHIFT
+#define SWAPPER_SKIP_LEVEL	1
 #else
-#define SWAPPER_PGTABLE_LEVELS	(CONFIG_PGTABLE_LEVELS)
+#define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
+#define SWAPPER_SKIP_LEVEL	0
 #endif
+#define SWAPPER_BLOCK_SIZE	(UL(1) << SWAPPER_BLOCK_SHIFT)
+#define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
+
+#define SWAPPER_PGTABLE_LEVELS		(CONFIG_PGTABLE_LEVELS - SWAPPER_SKIP_LEVEL)
 
 #define IDMAP_VA_BITS		48
 #define IDMAP_LEVELS		ARM64_HW_PGTABLE_LEVELS(IDMAP_VA_BITS)
@@ -53,24 +48,13 @@
 #define EARLY_ENTRIES(vstart, vend, shift, add) \
 	(SPAN_NR_ENTRIES(vstart, vend, shift) + (add))
 
-#define EARLY_PGDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PGDIR_SHIFT, add))
-
-#if SWAPPER_PGTABLE_LEVELS > 3
-#define EARLY_PUDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PUD_SHIFT, add))
-#else
-#define EARLY_PUDS(vstart, vend, add) (0)
-#endif
+#define EARLY_LEVEL(lvl, vstart, vend, add)	\
+	(SWAPPER_PGTABLE_LEVELS > lvl ? EARLY_ENTRIES(vstart, vend, SWAPPER_BLOCK_SHIFT + lvl * (PAGE_SHIFT - 3), add) : 0)
 
-#if SWAPPER_PGTABLE_LEVELS > 2
-#define EARLY_PMDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, SWAPPER_TABLE_SHIFT, add))
-#else
-#define EARLY_PMDS(vstart, vend, add) (0)
-#endif
-
-#define EARLY_PAGES(vstart, vend, add) ( 1 			/* PGDIR page */				\
-			+ EARLY_PGDS((vstart), (vend), add) 	/* each PGDIR needs a next level page table */	\
-			+ EARLY_PUDS((vstart), (vend), add)	/* each PUD needs a next level page table */	\
-			+ EARLY_PMDS((vstart), (vend), add))	/* each PMD needs a next level page table */
+#define EARLY_PAGES(vstart, vend, add) (1 	/* PGDIR page */				\
+	+ EARLY_LEVEL(3, (vstart), (vend), add) /* each entry needs a next level page table */	\
+	+ EARLY_LEVEL(2, (vstart), (vend), add)	/* each entry needs a next level page table */	\
+	+ EARLY_LEVEL(1, (vstart), (vend), add))/* each entry needs a next level page table */
 #define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE) + EARLY_SEGMENT_EXTRA_PAGES))
 
 /* the initial ID map may need two extra pages if it needs to be extended */
@@ -81,17 +65,6 @@
 #endif
 #define INIT_IDMAP_DIR_PAGES	EARLY_PAGES(KIMAGE_VADDR, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE, 1)
 
-/* Initial memory map size */
-#ifdef CONFIG_ARM64_4K_PAGES
-#define SWAPPER_BLOCK_SHIFT	PMD_SHIFT
-#define SWAPPER_BLOCK_SIZE	PMD_SIZE
-#define SWAPPER_TABLE_SHIFT	PUD_SHIFT
-#else
-#define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
-#define SWAPPER_BLOCK_SIZE	PAGE_SIZE
-#define SWAPPER_TABLE_SHIFT	PMD_SHIFT
-#endif
-
 /* The number of segments in the kernel image (text, rodata, inittext, initdata, data+bss) */
 #define KERNEL_SEGMENT_COUNT	5
 
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 21/43] arm64: kernel: Create initial ID map from C code
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (19 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 20/43] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 22/43] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The asm code that creates the initial ID map is rather intricate and
hard to follow. This is problematic because it makes adding support for
things like LPA2 or WXN more difficult than necessary. Also, it is
parameterized like the rest of the MM code to run with a configurable
number of levels, which is rather pointless, given that all AArch64 CPUs
implement support for 48-bit virtual addressing, and that many systems
exist with DRAM located outside of the 39-bit addressable range, which
is the only smaller VA size that is widely used, and we need additional
tricks to make things work in that combination.

So let's bite the bullet, and rip out all the asm macros, and fiddly
code, and replace it with a C implementation based on the newly added
routines for creating the early kernel VA mappings. And while at it,
create the initial ID map based on 48-bit virtual addressing as well,
regardless of the number of configured levels for the kernel proper.

Note that this code may execute with the MMU and caches disabled, and is
therefore not permitted to make unaligned accesses. This shouldn't
generally happen in any case for the algorithm as implemented, but to be
sure, let's pass -mstrict-align to the compiler just in case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h      |  14 -
 arch/arm64/include/asm/kernel-pgtable.h |  50 ++--
 arch/arm64/include/asm/mmu_context.h    |   6 +-
 arch/arm64/kernel/head.S                | 267 ++------------------
 arch/arm64/kernel/image-vars.h          |   1 +
 arch/arm64/kernel/pi/Makefile           |   3 +
 arch/arm64/kernel/pi/map_kernel.c       |  18 ++
 arch/arm64/kernel/pi/map_range.c        |  12 +
 arch/arm64/kernel/pi/pi.h               |   4 +
 arch/arm64/mm/mmu.c                     |   5 -
 arch/arm64/mm/proc.S                    |   3 +-
 11 files changed, 88 insertions(+), 295 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 513787e43329..6a467c694039 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -345,20 +345,6 @@ alternative_cb_end
 	bfi	\valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH
 	.endm
 
-/*
- * idmap_get_t0sz - get the T0SZ value needed to cover the ID map
- *
- * Calculate the maximum allowed value for TCR_EL1.T0SZ so that the
- * entire ID map region can be mapped. As T0SZ == (64 - #bits used),
- * this number conveniently equals the number of leading zeroes in
- * the physical address of _end.
- */
-	.macro	idmap_get_t0sz, reg
-	adrp	\reg, _end
-	orr	\reg, \reg, #(1 << VA_BITS_MIN) - 1
-	clz	\reg, \reg
-	.endm
-
 /*
  * tcr_compute_pa_size - set TCR.(I)PS to the highest supported
  * ID_AA64MMFR0_EL1.PARange value
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index f1fc98a233d5..bf05a77873a4 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -29,6 +29,7 @@
 #define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
 
 #define SWAPPER_PGTABLE_LEVELS		(CONFIG_PGTABLE_LEVELS - SWAPPER_SKIP_LEVEL)
+#define INIT_IDMAP_PGTABLE_LEVELS	(IDMAP_LEVELS - SWAPPER_SKIP_LEVEL)
 
 #define IDMAP_VA_BITS		48
 #define IDMAP_LEVELS		ARM64_HW_PGTABLE_LEVELS(IDMAP_VA_BITS)
@@ -48,44 +49,39 @@
 #define EARLY_ENTRIES(vstart, vend, shift, add) \
 	(SPAN_NR_ENTRIES(vstart, vend, shift) + (add))
 
-#define EARLY_LEVEL(lvl, vstart, vend, add)	\
-	(SWAPPER_PGTABLE_LEVELS > lvl ? EARLY_ENTRIES(vstart, vend, SWAPPER_BLOCK_SHIFT + lvl * (PAGE_SHIFT - 3), add) : 0)
+#define EARLY_LEVEL(lvl, lvls, vstart, vend, add)	\
+	(lvls > lvl ? EARLY_ENTRIES(vstart, vend, SWAPPER_BLOCK_SHIFT + lvl * (PAGE_SHIFT - 3), add) : 0)
 
-#define EARLY_PAGES(vstart, vend, add) (1 	/* PGDIR page */				\
-	+ EARLY_LEVEL(3, (vstart), (vend), add) /* each entry needs a next level page table */	\
-	+ EARLY_LEVEL(2, (vstart), (vend), add)	/* each entry needs a next level page table */	\
-	+ EARLY_LEVEL(1, (vstart), (vend), add))/* each entry needs a next level page table */
-#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(KIMAGE_VADDR, _end, EXTRA_PAGE) + EARLY_SEGMENT_EXTRA_PAGES))
+#define EARLY_PAGES(lvls, vstart, vend, add) (1 	/* PGDIR page */				\
+	+ EARLY_LEVEL(3, (lvls), (vstart), (vend), add) /* each entry needs a next level page table */	\
+	+ EARLY_LEVEL(2, (lvls), (vstart), (vend), add)	/* each entry needs a next level page table */	\
+	+ EARLY_LEVEL(1, (lvls), (vstart), (vend), add))/* each entry needs a next level page table */
+#define INIT_DIR_SIZE (PAGE_SIZE * (EARLY_PAGES(SWAPPER_PGTABLE_LEVELS, KIMAGE_VADDR, _end, EXTRA_PAGE) \
+				    + EARLY_SEGMENT_EXTRA_PAGES))
 
-/* the initial ID map may need two extra pages if it needs to be extended */
-#if VA_BITS < 48
-#define INIT_IDMAP_DIR_SIZE	((INIT_IDMAP_DIR_PAGES + 2) * PAGE_SIZE)
-#else
-#define INIT_IDMAP_DIR_SIZE	(INIT_IDMAP_DIR_PAGES * PAGE_SIZE)
-#endif
-#define INIT_IDMAP_DIR_PAGES	EARLY_PAGES(KIMAGE_VADDR, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE, 1)
+#define INIT_IDMAP_DIR_PAGES	(EARLY_PAGES(INIT_IDMAP_PGTABLE_LEVELS, KIMAGE_VADDR, _end, 1))
+#define INIT_IDMAP_DIR_SIZE	((INIT_IDMAP_DIR_PAGES + EARLY_IDMAP_EXTRA_PAGES) * PAGE_SIZE)
+
+#define INIT_IDMAP_FDT_PAGES	(EARLY_PAGES(INIT_IDMAP_PGTABLE_LEVELS, 0UL, UL(MAX_FDT_SIZE), 1) - 1)
+#define INIT_IDMAP_FDT_SIZE	((INIT_IDMAP_FDT_PAGES + EARLY_IDMAP_EXTRA_FDT_PAGES) * PAGE_SIZE)
 
 /* The number of segments in the kernel image (text, rodata, inittext, initdata, data+bss) */
 #define KERNEL_SEGMENT_COUNT	5
 
 #if SWAPPER_BLOCK_SIZE > SEGMENT_ALIGN
 #define EARLY_SEGMENT_EXTRA_PAGES (KERNEL_SEGMENT_COUNT + 1)
-#else
-#define EARLY_SEGMENT_EXTRA_PAGES 0
-#endif
-
 /*
- * Initial memory map attributes.
+ * The initial ID map consists of the kernel image, mapped as two separate
+ * segments, and may appear misaligned wrt the swapper block size. This means
+ * we need 3 additional pages. The DT could straddle a swapper block boundary,
+ * so it may need 2.
  */
-#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
-#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | PTE_UXN)
-
-#ifdef CONFIG_ARM64_4K_PAGES
-#define SWAPPER_RW_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS | PTE_WRITE)
-#define SWAPPER_RX_MMUFLAGS	(SWAPPER_RW_MMUFLAGS | PMD_SECT_RDONLY)
+#define EARLY_IDMAP_EXTRA_PAGES		3
+#define EARLY_IDMAP_EXTRA_FDT_PAGES	2
 #else
-#define SWAPPER_RW_MMUFLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS | PTE_WRITE)
-#define SWAPPER_RX_MMUFLAGS	(SWAPPER_RW_MMUFLAGS | PTE_RDONLY)
+#define EARLY_SEGMENT_EXTRA_PAGES	0
+#define EARLY_IDMAP_EXTRA_PAGES		0
+#define EARLY_IDMAP_EXTRA_FDT_PAGES	0
 #endif
 
 #endif	/* __ASM_KERNEL_PGTABLE_H */
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 926fbbcecbe0..a8a89a0f2867 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -61,11 +61,9 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
 }
 
 /*
- * TCR.T0SZ value to use when the ID map is active. Usually equals
- * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
- * physical memory, in which case it will be smaller.
+ * TCR.T0SZ value to use when the ID map is active.
  */
-extern int idmap_t0sz;
+#define idmap_t0sz	TCR_T0SZ(IDMAP_VA_BITS)
 
 /*
  * Ensure TCR.T0SZ is set to the provided value.
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index a1c29d64e875..545b5d8976f4 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -80,26 +80,42 @@
 	 *  x19        primary_entry() .. start_kernel()        whether we entered with the MMU on
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
-	 *  x22        create_idmap() .. start_kernel()         ID map VA of the DT blob
 	 *  x25        primary_entry() .. start_kernel()        supported VA size
-	 *  x28        create_idmap()                           callee preserved temp register
 	 */
 SYM_CODE_START(primary_entry)
 	bl	record_mmu_state
 	bl	preserve_boot_args
-	bl	create_idmap
+
+	adrp	x1, early_init_stack
+	mov	sp, x1
+	mov	x29, xzr
+	adrp	x0, init_idmap_pg_dir
+	bl	__pi_create_init_idmap
+
+	/*
+	 * If the page tables have been populated with non-cacheable
+	 * accesses (MMU disabled), invalidate those tables again to
+	 * remove any speculatively loaded cache lines.
+	 */
+	cbnz	x19, 0f
+	dmb     sy
+	mov	x1, x0				// end of used region
+	adrp    x0, init_idmap_pg_dir
+	adr_l	x2, dcache_inval_poc
+	blr	x2
+	b	1f
 
 	/*
 	 * If we entered with the MMU and caches on, clean the ID mapped part
 	 * of the primary boot code to the PoC so we can safely execute it with
 	 * the MMU off.
 	 */
-	cbz	x19, 0f
-	adrp	x0, __idmap_text_start
+0:	adrp	x0, __idmap_text_start
 	adr_l	x1, __idmap_text_end
 	adr_l	x2, dcache_clean_poc
 	blr	x2
-0:	mov	x0, x19
+
+1:	mov	x0, x19
 	bl	init_kernel_el			// w0=cpu_boot_mode
 	mov	x20, x0
 
@@ -175,238 +191,6 @@ SYM_CODE_START_LOCAL(preserve_boot_args)
 	ret
 SYM_CODE_END(preserve_boot_args)
 
-/*
- * Macro to populate page table entries, these entries can be pointers to the next level
- * or last level entries pointing to physical memory.
- *
- *	tbl:	page table address
- *	rtbl:	pointer to page table or physical memory
- *	index:	start index to write
- *	eindex:	end index to write - [index, eindex] written to
- *	flags:	flags for pagetable entry to or in
- *	inc:	increment to rtbl between each entry
- *	tmp1:	temporary variable
- *
- * Preserves:	tbl, eindex, flags, inc
- * Corrupts:	index, tmp1
- * Returns:	rtbl
- */
-	.macro populate_entries, tbl, rtbl, index, eindex, flags, inc, tmp1
-.Lpe\@:	phys_to_pte \tmp1, \rtbl
-	orr	\tmp1, \tmp1, \flags	// tmp1 = table entry
-	str	\tmp1, [\tbl, \index, lsl #3]
-	add	\rtbl, \rtbl, \inc	// rtbl = pa next level
-	add	\index, \index, #1
-	cmp	\index, \eindex
-	b.ls	.Lpe\@
-	.endm
-
-/*
- * Compute indices of table entries from virtual address range. If multiple entries
- * were needed in the previous page table level then the next page table level is assumed
- * to be composed of multiple pages. (This effectively scales the end index).
- *
- *	vstart:	virtual address of start of range
- *	vend:	virtual address of end of range - we map [vstart, vend]
- *	shift:	shift used to transform virtual address into index
- *	order:  #imm 2log(number of entries in page table)
- *	istart:	index in table corresponding to vstart
- *	iend:	index in table corresponding to vend
- *	count:	On entry: how many extra entries were required in previous level, scales
- *			  our end index.
- *		On exit: returns how many extra entries required for next page table level
- *
- * Preserves:	vstart, vend
- * Returns:	istart, iend, count
- */
-	.macro compute_indices, vstart, vend, shift, order, istart, iend, count
-	ubfx	\istart, \vstart, \shift, \order
-	ubfx	\iend, \vend, \shift, \order
-	add	\iend, \iend, \count, lsl \order
-	sub	\count, \iend, \istart
-	.endm
-
-/*
- * Map memory for specified virtual address range. Each level of page table needed supports
- * multiple entries. If a level requires n entries the next page table level is assumed to be
- * formed from n pages.
- *
- *	tbl:	location of page table
- *	rtbl:	address to be used for first level page table entry (typically tbl + PAGE_SIZE)
- *	vstart:	virtual address of start of range
- *	vend:	virtual address of end of range - we map [vstart, vend - 1]
- *	flags:	flags to use to map last level entries
- *	phys:	physical address corresponding to vstart - physical memory is contiguous
- *	order:  #imm 2log(number of entries in PGD table)
- *
- * If extra_shift is set, an extra level will be populated if the end address does
- * not fit in 'extra_shift' bits. This assumes vend is in the TTBR0 range.
- *
- * Temporaries:	istart, iend, tmp, count, sv - these need to be different registers
- * Preserves:	vstart, flags
- * Corrupts:	tbl, rtbl, vend, istart, iend, tmp, count, sv
- */
-	.macro map_memory, tbl, rtbl, vstart, vend, flags, phys, order, istart, iend, tmp, count, sv, extra_shift
-	sub \vend, \vend, #1
-	add \rtbl, \tbl, #PAGE_SIZE
-	mov \count, #0
-
-	.ifnb	\extra_shift
-	tst	\vend, #~((1 << (\extra_shift)) - 1)
-	b.eq	.L_\@
-	compute_indices \vstart, \vend, #\extra_shift, #(PAGE_SHIFT - 3), \istart, \iend, \count
-	mov \sv, \rtbl
-	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
-	mov \tbl, \sv
-	.endif
-.L_\@:
-	compute_indices \vstart, \vend, #PGDIR_SHIFT, #\order, \istart, \iend, \count
-	mov \sv, \rtbl
-	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
-	mov \tbl, \sv
-
-#if SWAPPER_PGTABLE_LEVELS > 3
-	compute_indices \vstart, \vend, #PUD_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
-	mov \sv, \rtbl
-	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
-	mov \tbl, \sv
-#endif
-
-#if SWAPPER_PGTABLE_LEVELS > 2
-	compute_indices \vstart, \vend, #SWAPPER_TABLE_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
-	mov \sv, \rtbl
-	populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmp
-	mov \tbl, \sv
-#endif
-
-	compute_indices \vstart, \vend, #SWAPPER_BLOCK_SHIFT, #(PAGE_SHIFT - 3), \istart, \iend, \count
-	bic \rtbl, \phys, #SWAPPER_BLOCK_SIZE - 1
-	populate_entries \tbl, \rtbl, \istart, \iend, \flags, #SWAPPER_BLOCK_SIZE, \tmp
-	.endm
-
-/*
- * Remap a subregion created with the map_memory macro with modified attributes
- * or output address. The entire remapped region must have been covered in the
- * invocation of map_memory.
- *
- * x0: last level table address (returned in first argument to map_memory)
- * x1: start VA of the existing mapping
- * x2: start VA of the region to update
- * x3: end VA of the region to update (exclusive)
- * x4: start PA associated with the region to update
- * x5: attributes to set on the updated region
- * x6: order of the last level mappings
- */
-SYM_FUNC_START_LOCAL(remap_region)
-	sub	x3, x3, #1		// make end inclusive
-
-	// Get the index offset for the start of the last level table
-	lsr	x1, x1, x6
-	bfi	x1, xzr, #0, #PAGE_SHIFT - 3
-
-	// Derive the start and end indexes into the last level table
-	// associated with the provided region
-	lsr	x2, x2, x6
-	lsr	x3, x3, x6
-	sub	x2, x2, x1
-	sub	x3, x3, x1
-
-	mov	x1, #1
-	lsl	x6, x1, x6		// block size at this level
-
-	populate_entries x0, x4, x2, x3, x5, x6, x7
-	ret
-SYM_FUNC_END(remap_region)
-
-SYM_FUNC_START_LOCAL(create_idmap)
-	mov	x28, lr
-	/*
-	 * The ID map carries a 1:1 mapping of the physical address range
-	 * covered by the loaded image, which could be anywhere in DRAM. This
-	 * means that the required size of the VA (== PA) space is decided at
-	 * boot time, and could be more than the configured size of the VA
-	 * space for ordinary kernel and user space mappings.
-	 *
-	 * There are three cases to consider here:
-	 * - 39 <= VA_BITS < 48, and the ID map needs up to 48 VA bits to cover
-	 *   the placement of the image. In this case, we configure one extra
-	 *   level of translation on the fly for the ID map only. (This case
-	 *   also covers 42-bit VA/52-bit PA on 64k pages).
-	 *
-	 * - VA_BITS == 48, and the ID map needs more than 48 VA bits. This can
-	 *   only happen when using 64k pages, in which case we need to extend
-	 *   the root level table rather than add a level. Note that we can
-	 *   treat this case as 'always extended' as long as we take care not
-	 *   to program an unsupported T0SZ value into the TCR register.
-	 *
-	 * - Combinations that would require two additional levels of
-	 *   translation are not supported, e.g., VA_BITS==36 on 16k pages, or
-	 *   VA_BITS==39/4k pages with 5-level paging, where the input address
-	 *   requires more than 47 or 48 bits, respectively.
-	 */
-#if (VA_BITS < 48)
-#define IDMAP_PGD_ORDER	(VA_BITS - PGDIR_SHIFT)
-#define EXTRA_SHIFT	(PGDIR_SHIFT + PAGE_SHIFT - 3)
-
-	/*
-	 * If VA_BITS < 48, we have to configure an additional table level.
-	 * First, we have to verify our assumption that the current value of
-	 * VA_BITS was chosen such that all translation levels are fully
-	 * utilised, and that lowering T0SZ will always result in an additional
-	 * translation level to be configured.
-	 */
-#if VA_BITS != EXTRA_SHIFT
-#error "Mismatch between VA_BITS and page size/number of translation levels"
-#endif
-#else
-#define IDMAP_PGD_ORDER	(PHYS_MASK_SHIFT - PGDIR_SHIFT)
-#define EXTRA_SHIFT
-	/*
-	 * If VA_BITS == 48, we don't have to configure an additional
-	 * translation level, but the top-level table has more entries.
-	 */
-#endif
-	adrp	x0, init_idmap_pg_dir
-	adrp	x3, _text
-	adrp	x6, _end + MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE
-	mov_q	x7, SWAPPER_RX_MMUFLAGS
-
-	map_memory x0, x1, x3, x6, x7, x3, IDMAP_PGD_ORDER, x10, x11, x12, x13, x14, EXTRA_SHIFT
-
-	/* Remap [.init].data, BSS and the kernel page tables r/w in the ID map */
-	adrp	x1, _text
-	adrp	x2, __initdata_begin
-	adrp	x3, _end
-	bic	x4, x2, #SWAPPER_BLOCK_SIZE - 1
-	mov_q	x5, SWAPPER_RW_MMUFLAGS
-	mov	x6, #SWAPPER_BLOCK_SHIFT
-	bl	remap_region
-
-	/* Remap the FDT after the kernel image */
-	adrp	x1, _text
-	adrp	x22, _end + SWAPPER_BLOCK_SIZE
-	bic	x2, x22, #SWAPPER_BLOCK_SIZE - 1
-	bfi	x22, x21, #0, #SWAPPER_BLOCK_SHIFT		// remapped FDT address
-	add	x3, x2, #MAX_FDT_SIZE + SWAPPER_BLOCK_SIZE
-	bic	x4, x21, #SWAPPER_BLOCK_SIZE - 1
-	mov_q	x5, SWAPPER_RW_MMUFLAGS
-	mov	x6, #SWAPPER_BLOCK_SHIFT
-	bl	remap_region
-
-	/*
-	 * Since the page tables have been populated with non-cacheable
-	 * accesses (MMU disabled), invalidate those tables again to
-	 * remove any speculatively loaded cache lines.
-	 */
-	cbnz	x19, 0f				// skip cache invalidation if MMU is on
-	dmb	sy
-
-	adrp	x0, init_idmap_pg_dir
-	adrp	x1, init_idmap_pg_end
-	bl	dcache_inval_poc
-0:	ret	x28
-SYM_FUNC_END(create_idmap)
-
 	/*
 	 * Initialize CPU registers with task-specific and cpu-specific context.
 	 *
@@ -729,11 +513,6 @@ SYM_FUNC_START_LOCAL(__no_granule_support)
 SYM_FUNC_END(__no_granule_support)
 
 SYM_FUNC_START_LOCAL(__primary_switch)
-	mrs		x1, tcr_el1
-	mov		x2, #64 - VA_BITS
-	tcr_set_t0sz	x1, x2
-	msr		tcr_el1, x1
-
 	adrp	x1, reserved_pg_dir
 	adrp	x2, init_idmap_pg_dir
 	bl	__enable_mmu
@@ -742,7 +521,7 @@ SYM_FUNC_START_LOCAL(__primary_switch)
 	mov	sp, x1
 	mov	x29, xzr
 	mov	x0, x20				// pass the full boot status
-	mov	x1, x22				// pass the low FDT mapping
+	mov	x1, x21				// pass the FDT
 	bl	__pi_early_map_kernel		// Map and relocate the kernel
 
 	ldr	x8, =__primary_switched
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index e566b32f9c22..941a14c05184 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -52,6 +52,7 @@ PROVIDE(__pi_cavium_erratum_27456_cpus	= cavium_erratum_27456_cpus);
 PROVIDE(__pi__ctype			= _ctype);
 PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 
+PROVIDE(__pi_init_idmap_pg_dir		= init_idmap_pg_dir);
 PROVIDE(__pi_init_pg_dir		= init_pg_dir);
 PROVIDE(__pi_init_pg_end		= init_pg_end);
 
diff --git a/arch/arm64/kernel/pi/Makefile b/arch/arm64/kernel/pi/Makefile
index 8c2f80a46b93..4393b41f0b71 100644
--- a/arch/arm64/kernel/pi/Makefile
+++ b/arch/arm64/kernel/pi/Makefile
@@ -11,6 +11,9 @@ KBUILD_CFLAGS	:= $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \
 		   -fno-asynchronous-unwind-tables -fno-unwind-tables \
 		   $(call cc-option,-fno-addrsig)
 
+# this code may run with the MMU off so disable unaligned accesses
+CFLAGS_map_range.o += -mstrict-align
+
 # remove SCS flags from all objects in this directory
 KBUILD_CFLAGS	:= $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
 # disable LTO
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index f206373b28b0..f86e878d366d 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -128,6 +128,22 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
 	}
 }
 
+static void __init map_fdt(u64 fdt)
+{
+	static u8 ptes[INIT_IDMAP_FDT_SIZE] __initdata __aligned(PAGE_SIZE);
+	u64 efdt = fdt + MAX_FDT_SIZE;
+	u64 ptep = (u64)ptes;
+
+	/*
+	 * Map up to MAX_FDT_SIZE bytes, but avoid overlap with
+	 * the kernel image.
+	 */
+	map_range(&ptep, fdt, (u64)_text > fdt ? min((u64)_text, efdt) : efdt,
+		  fdt, PAGE_KERNEL, IDMAP_ROOT_LEVEL,
+		  (pte_t *)init_idmap_pg_dir, false, 0);
+	dsb(ishst);
+}
+
 asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 {
 	static char const chosen_str[] __initconst = "/chosen";
@@ -136,6 +152,8 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 	int root_level = 4 - CONFIG_PGTABLE_LEVELS;
 	int chosen;
 
+	map_fdt((u64)fdt);
+
 	/* Clear BSS and the initial page tables */
 	memset(__bss_start, 0, (u64)init_pg_end - (u64)__bss_start);
 
diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
index c31feda18f47..79e4f6a2efe1 100644
--- a/arch/arm64/kernel/pi/map_range.c
+++ b/arch/arm64/kernel/pi/map_range.c
@@ -86,3 +86,15 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
 		tbl++;
 	}
 }
+
+asmlinkage u64 __init create_init_idmap(pgd_t *pg_dir)
+{
+	u64 ptep = (u64)pg_dir + PAGE_SIZE;
+
+	map_range(&ptep, (u64)_stext, (u64)__initdata_begin, (u64)_stext,
+		  PAGE_KERNEL_ROX, IDMAP_ROOT_LEVEL, (pte_t *)pg_dir, false, 0);
+	map_range(&ptep, (u64)__initdata_begin, (u64)_end, (u64)__initdata_begin,
+		  PAGE_KERNEL, IDMAP_ROOT_LEVEL, (pte_t *)pg_dir, false, 0);
+
+	return ptep;
+}
diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
index d307c58e9741..1ea282a5f96a 100644
--- a/arch/arm64/kernel/pi/pi.h
+++ b/arch/arm64/kernel/pi/pi.h
@@ -21,6 +21,8 @@ static inline void *prel64_to_pointer(const prel64_t *offset)
 
 extern bool dynamic_scs_is_enabled;
 
+extern pgd_t init_idmap_pg_dir[];
+
 void init_feature_override(u64 boot_status, const void *fdt, int chosen);
 u64 kaslr_early_init(void *fdt, int chosen);
 void relocate_kernel(u64 offset);
@@ -30,3 +32,5 @@ void map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
 	       int level, pte_t *tbl, bool may_use_cont, u64 va_offset);
 
 asmlinkage void early_map_kernel(u64 boot_status, void *fdt);
+
+asmlinkage u64 create_init_idmap(pgd_t *pgd);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a991f195592b..14a62c773201 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -45,8 +45,6 @@
 #define NO_CONT_MAPPINGS	BIT(1)
 #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
 
-int idmap_t0sz __ro_after_init;
-
 #if VA_BITS > 48
 u64 vabits_actual __ro_after_init = VA_BITS_MIN;
 EXPORT_SYMBOL(vabits_actual);
@@ -793,8 +791,6 @@ void __init paging_init(void)
 	pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
 	extern pgd_t init_idmap_pg_dir[];
 
-	idmap_t0sz = 63UL - __fls(__pa_symbol(_end) | GENMASK(VA_BITS_MIN - 1, 0));
-
 	map_kernel(pgdp);
 	map_mem(pgdp);
 
@@ -809,7 +805,6 @@ void __init paging_init(void)
 	memblock_allow_resize();
 
 	create_idmap();
-	idmap_t0sz = TCR_T0SZ(IDMAP_VA_BITS);
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 47ede52bb900..55c366dbda8f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -200,7 +200,8 @@ SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 
-#define KPTI_NG_PTE_FLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS | PTE_WRITE)
+#define KPTI_NG_PTE_FLAGS	(PTE_ATTRINDX(MT_NORMAL) | PTE_TYPE_PAGE | \
+				 PTE_AF | PTE_SHARED | PTE_UXN | PTE_WRITE)
 
 	.pushsection ".idmap.text", "a"
 
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 22/43] arm64: mm: avoid fixmap for early swapper_pg_dir updates
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (20 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 21/43] arm64: kernel: Create initial ID map from C code Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 23/43] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Early in the boot, when .rodata is still writable, we can poke
swapper_pg_dir entries directly, and there is no need to go through the
fixmap. After a future patch, we will enter the kernel with
swapper_pg_dir already active, and early swapper_pg_dir updates for
creating the fixmap page table hierarchy itself cannot go through the
fixmap for obvious reaons. So let's keep track of whether rodata is
writable, and update the descriptor directly in that case.

As the same reasoning applies to early KASAN init, make the function
noinstr as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/mmu.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 14a62c773201..9758f7e3f4b6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -55,6 +55,8 @@ EXPORT_SYMBOL(kimage_voffset);
 
 u32 __boot_cpu_mode[] = { BOOT_CPU_MODE_EL2, BOOT_CPU_MODE_EL1 };
 
+static bool rodata_is_rw __ro_after_init = true;
+
 /*
  * The booting CPU updates the failed status @__early_cpu_boot_status,
  * with MMU turned off.
@@ -71,10 +73,21 @@ EXPORT_SYMBOL(empty_zero_page);
 static DEFINE_SPINLOCK(swapper_pgdir_lock);
 static DEFINE_MUTEX(fixmap_lock);
 
-void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
+void noinstr set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
 {
 	pgd_t *fixmap_pgdp;
 
+	/*
+	 * Don't bother with the fixmap if swapper_pg_dir is still mapped
+	 * writable in the kernel mapping.
+	 */
+	if (rodata_is_rw) {
+		WRITE_ONCE(*pgdp, pgd);
+		dsb(ishst);
+		isb();
+		return;
+	}
+
 	spin_lock(&swapper_pgdir_lock);
 	fixmap_pgdp = pgd_set_fixmap(__pa_symbol(pgdp));
 	WRITE_ONCE(*fixmap_pgdp, pgd);
@@ -628,6 +641,7 @@ void mark_rodata_ro(void)
 	 * to cover NOTES and EXCEPTION_TABLE.
 	 */
 	section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata;
+	WRITE_ONCE(rodata_is_rw, false);
 	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
 
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 23/43] arm64: mm: omit redundant remap of kernel image
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (21 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 22/43] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 24/43] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()" Ard Biesheuvel
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Now that the early kernel mapping is created with all the right
attributes and segment boundaries, there is no longer a need to recreate
it and switch to it. This also means we no longer have to copy the kasan
shadow or some parts of the fixmap from one set of page tables to the
other.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/fixmap.h   |  1 -
 arch/arm64/include/asm/kasan.h    |  2 -
 arch/arm64/include/asm/mmu.h      |  2 +-
 arch/arm64/kernel/image-vars.h    |  1 +
 arch/arm64/kernel/pi/map_kernel.c |  6 +-
 arch/arm64/mm/fixmap.c            | 34 --------
 arch/arm64/mm/kasan_init.c        | 15 ----
 arch/arm64/mm/mmu.c               | 85 ++++----------------
 8 files changed, 21 insertions(+), 125 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 58c294a96676..8aabd45e9a13 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -100,7 +100,6 @@ enum fixed_addresses {
 #define FIXMAP_PAGE_IO     __pgprot(PROT_DEVICE_nGnRE)
 
 void __init early_fixmap_init(void);
-void __init fixmap_copy(pgd_t *pgdir);
 
 #define __early_set_fixmap __set_fixmap
 
diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
index 7eefc525a9df..e1b57c13f8a4 100644
--- a/arch/arm64/include/asm/kasan.h
+++ b/arch/arm64/include/asm/kasan.h
@@ -17,11 +17,9 @@
 
 asmlinkage void kasan_early_init(void);
 void kasan_init(void);
-void kasan_copy_shadow(pgd_t *pgdir);
 
 #else
 static inline void kasan_init(void) { }
-static inline void kasan_copy_shadow(pgd_t *pgdir) { }
 #endif
 
 #endif
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index d0b8b4b413b6..65977c7783c5 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -110,7 +110,7 @@ static inline bool kaslr_requires_kpti(void)
 }
 
 #define INIT_MM_CONTEXT(name)	\
-	.pgd = init_pg_dir,
+	.pgd = swapper_pg_dir,
 
 #endif	/* !__ASSEMBLY__ */
 #endif
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 941a14c05184..e140c5bda90b 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -55,6 +55,7 @@ PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 PROVIDE(__pi_init_idmap_pg_dir		= init_idmap_pg_dir);
 PROVIDE(__pi_init_pg_dir		= init_pg_dir);
 PROVIDE(__pi_init_pg_end		= init_pg_end);
+PROVIDE(__pi_swapper_pg_dir		= swapper_pg_dir);
 
 PROVIDE(__pi__text			= _text);
 PROVIDE(__pi__stext               	= _stext);
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index f86e878d366d..4b76a007a50d 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -124,8 +124,12 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
 			    text_prot, true, root_level);
 		map_segment(init_pg_dir, NULL, va_offset, __inittext_begin,
 			    __inittext_end, text_prot, false, root_level);
-		dsb(ishst);
 	}
+
+	/* Copy the root page table to its final location */
+	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PGD_SIZE);
+	dsb(ishst);
+	idmap_cpu_replace_ttbr1(swapper_pg_dir);
 }
 
 static void __init map_fdt(u64 fdt)
diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
index 6fc17b2e1714..9404f282f829 100644
--- a/arch/arm64/mm/fixmap.c
+++ b/arch/arm64/mm/fixmap.c
@@ -170,37 +170,3 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 
 	return dt_virt;
 }
-
-/*
- * Copy the fixmap region into a new pgdir.
- */
-void __init fixmap_copy(pgd_t *pgdir)
-{
-	if (!READ_ONCE(pgd_val(*pgd_offset_pgd(pgdir, FIXADDR_TOT_START)))) {
-		/*
-		 * The fixmap falls in a separate pgd to the kernel, and doesn't
-		 * live in the carveout for the swapper_pg_dir. We can simply
-		 * re-use the existing dir for the fixmap.
-		 */
-		set_pgd(pgd_offset_pgd(pgdir, FIXADDR_TOT_START),
-			READ_ONCE(*pgd_offset_k(FIXADDR_TOT_START)));
-	} else if (CONFIG_PGTABLE_LEVELS > 3) {
-		pgd_t *bm_pgdp;
-		p4d_t *bm_p4dp;
-		pud_t *bm_pudp;
-		/*
-		 * The fixmap shares its top level pgd entry with the kernel
-		 * mapping. This can really only occur when we are running
-		 * with 16k/4 levels, so we can simply reuse the pud level
-		 * entry instead.
-		 */
-		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
-		bm_pgdp = pgd_offset_pgd(pgdir, FIXADDR_TOT_START);
-		bm_p4dp = p4d_offset(bm_pgdp, FIXADDR_TOT_START);
-		bm_pudp = pud_set_fixmap_offset(bm_p4dp, FIXADDR_TOT_START);
-		pud_populate(&init_mm, bm_pudp, lm_alias(bm_pmd));
-		pud_clear_fixmap();
-	} else {
-		BUG();
-	}
-}
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 4c7ad574b946..89828ad2bca7 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -189,21 +189,6 @@ static void __init kasan_map_populate(unsigned long start, unsigned long end,
 	kasan_pgd_populate(start & PAGE_MASK, PAGE_ALIGN(end), node, false);
 }
 
-/*
- * Copy the current shadow region into a new pgdir.
- */
-void __init kasan_copy_shadow(pgd_t *pgdir)
-{
-	pgd_t *pgdp, *pgdp_new, *pgdp_end;
-
-	pgdp = pgd_offset_k(KASAN_SHADOW_START);
-	pgdp_end = pgd_offset_k(KASAN_SHADOW_END);
-	pgdp_new = pgd_offset_pgd(pgdir, KASAN_SHADOW_START);
-	do {
-		set_pgd(pgdp_new, READ_ONCE(*pgdp));
-	} while (pgdp++, pgdp_new++, pgdp != pgdp_end);
-}
-
 static void __init clear_pgds(unsigned long start,
 			unsigned long end)
 {
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9758f7e3f4b6..3db40b517947 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -648,9 +648,9 @@ void mark_rodata_ro(void)
 	debug_checkwx();
 }
 
-static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
-				      pgprot_t prot, struct vm_struct *vma,
-				      int flags, unsigned long vm_flags)
+static void __init declare_vma(struct vm_struct *vma,
+			       void *va_start, void *va_end,
+			       unsigned long vm_flags)
 {
 	phys_addr_t pa_start = __pa_symbol(va_start);
 	unsigned long size = va_end - va_start;
@@ -658,9 +658,6 @@ static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
 	BUG_ON(!PAGE_ALIGNED(pa_start));
 	BUG_ON(!PAGE_ALIGNED(size));
 
-	__create_pgd_mapping(pgdp, pa_start, (unsigned long)va_start, size, prot,
-			     early_pgtable_alloc, flags);
-
 	if (!(vm_flags & VM_NO_GUARD))
 		size += PAGE_SIZE;
 
@@ -673,12 +670,12 @@ static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end,
 	vm_area_add_early(vma);
 }
 
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 static pgprot_t kernel_exec_prot(void)
 {
 	return rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
 }
 
-#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 static int __init map_entry_trampoline(void)
 {
 	int i;
@@ -713,60 +710,17 @@ core_initcall(map_entry_trampoline);
 #endif
 
 /*
- * Open coded check for BTI, only for use to determine configuration
- * for early mappings for before the cpufeature code has run.
- */
-static bool arm64_early_this_cpu_has_bti(void)
-{
-	u64 pfr1;
-
-	if (!IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-		return false;
-
-	pfr1 = __read_sysreg_by_encoding(SYS_ID_AA64PFR1_EL1);
-	return cpuid_feature_extract_unsigned_field(pfr1,
-						    ID_AA64PFR1_EL1_BT_SHIFT);
-}
-
-/*
- * Create fine-grained mappings for the kernel.
+ * Declare the VMA areas for the kernel
  */
-static void __init map_kernel(pgd_t *pgdp)
+static void __init declare_kernel_vmas(void)
 {
-	static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
-				vmlinux_initdata, vmlinux_data;
-
-	/*
-	 * External debuggers may need to write directly to the text
-	 * mapping to install SW breakpoints. Allow this (only) when
-	 * explicitly requested with rodata=off.
-	 */
-	pgprot_t text_prot = kernel_exec_prot();
-
-	/*
-	 * If we have a CPU that supports BTI and a kernel built for
-	 * BTI then mark the kernel executable text as guarded pages
-	 * now so we don't have to rewrite the page tables later.
-	 */
-	if (arm64_early_this_cpu_has_bti())
-		text_prot = __pgprot_modify(text_prot, PTE_GP, PTE_GP);
+	static struct vm_struct vmlinux_seg[KERNEL_SEGMENT_COUNT];
 
-	/*
-	 * Only rodata will be remapped with different permissions later on,
-	 * all other segments are allowed to use contiguous mappings.
-	 */
-	map_kernel_segment(pgdp, _stext, _etext, text_prot, &vmlinux_text, 0,
-			   VM_NO_GUARD);
-	map_kernel_segment(pgdp, __start_rodata, __inittext_begin, PAGE_KERNEL,
-			   &vmlinux_rodata, NO_CONT_MAPPINGS, VM_NO_GUARD);
-	map_kernel_segment(pgdp, __inittext_begin, __inittext_end, text_prot,
-			   &vmlinux_inittext, 0, VM_NO_GUARD);
-	map_kernel_segment(pgdp, __initdata_begin, __initdata_end, PAGE_KERNEL,
-			   &vmlinux_initdata, 0, VM_NO_GUARD);
-	map_kernel_segment(pgdp, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0);
-
-	fixmap_copy(pgdp);
-	kasan_copy_shadow(pgdp);
+	declare_vma(&vmlinux_seg[0], _stext, _etext, VM_NO_GUARD);
+	declare_vma(&vmlinux_seg[1], __start_rodata, __inittext_begin, VM_NO_GUARD);
+	declare_vma(&vmlinux_seg[2], __inittext_begin, __inittext_end, VM_NO_GUARD);
+	declare_vma(&vmlinux_seg[3], __initdata_begin, __initdata_end, VM_NO_GUARD);
+	declare_vma(&vmlinux_seg[4], _data, _end, 0);
 }
 
 void __pi_map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
@@ -802,23 +756,12 @@ static void __init create_idmap(void)
 
 void __init paging_init(void)
 {
-	pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
-	extern pgd_t init_idmap_pg_dir[];
-
-	map_kernel(pgdp);
-	map_mem(pgdp);
-
-	pgd_clear_fixmap();
-
-	cpu_replace_ttbr1(lm_alias(swapper_pg_dir), init_idmap_pg_dir);
-	init_mm.pgd = swapper_pg_dir;
-
-	memblock_phys_free(__pa_symbol(init_pg_dir),
-			   __pa_symbol(init_pg_end) - __pa_symbol(init_pg_dir));
+	map_mem(swapper_pg_dir);
 
 	memblock_allow_resize();
 
 	create_idmap();
+	declare_kernel_vmas();
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 24/43] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()"
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (22 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 23/43] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 25/43] arm64: mm: Handle LVA support as a CPU feature Ard Biesheuvel
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

This reverts commit 1682c45b920643c, which is no longer needed now that
we create the permanent kernel mapping directly during early boot.

This is a RINO (revert in name only) given that some of the code has
moved around, but the changes are straight-forward.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/mmu_context.h | 17 ++++++-----------
 arch/arm64/mm/kasan_init.c           |  4 ++--
 arch/arm64/mm/mmu.c                  |  4 ++--
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index a8a89a0f2867..c768d16b81a4 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -108,18 +108,13 @@ static inline void cpu_uninstall_idmap(void)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
-static inline void __cpu_install_idmap(pgd_t *idmap)
+static inline void cpu_install_idmap(void)
 {
 	cpu_set_reserved_ttbr0();
 	local_flush_tlb_all();
 	cpu_set_idmap_tcr_t0sz();
 
-	cpu_switch_mm(lm_alias(idmap), &init_mm);
-}
-
-static inline void cpu_install_idmap(void)
-{
-	__cpu_install_idmap(idmap_pg_dir);
+	cpu_switch_mm(lm_alias(idmap_pg_dir), &init_mm);
 }
 
 /*
@@ -146,21 +141,21 @@ static inline void cpu_install_ttbr0(phys_addr_t ttbr0, unsigned long t0sz)
 	isb();
 }
 
-void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp);
+void __cpu_replace_ttbr1(pgd_t *pgdp, bool cnp);
 
 static inline void cpu_enable_swapper_cnp(void)
 {
-	__cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir, true);
+	__cpu_replace_ttbr1(lm_alias(swapper_pg_dir), true);
 }
 
-static inline void cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap)
+static inline void cpu_replace_ttbr1(pgd_t *pgdp)
 {
 	/*
 	 * Only for early TTBR1 replacement before cpucaps are finalized and
 	 * before we've decided whether to use CNP.
 	 */
 	WARN_ON(system_capabilities_finalized());
-	__cpu_replace_ttbr1(pgdp, idmap, false);
+	__cpu_replace_ttbr1(pgdp, false);
 }
 
 /*
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 89828ad2bca7..a86ab99587c9 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -225,7 +225,7 @@ static void __init kasan_init_shadow(void)
 	 */
 	memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir));
 	dsb(ishst);
-	cpu_replace_ttbr1(lm_alias(tmp_pg_dir), idmap_pg_dir);
+	cpu_replace_ttbr1(lm_alias(tmp_pg_dir));
 
 	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
@@ -261,7 +261,7 @@ static void __init kasan_init_shadow(void)
 				PAGE_KERNEL_RO));
 
 	memset(kasan_early_shadow_page, KASAN_SHADOW_INIT, PAGE_SIZE);
-	cpu_replace_ttbr1(lm_alias(swapper_pg_dir), idmap_pg_dir);
+	cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
 }
 
 static void __init kasan_init_depth(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 3db40b517947..a3d23da92d87 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1445,7 +1445,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte
  * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
  * avoiding the possibility of conflicting TLB entries being allocated.
  */
-void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
+void __cpu_replace_ttbr1(pgd_t *pgdp, bool cnp)
 {
 	typedef void (ttbr_replace_func)(phys_addr_t);
 	extern ttbr_replace_func idmap_cpu_replace_ttbr1;
@@ -1460,7 +1460,7 @@ void __cpu_replace_ttbr1(pgd_t *pgdp, pgd_t *idmap, bool cnp)
 
 	replace_phys = (void *)__pa_symbol(idmap_cpu_replace_ttbr1);
 
-	__cpu_install_idmap(idmap);
+	cpu_install_idmap();
 
 	/*
 	 * We really don't want to take *any* exceptions while TTBR1 is
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 25/43] arm64: mm: Handle LVA support as a CPU feature
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (23 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 24/43] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()" Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 26/43] arm64: mm: Add feature override support for LVA Ard Biesheuvel
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Currently, we detect CPU support for 52-bit virtual addressing (LVA)
extremely early, before creating the kernel page tables or enabling the
MMU. We cannot override the feature this early, and so large virtual
addressing is always enabled on CPUs that implement support for it if
the software support for it was enabled at build time. It also means we
rely on non-trivial code in asm to deal with this feature.

Given that both the ID map and the TTBR1 mapping of the kernel image are
guaranteed to be 48-bit addressable, it is not actually necessary to
enable support this early, and instead, we can model it as a CPU
feature. That way, we can rely on code patching to get the correct
TCR.T1SZ values programmed on secondary boot and resume from suspend.

On the primary boot path, we simply enable the MMU with 48-bit virtual
addressing initially, and update TCR.T1SZ if LVA is supported from C
code, right before creating the kernel mapping. Given that TTBR1 still
points to reserved_pg_dir at this point, updating TCR.T1SZ should be
safe without the need for explicit TLB maintenance.

Since this gets rid of all accesses to the vabits_actual variable from
asm code that occurred before TCR.T1SZ had been programmed, we no longer
have a need for this variable, and we can replace it with a C expression
that produces the correct value directly, based on the value of TCR.T1SZ.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/cpufeature.h |  9 ++++++
 arch/arm64/include/asm/memory.h     | 13 ++++++++-
 arch/arm64/kernel/cpufeature.c      | 13 +++++++++
 arch/arm64/kernel/head.S            | 29 +++++---------------
 arch/arm64/kernel/image-vars.h      |  1 -
 arch/arm64/kernel/pi/map_kernel.c   |  3 ++
 arch/arm64/kernel/sleep.S           |  3 --
 arch/arm64/mm/mmu.c                 |  5 ----
 arch/arm64/mm/proc.S                |  9 +++---
 arch/arm64/tools/cpucaps            |  1 +
 10 files changed, 49 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index e3edae1825f3..4f4dc5496ee3 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -995,6 +995,15 @@ static inline bool cpu_has_pac(void)
 					    &id_aa64isar2_override);
 }
 
+static inline bool cpu_has_lva(void)
+{
+	u64 mmfr2;
+
+	mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
+	return cpuid_feature_extract_unsigned_field(mmfr2,
+						    ID_AA64MMFR2_EL1_VARange_SHIFT);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 60904a6c4b42..9680d7444b3b 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -209,9 +209,20 @@
 #include <asm/boot.h>
 #include <asm/bug.h>
 #include <asm/sections.h>
+#include <asm/sysreg.h>
+
+static inline u64 __pure read_tcr(void)
+{
+	u64  tcr;
+
+	// read_sysreg() uses asm volatile, so avoid it here
+	asm("mrs %0, tcr_el1" : "=r"(tcr));
+	return tcr;
+}
 
 #if VA_BITS > 48
-extern u64			vabits_actual;
+// For reasons of #include hell, we can't use TCR_T1SZ_OFFSET/TCR_T1SZ_MASK here
+#define vabits_actual		(64 - ((read_tcr() >> 16) & 63))
 #else
 #define vabits_actual		((u64)VA_BITS)
 #endif
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 7064cf13f226..8eb8c7f7b317 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2692,6 +2692,19 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
 		.matches = has_lpa2,
 	},
+#ifdef CONFIG_ARM64_VA_BITS_52
+	{
+		.desc = "52-bit Virtual Addressing (LVA)",
+		.capability = ARM64_HAS_VA52,
+		.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_width = 4,
+		.field_pos = ID_AA64MMFR2_EL1_VARange_SHIFT,
+		.matches = has_cpuid_feature,
+		.min_field_value = ID_AA64MMFR2_EL1_VARange_52,
+	},
+#endif
 	{},
 };
 
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 545b5d8976f4..e25351addfd0 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -80,7 +80,6 @@
 	 *  x19        primary_entry() .. start_kernel()        whether we entered with the MMU on
 	 *  x20        primary_entry() .. __primary_switch()    CPU boot mode
 	 *  x21        primary_entry() .. start_kernel()        FDT pointer passed at boot in x0
-	 *  x25        primary_entry() .. start_kernel()        supported VA size
 	 */
 SYM_CODE_START(primary_entry)
 	bl	record_mmu_state
@@ -125,14 +124,6 @@ SYM_CODE_START(primary_entry)
 	 * On return, the CPU will be ready for the MMU to be turned on and
 	 * the TCR will have been set.
 	 */
-#if VA_BITS > 48
-	mrs_s	x0, SYS_ID_AA64MMFR2_EL1
-	tst	x0, ID_AA64MMFR2_EL1_VARange_MASK
-	mov	x0, #VA_BITS
-	mov	x25, #VA_BITS_MIN
-	csel	x25, x25, x0, eq
-	mov	x0, x25
-#endif
 	bl	__cpu_setup			// initialise processor
 	b	__primary_switch
 SYM_CODE_END(primary_entry)
@@ -242,11 +233,6 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 	mov	x0, x20
 	bl	set_cpu_boot_mode_flag
 
-#if VA_BITS > 48
-	adr_l	x8, vabits_actual		// Set this early so KASAN early init
-	str	x25, [x8]			// ... observes the correct value
-	dc	civac, x8			// Make visible to booting secondaries
-#endif
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 	bl	kasan_early_init
 #endif
@@ -376,10 +362,13 @@ SYM_FUNC_START_LOCAL(secondary_startup)
 	 * Common entry point for secondary CPUs.
 	 */
 	mov	x20, x0				// preserve boot mode
+
+#ifdef CONFIG_ARM64_VA_BITS_52
+alternative_if ARM64_HAS_VA52
 	bl	__cpu_secondary_check52bitva
-#if VA_BITS > 48
-	ldr_l	x0, vabits_actual
+alternative_else_nop_endif
 #endif
+
 	bl	__cpu_setup			// initialise processor
 	adrp	x1, swapper_pg_dir
 	adrp	x2, idmap_pg_dir
@@ -482,12 +471,8 @@ SYM_FUNC_START(__enable_mmu)
 	ret
 SYM_FUNC_END(__enable_mmu)
 
+#ifdef CONFIG_ARM64_VA_BITS_52
 SYM_FUNC_START(__cpu_secondary_check52bitva)
-#if VA_BITS > 48
-	ldr_l	x0, vabits_actual
-	cmp	x0, #52
-	b.ne	2f
-
 	mrs_s	x0, SYS_ID_AA64MMFR2_EL1
 	and	x0, x0, ID_AA64MMFR2_EL1_VARange_MASK
 	cbnz	x0, 2f
@@ -498,9 +483,9 @@ SYM_FUNC_START(__cpu_secondary_check52bitva)
 	wfi
 	b	1b
 
-#endif
 2:	ret
 SYM_FUNC_END(__cpu_secondary_check52bitva)
+#endif
 
 SYM_FUNC_START_LOCAL(__no_granule_support)
 	/* Indicate that this CPU can't boot and is stuck in the kernel */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index e140c5bda90b..2b9d702abe0f 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -36,7 +36,6 @@ PROVIDE(__pi___memcpy			= __pi_memcpy);
 PROVIDE(__pi___memmove			= __pi_memmove);
 PROVIDE(__pi___memset			= __pi_memset);
 
-PROVIDE(__pi_vabits_actual		= vabits_actual);
 PROVIDE(__pi_id_aa64isar1_override	= id_aa64isar1_override);
 PROVIDE(__pi_id_aa64isar2_override	= id_aa64isar2_override);
 PROVIDE(__pi_id_aa64mmfr1_override	= id_aa64mmfr1_override);
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index 4b76a007a50d..1853825aa29d 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -165,6 +165,9 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 	chosen = fdt_path_offset(fdt, chosen_str);
 	init_feature_override(boot_status, fdt, chosen);
 
+	if (VA_BITS > VA_BITS_MIN && cpu_has_lva())
+		sysreg_clear_set(tcr_el1, TCR_T1SZ_MASK, TCR_T1SZ(VA_BITS));
+
 	/*
 	 * The virtual KASLR displacement modulo 2MiB is decided by the
 	 * physical placement of the image, as otherwise, we might not be able
diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 2aa5129d8253..f093cdf71be1 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -102,9 +102,6 @@ SYM_CODE_START(cpu_resume)
 	mov	x0, xzr
 	bl	init_kernel_el
 	mov	x19, x0			// preserve boot mode
-#if VA_BITS > 48
-	ldr_l	x0, vabits_actual
-#endif
 	bl	__cpu_setup
 	/* enable the MMU early - so we can access sleep_save_stash by va */
 	adrp	x1, swapper_pg_dir
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a3d23da92d87..ba00d0205447 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -45,11 +45,6 @@
 #define NO_CONT_MAPPINGS	BIT(1)
 #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
 
-#if VA_BITS > 48
-u64 vabits_actual __ro_after_init = VA_BITS_MIN;
-EXPORT_SYMBOL(vabits_actual);
-#endif
-
 u64 kimage_voffset __ro_after_init;
 EXPORT_SYMBOL(kimage_voffset);
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 55c366dbda8f..d104ddab26a4 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -397,8 +397,6 @@ SYM_FUNC_END(idmap_kpti_install_ng_mappings)
  *
  *	Initialise the processor for turning the MMU on.
  *
- * Input:
- *	x0 - actual number of VA bits (ignored unless VA_BITS > 48)
  * Output:
  *	Return in x0 the value of the SCTLR_EL1 register.
  */
@@ -422,16 +420,17 @@ SYM_FUNC_START(__cpu_setup)
 	mair	.req	x17
 	tcr	.req	x16
 	mov_q	mair, MAIR_EL1_SET
-	mov_q	tcr, TCR_T0SZ(IDMAP_VA_BITS) | TCR_T1SZ(VA_BITS) | TCR_CACHE_FLAGS | \
+	mov_q	tcr, TCR_T0SZ(IDMAP_VA_BITS) | TCR_T1SZ(VA_BITS_MIN) | TCR_CACHE_FLAGS | \
 		     TCR_SMP_FLAGS | TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \
 		     TCR_TBI0 | TCR_A1 | TCR_KASAN_SW_FLAGS | TCR_MTE_FLAGS
 
 	tcr_clear_errata_bits tcr, x9, x5
 
 #ifdef CONFIG_ARM64_VA_BITS_52
-	sub		x9, xzr, x0
-	add		x9, x9, #64
+	mov		x9, #64 - VA_BITS
+alternative_if ARM64_HAS_VA52
 	tcr_set_t1sz	tcr, x9
+alternative_else_nop_endif
 #endif
 
 	/*
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index b912b1409fc0..b370d808b3ec 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -50,6 +50,7 @@ HAS_STAGE2_FWB
 HAS_TCR2
 HAS_TIDCP1
 HAS_TLB_RANGE
+HAS_VA52
 HAS_VIRT_HOST_EXTN
 HAS_WFXT
 HW_DBM
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 26/43] arm64: mm: Add feature override support for LVA
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (24 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 25/43] arm64: mm: Handle LVA support as a CPU feature Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 27/43] arm64: Avoid #define'ing PTE_MAYBE_NG to 0x0 for asm use Ard Biesheuvel
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add support for overriding the VARange field of the MMFR2 CPU ID
register. This permits the associated LVA feature to be overridden early
enough for the boot code that creates the kernel mapping to take it into
account.

Given that LPA2 implies LVA, disabling the latter should disable the
former as well. So override the ID_AA64MMFR0.TGran field of the current
page size as well if it advertises support for 52-bit addressing.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h    | 17 ++++++-----
 arch/arm64/include/asm/cpufeature.h   |  4 +++
 arch/arm64/kernel/cpufeature.c        |  8 +++--
 arch/arm64/kernel/image-vars.h        |  2 ++
 arch/arm64/kernel/pi/idreg-override.c | 31 ++++++++++++++++++++
 5 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 6a467c694039..68a99b116256 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -576,18 +576,21 @@ alternative_endif
 	.endm
 
 /*
- * Offset ttbr1 to allow for 48-bit kernel VAs set with 52-bit PTRS_PER_PGD.
+ * If the kernel is built for 52-bit virtual addressing but the hardware only
+ * supports 48 bits, we cannot program the pgdir address into TTBR1 directly,
+ * but we have to add an offset so that the TTBR1 address corresponds with the
+ * pgdir entry that covers the lowest 48-bit addressable VA.
+ *
  * orr is used as it can cover the immediate value (and is idempotent).
- * In future this may be nop'ed out when dealing with 52-bit kernel VAs.
  * 	ttbr: Value of ttbr to set, modified.
  */
 	.macro	offset_ttbr1, ttbr, tmp
 #ifdef CONFIG_ARM64_VA_BITS_52
-	mrs_s	\tmp, SYS_ID_AA64MMFR2_EL1
-	and	\tmp, \tmp, #(0xf << ID_AA64MMFR2_EL1_VARange_SHIFT)
-	cbnz	\tmp, .Lskipoffs_\@
-	orr	\ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
-.Lskipoffs_\@ :
+	mrs	\tmp, tcr_el1
+	and	\tmp, \tmp, #TCR_T1SZ_MASK
+	cmp	\tmp, #TCR_T1SZ(VA_BITS_MIN)
+	orr	\tmp, \ttbr, #TTBR1_BADDR_4852_OFFSET
+	csel	\ttbr, \tmp, \ttbr, eq
 #endif
 	.endm
 
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 4f4dc5496ee3..a2ac31aecdd9 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -906,7 +906,9 @@ static inline unsigned int get_vmid_bits(u64 mmfr1)
 s64 arm64_ftr_safe_value(const struct arm64_ftr_bits *ftrp, s64 new, s64 cur);
 struct arm64_ftr_reg *get_arm64_ftr_reg(u32 sys_id);
 
+extern struct arm64_ftr_override id_aa64mmfr0_override;
 extern struct arm64_ftr_override id_aa64mmfr1_override;
+extern struct arm64_ftr_override id_aa64mmfr2_override;
 extern struct arm64_ftr_override id_aa64pfr0_override;
 extern struct arm64_ftr_override id_aa64pfr1_override;
 extern struct arm64_ftr_override id_aa64zfr0_override;
@@ -1000,6 +1002,8 @@ static inline bool cpu_has_lva(void)
 	u64 mmfr2;
 
 	mmfr2 = read_sysreg_s(SYS_ID_AA64MMFR2_EL1);
+	mmfr2 &= ~id_aa64mmfr2_override.mask;
+	mmfr2 |= id_aa64mmfr2_override.val;
 	return cpuid_feature_extract_unsigned_field(mmfr2,
 						    ID_AA64MMFR2_EL1_VARange_SHIFT);
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 8eb8c7f7b317..ed9670d8360c 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -655,7 +655,9 @@ static const struct arm64_ftr_bits ftr_raz[] = {
 #define ARM64_FTR_REG(id, table)		\
 	__ARM64_FTR_REG_OVERRIDE(#id, id, table, &no_override)
 
+struct arm64_ftr_override id_aa64mmfr0_override;
 struct arm64_ftr_override id_aa64mmfr1_override;
+struct arm64_ftr_override id_aa64mmfr2_override;
 struct arm64_ftr_override id_aa64pfr0_override;
 struct arm64_ftr_override id_aa64pfr1_override;
 struct arm64_ftr_override id_aa64zfr0_override;
@@ -719,10 +721,12 @@ static const struct __ftr_reg_entry {
 			       &id_aa64isar2_override),
 
 	/* Op1 = 0, CRn = 0, CRm = 7 */
-	ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0),
+	ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0,
+			       &id_aa64mmfr0_override),
 	ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1,
 			       &id_aa64mmfr1_override),
-	ARM64_FTR_REG(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2),
+	ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2,
+			       &id_aa64mmfr2_override),
 	ARM64_FTR_REG(SYS_ID_AA64MMFR3_EL1, ftr_id_aa64mmfr3),
 
 	/* Op1 = 1, CRn = 0, CRm = 0 */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 2b9d702abe0f..ff81f809a240 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -38,7 +38,9 @@ PROVIDE(__pi___memset			= __pi_memset);
 
 PROVIDE(__pi_id_aa64isar1_override	= id_aa64isar1_override);
 PROVIDE(__pi_id_aa64isar2_override	= id_aa64isar2_override);
+PROVIDE(__pi_id_aa64mmfr0_override	= id_aa64mmfr0_override);
 PROVIDE(__pi_id_aa64mmfr1_override	= id_aa64mmfr1_override);
+PROVIDE(__pi_id_aa64mmfr2_override	= id_aa64mmfr2_override);
 PROVIDE(__pi_id_aa64pfr0_override	= id_aa64pfr0_override);
 PROVIDE(__pi_id_aa64pfr1_override	= id_aa64pfr1_override);
 PROVIDE(__pi_id_aa64smfr0_override	= id_aa64smfr0_override);
diff --git a/arch/arm64/kernel/pi/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
index 1884bd936c0d..aad399796e81 100644
--- a/arch/arm64/kernel/pi/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -59,6 +59,35 @@ static const struct ftr_set_desc mmfr1 __prel64_initconst = {
 	},
 };
 
+
+static bool __init mmfr2_varange_filter(u64 val)
+{
+	int __maybe_unused feat;
+
+	if (val)
+		return false;
+
+#ifdef CONFIG_ARM64_LPA2
+	feat = cpuid_feature_extract_signed_field(read_sysreg(id_aa64mmfr0_el1),
+						  ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+	if (feat >= ID_AA64MMFR0_EL1_TGRAN_LPA2) {
+		id_aa64mmfr0_override.val |=
+			(ID_AA64MMFR0_EL1_TGRAN_LPA2 - 1) << ID_AA64MMFR0_EL1_TGRAN_SHIFT;
+		id_aa64mmfr0_override.mask |= 0xfU << ID_AA64MMFR0_EL1_TGRAN_SHIFT;
+	}
+#endif
+	return true;
+}
+
+static const struct ftr_set_desc mmfr2 __prel64_initconst = {
+	.name		= "id_aa64mmfr2",
+	.override	= &id_aa64mmfr2_override,
+	.fields		= {
+		FIELD("varange", ID_AA64MMFR2_EL1_VARange_SHIFT, mmfr2_varange_filter),
+		{}
+	},
+};
+
 static bool __init pfr0_sve_filter(u64 val)
 {
 	/*
@@ -167,6 +196,7 @@ static const struct ftr_set_desc sw_features __prel64_initconst = {
 static const
 PREL64(const struct ftr_set_desc, reg) regs[] __prel64_initconst = {
 	{ &mmfr1	},
+	{ &mmfr2	},
 	{ &pfr0 	},
 	{ &pfr1 	},
 	{ &isar1	},
@@ -192,6 +222,7 @@ static const struct {
 	{ "arm64.nomte",		"id_aa64pfr1.mte=0" },
 	{ "nokaslr",			"arm64_sw.nokaslr=1" },
 	{ "rodata=off",			"arm64_sw.rodataoff=1" },
+	{ "arm64.nolva",		"id_aa64mmfr2.varange=0" },
 };
 
 static int __init parse_hexdigit(const char *p, u64 *v)
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 27/43] arm64: Avoid #define'ing PTE_MAYBE_NG to 0x0 for asm use
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (25 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 26/43] arm64: mm: Add feature override support for LVA Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 28/43] arm64: Add ESR decoding for exceptions involving translation level -1 Ard Biesheuvel
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The PROT_* macros resolve to expressions that are only valid in C and
not in assembler, and so they are only usable from C code. Currently, we
make an exception for the permission indirection init code in proc.S,
which doesn't care about the bits that are conditionally set, and so we
just #define PTE_MAYBE_NG to 0x0 for any assembler file that includes
these definitions.

This is dodgy because this means that PROT_NORMAL and friends is
generally available in asm code, but defined in a way that deviates from
the definition that C code will observe, which might lead to hard to
diagnose issues down the road.

So instead, #define PTE_MAYBE_NG only in the place where the PIE
constants are evaluated, and #undef it again right after. This allows us
to drop the #define from pgtable-prot.h, and avoid the risk of deviating
definitions between asm and C.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgtable-prot.h |  4 ----
 arch/arm64/mm/proc.S                  | 13 +++++++++++++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 483dbfa39c4c..63ced9ccec21 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -57,10 +57,6 @@
 #define _PAGE_READONLY_EXEC	(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN)
 #define _PAGE_EXECONLY		(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN)
 
-#ifdef __ASSEMBLY__
-#define PTE_MAYBE_NG	0
-#endif
-
 #ifndef __ASSEMBLY__
 
 #include <asm/cpufeature.h>
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index d104ddab26a4..6e1b2bc41a9f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -456,11 +456,24 @@ alternative_else_nop_endif
 	ubfx	x1, x1, #ID_AA64MMFR3_EL1_S1PIE_SHIFT, #4
 	cbz	x1, .Lskip_indirection
 
+	/*
+	 * The PROT_* macros describing the various memory types may resolve to
+	 * C expressions if they include the PTE_MAYBE_* macros, and so they
+	 * can only be used from C code. The PIE_E* constants below are also
+	 * defined in terms of those macros, but will mask out those
+	 * PTE_MAYBE_* constants, whether they are set or not. So #define them
+	 * as 0x0 here so we can evaluate the PIE_E* constants in asm context.
+	 */
+
+#define PTE_MAYBE_NG		0
+
 	mov_q	x0, PIE_E0
 	msr	REG_PIRE0_EL1, x0
 	mov_q	x0, PIE_E1
 	msr	REG_PIR_EL1, x0
 
+#undef PTE_MAYBE_NG
+
 	mov	x0, TCR2_EL1x_PIE
 	msr	REG_TCR2_EL1, x0
 
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 28/43] arm64: Add ESR decoding for exceptions involving translation level -1
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (26 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 27/43] arm64: Avoid #define'ing PTE_MAYBE_NG to 0x0 for asm use Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 29/43] arm64: mm: Wire up TCR.DS bit to PTE shareability fields Ard Biesheuvel
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The LPA2 feature introduces new FSC values to report abort exceptions
related to translation level -1. Define these and wire them up.

Reuse the new ESR FSC classification helpers that arrived via the KVM
arm64 tree, and update the one for translation faults to check
specifically for a translation fault at level -1. (Access flag or
permission faults cannot occur at level -1 because they alway involve a
descriptor at the superior level so changing those helpers is not
needed).

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/esr.h         | 13 ++++-----
 arch/arm64/include/asm/kvm_emulate.h | 10 ++-----
 arch/arm64/mm/fault.c                | 30 +++++++-------------
 3 files changed, 18 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 353fe08546cf..81606bf7d5ac 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -117,15 +117,9 @@
 #define ESR_ELx_FSC_ACCESS	(0x08)
 #define ESR_ELx_FSC_FAULT	(0x04)
 #define ESR_ELx_FSC_PERM	(0x0C)
-#define ESR_ELx_FSC_SEA_TTW0	(0x14)
-#define ESR_ELx_FSC_SEA_TTW1	(0x15)
-#define ESR_ELx_FSC_SEA_TTW2	(0x16)
-#define ESR_ELx_FSC_SEA_TTW3	(0x17)
+#define ESR_ELx_FSC_SEA_TTW(n)	(0x14 + (n))
 #define ESR_ELx_FSC_SECC	(0x18)
-#define ESR_ELx_FSC_SECC_TTW0	(0x1c)
-#define ESR_ELx_FSC_SECC_TTW1	(0x1d)
-#define ESR_ELx_FSC_SECC_TTW2	(0x1e)
-#define ESR_ELx_FSC_SECC_TTW3	(0x1f)
+#define ESR_ELx_FSC_SECC_TTW(n)	(0x1c + (n))
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT	(24)
@@ -394,6 +388,9 @@ static inline bool esr_is_data_abort(unsigned long esr)
 
 static inline bool esr_fsc_is_translation_fault(unsigned long esr)
 {
+	/* Translation fault, level -1 */
+	if ((esr & ESR_ELx_FSC) == 0b101011)
+		return true;
 	return (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_FAULT;
 }
 
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index b804fe832184..6f5b41c70103 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -425,15 +425,9 @@ static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
 {
 	switch (kvm_vcpu_trap_get_fault(vcpu)) {
 	case ESR_ELx_FSC_EXTABT:
-	case ESR_ELx_FSC_SEA_TTW0:
-	case ESR_ELx_FSC_SEA_TTW1:
-	case ESR_ELx_FSC_SEA_TTW2:
-	case ESR_ELx_FSC_SEA_TTW3:
+	case ESR_ELx_FSC_SEA_TTW(-1) ... ESR_ELx_FSC_SEA_TTW(3):
 	case ESR_ELx_FSC_SECC:
-	case ESR_ELx_FSC_SECC_TTW0:
-	case ESR_ELx_FSC_SECC_TTW1:
-	case ESR_ELx_FSC_SECC_TTW2:
-	case ESR_ELx_FSC_SECC_TTW3:
+	case ESR_ELx_FSC_SECC_TTW(-1) ... ESR_ELx_FSC_SECC_TTW(3):
 		return true;
 	default:
 		return false;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 55f6455a8284..60265ede48fe 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -257,16 +257,14 @@ static bool is_el1_data_abort(unsigned long esr)
 static inline bool is_el1_permission_fault(unsigned long addr, unsigned long esr,
 					   struct pt_regs *regs)
 {
-	unsigned long fsc_type = esr & ESR_ELx_FSC_TYPE;
-
 	if (!is_el1_data_abort(esr) && !is_el1_instruction_abort(esr))
 		return false;
 
-	if (fsc_type == ESR_ELx_FSC_PERM)
+	if (esr_fsc_is_permission_fault(esr))
 		return true;
 
 	if (is_ttbr0_addr(addr) && system_uses_ttbr0_pan())
-		return fsc_type == ESR_ELx_FSC_FAULT &&
+		return esr_fsc_is_translation_fault(esr) &&
 			(regs->pstate & PSR_PAN_BIT);
 
 	return false;
@@ -279,8 +277,7 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
 	unsigned long flags;
 	u64 par, dfsc;
 
-	if (!is_el1_data_abort(esr) ||
-	    (esr & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_FAULT)
+	if (!is_el1_data_abort(esr) || !esr_fsc_is_translation_fault(esr))
 		return false;
 
 	local_irq_save(flags);
@@ -301,7 +298,7 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
 	 * treat the translation fault as spurious.
 	 */
 	dfsc = FIELD_GET(SYS_PAR_EL1_FST, par);
-	return (dfsc & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_FAULT;
+	return !esr_fsc_is_translation_fault(dfsc);
 }
 
 static void die_kernel_fault(const char *msg, unsigned long addr,
@@ -368,11 +365,6 @@ static bool is_el1_mte_sync_tag_check_fault(unsigned long esr)
 	return false;
 }
 
-static bool is_translation_fault(unsigned long esr)
-{
-	return (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_FAULT;
-}
-
 static void __do_kernel_fault(unsigned long addr, unsigned long esr,
 			      struct pt_regs *regs)
 {
@@ -405,7 +397,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
 	} else if (addr < PAGE_SIZE) {
 		msg = "NULL pointer dereference";
 	} else {
-		if (is_translation_fault(esr) &&
+		if (esr_fsc_is_translation_fault(esr) &&
 		    kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
 			return;
 
@@ -782,18 +774,18 @@ static const struct fault_info fault_info[] = {
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 1 translation fault"	},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 2 translation fault"	},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 3 translation fault"	},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 8"			},
+	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 0 access flag fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 access flag fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 access flag fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 access flag fault"	},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 12"			},
+	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 0 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
 	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous external abort"	},
 	{ do_tag_check_fault,	SIGSEGV, SEGV_MTESERR,	"synchronous tag check fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 18"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 19"			},
+	{ do_sea,		SIGKILL, SI_KERNEL,	"level -1 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 (translation table walk)"	},
@@ -801,7 +793,7 @@ static const struct fault_info fault_info[] = {
 	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 25"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 26"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 27"			},
+	{ do_sea,		SIGKILL, SI_KERNEL,	"level -1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
@@ -815,9 +807,9 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 38"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 39"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 40"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 41"			},
+	{ do_bad,		SIGKILL, SI_KERNEL,	"level -1 address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 42"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 43"			},
+	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level -1 translation fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 44"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 45"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 46"			},
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 29/43] arm64: mm: Wire up TCR.DS bit to PTE shareability fields
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (27 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 28/43] arm64: Add ESR decoding for exceptions involving translation level -1 Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 30/43] arm64: mm: Add LPA2 support to phys<->pte conversion routines Ard Biesheuvel
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

When LPA2 is enabled, bits 8 and 9 of page and block descriptors become
part of the output address instead of carrying shareability attributes
for the region in question.

So avoid setting these bits if TCR.DS == 1, which means LPA2 is enabled.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig                     |  4 ++++
 arch/arm64/include/asm/pgtable-hwdef.h |  1 +
 arch/arm64/include/asm/pgtable-prot.h  | 16 ++++++++++++++--
 arch/arm64/mm/mmap.c                   |  4 ++++
 arch/arm64/mm/proc.S                   |  2 ++
 5 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index aa7c1d435139..8c2c36fffcf5 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1377,6 +1377,10 @@ config ARM64_PA_BITS
 	default 48 if ARM64_PA_BITS_48
 	default 52 if ARM64_PA_BITS_52
 
+config ARM64_LPA2
+	def_bool y
+	depends on ARM64_PA_BITS_52 && !ARM64_64K_PAGES
+
 choice
 	prompt "Endianness"
 	default CPU_LITTLE_ENDIAN
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index e4944d517c99..b770f98fc0b5 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -284,6 +284,7 @@
 #define TCR_E0PD1		(UL(1) << 56)
 #define TCR_TCMA0		(UL(1) << 57)
 #define TCR_TCMA1		(UL(1) << 58)
+#define TCR_DS			(UL(1) << 59)
 
 /*
  * TTBR.
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 63ced9ccec21..dd9ee67d1d87 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -30,8 +30,8 @@
 #define _PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
 #define _PROT_SECT_DEFAULT	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
 
-#define PROT_DEFAULT		(_PROT_DEFAULT | PTE_MAYBE_NG)
-#define PROT_SECT_DEFAULT	(_PROT_SECT_DEFAULT | PMD_MAYBE_NG)
+#define PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_MAYBE_NG | PTE_MAYBE_SHARED | PTE_AF)
+#define PROT_SECT_DEFAULT	(PMD_TYPE_SECT | PMD_MAYBE_NG | PMD_MAYBE_SHARED | PMD_SECT_AF)
 
 #define PROT_DEVICE_nGnRnE	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRnE))
 #define PROT_DEVICE_nGnRE	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRE))
@@ -67,7 +67,19 @@ extern bool arm64_use_ng_mappings;
 #define PTE_MAYBE_NG		(arm64_use_ng_mappings ? PTE_NG : 0)
 #define PMD_MAYBE_NG		(arm64_use_ng_mappings ? PMD_SECT_NG : 0)
 
+#ifndef CONFIG_ARM64_LPA2
 #define lpa2_is_enabled()	false
+#define PTE_MAYBE_SHARED	PTE_SHARED
+#define PMD_MAYBE_SHARED	PMD_SECT_S
+#else
+static inline bool __pure lpa2_is_enabled(void)
+{
+	return read_tcr() & TCR_DS;
+}
+
+#define PTE_MAYBE_SHARED	(lpa2_is_enabled() ? 0 : PTE_SHARED)
+#define PMD_MAYBE_SHARED	(lpa2_is_enabled() ? 0 : PMD_SECT_S)
+#endif
 
 /*
  * If we have userspace only BTI we don't want to mark kernel pages
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 645fe60d000f..642bdf908b22 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -73,6 +73,10 @@ static int __init adjust_protection_map(void)
 		protection_map[VM_EXEC | VM_SHARED] = PAGE_EXECONLY;
 	}
 
+	if (lpa2_is_enabled())
+		for (int i = 0; i < ARRAY_SIZE(protection_map); i++)
+			pgprot_val(protection_map[i]) &= ~PTE_SHARED;
+
 	return 0;
 }
 arch_initcall(adjust_protection_map);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 6e1b2bc41a9f..7c46f8cfd6ae 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -466,6 +466,7 @@ alternative_else_nop_endif
 	 */
 
 #define PTE_MAYBE_NG		0
+#define PTE_MAYBE_SHARED	0
 
 	mov_q	x0, PIE_E0
 	msr	REG_PIRE0_EL1, x0
@@ -473,6 +474,7 @@ alternative_else_nop_endif
 	msr	REG_PIR_EL1, x0
 
 #undef PTE_MAYBE_NG
+#undef PTE_MAYBE_SHARED
 
 	mov	x0, TCR2_EL1x_PIE
 	msr	REG_TCR2_EL1, x0
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 30/43] arm64: mm: Add LPA2 support to phys<->pte conversion routines
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (28 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 29/43] arm64: mm: Wire up TCR.DS bit to PTE shareability fields Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 31/43] arm64: mm: Add definitions to support 5 levels of paging Ard Biesheuvel
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In preparation for enabling LPA2 support, introduce the mask values for
converting between physical addresses and their representations in a
page table descriptor.

While at it, move the pte_to_phys asm macro into its only user, so that
we can freely modify it to use its input value register as a temp
register.

For LPA2, the PTE_ADDR_MASK contains two non-adjacent sequences of zero
bits, which means it no longer fits into the immediate field of an
ordinary ALU instruction. So let's redefine it to include the bits in
between as well, and only use it when converting from physical address
to PTE representation, where the distinction does not matter. Also
update the name accordingly to emphasize this.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h     | 16 ++--------------
 arch/arm64/include/asm/pgtable-hwdef.h | 10 +++++++---
 arch/arm64/include/asm/pgtable.h       |  5 +++--
 arch/arm64/mm/proc.S                   |  8 ++++++++
 4 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 68a99b116256..7eedcb36ebe0 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -612,25 +612,13 @@ alternative_endif
 
 	.macro	phys_to_pte, pte, phys
 #ifdef CONFIG_ARM64_PA_BITS_52
-	/*
-	 * We assume \phys is 64K aligned and this is guaranteed by only
-	 * supporting this configuration with 64K pages.
-	 */
-	orr	\pte, \phys, \phys, lsr #36
-	and	\pte, \pte, #PTE_ADDR_MASK
+	orr	\pte, \phys, \phys, lsr #PTE_ADDR_HIGH_SHIFT
+	and	\pte, \pte, #PHYS_TO_PTE_ADDR_MASK
 #else
 	mov	\pte, \phys
 #endif
 	.endm
 
-	.macro	pte_to_phys, phys, pte
-	and	\phys, \pte, #PTE_ADDR_MASK
-#ifdef CONFIG_ARM64_PA_BITS_52
-	orr	\phys, \phys, \phys, lsl #PTE_ADDR_HIGH_SHIFT
-	and	\phys, \phys, GENMASK_ULL(PHYS_MASK_SHIFT - 1, PAGE_SHIFT)
-#endif
-	.endm
-
 /*
  * tcr_clear_errata_bits - Clear TCR bits that trigger an errata on this CPU.
  */
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index b770f98fc0b5..4426f48f2ae0 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -155,13 +155,17 @@
 #define PTE_PXN			(_AT(pteval_t, 1) << 53)	/* Privileged XN */
 #define PTE_UXN			(_AT(pteval_t, 1) << 54)	/* User XN */
 
-#define PTE_ADDR_LOW		(((_AT(pteval_t, 1) << (48 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
+#define PTE_ADDR_LOW		(((_AT(pteval_t, 1) << (50 - PAGE_SHIFT)) - 1) << PAGE_SHIFT)
 #ifdef CONFIG_ARM64_PA_BITS_52
+#ifdef CONFIG_ARM64_64K_PAGES
 #define PTE_ADDR_HIGH		(_AT(pteval_t, 0xf) << 12)
-#define PTE_ADDR_MASK		(PTE_ADDR_LOW | PTE_ADDR_HIGH)
 #define PTE_ADDR_HIGH_SHIFT	36
+#define PHYS_TO_PTE_ADDR_MASK	(PTE_ADDR_LOW | PTE_ADDR_HIGH)
 #else
-#define PTE_ADDR_MASK		PTE_ADDR_LOW
+#define PTE_ADDR_HIGH		(_AT(pteval_t, 0x3) << 8)
+#define PTE_ADDR_HIGH_SHIFT	42
+#define PHYS_TO_PTE_ADDR_MASK	GENMASK_ULL(49, 8)
+#endif
 #endif
 
 /*
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 522c21348ae8..61de7b1516bc 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -80,15 +80,16 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 #ifdef CONFIG_ARM64_PA_BITS_52
 static inline phys_addr_t __pte_to_phys(pte_t pte)
 {
+	pte_val(pte) &= ~PTE_MAYBE_SHARED;
 	return (pte_val(pte) & PTE_ADDR_LOW) |
 		((pte_val(pte) & PTE_ADDR_HIGH) << PTE_ADDR_HIGH_SHIFT);
 }
 static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
 {
-	return (phys | (phys >> PTE_ADDR_HIGH_SHIFT)) & PTE_ADDR_MASK;
+	return (phys | (phys >> PTE_ADDR_HIGH_SHIFT)) & PHYS_TO_PTE_ADDR_MASK;
 }
 #else
-#define __pte_to_phys(pte)	(pte_val(pte) & PTE_ADDR_MASK)
+#define __pte_to_phys(pte)	(pte_val(pte) & PTE_ADDR_LOW)
 #define __phys_to_pte_val(phys)	(phys)
 #endif
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 7c46f8cfd6ae..d03434b7bca5 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -205,6 +205,14 @@ SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 
 	.pushsection ".idmap.text", "a"
 
+	.macro	pte_to_phys, phys, pte
+	and	\phys, \pte, #PTE_ADDR_LOW
+#ifdef CONFIG_ARM64_PA_BITS_52
+	and	\pte, \pte, #PTE_ADDR_HIGH
+	orr	\phys, \phys, \pte, lsl #PTE_ADDR_HIGH_SHIFT
+#endif
+	.endm
+
 	.macro	kpti_mk_tbl_ng, type, num_entries
 	add	end_\type\()p, cur_\type\()p, #\num_entries * 8
 .Ldo_\type:
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 31/43] arm64: mm: Add definitions to support 5 levels of paging
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (29 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 30/43] arm64: mm: Add LPA2 support to phys<->pte conversion routines Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 32/43] arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion Ard Biesheuvel
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add the required types and descriptor accessors to support 5 levels of
paging in the common code. This is one of the prerequisites for
supporting 52-bit virtual addressing with 4k pages.

Note that this does not cover the code that handles kernel mappings or
the fixmap.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgalloc.h       | 41 ++++++++++
 arch/arm64/include/asm/pgtable-hwdef.h | 22 +++++-
 arch/arm64/include/asm/pgtable-types.h |  6 ++
 arch/arm64/include/asm/pgtable.h       | 82 +++++++++++++++++++-
 arch/arm64/mm/mmu.c                    | 31 +++++++-
 arch/arm64/mm/pgd.c                    | 15 +++-
 6 files changed, 188 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 237224484d0f..cae8c648f462 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -60,6 +60,47 @@ static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
 }
 #endif	/* CONFIG_PGTABLE_LEVELS > 3 */
 
+#if CONFIG_PGTABLE_LEVELS > 4
+
+static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
+{
+	if (pgtable_l5_enabled())
+		set_pgd(pgdp, __pgd(__phys_to_pgd_val(p4dp) | prot));
+}
+
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgdp, p4d_t *p4dp)
+{
+	pgdval_t pgdval = PGD_TYPE_TABLE;
+
+	pgdval |= (mm == &init_mm) ? PGD_TABLE_UXN : PGD_TABLE_PXN;
+	__pgd_populate(pgdp, __pa(p4dp), pgdval);
+}
+
+static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+	gfp_t gfp = GFP_PGTABLE_USER;
+
+	if (mm == &init_mm)
+		gfp = GFP_PGTABLE_KERNEL;
+	return (p4d_t *)get_zeroed_page(gfp);
+}
+
+static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
+{
+	if (!pgtable_l5_enabled())
+		return;
+	BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
+	free_page((unsigned long)p4d);
+}
+
+#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
+#else
+static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
+{
+	BUILD_BUG();
+}
+#endif	/* CONFIG_PGTABLE_LEVELS > 4 */
+
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 extern void pgd_free(struct mm_struct *mm, pgd_t *pgdp);
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 4426f48f2ae0..ef207a0d4f0d 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -26,10 +26,10 @@
 #define ARM64_HW_PGTABLE_LEVELS(va_bits) (((va_bits) - 4) / (PAGE_SHIFT - 3))
 
 /*
- * Size mapped by an entry at level n ( 0 <= n <= 3)
+ * Size mapped by an entry at level n ( -1 <= n <= 3)
  * We map (PAGE_SHIFT - 3) at all translation levels and PAGE_SHIFT bits
  * in the final page. The maximum number of translation levels supported by
- * the architecture is 4. Hence, starting at level n, we have further
+ * the architecture is 5. Hence, starting at level n, we have further
  * ((4 - n) - 1) levels of translation excluding the offset within the page.
  * So, the total number of bits mapped by an entry at level n is :
  *
@@ -62,9 +62,16 @@
 #define PTRS_PER_PUD		(1 << (PAGE_SHIFT - 3))
 #endif
 
+#if CONFIG_PGTABLE_LEVELS > 4
+#define P4D_SHIFT		ARM64_HW_PGTABLE_LEVEL_SHIFT(0)
+#define P4D_SIZE		(_AC(1, UL) << P4D_SHIFT)
+#define P4D_MASK		(~(P4D_SIZE-1))
+#define PTRS_PER_P4D		(1 << (PAGE_SHIFT - 3))
+#endif
+
 /*
  * PGDIR_SHIFT determines the size a top-level page table entry can map
- * (depending on the configuration, this level can be 0, 1 or 2).
+ * (depending on the configuration, this level can be -1, 0, 1 or 2).
  */
 #define PGDIR_SHIFT		ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - CONFIG_PGTABLE_LEVELS)
 #define PGDIR_SIZE		(_AC(1, UL) << PGDIR_SHIFT)
@@ -87,6 +94,15 @@
 /*
  * Hardware page table definitions.
  *
+ * Level -1 descriptor (PGD).
+ */
+#define PGD_TYPE_TABLE		(_AT(pgdval_t, 3) << 0)
+#define PGD_TABLE_BIT		(_AT(pgdval_t, 1) << 1)
+#define PGD_TYPE_MASK		(_AT(pgdval_t, 3) << 0)
+#define PGD_TABLE_PXN		(_AT(pgdval_t, 1) << 59)
+#define PGD_TABLE_UXN		(_AT(pgdval_t, 1) << 60)
+
+/*
  * Level 0 descriptor (P4D).
  */
 #define P4D_TYPE_TABLE		(_AT(p4dval_t, 3) << 0)
diff --git a/arch/arm64/include/asm/pgtable-types.h b/arch/arm64/include/asm/pgtable-types.h
index b8f158ae2527..6d6d4065b0cb 100644
--- a/arch/arm64/include/asm/pgtable-types.h
+++ b/arch/arm64/include/asm/pgtable-types.h
@@ -36,6 +36,12 @@ typedef struct { pudval_t pud; } pud_t;
 #define __pud(x)	((pud_t) { (x) } )
 #endif
 
+#if CONFIG_PGTABLE_LEVELS > 4
+typedef struct { p4dval_t p4d; } p4d_t;
+#define p4d_val(x)	((x).p4d)
+#define __p4d(x)	((p4d_t) { (x) } )
+#endif
+
 typedef struct { pgdval_t pgd; } pgd_t;
 #define pgd_val(x)	((x).pgd)
 #define __pgd(x)	((pgd_t) { (x) } )
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 61de7b1516bc..7eb2b933ed3c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -808,7 +808,6 @@ static inline pud_t *p4d_pgtable(p4d_t p4d)
 #else
 
 #define p4d_page_paddr(p4d)	({ BUILD_BUG(); 0;})
-#define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
 
 /* Match pud_offset folding in <asm/generic/pgtable-nopud.h> */
 #define pud_set_fixmap(addr)		NULL
@@ -819,6 +818,87 @@ static inline pud_t *p4d_pgtable(p4d_t p4d)
 
 #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
 
+#if CONFIG_PGTABLE_LEVELS > 4
+
+static __always_inline bool pgtable_l5_enabled(void)
+{
+	if (!alternative_has_cap_likely(ARM64_ALWAYS_BOOT))
+		return vabits_actual == VA_BITS;
+	return alternative_has_cap_unlikely(ARM64_HAS_VA52);
+}
+
+static inline bool mm_p4d_folded(const struct mm_struct *mm)
+{
+	return !pgtable_l5_enabled();
+}
+#define mm_p4d_folded  mm_p4d_folded
+
+#define p4d_ERROR(e)	\
+	pr_err("%s:%d: bad p4d %016llx.\n", __FILE__, __LINE__, p4d_val(e))
+
+#define pgd_none(pgd)		(pgtable_l5_enabled() && !pgd_val(pgd))
+#define pgd_bad(pgd)		(pgtable_l5_enabled() && !(pgd_val(pgd) & 2))
+#define pgd_present(pgd)	(!pgd_none(pgd))
+
+static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	if (in_swapper_pgdir(pgdp)) {
+		set_swapper_pgd(pgdp, __pgd(pgd_val(pgd)));
+		return;
+	}
+
+	WRITE_ONCE(*pgdp, pgd);
+	dsb(ishst);
+	isb();
+}
+
+static inline void pgd_clear(pgd_t *pgdp)
+{
+	if (pgtable_l5_enabled())
+		set_pgd(pgdp, __pgd(0));
+}
+
+static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
+{
+	return __pgd_to_phys(pgd);
+}
+
+#define p4d_index(addr)		(((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1))
+
+static inline p4d_t *pgd_to_folded_p4d(pgd_t *pgdp, unsigned long addr)
+{
+	return (p4d_t *)PTR_ALIGN_DOWN(pgdp, PAGE_SIZE) + p4d_index(addr);
+}
+
+static inline phys_addr_t p4d_offset_phys(pgd_t *pgdp, unsigned long addr)
+{
+	BUG_ON(!pgtable_l5_enabled());
+
+	return pgd_page_paddr(READ_ONCE(*pgdp)) + p4d_index(addr) * sizeof(p4d_t);
+}
+
+static inline
+p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long addr)
+{
+	if (!pgtable_l5_enabled())
+		return pgd_to_folded_p4d(pgdp, addr);
+	return (p4d_t *)__va(pgd_page_paddr(pgd)) + p4d_index(addr);
+}
+#define p4d_offset_lockless p4d_offset_lockless
+
+static inline p4d_t *p4d_offset(pgd_t *pgdp, unsigned long addr)
+{
+	return p4d_offset_lockless(pgdp, READ_ONCE(*pgdp), addr);
+}
+
+#define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(__pgd_to_phys(pgd)))
+
+#else
+
+static inline bool pgtable_l5_enabled(void) { return false; }
+
+#endif  /* CONFIG_PGTABLE_LEVELS > 4 */
+
 #define pgd_ERROR(e)	\
 	pr_err("%s:%d: bad pgd %016llx.\n", __FILE__, __LINE__, pgd_val(e))
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ba00d0205447..d2e9dec38a15 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1025,7 +1025,7 @@ static void free_empty_pud_table(p4d_t *p4dp, unsigned long addr,
 	if (CONFIG_PGTABLE_LEVELS <= 3)
 		return;
 
-	if (!pgtable_range_aligned(start, end, floor, ceiling, PGDIR_MASK))
+	if (!pgtable_range_aligned(start, end, floor, ceiling, P4D_MASK))
 		return;
 
 	/*
@@ -1048,8 +1048,8 @@ static void free_empty_p4d_table(pgd_t *pgdp, unsigned long addr,
 				 unsigned long end, unsigned long floor,
 				 unsigned long ceiling)
 {
-	unsigned long next;
 	p4d_t *p4dp, p4d;
+	unsigned long i, next, start = addr;
 
 	do {
 		next = p4d_addr_end(addr, end);
@@ -1061,6 +1061,27 @@ static void free_empty_p4d_table(pgd_t *pgdp, unsigned long addr,
 		WARN_ON(!p4d_present(p4d));
 		free_empty_pud_table(p4dp, addr, next, floor, ceiling);
 	} while (addr = next, addr < end);
+
+	if (!pgtable_l5_enabled())
+		return;
+
+	if (!pgtable_range_aligned(start, end, floor, ceiling, PGDIR_MASK))
+		return;
+
+	/*
+	 * Check whether we can free the p4d page if the rest of the
+	 * entries are empty. Overlap with other regions have been
+	 * handled by the floor/ceiling check.
+	 */
+	p4dp = p4d_offset(pgdp, 0UL);
+	for (i = 0; i < PTRS_PER_P4D; i++) {
+		if (!p4d_none(READ_ONCE(p4dp[i])))
+			return;
+	}
+
+	pgd_clear(pgdp);
+	__flush_tlb_kernel_pgtable(start);
+	free_hotplug_pgtable_page(virt_to_page(p4dp));
 }
 
 static void free_empty_tables(unsigned long addr, unsigned long end,
@@ -1145,6 +1166,12 @@ int pmd_set_huge(pmd_t *pmdp, phys_addr_t phys, pgprot_t prot)
 	return 1;
 }
 
+#ifndef __PAGETABLE_P4D_FOLDED
+void p4d_clear_huge(p4d_t *p4dp)
+{
+}
+#endif
+
 int pud_clear_huge(pud_t *pudp)
 {
 	if (!pud_sect(READ_ONCE(*pudp)))
diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
index 4a64089e5771..3c4f8a279d2b 100644
--- a/arch/arm64/mm/pgd.c
+++ b/arch/arm64/mm/pgd.c
@@ -17,11 +17,20 @@
 
 static struct kmem_cache *pgd_cache __ro_after_init;
 
+static bool pgdir_is_page_size(void)
+{
+	if (PGD_SIZE == PAGE_SIZE)
+		return true;
+	if (CONFIG_PGTABLE_LEVELS == 5)
+		return !pgtable_l5_enabled();
+	return false;
+}
+
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	gfp_t gfp = GFP_PGTABLE_USER;
 
-	if (PGD_SIZE == PAGE_SIZE)
+	if (pgdir_is_page_size())
 		return (pgd_t *)__get_free_page(gfp);
 	else
 		return kmem_cache_alloc(pgd_cache, gfp);
@@ -29,7 +38,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-	if (PGD_SIZE == PAGE_SIZE)
+	if (pgdir_is_page_size())
 		free_page((unsigned long)pgd);
 	else
 		kmem_cache_free(pgd_cache, pgd);
@@ -37,7 +46,7 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 
 void __init pgtable_cache_init(void)
 {
-	if (PGD_SIZE == PAGE_SIZE)
+	if (pgdir_is_page_size())
 		return;
 
 #ifdef CONFIG_ARM64_PA_BITS_52
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 32/43] arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (30 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 31/43] arm64: mm: Add definitions to support 5 levels of paging Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 33/43] arm64: Enable LPA2 at boot if supported by the system Ard Biesheuvel
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add support for 5 level paging in the G-to-nG routine that creates its
own temporary page tables to traverse the swapper page tables. Also add
support for running the 5 level configuration with the top level folded
at runtime, to support CPUs that do not implement the LPA2 extension.

While at it, wire up the level skipping logic so it will also trigger on
4 level configurations with LPA2 enabled at build time but not active at
runtime, as we'll fall back to 3 level paging in that case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kernel/cpufeature.c |  9 ++-
 arch/arm64/mm/proc.S           | 70 +++++++++++++++++---
 2 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index ed9670d8360c..bc5e4e569864 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1765,6 +1765,9 @@ static int __init __kpti_install_ng_mappings(void *__unused)
 	pgd_t *kpti_ng_temp_pgd;
 	u64 alloc = 0;
 
+	if (levels == 5 && !pgtable_l5_enabled())
+		levels = 4;
+
 	remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings);
 
 	if (!cpu) {
@@ -1778,9 +1781,9 @@ static int __init __kpti_install_ng_mappings(void *__unused)
 		//
 		// The physical pages are laid out as follows:
 		//
-		// +--------+-/-------+-/------ +-\\--------+
-		// :  PTE[] : | PMD[] : | PUD[] : || PGD[]  :
-		// +--------+-\-------+-\------ +-//--------+
+		// +--------+-/-------+-/------ +-/------ +-\\\--------+
+		// :  PTE[] : | PMD[] : | PUD[] : | P4D[] : ||| PGD[]  :
+		// +--------+-\-------+-\------ +-\------ +-///--------+
 		//      ^
 		// The first page is mapped into this hierarchy at a PMD_SHIFT
 		// aligned virtual address, so that we can manipulate the PTE
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index d03434b7bca5..fa0d7c63f8d2 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,16 +216,15 @@ SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 	.macro	kpti_mk_tbl_ng, type, num_entries
 	add	end_\type\()p, cur_\type\()p, #\num_entries * 8
 .Ldo_\type:
-	ldr	\type, [cur_\type\()p]		// Load the entry
+	ldr	\type, [cur_\type\()p], #8	// Load the entry and advance
 	tbz	\type, #0, .Lnext_\type		// Skip invalid and
 	tbnz	\type, #11, .Lnext_\type	// non-global entries
 	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
-	str	\type, [cur_\type\()p]		// Update the entry
+	str	\type, [cur_\type\()p, #-8]	// Update the entry
 	.ifnc	\type, pte
 	tbnz	\type, #1, .Lderef_\type
 	.endif
 .Lnext_\type:
-	add	cur_\type\()p, cur_\type\()p, #8
 	cmp	cur_\type\()p, end_\type\()p
 	b.ne	.Ldo_\type
 	.endm
@@ -235,18 +234,18 @@ SYM_FUNC_ALIAS(__pi_idmap_cpu_replace_ttbr1, idmap_cpu_replace_ttbr1)
 	 * fixmap slot associated with the current level.
 	 */
 	.macro	kpti_map_pgtbl, type, level
-	str	xzr, [temp_pte, #8 * (\level + 1)]	// break before make
+	str	xzr, [temp_pte, #8 * (\level + 2)]	// break before make
 	dsb	nshst
-	add	pte, temp_pte, #PAGE_SIZE * (\level + 1)
+	add	pte, temp_pte, #PAGE_SIZE * (\level + 2)
 	lsr	pte, pte, #12
 	tlbi	vaae1, pte
 	dsb	nsh
 	isb
 
 	phys_to_pte pte, cur_\type\()p
-	add	cur_\type\()p, temp_pte, #PAGE_SIZE * (\level + 1)
+	add	cur_\type\()p, temp_pte, #PAGE_SIZE * (\level + 2)
 	orr	pte, pte, pte_flags
-	str	pte, [temp_pte, #8 * (\level + 1)]
+	str	pte, [temp_pte, #8 * (\level + 2)]
 	dsb	nshst
 	.endm
 
@@ -279,6 +278,8 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 	end_ptep	.req	x15
 	pte		.req	x16
 	valid		.req	x17
+	cur_p4dp	.req	x19
+	end_p4dp	.req	x20
 
 	mov	x5, x3				// preserve temp_pte arg
 	mrs	swapper_ttb, ttbr1_el1
@@ -286,6 +287,12 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 
 	cbnz	cpu, __idmap_kpti_secondary
 
+#if CONFIG_PGTABLE_LEVELS > 4
+	stp	x29, x30, [sp, #-32]!
+	mov	x29, sp
+	stp	x19, x20, [sp, #16]
+#endif
+
 	/* We're the boot CPU. Wait for the others to catch up */
 	sevl
 1:	wfe
@@ -303,9 +310,32 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 	mov_q	pte_flags, KPTI_NG_PTE_FLAGS
 
 	/* Everybody is enjoying the idmap, so we can rewrite swapper. */
+
+#ifdef CONFIG_ARM64_LPA2
+	/*
+	 * If LPA2 support is configured, but 52-bit virtual addressing is not
+	 * enabled at runtime, we will fall back to one level of paging less,
+	 * and so we have to walk swapper_pg_dir as if we dereferenced its
+	 * address from a PGD level entry, and terminate the PGD level loop
+	 * right after.
+	 */
+	adrp	pgd, swapper_pg_dir	// walk &swapper_pg_dir at the next level
+	mov	cur_pgdp, end_pgdp	// must be equal to terminate the PGD loop
+alternative_if_not ARM64_HAS_VA52
+	b	.Lderef_pgd		// skip to the next level
+alternative_else_nop_endif
+	/*
+	 * LPA2 based 52-bit virtual addressing requires 52-bit physical
+	 * addressing to be enabled as well. In this case, the shareability
+	 * bits are repurposed as physical address bits, and should not be
+	 * set in pte_flags.
+	 */
+	bic	pte_flags, pte_flags, #PTE_SHARED
+#endif
+
 	/* PGD */
 	adrp		cur_pgdp, swapper_pg_dir
-	kpti_map_pgtbl	pgd, 0
+	kpti_map_pgtbl	pgd, -1
 	kpti_mk_tbl_ng	pgd, PTRS_PER_PGD
 
 	/* Ensure all the updated entries are visible to secondary CPUs */
@@ -318,16 +348,33 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 
 	/* Set the flag to zero to indicate that we're all done */
 	str	wzr, [flag_ptr]
+#if CONFIG_PGTABLE_LEVELS > 4
+	ldp	x19, x20, [sp, #16]
+	ldp	x29, x30, [sp], #32
+#endif
 	ret
 
 .Lderef_pgd:
+	/* P4D */
+	.if		CONFIG_PGTABLE_LEVELS > 4
+	p4d		.req	x30
+	pte_to_phys	cur_p4dp, pgd
+	kpti_map_pgtbl	p4d, 0
+	kpti_mk_tbl_ng	p4d, PTRS_PER_P4D
+	b		.Lnext_pgd
+	.else		/* CONFIG_PGTABLE_LEVELS <= 4 */
+	p4d		.req	pgd
+	.set		.Lnext_p4d, .Lnext_pgd
+	.endif
+
+.Lderef_p4d:
 	/* PUD */
 	.if		CONFIG_PGTABLE_LEVELS > 3
 	pud		.req	x10
-	pte_to_phys	cur_pudp, pgd
+	pte_to_phys	cur_pudp, p4d
 	kpti_map_pgtbl	pud, 1
 	kpti_mk_tbl_ng	pud, PTRS_PER_PUD
-	b		.Lnext_pgd
+	b		.Lnext_p4d
 	.else		/* CONFIG_PGTABLE_LEVELS <= 3 */
 	pud		.req	pgd
 	.set		.Lnext_pud, .Lnext_pgd
@@ -371,6 +418,9 @@ SYM_TYPED_FUNC_START(idmap_kpti_install_ng_mappings)
 	.unreq	end_ptep
 	.unreq	pte
 	.unreq	valid
+	.unreq	cur_p4dp
+	.unreq	end_p4dp
+	.unreq	p4d
 
 	/* Secondary CPUs end up here */
 __idmap_kpti_secondary:
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 33/43] arm64: Enable LPA2 at boot if supported by the system
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (31 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 32/43] arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 34/43] arm64: mm: Add 5 level paging support to fixmap and swapper handling Ard Biesheuvel
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Update the early kernel mapping code to take 52-bit virtual addressing
into account based on the LPA2 feature. This is a bit more involved than
LVA (which is supported with 64k pages only), given that some page table
descriptor bits change meaning in this case.

To keep the handling in asm to a minimum, the initial ID map is still
created with 48-bit virtual addressing, which implies that the kernel
image must be loaded into 48-bit addressable physical memory. This is
currently required by the boot protocol, even though we happen to
support placement outside of that for LVA/64k based configurations.

Enabling LPA2 involves more than setting TCR.T1SZ to a lower value,
there is also a DS bit in TCR that needs to be set, and which changes
the meaning of bits [9:8] in all page table descriptors. Since we cannot
enable DS and every live page table descriptor at the same time, let's
pivot through another temporary mapping. This avoids the need to
reintroduce manipulations of the page tables with the MMU and caches
disabled.

To permit the LPA2 feature to be overridden on the kernel command line,
which may be necessary to work around silicon errata, or to deal with
mismatched features on heterogeneous SoC designs, test for CPU feature
overrides first, and only then enable LPA2.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/assembler.h  |  8 ++-
 arch/arm64/include/asm/cpufeature.h | 18 +++++
 arch/arm64/include/asm/memory.h     |  4 ++
 arch/arm64/kernel/head.S            |  8 +++
 arch/arm64/kernel/image-vars.h      |  1 +
 arch/arm64/kernel/pi/map_kernel.c   | 70 +++++++++++++++++++-
 arch/arm64/kernel/pi/map_range.c    | 11 ++-
 arch/arm64/kernel/pi/pi.h           |  4 +-
 arch/arm64/mm/init.c                |  2 +-
 arch/arm64/mm/mmu.c                 |  6 +-
 arch/arm64/mm/proc.S                |  3 +
 11 files changed, 124 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 7eedcb36ebe0..ce7b95cd6e79 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -581,11 +581,17 @@ alternative_endif
  * but we have to add an offset so that the TTBR1 address corresponds with the
  * pgdir entry that covers the lowest 48-bit addressable VA.
  *
+ * Note that this trick is only used for LVA/64k pages - LPA2/4k pages uses an
+ * additional paging level, and on LPA2/16k pages, we would end up with a root
+ * level table with only 2 entries, which is suboptimal in terms of TLB
+ * utilization, so there we fall back to 47 bits of translation if LPA2 is not
+ * supported.
+ *
  * orr is used as it can cover the immediate value (and is idempotent).
  * 	ttbr: Value of ttbr to set, modified.
  */
 	.macro	offset_ttbr1, ttbr, tmp
-#ifdef CONFIG_ARM64_VA_BITS_52
+#if defined(CONFIG_ARM64_VA_BITS_52) && !defined(CONFIG_ARM64_LPA2)
 	mrs	\tmp, tcr_el1
 	and	\tmp, \tmp, #TCR_T1SZ_MASK
 	cmp	\tmp, #TCR_T1SZ(VA_BITS_MIN)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index a2ac31aecdd9..a8f97690ce1f 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -1008,6 +1008,24 @@ static inline bool cpu_has_lva(void)
 						    ID_AA64MMFR2_EL1_VARange_SHIFT);
 }
 
+static inline bool cpu_has_lpa2(void)
+{
+#ifdef CONFIG_ARM64_LPA2
+	u64 mmfr0;
+	int feat;
+
+	mmfr0 = read_sysreg(id_aa64mmfr0_el1);
+	mmfr0 &= ~id_aa64mmfr0_override.mask;
+	mmfr0 |= id_aa64mmfr0_override.val;
+	feat = cpuid_feature_extract_signed_field(mmfr0,
+						  ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+
+	return feat >= ID_AA64MMFR0_EL1_TGRAN_LPA2;
+#else
+	return false;
+#endif
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 9680d7444b3b..b850b1b91471 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -54,7 +54,11 @@
 #define FIXADDR_TOP		(-UL(SZ_8M))
 
 #if VA_BITS > 48
+#ifdef CONFIG_ARM64_16K_PAGES
+#define VA_BITS_MIN		(47)
+#else
 #define VA_BITS_MIN		(48)
+#endif
 #else
 #define VA_BITS_MIN		(VA_BITS)
 #endif
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index e25351addfd0..405e9bce8c73 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -89,6 +89,7 @@ SYM_CODE_START(primary_entry)
 	mov	sp, x1
 	mov	x29, xzr
 	adrp	x0, init_idmap_pg_dir
+	mov	x1, xzr
 	bl	__pi_create_init_idmap
 
 	/*
@@ -473,9 +474,16 @@ SYM_FUNC_END(__enable_mmu)
 
 #ifdef CONFIG_ARM64_VA_BITS_52
 SYM_FUNC_START(__cpu_secondary_check52bitva)
+#ifndef CONFIG_ARM64_LPA2
 	mrs_s	x0, SYS_ID_AA64MMFR2_EL1
 	and	x0, x0, ID_AA64MMFR2_EL1_VARange_MASK
 	cbnz	x0, 2f
+#else
+	mrs	x0, id_aa64mmfr0_el1
+	sbfx	x0, x0, #ID_AA64MMFR0_EL1_TGRAN_SHIFT, 4
+	cmp	x0, #ID_AA64MMFR0_EL1_TGRAN_LPA2
+	b.ge	2f
+#endif
 
 	update_early_cpu_boot_status \
 		CPU_STUCK_IN_KERNEL | CPU_STUCK_REASON_52_BIT_VA, x0, x1
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index ff81f809a240..ba4f8f7d6a91 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -54,6 +54,7 @@ PROVIDE(__pi__ctype			= _ctype);
 PROVIDE(__pi_memstart_offset_seed	= memstart_offset_seed);
 
 PROVIDE(__pi_init_idmap_pg_dir		= init_idmap_pg_dir);
+PROVIDE(__pi_init_idmap_pg_end		= init_idmap_pg_end);
 PROVIDE(__pi_init_pg_dir		= init_pg_dir);
 PROVIDE(__pi_init_pg_end		= init_pg_end);
 PROVIDE(__pi_swapper_pg_dir		= swapper_pg_dir);
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index 1853825aa29d..5fa08e13e17e 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -127,11 +127,64 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
 	}
 
 	/* Copy the root page table to its final location */
-	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PGD_SIZE);
+	memcpy((void *)swapper_pg_dir + va_offset, init_pg_dir, PAGE_SIZE);
 	dsb(ishst);
 	idmap_cpu_replace_ttbr1(swapper_pg_dir);
 }
 
+static void noinline __section(".idmap.text") set_ttbr0_for_lpa2(u64 ttbr)
+{
+	u64 sctlr = read_sysreg(sctlr_el1);
+	u64 tcr = read_sysreg(tcr_el1) | TCR_DS;
+
+	asm("	msr	sctlr_el1, %0		;"
+	    "	isb				;"
+	    "   msr     ttbr0_el1, %1		;"
+	    "   msr     tcr_el1, %2		;"
+	    "	isb				;"
+	    "	tlbi    vmalle1			;"
+	    "	dsb     nsh			;"
+	    "	isb				;"
+	    "	msr     sctlr_el1, %3		;"
+	    "	isb				;"
+	    ::	"r"(sctlr & ~SCTLR_ELx_M), "r"(ttbr), "r"(tcr), "r"(sctlr));
+}
+
+static void __init remap_idmap_for_lpa2(void)
+{
+	/* clear the bits that change meaning once LPA2 is turned on */
+	pteval_t mask = PTE_SHARED;
+
+	/*
+	 * We have to clear bits [9:8] in all block or page descriptors in the
+	 * initial ID map, as otherwise they will be (mis)interpreted as
+	 * physical address bits once we flick the LPA2 switch (TCR.DS). Since
+	 * we cannot manipulate live descriptors in that way without creating
+	 * potential TLB conflicts, let's create another temporary ID map in a
+	 * LPA2 compatible fashion, and update the initial ID map while running
+	 * from that.
+	 */
+	create_init_idmap(init_pg_dir, mask);
+	dsb(ishst);
+	set_ttbr0_for_lpa2((u64)init_pg_dir);
+
+	/*
+	 * Recreate the initial ID map with the same granularity as before.
+	 * Don't bother with the FDT, we no longer need it after this.
+	 */
+	memset(init_idmap_pg_dir, 0,
+	       (u64)init_idmap_pg_dir - (u64)init_idmap_pg_end);
+
+	create_init_idmap(init_idmap_pg_dir, mask);
+	dsb(ishst);
+
+	/* switch back to the updated initial ID map */
+	set_ttbr0_for_lpa2((u64)init_idmap_pg_dir);
+
+	/* wipe the temporary ID map from memory */
+	memset(init_pg_dir, 0, (u64)init_pg_end - (u64)init_pg_dir);
+}
+
 static void __init map_fdt(u64 fdt)
 {
 	static u8 ptes[INIT_IDMAP_FDT_SIZE] __initdata __aligned(PAGE_SIZE);
@@ -154,6 +207,7 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 	u64 va_base, pa_base = (u64)&_text;
 	u64 kaslr_offset = pa_base % MIN_KIMG_ALIGN;
 	int root_level = 4 - CONFIG_PGTABLE_LEVELS;
+	int va_bits = VA_BITS;
 	int chosen;
 
 	map_fdt((u64)fdt);
@@ -165,8 +219,15 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 	chosen = fdt_path_offset(fdt, chosen_str);
 	init_feature_override(boot_status, fdt, chosen);
 
-	if (VA_BITS > VA_BITS_MIN && cpu_has_lva())
-		sysreg_clear_set(tcr_el1, TCR_T1SZ_MASK, TCR_T1SZ(VA_BITS));
+	if (IS_ENABLED(CONFIG_ARM64_64K_PAGES) && !cpu_has_lva()) {
+		va_bits = VA_BITS_MIN;
+	} else if (IS_ENABLED(CONFIG_ARM64_LPA2) && !cpu_has_lpa2()) {
+		va_bits = VA_BITS_MIN;
+		root_level++;
+	}
+
+	if (va_bits > VA_BITS_MIN)
+		sysreg_clear_set(tcr_el1, TCR_T1SZ_MASK, TCR_T1SZ(va_bits));
 
 	/*
 	 * The virtual KASLR displacement modulo 2MiB is decided by the
@@ -184,6 +245,9 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 		kaslr_offset |= kaslr_seed & ~(MIN_KIMG_ALIGN - 1);
 	}
 
+	if (IS_ENABLED(CONFIG_ARM64_LPA2) && va_bits > VA_BITS_MIN)
+		remap_idmap_for_lpa2();
+
 	va_base = KIMAGE_VADDR + kaslr_offset;
 	map_kernel(kaslr_offset, va_base - pa_base, root_level);
 }
diff --git a/arch/arm64/kernel/pi/map_range.c b/arch/arm64/kernel/pi/map_range.c
index 79e4f6a2efe1..5410b2cac590 100644
--- a/arch/arm64/kernel/pi/map_range.c
+++ b/arch/arm64/kernel/pi/map_range.c
@@ -87,14 +87,19 @@ void __init map_range(u64 *pte, u64 start, u64 end, u64 pa, pgprot_t prot,
 	}
 }
 
-asmlinkage u64 __init create_init_idmap(pgd_t *pg_dir)
+asmlinkage u64 __init create_init_idmap(pgd_t *pg_dir, pteval_t clrmask)
 {
 	u64 ptep = (u64)pg_dir + PAGE_SIZE;
+	pgprot_t text_prot = PAGE_KERNEL_ROX;
+	pgprot_t data_prot = PAGE_KERNEL;
+
+	pgprot_val(text_prot) &= ~clrmask;
+	pgprot_val(data_prot) &= ~clrmask;
 
 	map_range(&ptep, (u64)_stext, (u64)__initdata_begin, (u64)_stext,
-		  PAGE_KERNEL_ROX, IDMAP_ROOT_LEVEL, (pte_t *)pg_dir, false, 0);
+		  text_prot, IDMAP_ROOT_LEVEL, (pte_t *)pg_dir, false, 0);
 	map_range(&ptep, (u64)__initdata_begin, (u64)_end, (u64)__initdata_begin,
-		  PAGE_KERNEL, IDMAP_ROOT_LEVEL, (pte_t *)pg_dir, false, 0);
+		  data_prot, IDMAP_ROOT_LEVEL, (pte_t *)pg_dir, false, 0);
 
 	return ptep;
 }
diff --git a/arch/arm64/kernel/pi/pi.h b/arch/arm64/kernel/pi/pi.h
index 1ea282a5f96a..c91e5e965cd3 100644
--- a/arch/arm64/kernel/pi/pi.h
+++ b/arch/arm64/kernel/pi/pi.h
@@ -21,7 +21,7 @@ static inline void *prel64_to_pointer(const prel64_t *offset)
 
 extern bool dynamic_scs_is_enabled;
 
-extern pgd_t init_idmap_pg_dir[];
+extern pgd_t init_idmap_pg_dir[], init_idmap_pg_end[];
 
 void init_feature_override(u64 boot_status, const void *fdt, int chosen);
 u64 kaslr_early_init(void *fdt, int chosen);
@@ -33,4 +33,4 @@ void map_range(u64 *pgd, u64 start, u64 end, u64 pa, pgprot_t prot,
 
 asmlinkage void early_map_kernel(u64 boot_status, void *fdt);
 
-asmlinkage u64 create_init_idmap(pgd_t *pgd);
+asmlinkage u64 create_init_idmap(pgd_t *pgd, pteval_t clrmask);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 74c1db8ce271..0f427b50fdc3 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -238,7 +238,7 @@ void __init arm64_memblock_init(void)
 	 * physical address of PAGE_OFFSET, we have to *subtract* from it.
 	 */
 	if (IS_ENABLED(CONFIG_ARM64_VA_BITS_52) && (vabits_actual != 52))
-		memstart_addr -= _PAGE_OFFSET(48) - _PAGE_OFFSET(52);
+		memstart_addr -= _PAGE_OFFSET(vabits_actual) - _PAGE_OFFSET(52);
 
 	/*
 	 * Apply the memory limit if it was set. Since the kernel may be loaded
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d2e9dec38a15..d30ae4d3fdd9 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -582,8 +582,12 @@ static void __init map_mem(pgd_t *pgdp)
 	 * entries at any level are being shared between the linear region and
 	 * the vmalloc region. Check whether this is true for the PGD level, in
 	 * which case it is guaranteed to be true for all other levels as well.
+	 * (Unless we are running with support for LPA2, in which case the
+	 * entire reduced VA space is covered by a single pgd_t which will have
+	 * been populated without the PXNTable attribute by the time we get here.)
 	 */
-	BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end));
+	BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end) &&
+		     pgd_index(_PAGE_OFFSET(VA_BITS_MIN)) != PTRS_PER_PGD - 1);
 
 	early_kfence_pool = arm64_kfence_alloc_pool();
 
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index fa0d7c63f8d2..9d40f3ffd8d2 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -488,6 +488,9 @@ SYM_FUNC_START(__cpu_setup)
 	mov		x9, #64 - VA_BITS
 alternative_if ARM64_HAS_VA52
 	tcr_set_t1sz	tcr, x9
+#ifdef CONFIG_ARM64_LPA2
+	orr		tcr, tcr, #TCR_DS
+#endif
 alternative_else_nop_endif
 #endif
 
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 34/43] arm64: mm: Add 5 level paging support to fixmap and swapper handling
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (32 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 33/43] arm64: Enable LPA2 at boot if supported by the system Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 35/43] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging Ard Biesheuvel
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add support for using 5 levels of paging in the fixmap, as well as in
the kernel page table handling code which uses fixmaps internally.
This also handles the case where a 5 level build runs on hardware that
only supports 4 levels of paging.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/fixmap.h  |  1 +
 arch/arm64/include/asm/pgtable.h | 45 ++++++++++++++++---
 arch/arm64/mm/fixmap.c           |  2 +-
 arch/arm64/mm/mmu.c              | 47 ++++++++++++++++++--
 4 files changed, 85 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 8aabd45e9a13..87e307804b99 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -87,6 +87,7 @@ enum fixed_addresses {
 	FIX_PTE,
 	FIX_PMD,
 	FIX_PUD,
+	FIX_P4D,
 	FIX_PGD,
 
 	__end_of_fixed_addresses
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7eb2b933ed3c..3d7fb3cde83d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -621,12 +621,12 @@ static inline bool pud_table(pud_t pud) { return true; }
 				 PUD_TYPE_TABLE)
 #endif
 
-extern pgd_t init_pg_dir[PTRS_PER_PGD];
+extern pgd_t init_pg_dir[];
 extern pgd_t init_pg_end[];
-extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
-extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
-extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
-extern pgd_t reserved_pg_dir[PTRS_PER_PGD];
+extern pgd_t swapper_pg_dir[];
+extern pgd_t idmap_pg_dir[];
+extern pgd_t tramp_pg_dir[];
+extern pgd_t reserved_pg_dir[];
 
 extern void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd);
 
@@ -891,12 +891,47 @@ static inline p4d_t *p4d_offset(pgd_t *pgdp, unsigned long addr)
 	return p4d_offset_lockless(pgdp, READ_ONCE(*pgdp), addr);
 }
 
+static inline p4d_t *p4d_set_fixmap(unsigned long addr)
+{
+	if (!pgtable_l5_enabled())
+		return NULL;
+	return (p4d_t *)set_fixmap_offset(FIX_P4D, addr);
+}
+
+static inline p4d_t *p4d_set_fixmap_offset(pgd_t *pgdp, unsigned long addr)
+{
+	if (!pgtable_l5_enabled())
+		return pgd_to_folded_p4d(pgdp, addr);
+	return p4d_set_fixmap(p4d_offset_phys(pgdp, addr));
+}
+
+static inline void p4d_clear_fixmap(void)
+{
+	if (pgtable_l5_enabled())
+		clear_fixmap(FIX_P4D);
+}
+
+/* use ONLY for statically allocated translation tables */
+static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr)
+{
+	if (!pgtable_l5_enabled())
+		return pgd_to_folded_p4d(pgdp, addr);
+	return (p4d_t *)__phys_to_kimg(p4d_offset_phys(pgdp, addr));
+}
+
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(__pgd_to_phys(pgd)))
 
 #else
 
 static inline bool pgtable_l5_enabled(void) { return false; }
 
+/* Match p4d_offset folding in <asm/generic/pgtable-nop4d.h> */
+#define p4d_set_fixmap(addr)		NULL
+#define p4d_set_fixmap_offset(p4dp, addr)	((p4d_t *)p4dp)
+#define p4d_clear_fixmap()
+
+#define p4d_offset_kimg(dir,addr)	((p4d_t *)dir)
+
 #endif  /* CONFIG_PGTABLE_LEVELS > 4 */
 
 #define pgd_ERROR(e)	\
diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
index 9404f282f829..d22506e9c7fd 100644
--- a/arch/arm64/mm/fixmap.c
+++ b/arch/arm64/mm/fixmap.c
@@ -104,7 +104,7 @@ void __init early_fixmap_init(void)
 	unsigned long end = FIXADDR_TOP;
 
 	pgd_t *pgdp = pgd_offset_k(addr);
-	p4d_t *p4dp = p4d_offset(pgdp, addr);
+	p4d_t *p4dp = p4d_offset_kimg(pgdp, addr);
 
 	early_fixmap_init_pud(p4dp, addr, end);
 }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d30ae4d3fdd9..8e5b3a7c5afd 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -313,15 +313,14 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
 	} while (addr = next, addr != end);
 }
 
-static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
+static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
 			   phys_addr_t phys, pgprot_t prot,
 			   phys_addr_t (*pgtable_alloc)(int),
 			   int flags)
 {
 	unsigned long next;
-	pud_t *pudp;
-	p4d_t *p4dp = p4d_offset(pgdp, addr);
 	p4d_t p4d = READ_ONCE(*p4dp);
+	pud_t *pudp;
 
 	if (p4d_none(p4d)) {
 		p4dval_t p4dval = P4D_TYPE_TABLE | P4D_TABLE_UXN;
@@ -369,6 +368,46 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
 	pud_clear_fixmap();
 }
 
+static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
+			   phys_addr_t phys, pgprot_t prot,
+			   phys_addr_t (*pgtable_alloc)(int),
+			   int flags)
+{
+	unsigned long next;
+	pgd_t pgd = READ_ONCE(*pgdp);
+	p4d_t *p4dp;
+
+	if (pgd_none(pgd)) {
+		pgdval_t pgdval = PGD_TYPE_TABLE | PGD_TABLE_UXN;
+		phys_addr_t p4d_phys;
+
+		if (flags & NO_EXEC_MAPPINGS)
+			pgdval |= PGD_TABLE_PXN;
+		BUG_ON(!pgtable_alloc);
+		p4d_phys = pgtable_alloc(P4D_SHIFT);
+		__pgd_populate(pgdp, p4d_phys, pgdval);
+		pgd = READ_ONCE(*pgdp);
+	}
+	BUG_ON(pgd_bad(pgd));
+
+	p4dp = p4d_set_fixmap_offset(pgdp, addr);
+	do {
+		p4d_t old_p4d = READ_ONCE(*p4dp);
+
+		next = p4d_addr_end(addr, end);
+
+		alloc_init_pud(p4dp, addr, next, phys, prot,
+			       pgtable_alloc, flags);
+
+		BUG_ON(p4d_val(old_p4d) != 0 &&
+		       p4d_val(old_p4d) != READ_ONCE(p4d_val(*p4dp)));
+
+		phys += next - addr;
+	} while (p4dp++, addr = next, addr != end);
+
+	p4d_clear_fixmap();
+}
+
 static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
 					unsigned long virt, phys_addr_t size,
 					pgprot_t prot,
@@ -391,7 +430,7 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
 
 	do {
 		next = pgd_addr_end(addr, end);
-		alloc_init_pud(pgdp, addr, next, phys, prot, pgtable_alloc,
+		alloc_init_p4d(pgdp, addr, next, phys, prot, pgtable_alloc,
 			       flags);
 		phys += next - addr;
 	} while (pgdp++, addr = next, addr != end);
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 35/43] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (33 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 34/43] arm64: mm: Add 5 level paging support to fixmap and swapper handling Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime Ard Biesheuvel
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Allow the KASAN init code to deal with 5 levels of paging, and relax the
requirement that the shadow region is aligned to the top level pgd_t
size. This is necessary for LPA2 based 52-bit virtual addressing, where
the KASAN shadow will never be aligned to the pgd_t size. Allowing this
also enables the 16k/48-bit case for KASAN, which is a nice bonus.

This involves some hackery to manipulate the root and next level page
tables without having to distinguish all the various configurations,
including 16k/48-bits (which has a two entry pgd_t level), and LPA2
configurations running with one translation level less on non-LPA2
hardware.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig         |   2 +-
 arch/arm64/mm/kasan_init.c | 148 +++++++++++++++++---
 2 files changed, 130 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8c2c36fffcf5..9ca3316d6379 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -164,7 +164,7 @@ config ARM64
 	select HAVE_ARCH_HUGE_VMAP
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE
-	select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
+	select HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
 	select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index a86ab99587c9..fbddbf9faf19 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -23,7 +23,7 @@
 
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 
-static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
+static pgd_t tmp_pg_dir[PTRS_PER_PTE] __initdata __aligned(PAGE_SIZE);
 
 /*
  * The p*d_populate functions call virt_to_phys implicitly so they can't be used
@@ -99,6 +99,19 @@ static pud_t *__init kasan_pud_offset(p4d_t *p4dp, unsigned long addr, int node,
 	return early ? pud_offset_kimg(p4dp, addr) : pud_offset(p4dp, addr);
 }
 
+static p4d_t *__init kasan_p4d_offset(pgd_t *pgdp, unsigned long addr, int node,
+				      bool early)
+{
+	if (pgd_none(READ_ONCE(*pgdp))) {
+		phys_addr_t p4d_phys = early ?
+				__pa_symbol(kasan_early_shadow_p4d)
+					: kasan_alloc_zeroed_page(node);
+		__pgd_populate(pgdp, p4d_phys, PGD_TYPE_TABLE);
+	}
+
+	return early ? p4d_offset_kimg(pgdp, addr) : p4d_offset(pgdp, addr);
+}
+
 static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
 				      unsigned long end, int node, bool early)
 {
@@ -144,12 +157,12 @@ static void __init kasan_p4d_populate(pgd_t *pgdp, unsigned long addr,
 				      unsigned long end, int node, bool early)
 {
 	unsigned long next;
-	p4d_t *p4dp = p4d_offset(pgdp, addr);
+	p4d_t *p4dp = kasan_p4d_offset(pgdp, addr, node, early);
 
 	do {
 		next = p4d_addr_end(addr, end);
 		kasan_pud_populate(p4dp, addr, next, node, early);
-	} while (p4dp++, addr = next, addr != end);
+	} while (p4dp++, addr = next, addr != end && p4d_none(READ_ONCE(*p4dp)));
 }
 
 static void __init kasan_pgd_populate(unsigned long addr, unsigned long end,
@@ -165,19 +178,48 @@ static void __init kasan_pgd_populate(unsigned long addr, unsigned long end,
 	} while (pgdp++, addr = next, addr != end);
 }
 
+#if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS > 4
+#define SHADOW_ALIGN	P4D_SIZE
+#else
+#define SHADOW_ALIGN	PUD_SIZE
+#endif
+
+/*
+ * Return whether 'addr' is aligned to the size covered by a root level
+ * descriptor.
+ */
+static bool __init root_level_aligned(u64 addr)
+{
+	int shift = (ARM64_HW_PGTABLE_LEVELS(vabits_actual) - 1) * (PAGE_SHIFT - 3);
+
+	return (addr % (PAGE_SIZE << shift)) == 0;
+}
+
 /* The early shadow maps everything to a single page of zeroes */
 asmlinkage void __init kasan_early_init(void)
 {
 	BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
 		KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
-	/*
-	 * We cannot check the actual value of KASAN_SHADOW_START during build,
-	 * as it depends on vabits_actual. As a best-effort approach, check
-	 * potential values calculated based on VA_BITS and VA_BITS_MIN.
-	 */
-	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), PGDIR_SIZE));
-	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), PGDIR_SIZE));
-	BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, PGDIR_SIZE));
+	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), SHADOW_ALIGN));
+	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), SHADOW_ALIGN));
+	BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, SHADOW_ALIGN));
+
+	if (!root_level_aligned(KASAN_SHADOW_START)) {
+		/*
+		 * The start address is misaligned, and so the next level table
+		 * will be shared with the linear region. This can happen with
+		 * 4 or 5 level paging, so install a generic pte_t[] as the
+		 * next level. This prevents the kasan_pgd_populate call below
+		 * from inserting an entry that refers to the shared KASAN zero
+		 * shadow pud_t[]/p4d_t[], which could end up getting corrupted
+		 * when the linear region is mapped.
+		 */
+		static pte_t tbl[PTRS_PER_PTE] __page_aligned_bss;
+		pgd_t *pgdp = pgd_offset_k(KASAN_SHADOW_START);
+
+		set_pgd(pgdp, __pgd(__pa_symbol(tbl) | PGD_TYPE_TABLE));
+	}
+
 	kasan_pgd_populate(KASAN_SHADOW_START, KASAN_SHADOW_END, NUMA_NO_NODE,
 			   true);
 }
@@ -189,20 +231,75 @@ static void __init kasan_map_populate(unsigned long start, unsigned long end,
 	kasan_pgd_populate(start & PAGE_MASK, PAGE_ALIGN(end), node, false);
 }
 
-static void __init clear_pgds(unsigned long start,
-			unsigned long end)
+/*
+ * Return the descriptor index of 'addr' in the root level table
+ */
+static int __init root_level_idx(u64 addr)
 {
 	/*
-	 * Remove references to kasan page tables from
-	 * swapper_pg_dir. pgd_clear() can't be used
-	 * here because it's nop on 2,3-level pagetable setups
+	 * On 64k pages, the TTBR1 range root tables are extended for 52-bit
+	 * virtual addressing, and TTBR1 will simply point to the pgd_t entry
+	 * that covers the start of the 48-bit addressable VA space if LVA is
+	 * not implemented. This means we need to index the table as usual,
+	 * instead of masking off bits based on vabits_actual.
 	 */
-	for (; start < end; start += PGDIR_SIZE)
-		set_pgd(pgd_offset_k(start), __pgd(0));
+	u64 vabits = IS_ENABLED(CONFIG_ARM64_64K_PAGES) ? VA_BITS
+							: vabits_actual;
+	int shift = (ARM64_HW_PGTABLE_LEVELS(vabits) - 1) * (PAGE_SHIFT - 3);
+
+	return (addr & ~_PAGE_OFFSET(vabits)) >> (shift + PAGE_SHIFT);
+}
+
+/*
+ * Clone a next level table from swapper_pg_dir into tmp_pg_dir
+ */
+static void __init clone_next_level(u64 addr, pgd_t *tmp_pg_dir, pud_t *pud)
+{
+	int idx = root_level_idx(addr);
+	pgd_t pgd = READ_ONCE(swapper_pg_dir[idx]);
+	pud_t *pudp = (pud_t *)__phys_to_kimg(__pgd_to_phys(pgd));
+
+	memcpy(pud, pudp, PAGE_SIZE);
+	tmp_pg_dir[idx] = __pgd(__phys_to_pgd_val(__pa_symbol(pud)) |
+				PUD_TYPE_TABLE);
+}
+
+/*
+ * Return the descriptor index of 'addr' in the next level table
+ */
+static int __init next_level_idx(u64 addr)
+{
+	int shift = (ARM64_HW_PGTABLE_LEVELS(vabits_actual) - 2) * (PAGE_SHIFT - 3);
+
+	return (addr >> (shift + PAGE_SHIFT)) % PTRS_PER_PTE;
+}
+
+/*
+ * Dereference the table descriptor at 'pgd_idx' and clear the entries from
+ * 'start' to 'end' (exclusive) from the table.
+ */
+static void __init clear_next_level(int pgd_idx, int start, int end)
+{
+	pgd_t pgd = READ_ONCE(swapper_pg_dir[pgd_idx]);
+	pud_t *pudp = (pud_t *)__phys_to_kimg(__pgd_to_phys(pgd));
+
+	memset(&pudp[start], 0, (end - start) * sizeof(pud_t));
+}
+
+static void __init clear_shadow(u64 start, u64 end)
+{
+	int l = root_level_idx(start), m = root_level_idx(end);
+
+	if (!root_level_aligned(start))
+		clear_next_level(l++, next_level_idx(start), PTRS_PER_PTE);
+	if (!root_level_aligned(end))
+		clear_next_level(m, 0, next_level_idx(end));
+	memset(&swapper_pg_dir[l], 0, (m - l) * sizeof(pgd_t));
 }
 
 static void __init kasan_init_shadow(void)
 {
+	static pud_t pud[2][PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
 	u64 kimg_shadow_start, kimg_shadow_end;
 	u64 mod_shadow_start;
 	u64 vmalloc_shadow_end;
@@ -224,10 +321,23 @@ static void __init kasan_init_shadow(void)
 	 * setup will be finished.
 	 */
 	memcpy(tmp_pg_dir, swapper_pg_dir, sizeof(tmp_pg_dir));
+
+	/*
+	 * If the start or end address of the shadow region is not aligned to
+	 * the root level size, we have to allocate a temporary next-level table
+	 * in each case, clone the next level of descriptors, and install the
+	 * table into tmp_pg_dir. Note that with 5 levels of paging, the next
+	 * level will in fact be p4d_t, but that makes no difference in this
+	 * case.
+	 */
+	if (!root_level_aligned(KASAN_SHADOW_START))
+		clone_next_level(KASAN_SHADOW_START, tmp_pg_dir, pud[0]);
+	if (!root_level_aligned(KASAN_SHADOW_END))
+		clone_next_level(KASAN_SHADOW_END, tmp_pg_dir, pud[1]);
 	dsb(ishst);
 	cpu_replace_ttbr1(lm_alias(tmp_pg_dir));
 
-	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
+	clear_shadow(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
 	kasan_map_populate(kimg_shadow_start, kimg_shadow_end,
 			   early_pfn_to_nid(virt_to_pfn(lm_alias(KERNEL_START))));
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (34 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 35/43] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-29 14:17   ` Ryan Roberts
  2024-02-14 12:29 ` [PATCH v8 37/43] arm64: ptdump: Disregard unaddressable VA space Ard Biesheuvel
                   ` (7 subsequent siblings)
  43 siblings, 1 reply; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

In order to support LPA2 on 16k pages in a way that permits non-LPA2
systems to run the same kernel image, we have to be able to fall back to
at most 48 bits of virtual addressing.

Falling back to 48 bits would result in a level 0 with only 2 entries,
which is suboptimal in terms of TLB utilization. So instead, let's fall
back to 47 bits in that case. This means we need to be able to fold PUDs
dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
on LPA2 with 4k pages.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/pgalloc.h | 12 ++-
 arch/arm64/include/asm/pgtable.h | 87 +++++++++++++++++---
 arch/arm64/include/asm/tlb.h     |  3 +
 arch/arm64/kernel/cpufeature.c   |  2 +
 arch/arm64/mm/mmu.c              |  2 +-
 arch/arm64/mm/pgd.c              |  2 +
 6 files changed, 95 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index cae8c648f462..aeba2cf15a25 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -14,6 +14,7 @@
 #include <asm/tlbflush.h>
 
 #define __HAVE_ARCH_PGD_FREE
+#define __HAVE_ARCH_PUD_FREE
 #include <asm-generic/pgalloc.h>
 
 #define PGD_SIZE	(PTRS_PER_PGD * sizeof(pgd_t))
@@ -43,7 +44,8 @@ static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
 
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
 {
-	set_p4d(p4dp, __p4d(__phys_to_p4d_val(pudp) | prot));
+	if (pgtable_l4_enabled())
+		set_p4d(p4dp, __p4d(__phys_to_p4d_val(pudp) | prot));
 }
 
 static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
@@ -53,6 +55,14 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4dp, pud_t *pudp)
 	p4dval |= (mm == &init_mm) ? P4D_TABLE_UXN : P4D_TABLE_PXN;
 	__p4d_populate(p4dp, __pa(pudp), p4dval);
 }
+
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+	if (!pgtable_l4_enabled())
+		return;
+	BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
+	free_page((unsigned long)pud);
+}
 #else
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
 {
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 3d7fb3cde83d..b3c716fa8121 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -759,12 +759,27 @@ static inline pmd_t *pud_pgtable(pud_t pud)
 
 #if CONFIG_PGTABLE_LEVELS > 3
 
+static __always_inline bool pgtable_l4_enabled(void)
+{
+	if (CONFIG_PGTABLE_LEVELS > 4 || !IS_ENABLED(CONFIG_ARM64_LPA2))
+		return true;
+	if (!alternative_has_cap_likely(ARM64_ALWAYS_BOOT))
+		return vabits_actual == VA_BITS;
+	return alternative_has_cap_unlikely(ARM64_HAS_VA52);
+}
+
+static inline bool mm_pud_folded(const struct mm_struct *mm)
+{
+	return !pgtable_l4_enabled();
+}
+#define mm_pud_folded  mm_pud_folded
+
 #define pud_ERROR(e)	\
 	pr_err("%s:%d: bad pud %016llx.\n", __FILE__, __LINE__, pud_val(e))
 
-#define p4d_none(p4d)		(!p4d_val(p4d))
-#define p4d_bad(p4d)		(!(p4d_val(p4d) & 2))
-#define p4d_present(p4d)	(p4d_val(p4d))
+#define p4d_none(p4d)		(pgtable_l4_enabled() && !p4d_val(p4d))
+#define p4d_bad(p4d)		(pgtable_l4_enabled() && !(p4d_val(p4d) & 2))
+#define p4d_present(p4d)	(!p4d_none(p4d))
 
 static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
@@ -780,7 +795,8 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 
 static inline void p4d_clear(p4d_t *p4dp)
 {
-	set_p4d(p4dp, __p4d(0));
+	if (pgtable_l4_enabled())
+		set_p4d(p4dp, __p4d(0));
 }
 
 static inline phys_addr_t p4d_page_paddr(p4d_t p4d)
@@ -788,25 +804,74 @@ static inline phys_addr_t p4d_page_paddr(p4d_t p4d)
 	return __p4d_to_phys(p4d);
 }
 
+#define pud_index(addr)		(((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+
+static inline pud_t *p4d_to_folded_pud(p4d_t *p4dp, unsigned long addr)
+{
+	return (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr);
+}
+
 static inline pud_t *p4d_pgtable(p4d_t p4d)
 {
 	return (pud_t *)__va(p4d_page_paddr(p4d));
 }
 
-/* Find an entry in the first-level page table. */
-#define pud_offset_phys(dir, addr)	(p4d_page_paddr(READ_ONCE(*(dir))) + pud_index(addr) * sizeof(pud_t))
+static inline phys_addr_t pud_offset_phys(p4d_t *p4dp, unsigned long addr)
+{
+	BUG_ON(!pgtable_l4_enabled());
 
-#define pud_set_fixmap(addr)		((pud_t *)set_fixmap_offset(FIX_PUD, addr))
-#define pud_set_fixmap_offset(p4d, addr)	pud_set_fixmap(pud_offset_phys(p4d, addr))
-#define pud_clear_fixmap()		clear_fixmap(FIX_PUD)
+	return p4d_page_paddr(READ_ONCE(*p4dp)) + pud_index(addr) * sizeof(pud_t);
+}
 
-#define p4d_page(p4d)		pfn_to_page(__phys_to_pfn(__p4d_to_phys(p4d)))
+static inline
+pud_t *pud_offset_lockless(p4d_t *p4dp, p4d_t p4d, unsigned long addr)
+{
+	if (!pgtable_l4_enabled())
+		return p4d_to_folded_pud(p4dp, addr);
+	return (pud_t *)__va(p4d_page_paddr(p4d)) + pud_index(addr);
+}
+#define pud_offset_lockless pud_offset_lockless
+
+static inline pud_t *pud_offset(p4d_t *p4dp, unsigned long addr)
+{
+	return pud_offset_lockless(p4dp, READ_ONCE(*p4dp), addr);
+}
+#define pud_offset	pud_offset
+
+static inline pud_t *pud_set_fixmap(unsigned long addr)
+{
+	if (!pgtable_l4_enabled())
+		return NULL;
+	return (pud_t *)set_fixmap_offset(FIX_PUD, addr);
+}
+
+static inline pud_t *pud_set_fixmap_offset(p4d_t *p4dp, unsigned long addr)
+{
+	if (!pgtable_l4_enabled())
+		return p4d_to_folded_pud(p4dp, addr);
+	return pud_set_fixmap(pud_offset_phys(p4dp, addr));
+}
+
+static inline void pud_clear_fixmap(void)
+{
+	if (pgtable_l4_enabled())
+		clear_fixmap(FIX_PUD);
+}
 
 /* use ONLY for statically allocated translation tables */
-#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
+static inline pud_t *pud_offset_kimg(p4d_t *p4dp, u64 addr)
+{
+	if (!pgtable_l4_enabled())
+		return p4d_to_folded_pud(p4dp, addr);
+	return (pud_t *)__phys_to_kimg(pud_offset_phys(p4dp, addr));
+}
+
+#define p4d_page(p4d)		pfn_to_page(__phys_to_pfn(__p4d_to_phys(p4d)))
 
 #else
 
+static inline bool pgtable_l4_enabled(void) { return false; }
+
 #define p4d_page_paddr(p4d)	({ BUILD_BUG(); 0;})
 
 /* Match pud_offset folding in <asm/generic/pgtable-nopud.h> */
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 0150deb332af..a947c6e784ed 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -103,6 +103,9 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
 {
 	struct ptdesc *ptdesc = virt_to_ptdesc(pudp);
 
+	if (!pgtable_l4_enabled())
+		return;
+
 	pagetable_pud_dtor(ptdesc);
 	tlb_remove_ptdesc(tlb, ptdesc);
 }
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index bc5e4e569864..94f035f6c421 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1767,6 +1767,8 @@ static int __init __kpti_install_ng_mappings(void *__unused)
 
 	if (levels == 5 && !pgtable_l5_enabled())
 		levels = 4;
+	else if (levels == 4 && !pgtable_l4_enabled())
+		levels = 3;
 
 	remap_fn = (void *)__pa_symbol(idmap_kpti_install_ng_mappings);
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8e5b3a7c5afd..b131ed31a6c8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1065,7 +1065,7 @@ static void free_empty_pud_table(p4d_t *p4dp, unsigned long addr,
 		free_empty_pmd_table(pudp, addr, next, floor, ceiling);
 	} while (addr = next, addr < end);
 
-	if (CONFIG_PGTABLE_LEVELS <= 3)
+	if (!pgtable_l4_enabled())
 		return;
 
 	if (!pgtable_range_aligned(start, end, floor, ceiling, P4D_MASK))
diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
index 3c4f8a279d2b..0c501cabc238 100644
--- a/arch/arm64/mm/pgd.c
+++ b/arch/arm64/mm/pgd.c
@@ -21,6 +21,8 @@ static bool pgdir_is_page_size(void)
 {
 	if (PGD_SIZE == PAGE_SIZE)
 		return true;
+	if (CONFIG_PGTABLE_LEVELS == 4)
+		return !pgtable_l4_enabled();
 	if (CONFIG_PGTABLE_LEVELS == 5)
 		return !pgtable_l5_enabled();
 	return false;
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 37/43] arm64: ptdump: Disregard unaddressable VA space
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (35 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 38/43] arm64: ptdump: Deal with translation levels folded at runtime Ard Biesheuvel
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Configurations built with support for 52-bit virtual addressing can also
run on CPUs that only support 48 bits of VA space, in which case only
that part of swapper_pg_dir that represents the 48-bit addressable
region is relevant, and everything else is ignored by the hardware.

Our software pagetable walker has little in the way of input address
validation, and so it will happily start a walk from an address that is
not representable by the number of paging levels that are actually
active, resulting in lots of bogus output from the page table dumper
unless we take care to start at a valid address.

So define the start address at runtime based on vabits_actual.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/ptdump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 5f0849528ccf..16d0cf1d85c4 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -313,7 +313,6 @@ static void __init ptdump_initialize(void)
 
 static struct ptdump_info kernel_ptdump_info __ro_after_init = {
 	.mm		= &init_mm,
-	.base_addr	= PAGE_OFFSET,
 };
 
 void ptdump_check_wx(void)
@@ -329,7 +328,7 @@ void ptdump_check_wx(void)
 		.ptdump = {
 			.note_page = note_page,
 			.range = (struct ptdump_range[]) {
-				{PAGE_OFFSET, ~0UL},
+				{_PAGE_OFFSET(vabits_actual), ~0UL},
 				{0, 0}
 			}
 		}
@@ -370,6 +369,7 @@ static int __init ptdump_init(void)
 	static struct addr_marker address_markers[ARRAY_SIZE(m)] __ro_after_init;
 
 	kernel_ptdump_info.markers = memcpy(address_markers, m, sizeof(m));
+	kernel_ptdump_info.base_addr = page_offset;
 
 	ptdump_initialize();
 	ptdump_debugfs_register(&kernel_ptdump_info, "kernel_page_tables");
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 38/43] arm64: ptdump: Deal with translation levels folded at runtime
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (36 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 37/43] arm64: ptdump: Disregard unaddressable VA space Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 39/43] arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels Ard Biesheuvel
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Currently, the ptdump code deals with folded PMD or PUD levels at build
time, by omitting those levels when invoking note_page. IOW, note_page()
is never invoked with level == 1 if P4Ds are folded in the build
configuration.

With the introduction of LPA2 support, we will defer some of these
folding decisions to runtime, so let's take care of this by overriding
the 'level' argument when this condition triggers.

Substituting the PUD or PMD strings for "PGD" when the level in question
is folded at build time is no longer necessary, and so the conditional
expressions can be simplified. This also makes the indirection of the
'name' field unnecessary, so change that into a char[] array, and make
the whole thing __ro_after_init.

Note that the mm_p?d_folded() functions currently ignore their mm
pointer arguments, but let's wire them up correctly anyway.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/ptdump.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
index 16d0cf1d85c4..5b87f8d623f7 100644
--- a/arch/arm64/mm/ptdump.c
+++ b/arch/arm64/mm/ptdump.c
@@ -48,6 +48,7 @@ struct pg_state {
 	struct ptdump_state ptdump;
 	struct seq_file *seq;
 	const struct addr_marker *marker;
+	const struct mm_struct *mm;
 	unsigned long start_address;
 	int level;
 	u64 current_prot;
@@ -144,12 +145,12 @@ static const struct prot_bits pte_bits[] = {
 
 struct pg_level {
 	const struct prot_bits *bits;
-	const char *name;
-	size_t num;
+	char name[4];
+	int num;
 	u64 mask;
 };
 
-static struct pg_level pg_level[] = {
+static struct pg_level pg_level[] __ro_after_init = {
 	{ /* pgd */
 		.name	= "PGD",
 		.bits	= pte_bits,
@@ -159,11 +160,11 @@ static struct pg_level pg_level[] = {
 		.bits	= pte_bits,
 		.num	= ARRAY_SIZE(pte_bits),
 	}, { /* pud */
-		.name	= (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
+		.name	= "PUD",
 		.bits	= pte_bits,
 		.num	= ARRAY_SIZE(pte_bits),
 	}, { /* pmd */
-		.name	= (CONFIG_PGTABLE_LEVELS > 2) ? "PMD" : "PGD",
+		.name	= "PMD",
 		.bits	= pte_bits,
 		.num	= ARRAY_SIZE(pte_bits),
 	}, { /* pte */
@@ -227,6 +228,11 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 	static const char units[] = "KMGTPE";
 	u64 prot = 0;
 
+	/* check if the current level has been folded dynamically */
+	if ((level == 1 && mm_p4d_folded(st->mm)) ||
+	    (level == 2 && mm_pud_folded(st->mm)))
+		level = 0;
+
 	if (level >= 0)
 		prot = val & pg_level[level].mask;
 
@@ -288,6 +294,7 @@ void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
 	st = (struct pg_state){
 		.seq = s,
 		.marker = info->markers,
+		.mm = info->mm,
 		.level = -1,
 		.ptdump = {
 			.note_page = note_page,
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 39/43] arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (37 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 38/43] arm64: ptdump: Deal with translation levels folded at runtime Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 40/43] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs Ard Biesheuvel
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook,
	Oliver Upton

From: Ard Biesheuvel <ardb@kernel.org>

get_user_mapping_size() uses vabits_actual and CONFIG_PGTABLE_LEVELS to
provide the starting point for a table walk. This is fine for LVA, as
the number of translation levels is the same regardless of whether LVA
is enabled. However, with LPA2, this will no longer be the case, so
let's derive the number of levels from the number of VA bits directly.

Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6fa9e816df40..cd9456a03e38 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -805,7 +805,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 		.pgd		= (kvm_pteref_t)kvm->mm->pgd,
 		.ia_bits	= vabits_actual,
 		.start_level	= (KVM_PGTABLE_LAST_LEVEL -
-				   CONFIG_PGTABLE_LEVELS + 1),
+				   ARM64_HW_PGTABLE_LEVELS(pgt.ia_bits) + 1),
 		.mm_ops		= &kvm_user_mm_ops,
 	};
 	unsigned long flags;
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 40/43] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (38 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 39/43] arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 41/43] arm64: defconfig: Enable LPA2 support Ard Biesheuvel
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Update Kconfig to permit 4k and 16k granule configurations to be built
with 52-bit virtual addressing, now that all the prerequisites are in
place.

While at it, update the feature description so it matches on the
appropriate feature bits depending on the page size. For simplicity,
let's just keep ARM64_HAS_VA52 as the feature name.

Note that LPA2 based 52-bit virtual addressing requires 52-bit physical
addressing support to be enabled as well, as programming TCR.TxSZ to
values below 16 is not allowed unless TCR.DS is set, which is what
activates the 52-bit physical addressing support.

While supporting the converse (52-bit physical addressing without 52-bit
virtual addressing) would be possible in principle, let's keep things
simple, by only allowing these features to be enabled at the same time.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig             | 17 ++++++++-------
 arch/arm64/kernel/cpufeature.c | 22 ++++++++++++++++----
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9ca3316d6379..eed8fef08a10 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -368,7 +368,9 @@ config PGTABLE_LEVELS
 	default 3 if ARM64_64K_PAGES && (ARM64_VA_BITS_48 || ARM64_VA_BITS_52)
 	default 3 if ARM64_4K_PAGES && ARM64_VA_BITS_39
 	default 3 if ARM64_16K_PAGES && ARM64_VA_BITS_47
+	default 4 if ARM64_16K_PAGES && (ARM64_VA_BITS_48 || ARM64_VA_BITS_52)
 	default 4 if !ARM64_64K_PAGES && ARM64_VA_BITS_48
+	default 5 if ARM64_4K_PAGES && ARM64_VA_BITS_52
 
 config ARCH_SUPPORTS_UPROBES
 	def_bool y
@@ -396,13 +398,13 @@ config BUILTIN_RETURN_ADDRESS_STRIPS_PAC
 config KASAN_SHADOW_OFFSET
 	hex
 	depends on KASAN_GENERIC || KASAN_SW_TAGS
-	default 0xdfff800000000000 if (ARM64_VA_BITS_48 || ARM64_VA_BITS_52) && !KASAN_SW_TAGS
-	default 0xdfffc00000000000 if ARM64_VA_BITS_47 && !KASAN_SW_TAGS
+	default 0xdfff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && !KASAN_SW_TAGS
+	default 0xdfffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && !KASAN_SW_TAGS
 	default 0xdffffe0000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS
 	default 0xdfffffc000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS
 	default 0xdffffff800000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS
-	default 0xefff800000000000 if (ARM64_VA_BITS_48 || ARM64_VA_BITS_52) && KASAN_SW_TAGS
-	default 0xefffc00000000000 if ARM64_VA_BITS_47 && KASAN_SW_TAGS
+	default 0xefff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
+	default 0xefffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
 	default 0xeffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
 	default 0xefffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
 	default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
@@ -1310,7 +1312,7 @@ config ARM64_VA_BITS_48
 
 config ARM64_VA_BITS_52
 	bool "52-bit"
-	depends on ARM64_64K_PAGES && (ARM64_PAN || !ARM64_SW_TTBR0_PAN)
+	depends on ARM64_PAN || !ARM64_SW_TTBR0_PAN
 	help
 	  Enable 52-bit virtual addressing for userspace when explicitly
 	  requested via a hint to mmap(). The kernel will also use 52-bit
@@ -1357,10 +1359,11 @@ choice
 
 config ARM64_PA_BITS_48
 	bool "48-bit"
+	depends on ARM64_64K_PAGES || !ARM64_VA_BITS_52
 
 config ARM64_PA_BITS_52
-	bool "52-bit (ARMv8.2)"
-	depends on ARM64_64K_PAGES
+	bool "52-bit"
+	depends on ARM64_64K_PAGES || ARM64_VA_BITS_52
 	depends on ARM64_PAN || !ARM64_SW_TTBR0_PAN
 	help
 	  Enable support for a 52-bit physical address space, introduced as
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 94f035f6c421..0be9296e9253 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2703,15 +2703,29 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 	},
 #ifdef CONFIG_ARM64_VA_BITS_52
 	{
-		.desc = "52-bit Virtual Addressing (LVA)",
 		.capability = ARM64_HAS_VA52,
 		.type = ARM64_CPUCAP_BOOT_CPU_FEATURE,
-		.sys_reg = SYS_ID_AA64MMFR2_EL1,
-		.sign = FTR_UNSIGNED,
+		.matches = has_cpuid_feature,
 		.field_width = 4,
+#ifdef CONFIG_ARM64_64K_PAGES
+		.desc = "52-bit Virtual Addressing (LVA)",
+		.sign = FTR_SIGNED,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
 		.field_pos = ID_AA64MMFR2_EL1_VARange_SHIFT,
-		.matches = has_cpuid_feature,
 		.min_field_value = ID_AA64MMFR2_EL1_VARange_52,
+#else
+		.desc = "52-bit Virtual Addressing (LPA2)",
+		.sys_reg = SYS_ID_AA64MMFR0_EL1,
+#ifdef CONFIG_ARM64_4K_PAGES
+		.sign = FTR_SIGNED,
+		.field_pos = ID_AA64MMFR0_EL1_TGRAN4_SHIFT,
+		.min_field_value = ID_AA64MMFR0_EL1_TGRAN4_52_BIT,
+#else
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR0_EL1_TGRAN16_SHIFT,
+		.min_field_value = ID_AA64MMFR0_EL1_TGRAN16_52_BIT,
+#endif
+#endif
 	},
 #endif
 	{},
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 41/43] arm64: defconfig: Enable LPA2 support
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (39 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 40/43] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-14 12:29 ` [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

We typically enable support in defconfig for all architectural features
for which we can detect at runtime if the hardware actually supports
them.

Now that we have implemented support for LPA2 based 52-bit virtual
addressing in a way that should not impact 48-bit operation on non-LPA2
CPU, we can do the same, and enable 52-bit virtual addressing by
default.

Catalin adds:

  Currently the "Virtual address space size" arch/arm64/Kconfig menu
  entry sets different defaults for each page size. However, all are
  overridden by the defconfig to 48 bits. Set the new default in
  Kconfig and remove the defconfig line.

[ardb: squash follow-up fix from Catalin]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/Kconfig           | 4 +---
 arch/arm64/configs/defconfig | 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eed8fef08a10..160856de9bbb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1283,9 +1283,7 @@ endchoice
 
 choice
 	prompt "Virtual address space size"
-	default ARM64_VA_BITS_39 if ARM64_4K_PAGES
-	default ARM64_VA_BITS_47 if ARM64_16K_PAGES
-	default ARM64_VA_BITS_42 if ARM64_64K_PAGES
+	default ARM64_VA_BITS_52
 	help
 	  Allows choosing one of multiple possible virtual address
 	  space sizes. The level of translation table is determined by
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index e6cf3e5d63c3..f086b0624ec8 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -76,7 +76,6 @@ CONFIG_ARCH_VEXPRESS=y
 CONFIG_ARCH_VISCONTI=y
 CONFIG_ARCH_XGENE=y
 CONFIG_ARCH_ZYNQMP=y
-CONFIG_ARM64_VA_BITS_48=y
 CONFIG_SCHED_MC=y
 CONFIG_SCHED_SMT=y
 CONFIG_NUMA=y
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (40 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 41/43] arm64: defconfig: Enable LPA2 support Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-03-12 19:53   ` Catalin Marinas
  2024-02-14 12:29 ` [PATCH v8 43/43] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel
  2024-02-16 17:35 ` [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Catalin Marinas
  43 siblings, 1 reply; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

Add a hook to permit architectures to perform validation on the prot
flags passed to mmap(), like arch_validate_prot() does for mprotect().
This will be used by arm64 to reject PROT_WRITE+PROT_EXEC mappings on
configurations that run with WXN enabled.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/mman.h | 15 +++++++++++++++
 mm/mmap.c            |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index dc7048824be8..ec5e7f606e43 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -124,6 +124,21 @@ static inline bool arch_validate_flags(unsigned long flags)
 #define arch_validate_flags arch_validate_flags
 #endif
 
+#ifndef arch_validate_mmap_prot
+/*
+ * This is called from mmap(), which ignores unknown prot bits so the default
+ * is to accept anything.
+ *
+ * Returns true if the prot flags are valid
+ */
+static inline bool arch_validate_mmap_prot(unsigned long prot,
+					   unsigned long addr)
+{
+	return true;
+}
+#define arch_validate_mmap_prot arch_validate_mmap_prot
+#endif
+
 /*
  * Optimisation macro.  It is equivalent to:
  *      (x & bit1) ? bit2 : 0
diff --git a/mm/mmap.c b/mm/mmap.c
index d89770eaab6b..977a8c3fd9f5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1229,6 +1229,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		if (!(file && path_noexec(&file->f_path)))
 			prot |= PROT_EXEC;
 
+	if (!arch_validate_mmap_prot(prot, addr))
+		return -EACCES;
+
 	/* force arch specific MAP_FIXED handling in get_unmapped_area */
 	if (flags & MAP_FIXED_NOREPLACE)
 		flags |= MAP_FIXED;
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v8 43/43] arm64: mm: add support for WXN memory translation attribute
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (41 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
@ 2024-02-14 12:29 ` Ard Biesheuvel
  2024-02-16 17:35 ` [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Catalin Marinas
  43 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-14 12:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

The AArch64 virtual memory system supports a global WXN control, which
can be enabled to make all writable mappings implicitly no-exec. This is
a useful hardening feature, as it prevents mistakes in managing page
table permissions from being exploited to attack the system.

When enabled at EL1, the restrictions apply to both EL1 and EL0. EL1 is
completely under our control, and has been cleaned up to allow WXN to be
enabled from boot onwards. EL0 is not under our control, but given that
widely deployed security features such as selinux or PaX already limit
the ability of user space to create mappings that are writable and
executable at the same time, the impact of enabling this for EL0 is
expected to be limited. (For this reason, common user space libraries
that have a legitimate need for manipulating executable code already
carry fallbacks such as [0].)

If enabled at compile time, the feature can still be disabled at boot if
needed, by passing arm64.nowxn on the kernel command line.

[0] https://github.com/libffi/libffi/blob/master/src/closures.c#L440

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/arm64/Kconfig                    | 11 ++++++
 arch/arm64/include/asm/cpufeature.h   |  8 +++++
 arch/arm64/include/asm/mman.h         | 36 ++++++++++++++++++++
 arch/arm64/include/asm/mmu_context.h  | 30 +++++++++++++++-
 arch/arm64/kernel/pi/idreg-override.c |  4 ++-
 arch/arm64/kernel/pi/map_kernel.c     | 23 +++++++++++++
 arch/arm64/mm/proc.S                  |  6 ++++
 7 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 160856de9bbb..7761ffc6dbcf 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1608,6 +1608,17 @@ config RODATA_FULL_DEFAULT_ENABLED
 	  This requires the linear region to be mapped down to pages,
 	  which may adversely affect performance in some cases.
 
+config ARM64_WXN
+	bool "Enable WXN attribute so all writable mappings are non-exec"
+	help
+	  Set the WXN bit in the SCTLR system register so that all writable
+	  mappings are treated as if the PXN/UXN bit is set as well.
+	  If this is set to Y, it can still be disabled at runtime by
+	  passing 'arm64.nowxn' on the kernel command line.
+
+	  This should only be set if no software needs to be supported that
+	  relies on being able to execute from writable mappings.
+
 config ARM64_SW_TTBR0_PAN
 	bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
 	help
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index a8f97690ce1f..ee33b7e52da7 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -18,6 +18,7 @@
 #define ARM64_SW_FEATURE_OVERRIDE_NOKASLR	0
 #define ARM64_SW_FEATURE_OVERRIDE_HVHE		4
 #define ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF	8
+#define ARM64_SW_FEATURE_OVERRIDE_NOWXN		12
 
 #ifndef __ASSEMBLY__
 
@@ -962,6 +963,13 @@ static inline bool kaslr_disabled_cmdline(void)
 	return arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_NOKASLR);
 }
 
+static inline bool arm64_wxn_enabled(void)
+{
+	if (!IS_ENABLED(CONFIG_ARM64_WXN))
+		return false;
+	return !arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_NOWXN);
+}
+
 u32 get_kvm_ipa_limit(void);
 void dump_cpu_features(void);
 
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 5966ee4a6154..6d4940342ba7 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -35,11 +35,40 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
 }
 #define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
 
+static inline bool arm64_check_wx_prot(unsigned long prot,
+				       struct task_struct *tsk)
+{
+	/*
+	 * When we are running with SCTLR_ELx.WXN==1, writable mappings are
+	 * implicitly non-executable. This means we should reject such mappings
+	 * when user space attempts to create them using mmap() or mprotect().
+	 */
+	if (arm64_wxn_enabled() &&
+	    ((prot & (PROT_WRITE | PROT_EXEC)) == (PROT_WRITE | PROT_EXEC))) {
+		/*
+		 * User space libraries such as libffi carry elaborate
+		 * heuristics to decide whether it is worth it to even attempt
+		 * to create writable executable mappings, as PaX or selinux
+		 * enabled systems will outright reject it. They will usually
+		 * fall back to something else (e.g., two separate shared
+		 * mmap()s of a temporary file) on failure.
+		 */
+		pr_info_ratelimited(
+			"process %s (%d) attempted to create PROT_WRITE+PROT_EXEC mapping\n",
+			tsk->comm, tsk->pid);
+		return false;
+	}
+	return true;
+}
+
 static inline bool arch_validate_prot(unsigned long prot,
 	unsigned long addr __always_unused)
 {
 	unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM;
 
+	if (!arm64_check_wx_prot(prot, current))
+		return false;
+
 	if (system_supports_bti())
 		supported |= PROT_BTI;
 
@@ -50,6 +79,13 @@ static inline bool arch_validate_prot(unsigned long prot,
 }
 #define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr)
 
+static inline bool arch_validate_mmap_prot(unsigned long prot,
+					   unsigned long addr)
+{
+	return arm64_check_wx_prot(prot, current);
+}
+#define arch_validate_mmap_prot arch_validate_mmap_prot
+
 static inline bool arch_validate_flags(unsigned long vm_flags)
 {
 	if (!system_supports_mte())
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index c768d16b81a4..f0fe2d09d139 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -20,13 +20,41 @@
 #include <asm/cpufeature.h>
 #include <asm/daifflags.h>
 #include <asm/proc-fns.h>
-#include <asm-generic/mm_hooks.h>
 #include <asm/cputype.h>
 #include <asm/sysreg.h>
 #include <asm/tlbflush.h>
 
 extern bool rodata_full;
 
+static inline int arch_dup_mmap(struct mm_struct *oldmm,
+				struct mm_struct *mm)
+{
+	return 0;
+}
+
+static inline void arch_exit_mmap(struct mm_struct *mm)
+{
+}
+
+static inline void arch_unmap(struct mm_struct *mm,
+			unsigned long start, unsigned long end)
+{
+}
+
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+		bool write, bool execute, bool foreign)
+{
+	if (IS_ENABLED(CONFIG_ARM64_WXN) && execute &&
+	    (vma->vm_flags & (VM_WRITE | VM_EXEC)) == (VM_WRITE | VM_EXEC)) {
+		pr_warn_ratelimited(
+			"process %s (%d) attempted to execute from writable memory\n",
+			current->comm, current->pid);
+		/* disallow unless the nowxn override is set */
+		return !arm64_wxn_enabled();
+	}
+	return true;
+}
+
 static inline void contextidr_thread_switch(struct task_struct *next)
 {
 	if (!IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR))
diff --git a/arch/arm64/kernel/pi/idreg-override.c b/arch/arm64/kernel/pi/idreg-override.c
index aad399796e81..bccfee34f62f 100644
--- a/arch/arm64/kernel/pi/idreg-override.c
+++ b/arch/arm64/kernel/pi/idreg-override.c
@@ -189,6 +189,7 @@ static const struct ftr_set_desc sw_features __prel64_initconst = {
 		FIELD("nokaslr", ARM64_SW_FEATURE_OVERRIDE_NOKASLR, NULL),
 		FIELD("hvhe", ARM64_SW_FEATURE_OVERRIDE_HVHE, hvhe_filter),
 		FIELD("rodataoff", ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF, NULL),
+		FIELD("nowxn", ARM64_SW_FEATURE_OVERRIDE_NOWXN, NULL),
 		{}
 	},
 };
@@ -221,8 +222,9 @@ static const struct {
 	{ "arm64.nomops",		"id_aa64isar2.mops=0" },
 	{ "arm64.nomte",		"id_aa64pfr1.mte=0" },
 	{ "nokaslr",			"arm64_sw.nokaslr=1" },
-	{ "rodata=off",			"arm64_sw.rodataoff=1" },
+	{ "rodata=off",			"arm64_sw.rodataoff=1 arm64_sw.nowxn=1" },
 	{ "arm64.nolva",		"id_aa64mmfr2.varange=0" },
+	{ "arm64.nowxn",		"arm64_sw.nowxn=1" },
 };
 
 static int __init parse_hexdigit(const char *p, u64 *v)
diff --git a/arch/arm64/kernel/pi/map_kernel.c b/arch/arm64/kernel/pi/map_kernel.c
index 5fa08e13e17e..cac1e1f63c44 100644
--- a/arch/arm64/kernel/pi/map_kernel.c
+++ b/arch/arm64/kernel/pi/map_kernel.c
@@ -132,6 +132,25 @@ static void __init map_kernel(u64 kaslr_offset, u64 va_offset, int root_level)
 	idmap_cpu_replace_ttbr1(swapper_pg_dir);
 }
 
+static void noinline __section(".idmap.text") disable_wxn(void)
+{
+	u64 sctlr = read_sysreg(sctlr_el1) & ~SCTLR_ELx_WXN;
+
+	/*
+	 * We cannot safely clear the WXN bit while the MMU and caches are on,
+	 * so turn the MMU off, flush the TLBs and turn it on again but with
+	 * the WXN bit cleared this time.
+	 */
+	asm("	msr	sctlr_el1, %0		;"
+	    "	isb				;"
+	    "	tlbi    vmalle1			;"
+	    "	dsb     nsh			;"
+	    "	isb				;"
+	    "	msr     sctlr_el1, %1		;"
+	    "	isb				;"
+	    ::	"r"(sctlr & ~SCTLR_ELx_M), "r"(sctlr));
+}
+
 static void noinline __section(".idmap.text") set_ttbr0_for_lpa2(u64 ttbr)
 {
 	u64 sctlr = read_sysreg(sctlr_el1);
@@ -229,6 +248,10 @@ asmlinkage void __init early_map_kernel(u64 boot_status, void *fdt)
 	if (va_bits > VA_BITS_MIN)
 		sysreg_clear_set(tcr_el1, TCR_T1SZ_MASK, TCR_T1SZ(va_bits));
 
+	if (IS_ENABLED(CONFIG_ARM64_WXN) &&
+	    arm64_test_sw_feature_override(ARM64_SW_FEATURE_OVERRIDE_NOWXN))
+		disable_wxn();
+
 	/*
 	 * The virtual KASLR displacement modulo 2MiB is decided by the
 	 * physical placement of the image, as otherwise, we might not be able
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 9d40f3ffd8d2..bfd2ad896108 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -546,6 +546,12 @@ alternative_else_nop_endif
 	 * Prepare SCTLR
 	 */
 	mov_q	x0, INIT_SCTLR_EL1_MMU_ON
+#ifdef CONFIG_ARM64_WXN
+	ldr_l	x1, arm64_sw_feature_override + FTR_OVR_VAL_OFFSET
+	tst	x1, #0xf << ARM64_SW_FEATURE_OVERRIDE_NOWXN
+	orr	x1, x0, #SCTLR_ELx_WXN
+	csel	x0, x0, x1, ne
+#endif
 	ret					// return to head.S
 
 	.unreq	mair
-- 
2.43.0.687.g38aa6559b0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1
  2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
                   ` (42 preceding siblings ...)
  2024-02-14 12:29 ` [PATCH v8 43/43] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel
@ 2024-02-16 17:35 ` Catalin Marinas
  2024-02-16 18:23   ` Ard Biesheuvel
  43 siblings, 1 reply; 59+ messages in thread
From: Catalin Marinas @ 2024-02-16 17:35 UTC (permalink / raw)
  To: linux-arm-kernel, Ard Biesheuvel
  Cc: Will Deacon, Ard Biesheuvel, Marc Zyngier, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Kees Cook

On Wed, 14 Feb 2024 13:28:46 +0100, Ard Biesheuvel wrote:
> This v8 covers the remaining changes that implement support for LPA2 and
> WXN at stage 1, now that some of the prerequisites are in place.
> 
> v4: https://lore.kernel.org/r/20230912141549.278777-63-ardb@google.com/
> v5: https://lore.kernel.org/r/20231124101840.944737-41-ardb@google.com/
> v6: https://lore.kernel.org/r/20231129111555.3594833-43-ardb@google.com/
> v7: https://lore.kernel.org/r/20240123145258.1462979-52-ardb%2Bgit%40google.com/
> 
> [...]

I queued this series via the arm64 tree (for-next/stage1-lpa2). I tried
a couple of releases ago but for some reason my tests started failing at
it was very close to the merging window, so dropped. This time around,
if anything goes wrong, we have a bit of time to fix (it might as well
have been my test scripts and nothing to do with these patches).

The last patch introducing WXN has ABI implications but it's default
off. I think we should keep the patch as certain markets will likely
turn it on.

Surprisingly, there are no conflicts with Ryan's contpte series AFAICT
(I did a merge locally).

Thanks.

[01/43] arm64: kernel: Manage absolute relocations in code built under pi/
        https://git.kernel.org/arm64/c/48157aa39286
[02/43] arm64: kernel: Don't rely on objcopy to make code under pi/ __init
        https://git.kernel.org/arm64/c/a86aa72eb3b0
[03/43] arm64: head: move relocation handling to C code
        https://git.kernel.org/arm64/c/734958ef0b54
[04/43] arm64: idreg-override: Move to early mini C runtime
        https://git.kernel.org/arm64/c/e223a4491255
[05/43] arm64: kernel: Remove early fdt remap code
        https://git.kernel.org/arm64/c/9c4cd2a7d12c
[06/43] arm64: head: Clear BSS and the kernel page tables in one go
        https://git.kernel.org/arm64/c/aa99aad798a8
[07/43] arm64: Move feature overrides into the BSS section
        https://git.kernel.org/arm64/c/30687dec5ed5
[08/43] arm64: head: Run feature override detection before mapping the kernel
        https://git.kernel.org/arm64/c/dcfe969a6419
[09/43] arm64: head: move dynamic shadow call stack patching into early C runtime
        https://git.kernel.org/arm64/c/8a6e40e1f68e
[10/43] arm64: cpufeature: Add helper to test for CPU feature overrides
        https://git.kernel.org/arm64/c/35876f35f482
[11/43] arm64: kaslr: Use feature override instead of parsing the cmdline again
        https://git.kernel.org/arm64/c/af73b9a2dd39
[12/43] arm64: idreg-override: Create a pseudo feature for rodata=off
        https://git.kernel.org/arm64/c/9ddd9baa42a0
[13/43] arm64: Add helpers to probe local CPU for PAC and BTI support
        https://git.kernel.org/arm64/c/a669c6a49356
[14/43] arm64: head: allocate more pages for the kernel mapping
        https://git.kernel.org/arm64/c/8d47b8e5c74a
[15/43] arm64: head: move memstart_offset_seed handling to C code
        https://git.kernel.org/arm64/c/aa6a52b2470c
[16/43] arm64: mm: Make kaslr_requires_kpti() a static inline
        https://git.kernel.org/arm64/c/293d865f0af5
[17/43] arm64: mmu: Make __cpu_replace_ttbr1() out of line
        https://git.kernel.org/arm64/c/82ca151da7d5
[18/43] arm64: head: Move early kernel mapping routines into C code
        https://git.kernel.org/arm64/c/97a6f43bb049
[19/43] arm64: mm: Use 48-bit virtual addressing for the permanent ID map
        https://git.kernel.org/arm64/c/e6128a8e523c
[20/43] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels
        https://git.kernel.org/arm64/c/34b98e55f684
[21/43] arm64: kernel: Create initial ID map from C code
        https://git.kernel.org/arm64/c/84b04d3e6bdb
[22/43] arm64: mm: avoid fixmap for early swapper_pg_dir updates
        https://git.kernel.org/arm64/c/567a70c181df
[23/43] arm64: mm: omit redundant remap of kernel image
        https://git.kernel.org/arm64/c/ba5b0333a847
[24/43] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()"
        https://git.kernel.org/arm64/c/e0f92f0d1b51
[25/43] arm64: mm: Handle LVA support as a CPU feature
        https://git.kernel.org/arm64/c/9cce9c6c2c3b
[26/43] arm64: mm: Add feature override support for LVA
        https://git.kernel.org/arm64/c/68aec33f8f5a
[27/43] arm64: Avoid #define'ing PTE_MAYBE_NG to 0x0 for asm use
        https://git.kernel.org/arm64/c/60d043c10176
[28/43] arm64: Add ESR decoding for exceptions involving translation level -1
        https://git.kernel.org/arm64/c/7ac8d5b2423c
[29/43] arm64: mm: Wire up TCR.DS bit to PTE shareability fields
        https://git.kernel.org/arm64/c/db95ea787bd1
[30/43] arm64: mm: Add LPA2 support to phys<->pte conversion routines
        https://git.kernel.org/arm64/c/925a0eb48044
[31/43] arm64: mm: Add definitions to support 5 levels of paging
        https://git.kernel.org/arm64/c/a6bbf5d4d9d1
[32/43] arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion
        https://git.kernel.org/arm64/c/2b6c8f96cc47
[33/43] arm64: Enable LPA2 at boot if supported by the system
        https://git.kernel.org/arm64/c/9684ec186f8f
[34/43] arm64: mm: Add 5 level paging support to fixmap and swapper handling
        https://git.kernel.org/arm64/c/6ed8a3a094b4
[35/43] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
        https://git.kernel.org/arm64/c/0383808e4d99
[36/43] arm64: mm: Add support for folding PUDs at runtime
        https://git.kernel.org/arm64/c/0dd4f60a2c76
[37/43] arm64: ptdump: Disregard unaddressable VA space
        https://git.kernel.org/arm64/c/16f22981b6d7
[38/43] arm64: ptdump: Deal with translation levels folded at runtime
        https://git.kernel.org/arm64/c/d40900fcb397
[39/43] arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels
        https://git.kernel.org/arm64/c/95e059b5db60
[40/43] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs
        https://git.kernel.org/arm64/c/352b0395b505
[41/43] arm64: defconfig: Enable LPA2 support
        https://git.kernel.org/arm64/c/5d101654226d
[42/43] mm: add arch hook to validate mmap() prot flags
        https://git.kernel.org/arm64/c/cb1a393c40ee
[43/43] arm64: mm: add support for WXN memory translation attribute
        https://git.kernel.org/arm64/c/50e3ed0f93f4

-- 
Catalin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1
  2024-02-16 17:35 ` [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Catalin Marinas
@ 2024-02-16 18:23   ` Ard Biesheuvel
  2024-02-16 22:34     ` Ard Biesheuvel
  0 siblings, 1 reply; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-16 18:23 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Kees Cook

On Fri, 16 Feb 2024 at 18:35, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Wed, 14 Feb 2024 13:28:46 +0100, Ard Biesheuvel wrote:
> > This v8 covers the remaining changes that implement support for LPA2 and
> > WXN at stage 1, now that some of the prerequisites are in place.
> >
> > v4: https://lore.kernel.org/r/20230912141549.278777-63-ardb@google.com/
> > v5: https://lore.kernel.org/r/20231124101840.944737-41-ardb@google.com/
> > v6: https://lore.kernel.org/r/20231129111555.3594833-43-ardb@google.com/
> > v7: https://lore.kernel.org/r/20240123145258.1462979-52-ardb%2Bgit%40google.com/
> >
> > [...]
>
> I queued this series via the arm64 tree (for-next/stage1-lpa2). I tried
> a couple of releases ago but for some reason my tests started failing at
> it was very close to the merging window, so dropped. This time around,
> if anything goes wrong, we have a bit of time to fix (it might as well
> have been my test scripts and nothing to do with these patches).
>
> The last patch introducing WXN has ABI implications but it's default
> off. I think we should keep the patch as certain markets will likely
> turn it on.
>
> Surprisingly, there are no conflicts with Ryan's contpte series AFAICT
> (I did a merge locally).
>

No *lexical* conflicts, right? :-)

I built for-next/core with 16k pages/52-bits, and ended up with the
splat below. Unfortunately, it is intermittent, and I haven't been
able to reproduce it, so I have no idea whether it is my code, Ryan's
code or an inadvertent interaction between the two. (Or perhaps some
other code in the tree)

I did build with WXN enabled in this case, but it seems unlikely that
that plays a role here.






[    0.392768] Unable to handle kernel write to read-only memory at
virtual address ffffe6d037873360
[    0.393314] Mem abort info:
[    0.393480]   ESR = 0x000000009600004f
[    0.393702]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.394089]   SET = 0, FnV = 0
[    0.394272]   EA = 0, S1PTW = 0
[    0.394458]   FSC = 0x0f: level 3 permission fault
[    0.394741] Data abort info:
[    0.394915]   ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000
[    0.395239]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[    0.395602]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.396152] swapper pgtable: 16k pages, 47-bit VAs, pgdp=00000000c8870000
[    0.396844] [ffffe6d037873360] pgd=0000000000000000,
p4d=0000000000000000, pud=10000000c9034003, pmd=10000000c903c003,
pte=00600000c8870783
[    0.398172] Internal error: Oops: 000000009600004f [#1] PREEMPT SMP
[    0.398833] Modules linked in:
[    0.399147] CPU: 3 PID: 1 Comm: systemd Not tainted 6.8.0-rc3+ #28
[    0.399774] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[    0.400464] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    0.401176] pc : __pmd_alloc+0x164/0x1f4
[    0.401579] lr : __pmd_alloc+0xac/0x1f4
[    0.402117] sp : ffffc0008007b960
[    0.402672] x29: ffffc0008007b960 x28: ffffe6cfe1844000 x27: 0000000000002cc2
[    0.403833] x26: 0000000002000000 x25: ffffe6d037873360 x24: ffffe6cfe17f8000
[    0.404990] x23: ffff8ebf40fd4000 x22: ffffe6d037b83c80 x21: ffffe6d037b83c80
[    0.406177] x20: ffffe6d037873360 x19: ffffe6d037b83d1c x18: 000000000000cda8
[    0.407381] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[    0.408455] x14: 0000000000001b00 x13: dead000000000122 x12: 0000000000000000
[    0.409179] x11: 0000000000000015 x10: 1000000000000003 x9 : 1000000000000003
[    0.409939] x8 : 1000000100fd4003 x7 : 0000000000000000 x6 : ffffc0008007ba50
[    0.410670] x5 : ffff8ebf7f9bb9e0 x4 : ffff8ebf7f9bb9c0 x3 : 0000000000000901
[    0.411394] x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffffe6d037b83d1c
[    0.412204] Call trace:
[    0.412426]  __pmd_alloc+0x164/0x1f4
[    0.412736]  vmap_pages_pud_range+0x160/0x244
[    0.413122]  __vmap_pages_range_noflush+0xb0/0x218
[    0.413531]  __vmalloc_area_node+0x4b8/0x5c0
[    0.413897]  __vmalloc_node_range+0x124/0x218
[    0.414276]  module_alloc+0x118/0x170
[    0.414589]  load_module+0xd5c/0x133c
[    0.414898]  __arm64_sys_finit_module+0x21c/0x2c0
[    0.415292]  invoke_syscall+0x48/0xd8
[    0.415623]  do_el0_svc+0x7c/0xa8
[    0.415937]  el0_svc+0x34/0x78
[    0.416210]  el0t_64_sync_handler+0x84/0xfc
[    0.416589]  el0t_64_sync+0x190/0x194
[    0.416910] Code: 9278a508 aa090108 f9000fa8 f9400fa8 (f9000288)
[    0.417432] ---[ end trace 0000000000000000 ]---
[    0.417861] note: systemd[1] exited with preempt_count 1
[    0.418237] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[    0.418765] SMP: stopping secondary CPUs
[    0.419061] Kernel Offset: 0x26cfb5c90000 from 0xffffc00080000000
[    0.419517] PHYS_OFFSET: 0xfff07141c0000000
[    0.419820] CPU features: 0x0,00000000,a0044d4a,33ce7727
[    0.420214] Memory Limit: none
[    0.420439] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b ]---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1
  2024-02-16 18:23   ` Ard Biesheuvel
@ 2024-02-16 22:34     ` Ard Biesheuvel
  0 siblings, 0 replies; 59+ messages in thread
From: Ard Biesheuvel @ 2024-02-16 22:34 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Kees Cook

On Fri, 16 Feb 2024 at 19:23, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 16 Feb 2024 at 18:35, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >
> > On Wed, 14 Feb 2024 13:28:46 +0100, Ard Biesheuvel wrote:
> > > This v8 covers the remaining changes that implement support for LPA2 and
> > > WXN at stage 1, now that some of the prerequisites are in place.
> > >
> > > v4: https://lore.kernel.org/r/20230912141549.278777-63-ardb@google.com/
> > > v5: https://lore.kernel.org/r/20231124101840.944737-41-ardb@google.com/
> > > v6: https://lore.kernel.org/r/20231129111555.3594833-43-ardb@google.com/
> > > v7: https://lore.kernel.org/r/20240123145258.1462979-52-ardb%2Bgit%40google.com/
> > >
> > > [...]
> >
> > I queued this series via the arm64 tree (for-next/stage1-lpa2). I tried
> > a couple of releases ago but for some reason my tests started failing at
> > it was very close to the merging window, so dropped. This time around,
> > if anything goes wrong, we have a bit of time to fix (it might as well
> > have been my test scripts and nothing to do with these patches).
> >
> > The last patch introducing WXN has ABI implications but it's default
> > off. I think we should keep the patch as certain markets will likely
> > turn it on.
> >
> > Surprisingly, there are no conflicts with Ryan's contpte series AFAICT
> > (I did a merge locally).
> >
>
> No *lexical* conflicts, right? :-)
>
> I built for-next/core with 16k pages/52-bits, and ended up with the
> splat below. Unfortunately, it is intermittent, and I haven't been
> able to reproduce it, so I have no idea whether it is my code, Ryan's
> code or an inadvertent interaction between the two. (Or perhaps some
> other code in the tree)
>

It's my code: on 16k without LPA2, the PUDs are folded at runtime, so
set_pud() is writing to swapper_pg_dir, which is mapped read-only.
I'll have a fix out shortly.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime
  2024-02-14 12:29 ` [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime Ard Biesheuvel
@ 2024-02-29 14:17   ` Ryan Roberts
  2024-02-29 23:01     ` Nathan Chancellor
  0 siblings, 1 reply; 59+ messages in thread
From: Ryan Roberts @ 2024-02-29 14:17 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Anshuman Khandual, Kees Cook, Aishwarya TCV,
	Mark Brown

Hi Ard,

On 14/02/2024 12:29, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> In order to support LPA2 on 16k pages in a way that permits non-LPA2
> systems to run the same kernel image, we have to be able to fall back to
> at most 48 bits of virtual addressing.
> 
> Falling back to 48 bits would result in a level 0 with only 2 entries,
> which is suboptimal in terms of TLB utilization. So instead, let's fall
> back to 47 bits in that case. This means we need to be able to fold PUDs
> dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
> on LPA2 with 4k pages.

I'm seeing a panic during boot in today's linux-next (20240229) and bisect seems pretty confident that this commit is the offender. That said, its the merge commit that shows up as the problem commit:

26843fe8fa72 Merge branch 'for-next/core' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

but when testing the arm64's for-next/core, the problem doesn't exist. So I rebased the branch into linux-next and bisected again. That time, it fingers this patch. So I guess there is some interaction between this and other changes in next?


Note I'm running defconfig (so 4K base pages) plus:

# Squashfs for snaps, xfs for large file folios.
./scripts/config --enable CONFIG_SQUASHFS_LZ4
./scripts/config --enable CONFIG_SQUASHFS_LZO
./scripts/config --enable CONFIG_SQUASHFS_XZ
./scripts/config --enable CONFIG_SQUASHFS_ZSTD
./scripts/config --enable CONFIG_XFS_FS

# Useful trace features (on for Ubuntu configs).
./scripts/config --enable CONFIG_FTRACE
./scripts/config --enable CONFIG_FUNCTION_TRACER
./scripts/config --enable CONFIG_KPROBES
./scripts/config --enable CONFIG_HIST_TRIGGERS
./scripts/config --enable CONFIG_FTRACE_SYSCALLS

# For general mm debug.
./scripts/config --enable CONFIG_DEBUG_VM
./scripts/config --enable CONFIG_DEBUG_VM_MAPLE_TREE
./scripts/config --enable CONFIG_DEBUG_VM_RB
./scripts/config --enable CONFIG_DEBUG_VM_PGFLAGS
./scripts/config --enable CONFIG_DEBUG_VM_PGTABLE
./scripts/config --enable CONFIG_PAGE_TABLE_CHECK

# For mm selftests.
./scripts/config --enable CONFIG_USERFAULTFD
./scripts/config --enable CONFIG_TEST_VMALLOC
./scripts/config --enable CONFIG_GUP_TEST

# Ram block device for testing swap changes.
./scripts/config --enable CONFIG_BLK_DEV_RAM


I'm booting a VM on Apple M2 with 12G RAM assigned, split evenly across 2 emulated numa nodes, and with a bunch of hugetlb pages of all sizes reserved, if that matters.


And I see this panic during boot (I guess due to the VM_DEBUG Kconfigs):

[    0.161062] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
[    0.161416] BUG: Bad page state in process swapper/0  pfn:18a65d
[    0.161634] page does not match folio
[    0.161753] page: refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x18a65d
[    0.162046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[    0.162332] Mem abort info:
[    0.162427]   ESR = 0x0000000096000004
[    0.162559]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.162723]   SET = 0, FnV = 0
[    0.162827]   EA = 0, S1PTW = 0
[    0.162933]   FSC = 0x04: level 0 translation fault
[    0.163089] Data abort info:
[    0.163189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[    0.163370]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    0.163539]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.163719] [0000000000000008] user address but active_mm is swapper
[    0.163934] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[    0.164143] Modules linked in:
[    0.164251] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-00966-gde701dc1f7f8 #25
[    0.164516] Hardware name: linux,dummy-virt (DT)
[    0.164704] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    0.165052] pc : get_pfnblock_flags_mask+0x3c/0x68
[    0.165281] lr : __dump_page+0x1a0/0x408
[    0.165504] sp : ffff80008007b8f0
[    0.165715] x29: ffff80008007b8f0 x28: 0000000000ffffc0 x27: 0000000000000000
[    0.166047] x26: ffff80008007b950 x25: 0000000000000000 x24: 00000000fffffdff
[    0.166358] x23: ffffba8a417ba000 x22: 000000000018a65d x21: ffffba8a41601bf8
[    0.166701] x20: ffff80008007b950 x19: ffff80008007b950 x18: 0000000000000006
[    0.167036] x17: 78303a7865646e69 x16: 2030303030303030 x15: 0720072007200720
[    0.167365] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
[    0.167693] x11: 0720072007200720 x10: ffffba8a4269c038 x9 : ffffba8a3fb0d0b8
[    0.168017] x8 : 00000000ffffefff x7 : ffffba8a4269c038 x6 : 80000000fffff000
[    0.168346] x5 : 000003fffff81de4 x4 : 0001fffffc0ef230 x3 : 0000000000000000
[    0.168699] x2 : 0000000000000007 x1 : fffffe0779181ee5 x0 : 00000000001fffff
[    0.169041] Call trace:
[    0.169164]  get_pfnblock_flags_mask+0x3c/0x68
[    0.169413]  dump_page+0x2c/0x70
[    0.169565]  bad_page+0x84/0x130
[    0.169734]  free_page_is_bad_report+0xa0/0xb8
[    0.169958]  free_unref_page_prepare+0x350/0x428
[    0.170132]  free_unref_page+0x50/0x1f0
[    0.170278]  __free_pages+0x11c/0x160
[    0.170417]  free_pages.part.0+0x6c/0x88
[    0.170576]  free_pages+0x1c/0x38
[    0.170703]  destroy_args+0x1c8/0x330
[    0.170890]  debug_vm_pgtable+0xae8/0x10f8
[    0.171059]  do_one_initcall+0x60/0x2c0
[    0.171222]  kernel_init_freeable+0x1ec/0x3d8
[    0.171406]  kernel_init+0x28/0x1f0
[    0.171557]  ret_from_fork+0x10/0x20
[    0.171712] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460) 
[    0.171963] ---[ end trace 0000000000000000 ]---
[    0.172156] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.172383] SMP: stopping secondary CPUs
[    0.172649] Kernel Offset: 0x3a89bf800000 from 0xffff800080000000
[    0.173923] PHYS_OFFSET: 0xfffff76180000000
[    0.174585] CPU features: 0x0,00000000,2004454a,13867723
[    0.175707] Memory Limit: none
[    0.176261] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---


bisection log (after rebasing arm64 stuff onto linux-next):

git bisect start
# bad: [446381d9ff3498f7c406109fac88d10bf855d0bd] arm64: Update setup_arch() comment on interrupt masking
git bisect bad 446381d9ff3498f7c406109fac88d10bf855d0bd
# good: [7f43e0f76e4710b2882c551519eff50e502115c5] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux.git
git bisect good 7f43e0f76e4710b2882c551519eff50e502115c5
# good: [36da1bf4c61bf1e4322b9b04d6eb1aba2a515b73] arm64: mm: omit redundant remap of kernel image
git bisect good 36da1bf4c61bf1e4322b9b04d6eb1aba2a515b73
# bad: [38f5662b4788b308f3be3cdd15e6c0149a627937] mm: add arch hook to validate mmap() prot flags
git bisect bad 38f5662b4788b308f3be3cdd15e6c0149a627937
# good: [653a0b074c33c48913c78e72a000ff935ff208c2] arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion
git bisect good 653a0b074c33c48913c78e72a000ff935ff208c2
# bad: [ebc9452776ee8d978908eb2f7424838b0bff6285] arm64: ptdump: Disregard unaddressable VA space
git bisect bad ebc9452776ee8d978908eb2f7424838b0bff6285
# good: [1d8cd0e6257930b0df58ce51bca44e232dcce49c] arm64: mm: Add 5 level paging support to fixmap and swapper handling
git bisect good 1d8cd0e6257930b0df58ce51bca44e232dcce49c
# bad: [de701dc1f7f88e85aca48e4c76c66f03ac5fc55b] arm64: mm: Add support for folding PUDs at runtime
git bisect bad de701dc1f7f88e85aca48e4c76c66f03ac5fc55b
# good: [3561c4b14b23f03f109e954b5d89839bb8b73798] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging
git bisect good 3561c4b14b23f03f109e954b5d89839bb8b73798
# first bad commit: [de701dc1f7f88e85aca48e4c76c66f03ac5fc55b] arm64: mm: Add support for folding PUDs at runtime


I haven't looked in detail at your patch, but hoped you might get to the root cause quicker than me?

Thanks,
Ryan


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime
  2024-02-29 14:17   ` Ryan Roberts
@ 2024-02-29 23:01     ` Nathan Chancellor
  2024-03-01  8:54       ` Ryan Roberts
  0 siblings, 1 reply; 59+ messages in thread
From: Nathan Chancellor @ 2024-02-29 23:01 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Ard Biesheuvel, linux-arm-kernel, Ard Biesheuvel,
	Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland,
	Anshuman Khandual, Kees Cook, Aishwarya TCV, Mark Brown

On Thu, Feb 29, 2024 at 02:17:52PM +0000, Ryan Roberts wrote:
> Hi Ard,
> 
> On 14/02/2024 12:29, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> > 
> > In order to support LPA2 on 16k pages in a way that permits non-LPA2
> > systems to run the same kernel image, we have to be able to fall back to
> > at most 48 bits of virtual addressing.
> > 
> > Falling back to 48 bits would result in a level 0 with only 2 entries,
> > which is suboptimal in terms of TLB utilization. So instead, let's fall
> > back to 47 bits in that case. This means we need to be able to fold PUDs
> > dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
> > on LPA2 with 4k pages.
> 
> I'm seeing a panic during boot in today's linux-next (20240229) and bisect seems pretty confident that this commit is the offender. That said, its the merge commit that shows up as the problem commit:
> 
> 26843fe8fa72 Merge branch 'for-next/core' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> 
> but when testing the arm64's for-next/core, the problem doesn't exist. So I rebased the branch into linux-next and bisected again. That time, it fingers this patch. So I guess there is some interaction between this and other changes in next?
<...>
> [    0.161062] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
> [    0.161416] BUG: Bad page state in process swapper/0  pfn:18a65d
> [    0.161634] page does not match folio
> [    0.161753] page: refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x18a65d
> [    0.162046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> [    0.162332] Mem abort info:
> [    0.162427]   ESR = 0x0000000096000004
> [    0.162559]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    0.162723]   SET = 0, FnV = 0
> [    0.162827]   EA = 0, S1PTW = 0
> [    0.162933]   FSC = 0x04: level 0 translation fault
> [    0.163089] Data abort info:
> [    0.163189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [    0.163370]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [    0.163539]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [    0.163719] [0000000000000008] user address but active_mm is swapper
> [    0.163934] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [    0.164143] Modules linked in:
> [    0.164251] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-00966-gde701dc1f7f8 #25
> [    0.164516] Hardware name: linux,dummy-virt (DT)
> [    0.164704] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [    0.165052] pc : get_pfnblock_flags_mask+0x3c/0x68
> [    0.165281] lr : __dump_page+0x1a0/0x408
> [    0.165504] sp : ffff80008007b8f0
> [    0.165715] x29: ffff80008007b8f0 x28: 0000000000ffffc0 x27: 0000000000000000
> [    0.166047] x26: ffff80008007b950 x25: 0000000000000000 x24: 00000000fffffdff
> [    0.166358] x23: ffffba8a417ba000 x22: 000000000018a65d x21: ffffba8a41601bf8
> [    0.166701] x20: ffff80008007b950 x19: ffff80008007b950 x18: 0000000000000006
> [    0.167036] x17: 78303a7865646e69 x16: 2030303030303030 x15: 0720072007200720
> [    0.167365] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
> [    0.167693] x11: 0720072007200720 x10: ffffba8a4269c038 x9 : ffffba8a3fb0d0b8
> [    0.168017] x8 : 00000000ffffefff x7 : ffffba8a4269c038 x6 : 80000000fffff000
> [    0.168346] x5 : 000003fffff81de4 x4 : 0001fffffc0ef230 x3 : 0000000000000000
> [    0.168699] x2 : 0000000000000007 x1 : fffffe0779181ee5 x0 : 00000000001fffff
> [    0.169041] Call trace:
> [    0.169164]  get_pfnblock_flags_mask+0x3c/0x68
> [    0.169413]  dump_page+0x2c/0x70
> [    0.169565]  bad_page+0x84/0x130
> [    0.169734]  free_page_is_bad_report+0xa0/0xb8
> [    0.169958]  free_unref_page_prepare+0x350/0x428
> [    0.170132]  free_unref_page+0x50/0x1f0
> [    0.170278]  __free_pages+0x11c/0x160
> [    0.170417]  free_pages.part.0+0x6c/0x88
> [    0.170576]  free_pages+0x1c/0x38
> [    0.170703]  destroy_args+0x1c8/0x330
> [    0.170890]  debug_vm_pgtable+0xae8/0x10f8
> [    0.171059]  do_one_initcall+0x60/0x2c0
> [    0.171222]  kernel_init_freeable+0x1ec/0x3d8
> [    0.171406]  kernel_init+0x28/0x1f0
> [    0.171557]  ret_from_fork+0x10/0x20
> [    0.171712] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460) 
> [    0.171963] ---[ end trace 0000000000000000 ]---
> [    0.172156] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    0.172383] SMP: stopping secondary CPUs
> [    0.172649] Kernel Offset: 0x3a89bf800000 from 0xffff800080000000
> [    0.173923] PHYS_OFFSET: 0xfffff76180000000
> [    0.174585] CPU features: 0x0,00000000,2004454a,13867723
> [    0.175707] Memory Limit: none
> [    0.176261] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

I did a second bisection by merging https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/stage1-lpa2
on top of the merges before for-next/core and eventually landed on:

d67cd9f23139ddfd7e0ef1e18474c16445188433 is the first bad commit
commit d67cd9f23139ddfd7e0ef1e18474c16445188433
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Feb 27 19:23:31 2024 +0000

    mm: add __dump_folio()

    Turn __dump_page() into a wrapper around __dump_folio().  Snapshot the
    page & folio into a stack variable so we don't hit BUG_ON() if an
    allocation is freed under us and what was a folio pointer becomes a
    pointer to a tail page.

    Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

 mm/debug.c | 120 +++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 66 insertions(+), 54 deletions(-)

# bad: [7f43e0f76e4710b2882c551519eff50e502115c5] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux.git
# good: [805d849d7c3cc1f38efefd48b2480d62b7b5dcb7] Merge tag 'acpi-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect start '7f43e0f76e4710b2882c551519eff50e502115c5' '805d849d7c3cc1f38efefd48b2480d62b7b5dcb7'
# bad: [7e6ae2db7f319bf9613ec6db8fa3c9bc1de1b346] mm: add swappiness= arg to memory.reclaim
git bisect bad 7e6ae2db7f319bf9613ec6db8fa3c9bc1de1b346
# good: [c6ec76a2ebc5829e5826b218d2e1475ec11b333e] mm: add pte_batch_hint() to reduce scanning in folio_pte_batch()
git bisect good c6ec76a2ebc5829e5826b218d2e1475ec11b333e
# good: [a02829f011b64e6c102929ed55da52e38391e970] writeback: fix done_index when hitting the wbc->nr_to_write
git bisect good a02829f011b64e6c102929ed55da52e38391e970
# good: [de435b3b914686116f86494b8cb53224d7e24cc5] arm64/mm: improve comment in contpte_ptep_get_lockless()
git bisect good de435b3b914686116f86494b8cb53224d7e24cc5
# good: [c143365caad5c3ad45662c393b9114c7cc694473] mm: handle large folios in free_unref_folios()
git bisect good c143365caad5c3ad45662c393b9114c7cc694473
# skip: [ab6445067cfbaf4ac94e969f7e8e785049314099] mm: add alloc_contig_migrate_range allocation statistics
git bisect skip ab6445067cfbaf4ac94e969f7e8e785049314099
# good: [447bf726277614396adcd4beedaf77ef74a748fa] modules: wait do_free_init correctly
git bisect good 447bf726277614396adcd4beedaf77ef74a748fa
# good: [cf2ac0c3998ffcbea680aeea2dee04d450654534] mm: remove PageWaiters, PageSetWaiters and PageClearWaiters
git bisect good cf2ac0c3998ffcbea680aeea2dee04d450654534
# bad: [c48de1718df9dcafb08aefbc6a0edf46e2f94e66] mm: constify more page/folio tests
git bisect bad c48de1718df9dcafb08aefbc6a0edf46e2f94e66
# bad: [48e4e7b8eea5fc80faad81515d429bce041f352d] mm: make dump_page() take a const argument
git bisect bad 48e4e7b8eea5fc80faad81515d429bce041f352d
# bad: [d67cd9f23139ddfd7e0ef1e18474c16445188433] mm: add __dump_folio()
git bisect bad d67cd9f23139ddfd7e0ef1e18474c16445188433
# good: [e9844b2b6cf103f4f3a42119d62758eb26c5c233] mm: remove PageYoung and PageIdle definitions
git bisect good e9844b2b6cf103f4f3a42119d62758eb26c5c233
# first bad commit: [d67cd9f23139ddfd7e0ef1e18474c16445188433] mm: add __dump_folio()

Cheers,
Nathan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime
  2024-02-29 23:01     ` Nathan Chancellor
@ 2024-03-01  8:54       ` Ryan Roberts
  2024-03-01  9:10         ` Ard Biesheuvel
  0 siblings, 1 reply; 59+ messages in thread
From: Ryan Roberts @ 2024-03-01  8:54 UTC (permalink / raw)
  To: Nathan Chancellor, Matthew Wilcox
  Cc: Ard Biesheuvel, linux-arm-kernel, Ard Biesheuvel,
	Catalin Marinas, Will Deacon, Marc Zyngier, Mark Rutland,
	Anshuman Khandual, Kees Cook, Aishwarya TCV, Mark Brown

+ Matthew


On 29/02/2024 23:01, Nathan Chancellor wrote:
> On Thu, Feb 29, 2024 at 02:17:52PM +0000, Ryan Roberts wrote:
>> Hi Ard,
>>
>> On 14/02/2024 12:29, Ard Biesheuvel wrote:
>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>
>>> In order to support LPA2 on 16k pages in a way that permits non-LPA2
>>> systems to run the same kernel image, we have to be able to fall back to
>>> at most 48 bits of virtual addressing.
>>>
>>> Falling back to 48 bits would result in a level 0 with only 2 entries,
>>> which is suboptimal in terms of TLB utilization. So instead, let's fall
>>> back to 47 bits in that case. This means we need to be able to fold PUDs
>>> dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
>>> on LPA2 with 4k pages.
>>
>> I'm seeing a panic during boot in today's linux-next (20240229) and bisect seems pretty confident that this commit is the offender. That said, its the merge commit that shows up as the problem commit:
>>
>> 26843fe8fa72 Merge branch 'for-next/core' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
>>
>> but when testing the arm64's for-next/core, the problem doesn't exist. So I rebased the branch into linux-next and bisected again. That time, it fingers this patch. So I guess there is some interaction between this and other changes in next?
> <...>
>> [    0.161062] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
>> [    0.161416] BUG: Bad page state in process swapper/0  pfn:18a65d
>> [    0.161634] page does not match folio
>> [    0.161753] page: refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x18a65d
>> [    0.162046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
>> [    0.162332] Mem abort info:
>> [    0.162427]   ESR = 0x0000000096000004
>> [    0.162559]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [    0.162723]   SET = 0, FnV = 0
>> [    0.162827]   EA = 0, S1PTW = 0
>> [    0.162933]   FSC = 0x04: level 0 translation fault
>> [    0.163089] Data abort info:
>> [    0.163189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> [    0.163370]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> [    0.163539]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [    0.163719] [0000000000000008] user address but active_mm is swapper
>> [    0.163934] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> [    0.164143] Modules linked in:
>> [    0.164251] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-00966-gde701dc1f7f8 #25
>> [    0.164516] Hardware name: linux,dummy-virt (DT)
>> [    0.164704] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>> [    0.165052] pc : get_pfnblock_flags_mask+0x3c/0x68
>> [    0.165281] lr : __dump_page+0x1a0/0x408
>> [    0.165504] sp : ffff80008007b8f0
>> [    0.165715] x29: ffff80008007b8f0 x28: 0000000000ffffc0 x27: 0000000000000000
>> [    0.166047] x26: ffff80008007b950 x25: 0000000000000000 x24: 00000000fffffdff
>> [    0.166358] x23: ffffba8a417ba000 x22: 000000000018a65d x21: ffffba8a41601bf8
>> [    0.166701] x20: ffff80008007b950 x19: ffff80008007b950 x18: 0000000000000006
>> [    0.167036] x17: 78303a7865646e69 x16: 2030303030303030 x15: 0720072007200720
>> [    0.167365] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
>> [    0.167693] x11: 0720072007200720 x10: ffffba8a4269c038 x9 : ffffba8a3fb0d0b8
>> [    0.168017] x8 : 00000000ffffefff x7 : ffffba8a4269c038 x6 : 80000000fffff000
>> [    0.168346] x5 : 000003fffff81de4 x4 : 0001fffffc0ef230 x3 : 0000000000000000
>> [    0.168699] x2 : 0000000000000007 x1 : fffffe0779181ee5 x0 : 00000000001fffff
>> [    0.169041] Call trace:
>> [    0.169164]  get_pfnblock_flags_mask+0x3c/0x68
>> [    0.169413]  dump_page+0x2c/0x70
>> [    0.169565]  bad_page+0x84/0x130
>> [    0.169734]  free_page_is_bad_report+0xa0/0xb8
>> [    0.169958]  free_unref_page_prepare+0x350/0x428
>> [    0.170132]  free_unref_page+0x50/0x1f0
>> [    0.170278]  __free_pages+0x11c/0x160
>> [    0.170417]  free_pages.part.0+0x6c/0x88
>> [    0.170576]  free_pages+0x1c/0x38
>> [    0.170703]  destroy_args+0x1c8/0x330
>> [    0.170890]  debug_vm_pgtable+0xae8/0x10f8
>> [    0.171059]  do_one_initcall+0x60/0x2c0
>> [    0.171222]  kernel_init_freeable+0x1ec/0x3d8
>> [    0.171406]  kernel_init+0x28/0x1f0
>> [    0.171557]  ret_from_fork+0x10/0x20
>> [    0.171712] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460) 
>> [    0.171963] ---[ end trace 0000000000000000 ]---
>> [    0.172156] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>> [    0.172383] SMP: stopping secondary CPUs
>> [    0.172649] Kernel Offset: 0x3a89bf800000 from 0xffff800080000000
>> [    0.173923] PHYS_OFFSET: 0xfffff76180000000
>> [    0.174585] CPU features: 0x0,00000000,2004454a,13867723
>> [    0.175707] Memory Limit: none
>> [    0.176261] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> I did a second bisection by merging https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/stage1-lpa2
> on top of the merges before for-next/core and eventually landed on:
> 
> d67cd9f23139ddfd7e0ef1e18474c16445188433 is the first bad commit
> commit d67cd9f23139ddfd7e0ef1e18474c16445188433
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date:   Tue Feb 27 19:23:31 2024 +0000
> 
>     mm: add __dump_folio()
> 
>     Turn __dump_page() into a wrapper around __dump_folio().  Snapshot the
>     page & folio into a stack variable so we don't hit BUG_ON() if an
>     allocation is freed under us and what was a folio pointer becomes a
>     pointer to a tail page.
> 
>     Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

So is that suggesting that Ard's patch is doing something that the old
__dump_page() was ok with but the new version doesn't like? I don't think so,
because the bad page detection has already happened before we get to __dump_page().

So I'm not really sure how this patch is involved? I'm hoping that either Ard or
Matthew may be able to take a look and advise.

> 
>  mm/debug.c | 120 +++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 66 insertions(+), 54 deletions(-)
> 
> # bad: [7f43e0f76e4710b2882c551519eff50e502115c5] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux.git
> # good: [805d849d7c3cc1f38efefd48b2480d62b7b5dcb7] Merge tag 'acpi-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect start '7f43e0f76e4710b2882c551519eff50e502115c5' '805d849d7c3cc1f38efefd48b2480d62b7b5dcb7'
> # bad: [7e6ae2db7f319bf9613ec6db8fa3c9bc1de1b346] mm: add swappiness= arg to memory.reclaim
> git bisect bad 7e6ae2db7f319bf9613ec6db8fa3c9bc1de1b346
> # good: [c6ec76a2ebc5829e5826b218d2e1475ec11b333e] mm: add pte_batch_hint() to reduce scanning in folio_pte_batch()
> git bisect good c6ec76a2ebc5829e5826b218d2e1475ec11b333e
> # good: [a02829f011b64e6c102929ed55da52e38391e970] writeback: fix done_index when hitting the wbc->nr_to_write
> git bisect good a02829f011b64e6c102929ed55da52e38391e970
> # good: [de435b3b914686116f86494b8cb53224d7e24cc5] arm64/mm: improve comment in contpte_ptep_get_lockless()
> git bisect good de435b3b914686116f86494b8cb53224d7e24cc5
> # good: [c143365caad5c3ad45662c393b9114c7cc694473] mm: handle large folios in free_unref_folios()
> git bisect good c143365caad5c3ad45662c393b9114c7cc694473
> # skip: [ab6445067cfbaf4ac94e969f7e8e785049314099] mm: add alloc_contig_migrate_range allocation statistics
> git bisect skip ab6445067cfbaf4ac94e969f7e8e785049314099
> # good: [447bf726277614396adcd4beedaf77ef74a748fa] modules: wait do_free_init correctly
> git bisect good 447bf726277614396adcd4beedaf77ef74a748fa
> # good: [cf2ac0c3998ffcbea680aeea2dee04d450654534] mm: remove PageWaiters, PageSetWaiters and PageClearWaiters
> git bisect good cf2ac0c3998ffcbea680aeea2dee04d450654534
> # bad: [c48de1718df9dcafb08aefbc6a0edf46e2f94e66] mm: constify more page/folio tests
> git bisect bad c48de1718df9dcafb08aefbc6a0edf46e2f94e66
> # bad: [48e4e7b8eea5fc80faad81515d429bce041f352d] mm: make dump_page() take a const argument
> git bisect bad 48e4e7b8eea5fc80faad81515d429bce041f352d
> # bad: [d67cd9f23139ddfd7e0ef1e18474c16445188433] mm: add __dump_folio()
> git bisect bad d67cd9f23139ddfd7e0ef1e18474c16445188433
> # good: [e9844b2b6cf103f4f3a42119d62758eb26c5c233] mm: remove PageYoung and PageIdle definitions
> git bisect good e9844b2b6cf103f4f3a42119d62758eb26c5c233
> # first bad commit: [d67cd9f23139ddfd7e0ef1e18474c16445188433] mm: add __dump_folio()
> 
> Cheers,
> Nathan


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime
  2024-03-01  8:54       ` Ryan Roberts
@ 2024-03-01  9:10         ` Ard Biesheuvel
  2024-03-01  9:37           ` Ard Biesheuvel
  0 siblings, 1 reply; 59+ messages in thread
From: Ard Biesheuvel @ 2024-03-01  9:10 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Nathan Chancellor, Matthew Wilcox, Ard Biesheuvel,
	linux-arm-kernel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Anshuman Khandual, Kees Cook, Aishwarya TCV,
	Mark Brown

On Fri, 1 Mar 2024 at 09:54, Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> + Matthew
>
>
> On 29/02/2024 23:01, Nathan Chancellor wrote:
> > On Thu, Feb 29, 2024 at 02:17:52PM +0000, Ryan Roberts wrote:
> >> Hi Ard,
> >>
> >> On 14/02/2024 12:29, Ard Biesheuvel wrote:
> >>> From: Ard Biesheuvel <ardb@kernel.org>
> >>>
> >>> In order to support LPA2 on 16k pages in a way that permits non-LPA2
> >>> systems to run the same kernel image, we have to be able to fall back to
> >>> at most 48 bits of virtual addressing.
> >>>
> >>> Falling back to 48 bits would result in a level 0 with only 2 entries,
> >>> which is suboptimal in terms of TLB utilization. So instead, let's fall
> >>> back to 47 bits in that case. This means we need to be able to fold PUDs
> >>> dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
> >>> on LPA2 with 4k pages.
> >>
> >> I'm seeing a panic during boot in today's linux-next (20240229) and bisect seems pretty confident that this commit is the offender. That said, its the merge commit that shows up as the problem commit:
> >>
> >> 26843fe8fa72 Merge branch 'for-next/core' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> >>
> >> but when testing the arm64's for-next/core, the problem doesn't exist. So I rebased the branch into linux-next and bisected again. That time, it fingers this patch. So I guess there is some interaction between this and other changes in next?
> > <...>
> >> [    0.161062] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
> >> [    0.161416] BUG: Bad page state in process swapper/0  pfn:18a65d
> >> [    0.161634] page does not match folio
> >> [    0.161753] page: refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x18a65d
> >> [    0.162046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> >> [    0.162332] Mem abort info:
> >> [    0.162427]   ESR = 0x0000000096000004
> >> [    0.162559]   EC = 0x25: DABT (current EL), IL = 32 bits
> >> [    0.162723]   SET = 0, FnV = 0
> >> [    0.162827]   EA = 0, S1PTW = 0
> >> [    0.162933]   FSC = 0x04: level 0 translation fault
> >> [    0.163089] Data abort info:
> >> [    0.163189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >> [    0.163370]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >> [    0.163539]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> >> [    0.163719] [0000000000000008] user address but active_mm is swapper
> >> [    0.163934] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> [    0.164143] Modules linked in:
> >> [    0.164251] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-00966-gde701dc1f7f8 #25
> >> [    0.164516] Hardware name: linux,dummy-virt (DT)
> >> [    0.164704] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> >> [    0.165052] pc : get_pfnblock_flags_mask+0x3c/0x68
> >> [    0.165281] lr : __dump_page+0x1a0/0x408
> >> [    0.165504] sp : ffff80008007b8f0
> >> [    0.165715] x29: ffff80008007b8f0 x28: 0000000000ffffc0 x27: 0000000000000000
> >> [    0.166047] x26: ffff80008007b950 x25: 0000000000000000 x24: 00000000fffffdff
> >> [    0.166358] x23: ffffba8a417ba000 x22: 000000000018a65d x21: ffffba8a41601bf8
> >> [    0.166701] x20: ffff80008007b950 x19: ffff80008007b950 x18: 0000000000000006
> >> [    0.167036] x17: 78303a7865646e69 x16: 2030303030303030 x15: 0720072007200720
> >> [    0.167365] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
> >> [    0.167693] x11: 0720072007200720 x10: ffffba8a4269c038 x9 : ffffba8a3fb0d0b8
> >> [    0.168017] x8 : 00000000ffffefff x7 : ffffba8a4269c038 x6 : 80000000fffff000
> >> [    0.168346] x5 : 000003fffff81de4 x4 : 0001fffffc0ef230 x3 : 0000000000000000
> >> [    0.168699] x2 : 0000000000000007 x1 : fffffe0779181ee5 x0 : 00000000001fffff
> >> [    0.169041] Call trace:
> >> [    0.169164]  get_pfnblock_flags_mask+0x3c/0x68
> >> [    0.169413]  dump_page+0x2c/0x70
> >> [    0.169565]  bad_page+0x84/0x130
> >> [    0.169734]  free_page_is_bad_report+0xa0/0xb8
> >> [    0.169958]  free_unref_page_prepare+0x350/0x428
> >> [    0.170132]  free_unref_page+0x50/0x1f0
> >> [    0.170278]  __free_pages+0x11c/0x160
> >> [    0.170417]  free_pages.part.0+0x6c/0x88
> >> [    0.170576]  free_pages+0x1c/0x38
> >> [    0.170703]  destroy_args+0x1c8/0x330
> >> [    0.170890]  debug_vm_pgtable+0xae8/0x10f8
> >> [    0.171059]  do_one_initcall+0x60/0x2c0
> >> [    0.171222]  kernel_init_freeable+0x1ec/0x3d8
> >> [    0.171406]  kernel_init+0x28/0x1f0
> >> [    0.171557]  ret_from_fork+0x10/0x20
> >> [    0.171712] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460)
> >> [    0.171963] ---[ end trace 0000000000000000 ]---
> >> [    0.172156] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> >> [    0.172383] SMP: stopping secondary CPUs
> >> [    0.172649] Kernel Offset: 0x3a89bf800000 from 0xffff800080000000
> >> [    0.173923] PHYS_OFFSET: 0xfffff76180000000
> >> [    0.174585] CPU features: 0x0,00000000,2004454a,13867723
> >> [    0.175707] Memory Limit: none
> >> [    0.176261] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> >
> > I did a second bisection by merging https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/stage1-lpa2
> > on top of the merges before for-next/core and eventually landed on:
> >
> > d67cd9f23139ddfd7e0ef1e18474c16445188433 is the first bad commit
> > commit d67cd9f23139ddfd7e0ef1e18474c16445188433
> > Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Date:   Tue Feb 27 19:23:31 2024 +0000
> >
> >     mm: add __dump_folio()
> >
> >     Turn __dump_page() into a wrapper around __dump_folio().  Snapshot the
> >     page & folio into a stack variable so we don't hit BUG_ON() if an
> >     allocation is freed under us and what was a folio pointer becomes a
> >     pointer to a tail page.
> >
> >     Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org
> >     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> So is that suggesting that Ard's patch is doing something that the old
> __dump_page() was ok with but the new version doesn't like? I don't think so,
> because the bad page detection has already happened before we get to __dump_page().
>

Yes, there are clearly two different issues at play here. The NULL
dereference might be an issue in the __dump_page() patch, but going
down that code path in the first place seems like it might be a
problem with mine.

The mapcount of -512 looks interesting as well.

> So I'm not really sure how this patch is involved? I'm hoping that either Ard or
> Matthew may be able to take a look and advise.
>

I'll try and make sense of this today. Thanks for the report and the
bisecting work.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime
  2024-03-01  9:10         ` Ard Biesheuvel
@ 2024-03-01  9:37           ` Ard Biesheuvel
  2024-03-01  9:47             ` Ryan Roberts
  0 siblings, 1 reply; 59+ messages in thread
From: Ard Biesheuvel @ 2024-03-01  9:37 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Nathan Chancellor, Matthew Wilcox, Ard Biesheuvel,
	linux-arm-kernel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Anshuman Khandual, Kees Cook, Aishwarya TCV,
	Mark Brown

On Fri, 1 Mar 2024 at 10:10, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 1 Mar 2024 at 09:54, Ryan Roberts <ryan.roberts@arm.com> wrote:
> >
> > + Matthew
> >
> >
> > On 29/02/2024 23:01, Nathan Chancellor wrote:
> > > On Thu, Feb 29, 2024 at 02:17:52PM +0000, Ryan Roberts wrote:
> > >> Hi Ard,
> > >>
> > >> On 14/02/2024 12:29, Ard Biesheuvel wrote:
> > >>> From: Ard Biesheuvel <ardb@kernel.org>
> > >>>
> > >>> In order to support LPA2 on 16k pages in a way that permits non-LPA2
> > >>> systems to run the same kernel image, we have to be able to fall back to
> > >>> at most 48 bits of virtual addressing.
> > >>>
> > >>> Falling back to 48 bits would result in a level 0 with only 2 entries,
> > >>> which is suboptimal in terms of TLB utilization. So instead, let's fall
> > >>> back to 47 bits in that case. This means we need to be able to fold PUDs
> > >>> dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
> > >>> on LPA2 with 4k pages.
> > >>
> > >> I'm seeing a panic during boot in today's linux-next (20240229) and bisect seems pretty confident that this commit is the offender. That said, its the merge commit that shows up as the problem commit:
> > >>
> > >> 26843fe8fa72 Merge branch 'for-next/core' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> > >>
> > >> but when testing the arm64's for-next/core, the problem doesn't exist. So I rebased the branch into linux-next and bisected again. That time, it fingers this patch. So I guess there is some interaction between this and other changes in next?
> > > <...>
> > >> [    0.161062] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
> > >> [    0.161416] BUG: Bad page state in process swapper/0  pfn:18a65d
> > >> [    0.161634] page does not match folio
> > >> [    0.161753] page: refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x18a65d
> > >> [    0.162046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> > >> [    0.162332] Mem abort info:
> > >> [    0.162427]   ESR = 0x0000000096000004
> > >> [    0.162559]   EC = 0x25: DABT (current EL), IL = 32 bits
> > >> [    0.162723]   SET = 0, FnV = 0
> > >> [    0.162827]   EA = 0, S1PTW = 0
> > >> [    0.162933]   FSC = 0x04: level 0 translation fault
> > >> [    0.163089] Data abort info:
> > >> [    0.163189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> > >> [    0.163370]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > >> [    0.163539]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > >> [    0.163719] [0000000000000008] user address but active_mm is swapper
> > >> [    0.163934] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > >> [    0.164143] Modules linked in:
> > >> [    0.164251] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-00966-gde701dc1f7f8 #25
> > >> [    0.164516] Hardware name: linux,dummy-virt (DT)
> > >> [    0.164704] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > >> [    0.165052] pc : get_pfnblock_flags_mask+0x3c/0x68
> > >> [    0.165281] lr : __dump_page+0x1a0/0x408
> > >> [    0.165504] sp : ffff80008007b8f0
> > >> [    0.165715] x29: ffff80008007b8f0 x28: 0000000000ffffc0 x27: 0000000000000000
> > >> [    0.166047] x26: ffff80008007b950 x25: 0000000000000000 x24: 00000000fffffdff
> > >> [    0.166358] x23: ffffba8a417ba000 x22: 000000000018a65d x21: ffffba8a41601bf8
> > >> [    0.166701] x20: ffff80008007b950 x19: ffff80008007b950 x18: 0000000000000006
> > >> [    0.167036] x17: 78303a7865646e69 x16: 2030303030303030 x15: 0720072007200720
> > >> [    0.167365] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
> > >> [    0.167693] x11: 0720072007200720 x10: ffffba8a4269c038 x9 : ffffba8a3fb0d0b8
> > >> [    0.168017] x8 : 00000000ffffefff x7 : ffffba8a4269c038 x6 : 80000000fffff000
> > >> [    0.168346] x5 : 000003fffff81de4 x4 : 0001fffffc0ef230 x3 : 0000000000000000
> > >> [    0.168699] x2 : 0000000000000007 x1 : fffffe0779181ee5 x0 : 00000000001fffff
> > >> [    0.169041] Call trace:
> > >> [    0.169164]  get_pfnblock_flags_mask+0x3c/0x68
> > >> [    0.169413]  dump_page+0x2c/0x70
> > >> [    0.169565]  bad_page+0x84/0x130
> > >> [    0.169734]  free_page_is_bad_report+0xa0/0xb8
> > >> [    0.169958]  free_unref_page_prepare+0x350/0x428
> > >> [    0.170132]  free_unref_page+0x50/0x1f0
> > >> [    0.170278]  __free_pages+0x11c/0x160
> > >> [    0.170417]  free_pages.part.0+0x6c/0x88
> > >> [    0.170576]  free_pages+0x1c/0x38
> > >> [    0.170703]  destroy_args+0x1c8/0x330
> > >> [    0.170890]  debug_vm_pgtable+0xae8/0x10f8
> > >> [    0.171059]  do_one_initcall+0x60/0x2c0
> > >> [    0.171222]  kernel_init_freeable+0x1ec/0x3d8
> > >> [    0.171406]  kernel_init+0x28/0x1f0
> > >> [    0.171557]  ret_from_fork+0x10/0x20
> > >> [    0.171712] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460)
> > >> [    0.171963] ---[ end trace 0000000000000000 ]---
> > >> [    0.172156] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > >> [    0.172383] SMP: stopping secondary CPUs
> > >> [    0.172649] Kernel Offset: 0x3a89bf800000 from 0xffff800080000000
> > >> [    0.173923] PHYS_OFFSET: 0xfffff76180000000
> > >> [    0.174585] CPU features: 0x0,00000000,2004454a,13867723
> > >> [    0.175707] Memory Limit: none
> > >> [    0.176261] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> > >
> > > I did a second bisection by merging https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/stage1-lpa2
> > > on top of the merges before for-next/core and eventually landed on:
> > >
> > > d67cd9f23139ddfd7e0ef1e18474c16445188433 is the first bad commit
> > > commit d67cd9f23139ddfd7e0ef1e18474c16445188433
> > > Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Date:   Tue Feb 27 19:23:31 2024 +0000
> > >
> > >     mm: add __dump_folio()
> > >
> > >     Turn __dump_page() into a wrapper around __dump_folio().  Snapshot the
> > >     page & folio into a stack variable so we don't hit BUG_ON() if an
> > >     allocation is freed under us and what was a folio pointer becomes a
> > >     pointer to a tail page.
> > >
> > >     Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org
> > >     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > >     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >
> > So is that suggesting that Ard's patch is doing something that the old
> > __dump_page() was ok with but the new version doesn't like? I don't think so,
> > because the bad page detection has already happened before we get to __dump_page().
> >
>
> Yes, there are clearly two different issues at play here. The NULL
> dereference might be an issue in the __dump_page() patch, but going
> down that code path in the first place seems like it might be a
> problem with mine.
>
> The mapcount of -512 looks interesting as well.
>
> > So I'm not really sure how this patch is involved? I'm hoping that either Ard or
> > Matthew may be able to take a look and advise.
> >
>

The crash does not reproduce for me, but the warning can be fixed by

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index aeba2cf15a25..78f30b782889 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -61,7 +61,7 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
        if (!pgtable_l4_enabled())
                return;
        BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-       free_page((unsigned long)pud);
+       __pud_free(mm, pud);
 }
 #else
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)

I'll send this out as a patch shortly.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime
  2024-03-01  9:37           ` Ard Biesheuvel
@ 2024-03-01  9:47             ` Ryan Roberts
  2024-03-01 10:22               ` Ryan Roberts
  0 siblings, 1 reply; 59+ messages in thread
From: Ryan Roberts @ 2024-03-01  9:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nathan Chancellor, Matthew Wilcox, Ard Biesheuvel,
	linux-arm-kernel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Anshuman Khandual, Kees Cook, Aishwarya TCV,
	Mark Brown

On 01/03/2024 09:37, Ard Biesheuvel wrote:
> On Fri, 1 Mar 2024 at 10:10, Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> On Fri, 1 Mar 2024 at 09:54, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> + Matthew
>>>
>>>
>>> On 29/02/2024 23:01, Nathan Chancellor wrote:
>>>> On Thu, Feb 29, 2024 at 02:17:52PM +0000, Ryan Roberts wrote:
>>>>> Hi Ard,
>>>>>
>>>>> On 14/02/2024 12:29, Ard Biesheuvel wrote:
>>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>>>
>>>>>> In order to support LPA2 on 16k pages in a way that permits non-LPA2
>>>>>> systems to run the same kernel image, we have to be able to fall back to
>>>>>> at most 48 bits of virtual addressing.
>>>>>>
>>>>>> Falling back to 48 bits would result in a level 0 with only 2 entries,
>>>>>> which is suboptimal in terms of TLB utilization. So instead, let's fall
>>>>>> back to 47 bits in that case. This means we need to be able to fold PUDs
>>>>>> dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
>>>>>> on LPA2 with 4k pages.
>>>>>
>>>>> I'm seeing a panic during boot in today's linux-next (20240229) and bisect seems pretty confident that this commit is the offender. That said, its the merge commit that shows up as the problem commit:
>>>>>
>>>>> 26843fe8fa72 Merge branch 'for-next/core' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
>>>>>
>>>>> but when testing the arm64's for-next/core, the problem doesn't exist. So I rebased the branch into linux-next and bisected again. That time, it fingers this patch. So I guess there is some interaction between this and other changes in next?
>>>> <...>
>>>>> [    0.161062] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
>>>>> [    0.161416] BUG: Bad page state in process swapper/0  pfn:18a65d
>>>>> [    0.161634] page does not match folio
>>>>> [    0.161753] page: refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x18a65d
>>>>> [    0.162046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
>>>>> [    0.162332] Mem abort info:
>>>>> [    0.162427]   ESR = 0x0000000096000004
>>>>> [    0.162559]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>> [    0.162723]   SET = 0, FnV = 0
>>>>> [    0.162827]   EA = 0, S1PTW = 0
>>>>> [    0.162933]   FSC = 0x04: level 0 translation fault
>>>>> [    0.163089] Data abort info:
>>>>> [    0.163189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>> [    0.163370]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>> [    0.163539]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>> [    0.163719] [0000000000000008] user address but active_mm is swapper
>>>>> [    0.163934] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>> [    0.164143] Modules linked in:
>>>>> [    0.164251] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-00966-gde701dc1f7f8 #25
>>>>> [    0.164516] Hardware name: linux,dummy-virt (DT)
>>>>> [    0.164704] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>>>>> [    0.165052] pc : get_pfnblock_flags_mask+0x3c/0x68
>>>>> [    0.165281] lr : __dump_page+0x1a0/0x408
>>>>> [    0.165504] sp : ffff80008007b8f0
>>>>> [    0.165715] x29: ffff80008007b8f0 x28: 0000000000ffffc0 x27: 0000000000000000
>>>>> [    0.166047] x26: ffff80008007b950 x25: 0000000000000000 x24: 00000000fffffdff
>>>>> [    0.166358] x23: ffffba8a417ba000 x22: 000000000018a65d x21: ffffba8a41601bf8
>>>>> [    0.166701] x20: ffff80008007b950 x19: ffff80008007b950 x18: 0000000000000006
>>>>> [    0.167036] x17: 78303a7865646e69 x16: 2030303030303030 x15: 0720072007200720
>>>>> [    0.167365] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
>>>>> [    0.167693] x11: 0720072007200720 x10: ffffba8a4269c038 x9 : ffffba8a3fb0d0b8
>>>>> [    0.168017] x8 : 00000000ffffefff x7 : ffffba8a4269c038 x6 : 80000000fffff000
>>>>> [    0.168346] x5 : 000003fffff81de4 x4 : 0001fffffc0ef230 x3 : 0000000000000000
>>>>> [    0.168699] x2 : 0000000000000007 x1 : fffffe0779181ee5 x0 : 00000000001fffff
>>>>> [    0.169041] Call trace:
>>>>> [    0.169164]  get_pfnblock_flags_mask+0x3c/0x68
>>>>> [    0.169413]  dump_page+0x2c/0x70
>>>>> [    0.169565]  bad_page+0x84/0x130
>>>>> [    0.169734]  free_page_is_bad_report+0xa0/0xb8
>>>>> [    0.169958]  free_unref_page_prepare+0x350/0x428
>>>>> [    0.170132]  free_unref_page+0x50/0x1f0
>>>>> [    0.170278]  __free_pages+0x11c/0x160
>>>>> [    0.170417]  free_pages.part.0+0x6c/0x88
>>>>> [    0.170576]  free_pages+0x1c/0x38
>>>>> [    0.170703]  destroy_args+0x1c8/0x330
>>>>> [    0.170890]  debug_vm_pgtable+0xae8/0x10f8
>>>>> [    0.171059]  do_one_initcall+0x60/0x2c0
>>>>> [    0.171222]  kernel_init_freeable+0x1ec/0x3d8
>>>>> [    0.171406]  kernel_init+0x28/0x1f0
>>>>> [    0.171557]  ret_from_fork+0x10/0x20
>>>>> [    0.171712] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460)
>>>>> [    0.171963] ---[ end trace 0000000000000000 ]---
>>>>> [    0.172156] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>>>>> [    0.172383] SMP: stopping secondary CPUs
>>>>> [    0.172649] Kernel Offset: 0x3a89bf800000 from 0xffff800080000000
>>>>> [    0.173923] PHYS_OFFSET: 0xfffff76180000000
>>>>> [    0.174585] CPU features: 0x0,00000000,2004454a,13867723
>>>>> [    0.175707] Memory Limit: none
>>>>> [    0.176261] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>>>>
>>>> I did a second bisection by merging https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/stage1-lpa2
>>>> on top of the merges before for-next/core and eventually landed on:
>>>>
>>>> d67cd9f23139ddfd7e0ef1e18474c16445188433 is the first bad commit
>>>> commit d67cd9f23139ddfd7e0ef1e18474c16445188433
>>>> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>> Date:   Tue Feb 27 19:23:31 2024 +0000
>>>>
>>>>     mm: add __dump_folio()
>>>>
>>>>     Turn __dump_page() into a wrapper around __dump_folio().  Snapshot the
>>>>     page & folio into a stack variable so we don't hit BUG_ON() if an
>>>>     allocation is freed under us and what was a folio pointer becomes a
>>>>     pointer to a tail page.
>>>>
>>>>     Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org
>>>>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>>
>>> So is that suggesting that Ard's patch is doing something that the old
>>> __dump_page() was ok with but the new version doesn't like? I don't think so,
>>> because the bad page detection has already happened before we get to __dump_page().
>>>
>>
>> Yes, there are clearly two different issues at play here. The NULL
>> dereference might be an issue in the __dump_page() patch, but going
>> down that code path in the first place seems like it might be a
>> problem with mine.
>>
>> The mapcount of -512 looks interesting as well.
>>
>>> So I'm not really sure how this patch is involved? I'm hoping that either Ard or
>>> Matthew may be able to take a look and advise.
>>>
>>
> 
> The crash does not reproduce for me, but the warning can be fixed by
> 
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index aeba2cf15a25..78f30b782889 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -61,7 +61,7 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
>         if (!pgtable_l4_enabled())
>                 return;
>         BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
> -       free_page((unsigned long)pud);
> +       __pud_free(mm, pud);
>  }
>  #else
>  static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
> 
> I'll send this out as a patch shortly.

Great thanks! I'm pretty sure I've found the bug in Matthew's patch - it is
copying the struct page to the stack to avoid a potential race, but later some
macros are hiding a page_to_pfn(). Since the page's address is on the stack, I
reckon that's giving a bogus pfn. Just confirming and writing it up and will
send against the original patch shortly.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime
  2024-03-01  9:47             ` Ryan Roberts
@ 2024-03-01 10:22               ` Ryan Roberts
  0 siblings, 0 replies; 59+ messages in thread
From: Ryan Roberts @ 2024-03-01 10:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nathan Chancellor, Matthew Wilcox, Ard Biesheuvel,
	linux-arm-kernel, Catalin Marinas, Will Deacon, Marc Zyngier,
	Mark Rutland, Anshuman Khandual, Kees Cook, Aishwarya TCV,
	Mark Brown

On 01/03/2024 09:47, Ryan Roberts wrote:
> On 01/03/2024 09:37, Ard Biesheuvel wrote:
>> On Fri, 1 Mar 2024 at 10:10, Ard Biesheuvel <ardb@kernel.org> wrote:
>>>
>>> On Fri, 1 Mar 2024 at 09:54, Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> + Matthew
>>>>
>>>>
>>>> On 29/02/2024 23:01, Nathan Chancellor wrote:
>>>>> On Thu, Feb 29, 2024 at 02:17:52PM +0000, Ryan Roberts wrote:
>>>>>> Hi Ard,
>>>>>>
>>>>>> On 14/02/2024 12:29, Ard Biesheuvel wrote:
>>>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>>>>
>>>>>>> In order to support LPA2 on 16k pages in a way that permits non-LPA2
>>>>>>> systems to run the same kernel image, we have to be able to fall back to
>>>>>>> at most 48 bits of virtual addressing.
>>>>>>>
>>>>>>> Falling back to 48 bits would result in a level 0 with only 2 entries,
>>>>>>> which is suboptimal in terms of TLB utilization. So instead, let's fall
>>>>>>> back to 47 bits in that case. This means we need to be able to fold PUDs
>>>>>>> dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
>>>>>>> on LPA2 with 4k pages.
>>>>>>
>>>>>> I'm seeing a panic during boot in today's linux-next (20240229) and bisect seems pretty confident that this commit is the offender. That said, its the merge commit that shows up as the problem commit:
>>>>>>
>>>>>> 26843fe8fa72 Merge branch 'for-next/core' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
>>>>>>
>>>>>> but when testing the arm64's for-next/core, the problem doesn't exist. So I rebased the branch into linux-next and bisected again. That time, it fingers this patch. So I guess there is some interaction between this and other changes in next?
>>>>> <...>
>>>>>> [    0.161062] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
>>>>>> [    0.161416] BUG: Bad page state in process swapper/0  pfn:18a65d
>>>>>> [    0.161634] page does not match folio
>>>>>> [    0.161753] page: refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x18a65d
>>>>>> [    0.162046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
>>>>>> [    0.162332] Mem abort info:
>>>>>> [    0.162427]   ESR = 0x0000000096000004
>>>>>> [    0.162559]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> [    0.162723]   SET = 0, FnV = 0
>>>>>> [    0.162827]   EA = 0, S1PTW = 0
>>>>>> [    0.162933]   FSC = 0x04: level 0 translation fault
>>>>>> [    0.163089] Data abort info:
>>>>>> [    0.163189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>> [    0.163370]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>> [    0.163539]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>>> [    0.163719] [0000000000000008] user address but active_mm is swapper
>>>>>> [    0.163934] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>>> [    0.164143] Modules linked in:
>>>>>> [    0.164251] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-00966-gde701dc1f7f8 #25
>>>>>> [    0.164516] Hardware name: linux,dummy-virt (DT)
>>>>>> [    0.164704] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>>>>>> [    0.165052] pc : get_pfnblock_flags_mask+0x3c/0x68
>>>>>> [    0.165281] lr : __dump_page+0x1a0/0x408
>>>>>> [    0.165504] sp : ffff80008007b8f0
>>>>>> [    0.165715] x29: ffff80008007b8f0 x28: 0000000000ffffc0 x27: 0000000000000000
>>>>>> [    0.166047] x26: ffff80008007b950 x25: 0000000000000000 x24: 00000000fffffdff
>>>>>> [    0.166358] x23: ffffba8a417ba000 x22: 000000000018a65d x21: ffffba8a41601bf8
>>>>>> [    0.166701] x20: ffff80008007b950 x19: ffff80008007b950 x18: 0000000000000006
>>>>>> [    0.167036] x17: 78303a7865646e69 x16: 2030303030303030 x15: 0720072007200720
>>>>>> [    0.167365] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
>>>>>> [    0.167693] x11: 0720072007200720 x10: ffffba8a4269c038 x9 : ffffba8a3fb0d0b8
>>>>>> [    0.168017] x8 : 00000000ffffefff x7 : ffffba8a4269c038 x6 : 80000000fffff000
>>>>>> [    0.168346] x5 : 000003fffff81de4 x4 : 0001fffffc0ef230 x3 : 0000000000000000
>>>>>> [    0.168699] x2 : 0000000000000007 x1 : fffffe0779181ee5 x0 : 00000000001fffff
>>>>>> [    0.169041] Call trace:
>>>>>> [    0.169164]  get_pfnblock_flags_mask+0x3c/0x68
>>>>>> [    0.169413]  dump_page+0x2c/0x70
>>>>>> [    0.169565]  bad_page+0x84/0x130
>>>>>> [    0.169734]  free_page_is_bad_report+0xa0/0xb8
>>>>>> [    0.169958]  free_unref_page_prepare+0x350/0x428
>>>>>> [    0.170132]  free_unref_page+0x50/0x1f0
>>>>>> [    0.170278]  __free_pages+0x11c/0x160
>>>>>> [    0.170417]  free_pages.part.0+0x6c/0x88
>>>>>> [    0.170576]  free_pages+0x1c/0x38
>>>>>> [    0.170703]  destroy_args+0x1c8/0x330
>>>>>> [    0.170890]  debug_vm_pgtable+0xae8/0x10f8
>>>>>> [    0.171059]  do_one_initcall+0x60/0x2c0
>>>>>> [    0.171222]  kernel_init_freeable+0x1ec/0x3d8
>>>>>> [    0.171406]  kernel_init+0x28/0x1f0
>>>>>> [    0.171557]  ret_from_fork+0x10/0x20
>>>>>> [    0.171712] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460)
>>>>>> [    0.171963] ---[ end trace 0000000000000000 ]---
>>>>>> [    0.172156] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>>>>>> [    0.172383] SMP: stopping secondary CPUs
>>>>>> [    0.172649] Kernel Offset: 0x3a89bf800000 from 0xffff800080000000
>>>>>> [    0.173923] PHYS_OFFSET: 0xfffff76180000000
>>>>>> [    0.174585] CPU features: 0x0,00000000,2004454a,13867723
>>>>>> [    0.175707] Memory Limit: none
>>>>>> [    0.176261] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>>>>>
>>>>> I did a second bisection by merging https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/stage1-lpa2
>>>>> on top of the merges before for-next/core and eventually landed on:
>>>>>
>>>>> d67cd9f23139ddfd7e0ef1e18474c16445188433 is the first bad commit
>>>>> commit d67cd9f23139ddfd7e0ef1e18474c16445188433
>>>>> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>>> Date:   Tue Feb 27 19:23:31 2024 +0000
>>>>>
>>>>>     mm: add __dump_folio()
>>>>>
>>>>>     Turn __dump_page() into a wrapper around __dump_folio().  Snapshot the
>>>>>     page & folio into a stack variable so we don't hit BUG_ON() if an
>>>>>     allocation is freed under us and what was a folio pointer becomes a
>>>>>     pointer to a tail page.
>>>>>
>>>>>     Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org
>>>>>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>>>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>>>
>>>> So is that suggesting that Ard's patch is doing something that the old
>>>> __dump_page() was ok with but the new version doesn't like? I don't think so,
>>>> because the bad page detection has already happened before we get to __dump_page().
>>>>
>>>
>>> Yes, there are clearly two different issues at play here. The NULL
>>> dereference might be an issue in the __dump_page() patch, but going
>>> down that code path in the first place seems like it might be a
>>> problem with mine.
>>>
>>> The mapcount of -512 looks interesting as well.
>>>
>>>> So I'm not really sure how this patch is involved? I'm hoping that either Ard or
>>>> Matthew may be able to take a look and advise.
>>>>
>>>
>>
>> The crash does not reproduce for me, but the warning can be fixed by
>>
>> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
>> index aeba2cf15a25..78f30b782889 100644
>> --- a/arch/arm64/include/asm/pgalloc.h
>> +++ b/arch/arm64/include/asm/pgalloc.h
>> @@ -61,7 +61,7 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
>>         if (!pgtable_l4_enabled())
>>                 return;
>>         BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
>> -       free_page((unsigned long)pud);
>> +       __pud_free(mm, pud);
>>  }
>>  #else
>>  static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
>>
>> I'll send this out as a patch shortly.
> 
> Great thanks! I'm pretty sure I've found the bug in Matthew's patch - it is
> copying the struct page to the stack to avoid a potential race, but later some
> macros are hiding a page_to_pfn(). Since the page's address is on the stack, I
> reckon that's giving a bogus pfn. Just confirming and writing it up and will
> send against the original patch shortly.
> 

OK confirmed. When I fix Matthew's patch, the panic gets converted to a warning,
and when I add the fix for your patch, there is no warning at all.

See write up for Matthew's bug here:
https://lore.kernel.org/linux-mm/6de0d026-cd8d-4152-97ca-d33d2a4e2e84@arm.com/

Thanks,
Ryan




_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags
  2024-02-14 12:29 ` [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
@ 2024-03-12 19:53   ` Catalin Marinas
  2024-03-12 23:23     ` Ard Biesheuvel
  0 siblings, 1 reply; 59+ messages in thread
From: Catalin Marinas @ 2024-03-12 19:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Ard Biesheuvel, Will Deacon, Marc Zyngier,
	Mark Rutland, Ryan Roberts, Anshuman Khandual, Kees Cook

On Wed, Feb 14, 2024 at 01:29:28PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Add a hook to permit architectures to perform validation on the prot
> flags passed to mmap(), like arch_validate_prot() does for mprotect().
> This will be used by arm64 to reject PROT_WRITE+PROT_EXEC mappings on
> configurations that run with WXN enabled.
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  include/linux/mman.h | 15 +++++++++++++++
>  mm/mmap.c            |  3 +++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index dc7048824be8..ec5e7f606e43 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -124,6 +124,21 @@ static inline bool arch_validate_flags(unsigned long flags)
>  #define arch_validate_flags arch_validate_flags
>  #endif
>  
> +#ifndef arch_validate_mmap_prot
> +/*
> + * This is called from mmap(), which ignores unknown prot bits so the default
> + * is to accept anything.
> + *
> + * Returns true if the prot flags are valid
> + */
> +static inline bool arch_validate_mmap_prot(unsigned long prot,
> +					   unsigned long addr)
> +{
> +	return true;
> +}
> +#define arch_validate_mmap_prot arch_validate_mmap_prot
> +#endif
> +
>  /*
>   * Optimisation macro.  It is equivalent to:
>   *      (x & bit1) ? bit2 : 0
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d89770eaab6b..977a8c3fd9f5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1229,6 +1229,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  		if (!(file && path_noexec(&file->f_path)))
>  			prot |= PROT_EXEC;
>  
> +	if (!arch_validate_mmap_prot(prot, addr))
> +		return -EACCES;
> +
>  	/* force arch specific MAP_FIXED handling in get_unmapped_area */
>  	if (flags & MAP_FIXED_NOREPLACE)
>  		flags |= MAP_FIXED;

While writing the pull request for Linus (and looking to justify this
change), I realised that we already have arch_validate_flags() that can
do a similar check but on VM_* flags instead of PROT_* (we use it for
VM_MTE checks). What was the reason for adding a new hook? The only
difference is that here it returns -EACCESS while on
arch_validate_flags() failure it would return -EINVAL. It probably makes
more to return -EACCESS as it matches map_deny_write_exec() but with
your patches we are inconsistent between mmap() and mprotect() errors
(the latter is still -EINVAL).

It also got me thinking on whether we could use this as a hardened
version of the MDWE feature instead a CPU feature (though we'd end up
context-switching this SCTLR_EL1 bit). I think your patches have been
around before the MDWE feature was added to the kernel.

Sorry for not catching this early.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags
  2024-03-12 19:53   ` Catalin Marinas
@ 2024-03-12 23:23     ` Ard Biesheuvel
  2024-03-13 10:47       ` Catalin Marinas
  0 siblings, 1 reply; 59+ messages in thread
From: Ard Biesheuvel @ 2024-03-12 23:23 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Kees Cook

On Tue, 12 Mar 2024 at 20:53, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Wed, Feb 14, 2024 at 01:29:28PM +0100, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Add a hook to permit architectures to perform validation on the prot
> > flags passed to mmap(), like arch_validate_prot() does for mprotect().
> > This will be used by arm64 to reject PROT_WRITE+PROT_EXEC mappings on
> > configurations that run with WXN enabled.
> >
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  include/linux/mman.h | 15 +++++++++++++++
> >  mm/mmap.c            |  3 +++
> >  2 files changed, 18 insertions(+)
> >
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index dc7048824be8..ec5e7f606e43 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -124,6 +124,21 @@ static inline bool arch_validate_flags(unsigned long flags)
> >  #define arch_validate_flags arch_validate_flags
> >  #endif
> >
> > +#ifndef arch_validate_mmap_prot
> > +/*
> > + * This is called from mmap(), which ignores unknown prot bits so the default
> > + * is to accept anything.
> > + *
> > + * Returns true if the prot flags are valid
> > + */
> > +static inline bool arch_validate_mmap_prot(unsigned long prot,
> > +                                        unsigned long addr)
> > +{
> > +     return true;
> > +}
> > +#define arch_validate_mmap_prot arch_validate_mmap_prot
> > +#endif
> > +
> >  /*
> >   * Optimisation macro.  It is equivalent to:
> >   *      (x & bit1) ? bit2 : 0
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index d89770eaab6b..977a8c3fd9f5 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1229,6 +1229,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >               if (!(file && path_noexec(&file->f_path)))
> >                       prot |= PROT_EXEC;
> >
> > +     if (!arch_validate_mmap_prot(prot, addr))
> > +             return -EACCES;
> > +
> >       /* force arch specific MAP_FIXED handling in get_unmapped_area */
> >       if (flags & MAP_FIXED_NOREPLACE)
> >               flags |= MAP_FIXED;
>
> While writing the pull request for Linus (and looking to justify this
> change), I realised that we already have arch_validate_flags() that can
> do a similar check but on VM_* flags instead of PROT_* (we use it for
> VM_MTE checks). What was the reason for adding a new hook?

I did not consider arch_validate_flags() at all because I was looking
at ways to validate PROT_ flags not VM_ flags. But if
PROT_WRITE+PROT_EXEC implies VM_WRITE+VM_EXEC, I don't see why we
wouldn't be able to change this. That way, we can drop this patch
entirely afaict.

> The only
> difference is that here it returns -EACCESS while on
> arch_validate_flags() failure it would return -EINVAL. It probably makes
> more to return -EACCESS as it matches map_deny_write_exec() but with
> your patches we are inconsistent between mmap() and mprotect() errors
> (the latter is still -EINVAL).
>

Yes. Looking at it now, it would be better to add a single arch hook
to map_deny_write_exec(), and use that instead.

> It also got me thinking on whether we could use this as a hardened
> version of the MDWE feature instead a CPU feature (though we'd end up
> context-switching this SCTLR_EL1 bit). I think your patches have been
> around before the MDWE feature was added to the kernel.
>

Indeed.

MDWE looks like a good match in principle, but combining the two seems tricky:
- EL1 is going to flip between WXN protection on and off depending on
which setting it is using for EL0;
- context switching SCTLR_EL1.WXN may become costly in terms of TLB
maintenance, unless we cheat and ignore the kernel mappings (which
should work as expected regardless of the value of SCTLR_EL1.WXN);

If we can find a reasonable strategy to manage the TLB maintenance
that does not leave EL1 behind, I'm happy to explore this further. But
perhaps WXN should simply be treated as MDWE always-on for all
processes.

> Sorry for not catching this early.
>

No worries - it's more likely to be useful if we get this right.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags
  2024-03-12 23:23     ` Ard Biesheuvel
@ 2024-03-13 10:47       ` Catalin Marinas
  2024-03-13 11:45         ` Ard Biesheuvel
  0 siblings, 1 reply; 59+ messages in thread
From: Catalin Marinas @ 2024-03-13 10:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Kees Cook, Joey Gouly

On Wed, Mar 13, 2024 at 12:23:24AM +0100, Ard Biesheuvel wrote:
> On Tue, 12 Mar 2024 at 20:53, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Wed, Feb 14, 2024 at 01:29:28PM +0100, Ard Biesheuvel wrote:
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index d89770eaab6b..977a8c3fd9f5 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1229,6 +1229,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > >               if (!(file && path_noexec(&file->f_path)))
> > >                       prot |= PROT_EXEC;
> > >
> > > +     if (!arch_validate_mmap_prot(prot, addr))
> > > +             return -EACCES;
> > > +
> > >       /* force arch specific MAP_FIXED handling in get_unmapped_area */
> > >       if (flags & MAP_FIXED_NOREPLACE)
> > >               flags |= MAP_FIXED;
> >
> > While writing the pull request for Linus (and looking to justify this
> > change), I realised that we already have arch_validate_flags() that can
> > do a similar check but on VM_* flags instead of PROT_* (we use it for
> > VM_MTE checks). What was the reason for adding a new hook?
[...]
> > The only
> > difference is that here it returns -EACCESS while on
> > arch_validate_flags() failure it would return -EINVAL. It probably makes
> > more to return -EACCESS as it matches map_deny_write_exec() but with
> > your patches we are inconsistent between mmap() and mprotect() errors
> > (the latter is still -EINVAL).
> 
> Yes. Looking at it now, it would be better to add a single arch hook
> to map_deny_write_exec(), and use that instead.

This would work and matches the API better. Another way to look at the
arm64 WXN feature is to avoid bothering with with the checks knowing
that the hardware enforces XN when a permission is W. So it can be seen
as a choice irrespective of the user PROT_EXEC|PROT_WRITE permission.
But it's still an ABI change, so I guess better to be upfront with the
user and reject such mmap()/mprotect() permission combinations.

However, I've been looking through specs and realised that SCTLR_ELx.WXN
is RES0 when the permission indirection is enabled (FEAT_PIE from the
2022 specs, hopefully you have access to it). And while apparently WXN
gets better as it allows separate EL0/EL1 controls, it seems to only
apply when the base permission is RWX and the XN is toggled based on the
overlay permission (pkeys which Joey is working on). So it looks like
what the architects had in mind is optimising RW/RX switching via
overlays (no syscalls) but keeping the base permission RWX. The
traditional WXN hardening via SCTLR_EL1 disappeared.

(adding Joey to the thread, he contributed the PIE support)

> > It also got me thinking on whether we could use this as a hardened
> > version of the MDWE feature instead a CPU feature (though we'd end up
> > context-switching this SCTLR_EL1 bit). I think your patches have been
> > around before the MDWE feature was added to the kernel.
> 
> Indeed.
> 
> MDWE looks like a good match in principle, but combining the two seems tricky:
> - EL1 is going to flip between WXN protection on and off depending on
> which setting it is using for EL0;
> - context switching SCTLR_EL1.WXN may become costly in terms of TLB
> maintenance, unless we cheat and ignore the kernel mappings (which
> should work as expected regardless of the value of SCTLR_EL1.WXN);
> 
> If we can find a reasonable strategy to manage the TLB maintenance
> that does not leave EL1 behind, I'm happy to explore this further. But
> perhaps WXN should simply be treated as MDWE always-on for all
> processes.

Ah, I did not realise this bit can be cached in the TLB. So this doesn't
really work (and the Arm ARM is also vague about whether it's cached per
ASID or not).

> > Sorry for not catching this early.
> 
> No worries - it's more likely to be useful if we get this right.

Given that with PIE this feature no longer works, I'll revert these two
patches for now. We should revisit it in combination with PIE and POE
(overlays) but it does look like the semantics are slightly different
(unless I misread the specs). If we want a global MDWE=on on the command
line, this can be generic.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags
  2024-03-13 10:47       ` Catalin Marinas
@ 2024-03-13 11:45         ` Ard Biesheuvel
  2024-03-13 15:31           ` Catalin Marinas
  0 siblings, 1 reply; 59+ messages in thread
From: Ard Biesheuvel @ 2024-03-13 11:45 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Kees Cook, Joey Gouly

On Wed, 13 Mar 2024 at 11:47, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Wed, Mar 13, 2024 at 12:23:24AM +0100, Ard Biesheuvel wrote:
> > On Tue, 12 Mar 2024 at 20:53, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Wed, Feb 14, 2024 at 01:29:28PM +0100, Ard Biesheuvel wrote:
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index d89770eaab6b..977a8c3fd9f5 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -1229,6 +1229,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > > >               if (!(file && path_noexec(&file->f_path)))
> > > >                       prot |= PROT_EXEC;
> > > >
> > > > +     if (!arch_validate_mmap_prot(prot, addr))
> > > > +             return -EACCES;
> > > > +
> > > >       /* force arch specific MAP_FIXED handling in get_unmapped_area */
> > > >       if (flags & MAP_FIXED_NOREPLACE)
> > > >               flags |= MAP_FIXED;
> > >
> > > While writing the pull request for Linus (and looking to justify this
> > > change), I realised that we already have arch_validate_flags() that can
> > > do a similar check but on VM_* flags instead of PROT_* (we use it for
> > > VM_MTE checks). What was the reason for adding a new hook?
> [...]
> > > The only
> > > difference is that here it returns -EACCESS while on
> > > arch_validate_flags() failure it would return -EINVAL. It probably makes
> > > more to return -EACCESS as it matches map_deny_write_exec() but with
> > > your patches we are inconsistent between mmap() and mprotect() errors
> > > (the latter is still -EINVAL).
> >
> > Yes. Looking at it now, it would be better to add a single arch hook
> > to map_deny_write_exec(), and use that instead.
>
> This would work and matches the API better. Another way to look at the
> arm64 WXN feature is to avoid bothering with with the checks knowing
> that the hardware enforces XN when a permission is W. So it can be seen
> as a choice irrespective of the user PROT_EXEC|PROT_WRITE permission.
> But it's still an ABI change, so I guess better to be upfront with the
> user and reject such mmap()/mprotect() permission combinations.
>

Yes, that was the idea in the original patch.

> However, I've been looking through specs and realised that SCTLR_ELx.WXN
> is RES0 when the permission indirection is enabled (FEAT_PIE from the
> 2022 specs, hopefully you have access to it).

The latest public version of the ARM ARM does not cover FEAT_PIE at all.

> And while apparently WXN
> gets better as it allows separate EL0/EL1 controls, it seems to only
> apply when the base permission is RWX and the XN is toggled based on the
> overlay permission (pkeys which Joey is working on). So it looks like
> what the architects had in mind is optimising RW/RX switching via
> overlays (no syscalls) but keeping the base permission RWX. The
> traditional WXN hardening via SCTLR_EL1 disappeared.
>
> (adding Joey to the thread, he contributed the PIE support)
>

PIE sounds useful to implement things like JITs in user space, where
you want a certain mapping to transition to RW while all other CPUs
retain RX access concurrently.

WXN is intended to be static, where a single bit sets the system-wide
policy for all kernel and user space code.

It's rather unfortunate that FEAT_PIE relies on RWX mappings and
therefore needs to deprecate WXN entirely. It would have been nice to
have something like this for the kernel, which never has a need for
RWX mappings or transitioning mappings between RX and RW like that,
and so overlays don't seem to be a great fit.

> > > It also got me thinking on whether we could use this as a hardened
> > > version of the MDWE feature instead a CPU feature (though we'd end up
> > > context-switching this SCTLR_EL1 bit). I think your patches have been
> > > around before the MDWE feature was added to the kernel.
> >
> > Indeed.
> >
> > MDWE looks like a good match in principle, but combining the two seems tricky:
> > - EL1 is going to flip between WXN protection on and off depending on
> > which setting it is using for EL0;
> > - context switching SCTLR_EL1.WXN may become costly in terms of TLB
> > maintenance, unless we cheat and ignore the kernel mappings (which
> > should work as expected regardless of the value of SCTLR_EL1.WXN);
> >
> > If we can find a reasonable strategy to manage the TLB maintenance
> > that does not leave EL1 behind, I'm happy to explore this further. But
> > perhaps WXN should simply be treated as MDWE always-on for all
> > processes.
>
> Ah, I did not realise this bit can be cached in the TLB. So this doesn't
> really work (and the Arm ARM is also vague about whether it's cached per
> ASID or not).
>
> > > Sorry for not catching this early.
> >
> > No worries - it's more likely to be useful if we get this right.
>
> Given that with PIE this feature no longer works, I'll revert these two
> patches for now. We should revisit it in combination with PIE and POE
> (overlays) but it does look like the semantics are slightly different
> (unless I misread the specs). If we want a global MDWE=on on the command
> line, this can be generic.
>

I looked into this a bit more, and MDWE is a bit stricter than WXN,
and therefore less suitable for enabling system-wide. It disallows
adding executable permissions entirely, as well as adding write
permissions to a mapping that is already executable. WXN just
disallows setting both at the same time.

So using the same hook makes sense, but combining the logic beyond
that seems problematic too.

I'll code it up in any case to see what it looks like.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags
  2024-03-13 11:45         ` Ard Biesheuvel
@ 2024-03-13 15:31           ` Catalin Marinas
  0 siblings, 0 replies; 59+ messages in thread
From: Catalin Marinas @ 2024-03-13 15:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Mark Rutland,
	Ryan Roberts, Anshuman Khandual, Kees Cook, Joey Gouly

On Wed, Mar 13, 2024 at 12:45:22PM +0100, Ard Biesheuvel wrote:
> On Wed, 13 Mar 2024 at 11:47, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > However, I've been looking through specs and realised that SCTLR_ELx.WXN
> > is RES0 when the permission indirection is enabled (FEAT_PIE from the
> > 2022 specs, hopefully you have access to it).
> 
> The latest public version of the ARM ARM does not cover FEAT_PIE at all.

According to Mark R, the next version should be out soon. The xml
tarball for the 2022 extensions doesn't have any new text for
SCTLR_ELx.WXN field either. I could only find it in the engineering spec
which isn't public:

  When Stage1 Base Permissions uses the Indirect Permission Scheme,
  SCTLR_ELx.WXN has no effect and is RES 0.

> > And while apparently WXN
> > gets better as it allows separate EL0/EL1 controls, it seems to only
> > apply when the base permission is RWX and the XN is toggled based on the
> > overlay permission (pkeys which Joey is working on). So it looks like
> > what the architects had in mind is optimising RW/RX switching via
> > overlays (no syscalls) but keeping the base permission RWX. The
> > traditional WXN hardening via SCTLR_EL1 disappeared.
> >
> > (adding Joey to the thread, he contributed the PIE support)
> 
> PIE sounds useful to implement things like JITs in user space, where
> you want a certain mapping to transition to RW while all other CPUs
> retain RX access concurrently.
> 
> WXN is intended to be static, where a single bit sets the system-wide
> policy for all kernel and user space code.

I agree. I guess no-one used the current WXN and the architects decided
to deprecate it.

> It's rather unfortunate that FEAT_PIE relies on RWX mappings and
> therefore needs to deprecate WXN entirely. It would have been nice to
> have something like this for the kernel, which never has a need for
> RWX mappings or transitioning mappings between RX and RW like that,
> and so overlays don't seem to be a great fit.

Indeed. It looks more of a risk to somehow use WXN in the kernel in
combination with overlays because of the RWX permission.

> I looked into this a bit more, and MDWE is a bit stricter than WXN,
> and therefore less suitable for enabling system-wide. It disallows
> adding executable permissions entirely, as well as adding write
> permissions to a mapping that is already executable. WXN just
> disallows setting both at the same time.

With MDWE, we tried to copy the semantics of the BPF variant. It allows
mmap(PROT_EXEC) but not mrpotect(PROT_EXEC). But I agree, it's slightly
different than your proposed WXN.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2024-03-13 15:31 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-14 12:28 [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 01/43] arm64: kernel: Manage absolute relocations in code built under pi/ Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 02/43] arm64: kernel: Don't rely on objcopy to make code under pi/ __init Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 03/43] arm64: head: move relocation handling to C code Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 04/43] arm64: idreg-override: Move to early mini C runtime Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 05/43] arm64: kernel: Remove early fdt remap code Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 06/43] arm64: head: Clear BSS and the kernel page tables in one go Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 07/43] arm64: Move feature overrides into the BSS section Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 08/43] arm64: head: Run feature override detection before mapping the kernel Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 09/43] arm64: head: move dynamic shadow call stack patching into early C runtime Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 10/43] arm64: cpufeature: Add helper to test for CPU feature overrides Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 11/43] arm64: kaslr: Use feature override instead of parsing the cmdline again Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 12/43] arm64: idreg-override: Create a pseudo feature for rodata=off Ard Biesheuvel
2024-02-14 12:28 ` [PATCH v8 13/43] arm64: Add helpers to probe local CPU for PAC and BTI support Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 14/43] arm64: head: allocate more pages for the kernel mapping Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 15/43] arm64: head: move memstart_offset_seed handling to C code Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 16/43] arm64: mm: Make kaslr_requires_kpti() a static inline Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 17/43] arm64: mmu: Make __cpu_replace_ttbr1() out of line Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 18/43] arm64: head: Move early kernel mapping routines into C code Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 19/43] arm64: mm: Use 48-bit virtual addressing for the permanent ID map Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 20/43] arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levels Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 21/43] arm64: kernel: Create initial ID map from C code Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 22/43] arm64: mm: avoid fixmap for early swapper_pg_dir updates Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 23/43] arm64: mm: omit redundant remap of kernel image Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 24/43] arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()" Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 25/43] arm64: mm: Handle LVA support as a CPU feature Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 26/43] arm64: mm: Add feature override support for LVA Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 27/43] arm64: Avoid #define'ing PTE_MAYBE_NG to 0x0 for asm use Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 28/43] arm64: Add ESR decoding for exceptions involving translation level -1 Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 29/43] arm64: mm: Wire up TCR.DS bit to PTE shareability fields Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 30/43] arm64: mm: Add LPA2 support to phys<->pte conversion routines Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 31/43] arm64: mm: Add definitions to support 5 levels of paging Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 32/43] arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 33/43] arm64: Enable LPA2 at boot if supported by the system Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 34/43] arm64: mm: Add 5 level paging support to fixmap and swapper handling Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 35/43] arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime Ard Biesheuvel
2024-02-29 14:17   ` Ryan Roberts
2024-02-29 23:01     ` Nathan Chancellor
2024-03-01  8:54       ` Ryan Roberts
2024-03-01  9:10         ` Ard Biesheuvel
2024-03-01  9:37           ` Ard Biesheuvel
2024-03-01  9:47             ` Ryan Roberts
2024-03-01 10:22               ` Ryan Roberts
2024-02-14 12:29 ` [PATCH v8 37/43] arm64: ptdump: Disregard unaddressable VA space Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 38/43] arm64: ptdump: Deal with translation levels folded at runtime Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 39/43] arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 40/43] arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 41/43] arm64: defconfig: Enable LPA2 support Ard Biesheuvel
2024-02-14 12:29 ` [PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags Ard Biesheuvel
2024-03-12 19:53   ` Catalin Marinas
2024-03-12 23:23     ` Ard Biesheuvel
2024-03-13 10:47       ` Catalin Marinas
2024-03-13 11:45         ` Ard Biesheuvel
2024-03-13 15:31           ` Catalin Marinas
2024-02-14 12:29 ` [PATCH v8 43/43] arm64: mm: add support for WXN memory translation attribute Ard Biesheuvel
2024-02-16 17:35 ` [PATCH v8 00/43] arm64: Add support for LPA2 and WXN at stage 1 Catalin Marinas
2024-02-16 18:23   ` Ard Biesheuvel
2024-02-16 22:34     ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.