linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] x86: Build the core kernel using PIC codegen
@ 2024-01-22  9:08 Ard Biesheuvel
  2024-01-22  9:08 ` [RFC PATCH 1/5] kallsyms: Avoid weak references for kallsyms symbols Ard Biesheuvel
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2024-01-22  9:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ard Biesheuvel, Kevin Loughlin, Tom Lendacky, Dionna Glaze,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Arnd Bergmann, Martin KaFai Lau,
	Nathan Chancellor, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm

From: Ard Biesheuvel <ardb@kernel.org>

Originally, only arch/x86/kernel/head64.c had some code that required
special care because it executes very early from the 1:1 mapping of the
kernel rather than the ordinary kernel virtual mapping.

This is no longer the case, and there is a lot of SEV related code that
is reachable from the primary startup path, with no guarantees that the
toolchain will produce code that runs correctly. This is especially
problematic when it comes to things like string literals, which are
emitted by the compiler as data objects, and subsequently referenced via
an absolute address that is not mapped yet this early in the boot [0].

Kevin has been looking into failures resulting from the fact that Clang
behaves slightly differently from GCC in this regard, by selectively
applying PIC codegen to the objects in question. However, while this
fixes the observed issues, it does not offer any guarantees, given that
the set of reachable code from startup_64() does not appear to be
bounded when running on SEV hardware.

Instead of applying this change piecemeal to objects that happen to have
caused issues in the past, this series convert the core kernel to PIC
codegen entirely.

Note that this does not entirely solve the problem of the unbounded set
of reachable code from the early SEV entrypoint: there might be code
that attempts to access global objects via their kernel virtual address
(which is not mapped yet). But at least all implicit accesses will be
made via the same translation that the code is running from.
 
This does result in a slight increase in code size (see below) but it
also reduces the size of the KASLR relocation table (applied by the
decompressor) by roughly half.


Before

$ size -x vmlinux
   text	   data	    bss	    dec	    hex	filename
0x1b78ec1	0xdde145	0x381000	47022086	2cd8006	vmlinux

After

$ size -x vmlinux
   text	   data	    bss	    dec	    hex	filename
0x1b8371b	0xde0d1d	0x370000	47006776	2cd4438	vmlinux


[0] arch/x86/mm/mem_encrypt_identity.c has some nice examples of this,
    where RIP-relative references are emitted using inline asm.

[1] https://lkml.kernel.org/r/20240111223650.3502633-1-kevinloughlin%40google.com

Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Dionna Glaze <dionnaglaze@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev

Ard Biesheuvel (5):
  kallsyms: Avoid weak references for kallsyms symbols
  vmlinux: Avoid weak reference to notes section
  btf: Avoid weak external references
  x86/head64: Replace pointer fixups with PIE codegen
  x86: Build the core kernel with position independent codegen

 arch/x86/Makefile                 |  18 ++-
 arch/x86/boot/compressed/Makefile |   2 +-
 arch/x86/entry/vdso/Makefile      |   2 +-
 arch/x86/include/asm/init.h       |   2 -
 arch/x86/include/asm/setup.h      |   2 +-
 arch/x86/kernel/head64.c          | 117 +++++++-------------
 arch/x86/realmode/rm/Makefile     |   1 +
 include/asm-generic/vmlinux.lds.h |  23 ++++
 kernel/bpf/btf.c                  |   4 +-
 kernel/kallsyms.c                 |   6 -
 kernel/kallsyms_internal.h        |  30 ++---
 kernel/ksysfs.c                   |   4 +-
 lib/buildid.c                     |   4 +-
 13 files changed, 104 insertions(+), 111 deletions(-)

-- 
2.43.0.429.g432eaa2c6b-goog

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/5] kallsyms: Avoid weak references for kallsyms symbols
  2024-01-22  9:08 [RFC PATCH 0/5] x86: Build the core kernel using PIC codegen Ard Biesheuvel
@ 2024-01-22  9:08 ` Ard Biesheuvel
  2024-01-22  9:08 ` [RFC PATCH 2/5] vmlinux: Avoid weak reference to notes section Ard Biesheuvel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2024-01-22  9:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ard Biesheuvel, Kevin Loughlin, Tom Lendacky, Dionna Glaze,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Arnd Bergmann, Martin KaFai Lau,
	Nathan Chancellor, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm, Kees Cook

From: Ard Biesheuvel <ardb@kernel.org>

kallsyms is a directory of all the symbols in the vmlinux binary, and so
creating it is somewhat of a chicken-and-egg problem, as its non-zero
size affects the layout of the binary, and therefore the values of the
symbols.

For this reason, the kernel is linked more than once, and the first pass
does not include any kallsyms data at all. For the linker to accept
this, the symbol declarations describing the kallsyms metadata are
emitted as having weak linkage, so they can remain unsatisfied. During
the subsequent passes, the weak references are satisfied by the kallsyms
metadata that was constructed based on information gathered from the
preceding passes.

Weak references lead to somewhat worse codegen, because taking their
address may need to produce NULL (if the reference was unsatisfied), and
this is not usually supported by RIP or PC relative symbol references.

Given that these references are ultimately always satisfied in the final
link, let's drop the weak annotation, and instead, provide fallback
definitions in the linker script that are only emitted if an unsatisfied
reference exists.

While at it, drop the FRV specific annotation that these symbols reside
in .rodata - FRV is long gone.

Tested-by: Nick Desaulniers <ndesaulniers@google.com> # Boot
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20230504174320.3930345-1-ardb%40kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/asm-generic/vmlinux.lds.h | 19 +++++++++++++
 kernel/kallsyms.c                 |  6 ----
 kernel/kallsyms_internal.h        | 30 ++++++++------------
 3 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 5dd3a61d673d..a39e050416c7 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -448,11 +448,30 @@
 #endif
 #endif
 
+/*
+ * Some symbol definitions will not exist yet during the first pass of the
+ * link, but are guaranteed to exist in the final link. Provide preliminary
+ * definitions that will be superseded in the final link to avoid having to
+ * rely on weak external linkage, which requires a GOT when used in position
+ * independent code.
+ */
+#define PRELIMINARY_SYMBOL_DEFINITIONS					\
+	PROVIDE(kallsyms_addresses = .);				\
+	PROVIDE(kallsyms_offsets = .);					\
+	PROVIDE(kallsyms_names = .);					\
+	PROVIDE(kallsyms_num_syms = .);					\
+	PROVIDE(kallsyms_relative_base = .);				\
+	PROVIDE(kallsyms_token_table = .);				\
+	PROVIDE(kallsyms_token_index = .);				\
+	PROVIDE(kallsyms_markers = .);					\
+	PROVIDE(kallsyms_seqs_of_names = .);
+
 /*
  * Read only Data
  */
 #define RO_DATA(align)							\
 	. = ALIGN((align));						\
+	PRELIMINARY_SYMBOL_DEFINITIONS					\
 	.rodata           : AT(ADDR(.rodata) - LOAD_OFFSET) {		\
 		__start_rodata = .;					\
 		*(.rodata) *(.rodata.*)					\
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 18edd57b5fe8..22ea19a36e6e 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -325,12 +325,6 @@ static unsigned long get_symbol_pos(unsigned long addr,
 	unsigned long symbol_start = 0, symbol_end = 0;
 	unsigned long i, low, high, mid;
 
-	/* This kernel should never had been booted. */
-	if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
-		BUG_ON(!kallsyms_addresses);
-	else
-		BUG_ON(!kallsyms_offsets);
-
 	/* Do a binary search on the sorted kallsyms_addresses array. */
 	low = 0;
 	high = kallsyms_num_syms;
diff --git a/kernel/kallsyms_internal.h b/kernel/kallsyms_internal.h
index 27fabdcc40f5..85480274fc8f 100644
--- a/kernel/kallsyms_internal.h
+++ b/kernel/kallsyms_internal.h
@@ -5,27 +5,21 @@
 #include <linux/types.h>
 
 /*
- * These will be re-linked against their real values
- * during the second link stage.
+ * These will be re-linked against their real values during the second link
+ * stage. Preliminary values must be provided in the linker script using the
+ * PROVIDE() directive so that the first link stage can complete successfully.
  */
-extern const unsigned long kallsyms_addresses[] __weak;
-extern const int kallsyms_offsets[] __weak;
-extern const u8 kallsyms_names[] __weak;
+extern const unsigned long kallsyms_addresses[];
+extern const int kallsyms_offsets[];
+extern const u8 kallsyms_names[];
 
-/*
- * Tell the compiler that the count isn't in the small data section if the arch
- * has one (eg: FRV).
- */
-extern const unsigned int kallsyms_num_syms
-__section(".rodata") __attribute__((weak));
-
-extern const unsigned long kallsyms_relative_base
-__section(".rodata") __attribute__((weak));
+extern const unsigned int kallsyms_num_syms;
+extern const unsigned long kallsyms_relative_base;
 
-extern const char kallsyms_token_table[] __weak;
-extern const u16 kallsyms_token_index[] __weak;
+extern const char kallsyms_token_table[];
+extern const u16 kallsyms_token_index[];
 
-extern const unsigned int kallsyms_markers[] __weak;
-extern const u8 kallsyms_seqs_of_names[] __weak;
+extern const unsigned int kallsyms_markers[];
+extern const u8 kallsyms_seqs_of_names[];
 
 #endif // LINUX_KALLSYMS_INTERNAL_H_
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/5] vmlinux: Avoid weak reference to notes section
  2024-01-22  9:08 [RFC PATCH 0/5] x86: Build the core kernel using PIC codegen Ard Biesheuvel
  2024-01-22  9:08 ` [RFC PATCH 1/5] kallsyms: Avoid weak references for kallsyms symbols Ard Biesheuvel
@ 2024-01-22  9:08 ` Ard Biesheuvel
  2024-01-22  9:08 ` [RFC PATCH 3/5] btf: Avoid weak external references Ard Biesheuvel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2024-01-22  9:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ard Biesheuvel, Kevin Loughlin, Tom Lendacky, Dionna Glaze,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Arnd Bergmann, Martin KaFai Lau,
	Nathan Chancellor, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm

From: Ard Biesheuvel <ardb@kernel.org>

Weak references are references that are permitted to remain unsatisfied
in the final link. This means they cannot be implemented using place
relative relocations, resulting in GOT entries when using position
independent code generation.

The notes section should always exist, so the weak annotations can be
omitted.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 kernel/ksysfs.c | 4 ++--
 lib/buildid.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 1d4bc493b2f4..347beb763c59 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -226,8 +226,8 @@ KERNEL_ATTR_RW(rcu_normal);
 /*
  * Make /sys/kernel/notes give the raw contents of our kernel .notes section.
  */
-extern const void __start_notes __weak;
-extern const void __stop_notes __weak;
+extern const void __start_notes;
+extern const void __stop_notes;
 #define	notes_size (&__stop_notes - &__start_notes)
 
 static ssize_t notes_read(struct file *filp, struct kobject *kobj,
diff --git a/lib/buildid.c b/lib/buildid.c
index e3a7acdeef0e..15109fe191ae 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -182,8 +182,8 @@ unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX] __ro_after_init;
  */
 void __init init_vmlinux_build_id(void)
 {
-	extern const void __start_notes __weak;
-	extern const void __stop_notes __weak;
+	extern const void __start_notes;
+	extern const void __stop_notes;
 	unsigned int size = &__stop_notes - &__start_notes;
 
 	build_id_parse_buf(&__start_notes, vmlinux_build_id, size);
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/5] btf: Avoid weak external references
  2024-01-22  9:08 [RFC PATCH 0/5] x86: Build the core kernel using PIC codegen Ard Biesheuvel
  2024-01-22  9:08 ` [RFC PATCH 1/5] kallsyms: Avoid weak references for kallsyms symbols Ard Biesheuvel
  2024-01-22  9:08 ` [RFC PATCH 2/5] vmlinux: Avoid weak reference to notes section Ard Biesheuvel
@ 2024-01-22  9:08 ` Ard Biesheuvel
  2024-01-22  9:08 ` [RFC PATCH 4/5] x86/head64: Replace pointer fixups with PIE codegen Ard Biesheuvel
  2024-01-22  9:08 ` [RFC PATCH 5/5] x86: Build the core kernel with position independent codegen Ard Biesheuvel
  4 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2024-01-22  9:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ard Biesheuvel, Kevin Loughlin, Tom Lendacky, Dionna Glaze,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Arnd Bergmann, Martin KaFai Lau,
	Nathan Chancellor, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm

From: Ard Biesheuvel <ardb@kernel.org>

If the BTF code is enabled in the build configuration, the start/stop
BTF markers are guaranteed to exist in the final link but not during the
first linker pass.

Avoid GOT based relocations to these markers in the final executable by
providing preliminary definitions that will be used by the first linker
pass, and superseded by the actual definitions in the subsequent ones.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/asm-generic/vmlinux.lds.h | 2 ++
 kernel/bpf/btf.c                  | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index a39e050416c7..ef45331fb043 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -456,6 +456,8 @@
  * independent code.
  */
 #define PRELIMINARY_SYMBOL_DEFINITIONS					\
+	PROVIDE(__start_BTF = .);					\
+	PROVIDE(__stop_BTF = .);					\
 	PROVIDE(kallsyms_addresses = .);				\
 	PROVIDE(kallsyms_offsets = .);					\
 	PROVIDE(kallsyms_names = .);					\
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 596471189176..a659fc7045bb 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5581,8 +5581,8 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
 	return ERR_PTR(err);
 }
 
-extern char __weak __start_BTF[];
-extern char __weak __stop_BTF[];
+extern char __start_BTF[];
+extern char __stop_BTF[];
 extern struct btf *btf_vmlinux;
 
 #define BPF_MAP_TYPE(_id, _ops)
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 4/5] x86/head64: Replace pointer fixups with PIE codegen
  2024-01-22  9:08 [RFC PATCH 0/5] x86: Build the core kernel using PIC codegen Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2024-01-22  9:08 ` [RFC PATCH 3/5] btf: Avoid weak external references Ard Biesheuvel
@ 2024-01-22  9:08 ` Ard Biesheuvel
  2024-01-22 19:34   ` Brian Gerst
  2024-01-22  9:08 ` [RFC PATCH 5/5] x86: Build the core kernel with position independent codegen Ard Biesheuvel
  4 siblings, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2024-01-22  9:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ard Biesheuvel, Kevin Loughlin, Tom Lendacky, Dionna Glaze,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Arnd Bergmann, Martin KaFai Lau,
	Nathan Chancellor, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm

From: Ard Biesheuvel <ardb@kernel.org>

Some of the C code in head64.c may be called from a different virtual
address than it was linked at. Currently, we deal with this by using
ordinary, position dependent codegen, and fixing up all symbol
references on the fly. This is fragile and tricky to maintain. It is
also unnecessary: we can use position independent codegen (with hidden
visibility) to ensure that all compiler generated symbol references are
RIP-relative, removing the need for fixups entirely.

It does mean we need explicit references to kernel virtual addresses to
be generated by hand, so generate those using a movabs instruction in
inline asm in the handful places where we actually need this.

While at it, move these routines to .inittext where they belong.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/Makefile                 |  11 ++
 arch/x86/boot/compressed/Makefile |   2 +-
 arch/x86/include/asm/init.h       |   2 -
 arch/x86/include/asm/setup.h      |   2 +-
 arch/x86/kernel/Makefile          |   4 +
 arch/x86/kernel/head64.c          | 117 +++++++-------------
 6 files changed, 60 insertions(+), 78 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1a068de12a56..bed0850d91b0 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -168,6 +168,17 @@ else
         KBUILD_CFLAGS += -mcmodel=kernel
         KBUILD_RUSTFLAGS += -Cno-redzone=y
         KBUILD_RUSTFLAGS += -Ccode-model=kernel
+
+	PIE_CFLAGS := -fpie -mcmodel=small \
+		      -include $(srctree)/include/linux/hidden.h
+
+	ifeq ($(CONFIG_STACKPROTECTOR),y)
+		ifeq ($(CONFIG_SMP),y)
+			PIE_CFLAGS += -mstack-protector-guard-reg=gs
+		endif
+	endif
+
+	export PIE_CFLAGS
 endif
 
 #
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f19c038409aa..bccee07eae60 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -84,7 +84,7 @@ LDFLAGS_vmlinux += -T
 hostprogs	:= mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
 
 quiet_cmd_voffset = VOFFSET $@
       cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index cc9ccf61b6bd..5f1d3c421f68 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -2,8 +2,6 @@
 #ifndef _ASM_X86_INIT_H
 #define _ASM_X86_INIT_H
 
-#define __head	__section(".head.text")
-
 struct x86_mapping_info {
 	void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
 	void *context;			 /* context for alloc_pgt_page */
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 5c83729c8e71..dc657b1d7e14 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -48,7 +48,7 @@ extern unsigned long saved_video_mode;
 extern void reserve_standard_io_resources(void);
 extern void i386_reserve_resources(void);
 extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
-extern void startup_64_setup_env(unsigned long physbase);
+extern void startup_64_setup_env(void);
 extern void early_setup_idt(void);
 extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0000325ab98f..65194ca79b5c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,10 @@ CFLAGS_REMOVE_sev.o = -pg
 CFLAGS_REMOVE_rethook.o = -pg
 endif
 
+# head64.c contains C code that may execute from a different virtual address
+# than it was linked at, so we always build it using PIE codegen
+CFLAGS_head64.o += $(PIE_CFLAGS)
+
 KASAN_SANITIZE_head$(BITS).o				:= n
 KASAN_SANITIZE_dumpstack.o				:= n
 KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index dc0956067944..21f6c2a7061b 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -85,23 +85,8 @@ static struct desc_ptr startup_gdt_descr __initdata = {
 	.address = 0,
 };
 
-static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
-{
-	return ptr - (void *)_text + (void *)physaddr;
-}
-
-static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
-{
-	return fixup_pointer(ptr, physaddr);
-}
-
 #ifdef CONFIG_X86_5LEVEL
-static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
-{
-	return fixup_pointer(ptr, physaddr);
-}
-
-static bool __head check_la57_support(unsigned long physaddr)
+static bool __init check_la57_support(void)
 {
 	/*
 	 * 5-level paging is detected and enabled at kernel decompression
@@ -110,23 +95,28 @@ static bool __head check_la57_support(unsigned long physaddr)
 	if (!(native_read_cr4() & X86_CR4_LA57))
 		return false;
 
-	*fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
-	*fixup_int(&pgdir_shift, physaddr) = 48;
-	*fixup_int(&ptrs_per_p4d, physaddr) = 512;
-	*fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
-	*fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
-	*fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
+	__pgtable_l5_enabled	= 1;
+	pgdir_shift		= 48;
+	ptrs_per_p4d		= 512;
+	page_offset_base	= __PAGE_OFFSET_BASE_L5;
+	vmalloc_base		= __VMALLOC_BASE_L5;
+	vmemmap_base		= __VMEMMAP_BASE_L5;
 
 	return true;
 }
 #else
-static bool __head check_la57_support(unsigned long physaddr)
+static bool __init check_la57_support(void)
 {
 	return false;
 }
 #endif
 
-static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
+#define __va_symbol(sym) ({						\
+	unsigned long __v;						\
+	asm("movabsq $" __stringify(sym) ", %0":"=r"(__v));		\
+	__v; })
+
+static unsigned long __init sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
 {
 	unsigned long vaddr, vaddr_end;
 	int i;
@@ -141,8 +131,8 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
 	 * attribute.
 	 */
 	if (sme_get_me_mask()) {
-		vaddr = (unsigned long)__start_bss_decrypted;
-		vaddr_end = (unsigned long)__end_bss_decrypted;
+		vaddr = __va_symbol(__start_bss_decrypted);
+		vaddr_end = __va_symbol(__end_bss_decrypted);
 
 		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
 			/*
@@ -169,13 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
 	return sme_get_me_mask();
 }
 
-/* Code in __startup_64() can be relocated during execution, but the compiler
- * doesn't have to generate PC-relative relocations when accessing globals from
- * that function. Clang actually does not generate them, which leads to
- * boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer().
- */
-unsigned long __head __startup_64(unsigned long physaddr,
+unsigned long __init __startup_64(unsigned long physaddr,
 				  struct boot_params *bp)
 {
 	unsigned long load_delta, *p;
@@ -184,12 +168,10 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	p4dval_t *p4d;
 	pudval_t *pud;
 	pmdval_t *pmd, pmd_entry;
-	pteval_t *mask_ptr;
 	bool la57;
 	int i;
-	unsigned int *next_pgt_ptr;
 
-	la57 = check_la57_support(physaddr);
+	la57 = check_la57_support();
 
 	/* Is the address too large? */
 	if (physaddr >> MAX_PHYSMEM_BITS)
@@ -199,7 +181,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * Compute the delta between the address I am compiled to run at
 	 * and the address I am actually running at.
 	 */
-	load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+	load_delta = physaddr - (__va_symbol(_text) - __START_KERNEL_map);
 
 	/* Is the address not 2M aligned? */
 	if (load_delta & ~PMD_MASK)
@@ -210,26 +192,22 @@ unsigned long __head __startup_64(unsigned long physaddr,
 
 	/* Fixup the physical addresses in the page table */
 
-	pgd = fixup_pointer(early_top_pgt, physaddr);
+	pgd = (pgdval_t *)early_top_pgt;
 	p = pgd + pgd_index(__START_KERNEL_map);
 	if (la57)
 		*p = (unsigned long)level4_kernel_pgt;
 	else
 		*p = (unsigned long)level3_kernel_pgt;
-	*p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;
+	*p += _PAGE_TABLE_NOENC + sme_get_me_mask();
 
-	if (la57) {
-		p4d = fixup_pointer(level4_kernel_pgt, physaddr);
-		p4d[511] += load_delta;
-	}
+	if (la57)
+		level4_kernel_pgt[511].p4d += load_delta;
 
-	pud = fixup_pointer(level3_kernel_pgt, physaddr);
-	pud[510] += load_delta;
-	pud[511] += load_delta;
+	level3_kernel_pgt[510].pud += load_delta;
+	level3_kernel_pgt[511].pud += load_delta;
 
-	pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
 	for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--)
-		pmd[i] += load_delta;
+		level2_fixmap_pgt[i].pmd += load_delta;
 
 	/*
 	 * Set up the identity mapping for the switchover.  These
@@ -238,15 +216,13 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * it avoids problems around wraparound.
 	 */
 
-	next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
-	pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
-	pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
+	pud = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+	pmd = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
 
 	pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
 
 	if (la57) {
-		p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
-				    physaddr);
+		p4d = (p4dval_t *)early_dynamic_pgts[next_early_pgt++];
 
 		i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
 		pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
@@ -267,8 +243,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 
 	pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
 	/* Filter out unsupported __PAGE_KERNEL_* bits: */
-	mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
-	pmd_entry &= *mask_ptr;
+	pmd_entry &= __supported_pte_mask;
 	pmd_entry += sme_get_me_mask();
 	pmd_entry +=  physaddr;
 
@@ -294,14 +269,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * error, causing the BIOS to halt the system.
 	 */
 
-	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
+	pmd = (pmdval_t *)level2_kernel_pgt;
 
 	/* invalidate pages before the kernel image */
-	for (i = 0; i < pmd_index((unsigned long)_text); i++)
+	for (i = 0; i < pmd_index(__va_symbol(_text)); i++)
 		pmd[i] &= ~_PAGE_PRESENT;
 
 	/* fixup pages that are part of the kernel image */
-	for (; i <= pmd_index((unsigned long)_end); i++)
+	for (; i <= pmd_index(__va_symbol(_end)); i++)
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
 
@@ -313,7 +288,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * Fixup phys_base - remove the memory encryption mask to obtain
 	 * the true physical address.
 	 */
-	*fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
+	phys_base += load_delta - sme_get_me_mask();
 
 	return sme_postprocess_startup(bp, pmd);
 }
@@ -587,22 +562,16 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
 }
 
 /* This runs while still in the direct mapping */
-static void __head startup_64_load_idt(unsigned long physbase)
+static void startup_64_load_idt(void)
 {
-	struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
-	gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
-
-
-	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
-		void *handler;
+	gate_desc *idt = bringup_idt_table;
 
+	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
 		/* VMM Communication Exception */
-		handler = fixup_pointer(vc_no_ghcb, physbase);
-		set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
-	}
+		set_bringup_idt_handler(idt, X86_TRAP_VC, vc_no_ghcb);
 
-	desc->address = (unsigned long)idt;
-	native_load_idt(desc);
+	bringup_idt_descr.address = (unsigned long)idt;
+	native_load_idt(&bringup_idt_descr);
 }
 
 /* This is used when running on kernel addresses */
@@ -621,10 +590,10 @@ void early_setup_idt(void)
 /*
  * Setup boot CPU state needed before kernel switches to virtual addresses.
  */
-void __head startup_64_setup_env(unsigned long physbase)
+void __init startup_64_setup_env(void)
 {
 	/* Load GDT */
-	startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
+	startup_gdt_descr.address = (unsigned long)startup_gdt;
 	native_load_gdt(&startup_gdt_descr);
 
 	/* New GDT is live - reload data segment registers */
@@ -632,5 +601,5 @@ void __head startup_64_setup_env(unsigned long physbase)
 		     "movl %%eax, %%ss\n"
 		     "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
 
-	startup_64_load_idt(physbase);
+	startup_64_load_idt();
 }
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 5/5] x86: Build the core kernel with position independent codegen
  2024-01-22  9:08 [RFC PATCH 0/5] x86: Build the core kernel using PIC codegen Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2024-01-22  9:08 ` [RFC PATCH 4/5] x86/head64: Replace pointer fixups with PIE codegen Ard Biesheuvel
@ 2024-01-22  9:08 ` Ard Biesheuvel
  4 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2024-01-22  9:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ard Biesheuvel, Kevin Loughlin, Tom Lendacky, Dionna Glaze,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Arnd Bergmann, Martin KaFai Lau,
	Nathan Chancellor, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm

From: Ard Biesheuvel <ardb@kernel.org>

Pass the -fpie flag to the compiler when building objects that are
intended for the core kernel. This ensures that all implicit symbol
references are emitted using RIP-relative relocations, allowing the code
to be executed correctly even if it runs from a different virtual
address than the address it was linked/loaded/relocated to run at.

This is necessary to ensure that all C code that gets pulled in by the
early startup code runs correctly without the need for unpalatable hacks
in the code to force RIP-relative references.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/Makefile                 | 7 +++++--
 arch/x86/entry/vdso/Makefile      | 2 +-
 arch/x86/kernel/Makefile          | 4 ----
 arch/x86/realmode/rm/Makefile     | 1 +
 include/asm-generic/vmlinux.lds.h | 2 ++
 5 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index bed0850d91b0..0382a9534099 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -165,11 +165,13 @@ else
         KBUILD_RUSTFLAGS += $(rustflags-y)
 
         KBUILD_CFLAGS += -mno-red-zone
-        KBUILD_CFLAGS += -mcmodel=kernel
+        KBUILD_CFLAGS_MODULE += -mcmodel=kernel
         KBUILD_RUSTFLAGS += -Cno-redzone=y
-        KBUILD_RUSTFLAGS += -Ccode-model=kernel
+        KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small
+        KBUILD_RUSTFLAGS_MODULE += -Ccode-model=kernel
 
 	PIE_CFLAGS := -fpie -mcmodel=small \
+		      $(call cc-option,-Wa$(comma)-mrelax-relocations=no) \
 		      -include $(srctree)/include/linux/hidden.h
 
 	ifeq ($(CONFIG_STACKPROTECTOR),y)
@@ -178,6 +180,7 @@ else
 		endif
 	endif
 
+	KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS)
 	export PIE_CFLAGS
 endif
 
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b1b8dd1608f7..e2c79d4c1417 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -149,7 +149,7 @@ $(obj)/vdso32.so.dbg: KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
 $(obj)/vdso32.so.dbg: asflags-$(CONFIG_X86_64) += -m32
 
 KBUILD_CFLAGS_32 := $(filter-out -m64,$(KBUILD_CFLAGS))
-KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out -mcmodel=small,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
 KBUILD_CFLAGS_32 := $(filter-out $(RANDSTRUCT_CFLAGS),$(KBUILD_CFLAGS_32))
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 65194ca79b5c..0000325ab98f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,10 +21,6 @@ CFLAGS_REMOVE_sev.o = -pg
 CFLAGS_REMOVE_rethook.o = -pg
 endif
 
-# head64.c contains C code that may execute from a different virtual address
-# than it was linked at, so we always build it using PIE codegen
-CFLAGS_head64.o += $(PIE_CFLAGS)
-
 KASAN_SANITIZE_head$(BITS).o				:= n
 KASAN_SANITIZE_dumpstack.o				:= n
 KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index f614009d3e4e..fdb8e780f903 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,5 +76,6 @@ KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
 		   -I$(srctree)/arch/x86/boot
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= -fno-asynchronous-unwind-tables
+KBUILD_CFLAGS_KERNEL := $(filter-out $(PIE_CFLAGS),$(KBUILD_CFLAGS_KERNEL))
 GCOV_PROFILE := n
 UBSAN_SANITIZE := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index ef45331fb043..9518b87207e8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -355,6 +355,7 @@
 	*(.data..decrypted)						\
 	*(.ref.data)							\
 	*(.data..shared_aligned) /* percpu related */			\
+	*(.data.rel .data.rel.*)					\
 	MEM_KEEP(init.data*)						\
 	*(.data.unlikely)						\
 	__start_once = .;						\
@@ -477,6 +478,7 @@
 	.rodata           : AT(ADDR(.rodata) - LOAD_OFFSET) {		\
 		__start_rodata = .;					\
 		*(.rodata) *(.rodata.*)					\
+		*(.data.rel.ro*)					\
 		SCHED_DATA						\
 		RO_AFTER_INIT_DATA	/* Read only after init */	\
 		. = ALIGN(8);						\
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 4/5] x86/head64: Replace pointer fixups with PIE codegen
  2024-01-22  9:08 ` [RFC PATCH 4/5] x86/head64: Replace pointer fixups with PIE codegen Ard Biesheuvel
@ 2024-01-22 19:34   ` Brian Gerst
  2024-01-22 22:44     ` Nathan Chancellor
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Gerst @ 2024-01-22 19:34 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, Ard Biesheuvel, Kevin Loughlin, Tom Lendacky,
	Dionna Glaze, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Andy Lutomirski, Arnd Bergmann, Martin KaFai Lau,
	Nathan Chancellor, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm

On Mon, Jan 22, 2024 at 4:14 AM Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Some of the C code in head64.c may be called from a different virtual
> address than it was linked at. Currently, we deal with this by using
> ordinary, position dependent codegen, and fixing up all symbol
> references on the fly. This is fragile and tricky to maintain. It is
> also unnecessary: we can use position independent codegen (with hidden
> visibility) to ensure that all compiler generated symbol references are
> RIP-relative, removing the need for fixups entirely.
>
> It does mean we need explicit references to kernel virtual addresses to
> be generated by hand, so generate those using a movabs instruction in
> inline asm in the handful places where we actually need this.
>
> While at it, move these routines to .inittext where they belong.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/Makefile                 |  11 ++
>  arch/x86/boot/compressed/Makefile |   2 +-
>  arch/x86/include/asm/init.h       |   2 -
>  arch/x86/include/asm/setup.h      |   2 +-
>  arch/x86/kernel/Makefile          |   4 +
>  arch/x86/kernel/head64.c          | 117 +++++++-------------
>  6 files changed, 60 insertions(+), 78 deletions(-)
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 1a068de12a56..bed0850d91b0 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -168,6 +168,17 @@ else
>          KBUILD_CFLAGS += -mcmodel=kernel
>          KBUILD_RUSTFLAGS += -Cno-redzone=y
>          KBUILD_RUSTFLAGS += -Ccode-model=kernel
> +
> +       PIE_CFLAGS := -fpie -mcmodel=small \
> +                     -include $(srctree)/include/linux/hidden.h
> +
> +       ifeq ($(CONFIG_STACKPROTECTOR),y)
> +               ifeq ($(CONFIG_SMP),y)
> +                       PIE_CFLAGS += -mstack-protector-guard-reg=gs
> +               endif

This compiler flag requires GCC 8.1 or later.  When I posted a patch
series[1] to convert the stack protector to a normal percpu variable
instead of the fixed offset, there was pushback over requiring GCC 8.1
to keep stack protector support.  I added code to objtool to convert
code from older compilers, but there hasn't been any feedback since.
Similar conversion code would be needed in objtool for this unless the
decision is made to require GCC 8.1 for stack protector support going
forward.

Brian Gerst

[1] https://lore.kernel.org/lkml/20231115173708.108316-1-brgerst@gmail.com/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 4/5] x86/head64: Replace pointer fixups with PIE codegen
  2024-01-22 19:34   ` Brian Gerst
@ 2024-01-22 22:44     ` Nathan Chancellor
  2024-01-25 10:43       ` Ard Biesheuvel
  0 siblings, 1 reply; 9+ messages in thread
From: Nathan Chancellor @ 2024-01-22 22:44 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Ard Biesheuvel, linux-kernel, Ard Biesheuvel, Kevin Loughlin,
	Tom Lendacky, Dionna Glaze, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Andy Lutomirski, Arnd Bergmann,
	Martin KaFai Lau, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm

On Mon, Jan 22, 2024 at 02:34:46PM -0500, Brian Gerst wrote:
> On Mon, Jan 22, 2024 at 4:14 AM Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Some of the C code in head64.c may be called from a different virtual
> > address than it was linked at. Currently, we deal with this by using
> > ordinary, position dependent codegen, and fixing up all symbol
> > references on the fly. This is fragile and tricky to maintain. It is
> > also unnecessary: we can use position independent codegen (with hidden
> > visibility) to ensure that all compiler generated symbol references are
> > RIP-relative, removing the need for fixups entirely.
> >
> > It does mean we need explicit references to kernel virtual addresses to
> > be generated by hand, so generate those using a movabs instruction in
> > inline asm in the handful places where we actually need this.
> >
> > While at it, move these routines to .inittext where they belong.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/Makefile                 |  11 ++
> >  arch/x86/boot/compressed/Makefile |   2 +-
> >  arch/x86/include/asm/init.h       |   2 -
> >  arch/x86/include/asm/setup.h      |   2 +-
> >  arch/x86/kernel/Makefile          |   4 +
> >  arch/x86/kernel/head64.c          | 117 +++++++-------------
> >  6 files changed, 60 insertions(+), 78 deletions(-)
> >
> > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > index 1a068de12a56..bed0850d91b0 100644
> > --- a/arch/x86/Makefile
> > +++ b/arch/x86/Makefile
> > @@ -168,6 +168,17 @@ else
> >          KBUILD_CFLAGS += -mcmodel=kernel
> >          KBUILD_RUSTFLAGS += -Cno-redzone=y
> >          KBUILD_RUSTFLAGS += -Ccode-model=kernel
> > +
> > +       PIE_CFLAGS := -fpie -mcmodel=small \
> > +                     -include $(srctree)/include/linux/hidden.h
> > +
> > +       ifeq ($(CONFIG_STACKPROTECTOR),y)
> > +               ifeq ($(CONFIG_SMP),y)
> > +                       PIE_CFLAGS += -mstack-protector-guard-reg=gs
> > +               endif
> 
> This compiler flag requires GCC 8.1 or later.  When I posted a patch
> series[1] to convert the stack protector to a normal percpu variable
> instead of the fixed offset, there was pushback over requiring GCC 8.1
> to keep stack protector support.  I added code to objtool to convert
> code from older compilers, but there hasn't been any feedback since.
> Similar conversion code would be needed in objtool for this unless the
> decision is made to require GCC 8.1 for stack protector support going
> forward.
> 
> Brian Gerst
> 
> [1] https://lore.kernel.org/lkml/20231115173708.108316-1-brgerst@gmail.com/

I was going to comment on this as well, as that flag was only supported
in clang 12.0.0 and newer. It should not be too big of a deal for us
though, as I was already planning on bumping the minimum supported
version of clang for building the kernel to 13.0.1 (but there may be
breakage reports if this series lands before that):

https://lore.kernel.org/20240110165339.GA3105@dev-arch.thelio-3990X/

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 4/5] x86/head64: Replace pointer fixups with PIE codegen
  2024-01-22 22:44     ` Nathan Chancellor
@ 2024-01-25 10:43       ` Ard Biesheuvel
  0 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2024-01-25 10:43 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Brian Gerst, Ard Biesheuvel, linux-kernel, Kevin Loughlin,
	Tom Lendacky, Dionna Glaze, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Andy Lutomirski, Arnd Bergmann,
	Martin KaFai Lau, Nick Desaulniers, Justin Stitt, linux-arch,
	bpf, llvm

On Mon, 22 Jan 2024 at 23:44, Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Mon, Jan 22, 2024 at 02:34:46PM -0500, Brian Gerst wrote:
> > On Mon, Jan 22, 2024 at 4:14 AM Ard Biesheuvel <ardb+git@google.com> wrote:
> > >
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Some of the C code in head64.c may be called from a different virtual
> > > address than it was linked at. Currently, we deal with this by using
> > > ordinary, position dependent codegen, and fixing up all symbol
> > > references on the fly. This is fragile and tricky to maintain. It is
> > > also unnecessary: we can use position independent codegen (with hidden
> > > visibility) to ensure that all compiler generated symbol references are
> > > RIP-relative, removing the need for fixups entirely.
> > >
> > > It does mean we need explicit references to kernel virtual addresses to
> > > be generated by hand, so generate those using a movabs instruction in
> > > inline asm in the handful places where we actually need this.
> > >
> > > While at it, move these routines to .inittext where they belong.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/x86/Makefile                 |  11 ++
> > >  arch/x86/boot/compressed/Makefile |   2 +-
> > >  arch/x86/include/asm/init.h       |   2 -
> > >  arch/x86/include/asm/setup.h      |   2 +-
> > >  arch/x86/kernel/Makefile          |   4 +
> > >  arch/x86/kernel/head64.c          | 117 +++++++-------------
> > >  6 files changed, 60 insertions(+), 78 deletions(-)
> > >
> > > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > > index 1a068de12a56..bed0850d91b0 100644
> > > --- a/arch/x86/Makefile
> > > +++ b/arch/x86/Makefile
> > > @@ -168,6 +168,17 @@ else
> > >          KBUILD_CFLAGS += -mcmodel=kernel
> > >          KBUILD_RUSTFLAGS += -Cno-redzone=y
> > >          KBUILD_RUSTFLAGS += -Ccode-model=kernel
> > > +
> > > +       PIE_CFLAGS := -fpie -mcmodel=small \
> > > +                     -include $(srctree)/include/linux/hidden.h
> > > +
> > > +       ifeq ($(CONFIG_STACKPROTECTOR),y)
> > > +               ifeq ($(CONFIG_SMP),y)
> > > +                       PIE_CFLAGS += -mstack-protector-guard-reg=gs
> > > +               endif
> >
> > This compiler flag requires GCC 8.1 or later.  When I posted a patch
> > series[1] to convert the stack protector to a normal percpu variable
> > instead of the fixed offset, there was pushback over requiring GCC 8.1
> > to keep stack protector support.  I added code to objtool to convert
> > code from older compilers, but there hasn't been any feedback since.
> > Similar conversion code would be needed in objtool for this unless the
> > decision is made to require GCC 8.1 for stack protector support going
> > forward.
> >
> > Brian Gerst
> >
> > [1] https://lore.kernel.org/lkml/20231115173708.108316-1-brgerst@gmail.com/
>
> I was going to comment on this as well, as that flag was only supported
> in clang 12.0.0 and newer. It should not be too big of a deal for us
> though, as I was already planning on bumping the minimum supported
> version of clang for building the kernel to 13.0.1 (but there may be
> breakage reports if this series lands before that):
>

Thanks for pointing this out.

Given that building the entire kernel with fPIC is neither necessary
nor sufficient, I am going to abandon this approach.

If we apply fPIC to only a handful of compilation units containing
code that runs from the 1:1 mapping, it is not unreasonable to simply
disable the stack protector altogether for those pieces too. This
works around the older GCC issue.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-25 10:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22  9:08 [RFC PATCH 0/5] x86: Build the core kernel using PIC codegen Ard Biesheuvel
2024-01-22  9:08 ` [RFC PATCH 1/5] kallsyms: Avoid weak references for kallsyms symbols Ard Biesheuvel
2024-01-22  9:08 ` [RFC PATCH 2/5] vmlinux: Avoid weak reference to notes section Ard Biesheuvel
2024-01-22  9:08 ` [RFC PATCH 3/5] btf: Avoid weak external references Ard Biesheuvel
2024-01-22  9:08 ` [RFC PATCH 4/5] x86/head64: Replace pointer fixups with PIE codegen Ard Biesheuvel
2024-01-22 19:34   ` Brian Gerst
2024-01-22 22:44     ` Nathan Chancellor
2024-01-25 10:43       ` Ard Biesheuvel
2024-01-22  9:08 ` [RFC PATCH 5/5] x86: Build the core kernel with position independent codegen Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).