All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/29] x86: Kernel IBT
@ 2022-02-18 16:49 Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 01/29] static_call: Avoid building empty .static_call_sites Peter Zijlstra
                   ` (29 more replies)
  0 siblings, 30 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Hi,

This is an (almost!) complete Kernel IBT implementation. It's been self-hosting
for a few days now. That is, it runs on IBT enabled hardware (Tigerlake) and is
capable of building the next kernel.

It is also almost clean on allmodconfig using GCC-11.2.

The biggest TODO item at this point is Clang, I've not yet looked at that.

Patches are also available here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/wip.ibt

This series is on top of tip/master along with the linkage patches from Mark:

  https://lore.kernel.org/all/20220216162229.1076788-1-mark.rutland@arm.com/

Enjoy!


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 01/29] static_call: Avoid building empty .static_call_sites
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 02/29] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Without CONFIG_HAVE_STATIC_CALL_INLINE there's no point in creating
the .static_call_sites section and it's related symbols.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/asm-generic/vmlinux.lds.h |    4 ++++
 1 file changed, 4 insertions(+)

--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -398,6 +398,7 @@
 	KEEP(*(__jump_table))						\
 	__stop___jump_table = .;
 
+#ifdef CONFIG_HAVE_STATIC_CALL_INLINE
 #define STATIC_CALL_DATA						\
 	. = ALIGN(8);							\
 	__start_static_call_sites = .;					\
@@ -406,6 +407,9 @@
 	__start_static_call_tramp_key = .;				\
 	KEEP(*(.static_call_tramp_key))					\
 	__stop_static_call_tramp_key = .;
+#else
+#define STATIC_CALL_DATA
+#endif
 
 /*
  * Allow architectures to handle ro_after_init data on their



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 02/29] x86/module: Fix the paravirt vs alternative order
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 01/29] static_call: Avoid building empty .static_call_sites Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 20:28   ` Josh Poimboeuf
  2022-02-18 16:49 ` [PATCH 03/29] objtool: Add --dry-run Peter Zijlstra
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, Juergen Gross

Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
paravirt patching") there is an ordering dependency between patching
paravirt ops and patching alternatives, the module loader still
violates this.

Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/module.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -272,6 +272,10 @@ int module_finalize(const Elf_Ehdr *hdr,
 			retpolines = s;
 	}
 
+	if (para) {
+		void *pseg = (void *)para->sh_addr;
+		apply_paravirt(pseg, pseg + para->sh_size);
+	}
 	if (retpolines) {
 		void *rseg = (void *)retpolines->sh_addr;
 		apply_retpolines(rseg, rseg + retpolines->sh_size);
@@ -289,11 +293,6 @@ int module_finalize(const Elf_Ehdr *hdr,
 					    tseg, tseg + text->sh_size);
 	}
 
-	if (para) {
-		void *pseg = (void *)para->sh_addr;
-		apply_paravirt(pseg, pseg + para->sh_size);
-	}
-
 	/* make jump label nops */
 	jump_label_apply_nops(me);
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 03/29] objtool: Add --dry-run
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 01/29] static_call: Avoid building empty .static_call_sites Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 02/29] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 04/29] x86/livepatch: Validate __fentry__ location Peter Zijlstra
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Add a --dry-run argument to skip writing the modifications. This is
convenient for debugging.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/builtin-check.c           |    3 ++-
 tools/objtool/elf.c                     |    3 +++
 tools/objtool/include/objtool/builtin.h |    2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,7 @@
 #include <objtool/objtool.h>
 
 bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-     validate_dup, vmlinux, mcount, noinstr, backup, sls;
+     validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
 
 static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
@@ -46,6 +46,7 @@ const struct option check_options[] = {
 	OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"),
 	OPT_BOOLEAN('B', "backup", &backup, "create .orig files before modification"),
 	OPT_BOOLEAN('S', "sls", &sls, "validate straight-line-speculation"),
+	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
 	OPT_END(),
 };
 
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -1019,6 +1019,9 @@ int elf_write(struct elf *elf)
 	struct section *sec;
 	Elf_Scn *s;
 
+	if (dryrun)
+		return 0;
+
 	/* Update changed relocation sections and section headers: */
 	list_for_each_entry(sec, &elf->sections, list) {
 		if (sec->changed) {
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,7 @@
 
 extern const struct option check_options[];
 extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-            validate_dup, vmlinux, mcount, noinstr, backup, sls;
+            validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
 
 extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (2 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 03/29] objtool: Add --dry-run Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 21:08   ` Josh Poimboeuf
  2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn, Miroslav Benes

Currently livepatch assumes __fentry__ lives at func+0, which is most
likely untrue with IBT on. Override the weak klp_get_ftrace_location()
function with an arch specific version that's IBT aware.

Also make the weak fallback verify the location is an actual ftrace
location as a sanity check.

Suggested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/livepatch.h |    9 +++++++++
 kernel/livepatch/patch.c         |    2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/livepatch.h
+++ b/arch/x86/include/asm/livepatch.h
@@ -17,4 +17,13 @@ static inline void klp_arch_set_pc(struc
 	ftrace_instruction_pointer_set(fregs, ip);
 }
 
+#define klp_get_ftrace_location klp_get_ftrace_location
+static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
+{
+	unsigned long addr = ftrace_location(faddr);
+	if (!addr && IS_ENABLED(CONFIG_X86_IBT))
+		addr = ftrace_location(faddr + 4);
+	return addr;
+}
+
 #endif /* _ASM_X86_LIVEPATCH_H */
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -133,7 +133,7 @@ static void notrace klp_ftrace_handler(u
 #ifndef klp_get_ftrace_location
 static unsigned long klp_get_ftrace_location(unsigned long faddr)
 {
-	return faddr;
+	return ftrace_location(faddr);
 }
 #endif
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 05/29] x86: Base IBT bits
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (3 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 04/29] x86/livepatch: Validate __fentry__ location Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 20:49   ` Andrew Cooper
                     ` (3 more replies)
  2022-02-18 16:49 ` [PATCH 06/29] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
                   ` (24 subsequent siblings)
  29 siblings, 4 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Add Kconfig, Makefile and basic instruction support for x86 IBT.

TODO: clang

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/Kconfig           |   15 ++++++++++++
 arch/x86/Makefile          |    5 +++-
 arch/x86/include/asm/ibt.h |   53 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+), 1 deletion(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1861,6 +1861,21 @@ config X86_UMIP
 	  specific cases in protected and virtual-8086 modes. Emulated
 	  results are dummy.
 
+config CC_HAS_IBT
+	# GCC >= 9 and binutils >= 2.29
+	# Retpoline check to work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
+	def_bool $(cc-option, -fcf-protection=branch -mindirect-branch-register) && $(as-instr,endbr64)
+
+config X86_IBT
+	prompt "Indirect Branch Tracking"
+	bool
+	depends on X86_64 && CC_HAS_IBT
+	help
+	  Build the kernel with support for Indirect Branch Tracking, a
+	  hardware supported CFI scheme. Any indirect call must land on
+	  an ENDBR instruction, as such, the compiler will litter the
+	  code with them to make this happen.
+
 config X86_INTEL_MEMORY_PROTECTION_KEYS
 	prompt "Memory Protection Keys"
 	def_bool y
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -62,8 +62,11 @@ export BITS
 #
 KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
 
-# Intel CET isn't enabled in the kernel
+ifeq ($(CONFIG_X86_IBT),y)
+KBUILD_CFLAGS += $(call cc-option,-fcf-protection=branch)
+else
 KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
+endif
 
 ifeq ($(CONFIG_X86_32),y)
         BITS := 32
--- /dev/null
+++ b/arch/x86/include/asm/ibt.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_IBT_H
+#define _ASM_X86_IBT_H
+
+#ifdef CONFIG_X86_IBT
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_X86_64
+#define ASM_ENDBR	"endbr64\n\t"
+#else
+#define ASM_ENDBR	"endbr32\n\t"
+#endif
+
+#define __noendbr	__attribute__((nocf_check))
+
+/*
+ * A bit convoluted, but matches both endbr32 and endbr64 without
+ * having either as literal in the text.
+ */
+static inline bool is_endbr(const void *addr)
+{
+	unsigned int val = ~*(unsigned int *)addr;
+	val |= 0x01000000U;
+	return val == ~0xfa1e0ff3;
+}
+
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_X86_64
+#define ENDBR	endbr64
+#else
+#define ENDBR	endbr32
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#else /* !IBT */
+
+#ifndef __ASSEMBLY__
+
+#define ASM_ENDBR
+
+#define __noendbr
+
+#else /* __ASSEMBLY__ */
+
+#define ENDBR
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* CONFIG_X86_IBT */
+#endif /* _ASM_X86_IBT_H */



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 06/29] x86/ibt: Add ANNOTATE_NOENDBR
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (4 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

In order to have objtool warn about code references to !ENDBR
instruction, we need an annotation to allow this for non-coontrol-flow
instances -- consider text range checks, text patching, or return
trampolines etc.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/linkage.h      |   11 +++++++++++
 include/linux/instruction_pointer.h |    5 +++++
 include/linux/objtool.h             |   13 +++++++++++++
 3 files changed, 29 insertions(+)

--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -78,6 +78,12 @@ struct unwind_hint {
 #define STACK_FRAME_NON_STANDARD_FP(func)
 #endif
 
+#define ANNOTATE_NOENDBR					\
+	"986: \n\t"						\
+	".pushsection .discard.noendbr\n\t"			\
+	_ASM_PTR " 986b\n\t"					\
+	".popsection\n\t"
+
 #else /* __ASSEMBLY__ */
 
 /*
@@ -130,6 +136,13 @@ struct unwind_hint {
 	.popsection
 .endm
 
+.macro ANNOTATE_NOENDBR
+.Lhere_\@:
+	.pushsection .discard.noendbr
+	.quad	.Lhere_\@
+	.popsection
+.endm
+
 #endif /* __ASSEMBLY__ */
 
 #else /* !CONFIG_STACK_VALIDATION */



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 07/29] x86/entry: Sprinkle ENDBR dust
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (5 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 06/29] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-19  0:23   ` Josh Poimboeuf
  2022-02-19  0:36   ` Josh Poimboeuf
  2022-02-18 16:49 ` [PATCH 08/29] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
                   ` (22 subsequent siblings)
  29 siblings, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Kernel entry points should be having ENDBR on for IBT configs.

The SYSCALL entry points are found through taking their respective
address in order to program them in the MSRs, while the exception
entry points are found through UNWIND_HINT_IRET_REGS.

*Except* that latter hint is also used on exit code to denote when
we're down to an IRET frame. As such add an additional 'entry'
argument to the macro and have it default to '1' such that objtool
will assume it's an entry and WARN about it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S           |   35 +++++++++++++++++++++--------------
 arch/x86/entry/entry_64_compat.S    |    3 +++
 arch/x86/include/asm/idtentry.h     |   23 +++++++++++++++--------
 arch/x86/include/asm/segment.h      |    5 +++++
 arch/x86/include/asm/unwind_hints.h |   18 +++++++++++++-----
 arch/x86/kernel/head_64.S           |   14 +++++++++-----
 arch/x86/kernel/idt.c               |    5 +++--
 arch/x86/kernel/unwind_orc.c        |    3 ++-
 include/linux/objtool.h             |    5 +++--
 tools/include/linux/objtool.h       |    5 +++--
 tools/objtool/check.c               |    3 ++-
 tools/objtool/orc_dump.c            |    3 ++-
 12 files changed, 81 insertions(+), 41 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -39,6 +39,7 @@
 #include <asm/trapnr.h>
 #include <asm/nospec-branch.h>
 #include <asm/fsgsbase.h>
+#include <asm/ibt.h>
 #include <linux/err.h>
 
 #include "calling.h"
@@ -87,6 +88,7 @@
 SYM_CODE_START(entry_SYSCALL_64)
 	UNWIND_HINT_EMPTY
 
+	ENDBR
 	swapgs
 	/* tss.sp2 is scratch space. */
 	movq	%rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
@@ -349,7 +351,8 @@ SYM_CODE_END(ret_from_fork)
  */
 .macro idtentry vector asmsym cfunc has_error_code:req
 SYM_CODE_START(\asmsym)
-	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
+	UNWIND_HINT_IRET_REGS offset=\has_error_code*8 entry=1
+	ENDBR
 	ASM_CLAC
 
 	.if \has_error_code == 0
@@ -366,7 +369,7 @@ SYM_CODE_START(\asmsym)
 		.rept	6
 		pushq	5*8(%rsp)
 		.endr
-		UNWIND_HINT_IRET_REGS offset=8
+		UNWIND_HINT_IRET_REGS offset=8 entry=0
 .Lfrom_usermode_no_gap_\@:
 	.endif
 
@@ -416,7 +419,8 @@ SYM_CODE_END(\asmsym)
  */
 .macro idtentry_mce_db vector asmsym cfunc
 SYM_CODE_START(\asmsym)
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=1
+	ENDBR
 	ASM_CLAC
 
 	pushq	$-1			/* ORIG_RAX: no syscall to restart */
@@ -471,7 +475,8 @@ SYM_CODE_END(\asmsym)
  */
 .macro idtentry_vc vector asmsym cfunc
 SYM_CODE_START(\asmsym)
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=1
+	ENDBR
 	ASM_CLAC
 
 	/*
@@ -532,7 +537,8 @@ SYM_CODE_END(\asmsym)
  */
 .macro idtentry_df vector asmsym cfunc
 SYM_CODE_START(\asmsym)
-	UNWIND_HINT_IRET_REGS offset=8
+	UNWIND_HINT_IRET_REGS offset=8 entry=1
+	ENDBR
 	ASM_CLAC
 
 	/* paranoid_entry returns GS information for paranoid_exit in EBX. */
@@ -629,7 +635,7 @@ SYM_INNER_LABEL(restore_regs_and_return_
 	INTERRUPT_RETURN
 
 SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=0
 	/*
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
@@ -706,7 +712,7 @@ SYM_INNER_LABEL(native_irq_return_iret,
 	popq	%rdi				/* Restore user RDI */
 
 	movq	%rax, %rsp
-	UNWIND_HINT_IRET_REGS offset=8
+	UNWIND_HINT_IRET_REGS offset=8 entry=0
 
 	/*
 	 * At this point, we cannot write to the stack any more, but we can
@@ -821,13 +827,13 @@ SYM_CODE_START(xen_failsafe_callback)
 	movq	8(%rsp), %r11
 	addq	$0x30, %rsp
 	pushq	$0				/* RIP */
-	UNWIND_HINT_IRET_REGS offset=8
+	UNWIND_HINT_IRET_REGS offset=8 entry=0
 	jmp	asm_exc_general_protection
 1:	/* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
 	movq	(%rsp), %rcx
 	movq	8(%rsp), %r11
 	addq	$0x30, %rsp
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=0
 	pushq	$-1 /* orig_ax = -1 => not a system call */
 	PUSH_AND_CLEAR_REGS
 	ENCODE_FRAME_POINTER
@@ -1062,7 +1068,8 @@ SYM_CODE_END(error_return)
  *	      when PAGE_TABLE_ISOLATION is in use.  Do not clobber.
  */
 SYM_CODE_START(asm_exc_nmi)
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=1
+	ENDBR
 
 	/*
 	 * We allow breakpoints in NMIs. If a breakpoint occurs, then
@@ -1127,13 +1134,13 @@ SYM_CODE_START(asm_exc_nmi)
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
-	UNWIND_HINT_IRET_REGS base=%rdx offset=8
+	UNWIND_HINT_IRET_REGS base=%rdx offset=8 entry=0
 	pushq	5*8(%rdx)	/* pt_regs->ss */
 	pushq	4*8(%rdx)	/* pt_regs->rsp */
 	pushq	3*8(%rdx)	/* pt_regs->flags */
 	pushq	2*8(%rdx)	/* pt_regs->cs */
 	pushq	1*8(%rdx)	/* pt_regs->rip */
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=0
 	pushq   $-1		/* pt_regs->orig_ax */
 	PUSH_AND_CLEAR_REGS rdx=(%rdx)
 	ENCODE_FRAME_POINTER
@@ -1289,7 +1296,7 @@ SYM_CODE_START(asm_exc_nmi)
 	.rept 5
 	pushq	11*8(%rsp)
 	.endr
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=0
 
 	/* Everything up to here is safe from nested NMIs */
 
@@ -1305,7 +1312,7 @@ SYM_CODE_START(asm_exc_nmi)
 	pushq	$__KERNEL_CS	/* CS */
 	pushq	$1f		/* RIP */
 	iretq			/* continues at repeat_nmi below */
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=0
 1:
 #endif
 
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,6 +49,7 @@
 SYM_CODE_START(entry_SYSENTER_compat)
 	UNWIND_HINT_EMPTY
 	/* Interrupts are off on entry. */
+	ENDBR
 	SWAPGS
 
 	pushq	%rax
@@ -198,6 +199,7 @@ SYM_CODE_END(entry_SYSENTER_compat)
  */
 SYM_CODE_START(entry_SYSCALL_compat)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	/* Interrupts are off on entry. */
 	swapgs
 
@@ -340,6 +342,7 @@ SYM_CODE_END(entry_SYSCALL_compat)
  */
 SYM_CODE_START(entry_INT80_compat)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	/*
 	 * Interrupts are off on entry.
 	 */
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -5,6 +5,12 @@
 /* Interrupts/Exceptions */
 #include <asm/trapnr.h>
 
+#ifdef CONFIG_X86_IBT
+#define IDT_ALIGN	16
+#else
+#define IDT_ALIGN	8
+#endif
+
 #ifndef __ASSEMBLY__
 #include <linux/entry-common.h>
 #include <linux/hardirq.h>
@@ -492,33 +498,34 @@ __visible noinstr void func(struct pt_re
  * point is to mask off the bits above bit 7 because the push is sign
  * extending.
  */
-	.align 8
+
+	.align IDT_ALIGN
 SYM_CODE_START(irq_entries_start)
     vector=FIRST_EXTERNAL_VECTOR
     .rept NR_EXTERNAL_VECTORS
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=1
 0 :
+	ENDBR
 	.byte	0x6a, vector
 	jmp	asm_common_interrupt
-	nop
 	/* Ensure that the above is 8 bytes max */
-	. = 0b + 8
+	.fill 0b + IDT_ALIGN - ., 1, 0x90
 	vector = vector+1
     .endr
 SYM_CODE_END(irq_entries_start)
 
 #ifdef CONFIG_X86_LOCAL_APIC
-	.align 8
+	.align IDT_ALIGN
 SYM_CODE_START(spurious_entries_start)
     vector=FIRST_SYSTEM_VECTOR
     .rept NR_SYSTEM_VECTORS
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=1
 0 :
+	ENDBR
 	.byte	0x6a, vector
 	jmp	asm_spurious_interrupt
-	nop
 	/* Ensure that the above is 8 bytes max */
-	. = 0b + 8
+	.fill 0b + IDT_ALIGN - ., 1, 0x90
 	vector = vector+1
     .endr
 SYM_CODE_END(spurious_entries_start)
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -4,6 +4,7 @@
 
 #include <linux/const.h>
 #include <asm/alternative.h>
+#include <asm/ibt.h>
 
 /*
  * Constructor for a conventional segment GDT (or LDT) entry.
@@ -275,7 +276,11 @@ static inline void vdso_read_cpunode(uns
  * vector has no error code (two bytes), a 'push $vector_number' (two
  * bytes), and a jump to the common entry code (up to five bytes).
  */
+#ifdef CONFIG_X86_IBT
+#define EARLY_IDT_HANDLER_SIZE 13
+#else
 #define EARLY_IDT_HANDLER_SIZE 9
+#endif
 
 /*
  * xen_early_idt_handler_array is for Xen pv guests: for each entry in
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -11,7 +11,7 @@
 	UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1
 .endm
 
-.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0
+.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1
 	.if \base == %rsp
 		.if \indirect
 			.set sp_reg, ORC_REG_SP_INDIRECT
@@ -33,9 +33,17 @@
 	.set sp_offset, \offset
 
 	.if \partial
-		.set type, UNWIND_HINT_TYPE_REGS_PARTIAL
+		.if \entry
+		.set type, UNWIND_HINT_TYPE_REGS_ENTRY
+		.else
+		.set type, UNWIND_HINT_TYPE_REGS_EXIT
+		.endif
 	.elseif \extra == 0
-		.set type, UNWIND_HINT_TYPE_REGS_PARTIAL
+		.if \entry
+		.set type, UNWIND_HINT_TYPE_REGS_ENTRY
+		.else
+		.set type, UNWIND_HINT_TYPE_REGS_EXIT
+		.endif
 		.set sp_offset, \offset + (16*8)
 	.else
 		.set type, UNWIND_HINT_TYPE_REGS
@@ -44,8 +52,8 @@
 	UNWIND_HINT sp_reg=sp_reg sp_offset=sp_offset type=type
 .endm
 
-.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0
-	UNWIND_HINT_REGS base=\base offset=\offset partial=1
+.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 entry=1
+	UNWIND_HINT_REGS base=\base offset=\offset partial=1 entry=\entry
 .endm
 
 .macro UNWIND_HINT_FUNC
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -25,6 +25,7 @@
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
 #include <asm/fixmap.h>
+#include <asm/ibt.h>
 
 /*
  * We are not able to switch in one step to the final KERNEL ADDRESS SPACE
@@ -327,7 +328,8 @@ SYM_CODE_END(start_cpu0)
  * when .init.text is freed.
  */
 SYM_CODE_START_NOALIGN(vc_boot_ghcb)
-	UNWIND_HINT_IRET_REGS offset=8
+	UNWIND_HINT_IRET_REGS offset=8 entry=1
+	ENDBR
 
 	/* Build pt_regs */
 	PUSH_AND_CLEAR_REGS
@@ -371,18 +373,20 @@ SYM_CODE_START(early_idt_handler_array)
 	i = 0
 	.rept NUM_EXCEPTION_VECTORS
 	.if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0
-		UNWIND_HINT_IRET_REGS
+		UNWIND_HINT_IRET_REGS entry=1
+		ENDBR
 		pushq $0	# Dummy error code, to make stack frame uniform
 	.else
-		UNWIND_HINT_IRET_REGS offset=8
+		UNWIND_HINT_IRET_REGS offset=8 entry=1
+		ENDBR
 	.endif
 	pushq $i		# 72(%rsp) Vector number
 	jmp early_idt_handler_common
-	UNWIND_HINT_IRET_REGS
+	UNWIND_HINT_IRET_REGS entry=0
 	i = i + 1
 	.fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
 	.endr
-	UNWIND_HINT_IRET_REGS offset=16
+	UNWIND_HINT_IRET_REGS offset=16 entry=0
 SYM_CODE_END(early_idt_handler_array)
 
 SYM_CODE_START_LOCAL(early_idt_handler_common)
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -10,6 +10,7 @@
 #include <asm/proto.h>
 #include <asm/desc.h>
 #include <asm/hw_irq.h>
+#include <asm/idtentry.h>
 
 #define DPL0		0x0
 #define DPL3		0x3
@@ -272,7 +273,7 @@ void __init idt_setup_apic_and_irq_gates
 	idt_setup_from_table(idt_table, apic_idts, ARRAY_SIZE(apic_idts), true);
 
 	for_each_clear_bit_from(i, system_vectors, FIRST_SYSTEM_VECTOR) {
-		entry = irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR);
+		entry = irq_entries_start + IDT_ALIGN * (i - FIRST_EXTERNAL_VECTOR);
 		set_intr_gate(i, entry);
 	}
 
@@ -283,7 +284,7 @@ void __init idt_setup_apic_and_irq_gates
 		 * system_vectors bitmap. Otherwise they show up in
 		 * /proc/interrupts.
 		 */
-		entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR);
+		entry = spurious_entries_start + IDT_ALIGN * (i - FIRST_SYSTEM_VECTOR);
 		set_intr_gate(i, entry);
 	}
 #endif
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -566,7 +566,8 @@ bool unwind_next_frame(struct unwind_sta
 		state->signal = true;
 		break;
 
-	case UNWIND_HINT_TYPE_REGS_PARTIAL:
+	case UNWIND_HINT_TYPE_REGS_ENTRY:
+	case UNWIND_HINT_TYPE_REGS_EXIT:
 		if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
 			orc_warn_current("can't access iret registers at %pB\n",
 					 (void *)orig_ip);
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -35,8 +35,9 @@ struct unwind_hint {
  */
 #define UNWIND_HINT_TYPE_CALL		0
 #define UNWIND_HINT_TYPE_REGS		1
-#define UNWIND_HINT_TYPE_REGS_PARTIAL	2
-#define UNWIND_HINT_TYPE_FUNC		3
+#define UNWIND_HINT_TYPE_REGS_ENTRY	2
+#define UNWIND_HINT_TYPE_REGS_EXIT	3
+#define UNWIND_HINT_TYPE_FUNC		4
 
 #ifdef CONFIG_STACK_VALIDATION
 
--- a/tools/include/linux/objtool.h
+++ b/tools/include/linux/objtool.h
@@ -35,8 +35,9 @@ struct unwind_hint {
  */
 #define UNWIND_HINT_TYPE_CALL		0
 #define UNWIND_HINT_TYPE_REGS		1
-#define UNWIND_HINT_TYPE_REGS_PARTIAL	2
-#define UNWIND_HINT_TYPE_FUNC		3
+#define UNWIND_HINT_TYPE_REGS_ENTRY	2
+#define UNWIND_HINT_TYPE_REGS_EXIT	3
+#define UNWIND_HINT_TYPE_FUNC		4
 
 #ifdef CONFIG_STACK_VALIDATION
 
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2312,7 +2312,8 @@ static int update_cfi_state(struct instr
 	}
 
 	if (cfi->type == UNWIND_HINT_TYPE_REGS ||
-	    cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL)
+	    cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY ||
+	    cfi->type == UNWIND_HINT_TYPE_REGS_EXIT)
 		return update_cfi_state_regs(insn, cfi, op);
 
 	switch (op->dest.type) {
--- a/tools/objtool/orc_dump.c
+++ b/tools/objtool/orc_dump.c
@@ -43,7 +43,8 @@ static const char *orc_type_name(unsigne
 		return "call";
 	case UNWIND_HINT_TYPE_REGS:
 		return "regs";
-	case UNWIND_HINT_TYPE_REGS_PARTIAL:
+	case UNWIND_HINT_TYPE_REGS_ENTRY:
+	case UNWIND_HINT_TYPE_REGS_EXIT:
 		return "regs (partial)";
 	default:
 		return "?";



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 08/29] x86/linkage: Add ENDBR to SYM_FUNC_START*()
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (6 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 09/29] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Ensure the ASM functions have ENDBR on for IBT builds, this follows
the ARM64 example. Unlike ARM64, we'll likely end up overwriting them
with poison.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/linkage.h |   39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -3,6 +3,7 @@
 #define _ASM_X86_LINKAGE_H
 
 #include <linux/stringify.h>
+#include <asm/ibt.h>
 
 #undef notrace
 #define notrace __attribute__((no_instrument_function))
@@ -34,5 +35,43 @@
 
 #endif /* __ASSEMBLY__ */
 
+/*
+ * compressed and purgatory define this to disable EXPORT,
+ * hijack this same to also not emit ENDBR.
+ */
+#ifndef __DISABLE_EXPORTS
+
+/* SYM_FUNC_START -- use for global functions */
+#define SYM_FUNC_START(name)				\
+	SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN)	\
+	ENDBR
+
+/* SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment */
+#define SYM_FUNC_START_NOALIGN(name)			\
+	SYM_START(name, SYM_L_GLOBAL, SYM_A_NONE)	\
+	ENDBR
+
+/* SYM_FUNC_START_LOCAL -- use for local functions */
+#define SYM_FUNC_START_LOCAL(name)			\
+	SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN)	\
+	ENDBR
+
+/* SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o alignment */
+#define SYM_FUNC_START_LOCAL_NOALIGN(name)		\
+	SYM_START(name, SYM_L_LOCAL, SYM_A_NONE)	\
+	ENDBR
+
+/* SYM_FUNC_START_WEAK -- use for weak functions */
+#define SYM_FUNC_START_WEAK(name)			\
+	SYM_START(name, SYM_L_WEAK, SYM_A_ALIGN)	\
+	ENDBR
+
+/* SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment */
+#define SYM_FUNC_START_WEAK_NOALIGN(name)		\
+	SYM_START(name, SYM_L_WEAK, SYM_A_NONE)		\
+	ENDBR
+
+#endif /* __DISABLE_EXPORTS */
+
 #endif /* _ASM_X86_LINKAGE_H */
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 09/29] x86/ibt,paravirt: Sprinkle ENDBR
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (7 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 08/29] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 10/29] x86/bpf: Add ENDBR instructions to prologue Peter Zijlstra
                   ` (20 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S                 |    1 +
 arch/x86/include/asm/paravirt.h           |    1 +
 arch/x86/include/asm/qspinlock_paravirt.h |    3 +++
 arch/x86/kernel/kvm.c                     |    3 ++-
 arch/x86/kernel/paravirt.c                |    2 ++
 5 files changed, 9 insertions(+), 1 deletion(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -636,6 +636,7 @@ SYM_INNER_LABEL(restore_regs_and_return_
 
 SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
 	UNWIND_HINT_IRET_REGS entry=0
+	ENDBR // paravirt_iret
 	/*
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -666,6 +666,7 @@ bool __raw_callee_save___native_vcpu_is_
 	    ".globl " PV_THUNK_NAME(func) ";"				\
 	    ".type " PV_THUNK_NAME(func) ", @function;"			\
 	    PV_THUNK_NAME(func) ":"					\
+	    ASM_ENDBR							\
 	    FRAME_BEGIN							\
 	    PV_SAVE_ALL_CALLER_REGS					\
 	    "call " #func ";"						\
--- a/arch/x86/include/asm/qspinlock_paravirt.h
+++ b/arch/x86/include/asm/qspinlock_paravirt.h
@@ -2,6 +2,8 @@
 #ifndef __ASM_QSPINLOCK_PARAVIRT_H
 #define __ASM_QSPINLOCK_PARAVIRT_H
 
+#include <asm/ibt.h>
+
 /*
  * For x86-64, PV_CALLEE_SAVE_REGS_THUNK() saves and restores 8 64-bit
  * registers. For i386, however, only 1 32-bit register needs to be saved
@@ -39,6 +41,7 @@ asm    (".pushsection .text;"
 	".type " PV_UNLOCK ", @function;"
 	".align 4,0x90;"
 	PV_UNLOCK ": "
+	ASM_ENDBR
 	FRAME_BEGIN
 	"push  %rdx;"
 	"mov   $0x1,%eax;"
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1024,10 +1024,11 @@ asm(
 ".global __raw_callee_save___kvm_vcpu_is_preempted;"
 ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
 "__raw_callee_save___kvm_vcpu_is_preempted:"
+ASM_ENDBR
 "movq	__per_cpu_offset(,%rdi,8), %rax;"
 "cmpb	$0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
 "setne	%al;"
-"ret;"
+ASM_RET
 ".size __raw_callee_save___kvm_vcpu_is_preempted, .-__raw_callee_save___kvm_vcpu_is_preempted;"
 ".popsection");
 
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -41,6 +41,7 @@ extern void _paravirt_nop(void);
 asm (".pushsection .entry.text, \"ax\"\n"
      ".global _paravirt_nop\n"
      "_paravirt_nop:\n\t"
+     ASM_ENDBR
      ASM_RET
      ".size _paravirt_nop, . - _paravirt_nop\n\t"
      ".type _paravirt_nop, @function\n\t"
@@ -50,6 +51,7 @@ asm (".pushsection .entry.text, \"ax\"\n
 asm (".pushsection .entry.text, \"ax\"\n"
      ".global paravirt_ret0\n"
      "paravirt_ret0:\n\t"
+     ASM_ENDBR
      "xor %" _ASM_AX ", %" _ASM_AX ";\n\t"
      ASM_RET
      ".size paravirt_ret0, . - paravirt_ret0\n\t"



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 10/29] x86/bpf: Add ENDBR instructions to prologue
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (8 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 09/29] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 11/29] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

With IBT enabled builds we need ENDBR instructions at indirect jump
target sites, since we start execution of the JIT'ed code through an
indirect jump, the very first instruction needs to be ENDBR.

Similarly, since eBPF tail-calls use indirect branches, their landing
site needs to be an ENDBR too.

Note: this shifts the trampoline patch site by 5 bytes but I've not
yet figured out where this is used.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/net/bpf_jit_comp.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -46,6 +46,12 @@ static u8 *emit_code(u8 *ptr, u32 bytes,
 #define EMIT4_off32(b1, b2, b3, b4, off) \
 	do { EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
 
+#ifdef CONFIG_X86_IBT
+#define EMIT_ENDBR() EMIT4(0xf3, 0x0f, 0x1e, 0xfa)
+#else
+#define EMIT_ENDBR()
+#endif
+
 static bool is_imm8(int value)
 {
 	return value <= 127 && value >= -128;
@@ -241,7 +247,7 @@ struct jit_context {
 /* Number of bytes emit_patch() needs to generate instructions */
 #define X86_PATCH_SIZE		5
 /* Number of bytes that will be skipped on tailcall */
-#define X86_TAIL_CALL_OFFSET	11
+#define X86_TAIL_CALL_OFFSET	(11 + 4*IS_ENABLED(CONFIG_X86_IBT))
 
 static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
 {
@@ -286,6 +292,7 @@ static void emit_prologue(u8 **pprog, u3
 	/* BPF trampoline can be made to work without these nops,
 	 * but let's waste 5 bytes for now and optimize later
 	 */
+	EMIT_ENDBR();
 	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
 	prog += X86_PATCH_SIZE;
 	if (!ebpf_from_cbpf) {
@@ -296,6 +303,10 @@ static void emit_prologue(u8 **pprog, u3
 	}
 	EMIT1(0x55);             /* push rbp */
 	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
+
+	/* X86_TAIL_CALL_OFFSET is here */
+	EMIT_ENDBR();
+
 	/* sub rsp, rounded_stack_depth */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 11/29] x86/ibt,crypto: Add ENDBR for the jump-table entries
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (9 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 10/29] x86/bpf: Add ENDBR instructions to prologue Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 12/29] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -195,6 +195,7 @@ SYM_FUNC_START(crc_pcl)
 .altmacro
 LABEL crc_ %i
 .noaltmacro
+	ENDBR
 	crc32q   -i*8(block_0), crc_init
 	crc32q   -i*8(block_1), crc1
 	crc32q   -i*8(block_2), crc2
@@ -203,6 +204,7 @@ LABEL crc_ %i
 
 .altmacro
 LABEL crc_ %i
+	ENDBR
 .noaltmacro
 	crc32q   -i*8(block_0), crc_init
 	crc32q   -i*8(block_1), crc1
@@ -237,6 +239,7 @@ LABEL crc_ %i
 	################################################################
 
 LABEL crc_ 0
+	ENDBR
 	mov     tmp, len
 	cmp     $128*24, tmp
 	jae     full_block



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 12/29] x86/ibt,kvm: Add ENDBR to fastops
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (10 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 11/29] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 13/29] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -189,7 +189,7 @@
 #define X16(x...) X8(x), X8(x)
 
 #define NR_FASTOP (ilog2(sizeof(ulong)) + 1)
-#define FASTOP_SIZE 8
+#define FASTOP_SIZE (8 * (1 + IS_ENABLED(CONFIG_X86_IBT)))
 
 struct opcode {
 	u64 flags;
@@ -311,7 +311,8 @@ static int fastop(struct x86_emulate_ctx
 #define __FOP_FUNC(name) \
 	".align " __stringify(FASTOP_SIZE) " \n\t" \
 	".type " name ", @function \n\t" \
-	name ":\n\t"
+	name ":\n\t" \
+	ASM_ENDBR
 
 #define FOP_FUNC(name) \
 	__FOP_FUNC(#name)
@@ -433,6 +434,7 @@ static int fastop(struct x86_emulate_ctx
 	".align 4 \n\t" \
 	".type " #op ", @function \n\t" \
 	#op ": \n\t" \
+	ASM_ENDBR \
 	#op " %al \n\t" \
 	__FOP_RET(#op)
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 13/29] x86/ibt,ftrace: Add ENDBR to samples/ftrace
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (11 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 12/29] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 samples/ftrace/ftrace-direct-modify.c       |    5 +++++
 samples/ftrace/ftrace-direct-multi-modify.c |   10 +++++++---
 samples/ftrace/ftrace-direct-multi.c        |    5 ++++-
 samples/ftrace/ftrace-direct-too.c          |    3 +++
 samples/ftrace/ftrace-direct.c              |    3 +++
 5 files changed, 22 insertions(+), 4 deletions(-)

--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -24,20 +24,25 @@ static unsigned long my_ip = (unsigned l
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp1, @function\n"
 "	.globl		my_tramp1\n"
 "   my_tramp1:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	call my_direct_func1\n"
 "	leave\n"
 "	.size		my_tramp1, .-my_tramp1\n"
 	ASM_RET
+
 "	.type		my_tramp2, @function\n"
 "	.globl		my_tramp2\n"
 "   my_tramp2:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	call my_direct_func2\n"
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -22,11 +22,14 @@ extern void my_tramp2(void *);
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp1, @function\n"
 "	.globl		my_tramp1\n"
 "   my_tramp1:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"
@@ -34,12 +37,13 @@ asm (
 "	call my_direct_func1\n"
 "	popq %rdi\n"
 "	leave\n"
-"	ret\n"
+	ASM_RET
 "	.size		my_tramp1, .-my_tramp1\n"
+
 "	.type		my_tramp2, @function\n"
-"\n"
 "	.globl		my_tramp2\n"
 "   my_tramp2:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"
@@ -47,7 +51,7 @@ asm (
 "	call my_direct_func2\n"
 "	popq %rdi\n"
 "	leave\n"
-"	ret\n"
+	ASM_RET
 "	.size		my_tramp2, .-my_tramp2\n"
 "	.popsection\n"
 );
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -17,11 +17,14 @@ extern void my_tramp(void *);
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp, @function\n"
 "	.globl		my_tramp\n"
 "   my_tramp:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"
@@ -29,7 +32,7 @@ asm (
 "	call my_direct_func\n"
 "	popq %rdi\n"
 "	leave\n"
-"	ret\n"
+	ASM_RET
 "	.size		my_tramp, .-my_tramp\n"
 "	.popsection\n"
 );
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -19,11 +19,14 @@ extern void my_tramp(void *);
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp, @function\n"
 "	.globl		my_tramp\n"
 "   my_tramp:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -16,11 +16,14 @@ extern void my_tramp(void *);
 
 #ifdef CONFIG_X86_64
 
+#include <asm/ibt.h>
+
 asm (
 "	.pushsection    .text, \"ax\", @progbits\n"
 "	.type		my_tramp, @function\n"
 "	.globl		my_tramp\n"
 "   my_tramp:"
+	ASM_ENDBR
 "	pushq %rbp\n"
 "	movq %rsp, %rbp\n"
 "	pushq %rdi\n"



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (12 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 13/29] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 19:31   ` Andrew Cooper
                     ` (4 more replies)
  2022-02-18 16:49 ` [PATCH 15/29] x86: Disable IBT around firmware Peter Zijlstra
                   ` (15 subsequent siblings)
  29 siblings, 5 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

The bits required to make the hardware go.. Of note is that, provided
the syscall entry points are covered with ENDBR, #CP doesn't need to
be an IST because we'll never hit the syscall gap.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/cpufeatures.h          |    1 
 arch/x86/include/asm/idtentry.h             |    5 ++
 arch/x86/include/asm/msr-index.h            |   20 ++++++++
 arch/x86/include/asm/traps.h                |    2 
 arch/x86/include/uapi/asm/processor-flags.h |    2 
 arch/x86/kernel/cpu/common.c                |   23 +++++++++
 arch/x86/kernel/idt.c                       |    4 +
 arch/x86/kernel/traps.c                     |   65 ++++++++++++++++++++++++++++
 8 files changed, 121 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -387,6 +387,7 @@
 #define X86_FEATURE_TSXLDTRK		(18*32+16) /* TSX Suspend Load Address Tracking */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_ARCH_LBR		(18*32+19) /* Intel ARCH LBR */
+#define X86_FEATURE_IBT			(18*32+20) /* Indirect Branch Tracking */
 #define X86_FEATURE_AMX_BF16		(18*32+22) /* AMX bf16 Support */
 #define X86_FEATURE_AVX512_FP16		(18*32+23) /* AVX512 FP16 */
 #define X86_FEATURE_AMX_TILE		(18*32+24) /* AMX tile Support */
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -622,6 +622,11 @@ DECLARE_IDTENTRY_DF(X86_TRAP_DF,	exc_dou
 DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF,	xenpv_exc_double_fault);
 #endif
 
+/* #CP */
+#ifdef CONFIG_X86_IBT
+DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP,	exc_control_protection);
+#endif
+
 /* #VC */
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 DECLARE_IDTENTRY_VC(X86_TRAP_VC,	exc_vmm_communication);
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -360,11 +360,29 @@
 #define MSR_ATOM_CORE_TURBO_RATIOS	0x0000066c
 #define MSR_ATOM_CORE_TURBO_VIDS	0x0000066d
 
-
 #define MSR_CORE_PERF_LIMIT_REASONS	0x00000690
 #define MSR_GFX_PERF_LIMIT_REASONS	0x000006B0
 #define MSR_RING_PERF_LIMIT_REASONS	0x000006B1
 
+/* Control-flow Enforcement Technology MSRs */
+#define MSR_IA32_U_CET			0x000006a0 /* user mode cet */
+#define MSR_IA32_S_CET			0x000006a2 /* kernel mode cet */
+#define CET_SHSTK_EN			BIT_ULL(0)
+#define CET_WRSS_EN			BIT_ULL(1)
+#define CET_ENDBR_EN			BIT_ULL(2)
+#define CET_LEG_IW_EN			BIT_ULL(3)
+#define CET_NO_TRACK_EN			BIT_ULL(4)
+#define CET_SUPPRESS_DISABLE		BIT_ULL(5)
+#define CET_RESERVED			(BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9))
+#define CET_SUPPRESS			BIT_ULL(10)
+#define CET_WAIT_ENDBR			BIT_ULL(11)
+
+#define MSR_IA32_PL0_SSP		0x000006a4 /* ring-0 shadow stack pointer */
+#define MSR_IA32_PL1_SSP		0x000006a5 /* ring-1 shadow stack pointer */
+#define MSR_IA32_PL2_SSP		0x000006a6 /* ring-2 shadow stack pointer */
+#define MSR_IA32_PL3_SSP		0x000006a7 /* ring-3 shadow stack pointer */
+#define MSR_IA32_INT_SSP_TAB		0x000006a8 /* exception shadow stack table */
+
 /* Hardware P state interface */
 #define MSR_PPERF			0x0000064e
 #define MSR_PERF_LIMIT_REASONS		0x0000064f
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -18,6 +18,8 @@ void __init trap_init(void);
 asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
 #endif
 
+extern bool ibt_selftest(void);
+
 #ifdef CONFIG_X86_F00F_BUG
 /* For handling the FOOF bug */
 void handle_invalid_op(struct pt_regs *regs);
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -130,6 +130,8 @@
 #define X86_CR4_SMAP		_BITUL(X86_CR4_SMAP_BIT)
 #define X86_CR4_PKE_BIT		22 /* enable Protection Keys support */
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
+#define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
+#define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
 
 /*
  * x86-64 Task Priority Register, CR8
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -59,6 +59,7 @@
 #include <asm/cpu_device_id.h>
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
+#include <asm/traps.h>
 
 #include "cpu.h"
 
@@ -592,6 +593,27 @@ static __init int setup_disable_pku(char
 __setup("nopku", setup_disable_pku);
 #endif /* CONFIG_X86_64 */
 
+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
+{
+	u64 msr;
+
+	if (!IS_ENABLED(CONFIG_X86_IBT) ||
+	    !cpu_feature_enabled(X86_FEATURE_IBT))
+		return;
+
+	cr4_set_bits(X86_CR4_CET);
+
+	rdmsrl(MSR_IA32_S_CET, msr);
+	if (cpu_feature_enabled(X86_FEATURE_IBT))
+		msr |= CET_ENDBR_EN;
+	wrmsrl(MSR_IA32_S_CET, msr);
+
+	if (!ibt_selftest()) {
+		pr_err("IBT selftest: Failed!\n");
+		setup_clear_cpu_cap(X86_FEATURE_IBT);
+	}
+}
+
 /*
  * Some CPU features depend on higher CPUID levels, which may not always
  * be available due to CPUID level capping or broken virtualization
@@ -1709,6 +1731,7 @@ static void identify_cpu(struct cpuinfo_
 
 	x86_init_rdrand(c);
 	setup_pku(c);
+	setup_cet(c);
 
 	/*
 	 * Clear/Set all flags overridden by options, need do it
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -104,6 +104,10 @@ static const __initconst struct idt_data
 	ISTG(X86_TRAP_MC,		asm_exc_machine_check, IST_INDEX_MCE),
 #endif
 
+#ifdef CONFIG_X86_IBT
+	INTG(X86_TRAP_CP,		asm_exc_control_protection),
+#endif
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 	ISTG(X86_TRAP_VC,		asm_exc_vmm_communication, IST_INDEX_VC),
 #endif
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -210,6 +210,71 @@ DEFINE_IDTENTRY(exc_overflow)
 	do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL);
 }
 
+#ifdef CONFIG_X86_IBT
+
+static bool ibt_fatal = true;
+
+extern unsigned long ibt_selftest_ip; /* defined in asm beow */
+static volatile bool ibt_selftest_ok = false;
+
+DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
+		pr_err("Whaaa?!?!\n");
+		return;
+	}
+
+	if (WARN_ON_ONCE(user_mode(regs) || error_code != 3))
+		return;
+
+	if (unlikely(regs->ip == ibt_selftest_ip)) {
+		ibt_selftest_ok = true;
+		return;
+	}
+
+	pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
+	BUG_ON(ibt_fatal);
+}
+
+bool ibt_selftest(void)
+{
+	ibt_selftest_ok = false;
+
+	asm (ANNOTATE_NOENDBR
+	     "1: lea 2f(%%rip), %%rax\n\t"
+	     ANNOTATE_RETPOLINE_SAFE
+	     "   jmp *%%rax\n\t"
+	     "2: nop\n\t"
+
+	     /* unsigned ibt_selftest_ip = 2b */
+	     ".pushsection .data,\"aw\"\n\t"
+	     ".align 8\n\t"
+	     ".type ibt_selftest_ip, @object\n\t"
+	     ".size ibt_selftest_ip, 8\n\t"
+	     "ibt_selftest_ip:\n\t"
+	     ".quad 2b\n\t"
+	     ".popsection\n\t"
+
+	     : : : "rax", "memory");
+
+	return ibt_selftest_ok;
+}
+
+static int __init ibt_setup(char *str)
+{
+	if (!strcmp(str, "off"))
+		setup_clear_cpu_cap(X86_FEATURE_IBT);
+
+	if (!strcmp(str, "warn"))
+		ibt_fatal = false;
+
+	return 1;
+}
+
+__setup("ibt=", ibt_setup);
+
+#endif /* CONFIG_X86_IBT */
+
 #ifdef CONFIG_X86_F00F_BUG
 void handle_invalid_op(struct pt_regs *regs)
 #else



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 15/29] x86: Disable IBT around firmware
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (13 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-21  8:27   ` Kees Cook
  2022-02-18 16:49 ` [PATCH 16/29] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
                   ` (14 subsequent siblings)
  29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Assume firmware isn't IBT clean and disable it across calls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/efi.h   |    9 +++++++--
 arch/x86/include/asm/ibt.h   |   10 ++++++++++
 arch/x86/kernel/apm_32.c     |    7 +++++++
 arch/x86/kernel/cpu/common.c |   28 ++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -7,6 +7,7 @@
 #include <asm/tlb.h>
 #include <asm/nospec-branch.h>
 #include <asm/mmu_context.h>
+#include <asm/ibt.h>
 #include <linux/build_bug.h>
 #include <linux/kernel.h>
 #include <linux/pgtable.h>
@@ -120,8 +121,12 @@ extern asmlinkage u64 __efi_call(void *f
 	efi_enter_mm();							\
 })
 
-#define arch_efi_call_virt(p, f, args...)				\
-	efi_call((void *)p->f, args)					\
+#define arch_efi_call_virt(p, f, args...) ({				\
+	u64 ret, ibt = ibt_save();					\
+	ret = efi_call((void *)p->f, args);				\
+	ibt_restore(ibt);						\
+	ret;								\
+})
 
 #define arch_efi_call_virt_teardown()					\
 ({									\
--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -6,6 +6,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/types.h>
+
 #ifdef CONFIG_X86_64
 #define ASM_ENDBR	"endbr64\n\t"
 #else
@@ -25,6 +27,9 @@ static inline bool is_endbr(const void *
 	return val == ~0xfa1e0ff3;
 }
 
+extern u64 ibt_save(void);
+extern void ibt_restore(u64 save);
+
 #else /* __ASSEMBLY__ */
 
 #ifdef CONFIG_X86_64
@@ -39,10 +44,15 @@ static inline bool is_endbr(const void *
 
 #ifndef __ASSEMBLY__
 
+#include <linux/types.h>
+
 #define ASM_ENDBR
 
 #define __noendbr
 
+static inline u64 ibt_save(void) { return 0; }
+static inline void ibt_restore(u64 save) { }
+
 #else /* __ASSEMBLY__ */
 
 #define ENDBR
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -232,6 +232,7 @@
 #include <asm/paravirt.h>
 #include <asm/reboot.h>
 #include <asm/nospec-branch.h>
+#include <asm/ibt.h>
 
 #if defined(CONFIG_APM_DISPLAY_BLANK) && defined(CONFIG_VT)
 extern int (*console_blank_hook)(int);
@@ -598,6 +599,7 @@ static long __apm_bios_call(void *_call)
 	struct desc_struct	save_desc_40;
 	struct desc_struct	*gdt;
 	struct apm_bios_call	*call = _call;
+	u64			ibt;
 
 	cpu = get_cpu();
 	BUG_ON(cpu != 0);
@@ -607,11 +609,13 @@ static long __apm_bios_call(void *_call)
 
 	apm_irq_save(flags);
 	firmware_restrict_branch_speculation_start();
+	ibt = ibt_save();
 	APM_DO_SAVE_SEGS;
 	apm_bios_call_asm(call->func, call->ebx, call->ecx,
 			  &call->eax, &call->ebx, &call->ecx, &call->edx,
 			  &call->esi);
 	APM_DO_RESTORE_SEGS;
+	ibt_restore(ibt);
 	firmware_restrict_branch_speculation_end();
 	apm_irq_restore(flags);
 	gdt[0x40 / 8] = save_desc_40;
@@ -676,6 +680,7 @@ static long __apm_bios_call_simple(void
 	struct desc_struct	save_desc_40;
 	struct desc_struct	*gdt;
 	struct apm_bios_call	*call = _call;
+	u64			ibt;
 
 	cpu = get_cpu();
 	BUG_ON(cpu != 0);
@@ -685,10 +690,12 @@ static long __apm_bios_call_simple(void
 
 	apm_irq_save(flags);
 	firmware_restrict_branch_speculation_start();
+	ibt = ibt_save();
 	APM_DO_SAVE_SEGS;
 	error = apm_bios_call_simple_asm(call->func, call->ebx, call->ecx,
 					 &call->eax);
 	APM_DO_RESTORE_SEGS;
+	ibt_restore(ibt);
 	firmware_restrict_branch_speculation_end();
 	apm_irq_restore(flags);
 	gdt[0x40 / 8] = save_desc_40;
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -592,6 +592,34 @@ static __init int setup_disable_pku(char
 __setup("nopku", setup_disable_pku);
 #endif /* CONFIG_X86_64 */
 
+#ifdef CONFIG_X86_IBT
+
+u64 ibt_save(void)
+{
+	u64 msr = 0;
+
+	if (cpu_feature_enabled(X86_FEATURE_IBT)) {
+		rdmsrl(MSR_IA32_S_CET, msr);
+		wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
+	}
+
+	return msr;
+}
+
+void ibt_restore(u64 save)
+{
+	u64 msr;
+
+	if (cpu_feature_enabled(X86_FEATURE_IBT)) {
+		rdmsrl(MSR_IA32_S_CET, msr);
+		msr &= ~CET_ENDBR_EN;
+		msr |= (save & CET_ENDBR_EN);
+		wrmsrl(MSR_IA32_S_CET, msr);
+	}
+}
+
+#endif
+
 static __always_inline void setup_cet(struct cpuinfo_x86 *c)
 {
 	u64 msr;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 16/29] x86/bugs: Disable Retpoline when IBT
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (14 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 15/29] x86: Disable IBT around firmware Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-19  2:15   ` Josh Poimboeuf
  2022-02-18 16:49 ` [PATCH 17/29] x86/ibt: Annotate text references Peter Zijlstra
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Retpoline and IBT are mutually exclusive. IBT relies on indirect
branches (JMP/CALL *%reg) while retpoline avoids them by design.

Demote to LFENCE on IBT enabled hardware.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/cpu/bugs.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -937,6 +937,11 @@ static void __init spectre_v2_select_mit
 	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
 	retpoline_amd:
 		if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
+			if (IS_ENABLED(CONFIG_X86_IBT) &&
+			    boot_cpu_has(X86_FEATURE_IBT)) {
+				pr_err("Spectre mitigation: LFENCE not serializing, generic retpoline not available due to IBT, switching to none\n");
+				return;
+			}
 			pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n");
 			goto retpoline_generic;
 		}
@@ -945,6 +950,26 @@ static void __init spectre_v2_select_mit
 		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
 	} else {
 	retpoline_generic:
+		/*
+		 *  Full retpoline is incompatible with IBT, demote to LFENCE.
+		 */
+		if (IS_ENABLED(CONFIG_X86_IBT) &&
+		    boot_cpu_has(X86_FEATURE_IBT)) {
+			switch (cmd) {
+			case SPECTRE_V2_CMD_FORCE:
+			case SPECTRE_V2_CMD_AUTO:
+			case SPECTRE_V2_CMD_RETPOLINE:
+				/* silent for auto select */
+				break;
+
+			default:
+				/* warn when 'demoting' an explicit selection */
+				pr_warn("Spectre mitigation: Switching to LFENCE due to IBT\n");
+				break;
+			}
+
+			goto retpoline_amd;
+		}
 		mode = SPECTRE_V2_RETPOLINE_GENERIC;
 		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
 	}



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 17/29] x86/ibt: Annotate text references
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (15 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 16/29] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-19  5:22   ` Josh Poimboeuf
  2022-02-18 16:49 ` [PATCH 18/29] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
                   ` (12 subsequent siblings)
  29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Annotate away some of the generic code references. This is things
where we take the address of a symbol for exception handling or return
addresses (eg. context switch).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S            |    9 +++++++++
 arch/x86/entry/entry_64_compat.S     |    1 +
 arch/x86/kernel/alternative.c        |    4 +++-
 arch/x86/kernel/head_64.S            |    4 ++++
 arch/x86/kernel/kprobes/core.c       |    1 +
 arch/x86/kernel/relocate_kernel_64.S |    2 ++
 arch/x86/lib/error-inject.c          |    1 +
 arch/x86/lib/retpoline.S             |    2 ++
 10 files changed, 33 insertions(+), 2 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -278,6 +278,7 @@ SYM_FUNC_END(__switch_to_asm)
 .pushsection .text, "ax"
 SYM_CODE_START(ret_from_fork)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR // copy_thread
 	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
@@ -564,12 +565,16 @@ SYM_CODE_END(\asmsym)
 	.align 16
 	.globl __irqentry_text_start
 __irqentry_text_start:
+	ANNOTATE_NOENDBR // unwinders
+	ud2;
 
 #include <asm/idtentry.h>
 
 	.align 16
 	.globl __irqentry_text_end
 __irqentry_text_end:
+	ANNOTATE_NOENDBR
+	ud2;
 
 SYM_CODE_START_LOCAL(common_interrupt_return)
 SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
@@ -647,6 +652,7 @@ SYM_INNER_LABEL_ALIGN(native_iret, SYM_L
 #endif
 
 SYM_INNER_LABEL(native_irq_return_iret, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR // exc_double_fault
 	/*
 	 * This may fault.  Non-paranoid faults on return to userspace are
 	 * handled by fixup_bad_iret.  These include #SS, #GP, and #NP.
@@ -741,6 +747,7 @@ SYM_FUNC_START(asm_load_gs_index)
 	FRAME_BEGIN
 	swapgs
 .Lgs_change:
+	ANNOTATE_NOENDBR // error_entry
 	movl	%edi, %gs
 2:	ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
 	swapgs
@@ -1318,6 +1325,7 @@ SYM_CODE_START(asm_exc_nmi)
 #endif
 
 repeat_nmi:
+	ANNOTATE_NOENDBR // this code
 	/*
 	 * If there was a nested NMI, the first NMI's iret will return
 	 * here. But NMIs are still enabled and we can take another
@@ -1346,6 +1354,7 @@ SYM_CODE_START(asm_exc_nmi)
 	.endr
 	subq	$(5*8), %rsp
 end_repeat_nmi:
+	ANNOTATE_NOENDBR // this code
 
 	/*
 	 * Everything below this point can be preempted by a nested NMI.
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -148,6 +148,7 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_af
 	popfq
 	jmp	.Lsysenter_flags_fixed
 SYM_INNER_LABEL(__end_entry_SYSENTER_compat, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR // is_sysenter_singlestep
 SYM_CODE_END(entry_SYSENTER_compat)
 
 /*
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -713,6 +713,7 @@ asm (
 "	.pushsection	.init.text, \"ax\", @progbits\n"
 "	.type		int3_magic, @function\n"
 "int3_magic:\n"
+	ANNOTATE_NOENDBR
 "	movl	$1, (%" _ASM_ARG1 ")\n"
 	ASM_RET
 "	.size		int3_magic, .-int3_magic\n"
@@ -757,7 +758,8 @@ static void __init int3_selftest(void)
 	 * then trigger the INT3, padded with NOPs to match a CALL instruction
 	 * length.
 	 */
-	asm volatile ("1: int3; nop; nop; nop; nop\n\t"
+	asm volatile (ANNOTATE_NOENDBR
+		      "1: int3; nop; nop; nop; nop\n\t"
 		      ".pushsection .init.data,\"aw\"\n\t"
 		      ".align " __ASM_SEL(4, 8) "\n\t"
 		      ".type int3_selftest_ip, @object\n\t"
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -100,6 +100,7 @@ SYM_CODE_END(startup_64)
 
 SYM_CODE_START(secondary_startup_64)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	/*
 	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
 	 * and someone has loaded a mapped page table.
@@ -128,6 +129,7 @@ SYM_CODE_START(secondary_startup_64)
 	 */
 SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 
 	/*
 	 * Retrieve the modifier (SME encryption mask if SME is active) to be
@@ -193,6 +195,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_
 	jmp	*%rax
 1:
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR // above
 
 	/*
 	 * We must switch to a new descriptor in kernel space for the GDT
@@ -300,6 +303,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_
 	pushq	%rax		# target address in negative space
 	lretq
 .Lafter_lret:
+	ANNOTATE_NOENDBR
 SYM_CODE_END(secondary_startup_64)
 
 #include "verify_cpu.S"
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1023,6 +1023,7 @@ asm(
 	".type __kretprobe_trampoline, @function\n"
 	"__kretprobe_trampoline:\n"
 #ifdef CONFIG_X86_64
+	ANNOTATE_NOENDBR
 	/* Push a fake return address to tell the unwinder it's a kretprobe. */
 	"	pushq $__kretprobe_trampoline\n"
 	UNWIND_HINT_FUNC
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -42,6 +42,7 @@
 	.code64
 SYM_CODE_START_NOALIGN(relocate_kernel)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	/*
 	 * %rdi indirection_page
 	 * %rsi page_list
@@ -215,6 +216,7 @@ SYM_CODE_END(identity_mapped)
 
 SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR // RET target, above
 	movq	RSP(%r8), %rsp
 	movq	CR4(%r8), %rax
 	movq	%rax, %cr4
--- a/arch/x86/lib/error-inject.c
+++ b/arch/x86/lib/error-inject.c
@@ -11,6 +11,7 @@ asm(
 	".type just_return_func, @function\n"
 	".globl just_return_func\n"
 	"just_return_func:\n"
+		ANNOTATE_NOENDBR
 		ASM_RET
 	".size just_return_func, .-just_return_func\n"
 );
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -12,6 +12,8 @@
 
 	.section .text.__x86.indirect_thunk
 
+	ANNOTATE_NOENDBR // apply_retpolines
+
 .macro RETPOLINE reg
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call    .Ldo_rop_\@



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 18/29] x86/ibt,ftrace: Annotate ftrace code patching
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (16 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 17/29] x86/ibt: Annotate text references Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 19/29] x86/ibt,xen: Annotate away warnings Peter Zijlstra
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Notably the noinline is required to generate sane code; without it GCC
think's it's awesome to fold in a constant to the code reloc which
puts it in the wrong place to match with the ANNOTATE_NOENDBR.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/ftrace.c    |    2 +-
 arch/x86/kernel/ftrace_64.S |    9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -69,7 +69,7 @@ static const char *ftrace_nop_replace(vo
 	return x86_nops[5];
 }
 
-static const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+static noinline const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 {
 	return text_gen_insn(CALL_INSN_OPCODE, (void *)ip, (void *)addr);
 }
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -145,6 +145,7 @@ SYM_FUNC_START(ftrace_caller)
 	movq %rcx, RSP(%rsp)
 
 SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	/* Load the ftrace_ops into the 3rd parameter */
 	movq function_trace_op(%rip), %rdx
 
@@ -155,6 +156,7 @@ SYM_INNER_LABEL(ftrace_caller_op_ptr, SY
 	movq $0, CS(%rsp)
 
 SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	call ftrace_stub
 
 	/* Handlers can change the RIP */
@@ -169,6 +171,7 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBA
 	 * layout here.
 	 */
 SYM_INNER_LABEL(ftrace_caller_end, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 
 	jmp ftrace_epilogue
 SYM_FUNC_END(ftrace_caller);
@@ -179,6 +182,7 @@ SYM_FUNC_START(ftrace_epilogue)
  * It is also used to copy the RET for trampolines.
  */
 SYM_INNER_LABEL_ALIGN(ftrace_stub, SYM_L_WEAK)
+	ANNOTATE_NOENDBR
 	UNWIND_HINT_FUNC
 	RET
 SYM_FUNC_END(ftrace_epilogue)
@@ -192,6 +196,7 @@ SYM_FUNC_START(ftrace_regs_caller)
 	/* save_mcount_regs fills in first two parameters */
 
 SYM_INNER_LABEL(ftrace_regs_caller_op_ptr, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	/* Load the ftrace_ops into the 3rd parameter */
 	movq function_trace_op(%rip), %rdx
 
@@ -221,6 +226,7 @@ SYM_INNER_LABEL(ftrace_regs_caller_op_pt
 	leaq (%rsp), %rcx
 
 SYM_INNER_LABEL(ftrace_regs_call, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	call ftrace_stub
 
 	/* Copy flags back to SS, to restore them */
@@ -248,6 +254,7 @@ SYM_INNER_LABEL(ftrace_regs_call, SYM_L_
 	 */
 	testq	%rax, %rax
 SYM_INNER_LABEL(ftrace_regs_caller_jmp, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	jnz	1f
 
 	restore_mcount_regs
@@ -261,6 +268,7 @@ SYM_INNER_LABEL(ftrace_regs_caller_jmp,
 	 * to the return.
 	 */
 SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	jmp ftrace_epilogue
 
 	/* Swap the flags with orig_rax */
@@ -284,6 +292,7 @@ SYM_FUNC_START(__fentry__)
 	jnz trace
 
 SYM_INNER_LABEL(ftrace_stub, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 	RET
 
 trace:



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 19/29] x86/ibt,xen: Annotate away warnings
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (17 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 18/29] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 20:24   ` Andrew Cooper
  2022-02-18 16:49 ` [PATCH 20/29] x86/ibt,sev: Annotations Peter Zijlstra
                   ` (10 subsequent siblings)
  29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

The xen_iret ENDBR is needed for pre-alternative code calling the
pv_ops using indirect calls.

The rest look like hypervisor entry points which will be IRET like
transfers and as such don't need ENDBR.

The hypercall page comes from the hypervisor, there might or might not
be ENDBR there, not our problem.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S |    1 +
 arch/x86/kernel/head_64.S |    1 +
 arch/x86/xen/xen-asm.S    |    8 ++++++++
 arch/x86/xen/xen-head.S   |    5 +++--
 4 files changed, 13 insertions(+), 2 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -818,6 +818,7 @@ SYM_CODE_END(exc_xen_hypervisor_callback
  */
 SYM_CODE_START(xen_failsafe_callback)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	movl	%ds, %ecx
 	cmpw	%cx, 0x10(%rsp)
 	jne	1f
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -392,6 +392,7 @@ SYM_CODE_START(early_idt_handler_array)
 	.endr
 	UNWIND_HINT_IRET_REGS offset=16 entry=0
 SYM_CODE_END(early_idt_handler_array)
+	ANNOTATE_NOENDBR // early_idt_handler_array[NUM_EXCEPTION_VECTORS]
 
 SYM_CODE_START_LOCAL(early_idt_handler_common)
 	/*
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -122,6 +122,7 @@ SYM_FUNC_END(xen_read_cr2_direct);
 .macro xen_pv_trap name
 SYM_CODE_START(xen_\name)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	pop %rcx
 	pop %r11
 	jmp  \name
@@ -162,6 +163,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
 	i = 0
 	.rept NUM_EXCEPTION_VECTORS
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	pop %rcx
 	pop %r11
 	jmp early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE
@@ -169,6 +171,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
 	.fill xen_early_idt_handler_array + i*XEN_EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
 	.endr
 SYM_CODE_END(xen_early_idt_handler_array)
+	ANNOTATE_NOENDBR
 	__FINIT
 
 hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
@@ -189,6 +192,7 @@ hypercall_iret = hypercall_page + __HYPE
  */
 SYM_CODE_START(xen_iret)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	pushq $0
 	jmp hypercall_iret
 SYM_CODE_END(xen_iret)
@@ -230,6 +234,7 @@ SYM_CODE_END(xenpv_restore_regs_and_retu
 /* Normal 64-bit system call target */
 SYM_CODE_START(xen_syscall_target)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	popq %rcx
 	popq %r11
 
@@ -249,6 +254,7 @@ SYM_CODE_END(xen_syscall_target)
 /* 32-bit compat syscall target */
 SYM_CODE_START(xen_syscall32_target)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	popq %rcx
 	popq %r11
 
@@ -266,6 +272,7 @@ SYM_CODE_END(xen_syscall32_target)
 /* 32-bit compat sysenter target */
 SYM_CODE_START(xen_sysenter_target)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	/*
 	 * NB: Xen is polite and clears TF from EFLAGS for us.  This means
 	 * that we don't need to guard against single step exceptions here.
@@ -289,6 +296,7 @@ SYM_CODE_END(xen_sysenter_target)
 SYM_CODE_START(xen_syscall32_target)
 SYM_CODE_START(xen_sysenter_target)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 	lea 16(%rsp), %rsp	/* strip %rcx, %r11 */
 	mov $-ENOSYS, %rax
 	pushq $0
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,8 +25,8 @@
 SYM_CODE_START(hypercall_page)
 	.rept (PAGE_SIZE / 32)
 		UNWIND_HINT_FUNC
-		.skip 31, 0x90
-		RET
+		ANNOTATE_NOENDBR
+		.skip 32, 0xcc
 	.endr
 
 #define HYPERCALL(n) \
@@ -74,6 +74,7 @@ SYM_CODE_END(startup_xen)
 .pushsection .text
 SYM_CODE_START(asm_cpu_bringup_and_idle)
 	UNWIND_HINT_EMPTY
+	ANNOTATE_NOENDBR
 
 	call cpu_bringup_and_idle
 SYM_CODE_END(asm_cpu_bringup_and_idle)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 20/29] x86/ibt,sev: Annotations
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (18 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 19/29] x86/ibt,xen: Annotate away warnings Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 21/29] objtool: Rename --duplicate to --lto Peter Zijlstra
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

No IBT on AMD so far.. probably correct, who knows.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/entry_64.S        |    1 +
 arch/x86/entry/entry_64_compat.S |    1 +
 arch/x86/kernel/head_64.S        |    1 +
 3 files changed, 3 insertions(+)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -96,6 +96,7 @@ SYM_CODE_START(entry_SYSCALL_64)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 SYM_INNER_LABEL(entry_SYSCALL_64_safe_stack, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER_DS				/* pt_regs->ss */
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -214,6 +214,7 @@ SYM_CODE_START(entry_SYSCALL_compat)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
 SYM_INNER_LABEL(entry_SYSCALL_compat_safe_stack, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
 
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER32_DS		/* pt_regs->ss */
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -440,6 +440,7 @@ SYM_CODE_END(early_idt_handler_common)
  */
 SYM_CODE_START_NOALIGN(vc_no_ghcb)
 	UNWIND_HINT_IRET_REGS offset=8
+	ENDBR
 
 	/* Build pt_regs */
 	PUSH_AND_CLEAR_REGS



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 21/29] objtool: Rename --duplicate to --lto
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (19 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 20/29] x86/ibt,sev: Annotations Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-26 19:42   ` Josh Poimboeuf
  2022-02-18 16:49 ` [PATCH 22/29] Kbuild: Prepare !CLANG whole module objtool Peter Zijlstra
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

In order to prepare for LTO like objtool runs for modules, rename the
duplicate argument to lto.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 scripts/link-vmlinux.sh                 |    2 +-
 tools/objtool/builtin-check.c           |    4 ++--
 tools/objtool/check.c                   |    7 ++++++-
 tools/objtool/include/objtool/builtin.h |    2 +-
 4 files changed, 10 insertions(+), 5 deletions(-)

--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -115,7 +115,7 @@ objtool_link()
 			objtoolcmd="orc generate"
 		fi
 
-		objtoolopt="${objtoolopt} --duplicate"
+		objtoolopt="${objtoolopt} --lto"
 
 		if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
 			objtoolopt="${objtoolopt} --mcount"
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,7 @@
 #include <objtool/objtool.h>
 
 bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-     validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
+     lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
 
 static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
@@ -40,7 +40,7 @@ const struct option check_options[] = {
 	OPT_BOOLEAN('b', "backtrace", &backtrace, "unwind on error"),
 	OPT_BOOLEAN('a', "uaccess", &uaccess, "enable uaccess checking"),
 	OPT_BOOLEAN('s', "stats", &stats, "print statistics"),
-	OPT_BOOLEAN('d', "duplicate", &validate_dup, "duplicate validation for vmlinux.o"),
+	OPT_BOOLEAN(0, "lto", &lto, "whole-archive like runs"),
 	OPT_BOOLEAN('n', "noinstr", &noinstr, "noinstr validation for vmlinux.o"),
 	OPT_BOOLEAN('l', "vmlinux", &vmlinux, "vmlinux.o validation"),
 	OPT_BOOLEAN('M', "mcount", &mcount, "generate __mcount_loc"),
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3501,6 +3501,11 @@ int check(struct objtool_file *file)
 {
 	int ret, warnings = 0;
 
+	if (lto && !(vmlinux || module)) {
+		fprintf(stderr, "--lto requires: --vmlinux or --module\n");
+		return 1;
+	}
+
 	arch_initial_func_cfi_state(&initial_func_cfi);
 	init_cfi_state(&init_cfi);
 	init_cfi_state(&func_cfi);
@@ -3521,7 +3526,7 @@ int check(struct objtool_file *file)
 	if (list_empty(&file->insn_list))
 		goto out;
 
-	if (vmlinux && !validate_dup) {
+	if (vmlinux && !lto) {
 		ret = validate_vmlinux_functions(file);
 		if (ret < 0)
 			goto out;
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,7 @@
 
 extern const struct option check_options[];
 extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-            validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
+	    lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
 
 extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 22/29] Kbuild: Prepare !CLANG whole module objtool
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (20 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 21/29] objtool: Rename --duplicate to --lto Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 23/29] objtool: Read the NOENDBR annotation Peter Zijlstra
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Massage the Kbuild stuff to allow running objtool on whole modules.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Makefile               |    2 ++
 scripts/Makefile.build |   25 ++++++++++++++++---------
 scripts/Makefile.lib   |    2 +-
 3 files changed, 19 insertions(+), 10 deletions(-)

--- a/Makefile
+++ b/Makefile
@@ -907,6 +907,8 @@ ifdef CONFIG_LTO
 KBUILD_CFLAGS	+= -fno-lto $(CC_FLAGS_LTO)
 KBUILD_AFLAGS	+= -fno-lto
 export CC_FLAGS_LTO
+BUILD_LTO	:= y
+export BUILD_LTO
 endif
 
 ifdef CONFIG_CFI_CLANG
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -88,7 +88,7 @@ endif
 
 targets-for-modules := $(patsubst %.o, %.mod, $(filter %.o, $(obj-m)))
 
-ifdef CONFIG_LTO_CLANG
+ifdef BUILD_LTO
 targets-for-modules += $(patsubst %.o, %.lto.o, $(filter %.o, $(obj-m)))
 endif
 
@@ -230,6 +230,7 @@ objtool := $(objtree)/tools/objtool/objt
 objtool_args =								\
 	$(if $(CONFIG_UNWINDER_ORC),orc generate,check)			\
 	$(if $(part-of-module), --module)				\
+	$(if $(BUILD_LTO), --lto)					\
 	$(if $(CONFIG_FRAME_POINTER),, --no-fp)				\
 	$(if $(CONFIG_GCOV_KERNEL)$(CONFIG_LTO_CLANG), --no-unreachable)\
 	$(if $(CONFIG_RETPOLINE), --retpoline)				\
@@ -242,11 +243,16 @@ cmd_gen_objtooldep = $(if $(objtool-enab
 
 endif # CONFIG_STACK_VALIDATION
 
-ifdef CONFIG_LTO_CLANG
+ifdef BUILD_LTO
 
 # Skip objtool for LLVM bitcode
 $(obj)/%.o: objtool-enabled :=
 
+# objtool was skipped for LLVM bitcode, run it now that we have compiled
+# modules into native code
+$(obj)/%.lto.o: objtool-enabled = y
+$(obj)/%.lto.o: part-of-module := y
+
 else
 
 # 'OBJECT_FILES_NON_STANDARD := y': skip objtool checking for a directory
@@ -292,21 +298,22 @@ ifdef CONFIG_LTO_CLANG
 # Module .o files may contain LLVM bitcode, compile them into native code
 # before ELF processing
 quiet_cmd_cc_lto_link_modules = LTO [M] $@
-cmd_cc_lto_link_modules =						\
+      cmd_cc_lto_link_modules =						\
 	$(LD) $(ld_flags) -r -o $@					\
 		$(shell [ -s $(@:.lto.o=.o.symversions) ] &&		\
 			echo -T $(@:.lto.o=.o.symversions))		\
 		--whole-archive $(filter-out FORCE,$^)			\
 		$(cmd_objtool)
-
-# objtool was skipped for LLVM bitcode, run it now that we have compiled
-# modules into native code
-$(obj)/%.lto.o: objtool-enabled = y
-$(obj)/%.lto.o: part-of-module := y
+else
+quiet_cmd_cc_lto_link_modules = LD [M] $@
+      cmd_cc_lto_link_modules =						\
+	$(LD) $(ld_flags) -r -o $@					\
+		$(filter-out FORCE,$^)					\
+		$(cmd_objtool)
+endif
 
 $(obj)/%.lto.o: $(obj)/%.o FORCE
 	$(call if_changed,cc_lto_link_modules)
-endif
 
 cmd_mod = { \
 	echo $(if $($*-objs)$($*-y)$($*-m), $(addprefix $(obj)/, $($*-objs) $($*-y) $($*-m)), $(@:.mod=.o)); \
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -230,7 +230,7 @@ dtc_cpp_flags  = -Wp,-MMD,$(depfile).pre
 		 $(addprefix -I,$(DTC_INCLUDE))                          \
 		 -undef -D__DTS__
 
-ifeq ($(CONFIG_LTO_CLANG),y)
+ifdef BUILD_LTO
 # With CONFIG_LTO_CLANG, .o files in modules might be LLVM bitcode, so we
 # need to run LTO to compile them into native code (.lto.o) before further
 # processing.



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 23/29] objtool: Read the NOENDBR annotation
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (21 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 22/29] Kbuild: Prepare !CLANG whole module objtool Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Read the new NOENDBR annotation. While there, attempt to not bloat
struct instruction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/objtool/check.c                 |   27 +++++++++++++++++++++++++++
 tools/objtool/include/objtool/check.h |   13 ++++++++++---
 2 files changed, 37 insertions(+), 3 deletions(-)

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1860,6 +1860,29 @@ static int read_unwind_hints(struct objt
 	return 0;
 }
 
+static int read_noendbr_hints(struct objtool_file *file)
+{
+	struct section *sec;
+	struct instruction *insn;
+	struct reloc *reloc;
+
+	sec = find_section_by_name(file->elf, ".rela.discard.noendbr");
+	if (!sec)
+		return 0;
+
+	list_for_each_entry(reloc, &sec->reloc_list, list) {
+		insn = find_insn(file, reloc->sym->sec, reloc->sym->offset + reloc->addend);
+		if (!insn) {
+			WARN("bad .discard.noendbr entry");
+			return -1;
+		}
+
+		insn->noendbr = 1;
+	}
+
+	return 0;
+}
+
 static int read_retpoline_hints(struct objtool_file *file)
 {
 	struct section *sec;
@@ -2097,6 +2120,10 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
+	ret = read_noendbr_hints(file);
+	if (ret)
+		return ret;
+
 	/*
 	 * Must be before add_{jump_call}_destination.
 	 */
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -45,11 +45,18 @@ struct instruction {
 	unsigned int len;
 	enum insn_type type;
 	unsigned long immediate;
-	bool dead_end, ignore, ignore_alts;
-	bool hint;
-	bool retpoline_safe;
+
+	u8 dead_end	: 1,
+	   ignore	: 1,
+	   ignore_alts	: 1,
+	   hint		: 1,
+	   retpoline_safe : 1,
+	   noendbr	: 1;
+		/* 2 bit hole */
 	s8 instr;
 	u8 visited;
+	/* u8 hole */
+
 	struct alt_group *alt_group;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (22 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 23/29] objtool: Read the NOENDBR annotation Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-24  1:18   ` Joao Moreira
  2022-02-18 16:49 ` [PATCH 25/29] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Make sure we don't generate direct JMP/CALL instructions to an ENDBR
instruction (which might be poison).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/text-patching.h |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -5,6 +5,7 @@
 #include <linux/types.h>
 #include <linux/stddef.h>
 #include <asm/ptrace.h>
+#include <asm/ibt.h>
 
 struct paravirt_patch_site;
 #ifdef CONFIG_PARAVIRT
@@ -101,6 +102,11 @@ void *text_gen_insn(u8 opcode, const voi
 	static union text_poke_insn insn; /* per instance */
 	int size = text_opcode_size(opcode);
 
+#ifdef CONFIG_X86_IBT
+	if (is_endbr(dest))
+		dest += 4;
+#endif
+
 	insn.opcode = opcode;
 
 	if (size > 1) {



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 25/29] x86/ibt: Dont generate ENDBR in .discard.text
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (23 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 26/29] objtool: Add IBT validation / fixups Peter Zijlstra
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Having ENDBR in discarded sections can easily lead to relocations into
discarded sections which the linkers aren't really fond of. Objtool
also shouldn't generate them, but why tempt fate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/setup.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -8,6 +8,7 @@
 
 #include <linux/linkage.h>
 #include <asm/page_types.h>
+#include <asm/ibt.h>
 
 #ifdef __i386__
 
@@ -119,7 +120,7 @@ void *extend_brk(size_t size, size_t ali
  * executable.)
  */
 #define RESERVE_BRK(name,sz)						\
-	static void __section(".discard.text") __used notrace		\
+	static void __section(".discard.text") __noendbr __used notrace	\
 	__brk_reservation_fn_##name##__(void) {				\
 		asm volatile (						\
 			".pushsection .brk_reservation,\"aw\",@nobits;" \



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 26/29] objtool: Add IBT validation / fixups
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (24 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 25/29] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 27/29] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Objtool based IBT validation in 3 passes:

 --ibt:

    Report code relocs that are not JMP/CALL and don't point to ENDBR

 --ibt-fix-direct:

    Detect and rewrite any code/reloc from a JMP/CALL instruction
    to an ENDBR instruction. This is basically a compiler bug since
    neither needs the ENDBR and decoding it is a pure waste of time.

 --ibt-seal:

    Find superfluous ENDBR instructions. Any function that
    doesn't have it's address taken should not have an ENDBR
    instruction. This removes about 1-in-4 ENDBR instructions.

All these flags are LTO like and require '--lto' to run.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/vmlinux.lds.S           |    9 
 tools/objtool/arch/x86/decode.c         |   82 +++++++
 tools/objtool/builtin-check.c           |    6 
 tools/objtool/check.c                   |  356 ++++++++++++++++++++++++++++++--
 tools/objtool/include/objtool/arch.h    |    3 
 tools/objtool/include/objtool/builtin.h |    3 
 tools/objtool/include/objtool/objtool.h |    4 
 tools/objtool/objtool.c                 |    1 
 8 files changed, 441 insertions(+), 23 deletions(-)

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -285,6 +285,15 @@ SECTIONS
 	}
 #endif
 
+#ifdef CONFIG_X86_IBT
+	. = ALIGN(8);
+	.ibt_endbr_sites : AT(ADDR(.ibt_endbr_sites) - LOAD_OFFSET) {
+		__ibt_endbr_sites = .;
+		*(.ibt_endbr_sites)
+		__ibt_endbr_sites_end = .;
+	}
+#endif
+
 	/*
 	 * struct alt_inst entries. From the header (alternative.h):
 	 * "Alternative instructions for different CPU types or capabilities"
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -112,7 +112,7 @@ int arch_decode_instruction(struct objto
 	const struct elf *elf = file->elf;
 	struct insn insn;
 	int x86_64, ret;
-	unsigned char op1, op2, op3,
+	unsigned char op1, op2, op3, prefix,
 		      rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
 		      modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
 		      sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -137,6 +137,8 @@ int arch_decode_instruction(struct objto
 	if (insn.vex_prefix.nbytes)
 		return 0;
 
+	prefix = insn.prefixes.bytes[0];
+
 	op1 = insn.opcode.bytes[0];
 	op2 = insn.opcode.bytes[1];
 	op3 = insn.opcode.bytes[2];
@@ -492,6 +494,12 @@ int arch_decode_instruction(struct objto
 			/* nopl/nopw */
 			*type = INSN_NOP;
 
+		} else if (op2 == 0x1e) {
+
+			if (prefix == 0xf3 && (modrm == 0xfa || modrm == 0xfb))
+				*type = INSN_ENDBR;
+
+
 		} else if (op2 == 0x38 && op3 == 0xf8) {
 			if (insn.prefixes.nbytes == 1 &&
 			    insn.prefixes.bytes[0] == 0xf2) {
@@ -605,6 +613,7 @@ int arch_decode_instruction(struct objto
 				op->dest.type = OP_DEST_REG;
 				op->dest.reg = CFI_SP;
 			}
+			*type = INSN_IRET;
 			break;
 		}
 
@@ -705,6 +714,77 @@ const char *arch_nop_insn(int len)
 	return nops[len-1];
 }
 
+const char *arch_mod_immediate(struct instruction *insn, unsigned long target)
+{
+	struct section *sec = insn->sec;
+	Elf_Data *data = sec->data;
+	unsigned char op1, op2;
+	static char bytes[16];
+	struct insn x86_insn;
+	int ret, disp;
+
+	disp = (long)(target - (insn->offset + insn->len));
+
+	if (data->d_type != ELF_T_BYTE || data->d_off) {
+		WARN("unexpected data for section: %s", sec->name);
+		return NULL;
+	}
+
+	ret = insn_decode(&x86_insn, data->d_buf + insn->offset, insn->len,
+			  INSN_MODE_64);
+	if (ret < 0) {
+		WARN("can't decode instruction at %s:0x%lx", sec->name, insn->offset);
+		return NULL;
+	}
+
+	op1 = x86_insn.opcode.bytes[0];
+	op2 = x86_insn.opcode.bytes[1];
+
+	switch (op1) {
+	case 0x0f: /* escape */
+		switch (op2) {
+		case 0x80 ... 0x8f: /* jcc.d32 */
+			if (insn->len != 6)
+				return NULL;
+			bytes[0] = op1;
+			bytes[1] = op2;
+			*(int *)&bytes[2] = disp;
+			break;
+
+		default:
+			return NULL;
+		}
+		break;
+
+	case 0x70 ... 0x7f: /* jcc.d8 */
+	case 0xeb: /* jmp.d8 */
+		if (insn->len != 2)
+			return NULL;
+
+		if (disp >> 7 != disp >> 31) {
+			WARN("displacement doesn't fit\n");
+			return NULL;
+		}
+
+		bytes[0] = op1;
+		bytes[1] = disp & 0xff;
+		break;
+
+	case 0xe8: /* call */
+	case 0xe9: /* jmp.d32 */
+		if (insn->len != 5)
+			return NULL;
+		bytes[0] = op1;
+		*(int *)&bytes[1] = disp;
+		break;
+
+	default:
+		return NULL;
+	}
+
+	return bytes;
+}
+
 #define BYTE_RET	0xC3
 
 const char *arch_ret_insn(int len)
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,8 @@
 #include <objtool/objtool.h>
 
 bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-     lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
+     lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
+     ibt, ibt_fix_direct, ibt_seal;
 
 static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
@@ -47,6 +48,9 @@ const struct option check_options[] = {
 	OPT_BOOLEAN('B', "backup", &backup, "create .orig files before modification"),
 	OPT_BOOLEAN('S', "sls", &sls, "validate straight-line-speculation"),
 	OPT_BOOLEAN(0, "dry-run", &dryrun, "don't write the modifications"),
+	OPT_BOOLEAN(0, "ibt", &ibt, "validate ENDBR placement"),
+	OPT_BOOLEAN(0, "ibt-fix-direct", &ibt_fix_direct, "fixup direct jmp/call to ENDBR"),
+	OPT_BOOLEAN(0, "ibt-seal", &ibt_seal, "list superfluous ENDBR instructions"),
 	OPT_END(),
 };
 
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -380,6 +380,7 @@ static int decode_instructions(struct ob
 			memset(insn, 0, sizeof(*insn));
 			INIT_LIST_HEAD(&insn->alts);
 			INIT_LIST_HEAD(&insn->stack_ops);
+			INIT_LIST_HEAD(&insn->call_node);
 
 			insn->sec = sec;
 			insn->offset = offset;
@@ -731,6 +732,58 @@ static int create_retpoline_sites_sectio
 	return 0;
 }
 
+static int create_ibt_endbr_sites_sections(struct objtool_file *file)
+{
+	struct instruction *insn;
+	struct section *sec;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".ibt_endbr_sites");
+	if (sec) {
+		WARN("file already has .ibt_endbr_sites, skipping");
+		return 0;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->endbr_list, call_node)
+		idx++;
+
+	if (stats) {
+		printf("ibt: ENDBR at function start: %d\n", file->nr_endbr);
+		printf("ibt: ENDBR inside functions:  %d\n", file->nr_endbr_int);
+		printf("ibt: superfluous ENDBR:       %d\n", idx);
+	}
+
+	if (!idx)
+		return 0;
+
+	sec = elf_create_section(file->elf, ".ibt_endbr_sites", 0,
+				 sizeof(int), idx);
+	if (!sec) {
+		WARN("elf_create_section: .ibt_endbr_sites");
+		return -1;
+	}
+
+	idx = 0;
+	list_for_each_entry(insn, &file->endbr_list, call_node) {
+
+		int *site = (int *)sec->data->d_buf + idx;
+		*site = 0;
+
+		if (elf_add_reloc_to_insn(file->elf, sec,
+					  idx * sizeof(int),
+					  R_X86_64_PC32,
+					  insn->sec, insn->offset)) {
+			WARN("elf_add_reloc_to_insn: .ibt_endbr_sites");
+			return -1;
+		}
+
+		idx++;
+	}
+
+	return 0;
+}
+
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1176,6 +1229,15 @@ static int add_jump_destinations(struct
 	unsigned long dest_off;
 
 	for_each_insn(file, insn) {
+		if (insn->type == INSN_ENDBR && insn->func) {
+			if (insn->offset == insn->func->offset) {
+				list_add_tail(&insn->call_node, &file->endbr_list);
+				file->nr_endbr++;
+			} else {
+				file->nr_endbr_int++;
+			}
+		}
+
 		if (!is_static_jump(insn))
 			continue;
 
@@ -1192,10 +1254,14 @@ static int add_jump_destinations(struct
 		} else if (insn->func) {
 			/* internal or external sibling call (with reloc) */
 			add_call_dest(file, insn, reloc->sym, true);
-			continue;
+
+			dest_sec = reloc->sym->sec;
+			dest_off = reloc->sym->offset +
+				   arch_dest_reloc_offset(reloc->addend);
+
 		} else if (reloc->sym->sec->idx) {
 			dest_sec = reloc->sym->sec;
-			dest_off = reloc->sym->sym.st_value +
+			dest_off = reloc->sym->offset +
 				   arch_dest_reloc_offset(reloc->addend);
 		} else {
 			/* non-func asm code jumping to another file */
@@ -1205,6 +1271,10 @@ static int add_jump_destinations(struct
 		insn->jump_dest = find_insn(file, dest_sec, dest_off);
 		if (!insn->jump_dest) {
 
+			/* external symbol */
+			if (!vmlinux && insn->func)
+				continue;
+
 			/*
 			 * This is a special case where an alt instruction
 			 * jumps past the end of the section.  These are
@@ -1219,6 +1289,32 @@ static int add_jump_destinations(struct
 			return -1;
 		}
 
+		if (ibt && insn->jump_dest->type == INSN_ENDBR &&
+		    insn->jump_dest->func &&
+		    insn->jump_dest->offset == insn->jump_dest->func->offset) {
+			if (reloc) {
+				if (ibt_fix_direct) {
+					reloc->addend += 4;
+					elf_write_reloc(file->elf, reloc);
+				} else {
+					WARN_FUNC("Direct RELOC jump to ENDBR", insn->sec, insn->offset);
+				}
+			} else {
+				if (ibt_fix_direct) {
+					const char *bytes = arch_mod_immediate(insn, dest_off + 4);
+					if (bytes) {
+						elf_write_insn(file->elf, insn->sec,
+							       insn->offset, insn->len,
+							       bytes);
+					} else {
+						WARN_FUNC("Direct IMM jump to ENDBR; cannot fix", insn->sec, insn->offset);
+					}
+				} else {
+					WARN_FUNC("Direct IMM jump to ENDBR", insn->sec, insn->offset);
+				}
+			}
+		}
+
 		/*
 		 * Cross-function jump.
 		 */
@@ -1246,7 +1342,8 @@ static int add_jump_destinations(struct
 				insn->jump_dest->func->pfunc = insn->func;
 
 			} else if (insn->jump_dest->func->pfunc != insn->func->pfunc &&
-				   insn->jump_dest->offset == insn->jump_dest->func->offset) {
+				   ((insn->jump_dest->offset == insn->jump_dest->func->offset) ||
+				    (insn->jump_dest->offset == insn->jump_dest->func->offset + 4))) {
 				/* internal sibling call (without reloc) */
 				add_call_dest(file, insn, insn->jump_dest->func, true);
 			}
@@ -1256,23 +1353,12 @@ static int add_jump_destinations(struct
 	return 0;
 }
 
-static struct symbol *find_call_destination(struct section *sec, unsigned long offset)
-{
-	struct symbol *call_dest;
-
-	call_dest = find_func_by_offset(sec, offset);
-	if (!call_dest)
-		call_dest = find_symbol_by_offset(sec, offset);
-
-	return call_dest;
-}
-
 /*
  * Find the destination instructions for all calls.
  */
 static int add_call_destinations(struct objtool_file *file)
 {
-	struct instruction *insn;
+	struct instruction *insn, *target = NULL;
 	unsigned long dest_off;
 	struct symbol *dest;
 	struct reloc *reloc;
@@ -1284,7 +1370,21 @@ static int add_call_destinations(struct
 		reloc = insn_reloc(file, insn);
 		if (!reloc) {
 			dest_off = arch_jump_destination(insn);
-			dest = find_call_destination(insn->sec, dest_off);
+
+			target = find_insn(file, insn->sec, dest_off);
+			if (!target) {
+				WARN_FUNC("direct call to nowhere", insn->sec, insn->offset);
+				return -1;
+			}
+			dest = target->func;
+			if (!dest)
+				dest = find_symbol_containing(insn->sec, dest_off);
+			if (!dest) {
+				WARN_FUNC("IMM can't find call dest symbol at %s+0x%lx",
+					  insn->sec, insn->offset,
+					  insn->sec->name, dest_off);
+				return -1;
+			}
 
 			add_call_dest(file, insn, dest, false);
 
@@ -1303,10 +1403,25 @@ static int add_call_destinations(struct
 			}
 
 		} else if (reloc->sym->type == STT_SECTION) {
-			dest_off = arch_dest_reloc_offset(reloc->addend);
-			dest = find_call_destination(reloc->sym->sec, dest_off);
+			struct section *dest_sec;
+
+			dest_sec = reloc->sym->sec;
+			dest_off = reloc->sym->offset +
+				   arch_dest_reloc_offset(reloc->addend);
+
+			target = find_insn(file, dest_sec, dest_off);
+			if (target) {
+				dest = target->func;
+				if (!dest)
+					dest = find_symbol_containing(dest_sec, dest_off);
+			} else {
+				WARN("foo");
+				dest = find_func_by_offset(dest_sec, dest_off);
+				if (!dest)
+					dest = find_symbol_by_offset(dest_sec, dest_off);
+			}
 			if (!dest) {
-				WARN_FUNC("can't find call dest symbol at %s+0x%lx",
+				WARN_FUNC("RELOC can't find call dest symbol at %s+0x%lx",
 					  insn->sec, insn->offset,
 					  reloc->sym->sec->name,
 					  dest_off);
@@ -1317,9 +1432,43 @@ static int add_call_destinations(struct
 
 		} else if (reloc->sym->retpoline_thunk) {
 			add_retpoline_call(file, insn);
+			continue;
+
+		} else {
+			struct section *dest_sec;
+
+			dest_sec = reloc->sym->sec;
+			dest_off = reloc->sym->offset +
+				   arch_dest_reloc_offset(reloc->addend);
+
+			target = find_insn(file, dest_sec, dest_off);
 
-		} else
 			add_call_dest(file, insn, reloc->sym, false);
+		}
+
+		if (ibt && target && target->type == INSN_ENDBR) {
+			if (reloc) {
+				if (ibt_fix_direct) {
+					reloc->addend += 4;
+					elf_write_reloc(file->elf, reloc);
+				} else {
+					WARN_FUNC("Direct RELOC call to ENDBR", insn->sec, insn->offset);
+				}
+			} else {
+				if (ibt_fix_direct) {
+					const char *bytes = arch_mod_immediate(insn, dest_off + 4);
+					if (bytes) {
+						elf_write_insn(file->elf, insn->sec,
+							       insn->offset, insn->len,
+							       bytes);
+					} else {
+						WARN_FUNC("Direct IMM call to ENDBR; cannot fix", insn->sec, insn->offset);
+					}
+				} else {
+					WARN_FUNC("Direct IMM call to ENDBR", insn->sec, insn->offset);
+				}
+			}
+		}
 	}
 
 	return 0;
@@ -3054,6 +3203,8 @@ static struct instruction *next_insn_to_
 	return next_insn_same_sec(file, insn);
 }
 
+static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn);
+
 /*
  * Follow the branch starting at the given instruction, and recursively follow
  * any other branches (jumps).  Meanwhile, track the frame pointer state at
@@ -3102,6 +3253,12 @@ static int validate_branch(struct objtoo
 
 		if (insn->hint) {
 			state.cfi = *insn->cfi;
+			if (ibt) {
+				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY &&
+				    insn->type != INSN_ENDBR) {
+					WARN_FUNC("IRET_ENTRY hint without ENDBR", insn->sec, insn->offset);
+				}
+			}
 		} else {
 			/* XXX track if we actually changed state.cfi */
 
@@ -3261,7 +3418,12 @@ static int validate_branch(struct objtoo
 			state.df = false;
 			break;
 
+		case INSN_NOP:
+			break;
+
 		default:
+			if (ibt)
+				validate_ibt_insn(file, insn);
 			break;
 		}
 
@@ -3507,6 +3669,131 @@ static int validate_functions(struct obj
 	return warnings;
 }
 
+static struct instruction *
+validate_ibt_reloc(struct objtool_file *file, struct reloc *reloc)
+{
+	struct instruction *dest;
+	struct section *sec;
+	unsigned long off;
+
+	sec = reloc->sym->sec;
+	off = reloc->sym->offset + reloc->addend;
+
+	dest = find_insn(file, sec, off);
+	if (!dest)
+		return NULL;
+
+	if (dest->type == INSN_ENDBR) {
+		if (!list_empty(&dest->call_node))
+			list_del_init(&dest->call_node);
+
+		return NULL;
+	}
+
+	if (reloc->sym->static_call_tramp)
+		return NULL;
+
+	return dest;
+}
+
+static void validate_ibt_target(struct objtool_file *file, struct instruction *insn,
+				struct instruction *target)
+{
+	if (target->func && target->func == insn->func) {
+		/*
+		 * Anything from->to self is either _THIS_IP_ or IRET-to-self.
+		 *
+		 * There is no sane way to annotate _THIS_IP_ since the compiler treats the
+		 * relocation as a constant and is happy to fold in offsets, skewing any
+		 * annotation we do, leading to vast amounts of false-positives.
+		 *
+		 * There's also compiler generated _THIS_IP_ through KCOV and
+		 * such which we have no hope of annotating.
+		 *
+		 * As such, blanked accept self-references without issue.
+		 */
+		return;
+	}
+
+	/*
+	 * Annotated non-control flow target.
+	 */
+	if (target->noendbr)
+		return;
+
+	WARN_FUNC("relocation to !ENDBR: %s+0x%lx",
+		  insn->sec, insn->offset,
+		  target->func ? target->func->name : target->sec->name,
+		  target->func ? target->offset - target->func->offset : target->offset);
+}
+
+static void validate_ibt_insn(struct objtool_file *file, struct instruction *insn)
+{
+	struct reloc *reloc = insn_reloc(file, insn);
+	struct instruction *target;
+
+	for (;;) {
+		if (!reloc)
+			return;
+
+		target = validate_ibt_reloc(file, reloc);
+		if (target)
+			validate_ibt_target(file, insn, target);
+
+		reloc = find_reloc_by_dest_range(file->elf, insn->sec, reloc->offset + 1,
+						 (insn->offset + insn->len) - (reloc->offset + 1));
+	}
+}
+
+static int validate_ibt(struct objtool_file *file)
+{
+	struct section *sec;
+	struct reloc *reloc;
+
+	for_each_sec(file, sec) {
+		bool is_data;
+
+		/* already done in validate_branch() */
+		if (sec->sh.sh_flags & SHF_EXECINSTR)
+			continue;
+
+		if (!sec->reloc)
+			continue;
+
+		if (!strncmp(sec->name, ".orc", 4))
+			continue;
+
+		if (!strncmp(sec->name, ".discard", 8))
+			continue;
+
+		if (!strncmp(sec->name, ".debug", 6))
+			continue;
+
+		if (!strcmp(sec->name, "_error_injection_whitelist"))
+			continue;
+
+		if (!strcmp(sec->name, "_kprobe_blacklist"))
+			continue;
+
+		is_data = strstr(sec->name, ".data") || strstr(sec->name, ".rodata");
+
+		list_for_each_entry(reloc, &sec->reloc->reloc_list, list) {
+			struct instruction *target;
+
+			target = validate_ibt_reloc(file, reloc);
+			if (is_data && target && !target->noendbr) {
+				WARN_FUNC("data relocaction to !ENDBR: %s+0x%lx",
+					  reloc->sym->sec,
+					  reloc->sym->offset + reloc->addend,
+		  target->func ? target->func->name : target->sec->name,
+		  target->func ? target->offset - target->func->offset : target->offset);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int validate_reachable_instructions(struct objtool_file *file)
 {
 	struct instruction *insn;
@@ -3534,6 +3821,21 @@ int check(struct objtool_file *file)
 		return 1;
 	}
 
+	if (ibt && !lto) {
+		fprintf(stderr, "--ibt requires: --lto\n");
+		return 1;
+	}
+
+	if (ibt_fix_direct && !ibt) {
+		fprintf(stderr, "--ibt-fix-direct requires: --ibt\n");
+		return 1;
+	}
+
+	if (ibt_seal && !ibt_fix_direct) {
+		fprintf(stderr, "--ibt-seal requires: --ibt-fix-direct\n");
+		return 1;
+	}
+
 	arch_initial_func_cfi_state(&initial_func_cfi);
 	init_cfi_state(&init_cfi);
 	init_cfi_state(&func_cfi);
@@ -3580,6 +3882,13 @@ int check(struct objtool_file *file)
 		goto out;
 	warnings += ret;
 
+	if (ibt) {
+		ret = validate_ibt(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
+	}
+
 	if (!warnings) {
 		ret = validate_reachable_instructions(file);
 		if (ret < 0)
@@ -3604,6 +3913,13 @@ int check(struct objtool_file *file)
 		if (ret < 0)
 			goto out;
 		warnings += ret;
+	}
+
+	if (ibt_seal) {
+		ret = create_ibt_endbr_sites_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
 	}
 
 	if (stats) {
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -27,6 +27,8 @@ enum insn_type {
 	INSN_STD,
 	INSN_CLD,
 	INSN_TRAP,
+	INSN_ENDBR,
+	INSN_IRET,
 	INSN_OTHER,
 };
 
@@ -84,6 +86,7 @@ unsigned long arch_dest_reloc_offset(int
 
 const char *arch_nop_insn(int len);
 const char *arch_ret_insn(int len);
+const char *arch_mod_immediate(struct instruction *insn, unsigned long target);
 
 int arch_decode_hint_reg(u8 sp_reg, int *base);
 
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,8 @@
 
 extern const struct option check_options[];
 extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
-	    lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
+	    lto, vmlinux, mcount, noinstr, backup, sls, dryrun,
+	    ibt, ibt_fix_direct, ibt_seal;
 
 extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);
 
--- a/tools/objtool/include/objtool/objtool.h
+++ b/tools/objtool/include/objtool/objtool.h
@@ -26,8 +26,12 @@ struct objtool_file {
 	struct list_head retpoline_call_list;
 	struct list_head static_call_list;
 	struct list_head mcount_loc_list;
+	struct list_head endbr_list;
 	bool ignore_unreachables, c_file, hints, rodata;
 
+	unsigned int nr_endbr;
+	unsigned int nr_endbr_int;
+
 	unsigned long jl_short, jl_long;
 	unsigned long jl_nop_short, jl_nop_long;
 
--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -128,6 +128,7 @@ struct objtool_file *objtool_open_read(c
 	INIT_LIST_HEAD(&file.retpoline_call_list);
 	INIT_LIST_HEAD(&file.static_call_list);
 	INIT_LIST_HEAD(&file.mcount_loc_list);
+	INIT_LIST_HEAD(&file.endbr_list);
 	file.c_file = !vmlinux && find_section_by_name(file.elf, ".comment");
 	file.ignore_unreachables = no_unreachable;
 	file.hints = false;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 27/29] x86/ibt: Finish --ibt-fix-direct on module loading
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (25 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 26/29] objtool: Add IBT validation / fixups Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 28/29] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Since modules are not fully linked objectes, per construction, the
LTO-like objtool pass cannot fix up the direct calls to external
functions.

Have the module loader finish the job.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/module.c |   40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -24,6 +24,7 @@
 #include <asm/page.h>
 #include <asm/setup.h>
 #include <asm/unwind.h>
+#include <asm/ibt.h>
 
 #if 0
 #define DEBUGP(fmt, ...)				\
@@ -128,6 +129,33 @@ int apply_relocate(Elf32_Shdr *sechdrs,
 	return 0;
 }
 #else /*X86_64*/
+
+static inline void ibt_fix_direct(void *loc, u64 *val)
+{
+#ifdef CONFIG_X86_IBT
+	const void *addr = (void *)(4 + *val);
+	union text_poke_insn text;
+	u32 insn;
+
+	if (get_kernel_nofault(insn, addr))
+		return;
+
+	if (!is_endbr(&insn))
+		return;
+
+	/* validate jmp.d32/call @ loc */
+	if (WARN_ONCE(get_kernel_nofault(text, loc-1) ||
+		      (text.opcode != CALL_INSN_OPCODE &&
+		       text.opcode != JMP32_INSN_OPCODE),
+		      "Unexpected code at: %pS\n", loc))
+		return;
+
+	DEBUGP("ibt_fix_direct: %pS\n", addr);
+
+	*val += 4;
+#endif
+}
+
 static int __apply_relocate_add(Elf64_Shdr *sechdrs,
 		   const char *strtab,
 		   unsigned int symindex,
@@ -139,6 +167,7 @@ static int __apply_relocate_add(Elf64_Sh
 	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
 	Elf64_Sym *sym;
 	void *loc;
+	int type;
 	u64 val;
 
 	DEBUGP("Applying relocate section %u to %u\n",
@@ -153,13 +182,14 @@ static int __apply_relocate_add(Elf64_Sh
 		sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
 			+ ELF64_R_SYM(rel[i].r_info);
 
+		type = ELF64_R_TYPE(rel[i].r_info);
+
 		DEBUGP("type %d st_value %Lx r_addend %Lx loc %Lx\n",
-		       (int)ELF64_R_TYPE(rel[i].r_info),
-		       sym->st_value, rel[i].r_addend, (u64)loc);
+		       type, sym->st_value, rel[i].r_addend, (u64)loc);
 
 		val = sym->st_value + rel[i].r_addend;
 
-		switch (ELF64_R_TYPE(rel[i].r_info)) {
+		switch (type) {
 		case R_X86_64_NONE:
 			break;
 		case R_X86_64_64:
@@ -185,6 +215,10 @@ static int __apply_relocate_add(Elf64_Sh
 		case R_X86_64_PLT32:
 			if (*(u32 *)loc != 0)
 				goto invalid_relocation;
+
+			if (type == R_X86_64_PLT32)
+				ibt_fix_direct(loc, &val);
+
 			val -= (u64)loc;
 			write(loc, &val, 4);
 #if 0



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 28/29] x86/ibt: Ensure module init/exit points have references
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (26 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 27/29] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-18 16:49 ` [PATCH 29/29] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
  2022-02-19  1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Since the references to the module init/exit points only have external
references, a module LTO run will consider them 'unused' and seal
them, leading to an immediate fail on module load.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/cfi.h |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/include/linux/cfi.h
+++ b/include/linux/cfi.h
@@ -34,8 +34,17 @@ static inline void cfi_module_remove(str
 
 #else /* !CONFIG_CFI_CLANG */
 
-#define __CFI_ADDRESSABLE(fn, __attr)
+#ifdef CONFIG_X86_IBT
+
+#define __CFI_ADDRESSABLE(fn, __attr) \
+	const void *__cfi_jt_ ## fn __visible __attr = (void *)&fn
+
+#endif /* CONFIG_X86_IBT */
 
 #endif /* CONFIG_CFI_CLANG */
 
+#ifndef __CFI_ADDRESSABLE
+#define __CFI_ADDRESSABLE(fn, __attr)
+#endif
+
 #endif /* _LINUX_CFI_H */



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH 29/29] x86/alternative: Use .ibt_endbr_sites to seal indirect calls
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (27 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 28/29] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
@ 2022-02-18 16:49 ` Peter Zijlstra
  2022-02-19  1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
  29 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 16:49 UTC (permalink / raw)
  To: x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, keescook, samitolvanen,
	mark.rutland, alyssa.milburn

Objtool's --ibt-seal option generates .ibt_endbr_sites which lists
superfluous ENDBR instructions. That is those instructions for which
the function is never indirectly called.

Additionally, objtool's --ibt-fix-direct ensures direct calls never
target an ENDBR instruction.

Combined this yields that these instructions should never be executed.

Poison them using a 4 byte UD1 instruction; for IBT hardware this will
raise an #CP exception due to WAIT-FOR-ENDBR not getting what it
wants. For !IBT hardware it'll trigger #UD.

In either case, it will be 'impossible' to indirectly call these
functions thereafter.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Makefile                           |    5 ++++
 arch/um/kernel/um_arch.c           |    4 +++
 arch/x86/Kconfig                   |   12 +++++++++
 arch/x86/include/asm/alternative.h |    1 
 arch/x86/include/asm/ibt.h         |    4 ++-
 arch/x86/kernel/alternative.c      |   46 +++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/module.c           |   10 ++++++--
 arch/x86/kernel/traps.c            |   35 ++++++++++++++++++++++++++--
 scripts/Makefile.build             |    3 +-
 scripts/link-vmlinux.sh            |   10 ++++++--
 10 files changed, 122 insertions(+), 8 deletions(-)

--- a/Makefile
+++ b/Makefile
@@ -911,6 +911,11 @@ BUILD_LTO	:= y
 export BUILD_LTO
 endif
 
+ifdef CONFIG_X86_IBT_SEAL
+BUILD_LTO	:= y
+export BUILD_LTO
+endif
+
 ifdef CONFIG_CFI_CLANG
 CC_FLAGS_CFI	:= -fsanitize=cfi \
 		   -fsanitize-cfi-cross-dso \
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -424,6 +424,10 @@ void __init check_bugs(void)
 	os_check_bugs();
 }
 
+void apply_ibt_endbr(s32 *start, s32 *end)
+{
+}
+
 void apply_retpolines(s32 *start, s32 *end)
 {
 }
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1876,6 +1876,18 @@ config X86_IBT
 	  an ENDBR instruction, as such, the compiler will litter the
 	  code with them to make this happen.
 
+config X86_IBT_SEAL
+	prompt "Seal functions"
+	def_bool y
+	depends on X86_IBT && STACK_VALIDATION
+	help
+	  In addition to building the kernel with IBT, seal all functions that
+	  are not indirect call targets, avoiding them ever becomming one.
+
+	  This requires LTO like objtool runs and will slow down the build. It
+	  does significantly reduce the number of ENDBR instructions in the
+	  kernel image as well as provide some validation for !IBT hardware.
+
 config X86_INTEL_MEMORY_PROTECTION_KEYS
 	prompt "Memory Protection Keys"
 	def_bool y
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -76,6 +76,7 @@ extern int alternatives_patched;
 extern void alternative_instructions(void);
 extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
 extern void apply_retpolines(s32 *start, s32 *end);
+extern void apply_ibt_endbr(s32 *start, s32 *end);
 
 struct module;
 
--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -23,8 +23,10 @@
 static inline bool is_endbr(const void *addr)
 {
 	unsigned int val = ~*(unsigned int *)addr;
+	if (val == ~0x0040b90f) /* ud1_endbr */
+		return true;
 	val |= 0x01000000U;
-	return val == ~0xfa1e0ff3;
+	return val == ~0xfa1e0ff3; /* endbr */
 }
 
 extern u64 ibt_save(void);
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -115,6 +115,7 @@ static void __init_or_module add_nops(vo
 }
 
 extern s32 __retpoline_sites[], __retpoline_sites_end[];
+extern s32 __ibt_endbr_sites[], __ibt_endbr_sites_end[];
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
 void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -512,6 +513,49 @@ void __init_or_module noinline apply_ret
 
 #endif /* CONFIG_RETPOLINE && CONFIG_STACK_VALIDATION */
 
+#ifdef CONFIG_X86_IBT_SEAL
+
+/*
+ * ud1    0x0(%rax),%eax -- a 4 byte #UD instruction for when we don't have
+ *                          IBT and still want to trigger fail.
+ */
+static const u8 ud1_endbr[4] = { 0x0f, 0xb9, 0x40, 0x00 };
+
+/*
+ * Generated by: objtool --ibt-seal
+ */
+void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		u32 endbr;
+
+		if (WARN_ON_ONCE(get_kernel_nofault(endbr, addr)))
+			continue;
+
+		if (WARN_ON_ONCE(!is_endbr(&endbr)))
+			continue;
+
+		DPRINTK("ENDBR at: %pS (%px)", addr, addr);
+
+		/*
+		 * When we have IBT, the lack of ENDBR will trigger #CP
+		 * When we don't have IBT, explicitly trigger #UD
+		 */
+		DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr);
+		DUMP_BYTES(((u8*)ud1_endbr), 4, "%px: repl: ", addr);
+		text_poke_early(addr, ud1_endbr, 4);
+	}
+}
+
+#else
+
+void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_X86_IBT_SEAL */
+
 #ifdef CONFIG_SMP
 static void alternatives_smp_lock(const s32 *start, const s32 *end,
 				  u8 *text, u8 *text_end)
@@ -832,6 +876,8 @@ void __init alternative_instructions(voi
 	 */
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
+	apply_ibt_endbr(__ibt_endbr_sites, __ibt_endbr_sites_end);
+
 #ifdef CONFIG_SMP
 	/* Patch to UP if other cpus not imminent. */
 	if (!noreplace_smp && (num_present_cpus() == 1 || setup_max_cpus <= 1)) {
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -132,7 +132,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
 
 static inline void ibt_fix_direct(void *loc, u64 *val)
 {
-#ifdef CONFIG_X86_IBT
+#ifdef CONFIG_X86_IBT_SEAL
 	const void *addr = (void *)(4 + *val);
 	union text_poke_insn text;
 	u32 insn;
@@ -287,7 +287,7 @@ int module_finalize(const Elf_Ehdr *hdr,
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
 		*para = NULL, *orc = NULL, *orc_ip = NULL,
-		*retpolines = NULL;
+		*retpolines = NULL, *ibt_endbr = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -305,6 +305,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 			orc_ip = s;
 		if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
 			retpolines = s;
+		if (!strcmp(".ibt_endbr_sites", secstrings + s->sh_name))
+			ibt_endbr = s;
 	}
 
 	if (para) {
@@ -320,6 +322,10 @@ int module_finalize(const Elf_Ehdr *hdr,
 		void *aseg = (void *)alt->sh_addr;
 		apply_alternatives(aseg, aseg + alt->sh_size);
 	}
+	if (ibt_endbr) {
+		void *iseg = (void *)ibt_endbr->sh_addr;
+		apply_ibt_endbr(iseg, iseg + ibt_endbr->sh_size);
+	}
 	if (locks && text) {
 		void *lseg = (void *)locks->sh_addr;
 		void *tseg = (void *)text->sh_addr;
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -214,6 +214,12 @@ DEFINE_IDTENTRY(exc_overflow)
 
 static bool ibt_fatal = true;
 
+static void handle_endbr(struct pt_regs *regs)
+{
+	pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
+	BUG_ON(ibt_fatal);
+}
+
 extern unsigned long ibt_selftest_ip; /* defined in asm beow */
 static volatile bool ibt_selftest_ok = false;
 
@@ -232,8 +238,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_control_pr
 		return;
 	}
 
-	pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
-	BUG_ON(ibt_fatal);
+	handle_endbr(regs);
 }
 
 bool ibt_selftest(void)
@@ -277,6 +282,29 @@ static int __init ibt_setup(char *str)
 
 __setup("ibt=", ibt_setup);
 
+static bool handle_ud1_endbr(struct pt_regs *regs)
+{
+	u32 ud1;
+
+	if (get_kernel_nofault(ud1, (u32 *)regs->ip))
+		return false;
+
+	if (ud1 == 0x0040b90f) {
+		handle_endbr(regs);
+		regs->ip += 4;
+		return true;
+	}
+
+	return false;
+}
+
+#else /* CONFIG_X86_IBT */
+
+static bool handle_ud1_endbr(struct pt_regs *regs)
+{
+	return false;
+}
+
 #endif /* CONFIG_X86_IBT */
 
 #ifdef CONFIG_X86_F00F_BUG
@@ -285,6 +313,9 @@ void handle_invalid_op(struct pt_regs *r
 static inline void handle_invalid_op(struct pt_regs *regs)
 #endif
 {
+	if (!user_mode(regs) && handle_ud1_endbr(regs))
+		return;
+
 	do_error_trap(regs, 0, "invalid opcode", X86_TRAP_UD, SIGILL,
 		      ILL_ILLOPN, error_get_trap_addr(regs));
 }
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -231,6 +231,7 @@ objtool_args =								\
 	$(if $(CONFIG_UNWINDER_ORC),orc generate,check)			\
 	$(if $(part-of-module), --module)				\
 	$(if $(BUILD_LTO), --lto)					\
+	$(if $(CONFIG_X86_IBT_SEAL), --ibt --ibt-fix-direct --ibt-seal)	\
 	$(if $(CONFIG_FRAME_POINTER),, --no-fp)				\
 	$(if $(CONFIG_GCOV_KERNEL)$(CONFIG_LTO_CLANG), --no-unreachable)\
 	$(if $(CONFIG_RETPOLINE), --retpoline)				\
@@ -305,7 +306,7 @@ quiet_cmd_cc_lto_link_modules = LTO [M]
 		--whole-archive $(filter-out FORCE,$^)			\
 		$(cmd_objtool)
 else
-quiet_cmd_cc_lto_link_modules = LD [M] $@
+quiet_cmd_cc_lto_link_modules = LD [M]  $@
       cmd_cc_lto_link_modules =						\
 	$(LD) $(ld_flags) -r -o $@					\
 		$(filter-out FORCE,$^)					\
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -108,7 +108,9 @@ objtool_link()
 	local objtoolcmd;
 	local objtoolopt;
 
-	if is_enabled CONFIG_LTO_CLANG && is_enabled CONFIG_STACK_VALIDATION; then
+	if is_enabled CONFIG_STACK_VALIDATION && \
+	   ( is_enabled CONFIG_LTO_CLANG || is_enabled CONFIG_X86_IBT_SEAL ); then
+
 		# Don't perform vmlinux validation unless explicitly requested,
 		# but run objtool on vmlinux.o now that we have an object file.
 		if is_enabled CONFIG_UNWINDER_ORC; then
@@ -117,6 +119,10 @@ objtool_link()
 
 		objtoolopt="${objtoolopt} --lto"
 
+		if is_enabled CONFIG_X86_IBT_SEAL; then
+			objtoolopt="${objtoolopt} --ibt --ibt-fix-direct --ibt-seal"
+		fi
+
 		if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
 			objtoolopt="${objtoolopt} --mcount"
 		fi
@@ -168,7 +174,7 @@ vmlinux_link()
 	# skip output file argument
 	shift
 
-	if is_enabled CONFIG_LTO_CLANG; then
+	if is_enabled CONFIG_LTO_CLANG || is_enabled CONFIG_X86_IBT_SEAL; then
 		# Use vmlinux.o instead of performing the slow LTO link again.
 		objs=vmlinux.o
 		libs=



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
@ 2022-02-18 19:31   ` Andrew Cooper
  2022-02-18 21:15     ` Peter Zijlstra
  2022-02-19  1:20   ` Edgecombe, Rick P
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 19:31 UTC (permalink / raw)
  To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
  Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn

On 18/02/2022 16:49, Peter Zijlstra wrote:
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -592,6 +593,27 @@ static __init int setup_disable_pku(char
>  __setup("nopku", setup_disable_pku);
>  #endif /* CONFIG_X86_64 */
>  
> +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> +{
> +	u64 msr;
> +
> +	if (!IS_ENABLED(CONFIG_X86_IBT) ||
> +	    !cpu_feature_enabled(X86_FEATURE_IBT))
> +		return;
> +
> +	cr4_set_bits(X86_CR4_CET);
> +
> +	rdmsrl(MSR_IA32_S_CET, msr);
> +	if (cpu_feature_enabled(X86_FEATURE_IBT))
> +		msr |= CET_ENDBR_EN;
> +	wrmsrl(MSR_IA32_S_CET, msr);

So something I learnt the hard way with shstk is that you really want to
disable S_CET before heading into purgatory.

I've got no idea what's going to result from UEFI finally getting CET
support.  However, clearing out the other IBT settings is probably a
wise move.

In particular, if there was a stale legacy bitmap pointer, then
ibt_selftest() could take #PF ahead of #CP.

~Andrew

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 19/29] x86/ibt,xen: Annotate away warnings
  2022-02-18 16:49 ` [PATCH 19/29] x86/ibt,xen: Annotate away warnings Peter Zijlstra
@ 2022-02-18 20:24   ` Andrew Cooper
  2022-02-18 21:05     ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 20:24 UTC (permalink / raw)
  To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, Juergen Gross
  Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Andrew Cooper

On 18/02/2022 16:49, Peter Zijlstra wrote:
> The xen_iret ENDBR is needed for pre-alternative code calling the
> pv_ops using indirect calls.
>
> The rest look like hypervisor entry points which will be IRET like
> transfers and as such don't need ENDBR.

That's up for debate.  Mechanically, yes - they're IRET or SYSERET.

Logically however, they're entrypoints registered with Xen, so following
the spec, Xen ought to force WAIT-FOR-ENDBR.

Or we could argue that said entrypoints are registered in Xen.

The case for ENDBR for the IDT vectors is quite obviously - a stray
write into the IDT can modify the entrypoint, and ENDBR limits an
attacker's choices.

OTOH, the SYSCALL and SYSENTER entrypoints are latched in MSRs, and if
you've got a sufficiently large security hole that the attacker can
write these MSRs, you have already lost.  I'm not aware of any extra
security you get from forcing WAIT-FOR-ENDBR in the SYSCALL/SYSENTER
flow, and suspect it was like that just for consistency.

Under Xen PV, all entrypoints are configured by explicit hypercall, not
via a shared memory structure, so better match the MSR model for
native.  I could probably be argued away from having a RMW of MSR_U_CET
in the event delivery fastpath.


I'd be tempted to leave the ENDBR's in.  It feels like a safer default
until we figure out how to paravirt IBT properly.

> The hypercall page comes from the hypervisor, there might or might not
> be ENDBR there, not our problem.

Xen will make sure that the hypercall page contains ENDBR's if CET-IBT
is available for the guest to use.  Perhaps...

> --- a/arch/x86/xen/xen-head.S
> +++ b/arch/x86/xen/xen-head.S
> @@ -25,8 +25,8 @@
>  SYM_CODE_START(hypercall_page)
>  	.rept (PAGE_SIZE / 32)
>  		UNWIND_HINT_FUNC
> -		.skip 31, 0x90
> -		RET
> +		ANNOTATE_NOENDBR
> +		.skip 32, 0xcc

// Xen writes the hypercall page, and will sort out ENDBR

?

Also, somewhere in this series needs:

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 5004feb16783..e30f77264ee6 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -624,6 +624,7 @@ static struct trap_array_entry trap_array[] = {
        TRAP_ENTRY(exc_coprocessor_error,               false ),
        TRAP_ENTRY(exc_alignment_check,                 false ),
        TRAP_ENTRY(exc_simd_coprocessor_error,          false ),
+       TRAP_ENTRY(exc_control_protection,              false ),
 };
 
 static bool __ref get_trap_addr(void **addr, unsigned int ist)
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 444d824775f6..6f077aedd561 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -147,6 +147,7 @@ xen_pv_trap asm_exc_page_fault
 xen_pv_trap asm_exc_spurious_interrupt_bug
 xen_pv_trap asm_exc_coprocessor_error
 xen_pv_trap asm_exc_alignment_check
+xen_pv_trap asm_exc_control_protection
 #ifdef CONFIG_X86_MCE
 xen_pv_trap asm_xenpv_exc_machine_check
 #endif /* CONFIG_X86_MCE */

at a minimum, and possibly also:

diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index 444d824775f6..96db5c50a6e7 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -124,7 +124,7 @@ SYM_CODE_START(xen_\name)
        UNWIND_HINT_EMPTY
        pop %rcx
        pop %r11
-       jmp  \name
+       jmp  \name + 4 * IS_ENABLED(CONFIG_X86_IBT)
 SYM_CODE_END(xen_\name)
 _ASM_NOKPROBE(xen_\name)
 .endm

(Entirely untested.)

~Andrew

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/29] x86/module: Fix the paravirt vs alternative order
  2022-02-18 16:49 ` [PATCH 02/29] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
@ 2022-02-18 20:28   ` Josh Poimboeuf
  2022-02-18 21:22     ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-18 20:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn,
	Juergen Gross

On Fri, Feb 18, 2022 at 05:49:04PM +0100, Peter Zijlstra wrote:
> Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
> paravirt patching") there is an ordering dependency between patching
> paravirt ops and patching alternatives, the module loader still
> violates this.
> 
> Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
> Cc: Juergen Gross <jgross@suse.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Probably a good idea to put the 'para' and 'alt' clauses next to each
other and add a comment that the ordering is necessary.

> ---
>  arch/x86/kernel/module.c |    9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -272,6 +272,10 @@ int module_finalize(const Elf_Ehdr *hdr,
>  			retpolines = s;
>  	}
>  
> +	if (para) {
> +		void *pseg = (void *)para->sh_addr;
> +		apply_paravirt(pseg, pseg + para->sh_size);
> +	}
>  	if (retpolines) {
>  		void *rseg = (void *)retpolines->sh_addr;
>  		apply_retpolines(rseg, rseg + retpolines->sh_size);
> @@ -289,11 +293,6 @@ int module_finalize(const Elf_Ehdr *hdr,
>  					    tseg, tseg + text->sh_size);
>  	}
>  
> -	if (para) {
> -		void *pseg = (void *)para->sh_addr;
> -		apply_paravirt(pseg, pseg + para->sh_size);
> -	}
> -
>  	/* make jump label nops */
>  	jump_label_apply_nops(me);
>  
> 
> 

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
@ 2022-02-18 20:49   ` Andrew Cooper
  2022-02-18 21:11     ` David Laight
  2022-02-18 21:26     ` Peter Zijlstra
  2022-02-18 21:14   ` Josh Poimboeuf
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 20:49 UTC (permalink / raw)
  To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
  Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Andrew Cooper

On 18/02/2022 16:49, Peter Zijlstra wrote:
> +/*
> + * A bit convoluted, but matches both endbr32 and endbr64 without
> + * having either as literal in the text.
> + */
> +static inline bool is_endbr(const void *addr)
> +{
> +	unsigned int val = ~*(unsigned int *)addr;
> +	val |= 0x01000000U;
> +	return val == ~0xfa1e0ff3;
> +}

At this point, I feel I've earned an "I told you so". :)

Clang 13 sees straight through the trickery and generates:

is_endbr:                               # @is_endbr
        movl    $-16777217, %eax                # imm = 0xFEFFFFFF
        andl    (%rdi), %eax
        cmpl    $-98693133, %eax                # imm = 0xFA1E0FF3
        sete    %al
        retq

Here's one I prepared earlier:

/*
 * In some cases we need to inspect/insert endbr64 instructions.
 *
 * The naive way, mem{cmp,cpy}(ptr, "\xf3\x0f\x1e\xfa", 4), optimises
unsafely
 * by placing 0xfa1e0ff3 in an imm32 operand, and marks a legal indirect
 * branch target as far as the CPU is concerned.
 *
 * gen_endbr64() is written deliberately to avoid the problematic
operand, and
 * marked __const__ as it is safe for the optimiser to hoist/merge/etc.
 */
static inline uint32_t __attribute_const__ gen_endbr64(void)
{
    uint32_t res;

    asm ( "mov $~0xfa1e0ff3, %[res]\n\t"
          "not %[res]\n\t"
          : [res] "=&r" (res) );

    return res;
}

which should be robust against even the most enterprising optimiser.

~Andrew

P.S. Clang IAS had better never get "clever" enough to optimise what it
finds in asm statements...

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 19/29] x86/ibt,xen: Annotate away warnings
  2022-02-18 20:24   ` Andrew Cooper
@ 2022-02-18 21:05     ` Peter Zijlstra
  2022-02-18 23:07       ` Andrew Cooper
  0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, joao, hjl.tools, jpoimboe, Juergen Gross, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn

On Fri, Feb 18, 2022 at 08:24:41PM +0000, Andrew Cooper wrote:
> On 18/02/2022 16:49, Peter Zijlstra wrote:
> > The xen_iret ENDBR is needed for pre-alternative code calling the
> > pv_ops using indirect calls.
> >
> > The rest look like hypervisor entry points which will be IRET like
> > transfers and as such don't need ENDBR.
> 
> That's up for debate.  Mechanically, yes - they're IRET or SYSERET.
> 
> Logically however, they're entrypoints registered with Xen, so following
> the spec, Xen ought to force WAIT-FOR-ENDBR.

Cute..

> I'd be tempted to leave the ENDBR's in.  It feels like a safer default
> until we figure out how to paravirt IBT properly.

Fair enough, done.

> at a minimum, and possibly also:
> 
> diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
> index 444d824775f6..96db5c50a6e7 100644
> --- a/arch/x86/xen/xen-asm.S
> +++ b/arch/x86/xen/xen-asm.S
> @@ -124,7 +124,7 @@ SYM_CODE_START(xen_\name)
>         UNWIND_HINT_EMPTY
>         pop %rcx
>         pop %r11
> -       jmp  \name
> +       jmp  \name + 4 * IS_ENABLED(CONFIG_X86_IBT)
>  SYM_CODE_END(xen_\name)
>  _ASM_NOKPROBE(xen_\name)
>  .endm

objtool will do that for you, it will rewrite all direct jmp/call to
endbr.


Something like so then?

---
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -818,6 +818,7 @@ SYM_CODE_END(exc_xen_hypervisor_callback
  */
 SYM_CODE_START(xen_failsafe_callback)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	movl	%ds, %ecx
 	cmpw	%cx, 0x10(%rsp)
 	jne	1f
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -392,6 +392,7 @@ SYM_CODE_START(early_idt_handler_array)
 	.endr
 	UNWIND_HINT_IRET_REGS offset=16 entry=0
 SYM_CODE_END(early_idt_handler_array)
+	ANNOTATE_NOENDBR // early_idt_handler_array[NUM_EXCEPTION_VECTORS]
 
 SYM_CODE_START_LOCAL(early_idt_handler_common)
 	/*
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -624,6 +624,7 @@ static struct trap_array_entry trap_arra
 	TRAP_ENTRY(exc_coprocessor_error,		false ),
 	TRAP_ENTRY(exc_alignment_check,			false ),
 	TRAP_ENTRY(exc_simd_coprocessor_error,		false ),
+	TRAP_ENTRY(exc_control_protection,		false ),
 };
 
 static bool __ref get_trap_addr(void **addr, unsigned int ist)
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -122,6 +122,7 @@ SYM_FUNC_END(xen_read_cr2_direct);
 .macro xen_pv_trap name
 SYM_CODE_START(xen_\name)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	pop %rcx
 	pop %r11
 	jmp  \name
@@ -147,6 +148,7 @@ xen_pv_trap asm_exc_page_fault
 xen_pv_trap asm_exc_spurious_interrupt_bug
 xen_pv_trap asm_exc_coprocessor_error
 xen_pv_trap asm_exc_alignment_check
+xen_pv_trap_asm_exc_control_protection
 #ifdef CONFIG_X86_MCE
 xen_pv_trap asm_xenpv_exc_machine_check
 #endif /* CONFIG_X86_MCE */
@@ -162,6 +164,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
 	i = 0
 	.rept NUM_EXCEPTION_VECTORS
 	UNWIND_HINT_EMPTY
+	ENDBR
 	pop %rcx
 	pop %r11
 	jmp early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE
@@ -169,6 +172,7 @@ SYM_CODE_START(xen_early_idt_handler_arr
 	.fill xen_early_idt_handler_array + i*XEN_EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
 	.endr
 SYM_CODE_END(xen_early_idt_handler_array)
+	ANNOTATE_NOENDBR
 	__FINIT
 
 hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
@@ -189,6 +193,7 @@ hypercall_iret = hypercall_page + __HYPE
  */
 SYM_CODE_START(xen_iret)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	pushq $0
 	jmp hypercall_iret
 SYM_CODE_END(xen_iret)
@@ -230,6 +235,7 @@ SYM_CODE_END(xenpv_restore_regs_and_retu
 /* Normal 64-bit system call target */
 SYM_CODE_START(xen_syscall_target)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	popq %rcx
 	popq %r11
 
@@ -249,6 +255,7 @@ SYM_CODE_END(xen_syscall_target)
 /* 32-bit compat syscall target */
 SYM_CODE_START(xen_syscall32_target)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	popq %rcx
 	popq %r11
 
@@ -266,6 +273,7 @@ SYM_CODE_END(xen_syscall32_target)
 /* 32-bit compat sysenter target */
 SYM_CODE_START(xen_sysenter_target)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	/*
 	 * NB: Xen is polite and clears TF from EFLAGS for us.  This means
 	 * that we don't need to guard against single step exceptions here.
@@ -289,6 +297,7 @@ SYM_CODE_END(xen_sysenter_target)
 SYM_CODE_START(xen_syscall32_target)
 SYM_CODE_START(xen_sysenter_target)
 	UNWIND_HINT_EMPTY
+	ENDBR
 	lea 16(%rsp), %rsp	/* strip %rcx, %r11 */
 	mov $-ENOSYS, %rax
 	pushq $0
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,8 +25,11 @@
 SYM_CODE_START(hypercall_page)
 	.rept (PAGE_SIZE / 32)
 		UNWIND_HINT_FUNC
-		.skip 31, 0x90
-		RET
+		ANNOTATE_NOENDBR
+		/*
+		 * Xen will write the hypercall page, and sort out ENDBR.
+		 */
+		.skip 32, 0xcc
 	.endr
 
 #define HYPERCALL(n) \
@@ -74,6 +77,7 @@ SYM_CODE_END(startup_xen)
 .pushsection .text
 SYM_CODE_START(asm_cpu_bringup_and_idle)
 	UNWIND_HINT_EMPTY
+	ENDBR
 
 	call cpu_bringup_and_idle
 SYM_CODE_END(asm_cpu_bringup_and_idle)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-18 16:49 ` [PATCH 04/29] x86/livepatch: Validate __fentry__ location Peter Zijlstra
@ 2022-02-18 21:08   ` Josh Poimboeuf
  2022-02-23 10:09     ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-18 21:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn,
	Miroslav Benes, Steven Rostedt

On Fri, Feb 18, 2022 at 05:49:06PM +0100, Peter Zijlstra wrote:
> Currently livepatch assumes __fentry__ lives at func+0, which is most
> likely untrue with IBT on. Override the weak klp_get_ftrace_location()
> function with an arch specific version that's IBT aware.
> 
> Also make the weak fallback verify the location is an actual ftrace
> location as a sanity check.
> 
> Suggested-by: Miroslav Benes <mbenes@suse.cz>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/livepatch.h |    9 +++++++++
>  kernel/livepatch/patch.c         |    2 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> --- a/arch/x86/include/asm/livepatch.h
> +++ b/arch/x86/include/asm/livepatch.h
> @@ -17,4 +17,13 @@ static inline void klp_arch_set_pc(struc
>  	ftrace_instruction_pointer_set(fregs, ip);
>  }
>  
> +#define klp_get_ftrace_location klp_get_ftrace_location
> +static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
> +{
> +	unsigned long addr = ftrace_location(faddr);
> +	if (!addr && IS_ENABLED(CONFIG_X86_IBT))
> +		addr = ftrace_location(faddr + 4);
> +	return addr;

I'm kind of surprised this logic doesn't exist in ftrace itself.  Is
livepatch really the only user that needs to find the fentry for a given
function?

I had to do a double take for the ftrace_location() semantics, as I
originally assumed that's what it did, based on its name and signature.

Instead it apparently functions like a bool but returns its argument on
success.

Though the function comment tells a different story:

/**
 * ftrace_location - return true if the ip giving is a traced location

So it's all kinds of confusing...

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 20:49   ` Andrew Cooper
@ 2022-02-18 21:11     ` David Laight
  2022-02-18 21:24       ` Andrew Cooper
  2022-02-18 21:26     ` Peter Zijlstra
  1 sibling, 1 reply; 94+ messages in thread
From: David Laight @ 2022-02-18 21:11 UTC (permalink / raw)
  To: 'Andrew Cooper', Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
  Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn

From: Andrew Cooper
> Sent: 18 February 2022 20:50
> 
> On 18/02/2022 16:49, Peter Zijlstra wrote:
> > +/*
> > + * A bit convoluted, but matches both endbr32 and endbr64 without
> > + * having either as literal in the text.
> > + */
> > +static inline bool is_endbr(const void *addr)
> > +{
> > +	unsigned int val = ~*(unsigned int *)addr;
> > +	val |= 0x01000000U;
> > +	return val == ~0xfa1e0ff3;
> > +}
> 
> At this point, I feel I've earned an "I told you so". :)
> 
> Clang 13 sees straight through the trickery and generates:
> 
> is_endbr:                               # @is_endbr
>         movl    $-16777217, %eax                # imm = 0xFEFFFFFF
>         andl    (%rdi), %eax
>         cmpl    $-98693133, %eax                # imm = 0xFA1E0FF3
>         sete    %al
>         retq

I think it is enough to add:
	asm("", "=r" (val));
somewhere in the middle.
(I think that is right for asm with input and output in the same
register.)
There might be a HIDE_FOR_OPTIMISER() define that does that.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
  2022-02-18 20:49   ` Andrew Cooper
@ 2022-02-18 21:14   ` Josh Poimboeuf
  2022-02-18 21:21     ` Peter Zijlstra
  2022-02-18 22:12   ` Joao Moreira
  2022-02-19  1:07   ` Edgecombe, Rick P
  3 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-18 21:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 05:49:07PM +0100, Peter Zijlstra wrote:
> +#ifdef CONFIG_X86_64
> +#define ASM_ENDBR	"endbr64\n\t"
> +#else
> +#define ASM_ENDBR	"endbr32\n\t"
> +#endif

Is it safe to assume all supported assemblers know this instruction?

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-18 19:31   ` Andrew Cooper
@ 2022-02-18 21:15     ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, joao, hjl.tools, jpoimboe, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 07:31:38PM +0000, Andrew Cooper wrote:
> On 18/02/2022 16:49, Peter Zijlstra wrote:
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -592,6 +593,27 @@ static __init int setup_disable_pku(char
> >  __setup("nopku", setup_disable_pku);
> >  #endif /* CONFIG_X86_64 */
> >  
> > +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> > +{
> > +	u64 msr;
> > +
> > +	if (!IS_ENABLED(CONFIG_X86_IBT) ||
> > +	    !cpu_feature_enabled(X86_FEATURE_IBT))
> > +		return;
> > +
> > +	cr4_set_bits(X86_CR4_CET);
> > +
> > +	rdmsrl(MSR_IA32_S_CET, msr);
> > +	if (cpu_feature_enabled(X86_FEATURE_IBT))
> > +		msr |= CET_ENDBR_EN;
> > +	wrmsrl(MSR_IA32_S_CET, msr);
> 
> So something I learnt the hard way with shstk is that you really want to
> disable S_CET before heading into purgatory.
> 
> I've got no idea what's going to result from UEFI finally getting CET
> support.  However, clearing out the other IBT settings is probably a
> wise move.
> 
> In particular, if there was a stale legacy bitmap pointer, then
> ibt_selftest() could take #PF ahead of #CP.

How's this then? That writes the whole state to a known value before
enabling CR4.CET to make the thing go...

+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
+{
+       u64 msr = CET_ENDBR_EN;
+
+       if (!IS_ENABLED(CONFIG_X86_IBT) ||
+           !cpu_feature_enabled(X86_FEATURE_IBT))
+               return;
+
+       wrmsrl(MSR_IA32_S_CET, msr);
+       cr4_set_bits(X86_CR4_CET);
+
+       if (!ibt_selftest()) {
+               pr_err("IBT selftest: Failed!\n");
+               setup_clear_cpu_cap(X86_FEATURE_IBT);
+       }
+}


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 21:14   ` Josh Poimboeuf
@ 2022-02-18 21:21     ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:21 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 01:14:51PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:07PM +0100, Peter Zijlstra wrote:
> > +#ifdef CONFIG_X86_64
> > +#define ASM_ENDBR	"endbr64\n\t"
> > +#else
> > +#define ASM_ENDBR	"endbr32\n\t"
> > +#endif
> 
> Is it safe to assume all supported assemblers know this instruction?

I was hoping the answer was yes, given CC_HAS_IBT.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/29] x86/module: Fix the paravirt vs alternative order
  2022-02-18 20:28   ` Josh Poimboeuf
@ 2022-02-18 21:22     ` Peter Zijlstra
  2022-02-18 23:28       ` Josh Poimboeuf
  0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:22 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn,
	Juergen Gross

On Fri, Feb 18, 2022 at 12:28:20PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:04PM +0100, Peter Zijlstra wrote:
> > Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
> > paravirt patching") there is an ordering dependency between patching
> > paravirt ops and patching alternatives, the module loader still
> > violates this.
> > 
> > Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
> > Cc: Juergen Gross <jgross@suse.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Probably a good idea to put the 'para' and 'alt' clauses next to each
> other and add a comment that the ordering is necessary.

Can't, retpolines must be in between, but I'll add a comment to check
alternative.c for ordering constraints.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 21:11     ` David Laight
@ 2022-02-18 21:24       ` Andrew Cooper
  2022-02-18 22:37         ` David Laight
  0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 21:24 UTC (permalink / raw)
  To: David Laight, Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
  Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Andrew Cooper

On 18/02/2022 21:11, David Laight wrote:
> From: Andrew Cooper
>> Sent: 18 February 2022 20:50
>>
>> On 18/02/2022 16:49, Peter Zijlstra wrote:
>>> +/*
>>> + * A bit convoluted, but matches both endbr32 and endbr64 without
>>> + * having either as literal in the text.
>>> + */
>>> +static inline bool is_endbr(const void *addr)
>>> +{
>>> +	unsigned int val = ~*(unsigned int *)addr;
>>> +	val |= 0x01000000U;
>>> +	return val == ~0xfa1e0ff3;
>>> +}
>> At this point, I feel I've earned an "I told you so". :)
>>
>> Clang 13 sees straight through the trickery and generates:
>>
>> is_endbr:                               # @is_endbr
>>         movl    $-16777217, %eax                # imm = 0xFEFFFFFF
>>         andl    (%rdi), %eax
>>         cmpl    $-98693133, %eax                # imm = 0xFA1E0FF3
>>         sete    %al
>>         retq
> I think it is enough to add:
> 	asm("", "=r" (val));
> somewhere in the middle.

(First, you mean "+r" not "=r"), but no - the problem isn't val.  It's
`~0xfa1e0ff3` which the compiler is free to transform in several unsafe way.

~Andrew

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 20:49   ` Andrew Cooper
  2022-02-18 21:11     ` David Laight
@ 2022-02-18 21:26     ` Peter Zijlstra
  1 sibling, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-18 21:26 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, joao, hjl.tools, jpoimboe, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 08:49:45PM +0000, Andrew Cooper wrote:
> On 18/02/2022 16:49, Peter Zijlstra wrote:
> > +/*
> > + * A bit convoluted, but matches both endbr32 and endbr64 without
> > + * having either as literal in the text.
> > + */
> > +static inline bool is_endbr(const void *addr)
> > +{
> > +	unsigned int val = ~*(unsigned int *)addr;
> > +	val |= 0x01000000U;
> > +	return val == ~0xfa1e0ff3;
> > +}
> 
> At this point, I feel I've earned an "I told you so". :)

Ha! I actually have a note to double-check this. But yes, I'll stuff
that piece of asm in so I can forget about it.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
  2022-02-18 20:49   ` Andrew Cooper
  2022-02-18 21:14   ` Josh Poimboeuf
@ 2022-02-18 22:12   ` Joao Moreira
  2022-02-19  1:07   ` Edgecombe, Rick P
  3 siblings, 0 replies; 94+ messages in thread
From: Joao Moreira @ 2022-02-18 22:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn

> +config CC_HAS_IBT
> +	# GCC >= 9 and binutils >= 2.29
> +	# Retpoline check to work around
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
> +	def_bool $(cc-option, -fcf-protection=branch
> -mindirect-branch-register) && $(as-instr,endbr64)
> +
Is -mindirect-branch-register breaks compiling with clang. Maybe we 
should we do instead?

+       def_bool ($(cc-option, -fcf-protection=branch 
-mindirect-branch-register) || $(cc-option, -mretpoline-external-thunk)) 
&& $(as-instr,endbr64)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* RE: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 21:24       ` Andrew Cooper
@ 2022-02-18 22:37         ` David Laight
  0 siblings, 0 replies; 94+ messages in thread
From: David Laight @ 2022-02-18 22:37 UTC (permalink / raw)
  To: 'Andrew Cooper', Peter Zijlstra, x86, joao, hjl.tools, jpoimboe
  Cc: linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn

From: Andrew Cooper
> Sent: 18 February 2022 21:24
> 
> On 18/02/2022 21:11, David Laight wrote:
> > From: Andrew Cooper
> >> Sent: 18 February 2022 20:50
> >>
> >> On 18/02/2022 16:49, Peter Zijlstra wrote:
> >>> +/*
> >>> + * A bit convoluted, but matches both endbr32 and endbr64 without
> >>> + * having either as literal in the text.
> >>> + */
> >>> +static inline bool is_endbr(const void *addr)
> >>> +{
> >>> +	unsigned int val = ~*(unsigned int *)addr;
> >>> +	val |= 0x01000000U;
> >>> +	return val == ~0xfa1e0ff3;
> >>> +}
> >> At this point, I feel I've earned an "I told you so". :)
> >>
> >> Clang 13 sees straight through the trickery and generates:
> >>
> >> is_endbr:                               # @is_endbr
> >>         movl    $-16777217, %eax                # imm = 0xFEFFFFFF
> >>         andl    (%rdi), %eax
> >>         cmpl    $-98693133, %eax                # imm = 0xFA1E0FF3
> >>         sete    %al
> >>         retq
> > I think it is enough to add:
> > 	asm("", "=r" (val));
> > somewhere in the middle.
> 
> (First, you mean "+r" not "=r"),

I always double check....

> but no - the problem isn't val.  It's
> `~0xfa1e0ff3` which the compiler is free to transform in several unsafe way.

Actually you could do (modulo stupid errors):
	val = (*(unsigned int *)addr & ~0x01000000) ^ 0xff3;
	asm("", "+r" (val));
	return val ^ 0xfa1e0000;
which should be zero for endbra and non-zero overwise.
Shame the compiler will probably never use the flags from the final xor.
Converting to bool just adds code!
(I hate bool)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 19/29] x86/ibt,xen: Annotate away warnings
  2022-02-18 21:05     ` Peter Zijlstra
@ 2022-02-18 23:07       ` Andrew Cooper
  2022-02-21 14:20         ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Andrew Cooper @ 2022-02-18 23:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, Juergen Gross, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Juergen Gross, Andrew Cooper, Andy Lutomirski

On 18/02/2022 21:05, Peter Zijlstra wrote:
> On Fri, Feb 18, 2022 at 08:24:41PM +0000, Andrew Cooper wrote:
>> at a minimum, and possibly also:
>>
>> diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
>> index 444d824775f6..96db5c50a6e7 100644
>> --- a/arch/x86/xen/xen-asm.S
>> +++ b/arch/x86/xen/xen-asm.S
>> @@ -124,7 +124,7 @@ SYM_CODE_START(xen_\name)
>>         UNWIND_HINT_EMPTY
>>         pop %rcx
>>         pop %r11
>> -       jmp  \name
>> +       jmp  \name + 4 * IS_ENABLED(CONFIG_X86_IBT)
>>  SYM_CODE_END(xen_\name)
>>  _ASM_NOKPROBE(xen_\name)
>>  .endm
> objtool will do that for you, it will rewrite all direct jmp/call to
> endbr.

Ah - great.

> Something like so then?

Looks plausible,  although Juergen would be a better person to judge.


About paravirt_iret, this is all way more complicated than it needs to be.

Currently, there are two users of INTERRUPT_RETURN.

The first, in swapgs_restore_regs_and_return_to_usermode, is never going
to execute until patching is complete, and is already behind an
alternative causing XENPV to go a different way, which means that:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 97b1f84bb53f..f9a021e7688a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -608,8 +608,8 @@
SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
 
        /* Restore RDI. */
        popq    %rdi
-       SWAPGS
-       INTERRUPT_RETURN
+       swapgs
+       jmp     native_iret
 
 
 SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)

is correct AFAICT.  (Tangent; then ESPFIX64 can be simplified because
only the return-to-user path needs the LDT check, so the enter/exit user
state can be dropped.)


That leaves the single INTERRUPT_RETURN in
restore_regs_and_return_to_kernel.  Xen PV is an easy environment to
start up in, so:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 97b1f84bb53f..a9e7846cc176 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -626,7 +626,10 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel,
SYM_L_GLOBAL)
         * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
         * when returning from IPI handler.
         */
-       INTERRUPT_RETURN
+#ifdef CONFIG_XEN_PV
+early_iret_patch:
+#endif
+        jmp native_iret
 
 SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
        UNWIND_HINT_IRET_REGS
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 6a64496edefb..31f136328c84 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -66,6 +66,10 @@ SYM_CODE_START(startup_xen)
        cdq
        wrmsr
 
+       mov     $native_iret, %rax
+       sub     $xen_iret, %rax
+       add     %eax, 1 + early_iret_patch
+
        call xen_start_kernel
 SYM_CODE_END(startup_xen)
        __FINIT

really should be good enough to drop INTERRUPT_RETURN and paravirt_iret
entirely.

Obviously, that's very hacky, and might better be expressed like:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 97b1f84bb53f..af371e4f0dda 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -626,7 +626,7 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel,
SYM_L_GLOBAL)
         * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
         * when returning from IPI handler.
         */
-       INTERRUPT_RETURN
+       EARLY_ALTERNATIVE "jmp native_iret", "jmp xen_iret",
X86_FEATURE_XENPV
 
 SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
        UNWIND_HINT_IRET_REGS

or so, but my point is that the early Xen code, if it can identify this
patch point separate to the list of everything, can easily arrange for
it to be modified before HYPERCALL_set_trap_table (Xen PV's LIDT), and
then return_to_kernel is in its fully configured state (paravirt or
otherwise) before interrupts/exceptions can be taken.

~Andrew

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 02/29] x86/module: Fix the paravirt vs alternative order
  2022-02-18 21:22     ` Peter Zijlstra
@ 2022-02-18 23:28       ` Josh Poimboeuf
  0 siblings, 0 replies; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-18 23:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn,
	Juergen Gross

On Fri, Feb 18, 2022 at 10:22:46PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 18, 2022 at 12:28:20PM -0800, Josh Poimboeuf wrote:
> > On Fri, Feb 18, 2022 at 05:49:04PM +0100, Peter Zijlstra wrote:
> > > Ever since commit 4e6292114c741 ("x86/paravirt: Add new features for
> > > paravirt patching") there is an ordering dependency between patching
> > > paravirt ops and patching alternatives, the module loader still
> > > violates this.
> > > 
> > > Fixes: 4e6292114c741 ("x86/paravirt: Add new features for paravirt patching")
> > > Cc: Juergen Gross <jgross@suse.com>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > 
> > Probably a good idea to put the 'para' and 'alt' clauses next to each
> > other and add a comment that the ordering is necessary.
> 
> Can't, retpolines must be in between, but I'll add a comment to check
> alternative.c for ordering constraints.

Ah, even more justification for a comment then ;-)

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 07/29] x86/entry: Sprinkle ENDBR dust
  2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
@ 2022-02-19  0:23   ` Josh Poimboeuf
  2022-02-19 23:08     ` Peter Zijlstra
  2022-02-19  0:36   ` Josh Poimboeuf
  1 sibling, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19  0:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 05:49:09PM +0100, Peter Zijlstra wrote:
> Kernel entry points should be having ENDBR on for IBT configs.
> 
> The SYSCALL entry points are found through taking their respective
> address in order to program them in the MSRs, while the exception
> entry points are found through UNWIND_HINT_IRET_REGS.
> 
> *Except* that latter hint is also used on exit code to denote when
> we're down to an IRET frame. As such add an additional 'entry'
> argument to the macro and have it default to '1' such that objtool
> will assume it's an entry and WARN about it.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

So we now have two unwind types which are identical, except one requires
ENDBR after it.

It's not ideal.  The code has to make sure to get the annotations right
for objtool to do its job.  Setting the macro's default to 'entry=1'
does help with that, but still... it's clunky.

Also, calling them "entry" and "exit" is confusing.  Not all the exits
are exits.  Their common attribute is really that they're not "entry".

How important is it for objtool to validate these anyway?  Seems like
such bugs would be few and far between, and would be discovered in a
jiffy after bricking the system.

Another possibly better and less intrusive way of doing this would be
for objtool to realize that any UNWIND_HINT_IRET_REGS at the beginning
of a SYM_CODE_START (global non-function code symbol) needs ENDBR.

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 07/29] x86/entry: Sprinkle ENDBR dust
  2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
  2022-02-19  0:23   ` Josh Poimboeuf
@ 2022-02-19  0:36   ` Josh Poimboeuf
  1 sibling, 0 replies; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19  0:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 05:49:09PM +0100, Peter Zijlstra wrote:
> -	.align 8
> +
> +	.align IDT_ALIGN
>  SYM_CODE_START(irq_entries_start)
>      vector=FIRST_EXTERNAL_VECTOR
>      .rept NR_EXTERNAL_VECTORS
> -	UNWIND_HINT_IRET_REGS
> +	UNWIND_HINT_IRET_REGS entry=1
>  0 :
> +	ENDBR
>  	.byte	0x6a, vector
>  	jmp	asm_common_interrupt
> -	nop
>  	/* Ensure that the above is 8 bytes max */

"IDT_ALIGN bytes max" ?

> -	. = 0b + 8
> +	.fill 0b + IDT_ALIGN - ., 1, 0x90
>  	vector = vector+1
>      .endr
>  SYM_CODE_END(irq_entries_start)
>  
>  #ifdef CONFIG_X86_LOCAL_APIC
> -	.align 8
> +	.align IDT_ALIGN
>  SYM_CODE_START(spurious_entries_start)
>      vector=FIRST_SYSTEM_VECTOR
>      .rept NR_SYSTEM_VECTORS
> -	UNWIND_HINT_IRET_REGS
> +	UNWIND_HINT_IRET_REGS entry=1
>  0 :
> +	ENDBR
>  	.byte	0x6a, vector
>  	jmp	asm_spurious_interrupt
> -	nop
>  	/* Ensure that the above is 8 bytes max */

Ditto

> -	. = 0b + 8
> +	.fill 0b + IDT_ALIGN - ., 1, 0x90
>  	vector = vector+1
>      .endr

>  SYM_CODE_END(spurious_entries_start)
> --- a/arch/x86/include/asm/segment.h
> +++ b/arch/x86/include/asm/segment.h
> @@ -4,6 +4,7 @@
>  
>  #include <linux/const.h>
>  #include <asm/alternative.h>
> +#include <asm/ibt.h>
>  
>  /*
>   * Constructor for a conventional segment GDT (or LDT) entry.
> @@ -275,7 +276,11 @@ static inline void vdso_read_cpunode(uns
>   * vector has no error code (two bytes), a 'push $vector_number' (two
>   * bytes), and a jump to the common entry code (up to five bytes).
>   */
> +#ifdef CONFIG_X86_IBT
> +#define EARLY_IDT_HANDLER_SIZE 13
> +#else
>  #define EARLY_IDT_HANDLER_SIZE 9
> +#endif

Might want to add a sentence to the comment above: With IDT enabled,
ENDBR adds another four bytes.

>  /*
>   * xen_early_idt_handler_array is for Xen pv guests: for each entry in
> --- a/arch/x86/include/asm/unwind_hints.h
> +++ b/arch/x86/include/asm/unwind_hints.h
> @@ -11,7 +11,7 @@
>  	UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1
>  .endm
>  
> -.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0
> +.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1
>  	.if \base == %rsp
>  		.if \indirect
>  			.set sp_reg, ORC_REG_SP_INDIRECT
> @@ -33,9 +33,17 @@
>  	.set sp_offset, \offset
>  
>  	.if \partial
> -		.set type, UNWIND_HINT_TYPE_REGS_PARTIAL
> +		.if \entry
> +		.set type, UNWIND_HINT_TYPE_REGS_ENTRY
> +		.else
> +		.set type, UNWIND_HINT_TYPE_REGS_EXIT
> +		.endif
>  	.elseif \extra == 0
> -		.set type, UNWIND_HINT_TYPE_REGS_PARTIAL
> +		.if \entry
> +		.set type, UNWIND_HINT_TYPE_REGS_ENTRY
> +		.else
> +		.set type, UNWIND_HINT_TYPE_REGS_EXIT
> +		.endif
>  		.set sp_offset, \offset + (16*8)

'extra' is apparently no longer needed and can be shown the door.

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 05/29] x86: Base IBT bits
  2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
                     ` (2 preceding siblings ...)
  2022-02-18 22:12   ` Joao Moreira
@ 2022-02-19  1:07   ` Edgecombe, Rick P
  3 siblings, 0 replies; 94+ messages in thread
From: Edgecombe, Rick P @ 2022-02-19  1:07 UTC (permalink / raw)
  To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
  Cc: keescook, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
	Milburn, Alyssa

On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1861,6 +1861,21 @@ config X86_UMIP
>           specific cases in protected and virtual-8086 modes.
> Emulated
>           results are dummy.
>  
> +config CC_HAS_IBT
> +       # GCC >= 9 and binutils >= 2.29
> +       # Retpoline check to work around 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
> +       def_bool $(cc-option, -fcf-protection=branch -mindirect-
> branch-register) && $(as-instr,endbr64)
> +
> +config X86_IBT
> +       prompt "Indirect Branch Tracking"
> +       bool
> +       depends on X86_64 && CC_HAS_IBT
> +       help
> +         Build the kernel with support for Indirect Branch Tracking,
> a
> +         hardware supported CFI scheme. Any indirect call must land
> on
> +         an ENDBR instruction, as such, the compiler will litter the
> +         code with them to make this happen.
> +
> 

Could you call this something more specific then just X86_IBT? Like
X86_KERNEL_IBT or something? It could get confusing if we add userspace
IBT, or if someone wants IBT for KVM guests without CFI in the kernel.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
  2022-02-18 19:31   ` Andrew Cooper
@ 2022-02-19  1:20   ` Edgecombe, Rick P
  2022-02-19  1:21   ` Josh Poimboeuf
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 94+ messages in thread
From: Edgecombe, Rick P @ 2022-02-19  1:20 UTC (permalink / raw)
  To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
  Cc: keescook, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
	Milburn, Alyssa

On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> +static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> +{
> +       u64 msr;
> +
> +       if (!IS_ENABLED(CONFIG_X86_IBT) ||
> +           !cpu_feature_enabled(X86_FEATURE_IBT))
> +               return;
> +
> +       cr4_set_bits(X86_CR4_CET);
> +
> +       rdmsrl(MSR_IA32_S_CET, msr);
> +       if (cpu_feature_enabled(X86_FEATURE_IBT))

It must be true because of the above check.

> +               msr |= CET_ENDBR_EN;
> +       wrmsrl(MSR_IA32_S_CET, msr);
> +
> +       if (!ibt_selftest()) {
> +               pr_err("IBT selftest: Failed!\n");
> +               setup_clear_cpu_cap(X86_FEATURE_IBT);
> +       }
> +}
> +

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
  2022-02-18 19:31   ` Andrew Cooper
  2022-02-19  1:20   ` Edgecombe, Rick P
@ 2022-02-19  1:21   ` Josh Poimboeuf
  2022-02-19  9:24     ` Peter Zijlstra
  2022-02-21  8:24   ` Kees Cook
  2022-02-22  4:38   ` Edgecombe, Rick P
  4 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19  1:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 05:49:16PM +0100, Peter Zijlstra wrote:
> +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
> +{
> +	if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
> +		pr_err("Whaaa?!?!\n");
> +		return;
> +	}

Might want to upgrade that to a proper warning :-)

> +bool ibt_selftest(void)
> +{
> +	ibt_selftest_ok = false;
> +
> +	asm (ANNOTATE_NOENDBR
> +	     "1: lea 2f(%%rip), %%rax\n\t"
> +	     ANNOTATE_RETPOLINE_SAFE
> +	     "   jmp *%%rax\n\t"
> +	     "2: nop\n\t"
> +
> +	     /* unsigned ibt_selftest_ip = 2b */
> +	     ".pushsection .data,\"aw\"\n\t"
> +	     ".align 8\n\t"
> +	     ".type ibt_selftest_ip, @object\n\t"
> +	     ".size ibt_selftest_ip, 8\n\t"
> +	     "ibt_selftest_ip:\n\t"
> +	     ".quad 2b\n\t"
> +	     ".popsection\n\t"
> +
> +	     : : : "rax", "memory");

Can 'ibt_selftest_ip' just be defined in C (with __ro_after_init) and
passed as an output to the asm doing 'mov $2b, %[ibt_selftest_ip]'?

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/29] x86: Kernel IBT
  2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
                   ` (28 preceding siblings ...)
  2022-02-18 16:49 ` [PATCH 29/29] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
@ 2022-02-19  1:29 ` Edgecombe, Rick P
  2022-02-19  9:58   ` Peter Zijlstra
  2022-02-23  7:26   ` Kees Cook
  29 siblings, 2 replies; 94+ messages in thread
From: Edgecombe, Rick P @ 2022-02-19  1:29 UTC (permalink / raw)
  To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
  Cc: keescook, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
	Milburn, Alyssa

On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> This is an (almost!) complete Kernel IBT implementation. It's been
> self-hosting
> for a few days now. That is, it runs on IBT enabled hardware
> (Tigerlake) and is
> capable of building the next kernel.
> 
> It is also almost clean on allmodconfig using GCC-11.2.
> 
> The biggest TODO item at this point is Clang, I've not yet looked at
> that.

Do you need to turn this off before kexec?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/29] x86/bugs: Disable Retpoline when IBT
  2022-02-18 16:49 ` [PATCH 16/29] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
@ 2022-02-19  2:15   ` Josh Poimboeuf
  2022-02-22 15:00     ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19  2:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 05:49:18PM +0100, Peter Zijlstra wrote:
> Retpoline and IBT are mutually exclusive. IBT relies on indirect
> branches (JMP/CALL *%reg) while retpoline avoids them by design.
> 
> Demote to LFENCE on IBT enabled hardware.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/cpu/bugs.c |   25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -937,6 +937,11 @@ static void __init spectre_v2_select_mit
>  	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
>  	retpoline_amd:
>  		if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
> +			if (IS_ENABLED(CONFIG_X86_IBT) &&
> +			    boot_cpu_has(X86_FEATURE_IBT)) {
> +				pr_err("Spectre mitigation: LFENCE not serializing, generic retpoline not available due to IBT, switching to none\n");
> +				return;
> +			}
>  			pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n");
>  			goto retpoline_generic;
>  		}
> @@ -945,6 +950,26 @@ static void __init spectre_v2_select_mit
>  		setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
>  	} else {
>  	retpoline_generic:
> +		/*
> +		 *  Full retpoline is incompatible with IBT, demote to LFENCE.
> +		 */
> +		if (IS_ENABLED(CONFIG_X86_IBT) &&
> +		    boot_cpu_has(X86_FEATURE_IBT)) {
> +			switch (cmd) {
> +			case SPECTRE_V2_CMD_FORCE:
> +			case SPECTRE_V2_CMD_AUTO:
> +			case SPECTRE_V2_CMD_RETPOLINE:
> +				/* silent for auto select */
> +				break;
> +
> +			default:
> +				/* warn when 'demoting' an explicit selection */
> +				pr_warn("Spectre mitigation: Switching to LFENCE due to IBT\n");
> +				break;

This code is confusing, not helped by the fact that the existing code
already looks like spaghetti.

Assuming IBT systems also have eIBRS (right?), I don't think the above
SPECTRE_V2_CMD_{FORCE,AUTO} cases would be possible.

AFAICT, if execution reached the retpoline_generic label, the user
specified either RETPOLINE or RETPOLINE_GENERIC.

I'm not sure it makes sense to put RETPOLINE in the "silent" list.  If
the user boots an Intel system with spectre_v2=retpoline on the cmdline,
they're probably expecting a traditional retpoline and should be warned
if that changes, especially if it's a "demotion".

In that case the switch statement isn't even needed.  It can instead
just unconditinoally print the warning.


Also, why "demote" retpoline to LFENCE rather than attempting to
"promote" it to eIBRS?  Maybe there's a good reason but it probably at
least deserves some mention in the commit log.

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 17/29] x86/ibt: Annotate text references
  2022-02-18 16:49 ` [PATCH 17/29] x86/ibt: Annotate text references Peter Zijlstra
@ 2022-02-19  5:22   ` Josh Poimboeuf
  2022-02-19  9:39     ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-19  5:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 05:49:19PM +0100, Peter Zijlstra wrote:
> Annotate away some of the generic code references. This is things
> where we take the address of a symbol for exception handling or return
> addresses (eg. context switch).
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

The vast majority of these annotations can go away if objtool only
requires ENDBR for referenced *STT_FUNC* symbols.

Anything still needing ANNOTATE_NOENDBR after that, might arguably not
belong as STT_FUNC anyway and it might make sense to convert it to
non-function code (e.g. SYM_CODE{START,END}.

> @@ -564,12 +565,16 @@ SYM_CODE_END(\asmsym)
>  	.align 16
>  	.globl __irqentry_text_start
>  __irqentry_text_start:
> +	ANNOTATE_NOENDBR // unwinders
> +	ud2;
>  
>  #include <asm/idtentry.h>
>  
>  	.align 16
>  	.globl __irqentry_text_end
>  __irqentry_text_end:
> +	ANNOTATE_NOENDBR
> +	ud2;

Why ud2?  If no ud2 then the annotation shouldn't be needed since the
first idt entry has ENDBR.

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-19  1:21   ` Josh Poimboeuf
@ 2022-02-19  9:24     ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-19  9:24 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 05:21:55PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:16PM +0100, Peter Zijlstra wrote:
> > +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
> > +{
> > +	if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
> > +		pr_err("Whaaa?!?!\n");
> > +		return;
> > +	}
> 
> Might want to upgrade that to a proper warning :-)

"Unexpected #CP\n" ?

> > +bool ibt_selftest(void)
> > +{
> > +	ibt_selftest_ok = false;
> > +
> > +	asm (ANNOTATE_NOENDBR
> > +	     "1: lea 2f(%%rip), %%rax\n\t"
> > +	     ANNOTATE_RETPOLINE_SAFE
> > +	     "   jmp *%%rax\n\t"
> > +	     "2: nop\n\t"
> > +
> > +	     /* unsigned ibt_selftest_ip = 2b */
> > +	     ".pushsection .data,\"aw\"\n\t"
> > +	     ".align 8\n\t"
> > +	     ".type ibt_selftest_ip, @object\n\t"
> > +	     ".size ibt_selftest_ip, 8\n\t"
> > +	     "ibt_selftest_ip:\n\t"
> > +	     ".quad 2b\n\t"
> > +	     ".popsection\n\t"
> > +
> > +	     : : : "rax", "memory");
> 
> Can 'ibt_selftest_ip' just be defined in C (with __ro_after_init) and
> passed as an output to the asm doing 'mov $2b, %[ibt_selftest_ip]'?

This seemed simpler... note that it's ran on cpu bringup, so if you do
cpu hotplug it'll end up trying to write to ro memory if you do what you
suggest.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 17/29] x86/ibt: Annotate text references
  2022-02-19  5:22   ` Josh Poimboeuf
@ 2022-02-19  9:39     ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-19  9:39 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 09:22:16PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:19PM +0100, Peter Zijlstra wrote:
> > Annotate away some of the generic code references. This is things
> > where we take the address of a symbol for exception handling or return
> > addresses (eg. context switch).
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> The vast majority of these annotations can go away if objtool only
> requires ENDBR for referenced *STT_FUNC* symbols.
> 
> Anything still needing ANNOTATE_NOENDBR after that, might arguably not
> belong as STT_FUNC anyway and it might make sense to convert it to
> non-function code (e.g. SYM_CODE{START,END}.

I really rather prefer objtool to err to the side of caution for now.
Missing ENDBR typically bricks a box hard, normal consoles don't get
around to showing anything. My force_early_printk patches saved the day
a number of times.

Given that the only hardware I have with this on is a NUC without
serial, this is a massive pain in the arse to debug. That box has been
 >< close to total destruction a number of times. I never want to do that
ever again, life's too short to have to work with a NUC.

> > @@ -564,12 +565,16 @@ SYM_CODE_END(\asmsym)
> >  	.align 16
> >  	.globl __irqentry_text_start
> >  __irqentry_text_start:
> > +	ANNOTATE_NOENDBR // unwinders
> > +	ud2;
> >  
> >  #include <asm/idtentry.h>
> >  
> >  	.align 16
> >  	.globl __irqentry_text_end
> >  __irqentry_text_end:
> > +	ANNOTATE_NOENDBR
> > +	ud2;
> 
> Why ud2?  If no ud2 then the annotation shouldn't be needed since the
> first idt entry has ENDBR.

paranoia :-) just to make absolutely sure nobody ever tries to call
__irqentry_text_end, but yes, removed it.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/29] x86: Kernel IBT
  2022-02-19  1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
@ 2022-02-19  9:58   ` Peter Zijlstra
  2022-02-19 16:00     ` Andrew Cooper
  2022-02-21  8:42     ` Kees Cook
  2022-02-23  7:26   ` Kees Cook
  1 sibling, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-19  9:58 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Poimboe, Josh, hjl.tools, x86, joao, Cooper, Andrew, keescook,
	linux-kernel, mark.rutland, samitolvanen, ndesaulniers, Milburn,
	Alyssa

On Sat, Feb 19, 2022 at 01:29:45AM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> > This is an (almost!) complete Kernel IBT implementation. It's been
> > self-hosting
> > for a few days now. That is, it runs on IBT enabled hardware
> > (Tigerlake) and is
> > capable of building the next kernel.
> > 
> > It is also almost clean on allmodconfig using GCC-11.2.
> > 
> > The biggest TODO item at this point is Clang, I've not yet looked at
> > that.
> 
> Do you need to turn this off before kexec?

Probably... :-) I've never looked at that code though; so I'm not
exactly sure where to put things.

I'm assuming kexec does a hot-unplug of all but the boot-cpu which then
leaves only a single CPU with state in machine_kexec() ? Does the below
look reasonable?

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -638,6 +638,12 @@ static __always_inline void setup_cet(st
 	}
 }
 
+void cet_disable(void)
+{
+	cr4_clear_bits(X86_CR4_CET);
+	wrmsrl(MSR_IA32_S_CET, 0);
+}
+
 /*
  * Some CPU features depend on higher CPUID levels, which may not always
  * be available due to CPUID level capping or broken virtualization
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 33d41e350c79..cf26356db53e 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -72,4 +72,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
 #else
 static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
 #endif
+
+extern void cet_disable(void);
+
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index f5da4a18070a..29a2a1732605 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -310,6 +310,7 @@ void machine_kexec(struct kimage *image)
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
 	hw_breakpoint_disable();
+	cet_disable();
 
 	if (image->preserve_context) {
 #ifdef CONFIG_X86_IO_APIC

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/29] x86: Kernel IBT
  2022-02-19  9:58   ` Peter Zijlstra
@ 2022-02-19 16:00     ` Andrew Cooper
  2022-02-21  8:42     ` Kees Cook
  1 sibling, 0 replies; 94+ messages in thread
From: Andrew Cooper @ 2022-02-19 16:00 UTC (permalink / raw)
  To: Peter Zijlstra, Edgecombe, Rick P
  Cc: Poimboe, Josh, hjl.tools, x86, joao, keescook, linux-kernel,
	mark.rutland, samitolvanen, ndesaulniers, Milburn, Alyssa,
	Andrew Cooper

On 19/02/2022 09:58, Peter Zijlstra wrote:
> On Sat, Feb 19, 2022 at 01:29:45AM +0000, Edgecombe, Rick P wrote:
>> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
>>> This is an (almost!) complete Kernel IBT implementation. It's been
>>> self-hosting
>>> for a few days now. That is, it runs on IBT enabled hardware
>>> (Tigerlake) and is
>>> capable of building the next kernel.
>>>
>>> It is also almost clean on allmodconfig using GCC-11.2.
>>>
>>> The biggest TODO item at this point is Clang, I've not yet looked at
>>> that.
>> Do you need to turn this off before kexec?
> Probably... :-) I've never looked at that code though; so I'm not
> exactly sure where to put things.
>
> I'm assuming kexec does a hot-unplug of all but the boot-cpu which then
> leaves only a single CPU with state in machine_kexec() ? Does the below
> look reasonable?

If you skip writing to S_CET on hardware that doesn't have it, probably.

~Andrew

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 07/29] x86/entry: Sprinkle ENDBR dust
  2022-02-19  0:23   ` Josh Poimboeuf
@ 2022-02-19 23:08     ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-19 23:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 04:23:38PM -0800, Josh Poimboeuf wrote:
> Another possibly better and less intrusive way of doing this would be
> for objtool to realize that any UNWIND_HINT_IRET_REGS at the beginning
> of a SYM_CODE_START (global non-function code symbol) needs ENDBR.

This; I likes that. I reverted this patch from the tree (very much
including the annotations), redid the objtool check and regenerated the
missing ENDBR given the objtool output.

I think the few missing ENDBRs in this are due to using x86_64-defconfig
instead of allmodconfig. I'll try on monday after spooling up a real
build machine :-)

---
 arch/x86/entry/entry_64.S           | 32 +++++++++++++++-----------------
 arch/x86/entry/entry_64_compat.S    |  3 +--
 arch/x86/include/asm/idtentry.h     | 19 +++++++------------
 arch/x86/include/asm/segment.h      |  7 +------
 arch/x86/include/asm/unwind_hints.h | 18 +++++-------------
 arch/x86/kernel/head_64.S           | 13 ++++++-------
 arch/x86/kernel/unwind_orc.c        |  3 +--
 include/linux/objtool.h             |  5 ++---
 tools/include/linux/objtool.h       |  5 ++---
 tools/objtool/check.c               | 13 ++++++++-----
 tools/objtool/orc_dump.c            |  3 +--
 11 files changed, 49 insertions(+), 72 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 77e222f2061e..d69239c638a2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -39,7 +39,6 @@
 #include <asm/trapnr.h>
 #include <asm/nospec-branch.h>
 #include <asm/fsgsbase.h>
-#include <asm/ibt.h>
 #include <linux/err.h>
 
 #include "calling.h"
@@ -87,8 +86,8 @@
 
 SYM_CODE_START(entry_SYSCALL_64)
 	UNWIND_HINT_EMPTY
-
 	ENDBR
+
 	swapgs
 	/* tss.sp2 is scratch space. */
 	movq	%rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
@@ -353,7 +352,7 @@ SYM_CODE_END(ret_from_fork)
  */
 .macro idtentry vector asmsym cfunc has_error_code:req
 SYM_CODE_START(\asmsym)
-	UNWIND_HINT_IRET_REGS offset=\has_error_code*8 entry=1
+	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
 	ENDBR
 	ASM_CLAC
 
@@ -371,7 +370,7 @@ SYM_CODE_START(\asmsym)
 		.rept	6
 		pushq	5*8(%rsp)
 		.endr
-		UNWIND_HINT_IRET_REGS offset=8 entry=0
+		UNWIND_HINT_IRET_REGS offset=8
 .Lfrom_usermode_no_gap_\@:
 	.endif
 
@@ -421,7 +420,7 @@ SYM_CODE_END(\asmsym)
  */
 .macro idtentry_mce_db vector asmsym cfunc
 SYM_CODE_START(\asmsym)
-	UNWIND_HINT_IRET_REGS entry=1
+	UNWIND_HINT_IRET_REGS
 	ENDBR
 	ASM_CLAC
 
@@ -477,7 +476,7 @@ SYM_CODE_END(\asmsym)
  */
 .macro idtentry_vc vector asmsym cfunc
 SYM_CODE_START(\asmsym)
-	UNWIND_HINT_IRET_REGS entry=1
+	UNWIND_HINT_IRET_REGS
 	ENDBR
 	ASM_CLAC
 
@@ -539,7 +538,7 @@ SYM_CODE_END(\asmsym)
  */
 .macro idtentry_df vector asmsym cfunc
 SYM_CODE_START(\asmsym)
-	UNWIND_HINT_IRET_REGS offset=8 entry=1
+	UNWIND_HINT_IRET_REGS offset=8
 	ENDBR
 	ASM_CLAC
 
@@ -641,8 +640,7 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
 	INTERRUPT_RETURN
 
 SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
-	UNWIND_HINT_IRET_REGS entry=0
-	ENDBR // paravirt_iret
+	UNWIND_HINT_IRET_REGS
 	/*
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
@@ -720,7 +718,7 @@ SYM_INNER_LABEL(native_irq_return_iret, SYM_L_GLOBAL)
 	popq	%rdi				/* Restore user RDI */
 
 	movq	%rax, %rsp
-	UNWIND_HINT_IRET_REGS offset=8 entry=0
+	UNWIND_HINT_IRET_REGS offset=8
 
 	/*
 	 * At this point, we cannot write to the stack any more, but we can
@@ -837,13 +835,13 @@ SYM_CODE_START(xen_failsafe_callback)
 	movq	8(%rsp), %r11
 	addq	$0x30, %rsp
 	pushq	$0				/* RIP */
-	UNWIND_HINT_IRET_REGS offset=8 entry=0
+	UNWIND_HINT_IRET_REGS offset=8
 	jmp	asm_exc_general_protection
 1:	/* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
 	movq	(%rsp), %rcx
 	movq	8(%rsp), %r11
 	addq	$0x30, %rsp
-	UNWIND_HINT_IRET_REGS entry=0
+	UNWIND_HINT_IRET_REGS
 	pushq	$-1 /* orig_ax = -1 => not a system call */
 	PUSH_AND_CLEAR_REGS
 	ENCODE_FRAME_POINTER
@@ -1078,7 +1076,7 @@ SYM_CODE_END(error_return)
  *	      when PAGE_TABLE_ISOLATION is in use.  Do not clobber.
  */
 SYM_CODE_START(asm_exc_nmi)
-	UNWIND_HINT_IRET_REGS entry=1
+	UNWIND_HINT_IRET_REGS
 	ENDBR
 
 	/*
@@ -1144,13 +1142,13 @@ SYM_CODE_START(asm_exc_nmi)
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
-	UNWIND_HINT_IRET_REGS base=%rdx offset=8 entry=0
+	UNWIND_HINT_IRET_REGS base=%rdx offset=8
 	pushq	5*8(%rdx)	/* pt_regs->ss */
 	pushq	4*8(%rdx)	/* pt_regs->rsp */
 	pushq	3*8(%rdx)	/* pt_regs->flags */
 	pushq	2*8(%rdx)	/* pt_regs->cs */
 	pushq	1*8(%rdx)	/* pt_regs->rip */
-	UNWIND_HINT_IRET_REGS entry=0
+	UNWIND_HINT_IRET_REGS
 	pushq   $-1		/* pt_regs->orig_ax */
 	PUSH_AND_CLEAR_REGS rdx=(%rdx)
 	ENCODE_FRAME_POINTER
@@ -1306,7 +1304,7 @@ SYM_CODE_START(asm_exc_nmi)
 	.rept 5
 	pushq	11*8(%rsp)
 	.endr
-	UNWIND_HINT_IRET_REGS entry=0
+	UNWIND_HINT_IRET_REGS
 
 	/* Everything up to here is safe from nested NMIs */
 
@@ -1322,7 +1320,7 @@ SYM_CODE_START(asm_exc_nmi)
 	pushq	$__KERNEL_CS	/* CS */
 	pushq	$1f		/* RIP */
 	iretq			/* continues at repeat_nmi below */
-	UNWIND_HINT_IRET_REGS entry=0
+	UNWIND_HINT_IRET_REGS
 1:
 #endif
 
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 316e0fa119b4..86caf7872a25 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -48,8 +48,8 @@
  */
 SYM_CODE_START(entry_SYSENTER_compat)
 	UNWIND_HINT_EMPTY
-	/* Interrupts are off on entry. */
 	ENDBR
+	/* Interrupts are off on entry. */
 	SWAPGS
 
 	pushq	%rax
@@ -344,7 +344,6 @@ SYM_CODE_END(entry_SYSCALL_compat)
  */
 SYM_CODE_START(entry_INT80_compat)
 	UNWIND_HINT_EMPTY
-	ENDBR
 	/*
 	 * Interrupts are off on entry.
 	 */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 9127e1e3c439..1157ee6f98d7 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -5,11 +5,7 @@
 /* Interrupts/Exceptions */
 #include <asm/trapnr.h>
 
-#ifdef CONFIG_X86_IBT
-#define IDT_ALIGN	16
-#else
-#define IDT_ALIGN	8
-#endif
+#define IDT_ALIGN	(8 * (1 + IS_ENABLED(CONFIG_X86_IBT)))
 
 #ifndef __ASSEMBLY__
 #include <linux/entry-common.h>
@@ -486,7 +482,7 @@ __visible noinstr void func(struct pt_regs *regs,			\
 
 /*
  * ASM code to emit the common vector entry stubs where each stub is
- * packed into 8 bytes.
+ * packed into IDT_ALIGN bytes.
  *
  * Note, that the 'pushq imm8' is emitted via '.byte 0x6a, vector' because
  * GCC treats the local vector variable as unsigned int and would expand
@@ -498,17 +494,16 @@ __visible noinstr void func(struct pt_regs *regs,			\
  * point is to mask off the bits above bit 7 because the push is sign
  * extending.
  */
-
 	.align IDT_ALIGN
 SYM_CODE_START(irq_entries_start)
     vector=FIRST_EXTERNAL_VECTOR
     .rept NR_EXTERNAL_VECTORS
-	UNWIND_HINT_IRET_REGS entry=1
-0 :
+	UNWIND_HINT_IRET_REGS
 	ENDBR
+0 :
 	.byte	0x6a, vector
 	jmp	asm_common_interrupt
-	/* Ensure that the above is 8 bytes max */
+	/* Ensure that the above is IDT_ALIGN bytes max */
 	.fill 0b + IDT_ALIGN - ., 1, 0x90
 	vector = vector+1
     .endr
@@ -519,12 +514,12 @@ SYM_CODE_END(irq_entries_start)
 SYM_CODE_START(spurious_entries_start)
     vector=FIRST_SYSTEM_VECTOR
     .rept NR_SYSTEM_VECTORS
-	UNWIND_HINT_IRET_REGS entry=1
+	UNWIND_HINT_IRET_REGS
 0 :
 	ENDBR
 	.byte	0x6a, vector
 	jmp	asm_spurious_interrupt
-	/* Ensure that the above is 8 bytes max */
+	/* Ensure that the above is IDT_ALIGN bytes max */
 	.fill 0b + IDT_ALIGN - ., 1, 0x90
 	vector = vector+1
     .endr
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 6a8a5bcbf14d..3a09647788bd 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -4,7 +4,6 @@
 
 #include <linux/const.h>
 #include <asm/alternative.h>
-#include <asm/ibt.h>
 
 /*
  * Constructor for a conventional segment GDT (or LDT) entry.
@@ -276,11 +275,7 @@ static inline void vdso_read_cpunode(unsigned *cpu, unsigned *node)
  * vector has no error code (two bytes), a 'push $vector_number' (two
  * bytes), and a jump to the common entry code (up to five bytes).
  */
-#ifdef CONFIG_X86_IBT
-#define EARLY_IDT_HANDLER_SIZE 13
-#else
-#define EARLY_IDT_HANDLER_SIZE 9
-#endif
+#define EARLY_IDT_HANDLER_SIZE (9 + 4*IS_ENABLED(CONFIG_X86_IBT))
 
 /*
  * xen_early_idt_handler_array is for Xen pv guests: for each entry in
diff --git a/arch/x86/include/asm/unwind_hints.h b/arch/x86/include/asm/unwind_hints.h
index d5b401c2f9e9..8b33674288ea 100644
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -11,7 +11,7 @@
 	UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1
 .endm
 
-.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0 entry=1
+.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0
 	.if \base == %rsp
 		.if \indirect
 			.set sp_reg, ORC_REG_SP_INDIRECT
@@ -33,17 +33,9 @@
 	.set sp_offset, \offset
 
 	.if \partial
-		.if \entry
-		.set type, UNWIND_HINT_TYPE_REGS_ENTRY
-		.else
-		.set type, UNWIND_HINT_TYPE_REGS_EXIT
-		.endif
+		.set type, UNWIND_HINT_TYPE_REGS_PARTIAL
 	.elseif \extra == 0
-		.if \entry
-		.set type, UNWIND_HINT_TYPE_REGS_ENTRY
-		.else
-		.set type, UNWIND_HINT_TYPE_REGS_EXIT
-		.endif
+		.set type, UNWIND_HINT_TYPE_REGS_PARTIAL
 		.set sp_offset, \offset + (16*8)
 	.else
 		.set type, UNWIND_HINT_TYPE_REGS
@@ -52,8 +44,8 @@
 	UNWIND_HINT sp_reg=sp_reg sp_offset=sp_offset type=type
 .endm
 
-.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0 entry=1
-	UNWIND_HINT_REGS base=\base offset=\offset partial=1 entry=\entry
+.macro UNWIND_HINT_IRET_REGS base=%rsp offset=0
+	UNWIND_HINT_REGS base=\base offset=\offset partial=1
 .endm
 
 .macro UNWIND_HINT_FUNC
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 92e759ae9030..816bc70c9e71 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -25,7 +25,6 @@
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
 #include <asm/fixmap.h>
-#include <asm/ibt.h>
 
 /*
  * We are not able to switch in one step to the final KERNEL ADDRESS SPACE
@@ -332,8 +331,7 @@ SYM_CODE_END(start_cpu0)
  * when .init.text is freed.
  */
 SYM_CODE_START_NOALIGN(vc_boot_ghcb)
-	UNWIND_HINT_IRET_REGS offset=8 entry=1
-	ENDBR
+	UNWIND_HINT_IRET_REGS offset=8
 
 	/* Build pt_regs */
 	PUSH_AND_CLEAR_REGS
@@ -377,24 +375,25 @@ SYM_CODE_START(early_idt_handler_array)
 	i = 0
 	.rept NUM_EXCEPTION_VECTORS
 	.if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0
-		UNWIND_HINT_IRET_REGS entry=1
+		UNWIND_HINT_IRET_REGS
 		ENDBR
 		pushq $0	# Dummy error code, to make stack frame uniform
 	.else
-		UNWIND_HINT_IRET_REGS offset=8 entry=1
+		UNWIND_HINT_IRET_REGS offset=8
 		ENDBR
 	.endif
 	pushq $i		# 72(%rsp) Vector number
 	jmp early_idt_handler_common
-	UNWIND_HINT_IRET_REGS entry=0
+	UNWIND_HINT_IRET_REGS
 	i = i + 1
 	.fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
 	.endr
-	UNWIND_HINT_IRET_REGS offset=16 entry=0
 SYM_CODE_END(early_idt_handler_array)
 	ANNOTATE_NOENDBR // early_idt_handler_array[NUM_EXCEPTION_VECTORS]
 
 SYM_CODE_START_LOCAL(early_idt_handler_common)
+	UNWIND_HINT_IRET_REGS offset=16
+	ANNOTATE_NOENDBR
 	/*
 	 * The stack is the hardware frame, an error code or zero, and the
 	 * vector number.
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index fbf112c5485c..2de3c8c5eba9 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -566,8 +566,7 @@ bool unwind_next_frame(struct unwind_state *state)
 		state->signal = true;
 		break;
 
-	case UNWIND_HINT_TYPE_REGS_ENTRY:
-	case UNWIND_HINT_TYPE_REGS_EXIT:
+	case UNWIND_HINT_TYPE_REGS_PARTIAL:
 		if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) {
 			orc_warn_current("can't access iret registers at %pB\n",
 					 (void *)orig_ip);
diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index 5281e02c2326..fd9d90ec0e48 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -35,9 +35,8 @@ struct unwind_hint {
  */
 #define UNWIND_HINT_TYPE_CALL		0
 #define UNWIND_HINT_TYPE_REGS		1
-#define UNWIND_HINT_TYPE_REGS_ENTRY	2
-#define UNWIND_HINT_TYPE_REGS_EXIT	3
-#define UNWIND_HINT_TYPE_FUNC		4
+#define UNWIND_HINT_TYPE_REGS_PARTIAL	2
+#define UNWIND_HINT_TYPE_FUNC		3
 
 #ifdef CONFIG_STACK_VALIDATION
 
diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
index c48d45733071..aca52db2f3f3 100644
--- a/tools/include/linux/objtool.h
+++ b/tools/include/linux/objtool.h
@@ -35,9 +35,8 @@ struct unwind_hint {
  */
 #define UNWIND_HINT_TYPE_CALL		0
 #define UNWIND_HINT_TYPE_REGS		1
-#define UNWIND_HINT_TYPE_REGS_ENTRY	2
-#define UNWIND_HINT_TYPE_REGS_EXIT	3
-#define UNWIND_HINT_TYPE_FUNC		4
+#define UNWIND_HINT_TYPE_REGS_PARTIAL	2
+#define UNWIND_HINT_TYPE_FUNC		3
 
 #ifdef CONFIG_STACK_VALIDATION
 
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 414c8a1dd868..5db0f66ab8fe 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2488,8 +2488,7 @@ static int update_cfi_state(struct instruction *insn,
 	}
 
 	if (cfi->type == UNWIND_HINT_TYPE_REGS ||
-	    cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY ||
-	    cfi->type == UNWIND_HINT_TYPE_REGS_EXIT)
+	    cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL)
 		return update_cfi_state_regs(insn, cfi, op);
 
 	switch (op->dest.type) {
@@ -3254,9 +3253,13 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
 		if (insn->hint) {
 			state.cfi = *insn->cfi;
 			if (ibt) {
-				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_ENTRY &&
-				    insn->type != INSN_ENDBR) {
-					WARN_FUNC("IRET_ENTRY hint without ENDBR", insn->sec, insn->offset);
+				struct symbol *sym;
+				if (insn->cfi->type == UNWIND_HINT_TYPE_REGS_PARTIAL &&
+				    (sym = find_symbol_by_offset(insn->sec, insn->offset)) &&
+				    insn->type != INSN_ENDBR && !insn->noendbr) {
+					WARN_FUNC("IRET_REGS hint without ENDBR: %s",
+						  insn->sec, insn->offset,
+						  sym->name);
 				}
 			}
 		} else {
diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c
index 145cef3535c2..f5a8508c42d6 100644
--- a/tools/objtool/orc_dump.c
+++ b/tools/objtool/orc_dump.c
@@ -43,8 +43,7 @@ static const char *orc_type_name(unsigned int type)
 		return "call";
 	case UNWIND_HINT_TYPE_REGS:
 		return "regs";
-	case UNWIND_HINT_TYPE_REGS_ENTRY:
-	case UNWIND_HINT_TYPE_REGS_EXIT:
+	case UNWIND_HINT_TYPE_REGS_PARTIAL:
 		return "regs (partial)";
 	default:
 		return "?";

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
                     ` (2 preceding siblings ...)
  2022-02-19  1:21   ` Josh Poimboeuf
@ 2022-02-21  8:24   ` Kees Cook
  2022-02-22  4:38   ` Edgecombe, Rick P
  4 siblings, 0 replies; 94+ messages in thread
From: Kees Cook @ 2022-02-21  8:24 UTC (permalink / raw)
  To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, samitolvanen, mark.rutland,
	alyssa.milburn



On February 18, 2022 8:49:16 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>The bits required to make the hardware go.. Of note is that, provided
>the syscall entry points are covered with ENDBR, #CP doesn't need to
>be an IST because we'll never hit the syscall gap.
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> arch/x86/include/asm/cpufeatures.h          |    1 
> arch/x86/include/asm/idtentry.h             |    5 ++
> arch/x86/include/asm/msr-index.h            |   20 ++++++++
> arch/x86/include/asm/traps.h                |    2 
> arch/x86/include/uapi/asm/processor-flags.h |    2 
> arch/x86/kernel/cpu/common.c                |   23 +++++++++
> arch/x86/kernel/idt.c                       |    4 +
> arch/x86/kernel/traps.c                     |   65 ++++++++++++++++++++++++++++
> 8 files changed, 121 insertions(+), 1 deletion(-)
>
>--- a/arch/x86/include/asm/cpufeatures.h
>+++ b/arch/x86/include/asm/cpufeatures.h
>@@ -387,6 +387,7 @@
> #define X86_FEATURE_TSXLDTRK		(18*32+16) /* TSX Suspend Load Address Tracking */
> #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
> #define X86_FEATURE_ARCH_LBR		(18*32+19) /* Intel ARCH LBR */
>+#define X86_FEATURE_IBT			(18*32+20) /* Indirect Branch Tracking */
> #define X86_FEATURE_AMX_BF16		(18*32+22) /* AMX bf16 Support */
> #define X86_FEATURE_AVX512_FP16		(18*32+23) /* AVX512 FP16 */
> #define X86_FEATURE_AMX_TILE		(18*32+24) /* AMX tile Support */
>--- a/arch/x86/include/asm/idtentry.h
>+++ b/arch/x86/include/asm/idtentry.h
>@@ -622,6 +622,11 @@ DECLARE_IDTENTRY_DF(X86_TRAP_DF,	exc_dou
> DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF,	xenpv_exc_double_fault);
> #endif
> 
>+/* #CP */
>+#ifdef CONFIG_X86_IBT
>+DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP,	exc_control_protection);
>+#endif
>+
> /* #VC */
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> DECLARE_IDTENTRY_VC(X86_TRAP_VC,	exc_vmm_communication);
>--- a/arch/x86/include/asm/msr-index.h
>+++ b/arch/x86/include/asm/msr-index.h
>@@ -360,11 +360,29 @@
> #define MSR_ATOM_CORE_TURBO_RATIOS	0x0000066c
> #define MSR_ATOM_CORE_TURBO_VIDS	0x0000066d
> 
>-
> #define MSR_CORE_PERF_LIMIT_REASONS	0x00000690
> #define MSR_GFX_PERF_LIMIT_REASONS	0x000006B0
> #define MSR_RING_PERF_LIMIT_REASONS	0x000006B1
> 
>+/* Control-flow Enforcement Technology MSRs */
>+#define MSR_IA32_U_CET			0x000006a0 /* user mode cet */
>+#define MSR_IA32_S_CET			0x000006a2 /* kernel mode cet */
>+#define CET_SHSTK_EN			BIT_ULL(0)
>+#define CET_WRSS_EN			BIT_ULL(1)
>+#define CET_ENDBR_EN			BIT_ULL(2)
>+#define CET_LEG_IW_EN			BIT_ULL(3)
>+#define CET_NO_TRACK_EN			BIT_ULL(4)
>+#define CET_SUPPRESS_DISABLE		BIT_ULL(5)
>+#define CET_RESERVED			(BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9))
>+#define CET_SUPPRESS			BIT_ULL(10)
>+#define CET_WAIT_ENDBR			BIT_ULL(11)
>+
>+#define MSR_IA32_PL0_SSP		0x000006a4 /* ring-0 shadow stack pointer */
>+#define MSR_IA32_PL1_SSP		0x000006a5 /* ring-1 shadow stack pointer */
>+#define MSR_IA32_PL2_SSP		0x000006a6 /* ring-2 shadow stack pointer */
>+#define MSR_IA32_PL3_SSP		0x000006a7 /* ring-3 shadow stack pointer */
>+#define MSR_IA32_INT_SSP_TAB		0x000006a8 /* exception shadow stack table */
>+
> /* Hardware P state interface */
> #define MSR_PPERF			0x0000064e
> #define MSR_PERF_LIMIT_REASONS		0x0000064f
>--- a/arch/x86/include/asm/traps.h
>+++ b/arch/x86/include/asm/traps.h
>@@ -18,6 +18,8 @@ void __init trap_init(void);
> asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
> #endif
> 
>+extern bool ibt_selftest(void);
>+
> #ifdef CONFIG_X86_F00F_BUG
> /* For handling the FOOF bug */
> void handle_invalid_op(struct pt_regs *regs);
>--- a/arch/x86/include/uapi/asm/processor-flags.h
>+++ b/arch/x86/include/uapi/asm/processor-flags.h
>@@ -130,6 +130,8 @@
> #define X86_CR4_SMAP		_BITUL(X86_CR4_SMAP_BIT)
> #define X86_CR4_PKE_BIT		22 /* enable Protection Keys support */
> #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
>+#define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
>+#define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
> 
> /*
>  * x86-64 Task Priority Register, CR8
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -59,6 +59,7 @@
> #include <asm/cpu_device_id.h>
> #include <asm/uv/uv.h>
> #include <asm/sigframe.h>
>+#include <asm/traps.h>
> 
> #include "cpu.h"
> 
>@@ -592,6 +593,27 @@ static __init int setup_disable_pku(char
> __setup("nopku", setup_disable_pku);
> #endif /* CONFIG_X86_64 */
> 
>+static __always_inline void setup_cet(struct cpuinfo_x86 *c)
>+{
>+	u64 msr;
>+
>+	if (!IS_ENABLED(CONFIG_X86_IBT) ||
>+	    !cpu_feature_enabled(X86_FEATURE_IBT))
>+		return;
>+
>+	cr4_set_bits(X86_CR4_CET);

Please add X86_CR4_CET to cr4_pinned_mask too.

>+
>+	rdmsrl(MSR_IA32_S_CET, msr);
>+	if (cpu_feature_enabled(X86_FEATURE_IBT))
>+		msr |= CET_ENDBR_EN;
>+	wrmsrl(MSR_IA32_S_CET, msr);
>+
>+	if (!ibt_selftest()) {
>+		pr_err("IBT selftest: Failed!\n");
>+		setup_clear_cpu_cap(X86_FEATURE_IBT);
>+	}
>+}
>+
> /*
>  * Some CPU features depend on higher CPUID levels, which may not always
>  * be available due to CPUID level capping or broken virtualization
>@@ -1709,6 +1731,7 @@ static void identify_cpu(struct cpuinfo_
> 
> 	x86_init_rdrand(c);
> 	setup_pku(c);
>+	setup_cet(c);
> 
> 	/*
> 	 * Clear/Set all flags overridden by options, need do it
>--- a/arch/x86/kernel/idt.c
>+++ b/arch/x86/kernel/idt.c
>@@ -104,6 +104,10 @@ static const __initconst struct idt_data
> 	ISTG(X86_TRAP_MC,		asm_exc_machine_check, IST_INDEX_MCE),
> #endif
> 
>+#ifdef CONFIG_X86_IBT
>+	INTG(X86_TRAP_CP,		asm_exc_control_protection),
>+#endif
>+
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> 	ISTG(X86_TRAP_VC,		asm_exc_vmm_communication, IST_INDEX_VC),
> #endif
>--- a/arch/x86/kernel/traps.c
>+++ b/arch/x86/kernel/traps.c
>@@ -210,6 +210,71 @@ DEFINE_IDTENTRY(exc_overflow)
> 	do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL);
> }
> 
>+#ifdef CONFIG_X86_IBT
>+
>+static bool ibt_fatal = true;

__ro_after_init please. :)

>+
>+extern unsigned long ibt_selftest_ip; /* defined in asm beow */
>+static volatile bool ibt_selftest_ok = false;
>+
>+DEFINE_IDTENTRY_ERRORCODE(exc_control_protection)
>+{
>+	if (!cpu_feature_enabled(X86_FEATURE_IBT)) {
>+		pr_err("Whaaa?!?!\n");
>+		return;

Seems like this case should fail closed and not return?

>+	}
>+
>+	if (WARN_ON_ONCE(user_mode(regs) || error_code != 3))
>+		return;
>+
>+	if (unlikely(regs->ip == ibt_selftest_ip)) {
>+		ibt_selftest_ok = true;
>+		return;
>+	}
>+
>+	pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs));
>+	BUG_ON(ibt_fatal);
>+}
>+
>+bool ibt_selftest(void)
>+{
>+	ibt_selftest_ok = false;
>+
>+	asm (ANNOTATE_NOENDBR
>+	     "1: lea 2f(%%rip), %%rax\n\t"
>+	     ANNOTATE_RETPOLINE_SAFE
>+	     "   jmp *%%rax\n\t"
>+	     "2: nop\n\t"
>+
>+	     /* unsigned ibt_selftest_ip = 2b */
>+	     ".pushsection .data,\"aw\"\n\t"
>+	     ".align 8\n\t"
>+	     ".type ibt_selftest_ip, @object\n\t"
>+	     ".size ibt_selftest_ip, 8\n\t"
>+	     "ibt_selftest_ip:\n\t"
>+	     ".quad 2b\n\t"
>+	     ".popsection\n\t"
>+
>+	     : : : "rax", "memory");
>+
>+	return ibt_selftest_ok;
>+}
>+
>+static int __init ibt_setup(char *str)
>+{
>+	if (!strcmp(str, "off"))
>+		setup_clear_cpu_cap(X86_FEATURE_IBT);
>+
>+	if (!strcmp(str, "warn"))
>+		ibt_fatal = false;
>+
>+	return 1;
>+}
>+
>+__setup("ibt=", ibt_setup);
>+
>+#endif /* CONFIG_X86_IBT */
>+
> #ifdef CONFIG_X86_F00F_BUG
> void handle_invalid_op(struct pt_regs *regs)
> #else
>
>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 15/29] x86: Disable IBT around firmware
  2022-02-18 16:49 ` [PATCH 15/29] x86: Disable IBT around firmware Peter Zijlstra
@ 2022-02-21  8:27   ` Kees Cook
  2022-02-21 10:06     ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Kees Cook @ 2022-02-21  8:27 UTC (permalink / raw)
  To: Peter Zijlstra, x86, joao, hjl.tools, jpoimboe, andrew.cooper3
  Cc: linux-kernel, peterz, ndesaulniers, samitolvanen, mark.rutland,
	alyssa.milburn



On February 18, 2022 8:49:17 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>Assume firmware isn't IBT clean and disable it across calls.
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>---
> arch/x86/include/asm/efi.h   |    9 +++++++--
> arch/x86/include/asm/ibt.h   |   10 ++++++++++
> arch/x86/kernel/apm_32.c     |    7 +++++++
> arch/x86/kernel/cpu/common.c |   28 ++++++++++++++++++++++++++++
> 4 files changed, 52 insertions(+), 2 deletions(-)
>
>--- a/arch/x86/include/asm/efi.h
>+++ b/arch/x86/include/asm/efi.h
>@@ -7,6 +7,7 @@
> #include <asm/tlb.h>
> #include <asm/nospec-branch.h>
> #include <asm/mmu_context.h>
>+#include <asm/ibt.h>
> #include <linux/build_bug.h>
> #include <linux/kernel.h>
> #include <linux/pgtable.h>
>@@ -120,8 +121,12 @@ extern asmlinkage u64 __efi_call(void *f
> 	efi_enter_mm();							\
> })
> 
>-#define arch_efi_call_virt(p, f, args...)				\
>-	efi_call((void *)p->f, args)					\
>+#define arch_efi_call_virt(p, f, args...) ({				\
>+	u64 ret, ibt = ibt_save();					\
>+	ret = efi_call((void *)p->f, args);				\
>+	ibt_restore(ibt);						\
>+	ret;								\
>+})
> 
> #define arch_efi_call_virt_teardown()					\
> ({									\
>--- a/arch/x86/include/asm/ibt.h
>+++ b/arch/x86/include/asm/ibt.h
>@@ -6,6 +6,8 @@
> 
> #ifndef __ASSEMBLY__
> 
>+#include <linux/types.h>
>+
> #ifdef CONFIG_X86_64
> #define ASM_ENDBR	"endbr64\n\t"
> #else
>@@ -25,6 +27,9 @@ static inline bool is_endbr(const void *
> 	return val == ~0xfa1e0ff3;
> }
> 
>+extern u64 ibt_save(void);
>+extern void ibt_restore(u64 save);
>+
> #else /* __ASSEMBLY__ */
> 
> #ifdef CONFIG_X86_64
>@@ -39,10 +44,15 @@ static inline bool is_endbr(const void *
> 
> #ifndef __ASSEMBLY__
> 
>+#include <linux/types.h>
>+
> #define ASM_ENDBR
> 
> #define __noendbr
> 
>+static inline u64 ibt_save(void) { return 0; }
>+static inline void ibt_restore(u64 save) { }
>+
> #else /* __ASSEMBLY__ */
> 
> #define ENDBR
>--- a/arch/x86/kernel/apm_32.c
>+++ b/arch/x86/kernel/apm_32.c
>@@ -232,6 +232,7 @@
> #include <asm/paravirt.h>
> #include <asm/reboot.h>
> #include <asm/nospec-branch.h>
>+#include <asm/ibt.h>
> 
> #if defined(CONFIG_APM_DISPLAY_BLANK) && defined(CONFIG_VT)
> extern int (*console_blank_hook)(int);
>@@ -598,6 +599,7 @@ static long __apm_bios_call(void *_call)
> 	struct desc_struct	save_desc_40;
> 	struct desc_struct	*gdt;
> 	struct apm_bios_call	*call = _call;
>+	u64			ibt;
> 
> 	cpu = get_cpu();
> 	BUG_ON(cpu != 0);
>@@ -607,11 +609,13 @@ static long __apm_bios_call(void *_call)
> 
> 	apm_irq_save(flags);
> 	firmware_restrict_branch_speculation_start();
>+	ibt = ibt_save();
> 	APM_DO_SAVE_SEGS;
> 	apm_bios_call_asm(call->func, call->ebx, call->ecx,
> 			  &call->eax, &call->ebx, &call->ecx, &call->edx,
> 			  &call->esi);
> 	APM_DO_RESTORE_SEGS;
>+	ibt_restore(ibt);
> 	firmware_restrict_branch_speculation_end();
> 	apm_irq_restore(flags);
> 	gdt[0x40 / 8] = save_desc_40;
>@@ -676,6 +680,7 @@ static long __apm_bios_call_simple(void
> 	struct desc_struct	save_desc_40;
> 	struct desc_struct	*gdt;
> 	struct apm_bios_call	*call = _call;
>+	u64			ibt;
> 
> 	cpu = get_cpu();
> 	BUG_ON(cpu != 0);
>@@ -685,10 +690,12 @@ static long __apm_bios_call_simple(void
> 
> 	apm_irq_save(flags);
> 	firmware_restrict_branch_speculation_start();
>+	ibt = ibt_save();
> 	APM_DO_SAVE_SEGS;
> 	error = apm_bios_call_simple_asm(call->func, call->ebx, call->ecx,
> 					 &call->eax);
> 	APM_DO_RESTORE_SEGS;
>+	ibt_restore(ibt);
> 	firmware_restrict_branch_speculation_end();
> 	apm_irq_restore(flags);
> 	gdt[0x40 / 8] = save_desc_40;
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -592,6 +592,34 @@ static __init int setup_disable_pku(char
> __setup("nopku", setup_disable_pku);
> #endif /* CONFIG_X86_64 */
> 
>+#ifdef CONFIG_X86_IBT
>+
>+u64 ibt_save(void)
>+{
>+	u64 msr = 0;
>+
>+	if (cpu_feature_enabled(X86_FEATURE_IBT)) {
>+		rdmsrl(MSR_IA32_S_CET, msr);
>+		wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
>+	}
>+
>+	return msr;
>+}
>+
>+void ibt_restore(u64 save)

Please make these both __always_inline so there no risk of them ever gaining ENDBRs and being used by ROP to disable IBT...

>+{
>+	u64 msr;
>+
>+	if (cpu_feature_enabled(X86_FEATURE_IBT)) {
>+		rdmsrl(MSR_IA32_S_CET, msr);
>+		msr &= ~CET_ENDBR_EN;
>+		msr |= (save & CET_ENDBR_EN);
>+		wrmsrl(MSR_IA32_S_CET, msr);
>+	}
>+}
>+
>+#endif
>+
> static __always_inline void setup_cet(struct cpuinfo_x86 *c)
> {
> 	u64 msr;
>
>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/29] x86: Kernel IBT
  2022-02-19  9:58   ` Peter Zijlstra
  2022-02-19 16:00     ` Andrew Cooper
@ 2022-02-21  8:42     ` Kees Cook
  2022-02-21  9:24       ` Peter Zijlstra
  1 sibling, 1 reply; 94+ messages in thread
From: Kees Cook @ 2022-02-21  8:42 UTC (permalink / raw)
  To: Peter Zijlstra, Edgecombe, Rick P
  Cc: Poimboe, Josh, hjl.tools, x86, joao, Cooper, Andrew,
	linux-kernel, mark.rutland, samitolvanen, ndesaulniers, Milburn,
	Alyssa



On February 19, 2022 1:58:27 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>On Sat, Feb 19, 2022 at 01:29:45AM +0000, Edgecombe, Rick P wrote:
>> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
>> > This is an (almost!) complete Kernel IBT implementation. It's been
>> > self-hosting
>> > for a few days now. That is, it runs on IBT enabled hardware
>> > (Tigerlake) and is
>> > capable of building the next kernel.
>> > 
>> > It is also almost clean on allmodconfig using GCC-11.2.
>> > 
>> > The biggest TODO item at this point is Clang, I've not yet looked at
>> > that.
>> 
>> Do you need to turn this off before kexec?
>
>Probably... :-) I've never looked at that code though; so I'm not
>exactly sure where to put things.
>
>I'm assuming kexec does a hot-unplug of all but the boot-cpu which then
>leaves only a single CPU with state in machine_kexec() ? Does the below
>look reasonable?
>
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -638,6 +638,12 @@ static __always_inline void setup_cet(st
> 	}
> }
> 
>+void cet_disable(void)
>+{
>+	cr4_clear_bits(X86_CR4_CET);

I'd rather keep the pinning...

>+	wrmsrl(MSR_IA32_S_CET, 0);
>+}

Eh, why not just require kexec to be IBT safe? That seems a reasonable exercise if we ever expect UEFI to enforce IBT when starting the kernel on a normal boot...

-Kees

>+
> /*
>  * Some CPU features depend on higher CPUID levels, which may not always
>  * be available due to CPUID level capping or broken virtualization
>diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
>index 33d41e350c79..cf26356db53e 100644
>--- a/arch/x86/include/asm/cpu.h
>+++ b/arch/x86/include/asm/cpu.h
>@@ -72,4 +72,7 @@ void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
> #else
> static inline void init_ia32_feat_ctl(struct cpuinfo_x86 *c) {}
> #endif
>+
>+extern void cet_disable(void);
>+
> #endif /* _ASM_X86_CPU_H */
>diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>index f5da4a18070a..29a2a1732605 100644
>--- a/arch/x86/kernel/machine_kexec_64.c
>+++ b/arch/x86/kernel/machine_kexec_64.c
>@@ -310,6 +310,7 @@ void machine_kexec(struct kimage *image)
> 	/* Interrupts aren't acceptable while we reboot */
> 	local_irq_disable();
> 	hw_breakpoint_disable();
>+	cet_disable();
> 
> 	if (image->preserve_context) {
> #ifdef CONFIG_X86_IO_APIC

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/29] x86: Kernel IBT
  2022-02-21  8:42     ` Kees Cook
@ 2022-02-21  9:24       ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21  9:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Edgecombe, Rick P, Poimboe, Josh, hjl.tools, x86, joao, Cooper,
	Andrew, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
	Milburn, Alyssa

On Mon, Feb 21, 2022 at 12:42:25AM -0800, Kees Cook wrote:

> >+void cet_disable(void)
> >+{
> >+	cr4_clear_bits(X86_CR4_CET);
> 
> I'd rather keep the pinning...

Uff. is that still enforced at this point?

> >+	wrmsrl(MSR_IA32_S_CET, 0);
> >+}
> 
> Eh, why not just require kexec to be IBT safe? That seems a reasonable
> exercise if we ever expect UEFI to enforce IBT when starting the
> kernel on a normal boot...

Well, it makes it impossible to kexec into an 'old' kernel. That might
not be very nice.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 15/29] x86: Disable IBT around firmware
  2022-02-21  8:27   ` Kees Cook
@ 2022-02-21 10:06     ` Peter Zijlstra
  2022-02-21 13:22       ` Peter Zijlstra
  2022-02-21 15:54       ` Kees Cook
  0 siblings, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 10:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn


Could you trim replies so that I can actually find what you write?

On Mon, Feb 21, 2022 at 12:27:20AM -0800, Kees Cook wrote:

> >+#ifdef CONFIG_X86_IBT
> >+
> >+u64 ibt_save(void)
> >+{
> >+	u64 msr = 0;
> >+
> >+	if (cpu_feature_enabled(X86_FEATURE_IBT)) {
> >+		rdmsrl(MSR_IA32_S_CET, msr);
> >+		wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
> >+	}
> >+
> >+	return msr;
> >+}
> >+
> >+void ibt_restore(u64 save)
> 
> Please make these both __always_inline so there no risk of them ever gaining ENDBRs and being used by ROP to disable IBT...

Either that or mark them __noendbr. The below seems to work.

Do we have a preference?


--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -48,8 +48,8 @@ static inline bool is_endbr(const void *
 	return val == gen_endbr64();
 }
 
-extern u64 ibt_save(void);
-extern void ibt_restore(u64 save);
+extern __noendbr u64 ibt_save(void);
+extern __noendbr void ibt_restore(u64 save);
 
 #else /* __ASSEMBLY__ */
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -596,7 +596,7 @@ __setup("nopku", setup_disable_pku);
 
 #ifdef CONFIG_X86_IBT
 
-u64 ibt_save(void)
+__noendbr u64 ibt_save(void)
 {
 	u64 msr = 0;
 
@@ -608,7 +608,7 @@ u64 ibt_save(void)
 	return msr;
 }
 
-void ibt_restore(u64 save)
+__noendbr void ibt_restore(u64 save)
 {
 	u64 msr;
 

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 15/29] x86: Disable IBT around firmware
  2022-02-21 10:06     ` Peter Zijlstra
@ 2022-02-21 13:22       ` Peter Zijlstra
  2022-02-21 15:54       ` Kees Cook
  1 sibling, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 13:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn

On Mon, Feb 21, 2022 at 11:06:15AM +0100, Peter Zijlstra wrote:
> 
> Could you trim replies so that I can actually find what you write?
> 
> On Mon, Feb 21, 2022 at 12:27:20AM -0800, Kees Cook wrote:
> 
> > >+#ifdef CONFIG_X86_IBT
> > >+
> > >+u64 ibt_save(void)
> > >+{
> > >+	u64 msr = 0;
> > >+
> > >+	if (cpu_feature_enabled(X86_FEATURE_IBT)) {
> > >+		rdmsrl(MSR_IA32_S_CET, msr);
> > >+		wrmsrl(MSR_IA32_S_CET, msr & ~CET_ENDBR_EN);
> > >+	}
> > >+
> > >+	return msr;
> > >+}
> > >+
> > >+void ibt_restore(u64 save)
> > 
> > Please make these both __always_inline so there no risk of them ever gaining ENDBRs and being used by ROP to disable IBT...
> 
> Either that or mark them __noendbr. The below seems to work.
> 
> Do we have a preference?

The inline thing runs into header hell..

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 19/29] x86/ibt,xen: Annotate away warnings
  2022-02-18 23:07       ` Andrew Cooper
@ 2022-02-21 14:20         ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 14:20 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: x86, joao, hjl.tools, jpoimboe, Juergen Gross, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Andy Lutomirski

On Fri, Feb 18, 2022 at 11:07:15PM +0000, Andrew Cooper wrote:
> or so, but my point is that the early Xen code, if it can identify this
> patch point separate to the list of everything, can easily arrange for
> it to be modified before HYPERCALL_set_trap_table (Xen PV's LIDT), and
> then return_to_kernel is in its fully configured state (paravirt or
> otherwise) before interrupts/exceptions can be taken.

I ended up with the below... still bit of a hack, and I wonder if the
asm version you did isn't saner..

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -619,8 +619,8 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_
 
 	/* Restore RDI. */
 	popq	%rdi
-	SWAPGS
-	INTERRUPT_RETURN
+	swapgs
+	jmp	.Lnative_iret
 
 
 SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
@@ -637,11 +637,16 @@ SYM_INNER_LABEL(restore_regs_and_return_
 	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
 	 * when returning from IPI handler.
 	 */
-	INTERRUPT_RETURN
+#ifdef CONFIG_XEN_PV
+SYM_INNER_LABEL(early_xen_iret_patch, SYM_L_GLOBAL)
+	ANNOTATE_NOENDBR
+	.byte 0xe9
+	.long .Lnative_iret - (. + 4)
+#endif
 
-SYM_INNER_LABEL_ALIGN(native_iret, SYM_L_GLOBAL)
+.Lnative_iret:
 	UNWIND_HINT_IRET_REGS
-	ENDBR // paravirt_iret
+	ANNOTATE_NOENDBR
 	/*
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -141,13 +141,8 @@ static __always_inline void arch_local_i
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_XEN_PV
 #define SWAPGS	ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
-#define INTERRUPT_RETURN						\
-	ANNOTATE_RETPOLINE_SAFE;					\
-	ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);",		\
-		X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
 #else
 #define SWAPGS	swapgs
-#define INTERRUPT_RETURN	jmp native_iret
 #endif
 #endif
 #endif /* !__ASSEMBLY__ */
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -272,7 +272,6 @@ struct paravirt_patch_template {
 
 extern struct pv_info pv_info;
 extern struct paravirt_patch_template pv_ops;
-extern void (*paravirt_iret)(void);
 
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -350,7 +350,6 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb)
 	/* Remove Error Code */
 	addq    $8, %rsp
 
-	/* Pure iret required here - don't use INTERRUPT_RETURN */
 	iretq
 SYM_CODE_END(vc_boot_ghcb)
 #endif
@@ -435,6 +434,8 @@ SYM_CODE_END(early_idt_handler_common)
  * early_idt_handler_array can't be used because it returns via the
  * paravirtualized INTERRUPT_RETURN and pv-ops don't work that early.
  *
+ * XXX it does, fix this.
+ *
  * This handler will end up in the .init.text section and not be
  * available to boot secondary CPUs.
  */
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -151,8 +151,6 @@ void paravirt_set_sched_clock(u64 (*func
 }
 
 /* These are in entry.S */
-extern void native_iret(void);
-
 static struct resource reserve_ioports = {
 	.start = 0,
 	.end = IO_SPACE_LIMIT,
@@ -416,8 +414,6 @@ struct paravirt_patch_template pv_ops =
 
 #ifdef CONFIG_PARAVIRT_XXL
 NOKPROBE_SYMBOL(native_load_idt);
-
-void (*paravirt_iret)(void) = native_iret;
 #endif
 
 EXPORT_SYMBOL(pv_ops);
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1178,9 +1178,13 @@ static void __init xen_domu_set_legacy_f
 	x86_platform.legacy.rtc = 0;
 }
 
+extern void early_xen_iret_patch(void);
+
 /* First C function to be called on Xen boot */
 asmlinkage __visible void __init xen_start_kernel(void)
 {
+	void *early_xen_iret = &early_xen_iret_patch;
+	void *xen_iret_dest = &xen_iret;
 	struct physdev_set_iopl set_iopl;
 	unsigned long initrd_start = 0;
 	int rc;
@@ -1188,6 +1192,13 @@ asmlinkage __visible void __init xen_sta
 	if (!xen_start_info)
 		return;
 
+	OPTIMIZER_HIDE_VAR(early_xen_iret);
+	OPTIMIZER_HIDE_VAR(xen_iret_dest);
+
+	memcpy(early_xen_iret,
+	       text_gen_insn(JMP32_INSN_OPCODE, early_xen_iret, xen_iret_dest),
+	       JMP32_INSN_SIZE);
+
 	xen_domain_type = XEN_PV_DOMAIN;
 	xen_start_flags = xen_start_info->flags;
 
@@ -1196,7 +1207,6 @@ asmlinkage __visible void __init xen_sta
 	/* Install Xen paravirt ops */
 	pv_info = xen_info;
 	pv_ops.cpu = xen_cpu_ops.cpu;
-	paravirt_iret = xen_iret;
 	xen_init_irq_ops();
 
 	/*
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -193,7 +193,7 @@ hypercall_iret = hypercall_page + __HYPE
  */
 SYM_CODE_START(xen_iret)
 	UNWIND_HINT_EMPTY
-	ENDBR
+	ANNOTATE_NOENDBR
 	pushq $0
 	jmp hypercall_iret
 SYM_CODE_END(xen_iret)

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 15/29] x86: Disable IBT around firmware
  2022-02-21 10:06     ` Peter Zijlstra
  2022-02-21 13:22       ` Peter Zijlstra
@ 2022-02-21 15:54       ` Kees Cook
  2022-02-21 16:10         ` Peter Zijlstra
  1 sibling, 1 reply; 94+ messages in thread
From: Kees Cook @ 2022-02-21 15:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn



On February 21, 2022 2:06:15 AM PST, Peter Zijlstra <peterz@infradead.org> wrote:
>
>Could you trim replies so that I can actually find what you write?

Sorry, yes; I was on my phone where the interface is awkward.

>On Mon, Feb 21, 2022 at 12:27:20AM -0800, Kees Cook wrote:
>> Please make these both __always_inline so there no risk of them ever gaining ENDBRs and being used by ROP to disable IBT...
>
>Either that or mark them __noendbr. The below seems to work.
>
>Do we have a preference?

Ah yeah, that works for me.

A small bike shed: should __noendbr have an alias, like __never_indirect or something, so there is an arch-agnostic way to do this that actually says what it does? (yes, it's in x86-only code now, hence the bike shed...)

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 15/29] x86: Disable IBT around firmware
  2022-02-21 15:54       ` Kees Cook
@ 2022-02-21 16:10         ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-21 16:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: x86, joao, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, samitolvanen, mark.rutland, alyssa.milburn

On Mon, Feb 21, 2022 at 07:54:55AM -0800, Kees Cook wrote:
> A small bike shed: should __noendbr have an alias, like
> __never_indirect or something, so there is an arch-agnostic way to do
> this that actually says what it does? (yes, it's in x86-only code now,
> hence the bike shed...)

I actually asked Mark a related question last week somewhere, I think
the answer was that the annotation either wasn't working or not as
useful on ARM64.

I'm thinking it's easy enough to do a mass rename if/when we cross that
bridge though.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
                     ` (3 preceding siblings ...)
  2022-02-21  8:24   ` Kees Cook
@ 2022-02-22  4:38   ` Edgecombe, Rick P
  2022-02-22  9:32     ` Peter Zijlstra
  4 siblings, 1 reply; 94+ messages in thread
From: Edgecombe, Rick P @ 2022-02-22  4:38 UTC (permalink / raw)
  To: Poimboe, Josh, peterz, hjl.tools, x86, joao, Cooper, Andrew
  Cc: keescook, linux-kernel, mark.rutland, samitolvanen, ndesaulniers,
	Milburn, Alyssa

On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> +       cr4_set_bits(X86_CR4_CET);
> +
> +       rdmsrl(MSR_IA32_S_CET, msr);
> +       if (cpu_feature_enabled(X86_FEATURE_IBT))
> +               msr |= CET_ENDBR_EN;
> +       wrmsrl(MSR_IA32_S_CET, msr);

So I guess implicit in all of this is that MSR_IA32_S_CET will not be
managed by xsaves (makes sense).

But it still might be good to add the supervisor cet xfeature number to
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, with analogous reasoning to
XFEATURE_MASK_PT.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling
  2022-02-22  4:38   ` Edgecombe, Rick P
@ 2022-02-22  9:32     ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-22  9:32 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Poimboe, Josh, hjl.tools, x86, joao, Cooper, Andrew, keescook,
	linux-kernel, mark.rutland, samitolvanen, ndesaulniers, Milburn,
	Alyssa

On Tue, Feb 22, 2022 at 04:38:22AM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> > +       cr4_set_bits(X86_CR4_CET);
> > +
> > +       rdmsrl(MSR_IA32_S_CET, msr);
> > +       if (cpu_feature_enabled(X86_FEATURE_IBT))
> > +               msr |= CET_ENDBR_EN;
> > +       wrmsrl(MSR_IA32_S_CET, msr);
> 
> So I guess implicit in all of this is that MSR_IA32_S_CET will not be
> managed by xsaves (makes sense).
> 
> But it still might be good to add the supervisor cet xfeature number to
> XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, with analogous reasoning to
> XFEATURE_MASK_PT.

Yeah, no, I'm not touching that.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/29] x86/bugs: Disable Retpoline when IBT
  2022-02-19  2:15   ` Josh Poimboeuf
@ 2022-02-22 15:00     ` Peter Zijlstra
  2022-02-25  0:19       ` Josh Poimboeuf
  0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-22 15:00 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 06:15:30PM -0800, Josh Poimboeuf wrote:

> This code is confusing, not helped by the fact that the existing code
> already looks like spaghetti.

I'd say that's an insult to spaghetti.

> Assuming IBT systems also have eIBRS (right?), I don't think the above
> SPECTRE_V2_CMD_{FORCE,AUTO} cases would be possible.

Virt FTW.. if I don't handle it, some idiot will create a virtual
machine that doesn't expose eIBRS but does do IBT just to spite me.

> AFAICT, if execution reached the retpoline_generic label, the user
> specified either RETPOLINE or RETPOLINE_GENERIC.

Only RETPOLINE_GENERIC;

> I'm not sure it makes sense to put RETPOLINE in the "silent" list.  If
> the user boots an Intel system with spectre_v2=retpoline on the cmdline,
> they're probably expecting a traditional retpoline and should be warned
> if that changes, especially if it's a "demotion".

too friggin bad as to expectations; retpoline == auto. Not saying that
makes sense, just saying that's what it does.

> In that case the switch statement isn't even needed.  It can instead
> just unconditinoally print the warning.
> 
> 
> Also, why "demote" retpoline to LFENCE rather than attempting to
> "promote" it to eIBRS?  Maybe there's a good reason but it probably at
> least deserves some mention in the commit log.

The current code will never select retpoline if eibrs is available.


The alternative is doing this in apply_retpolines(), but that might be
even more nasty.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/29] x86: Kernel IBT
  2022-02-19  1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
  2022-02-19  9:58   ` Peter Zijlstra
@ 2022-02-23  7:26   ` Kees Cook
  2022-02-24 16:47     ` Mike Rapoport
  1 sibling, 1 reply; 94+ messages in thread
From: Kees Cook @ 2022-02-23  7:26 UTC (permalink / raw)
  To: Edgecombe, Rick P, Poimboe, Josh, peterz, hjl.tools, x86, joao,
	Cooper, Andrew
  Cc: linux-kernel, mark.rutland, samitolvanen, ndesaulniers, Milburn, Alyssa


On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> This is an (almost!) complete Kernel IBT implementation. 

BTW, I've successfully tested this on what /proc/cpuinfo calls an "11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz" (in a Lenovo "Yoga 7 15ITL5"). Normal laptop-y things all seem happy and it correctly blows up on a new LKDTM test I'll send out tomorrow.

So, even though the series is young and has some TODOs still:

Tested-by: Kees Cook <keescook@chromium.org>

One thought: should there be a note in dmesg about it being active? The only way to see it is finding "ibt" in cpuinfo...

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-18 21:08   ` Josh Poimboeuf
@ 2022-02-23 10:09     ` Peter Zijlstra
  2022-02-23 10:21       ` Miroslav Benes
  2022-02-23 10:57       ` Peter Zijlstra
  0 siblings, 2 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 10:09 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn,
	Miroslav Benes, Steven Rostedt

On Fri, Feb 18, 2022 at 01:08:31PM -0800, Josh Poimboeuf wrote:
> On Fri, Feb 18, 2022 at 05:49:06PM +0100, Peter Zijlstra wrote:
> > Currently livepatch assumes __fentry__ lives at func+0, which is most
> > likely untrue with IBT on. Override the weak klp_get_ftrace_location()
> > function with an arch specific version that's IBT aware.
> > 
> > Also make the weak fallback verify the location is an actual ftrace
> > location as a sanity check.
> > 
> > Suggested-by: Miroslav Benes <mbenes@suse.cz>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/include/asm/livepatch.h |    9 +++++++++
> >  kernel/livepatch/patch.c         |    2 +-
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> > 
> > --- a/arch/x86/include/asm/livepatch.h
> > +++ b/arch/x86/include/asm/livepatch.h
> > @@ -17,4 +17,13 @@ static inline void klp_arch_set_pc(struc
> >  	ftrace_instruction_pointer_set(fregs, ip);
> >  }
> >  
> > +#define klp_get_ftrace_location klp_get_ftrace_location
> > +static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
> > +{
> > +	unsigned long addr = ftrace_location(faddr);
> > +	if (!addr && IS_ENABLED(CONFIG_X86_IBT))
> > +		addr = ftrace_location(faddr + 4);
> > +	return addr;
> 
> I'm kind of surprised this logic doesn't exist in ftrace itself.  Is
> livepatch really the only user that needs to find the fentry for a given
> function?
> 
> I had to do a double take for the ftrace_location() semantics, as I
> originally assumed that's what it did, based on its name and signature.
> 
> Instead it apparently functions like a bool but returns its argument on
> success.
> 
> Though the function comment tells a different story:
> 
> /**
>  * ftrace_location - return true if the ip giving is a traced location
> 
> So it's all kinds of confusing...

Yes.. so yesterday, when making function-graph tracing not explode, I
ran into a similar issue. Steve suggested something along the lines of
.... this.

(modified from his actual suggestion to also cover this case)

Let me go try this...

--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
  */
 unsigned long ftrace_location(unsigned long ip)
 {
-	return ftrace_location_range(ip, ip);
+	struct dyn_ftrace *rec;
+	unsigned long offset;
+	unsigned long size;
+
+	rec = lookup_rec(ip, ip);
+	if (!rec) {
+		if (!kallsyms_lookup(ip, &size, &offset, NULL, NULL))
+			goto out;
+
+		rec = lookup_rec(ip - offset, (ip - offset) + size);
+	}
+
+	if (rec)
+		return rec->ip;
+
+out:
+	return 0;
 }
 
 /**
@@ -5110,11 +5126,16 @@ int register_ftrace_direct(unsigned long
 	struct ftrace_func_entry *entry;
 	struct ftrace_hash *free_hash = NULL;
 	struct dyn_ftrace *rec;
-	int ret = -EBUSY;
+	int ret = -ENODEV;
 
 	mutex_lock(&direct_mutex);
 
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	/* See if there's a direct function at @ip already */
+	ret = -EBUSY;
 	if (ftrace_find_rec_direct(ip))
 		goto out_unlock;
 
@@ -5222,6 +5243,10 @@ int unregister_ftrace_direct(unsigned lo
 
 	mutex_lock(&direct_mutex);
 
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	entry = find_direct_entry(&ip, NULL);
 	if (!entry)
 		goto out_unlock;

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 10:09     ` Peter Zijlstra
@ 2022-02-23 10:21       ` Miroslav Benes
  2022-02-23 10:57       ` Peter Zijlstra
  1 sibling, 0 replies; 94+ messages in thread
From: Miroslav Benes @ 2022-02-23 10:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Steven Rostedt

On Wed, 23 Feb 2022, Peter Zijlstra wrote:

> On Fri, Feb 18, 2022 at 01:08:31PM -0800, Josh Poimboeuf wrote:
> > On Fri, Feb 18, 2022 at 05:49:06PM +0100, Peter Zijlstra wrote:
> > > Currently livepatch assumes __fentry__ lives at func+0, which is most
> > > likely untrue with IBT on. Override the weak klp_get_ftrace_location()
> > > function with an arch specific version that's IBT aware.
> > > 
> > > Also make the weak fallback verify the location is an actual ftrace
> > > location as a sanity check.
> > > 
> > > Suggested-by: Miroslav Benes <mbenes@suse.cz>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  arch/x86/include/asm/livepatch.h |    9 +++++++++
> > >  kernel/livepatch/patch.c         |    2 +-
> > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > --- a/arch/x86/include/asm/livepatch.h
> > > +++ b/arch/x86/include/asm/livepatch.h
> > > @@ -17,4 +17,13 @@ static inline void klp_arch_set_pc(struc
> > >  	ftrace_instruction_pointer_set(fregs, ip);
> > >  }
> > >  
> > > +#define klp_get_ftrace_location klp_get_ftrace_location
> > > +static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
> > > +{
> > > +	unsigned long addr = ftrace_location(faddr);
> > > +	if (!addr && IS_ENABLED(CONFIG_X86_IBT))
> > > +		addr = ftrace_location(faddr + 4);
> > > +	return addr;
> > 
> > I'm kind of surprised this logic doesn't exist in ftrace itself.  Is
> > livepatch really the only user that needs to find the fentry for a given
> > function?
> > 
> > I had to do a double take for the ftrace_location() semantics, as I
> > originally assumed that's what it did, based on its name and signature.
> > 
> > Instead it apparently functions like a bool but returns its argument on
> > success.
> > 
> > Though the function comment tells a different story:
> > 
> > /**
> >  * ftrace_location - return true if the ip giving is a traced location
> > 
> > So it's all kinds of confusing...
> 
> Yes.. so yesterday, when making function-graph tracing not explode, I
> ran into a similar issue. Steve suggested something along the lines of
> .... this.
> 
> (modified from his actual suggestion to also cover this case)
> 
> Let me go try this...

Yes, this looks good.
 
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
>   */
>  unsigned long ftrace_location(unsigned long ip)
>  {
> -	return ftrace_location_range(ip, ip);
> +	struct dyn_ftrace *rec;
> +	unsigned long offset;
> +	unsigned long size;
> +
> +	rec = lookup_rec(ip, ip);
> +	if (!rec) {
> +		if (!kallsyms_lookup(ip, &size, &offset, NULL, NULL))

Since we do not care about a symbol name, kallsyms_lookup_size_offset() 
would be better I think.

Miroslav

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 10:09     ` Peter Zijlstra
  2022-02-23 10:21       ` Miroslav Benes
@ 2022-02-23 10:57       ` Peter Zijlstra
  2022-02-23 12:41         ` Steven Rostedt
  1 sibling, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 10:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn,
	Miroslav Benes, Steven Rostedt

On Wed, Feb 23, 2022 at 11:09:44AM +0100, Peter Zijlstra wrote:
> Yes.. so yesterday, when making function-graph tracing not explode, I
> ran into a similar issue. Steve suggested something along the lines of
> .... this.
> 
> (modified from his actual suggestion to also cover this case)
> 
> Let me go try this...

This one actually works...

---
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
  */
 unsigned long ftrace_location(unsigned long ip)
 {
-	return ftrace_location_range(ip, ip);
+	struct dyn_ftrace *rec;
+	unsigned long offset;
+	unsigned long size;
+
+	rec = lookup_rec(ip, ip);
+	if (!rec) {
+		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
+			goto out;
+
+		rec = lookup_rec(ip - offset, (ip - offset) + size);
+	}
+
+	if (rec)
+		return rec->ip;
+
+out:
+	return 0;
 }
 
 /**
@@ -5110,11 +5126,16 @@ int register_ftrace_direct(unsigned long
 	struct ftrace_func_entry *entry;
 	struct ftrace_hash *free_hash = NULL;
 	struct dyn_ftrace *rec;
-	int ret = -EBUSY;
+	int ret = -ENODEV;
 
 	mutex_lock(&direct_mutex);
 
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	/* See if there's a direct function at @ip already */
+	ret = -EBUSY;
 	if (ftrace_find_rec_direct(ip))
 		goto out_unlock;
 
@@ -5222,6 +5243,10 @@ int unregister_ftrace_direct(unsigned lo
 
 	mutex_lock(&direct_mutex);
 
+	ip = ftrace_location(ip);
+	if (!ip)
+		goto out_unlock;
+
 	entry = find_direct_entry(&ip, NULL);
 	if (!entry)
 		goto out_unlock;

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 10:57       ` Peter Zijlstra
@ 2022-02-23 12:41         ` Steven Rostedt
  2022-02-23 14:05           ` Peter Zijlstra
  2022-02-23 14:23           ` Steven Rostedt
  0 siblings, 2 replies; 94+ messages in thread
From: Steven Rostedt @ 2022-02-23 12:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Miroslav Benes

On Wed, 23 Feb 2022 11:57:26 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
>   */
>  unsigned long ftrace_location(unsigned long ip)
>  {
> -	return ftrace_location_range(ip, ip);
> +	struct dyn_ftrace *rec;
> +	unsigned long offset;
> +	unsigned long size;
> +
> +	rec = lookup_rec(ip, ip);
> +	if (!rec) {
> +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> +			goto out;
> +
> +		rec = lookup_rec(ip - offset, (ip - offset) + size);
> +	}
> +

Please create a new function for this. Perhaps find_ftrace_location().

ftrace_location() is used to see if the address given is a ftrace
nop or not. This change will make it always return true.

-- Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 12:41         ` Steven Rostedt
@ 2022-02-23 14:05           ` Peter Zijlstra
  2022-02-23 14:16             ` Steven Rostedt
  2022-02-23 14:23           ` Steven Rostedt
  1 sibling, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 14:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Miroslav Benes

On Wed, Feb 23, 2022 at 07:41:39AM -0500, Steven Rostedt wrote:
> On Wed, 23 Feb 2022 11:57:26 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> >   */
> >  unsigned long ftrace_location(unsigned long ip)
> >  {
> > -	return ftrace_location_range(ip, ip);
> > +	struct dyn_ftrace *rec;
> > +	unsigned long offset;
> > +	unsigned long size;
> > +
> > +	rec = lookup_rec(ip, ip);
> > +	if (!rec) {
> > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > +			goto out;
> > +
> > +		rec = lookup_rec(ip - offset, (ip - offset) + size);
> > +	}
> > +
> 
> Please create a new function for this. Perhaps find_ftrace_location().
> 
> ftrace_location() is used to see if the address given is a ftrace
> nop or not. This change will make it always return true.
> 

# git grep ftrace_location
arch/powerpc/include/asm/livepatch.h:#define klp_get_ftrace_location klp_get_ftrace_location
arch/powerpc/include/asm/livepatch.h:static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
arch/powerpc/include/asm/livepatch.h:   return ftrace_location_range(faddr, faddr + 16);
arch/powerpc/kernel/kprobes.c:          faddr = ftrace_location_range((unsigned long)addr,
arch/x86/kernel/kprobes/core.c: faddr = ftrace_location(addr);
arch/x86/kernel/kprobes/core.c:  * arch_check_ftrace_location(). Something went terribly wrong
include/linux/ftrace.h:unsigned long ftrace_location(unsigned long ip);
include/linux/ftrace.h:unsigned long ftrace_location_range(unsigned long start, unsigned long end);
include/linux/ftrace.h:static inline unsigned long ftrace_location(unsigned long ip)
kernel/bpf/trampoline.c:static int is_ftrace_location(void *ip)
kernel/bpf/trampoline.c:        addr = ftrace_location((long)ip);
kernel/bpf/trampoline.c:        ret = is_ftrace_location(ip);
kernel/kprobes.c:               unsigned long faddr = ftrace_location((unsigned long)addr);
kernel/kprobes.c:static int check_ftrace_location(struct kprobe *p)
kernel/kprobes.c:       ftrace_addr = ftrace_location((unsigned long)p->addr);
kernel/kprobes.c:       ret = check_ftrace_location(p);
kernel/livepatch/patch.c:#ifndef klp_get_ftrace_location
kernel/livepatch/patch.c:static unsigned long klp_get_ftrace_location(unsigned long faddr)
kernel/livepatch/patch.c:       return ftrace_location(faddr);
kernel/livepatch/patch.c:                       klp_get_ftrace_location((unsigned long)func->old_func);
kernel/livepatch/patch.c:                       klp_get_ftrace_location((unsigned long)func->old_func);
kernel/trace/ftrace.c: * ftrace_location_range - return the first address of a traced location
kernel/trace/ftrace.c:unsigned long ftrace_location_range(unsigned long start, unsigned long end)
kernel/trace/ftrace.c: * ftrace_location - return true if the ip giving is a traced location
kernel/trace/ftrace.c:unsigned long ftrace_location(unsigned long ip)
kernel/trace/ftrace.c:  ret = ftrace_location_range((unsigned long)start,
kernel/trace/ftrace.c:  if (!ftrace_location(ip))
kernel/trace/ftrace.c:  ip = ftrace_location(ip);
kernel/trace/ftrace.c:  ip = ftrace_location(ip);
kernel/trace/trace_kprobe.c:     * Since ftrace_location_range() does inclusive range check, we need
kernel/trace/trace_kprobe.c:    return !ftrace_location_range(addr, addr + size - 1);

and yet almost every caller takes the address it returns...

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 14:05           ` Peter Zijlstra
@ 2022-02-23 14:16             ` Steven Rostedt
  0 siblings, 0 replies; 94+ messages in thread
From: Steven Rostedt @ 2022-02-23 14:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Miroslav Benes

On Wed, 23 Feb 2022 15:05:42 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Feb 23, 2022 at 07:41:39AM -0500, Steven Rostedt wrote:
> > On Wed, 23 Feb 2022 11:57:26 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >   
> > > --- a/kernel/trace/ftrace.c
> > > +++ b/kernel/trace/ftrace.c
> > > @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> > >   */
> > >  unsigned long ftrace_location(unsigned long ip)
> > >  {
> > > -	return ftrace_location_range(ip, ip);
> > > +	struct dyn_ftrace *rec;
> > > +	unsigned long offset;
> > > +	unsigned long size;
> > > +
> > > +	rec = lookup_rec(ip, ip);
> > > +	if (!rec) {
> > > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > +			goto out;
> > > +
> > > +		rec = lookup_rec(ip - offset, (ip - offset) + size);
> > > +	}
> > > +  
> > 
> > Please create a new function for this. Perhaps find_ftrace_location().
> > 
> > ftrace_location() is used to see if the address given is a ftrace
> > nop or not. This change will make it always return true.
> >   
> 
> # git grep ftrace_location
> arch/powerpc/include/asm/livepatch.h:#define klp_get_ftrace_location klp_get_ftrace_location
> arch/powerpc/include/asm/livepatch.h:static inline unsigned long klp_get_ftrace_location(unsigned long faddr)
> arch/powerpc/include/asm/livepatch.h:   return ftrace_location_range(faddr, faddr + 16);
> arch/powerpc/kernel/kprobes.c:          faddr = ftrace_location_range((unsigned long)addr,
> arch/x86/kernel/kprobes/core.c: faddr = ftrace_location(addr);
> arch/x86/kernel/kprobes/core.c:  * arch_check_ftrace_location(). Something went terribly wrong
> include/linux/ftrace.h:unsigned long ftrace_location(unsigned long ip);
> include/linux/ftrace.h:unsigned long ftrace_location_range(unsigned long start, unsigned long end);
> include/linux/ftrace.h:static inline unsigned long ftrace_location(unsigned long ip)
> kernel/bpf/trampoline.c:static int is_ftrace_location(void *ip)
> kernel/bpf/trampoline.c:        addr = ftrace_location((long)ip);
> kernel/bpf/trampoline.c:        ret = is_ftrace_location(ip);
> kernel/kprobes.c:               unsigned long faddr = ftrace_location((unsigned long)addr);
> kernel/kprobes.c:static int check_ftrace_location(struct kprobe *p)
> kernel/kprobes.c:       ftrace_addr = ftrace_location((unsigned long)p->addr);
> kernel/kprobes.c:       ret = check_ftrace_location(p);
> kernel/livepatch/patch.c:#ifndef klp_get_ftrace_location
> kernel/livepatch/patch.c:static unsigned long klp_get_ftrace_location(unsigned long faddr)
> kernel/livepatch/patch.c:       return ftrace_location(faddr);
> kernel/livepatch/patch.c:                       klp_get_ftrace_location((unsigned long)func->old_func);
> kernel/livepatch/patch.c:                       klp_get_ftrace_location((unsigned long)func->old_func);
> kernel/trace/ftrace.c: * ftrace_location_range - return the first address of a traced location
> kernel/trace/ftrace.c:unsigned long ftrace_location_range(unsigned long start, unsigned long end)
> kernel/trace/ftrace.c: * ftrace_location - return true if the ip giving is a traced location
> kernel/trace/ftrace.c:unsigned long ftrace_location(unsigned long ip)
> kernel/trace/ftrace.c:  ret = ftrace_location_range((unsigned long)start,
> kernel/trace/ftrace.c:  if (!ftrace_location(ip))
> kernel/trace/ftrace.c:  ip = ftrace_location(ip);
> kernel/trace/ftrace.c:  ip = ftrace_location(ip);
> kernel/trace/trace_kprobe.c:     * Since ftrace_location_range() does inclusive range check, we need
> kernel/trace/trace_kprobe.c:    return !ftrace_location_range(addr, addr + size - 1);
> 
> and yet almost every caller takes the address it returns...

And they check if the returned value is 0 or not. If it is zero, it lets
them know it isn't an ftrace location.

-- Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 12:41         ` Steven Rostedt
  2022-02-23 14:05           ` Peter Zijlstra
@ 2022-02-23 14:23           ` Steven Rostedt
  2022-02-23 14:33             ` Steven Rostedt
  2022-02-23 14:49             ` Peter Zijlstra
  1 sibling, 2 replies; 94+ messages in thread
From: Steven Rostedt @ 2022-02-23 14:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Miroslav Benes, Masami Hiramatsu,
	Alexei Starovoitov

On Wed, 23 Feb 2022 07:41:39 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> >   */
> >  unsigned long ftrace_location(unsigned long ip)
> >  {
> > -	return ftrace_location_range(ip, ip);
> > +	struct dyn_ftrace *rec;
> > +	unsigned long offset;
> > +	unsigned long size;
> > +
> > +	rec = lookup_rec(ip, ip);
> > +	if (!rec) {
> > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > +			goto out;
> > +
> > +		rec = lookup_rec(ip - offset, (ip - offset) + size);
> > +	}
> > +  
> 
> Please create a new function for this. Perhaps find_ftrace_location().
> 
> ftrace_location() is used to see if the address given is a ftrace
> nop or not. This change will make it always return true.

Now we could do:

	return ip <= (rec->ip + MCOUNT_INSN_SIZE) ? rec->ip : 0;

Since we would want rec->ip if the pointer is before the ftrace
instruction. But we would need to audit all use cases and make sure this is
not called from any hot paths (in a callback).

This will affect kprobes and BPF as they both use ftrace_location() as well.

-- Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 14:23           ` Steven Rostedt
@ 2022-02-23 14:33             ` Steven Rostedt
  2022-02-23 14:49             ` Peter Zijlstra
  1 sibling, 0 replies; 94+ messages in thread
From: Steven Rostedt @ 2022-02-23 14:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Miroslav Benes, Masami Hiramatsu,
	Alexei Starovoitov

On Wed, 23 Feb 2022 09:23:27 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> 	return ip <= (rec->ip + MCOUNT_INSN_SIZE) ? rec->ip : 0;

That should be < and not <=, as I added the + MCOUNT_INSN_SIZE as an after
thought, and that addition changes the compare.

-- Steve

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 14:23           ` Steven Rostedt
  2022-02-23 14:33             ` Steven Rostedt
@ 2022-02-23 14:49             ` Peter Zijlstra
  2022-02-23 15:54               ` Peter Zijlstra
  1 sibling, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 14:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Miroslav Benes, Masami Hiramatsu,
	Alexei Starovoitov

On Wed, Feb 23, 2022 at 09:23:27AM -0500, Steven Rostedt wrote:
> On Wed, 23 Feb 2022 07:41:39 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > > --- a/kernel/trace/ftrace.c
> > > +++ b/kernel/trace/ftrace.c
> > > @@ -1578,7 +1578,23 @@ unsigned long ftrace_location_range(unsi
> > >   */
> > >  unsigned long ftrace_location(unsigned long ip)
> > >  {
> > > -	return ftrace_location_range(ip, ip);
> > > +	struct dyn_ftrace *rec;
> > > +	unsigned long offset;
> > > +	unsigned long size;
> > > +
> > > +	rec = lookup_rec(ip, ip);
> > > +	if (!rec) {
> > > +		if (!kallsyms_lookup_size_offset(ip, &size, &offset))
> > > +			goto out;
> > > +

		if (!offset)

> > > +		rec = lookup_rec(ip - offset, (ip - offset) + size);
> > > +	}
> > > +  
> > 
> > Please create a new function for this. Perhaps find_ftrace_location().
> > 
> > ftrace_location() is used to see if the address given is a ftrace
> > nop or not. This change will make it always return true.
> 
> Now we could do:
> 
> 	return ip <= (rec->ip + MCOUNT_INSN_SIZE) ? rec->ip : 0;

I don't see the point of that MCOUNT_INSN_SIZE there, I've done the
above. If +0 then find the entry, wherever it may be.

> Since we would want rec->ip if the pointer is before the ftrace
> instruction. But we would need to audit all use cases and make sure this is
> not called from any hot paths (in a callback).
> 
> This will affect kprobes and BPF as they both use ftrace_location() as well.

Yes, I already fixed kprobes, still trying to (re)discover how to run
the bpf-selftests, that stuff is too painful :-(

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 04/29] x86/livepatch: Validate __fentry__ location
  2022-02-23 14:49             ` Peter Zijlstra
@ 2022-02-23 15:54               ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-23 15:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Josh Poimboeuf, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn, Miroslav Benes, Masami Hiramatsu,
	Alexei Starovoitov

On Wed, Feb 23, 2022 at 03:49:41PM +0100, Peter Zijlstra wrote:

> > Since we would want rec->ip if the pointer is before the ftrace
> > instruction. But we would need to audit all use cases and make sure this is
> > not called from any hot paths (in a callback).
> > 
> > This will affect kprobes and BPF as they both use ftrace_location() as well.
> 
> Yes, I already fixed kprobes, still trying to (re)discover how to run
> the bpf-selftests, that stuff is too painful :-(

Ok, I think I managed... I'm obviously hitting the WARN_ON_ONCE() in
is_ftrace_location(). Funnily, no dead kernel, so that's something I
suppose.

Now, I'm trying to make sense of that code, but all that !ftrace_managed
code scares me to death.

At the very least __bpf_arch_text_poke() needs a bunch of help. Let me
go prod it with something sharp to see what falls out ...

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware
  2022-02-18 16:49 ` [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
@ 2022-02-24  1:18   ` Joao Moreira
  2022-02-24  9:10     ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Joao Moreira @ 2022-02-24  1:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn

> +#ifdef CONFIG_X86_IBT
> +	if (is_endbr(dest))
> +		dest += 4;
> +#endif

Hi, FWIIW I saw this snippet trigger a bug in the jump_label infra where 
the target displacement would not fit in a JMP8 operand. The behavior 
was seen because clang, for whatever reason (probably a bug?) inlined an 
ENDBR function along with a function, thus the JMP8 target was 
incremented. I compared the faulty kernel to one compiled with GCC and 
the latter wont emit/inline the ENDBR.

The displacement I'm using in my experimentation is a few bytes more 
than just 4, because I'm also adding extra instrumentation that should 
be skipped when not reached indirectly. Of course this is more prone to 
triggering the bug, but I don't think it is impossible to happen in the 
current implementation.

For these cases perhaps we can verify if the displacement fits the 
operand and, if not, simply ignore and lose the decode cycle which may 
not be a huge problem and remains semantically correct. Seems more 
sensible than padding jump tables with nops. In the meantime I'll 
investigate clang's behavior and if it is really a bug, I'll work on a 
patch.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware
  2022-02-24  1:18   ` Joao Moreira
@ 2022-02-24  9:10     ` Peter Zijlstra
  0 siblings, 0 replies; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-24  9:10 UTC (permalink / raw)
  To: Joao Moreira
  Cc: x86, hjl.tools, jpoimboe, andrew.cooper3, linux-kernel,
	ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn

On Wed, Feb 23, 2022 at 05:18:04PM -0800, Joao Moreira wrote:
> > +#ifdef CONFIG_X86_IBT
> > +	if (is_endbr(dest))
> > +		dest += 4;
> > +#endif
> 
> Hi, FWIIW I saw this snippet trigger a bug in the jump_label infra where the
> target displacement would not fit in a JMP8 operand.

Bah, I was afraid of seening that :/

> For these cases perhaps we can verify if the displacement fits the operand
> and, if not, simply ignore and lose the decode cycle which may not be a huge
> problem and remains semantically correct. Seems more sensible than padding
> jump tables with nops. In the meantime I'll investigate clang's behavior and
> if it is really a bug, I'll work on a patch.

Urgh, trouble is, we're going to be re-writing a bunch of ENDBR to be
UD1 0x0(%eax),%eax, and you really don't want to try and execute those.



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 00/29] x86: Kernel IBT
  2022-02-23  7:26   ` Kees Cook
@ 2022-02-24 16:47     ` Mike Rapoport
  0 siblings, 0 replies; 94+ messages in thread
From: Mike Rapoport @ 2022-02-24 16:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Edgecombe, Rick P, Poimboe, Josh, peterz, hjl.tools, x86, joao,
	Cooper, Andrew, linux-kernel, mark.rutland, samitolvanen,
	ndesaulniers, Milburn, Alyssa

On Tue, Feb 22, 2022 at 11:26:57PM -0800, Kees Cook wrote:
> 
> On Fri, 2022-02-18 at 17:49 +0100, Peter Zijlstra wrote:
> > This is an (almost!) complete Kernel IBT implementation. 
> 
> BTW, I've successfully tested this on what /proc/cpuinfo calls an "11th
> Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz" (in a Lenovo "Yoga 7 15ITL5").
> Normal laptop-y things all seem happy and it correctly blows up on a new
> LKDTM test I'll send out tomorrow.

For me it boots and can build kernel on a desktop with "12th Gen Intel(R)
Core(TM) i9-12900K"
 
> So, even though the series is young and has some TODOs still:
> 
> Tested-by: Kees Cook <keescook@chromium.org>

So, FWIW:

Tested-by: Mike Rapoport <rppt@linux.ibm.com>

> One thought: should there be a note in dmesg about it being active? The
> only way to see it is finding "ibt" in cpuinfo...
> 
> -Kees
> 
> -- 
> Kees Cook

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 16/29] x86/bugs: Disable Retpoline when IBT
  2022-02-22 15:00     ` Peter Zijlstra
@ 2022-02-25  0:19       ` Josh Poimboeuf
  0 siblings, 0 replies; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-25  0:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Tue, Feb 22, 2022 at 04:00:18PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 18, 2022 at 06:15:30PM -0800, Josh Poimboeuf wrote:
> 
> > This code is confusing, not helped by the fact that the existing code
> > already looks like spaghetti.
> 
> I'd say that's an insult to spaghetti.

:-)

> > Assuming IBT systems also have eIBRS (right?), I don't think the above
> > SPECTRE_V2_CMD_{FORCE,AUTO} cases would be possible.
> 
> Virt FTW.. if I don't handle it, some idiot will create a virtual
> machine that doesn't expose eIBRS but does do IBT just to spite me.

Ok, but in such a case, why not still do the warning, since the spectre
v2 mitigation isn't what the user might expect based on previous
behavior?

> 
> > AFAICT, if execution reached the retpoline_generic label, the user
> > specified either RETPOLINE or RETPOLINE_GENERIC.
> 
> Only RETPOLINE_GENERIC;

Hm?

	case SPECTRE_V2_CMD_RETPOLINE:
		if (IS_ENABLED(CONFIG_RETPOLINE))
			goto retpoline_auto;

retpoline_auto:
	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
		...
	} else {
	retpoline_generic:


> > I'm not sure it makes sense to put RETPOLINE in the "silent" list.  If
> > the user boots an Intel system with spectre_v2=retpoline on the cmdline,
> > they're probably expecting a traditional retpoline and should be warned
> > if that changes, especially if it's a "demotion".
> 
> too friggin bad as to expectations; retpoline == auto. Not saying that
> makes sense, just saying that's what it does.

Note quite.  Today it means "on Intel use the Intel retpoline; on AMD
use the AMD retpoline."

Intel doesn't recommend the AMD retpoline.  If you change that behavior
then it should be warned about so the user can adjust their mitigation
strategy accordingly.

> > In that case the switch statement isn't even needed.  It can instead
> > just unconditinoally print the warning.
> > 
> > 
> > Also, why "demote" retpoline to LFENCE rather than attempting to
> > "promote" it to eIBRS?  Maybe there's a good reason but it probably at
> > least deserves some mention in the commit log.
> 
> The current code will never select retpoline if eibrs is available.

Hm?  What do you think "spectre_v2=retpoline" does?

> The alternative is doing this in apply_retpolines(), but that might be
> even more nasty.

Hm?  Doing what in apply_retpolines()?

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
  2022-02-18 16:49 ` [PATCH 21/29] objtool: Rename --duplicate to --lto Peter Zijlstra
@ 2022-02-26 19:42   ` Josh Poimboeuf
  2022-02-26 21:48     ` Josh Poimboeuf
  0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-26 19:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Fri, Feb 18, 2022 at 05:49:23PM +0100, Peter Zijlstra wrote:
> In order to prepare for LTO like objtool runs for modules, rename the
> duplicate argument to lto.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  scripts/link-vmlinux.sh                 |    2 +-
>  tools/objtool/builtin-check.c           |    4 ++--
>  tools/objtool/check.c                   |    7 ++++++-
>  tools/objtool/include/objtool/builtin.h |    2 +-
>  4 files changed, 10 insertions(+), 5 deletions(-)
> 
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -115,7 +115,7 @@ objtool_link()
>  			objtoolcmd="orc generate"
>  		fi
>  
> -		objtoolopt="${objtoolopt} --duplicate"
> +		objtoolopt="${objtoolopt} --lto"
>  
>  		if is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL; then
>  			objtoolopt="${objtoolopt} --mcount"
> --- a/tools/objtool/builtin-check.c
> +++ b/tools/objtool/builtin-check.c
> @@ -20,7 +20,7 @@
>  #include <objtool/objtool.h>
>  
>  bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
> -     validate_dup, vmlinux, mcount, noinstr, backup, sls, dryrun;
> +     lto, vmlinux, mcount, noinstr, backup, sls, dryrun;
>  
>  static const char * const check_usage[] = {
>  	"objtool check [<options>] file.o",
> @@ -40,7 +40,7 @@ const struct option check_options[] = {
>  	OPT_BOOLEAN('b', "backtrace", &backtrace, "unwind on error"),
>  	OPT_BOOLEAN('a', "uaccess", &uaccess, "enable uaccess checking"),
>  	OPT_BOOLEAN('s', "stats", &stats, "print statistics"),
> -	OPT_BOOLEAN('d', "duplicate", &validate_dup, "duplicate validation for vmlinux.o"),
> +	OPT_BOOLEAN(0, "lto", &lto, "whole-archive like runs"),

"--lto" is a confusing name, since this "feature" isn't specific to LTO.

Also, it gives no indication of what it actually does.

What it does is, run objtool on vmlinux or module just like it's a
normal object, and *don't* do noinstr validation.  Right?

It's weird for the noinstr-only-mode to be the default.

BTW "--duplicate" had similar problems...

So how about:

- Default to normal mode on vmlinux/module, i.e. validate and/or
  generate ORC like any other object.  This default is more logically
  consistent and makes sense for the future once we get around to
  parallelizing objtool.

- Have "--noinstr", which does noinstr validation, in addition to all
  the other objtool validation/generation.  So it's additive, like any
  other cmdline option.  (Maybe this option isn't necessarily needed for
  now.)

- Have "--noinstr-only" which only does noinstr validation and nothing
  else.  (Alternatively, "--noinstr --dry-run")

?

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
  2022-02-26 19:42   ` Josh Poimboeuf
@ 2022-02-26 21:48     ` Josh Poimboeuf
  2022-02-28 11:05       ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-26 21:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Sat, Feb 26, 2022 at 11:42:13AM -0800, Josh Poimboeuf wrote:
> > +	OPT_BOOLEAN(0, "lto", &lto, "whole-archive like runs"),
> 
> "--lto" is a confusing name, since this "feature" isn't specific to LTO.
> 
> Also, it gives no indication of what it actually does.
> 
> What it does is, run objtool on vmlinux or module just like it's a
> normal object, and *don't* do noinstr validation.  Right?
> 
> It's weird for the noinstr-only-mode to be the default.
> 
> BTW "--duplicate" had similar problems...
> 
> So how about:
> 
> - Default to normal mode on vmlinux/module, i.e. validate and/or
>   generate ORC like any other object.  This default is more logically
>   consistent and makes sense for the future once we get around to
>   parallelizing objtool.
> 
> - Have "--noinstr", which does noinstr validation, in addition to all
>   the other objtool validation/generation.  So it's additive, like any
>   other cmdline option.  (Maybe this option isn't necessarily needed for
>   now.)

It just dawned on me that "--noinstr" already exists.  But I'm
scratching my head trying to figure out the difference between
"--noinstr" and omitting "--lto".

> - Have "--noinstr-only" which only does noinstr validation and nothing
>   else.  (Alternatively, "--noinstr --dry-run")
> 
> ?

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
  2022-02-26 21:48     ` Josh Poimboeuf
@ 2022-02-28 11:05       ` Peter Zijlstra
  2022-02-28 18:32         ` Josh Poimboeuf
  0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-28 11:05 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Sat, Feb 26, 2022 at 01:48:02PM -0800, Josh Poimboeuf wrote:
> On Sat, Feb 26, 2022 at 11:42:13AM -0800, Josh Poimboeuf wrote:
> > > +	OPT_BOOLEAN(0, "lto", &lto, "whole-archive like runs"),
> > 
> > "--lto" is a confusing name, since this "feature" isn't specific to LTO.
> > 
> > Also, it gives no indication of what it actually does.
> >
> > What it does is, run objtool on vmlinux or module just like it's a
> > normal object, and *don't* do noinstr validation.  Right?

How about --whole-archive, much like the linker then?

The distinction is that we run objtool *only* on vmlinux and modules and
not also on the individual .o files.

There's 3 models:

 A) every translation unit
    (module parts get --module)

 B) every translation unit + shallow vmlinux
    (module parts get --module, vmlinux.o gets --vmlinux)

 C) vmlinux + modules
    (modules get --module, vmlinux.o gets --vmlinux
    --duplicate/lto/whole-archive, pick your poison).


objtool started out with (A); then for noinstr validation I added a
shallow vmlinux pass that *only* checks .noinstr.text and .entry.text
for escapes (B). This is to not unduly add time to the slowest (single
threaded) part of the kernel build, linking vmlinux.

Then CLANG_LTO added (C), due to LTO there simply isn't asm to poke at
until the whole-archive thing. But this means that the vmlinux run needs
to do all validation, not only the shallow noinstr validation.
--duplicate was added there, a bad name because it really doesn't do
duplicate work, it's the first and only objtool run (it's only duplicate
if you also run on each TU, but we don't do that).

Now with these patches I need whole-archive objtool passes and instead
of making a 4th mode, or extend (B), I chose to just bite the bullet and
go full LTO style (C).

Now, I figured it would be good to have a flag to indicate we're running
LTO style and --duplicate is more or less that, except for the terrible
name.

> > It's weird for the noinstr-only-mode to be the default.
> > 
> > BTW "--duplicate" had similar problems...
> > 
> > So how about:
> > 
> > - Default to normal mode on vmlinux/module, i.e. validate and/or
> >   generate ORC like any other object.  This default is more logically
> >   consistent and makes sense for the future once we get around to
> >   parallelizing objtool.
> > 
> > - Have "--noinstr", which does noinstr validation, in addition to all
> >   the other objtool validation/generation.  So it's additive, like any
> >   other cmdline option.  (Maybe this option isn't necessarily needed for
> >   now.)
> 
> It just dawned on me that "--noinstr" already exists.  But I'm
> scratching my head trying to figure out the difference between
> "--noinstr" and omitting "--lto".

If you run: "objtool check --vmlinux --noinstr vmlinux.o", it'll only do
the shallow .noinstr.text/.entry.text checks. If OTOH you do: "objtool
check --vmlinux --noinstr --lto vmlinux.o" it'll do everything
(including noinstr).

Similarlt, "--module --lto" will come to mean whole module (which is
currently not distinguishable from a regular module part run).

(barring the possible 's/--lto/--whole-archive/' rename proposed up top)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
  2022-02-28 11:05       ` Peter Zijlstra
@ 2022-02-28 18:32         ` Josh Poimboeuf
  2022-02-28 20:09           ` Peter Zijlstra
  0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-28 18:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Mon, Feb 28, 2022 at 12:05:06PM +0100, Peter Zijlstra wrote:
> > It just dawned on me that "--noinstr" already exists.  But I'm
> > scratching my head trying to figure out the difference between
> > "--noinstr" and omitting "--lto".
> 
> If you run: "objtool check --vmlinux --noinstr vmlinux.o", it'll only do
> the shallow .noinstr.text/.entry.text checks. If OTOH you do: "objtool
> check --vmlinux --noinstr --lto vmlinux.o" it'll do everything
> (including noinstr).

I think I got all that.  But what does "--vmlinux" do by itself?

> Similarlt, "--module --lto" will come to mean whole module (which is
> currently not distinguishable from a regular module part run).
> 
> (barring the possible 's/--lto/--whole-archive/' rename proposed up top)

Thanks for the explanations.  To summarize, we have:

  A) legacy mode:

     translation unit: objtool check [--module]
     vmlinux.o:        N/A
     module:           N/A

  B) CONFIG_VMLINUX_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y)

     translation unit: objtool check [--module]
     vmlinux:          objtool check --vmlinux --noinstr
     module:           objtool check --module --noinstr
     
  C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:

     translation unit: N/A
     vmlinux:          objtool check --vmlinux --noinstr --lto
     module:           objtool check --module --noinstr --lto

Right?

I think I get it, but it's mental gymnastics for me to remember how the
options interact.  It still seems counterintuitive, because whatever
"objtool check" does to a translation unit, I'd expect "objtool check
--vmlinux" to do the same things.

I suppose it makes sense if I can remember that --vmlinux is a magical
option which disables all that other stuff.  And it's counteracted by
--lto, which removes the magic.  But that's all hard to remember and
just seems weird.

There are a variety of ways to run objtool against vmlinux.  The "lto"
approach is going to be less of an exception and may end up being the
default someday.  So making --vmlinux do weird stuff is going to be even
less intuitive as we go forward.  Let's make the default sane and
consistent with other file types.

So how about we just get rid of the magical --vmlinux and --lto options
altogether, and make --noinstr additive, like all the other options?

  A) legacy mode:
     .o files: objtool check [--module]
      vmlinux: N/A
       module: N/A

  B) CONFIG_NOINSTR_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y):
     .o files: objtool check [--module]
      vmlinux: objtool check --noinstr-only
       module: objtool check --module --noinstr-only
     
  C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
     .o files: N/A
      vmlinux: objtool check --noinstr
       module: objtool check --module --noinstr

(notice I renamed VMLINUX_VALIDATION to NOINSTR_VALIDATION)


Isn't that much more logical and intuitive?

  a) objtool has sane defaults, regardless of object type

  b) no magic options, other than --noinstr-only, but that's
     communicated in its name

  c) --vmlinux is no longer needed -- fewer options to juggle

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
  2022-02-28 18:32         ` Josh Poimboeuf
@ 2022-02-28 20:09           ` Peter Zijlstra
  2022-02-28 20:18             ` Josh Poimboeuf
  0 siblings, 1 reply; 94+ messages in thread
From: Peter Zijlstra @ 2022-02-28 20:09 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Mon, Feb 28, 2022 at 10:32:28AM -0800, Josh Poimboeuf wrote:

> Thanks for the explanations.  To summarize, we have:
> 
>   A) legacy mode:
> 
>      translation unit: objtool check [--module]
>      vmlinux.o:        N/A
>      module:           N/A
> 
>   B) CONFIG_VMLINUX_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y)
> 
>      translation unit: objtool check [--module]
>      vmlinux:          objtool check --vmlinux --noinstr
>      module:           objtool check --module --noinstr

Not the module thing here; noinstr never leaves the core kernel (for
now; I need me a few compiler features before I can tackle the idle path
issues).

>   C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
> 
>      translation unit: N/A
>      vmlinux:          objtool check --vmlinux --noinstr --lto
>      module:           objtool check --module --noinstr --lto
> 
> Right?

More or less, with the one caveat above.

> I think I get it, but it's mental gymnastics for me to remember how the
> options interact.  It still seems counterintuitive, because whatever
> "objtool check" does to a translation unit, I'd expect "objtool check
> --vmlinux" to do the same things.

I think I agree. It is a bit weird.

> So how about we just get rid of the magical --vmlinux and --lto options
> altogether, and make --noinstr additive, like all the other options?
>
>   A) legacy mode:
>      .o files: objtool check [--module]
>       vmlinux: N/A
>        module: N/A
>
>   B) CONFIG_NOINSTR_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y):
>      .o files: objtool check [--module]
>       vmlinux: objtool check --noinstr-only
>        module: objtool check --module --noinstr-only
>
>   C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
>      .o files: N/A
>       vmlinux: objtool check --noinstr
>        module: objtool check --module --noinstr

I like the --noinstr-only thing. But I think I still like a flag to
differentiate between TU/.o file and vmlinux/whole-module invocation.

Anyway, you ok with me cleaning this up later, in a separate series?

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
  2022-02-28 20:09           ` Peter Zijlstra
@ 2022-02-28 20:18             ` Josh Poimboeuf
  2022-03-01 14:19               ` Miroslav Benes
  0 siblings, 1 reply; 94+ messages in thread
From: Josh Poimboeuf @ 2022-02-28 20:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, joao, hjl.tools, andrew.cooper3, linux-kernel, ndesaulniers,
	keescook, samitolvanen, mark.rutland, alyssa.milburn

On Mon, Feb 28, 2022 at 09:09:34PM +0100, Peter Zijlstra wrote:
> > So how about we just get rid of the magical --vmlinux and --lto options
> > altogether, and make --noinstr additive, like all the other options?
> >
> >   A) legacy mode:
> >      .o files: objtool check [--module]
> >       vmlinux: N/A
> >        module: N/A
> >
> >   B) CONFIG_NOINSTR_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y):
> >      .o files: objtool check [--module]
> >       vmlinux: objtool check --noinstr-only
> >        module: objtool check --module --noinstr-only
> >
> >   C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
> >      .o files: N/A
> >       vmlinux: objtool check --noinstr
> >        module: objtool check --module --noinstr
> 
> I like the --noinstr-only thing. But I think I still like a flag to
> differentiate between TU/.o file and vmlinux/whole-module invocation.

I'm missing why that would still be useful.

> Anyway, you ok with me cleaning this up later, in a separate series?

Sure.  It's already less than ideal today anyway, with '--vmlinux' and
'--duplicate'.

-- 
Josh


^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH 21/29] objtool: Rename --duplicate to --lto
  2022-02-28 20:18             ` Josh Poimboeuf
@ 2022-03-01 14:19               ` Miroslav Benes
  0 siblings, 0 replies; 94+ messages in thread
From: Miroslav Benes @ 2022-03-01 14:19 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, x86, joao, hjl.tools, andrew.cooper3,
	linux-kernel, ndesaulniers, keescook, samitolvanen, mark.rutland,
	alyssa.milburn

On Mon, 28 Feb 2022, Josh Poimboeuf wrote:

> On Mon, Feb 28, 2022 at 09:09:34PM +0100, Peter Zijlstra wrote:
> > > So how about we just get rid of the magical --vmlinux and --lto options
> > > altogether, and make --noinstr additive, like all the other options?
> > >
> > >   A) legacy mode:
> > >      .o files: objtool check [--module]
> > >       vmlinux: N/A
> > >        module: N/A
> > >
> > >   B) CONFIG_NOINSTR_VALIDATION=y && !(CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y):
> > >      .o files: objtool check [--module]
> > >       vmlinux: objtool check --noinstr-only
> > >        module: objtool check --module --noinstr-only
> > >
> > >   C) CONFIG_X86_KERNEL_IBT=y || CONFIG_LTO=y:
> > >      .o files: N/A
> > >       vmlinux: objtool check --noinstr
> > >        module: objtool check --module --noinstr
> > 
> > I like the --noinstr-only thing. But I think I still like a flag to
> > differentiate between TU/.o file and vmlinux/whole-module invocation.
> 
> I'm missing why that would still be useful.
> 
> > Anyway, you ok with me cleaning this up later, in a separate series?
> 
> Sure.  It's already less than ideal today anyway, with '--vmlinux' and
> '--duplicate'.

Since I always have hard times to figure out different passes and options 
of objtool, could you add the above description (its final version) to 
tools/objtool/Documentation/ as a part of the cleanup series, please?

Miroslav

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2022-03-01 14:19 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-18 16:49 [PATCH 00/29] x86: Kernel IBT Peter Zijlstra
2022-02-18 16:49 ` [PATCH 01/29] static_call: Avoid building empty .static_call_sites Peter Zijlstra
2022-02-18 16:49 ` [PATCH 02/29] x86/module: Fix the paravirt vs alternative order Peter Zijlstra
2022-02-18 20:28   ` Josh Poimboeuf
2022-02-18 21:22     ` Peter Zijlstra
2022-02-18 23:28       ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 03/29] objtool: Add --dry-run Peter Zijlstra
2022-02-18 16:49 ` [PATCH 04/29] x86/livepatch: Validate __fentry__ location Peter Zijlstra
2022-02-18 21:08   ` Josh Poimboeuf
2022-02-23 10:09     ` Peter Zijlstra
2022-02-23 10:21       ` Miroslav Benes
2022-02-23 10:57       ` Peter Zijlstra
2022-02-23 12:41         ` Steven Rostedt
2022-02-23 14:05           ` Peter Zijlstra
2022-02-23 14:16             ` Steven Rostedt
2022-02-23 14:23           ` Steven Rostedt
2022-02-23 14:33             ` Steven Rostedt
2022-02-23 14:49             ` Peter Zijlstra
2022-02-23 15:54               ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 05/29] x86: Base IBT bits Peter Zijlstra
2022-02-18 20:49   ` Andrew Cooper
2022-02-18 21:11     ` David Laight
2022-02-18 21:24       ` Andrew Cooper
2022-02-18 22:37         ` David Laight
2022-02-18 21:26     ` Peter Zijlstra
2022-02-18 21:14   ` Josh Poimboeuf
2022-02-18 21:21     ` Peter Zijlstra
2022-02-18 22:12   ` Joao Moreira
2022-02-19  1:07   ` Edgecombe, Rick P
2022-02-18 16:49 ` [PATCH 06/29] x86/ibt: Add ANNOTATE_NOENDBR Peter Zijlstra
2022-02-18 16:49 ` [PATCH 07/29] x86/entry: Sprinkle ENDBR dust Peter Zijlstra
2022-02-19  0:23   ` Josh Poimboeuf
2022-02-19 23:08     ` Peter Zijlstra
2022-02-19  0:36   ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 08/29] x86/linkage: Add ENDBR to SYM_FUNC_START*() Peter Zijlstra
2022-02-18 16:49 ` [PATCH 09/29] x86/ibt,paravirt: Sprinkle ENDBR Peter Zijlstra
2022-02-18 16:49 ` [PATCH 10/29] x86/bpf: Add ENDBR instructions to prologue Peter Zijlstra
2022-02-18 16:49 ` [PATCH 11/29] x86/ibt,crypto: Add ENDBR for the jump-table entries Peter Zijlstra
2022-02-18 16:49 ` [PATCH 12/29] x86/ibt,kvm: Add ENDBR to fastops Peter Zijlstra
2022-02-18 16:49 ` [PATCH 13/29] x86/ibt,ftrace: Add ENDBR to samples/ftrace Peter Zijlstra
2022-02-18 16:49 ` [PATCH 14/29] x86/ibt: Add IBT feature, MSR and #CP handling Peter Zijlstra
2022-02-18 19:31   ` Andrew Cooper
2022-02-18 21:15     ` Peter Zijlstra
2022-02-19  1:20   ` Edgecombe, Rick P
2022-02-19  1:21   ` Josh Poimboeuf
2022-02-19  9:24     ` Peter Zijlstra
2022-02-21  8:24   ` Kees Cook
2022-02-22  4:38   ` Edgecombe, Rick P
2022-02-22  9:32     ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 15/29] x86: Disable IBT around firmware Peter Zijlstra
2022-02-21  8:27   ` Kees Cook
2022-02-21 10:06     ` Peter Zijlstra
2022-02-21 13:22       ` Peter Zijlstra
2022-02-21 15:54       ` Kees Cook
2022-02-21 16:10         ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 16/29] x86/bugs: Disable Retpoline when IBT Peter Zijlstra
2022-02-19  2:15   ` Josh Poimboeuf
2022-02-22 15:00     ` Peter Zijlstra
2022-02-25  0:19       ` Josh Poimboeuf
2022-02-18 16:49 ` [PATCH 17/29] x86/ibt: Annotate text references Peter Zijlstra
2022-02-19  5:22   ` Josh Poimboeuf
2022-02-19  9:39     ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 18/29] x86/ibt,ftrace: Annotate ftrace code patching Peter Zijlstra
2022-02-18 16:49 ` [PATCH 19/29] x86/ibt,xen: Annotate away warnings Peter Zijlstra
2022-02-18 20:24   ` Andrew Cooper
2022-02-18 21:05     ` Peter Zijlstra
2022-02-18 23:07       ` Andrew Cooper
2022-02-21 14:20         ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 20/29] x86/ibt,sev: Annotations Peter Zijlstra
2022-02-18 16:49 ` [PATCH 21/29] objtool: Rename --duplicate to --lto Peter Zijlstra
2022-02-26 19:42   ` Josh Poimboeuf
2022-02-26 21:48     ` Josh Poimboeuf
2022-02-28 11:05       ` Peter Zijlstra
2022-02-28 18:32         ` Josh Poimboeuf
2022-02-28 20:09           ` Peter Zijlstra
2022-02-28 20:18             ` Josh Poimboeuf
2022-03-01 14:19               ` Miroslav Benes
2022-02-18 16:49 ` [PATCH 22/29] Kbuild: Prepare !CLANG whole module objtool Peter Zijlstra
2022-02-18 16:49 ` [PATCH 23/29] objtool: Read the NOENDBR annotation Peter Zijlstra
2022-02-18 16:49 ` [PATCH 24/29] x86/text-patching: Make text_gen_insn() IBT aware Peter Zijlstra
2022-02-24  1:18   ` Joao Moreira
2022-02-24  9:10     ` Peter Zijlstra
2022-02-18 16:49 ` [PATCH 25/29] x86/ibt: Dont generate ENDBR in .discard.text Peter Zijlstra
2022-02-18 16:49 ` [PATCH 26/29] objtool: Add IBT validation / fixups Peter Zijlstra
2022-02-18 16:49 ` [PATCH 27/29] x86/ibt: Finish --ibt-fix-direct on module loading Peter Zijlstra
2022-02-18 16:49 ` [PATCH 28/29] x86/ibt: Ensure module init/exit points have references Peter Zijlstra
2022-02-18 16:49 ` [PATCH 29/29] x86/alternative: Use .ibt_endbr_sites to seal indirect calls Peter Zijlstra
2022-02-19  1:29 ` [PATCH 00/29] x86: Kernel IBT Edgecombe, Rick P
2022-02-19  9:58   ` Peter Zijlstra
2022-02-19 16:00     ` Andrew Cooper
2022-02-21  8:42     ` Kees Cook
2022-02-21  9:24       ` Peter Zijlstra
2022-02-23  7:26   ` Kees Cook
2022-02-24 16:47     ` Mike Rapoport

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.